git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git merge commits are non-deterministic? what changed?
@ 2012-11-09 13:31 Ulrich Spörlein
  2012-11-09 15:04 ` Andreas Schwab
  0 siblings, 1 reply; 9+ messages in thread
From: Ulrich Spörlein @ 2012-11-09 13:31 UTC (permalink / raw
  To: git

Hi all,

I'm running a couple of conversions from SVN to git, using a slightly
hacked version of svn2git (because it can cope with multiple branches
and is several orders of magnitude faster than git-svn).

Anyway, when doing some verification runs, using the same version of
svn2git, but different versions of git, I get different commit hashes,
and I tracked it down to the ordering of the parents inside a merge
commit.

version 1.7.9.2
% git show --format=raw e209a83|head
commit e209a83c1e0a387c88a44f3a8f2be2670ed85eae
tree de2d7c6726a45428d4a310da2acd8839daf9f85f
parent 5fba0401c23a594e4ad5e807bf14a5439645a358
parent 25062ba061871945759b3baa833fe64969383e40
parent 89bebeef185ed08424fc548f8569081c6add2439
parent c7d5f60d3a7e2e3c4da23b157c62504667344438
parent e7bc108f0d6a394050818a4af64a59094d3c793e
parent 48231afadc40013e6bfda56b04a11ee3a602598f
author rgrimes <rgrimes@FreeBSD.org> 739897097 +0000
committer rgrimes <rgrimes@FreeBSD.org> 739897097 +0000

vs

git version 1.8.0
% git show --format=raw 42f0fad|head
commit 42f0fadccab6eefc7ffdc1012345b42ad45e36c2
tree de2d7c6726a45428d4a310da2acd8839daf9f85f
parent 5fba0401c23a594e4ad5e807bf14a5439645a358
parent 25062ba061871945759b3baa833fe64969383e40
parent 89bebeef185ed08424fc548f8569081c6add2439
parent 48231afadc40013e6bfda56b04a11ee3a602598f
parent c7d5f60d3a7e2e3c4da23b157c62504667344438
parent e7bc108f0d6a394050818a4af64a59094d3c793e
author rgrimes <rgrimes@FreeBSD.org> 739897097 +0000
committer rgrimes <rgrimes@FreeBSD.org> 739897097 +0000

I haven't verified to see if that ordering is stable within a git
version, but the fact that it changed across versions clearly means that
I cannot depend on this currently (I have never seen this problem in two
years, so I blame git 1.8.0 ...)

Two questions:
1. Can we impose a stable ordering of the commits being recorded in a
merge commit? Listing parents in chronological order or something like
that.

2. Why the hell is the commit hash dependent on the ordering of the
parent commits? IMHO it should sort the set of parents before
calculating the hash ...

Help?
Uli

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git merge commits are non-deterministic? what changed?
  2012-11-09 13:31 git merge commits are non-deterministic? what changed? Ulrich Spörlein
@ 2012-11-09 15:04 ` Andreas Schwab
  2012-11-09 15:42   ` Ulrich Spörlein
  0 siblings, 1 reply; 9+ messages in thread
From: Andreas Schwab @ 2012-11-09 15:04 UTC (permalink / raw
  To: Ulrich Spörlein; +Cc: git

Ulrich Spörlein <uqs@spoerlein.net> writes:

> Two questions:
> 1. Can we impose a stable ordering of the commits being recorded in a
> merge commit? Listing parents in chronological order or something like
> that.

The order is determined by the order the refs are given to git merge (or
git commit-tree when using the plumbing).

> 2. Why the hell is the commit hash dependent on the ordering of the
> parent commits? IMHO it should sort the set of parents before
> calculating the hash ...

What would be the sort key?

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git merge commits are non-deterministic? what changed?
  2012-11-09 15:04 ` Andreas Schwab
@ 2012-11-09 15:42   ` Ulrich Spörlein
  2012-11-09 15:52     ` Matthieu Moy
  0 siblings, 1 reply; 9+ messages in thread
From: Ulrich Spörlein @ 2012-11-09 15:42 UTC (permalink / raw
  To: Andreas Schwab; +Cc: git

On Fri, 2012-11-09 at 16:04:31 +0100, Andreas Schwab wrote:
> Ulrich Spörlein <uqs@spoerlein.net> writes:
> 
> > Two questions:
> > 1. Can we impose a stable ordering of the commits being recorded in a
> > merge commit? Listing parents in chronological order or something like
> > that.
> 
> The order is determined by the order the refs are given to git merge (or
> git commit-tree when using the plumbing).
> 
> > 2. Why the hell is the commit hash dependent on the ordering of the
> > parent commits? IMHO it should sort the set of parents before
> > calculating the hash ...
> 
> What would be the sort key?

Trivially, the hash of the parents itself. So you'd always get

...
parent 0000
parent 1111
parent aaaa
parent ffff

hth
Uli

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git merge commits are non-deterministic? what changed?
  2012-11-09 15:42   ` Ulrich Spörlein
@ 2012-11-09 15:52     ` Matthieu Moy
  2012-11-09 16:16       ` Jeff King
  0 siblings, 1 reply; 9+ messages in thread
From: Matthieu Moy @ 2012-11-09 15:52 UTC (permalink / raw
  To: Ulrich Spörlein; +Cc: Andreas Schwab, git

Ulrich Spörlein <uqs@spoerlein.net> writes:

>> > 2. Why the hell is the commit hash dependent on the ordering of the
>> > parent commits? IMHO it should sort the set of parents before
>> > calculating the hash ...
>> 
>> What would be the sort key?
>
> Trivially, the hash of the parents itself. So you'd always get
>
> ...
> parent 0000
> parent 1111
> parent aaaa
> parent ffff

That would change the behavior of --first-parent. Or you'd need to
compute the sha1 of the sorted list, but keep the unsorted one in the
commit. Possible, but weird ;-).

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git merge commits are non-deterministic? what changed?
  2012-11-09 15:52     ` Matthieu Moy
@ 2012-11-09 16:16       ` Jeff King
  2012-11-09 18:27         ` Ulrich Spörlein
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff King @ 2012-11-09 16:16 UTC (permalink / raw
  To: Matthieu Moy; +Cc: Ulrich Spörlein, Andreas Schwab, git

On Fri, Nov 09, 2012 at 04:52:48PM +0100, Matthieu Moy wrote:

> Ulrich Spörlein <uqs@spoerlein.net> writes:
> 
> >> > 2. Why the hell is the commit hash dependent on the ordering of the
> >> > parent commits? IMHO it should sort the set of parents before
> >> > calculating the hash ...
> >> 
> >> What would be the sort key?
> >
> > Trivially, the hash of the parents itself. So you'd always get
> >
> > ...
> > parent 0000
> > parent 1111
> > parent aaaa
> > parent ffff
> 
> That would change the behavior of --first-parent. Or you'd need to
> compute the sha1 of the sorted list, but keep the unsorted one in the
> commit. Possible, but weird ;-).

Right. The reason that merge parents are stored in the order given on
the command line is not random or because it was not considered. It
encodes a valuable piece of information: did the user merge "foo" into
"bar", or did they merge "bar" into "foo"?

So I think this discussion is going in the wrong direction; git should
never sort the parents, because the order is meaningful. The original
complaint was that a run of svn2git produced different results on two
different git versions. The important question to me is: did svn2git
feed the parents to git in the same order?

If it did, and git produced different results, then that is a serious
bug.

If it did not, then the issue needs to be resolved in svn2git (which
_may_ want to sort the parents that it feeds to git, but it would depend
on whether the order it is currently presenting is meaningful).

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git merge commits are non-deterministic? what changed?
  2012-11-09 16:16       ` Jeff King
@ 2012-11-09 18:27         ` Ulrich Spörlein
  2012-11-12 11:27           ` Michael J Gruber
  0 siblings, 1 reply; 9+ messages in thread
From: Ulrich Spörlein @ 2012-11-09 18:27 UTC (permalink / raw
  To: Jeff King; +Cc: Matthieu Moy, Andreas Schwab, git

On Fri, 2012-11-09 at 11:16:47 -0500, Jeff King wrote:
> On Fri, Nov 09, 2012 at 04:52:48PM +0100, Matthieu Moy wrote:
> 
> > Ulrich Spörlein <uqs@spoerlein.net> writes:
> > 
> > >> > 2. Why the hell is the commit hash dependent on the ordering of the
> > >> > parent commits? IMHO it should sort the set of parents before
> > >> > calculating the hash ...
> > >> 
> > >> What would be the sort key?
> > >
> > > Trivially, the hash of the parents itself. So you'd always get
> > >
> > > ...
> > > parent 0000
> > > parent 1111
> > > parent aaaa
> > > parent ffff
> > 
> > That would change the behavior of --first-parent. Or you'd need to
> > compute the sha1 of the sorted list, but keep the unsorted one in the
> > commit. Possible, but weird ;-).
> 
> Right. The reason that merge parents are stored in the order given on
> the command line is not random or because it was not considered. It
> encodes a valuable piece of information: did the user merge "foo" into
> "bar", or did they merge "bar" into "foo"?
> 
> So I think this discussion is going in the wrong direction; git should
> never sort the parents, because the order is meaningful. The original
> complaint was that a run of svn2git produced different results on two
> different git versions. The important question to me is: did svn2git
> feed the parents to git in the same order?
> 
> If it did, and git produced different results, then that is a serious
> bug.
> 
> If it did not, then the issue needs to be resolved in svn2git (which
> _may_ want to sort the parents that it feeds to git, but it would depend
> on whether the order it is currently presenting is meaningful).

Yeah, thanks, looks like I have some more work to do. I don't quite get
how it could come up with a different order, seeing that it is using svn
as the base.

Will run some more experiments, thanks for the info so far.

Cheers,
Uli

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git merge commits are non-deterministic? what changed?
  2012-11-09 18:27         ` Ulrich Spörlein
@ 2012-11-12 11:27           ` Michael J Gruber
  2012-11-20 16:22             ` Ulrich Spörlein
  0 siblings, 1 reply; 9+ messages in thread
From: Michael J Gruber @ 2012-11-12 11:27 UTC (permalink / raw
  To: Ulrich Spörlein; +Cc: Jeff King, Matthieu Moy, Andreas Schwab, git

Ulrich Spörlein venit, vidit, dixit 09.11.2012 19:27:
> On Fri, 2012-11-09 at 11:16:47 -0500, Jeff King wrote:
>> On Fri, Nov 09, 2012 at 04:52:48PM +0100, Matthieu Moy wrote:
>>
>>> Ulrich Spörlein <uqs@spoerlein.net> writes:
>>>
>>>>>> 2. Why the hell is the commit hash dependent on the ordering of the
>>>>>> parent commits? IMHO it should sort the set of parents before
>>>>>> calculating the hash ...
>>>>>
>>>>> What would be the sort key?
>>>>
>>>> Trivially, the hash of the parents itself. So you'd always get
>>>>
>>>> ...
>>>> parent 0000
>>>> parent 1111
>>>> parent aaaa
>>>> parent ffff
>>>
>>> That would change the behavior of --first-parent. Or you'd need to
>>> compute the sha1 of the sorted list, but keep the unsorted one in the
>>> commit. Possible, but weird ;-).
>>
>> Right. The reason that merge parents are stored in the order given on
>> the command line is not random or because it was not considered. It
>> encodes a valuable piece of information: did the user merge "foo" into
>> "bar", or did they merge "bar" into "foo"?
>>
>> So I think this discussion is going in the wrong direction; git should
>> never sort the parents, because the order is meaningful. The original
>> complaint was that a run of svn2git produced different results on two
>> different git versions. The important question to me is: did svn2git
>> feed the parents to git in the same order?
>>
>> If it did, and git produced different results, then that is a serious
>> bug.
>>
>> If it did not, then the issue needs to be resolved in svn2git (which
>> _may_ want to sort the parents that it feeds to git, but it would depend
>> on whether the order it is currently presenting is meaningful).
> 
> Yeah, thanks, looks like I have some more work to do. I don't quite get
> how it could come up with a different order, seeing that it is using svn
> as the base.
> 
> Will run some more experiments, thanks for the info so far.

There was a change in the order in which "git cherry-pick A B C" applies
the commits. It's the only odering affecting change in 1.8.0 that I can
think of right now.

Michael

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git merge commits are non-deterministic? what changed?
  2012-11-12 11:27           ` Michael J Gruber
@ 2012-11-20 16:22             ` Ulrich Spörlein
  2012-11-20 20:39               ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Ulrich Spörlein @ 2012-11-20 16:22 UTC (permalink / raw
  To: Michael J Gruber; +Cc: Jeff King, Matthieu Moy, Andreas Schwab, git

On Mon, 2012-11-12 at 12:27:31 +0100, Michael J Gruber wrote:
> Ulrich Spörlein venit, vidit, dixit 09.11.2012 19:27:
> > On Fri, 2012-11-09 at 11:16:47 -0500, Jeff King wrote:
> >> On Fri, Nov 09, 2012 at 04:52:48PM +0100, Matthieu Moy wrote:
> >>
> >>> Ulrich Spörlein <uqs@spoerlein.net> writes:
> >>>
> >>>>>> 2. Why the hell is the commit hash dependent on the ordering of the
> >>>>>> parent commits? IMHO it should sort the set of parents before
> >>>>>> calculating the hash ...
> >>>>>
> >>>>> What would be the sort key?
> >>>>
> >>>> Trivially, the hash of the parents itself. So you'd always get
> >>>>
> >>>> ...
> >>>> parent 0000
> >>>> parent 1111
> >>>> parent aaaa
> >>>> parent ffff
> >>>
> >>> That would change the behavior of --first-parent. Or you'd need to
> >>> compute the sha1 of the sorted list, but keep the unsorted one in the
> >>> commit. Possible, but weird ;-).
> >>
> >> Right. The reason that merge parents are stored in the order given on
> >> the command line is not random or because it was not considered. It
> >> encodes a valuable piece of information: did the user merge "foo" into
> >> "bar", or did they merge "bar" into "foo"?
> >>
> >> So I think this discussion is going in the wrong direction; git should
> >> never sort the parents, because the order is meaningful. The original
> >> complaint was that a run of svn2git produced different results on two
> >> different git versions. The important question to me is: did svn2git
> >> feed the parents to git in the same order?
> >>
> >> If it did, and git produced different results, then that is a serious
> >> bug.
> >>
> >> If it did not, then the issue needs to be resolved in svn2git (which
> >> _may_ want to sort the parents that it feeds to git, but it would depend
> >> on whether the order it is currently presenting is meaningful).
> > 
> > Yeah, thanks, looks like I have some more work to do. I don't quite get
> > how it could come up with a different order, seeing that it is using svn
> > as the base.
> > 
> > Will run some more experiments, thanks for the info so far.
> 
> There was a change in the order in which "git cherry-pick A B C" applies
> the commits. It's the only odering affecting change in 1.8.0 that I can
> think of right now.

Just to wrap this up, it was of course a "feature" of the converter,
that resulted in this unrepeatable behavior. The SVN API makes use of
apr_hashes, which were traversed in arbitrary order, hence SVN commits
spanning multiple git-branches would be handled in a non-deterministic
order, leading to randomly ordered parent objects for later git merge
commits.

It it still debatable, whether a merge commit should have a
list-of-parents or a set-of-parents. Changing it to a set-of-parents
(with a well-defined hash function), would have made this problem go
away.

But this will never be changed, it would break the fundamental git
storage model as it is in place now.

Cheers,
Uli

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: git merge commits are non-deterministic? what changed?
  2012-11-20 16:22             ` Ulrich Spörlein
@ 2012-11-20 20:39               ` Junio C Hamano
  0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2012-11-20 20:39 UTC (permalink / raw
  To: Ulrich Spörlein
  Cc: Michael J Gruber, Jeff King, Matthieu Moy, Andreas Schwab, git

Ulrich Spörlein <uqs@spoerlein.net> writes:

> But this will never be changed, it would break the fundamental git
> storage model as it is in place now.

It doesn't just break "storage model", but more importantly, it
breaks the semantics.

Imagine that things started breaking after merging your topic branch
'foo' to the integration branch 'master', and how people would
perceive the situation.  Everybody would say your topic 'foo' broke
the build.  Nobody except you would say, even if the tip of your
topic 'foo' alone works perfectly, merging the 'master' to your
topic 'foo' broke that topic.  The topic should have been adjusted
to the updated baseline, that is the 'master' branch before this
merge since your topic 'foo' forked off of it, before or during the
merge.

To express what was merged into what, the order of parents in the
commit is fundamentally a part of what a commit is.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-11-20 20:39 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-09 13:31 git merge commits are non-deterministic? what changed? Ulrich Spörlein
2012-11-09 15:04 ` Andreas Schwab
2012-11-09 15:42   ` Ulrich Spörlein
2012-11-09 15:52     ` Matthieu Moy
2012-11-09 16:16       ` Jeff King
2012-11-09 18:27         ` Ulrich Spörlein
2012-11-12 11:27           ` Michael J Gruber
2012-11-20 16:22             ` Ulrich Spörlein
2012-11-20 20:39               ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).