The criss-cross merge case

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* The criss-cross merge case
@ 2005-04-27 20:25 Bram Cohen
  2005-04-27 23:32 ` Daniel Barkalow
  0 siblings, 1 reply; 10+ messages in thread
From: Bram Cohen @ 2005-04-27 20:25 UTC (permalink / raw
  To: git

Here's an example of where simple three-way merge can't do the right
thing. Each letter represents a snapshot of the history, and time goes
downwards. The numbers after some letters refer to which line number was
modified at that time.

A
|\
| \
|  \
|   \
|    \
|     \
|      \
B8      C3
|\     /|
| \   / |
|  \ /  |
|   X   |
|  / \  |
| /   \ |
|/     \|
D8      E3
 \      |
  \     |
   \    |
    \   |
     \  |
      \ |
       \|
        ?

In this case the ? should have a clean merge with the D vesion of line 8
(because it was made with the B version already in the history) and the E
version of line 3 (because it was made with the C version already in the
history).

The problem is that there's no single ancestor for the three-way merge
which does the right thing. If one picks B, then there will be an
unnecessary merge conflict at line 3, because D will have the C version
and E will have the E version but B will have neither. Likewise if one
picks C, there will be an unnecessary conflict at line 8 because D will
have the D version and E will have the B version but C will have neither.
Picking A will cause unnecessary conflicts on *both* lines.

The problem can actually be much worse than a simple unnecesary conflict,
because if the later updates were strict undos of the earlier updates,
then picking either B or C will merge something *wrong*. Using A as the
ancestor will keep that from happening, but it also maximizes unnecessary
conflicts.

Note that the above criss-cross case only involves two branches, using the
methodology of each one modifying their own section and pulling in old
versions of the other one from time to time. Cogito's interface encourages
exactly this work flow, which is not a bad thing from a work flow
perspective, but does make it hit this case regularly.

The way Git handles this currently is very bad, because it forces the
common ancestor to be from the same snapshot across all files, so this
problem will happen if the modifications are made even in different files,
not just different lines within the same file. That could be improved
greatly by finding an LCA for each file individually, which is what
Monotone does. Darcs, Codeville, and all the Arch descendants have better
merge algorithms which don't have to pick a single common ancestor.

-Bram

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: The criss-cross merge case
  2005-04-27 20:25 Bram Cohen
@ 2005-04-27 23:32 ` Daniel Barkalow
  2005-04-28  0:43   ` Tupshin Harper
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Barkalow @ 2005-04-27 23:32 UTC (permalink / raw
  To: Bram Cohen; +Cc: git

On Wed, 27 Apr 2005, Bram Cohen wrote:

> The way Git handles this currently is very bad, because it forces the
> common ancestor to be from the same snapshot across all files, so this
> problem will happen if the modifications are made even in different files,
> not just different lines within the same file. That could be improved
> greatly by finding an LCA for each file individually, which is what
> Monotone does.

The git core is perfectly sufficient for getting all LCAs or
per-file best LCAs; merge-base doesn't bother, currently, because the
deficiencies of "merge" (a.k.a. diff3) are worse than the issues with
chosing a suboptimal LCA.

My plan is to implement multi-file diff and merge with a suffix tree-based
algorithm, and then revisit the history stuff once we have a merger that
can do sensible things with this information.

Note that the present very bad merger is actually seems to be sufficient
for the Linux kernel, where patches from different sides of a merge are
generally either unrelated or are identical, and, otherwise, they tend to
be true conflicts where people fixed the same bug independantly in
different ways.

> Darcs, Codeville, and all the Arch descendants have better merge
> algorithms which don't have to pick a single common ancestor.

I've been looking at Darcs (which seems to have a good method, although I
think the underlying diff isn't great), and Codeville still doesn't have
any documentation. Arch's method is strictly weaker than 3-way merge, and
generates more rejects (not even conflicts) in my experience than even
CVS. 

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: The criss-cross merge case
  2005-04-27 23:32 ` Daniel Barkalow
@ 2005-04-28  0:43   ` Tupshin Harper
  2005-04-28  1:16     ` Daniel Barkalow
  0 siblings, 1 reply; 10+ messages in thread
From: Tupshin Harper @ 2005-04-28  0:43 UTC (permalink / raw
  To: Daniel Barkalow; +Cc: Bram Cohen, git

Daniel Barkalow wrote:

>I've been looking at Darcs (which seems to have a good method, although I
>think the underlying diff isn't great), and Codeville still doesn't have
>any documentation. Arch's method is strictly weaker than 3-way merge, and
>generates more rejects (not even conflicts) in my experience than even
>CVS. 
>
>	-Daniel
>  
>
Can you clarify what you mean by darcs' underlying diff not being that
great? It seems to function pretty much identically to gnu diff. In what
way would you want the underlying diff to be improved?

-Tupshin

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: The criss-cross merge case
  2005-04-28  0:43   ` Tupshin Harper
@ 2005-04-28  1:16     ` Daniel Barkalow
  2005-04-28  2:15       ` Benedikt Schmidt
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Barkalow @ 2005-04-28  1:16 UTC (permalink / raw
  To: Tupshin Harper; +Cc: Bram Cohen, git

On Wed, 27 Apr 2005, Tupshin Harper wrote:

> Can you clarify what you mean by darcs' underlying diff not being that
> great? It seems to function pretty much identically to gnu diff. In what
> way would you want the underlying diff to be improved?

GNU diff uses an algorithm which is tuned to handle finding the shortest
diff among a large set of similar-length alternatives while comparing
files which have a lot of repeated lines. The author of the paper it cites
is really thinking about diffing DNA sequences or similar things. It also
can't detect content moves, which are a common thing to have, and which
will be important in the long run, when we're trying to track
modifications to content which also moved from place to place.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: The criss-cross merge case
  2005-04-28  1:16     ` Daniel Barkalow
@ 2005-04-28  2:15       ` Benedikt Schmidt
  2005-04-28  2:19         ` Daniel Barkalow
  0 siblings, 1 reply; 10+ messages in thread
From: Benedikt Schmidt @ 2005-04-28  2:15 UTC (permalink / raw
  To: git

Daniel Barkalow <barkalow@iabervon.org> writes:

> On Wed, 27 Apr 2005, Tupshin Harper wrote:
>
>> Can you clarify what you mean by darcs' underlying diff not being that
>> great? It seems to function pretty much identically to gnu diff. In what
>> way would you want the underlying diff to be improved?
>
> GNU diff uses an algorithm which is tuned to handle finding the shortest
> diff among a large set of similar-length alternatives while comparing
> files which have a lot of repeated lines. The author of the paper it cites
> is really thinking about diffing DNA sequences or similar things.

AFAIK the paper mentioned in the GNU diff sources [1] is an improvement
to an earlier paper by the same author titled
"A File Comparison Program" - Miller, Myers - 1985.

Can you be more specific why the algorithm is a bad choice (performance,
quality of diff output)?

> It also can't detect content moves, which are a common thing to have, and
> which will be important in the long run, when we're trying to track
> modifications to content which also moved from place to place.

Ok, darcs doesn't handle block moves, so there is no need for an algorithm that
supports them (yet). Is there any free SCM that has support for block moves at
the moment? It seems like clearcase detects them, but I don't know where it
takes advantage of it.

Benedikt

[1] http://citeseer.ist.psu.edu/myers86ond.html

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: The criss-cross merge case
  2005-04-28  2:15       ` Benedikt Schmidt
@ 2005-04-28  2:19         ` Daniel Barkalow
  2005-04-28 11:16           ` David Roundy
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Barkalow @ 2005-04-28  2:19 UTC (permalink / raw
  To: Benedikt Schmidt; +Cc: git

On Thu, 28 Apr 2005, Benedikt Schmidt wrote:

> AFAIK the paper mentioned in the GNU diff sources [1] is an improvement
> to an earlier paper by the same author titled
> "A File Comparison Program" - Miller, Myers - 1985.

GNU diff is based on a better algorithm than traditional diff, reportly,
but there are better algorithms still, developed since, at least according
to a brief literature search on Google Scholar. (bdiff and vdelta, for
example, which can identify block moves as well.)

> Can you be more specific why the algorithm is a bad choice (performance,
> quality of diff output)?

I suspect that the speed is suboptimal (for the cases under which it is
actually used). The quality of the output is about ideal, lacking a
representation for block moves, but I'm hoping to have a diff/merge set
that handles block moves effectively, even if it can't report them in diff
format. I'm also hoping for an annotate function that could use block
moves.

> Ok, darcs doesn't handle block moves, so there is no need for an algorithm that
> supports them (yet). Is there any free SCM that has support for block moves at
> the moment? It seems like clearcase detects them, but I don't know where it
> takes advantage of it.

I would think that darcs would be able to do neat things in its merger if
it knew about block moves. Obviously, it only makes sense to add support
for identifying them and using them at the same time.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: The criss-cross merge case
  2005-04-28  2:19         ` Daniel Barkalow
@ 2005-04-28 11:16           ` David Roundy
  0 siblings, 0 replies; 10+ messages in thread
From: David Roundy @ 2005-04-28 11:16 UTC (permalink / raw
  To: git

On Wed, Apr 27, 2005 at 10:19:17PM -0400, Daniel Barkalow wrote:
> On Thu, 28 Apr 2005, Benedikt Schmidt wrote:
> > Ok, darcs doesn't handle block moves, so there is no need for an
> > algorithm that supports them (yet). Is there any free SCM that has
> > support for block moves at the moment? It seems like clearcase detects
> > them, but I don't know whqere it takes advantage of it.
> 
> I would think that darcs would be able to do neat things in its merger if
> it knew about block moves. Obviously, it only makes sense to add support
> for identifying them and using them at the same time.

Indeed, handling block moves would definitely be *very* nice.  An ancient
version of darcs actually did this (it's not in the current darcs history,
since it was so ancient and buggy), although it had a terrible diff
algorithm.  But I really didn't understand the theory back then, and when I
rewrote everything, I never added the block moves back in.  They complicate
conflict situations a bit, and once I found that darcs was actually
useable, I started focusing on other issues.  (Most recently efficiency
issues.)
-- 
David Roundy
http://www.darcs.net

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: The criss-cross merge case
@ 2005-04-28 14:25 Adam J. Richter
  2005-04-29 12:19 ` Wayne Scott
  0 siblings, 1 reply; 10+ messages in thread
From: Adam J. Richter @ 2005-04-28 14:25 UTC (permalink / raw
  To: ry102; +Cc: barkalow, bram, droundry, git, tupshin

On 2005-04-28, Benedikt Schmidt wrote:
>AFAIK the paper mentioned in the GNU diff sources [1] is an improvement
>to an earlier paper by the same author titled
>"A File Comparison Program" - Miller, Myers - 1985.
[...]
>[1] http://citeseer.ist.psu.edu/myers86ond.html

	Monotone apparently uses a futher acceleration of that algorithm
from the 1989 paper, also co-authored by the Myers, "An O(NP) Sequence
Comparison Algorithm" by Sun Wu, Udi Manber, and Gene Myers.
http://www.eecs.berkeley.edu/~gene/Papers/np_diff.pdf .  The Monotone
implementation was apparently a port of an implementation originally
written in Scheme by Aubrey Jaffer.

	I don't fully understand the 1989 paper, but I get the
general impression that is a small change to the previous algorithm
(the one in GNU diff) that might be a 30 line patch if someone
got around to submitting it, and seems to make the code run more
than twice as fast in practice.  One of these days, I will probably get
around to coding up a patch to GNU diff if nobody beats me to it.

	Making diff run faster may have at least one potentially useful
benefit for merging.  A faster diff makes it more practical run diff
on smaller units of comparison.  I posted a note here before about
converting the input files to diff3 to have just one character per
line, and then undoing that transformation of the result to produce
a character based merge that seemed to work pretty well in the
couple of tests that I tried.

                    __     ______________ 
Adam J. Richter        \ /
adam@yggdrasil.com      | g g d r a s i l

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: The criss-cross merge case
  2005-04-28 14:25 Adam J. Richter
@ 2005-04-29 12:19 ` Wayne Scott
  0 siblings, 0 replies; 10+ messages in thread
From: Wayne Scott @ 2005-04-29 12:19 UTC (permalink / raw
  To: Adam J. Richter; +Cc: ry102, barkalow, bram, droundry, git, tupshin

On 4/28/05, Adam J. Richter <adam@yggdrasil.com> wrote:
> On 2005-04-28, Benedikt Schmidt wrote:
> >AFAIK the paper mentioned in the GNU diff sources [1] is an improvement
> >to an earlier paper by the same author titled
> >"A File Comparison Program" - Miller, Myers - 1985.
> [...]
> >[1] http://citeseer.ist.psu.edu/myers86ond.html
> 
>         Monotone apparently uses a futher acceleration of that algorithm
> from the 1989 paper, also co-authored by the Myers, "An O(NP) Sequence
> Comparison Algorithm" by Sun Wu, Udi Manber, and Gene Myers.
> http://www.eecs.berkeley.edu/~gene/Papers/np_diff.pdf .  The Monotone
> implementation was apparently a port of an implementation originally
> written in Scheme by Aubrey Jaffer.
> 
>         I don't fully understand the 1989 paper, but I get the
> general impression that is a small change to the previous algorithm
> (the one in GNU diff) that might be a 30 line patch if someone
> got around to submitting it, and seems to make the code run more
> than twice as fast in practice.  One of these days, I will probably get
> around to coding up a patch to GNU diff if nobody beats me to it.
> 
>         Making diff run faster may have at least one potentially useful
> benefit for merging.  A faster diff makes it more practical run diff
> on smaller units of comparison.  I posted a note here before about
> converting the input files to diff3 to have just one character per
> line, and then undoing that transformation of the result to produce
> a character based merge that seemed to work pretty well in the
> couple of tests that I tried.

I just read that paper and unless I am mistaken, it already describes
the basis for how GNU diff works.  I don't think anything in that
paper would make it faster.

I also don't find anything to suggest the Monotone guys have rewritten
diff.  Just some notes from graydon that notes python's difflib uses a
non-optimal diff that is faster in some cases.

-Wayne

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: The criss-cross merge case
@ 2005-04-30 12:32 Adam J. Richter
  0 siblings, 0 replies; 10+ messages in thread
From: Adam J. Richter @ 2005-04-30 12:32 UTC (permalink / raw
  To: wsc9tt; +Cc: barkalow, bram, droundry, git, ry102, tupshin

On Fri, 29 Apr 2005 07:19:18 -0500, Wayne Scott wrote:
>On 4/28/05, Adam J. Richter <adam@yggdrasil.com> wrote:
>> On 2005-04-28, Benedikt Schmidt wrote:
>> >AFAIK the paper mentioned in the GNU diff sources [1] is an improvement
>> >to an earlier paper by the same author titled
>> >"A File Comparison Program" - Miller, Myers - 1985.
>> [...]
>> >[1] http://citeseer.ist.psu.edu/myers86ond.html
>> 
>>         Monotone apparently uses a futher acceleration of that algorithm
>> from the 1989 paper, also co-authored by the Myers, "An O(NP) Sequence
>> Comparison Algorithm" by Sun Wu, Udi Manber, and Gene Myers.
>> http://www.eecs.berkeley.edu/~gene/Papers/np_diff.pdf .  The Monotone
>> implementation was apparently a port of an implementation originally
>> written in Scheme by Aubrey Jaffer.
>> 
>>         I don't fully understand the 1989 paper, but I get the
>> general impression that is a small change to the previous algorithm
>> (the one in GNU diff) that might be a 30 line patch if someone
>> got around to submitting it, and seems to make the code run more
>> than twice as fast in practice.  One of these days, I will probably get
>> around to coding up a patch to GNU diff if nobody beats me to it.
>> 
>>         Making diff run faster may have at least one potentially useful
>> benefit for merging.  A faster diff makes it more practical run diff
>> on smaller units of comparison.  I posted a note here before about
>> converting the input files to diff3 to have just one character per
>> line, and then undoing that transformation of the result to produce
>> a character based merge that seemed to work pretty well in the
>> couple of tests that I tried.

>I just read that paper and unless I am mistaken, it already describes
>the basis for how GNU diff works.  I don't think anything in that
>paper would make it faster.
>
>I also don't find anything to suggest the Monotone guys have rewritten
>diff.  Just some notes from graydon that notes python's difflib uses a
>non-optimal diff that is faster in some cases.

	In terminology that can only be understood by reading
the 1985 paper, the 1989 paper describes a possible reduction
in the number of diagonals in the edit graph that iterations of the
1989 algorithm have to consider.  I say "possible reduction" because
the reduction can be zero in the worse case, although I get the
impression that it should be a reduction of 50% or better
typically, and it makes the case where the changes is just
a bunch of inserts run in linear time.

	I believe that the longest common subsequence finder
at the core of GNU diff does not currently perform this optimization,
but the one in monotone-0.18/lcs.{cc,hh} does.

                    __     ______________
Adam J. Richter        \ /
adam@yggdrasil.com      | g g d r a s i l

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2005-04-30 13:36 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-04-30 12:32 The criss-cross merge case Adam J. Richter
  -- strict thread matches above, loose matches on Subject: below --
2005-04-28 14:25 Adam J. Richter
2005-04-29 12:19 ` Wayne Scott
2005-04-27 20:25 Bram Cohen
2005-04-27 23:32 ` Daniel Barkalow
2005-04-28  0:43   ` Tupshin Harper
2005-04-28  1:16     ` Daniel Barkalow
2005-04-28  2:15       ` Benedikt Schmidt
2005-04-28  2:19         ` Daniel Barkalow
2005-04-28 11:16           ` David Roundy

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).