git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git merge -s subtree seems to be broken.
@ 2018-07-31 14:09 George Shammas
  2018-07-31 15:03 ` George Shammas
  0 siblings, 1 reply; 17+ messages in thread
From: George Shammas @ 2018-07-31 14:09 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 871 bytes --]

At work, we recently updated from a massively old version of git (1.7.10)
to 2.18. There are a few code bases that use subtrees, and they seem to
have completely broke when trying to merge in updates.

I have confirmed that it works correctly in 1.7.10.  The 2.18 behavior is
clearly incorrect.

git init
echo init > test
git add test
git commit -m init

git remote add tig https://github.com/jonas/tig.git
git fetch tig
git merge -s ours --no-commit --allow-unrelated-histories tig-2.3.0
git read-tree --prefix=src/ -u tig-2.3.0
git commit -m "Get upstream tig-2.3.0"
# Notice how the history are merged, and that the source from the upstream
repo is in src

echo update > test
git commit -a -m "test"

git merge -s subtree tig-2.4.0
# Boom, in 2.18 instead of merging into the subtree, it just deletes
everything in the repository, which is clearly the wrong behavior.

[-- Attachment #2: Type: text/html, Size: 1212 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: git merge -s subtree seems to be broken.
  2018-07-31 14:09 git merge -s subtree seems to be broken George Shammas
@ 2018-07-31 15:03 ` George Shammas
  2018-07-31 15:50   ` Jeff King
  2018-07-31 15:53   ` Junio C Hamano
  0 siblings, 2 replies; 17+ messages in thread
From: George Shammas @ 2018-07-31 15:03 UTC (permalink / raw)
  To: git

[-- Attachment #1: Type: text/plain, Size: 1204 bytes --]

Bisecting around, this might be the commit that introduced the breakage.

https://github.com/git/git/commit/d8febde

I really hope that it hasn't been broken for 5 years and I am just doing
something wrong.

On Tue, Jul 31, 2018 at 10:09 AM George Shammas <georgyo@gmail.com> wrote:

> At work, we recently updated from a massively old version of git (1.7.10)
> to 2.18. There are a few code bases that use subtrees, and they seem to
> have completely broke when trying to merge in updates.
>
> I have confirmed that it works correctly in 1.7.10.  The 2.18 behavior is
> clearly incorrect.
>
> git init
> echo init > test
> git add test
> git commit -m init
>
> git remote add tig https://github.com/jonas/tig.git
> git fetch tig
> git merge -s ours --no-commit --allow-unrelated-histories tig-2.3.0
> git read-tree --prefix=src/ -u tig-2.3.0
> git commit -m "Get upstream tig-2.3.0"
> # Notice how the history are merged, and that the source from the upstream
> repo is in src
>
> echo update > test
> git commit -a -m "test"
>
> git merge -s subtree tig-2.4.0
> # Boom, in 2.18 instead of merging into the subtree, it just deletes
> everything in the repository, which is clearly the wrong behavior.
>

[-- Attachment #2: Type: text/html, Size: 1863 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: git merge -s subtree seems to be broken.
  2018-07-31 15:03 ` George Shammas
@ 2018-07-31 15:50   ` Jeff King
  2018-07-31 16:08     ` Junio C Hamano
  2018-08-01  0:58     ` René Scharfe
  2018-07-31 15:53   ` Junio C Hamano
  1 sibling, 2 replies; 17+ messages in thread
From: Jeff King @ 2018-07-31 15:50 UTC (permalink / raw)
  To: George Shammas; +Cc: René Scharfe, git

On Tue, Jul 31, 2018 at 11:03:17AM -0400, George Shammas wrote:

> Bisecting around, this might be the commit that introduced the breakage.
> 
> https://github.com/git/git/commit/d8febde
> 
> I really hope that it hasn't been broken for 5 years and I am just doing
> something wrong.

Unfortunately, I think it has been broken for five years.

The problem introduced in that commit is that each iteration through the
loop advances the tree pointers. But when we're walking two lists and
see that one omits an entry the other has, we have to advance _one_ list
and keep the other where it is. So if we instrument the score_*
functions to see which ones trigger, your reproduction gives this with
the original code:

  warning: scoring trees:
    8e12d9b6bc57fe6308315914628dd4fd7665ca59
    aed1d7c5809e53d49b52c43a6103827046d60286
  warning: score_matches: .bookignore
  warning: score_matches: .gitignore
  warning: score_matches: .mailmap
  warning: score_differs: .travis.yml
  warning: score_matches: COPYING
  warning: score_differs: INSTALL.adoc
  warning: score_differs: Makefile
  warning: score_differs: NEWS.adoc
  warning: score_differs: README.adoc
  warning: score_missing: appveyor.yml
  warning: score_matches: autogen.sh
  warning: score_matches: book.json
  [...]

and the new one does:

  warning: scoring trees:
    8e12d9b6bc57fe6308315914628dd4fd7665ca59
    aed1d7c5809e53d49b52c43a6103827046d60286
  warning: score_matches: .bookignore
  warning: score_matches: .gitignore
  warning: score_matches: .mailmap
  warning: score_differs: .travis.yml
  warning: score_matches: COPYING
  warning: score_differs: INSTALL.adoc
  warning: score_differs: Makefile
  warning: score_differs: NEWS.adoc
  warning: score_differs: README.adoc
  warning: score_missing: appveyor.yml
  warning: score_missing: autogen.sh
  warning: score_missing: book.json

We're fine at first, but as soon as one tree has appveyor.yml and the
other doesn't, we get out of sync. We compare "autogen" and "appveyor",
and realize that they do not match. But then we need to increment
pointer for the tree with "appveyor" only, and leave the other in place,
at which point we'd realize that they both have "autogen". Instead, we
increment both, and after that we compare "autogen.sh" to "book.json",
and so on.

So the assertion in that commit message that "the calls to
update_tree_entry() are not needed any more" is just wrong. We have
decide whether to call it based on the "cmp" value.

I quoted your original reproduction below for the benefit of René
(cc'd).

-Peff

> On Tue, Jul 31, 2018 at 10:09 AM George Shammas <georgyo@gmail.com> wrote:
> 
> > At work, we recently updated from a massively old version of git (1.7.10)
> > to 2.18. There are a few code bases that use subtrees, and they seem to
> > have completely broke when trying to merge in updates.
> >
> > I have confirmed that it works correctly in 1.7.10.  The 2.18 behavior is
> > clearly incorrect.
> >
> > git init
> > echo init > test
> > git add test
> > git commit -m init
> >
> > git remote add tig https://github.com/jonas/tig.git
> > git fetch tig
> > git merge -s ours --no-commit --allow-unrelated-histories tig-2.3.0
> > git read-tree --prefix=src/ -u tig-2.3.0
> > git commit -m "Get upstream tig-2.3.0"
> > # Notice how the history are merged, and that the source from the upstream
> > repo is in src
> >
> > echo update > test
> > git commit -a -m "test"
> >
> > git merge -s subtree tig-2.4.0
> > # Boom, in 2.18 instead of merging into the subtree, it just deletes
> > everything in the repository, which is clearly the wrong behavior.
> >

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: git merge -s subtree seems to be broken.
  2018-07-31 15:03 ` George Shammas
  2018-07-31 15:50   ` Jeff King
@ 2018-07-31 15:53   ` Junio C Hamano
  2018-07-31 15:56     ` George Shammas
  2018-07-31 16:15     ` Jeff King
  1 sibling, 2 replies; 17+ messages in thread
From: Junio C Hamano @ 2018-07-31 15:53 UTC (permalink / raw)
  To: René Scharfe, George Shammas; +Cc: git

George Shammas <georgyo@gmail.com> writes:

> Bisecting around, this might be the commit that introduced the breakage.
>
> https://github.com/git/git/commit/d8febde

Interesting.  I've never used the "-s subtree" strategy without
"-Xsubtree=..." to explicitly tell where the thing should go for a
long time, so I am not surprised if I did not notice if an update to
the heuristics made long time ago had affected tree matching.

d8febde3 ("match-trees: simplify score_trees() using tree_entry()",
2013-03-24) does touch the area that may affect the subtree matching
behaviour.

Because it is an update to heuristics, and as such, we need to be
careful when saying it is or is not "broken".  Some heuristics may
work better with your particular case, and may do worse with other
cases.

But from the log message description, it looks like it was meant to
be a no-op simplification rewrite that should not affect the outcome,
so it is a bit surprising.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: git merge -s subtree seems to be broken.
  2018-07-31 15:53   ` Junio C Hamano
@ 2018-07-31 15:56     ` George Shammas
  2018-07-31 16:15     ` Jeff King
  1 sibling, 0 replies; 17+ messages in thread
From: George Shammas @ 2018-07-31 15:56 UTC (permalink / raw)
  To: gitster; +Cc: l.s.r, git

[-- Attachment #1: Type: text/plain, Size: 1152 bytes --]

While debugging this, I did try -X subtree=src/ however the effect was the
same.

On Tue, Jul 31, 2018 at 11:53 AM Junio C Hamano <gitster@pobox.com> wrote:

> George Shammas <georgyo@gmail.com> writes:
>
> > Bisecting around, this might be the commit that introduced the breakage.
> >
> > https://github.com/git/git/commit/d8febde
>
> Interesting.  I've never used the "-s subtree" strategy without
> "-Xsubtree=..." to explicitly tell where the thing should go for a
> long time, so I am not surprised if I did not notice if an update to
> the heuristics made long time ago had affected tree matching.
>
> d8febde3 ("match-trees: simplify score_trees() using tree_entry()",
> 2013-03-24) does touch the area that may affect the subtree matching
> behaviour.
>
> Because it is an update to heuristics, and as such, we need to be
> careful when saying it is or is not "broken".  Some heuristics may
> work better with your particular case, and may do worse with other
> cases.
>
> But from the log message description, it looks like it was meant to
> be a no-op simplification rewrite that should not affect the outcome,
> so it is a bit surprising.
>

[-- Attachment #2: Type: text/html, Size: 1668 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: git merge -s subtree seems to be broken.
  2018-07-31 15:50   ` Jeff King
@ 2018-07-31 16:08     ` Junio C Hamano
  2018-08-01  0:58     ` René Scharfe
  1 sibling, 0 replies; 17+ messages in thread
From: Junio C Hamano @ 2018-07-31 16:08 UTC (permalink / raw)
  To: Jeff King; +Cc: George Shammas, René Scharfe, git

Jeff King <peff@peff.net> writes:

> The problem introduced in that commit is that each iteration through the
> loop advances the tree pointers.

Ah, indeed.  

The original used tree_entry_extract() and update_tree_entry()
separately, but the update does tree_entry() on both sides.

> So the assertion in that commit message that "the calls to
> update_tree_entry() are not needed any more" is just wrong. We have
> decide whether to call it based on the "cmp" value.

Yup.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: git merge -s subtree seems to be broken.
  2018-07-31 15:53   ` Junio C Hamano
  2018-07-31 15:56     ` George Shammas
@ 2018-07-31 16:15     ` Jeff King
  2018-07-31 17:17       ` Junio C Hamano
  1 sibling, 1 reply; 17+ messages in thread
From: Jeff King @ 2018-07-31 16:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, George Shammas, git

On Tue, Jul 31, 2018 at 08:53:23AM -0700, Junio C Hamano wrote:

> George Shammas <georgyo@gmail.com> writes:
> 
> > Bisecting around, this might be the commit that introduced the breakage.
> >
> > https://github.com/git/git/commit/d8febde
> 
> Interesting.  I've never used the "-s subtree" strategy without
> "-Xsubtree=..." to explicitly tell where the thing should go for a
> long time, so I am not surprised if I did not notice if an update to
> the heuristics made long time ago had affected tree matching.
> 
> d8febde3 ("match-trees: simplify score_trees() using tree_entry()",
> 2013-03-24) does touch the area that may affect the subtree matching
> behaviour.
> 
> Because it is an update to heuristics, and as such, we need to be
> careful when saying it is or is not "broken".  Some heuristics may
> work better with your particular case, and may do worse with other
> cases.
> 
> But from the log message description, it looks like it was meant to
> be a no-op simplification rewrite that should not affect the outcome,
> so it is a bit surprising.

Yeah, this is definitely not "well, the heuristic changed a bit". It's
just broken. This fixes it, but we should probably add a test.

diff --git a/match-trees.c b/match-trees.c
index 4cdeff53e1..730fff4cfb 100644
--- a/match-trees.c
+++ b/match-trees.c
@@ -83,34 +83,40 @@ static int score_trees(const struct object_id *hash1, const struct object_id *ha
 	int score = 0;
 
 	for (;;) {
-		struct name_entry e1, e2;
-		int got_entry_from_one = tree_entry(&one, &e1);
-		int got_entry_from_two = tree_entry(&two, &e2);
 		int cmp;
 
-		if (got_entry_from_one && got_entry_from_two)
-			cmp = base_name_entries_compare(&e1, &e2);
-		else if (got_entry_from_one)
+		if (one.size && two.size)
+			cmp = base_name_entries_compare(&one.entry, &two.entry);
+		else if (one.size)
 			/* two lacks this entry */
 			cmp = -1;
-		else if (got_entry_from_two)
+		else if (two.size)
 			/* two has more entries */
 			cmp = 1;
 		else
 			break;
 
-		if (cmp < 0)
+		if (cmp < 0) {
 			/* path1 does not appear in two */
-			score += score_missing(e1.mode, e1.path);
-		else if (cmp > 0)
+			score += score_missing(one.entry.mode, one.entry.path);
+			update_tree_entry(&one);
+			continue;
+		} else if (cmp > 0) {
 			/* path2 does not appear in one */
-			score += score_missing(e2.mode, e2.path);
-		else if (oidcmp(e1.oid, e2.oid))
+			score += score_missing(two.entry.mode, two.entry.path);
+			update_tree_entry(&two);
+			continue;
+		} if (oidcmp(one.entry.oid, two.entry.oid)) {
 			/* they are different */
-			score += score_differs(e1.mode, e2.mode, e1.path);
-		else
+			score += score_differs(one.entry.mode, two.entry.mode,
+					       one.entry.path);
+		} else {
 			/* same subtree or blob */
-			score += score_matches(e1.mode, e2.mode, e1.path);
+			score += score_matches(one.entry.mode, two.entry.mode,
+					       one.entry.path);
+		}
+		update_tree_entry(&one);
+		update_tree_entry(&two);
 	}
 	free(one_buf);
 	free(two_buf);

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: git merge -s subtree seems to be broken.
  2018-07-31 16:15     ` Jeff King
@ 2018-07-31 17:17       ` Junio C Hamano
  2018-07-31 17:23         ` Jeff King
  0 siblings, 1 reply; 17+ messages in thread
From: Junio C Hamano @ 2018-07-31 17:17 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, George Shammas, git

Jeff King <peff@peff.net> writes:

> +...
> +		} else if (cmp > 0) {
>  			/* path2 does not appear in one */
> +			score += score_missing(two.entry.mode, two.entry.path);
> +			update_tree_entry(&two);
> +			continue;
> +		} if (oidcmp(one.entry.oid, two.entry.oid)) {

As the earlier ones do the "continue at the end of the block", this
does not affect the correctness, but I think you either meant "else if"
or a fresh "if/else" that is disconnected from the previous if/else if/...
chain.



>  			/* they are different */
> ...
> +			score += score_differs(one.entry.mode, two.entry.mode,
> +					       one.entry.path);
> +		} else {


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: git merge -s subtree seems to be broken.
  2018-07-31 17:17       ` Junio C Hamano
@ 2018-07-31 17:23         ` Jeff King
  2018-07-31 19:04           ` Jeff King
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff King @ 2018-07-31 17:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, George Shammas, git

On Tue, Jul 31, 2018 at 10:17:15AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > +...
> > +		} else if (cmp > 0) {
> >  			/* path2 does not appear in one */
> > +			score += score_missing(two.entry.mode, two.entry.path);
> > +			update_tree_entry(&two);
> > +			continue;
> > +		} if (oidcmp(one.entry.oid, two.entry.oid)) {
> 
> As the earlier ones do the "continue at the end of the block", this
> does not affect the correctness, but I think you either meant "else if"
> or a fresh "if/else" that is disconnected from the previous if/else if/...
> chain.

Yes, thanks. I actually started to write it without the "continue" at
all, and a big "else" that checked the "we have both" case. But I backed
that out (in favor of a smaller diff), and forgot to add back in the
"else if".

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: git merge -s subtree seems to be broken.
  2018-07-31 17:23         ` Jeff King
@ 2018-07-31 19:04           ` Jeff King
  2018-07-31 19:52             ` George Shammas
  2018-07-31 21:06             ` Junio C Hamano
  0 siblings, 2 replies; 17+ messages in thread
From: Jeff King @ 2018-07-31 19:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, George Shammas, git

On Tue, Jul 31, 2018 at 01:23:04PM -0400, Jeff King wrote:

> On Tue, Jul 31, 2018 at 10:17:15AM -0700, Junio C Hamano wrote:
> 
> > Jeff King <peff@peff.net> writes:
> > 
> > > +...
> > > +		} else if (cmp > 0) {
> > >  			/* path2 does not appear in one */
> > > +			score += score_missing(two.entry.mode, two.entry.path);
> > > +			update_tree_entry(&two);
> > > +			continue;
> > > +		} if (oidcmp(one.entry.oid, two.entry.oid)) {
> > 
> > As the earlier ones do the "continue at the end of the block", this
> > does not affect the correctness, but I think you either meant "else if"
> > or a fresh "if/else" that is disconnected from the previous if/else if/...
> > chain.
> 
> Yes, thanks. I actually started to write it without the "continue" at
> all, and a big "else" that checked the "we have both" case. But I backed
> that out (in favor of a smaller diff), and forgot to add back in the
> "else if".

So here it is fixed, and with a commit message. I'm not happy to omit a
regression test, but I actually couldn't come up with a minimal one that
tickled the problem, because we're playing around with heuristics. So I
compensated by probably over-explaining in the commit message. But
clearly this is not a well-tested code path given the length of time
between introducing and detecting the bug.

-- >8 --
Subject: [PATCH] score_trees(): fix iteration over trees with missing entries

In score_trees(), we walk over two sorted trees to find
which entries are missing or have different content between
the two.  So if we have two trees with these entries:

  one   two
  ---   ---
  a     a
  b     c
  c     d

we'd expect the loop to:

  - compare "a" to "a"

  - compare "b" to "c"; because these are sorted lists, we
    know that the second tree does not have "b"

  - compare "c" to "c"

  - compare "d" to end-of-list; we know that the first tree
    does not have "d"

And prior to d8febde370 (match-trees: simplify score_trees()
using tree_entry(), 2013-03-24) that worked. But after that
commit, we mistakenly increment the tree pointers for every
loop iteration, even when we've processed the entry for only
one side. As a result, we end up doing this:

  - compare "a" to "a"

  - compare "b" to "c"; we know that we do not have "b", but
    we still increment both tree pointers; at this point
    we're out of sync and all further comparisons are wrong

  - compare "c" to "d" and mistakenly claim that the second
    tree does not have "c"

  - exit the loop, mistakenly not realizing that the first
    tree does not have "d"

So contrary to the claim in d8febde370, we really do need to
manually use update_tree_entry(), because advancing the tree
pointer depends on the entry comparison.

That means we must stop using tree_entry() to access each
entry, since it auto-advances the pointer. Instead:

  - we'll use tree_desc.size directly to know if there's
    anything left to look at (which is what tree_entry() was
    doing under the hood)

  - rather than do an extra struct assignment to "e1" and
    "e2", we can just access the "entry" field of tree_desc
    directly

That makes us a little more intimate with the tree_desc
code, but that's not uncommon for its callers.

There's no regression test here, as it's a little tricky to
trigger this with a minimal example. The user-visible effect
is that the heuristics fail to correlate two trees that
should be. But in a minimal example, there aren't a lot of
other trees to match, so we often end up doing the right
thing anyway.

A real-world example (from the original bug report) is:

-- >8 --
git init repo
cd repo

echo init >file
git add file
git commit -m init

git remote add tig https://github.com/jonas/tig.git
git fetch tig
git merge -s ours --no-commit --allow-unrelated-histories tig-2.3.0
git read-tree --prefix=src/ -u tig-2.3.0
git commit -m 'get upstream tig-2.3.0'

echo update >file
git commit -a -m update

git merge -s subtree tig-2.4.0
-- 8< --

Before this patch, we fail to realize that the tig-2.4.0
content should go into the "src" directory.

Signed-off-by: Jeff King <peff@peff.net>
---
 match-trees.c | 43 ++++++++++++++++++++++++++-----------------
 1 file changed, 26 insertions(+), 17 deletions(-)

diff --git a/match-trees.c b/match-trees.c
index 4cdeff53e1..37653308d3 100644
--- a/match-trees.c
+++ b/match-trees.c
@@ -83,34 +83,43 @@ static int score_trees(const struct object_id *hash1, const struct object_id *ha
 	int score = 0;
 
 	for (;;) {
-		struct name_entry e1, e2;
-		int got_entry_from_one = tree_entry(&one, &e1);
-		int got_entry_from_two = tree_entry(&two, &e2);
 		int cmp;
 
-		if (got_entry_from_one && got_entry_from_two)
-			cmp = base_name_entries_compare(&e1, &e2);
-		else if (got_entry_from_one)
+		if (one.size && two.size)
+			cmp = base_name_entries_compare(&one.entry, &two.entry);
+		else if (one.size)
 			/* two lacks this entry */
 			cmp = -1;
-		else if (got_entry_from_two)
+		else if (two.size)
 			/* two has more entries */
 			cmp = 1;
 		else
 			break;
 
-		if (cmp < 0)
+		if (cmp < 0) {
 			/* path1 does not appear in two */
-			score += score_missing(e1.mode, e1.path);
-		else if (cmp > 0)
+			score += score_missing(one.entry.mode, one.entry.path);
+			update_tree_entry(&one);
+		} else if (cmp > 0) {
 			/* path2 does not appear in one */
-			score += score_missing(e2.mode, e2.path);
-		else if (oidcmp(e1.oid, e2.oid))
-			/* they are different */
-			score += score_differs(e1.mode, e2.mode, e1.path);
-		else
-			/* same subtree or blob */
-			score += score_matches(e1.mode, e2.mode, e1.path);
+			score += score_missing(two.entry.mode, two.entry.path);
+			update_tree_entry(&two);
+		} else {
+			/* path appears in both */
+			if (oidcmp(one.entry.oid, two.entry.oid)) {
+				/* they are different */
+				score += score_differs(one.entry.mode,
+						       two.entry.mode,
+						       one.entry.path);
+			} else {
+				/* same subtree or blob */
+				score += score_matches(one.entry.mode,
+						       two.entry.mode,
+						       one.entry.path);
+			}
+			update_tree_entry(&one);
+			update_tree_entry(&two);
+		}
 	}
 	free(one_buf);
 	free(two_buf);
-- 
2.18.0.796.g4bfd63b683


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: git merge -s subtree seems to be broken.
  2018-07-31 19:04           ` Jeff King
@ 2018-07-31 19:52             ` George Shammas
  2018-07-31 20:40               ` Jeff King
  2018-07-31 21:06             ` Junio C Hamano
  1 sibling, 1 reply; 17+ messages in thread
From: George Shammas @ 2018-07-31 19:52 UTC (permalink / raw)
  To: peff; +Cc: Junio C Hamano, l.s.r, git

[-- Attachment #1: Type: text/plain, Size: 7840 bytes --]

This is the fastest I ever seen an open source project respond to an issue
I reported. Thanks for being awesome!

On Tue, Jul 31, 2018 at 3:05 PM Jeff King <peff@peff.net> wrote:

> On Tue, Jul 31, 2018 at 01:23:04PM -0400, Jeff King wrote:
>
> > On Tue, Jul 31, 2018 at 10:17:15AM -0700, Junio C Hamano wrote:
> >
> > > Jeff King <peff@peff.net> writes:
> > >
> > > > +...
> > > > +         } else if (cmp > 0) {
> > > >                   /* path2 does not appear in one */
> > > > +                 score += score_missing(two.entry.mode,
> two.entry.path);
> > > > +                 update_tree_entry(&two);
> > > > +                 continue;
> > > > +         } if (oidcmp(one.entry.oid, two.entry.oid)) {
> > >
> > > As the earlier ones do the "continue at the end of the block", this
> > > does not affect the correctness, but I think you either meant "else if"
> > > or a fresh "if/else" that is disconnected from the previous if/else
> if/...
> > > chain.
> >
> > Yes, thanks. I actually started to write it without the "continue" at
> > all, and a big "else" that checked the "we have both" case. But I backed
> > that out (in favor of a smaller diff), and forgot to add back in the
> > "else if".
>
> So here it is fixed, and with a commit message. I'm not happy to omit a
> regression test, but I actually couldn't come up with a minimal one that
> tickled the problem, because we're playing around with heuristics. So I
> compensated by probably over-explaining in the commit message. But
> clearly this is not a well-tested code path given the length of time
> between introducing and detecting the bug.
>
> -- >8 --
> Subject: [PATCH] score_trees(): fix iteration over trees with missing
> entries
>
> In score_trees(), we walk over two sorted trees to find
> which entries are missing or have different content between
> the two.  So if we have two trees with these entries:
>
>   one   two
>   ---   ---
>   a     a
>   b     c
>   c     d
>
> we'd expect the loop to:
>
>   - compare "a" to "a"
>
>   - compare "b" to "c"; because these are sorted lists, we
>     know that the second tree does not have "b"
>
>   - compare "c" to "c"
>
>   - compare "d" to end-of-list; we know that the first tree
>     does not have "d"
>
> And prior to d8febde370 (match-trees: simplify score_trees()
> using tree_entry(), 2013-03-24) that worked. But after that
> commit, we mistakenly increment the tree pointers for every
> loop iteration, even when we've processed the entry for only
> one side. As a result, we end up doing this:
>
>   - compare "a" to "a"
>
>   - compare "b" to "c"; we know that we do not have "b", but
>     we still increment both tree pointers; at this point
>     we're out of sync and all further comparisons are wrong
>
>   - compare "c" to "d" and mistakenly claim that the second
>     tree does not have "c"
>
>   - exit the loop, mistakenly not realizing that the first
>     tree does not have "d"
>
> So contrary to the claim in d8febde370, we really do need to
> manually use update_tree_entry(), because advancing the tree
> pointer depends on the entry comparison.
>
> That means we must stop using tree_entry() to access each
> entry, since it auto-advances the pointer. Instead:
>
>   - we'll use tree_desc.size directly to know if there's
>     anything left to look at (which is what tree_entry() was
>     doing under the hood)
>
>   - rather than do an extra struct assignment to "e1" and
>     "e2", we can just access the "entry" field of tree_desc
>     directly
>
> That makes us a little more intimate with the tree_desc
> code, but that's not uncommon for its callers.
>
> There's no regression test here, as it's a little tricky to
> trigger this with a minimal example. The user-visible effect
> is that the heuristics fail to correlate two trees that
> should be. But in a minimal example, there aren't a lot of
> other trees to match, so we often end up doing the right
> thing anyway.
>
> A real-world example (from the original bug report) is:
>
> -- >8 --
> git init repo
> cd repo
>
> echo init >file
> git add file
> git commit -m init
>
> git remote add tig https://github.com/jonas/tig.git
> git fetch tig
> git merge -s ours --no-commit --allow-unrelated-histories tig-2.3.0
> git read-tree --prefix=src/ -u tig-2.3.0
> git commit -m 'get upstream tig-2.3.0'
>
> echo update >file
> git commit -a -m update
>
> git merge -s subtree tig-2.4.0
> -- 8< --
>
> Before this patch, we fail to realize that the tig-2.4.0
> content should go into the "src" directory.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  match-trees.c | 43 ++++++++++++++++++++++++++-----------------
>  1 file changed, 26 insertions(+), 17 deletions(-)
>
> diff --git a/match-trees.c b/match-trees.c
> index 4cdeff53e1..37653308d3 100644
> --- a/match-trees.c
> +++ b/match-trees.c
> @@ -83,34 +83,43 @@ static int score_trees(const struct object_id *hash1,
> const struct object_id *ha
>         int score = 0;
>
>         for (;;) {
> -               struct name_entry e1, e2;
> -               int got_entry_from_one = tree_entry(&one, &e1);
> -               int got_entry_from_two = tree_entry(&two, &e2);
>                 int cmp;
>
> -               if (got_entry_from_one && got_entry_from_two)
> -                       cmp = base_name_entries_compare(&e1, &e2);
> -               else if (got_entry_from_one)
> +               if (one.size && two.size)
> +                       cmp = base_name_entries_compare(&one.entry,
> &two.entry);
> +               else if (one.size)
>                         /* two lacks this entry */
>                         cmp = -1;
> -               else if (got_entry_from_two)
> +               else if (two.size)
>                         /* two has more entries */
>                         cmp = 1;
>                 else
>                         break;
>
> -               if (cmp < 0)
> +               if (cmp < 0) {
>                         /* path1 does not appear in two */
> -                       score += score_missing(e1.mode, e1.path);
> -               else if (cmp > 0)
> +                       score += score_missing(one.entry.mode,
> one.entry.path);
> +                       update_tree_entry(&one);
> +               } else if (cmp > 0) {
>                         /* path2 does not appear in one */
> -                       score += score_missing(e2.mode, e2.path);
> -               else if (oidcmp(e1.oid, e2.oid))
> -                       /* they are different */
> -                       score += score_differs(e1.mode, e2.mode, e1.path);
> -               else
> -                       /* same subtree or blob */
> -                       score += score_matches(e1.mode, e2.mode, e1.path);
> +                       score += score_missing(two.entry.mode,
> two.entry.path);
> +                       update_tree_entry(&two);
> +               } else {
> +                       /* path appears in both */
> +                       if (oidcmp(one.entry.oid, two.entry.oid)) {
> +                               /* they are different */
> +                               score += score_differs(one.entry.mode,
> +                                                      two.entry.mode,
> +                                                      one.entry.path);
> +                       } else {
> +                               /* same subtree or blob */
> +                               score += score_matches(one.entry.mode,
> +                                                      two.entry.mode,
> +                                                      one.entry.path);
> +                       }
> +                       update_tree_entry(&one);
> +                       update_tree_entry(&two);
> +               }
>         }
>         free(one_buf);
>         free(two_buf);
> --
> 2.18.0.796.g4bfd63b683
>
>

[-- Attachment #2: Type: text/html, Size: 10318 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: git merge -s subtree seems to be broken.
  2018-07-31 19:52             ` George Shammas
@ 2018-07-31 20:40               ` Jeff King
  0 siblings, 0 replies; 17+ messages in thread
From: Jeff King @ 2018-07-31 20:40 UTC (permalink / raw)
  To: George Shammas; +Cc: Junio C Hamano, l.s.r, git

On Tue, Jul 31, 2018 at 03:52:26PM -0400, George Shammas wrote:

> This is the fastest I ever seen an open source project respond to an issue
> I reported. Thanks for being awesome!

You're welcome. My speed is an inverse to how embarrassingly long we
carried the bug for. ;)

> > Signed-off-by: Jeff King <peff@peff.net>
> > ---
> >  match-trees.c | 43 ++++++++++++++++++++++++++-----------------
> >  1 file changed, 26 insertions(+), 17 deletions(-)

Sorry, I meant to actually add a:

  Reported-by: George Shammas <georgyo@gmail.com>

here.

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: git merge -s subtree seems to be broken.
  2018-07-31 19:04           ` Jeff King
  2018-07-31 19:52             ` George Shammas
@ 2018-07-31 21:06             ` Junio C Hamano
  2018-08-01  0:58               ` René Scharfe
  2018-08-02 18:45               ` Jeff King
  1 sibling, 2 replies; 17+ messages in thread
From: Junio C Hamano @ 2018-07-31 21:06 UTC (permalink / raw)
  To: Jeff King; +Cc: René Scharfe, George Shammas, git

Jeff King <peff@peff.net> writes:

> On Tue, Jul 31, 2018 at 01:23:04PM -0400, Jeff King wrote:
> ...
> So here it is fixed, and with a commit message. I'm not happy to omit a
> regression test, but I actually couldn't come up with a minimal one that
> tickled the problem, because we're playing around with heuristics. So I
> compensated by probably over-explaining in the commit message. But

Have you tried to apply the message yourself?  I'll fix it up but
the hint to answer that question is in two extra pair of scissors.

> clearly this is not a well-tested code path given the length of time
> between introducing and detecting the bug.

Thanks for writing it up.  The patch itself still looks correct, too.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: git merge -s subtree seems to be broken.
  2018-07-31 15:50   ` Jeff King
  2018-07-31 16:08     ` Junio C Hamano
@ 2018-08-01  0:58     ` René Scharfe
  1 sibling, 0 replies; 17+ messages in thread
From: René Scharfe @ 2018-08-01  0:58 UTC (permalink / raw)
  To: Jeff King, George Shammas; +Cc: git

Am 31.07.2018 um 17:50 schrieb Jeff King:
> On Tue, Jul 31, 2018 at 11:03:17AM -0400, George Shammas wrote:
> 
>> Bisecting around, this might be the commit that introduced the breakage.
>>
>> https://github.com/git/git/commit/d8febde
>>
>> I really hope that it hasn't been broken for 5 years and I am just doing
>> something wrong.
> 
> Unfortunately, I think it has been broken for five years.

I don't remember this change at all. :-(  Sorry for the trouble, everyone.
I should feel ashamed, but I'm only staring in bewilderment.

René

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: git merge -s subtree seems to be broken.
  2018-07-31 21:06             ` Junio C Hamano
@ 2018-08-01  0:58               ` René Scharfe
  2018-08-02 18:58                 ` Jeff King
  2018-08-02 18:45               ` Jeff King
  1 sibling, 1 reply; 17+ messages in thread
From: René Scharfe @ 2018-08-01  0:58 UTC (permalink / raw)
  To: Junio C Hamano, Jeff King; +Cc: George Shammas, git

Am 31.07.2018 um 23:06 schrieb Junio C Hamano:
> Jeff King <peff@peff.net> writes:
> 
>> On Tue, Jul 31, 2018 at 01:23:04PM -0400, Jeff King wrote:
>> ...
>> So here it is fixed, and with a commit message. I'm not happy to omit a
>> regression test, but I actually couldn't come up with a minimal one that
>> tickled the problem, because we're playing around with heuristics.
How about something like this? (squashable)

---
 t/t6029-merge-subtree.sh | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/t/t6029-merge-subtree.sh b/t/t6029-merge-subtree.sh
index 3e692454a7..474a850de6 100755
--- a/t/t6029-merge-subtree.sh
+++ b/t/t6029-merge-subtree.sh
@@ -29,6 +29,34 @@ test_expect_success 'subtree available and works like recursive' '
 
 '
 
+test_expect_success 'setup branch sub' '
+	git checkout --orphan sub &&
+	git rm -rf . &&
+	test_commit foo
+'
+
+test_expect_success 'setup branch main' '
+	git checkout -b main master &&
+	git merge -s ours --no-commit --allow-unrelated-histories sub &&
+	git read-tree --prefix=dir/ -u sub &&
+	git commit -m "initial merge of sub into main" &&
+	test_path_is_file dir/foo.t &&
+	test_path_is_file hello
+'
+
+test_expect_success 'update branch sub' '
+	git checkout sub &&
+	test_commit bar
+'
+
+test_expect_success 'update branch main' '
+	git checkout main &&
+	git merge -s subtree sub -m "second merge of sub into main" &&
+	test_path_is_file dir/bar.t &&
+	test_path_is_file dir/foo.t &&
+	test_path_is_file hello
+'
+
 test_expect_success 'setup' '
 	mkdir git-gui &&
 	cd git-gui &&
-- 
2.18.0

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: git merge -s subtree seems to be broken.
  2018-07-31 21:06             ` Junio C Hamano
  2018-08-01  0:58               ` René Scharfe
@ 2018-08-02 18:45               ` Jeff King
  1 sibling, 0 replies; 17+ messages in thread
From: Jeff King @ 2018-08-02 18:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: René Scharfe, George Shammas, git

On Tue, Jul 31, 2018 at 02:06:12PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > On Tue, Jul 31, 2018 at 01:23:04PM -0400, Jeff King wrote:
> > ...
> > So here it is fixed, and with a commit message. I'm not happy to omit a
> > regression test, but I actually couldn't come up with a minimal one that
> > tickled the problem, because we're playing around with heuristics. So I
> > compensated by probably over-explaining in the commit message. But
> 
> Have you tried to apply the message yourself?  I'll fix it up but
> the hint to answer that question is in two extra pair of scissors.

Heh, thank you for noticing. I actually wondered about that while
writing it and meant to test, but then got distracted.

I wonder "am --scissors" should actually look for the _first_ scissors.
I guess that has the opposite problem, which is that we might include
too much cruft in an email that uses scissors. Perhaps "too much" is a
better failure mode than "too little", though.

-Peff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: git merge -s subtree seems to be broken.
  2018-08-01  0:58               ` René Scharfe
@ 2018-08-02 18:58                 ` Jeff King
  0 siblings, 0 replies; 17+ messages in thread
From: Jeff King @ 2018-08-02 18:58 UTC (permalink / raw)
  To: René Scharfe; +Cc: Junio C Hamano, George Shammas, git

On Wed, Aug 01, 2018 at 02:58:50AM +0200, René Scharfe wrote:

> Am 31.07.2018 um 23:06 schrieb Junio C Hamano:
> > Jeff King <peff@peff.net> writes:
> > 
> >> On Tue, Jul 31, 2018 at 01:23:04PM -0400, Jeff King wrote:
> >> ...
> >> So here it is fixed, and with a commit message. I'm not happy to omit a
> >> regression test, but I actually couldn't come up with a minimal one that
> >> tickled the problem, because we're playing around with heuristics.
> How about something like this? (squashable)

Thanks. This is quite similar to what I tried, but I had started a new
script, and I suspect that what is in t6029's master branch tickles the
heuristic in a more interesting way. Or possibly I just botched my
attempt, though I did spend quite a bit of time fiddling with it. The
master branch seems to only contain one file, "hello". Hmph.

At any rate, it does trigger the bug and demonstrate the fix, so let's
go with it. I see Junio already squashed the tests into
jk/merge-subtree-heuristics, but I think we can cut down the commit
message a bit (and stop claiming that we don't have a test ;) ).  Like
so (and yes, this version omits the extra scissors):

-- >8 --
Subject: score_trees(): fix iteration over trees with missing entries

In score_trees(), we walk over two sorted trees to find
which entries are missing or have different content between
the two.  So if we have two trees with these entries:

  one   two
  ---   ---
  a     a
  b     c
  c     d

we'd expect the loop to:

  - compare "a" to "a"

  - compare "b" to "c"; because these are sorted lists, we
    know that the second tree does not have "b"

  - compare "c" to "c"

  - compare "d" to end-of-list; we know that the first tree
    does not have "d"

And prior to d8febde370 (match-trees: simplify score_trees()
using tree_entry(), 2013-03-24) that worked. But after that
commit, we mistakenly increment the tree pointers for every
loop iteration, even when we've processed the entry for only
one side. As a result, we end up doing this:

  - compare "a" to "a"

  - compare "b" to "c"; we know that we do not have "b", but
    we still increment both tree pointers; at this point
    we're out of sync and all further comparisons are wrong

  - compare "c" to "d" and mistakenly claim that the second
    tree does not have "c"

  - exit the loop, mistakenly not realizing that the first
    tree does not have "d"

So contrary to the claim in d8febde370, we really do need to
manually use update_tree_entry(), because advancing the tree
pointer depends on the entry comparison.

That means we must stop using tree_entry() to access each
entry, since it auto-advances the pointer. Instead:

  - we'll use tree_desc.size directly to know if there's
    anything left to look at (which is what tree_entry() was
    doing under the hood)

  - rather than do an extra struct assignment to "e1" and
    "e2", we can just access the "entry" field of tree_desc
    directly

That makes us a little more intimate with the tree_desc
code, but that's not uncommon for its callers.

The included test shows off the bug by adding a new entry
"bar.t", which sorts early in the tree and de-syncs the
comparison for "foo.t", which comes after.

Reported-by: George Shammas <georgyo@gmail.com>
Helped-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Jeff King <peff@peff.net>
---
 match-trees.c            | 43 ++++++++++++++++++++++++----------------
 t/t6029-merge-subtree.sh | 28 ++++++++++++++++++++++++++
 2 files changed, 54 insertions(+), 17 deletions(-)

diff --git a/match-trees.c b/match-trees.c
index 4cdeff53e1..37653308d3 100644
--- a/match-trees.c
+++ b/match-trees.c
@@ -83,34 +83,43 @@ static int score_trees(const struct object_id *hash1, const struct object_id *ha
 	int score = 0;
 
 	for (;;) {
-		struct name_entry e1, e2;
-		int got_entry_from_one = tree_entry(&one, &e1);
-		int got_entry_from_two = tree_entry(&two, &e2);
 		int cmp;
 
-		if (got_entry_from_one && got_entry_from_two)
-			cmp = base_name_entries_compare(&e1, &e2);
-		else if (got_entry_from_one)
+		if (one.size && two.size)
+			cmp = base_name_entries_compare(&one.entry, &two.entry);
+		else if (one.size)
 			/* two lacks this entry */
 			cmp = -1;
-		else if (got_entry_from_two)
+		else if (two.size)
 			/* two has more entries */
 			cmp = 1;
 		else
 			break;
 
-		if (cmp < 0)
+		if (cmp < 0) {
 			/* path1 does not appear in two */
-			score += score_missing(e1.mode, e1.path);
-		else if (cmp > 0)
+			score += score_missing(one.entry.mode, one.entry.path);
+			update_tree_entry(&one);
+		} else if (cmp > 0) {
 			/* path2 does not appear in one */
-			score += score_missing(e2.mode, e2.path);
-		else if (oidcmp(e1.oid, e2.oid))
-			/* they are different */
-			score += score_differs(e1.mode, e2.mode, e1.path);
-		else
-			/* same subtree or blob */
-			score += score_matches(e1.mode, e2.mode, e1.path);
+			score += score_missing(two.entry.mode, two.entry.path);
+			update_tree_entry(&two);
+		} else {
+			/* path appears in both */
+			if (oidcmp(one.entry.oid, two.entry.oid)) {
+				/* they are different */
+				score += score_differs(one.entry.mode,
+						       two.entry.mode,
+						       one.entry.path);
+			} else {
+				/* same subtree or blob */
+				score += score_matches(one.entry.mode,
+						       two.entry.mode,
+						       one.entry.path);
+			}
+			update_tree_entry(&one);
+			update_tree_entry(&two);
+		}
 	}
 	free(one_buf);
 	free(two_buf);
diff --git a/t/t6029-merge-subtree.sh b/t/t6029-merge-subtree.sh
index 3e692454a7..474a850de6 100755
--- a/t/t6029-merge-subtree.sh
+++ b/t/t6029-merge-subtree.sh
@@ -29,6 +29,34 @@ test_expect_success 'subtree available and works like recursive' '
 
 '
 
+test_expect_success 'setup branch sub' '
+	git checkout --orphan sub &&
+	git rm -rf . &&
+	test_commit foo
+'
+
+test_expect_success 'setup branch main' '
+	git checkout -b main master &&
+	git merge -s ours --no-commit --allow-unrelated-histories sub &&
+	git read-tree --prefix=dir/ -u sub &&
+	git commit -m "initial merge of sub into main" &&
+	test_path_is_file dir/foo.t &&
+	test_path_is_file hello
+'
+
+test_expect_success 'update branch sub' '
+	git checkout sub &&
+	test_commit bar
+'
+
+test_expect_success 'update branch main' '
+	git checkout main &&
+	git merge -s subtree sub -m "second merge of sub into main" &&
+	test_path_is_file dir/bar.t &&
+	test_path_is_file dir/foo.t &&
+	test_path_is_file hello
+'
+
 test_expect_success 'setup' '
 	mkdir git-gui &&
 	cd git-gui &&
-- 
2.18.0.800.g770d9f3396




^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-08-02 18:58 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-31 14:09 git merge -s subtree seems to be broken George Shammas
2018-07-31 15:03 ` George Shammas
2018-07-31 15:50   ` Jeff King
2018-07-31 16:08     ` Junio C Hamano
2018-08-01  0:58     ` René Scharfe
2018-07-31 15:53   ` Junio C Hamano
2018-07-31 15:56     ` George Shammas
2018-07-31 16:15     ` Jeff King
2018-07-31 17:17       ` Junio C Hamano
2018-07-31 17:23         ` Jeff King
2018-07-31 19:04           ` Jeff King
2018-07-31 19:52             ` George Shammas
2018-07-31 20:40               ` Jeff King
2018-07-31 21:06             ` Junio C Hamano
2018-08-01  0:58               ` René Scharfe
2018-08-02 18:58                 ` Jeff King
2018-08-02 18:45               ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).