git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Weird revision walk behaviour
@ 2018-05-23 17:10 SZEDER Gábor
  2018-05-23 17:32 ` Jeff King
  0 siblings, 1 reply; 12+ messages in thread
From: SZEDER Gábor @ 2018-05-23 17:10 UTC (permalink / raw)
  To: Git mailing list

There is this topic 'jt/partial-clone-proto-v2' currently cooking in
'next' and pointing to ba95710a3b ({fetch,upload}-pack: support filter
in protocol v2, 2018-05-03).  This topic is built on top of the merge
commit ea44c0a594 (Merge branch 'bw/protocol-v2' into
jt/partial-clone-proto-v2, 2018-05-02), which gives me the creeps,
because it shows up in some pathspec-limited revision walks where in
my opinion it should not:

  $ git log --oneline master..ba95710a3b -- ci/
  ea44c0a594 Merge branch 'bw/protocol-v2' into jt/partial-clone-proto-v2

But as far as I can tell, there are no changes in the 'ci/' directory
on any of the merge's parents:

  $ git log --oneline master..ea44c0a594^1 -- ci/
  # Nothing.
  $ git log --oneline master..ea44c0a594^2 -- ci/
  # Nothing!

And to add to my confusion:

  $ git log -1 --oneline master@{1.week.ago}
  ccdcbd54c4 The fifth batch for 2.18
  $ git log --oneline master@{1.week.ago}..ea44c0a594 -- ci/
  ea44c0a594 Merge branch 'bw/protocol-v2' into jt/partial-clone-proto-v2
  $ git log -1 --oneline master@{3.week.ago}
  1f1cddd558 The fourth batch for 2.18
  $ git log --oneline master@{3.week.ago}..ea44c0a594 -- ci/
  # Nothing, as it is supposed to be, IMHO.

This is not specific to the 'ci/' directory, it seems that any
untouched directory does the trick:

  $ git log --oneline master..ea44c0a594 -- contrib/coccinelle/ t/lib-httpd/
  ea44c0a594 Merge branch 'bw/protocol-v2' into jt/partial-clone-proto-v2
  $ git log --oneline master..ea44c0a594^1 -- contrib/coccinelle/ t/lib-httpd/
  # Nothing.
  $ git log --oneline master..ea44c0a594^2 -- contrib/coccinelle/ t/lib-httpd/
  # Nothing.
  $ git log --oneline master@{3.week.ago}..ea44c0a594 --
contrib/coccinelle/ t/lib-httpd/
  # Nothing, but this is what I would expect.

I get the same behavior with Git built from current master and from
past releases as well (tried it as far back as v2.0.0).

So...  what's going on here? :)
A bug?  Or am I missing something?  Some history simplification corner
case that I'm unaware of?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Weird revision walk behaviour
  2018-05-23 17:10 Weird revision walk behaviour SZEDER Gábor
@ 2018-05-23 17:32 ` Jeff King
  2018-05-23 17:35   ` Jeff King
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2018-05-23 17:32 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Kevin Bracey, Git mailing list

On Wed, May 23, 2018 at 07:10:58PM +0200, SZEDER Gábor wrote:

>   $ git log --oneline master..ba95710a3b -- ci/
>   ea44c0a594 Merge branch 'bw/protocol-v2' into jt/partial-clone-proto-v2
> 
> But as far as I can tell, there are no changes in the 'ci/' directory
> on any of the merge's parents:
> 
>   $ git log --oneline master..ea44c0a594^1 -- ci/
>   # Nothing.
>   $ git log --oneline master..ea44c0a594^2 -- ci/
>   # Nothing!

Hmm. That commit does touch "ci/" with respect to one of its parents.
It should get simplified away because it completely matches the other
parent, so it does sound like a bug.

> This is not specific to the 'ci/' directory, it seems that any
> untouched directory does the trick:
> 
>   $ git log --oneline master..ea44c0a594 -- contrib/coccinelle/ t/lib-httpd/
>   ea44c0a594 Merge branch 'bw/protocol-v2' into jt/partial-clone-proto-v2

Both of those directories also differ between one parent. If you try it
with "contrib/remote-helpers", which does not, then the commit does not
appear.

So it does seem like a bug where we should be simplifying away the merge
but are not (or I'm missing the corner case, too ;) ).

> I get the same behavior with Git built from current master and from
> past releases as well (tried it as far back as v2.0.0).

I keep some older builds around, and it does not reproduce with v1.6.6.3
(that's my usual goto for "old"). Bisecting turns up d0af663e42
(revision.c: Make --full-history consider more merges, 2013-05-16).  It
looks like an unintended change (the commit message claims that the
non-full-history case shouldn't be affected).

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Weird revision walk behaviour
  2018-05-23 17:32 ` Jeff King
@ 2018-05-23 17:35   ` Jeff King
  2018-05-24 18:54     ` Kevin Bracey
  2018-05-24 20:26     ` Kevin Bracey
  0 siblings, 2 replies; 12+ messages in thread
From: Jeff King @ 2018-05-23 17:35 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Junio C Hamano, Kevin Bracey, Git mailing list

On Wed, May 23, 2018 at 01:32:46PM -0400, Jeff King wrote:

> On Wed, May 23, 2018 at 07:10:58PM +0200, SZEDER Gábor wrote:
> 
> >   $ git log --oneline master..ba95710a3b -- ci/
> >   ea44c0a594 Merge branch 'bw/protocol-v2' into jt/partial-clone-proto-v2
> > 
> > But as far as I can tell, there are no changes in the 'ci/' directory
> > on any of the merge's parents:
> > 
> >   $ git log --oneline master..ea44c0a594^1 -- ci/
> >   # Nothing.
> >   $ git log --oneline master..ea44c0a594^2 -- ci/
> >   # Nothing!
> 
> Hmm. That commit does touch "ci/" with respect to one of its parents.
> It should get simplified away because it completely matches the other
> parent, so it does sound like a bug.
> 
> > This is not specific to the 'ci/' directory, it seems that any
> > untouched directory does the trick:
> > 
> >   $ git log --oneline master..ea44c0a594 -- contrib/coccinelle/ t/lib-httpd/
> >   ea44c0a594 Merge branch 'bw/protocol-v2' into jt/partial-clone-proto-v2
> 
> Both of those directories also differ between one parent. If you try it
> with "contrib/remote-helpers", which does not, then the commit does not
> appear.
> 
> So it does seem like a bug where we should be simplifying away the merge
> but are not (or I'm missing the corner case, too ;) ).
> 
> > I get the same behavior with Git built from current master and from
> > past releases as well (tried it as far back as v2.0.0).
> 
> I keep some older builds around, and it does not reproduce with v1.6.6.3
> (that's my usual goto for "old"). Bisecting turns up d0af663e42
> (revision.c: Make --full-history consider more merges, 2013-05-16).  It
> looks like an unintended change (the commit message claims that the
> non-full-history case shouldn't be affected).

There's more discussion in the thread at:

  https://public-inbox.org/git/1366658602-12254-1-git-send-email-kevin@bracey.fi/

I haven't absorbed it all yet, but I'm adding Junio to the cc.

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Weird revision walk behaviour
  2018-05-23 17:35   ` Jeff King
@ 2018-05-24 18:54     ` Kevin Bracey
  2018-05-24 20:26     ` Kevin Bracey
  1 sibling, 0 replies; 12+ messages in thread
From: Kevin Bracey @ 2018-05-24 18:54 UTC (permalink / raw)
  To: Jeff King, SZEDER Gábor; +Cc: Junio C Hamano, Git mailing list

On 23/05/2018 20:35, Jeff King wrote:
> There's more discussion in the thread at:
>
>    https://public-inbox.org/git/1366658602-12254-1-git-send-email-kevin@bracey.fi/
>
> I haven't absorbed it all yet, but I'm adding Junio to the cc.

Just to ack that I've seen the discussion, but I can't identify the 
code's reasoning at the moment. My recollection is that I accepted while 
coming up with the algorithm that it might err slightly on the side of 
false positives in the display - there were some merge cases I was 
unable to fully distinguish whether or not the merge had lost a change 
it shouldn't have done, and if I was uncertain I'd rather show it than not.

The first commit was not originally intended to alter behaviour for 
anything other than --full-history, but later in the chain there was 
specific consideration into tracking the path to the specified "bottom" 
commit. It may be that's part of what's happening here.

Kevin





^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Weird revision walk behaviour
  2018-05-23 17:35   ` Jeff King
  2018-05-24 18:54     ` Kevin Bracey
@ 2018-05-24 20:26     ` Kevin Bracey
  2018-05-27 17:37       ` Kevin Bracey
  1 sibling, 1 reply; 12+ messages in thread
From: Kevin Bracey @ 2018-05-24 20:26 UTC (permalink / raw)
  To: Jeff King, SZEDER Gábor; +Cc: Junio C Hamano, Git mailing list

On 23/05/2018 20:35, Jeff King wrote:
> On Wed, May 23, 2018 at 01:32:46PM -0400, Jeff King wrote:
>
>> On Wed, May 23, 2018 at 07:10:58PM +0200, SZEDER Gábor wrote:
>>
>>>    $ git log --oneline master..ba95710a3b -- ci/
>>>    ea44c0a594 Merge branch 'bw/protocol-v2' into jt/partial-clone-proto-v2
>>>
>>>
>>> I keep some older builds around, and it does not reproduce with v1.6.6.3
>>> (that's my usual goto for "old"). Bisecting turns up d0af663e42
>>> (revision.c: Make --full-history consider more merges, 2013-05-16).  It
>>> looks like an unintended change (the commit message claims that the
>>> non-full-history case shouldn't be affected).
> There's more discussion in the thread at:
>
>    https://public-inbox.org/git/1366658602-12254-1-git-send-email-kevin@bracey.fi/
>
> I haven't absorbed it all yet, but I'm adding Junio to the cc.
>

In this case, we're hitting a merge commit which is not on master, but 
it has two parents which both are. Which, IIRC, means the merge commit 
is INTERESTING with two UNINTERESTING parents; and we are TREESAME to 
only one of them.

The commit changing the logic of TREESAME you identified believes that 
those TREESAME changes for merges which were intended to improve fuller 
history modes shouldn't affect the simple history "because partially 
TREESAME merges are turned into normal commits". Clearly that didn't 
happen here.

I think we need to look at why that isn't happening, and if it can be 
made to happen. The problem is that this commit is effectively the base 
of the graph - it's got a double-connection to the UNINTERESTING set, 
and maybe that prevented the simple history "follow 1 TREESAME" logic 
from kicking in. Maybe it won't follow 1 TREESAME to UNINTERESTING.

I know there were quite a few changes later in the series to try to 
reconcile the simple and full history, for the cases where the simple 
history takes a weird path because of its love of TREESAME parents, 
hiding evil merges. But I believe the simple history behaviour was 
supposed to remain as-is - take first TREESAME always.

Kevin



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Weird revision walk behaviour
  2018-05-24 20:26     ` Kevin Bracey
@ 2018-05-27 17:37       ` Kevin Bracey
  2018-05-28 22:06         ` SZEDER Gábor
  0 siblings, 1 reply; 12+ messages in thread
From: Kevin Bracey @ 2018-05-27 17:37 UTC (permalink / raw)
  To: Jeff King, SZEDER Gábor; +Cc: Junio C Hamano, Git mailing list

On 24/05/2018 23:26, Kevin Bracey wrote:
>
>>> On Wed, May 23, 2018 at 07:10:58PM +0200, SZEDER Gábor wrote:
>>>
>>>>    $ git log --oneline master..ba95710a3b -- ci/
>>>>    ea44c0a594 Merge branch 'bw/protocol-v2' into 
>>>> jt/partial-clone-proto-v2
>>>>
> In this case, we're hitting a merge commit which is not on master, but 
> it has two parents which both are. Which, IIRC, means the merge commit 
> is INTERESTING with two UNINTERESTING parents; and we are TREESAME to 
> only one of them.
>
> The commit changing the logic of TREESAME you identified believes that 
> those TREESAME changes for merges which were intended to improve 
> fuller history modes shouldn't affect the simple history "because 
> partially TREESAME merges are turned into normal commits". Clearly 
> that didn't happen here.
>
Haven't currently got a development environment set up here, but I've 
been looking at the code.Here's a proposal, untested, as a potential 
starting point if anyone wants to consider a proper patch.

The simplify_history first-scan logic never actually turned merges into 
simple commits unless they were TREESAME to a relevant/interesting 
parent.  Anything where the TREESAME parent was UNINTERESTING was 
retained as a merge, but had its TREESAME flag set, and that permitted 
later simplification.

With the redefinition of the TREESAME flag, this merge commit is no 
longer TREESAME, and as the decoration logic to refine TREESAME isn't 
active for simplify_history, it doesn't get cleaned up (even if it would 
be in full history?)

I think the answer may be to add an extra post-process step on the 
initial loop to handle this special case. Something like:

         case REV_TREE_SAME:
             if (!revs->simplify_history || !relevant_commit(p)) {
                 /* Even if a merge with an uninteresting
                  * side branch brought the entire change
                  * we are interested in, we do not want
                  * to lose the other branches of this
                  * merge, so we just keep going.
                  */
                 if (ts)
                     ts->treesame[nth_parent] = 1;
+               /* But we note it for potential later simplification */
+               if (!treesame_parent)
+                    treesame_parent = p;
                 continue;
              }

...

After loop:

+     if (relevant_parents == 0 && revs->simplify_history && 
treesame_parent) {
+           treesame_parent->next = NULL;// Repeats code from loop - 
share somehow?
+           commit->parents = treesame_parent;
+           commit->object.flags |= TREESAME;
+           return;
+    }

      /*
       * TREESAME is straightforward for single-parent commits. For merge

The other option would be to take off the " || !relevant_commit(p)" 
test, but I'm assuming that is still needed for other cases.

Kevin



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Weird revision walk behaviour
  2018-05-27 17:37       ` Kevin Bracey
@ 2018-05-28 22:06         ` SZEDER Gábor
  2018-05-29  6:11           ` Kevin Bracey
  2018-05-29 21:04           ` Jeff King
  0 siblings, 2 replies; 12+ messages in thread
From: SZEDER Gábor @ 2018-05-28 22:06 UTC (permalink / raw)
  To: Kevin Bracey
  Cc: SZEDER Gábor, Jeff King, Junio C Hamano, Git mailing list


> On 24/05/2018 23:26, Kevin Bracey wrote:
> >
> >>> On Wed, May 23, 2018 at 07:10:58PM +0200, SZEDER Gábor wrote:
> >>>
> >>>>    $ git log --oneline master..ba95710a3b -- ci/
> >>>>    ea44c0a594 Merge branch 'bw/protocol-v2' into 
> >>>> jt/partial-clone-proto-v2
> >>>>
> > In this case, we're hitting a merge commit which is not on master, but 
> > it has two parents which both are.

Indeed, I didn't notice this important detail amidst the complexity of
the git history.  Thanks, with this info I could come up with a small
test case to demonstrate the issue, see below.

> Which, IIRC, means the merge commit 
> > is INTERESTING with two UNINTERESTING parents; and we are TREESAME to 
> > only one of them.
> >
> > The commit changing the logic of TREESAME you identified believes that 
> > those TREESAME changes for merges which were intended to improve 
> > fuller history modes shouldn't affect the simple history "because 
> > partially TREESAME merges are turned into normal commits". Clearly 
> > that didn't happen here.
> >
> Haven't currently got a development environment set up here, but I've 
> been looking at the code.Here's a proposal, untested, as a potential 
> starting point if anyone wants to consider a proper patch.
> 
> The simplify_history first-scan logic never actually turned merges into 
> simple commits unless they were TREESAME to a relevant/interesting 
> parent.  Anything where the TREESAME parent was UNINTERESTING was 
> retained as a merge, but had its TREESAME flag set, and that permitted 
> later simplification.
> 
> With the redefinition of the TREESAME flag, this merge commit is no 
> longer TREESAME, and as the decoration logic to refine TREESAME isn't 
> active for simplify_history, it doesn't get cleaned up (even if it would 
> be in full history?)
> 
> I think the answer may be to add an extra post-process step on the 
> initial loop to handle this special case. Something like:
> 
>          case REV_TREE_SAME:
>              if (!revs->simplify_history || !relevant_commit(p)) {
>                  /* Even if a merge with an uninteresting
>                   * side branch brought the entire change
>                   * we are interested in, we do not want
>                   * to lose the other branches of this
>                   * merge, so we just keep going.
>                   */
>                  if (ts)
>                      ts->treesame[nth_parent] = 1;
> +               /* But we note it for potential later simplification */
> +               if (!treesame_parent)
> +                    treesame_parent = p;
>                  continue;
>               }
> 
> ...
> 
> After loop:
> 
> +     if (relevant_parents == 0 && revs->simplify_history && 
> treesame_parent) {
> +           treesame_parent->next = NULL;// Repeats code from loop - 
> share somehow?
> +           commit->parents = treesame_parent;
> +           commit->object.flags |= TREESAME;
> +           return;
> +    }
> 
>       /*
>        * TREESAME is straightforward for single-parent commits. For merge

So, without investing nearly enough time to understand what is going
on, I massaged the above diffs into this:

  ---  >8 ---

diff --git a/revision.c b/revision.c
index 4e0e193e57..0ddd2c1e8a 100644
--- a/revision.c
+++ b/revision.c
@@ -605,7 +605,7 @@ static inline int limiting_can_increase_treesame(const struct rev_info *revs)
 
 static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
 {
-	struct commit_list **pp, *parent;
+	struct commit_list **pp, *parent, *treesame_parents = NULL;
 	struct treesame_state *ts = NULL;
 	int relevant_change = 0, irrelevant_change = 0;
 	int relevant_parents, nth_parent;
@@ -672,6 +672,7 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
 		switch (rev_compare_tree(revs, p, commit)) {
 		case REV_TREE_SAME:
 			if (!revs->simplify_history || !relevant_commit(p)) {
+				struct commit_list *tp;
 				/* Even if a merge with an uninteresting
 				 * side branch brought the entire change
 				 * we are interested in, we do not want
@@ -680,6 +681,13 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
 				 */
 				if (ts)
 					ts->treesame[nth_parent] = 1;
+				/* But we note it for potential later
+				 * simplification
+				 */
+				tp = treesame_parents;
+				treesame_parents = xmalloc(sizeof(*treesame_parents));
+				treesame_parents->item = p;
+				treesame_parents->next = tp;
 				continue;
 			}
 			parent->next = NULL;
@@ -716,6 +724,14 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
 		die("bad tree compare for commit %s", oid_to_hex(&commit->object.oid));
 	}
 
+	if (relevant_parents == 0 && revs->simplify_history &&
+	    treesame_parents) {
+		commit->parents = treesame_parents;
+		commit->object.flags |= TREESAME;
+		return;
+	} else
+		free_commit_list(treesame_parents);
+
 	/*
 	 * TREESAME is straightforward for single-parent commits. For merge
 	 * commits, it is most useful to define it so that "irrelevant"

  ---  >8 ---

FWIW, the test suite passes with the above patch applied.

And here is the small PoC test case to illustrate the issue, which
fails without but succeeds with the above patch.  Eventually it should
be part of 't6012-rev-list-simplify.sh', of course, but I haven't
looked into that yet.


  ---  >8 ---

diff --git a/t/t9999-weird-revision-walk-behaviour.sh b/t/t9999-weird-revision-walk-behaviour.sh
new file mode 100755
index 0000000000..22820f845b
--- /dev/null
+++ b/t/t9999-weird-revision-walk-behaviour.sh
@@ -0,0 +1,38 @@
+#!/bin/sh
+
+test_description='PoC weird revision walk behaviour test'
+
+. ./test-lib.sh
+
+# Create the following history, i.e. where both parents of merge 'M1'
+# are in 'master':
+#
+#   B---M2   master
+#  / \ /
+# A   X
+#  \ / \
+#   C---M1   b2
+#
+# and modify 'file' in commits 'A' and 'B', so one of 'M1's parents
+# ('B') is TREESAME wrt. 'file'.
+test_expect_success 'setup' '
+	test_commit initial file &&	# A
+	test_commit modified file &&	# B
+	git checkout -b b1 master^ &&
+	test_commit other-file &&	# C
+	git checkout -b b2 master &&
+	git merge --no-ff b1 &&		# M1
+	git checkout master &&
+	git merge --no-ff b1		# M2
+'
+
+test_expect_success 'debug' '
+	git log --oneline --graph --all
+'
+
+test_expect_success "\"Merge branch 'b1' into b2\" should not be shown" '
+	git log master..b2 -- file >actual &&
+	test_must_be_empty actual
+'
+
+test_done



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Weird revision walk behaviour
  2018-05-28 22:06         ` SZEDER Gábor
@ 2018-05-29  6:11           ` Kevin Bracey
  2018-05-29 21:04           ` Jeff King
  1 sibling, 0 replies; 12+ messages in thread
From: Kevin Bracey @ 2018-05-29  6:11 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Jeff King, Junio C Hamano, Git mailing list

On 29/05/2018 01:06, SZEDER Gábor wrote:
>
> So, without investing nearly enough time to understand what is going
> on, I massaged the above diffs into this:
Cool.

> +				treesame_parents = xmalloc(sizeof(*treesame_parents));
There's no need to actually record a list here. This is just for the 
simple history. We are only interested into becoming a non-merge to 1 
treesame parent, so I think we just need to record a pointer to the 
first one we see, just as this would exit immediately for the first 
relevant one. For the full-history case, we already have the full "which 
parents are treesame" recording mechanism just above, but it only kicks 
in for merge commits and only when settings require it. Adding a malloc 
here would be significant machinery overhead.
> FWIW, the test suite passes with the above patch applied.
I doubt there's an existing case like this anywhere in the revision test 
suite :) . And this patch is focused enough that it *should* only be 
changing the behaviour of this very specific case. As such, it does feel 
a little like a kludge, but I think it's fine because it's aligning the 
simple-history analysis with the "analyse relevant parents if any, else 
analyse irrelevant" rule of the full-history.
>
> And here is the small PoC test case to illustrate the issue, which
> fails without but succeeds with the above patch.  Eventually it should
> be part of 't6012-rev-list-simplify.sh', of course, but I haven't
> looked into that yet.
It may be there's enough criss-crossy history to test here to merit 
breaking out into a second test series.

+#   B---M2   master
+#  / \ /
+# A   X
+#  \ / \
+#   C---M1   b2
+#
+# and modify 'file' in commits 'A' and 'B', so one of 'M1's parents
+# ('B') is TREESAME wrt. 'file'.

I guess we'll be wanting test cases for A..B2, B..B2 and C..B2, and some 
where the the base is "some other child of B or C".  "B..B2" is no 
longer a pure set subtraction for simplification as B is UNINTERESTING 
(ie not in the set) but RELEVANT (because you named it as a bottom 
commit), so B..B2 actually still leaves M1 with 2 relevant parents. 
You'd want test cases covering B relevant+C irrelevant and B 
irrelevant+C relevant, which means subtracting them without naming them 
- so name a child of one.

And then we need to think about whether we want it displayed in each of 
the other modes for each of those queries...

Kevin


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Weird revision walk behaviour
  2018-05-28 22:06         ` SZEDER Gábor
  2018-05-29  6:11           ` Kevin Bracey
@ 2018-05-29 21:04           ` Jeff King
  2018-05-30  8:20             ` Kevin Bracey
  1 sibling, 1 reply; 12+ messages in thread
From: Jeff King @ 2018-05-29 21:04 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Kevin Bracey, Junio C Hamano, Git mailing list

On Tue, May 29, 2018 at 12:06:51AM +0200, SZEDER Gábor wrote:

> diff --git a/revision.c b/revision.c
> index 4e0e193e57..0ddd2c1e8a 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -605,7 +605,7 @@ static inline int limiting_can_increase_treesame(const struct rev_info *revs)
>  
>  static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
>  {
> -	struct commit_list **pp, *parent;
> +	struct commit_list **pp, *parent, *treesame_parents = NULL;
>  	struct treesame_state *ts = NULL;
>  	int relevant_change = 0, irrelevant_change = 0;
>  	int relevant_parents, nth_parent;
> @@ -672,6 +672,7 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
>  		switch (rev_compare_tree(revs, p, commit)) {
>  		case REV_TREE_SAME:
>  			if (!revs->simplify_history || !relevant_commit(p)) {
> +				struct commit_list *tp;
>  				/* Even if a merge with an uninteresting
>  				 * side branch brought the entire change
>  				 * we are interested in, we do not want
> @@ -680,6 +681,13 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
>  				 */
>  				if (ts)
>  					ts->treesame[nth_parent] = 1;
> +				/* But we note it for potential later
> +				 * simplification
> +				 */
> +				tp = treesame_parents;
> +				treesame_parents = xmalloc(sizeof(*treesame_parents));
> +				treesame_parents->item = p;
> +				treesame_parents->next = tp;
>  				continue;
>  			}

We hit this "if" if !relevant_commit(p), which I think is what we want.
But we'd also hit it if !revs->simplify_history. Would we want to avoid
doing the simplification in that case?

I guess later we do:

> @@ -716,6 +724,14 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
>  		die("bad tree compare for commit %s", oid_to_hex(&commit->object.oid));
>  	}
>  
> +	if (relevant_parents == 0 && revs->simplify_history &&
> +	    treesame_parents) {
> +		commit->parents = treesame_parents;
> +		commit->object.flags |= TREESAME;
> +		return;
> +	} else
> +		free_commit_list(treesame_parents);
> +

...which blocks the !simplify_history case from triggering. But then we
could avoid the allocation above in that case, I think (though I agree
with Kevin's later email that we may not need it at all).

Do we even need to do the parent rewriting here? By definition those
parents aren't interesting, and we're TREESAME to whatever is in
treesame_parents. So conceptually it seems like we just need a flag "I
found a treesame parent", but we only convert that into a TREESAME flag
if there are no relevant parents.

I wouldn't be surprised, though, if some code path really cares whether
we've simplified to a single uninteresting parent here, versus
simplifying to a root commit (I admit that the simplification code is
one of the areas of Git I'm least familiar with).

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Weird revision walk behaviour
  2018-05-29 21:04           ` Jeff King
@ 2018-05-30  8:20             ` Kevin Bracey
  2018-05-31  5:43               ` Jeff King
  0 siblings, 1 reply; 12+ messages in thread
From: Kevin Bracey @ 2018-05-30  8:20 UTC (permalink / raw)
  To: Jeff King, SZEDER Gábor; +Cc: Junio C Hamano, Git mailing list

On 30/05/2018 00:04, Jeff King wrote:
>
> Do we even need to do the parent rewriting here? By definition those
> parents aren't interesting, and we're TREESAME to whatever is in
> treesame_parents. So conceptually it seems like we just need a flag "I
> found a treesame parent", but we only convert that into a TREESAME flag
> if there are no relevant parents.

I think it's necessary to make the rules consistent. To mark the commit 
as TREESAME here when it's not TREESAME to all its parents would be 
inconsistent with the definition of the TREESAME flag used everywhere else:

* Original definition: "A commit is TREESAME if it is treesame to any 
parent"
* d0af66 definition: "A commit is TREESAME if it is treesame to all parents"
* Current 4d8266 definition: "A commit is TREESAME if it is treesame to 
all relevant parents; if no relevant parents then if it is treesame to 
all (irrelevant) parents."

The current problem is that the node is not marked TREESAME, but that's 
consistent with the definition. I think we do have to rewrite the commit 
so it is TREESAME as per the definition. Not flag it as TREESAME in 
violation of it.

It's possible you *could* get away with just flagging, because we never 
recompute the TREESAME flag in simple history mode. But it would be a 
cheat, and it may have other side effects. It means this node would 
remain a special rare case for others to trip up on later.  And I don't 
think it simplifies the scan. Remembering 
"pointer-to-first-treesame-parent" (not a list) for the rewrite is no 
more complex than remembering "bool-there-was-a-treesame-parent".  (A 
bool is what earlier code did - it worked for the original TREESAME 
definition. My patch series dropped that bool without replacement - 
missing this all-irrelevant case).

In the simple history mode, the assumption is we're "simplifying away 
merges up-front" here; we won't (and can't) rewrite parents later in a 
way that needs to recompute TREESAME. In the initial scan when all 
parents are relevant and we matched one, the commit became TREESAME as 
per the new definition immediately because of the rewrite.  This applies 
the equivalent rewrite when no relevant parents, consistent with the 
general concept, and without changing the TREESAME definition.

Kevin



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Weird revision walk behaviour
  2018-05-30  8:20             ` Kevin Bracey
@ 2018-05-31  5:43               ` Jeff King
  2018-05-31 14:54                 ` Kevin Bracey
  0 siblings, 1 reply; 12+ messages in thread
From: Jeff King @ 2018-05-31  5:43 UTC (permalink / raw)
  To: Kevin Bracey; +Cc: SZEDER Gábor, Junio C Hamano, Git mailing list

On Wed, May 30, 2018 at 11:20:40AM +0300, Kevin Bracey wrote:

> On 30/05/2018 00:04, Jeff King wrote:
> > 
> > Do we even need to do the parent rewriting here? By definition those
> > parents aren't interesting, and we're TREESAME to whatever is in
> > treesame_parents. So conceptually it seems like we just need a flag "I
> > found a treesame parent", but we only convert that into a TREESAME flag
> > if there are no relevant parents.
> 
> I think it's necessary to make the rules consistent. To mark the commit as
> TREESAME here when it's not TREESAME to all its parents would be
> inconsistent with the definition of the TREESAME flag used everywhere else:
> 
> * Original definition: "A commit is TREESAME if it is treesame to any
> parent"
> * d0af66 definition: "A commit is TREESAME if it is treesame to all parents"
> * Current 4d8266 definition: "A commit is TREESAME if it is treesame to all
> relevant parents; if no relevant parents then if it is treesame to all
> (irrelevant) parents."
> 
> The current problem is that the node is not marked TREESAME, but that's
> consistent with the definition. I think we do have to rewrite the commit so
> it is TREESAME as per the definition. Not flag it as TREESAME in violation
> of it.

If there are zero parents (neither relevant nor irrelevant), is it still
TREESAME? I would say in theory yes. So what I was proposing would be to
rewrite the parents to the empty set.

But anyway, I agree with you that the first-treesame-parent strategy is
not any more complex than the boolean, and is probably less likely to
cause unintended headaches later on.

What next here? It looks like we have a proposed solution. Do you want
to try to work up a set of tests based on what you wrote earlier?

I'd also love to hear from Junio as the expert in this area, but I think
he's been a bit busy with maintainer stuff recently. So maybe I should
just be patient. :)

-Peff

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Weird revision walk behaviour
  2018-05-31  5:43               ` Jeff King
@ 2018-05-31 14:54                 ` Kevin Bracey
  0 siblings, 0 replies; 12+ messages in thread
From: Kevin Bracey @ 2018-05-31 14:54 UTC (permalink / raw)
  To: Jeff King; +Cc: SZEDER Gábor, Junio C Hamano, Git mailing list

On 31/05/2018 08:43, Jeff King wrote:
>
> If there are zero parents (neither relevant nor irrelevant), is it still
> TREESAME? I would say in theory yes.

Not sure - I think roots are such a special case that TREESAME 
effectively doesn't matter. We always test for roots first.
>   So what I was proposing would be to
> rewrite the parents to the empty set.
That feels a bit radical - I believe we need to retain (some) parent 
information for modes that show it (eg the dangling unfilled circles in 
gitk). And making it a root I think could cause other problems with 
making it look like we have a disjoint history. I believe the next 
simplification step may be trying to follow down to the common root.
> What next here? It looks like we have a proposed solution. Do you want
> to try to work up a set of tests based on what you wrote earlier?
I was hoping Gábor would carry on, as he's made a start... I was just 
planning to back-seat drive.
> I'd also love to hear from Junio as the expert in this area, but I think
> he's been a bit busy with maintainer stuff recently. So maybe I should
> just be patient. :)
>
Likewise - I have been quite deep into this, but it was a quite short 
window of investigation a long time ago, and I've not looked at it 
since. Would like input from someone with more active knowledge.

Kevin


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-05-31 15:10 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-23 17:10 Weird revision walk behaviour SZEDER Gábor
2018-05-23 17:32 ` Jeff King
2018-05-23 17:35   ` Jeff King
2018-05-24 18:54     ` Kevin Bracey
2018-05-24 20:26     ` Kevin Bracey
2018-05-27 17:37       ` Kevin Bracey
2018-05-28 22:06         ` SZEDER Gábor
2018-05-29  6:11           ` Kevin Bracey
2018-05-29 21:04           ` Jeff King
2018-05-30  8:20             ` Kevin Bracey
2018-05-31  5:43               ` Jeff King
2018-05-31 14:54                 ` Kevin Bracey

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).