git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* History over-simplification
@ 2007-09-23  4:46 Shawn O. Pearce
  2007-09-25 22:42 ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Shawn O. Pearce @ 2007-09-23  4:46 UTC (permalink / raw)
  To: Linus Torvalds, Junio C Hamano; +Cc: git

This is an interesting problem for me.  We had a bad merge in the
repository that produced this particular gitk graph:

  http://www.spearce.org/2007/07/ugly-gitk.gif

The project maintainer tried to locate the commits that altered
a given file with `gitk -- path` and found only one commit
(the introduction of the path) but expected to at least find
a modification from a side branch.  What he was looking for
specifically was the merge (or revert) that threw away that
side branch's change as the change wasn't supposed to have been
reverted/lost.

He knew what the commit from the side branch was (the author of
it still had a branch pointing at the commit).  But try finding
the proper child commit in gitk without a path limiter in that
graph I link to above.  If you don't remember what a mess it is,
click through briefly to refresh your memory.  The path limiter is
really required here to boil the graph down to something a human
can reason with.

Adding in --full-history *almost* gave us the data he needed.
It did find the modification on the side branch but it couldn't
point to the child commit which reverted the path.  It turns
out the bad change was an octopus merge commit done incorrectly.
Yes, really.  Don't bother telling me how bad octopus merges are.
This problem happens in a two parent merge as well.

Below is a test which shows the problem.  Without the associated
patch to revision.c we cannot find the ours merge that trashed the
side branch's change.  In this toy repository its no big deal to
find the ours merge and figure out what happened.  In the production
repository we were looking at above its impossible without support
from Git.

I don't really like the patch to revision.c because it winds up
showing trivial merges too.  What I really want is to have the
"--full-history" option include a merge if either of the following
is true:

 a) The resulting path does not match _any_ of the parents.  We
    already handle this case correctly in revision.c and correctly
    show the "evil" merge.

 b) The resulting path matches one of the parents but not one of
    the others.  In such a case the merge should still be output if
    a 3-way read-tree would not have chosen this result by default.

The problem is computing b) from within the revision walker.
You can't compute the merge bases while inside the inner loop of the
revision walker itself, its already busy and the flags are already
in-use on all objects.  Worse commit parents are being rewritten
in ways that may break merge base computation.  Oh, and lets not
even talk about how expensive that merge base computation is if we
were to perform it on every merge.  :-\

Thoughts?


diff --git a/revision.c b/revision.c
index 33d092c..aca62a6 100644
--- a/revision.c
+++ b/revision.c
@@ -365,7 +365,7 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
 		}
 		die("bad tree compare for commit %s", sha1_to_hex(commit->object.sha1));
 	}
-	if (tree_changed && !tree_same)
+	if (tree_changed && (!tree_same || !revs->simplify_history))
 		commit->object.flags |= TREECHANGE;
 }
 
diff --git a/t/t6008-rev-list-toosimple.sh b/t/t6008-rev-list-toosimple.sh
new file mode 100755
index 0000000..696b057
--- /dev/null
+++ b/t/t6008-rev-list-toosimple.sh
@@ -0,0 +1,41 @@
+#!/bin/sh
+
+test_description='test git-rev-list history over simplification'
+. ./test-lib.sh
+
+test_expect_success setup '
+	echo BASE >content &&
+	git add content &&
+	git commit -m BASE &&
+	base=$(git rev-parse --verify HEAD) &&
+
+	test_tick &&
+	git checkout -b side &&
+	echo SIDE >content &&
+	git commit -m SIDE content &&
+	side=$(git rev-parse --verify HEAD) &&
+
+	test_tick &&
+	git checkout master &&
+	echo OTHER >other &&
+	git add other &&
+	git commit -m OTHER &&
+
+	test_tick &&
+	git merge -s ours side &&
+	keep=$(git rev-parse --verify HEAD) &&
+	test OTHER = $(git cat-file blob HEAD:other) &&
+	test BASE = $(git cat-file blob HEAD:content)
+'
+
+cat >expect <<EOF
+$keep
+$base
+$side
+EOF
+test_expect_success 'revert merge in history' '
+	git rev-list --full-history HEAD -- content >actual &&
+	git diff expect actual
+'
+
+test_done

-- 
Shawn.

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: History over-simplification
  2007-09-23  4:46 History over-simplification Shawn O. Pearce
@ 2007-09-25 22:42 ` Junio C Hamano
  2007-09-26 15:55   ` Shawn O. Pearce
  0 siblings, 1 reply; 5+ messages in thread
From: Junio C Hamano @ 2007-09-25 22:42 UTC (permalink / raw)
  To: Shawn O. Pearce; +Cc: Linus Torvalds, git

"Shawn O. Pearce" <spearce@spearce.org> writes:

> I don't really like the patch to revision.c because it winds up
> showing trivial merges too.  What I really want is to have the
> "--full-history" option include a merge if either of the following
> is true:
>
>  a) The resulting path does not match _any_ of the parents.  We
>     already handle this case correctly in revision.c and correctly
>     show the "evil" merge.
>
>  b) The resulting path matches one of the parents but not one of
>     the others.  In such a case the merge should still be output if
>     a 3-way read-tree would not have chosen this result by default.

I am not sure (b) is useful in general.  Merging two branches
that fix the same issue but in different ways (think: 'maint'
and 'master' have different infrastructure and a fix initially
made on 'master' was backported to 'maint', and then later
'maint' needed to be merged to 'master' to carry forward other
fixes) is a norm, and in such cases taking the version from the
merged-to branch is almost always what happens.

Also it sounds to me by "if read-tree would not have chosen this
result by default" you mean this feature would not just need to
run merge-base but also recursive merge-base synthesis, and also
recreate the structural merge (aka "rename detection") there as
well.  Even if (b) is useful, it sounds like a very expensive
option, and the current merge-recursive code is structured in
such a way to be easily reused for this purpose.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: History over-simplification
  2007-09-25 22:42 ` Junio C Hamano
@ 2007-09-26 15:55   ` Shawn O. Pearce
  0 siblings, 0 replies; 5+ messages in thread
From: Shawn O. Pearce @ 2007-09-26 15:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, git

Junio C Hamano <gitster@pobox.com> wrote:
> "Shawn O. Pearce" <spearce@spearce.org> writes:
> 
> > I don't really like the patch to revision.c because it winds up
> > showing trivial merges too.  What I really want is to have the
> > "--full-history" option include a merge if either of the following
> > is true:
> >
> >  a) The resulting path does not match _any_ of the parents.  We
> >     already handle this case correctly in revision.c and correctly
> >     show the "evil" merge.
> >
> >  b) The resulting path matches one of the parents but not one of
> >     the others.  In such a case the merge should still be output if
> >     a 3-way read-tree would not have chosen this result by default.
> 
> I am not sure (b) is useful in general.  Merging two branches
> that fix the same issue but in different ways (think: 'maint'
> and 'master' have different infrastructure and a fix initially
> made on 'master' was backported to 'maint', and then later
> 'maint' needed to be merged to 'master' to carry forward other
> fixes) is a norm, and in such cases taking the version from the
> merged-to branch is almost always what happens.

(b) is useful, even in the case you just described.  Try doing the
above backport today.  Run `git log maint -- foo.c` and find the
backported commit B.  Now try to see if that commit is in master in
foo.c with `git log master -- foo.c`.  B won't appear as the merge
of maint into master chose master's revision and maint's history
is pruned away.

Now if the backported change was logically the same but was rewritten
considerably it might take you a while to figure out why a merge
discarded the change B.  The merge commit might actually describe
why in its message, but you can't find that particular merge commit
today, even with --full-history.  Not without my one-line patch.

Unfortunately my one-line patch causes all of the merges that the
path is involved in to be output.  That can be quite a lot of them.
 
> Also it sounds to me by "if read-tree would not have chosen this
> result by default" you mean this feature would not just need to
> run merge-base but also recursive merge-base synthesis, and also
> recreate the structural merge (aka "rename detection") there as
> well.  Even if (b) is useful, it sounds like a very expensive
> option, and the current merge-recursive code is structured in
> such a way to be easily reused for this purpose.

I think you meant here that it is *not* structured in a way to be
easily reused for this purpose.  That code relies heavily on the
index to allow it to create that synthetic merge base.  Making it
callable from within the revision walker would not be a small change.

But as you say, running merge-recursive here would be a very
expensive option.  Which is why I was saying the 3-way read-tree
result as that is cheaper to compute.  However the read-tree result
can be different from the merge-recursive result (think renames)
and yet both are still "trivial" resolutions that the user was
never involved in.

Of course other kinds of trivial merges (where changes from the
two branches are in the same file but far enough part it can be
easily merged as a 3-way file merge) appear to the path limiter
as though they were an evil merge, and such merges are output.

So maybe having --full-history output all merges that affected
that path really is the right choice.

-- 
Shawn.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: History over-simplification
@ 2007-09-27  4:56 linux
  2007-09-27  5:34 ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: linux @ 2007-09-27  4:56 UTC (permalink / raw)
  To: git, spearce; +Cc: gitster, linux

>>  b) The resulting path matches one of the parents but not one of
>>     the others.  In such a case the merge should still be output if
>>     a 3-way read-tree would not have chosen this result by default.

> I am not sure (b) is useful in general.  Merging two branches
> that fix the same issue but in different ways (think: 'maint'
> and 'master' have different infrastructure and a fix initially
> made on 'master' was backported to 'maint', and then later
> 'maint' needed to be merged to 'master' to carry forward other
> fixes) is a norm, and in such cases taking the version from the
> merged-to branch is almost always what happens.
>
> Also it sounds to me by "if read-tree would not have chosen this
> result by default" you mean this feature would not just need to
> run merge-base but also recursive merge-base synthesis, and also
> recreate the structural merge (aka "rename detection") there as
> well.  Even if (b) is useful, it sounds like a very expensive
> option, and the current merge-recursive code is structured in
> such a way to be easily reused for this purpose.

These seem like roundabout ways of expressing the requirement.
The desired rule is "no silent changes in the simplified history".
E.g. if the revision history of a particular file is:

      +-C-+
     /     \
A---B       B---D
     \     /
      +-B-+

Then the normal code will notice that there are no changes on the
lower branch, prune the merge entirely, and simplify to A---B---C---D.
This is, however, misleading; the true history is A---B---C---B---D.
A merge must be shown unless it matches a *non-eliminated* ancestor.

The point is that this isn't expressed in terms of what merge-base would
do, but in terms of what the path limiter is in the process of doing.


Now, I haven't dived into the Deep Magic of revision.c to figure out
where to put this into the code, unfortunately.


Another way to say it is that the desired simplified history
is achieved when you have exhausted the following two rules:
- You may delete any revision which has only one ancestor and
  is identical (after path-limiting) to that ancestor.
  (Descendants are implicitly linked to that ancestor.)
- You may delete any ancestor link A---B if there is an
  alternative directed path between A and B.
  (Fast-forward rule.)
Given unlabeled ancestor links, there is a unique fixed point.

The current code is willing to delete a revision that is
identical to *any* ancestor, even deleted ones.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: History over-simplification
  2007-09-27  4:56 linux
@ 2007-09-27  5:34 ` Junio C Hamano
  0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2007-09-27  5:34 UTC (permalink / raw)
  To: linux; +Cc: git, spearce

linux@horizon.com writes:

> These seem like roundabout ways of expressing the requirement.
> The desired rule is "no silent changes in the simplified history".
> E.g. if the revision history of a particular file is:
>
>       +-C-+
>      /     \
> A---B       B---D
>      \     /
>       +-B-+
>
> Then the normal code will notice that there are no changes on the
> lower branch, prune the merge entirely, and simplify to A---B---C---D.
> This is, however, misleading; the true history is A---B---C---B---D.

Do you denote the same content with the same letter in the
above?

If so, wouldn't the simplified history be just A-B-D?

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2007-09-27  5:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-09-23  4:46 History over-simplification Shawn O. Pearce
2007-09-25 22:42 ` Junio C Hamano
2007-09-26 15:55   ` Shawn O. Pearce
  -- strict thread matches above, loose matches on Subject: below --
2007-09-27  4:56 linux
2007-09-27  5:34 ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).