git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH] revision: --include-diversions adds helpful merges
@ 2020-04-08  1:22 Derrick Stolee via GitGitGadget
  2020-04-08  1:30 ` Junio C Hamano
                   ` (3 more replies)
  0 siblings, 4 replies; 23+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-04-08  1:22 UTC (permalink / raw)
  To: git; +Cc: peff, me, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The default file history simplification of "git log -- <path>" or
"git rev-list -- <path>" focuses on providing the smallest set of
commits that first contributed a change. The revision walk greatly
restricts the set of walked commits by visiting only the first
TREESAME parent of a merge commit, when one exists. This means
that portions of the commit-graph are not walked, which can be a
performance benefit, but can also "hide" commits that added changes
but were ignored by a merge resolution.

The --full-history option modifies this by walking all commits and
reporting a merge commit as "interesting" if it has _any_ parent
that is not TREESAME. This tends to be an over-representation of
important commits, especially in an environment where most merge
commits are created by pull request completion.

Suppose we have a commit A and we create a commit B on top that
changes our file. When we merge the pull request, we create a merge
commit M. If no one else changed the file in the first-parent
history between M and A, then M will not be TREESAME to its first
parent, but will be TREESAME to B. Thus, the simplified history
will be "B". However, M will appear in the --full-history mode.

However, suppose that a number of topics T1, T2, ..., Tn were
created based on commits C1, C2, ..., Cn between A and M as
follows:

  A----C1----C2--- ... ---Cn----M------P1---P2--- ... ---Pn
   \     \     \            \  /      /    /            /
    \     \__.. \            \/ ..__T1    /           Tn
     \           \__..       /\     ..__T2           /
      \_____________________B  \____________________/

If the commits T1, T2, ... Tn did not change the file, then all of
P1 through Pn will be TREESAME to their first parent, but not
TREESAME to their second. This means that all of those merge commits
appear in the --full-history view, with edges that immediately
collapse into the lower history without introducing interesting
single-parent commits.

The --simplify-merges option was introduced to remove these extra
merge commits. By noticing that the rewritten parents are reachable
from their first parents, those edges can be simplified away. Finally,
the commits now look like single-parent commits that are TREESAME to
their "only" parent. Thus, they are removed and this issue does not
cause issues anymore. However, this also ends up removing the commit
M from the history view! Even worse, the --simplify-merges option
requires walking the entire history before returning a single result.

Many Git users are using Git alongside a Git service that provides
code storage alongside a code review tool commonly called "Pull
Requests" or "Merge Requests" against a target branch.  When these
requests are accepted and merged, they typically create a merge
commit whose first parent is the previous branch tip and the second
parent is the tip of the topic branch used for the request. This
presents a valuable order to the parents, but also makes that merge
commit slightly special. Users may want to see not only which
commits changed a file, but which pull requests merged those commits
into their branch. In the previous example, this would mean the
users want to see the merge commit "M" in addition to the single-
parent commit "C".

Users are even more likely to want these merge commits when they
use pull requests to merge into a feature branch before merging that
feature branch into their trunk.

In some sense, users are asking for the "first" merge commit to
bring in the change to their branch. As long as the parent order is
consistent, this can be handled with the following rule:

  Include a merge commit if it is not TREESAME to its first
  parent, but is TREESAME to a later parent.

I call such merge commits "diversions" because they divert the
history walk away from the first-parent history. As such, this
change adds the "--include-diversions" option to rev-list and log.
To test these options, extend the standard test example to include
a merge commit that is not TREESAME to its first parent. It is
surprising that that option was not already in the example, as it
is instructive.

In particular, this extension demonstrates a common issue with file
history simplification. When a user resolves a merge conflict using
"-Xours" or otherwise ignoring one side of the conflict, they create
a TREESAME edge that probably should not be TREESAME. This leads
users to become frustrated and complain that "my change disappeared!"
In my experience, showing them history with --full-history and
--simplify-merges quickly reveals the problematic merge. As mentioned,
this option is expensive to compute. The --include-diversions option
_might_ show the merge commit (usually titled "resolving conflicts")
more quickly. Of course, this depends on the user having the correct
parent order, which is backwards when using "git pull".

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
    Add a new history mode
    
    This --include-diversions option could use a better name.
    
    An experienced developer in the Windows OS Engineering Systems team
    pointed out how hard it is to find out when a change was "introduced" in
    the Windows OS repo. Due to their multi-leveled, long-lived branch
    organization, a commit could be part of hundreds of pull requests as the
    branches are merged across the organization.
    
    My default answer was "this is complicated not because of Git, but
    because of how you are branching." I then tried to explain how finding
    the "first merge" to include a commit is incredibly difficult and
    requires performing multiple reachability queries. As I was working it
    out on paper, I realized that was true if we relied only on the
    commit-graph shape to inform our qurey.
    
    If we use the TREESAME information, then suddenly we get a much clearer
    picture! Let's simply pick out those merge commits that "introduced a
    change" because they are TREESAME to a non-first-parent (and not
    TREESAME to the first parent).
    
    My name of "diversions" could probably use some work, but I like the
    basic concept of this option.
    
    I welcome any and all feedback. Thanks!
    
    -Stolee

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-599%2Fderrickstolee%2Fnew-history-mode-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-599/derrickstolee/new-history-mode-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/599

 Documentation/rev-list-options.txt |  6 ++++
 revision.c                         | 16 ++++++++++-
 revision.h                         |  1 +
 t/t6012-rev-list-simplify.sh       | 44 ++++++++++++++++++++++++++++++
 4 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index bfd02ade991..0c878be94a9 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -342,6 +342,12 @@ Default mode::
 	branches if the end result is the same (i.e. merging branches
 	with the same content)
 
+--include-diversions::
+	Include all commits from the default mode, but also any merge
+	commits that are not TREESAME to the first parent but are
+	TREESAME to a later parent. This mode is helpful for showing
+	the merge commits that "first introduced" a change to a branch.
+
 --full-history::
 	Same as the default mode, but does not prune some history.
 
diff --git a/revision.c b/revision.c
index 8136929e236..915d8febdc4 100644
--- a/revision.c
+++ b/revision.c
@@ -870,7 +870,19 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
 			}
 			parent->next = NULL;
 			commit->parents = parent;
-			commit->object.flags |= TREESAME;
+
+			/*
+			 * A merge commit is a "diversion" if it is not
+			 * TREESAME to its first parent but is TREESAME
+			 * to a later parent. In the simplified history,
+			 * we "divert" the history walk to the later
+			 * parent. These commits are shown when "diversions"
+			 * is enabled, so do not mark the object as
+			 * TREESAME here.
+			 */
+			if (!revs->diversions || !nth_parent)
+				commit->object.flags |= TREESAME;
+
 			return;
 
 		case REV_TREE_NEW:
@@ -2265,6 +2277,8 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	} else if (!strcmp(arg, "--full-diff")) {
 		revs->diff = 1;
 		revs->full_diff = 1;
+	} else if (!strcmp(arg, "--include-diversions")) {
+		revs->diversions = 1;
 	} else if (!strcmp(arg, "--full-history")) {
 		revs->simplify_history = 0;
 	} else if (!strcmp(arg, "--relative-date")) {
diff --git a/revision.h b/revision.h
index 475f048fb61..f06a73cbcd8 100644
--- a/revision.h
+++ b/revision.h
@@ -129,6 +129,7 @@ struct rev_info {
 			no_walk:2,
 			remove_empty_trees:1,
 			simplify_history:1,
+			diversions:1,
 			topo_order:1,
 			simplify_merges:1,
 			simplify_by_decoration:1,
diff --git a/t/t6012-rev-list-simplify.sh b/t/t6012-rev-list-simplify.sh
index a10f0df02b0..9c91226f737 100755
--- a/t/t6012-rev-list-simplify.sh
+++ b/t/t6012-rev-list-simplify.sh
@@ -154,4 +154,48 @@ test_expect_success '--full-diff is not affected by --parents' '
 	test_cmp expected actual
 '
 
+#
+# Modify the test repo to add a merge whose first parent is not TREESAME
+# but whose second parent is TREESAME
+#
+# A--B----------G--H--I--K--L--N
+#  \  \           /     /     /
+#   \  \         /     /     /
+#    C------E---F     J     /
+#     \  \_/               /
+#      \                  /
+#       M-----------------
+test_expect_success 'expand graph' '
+	git switch -c branchM C &&
+	echo "new data" >file &&
+	git add file &&
+	test_tick &&
+	test_commit M &&
+
+	git checkout master &&
+	git merge -Xtheirs branchM -m "N" &&
+	note N
+'
+
+check_result 'M C A' -- file
+check_result 'N M C A' --include-diversions -- file
+
+check_result 'N M L K J I H F E D C G B A' --full-history --topo-order
+check_result 'N M L K I H G F E D C B J A' --full-history
+check_result 'N M L K I H G F E D C B J A' --full-history --date-order
+check_result 'N M L K I H G F E D B C J A' --full-history --author-date-order
+check_result 'N M K I H E C B A' --full-history -- file
+check_result 'N M K I H E C B A' --full-history --topo-order -- file
+check_result 'N M K I H E C B A' --full-history --date-order -- file
+check_result 'N M K I H E B C A' --full-history --author-date-order -- file
+check_result 'N M I E C B A' --simplify-merges -- file
+check_result 'N M I E C B A' --simplify-merges --topo-order -- file
+check_result 'N M I E C B A' --simplify-merges --date-order -- file
+check_result 'N M I E B C A' --simplify-merges --author-date-order -- file
+check_result 'M C A' --topo-order -- file
+check_result 'M C A' --date-order -- file
+check_result 'M C A' --author-date-order -- file
+check_result 'H' --first-parent -- another-file
+check_result 'H' --first-parent --topo-order -- another-file
+
 test_done

base-commit: 274b9cc25322d9ee79aa8e6d4e86f0ffe5ced925
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] revision: --include-diversions adds helpful merges
  2020-04-08  1:22 [PATCH] revision: --include-diversions adds helpful merges Derrick Stolee via GitGitGadget
@ 2020-04-08  1:30 ` Junio C Hamano
  2020-04-08  1:39   ` Derrick Stolee
  2020-04-08  2:13 ` brian m. carlson
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2020-04-08  1:30 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, peff, me, Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

>     This --include-diversions option could use a better name.

True, but I do not think of a better (or for that matter a worse)
one.  

As a new feature, I think this is a reasonable thing to want,
especially it is in line with the push in the past few years to
treat the first parent history specially.

I wonder how this would interact with the ancestry-path option?
That one also, like the simplify-merges option, needs a limited
traversal, and if this new mode can do without a limited traversal
(in other words, the output can be done incrementally from the tip)
and achieve something similar to what these other options wanted to
show, that would be great.

Thanks.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] revision: --include-diversions adds helpful merges
  2020-04-08  1:30 ` Junio C Hamano
@ 2020-04-08  1:39   ` Derrick Stolee
  2020-04-08 15:28     ` Derrick Stolee
  0 siblings, 1 reply; 23+ messages in thread
From: Derrick Stolee @ 2020-04-08  1:39 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, peff, me, Derrick Stolee

On 4/7/2020 9:30 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>>     This --include-diversions option could use a better name.
> 
> True, but I do not think of a better (or for that matter a worse)
> one.  
> 
> As a new feature, I think this is a reasonable thing to want,
> especially it is in line with the push in the past few years to
> treat the first parent history specially.
> 
> I wonder how this would interact with the ancestry-path option?
> That one also, like the simplify-merges option, needs a limited
> traversal, and if this new mode can do without a limited traversal
> (in other words, the output can be done incrementally from the tip)
> and achieve something similar to what these other options wanted to
> show, that would be great.

You're right. I briefly considered the --ancestry-path option before
realizing that would get a huge set of commits (for example: every
topic based on the branch after the pull request was merged).

The --include-diversions works incrementally like simplified merges.
Based on the implementation, it would not change the results when
added to a --full-history query. This makes sense: a diversion would
appear in the --full-history results, anyway.

It is worth adding tests for the combination with --ancestry-path
and --simplify-merges, as the --include-diversions option would
add results to those queries.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] revision: --include-diversions adds helpful merges
  2020-04-08  1:22 [PATCH] revision: --include-diversions adds helpful merges Derrick Stolee via GitGitGadget
  2020-04-08  1:30 ` Junio C Hamano
@ 2020-04-08  2:13 ` brian m. carlson
  2020-04-08 18:48 ` Jeff King
  2020-04-09  0:01 ` [PATCH v2] " Derrick Stolee via GitGitGadget
  3 siblings, 0 replies; 23+ messages in thread
From: brian m. carlson @ 2020-04-08  2:13 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, peff, me, Derrick Stolee

[-- Attachment #1: Type: text/plain, Size: 4814 bytes --]

On 2020-04-08 at 01:22:03, Derrick Stolee via GitGitGadget wrote:
> The --simplify-merges option was introduced to remove these extra
> merge commits. By noticing that the rewritten parents are reachable
> from their first parents, those edges can be simplified away. Finally,
> the commits now look like single-parent commits that are TREESAME to
> their "only" parent. Thus, they are removed and this issue does not
> cause issues anymore. However, this also ends up removing the commit
> M from the history view! Even worse, the --simplify-merges option
> requires walking the entire history before returning a single result.

I was not aware --simplify-merges required walking the entire history.
Now I know why my alias using it performs so poorly on $HUGE_REPOSITORY
with thousands of extra backmerges at $DAYJOB.

Thanks for teaching me something.

> Many Git users are using Git alongside a Git service that provides
> code storage alongside a code review tool commonly called "Pull
> Requests" or "Merge Requests" against a target branch.  When these
> requests are accepted and merged, they typically create a merge
> commit whose first parent is the previous branch tip and the second
> parent is the tip of the topic branch used for the request. This
> presents a valuable order to the parents, but also makes that merge
> commit slightly special. Users may want to see not only which
> commits changed a file, but which pull requests merged those commits
> into their branch. In the previous example, this would mean the
> users want to see the merge commit "M" in addition to the single-
> parent commit "C".

I should not hesitate to point out that this history is also true of the
Git Project's repository, although of course the merges may be of less
interest here.

> In some sense, users are asking for the "first" merge commit to
> bring in the change to their branch. As long as the parent order is
> consistent, this can be handled with the following rule:
> 
>   Include a merge commit if it is not TREESAME to its first
>   parent, but is TREESAME to a later parent.
> 
> I call such merge commits "diversions" because they divert the
> history walk away from the first-parent history. As such, this
> change adds the "--include-diversions" option to rev-list and log.
> To test these options, extend the standard test example to include
> a merge commit that is not TREESAME to its first parent. It is
> surprising that that option was not already in the example, as it
> is instructive.
> 
> In particular, this extension demonstrates a common issue with file
> history simplification. When a user resolves a merge conflict using
> "-Xours" or otherwise ignoring one side of the conflict, they create
> a TREESAME edge that probably should not be TREESAME. This leads
> users to become frustrated and complain that "my change disappeared!"
> In my experience, showing them history with --full-history and
> --simplify-merges quickly reveals the problematic merge. As mentioned,
> this option is expensive to compute. The --include-diversions option
> _might_ show the merge commit (usually titled "resolving conflicts")
> more quickly. Of course, this depends on the user having the correct
> parent order, which is backwards when using "git pull".

I can't comment on the contents of the patch, since I'm really not at
all familiar with the revision machinery, but I do think this change is a
good idea.  I see this as a very common use case, and I think this
commit message explains the rationale well.

> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>     Add a new history mode
> 
>     This --include-diversions option could use a better name.

As we all know, I'm terrible at naming things, so I have no suggestions
here.  I'm happy with the name as it stands, but am of course open to
other ideas.

> diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
> index bfd02ade991..0c878be94a9 100644
> --- a/Documentation/rev-list-options.txt
> +++ b/Documentation/rev-list-options.txt
> @@ -342,6 +342,12 @@ Default mode::
>  	branches if the end result is the same (i.e. merging branches
>  	with the same content)
> 
> +--include-diversions::
> +	Include all commits from the default mode, but also any merge
> +	commits that are not TREESAME to the first parent but are
> +	TREESAME to a later parent. This mode is helpful for showing
> +	the merge commits that "first introduced" a change to a branch.

I wasn't sure if this use of "TREESAME" would be confusing, but it looks
like we already use it extensively throughout the documentation, so it's
probably fine.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 263 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] revision: --include-diversions adds helpful merges
  2020-04-08  1:39   ` Derrick Stolee
@ 2020-04-08 15:28     ` Derrick Stolee
  2020-04-08 19:46       ` Junio C Hamano
  0 siblings, 1 reply; 23+ messages in thread
From: Derrick Stolee @ 2020-04-08 15:28 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, peff, me, Derrick Stolee, brian m. carlson

On 4/7/2020 9:39 PM, Derrick Stolee wrote:
> On 4/7/2020 9:30 PM, Junio C Hamano wrote:
>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>>>     This --include-diversions option could use a better name.
>>
>> True, but I do not think of a better (or for that matter a worse)
>> one.  

Here are some alternative names:

	--audit-merges
	--audit-trunk
	--first-merges
	--subtle-merges
	--more-merges

The "audit" name implies some of the intent: we are trying to
audit which merge commits introduced these changes. The --audit-trunk
implies we are using a trunk-based workflow where parent order is
critical. However, "trunk" may be confusing when there are multiple
long-lived branches.

A "first merge" is confusing when we see a sequence of multiple
diversion merges (I include a test with this exact situation in
my next version.)

"subtle" is a bit wishy-washy.

"--more-merges" is not very specific. The way we are adding
merges to the final result may not be the only way we want to
add "more" merges in the future.

So, I think "--audit-merges" is the best of these alternatives.
I'd be happy to be overruled with a different option. Hopefully,
these options inspire better ideas from the community.

>> As a new feature, I think this is a reasonable thing to want,
>> especially it is in line with the push in the past few years to
>> treat the first parent history specially.
>>
>> I wonder how this would interact with the ancestry-path option?
>> That one also, like the simplify-merges option, needs a limited
>> traversal, and if this new mode can do without a limited traversal
>> (in other words, the output can be done incrementally from the tip)
>> and achieve something similar to what these other options wanted to
>> show, that would be great.
> 
> You're right. I briefly considered the --ancestry-path option before
> realizing that would get a huge set of commits (for example: every
> topic based on the branch after the pull request was merged).
> 
> The --include-diversions works incrementally like simplified merges.
> Based on the implementation, it would not change the results when
> added to a --full-history query. This makes sense: a diversion would
> appear in the --full-history results, anyway.
> 
> It is worth adding tests for the combination with --ancestry-path
> and --simplify-merges, as the --include-diversions option would
> add results to those queries.

My gitgitgadget PR [1] is updated with tests and some new logic to
handle the new option along with --simplify-merges. The situation was
a bit subtle, so my next version will include a significant update to
the rev-list documentation under the "History Simplification" mode.

I'll give things some time to calm on the name of the option before
sending a v2.

My v2 also includes adding a new object flag "DIVERSION" to track
these commits from the TREESAME calculation through the simplify-merges
logic. When I was adding a new flag, I realized that I already
messed up the 32-bit alignment of "struct object" when adding the
TOPO_ORDER flags. The parsed, type, and flags bitfields add up to
33 bits!

A solution would include pulling the TOPO_ORDER_* flags to be bits
22 and 23 instead of 26 and 27, but that would collide with what is
happening in builtin/show-branch.c. But then I saw the following
comment in builtin/show-branch.c:

/*
 * TODO: convert this use of commit->object.flags to commit-slab
 * instead to store a pointer to ref name directly. Then use the same
 * UNINTERESTING definition from revision.h here.
 */

Is anyone interested in tackling this problem? I don't see any
test failures when I swap the TOPO_ORDER_ flag locations, but
that might just mean that show-branch isn't tested enough.

Thanks,
-Stolee

[1] https://github.com/gitgitgadget/git/pull/599

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] revision: --include-diversions adds helpful merges
  2020-04-08  1:22 [PATCH] revision: --include-diversions adds helpful merges Derrick Stolee via GitGitGadget
  2020-04-08  1:30 ` Junio C Hamano
  2020-04-08  2:13 ` brian m. carlson
@ 2020-04-08 18:48 ` Jeff King
  2020-04-09  0:01 ` [PATCH v2] " Derrick Stolee via GitGitGadget
  3 siblings, 0 replies; 23+ messages in thread
From: Jeff King @ 2020-04-08 18:48 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget; +Cc: git, me, Derrick Stolee

On Wed, Apr 08, 2020 at 01:22:03AM +0000, Derrick Stolee via GitGitGadget wrote:

> The default file history simplification of "git log -- <path>" or
> "git rev-list -- <path>" focuses on providing the smallest set of
> commits that first contributed a change. The revision walk greatly
> restricts the set of walked commits by visiting only the first
> TREESAME parent of a merge commit, when one exists. This means
> that portions of the commit-graph are not walked, which can be a
> performance benefit, but can also "hide" commits that added changes
> but were ignored by a merge resolution.
> [...]

Thanks for a really great description of the problem.

Playing around with the patch, I found one curiosity. Try this:

  git log --graph --oneline origin -- GIT-VERSION-GEN >old
  git log --graph --oneline --include-diversions \
                            origin -- GIT-VERSION-GEN >new
  diff -u old new

The first hunk has:

  @@ -70,6 +70,7 @@
   * 20769079d2 Git 2.12-rc2
   * 5588dbffbd Git 2.12-rc1
   * 6e3a7b3398 Git 2.12-rc0
  +* 0a45050a14 Merge branch 'rj/git-version-gen-do-not-force-abbrev'
   * a7659747c2 GIT-VERSION-GEN: do not force abbreviation length used by 'describe'
   * 8d7a455ed5 Start post 2.11 cycle
   * 454cb6bd52 Git 2.11

which makes sense. That merge brought in a7659747c2, and the other side
hadn't touched it. But I can't tell from the output how the two are
related. Nor can I just add "-p" to the invocation; after we've
simplified, it has only a single parent, but it's TREESAME to that
parent. So it has no diff.

I actually think the most descriptive output here would be something
like:

  * 6e3a7b3398 Git 2.12-rc0
  * 0a45050a14 Merge branch 'rj/git-version-gen-do-not-force-abbrev'
  |\
  | * a7659747c2 GIT-VERSION-GEN: do not force abbreviation length used by 'describe'
  |/
  * Start post 2.11 cycle

I.e., leaving both parents intact for a "diversion" merge. But maybe
that would have secondary effects in other places.

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] revision: --include-diversions adds helpful merges
  2020-04-08 15:28     ` Derrick Stolee
@ 2020-04-08 19:46       ` Junio C Hamano
  2020-04-08 20:05         ` Jeff King
  0 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2020-04-08 19:46 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Derrick Stolee via GitGitGadget, git, peff, me, Derrick Stolee,
	brian m. carlson

Derrick Stolee <stolee@gmail.com> writes:

> On 4/7/2020 9:39 PM, Derrick Stolee wrote:
>> On 4/7/2020 9:30 PM, Junio C Hamano wrote:
>>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>>
>>>>     This --include-diversions option could use a better name.
>>>
>>> True, but I do not think of a better (or for that matter a worse)
>>> one.  
>
> Here are some alternative names:
>
> 	--audit-merges
> 	--audit-trunk
> 	--first-merges
> 	--subtle-merges
> 	--more-merges
>
> The "audit" name implies some of the intent: we are trying to
> audit which merge commits introduced these changes. The --audit-trunk
> implies we are using a trunk-based workflow where parent order is
> critical. However, "trunk" may be confusing when there are multiple
> long-lived branches.

Our features written with the intent to be useful for one purpose
often end up being used for purposes other than what the feature was
originally written for (the "--pickaxe" has always been a bitter
example of this for me).

For that reason, I am a bit hesitant to endorse "audit" exactly
because of the implication of "intent".

We usually try to give a single simplest history that explains how
the current content in the path came to be.  For that, commit M in
the illustration in your original message does not really help when
we ignore which parent chain owns the history, but in practice, many
workflows treat the first parent chain as something special, so we
want to show, not just the individual changes that matter, where the
changes are introduced in the first-parent chain.

I wonder if there is a simple-enough phrase to convey what the
latter half of above sentence says.  "include" and "keep" are both
good verbs---normally we discard these merges, because they do not
contribute at the level of individual changes, but with the option,
we "include" or "keep" these merges in the output.  It's not like
we keep _all_ the merges, but selected merges only.  How do we
decide which merges to keep?

I guess your "--first-merges" came from such a line of thought, and
is the closest among the five to what I have in mind, but it drops
too many words and loses too much meaning.  

"--keep-first-parent-merges", perhaps?





^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] revision: --include-diversions adds helpful merges
  2020-04-08 19:46       ` Junio C Hamano
@ 2020-04-08 20:05         ` Jeff King
  2020-04-08 20:22           ` Derrick Stolee
  0 siblings, 1 reply; 23+ messages in thread
From: Jeff King @ 2020-04-08 20:05 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, Derrick Stolee via GitGitGadget, git, me,
	Derrick Stolee, brian m. carlson

On Wed, Apr 08, 2020 at 12:46:41PM -0700, Junio C Hamano wrote:

> Our features written with the intent to be useful for one purpose
> often end up being used for purposes other than what the feature was
> originally written for (the "--pickaxe" has always been a bitter
> example of this for me).
> 
> For that reason, I am a bit hesitant to endorse "audit" exactly
> because of the implication of "intent".

Yeah, I agree with this.

> I wonder if there is a simple-enough phrase to convey what the
> latter half of above sentence says.  "include" and "keep" are both
> good verbs---normally we discard these merges, because they do not
> contribute at the level of individual changes, but with the option,
> we "include" or "keep" these merges in the output.  It's not like
> we keep _all_ the merges, but selected merges only.  How do we
> decide which merges to keep?
> 
> I guess your "--first-merges" came from such a line of thought, and
> is the closest among the five to what I have in mind, but it drops
> too many words and loses too much meaning.  
> 
> "--keep-first-parent-merges", perhaps?

FWIW, this name left me more confused, because "first-parent merges"
isn't an already-defined term I knew. And it seems like all merges have
a first parent. Having read the patch description, I guess it's "a merge
which isn't TREESAME to its first-parent".

I can't think of a more succinct way to name that, though. And possibly
if we gave that definition in the documentation, that would be enough.
The name doesn't have to be a complete description; it only has to make
sense once you know what you're trying to do (and be memorable enough).

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] revision: --include-diversions adds helpful merges
  2020-04-08 20:05         ` Jeff King
@ 2020-04-08 20:22           ` Derrick Stolee
  2020-04-08 21:35             ` Junio C Hamano
  0 siblings, 1 reply; 23+ messages in thread
From: Derrick Stolee @ 2020-04-08 20:22 UTC (permalink / raw)
  To: Jeff King, Junio C Hamano
  Cc: Derrick Stolee via GitGitGadget, git, me, Derrick Stolee,
	brian m. carlson

On 4/8/2020 4:05 PM, Jeff King wrote:
> On Wed, Apr 08, 2020 at 12:46:41PM -0700, Junio C Hamano wrote:
> 
>> Our features written with the intent to be useful for one purpose
>> often end up being used for purposes other than what the feature was
>> originally written for (the "--pickaxe" has always been a bitter
>> example of this for me).
>>
>> For that reason, I am a bit hesitant to endorse "audit" exactly
>> because of the implication of "intent".
> 
> Yeah, I agree with this.
> 
>> I wonder if there is a simple-enough phrase to convey what the
>> latter half of above sentence says.  "include" and "keep" are both
>> good verbs---normally we discard these merges, because they do not
>> contribute at the level of individual changes, but with the option,
>> we "include" or "keep" these merges in the output.  It's not like
>> we keep _all_ the merges, but selected merges only.  How do we
>> decide which merges to keep?
>>
>> I guess your "--first-merges" came from such a line of thought, and
>> is the closest among the five to what I have in mind, but it drops
>> too many words and loses too much meaning.  
>>
>> "--keep-first-parent-merges", perhaps?
> 
> FWIW, this name left me more confused, because "first-parent merges"
> isn't an already-defined term I knew. And it seems like all merges have
> a first parent. Having read the patch description, I guess it's "a merge
> which isn't TREESAME to its first-parent".
> 
> I can't think of a more succinct way to name that, though. And possibly
> if we gave that definition in the documentation, that would be enough.
> The name doesn't have to be a complete description; it only has to make
> sense once you know what you're trying to do (and be memorable enough).

Then I suppose we should focus on naming merge commits with this property:

  A merge commit that is not TREESAME to its first parent (but is TREESAME
  to a later parent).

The part in parentheses may be optional, because a merge commit that is
not TREESAME to any parent will be included by every history mode.

In my latest attempt at documentation, I called these merges "diverters"
yet still used "--include-diversions". Here are a few other words that we
could use:

 * diverters or diversions
 * redirects
 * switches (think railroad switch). Synonym: exchange
 * detours

The "switch" or "exchange" words are probably bad because they have
noun _and_ verb forms.

Or we could look again at the history results as a whole to find
inspiration for the command-line option:

 * --merge-trail

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] revision: --include-diversions adds helpful merges
  2020-04-08 20:22           ` Derrick Stolee
@ 2020-04-08 21:35             ` Junio C Hamano
  2020-04-08 23:59               ` Derrick Stolee
  0 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2020-04-08 21:35 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Jeff King, Derrick Stolee via GitGitGadget, git, me,
	Derrick Stolee, brian m. carlson

Derrick Stolee <stolee@gmail.com> writes:

> Then I suppose we should focus on naming merge commits with this property:
>
>   A merge commit that is not TREESAME to its first parent (but is TREESAME
>   to a later parent).
>
> The part in parentheses may be optional, because a merge commit that is
> not TREESAME to any parent will be included by every history mode.

A merge that is TREESAME to its first parent does not introduce
anything new to the mainline (as far as the paths that match the
pathspec are concerned).  We are trying to find names to call merges
that are not those no-op merges.  Hmph...

> In my latest attempt at documentation, I called these merges "diverters"
> yet still used "--include-diversions". Here are a few other words that we
> could use:
>
>  * diverters or diversions
>  * redirects
>  * switches (think railroad switch). Synonym: exchange
>  * detours

...none of the above tells me that they are not no-op (in other
words, they do something meaningful), so I must be coming from
a direction different from you are.  What redirects from what other
thing, for example?



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] revision: --include-diversions adds helpful merges
  2020-04-08 21:35             ` Junio C Hamano
@ 2020-04-08 23:59               ` Derrick Stolee
  2020-04-09  0:08                 ` Junio C Hamano
  0 siblings, 1 reply; 23+ messages in thread
From: Derrick Stolee @ 2020-04-08 23:59 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Derrick Stolee via GitGitGadget, git, me,
	Derrick Stolee, brian m. carlson

On 4/8/2020 5:35 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
> 
>> Then I suppose we should focus on naming merge commits with this property:
>>
>>   A merge commit that is not TREESAME to its first parent (but is TREESAME
>>   to a later parent).
>>
>> The part in parentheses may be optional, because a merge commit that is
>> not TREESAME to any parent will be included by every history mode.
> 
> A merge that is TREESAME to its first parent does not introduce
> anything new to the mainline (as far as the paths that match the
> pathspec are concerned).  We are trying to find names to call merges
> that are not those no-op merges.  Hmph...

There are three situations for a merge commit:

1. TREESAME to _all_ parents. These are not included.
2. not TREESAME to _all_ parents. These are already included.
3. TREESAME to some, but not TREESAME to others.

The third mode is the one that default mode will drop, but --full-history
will include. The new mode will include some of these (the ones that are
NOT TREESAME to their first parent).

>> In my latest attempt at documentation, I called these merges "diverters"
>> yet still used "--include-diversions". Here are a few other words that we
>> could use:
>>
>>  * diverters or diversions
>>  * redirects
>>  * switches (think railroad switch). Synonym: exchange
>>  * detours
> 
> ...none of the above tells me that they are not no-op (in other
> words, they do something meaningful), so I must be coming from
> a direction different from you are.  What redirects from what other
> thing, for example?

The merges do something meaningful: they "merge in" a "real" change.

I'll just submit my v2 as-is, which includes a significant change to
the documentation that should make things more clear.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2] revision: --include-diversions adds helpful merges
  2020-04-08  1:22 [PATCH] revision: --include-diversions adds helpful merges Derrick Stolee via GitGitGadget
                   ` (2 preceding siblings ...)
  2020-04-08 18:48 ` Jeff King
@ 2020-04-09  0:01 ` Derrick Stolee via GitGitGadget
  2020-04-10 12:19   ` [PATCH v3] revision: --show-pulls " Derrick Stolee via GitGitGadget
  3 siblings, 1 reply; 23+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-04-09  0:01 UTC (permalink / raw)
  To: git; +Cc: peff, me, Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The default file history simplification of "git log -- <path>" or
"git rev-list -- <path>" focuses on providing the smallest set of
commits that first contributed a change. The revision walk greatly
restricts the set of walked commits by visiting only the first
TREESAME parent of a merge commit, when one exists. This means
that portions of the commit-graph are not walked, which can be a
performance benefit, but can also "hide" commits that added changes
but were ignored by a merge resolution.

The --full-history option modifies this by walking all commits and
reporting a merge commit as "interesting" if it has _any_ parent
that is not TREESAME. This tends to be an over-representation of
important commits, especially in an environment where most merge
commits are created by pull request completion.

Suppose we have a commit A and we create a commit B on top that
changes our file. When we merge the pull request, we create a merge
commit M. If no one else changed the file in the first-parent
history between M and A, then M will not be TREESAME to its first
parent, but will be TREESAME to B. Thus, the simplified history
will be "B". However, M will appear in the --full-history mode.

However, suppose that a number of topics T1, T2, ..., Tn were
created based on commits C1, C2, ..., Cn between A and M as
follows:

  A----C1----C2--- ... ---Cn----M------P1---P2--- ... ---Pn
   \     \     \            \  /      /    /            /
    \     \__.. \            \/ ..__T1    /           Tn
     \           \__..       /\     ..__T2           /
      \_____________________B  \____________________/

If the commits T1, T2, ... Tn did not change the file, then all of
P1 through Pn will be TREESAME to their first parent, but not
TREESAME to their second. This means that all of those merge commits
appear in the --full-history view, with edges that immediately
collapse into the lower history without introducing interesting
single-parent commits.

The --simplify-merges option was introduced to remove these extra
merge commits. By noticing that the rewritten parents are reachable
from their first parents, those edges can be simplified away. Finally,
the commits now look like single-parent commits that are TREESAME to
their "only" parent. Thus, they are removed and this issue does not
cause issues anymore. However, this also ends up removing the commit
M from the history view! Even worse, the --simplify-merges option
requires walking the entire history before returning a single result.

Many Git users are using Git alongside a Git service that provides
code storage alongside a code review tool commonly called "Pull
Requests" or "Merge Requests" against a target branch.  When these
requests are accepted and merged, they typically create a merge
commit whose first parent is the previous branch tip and the second
parent is the tip of the topic branch used for the request. This
presents a valuable order to the parents, but also makes that merge
commit slightly special. Users may want to see not only which
commits changed a file, but which pull requests merged those commits
into their branch. In the previous example, this would mean the
users want to see the merge commit "M" in addition to the single-
parent commit "C".

Users are even more likely to want these merge commits when they
use pull requests to merge into a feature branch before merging that
feature branch into their trunk.

In some sense, users are asking for the "first" merge commit to
bring in the change to their branch. As long as the parent order is
consistent, this can be handled with the following rule:

  Include a merge commit if it is not TREESAME to its first
  parent, but is TREESAME to a later parent.

I call such merge commits "diversions" because they divert the
history walk away from the first-parent history. As such, this
change adds the "--include-diversions" option to rev-list and log.
To test these options, extend the standard test example to include
a merge commit that is not TREESAME to its first parent. It is
surprising that that option was not already in the example, as it
is instructive.

In particular, this extension demonstrates a common issue with file
history simplification. When a user resolves a merge conflict using
"-Xours" or otherwise ignoring one side of the conflict, they create
a TREESAME edge that probably should not be TREESAME. This leads
users to become frustrated and complain that "my change disappeared!"
In my experience, showing them history with --full-history and
--simplify-merges quickly reveals the problematic merge. As mentioned,
this option is expensive to compute. The --include-diversions option
_might_ show the merge commit (usually titled "resolving conflicts")
more quickly. Of course, this depends on the user having the correct
parent order, which is backwards when using "git pull".

There are some special considerations when combining the
--include-diversions option with --simplify-merges. This requires
adding a new DIVERSION object flag to store the information from
the initial TREESAME comparisons. This helps avoid dropping those
commits in later filters. This is covered by a test, including how
the parents can be simplified. Since "struct object" has already
ruined its 32-bit alignment by using 33 bits across parsed, type,
and flags member, let's not make it worse. DIVERSION is used in
revision.c with the same value (1u<<15) as REACHABLE in
commit-graph.c. The REACHABLE flag is only used when writing a
commit-graph file, and a revision walk using --include-diversions
does not happen in the same process. Care must be taken in the
future to ensure this remains the case.

Update Documentation/rev-list-options.txt with significant details
around this option. This requires updating the example in the
History Simplification section to demonstrate some of the problems
with TREESAME second parents.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
    Add a new history mode
    
    This --include-diversions option could use a better name.
    
    An experienced developer in the Windows OS Engineering Systems team
    pointed out how hard it is to find out when a change was "introduced" in
    the Windows OS repo. Due to their multi-leveled, long-lived branch
    organization, a commit could be part of hundreds of pull requests as the
    branches are merged across the organization.
    
    My default answer was "this is complicated not because of Git, but
    because of how you are branching." I then tried to explain how finding
    the "first merge" to include a commit is incredibly difficult and
    requires performing multiple reachability queries. As I was working it
    out on paper, I realized that was true if we relied only on the
    commit-graph shape to inform our qurey.
    
    If we use the TREESAME information, then suddenly we get a much clearer
    picture! Let's simply pick out those merge commits that "introduced a
    change" because they are TREESAME to a non-first-parent (and not
    TREESAME to the first parent).
    
    My name of "diversions" could probably use some work, but I like the
    basic concept of this option.
    
    I welcome any and all feedback. Thanks!
    
    UPDATES in v2:
    
     * The functionality is a bit more complicated to work with
       --simplify-merges.
       
       
     * The documentation is significantly expanded with an example that
       highlights the shortcomings of the default history simplification.
       
       
     * The documented example is used in the test script instead of simply
       extending the previous example.
       
       
    
    -Stolee

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-599%2Fderrickstolee%2Fnew-history-mode-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-599/derrickstolee/new-history-mode-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/599

Range-diff vs v1:

 1:  8bf93688392 ! 1:  146c443e7f1 revision: --include-diversions adds helpful merges
     @@ Commit message
          more quickly. Of course, this depends on the user having the correct
          parent order, which is backwards when using "git pull".
      
     +    There are some special considerations when combining the
     +    --include-diversions option with --simplify-merges. This requires
     +    adding a new DIVERSION object flag to store the information from
     +    the initial TREESAME comparisons. This helps avoid dropping those
     +    commits in later filters. This is covered by a test, including how
     +    the parents can be simplified. Since "struct object" has already
     +    ruined its 32-bit alignment by using 33 bits across parsed, type,
     +    and flags member, let's not make it worse. DIVERSION is used in
     +    revision.c with the same value (1u<<15) as REACHABLE in
     +    commit-graph.c. The REACHABLE flag is only used when writing a
     +    commit-graph file, and a revision walk using --include-diversions
     +    does not happen in the same process. Care must be taken in the
     +    future to ensure this remains the case.
     +
     +    Update Documentation/rev-list-options.txt with significant details
     +    around this option. This requires updating the example in the
     +    History Simplification section to demonstrate some of the problems
     +    with TREESAME second parents.
     +
          Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
      
       ## Documentation/rev-list-options.txt ##
     @@ Documentation/rev-list-options.txt: Default mode::
       --full-history::
       	Same as the default mode, but does not prune some history.
       
     +@@ Documentation/rev-list-options.txt: Note the major differences in `N`, `P`, and `Q` over `--full-history`:
     +   parent and is TREESAME.
     + --
     + 
     +-Finally, there is a fifth simplification mode available:
     ++There is another simplification mode available:
     + 
     + --ancestry-path::
     + 	Limit the displayed commits to those directly on the ancestry
     +@@ Documentation/rev-list-options.txt: option does. Applied to the 'D..M' range, it results in:
     + 				L--M
     + -----------------------------------------------------------------------
     + 
     ++Before discussing another option, `--include-diversions`, we need to
     ++create a new example history.
     +++
     ++A common problem users face when looking at simplified history is that a
     ++commit they know changed a file somehow does not appear in the file's
     ++simplified history. Let's demonstrate a new example and show how options
     ++such as `--full-history` and `--simplify-merges` works in that case:
     +++
     ++-----------------------------------------------------------------------
     ++	  .-A---M-----C--N---O---P
     ++	 /     / \  \  \/   /   /
     ++	I     B   \  R-'`-Z'   /
     ++	 \   /     \/         /
     ++	  \ /      /\        /
     ++	   `---X--'  `---Y--'
     ++-----------------------------------------------------------------------
     +++
     ++For this example, suppose `I` created `file.txt` which was modified by
     ++`A`, `B`, and `X` in different ways. The single-parent commits `C`, `Z`,
     ++and `Y` do not change `file.txt`. The merge commit `M` was created by
     ++resolving the merge conflict to include both changes from `A` and `B`
     ++and hence is not TREESAME to either. The merge commit `R`, however, was
     ++created by ignoring the contents of `file.txt` at `M` and taking only
     ++the contents of `file.txt` at `X`. Hence, `R` is TREESAME to `X` but not
     ++`M`. Finally, the natural merge resolution to create `N` is to take the
     ++contents of `file.txt` at `R`, so `N` is TREESAME to `R` but not `C`.
     ++The merge commits `O` and `P` are TREESAME to their first parents, but
     ++not to their second parents, `Z` and `Y` respectively.
     +++
     ++When using the default mode, `N` and `R` both have a TREESAME parent, so
     ++those edges are walked and the others are ignored. The resulting history
     ++graph is:
     +++
     ++-----------------------------------------------------------------------
     ++	I---X
     ++-----------------------------------------------------------------------
     +++
     ++When using `--full-history`, Git walks every edge. This will discover
     ++the commits `A` and `B` and the merge `M`, but also will reveal the
     ++merge commits `O` and `P`. With parent rewriting, the resulting graph is:
     +++
     ++-----------------------------------------------------------------------
     ++	  .-A---M--------N---O---P
     ++	 /     / \  \  \/   /   /
     ++	I     B   \  R-'`--'   /
     ++	 \   /     \/         /
     ++	  \ /      /\        /
     ++	   `---X--'  `------'
     ++-----------------------------------------------------------------------
     +++
     ++Here, the merge commits `O` and `P` contribute extra noise, as they did
     ++not actually contribute a change to `file.txt`. They only merged a topic
     ++that was based on an older version of `file.txt`. This is a common
     ++issue in repositories using a workflow where many contributors work in
     ++parallel and merge their topic branches along a single trunk: manu
     ++unrelated merges appear in the `--full-history` results.
     +++
     ++When using the `--simplify-merges` option, the commits `O` and `P`
     ++disappear from the results. This is because the rewritten second parents
     ++of `O` and `P` are reachable from their first parents. Those edges are
     ++removed and then the commits look like single-parent commits that are
     ++TREESAME to their parent. This also happens to the commit `N`, resulting
     ++in a history view as follows:
     +++
     ++-----------------------------------------------------------------------
     ++	  .-A---M--.
     ++	 /     /    \
     ++	I     B      R
     ++	 \   /      /
     ++	  \ /      /
     ++	   `---X--'
     ++-----------------------------------------------------------------------
     +++
     ++In this view, we see all of the important single-parent changes from
     ++`A`, `B`, and `X`. We also see the carefully-resolved merge `M` and the
     ++not-so-carefully-resolved merge `R`. This is usually enough information
     ++to determine why the commits `A` and `B` "disappeared" from history in
     ++the default view. However, there are a few issues with this approach.
     +++
     ++The first issue is performance. Unlike any previous option, the
     ++`--simplify-merges` option requires walking the entire commit history
     ++before returning a single result. This can make the option difficult to
     ++use for very large repositories.
     +++
     ++The second issue is one of auditing. When many contributors are working
     ++on the same repository, it is important which merge commits introduced
     ++a change into an important branch. The problematic merge `R` above is
     ++not likely to be the merge commit that was used to merge into an
     ++important branch. Instead, the merge `N` was used to merge `R` and `X`
     ++into the important branch. This commit may have information about why
     ++the change `X` came to override the changes from `A` and `B` in its
     ++commit message.
     +++
     ++The `--include-diversions` option helps with both of these issues. A
     ++merge commit is considered a "diverter" if it is not TREESAME to its
     ++first parent but is TREESAME to a later parent. These merges "divert"
     ++the history walk to a second parent instead of continuing along the
     ++first-parent history as expected. When using `--include-diversions`
     ++on this example (and no other options) the resulting graph is:
     +++
     ++-----------------------------------------------------------------------
     ++	I---X---R---N
     ++-----------------------------------------------------------------------
     +++
     ++Here, the merge commits `R` and `N` are included because they diverted
     ++the walk away from their first-parent history. They are the reason the
     ++commits `A` and `B` do not appear in the history.
     +++
     ++When `--include-diversions` is paired with `--simplify-merges`, the
     ++graph includes all of the necessary information:
     +++
     ++-----------------------------------------------------------------------
     ++	  .-A---M--.   N
     ++	 /     /    \ /
     ++	I     B      R
     ++	 \   /      /
     ++	  \ /      /
     ++	   `---X--'
     ++-----------------------------------------------------------------------
     +++
     ++Notice that since `M` is reachable from `R`, the edge from `N` to `M`
     ++was simplified away. However, `N` still appears in the history as an
     ++important commit because it would divert a simplified history walk.
     ++
     + The `--simplify-by-decoration` option allows you to view only the
     + big picture of the topology of the history, by omitting commits
     + that are not referenced by tags.  Commits are marked as !TREESAME
     +
     + ## object.h ##
     +@@ object.h: struct object_array {
     + 
     + /*
     +  * object flag allocation:
     +- * revision.h:               0---------10                              25----28
     ++ * revision.h:               0---------10         15                   25----28
     +  * fetch-pack.c:             01
     +  * negotiator/default.c:       2--5
     +  * walker.c:                 0-2
      
       ## revision.c ##
      @@ revision.c: static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
     @@ revision.c: static void try_to_simplify_commit(struct rev_info *revs, struct com
       			return;
       
       		case REV_TREE_NEW:
     +@@ revision.c: static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
     + 				relevant_change = 1;
     + 			else
     + 				irrelevant_change = 1;
     ++
     ++			if (!nth_parent)
     ++				commit->object.flags |= DIVERSION;
     ++
     + 			continue;
     + 		}
     + 		die("bad tree compare for commit %s", oid_to_hex(&commit->object.oid));
      @@ revision.c: static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
       	} else if (!strcmp(arg, "--full-diff")) {
       		revs->diff = 1;
     @@ revision.c: static int handle_revision_opt(struct rev_info *revs, int argc, cons
       	} else if (!strcmp(arg, "--full-history")) {
       		revs->simplify_history = 0;
       	} else if (!strcmp(arg, "--relative-date")) {
     +@@ revision.c: static struct commit_list **simplify_one(struct rev_info *revs, struct commit *c
     + 	if (!cnt ||
     + 	    (commit->object.flags & UNINTERESTING) ||
     + 	    !(commit->object.flags & TREESAME) ||
     +-	    (parent = one_relevant_parent(revs, commit->parents)) == NULL)
     ++	    (parent = one_relevant_parent(revs, commit->parents)) == NULL ||
     ++	    (revs->diversions && (commit->object.flags & DIVERSION)))
     + 		st->simplified = commit;
     + 	else {
     + 		pst = locate_simplify_state(revs, parent);
     +@@ revision.c: enum commit_action get_commit_action(struct rev_info *revs, struct commit *commi
     + 			/* drop merges unless we want parenthood */
     + 			if (!want_ancestry(revs))
     + 				return commit_ignore;
     ++
     ++			if (revs->diversions && (commit->object.flags & DIVERSION))
     ++				return commit_show;
     ++
     + 			/*
     + 			 * If we want ancestry, then need to keep any merges
     + 			 * between relevant commits to tie together topology.
      
       ## revision.h ##
     +@@
     + #define SYMMETRIC_LEFT	(1u<<8)
     + #define PATCHSAME	(1u<<9)
     + #define BOTTOM		(1u<<10)
     ++
     ++/* WARNING: This is also used as REACHABLE in commit-graph.c. */
     ++#define DIVERSION	(1u<<15)
     + /*
     +  * Indicates object was reached by traversal. i.e. not given by user on
     +  * command-line or stdin.
     +@@
     +  */
     + #define NOT_USER_GIVEN	(1u<<25)
     + #define TRACK_LINEAR	(1u<<26)
     +-#define ALL_REV_FLAGS	(((1u<<11)-1) | NOT_USER_GIVEN | TRACK_LINEAR)
     ++#define ALL_REV_FLAGS	(((1u<<11)-1) | NOT_USER_GIVEN | TRACK_LINEAR | DIVERSION)
     + 
     + #define TOPO_WALK_EXPLORED	(1u<<27)
     + #define TOPO_WALK_INDEGREE	(1u<<28)
      @@ revision.h: struct rev_info {
       			no_walk:2,
       			remove_empty_trees:1,
     @@ t/t6012-rev-list-simplify.sh: test_expect_success '--full-diff is not affected b
       '
       
      +#
     -+# Modify the test repo to add a merge whose first parent is not TREESAME
     -+# but whose second parent is TREESAME
     ++# Create a new history to demonstrate the value of --include-diversions
     ++# with respect to the subtleties of simplified history, --full-history,
     ++# and --simplify-merges.
     ++#
     ++#   .-A---M-----C--N---O---P
     ++#  /     / \  \  \/   /   /
     ++# I     B   \  R-'`-Z'   /
     ++#  \   /     \/         /
     ++#   \ /      /\        /
     ++#    `---X--'  `---Y--'
      +#
     -+# A--B----------G--H--I--K--L--N
     -+#  \  \           /     /     /
     -+#   \  \         /     /     /
     -+#    C------E---F     J     /
     -+#     \  \_/               /
     -+#      \                  /
     -+#       M-----------------
     -+test_expect_success 'expand graph' '
     -+	git switch -c branchM C &&
     -+	echo "new data" >file &&
     ++# This example is explained in Documentation/rev-list-options.txt
     ++
     ++test_expect_success 'rebuild repo' '
     ++	rm -rf .git * &&
     ++	git init &&
     ++	git switch -c main &&
     ++
     ++	echo base >file &&
      +	git add file &&
     -+	test_tick &&
     -+	test_commit M &&
     ++	test_commit I &&
      +
     -+	git checkout master &&
     -+	git merge -Xtheirs branchM -m "N" &&
     -+	note N
     ++	echo A >file &&
     ++	git add file &&
     ++	test_commit A &&
     ++
     ++	git switch -c branchB I &&
     ++	echo B >file &&
     ++	git add file &&
     ++	test_commit B &&
     ++
     ++	git switch main &&
     ++	test_must_fail git merge -m "M" B &&
     ++	echo A >file &&
     ++	echo B >>file &&
     ++	git add file &&
     ++	git merge --continue &&
     ++	note M &&
     ++
     ++	echo C >other &&
     ++	git add other &&
     ++	test_commit C &&
     ++
     ++	git switch -c branchX I &&
     ++	echo X >file &&
     ++	git add file &&
     ++	test_commit X &&
     ++
     ++	git switch -c branchR M &&
     ++	git merge -m R -Xtheirs X &&
     ++	note R &&
     ++
     ++	git switch main &&
     ++	git merge -m N R &&
     ++	note N &&
     ++
     ++	git switch -c branchY M &&
     ++	echo Y >y &&
     ++	git add y &&
     ++	test_commit Y &&
     ++
     ++	git switch -c branchZ C &&
     ++	echo Z >z &&
     ++	git add z &&
     ++	test_commit Z &&
     ++
     ++	git switch main &&
     ++	git merge -m O Z &&
     ++	note O &&
     ++
     ++	git merge -m P Y &&
     ++	note P
      +'
      +
     -+check_result 'M C A' -- file
     -+check_result 'N M C A' --include-diversions -- file
     -+
     -+check_result 'N M L K J I H F E D C G B A' --full-history --topo-order
     -+check_result 'N M L K I H G F E D C B J A' --full-history
     -+check_result 'N M L K I H G F E D C B J A' --full-history --date-order
     -+check_result 'N M L K I H G F E D B C J A' --full-history --author-date-order
     -+check_result 'N M K I H E C B A' --full-history -- file
     -+check_result 'N M K I H E C B A' --full-history --topo-order -- file
     -+check_result 'N M K I H E C B A' --full-history --date-order -- file
     -+check_result 'N M K I H E B C A' --full-history --author-date-order -- file
     -+check_result 'N M I E C B A' --simplify-merges -- file
     -+check_result 'N M I E C B A' --simplify-merges --topo-order -- file
     -+check_result 'N M I E C B A' --simplify-merges --date-order -- file
     -+check_result 'N M I E B C A' --simplify-merges --author-date-order -- file
     -+check_result 'M C A' --topo-order -- file
     -+check_result 'M C A' --date-order -- file
     -+check_result 'M C A' --author-date-order -- file
     -+check_result 'H' --first-parent -- another-file
     -+check_result 'H' --first-parent --topo-order -- another-file
     ++check_result 'X I' -- file
     ++check_result 'N R X I' --include-diversions -- file
     ++
     ++check_result 'P O N R X M B A I' --full-history --topo-order -- file
     ++check_result 'N R X M B A I' --simplify-merges --topo-order --include-diversions -- file
     ++check_result 'R X M B A I' --simplify-merges --topo-order -- file
     ++check_result 'N M A I' --first-parent -- file
     ++check_result 'N M A I' --first-parent --include-diversions -- file
     ++
     ++# --ancestry-path implies --full-history
     ++check_result 'P O N R M' --topo-order \
     ++	--ancestry-path A..HEAD -- file
     ++check_result 'P O N R M' --topo-order \
     ++	--include-diversions \
     ++	--ancestry-path A..HEAD -- file
     ++check_result 'P O N R M' --topo-order \
     ++	--full-history \
     ++	--ancestry-path A..HEAD -- file
     ++check_result 'R M' --topo-order \
     ++	--simplify-merges \
     ++	--ancestry-path A..HEAD -- file
     ++check_result 'N R M' --topo-order \
     ++	--simplify-merges --include-diversions \
     ++	--ancestry-path A..HEAD -- file
     ++
     ++test_expect_success 'log --graph --simplify-merges --include-diversions' '
     ++	cat >expect <<-\EOF &&
     ++	* N
     ++	*   R
     ++	|\  
     ++	| * X
     ++	* |   M
     ++	|\ \  
     ++	| * | B
     ++	| |/  
     ++	* / A
     ++	|/  
     ++	* I
     ++	EOF
     ++	git log --graph --pretty="%s" \
     ++		--simplify-merges --include-diversions \
     ++		-- file >actual &&
     ++	test_cmp expect actual
     ++'
      +
       test_done


 Documentation/rev-list-options.txt | 132 ++++++++++++++++++++++++++++-
 object.h                           |   2 +-
 revision.c                         |  27 +++++-
 revision.h                         |   6 +-
 t/t6012-rev-list-simplify.sh       | 120 ++++++++++++++++++++++++++
 5 files changed, 282 insertions(+), 5 deletions(-)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index bfd02ade991..858352bf351 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -342,6 +342,12 @@ Default mode::
 	branches if the end result is the same (i.e. merging branches
 	with the same content)
 
+--include-diversions::
+	Include all commits from the default mode, but also any merge
+	commits that are not TREESAME to the first parent but are
+	TREESAME to a later parent. This mode is helpful for showing
+	the merge commits that "first introduced" a change to a branch.
+
 --full-history::
 	Same as the default mode, but does not prune some history.
 
@@ -534,7 +540,7 @@ Note the major differences in `N`, `P`, and `Q` over `--full-history`:
   parent and is TREESAME.
 --
 
-Finally, there is a fifth simplification mode available:
+There is another simplification mode available:
 
 --ancestry-path::
 	Limit the displayed commits to those directly on the ancestry
@@ -573,6 +579,130 @@ option does. Applied to the 'D..M' range, it results in:
 				L--M
 -----------------------------------------------------------------------
 
+Before discussing another option, `--include-diversions`, we need to
+create a new example history.
++
+A common problem users face when looking at simplified history is that a
+commit they know changed a file somehow does not appear in the file's
+simplified history. Let's demonstrate a new example and show how options
+such as `--full-history` and `--simplify-merges` works in that case:
++
+-----------------------------------------------------------------------
+	  .-A---M-----C--N---O---P
+	 /     / \  \  \/   /   /
+	I     B   \  R-'`-Z'   /
+	 \   /     \/         /
+	  \ /      /\        /
+	   `---X--'  `---Y--'
+-----------------------------------------------------------------------
++
+For this example, suppose `I` created `file.txt` which was modified by
+`A`, `B`, and `X` in different ways. The single-parent commits `C`, `Z`,
+and `Y` do not change `file.txt`. The merge commit `M` was created by
+resolving the merge conflict to include both changes from `A` and `B`
+and hence is not TREESAME to either. The merge commit `R`, however, was
+created by ignoring the contents of `file.txt` at `M` and taking only
+the contents of `file.txt` at `X`. Hence, `R` is TREESAME to `X` but not
+`M`. Finally, the natural merge resolution to create `N` is to take the
+contents of `file.txt` at `R`, so `N` is TREESAME to `R` but not `C`.
+The merge commits `O` and `P` are TREESAME to their first parents, but
+not to their second parents, `Z` and `Y` respectively.
++
+When using the default mode, `N` and `R` both have a TREESAME parent, so
+those edges are walked and the others are ignored. The resulting history
+graph is:
++
+-----------------------------------------------------------------------
+	I---X
+-----------------------------------------------------------------------
++
+When using `--full-history`, Git walks every edge. This will discover
+the commits `A` and `B` and the merge `M`, but also will reveal the
+merge commits `O` and `P`. With parent rewriting, the resulting graph is:
++
+-----------------------------------------------------------------------
+	  .-A---M--------N---O---P
+	 /     / \  \  \/   /   /
+	I     B   \  R-'`--'   /
+	 \   /     \/         /
+	  \ /      /\        /
+	   `---X--'  `------'
+-----------------------------------------------------------------------
++
+Here, the merge commits `O` and `P` contribute extra noise, as they did
+not actually contribute a change to `file.txt`. They only merged a topic
+that was based on an older version of `file.txt`. This is a common
+issue in repositories using a workflow where many contributors work in
+parallel and merge their topic branches along a single trunk: manu
+unrelated merges appear in the `--full-history` results.
++
+When using the `--simplify-merges` option, the commits `O` and `P`
+disappear from the results. This is because the rewritten second parents
+of `O` and `P` are reachable from their first parents. Those edges are
+removed and then the commits look like single-parent commits that are
+TREESAME to their parent. This also happens to the commit `N`, resulting
+in a history view as follows:
++
+-----------------------------------------------------------------------
+	  .-A---M--.
+	 /     /    \
+	I     B      R
+	 \   /      /
+	  \ /      /
+	   `---X--'
+-----------------------------------------------------------------------
++
+In this view, we see all of the important single-parent changes from
+`A`, `B`, and `X`. We also see the carefully-resolved merge `M` and the
+not-so-carefully-resolved merge `R`. This is usually enough information
+to determine why the commits `A` and `B` "disappeared" from history in
+the default view. However, there are a few issues with this approach.
++
+The first issue is performance. Unlike any previous option, the
+`--simplify-merges` option requires walking the entire commit history
+before returning a single result. This can make the option difficult to
+use for very large repositories.
++
+The second issue is one of auditing. When many contributors are working
+on the same repository, it is important which merge commits introduced
+a change into an important branch. The problematic merge `R` above is
+not likely to be the merge commit that was used to merge into an
+important branch. Instead, the merge `N` was used to merge `R` and `X`
+into the important branch. This commit may have information about why
+the change `X` came to override the changes from `A` and `B` in its
+commit message.
++
+The `--include-diversions` option helps with both of these issues. A
+merge commit is considered a "diverter" if it is not TREESAME to its
+first parent but is TREESAME to a later parent. These merges "divert"
+the history walk to a second parent instead of continuing along the
+first-parent history as expected. When using `--include-diversions`
+on this example (and no other options) the resulting graph is:
++
+-----------------------------------------------------------------------
+	I---X---R---N
+-----------------------------------------------------------------------
++
+Here, the merge commits `R` and `N` are included because they diverted
+the walk away from their first-parent history. They are the reason the
+commits `A` and `B` do not appear in the history.
++
+When `--include-diversions` is paired with `--simplify-merges`, the
+graph includes all of the necessary information:
++
+-----------------------------------------------------------------------
+	  .-A---M--.   N
+	 /     /    \ /
+	I     B      R
+	 \   /      /
+	  \ /      /
+	   `---X--'
+-----------------------------------------------------------------------
++
+Notice that since `M` is reachable from `R`, the edge from `N` to `M`
+was simplified away. However, `N` still appears in the history as an
+important commit because it would divert a simplified history walk.
+
 The `--simplify-by-decoration` option allows you to view only the
 big picture of the topology of the history, by omitting commits
 that are not referenced by tags.  Commits are marked as !TREESAME
diff --git a/object.h b/object.h
index 2dbabfca0ab..b22328b8383 100644
--- a/object.h
+++ b/object.h
@@ -59,7 +59,7 @@ struct object_array {
 
 /*
  * object flag allocation:
- * revision.h:               0---------10                              25----28
+ * revision.h:               0---------10         15                   25----28
  * fetch-pack.c:             01
  * negotiator/default.c:       2--5
  * walker.c:                 0-2
diff --git a/revision.c b/revision.c
index 8136929e236..96f66400659 100644
--- a/revision.c
+++ b/revision.c
@@ -870,7 +870,19 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
 			}
 			parent->next = NULL;
 			commit->parents = parent;
-			commit->object.flags |= TREESAME;
+
+			/*
+			 * A merge commit is a "diversion" if it is not
+			 * TREESAME to its first parent but is TREESAME
+			 * to a later parent. In the simplified history,
+			 * we "divert" the history walk to the later
+			 * parent. These commits are shown when "diversions"
+			 * is enabled, so do not mark the object as
+			 * TREESAME here.
+			 */
+			if (!revs->diversions || !nth_parent)
+				commit->object.flags |= TREESAME;
+
 			return;
 
 		case REV_TREE_NEW:
@@ -897,6 +909,10 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
 				relevant_change = 1;
 			else
 				irrelevant_change = 1;
+
+			if (!nth_parent)
+				commit->object.flags |= DIVERSION;
+
 			continue;
 		}
 		die("bad tree compare for commit %s", oid_to_hex(&commit->object.oid));
@@ -2265,6 +2281,8 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	} else if (!strcmp(arg, "--full-diff")) {
 		revs->diff = 1;
 		revs->full_diff = 1;
+	} else if (!strcmp(arg, "--include-diversions")) {
+		revs->diversions = 1;
 	} else if (!strcmp(arg, "--full-history")) {
 		revs->simplify_history = 0;
 	} else if (!strcmp(arg, "--relative-date")) {
@@ -3019,7 +3037,8 @@ static struct commit_list **simplify_one(struct rev_info *revs, struct commit *c
 	if (!cnt ||
 	    (commit->object.flags & UNINTERESTING) ||
 	    !(commit->object.flags & TREESAME) ||
-	    (parent = one_relevant_parent(revs, commit->parents)) == NULL)
+	    (parent = one_relevant_parent(revs, commit->parents)) == NULL ||
+	    (revs->diversions && (commit->object.flags & DIVERSION)))
 		st->simplified = commit;
 	else {
 		pst = locate_simplify_state(revs, parent);
@@ -3602,6 +3621,10 @@ enum commit_action get_commit_action(struct rev_info *revs, struct commit *commi
 			/* drop merges unless we want parenthood */
 			if (!want_ancestry(revs))
 				return commit_ignore;
+
+			if (revs->diversions && (commit->object.flags & DIVERSION))
+				return commit_show;
+
 			/*
 			 * If we want ancestry, then need to keep any merges
 			 * between relevant commits to tie together topology.
diff --git a/revision.h b/revision.h
index 475f048fb61..f3c28e5f1c1 100644
--- a/revision.h
+++ b/revision.h
@@ -34,6 +34,9 @@
 #define SYMMETRIC_LEFT	(1u<<8)
 #define PATCHSAME	(1u<<9)
 #define BOTTOM		(1u<<10)
+
+/* WARNING: This is also used as REACHABLE in commit-graph.c. */
+#define DIVERSION	(1u<<15)
 /*
  * Indicates object was reached by traversal. i.e. not given by user on
  * command-line or stdin.
@@ -43,7 +46,7 @@
  */
 #define NOT_USER_GIVEN	(1u<<25)
 #define TRACK_LINEAR	(1u<<26)
-#define ALL_REV_FLAGS	(((1u<<11)-1) | NOT_USER_GIVEN | TRACK_LINEAR)
+#define ALL_REV_FLAGS	(((1u<<11)-1) | NOT_USER_GIVEN | TRACK_LINEAR | DIVERSION)
 
 #define TOPO_WALK_EXPLORED	(1u<<27)
 #define TOPO_WALK_INDEGREE	(1u<<28)
@@ -129,6 +132,7 @@ struct rev_info {
 			no_walk:2,
 			remove_empty_trees:1,
 			simplify_history:1,
+			diversions:1,
 			topo_order:1,
 			simplify_merges:1,
 			simplify_by_decoration:1,
diff --git a/t/t6012-rev-list-simplify.sh b/t/t6012-rev-list-simplify.sh
index a10f0df02b0..23226f2144c 100755
--- a/t/t6012-rev-list-simplify.sh
+++ b/t/t6012-rev-list-simplify.sh
@@ -154,4 +154,124 @@ test_expect_success '--full-diff is not affected by --parents' '
 	test_cmp expected actual
 '
 
+#
+# Create a new history to demonstrate the value of --include-diversions
+# with respect to the subtleties of simplified history, --full-history,
+# and --simplify-merges.
+#
+#   .-A---M-----C--N---O---P
+#  /     / \  \  \/   /   /
+# I     B   \  R-'`-Z'   /
+#  \   /     \/         /
+#   \ /      /\        /
+#    `---X--'  `---Y--'
+#
+# This example is explained in Documentation/rev-list-options.txt
+
+test_expect_success 'rebuild repo' '
+	rm -rf .git * &&
+	git init &&
+	git switch -c main &&
+
+	echo base >file &&
+	git add file &&
+	test_commit I &&
+
+	echo A >file &&
+	git add file &&
+	test_commit A &&
+
+	git switch -c branchB I &&
+	echo B >file &&
+	git add file &&
+	test_commit B &&
+
+	git switch main &&
+	test_must_fail git merge -m "M" B &&
+	echo A >file &&
+	echo B >>file &&
+	git add file &&
+	git merge --continue &&
+	note M &&
+
+	echo C >other &&
+	git add other &&
+	test_commit C &&
+
+	git switch -c branchX I &&
+	echo X >file &&
+	git add file &&
+	test_commit X &&
+
+	git switch -c branchR M &&
+	git merge -m R -Xtheirs X &&
+	note R &&
+
+	git switch main &&
+	git merge -m N R &&
+	note N &&
+
+	git switch -c branchY M &&
+	echo Y >y &&
+	git add y &&
+	test_commit Y &&
+
+	git switch -c branchZ C &&
+	echo Z >z &&
+	git add z &&
+	test_commit Z &&
+
+	git switch main &&
+	git merge -m O Z &&
+	note O &&
+
+	git merge -m P Y &&
+	note P
+'
+
+check_result 'X I' -- file
+check_result 'N R X I' --include-diversions -- file
+
+check_result 'P O N R X M B A I' --full-history --topo-order -- file
+check_result 'N R X M B A I' --simplify-merges --topo-order --include-diversions -- file
+check_result 'R X M B A I' --simplify-merges --topo-order -- file
+check_result 'N M A I' --first-parent -- file
+check_result 'N M A I' --first-parent --include-diversions -- file
+
+# --ancestry-path implies --full-history
+check_result 'P O N R M' --topo-order \
+	--ancestry-path A..HEAD -- file
+check_result 'P O N R M' --topo-order \
+	--include-diversions \
+	--ancestry-path A..HEAD -- file
+check_result 'P O N R M' --topo-order \
+	--full-history \
+	--ancestry-path A..HEAD -- file
+check_result 'R M' --topo-order \
+	--simplify-merges \
+	--ancestry-path A..HEAD -- file
+check_result 'N R M' --topo-order \
+	--simplify-merges --include-diversions \
+	--ancestry-path A..HEAD -- file
+
+test_expect_success 'log --graph --simplify-merges --include-diversions' '
+	cat >expect <<-\EOF &&
+	* N
+	*   R
+	|\  
+	| * X
+	* |   M
+	|\ \  
+	| * | B
+	| |/  
+	* / A
+	|/  
+	* I
+	EOF
+	git log --graph --pretty="%s" \
+		--simplify-merges --include-diversions \
+		-- file >actual &&
+	test_cmp expect actual
+'
+
 test_done

base-commit: 274b9cc25322d9ee79aa8e6d4e86f0ffe5ced925
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] revision: --include-diversions adds helpful merges
  2020-04-08 23:59               ` Derrick Stolee
@ 2020-04-09  0:08                 ` Junio C Hamano
  2020-04-09 11:52                   ` Derrick Stolee
  2020-04-09 14:28                   ` Philip Oakley
  0 siblings, 2 replies; 23+ messages in thread
From: Junio C Hamano @ 2020-04-09  0:08 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Jeff King, Derrick Stolee via GitGitGadget, git, me,
	Derrick Stolee, brian m. carlson

Derrick Stolee <stolee@gmail.com> writes:

>>> In my latest attempt at documentation, I called these merges "diverters"
>>> yet still used "--include-diversions". Here are a few other words that we
>>> could use:
>>>
>>>  * diverters or diversions
>>>  * redirects
>>>  * switches (think railroad switch). Synonym: exchange
>>>  * detours
>> 
>> ...none of the above tells me that they are not no-op (in other
>> words, they do something meaningful), so I must be coming from
>> a direction different from you are.  What redirects from what other
>> thing, for example?
>
> The merges do something meaningful: they "merge in" a "real" change.

Yes, but "redirect", "switch", "detour", or "divert" do not quite
mean "merging in a real change", at least to me.

> I'll just submit my v2 as-is, which includes a significant change to
> the documentation that should make things more clear.

Thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] revision: --include-diversions adds helpful merges
  2020-04-09  0:08                 ` Junio C Hamano
@ 2020-04-09 11:52                   ` Derrick Stolee
  2020-04-09 14:28                   ` Philip Oakley
  1 sibling, 0 replies; 23+ messages in thread
From: Derrick Stolee @ 2020-04-09 11:52 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Derrick Stolee via GitGitGadget, git, me,
	Derrick Stolee, brian m. carlson

On 4/8/2020 8:08 PM, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
> 
>>>> In my latest attempt at documentation, I called these merges "diverters"
>>>> yet still used "--include-diversions". Here are a few other words that we
>>>> could use:
>>>>
>>>>  * diverters or diversions
>>>>  * redirects
>>>>  * switches (think railroad switch). Synonym: exchange
>>>>  * detours
>>>
>>> ...none of the above tells me that they are not no-op (in other
>>> words, they do something meaningful), so I must be coming from
>>> a direction different from you are.  What redirects from what other
>>> thing, for example?
>>
>> The merges do something meaningful: they "merge in" a "real" change.
> 
> Yes, but "redirect", "switch", "detour", or "divert" do not quite
> mean "merging in a real change", at least to me.

Makes sense to me. The way you explain why certain words don't work
for you helps me think of new words to describe these merges:

 * signposts
 * guides
 * signals

For the argument, we cwould add "-merge" to each of these, such as
"--signpost-merges" or "--signal-merges".

I'm going to keep replying to this thread with ideas until someone
says "This one makes sense to me" or an equivalent. Alternatively,
someone else could present an idea and then I get to say "Aha!
That captures this concept clearly with an obvious metaphor!"

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] revision: --include-diversions adds helpful merges
  2020-04-09  0:08                 ` Junio C Hamano
  2020-04-09 11:52                   ` Derrick Stolee
@ 2020-04-09 14:28                   ` Philip Oakley
  2020-04-09 15:56                     ` Junio C Hamano
  1 sibling, 1 reply; 23+ messages in thread
From: Philip Oakley @ 2020-04-09 14:28 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee
  Cc: Jeff King, Derrick Stolee via GitGitGadget, git, me,
	Derrick Stolee, brian m. carlson

Hi

On 09/04/2020 01:08, Junio C Hamano wrote:
> Derrick Stolee <stolee@gmail.com> writes:
>
>>>> In my latest attempt at documentation, I called these merges "diverters"
>>>> yet still used "--include-diversions". Here are a few other words that we
>>>> could use:
>>>>
>>>>  * diverters or diversions
>>>>  * redirects
>>>>  * switches (think railroad switch). Synonym: exchange
>>>>  * detours
>>> ...none of the above tells me that they are not no-op (in other
>>> words, they do something meaningful), so I must be coming from
>>> a direction different from you are.  What redirects from what other
>>> thing, for example?
>> The merges do something meaningful: they "merge in" a "real" change.
> Yes, but "redirect", "switch", "detour", or "divert" do not quite
> mean "merging in a real change", at least to me.
>
>> I'll just submit my v2 as-is, which includes a significant change to
>> the documentation that should make things more clear.
> Thanks.
Can I suggest "--side-merges" as a possible descriptor for these
non-mainline diversions?

My thesaurus had suggested detour and sidetracked, which led to the
side-merge view.

Philip

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] revision: --include-diversions adds helpful merges
  2020-04-09 14:28                   ` Philip Oakley
@ 2020-04-09 15:56                     ` Junio C Hamano
  2020-04-09 17:20                       ` Derrick Stolee
  0 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2020-04-09 15:56 UTC (permalink / raw)
  To: Philip Oakley
  Cc: Derrick Stolee, Jeff King, Derrick Stolee via GitGitGadget, git,
	me, Derrick Stolee, brian m. carlson

Philip Oakley <philipoakley@iee.email> writes:

>> Yes, but "redirect", "switch", "detour", or "divert" do not quite
>> mean "merging in a real change", at least to me.
>>
>>> I'll just submit my v2 as-is, which includes a significant change to
>>> the documentation that should make things more clear.
>> Thanks.
> Can I suggest "--side-merges" as a possible descriptor for these
> non-mainline diversions?
>
> My thesaurus had suggested detour and sidetracked, which led to the
> side-merge view.

Ahh, sorry Derrick for being slow and thanks Philip for repeating
"diversion", as the word did not click for me at all when I saw the
patch and wrote my response.

But I think it started slowly to dawn on me.  

Does it come from the worldview where we want to follow the "trunk"
but because when we notice at a merge that we got everything that
matters to us from a side branch, we switch the track out of the
mainline and from then on follow that side branch?  Switching the
track and following the side branch happens silently with the
default "history simplification", but the new feature shows where
that side-tracking happens more prominently---is that where the
words "divert" etc. come from?

Then I can understand how these candidate words may have place in
describing the situation we want to use the feature; I am not yet
convinced any of the concrete option names floated on the thread (or
what I can come up with right now) would be clear to our target
audiences, but at least I am not as confused as I was before.

Thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] revision: --include-diversions adds helpful merges
  2020-04-09 15:56                     ` Junio C Hamano
@ 2020-04-09 17:20                       ` Derrick Stolee
  2020-04-09 18:24                         ` Jeff King
  0 siblings, 1 reply; 23+ messages in thread
From: Derrick Stolee @ 2020-04-09 17:20 UTC (permalink / raw)
  To: Junio C Hamano, Philip Oakley
  Cc: Jeff King, Derrick Stolee via GitGitGadget, git, me,
	Derrick Stolee, brian m. carlson, Johannes Schindelin

On 4/9/2020 11:56 AM, Junio C Hamano wrote:
> Philip Oakley <philipoakley@iee.email> writes:
> 
>>> Yes, but "redirect", "switch", "detour", or "divert" do not quite
>>> mean "merging in a real change", at least to me.
>>>
>>>> I'll just submit my v2 as-is, which includes a significant change to
>>>> the documentation that should make things more clear.
>>> Thanks.
>> Can I suggest "--side-merges" as a possible descriptor for these
>> non-mainline diversions?
>>
>> My thesaurus had suggested detour and sidetracked, which led to the
>> side-merge view.
> 
> Ahh, sorry Derrick for being slow and thanks Philip for repeating
> "diversion", as the word did not click for me at all when I saw the
> patch and wrote my response.
> 
> But I think it started slowly to dawn on me.  
> 
> Does it come from the worldview where we want to follow the "trunk"
> but because when we notice at a merge that we got everything that
> matters to us from a side branch, we switch the track out of the
> mainline and from then on follow that side branch?  Switching the
> track and following the side branch happens silently with the
> default "history simplification", but the new feature shows where
> that side-tracking happens more prominently---is that where the
> words "divert" etc. come from?
> 
> Then I can understand how these candidate words may have place in
> describing the situation we want to use the feature; I am not yet
> convinced any of the concrete option names floated on the thread (or
> what I can come up with right now) would be clear to our target
> audiences, but at least I am not as confused as I was before.

After thinking about all the great responses here, and having a
chat with Dscho about this, then taking a break, I had an "Aha!"
moment. We should call this option:

	--show-pulls

The direction here is important. Let's look at a potential
"git log --graph --oneline" output to explore this idea:

	* (master) A
	|\
	| * (feature) B
	| |\
	| | *   (topic) C
	| | |\
	| | | |
	| | * | D
	| | | |
	| * | | E
	| | | |
	* | | | F
	| |_|/
	|/| |
	* | |   G
	|/ /
	* /     H
	|/
	*       I

I use (master), (feature), and (topic) to decorate branches
that are updated only by "git commit" or "git pull". The
file 'foo' was created by commit I.

In this graph, the single-parent commits G and D change 'foo'.
The commit G enters master using "git commit".

The commit G enters topic using "git pull" starting from D. The
developer on that branch resolves conflicts by taking the version
of 'foo' from D. Thus C is TREESAME to D but not G.

The commit B is created by running "git pull topic" from the
feature branch.

The commit A is created by running "git pull feature" from the
master branch.

Thus, A and B "pulled" the change into their branches. The
commit C "pulled" G into the branch, but did not keep the change
to 'foo'.

Thus 'git log --graph --oneline master -- foo' would output:


	* D
	* I

'git log --graph --oneline --show-pulls master -- foo' shows:

	* A
	* B
	* D
	* I

'git log --graph --oneline --full-history -- foo' shows:

	* (master) A
	|\
	| * (feature) B
	| |\
	| | *   (topic) C
	| | |\
	| | | |
	| | * | D
	| |_|/
	|/| |
	* | |   G
	|/ /
	| /
	|/
	*       I

'git log --graph --oneline --full-history --simplify-merges -- foo'
would show:

	*   C
	|\
	* | D
	| |
	| * G
	|/
	*   I

'git log --graph --oneline --full-history --simplify-merges --show-pulls -- foo'
would show:

	* A
	* B
	*   C
	|\
	* | D
	| |
	| * G
	|/
	*   I

In conclusion, I think "--show-pulls" provides the right context for these
extra merges to show in the history view. It also roots these merges in a
Git-native name (that also happens to evoke the "pull request" concept that
is _not_ native to Git).

What do you think?

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] revision: --include-diversions adds helpful merges
  2020-04-09 17:20                       ` Derrick Stolee
@ 2020-04-09 18:24                         ` Jeff King
  2020-04-09 18:55                           ` Junio C Hamano
  0 siblings, 1 reply; 23+ messages in thread
From: Jeff King @ 2020-04-09 18:24 UTC (permalink / raw)
  To: Derrick Stolee
  Cc: Junio C Hamano, Philip Oakley, Derrick Stolee via GitGitGadget,
	git, me, Derrick Stolee, brian m. carlson, Johannes Schindelin

On Thu, Apr 09, 2020 at 01:20:57PM -0400, Derrick Stolee wrote:

> In conclusion, I think "--show-pulls" provides the right context for these
> extra merges to show in the history view. It also roots these merges in a
> Git-native name (that also happens to evoke the "pull request" concept that
> is _not_ native to Git).
> 
> What do you think?

Yeah, after reading more of the thread, I think the simplest way to
think about is "keep merges that pulled in something" with the
implication of "(even if the other side didn't touch anything)".

And "something you pulled" is a sensible way to think of that. So
--show-pulls makes sense to me. Or if we really want to tie it in to
simplification, --no-simplify-pulls. But that's more awkward to type,
and none of the existing simplification options use the word simplify. ;)

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] revision: --include-diversions adds helpful merges
  2020-04-09 18:24                         ` Jeff King
@ 2020-04-09 18:55                           ` Junio C Hamano
  2020-04-09 19:21                             ` Jeff King
  0 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2020-04-09 18:55 UTC (permalink / raw)
  To: Jeff King
  Cc: Derrick Stolee, Philip Oakley, Derrick Stolee via GitGitGadget,
	git, me, Derrick Stolee, brian m. carlson, Johannes Schindelin

Jeff King <peff@peff.net> writes:

> On Thu, Apr 09, 2020 at 01:20:57PM -0400, Derrick Stolee wrote:
>
>> In conclusion, I think "--show-pulls" provides the right context for these
>> extra merges to show in the history view. It also roots these merges in a
>> Git-native name (that also happens to evoke the "pull request" concept that
>> is _not_ native to Git).
>> 
>> What do you think?
>
> Yeah, after reading more of the thread, I think the simplest way to
> think about is "keep merges that pulled in something" with the
> implication of "(even if the other side didn't touch anything)".

Isn't it more like "even if our side didn't touch anything", though?

If a merge pulled in something, the other side by definition did
something (i.e. what was pulled in); if we did something since they
forked, we would have shown the merge without this patch---the only
new behaviour we are adding is to show the merge even when our side
didn't touch since they forked---so far we never showed that merge,
but now with this option we would when we are asked to.

I agree that "this is showing pulls" is an easy way to explain.

> And "something you pulled" is a sensible way to think of that. So
> --show-pulls makes sense to me. Or if we really want to tie it in to
> simplification, --no-simplify-pulls. But that's more awkward to type,
> and none of the existing simplification options use the word simplify. ;)

;-)
 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH] revision: --include-diversions adds helpful merges
  2020-04-09 18:55                           ` Junio C Hamano
@ 2020-04-09 19:21                             ` Jeff King
  0 siblings, 0 replies; 23+ messages in thread
From: Jeff King @ 2020-04-09 19:21 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Derrick Stolee, Philip Oakley, Derrick Stolee via GitGitGadget,
	git, me, Derrick Stolee, brian m. carlson, Johannes Schindelin

On Thu, Apr 09, 2020 at 11:55:51AM -0700, Junio C Hamano wrote:

> > On Thu, Apr 09, 2020 at 01:20:57PM -0400, Derrick Stolee wrote:
> >
> >> In conclusion, I think "--show-pulls" provides the right context for these
> >> extra merges to show in the history view. It also roots these merges in a
> >> Git-native name (that also happens to evoke the "pull request" concept that
> >> is _not_ native to Git).
> >> 
> >> What do you think?
> >
> > Yeah, after reading more of the thread, I think the simplest way to
> > think about is "keep merges that pulled in something" with the
> > implication of "(even if the other side didn't touch anything)".
> 
> Isn't it more like "even if our side didn't touch anything", though?

I meant the _other_ other. :) I.e., the other one that is not what just
got pulled in. Which is the first parent. ;)

So yes, I think we are on the same page, and I just said it badly. Using
"our side" is better than trying to double-negate "other".

-Peff

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v3] revision: --show-pulls adds helpful merges
  2020-04-09  0:01 ` [PATCH v2] " Derrick Stolee via GitGitGadget
@ 2020-04-10 12:19   ` Derrick Stolee via GitGitGadget
  2020-04-10 20:06     ` Junio C Hamano
  0 siblings, 1 reply; 23+ messages in thread
From: Derrick Stolee via GitGitGadget @ 2020-04-10 12:19 UTC (permalink / raw)
  To: git
  Cc: peff, me, philipoakley, sandals, Johannes.Schindelin,
	Derrick Stolee, Derrick Stolee

From: Derrick Stolee <dstolee@microsoft.com>

The default file history simplification of "git log -- <path>" or
"git rev-list -- <path>" focuses on providing the smallest set of
commits that first contributed a change. The revision walk greatly
restricts the set of walked commits by visiting only the first
TREESAME parent of a merge commit, when one exists. This means
that portions of the commit-graph are not walked, which can be a
performance benefit, but can also "hide" commits that added changes
but were ignored by a merge resolution.

The --full-history option modifies this by walking all commits and
reporting a merge commit as "interesting" if it has _any_ parent
that is not TREESAME. This tends to be an over-representation of
important commits, especially in an environment where most merge
commits are created by pull request completion.

Suppose we have a commit A and we create a commit B on top that
changes our file. When we merge the pull request, we create a merge
commit M. If no one else changed the file in the first-parent
history between M and A, then M will not be TREESAME to its first
parent, but will be TREESAME to B. Thus, the simplified history
will be "B". However, M will appear in the --full-history mode.

However, suppose that a number of topics T1, T2, ..., Tn were
created based on commits C1, C2, ..., Cn between A and M as
follows:

  A----C1----C2--- ... ---Cn----M------P1---P2--- ... ---Pn
   \     \     \            \  /      /    /            /
    \     \__.. \            \/ ..__T1    /           Tn
     \           \__..       /\     ..__T2           /
      \_____________________B  \____________________/

If the commits T1, T2, ... Tn did not change the file, then all of
P1 through Pn will be TREESAME to their first parent, but not
TREESAME to their second. This means that all of those merge commits
appear in the --full-history view, with edges that immediately
collapse into the lower history without introducing interesting
single-parent commits.

The --simplify-merges option was introduced to remove these extra
merge commits. By noticing that the rewritten parents are reachable
from their first parents, those edges can be simplified away. Finally,
the commits now look like single-parent commits that are TREESAME to
their "only" parent. Thus, they are removed and this issue does not
cause issues anymore. However, this also ends up removing the commit
M from the history view! Even worse, the --simplify-merges option
requires walking the entire history before returning a single result.

Many Git users are using Git alongside a Git service that provides
code storage alongside a code review tool commonly called "Pull
Requests" or "Merge Requests" against a target branch.  When these
requests are accepted and merged, they typically create a merge
commit whose first parent is the previous branch tip and the second
parent is the tip of the topic branch used for the request. This
presents a valuable order to the parents, but also makes that merge
commit slightly special. Users may want to see not only which
commits changed a file, but which pull requests merged those commits
into their branch. In the previous example, this would mean the
users want to see the merge commit "M" in addition to the single-
parent commit "C".

Users are even more likely to want these merge commits when they
use pull requests to merge into a feature branch before merging that
feature branch into their trunk.

In some sense, users are asking for the "first" merge commit to
bring in the change to their branch. As long as the parent order is
consistent, this can be handled with the following rule:

  Include a merge commit if it is not TREESAME to its first
  parent, but is TREESAME to a later parent.

These merges look like the merge commits that would result from
running "git pull <topic>" on a main branch. Thus, the option to
show these commits is called "--show-pulls". This has the added
benefit of showing the commits created by closing a pull request or
merge request on any of the Git hosting and code review platforms.

To test these options, extend the standard test example to include
a merge commit that is not TREESAME to its first parent. It is
surprising that that option was not already in the example, as it
is instructive.

In particular, this extension demonstrates a common issue with file
history simplification. When a user resolves a merge conflict using
"-Xours" or otherwise ignoring one side of the conflict, they create
a TREESAME edge that probably should not be TREESAME. This leads
users to become frustrated and complain that "my change disappeared!"
In my experience, showing them history with --full-history and
--simplify-merges quickly reveals the problematic merge. As mentioned,
this option is expensive to compute. The --show-pulls option
_might_ show the merge commit (usually titled "resolving conflicts")
more quickly. Of course, this depends on the user having the correct
parent order, which is backwards when using "git pull master" from a
topic branch.

There are some special considerations when combining the --show-pulls
option with --simplify-merges. This requires adding a new PULL_MERGE
object flag to store the information from the initial TREESAME
comparisons. This helps avoid dropping those commits in later filters.
This is covered by a test, including how the parents can be simplified.
Since "struct object" has already ruined its 32-bit alignment by using
33 bits across parsed, type, and flags member, let's not make it worse.
PULL_MERGE is used in revision.c with the same value (1u<<15) as
REACHABLE in commit-graph.c. The REACHABLE flag is only used when
writing a commit-graph file, and a revision walk using --show-pulls
does not happen in the same process. Care must be taken in the future
to ensure this remains the case.

Update Documentation/rev-list-options.txt with significant details
around this option. This requires updating the example in the
History Simplification section to demonstrate some of the problems
with TREESAME second parents.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
    Add a new history mode
    
    This --include-diversions option could use a better name.
    
    An experienced developer in the Windows OS Engineering Systems team
    pointed out how hard it is to find out when a change was "introduced" in
    the Windows OS repo. Due to their multi-leveled, long-lived branch
    organization, a commit could be part of hundreds of pull requests as the
    branches are merged across the organization.
    
    My default answer was "this is complicated not because of Git, but
    because of how you are branching." I then tried to explain how finding
    the "first merge" to include a commit is incredibly difficult and
    requires performing multiple reachability queries. As I was working it
    out on paper, I realized that was true if we relied only on the
    commit-graph shape to inform our qurey.
    
    If we use the TREESAME information, then suddenly we get a much clearer
    picture! Let's simply pick out those merge commits that "introduced a
    change" because they are TREESAME to a non-first-parent (and not
    TREESAME to the first parent).
    
    My name of "diversions" could probably use some work, but I like the
    basic concept of this option.
    
    I welcome any and all feedback. Thanks!
    
    UPDATES in v2:
    
     * The functionality is a bit more complicated to work with
       --simplify-merges.
       
       
     * The documentation is significantly expanded with an example that
       highlights the shortcomings of the default history simplification.
       
       
     * The documented example is used in the test script instead of simply
       extending the previous example.
       
       
    
    UPDATES in v3:
    
     * Renamed things appropriately for the shift from
       "--include-diversions" to "--show-pulls".
       
       
     * Slight rewording in the documentation to remove references to
       "diverters".
       
       
    
    I don't recommend looking at the range-diff for this version.
    
    -Stolee

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-599%2Fderrickstolee%2Fnew-history-mode-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-599/derrickstolee/new-history-mode-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/599

Range-diff vs v2:

 1:  146c443e7f1 ! 1:  19026f4d4c0 revision: --include-diversions adds helpful merges
     @@ Metadata
      Author: Derrick Stolee <dstolee@microsoft.com>
      
       ## Commit message ##
     -    revision: --include-diversions adds helpful merges
     +    revision: --show-pulls adds helpful merges
      
          The default file history simplification of "git log -- <path>" or
          "git rev-list -- <path>" focuses on providing the smallest set of
     @@ Commit message
            Include a merge commit if it is not TREESAME to its first
            parent, but is TREESAME to a later parent.
      
     -    I call such merge commits "diversions" because they divert the
     -    history walk away from the first-parent history. As such, this
     -    change adds the "--include-diversions" option to rev-list and log.
     +    These merges look like the merge commits that would result from
     +    running "git pull <topic>" on a main branch. Thus, the option to
     +    show these commits is called "--show-pulls". This has the added
     +    benefit of showing the commits created by closing a pull request or
     +    merge request on any of the Git hosting and code review platforms.
     +
          To test these options, extend the standard test example to include
          a merge commit that is not TREESAME to its first parent. It is
          surprising that that option was not already in the example, as it
     @@ Commit message
          users to become frustrated and complain that "my change disappeared!"
          In my experience, showing them history with --full-history and
          --simplify-merges quickly reveals the problematic merge. As mentioned,
     -    this option is expensive to compute. The --include-diversions option
     +    this option is expensive to compute. The --show-pulls option
          _might_ show the merge commit (usually titled "resolving conflicts")
          more quickly. Of course, this depends on the user having the correct
     -    parent order, which is backwards when using "git pull".
     +    parent order, which is backwards when using "git pull master" from a
     +    topic branch.
      
     -    There are some special considerations when combining the
     -    --include-diversions option with --simplify-merges. This requires
     -    adding a new DIVERSION object flag to store the information from
     -    the initial TREESAME comparisons. This helps avoid dropping those
     -    commits in later filters. This is covered by a test, including how
     -    the parents can be simplified. Since "struct object" has already
     -    ruined its 32-bit alignment by using 33 bits across parsed, type,
     -    and flags member, let's not make it worse. DIVERSION is used in
     -    revision.c with the same value (1u<<15) as REACHABLE in
     -    commit-graph.c. The REACHABLE flag is only used when writing a
     -    commit-graph file, and a revision walk using --include-diversions
     -    does not happen in the same process. Care must be taken in the
     -    future to ensure this remains the case.
     +    There are some special considerations when combining the --show-pulls
     +    option with --simplify-merges. This requires adding a new PULL_MERGE
     +    object flag to store the information from the initial TREESAME
     +    comparisons. This helps avoid dropping those commits in later filters.
     +    This is covered by a test, including how the parents can be simplified.
     +    Since "struct object" has already ruined its 32-bit alignment by using
     +    33 bits across parsed, type, and flags member, let's not make it worse.
     +    PULL_MERGE is used in revision.c with the same value (1u<<15) as
     +    REACHABLE in commit-graph.c. The REACHABLE flag is only used when
     +    writing a commit-graph file, and a revision walk using --show-pulls
     +    does not happen in the same process. Care must be taken in the future
     +    to ensure this remains the case.
      
          Update Documentation/rev-list-options.txt with significant details
          around this option. This requires updating the example in the
     @@ Documentation/rev-list-options.txt: Default mode::
       	branches if the end result is the same (i.e. merging branches
       	with the same content)
       
     -+--include-diversions::
     ++--show-pulls::
      +	Include all commits from the default mode, but also any merge
      +	commits that are not TREESAME to the first parent but are
      +	TREESAME to a later parent. This mode is helpful for showing
     @@ Documentation/rev-list-options.txt: option does. Applied to the 'D..M' range, it
       				L--M
       -----------------------------------------------------------------------
       
     -+Before discussing another option, `--include-diversions`, we need to
     ++Before discussing another option, `--show-pulls`, we need to
      +create a new example history.
      ++
      +A common problem users face when looking at simplified history is that a
     @@ Documentation/rev-list-options.txt: option does. Applied to the 'D..M' range, it
      +the change `X` came to override the changes from `A` and `B` in its
      +commit message.
      ++
     -+The `--include-diversions` option helps with both of these issues. A
     -+merge commit is considered a "diverter" if it is not TREESAME to its
     -+first parent but is TREESAME to a later parent. These merges "divert"
     -+the history walk to a second parent instead of continuing along the
     -+first-parent history as expected. When using `--include-diversions`
     -+on this example (and no other options) the resulting graph is:
     ++The `--show-pulls` option helps with both of these issues by adding more
     ++merge commits to the history results. If a merge is not TREESAME to its
     ++first parent but is TREESAME to a later parent, then that merge is
     ++treated as if it "pulled" the change from another branch. When using
     ++`--show-pulls` on this example (and no other options) the resulting
     ++graph is:
      ++
      +-----------------------------------------------------------------------
      +	I---X---R---N
      +-----------------------------------------------------------------------
      ++
     -+Here, the merge commits `R` and `N` are included because they diverted
     -+the walk away from their first-parent history. They are the reason the
     -+commits `A` and `B` do not appear in the history.
     ++Here, the merge commits `R` and `N` are included because they pulled
     ++the commits `X` and `R` into the base branch, respectively. These
     ++merges are the reason the commits `A` and `B` do not appear in the
     ++default history.
      ++
     -+When `--include-diversions` is paired with `--simplify-merges`, the
     ++When `--show-pulls` is paired with `--simplify-merges`, the
      +graph includes all of the necessary information:
      ++
      +-----------------------------------------------------------------------
     @@ Documentation/rev-list-options.txt: option does. Applied to the 'D..M' range, it
      ++
      +Notice that since `M` is reachable from `R`, the edge from `N` to `M`
      +was simplified away. However, `N` still appears in the history as an
     -+important commit because it would divert a simplified history walk.
     ++important commit because it "pulled" the change `R` into the main
     ++branch.
      +
       The `--simplify-by-decoration` option allows you to view only the
       big picture of the topology of the history, by omitting commits
     @@ revision.c: static void try_to_simplify_commit(struct rev_info *revs, struct com
      +			 * TREESAME to its first parent but is TREESAME
      +			 * to a later parent. In the simplified history,
      +			 * we "divert" the history walk to the later
     -+			 * parent. These commits are shown when "diversions"
     ++			 * parent. These commits are shown when "show_pulls"
      +			 * is enabled, so do not mark the object as
      +			 * TREESAME here.
      +			 */
     -+			if (!revs->diversions || !nth_parent)
     ++			if (!revs->show_pulls || !nth_parent)
      +				commit->object.flags |= TREESAME;
      +
       			return;
     @@ revision.c: static void try_to_simplify_commit(struct rev_info *revs, struct com
       				irrelevant_change = 1;
      +
      +			if (!nth_parent)
     -+				commit->object.flags |= DIVERSION;
     ++				commit->object.flags |= PULL_MERGE;
      +
       			continue;
       		}
     @@ revision.c: static int handle_revision_opt(struct rev_info *revs, int argc, cons
       	} else if (!strcmp(arg, "--full-diff")) {
       		revs->diff = 1;
       		revs->full_diff = 1;
     -+	} else if (!strcmp(arg, "--include-diversions")) {
     -+		revs->diversions = 1;
     ++	} else if (!strcmp(arg, "--show-pulls")) {
     ++		revs->show_pulls = 1;
       	} else if (!strcmp(arg, "--full-history")) {
       		revs->simplify_history = 0;
       	} else if (!strcmp(arg, "--relative-date")) {
     @@ revision.c: static struct commit_list **simplify_one(struct rev_info *revs, stru
       	    !(commit->object.flags & TREESAME) ||
      -	    (parent = one_relevant_parent(revs, commit->parents)) == NULL)
      +	    (parent = one_relevant_parent(revs, commit->parents)) == NULL ||
     -+	    (revs->diversions && (commit->object.flags & DIVERSION)))
     ++	    (revs->show_pulls && (commit->object.flags & PULL_MERGE)))
       		st->simplified = commit;
       	else {
       		pst = locate_simplify_state(revs, parent);
     @@ revision.c: enum commit_action get_commit_action(struct rev_info *revs, struct c
       			if (!want_ancestry(revs))
       				return commit_ignore;
      +
     -+			if (revs->diversions && (commit->object.flags & DIVERSION))
     ++			if (revs->show_pulls && (commit->object.flags & PULL_MERGE))
      +				return commit_show;
      +
       			/*
     @@ revision.h
       #define BOTTOM		(1u<<10)
      +
      +/* WARNING: This is also used as REACHABLE in commit-graph.c. */
     -+#define DIVERSION	(1u<<15)
     ++#define PULL_MERGE	(1u<<15)
       /*
        * Indicates object was reached by traversal. i.e. not given by user on
        * command-line or stdin.
     @@ revision.h
       #define NOT_USER_GIVEN	(1u<<25)
       #define TRACK_LINEAR	(1u<<26)
      -#define ALL_REV_FLAGS	(((1u<<11)-1) | NOT_USER_GIVEN | TRACK_LINEAR)
     -+#define ALL_REV_FLAGS	(((1u<<11)-1) | NOT_USER_GIVEN | TRACK_LINEAR | DIVERSION)
     ++#define ALL_REV_FLAGS	(((1u<<11)-1) | NOT_USER_GIVEN | TRACK_LINEAR | PULL_MERGE)
       
       #define TOPO_WALK_EXPLORED	(1u<<27)
       #define TOPO_WALK_INDEGREE	(1u<<28)
     @@ revision.h: struct rev_info {
       			no_walk:2,
       			remove_empty_trees:1,
       			simplify_history:1,
     -+			diversions:1,
     ++			show_pulls:1,
       			topo_order:1,
       			simplify_merges:1,
       			simplify_by_decoration:1,
     @@ t/t6012-rev-list-simplify.sh: test_expect_success '--full-diff is not affected b
       '
       
      +#
     -+# Create a new history to demonstrate the value of --include-diversions
     ++# Create a new history to demonstrate the value of --show-pulls
      +# with respect to the subtleties of simplified history, --full-history,
      +# and --simplify-merges.
      +#
     @@ t/t6012-rev-list-simplify.sh: test_expect_success '--full-diff is not affected b
      +'
      +
      +check_result 'X I' -- file
     -+check_result 'N R X I' --include-diversions -- file
     ++check_result 'N R X I' --show-pulls -- file
      +
      +check_result 'P O N R X M B A I' --full-history --topo-order -- file
     -+check_result 'N R X M B A I' --simplify-merges --topo-order --include-diversions -- file
     ++check_result 'N R X M B A I' --simplify-merges --topo-order --show-pulls -- file
      +check_result 'R X M B A I' --simplify-merges --topo-order -- file
      +check_result 'N M A I' --first-parent -- file
     -+check_result 'N M A I' --first-parent --include-diversions -- file
     ++check_result 'N M A I' --first-parent --show-pulls -- file
      +
      +# --ancestry-path implies --full-history
      +check_result 'P O N R M' --topo-order \
      +	--ancestry-path A..HEAD -- file
      +check_result 'P O N R M' --topo-order \
     -+	--include-diversions \
     ++	--show-pulls \
      +	--ancestry-path A..HEAD -- file
      +check_result 'P O N R M' --topo-order \
      +	--full-history \
     @@ t/t6012-rev-list-simplify.sh: test_expect_success '--full-diff is not affected b
      +	--simplify-merges \
      +	--ancestry-path A..HEAD -- file
      +check_result 'N R M' --topo-order \
     -+	--simplify-merges --include-diversions \
     ++	--simplify-merges --show-pulls \
      +	--ancestry-path A..HEAD -- file
      +
     -+test_expect_success 'log --graph --simplify-merges --include-diversions' '
     ++test_expect_success 'log --graph --simplify-merges --show-pulls' '
      +	cat >expect <<-\EOF &&
      +	* N
      +	*   R
     @@ t/t6012-rev-list-simplify.sh: test_expect_success '--full-diff is not affected b
      +	* I
      +	EOF
      +	git log --graph --pretty="%s" \
     -+		--simplify-merges --include-diversions \
     ++		--simplify-merges --show-pulls \
      +		-- file >actual &&
      +	test_cmp expect actual
      +'


 Documentation/rev-list-options.txt | 134 ++++++++++++++++++++++++++++-
 object.h                           |   2 +-
 revision.c                         |  27 +++++-
 revision.h                         |   6 +-
 t/t6012-rev-list-simplify.sh       | 120 ++++++++++++++++++++++++++
 5 files changed, 284 insertions(+), 5 deletions(-)

diff --git a/Documentation/rev-list-options.txt b/Documentation/rev-list-options.txt
index bfd02ade991..04ad7dd36eb 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -342,6 +342,12 @@ Default mode::
 	branches if the end result is the same (i.e. merging branches
 	with the same content)
 
+--show-pulls::
+	Include all commits from the default mode, but also any merge
+	commits that are not TREESAME to the first parent but are
+	TREESAME to a later parent. This mode is helpful for showing
+	the merge commits that "first introduced" a change to a branch.
+
 --full-history::
 	Same as the default mode, but does not prune some history.
 
@@ -534,7 +540,7 @@ Note the major differences in `N`, `P`, and `Q` over `--full-history`:
   parent and is TREESAME.
 --
 
-Finally, there is a fifth simplification mode available:
+There is another simplification mode available:
 
 --ancestry-path::
 	Limit the displayed commits to those directly on the ancestry
@@ -573,6 +579,132 @@ option does. Applied to the 'D..M' range, it results in:
 				L--M
 -----------------------------------------------------------------------
 
+Before discussing another option, `--show-pulls`, we need to
+create a new example history.
++
+A common problem users face when looking at simplified history is that a
+commit they know changed a file somehow does not appear in the file's
+simplified history. Let's demonstrate a new example and show how options
+such as `--full-history` and `--simplify-merges` works in that case:
++
+-----------------------------------------------------------------------
+	  .-A---M-----C--N---O---P
+	 /     / \  \  \/   /   /
+	I     B   \  R-'`-Z'   /
+	 \   /     \/         /
+	  \ /      /\        /
+	   `---X--'  `---Y--'
+-----------------------------------------------------------------------
++
+For this example, suppose `I` created `file.txt` which was modified by
+`A`, `B`, and `X` in different ways. The single-parent commits `C`, `Z`,
+and `Y` do not change `file.txt`. The merge commit `M` was created by
+resolving the merge conflict to include both changes from `A` and `B`
+and hence is not TREESAME to either. The merge commit `R`, however, was
+created by ignoring the contents of `file.txt` at `M` and taking only
+the contents of `file.txt` at `X`. Hence, `R` is TREESAME to `X` but not
+`M`. Finally, the natural merge resolution to create `N` is to take the
+contents of `file.txt` at `R`, so `N` is TREESAME to `R` but not `C`.
+The merge commits `O` and `P` are TREESAME to their first parents, but
+not to their second parents, `Z` and `Y` respectively.
++
+When using the default mode, `N` and `R` both have a TREESAME parent, so
+those edges are walked and the others are ignored. The resulting history
+graph is:
++
+-----------------------------------------------------------------------
+	I---X
+-----------------------------------------------------------------------
++
+When using `--full-history`, Git walks every edge. This will discover
+the commits `A` and `B` and the merge `M`, but also will reveal the
+merge commits `O` and `P`. With parent rewriting, the resulting graph is:
++
+-----------------------------------------------------------------------
+	  .-A---M--------N---O---P
+	 /     / \  \  \/   /   /
+	I     B   \  R-'`--'   /
+	 \   /     \/         /
+	  \ /      /\        /
+	   `---X--'  `------'
+-----------------------------------------------------------------------
++
+Here, the merge commits `O` and `P` contribute extra noise, as they did
+not actually contribute a change to `file.txt`. They only merged a topic
+that was based on an older version of `file.txt`. This is a common
+issue in repositories using a workflow where many contributors work in
+parallel and merge their topic branches along a single trunk: manu
+unrelated merges appear in the `--full-history` results.
++
+When using the `--simplify-merges` option, the commits `O` and `P`
+disappear from the results. This is because the rewritten second parents
+of `O` and `P` are reachable from their first parents. Those edges are
+removed and then the commits look like single-parent commits that are
+TREESAME to their parent. This also happens to the commit `N`, resulting
+in a history view as follows:
++
+-----------------------------------------------------------------------
+	  .-A---M--.
+	 /     /    \
+	I     B      R
+	 \   /      /
+	  \ /      /
+	   `---X--'
+-----------------------------------------------------------------------
++
+In this view, we see all of the important single-parent changes from
+`A`, `B`, and `X`. We also see the carefully-resolved merge `M` and the
+not-so-carefully-resolved merge `R`. This is usually enough information
+to determine why the commits `A` and `B` "disappeared" from history in
+the default view. However, there are a few issues with this approach.
++
+The first issue is performance. Unlike any previous option, the
+`--simplify-merges` option requires walking the entire commit history
+before returning a single result. This can make the option difficult to
+use for very large repositories.
++
+The second issue is one of auditing. When many contributors are working
+on the same repository, it is important which merge commits introduced
+a change into an important branch. The problematic merge `R` above is
+not likely to be the merge commit that was used to merge into an
+important branch. Instead, the merge `N` was used to merge `R` and `X`
+into the important branch. This commit may have information about why
+the change `X` came to override the changes from `A` and `B` in its
+commit message.
++
+The `--show-pulls` option helps with both of these issues by adding more
+merge commits to the history results. If a merge is not TREESAME to its
+first parent but is TREESAME to a later parent, then that merge is
+treated as if it "pulled" the change from another branch. When using
+`--show-pulls` on this example (and no other options) the resulting
+graph is:
++
+-----------------------------------------------------------------------
+	I---X---R---N
+-----------------------------------------------------------------------
++
+Here, the merge commits `R` and `N` are included because they pulled
+the commits `X` and `R` into the base branch, respectively. These
+merges are the reason the commits `A` and `B` do not appear in the
+default history.
++
+When `--show-pulls` is paired with `--simplify-merges`, the
+graph includes all of the necessary information:
++
+-----------------------------------------------------------------------
+	  .-A---M--.   N
+	 /     /    \ /
+	I     B      R
+	 \   /      /
+	  \ /      /
+	   `---X--'
+-----------------------------------------------------------------------
++
+Notice that since `M` is reachable from `R`, the edge from `N` to `M`
+was simplified away. However, `N` still appears in the history as an
+important commit because it "pulled" the change `R` into the main
+branch.
+
 The `--simplify-by-decoration` option allows you to view only the
 big picture of the topology of the history, by omitting commits
 that are not referenced by tags.  Commits are marked as !TREESAME
diff --git a/object.h b/object.h
index 2dbabfca0ab..b22328b8383 100644
--- a/object.h
+++ b/object.h
@@ -59,7 +59,7 @@ struct object_array {
 
 /*
  * object flag allocation:
- * revision.h:               0---------10                              25----28
+ * revision.h:               0---------10         15                   25----28
  * fetch-pack.c:             01
  * negotiator/default.c:       2--5
  * walker.c:                 0-2
diff --git a/revision.c b/revision.c
index 8136929e236..f89dd6caa55 100644
--- a/revision.c
+++ b/revision.c
@@ -870,7 +870,19 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
 			}
 			parent->next = NULL;
 			commit->parents = parent;
-			commit->object.flags |= TREESAME;
+
+			/*
+			 * A merge commit is a "diversion" if it is not
+			 * TREESAME to its first parent but is TREESAME
+			 * to a later parent. In the simplified history,
+			 * we "divert" the history walk to the later
+			 * parent. These commits are shown when "show_pulls"
+			 * is enabled, so do not mark the object as
+			 * TREESAME here.
+			 */
+			if (!revs->show_pulls || !nth_parent)
+				commit->object.flags |= TREESAME;
+
 			return;
 
 		case REV_TREE_NEW:
@@ -897,6 +909,10 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
 				relevant_change = 1;
 			else
 				irrelevant_change = 1;
+
+			if (!nth_parent)
+				commit->object.flags |= PULL_MERGE;
+
 			continue;
 		}
 		die("bad tree compare for commit %s", oid_to_hex(&commit->object.oid));
@@ -2265,6 +2281,8 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	} else if (!strcmp(arg, "--full-diff")) {
 		revs->diff = 1;
 		revs->full_diff = 1;
+	} else if (!strcmp(arg, "--show-pulls")) {
+		revs->show_pulls = 1;
 	} else if (!strcmp(arg, "--full-history")) {
 		revs->simplify_history = 0;
 	} else if (!strcmp(arg, "--relative-date")) {
@@ -3019,7 +3037,8 @@ static struct commit_list **simplify_one(struct rev_info *revs, struct commit *c
 	if (!cnt ||
 	    (commit->object.flags & UNINTERESTING) ||
 	    !(commit->object.flags & TREESAME) ||
-	    (parent = one_relevant_parent(revs, commit->parents)) == NULL)
+	    (parent = one_relevant_parent(revs, commit->parents)) == NULL ||
+	    (revs->show_pulls && (commit->object.flags & PULL_MERGE)))
 		st->simplified = commit;
 	else {
 		pst = locate_simplify_state(revs, parent);
@@ -3602,6 +3621,10 @@ enum commit_action get_commit_action(struct rev_info *revs, struct commit *commi
 			/* drop merges unless we want parenthood */
 			if (!want_ancestry(revs))
 				return commit_ignore;
+
+			if (revs->show_pulls && (commit->object.flags & PULL_MERGE))
+				return commit_show;
+
 			/*
 			 * If we want ancestry, then need to keep any merges
 			 * between relevant commits to tie together topology.
diff --git a/revision.h b/revision.h
index 475f048fb61..70899eb246c 100644
--- a/revision.h
+++ b/revision.h
@@ -34,6 +34,9 @@
 #define SYMMETRIC_LEFT	(1u<<8)
 #define PATCHSAME	(1u<<9)
 #define BOTTOM		(1u<<10)
+
+/* WARNING: This is also used as REACHABLE in commit-graph.c. */
+#define PULL_MERGE	(1u<<15)
 /*
  * Indicates object was reached by traversal. i.e. not given by user on
  * command-line or stdin.
@@ -43,7 +46,7 @@
  */
 #define NOT_USER_GIVEN	(1u<<25)
 #define TRACK_LINEAR	(1u<<26)
-#define ALL_REV_FLAGS	(((1u<<11)-1) | NOT_USER_GIVEN | TRACK_LINEAR)
+#define ALL_REV_FLAGS	(((1u<<11)-1) | NOT_USER_GIVEN | TRACK_LINEAR | PULL_MERGE)
 
 #define TOPO_WALK_EXPLORED	(1u<<27)
 #define TOPO_WALK_INDEGREE	(1u<<28)
@@ -129,6 +132,7 @@ struct rev_info {
 			no_walk:2,
 			remove_empty_trees:1,
 			simplify_history:1,
+			show_pulls:1,
 			topo_order:1,
 			simplify_merges:1,
 			simplify_by_decoration:1,
diff --git a/t/t6012-rev-list-simplify.sh b/t/t6012-rev-list-simplify.sh
index a10f0df02b0..b6fa43ace01 100755
--- a/t/t6012-rev-list-simplify.sh
+++ b/t/t6012-rev-list-simplify.sh
@@ -154,4 +154,124 @@ test_expect_success '--full-diff is not affected by --parents' '
 	test_cmp expected actual
 '
 
+#
+# Create a new history to demonstrate the value of --show-pulls
+# with respect to the subtleties of simplified history, --full-history,
+# and --simplify-merges.
+#
+#   .-A---M-----C--N---O---P
+#  /     / \  \  \/   /   /
+# I     B   \  R-'`-Z'   /
+#  \   /     \/         /
+#   \ /      /\        /
+#    `---X--'  `---Y--'
+#
+# This example is explained in Documentation/rev-list-options.txt
+
+test_expect_success 'rebuild repo' '
+	rm -rf .git * &&
+	git init &&
+	git switch -c main &&
+
+	echo base >file &&
+	git add file &&
+	test_commit I &&
+
+	echo A >file &&
+	git add file &&
+	test_commit A &&
+
+	git switch -c branchB I &&
+	echo B >file &&
+	git add file &&
+	test_commit B &&
+
+	git switch main &&
+	test_must_fail git merge -m "M" B &&
+	echo A >file &&
+	echo B >>file &&
+	git add file &&
+	git merge --continue &&
+	note M &&
+
+	echo C >other &&
+	git add other &&
+	test_commit C &&
+
+	git switch -c branchX I &&
+	echo X >file &&
+	git add file &&
+	test_commit X &&
+
+	git switch -c branchR M &&
+	git merge -m R -Xtheirs X &&
+	note R &&
+
+	git switch main &&
+	git merge -m N R &&
+	note N &&
+
+	git switch -c branchY M &&
+	echo Y >y &&
+	git add y &&
+	test_commit Y &&
+
+	git switch -c branchZ C &&
+	echo Z >z &&
+	git add z &&
+	test_commit Z &&
+
+	git switch main &&
+	git merge -m O Z &&
+	note O &&
+
+	git merge -m P Y &&
+	note P
+'
+
+check_result 'X I' -- file
+check_result 'N R X I' --show-pulls -- file
+
+check_result 'P O N R X M B A I' --full-history --topo-order -- file
+check_result 'N R X M B A I' --simplify-merges --topo-order --show-pulls -- file
+check_result 'R X M B A I' --simplify-merges --topo-order -- file
+check_result 'N M A I' --first-parent -- file
+check_result 'N M A I' --first-parent --show-pulls -- file
+
+# --ancestry-path implies --full-history
+check_result 'P O N R M' --topo-order \
+	--ancestry-path A..HEAD -- file
+check_result 'P O N R M' --topo-order \
+	--show-pulls \
+	--ancestry-path A..HEAD -- file
+check_result 'P O N R M' --topo-order \
+	--full-history \
+	--ancestry-path A..HEAD -- file
+check_result 'R M' --topo-order \
+	--simplify-merges \
+	--ancestry-path A..HEAD -- file
+check_result 'N R M' --topo-order \
+	--simplify-merges --show-pulls \
+	--ancestry-path A..HEAD -- file
+
+test_expect_success 'log --graph --simplify-merges --show-pulls' '
+	cat >expect <<-\EOF &&
+	* N
+	*   R
+	|\  
+	| * X
+	* |   M
+	|\ \  
+	| * | B
+	| |/  
+	* / A
+	|/  
+	* I
+	EOF
+	git log --graph --pretty="%s" \
+		--simplify-merges --show-pulls \
+		-- file >actual &&
+	test_cmp expect actual
+'
+
 test_done

base-commit: 274b9cc25322d9ee79aa8e6d4e86f0ffe5ced925
-- 
gitgitgadget

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] revision: --show-pulls adds helpful merges
  2020-04-10 12:19   ` [PATCH v3] revision: --show-pulls " Derrick Stolee via GitGitGadget
@ 2020-04-10 20:06     ` Junio C Hamano
  2020-04-10 21:43       ` Derrick Stolee
  0 siblings, 1 reply; 23+ messages in thread
From: Junio C Hamano @ 2020-04-10 20:06 UTC (permalink / raw)
  To: Derrick Stolee via GitGitGadget
  Cc: git, peff, me, philipoakley, sandals, Johannes.Schindelin,
	Derrick Stolee

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

>   A----C1----C2--- ... ---Cn----M------P1---P2--- ... ---Pn
>    \     \     \            \  /      /    /            /
>     \     \__.. \            \/ ..__T1    /           Tn
>      \           \__..       /\     ..__T2           /
>       \_____________________B  \____________________/
>
> If the commits T1, T2, ... Tn did not change the file, then all of
> P1 through Pn will be TREESAME to their first parent, but not
> TREESAME to their second. This means that all of those merge commits
> appear in the --full-history view, with edges that immediately
> collapse into the lower history without introducing interesting
> single-parent commits.
>
> The --simplify-merges option was introduced to remove these extra
> merge commits. By noticing that the rewritten parents are reachable
> from their first parents, those edges can be simplified away. Finally,
> the commits now look like single-parent commits that are TREESAME to
> their "only" parent. Thus, they are removed and this issue does not
> cause issues anymore. However, this also ends up removing the commit
> M from the history view! Even worse, the --simplify-merges option
> requires walking the entire history before returning a single result.

True.  It is not advisable to use --simplify-merges unless you are
limiting the history at the bottom for that reason.

> In some sense, users are asking for the "first" merge commit to
> bring in the change to their branch. As long as the parent order is
> consistent, this can be handled with the following rule:
>
>   Include a merge commit if it is not TREESAME to its first
>   parent, but is TREESAME to a later parent.

"but is" -> "even if it is" would make it a bit more accurate, I
would think.  Normally, such a merge that is treesame to some side
branch is omitted from the output, but the rule wants it to be shown
even if all the changes were brought in from one single parent.

> Update Documentation/rev-list-options.txt with significant details
> around this option. This requires updating the example in the
> History Simplification section to demonstrate some of the problems
> with TREESAME second parents.

Good.

> diff --git a/revision.c b/revision.c
> index 8136929e236..f89dd6caa55 100644
> --- a/revision.c
> +++ b/revision.c
> @@ -870,7 +870,19 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
>  			}
>  			parent->next = NULL;
>  			commit->parents = parent;
> -			commit->object.flags |= TREESAME;
> +
> +			/*
> +			 * A merge commit is a "diversion" if it is not
> +			 * TREESAME to its first parent but is TREESAME
> +			 * to a later parent. In the simplified history,
> +			 * we "divert" the history walk to the later
> +			 * parent. These commits are shown when "show_pulls"
> +			 * is enabled, so do not mark the object as
> +			 * TREESAME here.

As we no longer use the word "diversion", this explanation should
shift the focus from defining the word "diversion" and giving its
background to explaining why the above parent rewriting is done and
why the TREESAME marking is conditional, e.g.

      			The tree of the merge and of the parent are
      			the same; from here on, we stop following
      			histories of all other parents but this one
      			by culling commit->parents list.  We also
      			normally mark the merge commit TREESAME as
      			the merge itself did not introduce any
      			change relative to the parent, but we
      			refrain from doing so for the first parent
      			under "--show-pulls" mode, in order to let
      			the output phase to show the merge, which is
      			the last commit before we divert our walk to
      			a side history.

or something along that line.

> +			if (!revs->show_pulls || !nth_parent)
> +				commit->object.flags |= TREESAME;
> +
>  			return;

> @@ -897,6 +909,10 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
>  				relevant_change = 1;
>  			else
>  				irrelevant_change = 1;
> +
> +			if (!nth_parent)
> +				commit->object.flags |= PULL_MERGE;

For a three-parent merge that brings in changes to the first parent,
if the result matches the second parent, we would have returned in
the previous hunk before having a chance to inspect the third one
and marking the merge result with PULL_MERGE, but if you swap the
order of the second and the third parent, the second parent, which
has different tree from the result, would not return in the previous
hunk and cause the merge with PULL_MERGE here.  And then when we
inspect the third parent, the previous hunk's return would kick in.
So for two merges that merge exactly the same two branches on top of
exactly the same commit on the mainline, you sometimes mark the
result with PULL_MERGE and sometimes don't, depending on the order
of the second and the third parent.

That feels iffy.  Treating the first parent differently from others
is what we intend to do with this change, bhut this hunk treats the
other parents differently depending on the merge order.

> @@ -3019,7 +3037,8 @@ static struct commit_list **simplify_one(struct rev_info *revs, struct commit *c
>  	if (!cnt ||
>  	    (commit->object.flags & UNINTERESTING) ||
>  	    !(commit->object.flags & TREESAME) ||
> -	    (parent = one_relevant_parent(revs, commit->parents)) == NULL)
> +	    (parent = one_relevant_parent(revs, commit->parents)) == NULL ||
> +	    (revs->show_pulls && (commit->object.flags & PULL_MERGE)))
>  		st->simplified = commit;

... hence, wouldn't we see different result here ...

> @@ -3602,6 +3621,10 @@ enum commit_action get_commit_action(struct rev_info *revs, struct commit *commi
>  			/* drop merges unless we want parenthood */
>  			if (!want_ancestry(revs))
>  				return commit_ignore;
> +
> +			if (revs->show_pulls && (commit->object.flags & PULL_MERGE))
> +				return commit_show;

... and also here?

Thanks.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v3] revision: --show-pulls adds helpful merges
  2020-04-10 20:06     ` Junio C Hamano
@ 2020-04-10 21:43       ` Derrick Stolee
  0 siblings, 0 replies; 23+ messages in thread
From: Derrick Stolee @ 2020-04-10 21:43 UTC (permalink / raw)
  To: Junio C Hamano, Derrick Stolee via GitGitGadget
  Cc: git, peff, me, philipoakley, sandals, Johannes.Schindelin,
	Derrick Stolee

On 4/10/2020 4:06 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>>   A----C1----C2--- ... ---Cn----M------P1---P2--- ... ---Pn
>>    \     \     \            \  /      /    /            /
>>     \     \__.. \            \/ ..__T1    /           Tn
>>      \           \__..       /\     ..__T2           /
>>       \_____________________B  \____________________/
>>
>> If the commits T1, T2, ... Tn did not change the file, then all of
>> P1 through Pn will be TREESAME to their first parent, but not
>> TREESAME to their second. This means that all of those merge commits
>> appear in the --full-history view, with edges that immediately
>> collapse into the lower history without introducing interesting
>> single-parent commits.
>>
>> The --simplify-merges option was introduced to remove these extra
>> merge commits. By noticing that the rewritten parents are reachable
>> from their first parents, those edges can be simplified away. Finally,
>> the commits now look like single-parent commits that are TREESAME to
>> their "only" parent. Thus, they are removed and this issue does not
>> cause issues anymore. However, this also ends up removing the commit
>> M from the history view! Even worse, the --simplify-merges option
>> requires walking the entire history before returning a single result.
> 
> True.  It is not advisable to use --simplify-merges unless you are
> limiting the history at the bottom for that reason.

I will modify my advice to include this "limiting the history at the
bottom" advice.

>> In some sense, users are asking for the "first" merge commit to
>> bring in the change to their branch. As long as the parent order is
>> consistent, this can be handled with the following rule:
>>
>>   Include a merge commit if it is not TREESAME to its first
>>   parent, but is TREESAME to a later parent.
> 
> "but is" -> "even if it is" would make it a bit more accurate, I
> would think.  Normally, such a merge that is treesame to some side
> branch is omitted from the output, but the rule wants it to be shown
> even if all the changes were brought in from one single parent.

This is an excellent clarification, since the way I wrote it does not
describe every merge that we include. I wrote it to include the "extra"
merges that are included. A better strategy is to include a complete
description of every merge commit that appears in the output, as you
defined it.

>> Update Documentation/rev-list-options.txt with significant details
>> around this option. This requires updating the example in the
>> History Simplification section to demonstrate some of the problems
>> with TREESAME second parents.
> 
> Good.
> 
>> diff --git a/revision.c b/revision.c
>> index 8136929e236..f89dd6caa55 100644
>> --- a/revision.c
>> +++ b/revision.c
>> @@ -870,7 +870,19 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
>>  			}
>>  			parent->next = NULL;
>>  			commit->parents = parent;
>> -			commit->object.flags |= TREESAME;
>> +
>> +			/*
>> +			 * A merge commit is a "diversion" if it is not
>> +			 * TREESAME to its first parent but is TREESAME
>> +			 * to a later parent. In the simplified history,
>> +			 * we "divert" the history walk to the later
>> +			 * parent. These commits are shown when "show_pulls"
>> +			 * is enabled, so do not mark the object as
>> +			 * TREESAME here.
> 
> As we no longer use the word "diversion", this explanation should
> shift the focus from defining the word "diversion" and giving its
> background to explaining why the above parent rewriting is done and
> why the TREESAME marking is conditional, e.g.

Sorry about this. I intended to replace all references to this old
terminology, but failed.

>       			The tree of the merge and of the parent are
>       			the same; from here on, we stop following
>       			histories of all other parents but this one
>       			by culling commit->parents list.  We also
>       			normally mark the merge commit TREESAME as
>       			the merge itself did not introduce any
>       			change relative to the parent, but we
>       			refrain from doing so for the first parent
>       			under "--show-pulls" mode, in order to let
>       			the output phase to show the merge, which is
>       			the last commit before we divert our walk to
>       			a side history.
> 
> or something along that line.

This is pretty good. I plan to take it with very minor modifications in my
next version.

> 
>> +			if (!revs->show_pulls || !nth_parent)
>> +				commit->object.flags |= TREESAME;
>> +
>>  			return;
> 
>> @@ -897,6 +909,10 @@ static void try_to_simplify_commit(struct rev_info *revs, struct commit *commit)
>>  				relevant_change = 1;
>>  			else
>>  				irrelevant_change = 1;
>> +
>> +			if (!nth_parent)
>> +				commit->object.flags |= PULL_MERGE;
> 
> For a three-parent merge that brings in changes to the first parent,
> if the result matches the second parent, we would have returned in
> the previous hunk before having a chance to inspect the third one
> and marking the merge result with PULL_MERGE, but if you swap the
> order of the second and the third parent, the second parent, which
> has different tree from the result, would not return in the previous
> hunk and cause the merge with PULL_MERGE here.  And then when we
> inspect the third parent, the previous hunk's return would kick in.
> So for two merges that merge exactly the same two branches on top of
> exactly the same commit on the mainline, you sometimes mark the
> result with PULL_MERGE and sometimes don't, depending on the order
> of the second and the third parent.
> 
> That feels iffy.  Treating the first parent differently from others
> is what we intend to do with this change, bhut this hunk treats the
> other parents differently depending on the merge order.

It is worth adding a test with an octopus merge to really be clear
about the intended behavior. When describing the concept, I've been
careful to say "TREESAME to the first parent" and "not TREESAME to
a later parent" instead of assuming two parents. In the default
mode, we should not reach here unless we found a TREESAME parent
that was not the first parent, since we stop visiting parents after
we find any TREESAME parent. But this would change for --full-history
or --simplify-merges.

>> @@ -3019,7 +3037,8 @@ static struct commit_list **simplify_one(struct rev_info *revs, struct commit *c
>>  	if (!cnt ||
>>  	    (commit->object.flags & UNINTERESTING) ||
>>  	    !(commit->object.flags & TREESAME) ||
>> -	    (parent = one_relevant_parent(revs, commit->parents)) == NULL)
>> +	    (parent = one_relevant_parent(revs, commit->parents)) == NULL ||
>> +	    (revs->show_pulls && (commit->object.flags & PULL_MERGE)))
>>  		st->simplified = commit;
> 
> ... hence, wouldn't we see different result here ...
> 
>> @@ -3602,6 +3621,10 @@ enum commit_action get_commit_action(struct rev_info *revs, struct commit *commi
>>  			/* drop merges unless we want parenthood */
>>  			if (!want_ancestry(revs))
>>  				return commit_ignore;
>> +
>> +			if (revs->show_pulls && (commit->object.flags & PULL_MERGE))
>> +				return commit_show;
> 
> ... and also here?

If there is a flaw in how I set PULL_MERGE, then hopefully the fix is where
I assign the PULL_MERGE flag and these lines do not need to change.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-04-10 21:43 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-08  1:22 [PATCH] revision: --include-diversions adds helpful merges Derrick Stolee via GitGitGadget
2020-04-08  1:30 ` Junio C Hamano
2020-04-08  1:39   ` Derrick Stolee
2020-04-08 15:28     ` Derrick Stolee
2020-04-08 19:46       ` Junio C Hamano
2020-04-08 20:05         ` Jeff King
2020-04-08 20:22           ` Derrick Stolee
2020-04-08 21:35             ` Junio C Hamano
2020-04-08 23:59               ` Derrick Stolee
2020-04-09  0:08                 ` Junio C Hamano
2020-04-09 11:52                   ` Derrick Stolee
2020-04-09 14:28                   ` Philip Oakley
2020-04-09 15:56                     ` Junio C Hamano
2020-04-09 17:20                       ` Derrick Stolee
2020-04-09 18:24                         ` Jeff King
2020-04-09 18:55                           ` Junio C Hamano
2020-04-09 19:21                             ` Jeff King
2020-04-08  2:13 ` brian m. carlson
2020-04-08 18:48 ` Jeff King
2020-04-09  0:01 ` [PATCH v2] " Derrick Stolee via GitGitGadget
2020-04-10 12:19   ` [PATCH v3] revision: --show-pulls " Derrick Stolee via GitGitGadget
2020-04-10 20:06     ` Junio C Hamano
2020-04-10 21:43       ` Derrick Stolee

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).