git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: jnareb@gmail.com, Junio C Hamano <gitster@pobox.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: [PATCH 1/1] commit: don't use generation numbers if not needed
Date: Thu, 30 Aug 2018 05:58:09 -0700 (PDT)	[thread overview]
Message-ID: <efa3720fb40638e5d61c6130b55e3348d8e4339e.1535633886.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.28.git.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

In 3afc679b "commit: use generations in paint_down_to_common()",
the queue in paint_down_to_common() was changed to use a priority
order based on generation number before commit date. This served
two purposes:

 1. When generation numbers are present, the walk guarantees
    correct topological relationships, regardless of clock skew in
    commit dates.

 2. It enables short-circuiting the walk when the min_generation
    parameter is added in d7c1ec3e "commit: add short-circuit to
    paint_down_to_common()". This short-circuit helps commands
    like 'git branch --contains' from needing to walk to a merge
    base when we know the result is false.

The commit message for 3afc679b includes the following sentence:

    This change does not affect the number of commits that are
    walked during the execution of paint_down_to_common(), only
    the order that those commits are inspected.

This statement is incorrect. Because it changes the order in which
the commits are inspected, it changes the order they are added to
the queue, and hence can change the number of loops before the
queue_has_nonstale() method returns true.

This change makes a concrete difference depending on the topology
of the commit graph. For instance, computing the merge-base between
consecutive versions of the Linux kernel has no effect for versions
after v4.9, but 'git merge-base v4.8 v4.9' presents a performance
regression:

    v2.18.0: 0.122s
v2.19.0-rc1: 0.547s
       HEAD: 0.127s

To determine that this was simply an ordering issue, I inserted
a counter within the while loop of paint_down_to_common() and
found that the loop runs 167,468 times in v2.18.0 and 635,579
times in v2.19.0-rc1.

The topology of this case can be described in a simplified way
here:

  v4.9
   |  \
   |   \
  v4.8  \
   | \   \
   |  \   |
  ...  A  B
   |  /  /
   | /  /
   |/__/
   C

Here, the "..." means "a very long line of commits". By generation
number, A and B have generation one more than C. However, A and B
have commit date higher than most of the commits reachable from
v4.8. When the walk reaches v4.8, we realize that it has PARENT1
and PARENT2 flags, so everything it can reach is marked as STALE,
including A. B has only the PARENT1 flag, so is not STALE.

When paint_down_to_common() is run using
compare_commits_by_commit_date, A and B are removed from the queue
early and C is inserted into the queue. At this point, C and the
rest of the queue entries are marked as STALE. The loop then
terminates.

When paint_down_to_common() is run using
compare_commits_by_gen_then_commit_date, B is removed from the
queue only after the many commits reachable from v4.8 are explored.
This causes the loop to run longer. The reason for this regression
is simple: the queue order is intended to not explore a commit
until everything that _could_ reach that commit is explored. From
the information gathered by the original ordering, we have no
guarantee that there is not a commit D reachable from v4.8 that
can also reach B. We gained absolute correctness in exchange for
a performance regression.

The performance regression is probably the worse option, since
these incorrect results in paint_down_to_common() are rare. The
topology required for the performance regression are less rare,
but still require multiple merge commits where the parents differ
greatly in generation number. In our example above, the commit A
is as important as the commit B to demonstrate the problem, since
otherwise the commit C will sit in the queue as non-stale just as
long in both orders.

The solution provided uses the min_generation parameter to decide
if we should use generation numbers in our ordering. When
min_generation is equal to zero, it means that the caller has no
known cutoff for the walk, so we should rely on our commit-date
heuristic as before; this is the case with merge_bases_many().
When min_generation is non-zero, then the caller knows a valuable
cutoff for the short-circuit mechanism; this is the case with
remove_redundant() and in_merge_bases_many().

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/commit.c b/commit.c
index 1a6e632185..449c1f4920 100644
--- a/commit.c
+++ b/commit.c
@@ -874,6 +874,9 @@ static struct commit_list *paint_down_to_common(struct commit *one, int n,
 	int i;
 	uint32_t last_gen = GENERATION_NUMBER_INFINITY;
 
+	if (!min_generation)
+		queue.compare = compare_commits_by_commit_date;
+
 	one->object.flags |= PARENT1;
 	if (!n) {
 		commit_list_append(one, &result);
@@ -891,7 +894,7 @@ static struct commit_list *paint_down_to_common(struct commit *one, int n,
 		struct commit_list *parents;
 		int flags;
 
-		if (commit->generation > last_gen)
+		if (min_generation && commit->generation > last_gen)
 			BUG("bad generation skip %8x > %8x at %s",
 			    commit->generation, last_gen,
 			    oid_to_hex(&commit->object.oid));
-- 
gitgitgadget

  reply	other threads:[~2018-08-30 12:58 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-30 12:58 [PATCH 0/1] v2.19.0-rc1 Performance Regression in 'git merge-base' Derrick Stolee via GitGitGadget
2018-08-30 12:58 ` Derrick Stolee via GitGitGadget [this message]
2018-10-06 16:54   ` [PATCH 1/1] commit: don't use generation numbers if not needed Jakub Narebski
2018-08-30 15:26 ` [PATCH 0/1] v2.19.0-rc1 Performance Regression in 'git merge-base' Junio C Hamano
2018-10-06 16:09 ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=efa3720fb40638e5d61c6130b55e3348d8e4339e.1535633886.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jnareb@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).