git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: szeder.dev@gmail.com, avarab@gmail.com,
	abhishekkumar8222@gmail.com, me@ttaylorr.com,
	Derrick Stolee <dstolee@microsoft.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: [PATCH 2/2] commit-reach: use fast logic in repo_in_merge_base
Date: Wed, 17 Jun 2020 17:24:29 +0000	[thread overview]
Message-ID: <ead5a0d9bd3b5c4a8c3fde970941c0e4a10025ab.1592414670.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.664.git.1592414670.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

The repo_is_descendant_of() method is aware of the existence of the
commit-graph file. It checks for generation_numbers_enabled() before
deciding on using can_all_from_reach() or repo_in_merge_bases()
depending on the situation. The reason here is that can_all_from_reach()
uses a depth-first search that is limited by the minimum generation
number of the target commits, and that algorithm can be very slow when
generation numbers are not present. The alternative uses
paint_down_to_common() which will walk the entire merge-base boundary,
which is typically slower.

This method is used by commands like "git tag --contains" and "git
branch --contains" for very fast results when a commit-graph file
exists. Unfortunately, it is _not_ used in commands like "git merge-base
--is-ancestor" which is doing an even simpler request.

This issue was raised recently [1] with respect to a change to how
generation numbers are stored, but was also reported much earlier [2]
before commit-reach.c existed to simplify these reachability queries.

[1] https://lore.kernel.org/git/20200607195347.GA8232@szeder.dev/
[2] https://lore.kernel.org/git/87608bawoa.fsf@evledraar.gmail.com/

The root cause is that builtin/merge-base.c has a method
handle_is_ancestor() that calls in_merge_bases(), an older version of
repo_in_merge_bases(). It would be better if we have every caller to
in_merge_bases() use the logic in can_all_from_reach() when possible.

This is where things get a little tricky: repo_is_descendant_of() calls
repo_in_merge_bases() in the non-generation numbers enabled case! If we
simply update repo_in_merge_bases() to call repo_is_descendant_of()
instead of repo_in_merge_bases_many(), then we will get a recursive call
loop. Thankfully, this is caught by the test suite in the default mode
(i.e. GIT_TEST_COMMIT_GRAPH=0).

The trick, then, is to make the non-generation number case for
repo_is_descendant_of() call repo_in_merge_bases_many() directly,
skipping the non-_many version. This allows us to take advantage of this
faster code path, when possible.

The easiest way to measure the performance impact is to test the
following command on the Linux kernel repository:

	git merge-base --is-ancestor <A> <B>

  | A    | B    | Time Before | Time After |
  |------|------|-------------|------------|
  | v3.0 | v5.7 | 0.459s      | 0.028s     |
  | v4.0 | v5.7 | 0.267s      | 0.021s     |
  | v5.0 | v5.7 | 0.074s      | 0.013s     |

Note that each of these samples return success. The old code performed
the same operation when <A> and <B> are swapped. However,
can_all_from_reach() will return immediately if the generation numbers
show that <A> has larger generation number than <B>. Thus, the time for
the swapped case is universally 0.004s in each case.

Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-reach.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 13722430aa5..43e303d5f25 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -303,7 +303,7 @@ static int repo_is_descendant_of(struct repository *r,
 
 			other = with_commit->item;
 			with_commit = with_commit->next;
-			if (repo_in_merge_bases(r, other, commit))
+			if (repo_in_merge_bases_many(r, other, 1, &commit))
 				return 1;
 		}
 		return 0;
@@ -355,7 +355,15 @@ int repo_in_merge_bases(struct repository *r,
 			struct commit *commit,
 			struct commit *reference)
 {
-	return repo_in_merge_bases_many(r, commit, 1, &reference);
+	int res;
+	struct commit_list *list = NULL;
+	struct commit_list **next = &list;
+
+	next = commit_list_append(commit, next);
+	res = repo_is_descendant_of(r, reference, list);
+	free_commit_list(list);
+
+	return res;
 }
 
 struct commit_list *reduce_heads(struct commit_list *heads)
-- 
gitgitgadget

  parent reply	other threads:[~2020-06-17 17:24 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-17 17:24 [PATCH 0/2] Accelerate "git merge-base --is-ancestor" Derrick Stolee via GitGitGadget
2020-06-17 17:24 ` [PATCH 1/2] commit-reach: create repo_is_descendant_of() Derrick Stolee via GitGitGadget
2020-06-29 13:40   ` Taylor Blau
2020-06-30  1:45     ` Derrick Stolee
2020-06-17 17:24 ` Derrick Stolee via GitGitGadget [this message]
2020-06-17 20:46 ` [PATCH 0/2] Accelerate "git merge-base --is-ancestor" Junio C Hamano
2020-06-18  1:37   ` Derrick Stolee
2020-06-19  6:10 ` Abhishek Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ead5a0d9bd3b5c4a8c3fde970941c0e4a10025ab.1592414670.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=abhishekkumar8222@gmail.com \
    --cc=avarab@gmail.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).