git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com,
	"Jeff King" <peff@peff.net>,
	"Phillip Wood" <phillip.wood123@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Jonathan Tan" <jonathantanmy@google.com>,
	"Derrick Stolee" <derrickstolee@github.com>,
	"Derrick Stolee" <derrickstolee@github.com>
Subject: [PATCH v4 9/9] commit-reach: add tips_reachable_from_bases()
Date: Mon, 20 Mar 2023 11:26:55 +0000	[thread overview]
Message-ID: <87fe9676aec5224d00b786515c822975b2fce9b6.1679311616.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1489.v4.git.1679311615.gitgitgadget@gmail.com>

From: Derrick Stolee <derrickstolee@github.com>

Both 'git for-each-ref --merged=<X>' and 'git branch --merged=<X>' use
the ref-filter machinery to select references or branches (respectively)
that are reachable from a set of commits presented by one or more
--merged arguments. This happens within reach_filter(), which uses the
revision-walk machinery to walk history in a standard way.

However, the commit-reach.c file is full of custom searches that are
more efficient, especially for reachability queries that can terminate
early when reachability is discovered. Add a new
tips_reachable_from_bases() method to commit-reach.c and call it from
within reach_filter() in ref-filter.c. This affects both 'git branch'
and 'git for-each-ref' as tested in p1500-graph-walks.sh.

For the Linux kernel repository, we take an already-fast algorithm and
make it even faster:

Test                                            HEAD~1  HEAD
-------------------------------------------------------------------
1500.5: contains: git for-each-ref --merged     0.13    0.02 -84.6%
1500.6: contains: git branch --merged           0.14    0.02 -85.7%
1500.7: contains: git tag --merged              0.15    0.03 -80.0%

(Note that we remove the iterative 'git rev-list' test from p1500
because it no longer makes sense as a comparison to 'git for-each-ref'
and would just waste time running it for these comparisons.)

The algorithm is implemented in commit-reach.c in the method
tips_reachable_from_base(). This method takes a string_list of tips and
assigns the 'util' for each item with the value 1 if the base commit can
reach those tips.

Like other reachability queries in commit-reach.c, the fastest way to
search for "can A reach B?" is to do a depth-first search up to the
generation number of B, preferring to explore first parents before later
parents. While we must walk all reachable commits up to that generation
number when the answer is "no", the depth-first search can answer "yes"
much faster than other approaches in most cases.

This search becomes trickier when there are multiple targets for the
depth-first search. The commits with lower generation number are more
likely to be within the history of the start commit, but we don't want
to waste time searching commits of low generation number if the commit
target with lowest generation number has already been found.

The trick here is to take the input commits and sort them by generation
number in ascending order. Track the index within this order as
min_generation_index. When we find a commit, if its index in the list is
equal to min_generation_index, then we can increase the generation
number boundary of our search to the next-lowest value in the list.

With this mechanism, the number of commits to search is minimized with
respect to the depth-first search heuristic. We will walk all commits up
to the minimum generation number of a commit that is _not_ reachable
from the start, but we will walk only the necessary portion of the
depth-first search for the reachable commits of lower generation.

Add extra tests for this behavior in t6600-test-reach.sh as the
interesting data shape of that repository can sometimes demonstrate
corner case bugs.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 commit-reach.c              | 113 ++++++++++++++++++++++++++++++++++++
 commit-reach.h              |   9 +++
 ref-filter.c                |  20 ++-----
 t/perf/p1500-graph-walks.sh |  15 +++--
 t/t6600-test-reach.sh       |  83 ++++++++++++++++++++++++++
 5 files changed, 219 insertions(+), 21 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index cd990dce16a..c1edeb46106 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -1044,3 +1044,116 @@ void ahead_behind(struct repository *r,
 	clear_bit_arrays(&bit_arrays);
 	clear_prio_queue(&queue);
 }
+
+struct commit_and_index {
+	struct commit *commit;
+	unsigned int index;
+	timestamp_t generation;
+};
+
+static int compare_commit_and_index_by_generation(const void *va, const void *vb)
+{
+	const struct commit_and_index *a = (const struct commit_and_index *)va;
+	const struct commit_and_index *b = (const struct commit_and_index *)vb;
+
+	if (a->generation > b->generation)
+		return 1;
+	if (a->generation < b->generation)
+		return -1;
+	return 0;
+}
+
+void tips_reachable_from_bases(struct repository *r,
+			       struct commit_list *bases,
+			       struct commit **tips, size_t tips_nr,
+			       int mark)
+{
+	struct commit_and_index *commits;
+	size_t min_generation_index = 0;
+	timestamp_t min_generation;
+	struct commit_list *stack = NULL;
+
+	if (!bases || !tips || !tips_nr)
+		return;
+
+	/*
+	 * Do a depth-first search starting at 'bases' to search for the
+	 * tips. Stop at the lowest (un-found) generation number. When
+	 * finding the lowest commit, increase the minimum generation
+	 * number to the next lowest (un-found) generation number.
+	 */
+
+	CALLOC_ARRAY(commits, tips_nr);
+
+	for (size_t i = 0; i < tips_nr; i++) {
+		commits[i].commit = tips[i];
+		commits[i].index = i;
+		commits[i].generation = commit_graph_generation(tips[i]);
+	}
+
+	/* Sort with generation number ascending. */
+	QSORT(commits, tips_nr, compare_commit_and_index_by_generation);
+	min_generation = commits[0].generation;
+
+	while (bases) {
+		repo_parse_commit(r, bases->item);
+		commit_list_insert(bases->item, &stack);
+		bases = bases->next;
+	}
+
+	while (stack) {
+		int explored_all_parents = 1;
+		struct commit_list *p;
+		struct commit *c = stack->item;
+		timestamp_t c_gen = commit_graph_generation(c);
+
+		/* Does it match any of our tips? */
+		for (size_t j = min_generation_index; j < tips_nr; j++) {
+			if (c_gen < commits[j].generation)
+				break;
+
+			if (commits[j].commit == c) {
+				tips[commits[j].index]->object.flags |= mark;
+
+				if (j == min_generation_index) {
+					unsigned int k = j + 1;
+					while (k < tips_nr &&
+					       (tips[commits[k].index]->object.flags & mark))
+						k++;
+
+					/* Terminate early if all found. */
+					if (k >= tips_nr)
+						goto done;
+
+					min_generation_index = k;
+					min_generation = commits[k].generation;
+				}
+			}
+		}
+
+		for (p = c->parents; p; p = p->next) {
+			repo_parse_commit(r, p->item);
+
+			/* Have we already explored this parent? */
+			if (p->item->object.flags & SEEN)
+				continue;
+
+			/* Is it below the current minimum generation? */
+			if (commit_graph_generation(p->item) < min_generation)
+				continue;
+
+			/* Ok, we will explore from here on. */
+			p->item->object.flags |= SEEN;
+			explored_all_parents = 0;
+			commit_list_insert(p->item, &stack);
+			break;
+		}
+
+		if (explored_all_parents)
+			pop_commit(&stack);
+	}
+
+done:
+	free(commits);
+	repo_clear_commit_marks(r, SEEN);
+}
diff --git a/commit-reach.h b/commit-reach.h
index f708c46e523..d6321ae700e 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -135,4 +135,13 @@ void ahead_behind(struct repository *r,
 		  struct commit **commits, size_t commits_nr,
 		  struct ahead_behind_count *counts, size_t counts_nr);
 
+/*
+ * For all tip commits, add 'mark' to their flags if and only if they
+ * are reachable from one of the commits in 'bases'.
+ */
+void tips_reachable_from_bases(struct repository *r,
+			       struct commit_list *bases,
+			       struct commit **tips, size_t tips_nr,
+			       int mark);
+
 #endif
diff --git a/ref-filter.c b/ref-filter.c
index 62135f649ec..c724ff94113 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2396,33 +2396,22 @@ static void reach_filter(struct ref_array *array,
 			 struct commit_list *check_reachable,
 			 int include_reached)
 {
-	struct rev_info revs;
 	int i, old_nr;
 	struct commit **to_clear;
-	struct commit_list *cr;
 
 	if (!check_reachable)
 		return;
 
 	CALLOC_ARRAY(to_clear, array->nr);
-
-	repo_init_revisions(the_repository, &revs, NULL);
-
 	for (i = 0; i < array->nr; i++) {
 		struct ref_array_item *item = array->items[i];
-		add_pending_object(&revs, &item->commit->object, item->refname);
 		to_clear[i] = item->commit;
 	}
 
-	for (cr = check_reachable; cr; cr = cr->next) {
-		struct commit *merge_commit = cr->item;
-		merge_commit->object.flags |= UNINTERESTING;
-		add_pending_object(&revs, &merge_commit->object, "");
-	}
-
-	revs.limited = 1;
-	if (prepare_revision_walk(&revs))
-		die(_("revision walk setup failed"));
+	tips_reachable_from_bases(the_repository,
+				  check_reachable,
+				  to_clear, array->nr,
+				  UNINTERESTING);
 
 	old_nr = array->nr;
 	array->nr = 0;
@@ -2446,7 +2435,6 @@ static void reach_filter(struct ref_array *array,
 		clear_commit_marks(merge_commit, ALL_REV_FLAGS);
 	}
 
-	release_revisions(&revs);
 	free(to_clear);
 }
 
diff --git a/t/perf/p1500-graph-walks.sh b/t/perf/p1500-graph-walks.sh
index 439a448c2e6..e14e7620cce 100755
--- a/t/perf/p1500-graph-walks.sh
+++ b/t/perf/p1500-graph-walks.sh
@@ -35,11 +35,16 @@ test_perf 'ahead-behind counts: git tag' '
 	xargs git tag -l --format="%(ahead-behind:HEAD)" <tags
 '
 
-test_perf 'ahead-behind counts: git rev-list' '
-	for r in $(cat refs)
-	do
-		git rev-list --count "HEAD..$r" || return 1
-	done
+test_perf 'contains: git for-each-ref --merged' '
+	git for-each-ref --merged=HEAD --stdin <refs
+'
+
+test_perf 'contains: git branch --merged' '
+	xargs git branch --merged=HEAD <branches
+'
+
+test_perf 'contains: git tag --merged' '
+	xargs git tag --merged=HEAD <tags
 '
 
 test_done
diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index 0cb50797ef7..b330945f497 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -529,4 +529,87 @@ test_expect_success 'for-each-ref ahead-behind:none' '
 		--format="%(refname) %(ahead-behind:commit-8-4)" --stdin
 '
 
+test_expect_success 'for-each-ref merged:linear' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-1-3
+	refs/heads/commit-1-5
+	refs/heads/commit-1-8
+	refs/heads/commit-2-1
+	refs/heads/commit-5-1
+	refs/heads/commit-9-1
+	EOF
+	cat >expect <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-1-3
+	refs/heads/commit-1-5
+	refs/heads/commit-1-8
+	EOF
+	run_all_modes git for-each-ref --merged=commit-1-9 \
+		--format="%(refname)" --stdin
+'
+
+test_expect_success 'for-each-ref merged:all' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-2-4
+	refs/heads/commit-4-2
+	refs/heads/commit-4-4
+	EOF
+	cat >expect <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-2-4
+	refs/heads/commit-4-2
+	refs/heads/commit-4-4
+	EOF
+	run_all_modes git for-each-ref --merged=commit-5-5 \
+		--format="%(refname)" --stdin
+'
+
+test_expect_success 'for-each-ref ahead-behind:some' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-5-3
+	refs/heads/commit-4-8
+	refs/heads/commit-9-9
+	EOF
+	cat >expect <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-5-3
+	EOF
+	run_all_modes git for-each-ref --merged=commit-9-6 \
+		--format="%(refname)" --stdin
+'
+
+test_expect_success 'for-each-ref merged:some, multibase' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-5-3
+	refs/heads/commit-7-8
+	refs/heads/commit-4-8
+	refs/heads/commit-9-9
+	EOF
+	cat >expect <<-\EOF &&
+	refs/heads/commit-1-1
+	refs/heads/commit-4-8
+	refs/heads/commit-5-3
+	EOF
+	run_all_modes git for-each-ref \
+		--merged=commit-5-8 \
+		--merged=commit-8-5 \
+		--format="%(refname)" \
+		--stdin
+'
+
+test_expect_success 'for-each-ref merged:none' '
+	cat >input <<-\EOF &&
+	refs/heads/commit-7-5
+	refs/heads/commit-4-8
+	refs/heads/commit-9-9
+	EOF
+	>expect &&
+	run_all_modes git for-each-ref --merged=commit-8-4 \
+		--format="%(refname)" --stdin
+'
+
 test_done
-- 
gitgitgadget

      parent reply	other threads:[~2023-03-20 11:27 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-06 14:06 [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges Derrick Stolee via GitGitGadget
2023-03-06 14:06 ` [PATCH 1/8] ahead-behind: create empty builtin Derrick Stolee via GitGitGadget
2023-03-06 18:48   ` Junio C Hamano
2023-03-07  0:40     ` Taylor Blau
2023-03-08 22:14       ` Derrick Stolee
2023-03-08 22:56         ` Junio C Hamano
2023-03-06 14:06 ` [PATCH 2/8] ahead-behind: parse tip references Derrick Stolee via GitGitGadget
2023-03-07  0:43   ` Taylor Blau
2023-03-06 14:06 ` [PATCH 3/8] ahead-behind: implement --ignore-missing option Derrick Stolee via GitGitGadget
2023-03-07  0:46   ` Taylor Blau
2023-03-06 14:06 ` [PATCH 4/8] commit-graph: combine generation computations Derrick Stolee via GitGitGadget
2023-03-06 14:06 ` [PATCH 5/8] commit-graph: return generation from memory Derrick Stolee via GitGitGadget
2023-03-06 14:06 ` [PATCH 6/8] commit-graph: introduce `ensure_generations_valid()` Taylor Blau via GitGitGadget
2023-03-06 18:52   ` Junio C Hamano
2023-03-07  0:50     ` Taylor Blau
2023-03-06 14:06 ` [PATCH 7/8] ahead-behind: implement ahead_behind() logic Derrick Stolee via GitGitGadget
2023-03-07  1:05   ` Taylor Blau
2023-03-09 17:32     ` Derrick Stolee
2023-03-06 14:06 ` [PATCH 8/8] ahead-behind: add --contains mode Derrick Stolee via GitGitGadget
2023-03-06 18:26 ` [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges Junio C Hamano
2023-03-06 20:18   ` Derrick Stolee
2023-03-06 22:24     ` Junio C Hamano
2023-03-07  0:36   ` Taylor Blau
2023-03-09  9:20     ` Jeff King
2023-03-09 21:51       ` Junio C Hamano
2023-03-07  0:33 ` Taylor Blau
2023-03-10 17:20 ` [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
2023-03-10 17:20   ` [PATCH v2 1/8] for-each-ref: add --stdin option Derrick Stolee via GitGitGadget
2023-03-10 18:08     ` Junio C Hamano
2023-03-13 10:31     ` Phillip Wood
2023-03-13 13:33       ` Derrick Stolee
2023-03-13 21:10         ` Taylor Blau
2023-03-15 13:37     ` Ævar Arnfjörð Bjarmason
2023-03-15 17:17       ` Jeff King
2023-03-15 17:49     ` Jeff King
2023-03-15 19:24       ` Junio C Hamano
2023-03-15 19:44         ` Jeff King
2023-03-10 17:20   ` [PATCH v2 2/8] for-each-ref: explicitly test no matches Derrick Stolee via GitGitGadget
2023-03-10 17:20   ` [PATCH v2 3/8] commit-graph: combine generation computations Derrick Stolee via GitGitGadget
2023-03-10 17:20   ` [PATCH v2 4/8] commit-graph: return generation from memory Derrick Stolee via GitGitGadget
2023-03-10 17:21   ` [PATCH v2 5/8] commit-graph: introduce `ensure_generations_valid()` Taylor Blau via GitGitGadget
2023-03-10 17:21   ` [PATCH v2 6/8] commit-reach: implement ahead_behind() logic Derrick Stolee via GitGitGadget
2023-03-15 13:50     ` Ævar Arnfjörð Bjarmason
2023-03-15 16:03       ` Junio C Hamano
2023-03-15 16:13         ` Derrick Stolee
2023-03-10 17:21   ` [PATCH v2 7/8] for-each-ref: add ahead-behind format atom Derrick Stolee via GitGitGadget
2023-03-10 19:09     ` Junio C Hamano
2023-03-15 13:57     ` Ævar Arnfjörð Bjarmason
2023-03-15 16:01       ` Junio C Hamano
2023-03-15 16:12         ` Derrick Stolee
2023-03-15 16:11       ` Derrick Stolee
2023-03-10 17:21   ` [PATCH v2 8/8] commit-reach: add tips_reachable_from_bases() Derrick Stolee via GitGitGadget
2023-03-15 14:13     ` Ævar Arnfjörð Bjarmason
2023-03-15 16:17       ` Derrick Stolee
2023-03-15 16:18         ` Derrick Stolee
2023-03-10 19:16   ` [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option Junio C Hamano
2023-03-10 19:25     ` Derrick Stolee
2023-03-15 17:31       ` Jeff King
2023-03-15 17:44         ` Derrick Stolee
2023-03-15 19:34         ` Junio C Hamano
2023-03-15 13:22   ` Ævar Arnfjörð Bjarmason
2023-03-15 13:54     ` Derrick Stolee
2023-03-15 17:45   ` [PATCH v3 " Derrick Stolee via GitGitGadget
2023-03-15 17:45     ` [PATCH v3 1/8] for-each-ref: add --stdin option Derrick Stolee via GitGitGadget
2023-03-15 18:06       ` Jeff King
2023-03-15 19:14         ` Junio C Hamano
2023-03-15 22:41       ` Jonathan Tan
2023-03-15 17:45     ` [PATCH v3 2/8] for-each-ref: explicitly test no matches Derrick Stolee via GitGitGadget
2023-03-15 17:45     ` [PATCH v3 3/8] commit-graph: combine generation computations Derrick Stolee via GitGitGadget
2023-03-15 22:49       ` Jonathan Tan
2023-03-17 18:30         ` Derrick Stolee
2023-03-15 17:45     ` [PATCH v3 4/8] commit-graph: return generation from memory Derrick Stolee via GitGitGadget
2023-03-15 22:58       ` Jonathan Tan
2023-03-15 17:45     ` [PATCH v3 5/8] commit-graph: introduce `ensure_generations_valid()` Taylor Blau via GitGitGadget
2023-03-15 17:45     ` [PATCH v3 6/8] commit-reach: implement ahead_behind() logic Derrick Stolee via GitGitGadget
2023-03-15 23:28       ` Jonathan Tan
2023-03-17 18:44         ` Derrick Stolee
2023-03-15 17:45     ` [PATCH v3 7/8] for-each-ref: add ahead-behind format atom Derrick Stolee via GitGitGadget
2023-03-15 17:45     ` [PATCH v3 8/8] commit-reach: add tips_reachable_from_bases() Derrick Stolee via GitGitGadget
2023-03-20 11:26     ` [PATCH v4 0/9] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 1/9] for-each-ref: add --stdin option Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 2/9] for-each-ref: explicitly test no matches Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 3/9] commit-graph: refactor compute_topological_levels() Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 4/9] commit-graph: simplify compute_generation_numbers() Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 5/9] commit-graph: return generation from memory Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 6/9] commit-graph: introduce `ensure_generations_valid()` Taylor Blau via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 7/9] commit-reach: implement ahead_behind() logic Derrick Stolee via GitGitGadget
2023-03-20 20:40         ` Jonathan Tan
2023-03-20 11:26       ` [PATCH v4 8/9] for-each-ref: add ahead-behind format atom Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` Derrick Stolee via GitGitGadget [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87fe9676aec5224d00b786515c822975b2fce9b6.1679311616.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=avarab@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=phillip.wood123@gmail.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).