git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: "Michael Haggerty" <mhagger@alum.mit.edu>,
	me@ttaylorr.com, peff@peff.net, gitster@pobox.com,
	"René Scharfe" <l.s.r@web.de>,
	"Derrick Stolee" <stolee@gmail.com>,
	"Derrick Stolee" <derrickstolee@github.com>,
	"Derrick Stolee" <dstolee@microsoft.com>
Subject: [PATCH v2 4/5] commit-reach: use heuristic in remove_redundant()
Date: Mon, 01 Feb 2021 12:47:26 +0000	[thread overview]
Message-ID: <83feabeebb5f035059758fba1ca5cf74f3a22c91.1612183647.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.852.v2.git.1612183647.gitgitgadget@gmail.com>

From: Derrick Stolee <dstolee@microsoft.com>

Reachability algorithms in commit-reach.c frequently benefit from using
the first-parent history as a heuristic for satisfying reachability
queries. The most obvious example was implemented in 4fbcca4e
(commit-reach: make can_all_from_reach... linear, 2018-07-20).

Update the walk in remove_redundant() to use this same heuristic. Here,
we are walking starting at the parents of the input commits. Sort those
parents and walk from the highest generation to lower. Each time, use
the heuristic of searching the first parent history before continuing to
expand the walk.

The order in which we explore the commits matters, so update
compare_commits_by_gen to break generation number ties with commit date.
This has no effect when the commits are in a commit-graph file with
corrected commit dates computed, but it will assist when the commits are
in the region "above" the commit-graph with "infinite" generation
number. Note that we cannot shift to use
compare_commits_by_gen_then_commit_date as the method prototype is
different. We use compare_commits_by_gen for QSORT() as opposed to as a
priority function.

The important piece is to ensure we short-circuit the walk when we find
that there is a single non-redundant commit. This happens frequently
when looking for merge-bases or comparing several tags with 'git
merge-base --independent'. Use a new count 'count_still_independent' and
if that hits 1 we can stop walking.

To update 'count_still_independent' properly, we add use of the RESULT
flag on the input commits. Then we can detect when we reach one of these
commits and decrease the count. We need to remove the RESULT flag at
that moment because we might re-visit that commit when popping the
stack.

We use the STALE flag to mark parents that have been added to the new
walk_start list, but we need to clear that flag before we start walking
so those flags don't halt our depth-first-search walk.

On my copy of the Linux kernel repository, the performance of 'git
merge-base --independent <all-tags>' goes from 1.1 seconds to 0.11
seconds.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-reach.c | 72 +++++++++++++++++++++++++++++++++++++++-----------
 1 file changed, 56 insertions(+), 16 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 7bf52e94429..d3a6e2bdd04 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -29,6 +29,10 @@ static int compare_commits_by_gen(const void *_a, const void *_b)
 		return -1;
 	if (generation_a > generation_b)
 		return 1;
+	if (a->date < b->date)
+		return -1;
+	if (a->date > b->date)
+		return 1;
 	return 0;
 }
 
@@ -231,11 +235,10 @@ static int remove_redundant_no_gen(struct repository *r,
 static int remove_redundant_with_gen(struct repository *r,
 				     struct commit **array, int cnt)
 {
-	int i, count_non_stale = 0;
+	int i, count_non_stale = 0, count_still_independent = cnt;
 	timestamp_t min_generation = GENERATION_NUMBER_INFINITY;
 	struct commit **walk_start;
 	size_t walk_start_nr = 0, walk_start_alloc = cnt;
-	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
 
 	ALLOC_ARRAY(walk_start, walk_start_alloc);
 
@@ -245,6 +248,7 @@ static int remove_redundant_with_gen(struct repository *r,
 		timestamp_t generation;
 
 		repo_parse_commit(r, array[i]);
+		array[i]->object.flags |= RESULT;
 		parents = array[i]->parents;
 
 		while (parents) {
@@ -253,7 +257,6 @@ static int remove_redundant_with_gen(struct repository *r,
 				parents->item->object.flags |= STALE;
 				ALLOC_GROW(walk_start, walk_start_nr + 1, walk_start_alloc);
 				walk_start[walk_start_nr++] = parents->item;
-				prio_queue_put(&queue, parents->item);
 			}
 			parents = parents->next;
 		}
@@ -264,26 +267,63 @@ static int remove_redundant_with_gen(struct repository *r,
 			min_generation = generation;
 	}
 
-	/* push the STALE bits up to min generation */
-	while (queue.nr) {
-		struct commit_list *parents;
-		struct commit *c = prio_queue_get(&queue);
+	QSORT(walk_start, walk_start_nr, compare_commits_by_gen);
 
-		repo_parse_commit(r, c);
+	/* remove STALE bit for now to allow walking through parents */
+	for (i = 0; i < walk_start_nr; i++)
+		walk_start[i]->object.flags &= ~STALE;
 
-		if (commit_graph_generation(c) < min_generation)
-			continue;
+	/*
+	 * Start walking from the highest generation. Hopefully, it will
+	 * find all other items during the first-parent walk, and we can
+	 * terminate early. Otherwise, we will do the same amount of work
+	 * as before.
+	 */
+	for (i = walk_start_nr - 1; i >= 0 && count_still_independent > 1; i--) {
+		/* push the STALE bits up to min generation */
+		struct commit_list *stack = NULL;
 
-		parents = c->parents;
-		while (parents) {
-			if (!(parents->item->object.flags & STALE)) {
-				parents->item->object.flags |= STALE;
-				prio_queue_put(&queue, parents->item);
+		commit_list_insert(walk_start[i], &stack);
+		walk_start[i]->object.flags |= STALE;
+
+		while (stack) {
+			struct commit_list *parents;
+			struct commit *c = stack->item;
+
+			repo_parse_commit(r, c);
+
+			if (c->object.flags & RESULT) {
+				c->object.flags &= ~RESULT;
+				if (--count_still_independent <= 1)
+					break;
 			}
-			parents = parents->next;
+
+			if (commit_graph_generation(c) < min_generation) {
+				pop_commit(&stack);
+				continue;
+			}
+
+			parents = c->parents;
+			while (parents) {
+				if (!(parents->item->object.flags & STALE)) {
+					parents->item->object.flags |= STALE;
+					commit_list_insert(parents->item, &stack);
+					break;
+				}
+				parents = parents->next;
+			}
+
+			/* pop if all parents have been visited already */
+			if (!parents)
+				pop_commit(&stack);
 		}
+		free_commit_list(stack);
 	}
 
+	/* clear result */
+	for (i = 0; i < cnt; i++)
+		array[i]->object.flags &= ~RESULT;
+
 	/* rearrange array */
 	for (i = count_non_stale = 0; i < cnt; i++) {
 		if (!(array[i]->object.flags & STALE))
-- 
gitgitgadget


  parent reply	other threads:[~2021-02-01 12:49 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-28 16:24 [PATCH 0/3] Speed up remove_redundant() Derrick Stolee via GitGitGadget
2021-01-28 16:24 ` [PATCH 1/3] commit-reach: use one walk in remove_redundant() Derrick Stolee via GitGitGadget
2021-01-28 20:51   ` Junio C Hamano
2021-01-29 17:11     ` René Scharfe
2021-01-31  3:52       ` Derrick Stolee
2021-01-31 10:20         ` René Scharfe
2021-01-31  3:59     ` Derrick Stolee
2021-01-31 20:13       ` Derrick Stolee
2021-01-31 20:25       ` Junio C Hamano
2021-02-01  3:55         ` Derrick Stolee
2021-01-29 17:10   ` René Scharfe
2021-01-28 16:24 ` [PATCH 2/3] commit-reach: move compare_commits_by_gen Derrick Stolee via GitGitGadget
2021-01-28 16:24 ` [PATCH 3/3] commit-reach: use heuristic in remove_redundant() Derrick Stolee via GitGitGadget
2021-01-28 20:20 ` [PATCH 0/3] Speed up remove_redundant() Junio C Hamano
2021-02-01 12:47 ` [PATCH v2 0/5] " Derrick Stolee via GitGitGadget
2021-02-01 12:47   ` [PATCH v2 1/5] commit-reach: reduce requirements for remove_redundant() Derrick Stolee via GitGitGadget
2021-02-01 19:51     ` Junio C Hamano
2021-02-01 12:47   ` [PATCH v2 2/5] commit-reach: use one walk in remove_redundant() Derrick Stolee via GitGitGadget
2021-02-01 16:12     ` René Scharfe.
2021-02-01 16:31       ` Derrick Stolee
2021-02-01 12:47   ` [PATCH v2 3/5] commit-reach: move compare_commits_by_gen Derrick Stolee via GitGitGadget
2021-02-01 12:47   ` Derrick Stolee via GitGitGadget [this message]
2021-02-01 20:05     ` [PATCH v2 4/5] commit-reach: use heuristic in remove_redundant() Junio C Hamano
2021-02-01 21:02       ` Derrick Stolee
2021-02-01 12:47   ` [PATCH v2 5/5] commit-reach: stale commits may prune generation further Derrick Stolee via GitGitGadget
2021-02-03 15:59     ` Taylor Blau
2021-02-01 15:48   ` [PATCH v2 0/5] Speed up remove_redundant() Derrick Stolee
2021-02-18 23:25     ` Junio C Hamano
2021-02-19 12:17       ` Derrick Stolee
2021-02-20  3:32         ` Junio C Hamano
2021-02-19 12:34   ` [PATCH v3 " Derrick Stolee via GitGitGadget
2021-02-19 12:34     ` [PATCH v3 1/5] commit-reach: reduce requirements for remove_redundant() Derrick Stolee via GitGitGadget
2021-02-19 12:34     ` [PATCH v3 2/5] commit-reach: use one walk in remove_redundant() Derrick Stolee via GitGitGadget
2021-02-19 12:34     ` [PATCH v3 3/5] commit-reach: move compare_commits_by_gen Derrick Stolee via GitGitGadget
2021-02-19 12:34     ` [PATCH v3 4/5] commit-reach: use heuristic in remove_redundant() Derrick Stolee via GitGitGadget
2021-02-19 12:34     ` [PATCH v3 5/5] commit-reach: stale commits may prune generation further Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=83feabeebb5f035059758fba1ca5cf74f3a22c91.1612183647.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    --cc=me@ttaylorr.com \
    --cc=mhagger@alum.mit.edu \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).