git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, me@ttaylorr.com, vdye@github.com,
	Jeff King <peff@peff.net>,
	Derrick Stolee <derrickstolee@github.com>,
	Derrick Stolee <derrickstolee@github.com>
Subject: [PATCH v2 6/8] commit-reach: implement ahead_behind() logic
Date: Fri, 10 Mar 2023 17:21:01 +0000	[thread overview]
Message-ID: <9831c23eadbcd990ca09628e5846056e4879ee3d.1678468864.git.gitgitgadget@gmail.com> (raw)
In-Reply-To: <pull.1489.v2.git.1678468863.gitgitgadget@gmail.com>

From: Derrick Stolee <derrickstolee@github.com>

Fully implement the commit-counting logic required to determine
ahead/behind counts for a batch of commit pairs. This is a new library
method within commit-reach.h. This method will be linked to the
for-each-ref builtin in the next change.

The interface for ahead_behind() uses two arrays. The first array of
commits contains the list of all starting points for the walk. This
includes all tip commits _and_ base commits. The second array, using the
new ahead_behind_count struct, indicates which commits from that initial
array form the base/tip pair for the ahead/behind count it will store.

This implementation of ahead_behind() allows multiple bases, if desired.
Even with multiple bases, there is only one commit walk used for
counting the ahead/behind values, saving time when the base/tip ranges
overlap significantly.

This interface for ahead_behind() also makes it very easy to call
ensure_generations_valid() on the entire array of bases and tips. This
call is necessary because it is critical that the walk that counts
ahead/behind values never walks a commit more than once. Without
generation numbers on every commit, there is a possibility that a
commit date skew could cause the walk to revisit a commit and then
double-count it. For this reason, it is strongly recommended that 'git
ahead-behind' is only run in a repository with a commit-graph file that
covers most of the reachable commits, storing precomputed generation
numbers. If no commit-graph exists, this walk will be much slower as it
must walk all reachable commits in ensure_generations_valid() before
performing the counting logic.

It is possible to detect if generation numbers are available at run time
and redirect the implementation to another algorithm that does not
require this property. However, that implementation requires a commit
walk per base/tip pair _and_ can be slower due to the commit date
heuristics required. Such an implementation could be considered in the
future if there is a reason to include it, but most Git hosts should
already be generating a commit-graph file as part of repository
maintenance. Most Git clients should also be generating commit-graph
files as part of background maintenance or automatic GCs.

Now, let's discuss the ahead/behind counting algorithm.

Each commit in the input commit list is associated with a bit position
indicating "the ith commit can reach this commit". Each of these commits
is associated with a bitmap with its position flipped on and then
placed in a queue for walking commit history. We walk commits by popping
the commit with maximum generation number out of the queue, guaranteeing
that we will never walk a child of that commit in any future steps.

As we walk, we load the bitmap for the current commit and perform two
main steps. The _second_ step examines each parent of the current commit
and adds the current commit's bitmap bits to each parent's bitmap. (We
create a new bitmap for the parent if this is our first time seeing that
parent.) After adding the bits to the parent's bitmap, the parent is
added to the walk queue. Due to this passing of bits to parents, the
current commit has a guarantee that the ith bit is enabled on its bitmap
if and only if the ith commit can reach the current commit.

The first step of the walk is to examine the bitmask on the current
commit and decide which ranges the commit is in or not. Due to the "bit
pushing" in the second step, we have a guarantee that the ith bit of the
current commit's bitmap is on if and only if the ith starting commit can
reach it. For each ahead_behind_count struct, check the base_index and
tip_index to see if those bits are enabled on the current bitmap. If
exactly one bit is enabled, then increment the corresponding 'ahead' or
'behind' count.  This increment is the reason we _absolutely need_ to
walk commits at most once.

The only subtle thing to do with this walk is to check to see if a
parent has all bits on in its bitmap, in which case it becomes "stale"
and is marked with the STALE bit. This allows queue_has_nonstale() to be
the terminating condition of the walk, which greatly reduces the number
of commits walked if all of the commits are nearby in history. It avoids
walking a large number of common commits when there is a deep history.
We also use the helper method insert_no_dup() to add commits to the
priority queue without adding them multiple times. This uses the PARENT2
flag. Thus, we must clear both the STALE and PARENT2 bits of all
commits, in case ahead_behind() is called multiple times in the same
process.

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 commit-reach.c | 96 ++++++++++++++++++++++++++++++++++++++++++++++++++
 commit-reach.h | 30 ++++++++++++++++
 2 files changed, 126 insertions(+)

diff --git a/commit-reach.c b/commit-reach.c
index 2e33c599a82..338ca8084b2 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -8,6 +8,7 @@
 #include "revision.h"
 #include "tag.h"
 #include "commit-reach.h"
+#include "ewah/ewok.h"
 
 /* Remember to update object flag allocation in object.h */
 #define PARENT1		(1u<<16)
@@ -941,3 +942,98 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
 
 	return found_commits;
 }
+
+define_commit_slab(bit_arrays, struct bitmap *);
+static struct bit_arrays bit_arrays;
+
+static void insert_no_dup(struct prio_queue *queue, struct commit *c)
+{
+	if (c->object.flags & PARENT2)
+		return;
+	prio_queue_put(queue, c);
+	c->object.flags |= PARENT2;
+}
+
+static struct bitmap *init_bit_array(struct commit *c, int width)
+{
+	struct bitmap **bitmap = bit_arrays_at(&bit_arrays, c);
+	if (!*bitmap)
+		*bitmap = bitmap_word_alloc(width);
+	return *bitmap;
+}
+
+static void free_bit_array(struct commit *c)
+{
+	struct bitmap **bitmap = bit_arrays_at(&bit_arrays, c);
+	if (!*bitmap)
+		return;
+	bitmap_free(*bitmap);
+	*bitmap = NULL;
+}
+
+void ahead_behind(struct commit **commits, size_t commits_nr,
+		  struct ahead_behind_count *counts, size_t counts_nr)
+{
+	struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
+	size_t width = (commits_nr + BITS_IN_EWORD - 1) / BITS_IN_EWORD;
+	size_t i;
+
+	if (!commits_nr || !counts_nr)
+		return;
+
+	for (i = 0; i < counts_nr; i++) {
+		counts[i].ahead = 0;
+		counts[i].behind = 0;
+	}
+
+	ensure_generations_valid(commits, commits_nr);
+
+	init_bit_arrays(&bit_arrays);
+
+	for (i = 0; i < commits_nr; i++) {
+		struct commit *c = commits[i];
+		struct bitmap *bitmap = init_bit_array(c, width);
+
+		bitmap_set(bitmap, i);
+		insert_no_dup(&queue, c);
+	}
+
+	while (queue_has_nonstale(&queue)) {
+		struct commit *c = prio_queue_get(&queue);
+		struct commit_list *p;
+		struct bitmap *bitmap_c = init_bit_array(c, width);
+
+		for (i = 0; i < counts_nr; i++) {
+			int reach_from_tip = !!bitmap_get(bitmap_c, counts[i].tip_index);
+			int reach_from_base = !!bitmap_get(bitmap_c, counts[i].base_index);
+
+			if (reach_from_tip ^ reach_from_base) {
+				if (reach_from_base)
+					counts[i].behind++;
+				else
+					counts[i].ahead++;
+			}
+		}
+
+		for (p = c->parents; p; p = p->next) {
+			struct bitmap *bitmap_p;
+
+			parse_commit(p->item);
+
+			bitmap_p = init_bit_array(p->item, width);
+			bitmap_or(bitmap_p, bitmap_c);
+
+			if (bitmap_popcount(bitmap_p) == commits_nr)
+				p->item->object.flags |= STALE;
+
+			insert_no_dup(&queue, p->item);
+		}
+
+		free_bit_array(c);
+	}
+
+	/* STALE is used here, PARENT2 is used by insert_no_dup(). */
+	repo_clear_commit_marks(the_repository, PARENT2 | STALE);
+	clear_bit_arrays(&bit_arrays);
+	clear_prio_queue(&queue);
+}
diff --git a/commit-reach.h b/commit-reach.h
index 148b56fea50..f871b5dcce9 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -104,4 +104,34 @@ struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
 					 struct commit **to, int nr_to,
 					 unsigned int reachable_flag);
 
+struct ahead_behind_count {
+	/**
+	 * As input, the *_index members indicate which positions in
+	 * the 'tips' array correspond to the tip and base of this
+	 * comparison.
+	 */
+	size_t tip_index;
+	size_t base_index;
+
+	/**
+	 * These values store the computed counts for each side of the
+	 * symmetric difference:
+	 *
+	 * 'ahead' stores the number of commits reachable from the tip
+	 * and not reachable from the base.
+	 *
+	 * 'behind' stores the number of commits reachable from the base
+	 * and not reachable from the tip.
+	 */
+	unsigned int ahead;
+	unsigned int behind;
+};
+
+/*
+ * Given an array of commits and an array of ahead_behind_count pairs,
+ * compute the ahead/behind counts for each pair.
+ */
+void ahead_behind(struct commit **commits, size_t commits_nr,
+		  struct ahead_behind_count *counts, size_t counts_nr);
+
 #endif
-- 
gitgitgadget


  parent reply	other threads:[~2023-03-10 17:21 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-06 14:06 [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges Derrick Stolee via GitGitGadget
2023-03-06 14:06 ` [PATCH 1/8] ahead-behind: create empty builtin Derrick Stolee via GitGitGadget
2023-03-06 18:48   ` Junio C Hamano
2023-03-07  0:40     ` Taylor Blau
2023-03-08 22:14       ` Derrick Stolee
2023-03-08 22:56         ` Junio C Hamano
2023-03-06 14:06 ` [PATCH 2/8] ahead-behind: parse tip references Derrick Stolee via GitGitGadget
2023-03-07  0:43   ` Taylor Blau
2023-03-06 14:06 ` [PATCH 3/8] ahead-behind: implement --ignore-missing option Derrick Stolee via GitGitGadget
2023-03-07  0:46   ` Taylor Blau
2023-03-06 14:06 ` [PATCH 4/8] commit-graph: combine generation computations Derrick Stolee via GitGitGadget
2023-03-06 14:06 ` [PATCH 5/8] commit-graph: return generation from memory Derrick Stolee via GitGitGadget
2023-03-06 14:06 ` [PATCH 6/8] commit-graph: introduce `ensure_generations_valid()` Taylor Blau via GitGitGadget
2023-03-06 18:52   ` Junio C Hamano
2023-03-07  0:50     ` Taylor Blau
2023-03-06 14:06 ` [PATCH 7/8] ahead-behind: implement ahead_behind() logic Derrick Stolee via GitGitGadget
2023-03-07  1:05   ` Taylor Blau
2023-03-09 17:32     ` Derrick Stolee
2023-03-06 14:06 ` [PATCH 8/8] ahead-behind: add --contains mode Derrick Stolee via GitGitGadget
2023-03-06 18:26 ` [PATCH 0/8] ahead-behind: new builtin for counting multiple commit ranges Junio C Hamano
2023-03-06 20:18   ` Derrick Stolee
2023-03-06 22:24     ` Junio C Hamano
2023-03-07  0:36   ` Taylor Blau
2023-03-09  9:20     ` Jeff King
2023-03-09 21:51       ` Junio C Hamano
2023-03-07  0:33 ` Taylor Blau
2023-03-10 17:20 ` [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
2023-03-10 17:20   ` [PATCH v2 1/8] for-each-ref: add --stdin option Derrick Stolee via GitGitGadget
2023-03-10 18:08     ` Junio C Hamano
2023-03-13 10:31     ` Phillip Wood
2023-03-13 13:33       ` Derrick Stolee
2023-03-13 21:10         ` Taylor Blau
2023-03-15 13:37     ` Ævar Arnfjörð Bjarmason
2023-03-15 17:17       ` Jeff King
2023-03-15 17:49     ` Jeff King
2023-03-15 19:24       ` Junio C Hamano
2023-03-15 19:44         ` Jeff King
2023-03-10 17:20   ` [PATCH v2 2/8] for-each-ref: explicitly test no matches Derrick Stolee via GitGitGadget
2023-03-10 17:20   ` [PATCH v2 3/8] commit-graph: combine generation computations Derrick Stolee via GitGitGadget
2023-03-10 17:20   ` [PATCH v2 4/8] commit-graph: return generation from memory Derrick Stolee via GitGitGadget
2023-03-10 17:21   ` [PATCH v2 5/8] commit-graph: introduce `ensure_generations_valid()` Taylor Blau via GitGitGadget
2023-03-10 17:21   ` Derrick Stolee via GitGitGadget [this message]
2023-03-15 13:50     ` [PATCH v2 6/8] commit-reach: implement ahead_behind() logic Ævar Arnfjörð Bjarmason
2023-03-15 16:03       ` Junio C Hamano
2023-03-15 16:13         ` Derrick Stolee
2023-03-10 17:21   ` [PATCH v2 7/8] for-each-ref: add ahead-behind format atom Derrick Stolee via GitGitGadget
2023-03-10 19:09     ` Junio C Hamano
2023-03-15 13:57     ` Ævar Arnfjörð Bjarmason
2023-03-15 16:01       ` Junio C Hamano
2023-03-15 16:12         ` Derrick Stolee
2023-03-15 16:11       ` Derrick Stolee
2023-03-10 17:21   ` [PATCH v2 8/8] commit-reach: add tips_reachable_from_bases() Derrick Stolee via GitGitGadget
2023-03-15 14:13     ` Ævar Arnfjörð Bjarmason
2023-03-15 16:17       ` Derrick Stolee
2023-03-15 16:18         ` Derrick Stolee
2023-03-10 19:16   ` [PATCH v2 0/8] ref-filter: ahead/behind counting, faster --merged option Junio C Hamano
2023-03-10 19:25     ` Derrick Stolee
2023-03-15 17:31       ` Jeff King
2023-03-15 17:44         ` Derrick Stolee
2023-03-15 19:34         ` Junio C Hamano
2023-03-15 13:22   ` Ævar Arnfjörð Bjarmason
2023-03-15 13:54     ` Derrick Stolee
2023-03-15 17:45   ` [PATCH v3 " Derrick Stolee via GitGitGadget
2023-03-15 17:45     ` [PATCH v3 1/8] for-each-ref: add --stdin option Derrick Stolee via GitGitGadget
2023-03-15 18:06       ` Jeff King
2023-03-15 19:14         ` Junio C Hamano
2023-03-15 22:41       ` Jonathan Tan
2023-03-15 17:45     ` [PATCH v3 2/8] for-each-ref: explicitly test no matches Derrick Stolee via GitGitGadget
2023-03-15 17:45     ` [PATCH v3 3/8] commit-graph: combine generation computations Derrick Stolee via GitGitGadget
2023-03-15 22:49       ` Jonathan Tan
2023-03-17 18:30         ` Derrick Stolee
2023-03-15 17:45     ` [PATCH v3 4/8] commit-graph: return generation from memory Derrick Stolee via GitGitGadget
2023-03-15 22:58       ` Jonathan Tan
2023-03-15 17:45     ` [PATCH v3 5/8] commit-graph: introduce `ensure_generations_valid()` Taylor Blau via GitGitGadget
2023-03-15 17:45     ` [PATCH v3 6/8] commit-reach: implement ahead_behind() logic Derrick Stolee via GitGitGadget
2023-03-15 23:28       ` Jonathan Tan
2023-03-17 18:44         ` Derrick Stolee
2023-03-15 17:45     ` [PATCH v3 7/8] for-each-ref: add ahead-behind format atom Derrick Stolee via GitGitGadget
2023-03-15 17:45     ` [PATCH v3 8/8] commit-reach: add tips_reachable_from_bases() Derrick Stolee via GitGitGadget
2023-03-20 11:26     ` [PATCH v4 0/9] ref-filter: ahead/behind counting, faster --merged option Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 1/9] for-each-ref: add --stdin option Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 2/9] for-each-ref: explicitly test no matches Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 3/9] commit-graph: refactor compute_topological_levels() Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 4/9] commit-graph: simplify compute_generation_numbers() Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 5/9] commit-graph: return generation from memory Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 6/9] commit-graph: introduce `ensure_generations_valid()` Taylor Blau via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 7/9] commit-reach: implement ahead_behind() logic Derrick Stolee via GitGitGadget
2023-03-20 20:40         ` Jonathan Tan
2023-03-20 11:26       ` [PATCH v4 8/9] for-each-ref: add ahead-behind format atom Derrick Stolee via GitGitGadget
2023-03-20 11:26       ` [PATCH v4 9/9] commit-reach: add tips_reachable_from_bases() Derrick Stolee via GitGitGadget

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9831c23eadbcd990ca09628e5846056e4879ee3d.1678468864.git.gitgitgadget@gmail.com \
    --to=gitgitgadget@gmail.com \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).