git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: git@vger.kernel.org
Cc: Jeff King <peff@peff.net>, Elijah Newren <newren@gmail.com>,
	Junio C Hamano <gitster@pobox.com>
Subject: [PATCH 23/24] pseudo-merge: implement support for finding existing merges
Date: Wed, 20 Mar 2024 18:06:07 -0400	[thread overview]
Message-ID: <4ae4f0eaae5ebe9495968e8585f4b2692d2cbec2.1710972293.git.me@ttaylorr.com> (raw)
In-Reply-To: <cover.1710972293.git.me@ttaylorr.com>

This patch implements support for reusing existing pseudo-merge commits
when writing bitmaps when there is an existing pseudo-merge bitmap which
has exactly the same set of parents as one that we are about to write.

Note that unstable pseudo-merges are likely to change between
consecutive repacks, and so are generally poor candidates for reuse.
However, stable pseudo-merges (see the configuration option
'bitmapPseudoMerge.<name>.stableThreshold') are by definition unlikely
to change between runs (as they represent long-running branches).

Because there is no index from a *set* of pseudo-merge parents to a
matching pseudo-merge bitmap, we have to construct the bitmap
corresponding to the set of parents for each pending pseudo-merge commit
and see if a matching bitmap exists.

This is technically quadratic in the number of pseudo-merges, but is OK
in practice for a couple of reasons:

  - non-matching pseudo-merge bitmaps are rejected quickly as soon as
    they differ in a single bit

  - already-matched pseudo-merge bitmaps are discarded from subsequent
    rounds of search

  - the number of pseudo-merges is generally small, even for large
    repositories

In order to do this, implement (a) a function that finds a matching
pseudo-merge given some uncompressed bitset describing its parents, (b)
a function that computes the bitset of parents for a given pseudo-merge
commit, and (c) call that function before computing the set of reachable
objects for some pending pseudo-merge.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap-write.c             | 15 ++++++--
 pack-bitmap.c                   | 32 +++++++++++++++++
 pack-bitmap.h                   |  2 ++
 pseudo-merge.c                  | 55 ++++++++++++++++++++++++++++
 pseudo-merge.h                  |  7 ++++
 t/t5333-pseudo-merge-bitmaps.sh | 64 +++++++++++++++++++++++++++++++++
 6 files changed, 173 insertions(+), 2 deletions(-)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 2d1b202fcd9..fdd84d31a68 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -19,6 +19,10 @@
 #include "tree-walk.h"
 #include "pseudo-merge.h"
 #include "oid-array.h"
+#include "config.h"
+#include "alloc.h"
+#include "refs.h"
+#include "strmap.h"
 
 struct bitmapped_commit {
 	struct commit *commit;
@@ -443,6 +447,7 @@ static int fill_bitmap_tree(struct bitmap *bitmap,
 }
 
 static int reused_bitmaps_nr;
+static int reused_pseudo_merge_bitmaps_nr;
 
 static int fill_bitmap_commit(struct bb_commit *ent,
 			      struct commit *commit,
@@ -467,7 +472,7 @@ static int fill_bitmap_commit(struct bb_commit *ent,
 			struct bitmap *remapped = bitmap_new();
 
 			if (commit->object.flags & BITMAP_PSEUDO_MERGE)
-				old = NULL;
+				old = pseudo_merge_bitmap_for_commit(old_bitmap, c);
 			else
 				old = bitmap_for_commit(old_bitmap, c);
 			/*
@@ -478,7 +483,10 @@ static int fill_bitmap_commit(struct bb_commit *ent,
 			if (old && !rebuild_bitmap(mapping, old, remapped)) {
 				bitmap_or(ent->bitmap, remapped);
 				bitmap_free(remapped);
-				reused_bitmaps_nr++;
+				if (commit->object.flags & BITMAP_PSEUDO_MERGE)
+					reused_pseudo_merge_bitmaps_nr++;
+				else
+					reused_bitmaps_nr++;
 				continue;
 			}
 			bitmap_free(remapped);
@@ -604,6 +612,9 @@ int bitmap_writer_build(struct packing_data *to_pack)
 			    the_repository);
 	trace2_data_intmax("pack-bitmap-write", the_repository,
 			   "building_bitmaps_reused", reused_bitmaps_nr);
+	trace2_data_intmax("pack-bitmap-write", the_repository,
+			   "building_bitmaps_pseudo_merge_reused",
+			   reused_pseudo_merge_bitmaps_nr);
 
 	stop_progress(&writer.progress);
 
diff --git a/pack-bitmap.c b/pack-bitmap.c
index be65f637cf5..5a5f8b7e69f 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1316,6 +1316,37 @@ static struct bitmap *find_boundary_objects(struct bitmap_index *bitmap_git,
 	return cb.base;
 }
 
+struct ewah_bitmap *pseudo_merge_bitmap_for_commit(struct bitmap_index *bitmap_git,
+						   struct commit *commit)
+{
+	struct commit_list *p;
+	struct bitmap *parents;
+	struct pseudo_merge *match = NULL;
+
+	if (!bitmap_git->pseudo_merges.nr)
+		return NULL;
+
+	parents = bitmap_new();
+
+	for (p = commit->parents; p; p = p->next) {
+		int pos = bitmap_position(bitmap_git, &p->item->object.oid);
+		if (pos < 0 || pos >= bitmap_num_objects(bitmap_git))
+			goto done;
+
+		bitmap_set(parents, pos);
+	}
+
+	match = pseudo_merge_for_parents(&bitmap_git->pseudo_merges,
+						parents);
+
+done:
+	bitmap_free(parents);
+	if (match)
+		return pseudo_merge_bitmap(&bitmap_git->pseudo_merges, match);
+
+	return NULL;
+}
+
 static void unsatisfy_all_pseudo_merges(struct bitmap_index *bitmap_git)
 {
 	uint32_t i;
@@ -2808,6 +2839,7 @@ void free_bitmap_index(struct bitmap_index *b)
 		 */
 		close_midx_revindex(b->midx);
 	}
+	free_pseudo_merge_map(&b->pseudo_merges);
 	free(b);
 }
 
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 25d3b8e604a..0fefef39bec 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -119,6 +119,8 @@ int rebuild_bitmap(const uint32_t *reposition,
 		   struct bitmap *dest);
 struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
 				      struct commit *commit);
+struct ewah_bitmap *pseudo_merge_bitmap_for_commit(struct bitmap_index *bitmap_git,
+						   struct commit *commit);
 void bitmap_writer_select_commits(struct commit **indexed_commits,
 				  unsigned int indexed_commits_nr);
 int bitmap_writer_build(struct packing_data *to_pack);
diff --git a/pseudo-merge.c b/pseudo-merge.c
index e111c9cd1a6..9e21fbb5062 100644
--- a/pseudo-merge.c
+++ b/pseudo-merge.c
@@ -682,3 +682,58 @@ int cascade_pseudo_merges(const struct pseudo_merge_map *pm,
 
 	return ret;
 }
+
+struct pseudo_merge *pseudo_merge_for_parents(const struct pseudo_merge_map *pm,
+					      struct bitmap *parents)
+{
+	struct pseudo_merge *match = NULL;
+	size_t i;
+
+	if (!pm->nr)
+		return NULL;
+
+	/*
+	 * NOTE: this loop is quadratic in the worst-case (where no
+	 * matching pseudo-merge bitmaps are found), but in practice
+	 * this is OK for a few reasons:
+	 *
+	 *   - Rejecting pseudo-merge bitmaps that do not match the
+	 *     given commit is done quickly (i.e. `bitmap_equals_ewah()`
+	 *     returns early when we know the two bitmaps aren't equal.
+	 *
+	 *   - Already matched pseudo-merge bitmaps (which we track with
+	 *     the `->satisfied` bit here) are skipped as potential
+	 *     candidates.
+	 *
+	 *   - The number of pseudo-merges should be small (in the
+	 *     hundreds for most repositories).
+	 *
+	 * If in the future this semi-quadratic behavior does become a
+	 * problem, another approach would be to keep track of which
+	 * pseudo-merges are still "viable" after enumerating the
+	 * pseudo-merge commit's parents:
+	 *
+	 *   - A pseudo-merge bitmap becomes non-viable when the bit(s)
+	 *     corresponding to one or more parent(s) of the given
+	 *     commit are not set in a candidate pseudo-merge's commits
+	 *     bitmap.
+	 *
+	 *   - After processing all bits, enumerate the remaining set of
+	 *     viable pseudo-merge bitmaps, and check that their
+	 *     popcount() matches the number of parents in the given
+	 *     commit.
+	 */
+	for (i = 0; i < pm->nr; i++) {
+		struct pseudo_merge *candidate = use_pseudo_merge(pm, &pm->v[i]);
+		if (!candidate || candidate->satisfied)
+			continue;
+		if (!bitmap_equals_ewah(parents, candidate->commits))
+			continue;
+
+		match = candidate;
+		match->satisfied = 1;
+		break;
+	}
+
+	return match;
+}
diff --git a/pseudo-merge.h b/pseudo-merge.h
index cc14e947e86..33acd00a3e5 100644
--- a/pseudo-merge.h
+++ b/pseudo-merge.h
@@ -208,4 +208,11 @@ int cascade_pseudo_merges(const struct pseudo_merge_map *pm,
 			  struct bitmap *result,
 			  struct bitmap *roots);
 
+/*
+ * Returns a pseudo-merge which contains the exact set of commits
+ * listed in the "parents" bitamp, or NULL if none could be found.
+ */
+struct pseudo_merge *pseudo_merge_for_parents(const struct pseudo_merge_map *pm,
+					      struct bitmap *parents);
+
 #endif
diff --git a/t/t5333-pseudo-merge-bitmaps.sh b/t/t5333-pseudo-merge-bitmaps.sh
index 909c17e301e..531f1924af4 100755
--- a/t/t5333-pseudo-merge-bitmaps.sh
+++ b/t/t5333-pseudo-merge-bitmaps.sh
@@ -22,6 +22,10 @@ test_pseudo_merges_cascades () {
 	test_trace2_data bitmap pseudo_merges_cascades "$1"
 }
 
+test_pseudo_merges_reused () {
+	test_trace2_data pack-bitmap-write building_bitmaps_pseudo_merge_reused "$1"
+}
+
 tag_everything () {
 	git rev-list --all --no-object-names >in &&
 	perl -lne '
@@ -322,4 +326,64 @@ test_expect_success 'pseudo-merge overlap stale traversal' '
 	)
 '
 
+test_expect_success 'pseudo-merge reuse' '
+	git init pseudo-merge-reuse &&
+	(
+		cd pseudo-merge-reuse &&
+
+		stable="1641013200" && # 2022-01-01
+		unstable="1672549200" && # 2023-01-01
+
+		for date in $stable $unstable
+		do
+			test_commit_bulk --date "$date +0000" 128 &&
+			test_tick || return 1
+		done &&
+
+		tag_everything &&
+
+		git \
+			-c bitmapPseudoMerge.test.pattern="refs/tags/" \
+			-c bitmapPseudoMerge.test.maxMerges=1 \
+			-c bitmapPseudoMerge.test.threshold=now \
+			-c bitmapPseudoMerge.test.stableThreshold=$(($unstable - 1)) \
+			-c bitmapPseudoMerge.test.stableSize=512 \
+			repack -adb &&
+
+		test_pseudo_merges >merges &&
+		test_line_count = 2 merges &&
+
+		test_pseudo_merge_commits 0 >stable-oids.before &&
+		test_pseudo_merge_commits 1 >unstable-oids.before &&
+
+		: >trace2.txt &&
+		GIT_TRACE2_EVENT=$PWD/trace2.txt git \
+			-c bitmapPseudoMerge.test.pattern="refs/tags/" \
+			-c bitmapPseudoMerge.test.maxMerges=2 \
+			-c bitmapPseudoMerge.test.threshold=now \
+			-c bitmapPseudoMerge.test.stableThreshold=$(($unstable - 1)) \
+			-c bitmapPseudoMerge.test.stableSize=512 \
+			repack -adb &&
+
+		test_pseudo_merges_reused 1 <trace2.txt &&
+
+		test_pseudo_merges >merges &&
+		test_line_count = 3 merges &&
+
+		test_pseudo_merge_commits 0 >stable-oids.after &&
+		for i in 1 2
+		do
+			test_pseudo_merge_commits $i || return 1
+		done >unstable-oids.after &&
+
+		sort -u <stable-oids.before >expect &&
+		sort -u <stable-oids.after >actual &&
+		test_cmp expect actual &&
+
+		sort -u <unstable-oids.before >expect &&
+		sort -u <unstable-oids.after >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_done
-- 
2.44.0.303.g1dc5e5b124c



  parent reply	other threads:[~2024-03-20 22:07 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-20 22:04 [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Taylor Blau
2024-03-20 22:05 ` [PATCH 01/24] Documentation/technical: describe pseudo-merge bitmaps format Taylor Blau
2024-03-21 21:24   ` Junio C Hamano
2024-03-21 22:13     ` Taylor Blau
2024-03-21 22:22       ` Junio C Hamano
2024-03-20 22:05 ` [PATCH 02/24] config: repo_config_get_expiry() Taylor Blau
2024-04-10 17:54   ` Jeff King
2024-04-29 19:39     ` Taylor Blau
2024-03-20 22:05 ` [PATCH 03/24] ewah: implement `ewah_bitmap_is_subset()` Taylor Blau
2024-04-10 18:05   ` Jeff King
2024-04-29 19:47     ` Taylor Blau
2024-03-20 22:05 ` [PATCH 04/24] pack-bitmap: drop unused `max_bitmaps` parameter Taylor Blau
2024-04-10 18:06   ` Jeff King
2024-03-20 22:05 ` [PATCH 05/24] pack-bitmap: move some initialization to `bitmap_writer_init()` Taylor Blau
2024-04-10 18:10   ` Jeff King
2024-03-20 22:05 ` [PATCH 06/24] pseudo-merge.ch: initial commit Taylor Blau
2024-03-20 22:05 ` [PATCH 07/24] pack-bitmap-write: support storing pseudo-merge commits Taylor Blau
2024-03-20 22:05 ` [PATCH 08/24] pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()` Taylor Blau
2024-03-20 22:05 ` [PATCH 09/24] pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public Taylor Blau
2024-03-20 22:05 ` [PATCH 10/24] pseudo-merge: implement support for selecting pseudo-merge commits Taylor Blau
2024-03-20 22:05 ` [PATCH 11/24] pack-bitmap-write.c: select " Taylor Blau
2024-03-20 22:05 ` [PATCH 12/24] pack-bitmap-write.c: write pseudo-merge table Taylor Blau
2024-03-20 22:05 ` [PATCH 13/24] pack-bitmap: extract `read_bitmap()` function Taylor Blau
2024-03-20 22:05 ` [PATCH 14/24] pseudo-merge: scaffolding for reads Taylor Blau
2024-03-20 22:05 ` [PATCH 15/24] pack-bitmap.c: read pseudo-merge extension Taylor Blau
2024-03-20 22:05 ` [PATCH 16/24] pseudo-merge: implement support for reading pseudo-merge commits Taylor Blau
2024-03-20 22:05 ` [PATCH 17/24] ewah: implement `ewah_bitmap_popcount()` Taylor Blau
2024-03-20 22:05 ` [PATCH 18/24] pack-bitmap: implement test helpers for pseudo-merge Taylor Blau
2024-03-20 22:05 ` [PATCH 19/24] t/test-lib-functions.sh: support `--date` in `test_commit_bulk()` Taylor Blau
2024-03-20 22:05 ` [PATCH 20/24] pack-bitmap.c: use pseudo-merges during traversal Taylor Blau
2024-03-20 22:06 ` [PATCH 21/24] pack-bitmap: extra trace2 information Taylor Blau
2024-03-20 22:06 ` [PATCH 22/24] ewah: `bitmap_equals_ewah()` Taylor Blau
2024-03-20 22:06 ` Taylor Blau [this message]
2024-03-20 22:06 ` [PATCH 24/24] t/perf: implement performace tests for pseudo-merge bitmaps Taylor Blau
2024-03-21 19:50 ` [PATCH 00/24] pack-bitmap: pseudo-merge reachability bitmaps Junio C Hamano
2024-04-29 20:42 ` [PATCH v2 00/23] " Taylor Blau
2024-04-29 20:42   ` [PATCH v2 01/23] Documentation/technical: describe pseudo-merge bitmaps format Taylor Blau
2024-05-06 11:52     ` Patrick Steinhardt
2024-05-06 16:37       ` Taylor Blau
2024-05-10 11:46         ` Patrick Steinhardt
2024-05-13 19:47           ` Taylor Blau
2024-05-14  6:33             ` Patrick Steinhardt
2024-04-29 20:43   ` [PATCH v2 02/23] ewah: implement `ewah_bitmap_is_subset()` Taylor Blau
2024-04-29 20:43   ` [PATCH v2 03/23] pack-bitmap: drop unused `max_bitmaps` parameter Taylor Blau
2024-04-29 20:43   ` [PATCH v2 04/23] pack-bitmap: move some initialization to `bitmap_writer_init()` Taylor Blau
2024-05-06 11:52     ` Patrick Steinhardt
2024-05-06 18:24       ` Taylor Blau
2024-04-29 20:43   ` [PATCH v2 05/23] pseudo-merge.ch: initial commit Taylor Blau
2024-04-29 20:43   ` [PATCH v2 06/23] pack-bitmap-write: support storing pseudo-merge commits Taylor Blau
2024-05-06 11:52     ` Patrick Steinhardt
2024-05-06 18:48       ` Taylor Blau
2024-05-10 11:47         ` Patrick Steinhardt
2024-05-13 18:42     ` Jeff King
2024-05-13 20:19       ` Taylor Blau
2024-04-29 20:43   ` [PATCH v2 07/23] pack-bitmap: implement `bitmap_writer_has_bitmapped_object_id()` Taylor Blau
2024-04-29 20:43   ` [PATCH v2 08/23] pack-bitmap: make `bitmap_writer_push_bitmapped_commit()` public Taylor Blau
2024-05-13 18:50     ` Jeff King
2024-05-14  0:54       ` Taylor Blau
2024-04-29 20:43   ` [PATCH v2 09/23] pseudo-merge: implement support for selecting pseudo-merge commits Taylor Blau
2024-05-06 11:53     ` Patrick Steinhardt
2024-05-06 19:58       ` Taylor Blau
2024-05-13 19:03     ` Jeff King
2024-05-14  0:58       ` Taylor Blau
2024-05-16  8:07         ` Jeff King
2024-05-16 22:43           ` Junio C Hamano
2024-04-29 20:43   ` [PATCH v2 10/23] pack-bitmap-write.c: select " Taylor Blau
2024-05-06 11:53     ` Patrick Steinhardt
2024-05-06 20:05       ` Taylor Blau
2024-05-10 11:47         ` Patrick Steinhardt
2024-04-29 20:43   ` [PATCH v2 11/23] pack-bitmap-write.c: write pseudo-merge table Taylor Blau
2024-04-29 20:43   ` [PATCH v2 12/23] pack-bitmap: extract `read_bitmap()` function Taylor Blau
2024-04-29 20:43   ` [PATCH v2 13/23] pseudo-merge: scaffolding for reads Taylor Blau
2024-04-29 20:43   ` [PATCH v2 14/23] pack-bitmap.c: read pseudo-merge extension Taylor Blau
2024-04-29 20:44   ` [PATCH v2 15/23] pseudo-merge: implement support for reading pseudo-merge commits Taylor Blau
2024-04-29 20:44   ` [PATCH v2 16/23] ewah: implement `ewah_bitmap_popcount()` Taylor Blau
2024-04-29 20:44   ` [PATCH v2 17/23] pack-bitmap: implement test helpers for pseudo-merge Taylor Blau
2024-04-29 20:44   ` [PATCH v2 18/23] t/test-lib-functions.sh: support `--date` in `test_commit_bulk()` Taylor Blau
2024-04-29 20:44   ` [PATCH v2 19/23] pack-bitmap.c: use pseudo-merges during traversal Taylor Blau
2024-04-29 20:44   ` [PATCH v2 20/23] pack-bitmap: extra trace2 information Taylor Blau
2024-04-29 20:44   ` [PATCH v2 21/23] ewah: `bitmap_equals_ewah()` Taylor Blau
2024-04-29 20:44   ` [PATCH v2 22/23] pseudo-merge: implement support for finding existing merges Taylor Blau
2024-04-29 20:44   ` [PATCH v2 23/23] t/perf: implement performace tests for pseudo-merge bitmaps Taylor Blau
2024-04-30 20:03   ` [PATCH v2 00/23] pack-bitmap: pseudo-merge reachability bitmaps Junio C Hamano
2024-05-01 14:40     ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4ae4f0eaae5ebe9495968e8585f4b2692d2cbec2.1710972293.git.me@ttaylorr.com \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=newren@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).