git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 00/22] multi-pack reachability bitmaps
@ 2021-04-09 18:10 Taylor Blau
  2021-04-09 18:10 ` [PATCH 01/22] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps Taylor Blau
                   ` (21 more replies)
  0 siblings, 22 replies; 32+ messages in thread
From: Taylor Blau @ 2021-04-09 18:10 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

This series implements multi-pack reachability bitmaps. It is based on
'master' after merging 'tb/pack-preferred-tips-to-give-bitmap'.

This is an extension of the classic single-pack bitmaps. Instead of
mapping between objects and bit positions according to each object's
pack-relative position, multi-pack bitmaps use each object's position in
a kind of "pseudo pack".

The pseudo pack doesn't refer to a physical packfile, but instead a
conceptual ordering of objects in a multi-pack index. This ordering is
reflected in the MIDX's .rev file, which is used extensively to power
multi-pack bitmaps.

This somewhat lengthy series is organized as follows:

  - The first eight patches are cleanup and preparation.

  - The next three patches factor out functions which have different
    implementations based on whether a bitmap is tied to a pack or MIDX.

  - The next two patches implement support for reading and writing
    multi-pack bitmaps.

  - The remaining tests prepare for a new
    GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP mode of running the test
    suite, and add new tests covering the new multi-pack bitmap
    behavior.

You can experiment with the new functionality by running "git
multi-pack-index write --bitmap", which updates the multi-pack index (if
necessary), and writes out a corresponding .bitmap file. Eventually,
support for invoking the above during "git repack" will be introduced,
but this is done in a separate series.

These patches have been extracted from a version which has been running
on every repository on GitHub for the past few weeks.

Thanks in advance for your review (including on all of the many series leading
up to this one).

Jeff King (1):
  t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP

Taylor Blau (21):
  pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps
  pack-bitmap-write.c: gracefully fail to write non-closed bitmaps
  pack-bitmap-write.c: free existing bitmaps
  Documentation: build 'technical/bitmap-format' by default
  Documentation: describe MIDX-based bitmaps
  midx: make a number of functions non-static
  midx: clear auxiliary .rev after replacing the MIDX
  midx: respect 'core.multiPackIndex' when writing
  pack-bitmap.c: introduce 'bitmap_num_objects()'
  pack-bitmap.c: introduce 'nth_bitmap_object_oid()'
  pack-bitmap.c: introduce 'bitmap_is_preferred_refname()'
  pack-bitmap: read multi-pack bitmaps
  pack-bitmap: write multi-pack bitmaps
  t5310: move some tests to lib-bitmap.sh
  t/helper/test-read-midx.c: add --checksum mode
  t5326: test multi-pack bitmap behavior
  t5319: don't write MIDX bitmaps in t5319
  t7700: update to work with MIDX bitmap test knob
  midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
  p5310: extract full and partial bitmap tests
  p5326: perf tests for MIDX bitmaps

 Documentation/Makefile                       |   1 +
 Documentation/git-multi-pack-index.txt       |  12 +-
 Documentation/technical/bitmap-format.txt    |  72 ++-
 Documentation/technical/multi-pack-index.txt |  10 +-
 builtin/multi-pack-index.c                   |   2 +
 builtin/pack-objects.c                       |  15 +-
 builtin/repack.c                             |  13 +-
 ci/run-build-and-tests.sh                    |   1 +
 midx.c                                       | 216 ++++++++-
 midx.h                                       |   5 +
 pack-bitmap-write.c                          |  79 +++-
 pack-bitmap.c                                | 463 +++++++++++++++++--
 pack-bitmap.h                                |   8 +-
 packfile.c                                   |   2 +-
 t/README                                     |   4 +
 t/helper/test-read-midx.c                    |  16 +-
 t/lib-bitmap.sh                              | 216 +++++++++
 t/perf/lib-bitmap.sh                         |  69 +++
 t/perf/p5310-pack-bitmaps.sh                 |  65 +--
 t/perf/p5326-multi-pack-bitmaps.sh           |  43 ++
 t/t5310-pack-bitmaps.sh                      | 208 +--------
 t/t5319-multi-pack-index.sh                  |   3 +-
 t/t5326-multi-pack-bitmaps.sh                | 278 +++++++++++
 t/t7700-repack.sh                            |  18 +-
 24 files changed, 1435 insertions(+), 384 deletions(-)
 create mode 100644 t/perf/lib-bitmap.sh
 create mode 100755 t/perf/p5326-multi-pack-bitmaps.sh
 create mode 100755 t/t5326-multi-pack-bitmaps.sh

-- 
2.31.1.163.ga65ce7f831

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 01/22] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
@ 2021-04-09 18:10 ` Taylor Blau
  2021-04-09 18:10 ` [PATCH 02/22] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps Taylor Blau
                   ` (20 subsequent siblings)
  21 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-04-09 18:10 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

The special `--test-bitmap` mode of `git rev-list` is used to compare
the result of an object traversal with a bitmap to check its integrity.
This mode does not, however, assert that the types of reachable objects
are stored correctly.

Harden this mode by teaching it to also check that each time an object's
bit is marked, the corresponding bit should be set in exactly one of the
type bitmaps (whose type matches the object's true type).

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 3ed15431cd..d45e91db1e 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1263,10 +1263,52 @@ void count_bitmap_commit_list(struct bitmap_index *bitmap_git,
 struct bitmap_test_data {
 	struct bitmap_index *bitmap_git;
 	struct bitmap *base;
+	struct bitmap *commits;
+	struct bitmap *trees;
+	struct bitmap *blobs;
+	struct bitmap *tags;
 	struct progress *prg;
 	size_t seen;
 };
 
+static void test_bitmap_type(struct bitmap_test_data *tdata,
+			     struct object *obj, int pos)
+{
+	enum object_type bitmap_type = OBJ_NONE;
+	int bitmaps_nr = 0;
+
+	if (bitmap_get(tdata->commits, pos)) {
+		bitmap_type = OBJ_COMMIT;
+		bitmaps_nr++;
+	}
+	if (bitmap_get(tdata->trees, pos)) {
+		bitmap_type = OBJ_TREE;
+		bitmaps_nr++;
+	}
+	if (bitmap_get(tdata->blobs, pos)) {
+		bitmap_type = OBJ_BLOB;
+		bitmaps_nr++;
+	}
+	if (bitmap_get(tdata->tags, pos)) {
+		bitmap_type = OBJ_TAG;
+		bitmaps_nr++;
+	}
+
+	if (!bitmap_type)
+		die("object %s not found in type bitmaps",
+		    oid_to_hex(&obj->oid));
+
+	if (bitmaps_nr > 1)
+		die("object %s does not have a unique type",
+		    oid_to_hex(&obj->oid));
+
+	if (bitmap_type != obj->type)
+		die("object %s: real type %s, expected: %s",
+		    oid_to_hex(&obj->oid),
+		    type_name(obj->type),
+		    type_name(bitmap_type));
+}
+
 static void test_show_object(struct object *object, const char *name,
 			     void *data)
 {
@@ -1276,6 +1318,7 @@ static void test_show_object(struct object *object, const char *name,
 	bitmap_pos = bitmap_position(tdata->bitmap_git, &object->oid);
 	if (bitmap_pos < 0)
 		die("Object not in bitmap: %s\n", oid_to_hex(&object->oid));
+	test_bitmap_type(tdata, object, bitmap_pos);
 
 	bitmap_set(tdata->base, bitmap_pos);
 	display_progress(tdata->prg, ++tdata->seen);
@@ -1290,6 +1333,7 @@ static void test_show_commit(struct commit *commit, void *data)
 				     &commit->object.oid);
 	if (bitmap_pos < 0)
 		die("Object not in bitmap: %s\n", oid_to_hex(&commit->object.oid));
+	test_bitmap_type(tdata, &commit->object, bitmap_pos);
 
 	bitmap_set(tdata->base, bitmap_pos);
 	display_progress(tdata->prg, ++tdata->seen);
@@ -1337,6 +1381,10 @@ void test_bitmap_walk(struct rev_info *revs)
 
 	tdata.bitmap_git = bitmap_git;
 	tdata.base = bitmap_new();
+	tdata.commits = ewah_to_bitmap(bitmap_git->commits);
+	tdata.trees = ewah_to_bitmap(bitmap_git->trees);
+	tdata.blobs = ewah_to_bitmap(bitmap_git->blobs);
+	tdata.tags = ewah_to_bitmap(bitmap_git->tags);
 	tdata.prg = start_progress("Verifying bitmap entries", result_popcnt);
 	tdata.seen = 0;
 
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 02/22] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
  2021-04-09 18:10 ` [PATCH 01/22] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps Taylor Blau
@ 2021-04-09 18:10 ` Taylor Blau
  2021-04-16  2:46   ` Jonathan Tan
  2021-04-09 18:10 ` [PATCH 03/22] pack-bitmap-write.c: free existing bitmaps Taylor Blau
                   ` (19 subsequent siblings)
  21 siblings, 1 reply; 32+ messages in thread
From: Taylor Blau @ 2021-04-09 18:10 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

The set of objects covered by a bitmap must be closed under
reachability, since it must be the case that there is a valid bit
position assigned for every possible reachable object (otherwise the
bitmaps would be incomplete).

Pack bitmaps are never written from 'git repack' unless repacking
all-into-one, and so we never write non-closed bitmaps.

But multi-pack bitmaps change this, since it isn't known whether the
set of objects in the MIDX is closed under reachability until walking
them. Plumb through a bit that is set when a reachable object isn't
found.

As soon as a reachable object isn't found in the set of objects to
include in the bitmap, bitmap_writer_build() knows that the set is not
closed, and so it now fails gracefully.

(The new conditional in builtin/pack-objects.c:bitmap_writer_build()
guards against other failure modes, but is never triggered here, because
of the all-into-one detail above. This return value will be important to
check from the multi-pack index caller.)

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c |  3 +-
 pack-bitmap-write.c    | 76 +++++++++++++++++++++++++++++-------------
 pack-bitmap.h          |  2 +-
 3 files changed, 56 insertions(+), 25 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index a1e33d7507..5205dde2e1 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1116,7 +1116,8 @@ static void write_pack_file(void)
 
 				bitmap_writer_show_progress(progress);
 				bitmap_writer_select_commits(indexed_commits, indexed_commits_nr, -1);
-				bitmap_writer_build(&to_pack);
+				if (bitmap_writer_build(&to_pack) < 0)
+					die(_("failed to write bitmap index"));
 				bitmap_writer_finish(written_list, nr_written,
 						     tmpname.buf, write_bitmap_options);
 				write_bitmap_index = 0;
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 88d9e696a5..e829c46649 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -125,15 +125,20 @@ static inline void push_bitmapped_commit(struct commit *commit)
 	writer.selected_nr++;
 }
 
-static uint32_t find_object_pos(const struct object_id *oid)
+static uint32_t find_object_pos(const struct object_id *oid, int *found)
 {
 	struct object_entry *entry = packlist_find(writer.to_pack, oid);
 
 	if (!entry) {
-		die("Failed to write bitmap index. Packfile doesn't have full closure "
+		if (found)
+			*found = 0;
+		warning("Failed to write bitmap index. Packfile doesn't have full closure "
 			"(object %s is missing)", oid_to_hex(oid));
+		return 0;
 	}
 
+	if (found)
+		*found = 1;
 	return oe_in_pack_pos(writer.to_pack, entry);
 }
 
@@ -331,9 +336,10 @@ static void bitmap_builder_clear(struct bitmap_builder *bb)
 	bb->commits_nr = bb->commits_alloc = 0;
 }
 
-static void fill_bitmap_tree(struct bitmap *bitmap,
-			     struct tree *tree)
+static int fill_bitmap_tree(struct bitmap *bitmap,
+			    struct tree *tree)
 {
+	int found;
 	uint32_t pos;
 	struct tree_desc desc;
 	struct name_entry entry;
@@ -342,9 +348,11 @@ static void fill_bitmap_tree(struct bitmap *bitmap,
 	 * If our bit is already set, then there is nothing to do. Both this
 	 * tree and all of its children will be set.
 	 */
-	pos = find_object_pos(&tree->object.oid);
+	pos = find_object_pos(&tree->object.oid, &found);
+	if (!found)
+		return -1;
 	if (bitmap_get(bitmap, pos))
-		return;
+		return 0;
 	bitmap_set(bitmap, pos);
 
 	if (parse_tree(tree) < 0)
@@ -355,11 +363,15 @@ static void fill_bitmap_tree(struct bitmap *bitmap,
 	while (tree_entry(&desc, &entry)) {
 		switch (object_type(entry.mode)) {
 		case OBJ_TREE:
-			fill_bitmap_tree(bitmap,
-					 lookup_tree(the_repository, &entry.oid));
+			if (fill_bitmap_tree(bitmap,
+					     lookup_tree(the_repository, &entry.oid)) < 0)
+				return -1;
 			break;
 		case OBJ_BLOB:
-			bitmap_set(bitmap, find_object_pos(&entry.oid));
+			pos = find_object_pos(&entry.oid, &found);
+			if (!found)
+				return -1;
+			bitmap_set(bitmap, pos);
 			break;
 		default:
 			/* Gitlink, etc; not reachable */
@@ -368,15 +380,18 @@ static void fill_bitmap_tree(struct bitmap *bitmap,
 	}
 
 	free_tree_buffer(tree);
+	return 0;
 }
 
-static void fill_bitmap_commit(struct bb_commit *ent,
-			       struct commit *commit,
-			       struct prio_queue *queue,
-			       struct prio_queue *tree_queue,
-			       struct bitmap_index *old_bitmap,
-			       const uint32_t *mapping)
+static int fill_bitmap_commit(struct bb_commit *ent,
+			      struct commit *commit,
+			      struct prio_queue *queue,
+			      struct prio_queue *tree_queue,
+			      struct bitmap_index *old_bitmap,
+			      const uint32_t *mapping)
 {
+	int found;
+	uint32_t pos;
 	if (!ent->bitmap)
 		ent->bitmap = bitmap_new();
 
@@ -401,11 +416,16 @@ static void fill_bitmap_commit(struct bb_commit *ent,
 		 * Mark ourselves and queue our tree. The commit
 		 * walk ensures we cover all parents.
 		 */
-		bitmap_set(ent->bitmap, find_object_pos(&c->object.oid));
+		pos = find_object_pos(&c->object.oid, &found);
+		if (!found)
+			return -1;
+		bitmap_set(ent->bitmap, pos);
 		prio_queue_put(tree_queue, get_commit_tree(c));
 
 		for (p = c->parents; p; p = p->next) {
-			int pos = find_object_pos(&p->item->object.oid);
+			pos = find_object_pos(&p->item->object.oid, &found);
+			if (!found)
+				return -1;
 			if (!bitmap_get(ent->bitmap, pos)) {
 				bitmap_set(ent->bitmap, pos);
 				prio_queue_put(queue, p->item);
@@ -413,8 +433,12 @@ static void fill_bitmap_commit(struct bb_commit *ent,
 		}
 	}
 
-	while (tree_queue->nr)
-		fill_bitmap_tree(ent->bitmap, prio_queue_get(tree_queue));
+	while (tree_queue->nr) {
+		if (fill_bitmap_tree(ent->bitmap,
+				     prio_queue_get(tree_queue)) < 0)
+			return -1;
+	}
+	return 0;
 }
 
 static void store_selected(struct bb_commit *ent, struct commit *commit)
@@ -432,7 +456,7 @@ static void store_selected(struct bb_commit *ent, struct commit *commit)
 	kh_value(writer.bitmaps, hash_pos) = stored;
 }
 
-void bitmap_writer_build(struct packing_data *to_pack)
+int bitmap_writer_build(struct packing_data *to_pack)
 {
 	struct bitmap_builder bb;
 	size_t i;
@@ -441,6 +465,7 @@ void bitmap_writer_build(struct packing_data *to_pack)
 	struct prio_queue tree_queue = { NULL };
 	struct bitmap_index *old_bitmap;
 	uint32_t *mapping;
+	int closed = 1; /* until proven otherwise */
 
 	writer.bitmaps = kh_init_oid_map();
 	writer.to_pack = to_pack;
@@ -463,8 +488,11 @@ void bitmap_writer_build(struct packing_data *to_pack)
 		struct commit *child;
 		int reused = 0;
 
-		fill_bitmap_commit(ent, commit, &queue, &tree_queue,
-				   old_bitmap, mapping);
+		if (fill_bitmap_commit(ent, commit, &queue, &tree_queue,
+				       old_bitmap, mapping) < 0) {
+			closed = 0;
+			break;
+		}
 
 		if (ent->selected) {
 			store_selected(ent, commit);
@@ -499,7 +527,9 @@ void bitmap_writer_build(struct packing_data *to_pack)
 
 	stop_progress(&writer.progress);
 
-	compute_xor_offsets();
+	if (closed)
+		compute_xor_offsets();
+	return closed;
 }
 
 /**
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 78f2b3ff79..988ed3a30d 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -86,7 +86,7 @@ struct ewah_bitmap *bitmap_for_commit(struct bitmap_index *bitmap_git,
 				      struct commit *commit);
 void bitmap_writer_select_commits(struct commit **indexed_commits,
 		unsigned int indexed_commits_nr, int max_bitmaps);
-void bitmap_writer_build(struct packing_data *to_pack);
+int bitmap_writer_build(struct packing_data *to_pack);
 void bitmap_writer_finish(struct pack_idx_entry **index,
 			  uint32_t index_nr,
 			  const char *filename,
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 03/22] pack-bitmap-write.c: free existing bitmaps
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
  2021-04-09 18:10 ` [PATCH 01/22] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps Taylor Blau
  2021-04-09 18:10 ` [PATCH 02/22] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps Taylor Blau
@ 2021-04-09 18:10 ` Taylor Blau
  2021-04-09 18:10 ` [PATCH 04/22] Documentation: build 'technical/bitmap-format' by default Taylor Blau
                   ` (18 subsequent siblings)
  21 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-04-09 18:10 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

When writing a new bitmap, the bitmap writer code attempts to read the
existing bitmap (if one is present). This is done in order to quickly
permute the bits of any bitmaps for commits which appear in the existing
bitmap, and were also selected for the new bitmap.

But since this code was added in 341fa34887 (pack-bitmap-write: use
existing bitmaps, 2020-12-08), the resources associated with opening an
existing bitmap were never released.

It's fine to ignore this, but it's bad hygiene. It will also cause a
problem for the multi-pack-index builtin, which will be responsible not
only for writing bitmaps, but also for expiring any old multi-pack
bitmaps.

If an existing bitmap was reused here, it will also be expired. That
will cause a problem on platforms which require file resources to be
closed before unlinking them, like Windows. Avoid this by ensuring we
close reused bitmaps with free_bitmap_index() before removing them.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap-write.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index e829c46649..f90e100e3e 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -520,6 +520,7 @@ int bitmap_writer_build(struct packing_data *to_pack)
 	clear_prio_queue(&queue);
 	clear_prio_queue(&tree_queue);
 	bitmap_builder_clear(&bb);
+	free_bitmap_index(old_bitmap);
 	free(mapping);
 
 	trace2_region_leave("pack-bitmap-write", "building_bitmaps_total",
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 04/22] Documentation: build 'technical/bitmap-format' by default
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (2 preceding siblings ...)
  2021-04-09 18:10 ` [PATCH 03/22] pack-bitmap-write.c: free existing bitmaps Taylor Blau
@ 2021-04-09 18:10 ` Taylor Blau
  2021-04-09 18:11 ` [PATCH 05/22] Documentation: describe MIDX-based bitmaps Taylor Blau
                   ` (17 subsequent siblings)
  21 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-04-09 18:10 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Even though the 'TECH_DOCS' variable was introduced all the way back in
5e00439f0a (Documentation: build html for all files in technical and
howto, 2012-10-23), the 'bitmap-format' document was never added to that
list when it was created.

Prepare for changes to this file by including it in the list of
technical documentation that 'make doc' will build by default.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 874a01d7a8..6d60c8c165 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -83,6 +83,7 @@ SP_ARTICLES += $(API_DOCS)
 TECH_DOCS += MyFirstContribution
 TECH_DOCS += MyFirstObjectWalk
 TECH_DOCS += SubmittingPatches
+TECH_DOCS += technical/bitmap-format
 TECH_DOCS += technical/hash-function-transition
 TECH_DOCS += technical/http-protocol
 TECH_DOCS += technical/index-format
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 05/22] Documentation: describe MIDX-based bitmaps
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (3 preceding siblings ...)
  2021-04-09 18:10 ` [PATCH 04/22] Documentation: build 'technical/bitmap-format' by default Taylor Blau
@ 2021-04-09 18:11 ` Taylor Blau
  2021-04-09 18:11 ` [PATCH 06/22] midx: make a number of functions non-static Taylor Blau
                   ` (16 subsequent siblings)
  21 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-04-09 18:11 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Update the technical documentation to describe the multi-pack bitmap
format. This patch merely introduces the new format, and describes its
high-level ideas. Git does not yet know how to read nor write these
multi-pack variants, and so the subsequent patches will:

  - Introduce code to interpret multi-pack bitmaps, according to this
    document.

  - Then, introduce code to write multi-pack bitmaps from the 'git
    multi-pack-index write' sub-command.

Finally, the implementation will gain tests in subsequent patches (as
opposed to inline with the patch teaching Git how to write multi-pack
bitmaps) to avoid a cyclic dependency.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/technical/bitmap-format.txt    | 72 ++++++++++++++++----
 Documentation/technical/multi-pack-index.txt | 10 +--
 2 files changed, 61 insertions(+), 21 deletions(-)

diff --git a/Documentation/technical/bitmap-format.txt b/Documentation/technical/bitmap-format.txt
index f8c18a0f7a..25221c7ec8 100644
--- a/Documentation/technical/bitmap-format.txt
+++ b/Documentation/technical/bitmap-format.txt
@@ -1,6 +1,45 @@
 GIT bitmap v1 format
 ====================
 
+== Pack and multi-pack bitmaps
+
+Bitmaps store reachability information about the set of objects in a packfile,
+or a multi-pack index (MIDX). The former is defined obviously, and the latter is
+defined as the union of objects in packs contained in the MIDX.
+
+A bitmap may belong to either one pack, or the repository's multi-pack index (if
+it exists). A repository may have at most one bitmap.
+
+An object is uniquely described by its bit position within a bitmap:
+
+	- If the bitmap belongs to a packfile, the __n__th bit corresponds to
+	the __n__th object in pack order. For a function `offset` which maps
+	objects to their byte offset within a pack, pack order is defined as
+	follows:
+
+		o1 <= o2 <==> offset(o1) <= offset(o2)
+
+	- If the bitmap belongs to a MIDX, the __n__th bit corresponds to the
+	__n__th object in MIDX order. With an additional function `pack` which
+	maps objects to the pack they were selected from by the MIDX, MIDX order
+	is defined as follows:
+
+		o1 <= o2 <==> pack(o1) <= pack(o2) /\ offset(o1) <= offset(o2)
+
+	The ordering between packs is done lexicographically by the pack name,
+	with the exception of the preferred pack, which sorts ahead of all other
+	packs.
+
+The on-disk representation (described below) of a bitmap is the same regardless
+of whether or not that bitmap belongs to a packfile or a MIDX. The only
+difference is the interpretation of the bits, which is described above.
+
+Certain bitmap extensions are supported (see: Appendix B). No extensions are
+required for bitmaps corresponding to packfiles. For bitmaps that correspond to
+MIDXs, both the bit-cache and rev-cache extensions are required.
+
+== On-disk format
+
 	- A header appears at the beginning:
 
 		4-byte signature: {'B', 'I', 'T', 'M'}
@@ -14,17 +53,19 @@ GIT bitmap v1 format
 			The following flags are supported:
 
 			- BITMAP_OPT_FULL_DAG (0x1) REQUIRED
-			This flag must always be present. It implies that the bitmap
-			index has been generated for a packfile with full closure
-			(i.e. where every single object in the packfile can find
-			 its parent links inside the same packfile). This is a
-			requirement for the bitmap index format, also present in JGit,
-			that greatly reduces the complexity of the implementation.
+			This flag must always be present. It implies that the
+			bitmap index has been generated for a packfile or
+			multi-pack index (MIDX) with full closure (i.e. where
+			every single object in the packfile/MIDX can find its
+			parent links inside the same packfile/MIDX). This is a
+			requirement for the bitmap index format, also present in
+			JGit, that greatly reduces the complexity of the
+			implementation.
 
 			- BITMAP_OPT_HASH_CACHE (0x4)
 			If present, the end of the bitmap file contains
 			`N` 32-bit name-hash values, one per object in the
-			pack. The format and meaning of the name-hash is
+			pack/MIDX. The format and meaning of the name-hash is
 			described below.
 
 		4-byte entry count (network byte order)
@@ -33,7 +74,8 @@ GIT bitmap v1 format
 
 		20-byte checksum
 
-			The SHA1 checksum of the pack this bitmap index belongs to.
+			The SHA1 checksum of the pack/MIDX this bitmap index
+			belongs to.
 
 	- 4 EWAH bitmaps that act as type indexes
 
@@ -50,7 +92,7 @@ GIT bitmap v1 format
 			- Tags
 
 		In each bitmap, the `n`th bit is set to true if the `n`th object
-		in the packfile is of that type.
+		in the packfile or multi-pack index is of that type.
 
 		The obvious consequence is that the OR of all 4 bitmaps will result
 		in a full set (all bits set), and the AND of all 4 bitmaps will
@@ -62,8 +104,9 @@ GIT bitmap v1 format
 		Each entry contains the following:
 
 		- 4-byte object position (network byte order)
-			The position **in the index for the packfile** where the
-			bitmap for this commit is found.
+			The position **in the index for the packfile or
+			multi-pack index** where the bitmap for this commit is
+			found.
 
 		- 1-byte XOR-offset
 			The xor offset used to compress this bitmap. For an entry
@@ -146,10 +189,11 @@ Name-hash cache
 ---------------
 
 If the BITMAP_OPT_HASH_CACHE flag is set, the end of the bitmap contains
-a cache of 32-bit values, one per object in the pack. The value at
+a cache of 32-bit values, one per object in the pack/MIDX. The value at
 position `i` is the hash of the pathname at which the `i`th object
-(counting in index order) in the pack can be found.  This can be fed
-into the delta heuristics to compare objects with similar pathnames.
+(counting in index or multi-pack index order) in the pack/MIDX can be found.
+This can be fed into the delta heuristics to compare objects with similar
+pathnames.
 
 The hash algorithm used is:
 
diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt
index fb688976c4..1a73c3ee20 100644
--- a/Documentation/technical/multi-pack-index.txt
+++ b/Documentation/technical/multi-pack-index.txt
@@ -71,14 +71,10 @@ Future Work
   still reducing the number of binary searches required for object
   lookups.
 
-- The reachability bitmap is currently paired directly with a single
-  packfile, using the pack-order as the object order to hopefully
-  compress the bitmaps well using run-length encoding. This could be
-  extended to pair a reachability bitmap with a multi-pack-index. If
-  the multi-pack-index is extended to store a "stable object order"
+- If the multi-pack-index is extended to store a "stable object order"
   (a function Order(hash) = integer that is constant for a given hash,
-  even as the multi-pack-index is updated) then a reachability bitmap
-  could point to a multi-pack-index and be updated independently.
+  even as the multi-pack-index is updated) then MIDX bitmaps could be
+  updated independently of the MIDX.
 
 - Packfiles can be marked as "special" using empty files that share
   the initial name but replace ".pack" with ".keep" or ".promisor".
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 06/22] midx: make a number of functions non-static
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (4 preceding siblings ...)
  2021-04-09 18:11 ` [PATCH 05/22] Documentation: describe MIDX-based bitmaps Taylor Blau
@ 2021-04-09 18:11 ` Taylor Blau
  2021-04-09 18:11 ` [PATCH 07/22] midx: clear auxiliary .rev after replacing the MIDX Taylor Blau
                   ` (15 subsequent siblings)
  21 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-04-09 18:11 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

These functions will be called from outside of midx.c in a subsequent
patch.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 4 ++--
 midx.h | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/midx.c b/midx.c
index 9e86583172..5249802326 100644
--- a/midx.c
+++ b/midx.c
@@ -48,12 +48,12 @@ static uint8_t oid_version(void)
 	}
 }
 
-static const unsigned char *get_midx_checksum(struct multi_pack_index *m)
+const unsigned char *get_midx_checksum(struct multi_pack_index *m)
 {
 	return m->data + m->data_len - the_hash_algo->rawsz;
 }
 
-static char *get_midx_filename(const char *object_dir)
+char *get_midx_filename(const char *object_dir)
 {
 	return xstrfmt("%s/pack/multi-pack-index", object_dir);
 }
diff --git a/midx.h b/midx.h
index 8684cf0fef..1172df1a71 100644
--- a/midx.h
+++ b/midx.h
@@ -42,6 +42,8 @@ struct multi_pack_index {
 #define MIDX_PROGRESS     (1 << 0)
 #define MIDX_WRITE_REV_INDEX (1 << 1)
 
+const unsigned char *get_midx_checksum(struct multi_pack_index *m);
+char *get_midx_filename(const char *object_dir);
 char *get_midx_rev_filename(struct multi_pack_index *m);
 
 struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local);
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 07/22] midx: clear auxiliary .rev after replacing the MIDX
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (5 preceding siblings ...)
  2021-04-09 18:11 ` [PATCH 06/22] midx: make a number of functions non-static Taylor Blau
@ 2021-04-09 18:11 ` Taylor Blau
  2021-04-09 18:11 ` [PATCH 08/22] midx: respect 'core.multiPackIndex' when writing Taylor Blau
                   ` (14 subsequent siblings)
  21 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-04-09 18:11 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

When writing a new multi-pack index, write_midx_internal() attempts to
clean up any auxiliary files (currently just the MIDX's `.rev` file, but
soon to include a `.bitmap`, too) corresponding to the MIDX it's
replacing.

This step should happen after the new MIDX is written into place, since
doing so beforehand means that the old MIDX could be read without its
corresponding .rev file.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/midx.c b/midx.c
index 5249802326..a24c36968d 100644
--- a/midx.c
+++ b/midx.c
@@ -1076,10 +1076,11 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 
 	if (flags & MIDX_WRITE_REV_INDEX)
 		write_midx_reverse_index(midx_name, midx_hash, &ctx);
-	clear_midx_files_ext(the_repository, ".rev", midx_hash);
 
 	commit_lock_file(&lk);
 
+	clear_midx_files_ext(the_repository, ".rev", midx_hash);
+
 cleanup:
 	for (i = 0; i < ctx.nr; i++) {
 		if (ctx.info[i].p) {
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 08/22] midx: respect 'core.multiPackIndex' when writing
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (6 preceding siblings ...)
  2021-04-09 18:11 ` [PATCH 07/22] midx: clear auxiliary .rev after replacing the MIDX Taylor Blau
@ 2021-04-09 18:11 ` Taylor Blau
  2021-04-09 18:11 ` [PATCH 09/22] pack-bitmap.c: introduce 'bitmap_num_objects()' Taylor Blau
                   ` (13 subsequent siblings)
  21 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-04-09 18:11 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

When writing a new multi-pack index, write_midx_internal() attempts to
load any existing one to fill in some pieces of information. But it uses
load_multi_pack_index(), which ignores the configuration
"core.multiPackIndex", which indicates whether or not Git is allowed to
read an existing multi-pack-index.

Replace this with a routine that does respect that setting, to avoid
reading multi-pack-index files when told not to.

This avoids a problem that would arise in subsequent patches due to the
combination of 'git repack' reopening the object store in-process and
the multi-pack index code not checking whether a pack already exists in
the object store when calling add_pack_to_midx().

This would ultimately lead to a cycle being created along the
'packed_git' struct's '->next' pointer. That is obviously bad, but it
has hard-to-debug downstream effects like saying a bitmap can't be
loaded for a pack because one already exists (for the same pack).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/midx.c b/midx.c
index a24c36968d..567cdf0fcf 100644
--- a/midx.c
+++ b/midx.c
@@ -908,8 +908,18 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 
 	if (m)
 		ctx.m = m;
-	else
-		ctx.m = load_multi_pack_index(object_dir, 1);
+	else {
+		struct multi_pack_index *cur;
+
+		prepare_multi_pack_index_one(the_repository, object_dir, 1);
+
+		ctx.m = NULL;
+		for (cur = the_repository->objects->multi_pack_index; cur;
+		     cur = cur->next) {
+			if (!strcmp(object_dir, cur->object_dir))
+				ctx.m = cur;
+		}
+	}
 
 	ctx.nr = 0;
 	ctx.alloc = ctx.m ? ctx.m->num_packs : 16;
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 09/22] pack-bitmap.c: introduce 'bitmap_num_objects()'
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (7 preceding siblings ...)
  2021-04-09 18:11 ` [PATCH 08/22] midx: respect 'core.multiPackIndex' when writing Taylor Blau
@ 2021-04-09 18:11 ` Taylor Blau
  2021-04-09 18:11 ` [PATCH 10/22] pack-bitmap.c: introduce 'nth_bitmap_object_oid()' Taylor Blau
                   ` (12 subsequent siblings)
  21 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-04-09 18:11 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

A subsequent patch to support reading MIDX bitmaps will be less noisy
after extracting a generic function to return how many objects are
contained in a bitmap.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 37 +++++++++++++++++++++----------------
 1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index d45e91db1e..a6c616aa3e 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -136,6 +136,11 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
 	return b;
 }
 
+static uint32_t bitmap_num_objects(struct bitmap_index *index)
+{
+	return index->pack->num_objects;
+}
+
 static int load_bitmap_header(struct bitmap_index *index)
 {
 	struct bitmap_disk_header *header = (void *)index->map;
@@ -154,7 +159,7 @@ static int load_bitmap_header(struct bitmap_index *index)
 	/* Parse known bitmap format options */
 	{
 		uint32_t flags = ntohs(header->options);
-		size_t cache_size = st_mult(index->pack->num_objects, sizeof(uint32_t));
+		size_t cache_size = st_mult(bitmap_num_objects(index), sizeof(uint32_t));
 		unsigned char *index_end = index->map + index->map_size - the_hash_algo->rawsz;
 
 		if ((flags & BITMAP_OPT_FULL_DAG) == 0)
@@ -399,7 +404,7 @@ static inline int bitmap_position_extended(struct bitmap_index *bitmap_git,
 
 	if (pos < kh_end(positions)) {
 		int bitmap_pos = kh_value(positions, pos);
-		return bitmap_pos + bitmap_git->pack->num_objects;
+		return bitmap_pos + bitmap_num_objects(bitmap_git);
 	}
 
 	return -1;
@@ -451,7 +456,7 @@ static int ext_index_add_object(struct bitmap_index *bitmap_git,
 		bitmap_pos = kh_value(eindex->positions, hash_pos);
 	}
 
-	return bitmap_pos + bitmap_git->pack->num_objects;
+	return bitmap_pos + bitmap_num_objects(bitmap_git);
 }
 
 struct bitmap_show_data {
@@ -647,7 +652,7 @@ static void show_extended_objects(struct bitmap_index *bitmap_git,
 	for (i = 0; i < eindex->count; ++i) {
 		struct object *obj;
 
-		if (!bitmap_get(objects, bitmap_git->pack->num_objects + i))
+		if (!bitmap_get(objects, bitmap_num_objects(bitmap_git) + i))
 			continue;
 
 		obj = eindex->objects[i];
@@ -808,7 +813,7 @@ static void filter_bitmap_exclude_type(struct bitmap_index *bitmap_git,
 	 * individually.
 	 */
 	for (i = 0; i < eindex->count; i++) {
-		uint32_t pos = i + bitmap_git->pack->num_objects;
+		uint32_t pos = i + bitmap_num_objects(bitmap_git);
 		if (eindex->objects[i]->type == type &&
 		    bitmap_get(to_filter, pos) &&
 		    !bitmap_get(tips, pos))
@@ -835,7 +840,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
 
 	oi.sizep = &size;
 
-	if (pos < pack->num_objects) {
+	if (pos < bitmap_num_objects(bitmap_git)) {
 		off_t ofs = pack_pos_to_offset(pack, pos);
 		if (packed_object_info(the_repository, pack, ofs, &oi) < 0) {
 			struct object_id oid;
@@ -845,7 +850,7 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
 		}
 	} else {
 		struct eindex *eindex = &bitmap_git->ext_index;
-		struct object *obj = eindex->objects[pos - pack->num_objects];
+		struct object *obj = eindex->objects[pos - bitmap_num_objects(bitmap_git)];
 		if (oid_object_info_extended(the_repository, &obj->oid, &oi, 0) < 0)
 			die(_("unable to get size of %s"), oid_to_hex(&obj->oid));
 	}
@@ -887,7 +892,7 @@ static void filter_bitmap_blob_limit(struct bitmap_index *bitmap_git,
 	}
 
 	for (i = 0; i < eindex->count; i++) {
-		uint32_t pos = i + bitmap_git->pack->num_objects;
+		uint32_t pos = i + bitmap_num_objects(bitmap_git);
 		if (eindex->objects[i]->type == OBJ_BLOB &&
 		    bitmap_get(to_filter, pos) &&
 		    !bitmap_get(tips, pos) &&
@@ -1075,8 +1080,8 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
 	enum object_type type;
 	unsigned long size;
 
-	if (pos >= bitmap_git->pack->num_objects)
-		return; /* not actually in the pack */
+	if (pos >= bitmap_num_objects(bitmap_git))
+		return; /* not actually in the pack or MIDX */
 
 	offset = header = pack_pos_to_offset(bitmap_git->pack, pos);
 	type = unpack_object_header(bitmap_git->pack, w_curs, &offset, &size);
@@ -1142,6 +1147,7 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 	struct pack_window *w_curs = NULL;
 	size_t i = 0;
 	uint32_t offset;
+	uint32_t objects_nr = bitmap_num_objects(bitmap_git);
 
 	assert(result);
 
@@ -1149,8 +1155,8 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 		i++;
 
 	/* Don't mark objects not in the packfile */
-	if (i > bitmap_git->pack->num_objects / BITS_IN_EWORD)
-		i = bitmap_git->pack->num_objects / BITS_IN_EWORD;
+	if (i > objects_nr / BITS_IN_EWORD)
+		i = objects_nr / BITS_IN_EWORD;
 
 	reuse = bitmap_word_alloc(i);
 	memset(reuse->words, 0xFF, i * sizeof(eword_t));
@@ -1234,7 +1240,7 @@ static uint32_t count_object_type(struct bitmap_index *bitmap_git,
 
 	for (i = 0; i < eindex->count; ++i) {
 		if (eindex->objects[i]->type == type &&
-			bitmap_get(objects, bitmap_git->pack->num_objects + i))
+			bitmap_get(objects, bitmap_num_objects(bitmap_git) + i))
 			count++;
 	}
 
@@ -1455,7 +1461,7 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
 	uint32_t i, num_objects;
 	uint32_t *reposition;
 
-	num_objects = bitmap_git->pack->num_objects;
+	num_objects = bitmap_num_objects(bitmap_git);
 	CALLOC_ARRAY(reposition, num_objects);
 
 	for (i = 0; i < num_objects; ++i) {
@@ -1538,7 +1544,6 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
 static off_t get_disk_usage_for_extended(struct bitmap_index *bitmap_git)
 {
 	struct bitmap *result = bitmap_git->result;
-	struct packed_git *pack = bitmap_git->pack;
 	struct eindex *eindex = &bitmap_git->ext_index;
 	off_t total = 0;
 	struct object_info oi = OBJECT_INFO_INIT;
@@ -1550,7 +1555,7 @@ static off_t get_disk_usage_for_extended(struct bitmap_index *bitmap_git)
 	for (i = 0; i < eindex->count; i++) {
 		struct object *obj = eindex->objects[i];
 
-		if (!bitmap_get(result, pack->num_objects + i))
+		if (!bitmap_get(result, bitmap_num_objects(bitmap_git) + i))
 			continue;
 
 		if (oid_object_info_extended(the_repository, &obj->oid, &oi, 0) < 0)
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 10/22] pack-bitmap.c: introduce 'nth_bitmap_object_oid()'
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (8 preceding siblings ...)
  2021-04-09 18:11 ` [PATCH 09/22] pack-bitmap.c: introduce 'bitmap_num_objects()' Taylor Blau
@ 2021-04-09 18:11 ` Taylor Blau
  2021-04-09 18:11 ` [PATCH 11/22] pack-bitmap.c: introduce 'bitmap_is_preferred_refname()' Taylor Blau
                   ` (11 subsequent siblings)
  21 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-04-09 18:11 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

A subsequent patch to support reading MIDX bitmaps will be less noisy
after extracting a generic function to fetch the nth OID contained in
the bitmap.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index a6c616aa3e..97ee2d331d 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -223,6 +223,13 @@ static inline uint8_t read_u8(const unsigned char *buffer, size_t *pos)
 
 #define MAX_XOR_OFFSET 160
 
+static void nth_bitmap_object_oid(struct bitmap_index *index,
+				  struct object_id *oid,
+				  uint32_t n)
+{
+	nth_packed_object_id(oid, index->pack, n);
+}
+
 static int load_bitmap_entries_v1(struct bitmap_index *index)
 {
 	uint32_t i;
@@ -242,9 +249,7 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
 		xor_offset = read_u8(index->map, &index->map_pos);
 		flags = read_u8(index->map, &index->map_pos);
 
-		if (nth_packed_object_id(&oid, index->pack, commit_idx_pos) < 0)
-			return error("corrupt ewah bitmap: commit index %u out of range",
-				     (unsigned)commit_idx_pos);
+		nth_bitmap_object_oid(index, &oid, commit_idx_pos);
 
 		bitmap = read_bitmap_1(index);
 		if (!bitmap)
@@ -844,8 +849,8 @@ static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
 		off_t ofs = pack_pos_to_offset(pack, pos);
 		if (packed_object_info(the_repository, pack, ofs, &oi) < 0) {
 			struct object_id oid;
-			nth_packed_object_id(&oid, pack,
-					     pack_pos_to_index(pack, pos));
+			nth_bitmap_object_oid(bitmap_git, &oid,
+					      pack_pos_to_index(pack, pos));
 			die(_("unable to get size of %s"), oid_to_hex(&oid));
 		}
 	} else {
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 11/22] pack-bitmap.c: introduce 'bitmap_is_preferred_refname()'
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (9 preceding siblings ...)
  2021-04-09 18:11 ` [PATCH 10/22] pack-bitmap.c: introduce 'nth_bitmap_object_oid()' Taylor Blau
@ 2021-04-09 18:11 ` Taylor Blau
  2021-04-09 18:11 ` [PATCH 12/22] pack-bitmap: read multi-pack bitmaps Taylor Blau
                   ` (10 subsequent siblings)
  21 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-04-09 18:11 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

In a recent commit, pack-objects learned support for the
'pack.preferBitmapTips' configuration. This patch prepares the
multi-pack bitmap code to respect this configuration, too.

Since the multi-pack bitmap code already does a traversal of all
references (in order to discover the set of reachable commits in the
multi-pack index), it is more efficient to check whether or not each
reference is a suffix of any value of 'pack.preferBitmapTips' rather
than do an additional traversal.

Implement a function 'bitmap_is_preferred_refname()' which does just
that. The caller will be added in a subsequent patch.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c | 16 ++++++++++++++++
 pack-bitmap.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index 97ee2d331d..be52570b0f 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1594,3 +1594,19 @@ const struct string_list *bitmap_preferred_tips(struct repository *r)
 {
 	return repo_config_get_value_multi(r, "pack.preferbitmaptips");
 }
+
+int bitmap_is_preferred_refname(struct repository *r, const char *refname)
+{
+	const struct string_list *preferred_tips = bitmap_preferred_tips(r);
+	struct string_list_item *item;
+
+	if (!preferred_tips)
+		return 0;
+
+	for_each_string_list_item(item, preferred_tips) {
+		if (starts_with(refname, item->string))
+			return 1;
+	}
+
+	return 0;
+}
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 988ed3a30d..0bf75ff2a7 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -93,5 +93,6 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 			  uint16_t options);
 
 const struct string_list *bitmap_preferred_tips(struct repository *r);
+int bitmap_is_preferred_refname(struct repository *r, const char *refname);
 
 #endif
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 12/22] pack-bitmap: read multi-pack bitmaps
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (10 preceding siblings ...)
  2021-04-09 18:11 ` [PATCH 11/22] pack-bitmap.c: introduce 'bitmap_is_preferred_refname()' Taylor Blau
@ 2021-04-09 18:11 ` Taylor Blau
  2021-04-16  2:39   ` Jonathan Tan
  2021-04-09 18:11 ` [PATCH 13/22] pack-bitmap: write " Taylor Blau
                   ` (9 subsequent siblings)
  21 siblings, 1 reply; 32+ messages in thread
From: Taylor Blau @ 2021-04-09 18:11 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

This prepares the code in pack-bitmap to interpret the new multi-pack
bitmaps described in Documentation/technical/bitmap-format.txt, which
mostly involves converting bit positions to accommodate looking them up
in a MIDX.

Note that there are currently no writers who write multi-pack bitmaps,
and that this will be implemented in the subsequent commit.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/pack-objects.c |  12 +-
 pack-bitmap-write.c    |   2 +-
 pack-bitmap.c          | 349 +++++++++++++++++++++++++++++++++++++----
 pack-bitmap.h          |   5 +
 packfile.c             |   2 +-
 5 files changed, 338 insertions(+), 32 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 5205dde2e1..a4e4e4ebcc 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -984,7 +984,17 @@ static void write_reused_pack(struct hashfile *f)
 				break;
 
 			offset += ewah_bit_ctz64(word >> offset);
-			write_reused_pack_one(pos + offset, f, &w_curs);
+			if (bitmap_is_midx(bitmap_git)) {
+				off_t pack_offs = bitmap_pack_offset(bitmap_git,
+								     pos + offset);
+				uint32_t pos;
+
+				if (offset_to_pack_pos(reuse_packfile, pack_offs, &pos) < 0)
+					die(_("write_reused_pack: could not locate %"PRIdMAX),
+					    (intmax_t)pack_offs);
+				write_reused_pack_one(pos, f, &w_curs);
+			} else
+				write_reused_pack_one(pos + offset, f, &w_curs);
 			display_progress(progress_state, ++written);
 		}
 	}
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index f90e100e3e..020c1774c8 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -48,7 +48,7 @@ void bitmap_writer_show_progress(int show)
 }
 
 /**
- * Build the initial type index for the packfile
+ * Build the initial type index for the packfile or multi-pack-index
  */
 void bitmap_writer_build_type_index(struct packing_data *to_pack,
 				    struct pack_idx_entry **index,
diff --git a/pack-bitmap.c b/pack-bitmap.c
index be52570b0f..e41fce9675 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -13,6 +13,7 @@
 #include "repository.h"
 #include "object-store.h"
 #include "list-objects-filter-options.h"
+#include "midx.h"
 #include "config.h"
 
 /*
@@ -35,8 +36,15 @@ struct stored_bitmap {
  * the active bitmap index is the largest one.
  */
 struct bitmap_index {
-	/* Packfile to which this bitmap index belongs to */
+	/*
+	 * The pack or multi-pack index (MIDX) that this bitmap index belongs
+	 * to.
+	 *
+	 * Exactly one of these must be non-NULL; this specifies the object
+	 * order used to interpret this bitmap.
+	 */
 	struct packed_git *pack;
+	struct multi_pack_index *midx;
 
 	/*
 	 * Mark the first `reuse_objects` in the packfile as reused:
@@ -71,6 +79,8 @@ struct bitmap_index {
 	/* If not NULL, this is a name-hash cache pointing into map. */
 	uint32_t *hashes;
 
+	const unsigned char *checksum;
+
 	/*
 	 * Extended index.
 	 *
@@ -138,6 +148,8 @@ static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
 
 static uint32_t bitmap_num_objects(struct bitmap_index *index)
 {
+	if (index->midx)
+		return index->midx->num_objects;
 	return index->pack->num_objects;
 }
 
@@ -175,6 +187,7 @@ static int load_bitmap_header(struct bitmap_index *index)
 	}
 
 	index->entry_count = ntohl(header->entry_count);
+	index->checksum = header->checksum;
 	index->map_pos += header_size;
 	return 0;
 }
@@ -227,7 +240,10 @@ static void nth_bitmap_object_oid(struct bitmap_index *index,
 				  struct object_id *oid,
 				  uint32_t n)
 {
-	nth_packed_object_id(oid, index->pack, n);
+	if (index->midx)
+		nth_midxed_object_oid(oid, index->midx, n);
+	else
+		nth_packed_object_id(oid, index->pack, n);
 }
 
 static int load_bitmap_entries_v1(struct bitmap_index *index)
@@ -272,7 +288,14 @@ static int load_bitmap_entries_v1(struct bitmap_index *index)
 	return 0;
 }
 
-static char *pack_bitmap_filename(struct packed_git *p)
+char *midx_bitmap_filename(struct multi_pack_index *midx)
+{
+	return xstrfmt("%s-%s.bitmap",
+		       get_midx_filename(midx->object_dir),
+		       hash_to_hex(get_midx_checksum(midx)));
+}
+
+char *pack_bitmap_filename(struct packed_git *p)
 {
 	size_t len;
 
@@ -281,6 +304,54 @@ static char *pack_bitmap_filename(struct packed_git *p)
 	return xstrfmt("%.*s.bitmap", (int)len, p->pack_name);
 }
 
+static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
+			      struct multi_pack_index *midx)
+{
+	struct stat st;
+	char *idx_name = midx_bitmap_filename(midx);
+	int fd = git_open(idx_name);
+
+	free(idx_name);
+
+	if (fd < 0)
+		return -1;
+
+	if (fstat(fd, &st)) {
+		close(fd);
+		return -1;
+	}
+
+	if (bitmap_git->pack || bitmap_git->midx) {
+		/* ignore extra bitmap file; we can only handle one */
+		return -1;
+	}
+
+	bitmap_git->midx = midx;
+	bitmap_git->map_size = xsize_t(st.st_size);
+	bitmap_git->map_pos = 0;
+	bitmap_git->map = xmmap(NULL, bitmap_git->map_size, PROT_READ,
+				MAP_PRIVATE, fd, 0);
+	close(fd);
+
+	if (load_bitmap_header(bitmap_git) < 0)
+		goto cleanup;
+
+	if (!hasheq(get_midx_checksum(bitmap_git->midx), bitmap_git->checksum))
+		goto cleanup;
+
+	if (load_midx_revindex(bitmap_git->midx) < 0) {
+		warning(_("multi-pack bitmap is missing required reverse index"));
+		goto cleanup;
+	}
+	return 0;
+
+cleanup:
+	munmap(bitmap_git->map, bitmap_git->map_size);
+	bitmap_git->map_size = 0;
+	bitmap_git->map = NULL;
+	return -1;
+}
+
 static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git *packfile)
 {
 	int fd;
@@ -302,12 +373,18 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
 		return -1;
 	}
 
-	if (bitmap_git->pack) {
+	if (bitmap_git->pack || bitmap_git->midx) {
+		/* ignore extra bitmap file; we can only handle one */
 		warning("ignoring extra bitmap file: %s", packfile->pack_name);
 		close(fd);
 		return -1;
 	}
 
+	if (!is_pack_valid(packfile)) {
+		close(fd);
+		return -1;
+	}
+
 	bitmap_git->pack = packfile;
 	bitmap_git->map_size = xsize_t(st.st_size);
 	bitmap_git->map = xmmap(NULL, bitmap_git->map_size, PROT_READ, MAP_PRIVATE, fd, 0);
@@ -324,13 +401,36 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
 	return 0;
 }
 
-static int load_pack_bitmap(struct bitmap_index *bitmap_git)
+static int load_reverse_index(struct bitmap_index *bitmap_git)
+{
+	if (bitmap_is_midx(bitmap_git)) {
+		uint32_t i;
+		int ret;
+
+		ret = load_midx_revindex(bitmap_git->midx);
+		if (ret)
+			return ret;
+
+		for (i = 0; i < bitmap_git->midx->num_packs; i++) {
+			if (prepare_midx_pack(the_repository, bitmap_git->midx, i))
+				die(_("load_reverse_index: could not open pack"));
+			ret = load_pack_revindex(bitmap_git->midx->packs[i]);
+			if (ret)
+				return ret;
+		}
+		return 0;
+	}
+	return load_pack_revindex(bitmap_git->pack);
+}
+
+static int load_bitmap(struct bitmap_index *bitmap_git)
 {
 	assert(bitmap_git->map);
 
 	bitmap_git->bitmaps = kh_init_oid_map();
 	bitmap_git->ext_index.positions = kh_init_oid_pos();
-	if (load_pack_revindex(bitmap_git->pack))
+
+	if (load_reverse_index(bitmap_git))
 		goto failed;
 
 	if (!(bitmap_git->commits = read_bitmap_1(bitmap_git)) ||
@@ -374,11 +474,35 @@ static int open_pack_bitmap(struct repository *r,
 	return ret;
 }
 
+static int open_midx_bitmap(struct repository *r,
+			    struct bitmap_index *bitmap_git)
+{
+	struct multi_pack_index *midx;
+
+	assert(!bitmap_git->map);
+
+	for (midx = get_multi_pack_index(r); midx; midx = midx->next) {
+		if (!open_midx_bitmap_1(bitmap_git, midx))
+			return 0;
+	}
+	return -1;
+}
+
+static int open_bitmap(struct repository *r,
+		       struct bitmap_index *bitmap_git)
+{
+	assert(!bitmap_git->map);
+
+	if (!open_midx_bitmap(r, bitmap_git))
+		return 0;
+	return open_pack_bitmap(r, bitmap_git);
+}
+
 struct bitmap_index *prepare_bitmap_git(struct repository *r)
 {
 	struct bitmap_index *bitmap_git = xcalloc(1, sizeof(*bitmap_git));
 
-	if (!open_pack_bitmap(r, bitmap_git) && !load_pack_bitmap(bitmap_git))
+	if (!open_bitmap(r, bitmap_git) && !load_bitmap(bitmap_git))
 		return bitmap_git;
 
 	free_bitmap_index(bitmap_git);
@@ -428,10 +552,26 @@ static inline int bitmap_position_packfile(struct bitmap_index *bitmap_git,
 	return pos;
 }
 
+static int bitmap_position_midx(struct bitmap_index *bitmap_git,
+				const struct object_id *oid)
+{
+	uint32_t want, got;
+	if (!bsearch_midx(oid, bitmap_git->midx, &want))
+		return -1;
+
+	if (midx_to_pack_pos(bitmap_git->midx, want, &got) < 0)
+		return -1;
+	return got;
+}
+
 static int bitmap_position(struct bitmap_index *bitmap_git,
 			   const struct object_id *oid)
 {
-	int pos = bitmap_position_packfile(bitmap_git, oid);
+	int pos;
+	if (bitmap_is_midx(bitmap_git))
+		pos = bitmap_position_midx(bitmap_git, oid);
+	else
+		pos = bitmap_position_packfile(bitmap_git, oid);
 	return (pos >= 0) ? pos : bitmap_position_extended(bitmap_git, oid);
 }
 
@@ -721,6 +861,7 @@ static void show_objects_for_type(
 			continue;
 
 		for (offset = 0; offset < BITS_IN_EWORD; ++offset) {
+			struct packed_git *pack;
 			struct object_id oid;
 			uint32_t hash = 0, index_pos;
 			off_t ofs;
@@ -730,14 +871,28 @@ static void show_objects_for_type(
 
 			offset += ewah_bit_ctz64(word >> offset);
 
-			index_pos = pack_pos_to_index(bitmap_git->pack, pos + offset);
-			ofs = pack_pos_to_offset(bitmap_git->pack, pos + offset);
-			nth_packed_object_id(&oid, bitmap_git->pack, index_pos);
+			if (bitmap_is_midx(bitmap_git)) {
+				struct multi_pack_index *m = bitmap_git->midx;
+				uint32_t pack_id;
+
+				index_pos = pack_pos_to_midx(m, pos + offset);
+				ofs = nth_midxed_offset(m, index_pos);
+				nth_midxed_object_oid(&oid, m, index_pos);
+
+				pack_id = nth_midxed_pack_int_id(m, index_pos);
+				pack = bitmap_git->midx->packs[pack_id];
+			} else {
+				index_pos = pack_pos_to_index(bitmap_git->pack, pos + offset);
+				ofs = pack_pos_to_offset(bitmap_git->pack, pos + offset);
+				nth_bitmap_object_oid(bitmap_git, &oid, index_pos);
+
+				pack = bitmap_git->pack;
+			}
 
 			if (bitmap_git->hashes)
 				hash = get_be32(bitmap_git->hashes + index_pos);
 
-			show_reach(&oid, object_type, 0, hash, bitmap_git->pack, ofs);
+			show_reach(&oid, object_type, 0, hash, pack, ofs);
 		}
 	}
 }
@@ -749,8 +904,13 @@ static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
 		struct object *object = roots->item;
 		roots = roots->next;
 
-		if (find_pack_entry_one(object->oid.hash, bitmap_git->pack) > 0)
-			return 1;
+		if (bitmap_is_midx(bitmap_git)) {
+			if (bsearch_midx(&object->oid, bitmap_git->midx, NULL))
+				return 1;
+		} else {
+			if (find_pack_entry_one(object->oid.hash, bitmap_git->pack) > 0)
+				return 1;
+		}
 	}
 
 	return 0;
@@ -839,14 +999,26 @@ static void filter_bitmap_blob_none(struct bitmap_index *bitmap_git,
 static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
 				     uint32_t pos)
 {
-	struct packed_git *pack = bitmap_git->pack;
 	unsigned long size;
 	struct object_info oi = OBJECT_INFO_INIT;
 
 	oi.sizep = &size;
 
 	if (pos < bitmap_num_objects(bitmap_git)) {
-		off_t ofs = pack_pos_to_offset(pack, pos);
+		struct packed_git *pack;
+		off_t ofs;
+
+		if (bitmap_is_midx(bitmap_git)) {
+			uint32_t midx_pos = pack_pos_to_midx(bitmap_git->midx, pos);
+			uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
+
+			pack = bitmap_git->midx->packs[pack_id];
+			ofs = nth_midxed_offset(bitmap_git->midx, midx_pos);
+		} else {
+			pack = bitmap_git->pack;
+			ofs = pack_pos_to_offset(pack, pos);
+		}
+
 		if (packed_object_info(the_repository, pack, ofs, &oi) < 0) {
 			struct object_id oid;
 			nth_bitmap_object_oid(bitmap_git, &oid,
@@ -990,7 +1162,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	/* try to open a bitmapped pack, but don't parse it yet
 	 * because we may not need to use it */
 	CALLOC_ARRAY(bitmap_git, 1);
-	if (open_pack_bitmap(revs->repo, bitmap_git) < 0)
+	if (open_bitmap(revs->repo, bitmap_git) < 0)
 		goto cleanup;
 
 	for (i = 0; i < revs->pending.nr; ++i) {
@@ -1034,7 +1206,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs,
 	 * from disk. this is the point of no return; after this the rev_list
 	 * becomes invalidated and we must perform the revwalk through bitmaps
 	 */
-	if (load_pack_bitmap(bitmap_git) < 0)
+	if (load_bitmap(bitmap_git) < 0)
 		goto cleanup;
 
 	object_array_clear(&revs->pending);
@@ -1081,15 +1253,29 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
 			      struct bitmap *reuse,
 			      struct pack_window **w_curs)
 {
-	off_t offset, header;
+	struct packed_git *pack;
+	off_t offset, delta_obj_offset;
 	enum object_type type;
 	unsigned long size;
 
 	if (pos >= bitmap_num_objects(bitmap_git))
 		return; /* not actually in the pack or MIDX */
 
-	offset = header = pack_pos_to_offset(bitmap_git->pack, pos);
-	type = unpack_object_header(bitmap_git->pack, w_curs, &offset, &size);
+	if (bitmap_is_midx(bitmap_git)) {
+		uint32_t pack_id, midx_pos;
+
+		midx_pos = pack_pos_to_midx(bitmap_git->midx, pos);
+		pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
+
+		pack = bitmap_git->midx->packs[pack_id];
+		offset = nth_midxed_offset(bitmap_git->midx, midx_pos);
+	} else {
+		pack = bitmap_git->pack;
+		offset = pack_pos_to_offset(bitmap_git->pack, pos);
+	}
+
+	delta_obj_offset = offset;
+	type = unpack_object_header(pack, w_curs, &offset, &size);
 	if (type < 0)
 		return; /* broken packfile, punt */
 
@@ -1105,11 +1291,11 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
 		 * and the normal slow path will complain about it in
 		 * more detail.
 		 */
-		base_offset = get_delta_base(bitmap_git->pack, w_curs,
-					     &offset, type, header);
+		base_offset = get_delta_base(pack, w_curs, &offset, type,
+					     delta_obj_offset);
 		if (!base_offset)
 			return;
-		if (offset_to_pack_pos(bitmap_git->pack, base_offset, &base_pos) < 0)
+		if (offset_to_pack_pos(pack, base_offset, &base_pos) < 0)
 			return;
 
 		/*
@@ -1120,6 +1306,16 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
 		 * packs we write fresh, and OFS_DELTA is the default). But
 		 * let's double check to make sure the pack wasn't written with
 		 * odd parameters.
+		 *
+		 * Note that the base does not need to be repositioned, i.e.,
+		 * the MIDX is guaranteed to have selected the copy of "base"
+		 * from the same pack, since this function is only ever called
+		 * on the preferred pack (and all duplicate objects are resolved
+		 * in favor of the preferred pack).
+		 *
+		 * This means that we can reuse base_pos when looking up the bit
+		 * in the reuse bitmap, too, since bits corresponding to the
+		 * preferred pack precede all bits from other packs.
 		 */
 		if (base_pos >= pos)
 			return;
@@ -1142,6 +1338,14 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
 	bitmap_set(reuse, pos);
 }
 
+static uint32_t midx_preferred_pack(struct bitmap_index *bitmap_git)
+{
+	struct multi_pack_index *m = bitmap_git->midx;
+	if (!m)
+		BUG("midx_preferred_pack: requires non-empty MIDX");
+	return nth_midxed_pack_int_id(m, pack_pos_to_midx(bitmap_git->midx, 0));
+}
+
 int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 				       struct packed_git **packfile_out,
 				       uint32_t *entries,
@@ -1153,13 +1357,29 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 	size_t i = 0;
 	uint32_t offset;
 	uint32_t objects_nr = bitmap_num_objects(bitmap_git);
+	uint32_t preferred_pack = 0;
 
 	assert(result);
 
+	load_reverse_index(bitmap_git);
+
+	if (bitmap_is_midx(bitmap_git)) {
+		preferred_pack = midx_preferred_pack(bitmap_git);
+		objects_nr = bitmap_git->midx->packs[preferred_pack]->num_objects;
+	} else
+		objects_nr = bitmap_git->pack->num_objects;
+
 	while (i < result->word_alloc && result->words[i] == (eword_t)~0)
 		i++;
 
-	/* Don't mark objects not in the packfile */
+	/*
+	 * Don't mark objects not in the packfile or preferred pack. This bitmap
+	 * marks objects eligible for reuse, but the pack-reuse code only
+	 * understands how to reuse a single pack. Since the preferred pack is
+	 * guaranteed to have all bases for its deltas (in a multi-pack bitmap),
+	 * we use it instead of another pack. In single-pack bitmaps, the choice
+	 * is made for us.
+	 */
 	if (i > objects_nr / BITS_IN_EWORD)
 		i = objects_nr / BITS_IN_EWORD;
 
@@ -1175,6 +1395,14 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 				break;
 
 			offset += ewah_bit_ctz64(word >> offset);
+			if (bitmap_is_midx(bitmap_git)) {
+				/*
+				 * Can't reuse from a non-preferred pack (see
+				 * above).
+				 */
+				if (pos + offset >= objects_nr)
+					continue;
+			}
 			try_partial_reuse(bitmap_git, pos + offset, reuse, &w_curs);
 		}
 	}
@@ -1192,7 +1420,9 @@ int reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 	 * need to be handled separately.
 	 */
 	bitmap_and_not(result, reuse);
-	*packfile_out = bitmap_git->pack;
+	*packfile_out = bitmap_git->pack ?
+		bitmap_git->pack :
+		bitmap_git->midx->packs[preferred_pack];
 	*reuse_out = reuse;
 	return 0;
 }
@@ -1466,6 +1696,12 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
 	uint32_t i, num_objects;
 	uint32_t *reposition;
 
+	if (!bitmap_is_midx(bitmap_git))
+		load_reverse_index(bitmap_git);
+	else if (load_midx_revindex(bitmap_git->midx) < 0)
+		BUG("rebuild_existing_bitmaps: missing required rev-cache "
+		    "extension");
+
 	num_objects = bitmap_num_objects(bitmap_git);
 	CALLOC_ARRAY(reposition, num_objects);
 
@@ -1473,8 +1709,13 @@ uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git,
 		struct object_id oid;
 		struct object_entry *oe;
 
-		nth_packed_object_id(&oid, bitmap_git->pack,
-				     pack_pos_to_index(bitmap_git->pack, i));
+		if (bitmap_is_midx(bitmap_git))
+			nth_midxed_object_oid(&oid,
+					      bitmap_git->midx,
+					      pack_pos_to_midx(bitmap_git->midx, i));
+		else
+			nth_packed_object_id(&oid, bitmap_git->pack,
+					     pack_pos_to_index(bitmap_git->pack, i));
 		oe = packlist_find(mapping, &oid);
 
 		if (oe)
@@ -1500,6 +1741,19 @@ void free_bitmap_index(struct bitmap_index *b)
 	free(b->ext_index.hashes);
 	bitmap_free(b->result);
 	bitmap_free(b->haves);
+	if (bitmap_is_midx(b)) {
+		/*
+		 * Multi-pack bitmaps need to have resources associated with
+		 * their on-disk reverse indexes unmapped so that stale .rev and
+		 * .bitmap files can be removed.
+		 *
+		 * Unlike pack-based bitmaps, multi-pack bitmaps can be read and
+		 * written in the same 'git multi-pack-index write --bitmap'
+		 * process. Close resources so they can be removed safely on
+		 * platforms like Windows.
+		 */
+		close_midx_revindex(b->midx);
+	}
 	free(b);
 }
 
@@ -1514,7 +1768,7 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
 				     enum object_type object_type)
 {
 	struct bitmap *result = bitmap_git->result;
-	struct packed_git *pack = bitmap_git->pack;
+	struct packed_git *pack;
 	off_t total = 0;
 	struct ewah_iterator it;
 	eword_t filter;
@@ -1538,6 +1792,29 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
 
 			offset += ewah_bit_ctz64(word >> offset);
 			pos = base + offset;
+
+			if (bitmap_is_midx(bitmap_git)) {
+				uint32_t pack_pos;
+				uint32_t midx_pos = pack_pos_to_midx(bitmap_git->midx, pos);
+				uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
+				off_t offset = nth_midxed_offset(bitmap_git->midx, midx_pos);
+
+				pack = bitmap_git->midx->packs[pack_id];
+
+				if (offset_to_pack_pos(pack, offset, &pack_pos) < 0) {
+					struct object_id oid;
+					nth_midxed_object_oid(&oid, bitmap_git->midx, midx_pos);
+
+					die(_("could not find %s in pack #%"PRIu32" at offset %"PRIuMAX),
+					    oid_to_hex(&oid),
+					    pack_id,
+					    (uintmax_t)offset);
+				}
+
+				pos = pack_pos;
+			} else
+				pack = bitmap_git->pack;
+
 			total += pack_pos_to_offset(pack, pos + 1) -
 				 pack_pos_to_offset(pack, pos);
 		}
@@ -1590,6 +1867,20 @@ off_t get_disk_usage_from_bitmap(struct bitmap_index *bitmap_git,
 	return total;
 }
 
+int bitmap_is_midx(struct bitmap_index *bitmap_git)
+{
+	return !!bitmap_git->midx;
+}
+
+off_t bitmap_pack_offset(struct bitmap_index *bitmap_git, uint32_t pos)
+{
+	if (bitmap_is_midx(bitmap_git))
+		return nth_midxed_offset(bitmap_git->midx,
+					 pack_pos_to_midx(bitmap_git->midx, pos));
+	return nth_packed_object_offset(bitmap_git->pack,
+					pack_pos_to_index(bitmap_git->pack, pos));
+}
+
 const struct string_list *bitmap_preferred_tips(struct repository *r)
 {
 	return repo_config_get_value_multi(r, "pack.preferbitmaptips");
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 0bf75ff2a7..0dc6f7a7e4 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -91,6 +91,11 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 			  uint32_t index_nr,
 			  const char *filename,
 			  uint16_t options);
+char *midx_bitmap_filename(struct multi_pack_index *midx);
+char *pack_bitmap_filename(struct packed_git *p);
+
+int bitmap_is_midx(struct bitmap_index *bitmap_git);
+off_t bitmap_pack_offset(struct bitmap_index *bitmap_git, uint32_t pos);
 
 const struct string_list *bitmap_preferred_tips(struct repository *r);
 int bitmap_is_preferred_refname(struct repository *r, const char *refname);
diff --git a/packfile.c b/packfile.c
index 8668345d93..c444e365a3 100644
--- a/packfile.c
+++ b/packfile.c
@@ -863,7 +863,7 @@ static void prepare_pack(const char *full_name, size_t full_name_len,
 	if (!strcmp(file_name, "multi-pack-index"))
 		return;
 	if (starts_with(file_name, "multi-pack-index") &&
-	    ends_with(file_name, ".rev"))
+	    (ends_with(file_name, ".bitmap") || ends_with(file_name, ".rev")))
 		return;
 	if (ends_with(file_name, ".idx") ||
 	    ends_with(file_name, ".rev") ||
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 13/22] pack-bitmap: write multi-pack bitmaps
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (11 preceding siblings ...)
  2021-04-09 18:11 ` [PATCH 12/22] pack-bitmap: read multi-pack bitmaps Taylor Blau
@ 2021-04-09 18:11 ` Taylor Blau
  2021-05-04  5:02   ` Jonathan Tan
  2021-04-09 18:11 ` [PATCH 14/22] t5310: move some tests to lib-bitmap.sh Taylor Blau
                   ` (8 subsequent siblings)
  21 siblings, 1 reply; 32+ messages in thread
From: Taylor Blau @ 2021-04-09 18:11 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Write multi-pack bitmaps in the format described by
Documentation/technical/bitmap-format.txt, inferring their presence with
the absence of '--bitmap'.

To write a multi-pack bitmap, this patch attempts to reuse as much of
the existing machinery from pack-objects as possible. Specifically, the
MIDX code prepares a packing_data struct that pretends as if a single
packfile has been generated containing all of the objects contained
within the MIDX.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/git-multi-pack-index.txt |  12 +-
 builtin/multi-pack-index.c             |   2 +
 midx.c                                 | 195 ++++++++++++++++++++++++-
 midx.h                                 |   1 +
 4 files changed, 202 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-multi-pack-index.txt b/Documentation/git-multi-pack-index.txt
index ffd601bc17..ada14deb2c 100644
--- a/Documentation/git-multi-pack-index.txt
+++ b/Documentation/git-multi-pack-index.txt
@@ -10,7 +10,7 @@ SYNOPSIS
 --------
 [verse]
 'git multi-pack-index' [--object-dir=<dir>] [--[no-]progress]
-	[--preferred-pack=<pack>] <subcommand>
+	[--preferred-pack=<pack>] [--[no-]bitmap] <subcommand>
 
 DESCRIPTION
 -----------
@@ -40,6 +40,9 @@ write::
 		multiple packs contain the same object. If not given,
 		ties are broken in favor of the pack with the lowest
 		mtime.
+
+	--[no-]bitmap::
+		Control whether or not a multi-pack bitmap is written.
 --
 
 verify::
@@ -81,6 +84,13 @@ EXAMPLES
 $ git multi-pack-index write
 -----------------------------------------------
 
+* Write a MIDX file for the packfiles in the current .git folder with a
+corresponding bitmap.
++
+-------------------------------------------------------------
+$ git multi-pack-index write --preferred-pack <pack> --bitmap
+-------------------------------------------------------------
+
 * Write a MIDX file for the packfiles in an alternate object store.
 +
 -----------------------------------------------
diff --git a/builtin/multi-pack-index.c b/builtin/multi-pack-index.c
index 5d3ea445fd..bf6fa982e3 100644
--- a/builtin/multi-pack-index.c
+++ b/builtin/multi-pack-index.c
@@ -68,6 +68,8 @@ static int cmd_multi_pack_index_write(int argc, const char **argv)
 		OPT_STRING(0, "preferred-pack", &opts.preferred_pack,
 			   N_("preferred-pack"),
 			   N_("pack for reuse when computing a multi-pack bitmap")),
+		OPT_BIT(0, "bitmap", &opts.flags, N_("write multi-pack bitmap"),
+			MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX),
 		OPT_END(),
 	};
 
diff --git a/midx.c b/midx.c
index 567cdf0fcf..32d7d184c0 100644
--- a/midx.c
+++ b/midx.c
@@ -13,6 +13,10 @@
 #include "repository.h"
 #include "chunk-format.h"
 #include "pack.h"
+#include "pack-bitmap.h"
+#include "refs.h"
+#include "revision.h"
+#include "list-objects.h"
 
 #define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
 #define MIDX_VERSION 1
@@ -885,6 +889,145 @@ static void write_midx_reverse_index(char *midx_name, unsigned char *midx_hash,
 static void clear_midx_files_ext(struct repository *r, const char *ext,
 				 unsigned char *keep_hash);
 
+static void prepare_midx_packing_data(struct packing_data *pdata,
+				      struct write_midx_context *ctx)
+{
+	uint32_t i;
+
+	memset(pdata, 0, sizeof(struct packing_data));
+	prepare_packing_data(the_repository, pdata);
+
+	for (i = 0; i < ctx->entries_nr; i++) {
+		struct pack_midx_entry *from = &ctx->entries[ctx->pack_order[i]];
+		struct object_entry *to = packlist_alloc(pdata, &from->oid);
+
+		oe_set_in_pack(pdata, to,
+			       ctx->info[ctx->pack_perm[from->pack_int_id]].p);
+	}
+}
+
+static int add_ref_to_pending(const char *refname,
+			      const struct object_id *oid,
+			      int flag, void *cb_data)
+{
+	struct rev_info *revs = (struct rev_info*)cb_data;
+	struct object *object;
+
+	if ((flag & REF_ISSYMREF) && (flag & REF_ISBROKEN)) {
+		warning("symbolic ref is dangling: %s", refname);
+		return 0;
+	}
+
+	object = parse_object_or_die(oid, refname);
+	if (object->type != OBJ_COMMIT)
+		return 0;
+
+	add_pending_object(revs, object, "");
+	if (bitmap_is_preferred_refname(revs->repo, refname))
+		object->flags |= NEEDS_BITMAP;
+	return 0;
+}
+
+struct bitmap_commit_cb {
+	struct commit **commits;
+	size_t commits_nr, commits_alloc;
+
+	struct write_midx_context *ctx;
+};
+
+static const struct object_id *bitmap_oid_access(size_t index,
+						 const void *_entries)
+{
+	const struct pack_midx_entry *entries = _entries;
+	return &entries[index].oid;
+}
+
+static void bitmap_show_commit(struct commit *commit, void *_data)
+{
+	struct bitmap_commit_cb *data = _data;
+	if (oid_pos(&commit->object.oid, data->ctx->entries,
+		    data->ctx->entries_nr,
+		    bitmap_oid_access) > -1) {
+		ALLOC_GROW(data->commits, data->commits_nr + 1,
+			   data->commits_alloc);
+		data->commits[data->commits_nr++] = commit;
+	}
+}
+
+static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr_p,
+						    struct write_midx_context *ctx)
+{
+	struct rev_info revs;
+	struct bitmap_commit_cb cb;
+
+	memset(&cb, 0, sizeof(struct bitmap_commit_cb));
+	cb.ctx = ctx;
+
+	repo_init_revisions(the_repository, &revs, NULL);
+	for_each_ref(add_ref_to_pending, &revs);
+
+	fetch_if_missing = 0;
+	revs.exclude_promisor_objects = 1;
+
+	if (prepare_revision_walk(&revs))
+		die(_("revision walk setup failed"));
+
+	traverse_commit_list(&revs, bitmap_show_commit, NULL, &cb);
+	if (indexed_commits_nr_p)
+		*indexed_commits_nr_p = cb.commits_nr;
+
+	return cb.commits;
+}
+
+static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
+			     struct write_midx_context *ctx,
+			     unsigned flags)
+{
+	struct packing_data pdata;
+	struct pack_idx_entry **index;
+	struct commit **commits = NULL;
+	uint32_t i, commits_nr;
+	char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, hash_to_hex(midx_hash));
+	int ret;
+
+	prepare_midx_packing_data(&pdata, ctx);
+
+	commits = find_commits_for_midx_bitmap(&commits_nr, ctx);
+
+	/*
+	 * Build the MIDX-order index based on pdata.objects (which is already
+	 * in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of
+	 * this order).
+	 */
+	ALLOC_ARRAY(index, pdata.nr_objects);
+	for (i = 0; i < pdata.nr_objects; i++)
+		index[i] = (struct pack_idx_entry *)&pdata.objects[i];
+
+	bitmap_writer_show_progress(flags & MIDX_PROGRESS);
+	bitmap_writer_build_type_index(&pdata, index, pdata.nr_objects);
+
+	/*
+	 * bitmap_writer_select_commits expects objects in lex order, but
+	 * pack_order gives us exactly that. use it directly instead of
+	 * re-sorting the array
+	 */
+	for (i = 0; i < pdata.nr_objects; i++)
+		index[ctx->pack_order[i]] = (struct pack_idx_entry *)&pdata.objects[i];
+
+	bitmap_writer_select_commits(commits, commits_nr, -1);
+	ret = bitmap_writer_build(&pdata);
+	if (!ret)
+		goto cleanup;
+
+	bitmap_writer_set_checksum(midx_hash);
+	bitmap_writer_finish(index, pdata.nr_objects, bitmap_name, 0);
+
+cleanup:
+	free(index);
+	free(bitmap_name);
+	return ret;
+}
+
 static int write_midx_internal(const char *object_dir, struct multi_pack_index *m,
 			       struct string_list *packs_to_drop,
 			       const char *preferred_pack_name,
@@ -930,9 +1073,16 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 		for (i = 0; i < ctx.m->num_packs; i++) {
 			ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc);
 
+			if (prepare_midx_pack(the_repository, ctx.m, i)) {
+				error(_("could not load pack %s"),
+				      ctx.m->pack_names[i]);
+				result = 1;
+				goto cleanup;
+			}
+
 			ctx.info[ctx.nr].orig_pack_int_id = i;
 			ctx.info[ctx.nr].pack_name = xstrdup(ctx.m->pack_names[i]);
-			ctx.info[ctx.nr].p = NULL;
+			ctx.info[ctx.nr].p = ctx.m->packs[i];
 			ctx.info[ctx.nr].expired = 0;
 			ctx.nr++;
 		}
@@ -947,8 +1097,26 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx);
 	stop_progress(&ctx.progress);
 
-	if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop)
-		goto cleanup;
+	if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop) {
+		struct bitmap_index *bitmap_git;
+		int bitmap_exists;
+		int want_bitmap = flags & MIDX_WRITE_BITMAP;
+
+		bitmap_git = prepare_bitmap_git(the_repository);
+		bitmap_exists = bitmap_git && bitmap_is_midx(bitmap_git);
+		free_bitmap_index(bitmap_git);
+
+		if (bitmap_exists || !want_bitmap) {
+			/*
+			 * The correct MIDX already exists, and so does a
+			 * corresponding bitmap (or one wasn't requested).
+			 */
+			if (!want_bitmap)
+				clear_midx_files_ext(the_repository, ".bitmap",
+						     NULL);
+			goto cleanup;
+		}
+	}
 
 	ctx.preferred_pack_idx = -1;
 	if (preferred_pack_name) {
@@ -1048,9 +1216,6 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	hold_lock_file_for_update(&lk, midx_name, LOCK_DIE_ON_ERROR);
 	f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
 
-	if (ctx.m)
-		close_midx(ctx.m);
-
 	if (ctx.nr - dropped_packs == 0) {
 		error(_("no pack files to index."));
 		result = 1;
@@ -1081,14 +1246,17 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
 	free_chunkfile(cf);
 
-	if (flags & MIDX_WRITE_REV_INDEX)
+	if (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))
 		ctx.pack_order = midx_pack_order(&ctx);
 
 	if (flags & MIDX_WRITE_REV_INDEX)
 		write_midx_reverse_index(midx_name, midx_hash, &ctx);
+	if (flags & MIDX_WRITE_BITMAP)
+		write_midx_bitmap(midx_name, midx_hash, &ctx, flags);
 
 	commit_lock_file(&lk);
 
+	clear_midx_files_ext(the_repository, ".bitmap", midx_hash);
 	clear_midx_files_ext(the_repository, ".rev", midx_hash);
 
 cleanup:
@@ -1096,6 +1264,15 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 		if (ctx.info[i].p) {
 			close_pack(ctx.info[i].p);
 			free(ctx.info[i].p);
+			if (ctx.m) {
+				/*
+				 * Destroy a stale reference to the pack in
+				 * 'ctx.m'.
+				 */
+				uint32_t orig = ctx.info[i].orig_pack_int_id;
+				if (orig < ctx.m->num_packs)
+					ctx.m->packs[orig] = NULL;
+			}
 		}
 		free(ctx.info[i].pack_name);
 	}
@@ -1105,6 +1282,9 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
 	free(ctx.pack_perm);
 	free(ctx.pack_order);
 	free(midx_name);
+	if (ctx.m)
+		close_midx(ctx.m);
+
 	return result;
 }
 
@@ -1166,6 +1346,7 @@ void clear_midx_file(struct repository *r)
 	if (remove_path(midx))
 		die(_("failed to clear multi-pack-index at %s"), midx);
 
+	clear_midx_files_ext(r, ".bitmap", NULL);
 	clear_midx_files_ext(r, ".rev", NULL);
 
 	free(midx);
diff --git a/midx.h b/midx.h
index 1172df1a71..350f4d0a7b 100644
--- a/midx.h
+++ b/midx.h
@@ -41,6 +41,7 @@ struct multi_pack_index {
 
 #define MIDX_PROGRESS     (1 << 0)
 #define MIDX_WRITE_REV_INDEX (1 << 1)
+#define MIDX_WRITE_BITMAP (1 << 2)
 
 const unsigned char *get_midx_checksum(struct multi_pack_index *m);
 char *get_midx_filename(const char *object_dir);
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 14/22] t5310: move some tests to lib-bitmap.sh
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (12 preceding siblings ...)
  2021-04-09 18:11 ` [PATCH 13/22] pack-bitmap: write " Taylor Blau
@ 2021-04-09 18:11 ` Taylor Blau
  2021-04-09 18:11 ` [PATCH 15/22] t/helper/test-read-midx.c: add --checksum mode Taylor Blau
                   ` (7 subsequent siblings)
  21 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-04-09 18:11 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

We'll soon be adding a test script that will cover many of the same
bitmap concepts as t5310, but for MIDX bitmaps. Let's pull out as many
of the applicable tests as we can so we don't have to rewrite them.

There should be no functional change to t5310; we still run the same
operations in the same order.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/lib-bitmap.sh         | 212 ++++++++++++++++++++++++++++++++++++++++
 t/t5310-pack-bitmaps.sh | 204 +-------------------------------------
 2 files changed, 216 insertions(+), 200 deletions(-)

diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
index fe3f98be24..c655a9bf35 100644
--- a/t/lib-bitmap.sh
+++ b/t/lib-bitmap.sh
@@ -1,3 +1,6 @@
+# Helpers for scripts testing bitamp functionality; see t5310 for
+# example usage.
+
 # Compare a file containing rev-list bitmap traversal output to its non-bitmap
 # counterpart. You can't just use test_cmp for this, because the two produce
 # subtly different output:
@@ -24,3 +27,212 @@ test_bitmap_traversal () {
 	test_cmp "$1.normalized" "$2.normalized" &&
 	rm -f "$1.normalized" "$2.normalized"
 }
+
+# To ensure the logic for "maximal commits" is exercised, make
+# the repository a bit more complicated.
+#
+#    other                         second
+#      *                             *
+# (99 commits)                  (99 commits)
+#      *                             *
+#      |\                           /|
+#      | * octo-other  octo-second * |
+#      |/|\_________  ____________/|\|
+#      | \          \/  __________/  |
+#      |  | ________/\ /             |
+#      *  |/          * merge-right  *
+#      | _|__________/ \____________ |
+#      |/ |                         \|
+# (l1) *  * merge-left               * (r1)
+#      | / \________________________ |
+#      |/                           \|
+# (l2) *                             * (r2)
+#       \___________________________ |
+#                                   \|
+#                                    * (base)
+#
+# We only push bits down the first-parent history, which
+# makes some of these commits unimportant!
+#
+# The important part for the maximal commit algorithm is how
+# the bitmasks are extended. Assuming starting bit positions
+# for second (bit 0) and other (bit 1), the bitmasks at the
+# end should be:
+#
+#      second: 1       (maximal, selected)
+#       other: 01      (maximal, selected)
+#      (base): 11 (maximal)
+#
+# This complicated history was important for a previous
+# version of the walk that guarantees never walking a
+# commit multiple times. That goal might be important
+# again, so preserve this complicated case. For now, this
+# test will guarantee that the bitmaps are computed
+# correctly, even with the repeat calculations.
+setup_bitmap_history() {
+	test_expect_success 'setup repo with moderate-sized history' '
+		test_commit_bulk --id=file 10 &&
+		git branch -M second &&
+		git checkout -b other HEAD~5 &&
+		test_commit_bulk --id=side 10 &&
+
+		# add complicated history setup, including merges and
+		# ambiguous merge-bases
+
+		git checkout -b merge-left other~2 &&
+		git merge second~2 -m "merge-left" &&
+
+		git checkout -b merge-right second~1 &&
+		git merge other~1 -m "merge-right" &&
+
+		git checkout -b octo-second second &&
+		git merge merge-left merge-right -m "octopus-second" &&
+
+		git checkout -b octo-other other &&
+		git merge merge-left merge-right -m "octopus-other" &&
+
+		git checkout other &&
+		git merge octo-other -m "pull octopus" &&
+
+		git checkout second &&
+		git merge octo-second -m "pull octopus" &&
+
+		# Remove these branches so they are not selected
+		# as bitmap tips
+		git branch -D merge-left &&
+		git branch -D merge-right &&
+		git branch -D octo-other &&
+		git branch -D octo-second &&
+
+		# add padding to make these merges less interesting
+		# and avoid having them selected for bitmaps
+		test_commit_bulk --id=file 100 &&
+		git checkout other &&
+		test_commit_bulk --id=side 100 &&
+		git checkout second &&
+
+		bitmaptip=$(git rev-parse second) &&
+		blob=$(echo tagged-blob | git hash-object -w --stdin) &&
+		git tag tagged-blob $blob
+	'
+}
+
+rev_list_tests_head () {
+	test_expect_success "counting commits via bitmap ($state, $branch)" '
+		git rev-list --count $branch >expect &&
+		git rev-list --use-bitmap-index --count $branch >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "counting partial commits via bitmap ($state, $branch)" '
+		git rev-list --count $branch~5..$branch >expect &&
+		git rev-list --use-bitmap-index --count $branch~5..$branch >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "counting commits with limit ($state, $branch)" '
+		git rev-list --count -n 1 $branch >expect &&
+		git rev-list --use-bitmap-index --count -n 1 $branch >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "counting non-linear history ($state, $branch)" '
+		git rev-list --count other...second >expect &&
+		git rev-list --use-bitmap-index --count other...second >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "counting commits with limiting ($state, $branch)" '
+		git rev-list --count $branch -- 1.t >expect &&
+		git rev-list --use-bitmap-index --count $branch -- 1.t >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "counting objects via bitmap ($state, $branch)" '
+		git rev-list --count --objects $branch >expect &&
+		git rev-list --use-bitmap-index --count --objects $branch >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success "enumerate commits ($state, $branch)" '
+		git rev-list --use-bitmap-index $branch >actual &&
+		git rev-list $branch >expect &&
+		test_bitmap_traversal --no-confirm-bitmaps expect actual
+	'
+
+	test_expect_success "enumerate --objects ($state, $branch)" '
+		git rev-list --objects --use-bitmap-index $branch >actual &&
+		git rev-list --objects $branch >expect &&
+		test_bitmap_traversal expect actual
+	'
+
+	test_expect_success "bitmap --objects handles non-commit objects ($state, $branch)" '
+		git rev-list --objects --use-bitmap-index $branch tagged-blob >actual &&
+		grep $blob actual
+	'
+}
+
+rev_list_tests () {
+	state=$1
+
+	for branch in "second" "other"
+	do
+		rev_list_tests_head
+	done
+}
+
+basic_bitmap_tests () {
+	tip="$1"
+	test_expect_success 'rev-list --test-bitmap verifies bitmaps' "
+		git rev-list --test-bitmap "${tip:-HEAD}"
+	"
+
+	rev_list_tests 'full bitmap'
+
+	test_expect_success 'clone from bitmapped repository' '
+		rm -fr clone.git &&
+		git clone --no-local --bare . clone.git &&
+		git rev-parse HEAD >expect &&
+		git --git-dir=clone.git rev-parse HEAD >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success 'partial clone from bitmapped repository' '
+		test_config uploadpack.allowfilter true &&
+		rm -fr partial-clone.git &&
+		git clone --no-local --bare --filter=blob:none . partial-clone.git &&
+		(
+			cd partial-clone.git &&
+			pack=$(echo objects/pack/*.pack) &&
+			git verify-pack -v "$pack" >have &&
+			awk "/blob/ { print \$1 }" <have >blobs &&
+			# we expect this single blob because of the direct ref
+			git rev-parse refs/tags/tagged-blob >expect &&
+			test_cmp expect blobs
+		)
+	'
+
+	test_expect_success 'setup further non-bitmapped commits' '
+		test_commit_bulk --id=further 10
+	'
+
+	rev_list_tests 'partial bitmap'
+
+	test_expect_success 'fetch (partial bitmap)' '
+		git --git-dir=clone.git fetch origin second:second &&
+		git rev-parse HEAD >expect &&
+		git --git-dir=clone.git rev-parse HEAD >actual &&
+		test_cmp expect actual
+	'
+}
+
+# have_delta <obj> <expected_base>
+#
+# Note that because this relies on cat-file, it might find _any_ copy of an
+# object in the repository. The caller is responsible for making sure
+# there's only one (e.g., via "repack -ad", or having just fetched a copy).
+have_delta () {
+	echo $2 >expect &&
+	echo $1 | git cat-file --batch-check="%(deltabase)" >actual &&
+	test_cmp expect actual
+}
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index f53efc8229..4318f84d53 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -25,93 +25,10 @@ has_any () {
 	grep -Ff "$1" "$2"
 }
 
-# To ensure the logic for "maximal commits" is exercised, make
-# the repository a bit more complicated.
-#
-#    other                         second
-#      *                             *
-# (99 commits)                  (99 commits)
-#      *                             *
-#      |\                           /|
-#      | * octo-other  octo-second * |
-#      |/|\_________  ____________/|\|
-#      | \          \/  __________/  |
-#      |  | ________/\ /             |
-#      *  |/          * merge-right  *
-#      | _|__________/ \____________ |
-#      |/ |                         \|
-# (l1) *  * merge-left               * (r1)
-#      | / \________________________ |
-#      |/                           \|
-# (l2) *                             * (r2)
-#       \___________________________ |
-#                                   \|
-#                                    * (base)
-#
-# We only push bits down the first-parent history, which
-# makes some of these commits unimportant!
-#
-# The important part for the maximal commit algorithm is how
-# the bitmasks are extended. Assuming starting bit positions
-# for second (bit 0) and other (bit 1), the bitmasks at the
-# end should be:
-#
-#      second: 1       (maximal, selected)
-#       other: 01      (maximal, selected)
-#      (base): 11 (maximal)
-#
-# This complicated history was important for a previous
-# version of the walk that guarantees never walking a
-# commit multiple times. That goal might be important
-# again, so preserve this complicated case. For now, this
-# test will guarantee that the bitmaps are computed
-# correctly, even with the repeat calculations.
+setup_bitmap_history
 
-test_expect_success 'setup repo with moderate-sized history' '
-	test_commit_bulk --id=file 10 &&
-	git branch -M second &&
-	git checkout -b other HEAD~5 &&
-	test_commit_bulk --id=side 10 &&
-
-	# add complicated history setup, including merges and
-	# ambiguous merge-bases
-
-	git checkout -b merge-left other~2 &&
-	git merge second~2 -m "merge-left" &&
-
-	git checkout -b merge-right second~1 &&
-	git merge other~1 -m "merge-right" &&
-
-	git checkout -b octo-second second &&
-	git merge merge-left merge-right -m "octopus-second" &&
-
-	git checkout -b octo-other other &&
-	git merge merge-left merge-right -m "octopus-other" &&
-
-	git checkout other &&
-	git merge octo-other -m "pull octopus" &&
-
-	git checkout second &&
-	git merge octo-second -m "pull octopus" &&
-
-	# Remove these branches so they are not selected
-	# as bitmap tips
-	git branch -D merge-left &&
-	git branch -D merge-right &&
-	git branch -D octo-other &&
-	git branch -D octo-second &&
-
-	# add padding to make these merges less interesting
-	# and avoid having them selected for bitmaps
-	test_commit_bulk --id=file 100 &&
-	git checkout other &&
-	test_commit_bulk --id=side 100 &&
-	git checkout second &&
-
-	bitmaptip=$(git rev-parse second) &&
-	blob=$(echo tagged-blob | git hash-object -w --stdin) &&
-	git tag tagged-blob $blob &&
-	git config repack.writebitmaps true
+test_expect_success 'setup writing bitmaps during repack' '
+	git config repack.writeBitmaps true
 '
 
 test_expect_success 'full repack creates bitmaps' '
@@ -123,109 +40,7 @@ test_expect_success 'full repack creates bitmaps' '
 	grep "\"key\":\"num_maximal_commits\",\"value\":\"107\"" trace
 '
 
-test_expect_success 'rev-list --test-bitmap verifies bitmaps' '
-	git rev-list --test-bitmap HEAD
-'
-
-rev_list_tests_head () {
-	test_expect_success "counting commits via bitmap ($state, $branch)" '
-		git rev-list --count $branch >expect &&
-		git rev-list --use-bitmap-index --count $branch >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "counting partial commits via bitmap ($state, $branch)" '
-		git rev-list --count $branch~5..$branch >expect &&
-		git rev-list --use-bitmap-index --count $branch~5..$branch >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "counting commits with limit ($state, $branch)" '
-		git rev-list --count -n 1 $branch >expect &&
-		git rev-list --use-bitmap-index --count -n 1 $branch >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "counting non-linear history ($state, $branch)" '
-		git rev-list --count other...second >expect &&
-		git rev-list --use-bitmap-index --count other...second >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "counting commits with limiting ($state, $branch)" '
-		git rev-list --count $branch -- 1.t >expect &&
-		git rev-list --use-bitmap-index --count $branch -- 1.t >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "counting objects via bitmap ($state, $branch)" '
-		git rev-list --count --objects $branch >expect &&
-		git rev-list --use-bitmap-index --count --objects $branch >actual &&
-		test_cmp expect actual
-	'
-
-	test_expect_success "enumerate commits ($state, $branch)" '
-		git rev-list --use-bitmap-index $branch >actual &&
-		git rev-list $branch >expect &&
-		test_bitmap_traversal --no-confirm-bitmaps expect actual
-	'
-
-	test_expect_success "enumerate --objects ($state, $branch)" '
-		git rev-list --objects --use-bitmap-index $branch >actual &&
-		git rev-list --objects $branch >expect &&
-		test_bitmap_traversal expect actual
-	'
-
-	test_expect_success "bitmap --objects handles non-commit objects ($state, $branch)" '
-		git rev-list --objects --use-bitmap-index $branch tagged-blob >actual &&
-		grep $blob actual
-	'
-}
-
-rev_list_tests () {
-	state=$1
-
-	for branch in "second" "other"
-	do
-		rev_list_tests_head
-	done
-}
-
-rev_list_tests 'full bitmap'
-
-test_expect_success 'clone from bitmapped repository' '
-	git clone --no-local --bare . clone.git &&
-	git rev-parse HEAD >expect &&
-	git --git-dir=clone.git rev-parse HEAD >actual &&
-	test_cmp expect actual
-'
-
-test_expect_success 'partial clone from bitmapped repository' '
-	test_config uploadpack.allowfilter true &&
-	git clone --no-local --bare --filter=blob:none . partial-clone.git &&
-	(
-		cd partial-clone.git &&
-		pack=$(echo objects/pack/*.pack) &&
-		git verify-pack -v "$pack" >have &&
-		awk "/blob/ { print \$1 }" <have >blobs &&
-		# we expect this single blob because of the direct ref
-		git rev-parse refs/tags/tagged-blob >expect &&
-		test_cmp expect blobs
-	)
-'
-
-test_expect_success 'setup further non-bitmapped commits' '
-	test_commit_bulk --id=further 10
-'
-
-rev_list_tests 'partial bitmap'
-
-test_expect_success 'fetch (partial bitmap)' '
-	git --git-dir=clone.git fetch origin second:second &&
-	git rev-parse HEAD >expect &&
-	git --git-dir=clone.git rev-parse HEAD >actual &&
-	test_cmp expect actual
-'
+basic_bitmap_tests
 
 test_expect_success 'incremental repack fails when bitmaps are requested' '
 	test_commit more-1 &&
@@ -461,17 +276,6 @@ test_expect_success 'truncated bitmap fails gracefully (cache)' '
 	test_i18ngrep corrupted.bitmap.index stderr
 '
 
-# have_delta <obj> <expected_base>
-#
-# Note that because this relies on cat-file, it might find _any_ copy of an
-# object in the repository. The caller is responsible for making sure
-# there's only one (e.g., via "repack -ad", or having just fetched a copy).
-have_delta () {
-	echo $2 >expect &&
-	echo $1 | git cat-file --batch-check="%(deltabase)" >actual &&
-	test_cmp expect actual
-}
-
 # Create a state of history with these properties:
 #
 #  - refs that allow a client to fetch some new history, while sharing some old
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 15/22] t/helper/test-read-midx.c: add --checksum mode
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (13 preceding siblings ...)
  2021-04-09 18:11 ` [PATCH 14/22] t5310: move some tests to lib-bitmap.sh Taylor Blau
@ 2021-04-09 18:11 ` Taylor Blau
  2021-04-09 18:12 ` [PATCH 16/22] t5326: test multi-pack bitmap behavior Taylor Blau
                   ` (6 subsequent siblings)
  21 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-04-09 18:11 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Subsequent tests will want to check for the existence of a multi-pack
bitmap which matches the multi-pack-index stored in the pack directory.

The multi-pack bitmap includes the hex checksum of the MIDX it
corresponds to in its filename (for example,
'$packdir/multi-pack-index-<checksum>.bitmap'). As a result, some tests
want a way to learn what '<checksum>' is.

This helper addresses that need by printing the checksum of the
repository's multi-pack-index.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/helper/test-read-midx.c | 16 +++++++++++++++-
 t/lib-bitmap.sh           |  4 ++++
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/t/helper/test-read-midx.c b/t/helper/test-read-midx.c
index 7c2eb11a8e..cb0d27049a 100644
--- a/t/helper/test-read-midx.c
+++ b/t/helper/test-read-midx.c
@@ -60,12 +60,26 @@ static int read_midx_file(const char *object_dir, int show_objects)
 	return 0;
 }
 
+static int read_midx_checksum(const char *object_dir)
+{
+	struct multi_pack_index *m;
+
+	setup_git_directory();
+	m = load_multi_pack_index(object_dir, 1);
+	if (!m)
+		return 1;
+	printf("%s\n", hash_to_hex(get_midx_checksum(m)));
+	return 0;
+}
+
 int cmd__read_midx(int argc, const char **argv)
 {
 	if (!(argc == 2 || argc == 3))
-		usage("read-midx [--show-objects] <object-dir>");
+		usage("read-midx [--show-objects|--checksum] <object-dir>");
 
 	if (!strcmp(argv[1], "--show-objects"))
 		return read_midx_file(argv[2], 1);
+	else if (!strcmp(argv[1], "--checksum"))
+		return read_midx_checksum(argv[2]);
 	return read_midx_file(argv[1], 0);
 }
diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
index c655a9bf35..a5ac8b41a7 100644
--- a/t/lib-bitmap.sh
+++ b/t/lib-bitmap.sh
@@ -236,3 +236,7 @@ have_delta () {
 	echo $1 | git cat-file --batch-check="%(deltabase)" >actual &&
 	test_cmp expect actual
 }
+
+midx_checksum () {
+	test-tool read-midx --checksum "${1:-.git/objects}"
+}
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 16/22] t5326: test multi-pack bitmap behavior
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (14 preceding siblings ...)
  2021-04-09 18:11 ` [PATCH 15/22] t/helper/test-read-midx.c: add --checksum mode Taylor Blau
@ 2021-04-09 18:12 ` Taylor Blau
  2021-05-04 17:51   ` Jonathan Tan
  2021-04-09 18:12 ` [PATCH 17/22] t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP Taylor Blau
                   ` (5 subsequent siblings)
  21 siblings, 1 reply; 32+ messages in thread
From: Taylor Blau @ 2021-04-09 18:12 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

This patch introduces a new test, t5326, which tests the basic
functionality of multi-pack bitmaps.

Some trivial behavior is tested, such as:

  - Whether bitmaps can be generated with more than one pack.
  - Whether clones can be served with all objects in the bitmap.
  - Whether follow-up fetches can be served with some objects outside of
    the server's bitmap

These use lib-bitmap's tests (which in turn were pulled from t5310), and
we cover cases where the MIDX represents both a single pack and multiple
packs.

In addition, some non-trivial and MIDX-specific behavior is tested, too,
including:

  - Whether multi-pack bitmaps behave correctly with respect to the
    pack-reuse machinery when the base for some object is selected from
    a different pack than the delta.
  - Whether multi-pack bitmaps correctly respect the
    pack.preferBitmapTips configuration.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5326-multi-pack-bitmaps.sh | 278 ++++++++++++++++++++++++++++++++++
 1 file changed, 278 insertions(+)
 create mode 100755 t/t5326-multi-pack-bitmaps.sh

diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
new file mode 100755
index 0000000000..51c4f9ad78
--- /dev/null
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -0,0 +1,278 @@
+#!/bin/sh
+
+test_description='exercise basic multi-pack bitmap functionality'
+. ./test-lib.sh
+. "${TEST_DIRECTORY}/lib-bitmap.sh"
+
+# We'll be writing our own midx and bitmaps, so avoid getting confused by the
+# automatic ones.
+GIT_TEST_MULTI_PACK_INDEX=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
+
+objdir=.git/objects
+midx=$objdir/pack/multi-pack-index
+
+# midx_pack_source <obj>
+midx_pack_source () {
+	test-tool read-midx --show-objects .git/objects | grep "^$1 " | cut -f2
+}
+
+setup_bitmap_history
+
+test_expect_success 'enable core.multiPackIndex' '
+	git config core.multiPackIndex true
+'
+
+test_expect_success 'create single-pack midx with bitmaps' '
+	git repack -ad &&
+	git multi-pack-index write --bitmap &&
+	test_path_is_file $midx &&
+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
+'
+
+basic_bitmap_tests
+
+test_expect_success 'create new additional packs' '
+	for i in $(test_seq 1 16)
+	do
+		test_commit "$i" &&
+		git repack -d
+	done &&
+
+	git checkout -b other2 HEAD~8 &&
+	for i in $(test_seq 1 8)
+	do
+		test_commit "side-$i" &&
+		git repack -d
+	done &&
+	git checkout second
+'
+
+test_expect_success 'create multi-pack midx with bitmaps' '
+	git multi-pack-index write --bitmap &&
+
+	ls $objdir/pack/pack-*.pack >packs &&
+	test_line_count = 26 packs &&
+
+	test_path_is_file $midx &&
+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
+'
+
+basic_bitmap_tests
+
+test_expect_success '--no-bitmap is respected when bitmaps exist' '
+	git multi-pack-index write --bitmap &&
+
+	test_commit respect--no-bitmap &&
+	GIT_TEST_MULTI_PACK_INDEX=0 git repack -d &&
+
+	test_path_is_file $midx &&
+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+
+	git multi-pack-index write --no-bitmap &&
+
+	test_path_is_file $midx &&
+	test_path_is_missing $midx-$(midx_checksum $objdir).bitmap
+'
+
+test_expect_success 'setup midx with base from later pack' '
+	# Write a and b so that "a" is a delta on top of base "b", since Git
+	# prefers to delete contents out of a base rather than add to a shorter
+	# object.
+	test_seq 1 128 >a &&
+	test_seq 1 130 >b &&
+
+	git add a b &&
+	git commit -m "initial commit" &&
+
+	a=$(git rev-parse HEAD:a) &&
+	b=$(git rev-parse HEAD:b) &&
+
+	# In the first pack, "a" is stored as a delta to "b".
+	p1=$(git pack-objects .git/objects/pack/pack <<-EOF
+	$a
+	$b
+	EOF
+	) &&
+
+	# In the second pack, "a" is missing, and "b" is not a delta nor base to
+	# any other object.
+	p2=$(git pack-objects .git/objects/pack/pack <<-EOF
+	$b
+	$(git rev-parse HEAD)
+	$(git rev-parse HEAD^{tree})
+	EOF
+	) &&
+
+	git prune-packed &&
+	# Use the second pack as the preferred source, so that "b" occurs
+	# earlier in the MIDX object order, rendering "a" unusable for pack
+	# reuse.
+	git multi-pack-index write --bitmap --preferred-pack=pack-$p2.idx &&
+
+	have_delta $a $b &&
+	test $(midx_pack_source $a) != $(midx_pack_source $b)
+'
+
+rev_list_tests 'full bitmap with backwards delta'
+
+test_expect_success 'clone with bitmaps enabled' '
+	git clone --no-local --bare . clone-reverse-delta.git &&
+	test_when_finished "rm -fr clone-reverse-delta.git" &&
+
+	git rev-parse HEAD >expect &&
+	git --git-dir=clone-reverse-delta.git rev-parse HEAD >actual &&
+	test_cmp expect actual
+'
+
+bitmap_reuse_tests() {
+	from=$1
+	to=$2
+
+	test_expect_success "setup pack reuse tests ($from -> $to)" '
+		rm -fr repo &&
+		git init repo &&
+		(
+			cd repo &&
+			test_commit_bulk 16 &&
+			git tag old-tip &&
+
+			git config core.multiPackIndex true &&
+			if test "MIDX" = "$from"
+			then
+				GIT_TEST_MULTI_PACK_INDEX=0 git repack -Ad &&
+				git multi-pack-index write --bitmap
+			else
+				GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb
+			fi
+		)
+	'
+
+	test_expect_success "build bitmap from existing ($from -> $to)" '
+		(
+			cd repo &&
+			test_commit_bulk --id=further 16 &&
+			git tag new-tip &&
+
+			if test "MIDX" = "$to"
+			then
+				GIT_TEST_MULTI_PACK_INDEX=0 git repack -d &&
+				git multi-pack-index write --bitmap
+			else
+				GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb
+			fi
+		)
+	'
+
+	test_expect_success "verify resulting bitmaps ($from -> $to)" '
+		(
+			cd repo &&
+			git for-each-ref &&
+			git rev-list --test-bitmap refs/tags/old-tip &&
+			git rev-list --test-bitmap refs/tags/new-tip
+		)
+	'
+}
+
+bitmap_reuse_tests 'pack' 'MIDX'
+bitmap_reuse_tests 'MIDX' 'pack'
+bitmap_reuse_tests 'MIDX' 'MIDX'
+
+test_expect_success 'missing object closure fails gracefully' '
+	rm -fr repo &&
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		test_commit loose &&
+		test_commit packed &&
+
+		# Do not pass "--revs"; we want a pack without the "loose"
+		# commit.
+		git pack-objects $objdir/pack/pack <<-EOF &&
+		$(git rev-parse packed)
+		EOF
+
+		git multi-pack-index write --bitmap 2>err &&
+		grep "doesn.t have full closure" err &&
+		test_path_is_file $midx &&
+		test_path_is_missing $midx-$(midx_checksum $objdir).bitmap
+	)
+'
+
+test_expect_success 'setup partial bitmaps' '
+	test_commit packed &&
+	git repack &&
+	test_commit loose &&
+	git multi-pack-index write --bitmap 2>err &&
+	test_path_is_file $midx &&
+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
+'
+
+basic_bitmap_tests HEAD~
+
+test_expect_success 'removing a MIDX clears stale bitmaps' '
+	rm -fr repo &&
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+		test_commit base &&
+		git repack &&
+		git multi-pack-index write --bitmap &&
+
+		# Write a MIDX and bitmap; remove the MIDX but leave the bitmap.
+		stale_bitmap=$midx-$(midx_checksum $objdir).bitmap &&
+		rm $midx &&
+
+		# Then write a new MIDX.
+		test_commit new &&
+		git repack &&
+		git multi-pack-index write --bitmap &&
+
+		test_path_is_file $midx &&
+		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+		test_path_is_missing $stale_bitmap
+	)
+'
+
+test_expect_success 'pack.preferBitmapTips' '
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		test_commit_bulk --message="%s" 103 &&
+
+		git log --format="%H" >commits.raw &&
+		sort <commits.raw >commits &&
+
+		git log --format="create refs/tags/%s %H" HEAD >refs &&
+		git update-ref --stdin <refs &&
+
+		git multi-pack-index write --bitmap &&
+		test_path_is_file $midx &&
+		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+
+		test-tool bitmap list-commits | sort >bitmaps &&
+		comm -13 bitmaps commits >before &&
+		test_line_count = 1 before &&
+
+		perl -ne "printf(\"create refs/tags/include/%d \", $.); print" \
+			<before | git update-ref --stdin &&
+
+		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
+		rm -fr $midx-$(midx_checksum $objdir).rev &&
+		rm -fr $midx &&
+
+		git -c pack.preferBitmapTips=refs/tags/include \
+			multi-pack-index write --bitmap &&
+		test-tool bitmap list-commits | sort >bitmaps &&
+		comm -13 bitmaps commits >after &&
+
+		! test_cmp before after
+	)
+'
+
+test_done
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 17/22] t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (15 preceding siblings ...)
  2021-04-09 18:12 ` [PATCH 16/22] t5326: test multi-pack bitmap behavior Taylor Blau
@ 2021-04-09 18:12 ` Taylor Blau
  2021-04-09 18:12 ` [PATCH 18/22] t5319: don't write MIDX bitmaps in t5319 Taylor Blau
                   ` (4 subsequent siblings)
  21 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-04-09 18:12 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

From: Jeff King <peff@peff.net>

Generating a MIDX bitmap confuses many of the tests in t5310, which
expect to control whether and how bitmaps are written. Since the
relevant MIDX-bitmap tests here are covered already in t5326, let's just
disable the flag for the whole t5310 script.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5310-pack-bitmaps.sh | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index 4318f84d53..673baa5c3c 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -8,6 +8,10 @@ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 . "$TEST_DIRECTORY"/lib-bundle.sh
 . "$TEST_DIRECTORY"/lib-bitmap.sh
 
+# t5310 deals only with single-pack bitmaps, so don't write MIDX bitmaps in
+# their place.
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
+
 objpath () {
 	echo ".git/objects/$(echo "$1" | sed -e 's|\(..\)|\1/|')"
 }
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 18/22] t5319: don't write MIDX bitmaps in t5319
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (16 preceding siblings ...)
  2021-04-09 18:12 ` [PATCH 17/22] t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP Taylor Blau
@ 2021-04-09 18:12 ` Taylor Blau
  2021-04-09 18:12 ` [PATCH 19/22] t7700: update to work with MIDX bitmap test knob Taylor Blau
                   ` (3 subsequent siblings)
  21 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-04-09 18:12 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

This test is specifically about generating a midx still respecting a
pack-based bitmap file. Generating a MIDX bitmap would confuse the test.
Let's override the 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' variable to
make sure we don't do so.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5319-multi-pack-index.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 5641d158df..69f1c815aa 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -474,7 +474,8 @@ test_expect_success 'repack preserves multi-pack-index when creating packs' '
 compare_results_with_midx "after repack"
 
 test_expect_success 'multi-pack-index and pack-bitmap' '
-	git -c repack.writeBitmaps=true repack -ad &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -c repack.writeBitmaps=true repack -ad &&
 	git multi-pack-index write &&
 	git rev-list --test-bitmap HEAD
 '
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 19/22] t7700: update to work with MIDX bitmap test knob
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (17 preceding siblings ...)
  2021-04-09 18:12 ` [PATCH 18/22] t5319: don't write MIDX bitmaps in t5319 Taylor Blau
@ 2021-04-09 18:12 ` Taylor Blau
  2021-04-09 18:12 ` [PATCH 20/22] midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' Taylor Blau
                   ` (2 subsequent siblings)
  21 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-04-09 18:12 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

A number of these tests are focused only on pack-based bitmaps and need
to be updated to disable 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' where
necessary.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t7700-repack.sh | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 25b235c063..98eda3bfeb 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -63,13 +63,14 @@ test_expect_success 'objects in packs marked .keep are not repacked' '
 
 test_expect_success 'writing bitmaps via command-line can duplicate .keep objects' '
 	# build on $oid, $packid, and .keep state from previous
-	git repack -Adbl &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 git repack -Adbl &&
 	test_has_duplicate_object true
 '
 
 test_expect_success 'writing bitmaps via config can duplicate .keep objects' '
 	# build on $oid, $packid, and .keep state from previous
-	git -c repack.writebitmaps=true repack -Adl &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -c repack.writebitmaps=true repack -Adl &&
 	test_has_duplicate_object true
 '
 
@@ -189,7 +190,9 @@ test_expect_success 'repack --keep-pack' '
 
 test_expect_success 'bitmaps are created by default in bare repos' '
 	git clone --bare .git bare.git &&
-	git -C bare.git repack -ad &&
+	rm -f bare.git/objects/pack/*.bitmap &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -C bare.git repack -ad &&
 	bitmap=$(ls bare.git/objects/pack/*.bitmap) &&
 	test_path_is_file "$bitmap"
 '
@@ -200,7 +203,8 @@ test_expect_success 'incremental repack does not complain' '
 '
 
 test_expect_success 'bitmaps can be disabled on bare repos' '
-	git -c repack.writeBitmaps=false -C bare.git repack -ad &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -c repack.writeBitmaps=false -C bare.git repack -ad &&
 	bitmap=$(ls bare.git/objects/pack/*.bitmap || :) &&
 	test -z "$bitmap"
 '
@@ -211,7 +215,8 @@ test_expect_success 'no bitmaps created if .keep files present' '
 	keep=${pack%.pack}.keep &&
 	test_when_finished "rm -f \"\$keep\"" &&
 	>"$keep" &&
-	git -C bare.git repack -ad 2>stderr &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -C bare.git repack -ad 2>stderr &&
 	test_must_be_empty stderr &&
 	find bare.git/objects/pack/ -type f -name "*.bitmap" >actual &&
 	test_must_be_empty actual
@@ -222,7 +227,8 @@ test_expect_success 'auto-bitmaps do not complain if unavailable' '
 	blob=$(test-tool genrandom big $((1024*1024)) |
 	       git -C bare.git hash-object -w --stdin) &&
 	git -C bare.git update-ref refs/tags/big $blob &&
-	git -C bare.git repack -ad 2>stderr &&
+	GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0 \
+		git -C bare.git repack -ad 2>stderr &&
 	test_must_be_empty stderr &&
 	find bare.git/objects/pack -type f -name "*.bitmap" >actual &&
 	test_must_be_empty actual
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 20/22] midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP'
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (18 preceding siblings ...)
  2021-04-09 18:12 ` [PATCH 19/22] t7700: update to work with MIDX bitmap test knob Taylor Blau
@ 2021-04-09 18:12 ` Taylor Blau
  2021-04-09 18:12 ` [PATCH 21/22] p5310: extract full and partial bitmap tests Taylor Blau
  2021-04-09 18:12 ` [PATCH 22/22] p5326: perf tests for MIDX bitmaps Taylor Blau
  21 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-04-09 18:12 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

Introduce a new 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' environment
variable to also write a multi-pack bitmap when
'GIT_TEST_MULTI_PACK_INDEX' is set.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/repack.c          | 13 ++++++++++---
 ci/run-build-and-tests.sh |  1 +
 midx.h                    |  2 ++
 t/README                  |  4 ++++
 4 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 2847fdfbab..3cb843fb59 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -515,7 +515,10 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		if (!(pack_everything & ALL_INTO_ONE) ||
 		    !is_bare_repository())
 			write_bitmaps = 0;
-	}
+	} else if (write_bitmaps &&
+		   git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0) &&
+		   git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP, 0))
+		write_bitmaps = 0;
 	if (pack_kept_objects < 0)
 		pack_kept_objects = write_bitmaps > 0;
 
@@ -720,8 +723,12 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 		update_server_info(0);
 	remove_temporary_files();
 
-	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0))
-		write_midx_file(get_object_directory(), NULL, 0);
+	if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0)) {
+		unsigned flags = 0;
+		if (git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP, 0))
+			flags |= MIDX_WRITE_BITMAP | MIDX_WRITE_REV_INDEX;
+		write_midx_file(get_object_directory(), NULL, flags);
+	}
 
 	string_list_clear(&names, 0);
 	string_list_clear(&rollback, 0);
diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
index a66b5e8c75..7c55a5033e 100755
--- a/ci/run-build-and-tests.sh
+++ b/ci/run-build-and-tests.sh
@@ -22,6 +22,7 @@ linux-gcc)
 	export GIT_TEST_COMMIT_GRAPH=1
 	export GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=1
 	export GIT_TEST_MULTI_PACK_INDEX=1
+	export GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=1
 	export GIT_TEST_ADD_I_USE_BUILTIN=1
 	export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master
 	export GIT_TEST_WRITE_REV_INDEX=1
diff --git a/midx.h b/midx.h
index 350f4d0a7b..aa3da557bb 100644
--- a/midx.h
+++ b/midx.h
@@ -8,6 +8,8 @@ struct pack_entry;
 struct repository;
 
 #define GIT_TEST_MULTI_PACK_INDEX "GIT_TEST_MULTI_PACK_INDEX"
+#define GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP \
+	"GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP"
 
 struct multi_pack_index {
 	struct multi_pack_index *next;
diff --git a/t/README b/t/README
index fd9375b146..956731da44 100644
--- a/t/README
+++ b/t/README
@@ -420,6 +420,10 @@ GIT_TEST_MULTI_PACK_INDEX=<boolean>, when true, forces the multi-pack-
 index to be written after every 'git repack' command, and overrides the
 'core.multiPackIndex' setting to true.
 
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=<boolean>, when true, sets the
+'--bitmap' option on all invocations of 'git multi-pack-index write',
+and ignores pack-objects' '--write-bitmap-index'.
+
 GIT_TEST_SIDEBAND_ALL=<boolean>, when true, overrides the
 'uploadpack.allowSidebandAll' setting to true, and when false, forces
 fetch-pack to not request sideband-all (even if the server advertises
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 21/22] p5310: extract full and partial bitmap tests
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (19 preceding siblings ...)
  2021-04-09 18:12 ` [PATCH 20/22] midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' Taylor Blau
@ 2021-04-09 18:12 ` Taylor Blau
  2021-04-09 18:12 ` [PATCH 22/22] p5326: perf tests for MIDX bitmaps Taylor Blau
  21 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-04-09 18:12 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

A new p5326 introduced by the next patch will want these same tests,
interjecting its own setup in between. Move them out so that both perf
tests can reuse them.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/perf/lib-bitmap.sh         | 69 ++++++++++++++++++++++++++++++++++++
 t/perf/p5310-pack-bitmaps.sh | 65 ++-------------------------------
 2 files changed, 72 insertions(+), 62 deletions(-)
 create mode 100644 t/perf/lib-bitmap.sh

diff --git a/t/perf/lib-bitmap.sh b/t/perf/lib-bitmap.sh
new file mode 100644
index 0000000000..63d3bc7cec
--- /dev/null
+++ b/t/perf/lib-bitmap.sh
@@ -0,0 +1,69 @@
+# Helper functions for testing bitmap performance; see p5310.
+
+test_full_bitmap () {
+	test_perf 'simulated clone' '
+		git pack-objects --stdout --all </dev/null >/dev/null
+	'
+
+	test_perf 'simulated fetch' '
+		have=$(git rev-list HEAD~100 -1) &&
+		{
+			echo HEAD &&
+			echo ^$have
+		} | git pack-objects --revs --stdout >/dev/null
+	'
+
+	test_perf 'pack to file (bitmap)' '
+		git pack-objects --use-bitmap-index --all pack1b </dev/null >/dev/null
+	'
+
+	test_perf 'rev-list (commits)' '
+		git rev-list --all --use-bitmap-index >/dev/null
+	'
+
+	test_perf 'rev-list (objects)' '
+		git rev-list --all --use-bitmap-index --objects >/dev/null
+	'
+
+	test_perf 'rev-list with tag negated via --not --all (objects)' '
+		git rev-list perf-tag --not --all --use-bitmap-index --objects >/dev/null
+	'
+
+	test_perf 'rev-list with negative tag (objects)' '
+		git rev-list HEAD --not perf-tag --use-bitmap-index --objects >/dev/null
+	'
+
+	test_perf 'rev-list count with blob:none' '
+		git rev-list --use-bitmap-index --count --objects --all \
+			--filter=blob:none >/dev/null
+	'
+
+	test_perf 'rev-list count with blob:limit=1k' '
+		git rev-list --use-bitmap-index --count --objects --all \
+			--filter=blob:limit=1k >/dev/null
+	'
+
+	test_perf 'rev-list count with tree:0' '
+		git rev-list --use-bitmap-index --count --objects --all \
+			--filter=tree:0 >/dev/null
+	'
+
+	test_perf 'simulated partial clone' '
+		git pack-objects --stdout --all --filter=blob:none </dev/null >/dev/null
+	'
+}
+
+test_partial_bitmap () {
+	test_perf 'clone (partial bitmap)' '
+		git pack-objects --stdout --all </dev/null >/dev/null
+	'
+
+	test_perf 'pack to file (partial bitmap)' '
+		git pack-objects --use-bitmap-index --all pack2b </dev/null >/dev/null
+	'
+
+	test_perf 'rev-list with tree filter (partial bitmap)' '
+		git rev-list --use-bitmap-index --count --objects --all \
+			--filter=tree:0 >/dev/null
+	'
+}
diff --git a/t/perf/p5310-pack-bitmaps.sh b/t/perf/p5310-pack-bitmaps.sh
index 452be01056..7ad4f237bc 100755
--- a/t/perf/p5310-pack-bitmaps.sh
+++ b/t/perf/p5310-pack-bitmaps.sh
@@ -2,6 +2,7 @@
 
 test_description='Tests pack performance using bitmaps'
 . ./perf-lib.sh
+. "${TEST_DIRECTORY}/perf/lib-bitmap.sh"
 
 test_perf_large_repo
 
@@ -25,56 +26,7 @@ test_perf 'repack to disk' '
 	git repack -ad
 '
 
-test_perf 'simulated clone' '
-	git pack-objects --stdout --all </dev/null >/dev/null
-'
-
-test_perf 'simulated fetch' '
-	have=$(git rev-list HEAD~100 -1) &&
-	{
-		echo HEAD &&
-		echo ^$have
-	} | git pack-objects --revs --stdout >/dev/null
-'
-
-test_perf 'pack to file (bitmap)' '
-	git pack-objects --use-bitmap-index --all pack1b </dev/null >/dev/null
-'
-
-test_perf 'rev-list (commits)' '
-	git rev-list --all --use-bitmap-index >/dev/null
-'
-
-test_perf 'rev-list (objects)' '
-	git rev-list --all --use-bitmap-index --objects >/dev/null
-'
-
-test_perf 'rev-list with tag negated via --not --all (objects)' '
-	git rev-list perf-tag --not --all --use-bitmap-index --objects >/dev/null
-'
-
-test_perf 'rev-list with negative tag (objects)' '
-	git rev-list HEAD --not perf-tag --use-bitmap-index --objects >/dev/null
-'
-
-test_perf 'rev-list count with blob:none' '
-	git rev-list --use-bitmap-index --count --objects --all \
-		--filter=blob:none >/dev/null
-'
-
-test_perf 'rev-list count with blob:limit=1k' '
-	git rev-list --use-bitmap-index --count --objects --all \
-		--filter=blob:limit=1k >/dev/null
-'
-
-test_perf 'rev-list count with tree:0' '
-	git rev-list --use-bitmap-index --count --objects --all \
-		--filter=tree:0 >/dev/null
-'
-
-test_perf 'simulated partial clone' '
-	git pack-objects --stdout --all --filter=blob:none </dev/null >/dev/null
-'
+test_full_bitmap
 
 test_expect_success 'create partial bitmap state' '
 	# pick a commit to represent the repo tip in the past
@@ -97,17 +49,6 @@ test_expect_success 'create partial bitmap state' '
 	git update-ref HEAD $orig_tip
 '
 
-test_perf 'clone (partial bitmap)' '
-	git pack-objects --stdout --all </dev/null >/dev/null
-'
-
-test_perf 'pack to file (partial bitmap)' '
-	git pack-objects --use-bitmap-index --all pack2b </dev/null >/dev/null
-'
-
-test_perf 'rev-list with tree filter (partial bitmap)' '
-	git rev-list --use-bitmap-index --count --objects --all \
-		--filter=tree:0 >/dev/null
-'
+test_partial_bitmap
 
 test_done
-- 
2.31.1.163.ga65ce7f831


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 22/22] p5326: perf tests for MIDX bitmaps
  2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
                   ` (20 preceding siblings ...)
  2021-04-09 18:12 ` [PATCH 21/22] p5310: extract full and partial bitmap tests Taylor Blau
@ 2021-04-09 18:12 ` Taylor Blau
  2021-05-04 18:00   ` Jonathan Tan
  21 siblings, 1 reply; 32+ messages in thread
From: Taylor Blau @ 2021-04-09 18:12 UTC (permalink / raw)
  To: git; +Cc: peff, dstolee, gitster, jonathantanmy

These new performance tests demonstrate effectively the same behavior as
p5310, but use a multi-pack bitmap instead of a single-pack one.

Notably, p5326 does not create a MIDX bitmap with multiple packs. This
is so we can measure a direct comparison between it and p5310. Any
difference between the two is measuring just the overhead of using MIDX
bitmaps.

Here are the results of p5310 and p5326 together, measured at the same
time and on the same machine (using a Xenon W-2255 CPU):

    Test                                                  HEAD
    ------------------------------------------------------------------------
    5310.2: repack to disk                                96.78(93.39+11.33)
    5310.3: simulated clone                               9.98(9.79+0.19)
    5310.4: simulated fetch                               1.75(4.26+0.19)
    5310.5: pack to file (bitmap)                         28.20(27.87+8.70)
    5310.6: rev-list (commits)                            0.41(0.36+0.05)
    5310.7: rev-list (objects)                            1.61(1.54+0.07)
    5310.8: rev-list count with blob:none                 0.25(0.21+0.04)
    5310.9: rev-list count with blob:limit=1k             2.65(2.54+0.10)
    5310.10: rev-list count with tree:0                   0.23(0.19+0.04)
    5310.11: simulated partial clone                      4.34(4.21+0.12)
    5310.13: clone (partial bitmap)                       11.05(12.21+0.48)
    5310.14: pack to file (partial bitmap)                31.25(34.22+3.70)
    5310.15: rev-list with tree filter (partial bitmap)   0.26(0.22+0.04)

versus the same tests (this time using a multi-pack index):

    Test                                                  HEAD
    ------------------------------------------------------------------------
    5326.2: setup multi-pack index                        78.99(75.29+11.58)
    5326.3: simulated clone                               11.78(11.56+0.22)
    5326.4: simulated fetch                               1.70(4.49+0.13)
    5326.5: pack to file (bitmap)                         28.02(27.72+8.76)
    5326.6: rev-list (commits)                            0.42(0.36+0.06)
    5326.7: rev-list (objects)                            1.65(1.58+0.06)
    5326.8: rev-list count with blob:none                 0.26(0.21+0.05)
    5326.9: rev-list count with blob:limit=1k             2.97(2.86+0.10)
    5326.10: rev-list count with tree:0                   0.25(0.20+0.04)
    5326.11: simulated partial clone                      5.65(5.49+0.16)
    5326.13: clone (partial bitmap)                       12.22(13.43+0.38)
    5326.14: pack to file (partial bitmap)                30.05(31.57+7.25)
    5326.15: rev-list with tree filter (partial bitmap)   0.24(0.20+0.04)

There is slight overhead in "simulated clone", "simulated partial
clone", and "clone (partial bitmap)". Unsurprisingly, that overhead is
due to using the MIDX's reverse index to map between bit positions and
MIDX positions.

This can be reproduced by running "git repack -adb" along with "git
multi-pack-index write --bitmap" in a large-ish repository. Then run:

    $ perf record -o pack.perf git -c core.multiPackIndex=false \
      pack-objects --all --stdout >/dev/null </dev/null
    $ perf record -o midx.perf git -c core.multiPackIndex=true \
      pack-objects --all --stdout >/dev/null </dev/null

and compare the two with "perf diff -c delta -o 1 pack.perf midx.perf".
The most notable results are below (the next largest positive delta is
+0.14%):

    # Event 'cycles'
    #
    # Baseline    Delta  Shared Object       Symbol
    # ........  .......  ..................  ..........................
    #
                 +5.86%  git                 [.] nth_midxed_offset
                 +5.24%  git                 [.] nth_midxed_pack_int_id
         3.45%   +0.97%  git                 [.] offset_to_pack_pos
         3.30%   +0.57%  git                 [.] pack_pos_to_offset
                 +0.30%  git                 [.] pack_pos_to_midx

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/perf/p5326-multi-pack-bitmaps.sh | 43 ++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)
 create mode 100755 t/perf/p5326-multi-pack-bitmaps.sh

diff --git a/t/perf/p5326-multi-pack-bitmaps.sh b/t/perf/p5326-multi-pack-bitmaps.sh
new file mode 100755
index 0000000000..5845109ac7
--- /dev/null
+++ b/t/perf/p5326-multi-pack-bitmaps.sh
@@ -0,0 +1,43 @@
+#!/bin/sh
+
+test_description='Tests performance using midx bitmaps'
+. ./perf-lib.sh
+. "${TEST_DIRECTORY}/perf/lib-bitmap.sh"
+
+test_perf_large_repo
+
+test_expect_success 'enable multi-pack index' '
+	git config core.multiPackIndex true
+'
+
+test_perf 'setup multi-pack index' '
+	git repack -ad &&
+	git multi-pack-index write --bitmap
+'
+
+test_full_bitmap
+
+test_expect_success 'create partial bitmap state' '
+	# pick a commit to represent the repo tip in the past
+	cutoff=$(git rev-list HEAD~100 -1) &&
+	orig_tip=$(git rev-parse HEAD) &&
+
+	# now pretend we have just one tip
+	rm -rf .git/logs .git/refs/* .git/packed-refs &&
+	git update-ref HEAD $cutoff &&
+
+	# and then repack, which will leave us with a nice
+	# big bitmap pack of the "old" history, and all of
+	# the new history will be loose, as if it had been pushed
+	# up incrementally and exploded via unpack-objects
+	git repack -Ad &&
+	git multi-pack-index write --bitmap &&
+
+	# and now restore our original tip, as if the pushes
+	# had happened
+	git update-ref HEAD $orig_tip
+'
+
+test_partial_bitmap
+
+test_done
-- 
2.31.1.163.ga65ce7f831

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 12/22] pack-bitmap: read multi-pack bitmaps
  2021-04-09 18:11 ` [PATCH 12/22] pack-bitmap: read multi-pack bitmaps Taylor Blau
@ 2021-04-16  2:39   ` Jonathan Tan
  2021-04-16  3:13     ` Taylor Blau
  0 siblings, 1 reply; 32+ messages in thread
From: Jonathan Tan @ 2021-04-16  2:39 UTC (permalink / raw)
  To: me; +Cc: git, peff, dstolee, gitster, jonathantanmy

I'll review until this patch for now. Hopefully I'll get to the rest
soon.

> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 5205dde2e1..a4e4e4ebcc 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -984,7 +984,17 @@ static void write_reused_pack(struct hashfile *f)
>  				break;
>  
>  			offset += ewah_bit_ctz64(word >> offset);
> -			write_reused_pack_one(pos + offset, f, &w_curs);
> +			if (bitmap_is_midx(bitmap_git)) {
> +				off_t pack_offs = bitmap_pack_offset(bitmap_git,
> +								     pos + offset);
> +				uint32_t pos;
> +
> +				if (offset_to_pack_pos(reuse_packfile, pack_offs, &pos) < 0)
> +					die(_("write_reused_pack: could not locate %"PRIdMAX),
> +					    (intmax_t)pack_offs);
> +				write_reused_pack_one(pos, f, &w_curs);
> +			} else
> +				write_reused_pack_one(pos + offset, f, &w_curs);
>  			display_progress(progress_state, ++written);
>  		}
>  	}

When bitmaps are used, pos + offset is the pseudo-pack (a virtual
concatenation of all packfiles in the MIDX) position (as in, first
object is 0, second object is 1, and so on), not a position in
a single packfile. From it, we obtain a pack offset, and from it, we
obtain a position in the reused packfile (reuse_packfile). In this way,
the code is equivalent to the non-MIDX case. Looks good.

(There is no need to select a packfile here in the case of MIDX because,
as the code later shows, we always reuse only one packfile - assigned to
reuse_packfile.)

> @@ -35,8 +36,15 @@ struct stored_bitmap {
>   * the active bitmap index is the largest one.
>   */
>  struct bitmap_index {
> -	/* Packfile to which this bitmap index belongs to */
> +	/*
> +	 * The pack or multi-pack index (MIDX) that this bitmap index belongs
> +	 * to.
> +	 *
> +	 * Exactly one of these must be non-NULL; this specifies the object
> +	 * order used to interpret this bitmap.
> +	 */
>  	struct packed_git *pack;
> +	struct multi_pack_index *midx;

Makes sense.

> @@ -71,6 +79,8 @@ struct bitmap_index {
>  	/* If not NULL, this is a name-hash cache pointing into map. */
>  	uint32_t *hashes;
>  
> +	const unsigned char *checksum;
> +
>  	/*
>  	 * Extended index.
>  	 *

I see later that this checksum is used, OK. Maybe comment that this
points into map (just like "hashes", as quoted above).

> @@ -281,6 +304,54 @@ static char *pack_bitmap_filename(struct packed_git *p)
>  	return xstrfmt("%.*s.bitmap", (int)len, p->pack_name);
>  }
>  
> +static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
> +			      struct multi_pack_index *midx)
> +{
> +	struct stat st;
> +	char *idx_name = midx_bitmap_filename(midx);
> +	int fd = git_open(idx_name);
> +
> +	free(idx_name);
> +
> +	if (fd < 0)
> +		return -1;
> +
> +	if (fstat(fd, &st)) {
> +		close(fd);
> +		return -1;
> +	}
> +
> +	if (bitmap_git->pack || bitmap_git->midx) {
> +		/* ignore extra bitmap file; we can only handle one */
> +		return -1;

Here, fd is not closed? Maybe better to have multiple cleanup stages
(one when the mmap has been built, and one when not).

> @@ -302,12 +373,18 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
>  		return -1;
>  	}
>  
> -	if (bitmap_git->pack) {
> +	if (bitmap_git->pack || bitmap_git->midx) {
> +		/* ignore extra bitmap file; we can only handle one */
>  		warning("ignoring extra bitmap file: %s", packfile->pack_name);
>  		close(fd);
>  		return -1;
>  	}
>  
> +	if (!is_pack_valid(packfile)) {
> +		close(fd);
> +		return -1;
> +	}

Why is this needed now (and presumably, not before)?

> -static int load_pack_bitmap(struct bitmap_index *bitmap_git)
> +static int load_reverse_index(struct bitmap_index *bitmap_git)
> +{
> +	if (bitmap_is_midx(bitmap_git)) {
> +		uint32_t i;
> +		int ret;
> +
> +		ret = load_midx_revindex(bitmap_git->midx);
> +		if (ret)
> +			return ret;
> +
> +		for (i = 0; i < bitmap_git->midx->num_packs; i++) {
> +			if (prepare_midx_pack(the_repository, bitmap_git->midx, i))
> +				die(_("load_reverse_index: could not open pack"));
> +			ret = load_pack_revindex(bitmap_git->midx->packs[i]);

I was thinking about why we still need per-pack revindex, but I think
the answer is that we still need to convert pack offsets (roughly
speaking, 0 to size of packfile in bytes) to pack positions (0 to number
of objects) (and one such conversion is in the quoted section of
builtin/pack-objects.c above), and MIDX does not provide this. OK, makes
sense.

> +			if (ret)
> +				return ret;
> +		}
> +		return 0;
> +	}
> +	return load_pack_revindex(bitmap_git->pack);
> +}

[snip]

> @@ -428,10 +552,26 @@ static inline int bitmap_position_packfile(struct bitmap_index *bitmap_git,
>  	return pos;
>  }
>  
> +static int bitmap_position_midx(struct bitmap_index *bitmap_git,
> +				const struct object_id *oid)
> +{
> +	uint32_t want, got;
> +	if (!bsearch_midx(oid, bitmap_git->midx, &want))
> +		return -1;
> +
> +	if (midx_to_pack_pos(bitmap_git->midx, want, &got) < 0)
> +		return -1;
> +	return got;
> +}

bsearch_midx() gives us the position in the MIDX (e.g. if we had an
object with the name 00...00, "want" will be 0, and if we had an
object with the name ff...ff, "want" will be the number of objects
minus 1). midx_to_pack_pos() converts that into the position in the
pseudo-pack, which is what we want. OK.

> @@ -730,14 +871,28 @@ static void show_objects_for_type(
>  
>  			offset += ewah_bit_ctz64(word >> offset);
>  
> -			index_pos = pack_pos_to_index(bitmap_git->pack, pos + offset);
> -			ofs = pack_pos_to_offset(bitmap_git->pack, pos + offset);
> -			nth_packed_object_id(&oid, bitmap_git->pack, index_pos);
> +			if (bitmap_is_midx(bitmap_git)) {
> +				struct multi_pack_index *m = bitmap_git->midx;
> +				uint32_t pack_id;
> +
> +				index_pos = pack_pos_to_midx(m, pos + offset);
> +				ofs = nth_midxed_offset(m, index_pos);
> +				nth_midxed_object_oid(&oid, m, index_pos);
> +
> +				pack_id = nth_midxed_pack_int_id(m, index_pos);
> +				pack = bitmap_git->midx->packs[pack_id];

This is similar to the builtin/pack-objects.c case right at the start of
this patch. (bitmap_pack_offset(), used in builtin/pack-objects.c, is
pack_pos_to_midx() and nth_midx_offset() chained.) OK.

> +			} else {
> +				index_pos = pack_pos_to_index(bitmap_git->pack, pos + offset);
> +				ofs = pack_pos_to_offset(bitmap_git->pack, pos + offset);
> +				nth_bitmap_object_oid(bitmap_git, &oid, index_pos);
> +
> +				pack = bitmap_git->pack;
> +			}
>  
>  			if (bitmap_git->hashes)
>  				hash = get_be32(bitmap_git->hashes + index_pos);
>  
> -			show_reach(&oid, object_type, 0, hash, bitmap_git->pack, ofs);
> +			show_reach(&oid, object_type, 0, hash, pack, ofs);
>  		}
>  	}
>  }
> @@ -749,8 +904,13 @@ static int in_bitmapped_pack(struct bitmap_index *bitmap_git,
>  		struct object *object = roots->item;
>  		roots = roots->next;
>  
> -		if (find_pack_entry_one(object->oid.hash, bitmap_git->pack) > 0)
> -			return 1;
> +		if (bitmap_is_midx(bitmap_git)) {
> +			if (bsearch_midx(&object->oid, bitmap_git->midx, NULL))
> +				return 1;
> +		} else {
> +			if (find_pack_entry_one(object->oid.hash, bitmap_git->pack) > 0)
> +				return 1;
> +		}
>  	}
>  
>  	return 0;

OK - we don't actually care about the position, just that it exists,
which is why we can pass NULL as the last argument to bsearch_midx().

> @@ -839,14 +999,26 @@ static void filter_bitmap_blob_none(struct bitmap_index *bitmap_git,
>  static unsigned long get_size_by_pos(struct bitmap_index *bitmap_git,
>  				     uint32_t pos)
>  {
> -	struct packed_git *pack = bitmap_git->pack;
>  	unsigned long size;
>  	struct object_info oi = OBJECT_INFO_INIT;
>  
>  	oi.sizep = &size;
>  
>  	if (pos < bitmap_num_objects(bitmap_git)) {
> -		off_t ofs = pack_pos_to_offset(pack, pos);
> +		struct packed_git *pack;
> +		off_t ofs;
> +
> +		if (bitmap_is_midx(bitmap_git)) {
> +			uint32_t midx_pos = pack_pos_to_midx(bitmap_git->midx, pos);
> +			uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
> +
> +			pack = bitmap_git->midx->packs[pack_id];
> +			ofs = nth_midxed_offset(bitmap_git->midx, midx_pos);
> +		} else {
> +			pack = bitmap_git->pack;
> +			ofs = pack_pos_to_offset(pack, pos);
> +		}
> +
>  		if (packed_object_info(the_repository, pack, ofs, &oi) < 0) {
>  			struct object_id oid;
>  			nth_bitmap_object_oid(bitmap_git, &oid,

Makes sense - "pos" is the position in the pseudo-pack. From it we get
the MIDX position, and then we can get the pack ID and pack offset as
usual.

> @@ -1081,15 +1253,29 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
>  			      struct bitmap *reuse,
>  			      struct pack_window **w_curs)
>  {
> -	off_t offset, header;
> +	struct packed_git *pack;
> +	off_t offset, delta_obj_offset;
>  	enum object_type type;
>  	unsigned long size;
>  
>  	if (pos >= bitmap_num_objects(bitmap_git))
>  		return; /* not actually in the pack or MIDX */
>  
> -	offset = header = pack_pos_to_offset(bitmap_git->pack, pos);
> -	type = unpack_object_header(bitmap_git->pack, w_curs, &offset, &size);
> +	if (bitmap_is_midx(bitmap_git)) {
> +		uint32_t pack_id, midx_pos;
> +
> +		midx_pos = pack_pos_to_midx(bitmap_git->midx, pos);
> +		pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
> +
> +		pack = bitmap_git->midx->packs[pack_id];
> +		offset = nth_midxed_offset(bitmap_git->midx, midx_pos);

Would it be useful to assert somewhere here that "pack" is the preferred
pack?

Going further, is it reasonable to say that positions 0..n in the
preferred pack (where n is the number of objects in the preferred pack)
match positions 0..n in the pseudo-pack exactly? If yes, maybe we can
simplify things by explaining that we can operate in the MIDX case
exactly (or as similarly as possible) like we operate on a single
packfile because of this, instead of always needing to consider if a
delta base could appear in the MIDX as belonging to another packfile.

> @@ -1538,6 +1792,29 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
>  
>  			offset += ewah_bit_ctz64(word >> offset);
>  			pos = base + offset;
> +
> +			if (bitmap_is_midx(bitmap_git)) {
> +				uint32_t pack_pos;
> +				uint32_t midx_pos = pack_pos_to_midx(bitmap_git->midx, pos);
> +				uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
> +				off_t offset = nth_midxed_offset(bitmap_git->midx, midx_pos);
> +
> +				pack = bitmap_git->midx->packs[pack_id];
> +
> +				if (offset_to_pack_pos(pack, offset, &pack_pos) < 0) {
> +					struct object_id oid;
> +					nth_midxed_object_oid(&oid, bitmap_git->midx, midx_pos);
> +
> +					die(_("could not find %s in pack #%"PRIu32" at offset %"PRIuMAX),
> +					    oid_to_hex(&oid),
> +					    pack_id,
> +					    (uintmax_t)offset);
> +				}
> +
> +				pos = pack_pos;
> +			} else
> +				pack = bitmap_git->pack;
> +
>  			total += pack_pos_to_offset(pack, pos + 1) -
>  				 pack_pos_to_offset(pack, pos);
>  		}

"pos" is assigned to twice in the MIDX case (with different semantics).
I think it's better to do it like in the rest of the patch - use "base +
offset" as the argument to pack_pos_to_midx, and then you wouldn't need
to assign to "pos" twice.

> diff --git a/packfile.c b/packfile.c
> index 8668345d93..c444e365a3 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -863,7 +863,7 @@ static void prepare_pack(const char *full_name, size_t full_name_len,
>  	if (!strcmp(file_name, "multi-pack-index"))
>  		return;
>  	if (starts_with(file_name, "multi-pack-index") &&
> -	    ends_with(file_name, ".rev"))
> +	    (ends_with(file_name, ".bitmap") || ends_with(file_name, ".rev")))
>  		return;
>  	if (ends_with(file_name, ".idx") ||
>  	    ends_with(file_name, ".rev") ||

I guess this will come into play when we start writing MIDX bitmaps?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 02/22] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps
  2021-04-09 18:10 ` [PATCH 02/22] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps Taylor Blau
@ 2021-04-16  2:46   ` Jonathan Tan
  0 siblings, 0 replies; 32+ messages in thread
From: Jonathan Tan @ 2021-04-16  2:46 UTC (permalink / raw)
  To: me; +Cc: git, peff, dstolee, gitster, jonathantanmy

> @@ -125,15 +125,20 @@ static inline void push_bitmapped_commit(struct commit *commit)
>  	writer.selected_nr++;
>  }
>  
> -static uint32_t find_object_pos(const struct object_id *oid)
> +static uint32_t find_object_pos(const struct object_id *oid, int *found)

find_object_pos() is only called by fill_bitmap_tree() and
fill_bitmap_commit(). fill_bitmap_tree() is only called by itself and
fill_bitmap_commit(). fill_bitmap_commit() is only called by
bitmap_writer_build(). And bitmap_writer_build() is only called by
write_pack_file(), which has been changed to die when
bitmap_writer_build() fails. So looks like everything is accounted for.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 12/22] pack-bitmap: read multi-pack bitmaps
  2021-04-16  2:39   ` Jonathan Tan
@ 2021-04-16  3:13     ` Taylor Blau
  0 siblings, 0 replies; 32+ messages in thread
From: Taylor Blau @ 2021-04-16  3:13 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: me, git, peff, dstolee, gitster

On Thu, Apr 15, 2021 at 07:39:25PM -0700, Jonathan Tan wrote:
> I'll review until this patch for now. Hopefully I'll get to the rest
> soon.

Thanks in advance. I always find that you leave insightful comments, so
I appreciate you taking the time to review my patches.

> > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> > index 5205dde2e1..a4e4e4ebcc 100644
> > --- a/builtin/pack-objects.c
> > +++ b/builtin/pack-objects.c
> > @@ -984,7 +984,17 @@ static void write_reused_pack(struct hashfile *f)
> >  				break;
> >
> >  			offset += ewah_bit_ctz64(word >> offset);
> > -			write_reused_pack_one(pos + offset, f, &w_curs);
> > +			if (bitmap_is_midx(bitmap_git)) {
> > +				off_t pack_offs = bitmap_pack_offset(bitmap_git,
> > +								     pos + offset);
> > +				uint32_t pos;
> > +
> > +				if (offset_to_pack_pos(reuse_packfile, pack_offs, &pos) < 0)
> > +					die(_("write_reused_pack: could not locate %"PRIdMAX),
> > +					    (intmax_t)pack_offs);
> > +				write_reused_pack_one(pos, f, &w_curs);
> > +			} else
> > +				write_reused_pack_one(pos + offset, f, &w_curs);
> >  			display_progress(progress_state, ++written);
> >  		}
> >  	}
>
> When bitmaps are used, pos + offset is the pseudo-pack (a virtual
> concatenation of all packfiles in the MIDX) position (as in, first
> object is 0, second object is 1, and so on), not a position in
> a single packfile. From it, we obtain a pack offset, and from it, we
> obtain a position in the reused packfile (reuse_packfile). In this way,
> the code is equivalent to the non-MIDX case. Looks good.
>
> (There is no need to select a packfile here in the case of MIDX because,
> as the code later shows, we always reuse only one packfile - assigned to
> reuse_packfile.)

You're exactly right here on both points.

It's worth noting that the "reuse" you're describing here is only about
reusing sections of the original packfile byte-for-byte (with the
exception of fixing the offsets in any OFS_DELTAs). That's not to be
confused with delta reuse, which is entirely different.

I think that both Peff and I are dubious that the pack-reuse stuff is
kicking in all that much, since there are some heuristics in place about
when it is allowed to take over and when it isn't, but that's a topic
for another thread.

> > @@ -35,8 +36,15 @@ struct stored_bitmap {
> >   * the active bitmap index is the largest one.
> >   */
> >  struct bitmap_index {
> > -	/* Packfile to which this bitmap index belongs to */
> > +	/*
> > +	 * The pack or multi-pack index (MIDX) that this bitmap index belongs
> > +	 * to.
> > +	 *
> > +	 * Exactly one of these must be non-NULL; this specifies the object
> > +	 * order used to interpret this bitmap.
> > +	 */
> >  	struct packed_git *pack;
> > +	struct multi_pack_index *midx;
>
> Makes sense.
>
> > @@ -71,6 +79,8 @@ struct bitmap_index {
> >  	/* If not NULL, this is a name-hash cache pointing into map. */
> >  	uint32_t *hashes;
> >
> > +	const unsigned char *checksum;
> > +
> >  	/*
> >  	 * Extended index.
> >  	 *
>
> I see later that this checksum is used, OK. Maybe comment that this
> points into map (just like "hashes", as quoted above).

Yep, quite fair.

> > +	if (bitmap_git->pack || bitmap_git->midx) {
> > +		/* ignore extra bitmap file; we can only handle one */
> > +		return -1;
>
> Here, fd is not closed? Maybe better to have multiple cleanup stages
> (one when the mmap has been built, and one when not).

Good eyes. That's an oversight, and we should be closing fd there, too.
It looks like we're also missing a warning(), although I am skeptical
that the warning would ever kick in. The pack-based version of this
function is run in a loop over all packs, but the loop doesn't terminate
once a pack bitmap is opened, since we make sure that no *other* packs
have bitmaps, too.

But we don't do the same for multi-pack bitmaps, i.e., once we find a
MIDX that has a bitmap, we terminate immediately. It may be worth
scanning through the list of all MIDXs to make sure that only one has a
bitmap, but to be honest I could go either way on that point, too, since
any MIDX bitmap is worth loading. But the warning doesn't hurt, so I'll
add that, too.

> > +	if (!is_pack_valid(packfile)) {
> > +		close(fd);
> > +		return -1;
> > +	}
>
> Why is this needed now (and presumably, not before)?

It does appear as a stray hunk, and I'm sure that it probably could be
extracted into its own patch. I can't recall anything about this
particular patch that makes it necessary, but maybe Peff remembers
something I don't.

> > @@ -1081,15 +1253,29 @@ static void try_partial_reuse(struct bitmap_index *bitmap_git,
> >  			      struct bitmap *reuse,
> >  			      struct pack_window **w_curs)
> >  {
> > -	off_t offset, header;
> > +	struct packed_git *pack;
> > +	off_t offset, delta_obj_offset;
> >  	enum object_type type;
> >  	unsigned long size;
> >
> >  	if (pos >= bitmap_num_objects(bitmap_git))
> >  		return; /* not actually in the pack or MIDX */
> >
> > -	offset = header = pack_pos_to_offset(bitmap_git->pack, pos);
> > -	type = unpack_object_header(bitmap_git->pack, w_curs, &offset, &size);
> > +	if (bitmap_is_midx(bitmap_git)) {
> > +		uint32_t pack_id, midx_pos;
> > +
> > +		midx_pos = pack_pos_to_midx(bitmap_git->midx, pos);
> > +		pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
> > +
> > +		pack = bitmap_git->midx->packs[pack_id];
> > +		offset = nth_midxed_offset(bitmap_git->midx, midx_pos);
>
> Would it be useful to assert somewhere here that "pack" is the preferred
> pack?

An assertion like that may hurt this function's cache performance, since
the way we determine the preferred pack is by looking at which pack is
the donor for the 0th object in the MIDX's .rev file. And this function
is rather hot, since it is invoked once per-bit. So it may cause us to
hit more page faults than we currently do.

That all said, the assertion may not be helping much since we only call
this method on objects from a single pack (the bitmapped pack in the
single-pack case, or the preferred pack in the MIDX case). There's a
comment in reuse_partial_packfile_from_bitmap() to this effect, which
may or may not be good enough ;).

> Going further, is it reasonable to say that positions 0..n in the
> preferred pack (where n is the number of objects in the preferred pack)
> match positions 0..n in the pseudo-pack exactly? If yes, maybe we can
> simplify things by explaining that we can operate in the MIDX case
> exactly (or as similarly as possible) like we operate on a single
> packfile because of this, instead of always needing to consider if a
> delta base could appear in the MIDX as belonging to another packfile.

You're right, and there are two things going on here which allow us to
make that assumption:

  - The preferred pack sorts ahead of all other packs in the MIDX when
    assembling the pseudo-pack order, so bits 0..n (where 'n' is the
    number of objects in the preferred pack) of the pseudo pack are
    designated to the preferred pack.

  - When duplicates of objects exist, the MIDX *always* breaks ties in
    favor of the preferred pack, so it's never the case that a delta'd
    object from the preferred pack will find its base in another pack
    (if it asked the MIDX to locate a copy of the base object).

So we can safely remove the conditional on bitmap_is_midx() in the first
part of this function for exactly the reasons above, which is good. That
probably merits moving the comment beginning with "Note that the base
does not need to be repositioned ..." earlier in this function, to make
clear that we really can treat bits from the preferred pack as if they
don't have anything to do with the MIDX at all.

So long as we determine the preferred pack ahead of time (and not once
per-call), I think that it would be a win.

> > @@ -1538,6 +1792,29 @@ static off_t get_disk_usage_for_type(struct bitmap_index *bitmap_git,
> >
> >  			offset += ewah_bit_ctz64(word >> offset);
> >  			pos = base + offset;
> > +
> > +			if (bitmap_is_midx(bitmap_git)) {
> > +				uint32_t pack_pos;
> > +				uint32_t midx_pos = pack_pos_to_midx(bitmap_git->midx, pos);
> > +				uint32_t pack_id = nth_midxed_pack_int_id(bitmap_git->midx, midx_pos);
> > +				off_t offset = nth_midxed_offset(bitmap_git->midx, midx_pos);
> > +
> > +				pack = bitmap_git->midx->packs[pack_id];
> > +
> > +				if (offset_to_pack_pos(pack, offset, &pack_pos) < 0) {
> > +					struct object_id oid;
> > +					nth_midxed_object_oid(&oid, bitmap_git->midx, midx_pos);
> > +
> > +					die(_("could not find %s in pack #%"PRIu32" at offset %"PRIuMAX),
> > +					    oid_to_hex(&oid),
> > +					    pack_id,
> > +					    (uintmax_t)offset);
> > +				}
> > +
> > +				pos = pack_pos;
> > +			} else
> > +				pack = bitmap_git->pack;
> > +
> >  			total += pack_pos_to_offset(pack, pos + 1) -
> >  				 pack_pos_to_offset(pack, pos);
> >  		}
>
> "pos" is assigned to twice in the MIDX case (with different semantics).
> I think it's better to do it like in the rest of the patch - use "base +
> offset" as the argument to pack_pos_to_midx, and then you wouldn't need
> to assign to "pos" twice.

Good idea, thanks. Skimming again over the patch, this is the only place
that I could find where I double-assign pos like this.

> > diff --git a/packfile.c b/packfile.c
> > index 8668345d93..c444e365a3 100644
> > --- a/packfile.c
> > +++ b/packfile.c
> > @@ -863,7 +863,7 @@ static void prepare_pack(const char *full_name, size_t full_name_len,
> >  	if (!strcmp(file_name, "multi-pack-index"))
> >  		return;
> >  	if (starts_with(file_name, "multi-pack-index") &&
> > -	    ends_with(file_name, ".rev"))
> > +	    (ends_with(file_name, ".bitmap") || ends_with(file_name, ".rev")))
> >  		return;
> >  	if (ends_with(file_name, ".idx") ||
> >  	    ends_with(file_name, ".rev") ||
>
> I guess this will come into play when we start writing MIDX bitmaps?

Yep, that's right. Since this patch is about making sure we can handle
the MIDX bitmap as described in
Documentation/technical/bitmap-format.txt, this is part of that.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 13/22] pack-bitmap: write multi-pack bitmaps
  2021-04-09 18:11 ` [PATCH 13/22] pack-bitmap: write " Taylor Blau
@ 2021-05-04  5:02   ` Jonathan Tan
  2021-05-06 20:18     ` Taylor Blau
  0 siblings, 1 reply; 32+ messages in thread
From: Jonathan Tan @ 2021-05-04  5:02 UTC (permalink / raw)
  To: me; +Cc: git, peff, dstolee, gitster, jonathantanmy

> Write multi-pack bitmaps in the format described by
> Documentation/technical/bitmap-format.txt, inferring their presence with
> the absence of '--bitmap'.
> 
> To write a multi-pack bitmap, this patch attempts to reuse as much of
> the existing machinery from pack-objects as possible. Specifically, the
> MIDX code prepares a packing_data struct that pretends as if a single
> packfile has been generated containing all of the objects contained
> within the MIDX.

Sounds good, and makes sense. Conceptually, the MIDX bitmap is the same
as a regular packfile bitmap, just that the order of objects in the
bitmap is defined differently.

> +static void prepare_midx_packing_data(struct packing_data *pdata,
> +				      struct write_midx_context *ctx)
> +{
> +	uint32_t i;
> +
> +	memset(pdata, 0, sizeof(struct packing_data));
> +	prepare_packing_data(the_repository, pdata);
> +
> +	for (i = 0; i < ctx->entries_nr; i++) {
> +		struct pack_midx_entry *from = &ctx->entries[ctx->pack_order[i]];
> +		struct object_entry *to = packlist_alloc(pdata, &from->oid);
> +
> +		oe_set_in_pack(pdata, to,
> +			       ctx->info[ctx->pack_perm[from->pack_int_id]].p);
> +	}
> +}

It is surprising to see this right at the top. Scrolling down, I guess
that there is more information needed than just the packing_data struct.

> +static int add_ref_to_pending(const char *refname,
> +			      const struct object_id *oid,
> +			      int flag, void *cb_data)
> +{
> +	struct rev_info *revs = (struct rev_info*)cb_data;
> +	struct object *object;
> +
> +	if ((flag & REF_ISSYMREF) && (flag & REF_ISBROKEN)) {
> +		warning("symbolic ref is dangling: %s", refname);
> +		return 0;
> +	}
> +
> +	object = parse_object_or_die(oid, refname);
> +	if (object->type != OBJ_COMMIT)
> +		return 0;
> +
> +	add_pending_object(revs, object, "");
> +	if (bitmap_is_preferred_refname(revs->repo, refname))
> +		object->flags |= NEEDS_BITMAP;
> +	return 0;
> +}

Makes sense. We need to flag certain commits as NEEDS_BITMAP because
bitmaps are not made for all commits but only certain ones.

> +struct bitmap_commit_cb {
> +	struct commit **commits;
> +	size_t commits_nr, commits_alloc;
> +
> +	struct write_midx_context *ctx;
> +};
> +
> +static const struct object_id *bitmap_oid_access(size_t index,
> +						 const void *_entries)
> +{
> +	const struct pack_midx_entry *entries = _entries;
> +	return &entries[index].oid;
> +}
> +
> +static void bitmap_show_commit(struct commit *commit, void *_data)
> +{
> +	struct bitmap_commit_cb *data = _data;
> +	if (oid_pos(&commit->object.oid, data->ctx->entries,
> +		    data->ctx->entries_nr,
> +		    bitmap_oid_access) > -1) {
> +		ALLOC_GROW(data->commits, data->commits_nr + 1,
> +			   data->commits_alloc);
> +		data->commits[data->commits_nr++] = commit;
> +	}
> +}
> +
> +static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr_p,
> +						    struct write_midx_context *ctx)
> +{
> +	struct rev_info revs;
> +	struct bitmap_commit_cb cb;
> +
> +	memset(&cb, 0, sizeof(struct bitmap_commit_cb));
> +	cb.ctx = ctx;
> +
> +	repo_init_revisions(the_repository, &revs, NULL);
> +	for_each_ref(add_ref_to_pending, &revs);
> +
> +	fetch_if_missing = 0;
> +	revs.exclude_promisor_objects = 1;

I think that the MIDX bitmap requires all objects be present? If yes, we
should omit these 2 lines.

> +
> +	if (prepare_revision_walk(&revs))
> +		die(_("revision walk setup failed"));
> +
> +	traverse_commit_list(&revs, bitmap_show_commit, NULL, &cb);
> +	if (indexed_commits_nr_p)
> +		*indexed_commits_nr_p = cb.commits_nr;
> +
> +	return cb.commits;
> +}

Hmm...I might be missing something obvious, but this function and its
callbacks seem to be written like this in order to put the returned
commits in a certain order. But later on in write_midx_bitmap(), the
return value of this function is passed to
bitmap_writer_select_commits(), which resorts the list anyway?

> +static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
> +			     struct write_midx_context *ctx,
> +			     unsigned flags)
> +{
> +	struct packing_data pdata;
> +	struct pack_idx_entry **index;
> +	struct commit **commits = NULL;
> +	uint32_t i, commits_nr;
> +	char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, hash_to_hex(midx_hash));
> +	int ret;
> +
> +	prepare_midx_packing_data(&pdata, ctx);
> +
> +	commits = find_commits_for_midx_bitmap(&commits_nr, ctx);
> +
> +	/*
> +	 * Build the MIDX-order index based on pdata.objects (which is already
> +	 * in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of
> +	 * this order).
> +	 */
> +	ALLOC_ARRAY(index, pdata.nr_objects);
> +	for (i = 0; i < pdata.nr_objects; i++)
> +		index[i] = (struct pack_idx_entry *)&pdata.objects[i];
> +
> +	bitmap_writer_show_progress(flags & MIDX_PROGRESS);
> +	bitmap_writer_build_type_index(&pdata, index, pdata.nr_objects);
> +
> +	/*
> +	 * bitmap_writer_select_commits expects objects in lex order, but
> +	 * pack_order gives us exactly that. use it directly instead of
> +	 * re-sorting the array
> +	 */
> +	for (i = 0; i < pdata.nr_objects; i++)
> +		index[ctx->pack_order[i]] = (struct pack_idx_entry *)&pdata.objects[i];
> +
> +	bitmap_writer_select_commits(commits, commits_nr, -1);

The comment above says bitmap_writer_select_commits() expects objects in
lex order, but (1) you're putting "index" in lex order, not "commits",
and (2) the first thing in bitmap_writer_select_commits() is a QSORT.
Did you mean another function?

> +	ret = bitmap_writer_build(&pdata);
> +	if (!ret)
> +		goto cleanup;
> +
> +	bitmap_writer_set_checksum(midx_hash);
> +	bitmap_writer_finish(index, pdata.nr_objects, bitmap_name, 0);

So bitmap_writer_build_type_index() and bitmap_writer_finish() are
called with 2 different orders of commits. Is this expected? If yes,
maybe this is worth a comment.

> +
> +cleanup:
> +	free(index);
> +	free(bitmap_name);
> +	return ret;
> +}

[snip]

> @@ -930,9 +1073,16 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
>  		for (i = 0; i < ctx.m->num_packs; i++) {
>  			ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc);
>  
> +			if (prepare_midx_pack(the_repository, ctx.m, i)) {
> +				error(_("could not load pack %s"),
> +				      ctx.m->pack_names[i]);
> +				result = 1;
> +				goto cleanup;
> +			}
> +
>  			ctx.info[ctx.nr].orig_pack_int_id = i;
>  			ctx.info[ctx.nr].pack_name = xstrdup(ctx.m->pack_names[i]);
> -			ctx.info[ctx.nr].p = NULL;
> +			ctx.info[ctx.nr].p = ctx.m->packs[i];
>  			ctx.info[ctx.nr].expired = 0;
>  			ctx.nr++;
>  		}

Why is this needed now and not before? From what I see in this function,
nothing seems to happen to this .p pack except that they are closed
later.

> @@ -1096,6 +1264,15 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
>  		if (ctx.info[i].p) {
>  			close_pack(ctx.info[i].p);
>  			free(ctx.info[i].p);
> +			if (ctx.m) {
> +				/*
> +				 * Destroy a stale reference to the pack in
> +				 * 'ctx.m'.
> +				 */
> +				uint32_t orig = ctx.info[i].orig_pack_int_id;
> +				if (orig < ctx.m->num_packs)
> +					ctx.m->packs[orig] = NULL;
> +			}
>  		}
>  		free(ctx.info[i].pack_name);
>  	}

Is this hunk needed? "ctx" is a local variable and will not outlast this
function.

I'll review the rest tomorrow. It seems like I've gotten over the most
difficult patches.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 16/22] t5326: test multi-pack bitmap behavior
  2021-04-09 18:12 ` [PATCH 16/22] t5326: test multi-pack bitmap behavior Taylor Blau
@ 2021-05-04 17:51   ` Jonathan Tan
  0 siblings, 0 replies; 32+ messages in thread
From: Jonathan Tan @ 2021-05-04 17:51 UTC (permalink / raw)
  To: me; +Cc: git, peff, dstolee, gitster, jonathantanmy

> +test_expect_success 'clone with bitmaps enabled' '
> +	git clone --no-local --bare . clone-reverse-delta.git &&
> +	test_when_finished "rm -fr clone-reverse-delta.git" &&
> +
> +	git rev-parse HEAD >expect &&
> +	git --git-dir=clone-reverse-delta.git rev-parse HEAD >actual &&
> +	test_cmp expect actual
> +'

What is this test testing? That bitmaps are used? (I'm not sure how to
verify that though - we seem to have tracing for bitmap writing but not
for reading, for example.)

> +bitmap_reuse_tests() {
> +	from=$1
> +	to=$2
> +
> +	test_expect_success "setup pack reuse tests ($from -> $to)" '
> +		rm -fr repo &&
> +		git init repo &&
> +		(
> +			cd repo &&
> +			test_commit_bulk 16 &&
> +			git tag old-tip &&
> +
> +			git config core.multiPackIndex true &&
> +			if test "MIDX" = "$from"
> +			then
> +				GIT_TEST_MULTI_PACK_INDEX=0 git repack -Ad &&
> +				git multi-pack-index write --bitmap
> +			else
> +				GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb
> +			fi
> +		)
> +	'
> +
> +	test_expect_success "build bitmap from existing ($from -> $to)" '
> +		(
> +			cd repo &&
> +			test_commit_bulk --id=further 16 &&
> +			git tag new-tip &&
> +
> +			if test "MIDX" = "$to"
> +			then
> +				GIT_TEST_MULTI_PACK_INDEX=0 git repack -d &&
> +				git multi-pack-index write --bitmap
> +			else
> +				GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb
> +			fi
> +		)
> +	'
> +
> +	test_expect_success "verify resulting bitmaps ($from -> $to)" '
> +		(
> +			cd repo &&
> +			git for-each-ref &&
> +			git rev-list --test-bitmap refs/tags/old-tip &&
> +			git rev-list --test-bitmap refs/tags/new-tip
> +		)
> +	'
> +}
> +
> +bitmap_reuse_tests 'pack' 'MIDX'
> +bitmap_reuse_tests 'MIDX' 'pack'
> +bitmap_reuse_tests 'MIDX' 'MIDX'

Is it possible to verify that the bitmaps have truly been reused (and
not, say, created from scratch)? (E.g. is there any nature of the
bitmap created - for example, the order of commits?)

> +test_expect_success 'pack.preferBitmapTips' '
> +	git init repo &&
> +	test_when_finished "rm -fr repo" &&
> +	(
> +		cd repo &&
> +
> +		test_commit_bulk --message="%s" 103 &&
> +
> +		git log --format="%H" >commits.raw &&
> +		sort <commits.raw >commits &&
> +
> +		git log --format="create refs/tags/%s %H" HEAD >refs &&
> +		git update-ref --stdin <refs &&
> +
> +		git multi-pack-index write --bitmap &&
> +		test_path_is_file $midx &&
> +		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
> +
> +		test-tool bitmap list-commits | sort >bitmaps &&
> +		comm -13 bitmaps commits >before &&
> +		test_line_count = 1 before &&
> +
> +		perl -ne "printf(\"create refs/tags/include/%d \", $.); print" \
> +			<before | git update-ref --stdin &&
> +
> +		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
> +		rm -fr $midx-$(midx_checksum $objdir).rev &&
> +		rm -fr $midx &&
> +
> +		git -c pack.preferBitmapTips=refs/tags/include \
> +			multi-pack-index write --bitmap &&
> +		test-tool bitmap list-commits | sort >bitmaps &&
> +		comm -13 bitmaps commits >after &&
> +
> +		! test_cmp before after
> +	)
> +'

Could we have a more precise comparison of "before" and "after" (besides
the fact that they're different)?

Besides that, all the patches up to this one look good (including patch
14, verified with "--color-moved
--color-moved-ws=allow-indentation-change").

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 22/22] p5326: perf tests for MIDX bitmaps
  2021-04-09 18:12 ` [PATCH 22/22] p5326: perf tests for MIDX bitmaps Taylor Blau
@ 2021-05-04 18:00   ` Jonathan Tan
  2021-05-05  0:55     ` Junio C Hamano
  0 siblings, 1 reply; 32+ messages in thread
From: Jonathan Tan @ 2021-05-04 18:00 UTC (permalink / raw)
  To: me; +Cc: git, peff, dstolee, gitster, jonathantanmy

> There is slight overhead in "simulated clone", "simulated partial
> clone", and "clone (partial bitmap)". Unsurprisingly, that overhead is
> due to using the MIDX's reverse index to map between bit positions and
> MIDX positions.

Thanks - it's great to see that accessing a MIDX bitmap doesn't add much
overhead (as compared to accessing a single-pack bitmap of the same
size).

All the remaining patches up to and including this one look good.
Overall, I did have some comments here and there, but I am happy with
the overall design.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 22/22] p5326: perf tests for MIDX bitmaps
  2021-05-04 18:00   ` Jonathan Tan
@ 2021-05-05  0:55     ` Junio C Hamano
  0 siblings, 0 replies; 32+ messages in thread
From: Junio C Hamano @ 2021-05-05  0:55 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: me, git, peff, dstolee

Jonathan Tan <jonathantanmy@google.com> writes:

>> There is slight overhead in "simulated clone", "simulated partial
>> clone", and "clone (partial bitmap)". Unsurprisingly, that overhead is
>> due to using the MIDX's reverse index to map between bit positions and
>> MIDX positions.
>
> Thanks - it's great to see that accessing a MIDX bitmap doesn't add much
> overhead (as compared to accessing a single-pack bitmap of the same
> size).
>
> All the remaining patches up to and including this one look good.
> Overall, I did have some comments here and there, but I am happy with
> the overall design.

Thanks for a review.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 13/22] pack-bitmap: write multi-pack bitmaps
  2021-05-04  5:02   ` Jonathan Tan
@ 2021-05-06 20:18     ` Taylor Blau
  2021-05-06 22:00       ` Jonathan Tan
  0 siblings, 1 reply; 32+ messages in thread
From: Taylor Blau @ 2021-05-06 20:18 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, peff, dstolee, gitster

On Mon, May 03, 2021 at 10:02:30PM -0700, Jonathan Tan wrote:
> > +static void prepare_midx_packing_data(struct packing_data *pdata,
> > +				      struct write_midx_context *ctx)
> > +{
> > +	uint32_t i;
> > +
> > +	memset(pdata, 0, sizeof(struct packing_data));
> > +	prepare_packing_data(the_repository, pdata);
> > +
> > +	for (i = 0; i < ctx->entries_nr; i++) {
> > +		struct pack_midx_entry *from = &ctx->entries[ctx->pack_order[i]];
> > +		struct object_entry *to = packlist_alloc(pdata, &from->oid);
> > +
> > +		oe_set_in_pack(pdata, to,
> > +			       ctx->info[ctx->pack_perm[from->pack_int_id]].p);
> > +	}
> > +}
>
> It is surprising to see this right at the top. Scrolling down, I guess
> that there is more information needed than just the packing_data struct.

Hmm, which part is surprising to you? This function is setting up the
packing_data structure that I mentioned in the commit message, which
happens in two steps. First, we allocate and call
prepare_packing_data(). And then we call packlist_alloc() for each
object in the MIDX, setting up some information about each object
(like its OID and which physical pack it came from).

But if any of this is unclear, let me know which part and I'd be happy
to add a clarifying comment.

> > +static int add_ref_to_pending(const char *refname,
> > +			      const struct object_id *oid,
> > +			      int flag, void *cb_data)
> > +{
> > +	struct rev_info *revs = (struct rev_info*)cb_data;
> > +	struct object *object;
> > +
> > +	if ((flag & REF_ISSYMREF) && (flag & REF_ISBROKEN)) {
> > +		warning("symbolic ref is dangling: %s", refname);
> > +		return 0;
> > +	}
> > +
> > +	object = parse_object_or_die(oid, refname);
> > +	if (object->type != OBJ_COMMIT)
> > +		return 0;
> > +
> > +	add_pending_object(revs, object, "");
> > +	if (bitmap_is_preferred_refname(revs->repo, refname))
> > +		object->flags |= NEEDS_BITMAP;
> > +	return 0;
> > +}
>
> Makes sense. We need to flag certain commits as NEEDS_BITMAP because
> bitmaps are not made for all commits but only certain ones.

Right, and the NEEDS_BITMAP is a bit of a misnomer. It's true meaning is
more like BITMAPPING_THIS_WOULD_BE_A_GOOD_IDEA, since it roughly
translates to "bitmap this commit before any others in its window". More
details are in bitmap_writer_select_commits(), but in all honesty I find
the implementation there somewhat confusing.

> > +static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr_p,
> > +						    struct write_midx_context *ctx)
> > +{
> > +	struct rev_info revs;
> > +	struct bitmap_commit_cb cb;
> > +
> > +	memset(&cb, 0, sizeof(struct bitmap_commit_cb));
> > +	cb.ctx = ctx;
> > +
> > +	repo_init_revisions(the_repository, &revs, NULL);
> > +	for_each_ref(add_ref_to_pending, &revs);
> > +
> > +	fetch_if_missing = 0;
> > +	revs.exclude_promisor_objects = 1;
>
> I think that the MIDX bitmap requires all objects be present? If yes, we
> should omit these 2 lines.

It does require that all objects are present, but if we fetched any
promisor objects at this stage it would be too late. That's because by
the time we're in this function, all of the packs that are to be
included in the MIDX should already exist on disk.

Skipping promisor objects here is intentional, since it only excludes
them from the list of reachable commits that we want to select from when
computing the selection of MIDX'd commits to receive bitmaps.

But, if one of those promisor objects is reachable from another object
that is included in the bitmap, then we will complain later on that we
couldn't find a reachability closure (and fail appropriately).

That said, I'm not sure any of that is obvious from reading this code,
so I'll add a comment to that effect around these lines.

> > +
> > +	if (prepare_revision_walk(&revs))
> > +		die(_("revision walk setup failed"));
> > +
> > +	traverse_commit_list(&revs, bitmap_show_commit, NULL, &cb);
> > +	if (indexed_commits_nr_p)
> > +		*indexed_commits_nr_p = cb.commits_nr;
> > +
> > +	return cb.commits;
> > +}
>
> Hmm...I might be missing something obvious, but this function and its
> callbacks seem to be written like this in order to put the returned
> commits in a certain order. But later on in write_midx_bitmap(), the
> return value of this function is passed to
> bitmap_writer_select_commits(), which resorts the list anyway?

It isn't intentional, but rather just to build up the list in topo
order. In fact, the order we build it up in isn't quite the same as how
the pack bitmap code generates it (it is in true topo order, at least on
GitHub's servers, as a side effect of using delta islands).

The fact that we resort according to date_compare makes me wonder why
changing that seemed to make such a difference for us. The whole
selection code is a mystery to me.

But no, the order shouldn't matter since we QSORT it later. Any code
here that looks like it's putting it in a certain order has much more to
do with convenience than anything else.

>
> > +static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash,
> > +			     struct write_midx_context *ctx,
> > +			     unsigned flags)
> > +{
> > +	struct packing_data pdata;
> > +	struct pack_idx_entry **index;
> > +	struct commit **commits = NULL;
> > +	uint32_t i, commits_nr;
> > +	char *bitmap_name = xstrfmt("%s-%s.bitmap", midx_name, hash_to_hex(midx_hash));
> > +	int ret;
> > +
> > +	prepare_midx_packing_data(&pdata, ctx);
> > +
> > +	commits = find_commits_for_midx_bitmap(&commits_nr, ctx);
> > +
> > +	/*
> > +	 * Build the MIDX-order index based on pdata.objects (which is already
> > +	 * in MIDX order; c.f., 'midx_pack_order_cmp()' for the definition of
> > +	 * this order).
> > +	 */
> > +	ALLOC_ARRAY(index, pdata.nr_objects);
> > +	for (i = 0; i < pdata.nr_objects; i++)
> > +		index[i] = (struct pack_idx_entry *)&pdata.objects[i];
> > +
> > +	bitmap_writer_show_progress(flags & MIDX_PROGRESS);
> > +	bitmap_writer_build_type_index(&pdata, index, pdata.nr_objects);
> > +
> > +	/*
> > +	 * bitmap_writer_select_commits expects objects in lex order, but
> > +	 * pack_order gives us exactly that. use it directly instead of
> > +	 * re-sorting the array
> > +	 */
> > +	for (i = 0; i < pdata.nr_objects; i++)
> > +		index[ctx->pack_order[i]] = (struct pack_idx_entry *)&pdata.objects[i];
> > +
> > +	bitmap_writer_select_commits(commits, commits_nr, -1);
>
> The comment above says bitmap_writer_select_commits() expects objects in
> lex order, but (1) you're putting "index" in lex order, not "commits",
> and (2) the first thing in bitmap_writer_select_commits() is a QSORT.
> Did you mean another function?

Ack, I definitely meant bitmap_writer_build(). Thanks for catching.

> > +	ret = bitmap_writer_build(&pdata);
> > +	if (!ret)
> > +		goto cleanup;
> > +
> > +	bitmap_writer_set_checksum(midx_hash);
> > +	bitmap_writer_finish(index, pdata.nr_objects, bitmap_name, 0);
>
> So bitmap_writer_build_type_index() and bitmap_writer_finish() are
> called with 2 different orders of commits. Is this expected? If yes,
> maybe this is worth a comment.

Confusingly so, but yes, these two do expect different orders. You can
see the same re-sorting going on much more subtly in
pack-write.c:write_idx_file(), which is called by
builtin/pack-objects.c:finish_tmp_packfile(), which happens between
bitmap_writer_build_type_index() and bitmap_writer_finish().

Definitely worth adding a comment.

> > @@ -930,9 +1073,16 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
> >  		for (i = 0; i < ctx.m->num_packs; i++) {
> >  			ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc);
> >
> > +			if (prepare_midx_pack(the_repository, ctx.m, i)) {
> > +				error(_("could not load pack %s"),
> > +				      ctx.m->pack_names[i]);
> > +				result = 1;
> > +				goto cleanup;
> > +			}
> > +
> >  			ctx.info[ctx.nr].orig_pack_int_id = i;
> >  			ctx.info[ctx.nr].pack_name = xstrdup(ctx.m->pack_names[i]);
> > -			ctx.info[ctx.nr].p = NULL;
> > +			ctx.info[ctx.nr].p = ctx.m->packs[i];
> >  			ctx.info[ctx.nr].expired = 0;
> >  			ctx.nr++;
> >  		}
>
> Why is this needed now and not before? From what I see in this function,
> nothing seems to happen to this .p pack except that they are closed
> later.

These are used by prepare_midx_packing_data().

> > @@ -1096,6 +1264,15 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
> >  		if (ctx.info[i].p) {
> >  			close_pack(ctx.info[i].p);
> >  			free(ctx.info[i].p);
> > +			if (ctx.m) {
> > +				/*
> > +				 * Destroy a stale reference to the pack in
> > +				 * 'ctx.m'.
> > +				 */
> > +				uint32_t orig = ctx.info[i].orig_pack_int_id;
> > +				if (orig < ctx.m->num_packs)
> > +					ctx.m->packs[orig] = NULL;
> > +			}
> >  		}
> >  		free(ctx.info[i].pack_name);
> >  	}
>
> Is this hunk needed? "ctx" is a local variable and will not outlast this
> function.

I can't remember exactly why I added this. I'll play around with it and
either remove it or add a comment why it's necessary before the next
reroll.

> I'll review the rest tomorrow. It seems like I've gotten over the most
> difficult patches.

Thanks, and sorry that this took me a few days to get back to. I
appreciate your review immensely.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 13/22] pack-bitmap: write multi-pack bitmaps
  2021-05-06 20:18     ` Taylor Blau
@ 2021-05-06 22:00       ` Jonathan Tan
  0 siblings, 0 replies; 32+ messages in thread
From: Jonathan Tan @ 2021-05-06 22:00 UTC (permalink / raw)
  To: me; +Cc: jonathantanmy, git, peff, dstolee, gitster

> On Mon, May 03, 2021 at 10:02:30PM -0700, Jonathan Tan wrote:
> > > +static void prepare_midx_packing_data(struct packing_data *pdata,
> > > +				      struct write_midx_context *ctx)
> > > +{
> > > +	uint32_t i;
> > > +
> > > +	memset(pdata, 0, sizeof(struct packing_data));
> > > +	prepare_packing_data(the_repository, pdata);
> > > +
> > > +	for (i = 0; i < ctx->entries_nr; i++) {
> > > +		struct pack_midx_entry *from = &ctx->entries[ctx->pack_order[i]];
> > > +		struct object_entry *to = packlist_alloc(pdata, &from->oid);
> > > +
> > > +		oe_set_in_pack(pdata, to,
> > > +			       ctx->info[ctx->pack_perm[from->pack_int_id]].p);
> > > +	}
> > > +}
> >
> > It is surprising to see this right at the top. Scrolling down, I guess
> > that there is more information needed than just the packing_data struct.
> 
> Hmm, which part is surprising to you? This function is setting up the
> packing_data structure that I mentioned in the commit message, which
> happens in two steps. First, we allocate and call
> prepare_packing_data(). And then we call packlist_alloc() for each
> object in the MIDX, setting up some information about each object
> (like its OID and which physical pack it came from).
> 
> But if any of this is unclear, let me know which part and I'd be happy
> to add a clarifying comment.

Ah, I think I was unclear. I was thinking that the commit message led me
to believe that all information needed for creating a bitmap lies in the
packing_data struct, so I would have expected several helper functions
followed by a function that actually writes the packing_data struct.
Maybe the commit message could be rewritten to avoid that confusion, but
it's probably not a big deal.

> > > +static struct commit **find_commits_for_midx_bitmap(uint32_t *indexed_commits_nr_p,
> > > +						    struct write_midx_context *ctx)
> > > +{
> > > +	struct rev_info revs;
> > > +	struct bitmap_commit_cb cb;
> > > +
> > > +	memset(&cb, 0, sizeof(struct bitmap_commit_cb));
> > > +	cb.ctx = ctx;
> > > +
> > > +	repo_init_revisions(the_repository, &revs, NULL);
> > > +	for_each_ref(add_ref_to_pending, &revs);
> > > +
> > > +	fetch_if_missing = 0;
> > > +	revs.exclude_promisor_objects = 1;
> >
> > I think that the MIDX bitmap requires all objects be present? If yes, we
> > should omit these 2 lines.
> 
> It does require that all objects are present, but if we fetched any
> promisor objects at this stage it would be too late. That's because by
> the time we're in this function, all of the packs that are to be
> included in the MIDX should already exist on disk.
> 
> Skipping promisor objects here is intentional, since it only excludes
> them from the list of reachable commits that we want to select from when
> computing the selection of MIDX'd commits to receive bitmaps.
> 
> But, if one of those promisor objects is reachable from another object
> that is included in the bitmap, then we will complain later on that we
> couldn't find a reachability closure (and fail appropriately).
> 
> That said, I'm not sure any of that is obvious from reading this code,
> so I'll add a comment to that effect around these lines.

So you're saying that if we have missing promisor commits as in the
following graph:

   A
  / \
 B   C
 |   |
 .   .
 .   .
 .   .

where B is missing but promised, and only C is NEEDS_BITMAP, then the
MIDX bitmap write will still work? (So the rev walk is intended to walk
through A and C but not B, and because we are only building bitmaps for
C and potentially its ancestors, we only need the objects in C's
transitive closure.) Even if this is true, "exclude_promisor_objects" is
the wrong option here, because it will exclude all commits that came
from a promisor remote (regardless of whether it is present locally).
(That's how "promisor object" is defined in partial-clone.txt.) What we
need would be an option that permits missing links.

And even if we go with that option that permits missing links, it still
remains that we have very little support for missing promisor commits in
Git right now.

It might be better to just assume that MIDX will only be used for full
clones. If you want, you can add a NEEDSWORK explaining the above case.

> > > +
> > > +	if (prepare_revision_walk(&revs))
> > > +		die(_("revision walk setup failed"));
> > > +
> > > +	traverse_commit_list(&revs, bitmap_show_commit, NULL, &cb);
> > > +	if (indexed_commits_nr_p)
> > > +		*indexed_commits_nr_p = cb.commits_nr;
> > > +
> > > +	return cb.commits;
> > > +}
> >
> > Hmm...I might be missing something obvious, but this function and its
> > callbacks seem to be written like this in order to put the returned
> > commits in a certain order. But later on in write_midx_bitmap(), the
> > return value of this function is passed to
> > bitmap_writer_select_commits(), which resorts the list anyway?
> 
> It isn't intentional, but rather just to build up the list in topo
> order. In fact, the order we build it up in isn't quite the same as how
> the pack bitmap code generates it (it is in true topo order, at least on
> GitHub's servers, as a side effect of using delta islands).
> 
> The fact that we resort according to date_compare makes me wonder why
> changing that seemed to make such a difference for us. The whole
> selection code is a mystery to me.
> 
> But no, the order shouldn't matter since we QSORT it later. Any code
> here that looks like it's putting it in a certain order has much more to
> do with convenience than anything else.

If the order doesn't matter, why don't you just copy one-by-one from
data->ctx->entries into data->commits? (Unless data->ctx->entries has
extra commits?)

> > > +	ret = bitmap_writer_build(&pdata);
> > > +	if (!ret)
> > > +		goto cleanup;
> > > +
> > > +	bitmap_writer_set_checksum(midx_hash);
> > > +	bitmap_writer_finish(index, pdata.nr_objects, bitmap_name, 0);
> >
> > So bitmap_writer_build_type_index() and bitmap_writer_finish() are
> > called with 2 different orders of commits. Is this expected? If yes,
> > maybe this is worth a comment.
> 
> Confusingly so, but yes, these two do expect different orders. You can
> see the same re-sorting going on much more subtly in
> pack-write.c:write_idx_file(), which is called by
> builtin/pack-objects.c:finish_tmp_packfile(), which happens between
> bitmap_writer_build_type_index() and bitmap_writer_finish().
> 
> Definitely worth adding a comment.

Ah, I see. Thanks for your explanation.

> > > @@ -930,9 +1073,16 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
> > >  		for (i = 0; i < ctx.m->num_packs; i++) {
> > >  			ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc);
> > >
> > > +			if (prepare_midx_pack(the_repository, ctx.m, i)) {
> > > +				error(_("could not load pack %s"),
> > > +				      ctx.m->pack_names[i]);
> > > +				result = 1;
> > > +				goto cleanup;
> > > +			}
> > > +
> > >  			ctx.info[ctx.nr].orig_pack_int_id = i;
> > >  			ctx.info[ctx.nr].pack_name = xstrdup(ctx.m->pack_names[i]);
> > > -			ctx.info[ctx.nr].p = NULL;
> > > +			ctx.info[ctx.nr].p = ctx.m->packs[i];
> > >  			ctx.info[ctx.nr].expired = 0;
> > >  			ctx.nr++;
> > >  		}
> >
> > Why is this needed now and not before? From what I see in this function,
> > nothing seems to happen to this .p pack except that they are closed
> > later.
> 
> These are used by prepare_midx_packing_data().

Ah, thanks.

> > > @@ -1096,6 +1264,15 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
> > >  		if (ctx.info[i].p) {
> > >  			close_pack(ctx.info[i].p);
> > >  			free(ctx.info[i].p);
> > > +			if (ctx.m) {
> > > +				/*
> > > +				 * Destroy a stale reference to the pack in
> > > +				 * 'ctx.m'.
> > > +				 */
> > > +				uint32_t orig = ctx.info[i].orig_pack_int_id;
> > > +				if (orig < ctx.m->num_packs)
> > > +					ctx.m->packs[orig] = NULL;
> > > +			}
> > >  		}
> > >  		free(ctx.info[i].pack_name);
> > >  	}
> >
> > Is this hunk needed? "ctx" is a local variable and will not outlast this
> > function.
> 
> I can't remember exactly why I added this. I'll play around with it and
> either remove it or add a comment why it's necessary before the next
> reroll.

OK.

> 
> > I'll review the rest tomorrow. It seems like I've gotten over the most
> > difficult patches.
> 
> Thanks, and sorry that this took me a few days to get back to. I
> appreciate your review immensely.

No worries, and thanks for these patches.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2021-05-06 22:00 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-09 18:10 [PATCH 00/22] multi-pack reachability bitmaps Taylor Blau
2021-04-09 18:10 ` [PATCH 01/22] pack-bitmap.c: harden 'test_bitmap_walk()' to check type bitmaps Taylor Blau
2021-04-09 18:10 ` [PATCH 02/22] pack-bitmap-write.c: gracefully fail to write non-closed bitmaps Taylor Blau
2021-04-16  2:46   ` Jonathan Tan
2021-04-09 18:10 ` [PATCH 03/22] pack-bitmap-write.c: free existing bitmaps Taylor Blau
2021-04-09 18:10 ` [PATCH 04/22] Documentation: build 'technical/bitmap-format' by default Taylor Blau
2021-04-09 18:11 ` [PATCH 05/22] Documentation: describe MIDX-based bitmaps Taylor Blau
2021-04-09 18:11 ` [PATCH 06/22] midx: make a number of functions non-static Taylor Blau
2021-04-09 18:11 ` [PATCH 07/22] midx: clear auxiliary .rev after replacing the MIDX Taylor Blau
2021-04-09 18:11 ` [PATCH 08/22] midx: respect 'core.multiPackIndex' when writing Taylor Blau
2021-04-09 18:11 ` [PATCH 09/22] pack-bitmap.c: introduce 'bitmap_num_objects()' Taylor Blau
2021-04-09 18:11 ` [PATCH 10/22] pack-bitmap.c: introduce 'nth_bitmap_object_oid()' Taylor Blau
2021-04-09 18:11 ` [PATCH 11/22] pack-bitmap.c: introduce 'bitmap_is_preferred_refname()' Taylor Blau
2021-04-09 18:11 ` [PATCH 12/22] pack-bitmap: read multi-pack bitmaps Taylor Blau
2021-04-16  2:39   ` Jonathan Tan
2021-04-16  3:13     ` Taylor Blau
2021-04-09 18:11 ` [PATCH 13/22] pack-bitmap: write " Taylor Blau
2021-05-04  5:02   ` Jonathan Tan
2021-05-06 20:18     ` Taylor Blau
2021-05-06 22:00       ` Jonathan Tan
2021-04-09 18:11 ` [PATCH 14/22] t5310: move some tests to lib-bitmap.sh Taylor Blau
2021-04-09 18:11 ` [PATCH 15/22] t/helper/test-read-midx.c: add --checksum mode Taylor Blau
2021-04-09 18:12 ` [PATCH 16/22] t5326: test multi-pack bitmap behavior Taylor Blau
2021-05-04 17:51   ` Jonathan Tan
2021-04-09 18:12 ` [PATCH 17/22] t5310: disable GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP Taylor Blau
2021-04-09 18:12 ` [PATCH 18/22] t5319: don't write MIDX bitmaps in t5319 Taylor Blau
2021-04-09 18:12 ` [PATCH 19/22] t7700: update to work with MIDX bitmap test knob Taylor Blau
2021-04-09 18:12 ` [PATCH 20/22] midx: respect 'GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP' Taylor Blau
2021-04-09 18:12 ` [PATCH 21/22] p5310: extract full and partial bitmap tests Taylor Blau
2021-04-09 18:12 ` [PATCH 22/22] p5326: perf tests for MIDX bitmaps Taylor Blau
2021-05-04 18:00   ` Jonathan Tan
2021-05-05  0:55     ` Junio C Hamano

git@vger.kernel.org list mirror (unofficial, one of many)

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 git git/ https://public-inbox.org/git \
		git@vger.kernel.org
	public-inbox-index git

Example config snippet for mirrors.
Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://7fh6tueqddpjyxjmgtdiueylzoqt6pt7hec3pukyptlmohoowvhde4yd.onion/inbox.comp.version-control.git
	nntp://ie5yzdi7fg72h7s4sdcztq5evakq23rdt33mfyfcddc5u3ndnw24ogqd.onion/inbox.comp.version-control.git
	nntp://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/inbox.comp.version-control.git
	nntp://news.gmane.io/gmane.comp.version-control.git
 note: .onion URLs require Tor: https://www.torproject.org/

code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git