git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/2] midx: prevent bitmap corruption when permuting pack order
@ 2021-12-08 19:26 Taylor Blau
  2021-12-08 19:26 ` [PATCH 1/2] t5326: demonstrate bitmap corruption after permutation Taylor Blau
                   ` (4 more replies)
  0 siblings, 5 replies; 59+ messages in thread
From: Taylor Blau @ 2021-12-08 19:26 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, stolee

This series demonstrates and fixes a problem (which has existed since the
inception of MIDX bitmaps) which made it possible to reliably introduce bitmap
corruption into a repository.

The recipe (which is explained in detail in the first patch) roughly looks like
creating a MIDX with some packs. Then re-generating the MIDX and its bitmap but
under a different object order (e.g., by changing the `--preferred-pack`).

Because we used to link the new .rev file into place, and because the object
order is not encoded into the MIDX itself, we would inadvertently generate a
.rev file with the same name as the existing one that contains different
contents, and then fail to move it into place. That causes us to read the
.bitmap under the old object order, producing incorrect results.

This series applies a minimal fix (in the second patch), which is to include the
pack ordering in the MIDX itself. Some small thoughts here:

  - We could alternatively have included the entire .rev file's contents in the
    MIDX. But this ends up being kind of awkward (and is also discussed in the
    second patch).

  - We could cache the identity of the preferred pack (which we currently
    discover by looking up the pack for the object in the 0th position of the
    MIDX's pseudo-pack order) by just reading the first value of the new
    optional chunk.

I'm certainly interested in pursuing the latter, but in a different series,
since I'd like to keep this fix as minimal as possible.

Taylor Blau (2):
  t5326: demonstrate bitmap corruption after permutation
  midx.c: make changing the preferred pack safe

 Documentation/technical/multi-pack-index.txt |  1 +
 Documentation/technical/pack-format.txt      |  4 +++
 midx.c                                       | 25 +++++++++++++---
 t/t5326-multi-pack-bitmaps.sh                | 31 ++++++++++++++++++++
 4 files changed, 57 insertions(+), 4 deletions(-)

-- 
2.34.1.25.gb3157a20e6

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 1/2] t5326: demonstrate bitmap corruption after permutation
  2021-12-08 19:26 [PATCH 0/2] midx: prevent bitmap corruption when permuting pack order Taylor Blau
@ 2021-12-08 19:26 ` Taylor Blau
  2021-12-08 19:26 ` [PATCH 2/2] midx.c: make changing the preferred pack safe Taylor Blau
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 59+ messages in thread
From: Taylor Blau @ 2021-12-08 19:26 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, stolee

This patch demonstrates a cause of bitmap corruption that can occur when
the contents of the multi-pack index does not change, but the underlying
object order does.

In this example, we have a MIDX containing two packs, each with a
distinct set of objects (pack A corresponds to the tree, blob, and
commit from the first patch, and pack B corresponds to the second
patch).

First, a MIDX is written where the 'A' pack is preferred. As expected,
the bitmaps generated there are in-tact. But then, we generate an
identical MIDX with a different object order: this time preferring pack
'B'.

Due to a bug which will be explained and fixed in the following commit,
the MIDX is updated, but the .rev file is not, causing the .bitmap file
to be read incorrectly. Specifically, the .bitmap file will contain
correct data, but the auxiliary object order in the .rev file is stale,
causing readers to get confused by reading the new bitmaps using the old
object order.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5326-multi-pack-bitmaps.sh | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index e187f90f29..0ca2868b0b 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -395,4 +395,35 @@ test_expect_success 'hash-cache values are propagated from pack bitmaps' '
 	)
 '
 
+test_expect_failure 'changing the preferred pack does not corrupt bitmaps' '
+	rm -fr repo &&
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		test_commit A &&
+		test_commit B &&
+
+		git rev-list --objects --no-object-names HEAD^ >A.objects &&
+		git rev-list --objects --no-object-names HEAD^.. >B.objects &&
+
+		A=$(git pack-objects $objdir/pack/pack <A.objects) &&
+		B=$(git pack-objects $objdir/pack/pack <B.objects) &&
+
+		cat >indexes <<-EOF &&
+		pack-$A.idx
+		pack-$B.idx
+		EOF
+
+		git multi-pack-index write --bitmap --stdin-packs \
+			--preferred-pack=pack-$A.pack <indexes &&
+		git rev-list --test-bitmap A &&
+
+		git multi-pack-index write --bitmap --stdin-packs \
+			--preferred-pack=pack-$B.pack <indexes &&
+		git rev-list --test-bitmap A
+	)
+'
+
 test_done
-- 
2.34.1.25.gb3157a20e6


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 2/2] midx.c: make changing the preferred pack safe
  2021-12-08 19:26 [PATCH 0/2] midx: prevent bitmap corruption when permuting pack order Taylor Blau
  2021-12-08 19:26 ` [PATCH 1/2] t5326: demonstrate bitmap corruption after permutation Taylor Blau
@ 2021-12-08 19:26 ` Taylor Blau
  2021-12-08 19:30 ` [PATCH 0/2] midx: prevent bitmap corruption when permuting pack order Derrick Stolee
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 59+ messages in thread
From: Taylor Blau @ 2021-12-08 19:26 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, stolee

The previous patch demonstrates a bug where a MIDX's auxiliary object
order can become out of sync with a MIDX bitmap.

This is because of two confounding factors:

  - First, the object order is stored in a file which is named according
    to the multi-pack index's checksum, and the MIDX does not store the
    object order. This means that the object order can change without
    altering the checksum.

  - But the .rev file is moved into place with finalize_object_file(),
    which link(2)'s the file into place instead of renaming it. For us,
    that means that a modified .rev file will not be moved into place if
    MIDX's checksum was unchanged.

The fix here is two-fold. First, we need to stop linking the file into
place and instead rename it. It's likely we were using
finalize_object_file() instead of a pure rename() because the former
also adjusts shared permissions. But that is unnecessary, because we
already do so in write_rev_file_order(), so rename alone is safe.

But we also need to make the MIDX's checksum change in some way when the
preferred pack changes without altering the set of packs stored in a
MIDX to prevent a race where the new .rev file is moved into place
before the MIDX is updated. Here, you'd get the opposite effect: reading
old bitmaps with the new object order.

But this race bites us even here: suppose that we didn't change the MIDX
checksum, but only renamed the auxiliary object order into place instead
of hardlinking it. Then when we go to generate the new bitmap, we'll
load the old MIDX bitmap, along with the MIDX that it references. That's
fine, since the new MIDX isn't moved into place until after the new
bitmap is generated. But the new object order *has* been moved into
place. So we'll read the old bitmaps in the new order when generating
the new bitmap file, meaning that without this secondary change, bitmap
generation itself would become a victim of the race described here.

This can all be prevented by forcing the MIDX's checksum to change when
the object order changes. We could include the entire object order in
the MIDX, but doing so is somewhat awkward. (For example, the code that
writes a .rev file expects to know the checksum of the associated pack
or MIDX, but writing that data into the MIDX itself makes that a
circular dependency).

Instead, make the pack order used during bitmap generation part of the
MIDX itself. That means that the new test in t5326 will cause the MIDX's
checksum to update, preventing the stale read problem.

There is no need to store the complete pack order here, since the only
way a fixed set of packs can result in a change in the object ordering
is by changing the preferred pack. That's because packs are sorted
according to their pack name only, so it would be impossible to induce a
different order by, say, touching a pack's mtime.

But storing the complete pack order gives us some flexibility in the
future, and it only costs 4 bytes per pack to do. In the future, we
could also optimize .rev reads looking for the identity of the preferred
pack by consulting this list if it exists.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/technical/multi-pack-index.txt |  1 +
 Documentation/technical/pack-format.txt      |  4 ++++
 midx.c                                       | 25 ++++++++++++++++----
 t/t5326-multi-pack-bitmaps.sh                |  2 +-
 4 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt
index 86f40f2490..077c893ae3 100644
--- a/Documentation/technical/multi-pack-index.txt
+++ b/Documentation/technical/multi-pack-index.txt
@@ -24,6 +24,7 @@ and their offsets into multiple packfiles. It contains:
   - An offset within the jth packfile for the object.
 - If large offsets are required, we use another list of large
   offsets similar to version 2 pack-indexes.
+- An optional list of pack IDs in bitmap-sorted order.
 
 Thus, we can provide O(log N) lookup time for any number
 of packfiles.
diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
index 8d2f42f29e..edd77fb437 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -376,6 +376,10 @@ CHUNK DATA:
 	[Optional] Object Large Offsets (ID: {'L', 'O', 'F', 'F'})
 	    8-byte offsets into large packfiles.
 
+	[Optional] Bitmap pack order (ID: {'P', 'O', 'R', 'D'})
+	    A list of 4-byte pack identifiers in sorted order according to
+	    their relative bitmap positions.
+
 TRAILER:
 
 	Index checksum of the above contents.
diff --git a/midx.c b/midx.c
index 837b46b2af..abcdb92dd1 100644
--- a/midx.c
+++ b/midx.c
@@ -33,6 +33,7 @@
 #define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
 #define MIDX_CHUNKID_OBJECTOFFSETS 0x4f4f4646 /* "OOFF" */
 #define MIDX_CHUNKID_LARGEOFFSETS 0x4c4f4646 /* "LOFF" */
+#define MIDX_CHUNKID_PACKORDER 0x504f5244 /* "PORD" */
 #define MIDX_CHUNK_FANOUT_SIZE (sizeof(uint32_t) * 256)
 #define MIDX_CHUNK_OFFSET_WIDTH (2 * sizeof(uint32_t))
 #define MIDX_CHUNK_LARGE_OFFSET_WIDTH (sizeof(uint64_t))
@@ -833,6 +834,18 @@ static int write_midx_large_offsets(struct hashfile *f,
 	return 0;
 }
 
+static int write_midx_pack_order(struct hashfile *f,
+				 void *data)
+{
+	struct write_midx_context *ctx = data;
+	uint32_t i;
+
+	for (i = 0; i < ctx->entries_nr; i++)
+		hashwrite_be32(f, ctx->pack_order[i]);
+
+	return 0;
+}
+
 struct midx_pack_order_data {
 	uint32_t nr;
 	uint32_t pack;
@@ -891,7 +904,7 @@ static void write_midx_reverse_index(char *midx_name, unsigned char *midx_hash,
 	tmp_file = write_rev_file_order(NULL, ctx->pack_order, ctx->entries_nr,
 					midx_hash, WRITE_REV);
 
-	if (finalize_object_file(tmp_file, buf.buf))
+	if (rename(tmp_file, buf.buf))
 		die(_("cannot store reverse index file"));
 
 	strbuf_release(&buf);
@@ -1403,15 +1416,19 @@ static int write_midx_internal(const char *object_dir,
 			(size_t)ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH,
 			write_midx_large_offsets);
 
+	if (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP)) {
+		ctx.pack_order = midx_pack_order(&ctx);
+		add_chunk(cf, MIDX_CHUNKID_PACKORDER,
+			  ctx.entries_nr * sizeof(uint32_t),
+			  write_midx_pack_order);
+	}
+
 	write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs);
 	write_chunkfile(cf, &ctx);
 
 	finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
 	free_chunkfile(cf);
 
-	if (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))
-		ctx.pack_order = midx_pack_order(&ctx);
-
 	if (flags & MIDX_WRITE_REV_INDEX)
 		write_midx_reverse_index(midx_name.buf, midx_hash, &ctx);
 	if (flags & MIDX_WRITE_BITMAP) {
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 0ca2868b0b..353282310d 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -395,7 +395,7 @@ test_expect_success 'hash-cache values are propagated from pack bitmaps' '
 	)
 '
 
-test_expect_failure 'changing the preferred pack does not corrupt bitmaps' '
+test_expect_success 'changing the preferred pack does not corrupt bitmaps' '
 	rm -fr repo &&
 	git init repo &&
 	test_when_finished "rm -fr repo" &&
-- 
2.34.1.25.gb3157a20e6

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 0/2] midx: prevent bitmap corruption when permuting pack order
  2021-12-08 19:26 [PATCH 0/2] midx: prevent bitmap corruption when permuting pack order Taylor Blau
  2021-12-08 19:26 ` [PATCH 1/2] t5326: demonstrate bitmap corruption after permutation Taylor Blau
  2021-12-08 19:26 ` [PATCH 2/2] midx.c: make changing the preferred pack safe Taylor Blau
@ 2021-12-08 19:30 ` Derrick Stolee
  2021-12-08 19:55   ` Jeff King
  2021-12-14  1:55 ` [PATCH v2 0/8] " Taylor Blau
  2022-01-04 18:15 ` [PATCH v3 0/9] " Taylor Blau
  4 siblings, 1 reply; 59+ messages in thread
From: Derrick Stolee @ 2021-12-08 19:30 UTC (permalink / raw)
  To: Taylor Blau, git; +Cc: gitster, peff

On 12/8/2021 2:26 PM, Taylor Blau wrote:
> This series demonstrates and fixes a problem (which has existed since the
> inception of MIDX bitmaps) which made it possible to reliably introduce bitmap
> corruption into a repository.
> 
> The recipe (which is explained in detail in the first patch) roughly looks like
> creating a MIDX with some packs. Then re-generating the MIDX and its bitmap but
> under a different object order (e.g., by changing the `--preferred-pack`).
> 
> Because we used to link the new .rev file into place, and because the object
> order is not encoded into the MIDX itself, we would inadvertently generate a
> .rev file with the same name as the existing one that contains different
> contents, and then fail to move it into place. That causes us to read the
> .bitmap under the old object order, producing incorrect results.
> 
> This series applies a minimal fix (in the second patch), which is to include the
> pack ordering in the MIDX itself. Some small thoughts here:
> 
>   - We could alternatively have included the entire .rev file's contents in the
>     MIDX. But this ends up being kind of awkward (and is also discussed in the
>     second patch).
> 
>   - We could cache the identity of the preferred pack (which we currently
>     discover by looking up the pack for the object in the 0th position of the
>     MIDX's pseudo-pack order) by just reading the first value of the new
>     optional chunk.
> 
> I'm certainly interested in pursuing the latter, but in a different series,
> since I'd like to keep this fix as minimal as possible.
> 
> Taylor Blau (2):
>   t5326: demonstrate bitmap corruption after permutation
>   midx.c: make changing the preferred pack safe

Just chiming in to say that I reviewed an earlier version of this series
and the version in this submission looks good to me.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 0/2] midx: prevent bitmap corruption when permuting pack order
  2021-12-08 19:30 ` [PATCH 0/2] midx: prevent bitmap corruption when permuting pack order Derrick Stolee
@ 2021-12-08 19:55   ` Jeff King
  2021-12-10 18:36     ` Taylor Blau
  0 siblings, 1 reply; 59+ messages in thread
From: Jeff King @ 2021-12-08 19:55 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Taylor Blau, git, gitster

On Wed, Dec 08, 2021 at 02:30:17PM -0500, Derrick Stolee wrote:

> > Taylor Blau (2):
> >   t5326: demonstrate bitmap corruption after permutation
> >   midx.c: make changing the preferred pack safe
> 
> Just chiming in to say that I reviewed an earlier version of this series
> and the version in this submission looks good to me.

Ditto. ;)

-Peff

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 0/2] midx: prevent bitmap corruption when permuting pack order
  2021-12-08 19:55   ` Jeff King
@ 2021-12-10 18:36     ` Taylor Blau
  2021-12-10 22:31       ` Taylor Blau
  0 siblings, 1 reply; 59+ messages in thread
From: Taylor Blau @ 2021-12-10 18:36 UTC (permalink / raw)
  To: Jeff King; +Cc: Derrick Stolee, git, gitster

On Wed, Dec 08, 2021 at 02:55:13PM -0500, Jeff King wrote:
> On Wed, Dec 08, 2021 at 02:30:17PM -0500, Derrick Stolee wrote:
>
> > > Taylor Blau (2):
> > >   t5326: demonstrate bitmap corruption after permutation
> > >   midx.c: make changing the preferred pack safe
> >
> > Just chiming in to say that I reviewed an earlier version of this series
> > and the version in this submission looks good to me.
>
> Ditto. ;)

All three of us missed that this PORD chunk actually contains the
psuedo-pack position for every object in the MIDX. That is OK, but it's
definitely adding more than 4 bytes per pack to the MIDX (in practice,
it's adding 4 bytes per object).

I'm semi-OK with this direction, since it's tantamount to storing the
.rev file's contents in the MIDX itself. And even though we're not
reading from it, it is doing the thing we need it to which is causing
the MIDX to change its checksum when the object order changes.

But I'm curious what both of your thoughts are before moving forward.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 0/2] midx: prevent bitmap corruption when permuting pack order
  2021-12-10 18:36     ` Taylor Blau
@ 2021-12-10 22:31       ` Taylor Blau
  2021-12-11  1:39         ` Taylor Blau
  0 siblings, 1 reply; 59+ messages in thread
From: Taylor Blau @ 2021-12-10 22:31 UTC (permalink / raw)
  To: Jeff King; +Cc: Derrick Stolee, git, gitster

On Fri, Dec 10, 2021 at 01:36:02PM -0500, Taylor Blau wrote:
> On Wed, Dec 08, 2021 at 02:55:13PM -0500, Jeff King wrote:
> > On Wed, Dec 08, 2021 at 02:30:17PM -0500, Derrick Stolee wrote:
> >
> > > > Taylor Blau (2):
> > > >   t5326: demonstrate bitmap corruption after permutation
> > > >   midx.c: make changing the preferred pack safe
> > >
> > > Just chiming in to say that I reviewed an earlier version of this series
> > > and the version in this submission looks good to me.
> >
> > Ditto. ;)
>
> All three of us missed that this PORD chunk actually contains the
> psuedo-pack position for every object in the MIDX. That is OK, but it's
> definitely adding more than 4 bytes per pack to the MIDX (in practice,
> it's adding 4 bytes per object).
>
> I'm semi-OK with this direction, since it's tantamount to storing the
> .rev file's contents in the MIDX itself. And even though we're not
> reading from it, it is doing the thing we need it to which is causing
> the MIDX to change its checksum when the object order changes.
>
> But I'm curious what both of your thoughts are before moving forward.

To just add a little bit more detail before I mostly close my computer
for the weekend:

The key part of this bug is that the MIDX checksum could remain
unchanged even when the object order used to write the MIDX bitmap does,
and that's what the first patch here is demonstrating.

Having PORD contain the same data as the .rev file still accomplishes
our original goal of preventing this bug, because it forces the checksum
to change when the object order does. But it's definitely more invasive
than I had imagined.

I had originally imagined that storing the preferred pack's identity
alone would be enough to solve this bug. But that isn't quite so,
because we break ties among duplicate objects first by prefered-ness,
then by their pack's mtime. So that could change too, and it would cause
us to break in the same way.

At the bare minimum you need an ordering of all of the packs in the
MIDX (like I had originally imagined here). At most, we could do
something like what is unintentionally written here, which would allow
us to get rid of MIDX .rev files entirely. I think doing the former is
simpler, and I am not sure if there are practical advantages to the
latter.

But I'm definitely curious as to what others think would be a good
direction to pursue.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 0/2] midx: prevent bitmap corruption when permuting pack order
  2021-12-10 22:31       ` Taylor Blau
@ 2021-12-11  1:39         ` Taylor Blau
  2021-12-13 14:00           ` Derrick Stolee
  0 siblings, 1 reply; 59+ messages in thread
From: Taylor Blau @ 2021-12-11  1:39 UTC (permalink / raw)
  To: Jeff King; +Cc: Derrick Stolee, git, gitster

On Fri, Dec 10, 2021 at 05:31:27PM -0500, Taylor Blau wrote:
> I had originally imagined that storing the preferred pack's identity
> alone would be enough to solve this bug. But that isn't quite so,
> because we break ties among duplicate objects first by prefered-ness,
> then by their pack's mtime. So that could change too, and it would cause
> us to break in the same way.
>
> At the bare minimum you need an ordering of all of the packs in the
> MIDX (like I had originally imagined here). At most, we could do
> something like what is unintentionally written here, which would allow
> us to get rid of MIDX .rev files entirely. I think doing the former is
> simpler, and I am not sure if there are practical advantages to the
> latter.

Thinking on it more, I don't think this "at minimum you would need..."
is quite right either. It would suffice to know the identity of the
preferred pack, and the mtimes of all of the other packs, since that
alone is enough to reconstruct the object order.

That is pretty appealing, too, because knowing the order of packs would
require some major surgery (the order of packs isn't really something
the MIDX code thinks about, it's inferred from the way it sorts
objects).

So here I'd imagine we'd have two chunks:

  - `PREF`: a 4-byte network order integer identifier of the preferred
    pack
  - `MTME`: an array of 4-byte network order integers containing the
    mtimes of each pack in lexical order of the pack checksums.

Unless I'm missing something, I'm pretty sure that these two would be
sufficient to uniquely identify the object order (and, importantly, to
cause the MIDX's checksum to change when the object order does).

And these are vastly easier to write down than the ordering of the packs
themselves, while also being a much more incremental step forward. So I
think I'm more comfortable with this direction than anything discussed
so far.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 0/2] midx: prevent bitmap corruption when permuting pack order
  2021-12-11  1:39         ` Taylor Blau
@ 2021-12-13 14:00           ` Derrick Stolee
  2021-12-13 14:31             ` Taylor Blau
  0 siblings, 1 reply; 59+ messages in thread
From: Derrick Stolee @ 2021-12-13 14:00 UTC (permalink / raw)
  To: Taylor Blau, Jeff King; +Cc: git, gitster

On 12/10/2021 8:39 PM, Taylor Blau wrote:
> On Fri, Dec 10, 2021 at 05:31:27PM -0500, Taylor Blau wrote:
>> I had originally imagined that storing the preferred pack's identity
>> alone would be enough to solve this bug. But that isn't quite so,
>> because we break ties among duplicate objects first by prefered-ness,
>> then by their pack's mtime. So that could change too, and it would cause
>> us to break in the same way.
>>
>> At the bare minimum you need an ordering of all of the packs in the
>> MIDX (like I had originally imagined here). At most, we could do
>> something like what is unintentionally written here, which would allow
>> us to get rid of MIDX .rev files entirely. I think doing the former is
>> simpler, and I am not sure if there are practical advantages to the
>> latter.
> 
> Thinking on it more, I don't think this "at minimum you would need..."
> is quite right either. It would suffice to know the identity of the
> preferred pack, and the mtimes of all of the other packs, since that
> alone is enough to reconstruct the object order.
> 
> That is pretty appealing, too, because knowing the order of packs would
> require some major surgery (the order of packs isn't really something
> the MIDX code thinks about, it's inferred from the way it sorts
> objects).

I think the root cause is that the object order can change when the
preferred pack changes with the same set of pack-files. Suppose we
added more complicated ways of deduplicating objects across the packs?
Then whatever we include here based on preferred packs and mtimes
would need to be updated to match.

However, if we store the contents of the .rev file in the MIDX itself,
then we don't need that extra layer of indirection.

I'm leaning towards keeping the contents of the PORD chunk as-is, but
renaming it to something like OORD (for object order). Then, we can
carefully transition from using the .rev file to reading this chunk.
We will want to continue looking for the .rev file when this chunk does
not exist.

Thanks,
-Stolee


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 0/2] midx: prevent bitmap corruption when permuting pack order
  2021-12-13 14:00           ` Derrick Stolee
@ 2021-12-13 14:31             ` Taylor Blau
  0 siblings, 0 replies; 59+ messages in thread
From: Taylor Blau @ 2021-12-13 14:31 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Taylor Blau, Jeff King, git, gitster

On Mon, Dec 13, 2021 at 09:00:55AM -0500, Derrick Stolee wrote:
> I think the root cause is that the object order can change when the
> preferred pack changes with the same set of pack-files. Suppose we
> added more complicated ways of deduplicating objects across the packs?
> Then whatever we include here based on preferred packs and mtimes
> would need to be updated to match.
>
> However, if we store the contents of the .rev file in the MIDX itself,
> then we don't need that extra layer of indirection.

Yeah, towards the end of the weekend I started to lean towards putting
the whole object order inside of the MIDX.

You're definitely right that the root of the problem is the object order
is stored separately from the MIDX itself. And we probably could get
pretty creative and figure out what the minimal amount of information
needed to fingerprint an object order is. But the amount of time I spent
thinking about each proposal did not leave me with the obvious sense
that there still weren't ways around that could change the object order
without changing the MIDX's checksum.

> I'm leaning towards keeping the contents of the PORD chunk as-is, but
> renaming it to something like OORD (for object order). Then, we can
> carefully transition from using the .rev file to reading this chunk.
> We will want to continue looking for the .rev file when this chunk does
> not exist.

Agreed. What do you think about a series that just writes the chunk but
does not read it? That would solve the bug in the immediate-term. Then
we can go back and transition from the .rev file to the new chunk.

Reading the revindex chunk over the .rev file is pretty easy, as
demonstrated by the patch below.

What I'm less certain about is how we should write tests to make sure
the old strategy works, too. Do we want to run the MIDX bitmap tests
twice in a separate mode each (one that writes .rev files and one that
writes chuns in the MIDX)? Something else? Thoughts welcome.

--- 8< ---

diff --git a/pack-revindex.c b/pack-revindex.c
index 70d0fbafcb..d59ae73198 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -301,6 +301,20 @@ int load_midx_revindex(struct multi_pack_index *m)
 	if (m->revindex_data)
 		return 0;

+	if (m->chunk_revindex) {
+		/*
+		 * If the MIDX `m` has a `RIDX` chunk, then use its contents for
+		 * the reverse index instead of trying to load a separate `.rev`
+		 * file.
+		 *
+		 * Note that we do *not* set `m->revindex_map` here, since we do
+		 * not want to accidentally call munmap() in the middle of the
+		 * MIDX.
+		 */
+		m->revindex_data = (const uint32_t *)m->chunk_revindex;
+		return 0;
+	}
+
 	get_midx_rev_filename(&revindex_name, m);

 	ret = load_revindex_from_disk(revindex_name.buf,

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 0/8] midx: prevent bitmap corruption when permuting pack order
  2021-12-08 19:26 [PATCH 0/2] midx: prevent bitmap corruption when permuting pack order Taylor Blau
                   ` (2 preceding siblings ...)
  2021-12-08 19:30 ` [PATCH 0/2] midx: prevent bitmap corruption when permuting pack order Derrick Stolee
@ 2021-12-14  1:55 ` Taylor Blau
  2021-12-14  1:55   ` [PATCH v2 1/8] t5326: demonstrate bitmap corruption after permutation Taylor Blau
                     ` (9 more replies)
  2022-01-04 18:15 ` [PATCH v3 0/9] " Taylor Blau
  4 siblings, 10 replies; 59+ messages in thread
From: Taylor Blau @ 2021-12-14  1:55 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, stolee

Here is a reroll of my series which fixes a serious problem with MIDX bitmaps by
which they can become corrupt when permuting their pack order.

The problem is described in detail in the second patch. But this version goes
further and stops writing the MIDX .rev file altogether, instead reading the
same data out of a new optional 'RIDX' chunk (falling back to the .rev file when
necessary).

Applying just the first two patches is sufficient to fix this problem, but the
remaining six go further to deprecate the MIDX .rev file altogether. I'm not
crazy about the testing strategy, though I think it was the best that I came up
with.

The idea is to essentially copy the core of t5326 into a new t5327. The former
uses the RIDX chunk, and the latter uses .rev files. I would have like to just
run the same tests in t5326 twice, but they munge the state of their test
repository enough to make it sufficiently awkward to do in practice.

So I'm definitely open to suggestions there, but otherwise this series should go
a long ways towards fixing my design mistake of having the MIDX .rev file be
separate from the MIDX itself.

Taylor Blau (8):
  t5326: demonstrate bitmap corruption after permutation
  midx.c: make changing the preferred pack safe
  pack-revindex.c: instrument loading on-disk reverse index
  t5326: drop unnecessary setup
  t5326: extract `test_rev_exists`
  t5326: move tests to t/lib-bitmap.sh
  t/lib-bitmap.sh: parameterize tests over reverse index source
  midx: read `RIDX` chunk when present

 Documentation/technical/multi-pack-index.txt |   1 +
 Documentation/technical/pack-format.txt      |  13 +-
 midx.c                                       |  31 +++-
 midx.h                                       |   1 +
 pack-revindex.c                              |  20 ++
 t/lib-bitmap.sh                              | 186 +++++++++++++++++++
 t/t5326-multi-pack-bitmaps.sh                | 144 +-------------
 t/t5327-multi-pack-bitmaps-rev.sh            |  23 +++
 t/t7700-repack.sh                            |   4 -
 9 files changed, 271 insertions(+), 152 deletions(-)
 create mode 100755 t/t5327-multi-pack-bitmaps-rev.sh

Range-diff against v1:
1:  ea91cebb6b = 1:  dfbac4bc60 t5326: demonstrate bitmap corruption after permutation
2:  fa42d816f4 ! 2:  4ea52e66dd midx.c: make changing the preferred pack safe
    @@ Commit message
         or MIDX, but writing that data into the MIDX itself makes that a
         circular dependency).
     
    -    Instead, make the pack order used during bitmap generation part of the
    +    Instead, make the object order used during bitmap generation part of the
         MIDX itself. That means that the new test in t5326 will cause the MIDX's
         checksum to update, preventing the stale read problem.
     
    -    There is no need to store the complete pack order here, since the only
    -    way a fixed set of packs can result in a change in the object ordering
    -    is by changing the preferred pack. That's because packs are sorted
    -    according to their pack name only, so it would be impossible to induce a
    -    different order by, say, touching a pack's mtime.
    -
    -    But storing the complete pack order gives us some flexibility in the
    -    future, and it only costs 4 bytes per pack to do. In the future, we
    -    could also optimize .rev reads looking for the identity of the preferred
    -    pack by consulting this list if it exists.
    +    In theory, it is possible to store a "fingerprint" of the full object
    +    order here, so long as that fingerprint changes at least as often as the
    +    full object order does. Some possibilities here include storing the
    +    identity of the preferred pack, along with the mtimes of the
    +    non-preferred packs in a consistent order. But storing a limited part of
    +    the information makes it difficult to reason about whether or not there
    +    are gaps between the two that would cause us to get bitten by this bug
    +    again.
     
         Signed-off-by: Taylor Blau <me@ttaylorr.com>
     
    @@ Documentation/technical/multi-pack-index.txt: and their offsets into multiple pa
        - An offset within the jth packfile for the object.
      - If large offsets are required, we use another list of large
        offsets similar to version 2 pack-indexes.
    -+- An optional list of pack IDs in bitmap-sorted order.
    ++- An optional list of objects in pseudo-pack order (used with MIDX bitmaps).
      
      Thus, we can provide O(log N) lookup time for any number
      of packfiles.
    @@ Documentation/technical/pack-format.txt: CHUNK DATA:
      	[Optional] Object Large Offsets (ID: {'L', 'O', 'F', 'F'})
      	    8-byte offsets into large packfiles.
      
    -+	[Optional] Bitmap pack order (ID: {'P', 'O', 'R', 'D'})
    -+	    A list of 4-byte pack identifiers in sorted order according to
    -+	    their relative bitmap positions.
    ++	[Optional] Bitmap pack order (ID: {'R', 'I', 'D', 'X'})
    ++	    A list of MIDX positions (one per object in the MIDX, num_objects in
    ++	    total, each a 4-byte unsigned integer in network byte order), sorted
    ++	    according to their relative bitmap/pseudo-pack positions.
     +
      TRAILER:
      
      	Index checksum of the above contents.
    +@@ Documentation/technical/pack-format.txt: In short, a MIDX's pseudo-pack is the de-duplicated concatenation of
    + objects in packs stored by the MIDX, laid out in pack order, and the
    + packs arranged in MIDX order (with the preferred pack coming first).
    + 
    +-Finally, note that the MIDX's reverse index is not stored as a chunk in
    +-the multi-pack-index itself. This is done because the reverse index
    +-includes the checksum of the pack or MIDX to which it belongs, which
    +-makes it impossible to write in the MIDX. To avoid races when rewriting
    +-the MIDX, a MIDX reverse index includes the MIDX's checksum in its
    +-filename (e.g., `multi-pack-index-xyz.rev`).
    ++The MIDX's reverse index is stored in the optional 'RIDX' chunk within
    ++the MIDX itself.
     
      ## midx.c ##
     @@
      #define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
      #define MIDX_CHUNKID_OBJECTOFFSETS 0x4f4f4646 /* "OOFF" */
      #define MIDX_CHUNKID_LARGEOFFSETS 0x4c4f4646 /* "LOFF" */
    -+#define MIDX_CHUNKID_PACKORDER 0x504f5244 /* "PORD" */
    ++#define MIDX_CHUNKID_REVINDEX 0x52494458 /* "RIDX" */
      #define MIDX_CHUNK_FANOUT_SIZE (sizeof(uint32_t) * 256)
      #define MIDX_CHUNK_OFFSET_WIDTH (2 * sizeof(uint32_t))
      #define MIDX_CHUNK_LARGE_OFFSET_WIDTH (sizeof(uint64_t))
    @@ midx.c: static int write_midx_large_offsets(struct hashfile *f,
      	return 0;
      }
      
    -+static int write_midx_pack_order(struct hashfile *f,
    -+				 void *data)
    ++static int write_midx_revindex(struct hashfile *f,
    ++			       void *data)
     +{
     +	struct write_midx_context *ctx = data;
     +	uint32_t i;
    @@ midx.c: static int write_midx_internal(const char *object_dir,
      
     +	if (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP)) {
     +		ctx.pack_order = midx_pack_order(&ctx);
    -+		add_chunk(cf, MIDX_CHUNKID_PACKORDER,
    ++		add_chunk(cf, MIDX_CHUNKID_REVINDEX,
     +			  ctx.entries_nr * sizeof(uint32_t),
    -+			  write_midx_pack_order);
    ++			  write_midx_revindex);
     +	}
     +
      	write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs);
-:  ---------- > 3:  b630fea149 pack-revindex.c: instrument loading on-disk reverse index
-:  ---------- > 4:  f430b6f2e9 t5326: drop unnecessary setup
-:  ---------- > 5:  73faab9f42 t5326: extract `test_rev_exists`
-:  ---------- > 6:  bf42b116e1 t5326: move tests to t/lib-bitmap.sh
-:  ---------- > 7:  fa91631024 t/lib-bitmap.sh: parameterize tests over reverse index source
-:  ---------- > 8:  993bfa8dd8 midx: read `RIDX` chunk when present
-- 
2.34.1.25.gb3157a20e6

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 1/8] t5326: demonstrate bitmap corruption after permutation
  2021-12-14  1:55 ` [PATCH v2 0/8] " Taylor Blau
@ 2021-12-14  1:55   ` Taylor Blau
  2021-12-14  1:55   ` [PATCH v2 2/8] midx.c: make changing the preferred pack safe Taylor Blau
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 59+ messages in thread
From: Taylor Blau @ 2021-12-14  1:55 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, stolee

This patch demonstrates a cause of bitmap corruption that can occur when
the contents of the multi-pack index does not change, but the underlying
object order does.

In this example, we have a MIDX containing two packs, each with a
distinct set of objects (pack A corresponds to the tree, blob, and
commit from the first patch, and pack B corresponds to the second
patch).

First, a MIDX is written where the 'A' pack is preferred. As expected,
the bitmaps generated there are in-tact. But then, we generate an
identical MIDX with a different object order: this time preferring pack
'B'.

Due to a bug which will be explained and fixed in the following commit,
the MIDX is updated, but the .rev file is not, causing the .bitmap file
to be read incorrectly. Specifically, the .bitmap file will contain
correct data, but the auxiliary object order in the .rev file is stale,
causing readers to get confused by reading the new bitmaps using the old
object order.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5326-multi-pack-bitmaps.sh | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index e187f90f29..0ca2868b0b 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -395,4 +395,35 @@ test_expect_success 'hash-cache values are propagated from pack bitmaps' '
 	)
 '
 
+test_expect_failure 'changing the preferred pack does not corrupt bitmaps' '
+	rm -fr repo &&
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		test_commit A &&
+		test_commit B &&
+
+		git rev-list --objects --no-object-names HEAD^ >A.objects &&
+		git rev-list --objects --no-object-names HEAD^.. >B.objects &&
+
+		A=$(git pack-objects $objdir/pack/pack <A.objects) &&
+		B=$(git pack-objects $objdir/pack/pack <B.objects) &&
+
+		cat >indexes <<-EOF &&
+		pack-$A.idx
+		pack-$B.idx
+		EOF
+
+		git multi-pack-index write --bitmap --stdin-packs \
+			--preferred-pack=pack-$A.pack <indexes &&
+		git rev-list --test-bitmap A &&
+
+		git multi-pack-index write --bitmap --stdin-packs \
+			--preferred-pack=pack-$B.pack <indexes &&
+		git rev-list --test-bitmap A
+	)
+'
+
 test_done
-- 
2.34.1.25.gb3157a20e6


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 2/8] midx.c: make changing the preferred pack safe
  2021-12-14  1:55 ` [PATCH v2 0/8] " Taylor Blau
  2021-12-14  1:55   ` [PATCH v2 1/8] t5326: demonstrate bitmap corruption after permutation Taylor Blau
@ 2021-12-14  1:55   ` Taylor Blau
  2021-12-14  1:55   ` [PATCH v2 3/8] pack-revindex.c: instrument loading on-disk reverse index Taylor Blau
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 59+ messages in thread
From: Taylor Blau @ 2021-12-14  1:55 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, stolee

The previous patch demonstrates a bug where a MIDX's auxiliary object
order can become out of sync with a MIDX bitmap.

This is because of two confounding factors:

  - First, the object order is stored in a file which is named according
    to the multi-pack index's checksum, and the MIDX does not store the
    object order. This means that the object order can change without
    altering the checksum.

  - But the .rev file is moved into place with finalize_object_file(),
    which link(2)'s the file into place instead of renaming it. For us,
    that means that a modified .rev file will not be moved into place if
    MIDX's checksum was unchanged.

The fix here is two-fold. First, we need to stop linking the file into
place and instead rename it. It's likely we were using
finalize_object_file() instead of a pure rename() because the former
also adjusts shared permissions. But that is unnecessary, because we
already do so in write_rev_file_order(), so rename alone is safe.

But we also need to make the MIDX's checksum change in some way when the
preferred pack changes without altering the set of packs stored in a
MIDX to prevent a race where the new .rev file is moved into place
before the MIDX is updated. Here, you'd get the opposite effect: reading
old bitmaps with the new object order.

But this race bites us even here: suppose that we didn't change the MIDX
checksum, but only renamed the auxiliary object order into place instead
of hardlinking it. Then when we go to generate the new bitmap, we'll
load the old MIDX bitmap, along with the MIDX that it references. That's
fine, since the new MIDX isn't moved into place until after the new
bitmap is generated. But the new object order *has* been moved into
place. So we'll read the old bitmaps in the new order when generating
the new bitmap file, meaning that without this secondary change, bitmap
generation itself would become a victim of the race described here.

This can all be prevented by forcing the MIDX's checksum to change when
the object order changes. We could include the entire object order in
the MIDX, but doing so is somewhat awkward. (For example, the code that
writes a .rev file expects to know the checksum of the associated pack
or MIDX, but writing that data into the MIDX itself makes that a
circular dependency).

Instead, make the object order used during bitmap generation part of the
MIDX itself. That means that the new test in t5326 will cause the MIDX's
checksum to update, preventing the stale read problem.

In theory, it is possible to store a "fingerprint" of the full object
order here, so long as that fingerprint changes at least as often as the
full object order does. Some possibilities here include storing the
identity of the preferred pack, along with the mtimes of the
non-preferred packs in a consistent order. But storing a limited part of
the information makes it difficult to reason about whether or not there
are gaps between the two that would cause us to get bitten by this bug
again.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/technical/multi-pack-index.txt |  1 +
 Documentation/technical/pack-format.txt      | 13 +++++-----
 midx.c                                       | 25 ++++++++++++++++----
 t/t5326-multi-pack-bitmaps.sh                |  2 +-
 4 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt
index 86f40f2490..6e1fb092c6 100644
--- a/Documentation/technical/multi-pack-index.txt
+++ b/Documentation/technical/multi-pack-index.txt
@@ -24,6 +24,7 @@ and their offsets into multiple packfiles. It contains:
   - An offset within the jth packfile for the object.
 - If large offsets are required, we use another list of large
   offsets similar to version 2 pack-indexes.
+- An optional list of objects in pseudo-pack order (used with MIDX bitmaps).
 
 Thus, we can provide O(log N) lookup time for any number
 of packfiles.
diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
index 8d2f42f29e..6d3efb7d16 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -376,6 +376,11 @@ CHUNK DATA:
 	[Optional] Object Large Offsets (ID: {'L', 'O', 'F', 'F'})
 	    8-byte offsets into large packfiles.
 
+	[Optional] Bitmap pack order (ID: {'R', 'I', 'D', 'X'})
+	    A list of MIDX positions (one per object in the MIDX, num_objects in
+	    total, each a 4-byte unsigned integer in network byte order), sorted
+	    according to their relative bitmap/pseudo-pack positions.
+
 TRAILER:
 
 	Index checksum of the above contents.
@@ -456,9 +461,5 @@ In short, a MIDX's pseudo-pack is the de-duplicated concatenation of
 objects in packs stored by the MIDX, laid out in pack order, and the
 packs arranged in MIDX order (with the preferred pack coming first).
 
-Finally, note that the MIDX's reverse index is not stored as a chunk in
-the multi-pack-index itself. This is done because the reverse index
-includes the checksum of the pack or MIDX to which it belongs, which
-makes it impossible to write in the MIDX. To avoid races when rewriting
-the MIDX, a MIDX reverse index includes the MIDX's checksum in its
-filename (e.g., `multi-pack-index-xyz.rev`).
+The MIDX's reverse index is stored in the optional 'RIDX' chunk within
+the MIDX itself.
diff --git a/midx.c b/midx.c
index 837b46b2af..d3179e9c02 100644
--- a/midx.c
+++ b/midx.c
@@ -33,6 +33,7 @@
 #define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
 #define MIDX_CHUNKID_OBJECTOFFSETS 0x4f4f4646 /* "OOFF" */
 #define MIDX_CHUNKID_LARGEOFFSETS 0x4c4f4646 /* "LOFF" */
+#define MIDX_CHUNKID_REVINDEX 0x52494458 /* "RIDX" */
 #define MIDX_CHUNK_FANOUT_SIZE (sizeof(uint32_t) * 256)
 #define MIDX_CHUNK_OFFSET_WIDTH (2 * sizeof(uint32_t))
 #define MIDX_CHUNK_LARGE_OFFSET_WIDTH (sizeof(uint64_t))
@@ -833,6 +834,18 @@ static int write_midx_large_offsets(struct hashfile *f,
 	return 0;
 }
 
+static int write_midx_revindex(struct hashfile *f,
+			       void *data)
+{
+	struct write_midx_context *ctx = data;
+	uint32_t i;
+
+	for (i = 0; i < ctx->entries_nr; i++)
+		hashwrite_be32(f, ctx->pack_order[i]);
+
+	return 0;
+}
+
 struct midx_pack_order_data {
 	uint32_t nr;
 	uint32_t pack;
@@ -891,7 +904,7 @@ static void write_midx_reverse_index(char *midx_name, unsigned char *midx_hash,
 	tmp_file = write_rev_file_order(NULL, ctx->pack_order, ctx->entries_nr,
 					midx_hash, WRITE_REV);
 
-	if (finalize_object_file(tmp_file, buf.buf))
+	if (rename(tmp_file, buf.buf))
 		die(_("cannot store reverse index file"));
 
 	strbuf_release(&buf);
@@ -1403,15 +1416,19 @@ static int write_midx_internal(const char *object_dir,
 			(size_t)ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH,
 			write_midx_large_offsets);
 
+	if (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP)) {
+		ctx.pack_order = midx_pack_order(&ctx);
+		add_chunk(cf, MIDX_CHUNKID_REVINDEX,
+			  ctx.entries_nr * sizeof(uint32_t),
+			  write_midx_revindex);
+	}
+
 	write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs);
 	write_chunkfile(cf, &ctx);
 
 	finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
 	free_chunkfile(cf);
 
-	if (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))
-		ctx.pack_order = midx_pack_order(&ctx);
-
 	if (flags & MIDX_WRITE_REV_INDEX)
 		write_midx_reverse_index(midx_name.buf, midx_hash, &ctx);
 	if (flags & MIDX_WRITE_BITMAP) {
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 0ca2868b0b..353282310d 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -395,7 +395,7 @@ test_expect_success 'hash-cache values are propagated from pack bitmaps' '
 	)
 '
 
-test_expect_failure 'changing the preferred pack does not corrupt bitmaps' '
+test_expect_success 'changing the preferred pack does not corrupt bitmaps' '
 	rm -fr repo &&
 	git init repo &&
 	test_when_finished "rm -fr repo" &&
-- 
2.34.1.25.gb3157a20e6


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 3/8] pack-revindex.c: instrument loading on-disk reverse index
  2021-12-14  1:55 ` [PATCH v2 0/8] " Taylor Blau
  2021-12-14  1:55   ` [PATCH v2 1/8] t5326: demonstrate bitmap corruption after permutation Taylor Blau
  2021-12-14  1:55   ` [PATCH v2 2/8] midx.c: make changing the preferred pack safe Taylor Blau
@ 2021-12-14  1:55   ` Taylor Blau
  2021-12-14  1:55   ` [PATCH v2 4/8] t5326: drop unnecessary setup Taylor Blau
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 59+ messages in thread
From: Taylor Blau @ 2021-12-14  1:55 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, stolee

In a subsequent commit, we'll use the MIDX's new 'RIDX' chunk as a
source for the reverse index's data. But it will be useful for tests to
be able to determine whether the reverse index was loaded from the
separate .rev file, or from a chunk within the MIDX.

To instrument this, add a trace2 event which the tests can look for in
order to determine the reverse index's source.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-revindex.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/pack-revindex.c b/pack-revindex.c
index 70d0fbafcb..bd15ebad03 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -301,6 +301,9 @@ int load_midx_revindex(struct multi_pack_index *m)
 	if (m->revindex_data)
 		return 0;
 
+	trace2_data_string("load_midx_revindex", the_repository,
+			   "source", "rev");
+
 	get_midx_rev_filename(&revindex_name, m);
 
 	ret = load_revindex_from_disk(revindex_name.buf,
-- 
2.34.1.25.gb3157a20e6


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 4/8] t5326: drop unnecessary setup
  2021-12-14  1:55 ` [PATCH v2 0/8] " Taylor Blau
                     ` (2 preceding siblings ...)
  2021-12-14  1:55   ` [PATCH v2 3/8] pack-revindex.c: instrument loading on-disk reverse index Taylor Blau
@ 2021-12-14  1:55   ` Taylor Blau
  2021-12-14  1:55   ` [PATCH v2 5/8] t5326: extract `test_rev_exists` Taylor Blau
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 59+ messages in thread
From: Taylor Blau @ 2021-12-14  1:55 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, stolee

The core.multiPackIndex config became true by default back in 18e449f86b
(midx: enable core.multiPackIndex by default, 2020-09-25), so it is no
longer necessary to enable it explicitly.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5326-multi-pack-bitmaps.sh | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 353282310d..1a9581af30 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -19,10 +19,6 @@ midx_pack_source () {
 
 setup_bitmap_history
 
-test_expect_success 'enable core.multiPackIndex' '
-	git config core.multiPackIndex true
-'
-
 test_expect_success 'create single-pack midx with bitmaps' '
 	git repack -ad &&
 	git multi-pack-index write --bitmap &&
-- 
2.34.1.25.gb3157a20e6


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 5/8] t5326: extract `test_rev_exists`
  2021-12-14  1:55 ` [PATCH v2 0/8] " Taylor Blau
                     ` (3 preceding siblings ...)
  2021-12-14  1:55   ` [PATCH v2 4/8] t5326: drop unnecessary setup Taylor Blau
@ 2021-12-14  1:55   ` Taylor Blau
  2021-12-20 18:33     ` Derrick Stolee
  2021-12-14  1:55   ` [PATCH v2 6/8] t5326: move tests to t/lib-bitmap.sh Taylor Blau
                     ` (4 subsequent siblings)
  9 siblings, 1 reply; 59+ messages in thread
From: Taylor Blau @ 2021-12-14  1:55 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, stolee

To determine which source of data is used for the MIDX's reverse index
cache, introduce a helper which forces loading the reverse index, and
then looks for the special trace2 event introduced in a previous commit.

For now, this helper just looks for when the legacy MIDX .rev file was
loaded, but in a subsequent commit will become parameterized over the
the reverse index's source.

This function replaces checking for the existence of the .rev file. We
could write a similar helper to ensure that the .rev file is cleaned up
after repacking, but it will make subsequent tests more difficult to
write, and provides marginal value since we already check that the MIDX
.bitmap file is removed.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5326-multi-pack-bitmaps.sh | 38 +++++++++++++++++++++--------------
 1 file changed, 23 insertions(+), 15 deletions(-)

diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 1a9581af30..85a91c2675 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -17,16 +17,30 @@ midx_pack_source () {
 	test-tool read-midx --show-objects .git/objects | grep "^$1 " | cut -f2
 }
 
+test_rev_exists () {
+	commit="$1"
+
+	test_expect_success 'reverse index exists' '
+		GIT_TRACE2_EVENT_NESTING=10 \
+		GIT_TRACE2_EVENT=$(pwd)/event.trace \
+			git rev-list --test-bitmap "$commit" &&
+
+		test_path_is_file $midx-$(midx_checksum $objdir).rev &&
+		grep "\"category\":\"load_midx_revindex\",\"key\":\"source\",\"value\":\"rev\"" event.trace
+	'
+}
+
 setup_bitmap_history
 
 test_expect_success 'create single-pack midx with bitmaps' '
 	git repack -ad &&
 	git multi-pack-index write --bitmap &&
 	test_path_is_file $midx &&
-	test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
-	test_path_is_file $midx-$(midx_checksum $objdir).rev
+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
 '
 
+test_rev_exists HEAD
+
 basic_bitmap_tests
 
 test_expect_success 'create new additional packs' '
@@ -52,10 +66,11 @@ test_expect_success 'create multi-pack midx with bitmaps' '
 	test_line_count = 25 packs &&
 
 	test_path_is_file $midx &&
-	test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
-	test_path_is_file $midx-$(midx_checksum $objdir).rev
+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
 '
 
+test_rev_exists HEAD
+
 basic_bitmap_tests
 
 test_expect_success '--no-bitmap is respected when bitmaps exist' '
@@ -66,7 +81,6 @@ test_expect_success '--no-bitmap is respected when bitmaps exist' '
 
 	test_path_is_file $midx &&
 	test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
-	test_path_is_file $midx-$(midx_checksum $objdir).rev &&
 
 	git multi-pack-index write --no-bitmap &&
 
@@ -206,10 +220,11 @@ test_expect_success 'setup partial bitmaps' '
 	test_commit loose &&
 	git multi-pack-index write --bitmap 2>err &&
 	test_path_is_file $midx &&
-	test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
-	test_path_is_file $midx-$(midx_checksum $objdir).rev
+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
 '
 
+test_rev_exists HEAD~
+
 basic_bitmap_tests HEAD~
 
 test_expect_success 'removing a MIDX clears stale bitmaps' '
@@ -224,7 +239,6 @@ test_expect_success 'removing a MIDX clears stale bitmaps' '
 
 		# Write a MIDX and bitmap; remove the MIDX but leave the bitmap.
 		stale_bitmap=$midx-$(midx_checksum $objdir).bitmap &&
-		stale_rev=$midx-$(midx_checksum $objdir).rev &&
 		rm $midx &&
 
 		# Then write a new MIDX.
@@ -234,9 +248,7 @@ test_expect_success 'removing a MIDX clears stale bitmaps' '
 
 		test_path_is_file $midx &&
 		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
-		test_path_is_file $midx-$(midx_checksum $objdir).rev &&
-		test_path_is_missing $stale_bitmap &&
-		test_path_is_missing $stale_rev
+		test_path_is_missing $stale_bitmap
 	)
 '
 
@@ -257,7 +269,6 @@ test_expect_success 'pack.preferBitmapTips' '
 		git multi-pack-index write --bitmap &&
 		test_path_is_file $midx &&
 		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
-		test_path_is_file $midx-$(midx_checksum $objdir).rev &&
 
 		test-tool bitmap list-commits | sort >bitmaps &&
 		comm -13 bitmaps commits >before &&
@@ -267,7 +278,6 @@ test_expect_success 'pack.preferBitmapTips' '
 			<before | git update-ref --stdin &&
 
 		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
-		rm -fr $midx-$(midx_checksum $objdir).rev &&
 		rm -fr $midx &&
 
 		git -c pack.preferBitmapTips=refs/tags/include \
@@ -305,7 +315,6 @@ test_expect_success 'writing a bitmap with --refs-snapshot' '
 		grep "$(git rev-parse two)" bitmaps &&
 
 		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
-		rm -fr $midx-$(midx_checksum $objdir).rev &&
 		rm -fr $midx &&
 
 		# Then again, but with a refs snapshot which only sees
@@ -350,7 +359,6 @@ test_expect_success 'write a bitmap with --refs-snapshot (preferred tips)' '
 		) >snapshot &&
 
 		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
-		rm -fr $midx-$(midx_checksum $objdir).rev &&
 		rm -fr $midx &&
 
 		git multi-pack-index write --bitmap --refs-snapshot=snapshot &&
-- 
2.34.1.25.gb3157a20e6


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 6/8] t5326: move tests to t/lib-bitmap.sh
  2021-12-14  1:55 ` [PATCH v2 0/8] " Taylor Blau
                     ` (4 preceding siblings ...)
  2021-12-14  1:55   ` [PATCH v2 5/8] t5326: extract `test_rev_exists` Taylor Blau
@ 2021-12-14  1:55   ` Taylor Blau
  2021-12-14  1:55   ` [PATCH v2 7/8] t/lib-bitmap.sh: parameterize tests over reverse index source Taylor Blau
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 59+ messages in thread
From: Taylor Blau @ 2021-12-14  1:55 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, stolee

In t5326, we have a handful of tests that we would like to run twice:
once using the MIDX's new `RIDX` chunk as the source of the
reverse-index cache, and once using the separate `.rev` file.

But because these tests mutate the state of the underlying repository,
and then make assumptions about those mutations occurring in a certain
sequence, simply running the tests twice in the same repository is
awkward.

Instead, extract the core of interesting tests into t/lib-bitmap.sh to
prepare for them to be run twice, each in a separate test script. This
means that they can each operate on a separate repository, removing any
concerns about mutating state.

For now, this patch is a strict cut-and-paste of some tests from t5326.
The tests which did not move are not interesting with respect to the
source of their reverse index data.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/lib-bitmap.sh               | 178 ++++++++++++++++++++++++++++++++++
 t/t5326-multi-pack-bitmaps.sh | 174 +--------------------------------
 2 files changed, 180 insertions(+), 172 deletions(-)

diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
index 21d0392dda..771c41c2ea 100644
--- a/t/lib-bitmap.sh
+++ b/t/lib-bitmap.sh
@@ -1,6 +1,9 @@
 # Helpers for scripts testing bitmap functionality; see t5310 for
 # example usage.
 
+objdir=.git/objects
+midx=$objdir/pack/multi-pack-index
+
 # Compare a file containing rev-list bitmap traversal output to its non-bitmap
 # counterpart. You can't just use test_cmp for this, because the two produce
 # subtly different output:
@@ -264,3 +267,178 @@ have_delta () {
 midx_checksum () {
 	test-tool read-midx --checksum "$1"
 }
+
+# midx_pack_source <obj>
+midx_pack_source () {
+	test-tool read-midx --show-objects .git/objects | grep "^$1 " | cut -f2
+}
+
+test_rev_exists () {
+	commit="$1"
+
+	test_expect_success 'reverse index exists' '
+		GIT_TRACE2_EVENT_NESTING=10 \
+		GIT_TRACE2_EVENT=$(pwd)/event.trace \
+			git rev-list --test-bitmap "$commit" &&
+
+		test_path_is_file $midx-$(midx_checksum $objdir).rev &&
+		grep "\"category\":\"load_midx_revindex\",\"key\":\"source\",\"value\":\"rev\"" event.trace
+	'
+}
+
+midx_bitmap_core () {
+	setup_bitmap_history
+
+	test_expect_success 'create single-pack midx with bitmaps' '
+		git repack -ad &&
+		git multi-pack-index write --bitmap &&
+		test_path_is_file $midx &&
+		test_path_is_file $midx-$(midx_checksum $objdir).bitmap
+	'
+
+	test_rev_exists HEAD
+
+	basic_bitmap_tests
+
+	test_expect_success 'create new additional packs' '
+		for i in $(test_seq 1 16)
+		do
+			test_commit "$i" &&
+			git repack -d || return 1
+		done &&
+
+		git checkout -b other2 HEAD~8 &&
+		for i in $(test_seq 1 8)
+		do
+			test_commit "side-$i" &&
+			git repack -d || return 1
+		done &&
+		git checkout second
+	'
+
+	test_expect_success 'create multi-pack midx with bitmaps' '
+		git multi-pack-index write --bitmap &&
+
+		ls $objdir/pack/pack-*.pack >packs &&
+		test_line_count = 25 packs &&
+
+		test_path_is_file $midx &&
+		test_path_is_file $midx-$(midx_checksum $objdir).bitmap
+	'
+
+	test_rev_exists HEAD
+
+	basic_bitmap_tests
+
+	test_expect_success '--no-bitmap is respected when bitmaps exist' '
+		git multi-pack-index write --bitmap &&
+
+		test_commit respect--no-bitmap &&
+		git repack -d &&
+
+		test_path_is_file $midx &&
+		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+
+		git multi-pack-index write --no-bitmap &&
+
+		test_path_is_file $midx &&
+		test_path_is_missing $midx-$(midx_checksum $objdir).bitmap &&
+		test_path_is_missing $midx-$(midx_checksum $objdir).rev
+	'
+
+	test_expect_success 'setup midx with base from later pack' '
+		# Write a and b so that "a" is a delta on top of base "b", since Git
+		# prefers to delete contents out of a base rather than add to a shorter
+		# object.
+		test_seq 1 128 >a &&
+		test_seq 1 130 >b &&
+
+		git add a b &&
+		git commit -m "initial commit" &&
+
+		a=$(git rev-parse HEAD:a) &&
+		b=$(git rev-parse HEAD:b) &&
+
+		# In the first pack, "a" is stored as a delta to "b".
+		p1=$(git pack-objects .git/objects/pack/pack <<-EOF
+		$a
+		$b
+		EOF
+		) &&
+
+		# In the second pack, "a" is missing, and "b" is not a delta nor base to
+		# any other object.
+		p2=$(git pack-objects .git/objects/pack/pack <<-EOF
+		$b
+		$(git rev-parse HEAD)
+		$(git rev-parse HEAD^{tree})
+		EOF
+		) &&
+
+		git prune-packed &&
+		# Use the second pack as the preferred source, so that "b" occurs
+		# earlier in the MIDX object order, rendering "a" unusable for pack
+		# reuse.
+		git multi-pack-index write --bitmap --preferred-pack=pack-$p2.idx &&
+
+		have_delta $a $b &&
+		test $(midx_pack_source $a) != $(midx_pack_source $b)
+	'
+
+	rev_list_tests 'full bitmap with backwards delta'
+
+	test_expect_success 'clone with bitmaps enabled' '
+		git clone --no-local --bare . clone-reverse-delta.git &&
+		test_when_finished "rm -fr clone-reverse-delta.git" &&
+
+		git rev-parse HEAD >expect &&
+		git --git-dir=clone-reverse-delta.git rev-parse HEAD >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success 'changing the preferred pack does not corrupt bitmaps' '
+		rm -fr repo &&
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+
+			test_commit A &&
+			test_commit B &&
+
+			git rev-list --objects --no-object-names HEAD^ >A.objects &&
+			git rev-list --objects --no-object-names HEAD^.. >B.objects &&
+
+			A=$(git pack-objects $objdir/pack/pack <A.objects) &&
+			B=$(git pack-objects $objdir/pack/pack <B.objects) &&
+
+			cat >indexes <<-EOF &&
+			pack-$A.idx
+			pack-$B.idx
+			EOF
+
+			git multi-pack-index write --bitmap --stdin-packs \
+				--preferred-pack=pack-$A.pack <indexes &&
+			git rev-list --test-bitmap A &&
+
+			git multi-pack-index write --bitmap --stdin-packs \
+				--preferred-pack=pack-$B.pack <indexes &&
+			git rev-list --test-bitmap A
+		)
+	'
+}
+
+midx_bitmap_partial_tests () {
+	test_expect_success 'setup partial bitmaps' '
+		test_commit packed &&
+		git repack &&
+		test_commit loose &&
+		git multi-pack-index write --bitmap 2>err &&
+		test_path_is_file $midx &&
+		test_path_is_file $midx-$(midx_checksum $objdir).bitmap
+	'
+
+	test_rev_exists HEAD~
+
+	basic_bitmap_tests HEAD~
+}
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 85a91c2675..100ac90d15 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -9,135 +9,7 @@ test_description='exercise basic multi-pack bitmap functionality'
 GIT_TEST_MULTI_PACK_INDEX=0
 GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
 
-objdir=.git/objects
-midx=$objdir/pack/multi-pack-index
-
-# midx_pack_source <obj>
-midx_pack_source () {
-	test-tool read-midx --show-objects .git/objects | grep "^$1 " | cut -f2
-}
-
-test_rev_exists () {
-	commit="$1"
-
-	test_expect_success 'reverse index exists' '
-		GIT_TRACE2_EVENT_NESTING=10 \
-		GIT_TRACE2_EVENT=$(pwd)/event.trace \
-			git rev-list --test-bitmap "$commit" &&
-
-		test_path_is_file $midx-$(midx_checksum $objdir).rev &&
-		grep "\"category\":\"load_midx_revindex\",\"key\":\"source\",\"value\":\"rev\"" event.trace
-	'
-}
-
-setup_bitmap_history
-
-test_expect_success 'create single-pack midx with bitmaps' '
-	git repack -ad &&
-	git multi-pack-index write --bitmap &&
-	test_path_is_file $midx &&
-	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
-'
-
-test_rev_exists HEAD
-
-basic_bitmap_tests
-
-test_expect_success 'create new additional packs' '
-	for i in $(test_seq 1 16)
-	do
-		test_commit "$i" &&
-		git repack -d || return 1
-	done &&
-
-	git checkout -b other2 HEAD~8 &&
-	for i in $(test_seq 1 8)
-	do
-		test_commit "side-$i" &&
-		git repack -d || return 1
-	done &&
-	git checkout second
-'
-
-test_expect_success 'create multi-pack midx with bitmaps' '
-	git multi-pack-index write --bitmap &&
-
-	ls $objdir/pack/pack-*.pack >packs &&
-	test_line_count = 25 packs &&
-
-	test_path_is_file $midx &&
-	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
-'
-
-test_rev_exists HEAD
-
-basic_bitmap_tests
-
-test_expect_success '--no-bitmap is respected when bitmaps exist' '
-	git multi-pack-index write --bitmap &&
-
-	test_commit respect--no-bitmap &&
-	git repack -d &&
-
-	test_path_is_file $midx &&
-	test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
-
-	git multi-pack-index write --no-bitmap &&
-
-	test_path_is_file $midx &&
-	test_path_is_missing $midx-$(midx_checksum $objdir).bitmap &&
-	test_path_is_missing $midx-$(midx_checksum $objdir).rev
-'
-
-test_expect_success 'setup midx with base from later pack' '
-	# Write a and b so that "a" is a delta on top of base "b", since Git
-	# prefers to delete contents out of a base rather than add to a shorter
-	# object.
-	test_seq 1 128 >a &&
-	test_seq 1 130 >b &&
-
-	git add a b &&
-	git commit -m "initial commit" &&
-
-	a=$(git rev-parse HEAD:a) &&
-	b=$(git rev-parse HEAD:b) &&
-
-	# In the first pack, "a" is stored as a delta to "b".
-	p1=$(git pack-objects .git/objects/pack/pack <<-EOF
-	$a
-	$b
-	EOF
-	) &&
-
-	# In the second pack, "a" is missing, and "b" is not a delta nor base to
-	# any other object.
-	p2=$(git pack-objects .git/objects/pack/pack <<-EOF
-	$b
-	$(git rev-parse HEAD)
-	$(git rev-parse HEAD^{tree})
-	EOF
-	) &&
-
-	git prune-packed &&
-	# Use the second pack as the preferred source, so that "b" occurs
-	# earlier in the MIDX object order, rendering "a" unusable for pack
-	# reuse.
-	git multi-pack-index write --bitmap --preferred-pack=pack-$p2.idx &&
-
-	have_delta $a $b &&
-	test $(midx_pack_source $a) != $(midx_pack_source $b)
-'
-
-rev_list_tests 'full bitmap with backwards delta'
-
-test_expect_success 'clone with bitmaps enabled' '
-	git clone --no-local --bare . clone-reverse-delta.git &&
-	test_when_finished "rm -fr clone-reverse-delta.git" &&
-
-	git rev-parse HEAD >expect &&
-	git --git-dir=clone-reverse-delta.git rev-parse HEAD >actual &&
-	test_cmp expect actual
-'
+midx_bitmap_core
 
 bitmap_reuse_tests() {
 	from=$1
@@ -214,18 +86,7 @@ test_expect_success 'missing object closure fails gracefully' '
 	)
 '
 
-test_expect_success 'setup partial bitmaps' '
-	test_commit packed &&
-	git repack &&
-	test_commit loose &&
-	git multi-pack-index write --bitmap 2>err &&
-	test_path_is_file $midx &&
-	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
-'
-
-test_rev_exists HEAD~
-
-basic_bitmap_tests HEAD~
+midx_bitmap_partial_tests
 
 test_expect_success 'removing a MIDX clears stale bitmaps' '
 	rm -fr repo &&
@@ -399,35 +260,4 @@ test_expect_success 'hash-cache values are propagated from pack bitmaps' '
 	)
 '
 
-test_expect_success 'changing the preferred pack does not corrupt bitmaps' '
-	rm -fr repo &&
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
-
-		test_commit A &&
-		test_commit B &&
-
-		git rev-list --objects --no-object-names HEAD^ >A.objects &&
-		git rev-list --objects --no-object-names HEAD^.. >B.objects &&
-
-		A=$(git pack-objects $objdir/pack/pack <A.objects) &&
-		B=$(git pack-objects $objdir/pack/pack <B.objects) &&
-
-		cat >indexes <<-EOF &&
-		pack-$A.idx
-		pack-$B.idx
-		EOF
-
-		git multi-pack-index write --bitmap --stdin-packs \
-			--preferred-pack=pack-$A.pack <indexes &&
-		git rev-list --test-bitmap A &&
-
-		git multi-pack-index write --bitmap --stdin-packs \
-			--preferred-pack=pack-$B.pack <indexes &&
-		git rev-list --test-bitmap A
-	)
-'
-
 test_done
-- 
2.34.1.25.gb3157a20e6


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 7/8] t/lib-bitmap.sh: parameterize tests over reverse index source
  2021-12-14  1:55 ` [PATCH v2 0/8] " Taylor Blau
                     ` (5 preceding siblings ...)
  2021-12-14  1:55   ` [PATCH v2 6/8] t5326: move tests to t/lib-bitmap.sh Taylor Blau
@ 2021-12-14  1:55   ` Taylor Blau
  2021-12-14  1:55   ` [PATCH v2 8/8] midx: read `RIDX` chunk when present Taylor Blau
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 59+ messages in thread
From: Taylor Blau @ 2021-12-14  1:55 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, stolee

To prepare for reading the reverse index data out of the MIDX itself,
teach the `test_rev_exists` function to take an expected "source" for
the reverse index data.

When given "rev", it asserts that the MIDX's `.rev` file exists, and is
loaded when verifying the integrity of its bitmaps. Otherwise, it
ensures that trace2 reports the source of the reverse index data as the
same string which was given to test_rev_exists().

The following patch will implement reading the reverse index data from
the MIDX itself.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/lib-bitmap.sh | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
index 771c41c2ea..0a35daf939 100644
--- a/t/lib-bitmap.sh
+++ b/t/lib-bitmap.sh
@@ -275,18 +275,24 @@ midx_pack_source () {
 
 test_rev_exists () {
 	commit="$1"
+	kind="$2"
 
 	test_expect_success 'reverse index exists' '
 		GIT_TRACE2_EVENT_NESTING=10 \
 		GIT_TRACE2_EVENT=$(pwd)/event.trace \
 			git rev-list --test-bitmap "$commit" &&
 
-		test_path_is_file $midx-$(midx_checksum $objdir).rev &&
-		grep "\"category\":\"load_midx_revindex\",\"key\":\"source\",\"value\":\"rev\"" event.trace
+		if test "rev" = "$kind"
+		then
+			test_path_is_file $midx-$(midx_checksum $objdir).rev
+		fi &&
+		grep "\"category\":\"load_midx_revindex\",\"key\":\"source\",\"value\":\"$kind\"" event.trace
 	'
 }
 
 midx_bitmap_core () {
+	rev_kind="${1:-rev}"
+
 	setup_bitmap_history
 
 	test_expect_success 'create single-pack midx with bitmaps' '
@@ -296,7 +302,7 @@ midx_bitmap_core () {
 		test_path_is_file $midx-$(midx_checksum $objdir).bitmap
 	'
 
-	test_rev_exists HEAD
+	test_rev_exists HEAD "$rev_kind"
 
 	basic_bitmap_tests
 
@@ -326,7 +332,7 @@ midx_bitmap_core () {
 		test_path_is_file $midx-$(midx_checksum $objdir).bitmap
 	'
 
-	test_rev_exists HEAD
+	test_rev_exists HEAD "$rev_kind"
 
 	basic_bitmap_tests
 
@@ -429,6 +435,8 @@ midx_bitmap_core () {
 }
 
 midx_bitmap_partial_tests () {
+	rev_kind="${1:-rev}"
+
 	test_expect_success 'setup partial bitmaps' '
 		test_commit packed &&
 		git repack &&
@@ -438,7 +446,7 @@ midx_bitmap_partial_tests () {
 		test_path_is_file $midx-$(midx_checksum $objdir).bitmap
 	'
 
-	test_rev_exists HEAD~
+	test_rev_exists HEAD~ "$rev_kind"
 
 	basic_bitmap_tests HEAD~
 }
-- 
2.34.1.25.gb3157a20e6


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v2 8/8] midx: read `RIDX` chunk when present
  2021-12-14  1:55 ` [PATCH v2 0/8] " Taylor Blau
                     ` (6 preceding siblings ...)
  2021-12-14  1:55   ` [PATCH v2 7/8] t/lib-bitmap.sh: parameterize tests over reverse index source Taylor Blau
@ 2021-12-14  1:55   ` Taylor Blau
  2021-12-20 18:42     ` Derrick Stolee
  2021-12-15 19:46   ` [PATCH v2 0/8] midx: prevent bitmap corruption when permuting pack order Junio C Hamano
  2021-12-15 22:58   ` Junio C Hamano
  9 siblings, 1 reply; 59+ messages in thread
From: Taylor Blau @ 2021-12-14  1:55 UTC (permalink / raw)
  To: git; +Cc: gitster, peff, stolee

When a MIDX contains the new `RIDX` chunk, ensure that the reverse index
is read from it instead of the on-disk .rev file. Since we need to
encode the object order in the MIDX itself for correctness reasons,
there is no point in storing the same data again outside of the MIDX.

So, this patch stops writing separate .rev files, and reads it out of
the MIDX itself. This is possible to do with relatively little new code,
since the format of the RIDX chunk is identical to the data in the .rev
file. In other words, we can implement this by pointing the
`revindex_data` field at the reverse index chunk of the MIDX instead of
the .rev file without any other changes.

Note that we have two knobs that are adjusted for the new tests:
GIT_TEST_MIDX_WRITE_REV and GIT_TEST_MIDX_READ_RIDX. The former controls
whether the MIDX .rev is written at all, and the latter controls whether
we read the MIDX's RIDX chunk.

Both are necessary to ensure that the test added at the beginning of
this series continues to work. This is because we always need to write
the RIDX chunk in the MIDX in order to change its checksum, but we want
to make sure reading the existing .rev file still works (since the RIDX
chunk takes precedence by default).

Arguably this isn't a very interesting mode to test, because the
precedence rules mean that we'll always read the RIDX chunk over the
.rev file. But it makes it impossible for a user to induce corruption in
their repository by adjusting the test knobs (since if we had an
either/or knob they could stop writing the RIDX chunk, allowing them to
tweak the MIDX's object order without changing its checksum).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c                            |  6 +++++-
 midx.h                            |  1 +
 pack-revindex.c                   | 17 +++++++++++++++++
 t/lib-bitmap.sh                   |  4 ++--
 t/t5326-multi-pack-bitmaps.sh     |  5 +++++
 t/t5327-multi-pack-bitmaps-rev.sh | 23 +++++++++++++++++++++++
 t/t7700-repack.sh                 |  4 ----
 7 files changed, 53 insertions(+), 7 deletions(-)
 create mode 100755 t/t5327-multi-pack-bitmaps-rev.sh

diff --git a/midx.c b/midx.c
index d3179e9c02..9aba13b5b1 100644
--- a/midx.c
+++ b/midx.c
@@ -162,6 +162,9 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 
 	pair_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS, &m->chunk_large_offsets);
 
+	if (git_env_bool("GIT_TEST_MIDX_READ_RIDX", 1))
+		pair_chunk(cf, MIDX_CHUNKID_REVINDEX, &m->chunk_revindex);
+
 	m->num_objects = ntohl(m->chunk_oid_fanout[255]);
 
 	CALLOC_ARRAY(m->pack_names, m->num_packs);
@@ -1429,7 +1432,8 @@ static int write_midx_internal(const char *object_dir,
 	finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
 	free_chunkfile(cf);
 
-	if (flags & MIDX_WRITE_REV_INDEX)
+	if (flags & MIDX_WRITE_REV_INDEX &&
+	    git_env_bool("GIT_TEST_MIDX_WRITE_REV", 0))
 		write_midx_reverse_index(midx_name.buf, midx_hash, &ctx);
 	if (flags & MIDX_WRITE_BITMAP) {
 		if (write_midx_bitmap(midx_name.buf, midx_hash, &ctx,
diff --git a/midx.h b/midx.h
index b7d79a515c..22e8e53288 100644
--- a/midx.h
+++ b/midx.h
@@ -36,6 +36,7 @@ struct multi_pack_index {
 	const unsigned char *chunk_oid_lookup;
 	const unsigned char *chunk_object_offsets;
 	const unsigned char *chunk_large_offsets;
+	const unsigned char *chunk_revindex;
 
 	const char **pack_names;
 	struct packed_git **packs;
diff --git a/pack-revindex.c b/pack-revindex.c
index bd15ebad03..08dc160167 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -298,9 +298,26 @@ int load_midx_revindex(struct multi_pack_index *m)
 {
 	struct strbuf revindex_name = STRBUF_INIT;
 	int ret;
+
 	if (m->revindex_data)
 		return 0;
 
+	if (m->chunk_revindex) {
+		/*
+		 * If the MIDX `m` has a `RIDX` chunk, then use its contents for
+		 * the reverse index instead of trying to load a separate `.rev`
+		 * file.
+		 *
+		 * Note that we do *not* set `m->revindex_map` here, since we do
+		 * not want to accidentally call munmap() in the middle of the
+		 * MIDX.
+		 */
+		trace2_data_string("load_midx_revindex", the_repository,
+				   "source", "midx");
+		m->revindex_data = (const uint32_t *)m->chunk_revindex;
+		return 0;
+	}
+
 	trace2_data_string("load_midx_revindex", the_repository,
 			   "source", "rev");
 
diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
index 0a35daf939..f5eaa9cf68 100644
--- a/t/lib-bitmap.sh
+++ b/t/lib-bitmap.sh
@@ -291,7 +291,7 @@ test_rev_exists () {
 }
 
 midx_bitmap_core () {
-	rev_kind="${1:-rev}"
+	rev_kind="${1:-midx}"
 
 	setup_bitmap_history
 
@@ -435,7 +435,7 @@ midx_bitmap_core () {
 }
 
 midx_bitmap_partial_tests () {
-	rev_kind="${1:-rev}"
+	rev_kind="${1:-midx}"
 
 	test_expect_success 'setup partial bitmaps' '
 		test_commit packed &&
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 100ac90d15..8c92acb0ce 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -9,6 +9,11 @@ test_description='exercise basic multi-pack bitmap functionality'
 GIT_TEST_MULTI_PACK_INDEX=0
 GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
 
+GIT_TEST_MIDX_WRITE_REV=0
+GIT_TEST_MIDX_READ_RIDX=1
+export GIT_TEST_MIDX_WRITE_REV
+export GIT_TEST_MIDX_READ_RIDX
+
 midx_bitmap_core
 
 bitmap_reuse_tests() {
diff --git a/t/t5327-multi-pack-bitmaps-rev.sh b/t/t5327-multi-pack-bitmaps-rev.sh
new file mode 100755
index 0000000000..d30ba632c8
--- /dev/null
+++ b/t/t5327-multi-pack-bitmaps-rev.sh
@@ -0,0 +1,23 @@
+#!/bin/sh
+
+test_description='exercise basic multi-pack bitmap functionality (.rev files)'
+
+. ./test-lib.sh
+. "${TEST_DIRECTORY}/lib-bitmap.sh"
+
+# We'll be writing our own midx and bitmaps, so avoid getting confused by the
+# automatic ones.
+GIT_TEST_MULTI_PACK_INDEX=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
+
+# Unlike t5326, this test exercise multi-pack bitmap functionality where the
+# object order is stored in a separate .rev file.
+GIT_TEST_MIDX_WRITE_REV=1
+GIT_TEST_MIDX_READ_RIDX=0
+export GIT_TEST_MIDX_WRITE_REV
+export GIT_TEST_MIDX_READ_RIDX
+
+midx_bitmap_core rev
+midx_bitmap_partial_tests rev
+
+test_done
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 0260ad6f0e..3ee56cefd3 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -311,16 +311,13 @@ test_expect_success 'cleans up MIDX when appropriate' '
 		checksum=$(midx_checksum $objdir) &&
 		test_path_is_file $midx &&
 		test_path_is_file $midx-$checksum.bitmap &&
-		test_path_is_file $midx-$checksum.rev &&
 
 		test_commit repack-3 &&
 		GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb --write-midx &&
 
 		test_path_is_file $midx &&
 		test_path_is_missing $midx-$checksum.bitmap &&
-		test_path_is_missing $midx-$checksum.rev &&
 		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
-		test_path_is_file $midx-$(midx_checksum $objdir).rev &&
 
 		test_commit repack-4 &&
 		GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb &&
@@ -353,7 +350,6 @@ test_expect_success '--write-midx with preferred bitmap tips' '
 		test_line_count = 1 before &&
 
 		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
-		rm -fr $midx-$(midx_checksum $objdir).rev &&
 		rm -fr $midx &&
 
 		# instead of constructing the snapshot ourselves (c.f., the test
-- 
2.34.1.25.gb3157a20e6

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 0/8] midx: prevent bitmap corruption when permuting pack order
  2021-12-14  1:55 ` [PATCH v2 0/8] " Taylor Blau
                     ` (7 preceding siblings ...)
  2021-12-14  1:55   ` [PATCH v2 8/8] midx: read `RIDX` chunk when present Taylor Blau
@ 2021-12-15 19:46   ` Junio C Hamano
  2021-12-15 21:37     ` Taylor Blau
  2021-12-20 18:51     ` Derrick Stolee
  2021-12-15 22:58   ` Junio C Hamano
  9 siblings, 2 replies; 59+ messages in thread
From: Junio C Hamano @ 2021-12-15 19:46 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, stolee

Taylor Blau <me@ttaylorr.com> writes:

> Here is a reroll of my series which fixes a serious problem with MIDX bitmaps by
> which they can become corrupt when permuting their pack order.

This seems to depend on tb/cruft-packs that is not yet in 'next', so
I'll redo this topic branch by forking it at 'master', merging the
other topic in, and then queuing these 8 patches.

> So I'm definitely open to suggestions there, but otherwise this series should go
> a long ways towards fixing my design mistake of having the MIDX .rev file be
> separate from the MIDX itself.

Yeah, a single file with different chunks is a good way to ensure
atomicity of update.

A note to reviewers.

We need to make sure that not just we can still read .rev in
existing repositories (and convert it to the new chunk) correctly,
but also decide what to do to older versions of Git once the
repository is touched by this new version.  Would they be upset to
see no .rev files or is it just the performance thing (and it is
more correct to recompute the reverse index on the fly)?  Should the
new chunk be made mandatory to cause them notice that they should
not muck with the repository, or is it optional?  Things like that.

Thanks.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 0/8] midx: prevent bitmap corruption when permuting pack order
  2021-12-15 19:46   ` [PATCH v2 0/8] midx: prevent bitmap corruption when permuting pack order Junio C Hamano
@ 2021-12-15 21:37     ` Taylor Blau
  2021-12-15 22:17       ` Junio C Hamano
  2021-12-20 18:51     ` Derrick Stolee
  1 sibling, 1 reply; 59+ messages in thread
From: Taylor Blau @ 2021-12-15 21:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, peff, stolee

On Wed, Dec 15, 2021 at 11:46:16AM -0800, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > Here is a reroll of my series which fixes a serious problem with MIDX bitmaps by
> > which they can become corrupt when permuting their pack order.
>
> This seems to depend on tb/cruft-packs that is not yet in 'next', so
> I'll redo this topic branch by forking it at 'master', merging the
> other topic in, and then queuing these 8 patches.

Hmm. They shouldn't depend on that topic. My local copy depends on
abe6bb3905 (The first batch to start the current cycle, 2021-11-29), but
I'm happy to rebase things if basing on abe6bb3905 was a mistake.

> > So I'm definitely open to suggestions there, but otherwise this series should go
> > a long ways towards fixing my design mistake of having the MIDX .rev file be
> > separate from the MIDX itself.
>
> Yeah, a single file with different chunks is a good way to ensure
> atomicity of update.
>
> A note to reviewers.
>
> We need to make sure that not just we can still read .rev in
> existing repositories (and convert it to the new chunk) correctly,
> but also decide what to do to older versions of Git once the
> repository is touched by this new version.  Would they be upset to
> see no .rev files or is it just the performance thing (and it is
> more correct to recompute the reverse index on the fly)?  Should the
> new chunk be made mandatory to cause them notice that they should
> not muck with the repository, or is it optional?  Things like that.

An old client that is looking in a repository where the reverse index is
stored as a chunk inside of the MIDX (and does not appear as a separate
file on disk) will not be able to read the MIDX's bitmap. We could
compute the reverse index on the fly (as we did for a long time with
single packs) but it is slow, and older clients obviously would not know
how to do it.

But the failure is graceful(ish): we'll get a warning that says we
couldn't find a reverse index, and then carry on pretending that the
MIDX did not have a bitmap.

    ~/s/git [nand] (master) $ git.compile multi-pack-index write --bitmap
    Selecting bitmap commits: 73008, done.
    Building bitmaps: 100% (319/319), done.
    ~/s/git [nand] (master) $ rm -f .git/objects/pack/multi-pack-index-*.rev
    ~/s/git [nand] (master) $ git.compile rev-list --count --objects --use-bitmap-index HEAD
    warning: multi-pack bitmap is missing required reverse index
    warning: ignoring extra bitmap file:
    .git/objects/pack/pack-00465554dc3f3d96ac89a0ecc73cb5f5abbb35a5.pack
    warning: multi-pack bitmap is missing required reverse index
    warning: ignoring extra bitmap file:
    .git/objects/pack/pack-00465554dc3f3d96ac89a0ecc73cb5f5abbb35a5.pack
    307913

As far as whether or not the chunk is necessary, it only *needs* to be
present in the MIDX if there is a corresponding MIDX bitmap. We could
generate it always, but this series keeps in the tradition of the .rev
file and only writes it when we are also writing a bitmap.

> Thanks.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 0/8] midx: prevent bitmap corruption when permuting pack order
  2021-12-15 21:37     ` Taylor Blau
@ 2021-12-15 22:17       ` Junio C Hamano
  2021-12-15 22:55         ` Junio C Hamano
  0 siblings, 1 reply; 59+ messages in thread
From: Junio C Hamano @ 2021-12-15 22:17 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, stolee

Taylor Blau <me@ttaylorr.com> writes:

> On Wed, Dec 15, 2021 at 11:46:16AM -0800, Junio C Hamano wrote:
>> Taylor Blau <me@ttaylorr.com> writes:
>>
>> > Here is a reroll of my series which fixes a serious problem with MIDX bitmaps by
>> > which they can become corrupt when permuting their pack order.
>>
>> This seems to depend on tb/cruft-packs that is not yet in 'next', so
>> I'll redo this topic branch by forking it at 'master', merging the
>> other topic in, and then queuing these 8 patches.
>
> Hmm. They shouldn't depend on that topic. My local copy depends on
> abe6bb3905 (The first batch to start the current cycle, 2021-11-29), but
> I'm happy to rebase things if basing on abe6bb3905 was a mistake.

Not so fast.  Let me double check.  My attempt to apply on the tip
of 'master' (as of yesterday) did not work and because the only
topic in 'seen' that touched midx.c was that other topic, I tried to
merge it with 'master' before applying the patches on top and it
worked.  If the other topic has no relation to this topic, it would
work as well (but then it does not explain why patches from the list
did not apply for me).

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 0/8] midx: prevent bitmap corruption when permuting pack order
  2021-12-15 22:17       ` Junio C Hamano
@ 2021-12-15 22:55         ` Junio C Hamano
  0 siblings, 0 replies; 59+ messages in thread
From: Junio C Hamano @ 2021-12-15 22:55 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, stolee

Junio C Hamano <gitster@pobox.com> writes:

> Not so fast.  Let me double check.  My attempt to apply on the tip
> of 'master' (as of yesterday) did not work and because the only
> topic in 'seen' that touched midx.c was that other topic, I tried to
> merge it with 'master' before applying the patches on top and it
> worked.  If the other topic has no relation to this topic, it would
> work as well (but then it does not explain why patches from the list
> did not apply for me).

Interesting.  The posted patches do apply cleanly to the tip of
'master' as of the first batch abe6bb39 (The first batch to start
the current cycle, 2021-11-29), but since then a few topics have
touched Documentation/technical/multi-pack-index.txt and made the
patch application fail to the tip of 'master' as of the second batch
e773545c (The second batch, 2021-12-10).

Even more interestingly, after merging the topic tb/cruft-packs,
which does not even touch the multi-pack-index.txt file, the patches
seemed to apply OK with "-3".  Perhaps I should have tried "am -3"
when I applied them to the 'master' with the second batch of topics.
I dunno.

Thanks for sanity checking.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 0/8] midx: prevent bitmap corruption when permuting pack order
  2021-12-14  1:55 ` [PATCH v2 0/8] " Taylor Blau
                     ` (8 preceding siblings ...)
  2021-12-15 19:46   ` [PATCH v2 0/8] midx: prevent bitmap corruption when permuting pack order Junio C Hamano
@ 2021-12-15 22:58   ` Junio C Hamano
  2021-12-15 23:01     ` Taylor Blau
  9 siblings, 1 reply; 59+ messages in thread
From: Junio C Hamano @ 2021-12-15 22:58 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, stolee

Taylor Blau <me@ttaylorr.com> writes:

>  Documentation/technical/multi-pack-index.txt |   1 +
>  Documentation/technical/pack-format.txt      |  13 +-
>  midx.c                                       |  31 +++-
>  midx.h                                       |   1 +
>  pack-revindex.c                              |  20 ++
>  t/lib-bitmap.sh                              | 186 +++++++++++++++++++
>  t/t5326-multi-pack-bitmaps.sh                | 144 +-------------
>  t/t5327-multi-pack-bitmaps-rev.sh            |  23 +++
>  t/t7700-repack.sh                            |   4 -
>  9 files changed, 271 insertions(+), 152 deletions(-)
>  create mode 100755 t/t5327-multi-pack-bitmaps-rev.sh

As 5327 is taken by the tb/cruft-packs topic, I'll move it to 5328.

Thanks.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 0/8] midx: prevent bitmap corruption when permuting pack order
  2021-12-15 22:58   ` Junio C Hamano
@ 2021-12-15 23:01     ` Taylor Blau
  0 siblings, 0 replies; 59+ messages in thread
From: Taylor Blau @ 2021-12-15 23:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, git, peff, stolee

On Wed, Dec 15, 2021 at 02:58:07PM -0800, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> >  Documentation/technical/multi-pack-index.txt |   1 +
> >  Documentation/technical/pack-format.txt      |  13 +-
> >  midx.c                                       |  31 +++-
> >  midx.h                                       |   1 +
> >  pack-revindex.c                              |  20 ++
> >  t/lib-bitmap.sh                              | 186 +++++++++++++++++++
> >  t/t5326-multi-pack-bitmaps.sh                | 144 +-------------
> >  t/t5327-multi-pack-bitmaps-rev.sh            |  23 +++
> >  t/t7700-repack.sh                            |   4 -
> >  9 files changed, 271 insertions(+), 152 deletions(-)
> >  create mode 100755 t/t5327-multi-pack-bitmaps-rev.sh
>
> As 5327 is taken by the tb/cruft-packs topic, I'll move it to 5328.

Ah, thanks. It should be a straightforward move. Alternatively feel free
to eject the cruft packs topic and I can update that series to use
t5328 instead.

Sounds like you were able to get the patches to apply, though, so I'll
refrain from sending anymore until there is a compelling reason to do
so.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 5/8] t5326: extract `test_rev_exists`
  2021-12-14  1:55   ` [PATCH v2 5/8] t5326: extract `test_rev_exists` Taylor Blau
@ 2021-12-20 18:33     ` Derrick Stolee
  2022-01-04 15:33       ` Taylor Blau
  0 siblings, 1 reply; 59+ messages in thread
From: Derrick Stolee @ 2021-12-20 18:33 UTC (permalink / raw)
  To: Taylor Blau, git; +Cc: gitster, peff

On 12/13/2021 8:55 PM, Taylor Blau wrote:
> To determine which source of data is used for the MIDX's reverse index
> cache, introduce a helper which forces loading the reverse index, and
> then looks for the special trace2 event introduced in a previous commit.
> 
> For now, this helper just looks for when the legacy MIDX .rev file was
> loaded, but in a subsequent commit will become parameterized over the
> the reverse index's source.
> 
> This function replaces checking for the existence of the .rev file. We
> could write a similar helper to ensure that the .rev file is cleaned up
> after repacking, but it will make subsequent tests more difficult to
> write, and provides marginal value since we already check that the MIDX
> .bitmap file is removed.

...

> +test_rev_exists () {
> +	commit="$1"
> +
> +	test_expect_success 'reverse index exists' '
> +		GIT_TRACE2_EVENT_NESTING=10 \

Very recently, b8de3d6 (test-lib.sh: set GIT_TRACE2_EVENT_NESTING,
2021-11-29) made it to master and sets this to 100 across the test
suite, so you don't need this line.

> +		GIT_TRACE2_EVENT=$(pwd)/event.trace \
> +			git rev-list --test-bitmap "$commit" &&

This use of $commit has me worried. Do the tests become too flaky
to changes in how we choose commits for the bitmaps? Does that
require callers to be too aware of the refstate when creating the
bitmaps?

Perhaps just `git rev-list --use-bitmap-index [--objects] HEAD`
would suffice to generate the trace event?

> +		test_path_is_file $midx-$(midx_checksum $objdir).rev &&
> +		grep "\"category\":\"load_midx_revindex\",\"key\":\"source\",\"value\":\"rev\"" event.trace
> +	'
> +}
> +
>  setup_bitmap_history
>  
>  test_expect_success 'create single-pack midx with bitmaps' '
>  	git repack -ad &&
>  	git multi-pack-index write --bitmap &&
>  	test_path_is_file $midx &&
> -	test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
> -	test_path_is_file $midx-$(midx_checksum $objdir).rev
> +	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
>  '
>  
> +test_rev_exists HEAD
> +

Perhaps this helper would be more appropriate as a helper method
within a test, rather than creating a test of its own? I think
it looks better to include it next to the setup lines, something
like

test_expect_success 'create single-pack midx with bitmaps' '
	git repack -ad &&
	git multi-pack-index write --bitmap &&
	test_path_is_file $midx &&
	test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
	test_rev_exists HEAD
'

This would lead to a failure within 'create single-pack midx
with bitmaps' which is easier to find in the file.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 8/8] midx: read `RIDX` chunk when present
  2021-12-14  1:55   ` [PATCH v2 8/8] midx: read `RIDX` chunk when present Taylor Blau
@ 2021-12-20 18:42     ` Derrick Stolee
  2022-01-04 15:21       ` Taylor Blau
  0 siblings, 1 reply; 59+ messages in thread
From: Derrick Stolee @ 2021-12-20 18:42 UTC (permalink / raw)
  To: Taylor Blau, git; +Cc: gitster, peff

On 12/13/2021 8:55 PM, Taylor Blau wrote:
...
> Note that we have two knobs that are adjusted for the new tests:
> GIT_TEST_MIDX_WRITE_REV and GIT_TEST_MIDX_READ_RIDX. The former controls
> whether the MIDX .rev is written at all, and the latter controls whether
> we read the MIDX's RIDX chunk.

...

> +	if (git_env_bool("GIT_TEST_MIDX_READ_RIDX", 1))
> +		pair_chunk(cf, MIDX_CHUNKID_REVINDEX, &m->chunk_revindex);
> +

Skip reading if GIT_TEST_MIDX_READ_RIDX=0 (do read if true or unset). Good.

> -	if (flags & MIDX_WRITE_REV_INDEX)
> +	if (flags & MIDX_WRITE_REV_INDEX &&
> +	    git_env_bool("GIT_TEST_MIDX_WRITE_REV", 0))
>  		write_midx_reverse_index(midx_name.buf, midx_hash, &ctx);

Only write if GIT_TEST_MIDX_WRITE_REV=1 (skip if false or unset). Good.

> +	if (m->chunk_revindex) {
> +		/*
> +		 * If the MIDX `m` has a `RIDX` chunk, then use its contents for
> +		 * the reverse index instead of trying to load a separate `.rev`
> +		 * file.
> +		 *
> +		 * Note that we do *not* set `m->revindex_map` here, since we do
> +		 * not want to accidentally call munmap() in the middle of the
> +		 * MIDX.
> +		 */
> +		trace2_data_string("load_midx_revindex", the_repository,
> +				   "source", "midx");
> +		m->revindex_data = (const uint32_t *)m->chunk_revindex;
> +		return 0;
> +	}
> +
>  	trace2_data_string("load_midx_revindex", the_repository,
>  			   "source", "rev");

A natural way to do this.

>  
> diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
> index 0a35daf939..f5eaa9cf68 100644
> --- a/t/lib-bitmap.sh
> +++ b/t/lib-bitmap.sh
> @@ -291,7 +291,7 @@ test_rev_exists () {
>  }
>  
>  midx_bitmap_core () {
> -	rev_kind="${1:-rev}"
> +	rev_kind="${1:-midx}"
>  
>  	setup_bitmap_history
>  
> @@ -435,7 +435,7 @@ midx_bitmap_core () {
>  }
>  
>  midx_bitmap_partial_tests () {
> -	rev_kind="${1:-rev}"
> +	rev_kind="${1:-midx}"

And I'm glad we can just update the default here.

> diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
> index 100ac90d15..8c92acb0ce 100755
> --- a/t/t5326-multi-pack-bitmaps.sh
> +++ b/t/t5326-multi-pack-bitmaps.sh
> @@ -9,6 +9,11 @@ test_description='exercise basic multi-pack bitmap functionality'
>  GIT_TEST_MULTI_PACK_INDEX=0
>  GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
>  
> +GIT_TEST_MIDX_WRITE_REV=0
> +GIT_TEST_MIDX_READ_RIDX=1
> +export GIT_TEST_MIDX_WRITE_REV
> +export GIT_TEST_MIDX_READ_RIDX

Technically, we could unset these variables, right? ("...=")

> diff --git a/t/t5327-multi-pack-bitmaps-rev.sh b/t/t5327-multi-pack-bitmaps-rev.sh
...
> +# We'll be writing our own midx and bitmaps, so avoid getting confused by the
> +# automatic ones.
> +GIT_TEST_MULTI_PACK_INDEX=0
> +GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
> +
> +# Unlike t5326, this test exercise multi-pack bitmap functionality where the
> +# object order is stored in a separate .rev file.
> +GIT_TEST_MIDX_WRITE_REV=1
> +GIT_TEST_MIDX_READ_RIDX=0
> +export GIT_TEST_MIDX_WRITE_REV
> +export GIT_TEST_MIDX_READ_RIDX
> +
> +midx_bitmap_core rev
> +midx_bitmap_partial_tests rev
> +
> +test_done

Nice that now we get the payoff of the test refactor. This new script exists
only to verify that we can still read the old .rev files, which is important
for compatibility.

Forgive me for "speaking out loud" throughout this patch. I don't see any
need for changes but it is the crux of making this series work.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 0/8] midx: prevent bitmap corruption when permuting pack order
  2021-12-15 19:46   ` [PATCH v2 0/8] midx: prevent bitmap corruption when permuting pack order Junio C Hamano
  2021-12-15 21:37     ` Taylor Blau
@ 2021-12-20 18:51     ` Derrick Stolee
  2021-12-20 19:52       ` Taylor Blau
  1 sibling, 1 reply; 59+ messages in thread
From: Derrick Stolee @ 2021-12-20 18:51 UTC (permalink / raw)
  To: Junio C Hamano, Taylor Blau; +Cc: git, peff

On 12/15/2021 2:46 PM, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
> 
>> Here is a reroll of my series which fixes a serious problem with MIDX bitmaps by
>> which they can become corrupt when permuting their pack order.
...
>> So I'm definitely open to suggestions there, but otherwise this series should go
>> a long ways towards fixing my design mistake of having the MIDX .rev file be
>> separate from the MIDX itself.
> 
> Yeah, a single file with different chunks is a good way to ensure
> atomicity of update.

I just reviewed this series for the first time. Sorry for being so
late getting to it.

I had a few minor recommendations but they are mostly nitpicks and
don't deserve holding up the series if there are no other major
comments.

> A note to reviewers.
> 
> We need to make sure that not just we can still read .rev in
> existing repositories (and convert it to the new chunk) correctly,
> but also decide what to do to older versions of Git once the
> repository is touched by this new version.  Would they be upset to
> see no .rev files or is it just the performance thing (and it is
> more correct to recompute the reverse index on the fly)?  Should the
> new chunk be made mandatory to cause them notice that they should
> not muck with the repository, or is it optional?  Things like that.

1. Can we still read a .rev?

The new test script specifically verifies that existing repositories
will continue to read their .rev upon upgrade. Their .rev files will
be replaced with the chunk during the next write.

2. What if they downgrade after the RIDX chunk is in place?

The .rev file is missing and the repo has a performance issue because
they can't use bitmaps. Correctness is not a problem. Anyone using
.rev files for server use (where bitmaps are most useful) is hopefully
already careful about downgrading Git versions.

3. Should the chunk be made mandatory?

Unfortunately, the chunk format did not follow the index format's
example of making lowercase chunk IDs required. Instead, the chunks
that are necessary for v1 are necessary forever and all other chunks
are deemed optional. Changing this would require something more
drastic like updating the version number or giving some grace period
where released versions start treating lowercase chunk IDs as required
before creating a new "required" chunk.

This does mean that if there is a version incompatibility, the RIDX
chunk will just be ignored by the older version of Git.

In terms of making this a safe format upgrade, I think Taylor has
achieved that.

The only thing I can think is that server operators might want to
deploy this version with GIT_TEST_MIDX_WRITE_REV=1 for a while, so
any need to downgrade would not suffer a performance penalty for a
missing .rev file. If that is a planned way to safely deploy this
change, then it might be worth adding a test that we safely delete
a .rev file after writing both a .rev file and a RIDX chunk. (The
RIDX chunk will be preferred, so maybe the previous .rev file hits
some logic that would skip its deletion?)

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 0/8] midx: prevent bitmap corruption when permuting pack order
  2021-12-20 18:51     ` Derrick Stolee
@ 2021-12-20 19:52       ` Taylor Blau
  2021-12-20 20:09         ` Derrick Stolee
  0 siblings, 1 reply; 59+ messages in thread
From: Taylor Blau @ 2021-12-20 19:52 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Junio C Hamano, git, peff

On Mon, Dec 20, 2021 at 01:51:22PM -0500, Derrick Stolee wrote:
> On 12/15/2021 2:46 PM, Junio C Hamano wrote:
> > Taylor Blau <me@ttaylorr.com> writes:
> >
> >> Here is a reroll of my series which fixes a serious problem with MIDX bitmaps by
> >> which they can become corrupt when permuting their pack order.
> ...
> >> So I'm definitely open to suggestions there, but otherwise this series should go
> >> a long ways towards fixing my design mistake of having the MIDX .rev file be
> >> separate from the MIDX itself.
> >
> > Yeah, a single file with different chunks is a good way to ensure
> > atomicity of update.
>
> I just reviewed this series for the first time. Sorry for being so
> late getting to it.

Thanks for your review. I think that the upgrade plan is sane, but I can
comment a little bit more about that below.

In the meantime, some thoughts on all of the combinations of the new
GIT_TEST_ variables:

  - GIT_TEST_MIDX_READ_RIDX=0 GIT_TEST_MIDX_WRITE_REV=0: this means that
    we won't use bitmaps at all, since we won't have a .rev file to
    read, and we will pretend that the RIDX chunk does not exist.

  - GIT_TEST_MIDX_READ_RIDX=0 GIT_TEST_MIDX_WRITE_REV=1: this is the
    status-quo of how things work today.

  - GIT_TEST_MIDX_READ_RIDX=1 GIT_TEST_MIDX_WRITE_REV=0: this is the
    status-quo of of how things will work after this patch series.

  - GIT_TEST_MIDX_READ_RIDX=1 GIT_TEST_MIDX_WRITE_REV=1: this is useful
    for testing that the RIDX chunk is preferred over reading the .rev
    file.

So all but the (0, 0) combination make sense. Perhaps we should ban that
combination entirely, since nobody would ever set it for a good reason.
But I think that none of these combinations would allow us to propagate
the corruption, since we will always *write* the new RIDX chunk, which
causes the MIDX's checksum to change when we changes its object order.

I kind of hate these runtime checks that are only useful for testing. If
anybody has better ideas of how we should go about this that still
provides comprehensive coverage of .rev files, then I'm all ears.

(The nuclear option might be to just stop supporting .rev files for
MIDX's altogether, since this is such a new feature. But that seems like
a pretty drastic step, and definitely isn't backwards compatible. So I'd
be wary of going in that direction).

> 1. Can we still read a .rev?
>
> The new test script specifically verifies that existing repositories
> will continue to read their .rev upon upgrade. Their .rev files will
> be replaced with the chunk during the next write.

Yes, exactly.

> 2. What if they downgrade after the RIDX chunk is in place?
>
> The .rev file is missing and the repo has a performance issue because
> they can't use bitmaps. Correctness is not a problem. Anyone using
> .rev files for server use (where bitmaps are most useful) is hopefully
> already careful about downgrading Git versions.

Yes, and we gracefully degrade here (I think that I demonstrated this in
my previous response), so in this scenario the worst an operator would
encounter is a performance regression.

> 3. Should the chunk be made mandatory?
>
> Unfortunately, the chunk format did not follow the index format's
> example of making lowercase chunk IDs required. Instead, the chunks
> that are necessary for v1 are necessary forever and all other chunks
> are deemed optional. Changing this would require something more
> drastic like updating the version number or giving some grace period
> where released versions start treating lowercase chunk IDs as required
> before creating a new "required" chunk.
>
> This does mean that if there is a version incompatibility, the RIDX
> chunk will just be ignored by the older version of Git.
>
> In terms of making this a safe format upgrade, I think Taylor has
> achieved that.

Thanks. And yeah, the chunk should (and is) mandatory when writing a
MIDX bitmap. But if we ran `git multi-pack-index write` without
`--bitmap`, then we would be free to not write the RIDX chunk (and
indeed that is what we do).

> The only thing I can think is that server operators might want to
> deploy this version with GIT_TEST_MIDX_WRITE_REV=1 for a while, so
> any need to downgrade would not suffer a performance penalty for a
> missing .rev file. If that is a planned way to safely deploy this
> change, then it might be worth adding a test that we safely delete
> a .rev file after writing both a .rev file and a RIDX chunk. (The
> RIDX chunk will be preferred, so maybe the previous .rev file hits
> some logic that would skip its deletion?)

That logic (that we delete auxiliary files--including the .rev file--not
matching the checksum of the MIDX we just wrote) is unchanged. So I
think we should be good there since we have existing coverage.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 0/8] midx: prevent bitmap corruption when permuting pack order
  2021-12-20 19:52       ` Taylor Blau
@ 2021-12-20 20:09         ` Derrick Stolee
  0 siblings, 0 replies; 59+ messages in thread
From: Derrick Stolee @ 2021-12-20 20:09 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Junio C Hamano, git, peff

On 12/20/2021 2:52 PM, Taylor Blau wrote:
> On Mon, Dec 20, 2021 at 01:51:22PM -0500, Derrick Stolee wrote:

>> 3. Should the chunk be made mandatory?
>>
>> Unfortunately, the chunk format did not follow the index format's
>> example of making lowercase chunk IDs required. Instead, the chunks
>> that are necessary for v1 are necessary forever and all other chunks
>> are deemed optional. Changing this would require something more
>> drastic like updating the version number or giving some grace period
>> where released versions start treating lowercase chunk IDs as required
>> before creating a new "required" chunk.
>>
>> This does mean that if there is a version incompatibility, the RIDX
>> chunk will just be ignored by the older version of Git.
>>
>> In terms of making this a safe format upgrade, I think Taylor has
>> achieved that.
> 
> Thanks. And yeah, the chunk should (and is) mandatory when writing a
> MIDX bitmap. But if we ran `git multi-pack-index write` without
> `--bitmap`, then we would be free to not write the RIDX chunk (and
> indeed that is what we do).

I just want to be careful of the language here, with respect to how
chunks are listed as required or optional in
Documentation/technical/pack-format.txt. "Required" means every MIDX
needs one or is invalid. "Optional" means that all versions of Git
should ignore the chunk if it does not recognize the ID.

So here, the new RIDX chunk will be ignored by older versions and
will be written only when we care about bitmaps.

(Nothing I say here is in conflict with what you said, but I
anticipate confusion with your use of the word "mandatory".)

>> The only thing I can think is that server operators might want to
>> deploy this version with GIT_TEST_MIDX_WRITE_REV=1 for a while, so
>> any need to downgrade would not suffer a performance penalty for a
>> missing .rev file. If that is a planned way to safely deploy this
>> change, then it might be worth adding a test that we safely delete
>> a .rev file after writing both a .rev file and a RIDX chunk. (The
>> RIDX chunk will be preferred, so maybe the previous .rev file hits
>> some logic that would skip its deletion?)
> 
> That logic (that we delete auxiliary files--including the .rev file--not
> matching the checksum of the MIDX we just wrote) is unchanged. So I
> think we should be good there since we have existing coverage.

Sounds good.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 8/8] midx: read `RIDX` chunk when present
  2021-12-20 18:42     ` Derrick Stolee
@ 2022-01-04 15:21       ` Taylor Blau
  0 siblings, 0 replies; 59+ messages in thread
From: Taylor Blau @ 2022-01-04 15:21 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, gitster, peff

On Mon, Dec 20, 2021 at 01:42:24PM -0500, Derrick Stolee wrote:
> > diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
> > index 100ac90d15..8c92acb0ce 100755
> > --- a/t/t5326-multi-pack-bitmaps.sh
> > +++ b/t/t5326-multi-pack-bitmaps.sh
> > @@ -9,6 +9,11 @@ test_description='exercise basic multi-pack bitmap functionality'
> >  GIT_TEST_MULTI_PACK_INDEX=0
> >  GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
> >
> > +GIT_TEST_MIDX_WRITE_REV=0
> > +GIT_TEST_MIDX_READ_RIDX=1
> > +export GIT_TEST_MIDX_WRITE_REV
> > +export GIT_TEST_MIDX_READ_RIDX
>
> Technically, we could unset these variables, right? ("...=")

We absolutely could, and I can't think of any reason not to (other than
these tests would be fragile with respect to the default values of these
special "GIT_TEST_" environment variables, but these are unlikely to
change ever).

> > diff --git a/t/t5327-multi-pack-bitmaps-rev.sh b/t/t5327-multi-pack-bitmaps-rev.sh
> ...
> > +# We'll be writing our own midx and bitmaps, so avoid getting confused by the
> > +# automatic ones.
> > +GIT_TEST_MULTI_PACK_INDEX=0
> > +GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
> > +
> > +# Unlike t5326, this test exercise multi-pack bitmap functionality where the
> > +# object order is stored in a separate .rev file.
> > +GIT_TEST_MIDX_WRITE_REV=1
> > +GIT_TEST_MIDX_READ_RIDX=0
> > +export GIT_TEST_MIDX_WRITE_REV
> > +export GIT_TEST_MIDX_READ_RIDX
> > +
> > +midx_bitmap_core rev
> > +midx_bitmap_partial_tests rev
> > +
> > +test_done
>
> Nice that now we get the payoff of the test refactor. This new script exists
> only to verify that we can still read the old .rev files, which is important
> for compatibility.

Exactly. Though I certainly wavered on testing the legacy .rev behavior
as much as I did. But that had much more to do with how awkward the
tests came out. That behavior is important enough IMHO that it's worth
the awkwardness.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v2 5/8] t5326: extract `test_rev_exists`
  2021-12-20 18:33     ` Derrick Stolee
@ 2022-01-04 15:33       ` Taylor Blau
  0 siblings, 0 replies; 59+ messages in thread
From: Taylor Blau @ 2022-01-04 15:33 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, gitster, peff

On Mon, Dec 20, 2021 at 01:33:32PM -0500, Derrick Stolee wrote:
> On 12/13/2021 8:55 PM, Taylor Blau wrote:
> > To determine which source of data is used for the MIDX's reverse index
> > cache, introduce a helper which forces loading the reverse index, and
> > then looks for the special trace2 event introduced in a previous commit.
> >
> > For now, this helper just looks for when the legacy MIDX .rev file was
> > loaded, but in a subsequent commit will become parameterized over the
> > the reverse index's source.
> >
> > This function replaces checking for the existence of the .rev file. We
> > could write a similar helper to ensure that the .rev file is cleaned up
> > after repacking, but it will make subsequent tests more difficult to
> > write, and provides marginal value since we already check that the MIDX
> > .bitmap file is removed.
>
> ...
>
> > +test_rev_exists () {
> > +	commit="$1"
> > +
> > +	test_expect_success 'reverse index exists' '
> > +		GIT_TRACE2_EVENT_NESTING=10 \
>
> Very recently, b8de3d6 (test-lib.sh: set GIT_TRACE2_EVENT_NESTING,
> 2021-11-29) made it to master and sets this to 100 across the test
> suite, so you don't need this line.

Thanks; I had written it with this in place in case this topic was going
to reach master before the topic containing b8de3d6. But it didn't, and
you're right that setting GIT_TRACE2_EVENT_NESTING explicitly here is no
longer necessary.

> > +		GIT_TRACE2_EVENT=$(pwd)/event.trace \
> > +			git rev-list --test-bitmap "$commit" &&
>
> This use of $commit has me worried. Do the tests become too flaky
> to changes in how we choose commits for the bitmaps? Does that
> require callers to be too aware of the refstate when creating the
> bitmaps?
>
> Perhaps just `git rev-list --use-bitmap-index [--objects] HEAD`
> would suffice to generate the trace event?

It's necessary for the group of tests which exercise partial bitmap
coverage (see the "setup partial bitmaps" test and below for more). In
those tests, we don't have bitmap coverage of HEAD, only HEAD~, so we
need to be able to say things like:

    test_rev_exists HEAD~

since just asking for the tip isn't guaranteed to work always.

We could make the argument optional (i.e., `git rev-list --test-bitmap
"${1:-HEAD}"`), but I generally find that it makes the callers more
difficult to read rather than easier.

On the "--test-bitmap" vs "--objects" thing, I don't think it matters
either way. They are both doing basically the same traversal, but
"--test-bitmap" does some additional integrity checks on top.

> >  test_expect_success 'create single-pack midx with bitmaps' '
> >  	git repack -ad &&
> >  	git multi-pack-index write --bitmap &&
> >  	test_path_is_file $midx &&
> > -	test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
> > -	test_path_is_file $midx-$(midx_checksum $objdir).rev
> > +	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
> >  '
> >
> > +test_rev_exists HEAD
> > +
>
> Perhaps this helper would be more appropriate as a helper method
> within a test, rather than creating a test of its own? I think
> it looks better to include it next to the setup lines, something
> like

Eh, sure. I won't plan on picking this up, but I'm happy to if you feel
strongly about it.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v3 0/9] midx: prevent bitmap corruption when permuting pack order
  2021-12-08 19:26 [PATCH 0/2] midx: prevent bitmap corruption when permuting pack order Taylor Blau
                   ` (3 preceding siblings ...)
  2021-12-14  1:55 ` [PATCH v2 0/8] " Taylor Blau
@ 2022-01-04 18:15 ` Taylor Blau
  2022-01-04 18:15   ` [PATCH v3 1/9] t5326: demonstrate bitmap corruption after permutation Taylor Blau
                     ` (8 more replies)
  4 siblings, 9 replies; 59+ messages in thread
From: Taylor Blau @ 2022-01-04 18:15 UTC (permalink / raw)
  To: git; +Cc: gitster, stolee

Here is a small reroll of my series which fixes a serious problem with MIDX
bitmaps by which they can become corrupt when permuting their pack order.

This reroll cleans up some of the tests by removing a couple of places where we
explicitly set GIT_TRACE2_EVENT_NESTING in the environment, which is no longer
needed after 62329d336f (Merge branch 'ds/trace2-regions-in-tests', 2021-12-15).

It also adds one new patch on top, which ensure that we gracefully fall back
when MIDX bitmaps cannot be read in certain cases.

It is prepared on the tip of master (which is dcc0cd074f at the time of
writing).

Taylor Blau (9):
  t5326: demonstrate bitmap corruption after permutation
  midx.c: make changing the preferred pack safe
  pack-revindex.c: instrument loading on-disk reverse index
  t5326: drop unnecessary setup
  t5326: extract `test_rev_exists`
  t5326: move tests to t/lib-bitmap.sh
  t/lib-bitmap.sh: parameterize tests over reverse index source
  midx: read `RIDX` chunk when present
  pack-bitmap.c: gracefully fallback after opening pack/MIDX

 Documentation/technical/multi-pack-index.txt |   1 +
 Documentation/technical/pack-format.txt      |  13 +-
 midx.c                                       |  31 +++-
 midx.h                                       |   1 +
 pack-bitmap.c                                |   4 +
 pack-revindex.c                              |  20 ++
 t/lib-bitmap.sh                              | 185 +++++++++++++++++++
 t/t5310-pack-bitmaps.sh                      |  28 +++
 t/t5326-multi-pack-bitmaps.sh                | 164 +++-------------
 t/t5327-multi-pack-bitmaps-rev.sh            |  23 +++
 t/t7700-repack.sh                            |   4 -
 11 files changed, 322 insertions(+), 152 deletions(-)
 create mode 100755 t/t5327-multi-pack-bitmaps-rev.sh

Range-diff against v2:
 1:  dfbac4bc60 =  1:  babce7d29a t5326: demonstrate bitmap corruption after permutation
 2:  4ea52e66dd !  2:  7d20c13f8b midx.c: make changing the preferred pack safe
    @@ Commit message
     
      ## Documentation/technical/multi-pack-index.txt ##
     @@ Documentation/technical/multi-pack-index.txt: and their offsets into multiple packfiles. It contains:
    -   - An offset within the jth packfile for the object.
    - - If large offsets are required, we use another list of large
    + ** An offset within the jth packfile for the object.
    + * If large offsets are required, we use another list of large
        offsets similar to version 2 pack-indexes.
     +- An optional list of objects in pseudo-pack order (used with MIDX bitmaps).
      
 3:  b630fea149 =  3:  3279e2eb9b pack-revindex.c: instrument loading on-disk reverse index
 4:  f430b6f2e9 =  4:  5818621ea8 t5326: drop unnecessary setup
 5:  73faab9f42 !  5:  33502d6a17 t5326: extract `test_rev_exists`
    @@ t/t5326-multi-pack-bitmaps.sh: midx_pack_source () {
     +	commit="$1"
     +
     +	test_expect_success 'reverse index exists' '
    -+		GIT_TRACE2_EVENT_NESTING=10 \
     +		GIT_TRACE2_EVENT=$(pwd)/event.trace \
     +			git rev-list --test-bitmap "$commit" &&
     +
 6:  bf42b116e1 !  6:  76e23cae0f t5326: move tests to t/lib-bitmap.sh
    @@ t/lib-bitmap.sh: have_delta () {
     +	commit="$1"
     +
     +	test_expect_success 'reverse index exists' '
    -+		GIT_TRACE2_EVENT_NESTING=10 \
     +		GIT_TRACE2_EVENT=$(pwd)/event.trace \
     +			git rev-list --test-bitmap "$commit" &&
     +
    @@ t/t5326-multi-pack-bitmaps.sh: test_description='exercise basic multi-pack bitma
     -	commit="$1"
     -
     -	test_expect_success 'reverse index exists' '
    --		GIT_TRACE2_EVENT_NESTING=10 \
     -		GIT_TRACE2_EVENT=$(pwd)/event.trace \
     -			git rev-list --test-bitmap "$commit" &&
     -
 7:  fa91631024 !  7:  7ce3dc60f9 t/lib-bitmap.sh: parameterize tests over reverse index source
    @@ t/lib-bitmap.sh: midx_pack_source () {
     +	kind="$2"
      
      	test_expect_success 'reverse index exists' '
    - 		GIT_TRACE2_EVENT_NESTING=10 \
      		GIT_TRACE2_EVENT=$(pwd)/event.trace \
      			git rev-list --test-bitmap "$commit" &&
      
 8:  993bfa8dd8 !  8:  55aa69de12 midx: read `RIDX` chunk when present
    @@ t/t5326-multi-pack-bitmaps.sh: test_description='exercise basic multi-pack bitma
      GIT_TEST_MULTI_PACK_INDEX=0
      GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
      
    -+GIT_TEST_MIDX_WRITE_REV=0
    -+GIT_TEST_MIDX_READ_RIDX=1
    -+export GIT_TEST_MIDX_WRITE_REV
    -+export GIT_TEST_MIDX_READ_RIDX
    ++# This test exercise multi-pack bitmap functionality where the object order is
    ++# stored and read from a special chunk within the MIDX, so use the default
    ++# behavior here.
    ++sane_unset GIT_TEST_MIDX_WRITE_REV
    ++sane_unset GIT_TEST_MIDX_READ_RIDX
     +
      midx_bitmap_core
      
 -:  ---------- >  9:  9707d5ea44 pack-bitmap.c: gracefully fallback after opening pack/MIDX
-- 
2.34.1.25.gb3157a20e6

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v3 1/9] t5326: demonstrate bitmap corruption after permutation
  2022-01-04 18:15 ` [PATCH v3 0/9] " Taylor Blau
@ 2022-01-04 18:15   ` Taylor Blau
  2022-01-20 17:55     ` Jonathan Tan
  2022-01-04 18:15   ` [PATCH v3 2/9] midx.c: make changing the preferred pack safe Taylor Blau
                     ` (7 subsequent siblings)
  8 siblings, 1 reply; 59+ messages in thread
From: Taylor Blau @ 2022-01-04 18:15 UTC (permalink / raw)
  To: git; +Cc: gitster, stolee

This patch demonstrates a cause of bitmap corruption that can occur when
the contents of the multi-pack index does not change, but the underlying
object order does.

In this example, we have a MIDX containing two packs, each with a
distinct set of objects (pack A corresponds to the tree, blob, and
commit from the first patch, and pack B corresponds to the second
patch).

First, a MIDX is written where the 'A' pack is preferred. As expected,
the bitmaps generated there are in-tact. But then, we generate an
identical MIDX with a different object order: this time preferring pack
'B'.

Due to a bug which will be explained and fixed in the following commit,
the MIDX is updated, but the .rev file is not, causing the .bitmap file
to be read incorrectly. Specifically, the .bitmap file will contain
correct data, but the auxiliary object order in the .rev file is stale,
causing readers to get confused by reading the new bitmaps using the old
object order.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5326-multi-pack-bitmaps.sh | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index e187f90f29..0ca2868b0b 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -395,4 +395,35 @@ test_expect_success 'hash-cache values are propagated from pack bitmaps' '
 	)
 '
 
+test_expect_failure 'changing the preferred pack does not corrupt bitmaps' '
+	rm -fr repo &&
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		test_commit A &&
+		test_commit B &&
+
+		git rev-list --objects --no-object-names HEAD^ >A.objects &&
+		git rev-list --objects --no-object-names HEAD^.. >B.objects &&
+
+		A=$(git pack-objects $objdir/pack/pack <A.objects) &&
+		B=$(git pack-objects $objdir/pack/pack <B.objects) &&
+
+		cat >indexes <<-EOF &&
+		pack-$A.idx
+		pack-$B.idx
+		EOF
+
+		git multi-pack-index write --bitmap --stdin-packs \
+			--preferred-pack=pack-$A.pack <indexes &&
+		git rev-list --test-bitmap A &&
+
+		git multi-pack-index write --bitmap --stdin-packs \
+			--preferred-pack=pack-$B.pack <indexes &&
+		git rev-list --test-bitmap A
+	)
+'
+
 test_done
-- 
2.34.1.25.gb3157a20e6


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v3 2/9] midx.c: make changing the preferred pack safe
  2022-01-04 18:15 ` [PATCH v3 0/9] " Taylor Blau
  2022-01-04 18:15   ` [PATCH v3 1/9] t5326: demonstrate bitmap corruption after permutation Taylor Blau
@ 2022-01-04 18:15   ` Taylor Blau
  2022-01-14 21:35     ` Junio C Hamano
  2022-01-20 18:08     ` Jonathan Tan
  2022-01-04 18:15   ` [PATCH v3 3/9] pack-revindex.c: instrument loading on-disk reverse index Taylor Blau
                     ` (6 subsequent siblings)
  8 siblings, 2 replies; 59+ messages in thread
From: Taylor Blau @ 2022-01-04 18:15 UTC (permalink / raw)
  To: git; +Cc: gitster, stolee

The previous patch demonstrates a bug where a MIDX's auxiliary object
order can become out of sync with a MIDX bitmap.

This is because of two confounding factors:

  - First, the object order is stored in a file which is named according
    to the multi-pack index's checksum, and the MIDX does not store the
    object order. This means that the object order can change without
    altering the checksum.

  - But the .rev file is moved into place with finalize_object_file(),
    which link(2)'s the file into place instead of renaming it. For us,
    that means that a modified .rev file will not be moved into place if
    MIDX's checksum was unchanged.

The fix here is two-fold. First, we need to stop linking the file into
place and instead rename it. It's likely we were using
finalize_object_file() instead of a pure rename() because the former
also adjusts shared permissions. But that is unnecessary, because we
already do so in write_rev_file_order(), so rename alone is safe.

But we also need to make the MIDX's checksum change in some way when the
preferred pack changes without altering the set of packs stored in a
MIDX to prevent a race where the new .rev file is moved into place
before the MIDX is updated. Here, you'd get the opposite effect: reading
old bitmaps with the new object order.

But this race bites us even here: suppose that we didn't change the MIDX
checksum, but only renamed the auxiliary object order into place instead
of hardlinking it. Then when we go to generate the new bitmap, we'll
load the old MIDX bitmap, along with the MIDX that it references. That's
fine, since the new MIDX isn't moved into place until after the new
bitmap is generated. But the new object order *has* been moved into
place. So we'll read the old bitmaps in the new order when generating
the new bitmap file, meaning that without this secondary change, bitmap
generation itself would become a victim of the race described here.

This can all be prevented by forcing the MIDX's checksum to change when
the object order changes. We could include the entire object order in
the MIDX, but doing so is somewhat awkward. (For example, the code that
writes a .rev file expects to know the checksum of the associated pack
or MIDX, but writing that data into the MIDX itself makes that a
circular dependency).

Instead, make the object order used during bitmap generation part of the
MIDX itself. That means that the new test in t5326 will cause the MIDX's
checksum to update, preventing the stale read problem.

In theory, it is possible to store a "fingerprint" of the full object
order here, so long as that fingerprint changes at least as often as the
full object order does. Some possibilities here include storing the
identity of the preferred pack, along with the mtimes of the
non-preferred packs in a consistent order. But storing a limited part of
the information makes it difficult to reason about whether or not there
are gaps between the two that would cause us to get bitten by this bug
again.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 Documentation/technical/multi-pack-index.txt |  1 +
 Documentation/technical/pack-format.txt      | 13 +++++-----
 midx.c                                       | 25 ++++++++++++++++----
 t/t5326-multi-pack-bitmaps.sh                |  2 +-
 4 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/Documentation/technical/multi-pack-index.txt b/Documentation/technical/multi-pack-index.txt
index b39c69da8c..f2221d2b44 100644
--- a/Documentation/technical/multi-pack-index.txt
+++ b/Documentation/technical/multi-pack-index.txt
@@ -24,6 +24,7 @@ and their offsets into multiple packfiles. It contains:
 ** An offset within the jth packfile for the object.
 * If large offsets are required, we use another list of large
   offsets similar to version 2 pack-indexes.
+- An optional list of objects in pseudo-pack order (used with MIDX bitmaps).
 
 Thus, we can provide O(log N) lookup time for any number
 of packfiles.
diff --git a/Documentation/technical/pack-format.txt b/Documentation/technical/pack-format.txt
index 8d2f42f29e..6d3efb7d16 100644
--- a/Documentation/technical/pack-format.txt
+++ b/Documentation/technical/pack-format.txt
@@ -376,6 +376,11 @@ CHUNK DATA:
 	[Optional] Object Large Offsets (ID: {'L', 'O', 'F', 'F'})
 	    8-byte offsets into large packfiles.
 
+	[Optional] Bitmap pack order (ID: {'R', 'I', 'D', 'X'})
+	    A list of MIDX positions (one per object in the MIDX, num_objects in
+	    total, each a 4-byte unsigned integer in network byte order), sorted
+	    according to their relative bitmap/pseudo-pack positions.
+
 TRAILER:
 
 	Index checksum of the above contents.
@@ -456,9 +461,5 @@ In short, a MIDX's pseudo-pack is the de-duplicated concatenation of
 objects in packs stored by the MIDX, laid out in pack order, and the
 packs arranged in MIDX order (with the preferred pack coming first).
 
-Finally, note that the MIDX's reverse index is not stored as a chunk in
-the multi-pack-index itself. This is done because the reverse index
-includes the checksum of the pack or MIDX to which it belongs, which
-makes it impossible to write in the MIDX. To avoid races when rewriting
-the MIDX, a MIDX reverse index includes the MIDX's checksum in its
-filename (e.g., `multi-pack-index-xyz.rev`).
+The MIDX's reverse index is stored in the optional 'RIDX' chunk within
+the MIDX itself.
diff --git a/midx.c b/midx.c
index 837b46b2af..d3179e9c02 100644
--- a/midx.c
+++ b/midx.c
@@ -33,6 +33,7 @@
 #define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
 #define MIDX_CHUNKID_OBJECTOFFSETS 0x4f4f4646 /* "OOFF" */
 #define MIDX_CHUNKID_LARGEOFFSETS 0x4c4f4646 /* "LOFF" */
+#define MIDX_CHUNKID_REVINDEX 0x52494458 /* "RIDX" */
 #define MIDX_CHUNK_FANOUT_SIZE (sizeof(uint32_t) * 256)
 #define MIDX_CHUNK_OFFSET_WIDTH (2 * sizeof(uint32_t))
 #define MIDX_CHUNK_LARGE_OFFSET_WIDTH (sizeof(uint64_t))
@@ -833,6 +834,18 @@ static int write_midx_large_offsets(struct hashfile *f,
 	return 0;
 }
 
+static int write_midx_revindex(struct hashfile *f,
+			       void *data)
+{
+	struct write_midx_context *ctx = data;
+	uint32_t i;
+
+	for (i = 0; i < ctx->entries_nr; i++)
+		hashwrite_be32(f, ctx->pack_order[i]);
+
+	return 0;
+}
+
 struct midx_pack_order_data {
 	uint32_t nr;
 	uint32_t pack;
@@ -891,7 +904,7 @@ static void write_midx_reverse_index(char *midx_name, unsigned char *midx_hash,
 	tmp_file = write_rev_file_order(NULL, ctx->pack_order, ctx->entries_nr,
 					midx_hash, WRITE_REV);
 
-	if (finalize_object_file(tmp_file, buf.buf))
+	if (rename(tmp_file, buf.buf))
 		die(_("cannot store reverse index file"));
 
 	strbuf_release(&buf);
@@ -1403,15 +1416,19 @@ static int write_midx_internal(const char *object_dir,
 			(size_t)ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH,
 			write_midx_large_offsets);
 
+	if (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP)) {
+		ctx.pack_order = midx_pack_order(&ctx);
+		add_chunk(cf, MIDX_CHUNKID_REVINDEX,
+			  ctx.entries_nr * sizeof(uint32_t),
+			  write_midx_revindex);
+	}
+
 	write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs);
 	write_chunkfile(cf, &ctx);
 
 	finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
 	free_chunkfile(cf);
 
-	if (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))
-		ctx.pack_order = midx_pack_order(&ctx);
-
 	if (flags & MIDX_WRITE_REV_INDEX)
 		write_midx_reverse_index(midx_name.buf, midx_hash, &ctx);
 	if (flags & MIDX_WRITE_BITMAP) {
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 0ca2868b0b..353282310d 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -395,7 +395,7 @@ test_expect_success 'hash-cache values are propagated from pack bitmaps' '
 	)
 '
 
-test_expect_failure 'changing the preferred pack does not corrupt bitmaps' '
+test_expect_success 'changing the preferred pack does not corrupt bitmaps' '
 	rm -fr repo &&
 	git init repo &&
 	test_when_finished "rm -fr repo" &&
-- 
2.34.1.25.gb3157a20e6


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v3 3/9] pack-revindex.c: instrument loading on-disk reverse index
  2022-01-04 18:15 ` [PATCH v3 0/9] " Taylor Blau
  2022-01-04 18:15   ` [PATCH v3 1/9] t5326: demonstrate bitmap corruption after permutation Taylor Blau
  2022-01-04 18:15   ` [PATCH v3 2/9] midx.c: make changing the preferred pack safe Taylor Blau
@ 2022-01-04 18:15   ` Taylor Blau
  2022-01-20 18:15     ` Jonathan Tan
  2022-01-04 18:15   ` [PATCH v3 4/9] t5326: drop unnecessary setup Taylor Blau
                     ` (5 subsequent siblings)
  8 siblings, 1 reply; 59+ messages in thread
From: Taylor Blau @ 2022-01-04 18:15 UTC (permalink / raw)
  To: git; +Cc: gitster, stolee

In a subsequent commit, we'll use the MIDX's new 'RIDX' chunk as a
source for the reverse index's data. But it will be useful for tests to
be able to determine whether the reverse index was loaded from the
separate .rev file, or from a chunk within the MIDX.

To instrument this, add a trace2 event which the tests can look for in
order to determine the reverse index's source.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-revindex.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/pack-revindex.c b/pack-revindex.c
index 70d0fbafcb..bd15ebad03 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -301,6 +301,9 @@ int load_midx_revindex(struct multi_pack_index *m)
 	if (m->revindex_data)
 		return 0;
 
+	trace2_data_string("load_midx_revindex", the_repository,
+			   "source", "rev");
+
 	get_midx_rev_filename(&revindex_name, m);
 
 	ret = load_revindex_from_disk(revindex_name.buf,
-- 
2.34.1.25.gb3157a20e6


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v3 4/9] t5326: drop unnecessary setup
  2022-01-04 18:15 ` [PATCH v3 0/9] " Taylor Blau
                     ` (2 preceding siblings ...)
  2022-01-04 18:15   ` [PATCH v3 3/9] pack-revindex.c: instrument loading on-disk reverse index Taylor Blau
@ 2022-01-04 18:15   ` Taylor Blau
  2022-01-04 18:15   ` [PATCH v3 5/9] t5326: extract `test_rev_exists` Taylor Blau
                     ` (4 subsequent siblings)
  8 siblings, 0 replies; 59+ messages in thread
From: Taylor Blau @ 2022-01-04 18:15 UTC (permalink / raw)
  To: git; +Cc: gitster, stolee

The core.multiPackIndex config became true by default back in 18e449f86b
(midx: enable core.multiPackIndex by default, 2020-09-25), so it is no
longer necessary to enable it explicitly.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5326-multi-pack-bitmaps.sh | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 353282310d..1a9581af30 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -19,10 +19,6 @@ midx_pack_source () {
 
 setup_bitmap_history
 
-test_expect_success 'enable core.multiPackIndex' '
-	git config core.multiPackIndex true
-'
-
 test_expect_success 'create single-pack midx with bitmaps' '
 	git repack -ad &&
 	git multi-pack-index write --bitmap &&
-- 
2.34.1.25.gb3157a20e6


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v3 5/9] t5326: extract `test_rev_exists`
  2022-01-04 18:15 ` [PATCH v3 0/9] " Taylor Blau
                     ` (3 preceding siblings ...)
  2022-01-04 18:15   ` [PATCH v3 4/9] t5326: drop unnecessary setup Taylor Blau
@ 2022-01-04 18:15   ` Taylor Blau
  2022-01-04 18:15   ` [PATCH v3 6/9] t5326: move tests to t/lib-bitmap.sh Taylor Blau
                     ` (3 subsequent siblings)
  8 siblings, 0 replies; 59+ messages in thread
From: Taylor Blau @ 2022-01-04 18:15 UTC (permalink / raw)
  To: git; +Cc: gitster, stolee

To determine which source of data is used for the MIDX's reverse index
cache, introduce a helper which forces loading the reverse index, and
then looks for the special trace2 event introduced in a previous commit.

For now, this helper just looks for when the legacy MIDX .rev file was
loaded, but in a subsequent commit will become parameterized over the
the reverse index's source.

This function replaces checking for the existence of the .rev file. We
could write a similar helper to ensure that the .rev file is cleaned up
after repacking, but it will make subsequent tests more difficult to
write, and provides marginal value since we already check that the MIDX
.bitmap file is removed.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t5326-multi-pack-bitmaps.sh | 37 +++++++++++++++++++++--------------
 1 file changed, 22 insertions(+), 15 deletions(-)

diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 1a9581af30..999740f8c7 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -17,16 +17,29 @@ midx_pack_source () {
 	test-tool read-midx --show-objects .git/objects | grep "^$1 " | cut -f2
 }
 
+test_rev_exists () {
+	commit="$1"
+
+	test_expect_success 'reverse index exists' '
+		GIT_TRACE2_EVENT=$(pwd)/event.trace \
+			git rev-list --test-bitmap "$commit" &&
+
+		test_path_is_file $midx-$(midx_checksum $objdir).rev &&
+		grep "\"category\":\"load_midx_revindex\",\"key\":\"source\",\"value\":\"rev\"" event.trace
+	'
+}
+
 setup_bitmap_history
 
 test_expect_success 'create single-pack midx with bitmaps' '
 	git repack -ad &&
 	git multi-pack-index write --bitmap &&
 	test_path_is_file $midx &&
-	test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
-	test_path_is_file $midx-$(midx_checksum $objdir).rev
+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
 '
 
+test_rev_exists HEAD
+
 basic_bitmap_tests
 
 test_expect_success 'create new additional packs' '
@@ -52,10 +65,11 @@ test_expect_success 'create multi-pack midx with bitmaps' '
 	test_line_count = 25 packs &&
 
 	test_path_is_file $midx &&
-	test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
-	test_path_is_file $midx-$(midx_checksum $objdir).rev
+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
 '
 
+test_rev_exists HEAD
+
 basic_bitmap_tests
 
 test_expect_success '--no-bitmap is respected when bitmaps exist' '
@@ -66,7 +80,6 @@ test_expect_success '--no-bitmap is respected when bitmaps exist' '
 
 	test_path_is_file $midx &&
 	test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
-	test_path_is_file $midx-$(midx_checksum $objdir).rev &&
 
 	git multi-pack-index write --no-bitmap &&
 
@@ -206,10 +219,11 @@ test_expect_success 'setup partial bitmaps' '
 	test_commit loose &&
 	git multi-pack-index write --bitmap 2>err &&
 	test_path_is_file $midx &&
-	test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
-	test_path_is_file $midx-$(midx_checksum $objdir).rev
+	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
 '
 
+test_rev_exists HEAD~
+
 basic_bitmap_tests HEAD~
 
 test_expect_success 'removing a MIDX clears stale bitmaps' '
@@ -224,7 +238,6 @@ test_expect_success 'removing a MIDX clears stale bitmaps' '
 
 		# Write a MIDX and bitmap; remove the MIDX but leave the bitmap.
 		stale_bitmap=$midx-$(midx_checksum $objdir).bitmap &&
-		stale_rev=$midx-$(midx_checksum $objdir).rev &&
 		rm $midx &&
 
 		# Then write a new MIDX.
@@ -234,9 +247,7 @@ test_expect_success 'removing a MIDX clears stale bitmaps' '
 
 		test_path_is_file $midx &&
 		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
-		test_path_is_file $midx-$(midx_checksum $objdir).rev &&
-		test_path_is_missing $stale_bitmap &&
-		test_path_is_missing $stale_rev
+		test_path_is_missing $stale_bitmap
 	)
 '
 
@@ -257,7 +268,6 @@ test_expect_success 'pack.preferBitmapTips' '
 		git multi-pack-index write --bitmap &&
 		test_path_is_file $midx &&
 		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
-		test_path_is_file $midx-$(midx_checksum $objdir).rev &&
 
 		test-tool bitmap list-commits | sort >bitmaps &&
 		comm -13 bitmaps commits >before &&
@@ -267,7 +277,6 @@ test_expect_success 'pack.preferBitmapTips' '
 			<before | git update-ref --stdin &&
 
 		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
-		rm -fr $midx-$(midx_checksum $objdir).rev &&
 		rm -fr $midx &&
 
 		git -c pack.preferBitmapTips=refs/tags/include \
@@ -305,7 +314,6 @@ test_expect_success 'writing a bitmap with --refs-snapshot' '
 		grep "$(git rev-parse two)" bitmaps &&
 
 		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
-		rm -fr $midx-$(midx_checksum $objdir).rev &&
 		rm -fr $midx &&
 
 		# Then again, but with a refs snapshot which only sees
@@ -350,7 +358,6 @@ test_expect_success 'write a bitmap with --refs-snapshot (preferred tips)' '
 		) >snapshot &&
 
 		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
-		rm -fr $midx-$(midx_checksum $objdir).rev &&
 		rm -fr $midx &&
 
 		git multi-pack-index write --bitmap --refs-snapshot=snapshot &&
-- 
2.34.1.25.gb3157a20e6


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v3 6/9] t5326: move tests to t/lib-bitmap.sh
  2022-01-04 18:15 ` [PATCH v3 0/9] " Taylor Blau
                     ` (4 preceding siblings ...)
  2022-01-04 18:15   ` [PATCH v3 5/9] t5326: extract `test_rev_exists` Taylor Blau
@ 2022-01-04 18:15   ` Taylor Blau
  2022-01-04 18:15   ` [PATCH v3 7/9] t/lib-bitmap.sh: parameterize tests over reverse index source Taylor Blau
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 59+ messages in thread
From: Taylor Blau @ 2022-01-04 18:15 UTC (permalink / raw)
  To: git; +Cc: gitster, stolee

In t5326, we have a handful of tests that we would like to run twice:
once using the MIDX's new `RIDX` chunk as the source of the
reverse-index cache, and once using the separate `.rev` file.

But because these tests mutate the state of the underlying repository,
and then make assumptions about those mutations occurring in a certain
sequence, simply running the tests twice in the same repository is
awkward.

Instead, extract the core of interesting tests into t/lib-bitmap.sh to
prepare for them to be run twice, each in a separate test script. This
means that they can each operate on a separate repository, removing any
concerns about mutating state.

For now, this patch is a strict cut-and-paste of some tests from t5326.
The tests which did not move are not interesting with respect to the
source of their reverse index data.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/lib-bitmap.sh               | 177 ++++++++++++++++++++++++++++++++++
 t/t5326-multi-pack-bitmaps.sh | 173 +--------------------------------
 2 files changed, 179 insertions(+), 171 deletions(-)

diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
index 21d0392dda..48a8730a13 100644
--- a/t/lib-bitmap.sh
+++ b/t/lib-bitmap.sh
@@ -1,6 +1,9 @@
 # Helpers for scripts testing bitmap functionality; see t5310 for
 # example usage.
 
+objdir=.git/objects
+midx=$objdir/pack/multi-pack-index
+
 # Compare a file containing rev-list bitmap traversal output to its non-bitmap
 # counterpart. You can't just use test_cmp for this, because the two produce
 # subtly different output:
@@ -264,3 +267,177 @@ have_delta () {
 midx_checksum () {
 	test-tool read-midx --checksum "$1"
 }
+
+# midx_pack_source <obj>
+midx_pack_source () {
+	test-tool read-midx --show-objects .git/objects | grep "^$1 " | cut -f2
+}
+
+test_rev_exists () {
+	commit="$1"
+
+	test_expect_success 'reverse index exists' '
+		GIT_TRACE2_EVENT=$(pwd)/event.trace \
+			git rev-list --test-bitmap "$commit" &&
+
+		test_path_is_file $midx-$(midx_checksum $objdir).rev &&
+		grep "\"category\":\"load_midx_revindex\",\"key\":\"source\",\"value\":\"rev\"" event.trace
+	'
+}
+
+midx_bitmap_core () {
+	setup_bitmap_history
+
+	test_expect_success 'create single-pack midx with bitmaps' '
+		git repack -ad &&
+		git multi-pack-index write --bitmap &&
+		test_path_is_file $midx &&
+		test_path_is_file $midx-$(midx_checksum $objdir).bitmap
+	'
+
+	test_rev_exists HEAD
+
+	basic_bitmap_tests
+
+	test_expect_success 'create new additional packs' '
+		for i in $(test_seq 1 16)
+		do
+			test_commit "$i" &&
+			git repack -d || return 1
+		done &&
+
+		git checkout -b other2 HEAD~8 &&
+		for i in $(test_seq 1 8)
+		do
+			test_commit "side-$i" &&
+			git repack -d || return 1
+		done &&
+		git checkout second
+	'
+
+	test_expect_success 'create multi-pack midx with bitmaps' '
+		git multi-pack-index write --bitmap &&
+
+		ls $objdir/pack/pack-*.pack >packs &&
+		test_line_count = 25 packs &&
+
+		test_path_is_file $midx &&
+		test_path_is_file $midx-$(midx_checksum $objdir).bitmap
+	'
+
+	test_rev_exists HEAD
+
+	basic_bitmap_tests
+
+	test_expect_success '--no-bitmap is respected when bitmaps exist' '
+		git multi-pack-index write --bitmap &&
+
+		test_commit respect--no-bitmap &&
+		git repack -d &&
+
+		test_path_is_file $midx &&
+		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+
+		git multi-pack-index write --no-bitmap &&
+
+		test_path_is_file $midx &&
+		test_path_is_missing $midx-$(midx_checksum $objdir).bitmap &&
+		test_path_is_missing $midx-$(midx_checksum $objdir).rev
+	'
+
+	test_expect_success 'setup midx with base from later pack' '
+		# Write a and b so that "a" is a delta on top of base "b", since Git
+		# prefers to delete contents out of a base rather than add to a shorter
+		# object.
+		test_seq 1 128 >a &&
+		test_seq 1 130 >b &&
+
+		git add a b &&
+		git commit -m "initial commit" &&
+
+		a=$(git rev-parse HEAD:a) &&
+		b=$(git rev-parse HEAD:b) &&
+
+		# In the first pack, "a" is stored as a delta to "b".
+		p1=$(git pack-objects .git/objects/pack/pack <<-EOF
+		$a
+		$b
+		EOF
+		) &&
+
+		# In the second pack, "a" is missing, and "b" is not a delta nor base to
+		# any other object.
+		p2=$(git pack-objects .git/objects/pack/pack <<-EOF
+		$b
+		$(git rev-parse HEAD)
+		$(git rev-parse HEAD^{tree})
+		EOF
+		) &&
+
+		git prune-packed &&
+		# Use the second pack as the preferred source, so that "b" occurs
+		# earlier in the MIDX object order, rendering "a" unusable for pack
+		# reuse.
+		git multi-pack-index write --bitmap --preferred-pack=pack-$p2.idx &&
+
+		have_delta $a $b &&
+		test $(midx_pack_source $a) != $(midx_pack_source $b)
+	'
+
+	rev_list_tests 'full bitmap with backwards delta'
+
+	test_expect_success 'clone with bitmaps enabled' '
+		git clone --no-local --bare . clone-reverse-delta.git &&
+		test_when_finished "rm -fr clone-reverse-delta.git" &&
+
+		git rev-parse HEAD >expect &&
+		git --git-dir=clone-reverse-delta.git rev-parse HEAD >actual &&
+		test_cmp expect actual
+	'
+
+	test_expect_success 'changing the preferred pack does not corrupt bitmaps' '
+		rm -fr repo &&
+		git init repo &&
+		test_when_finished "rm -fr repo" &&
+		(
+			cd repo &&
+
+			test_commit A &&
+			test_commit B &&
+
+			git rev-list --objects --no-object-names HEAD^ >A.objects &&
+			git rev-list --objects --no-object-names HEAD^.. >B.objects &&
+
+			A=$(git pack-objects $objdir/pack/pack <A.objects) &&
+			B=$(git pack-objects $objdir/pack/pack <B.objects) &&
+
+			cat >indexes <<-EOF &&
+			pack-$A.idx
+			pack-$B.idx
+			EOF
+
+			git multi-pack-index write --bitmap --stdin-packs \
+				--preferred-pack=pack-$A.pack <indexes &&
+			git rev-list --test-bitmap A &&
+
+			git multi-pack-index write --bitmap --stdin-packs \
+				--preferred-pack=pack-$B.pack <indexes &&
+			git rev-list --test-bitmap A
+		)
+	'
+}
+
+midx_bitmap_partial_tests () {
+	test_expect_success 'setup partial bitmaps' '
+		test_commit packed &&
+		git repack &&
+		test_commit loose &&
+		git multi-pack-index write --bitmap 2>err &&
+		test_path_is_file $midx &&
+		test_path_is_file $midx-$(midx_checksum $objdir).bitmap
+	'
+
+	test_rev_exists HEAD~
+
+	basic_bitmap_tests HEAD~
+}
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 999740f8c7..100ac90d15 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -9,134 +9,7 @@ test_description='exercise basic multi-pack bitmap functionality'
 GIT_TEST_MULTI_PACK_INDEX=0
 GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
 
-objdir=.git/objects
-midx=$objdir/pack/multi-pack-index
-
-# midx_pack_source <obj>
-midx_pack_source () {
-	test-tool read-midx --show-objects .git/objects | grep "^$1 " | cut -f2
-}
-
-test_rev_exists () {
-	commit="$1"
-
-	test_expect_success 'reverse index exists' '
-		GIT_TRACE2_EVENT=$(pwd)/event.trace \
-			git rev-list --test-bitmap "$commit" &&
-
-		test_path_is_file $midx-$(midx_checksum $objdir).rev &&
-		grep "\"category\":\"load_midx_revindex\",\"key\":\"source\",\"value\":\"rev\"" event.trace
-	'
-}
-
-setup_bitmap_history
-
-test_expect_success 'create single-pack midx with bitmaps' '
-	git repack -ad &&
-	git multi-pack-index write --bitmap &&
-	test_path_is_file $midx &&
-	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
-'
-
-test_rev_exists HEAD
-
-basic_bitmap_tests
-
-test_expect_success 'create new additional packs' '
-	for i in $(test_seq 1 16)
-	do
-		test_commit "$i" &&
-		git repack -d || return 1
-	done &&
-
-	git checkout -b other2 HEAD~8 &&
-	for i in $(test_seq 1 8)
-	do
-		test_commit "side-$i" &&
-		git repack -d || return 1
-	done &&
-	git checkout second
-'
-
-test_expect_success 'create multi-pack midx with bitmaps' '
-	git multi-pack-index write --bitmap &&
-
-	ls $objdir/pack/pack-*.pack >packs &&
-	test_line_count = 25 packs &&
-
-	test_path_is_file $midx &&
-	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
-'
-
-test_rev_exists HEAD
-
-basic_bitmap_tests
-
-test_expect_success '--no-bitmap is respected when bitmaps exist' '
-	git multi-pack-index write --bitmap &&
-
-	test_commit respect--no-bitmap &&
-	git repack -d &&
-
-	test_path_is_file $midx &&
-	test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
-
-	git multi-pack-index write --no-bitmap &&
-
-	test_path_is_file $midx &&
-	test_path_is_missing $midx-$(midx_checksum $objdir).bitmap &&
-	test_path_is_missing $midx-$(midx_checksum $objdir).rev
-'
-
-test_expect_success 'setup midx with base from later pack' '
-	# Write a and b so that "a" is a delta on top of base "b", since Git
-	# prefers to delete contents out of a base rather than add to a shorter
-	# object.
-	test_seq 1 128 >a &&
-	test_seq 1 130 >b &&
-
-	git add a b &&
-	git commit -m "initial commit" &&
-
-	a=$(git rev-parse HEAD:a) &&
-	b=$(git rev-parse HEAD:b) &&
-
-	# In the first pack, "a" is stored as a delta to "b".
-	p1=$(git pack-objects .git/objects/pack/pack <<-EOF
-	$a
-	$b
-	EOF
-	) &&
-
-	# In the second pack, "a" is missing, and "b" is not a delta nor base to
-	# any other object.
-	p2=$(git pack-objects .git/objects/pack/pack <<-EOF
-	$b
-	$(git rev-parse HEAD)
-	$(git rev-parse HEAD^{tree})
-	EOF
-	) &&
-
-	git prune-packed &&
-	# Use the second pack as the preferred source, so that "b" occurs
-	# earlier in the MIDX object order, rendering "a" unusable for pack
-	# reuse.
-	git multi-pack-index write --bitmap --preferred-pack=pack-$p2.idx &&
-
-	have_delta $a $b &&
-	test $(midx_pack_source $a) != $(midx_pack_source $b)
-'
-
-rev_list_tests 'full bitmap with backwards delta'
-
-test_expect_success 'clone with bitmaps enabled' '
-	git clone --no-local --bare . clone-reverse-delta.git &&
-	test_when_finished "rm -fr clone-reverse-delta.git" &&
-
-	git rev-parse HEAD >expect &&
-	git --git-dir=clone-reverse-delta.git rev-parse HEAD >actual &&
-	test_cmp expect actual
-'
+midx_bitmap_core
 
 bitmap_reuse_tests() {
 	from=$1
@@ -213,18 +86,7 @@ test_expect_success 'missing object closure fails gracefully' '
 	)
 '
 
-test_expect_success 'setup partial bitmaps' '
-	test_commit packed &&
-	git repack &&
-	test_commit loose &&
-	git multi-pack-index write --bitmap 2>err &&
-	test_path_is_file $midx &&
-	test_path_is_file $midx-$(midx_checksum $objdir).bitmap
-'
-
-test_rev_exists HEAD~
-
-basic_bitmap_tests HEAD~
+midx_bitmap_partial_tests
 
 test_expect_success 'removing a MIDX clears stale bitmaps' '
 	rm -fr repo &&
@@ -398,35 +260,4 @@ test_expect_success 'hash-cache values are propagated from pack bitmaps' '
 	)
 '
 
-test_expect_success 'changing the preferred pack does not corrupt bitmaps' '
-	rm -fr repo &&
-	git init repo &&
-	test_when_finished "rm -fr repo" &&
-	(
-		cd repo &&
-
-		test_commit A &&
-		test_commit B &&
-
-		git rev-list --objects --no-object-names HEAD^ >A.objects &&
-		git rev-list --objects --no-object-names HEAD^.. >B.objects &&
-
-		A=$(git pack-objects $objdir/pack/pack <A.objects) &&
-		B=$(git pack-objects $objdir/pack/pack <B.objects) &&
-
-		cat >indexes <<-EOF &&
-		pack-$A.idx
-		pack-$B.idx
-		EOF
-
-		git multi-pack-index write --bitmap --stdin-packs \
-			--preferred-pack=pack-$A.pack <indexes &&
-		git rev-list --test-bitmap A &&
-
-		git multi-pack-index write --bitmap --stdin-packs \
-			--preferred-pack=pack-$B.pack <indexes &&
-		git rev-list --test-bitmap A
-	)
-'
-
 test_done
-- 
2.34.1.25.gb3157a20e6


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v3 7/9] t/lib-bitmap.sh: parameterize tests over reverse index source
  2022-01-04 18:15 ` [PATCH v3 0/9] " Taylor Blau
                     ` (5 preceding siblings ...)
  2022-01-04 18:15   ` [PATCH v3 6/9] t5326: move tests to t/lib-bitmap.sh Taylor Blau
@ 2022-01-04 18:15   ` Taylor Blau
  2022-01-24 19:15     ` Jonathan Tan
  2022-01-04 18:16   ` [PATCH v3 8/9] midx: read `RIDX` chunk when present Taylor Blau
  2022-01-04 18:16   ` [PATCH v3 9/9] pack-bitmap.c: gracefully fallback after opening pack/MIDX Taylor Blau
  8 siblings, 1 reply; 59+ messages in thread
From: Taylor Blau @ 2022-01-04 18:15 UTC (permalink / raw)
  To: git; +Cc: gitster, stolee

To prepare for reading the reverse index data out of the MIDX itself,
teach the `test_rev_exists` function to take an expected "source" for
the reverse index data.

When given "rev", it asserts that the MIDX's `.rev` file exists, and is
loaded when verifying the integrity of its bitmaps. Otherwise, it
ensures that trace2 reports the source of the reverse index data as the
same string which was given to test_rev_exists().

The following patch will implement reading the reverse index data from
the MIDX itself.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/lib-bitmap.sh | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
index 48a8730a13..77b5f46a03 100644
--- a/t/lib-bitmap.sh
+++ b/t/lib-bitmap.sh
@@ -275,17 +275,23 @@ midx_pack_source () {
 
 test_rev_exists () {
 	commit="$1"
+	kind="$2"
 
 	test_expect_success 'reverse index exists' '
 		GIT_TRACE2_EVENT=$(pwd)/event.trace \
 			git rev-list --test-bitmap "$commit" &&
 
-		test_path_is_file $midx-$(midx_checksum $objdir).rev &&
-		grep "\"category\":\"load_midx_revindex\",\"key\":\"source\",\"value\":\"rev\"" event.trace
+		if test "rev" = "$kind"
+		then
+			test_path_is_file $midx-$(midx_checksum $objdir).rev
+		fi &&
+		grep "\"category\":\"load_midx_revindex\",\"key\":\"source\",\"value\":\"$kind\"" event.trace
 	'
 }
 
 midx_bitmap_core () {
+	rev_kind="${1:-rev}"
+
 	setup_bitmap_history
 
 	test_expect_success 'create single-pack midx with bitmaps' '
@@ -295,7 +301,7 @@ midx_bitmap_core () {
 		test_path_is_file $midx-$(midx_checksum $objdir).bitmap
 	'
 
-	test_rev_exists HEAD
+	test_rev_exists HEAD "$rev_kind"
 
 	basic_bitmap_tests
 
@@ -325,7 +331,7 @@ midx_bitmap_core () {
 		test_path_is_file $midx-$(midx_checksum $objdir).bitmap
 	'
 
-	test_rev_exists HEAD
+	test_rev_exists HEAD "$rev_kind"
 
 	basic_bitmap_tests
 
@@ -428,6 +434,8 @@ midx_bitmap_core () {
 }
 
 midx_bitmap_partial_tests () {
+	rev_kind="${1:-rev}"
+
 	test_expect_success 'setup partial bitmaps' '
 		test_commit packed &&
 		git repack &&
@@ -437,7 +445,7 @@ midx_bitmap_partial_tests () {
 		test_path_is_file $midx-$(midx_checksum $objdir).bitmap
 	'
 
-	test_rev_exists HEAD~
+	test_rev_exists HEAD~ "$rev_kind"
 
 	basic_bitmap_tests HEAD~
 }
-- 
2.34.1.25.gb3157a20e6


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v3 8/9] midx: read `RIDX` chunk when present
  2022-01-04 18:15 ` [PATCH v3 0/9] " Taylor Blau
                     ` (6 preceding siblings ...)
  2022-01-04 18:15   ` [PATCH v3 7/9] t/lib-bitmap.sh: parameterize tests over reverse index source Taylor Blau
@ 2022-01-04 18:16   ` Taylor Blau
  2022-01-24 19:27     ` Jonathan Tan
  2022-01-04 18:16   ` [PATCH v3 9/9] pack-bitmap.c: gracefully fallback after opening pack/MIDX Taylor Blau
  8 siblings, 1 reply; 59+ messages in thread
From: Taylor Blau @ 2022-01-04 18:16 UTC (permalink / raw)
  To: git; +Cc: gitster, stolee

When a MIDX contains the new `RIDX` chunk, ensure that the reverse index
is read from it instead of the on-disk .rev file. Since we need to
encode the object order in the MIDX itself for correctness reasons,
there is no point in storing the same data again outside of the MIDX.

So, this patch stops writing separate .rev files, and reads it out of
the MIDX itself. This is possible to do with relatively little new code,
since the format of the RIDX chunk is identical to the data in the .rev
file. In other words, we can implement this by pointing the
`revindex_data` field at the reverse index chunk of the MIDX instead of
the .rev file without any other changes.

Note that we have two knobs that are adjusted for the new tests:
GIT_TEST_MIDX_WRITE_REV and GIT_TEST_MIDX_READ_RIDX. The former controls
whether the MIDX .rev is written at all, and the latter controls whether
we read the MIDX's RIDX chunk.

Both are necessary to ensure that the test added at the beginning of
this series continues to work. This is because we always need to write
the RIDX chunk in the MIDX in order to change its checksum, but we want
to make sure reading the existing .rev file still works (since the RIDX
chunk takes precedence by default).

Arguably this isn't a very interesting mode to test, because the
precedence rules mean that we'll always read the RIDX chunk over the
.rev file. But it makes it impossible for a user to induce corruption in
their repository by adjusting the test knobs (since if we had an
either/or knob they could stop writing the RIDX chunk, allowing them to
tweak the MIDX's object order without changing its checksum).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 midx.c                            |  6 +++++-
 midx.h                            |  1 +
 pack-revindex.c                   | 17 +++++++++++++++++
 t/lib-bitmap.sh                   |  4 ++--
 t/t5326-multi-pack-bitmaps.sh     |  6 ++++++
 t/t5327-multi-pack-bitmaps-rev.sh | 23 +++++++++++++++++++++++
 t/t7700-repack.sh                 |  4 ----
 7 files changed, 54 insertions(+), 7 deletions(-)
 create mode 100755 t/t5327-multi-pack-bitmaps-rev.sh

diff --git a/midx.c b/midx.c
index d3179e9c02..9aba13b5b1 100644
--- a/midx.c
+++ b/midx.c
@@ -162,6 +162,9 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
 
 	pair_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS, &m->chunk_large_offsets);
 
+	if (git_env_bool("GIT_TEST_MIDX_READ_RIDX", 1))
+		pair_chunk(cf, MIDX_CHUNKID_REVINDEX, &m->chunk_revindex);
+
 	m->num_objects = ntohl(m->chunk_oid_fanout[255]);
 
 	CALLOC_ARRAY(m->pack_names, m->num_packs);
@@ -1429,7 +1432,8 @@ static int write_midx_internal(const char *object_dir,
 	finalize_hashfile(f, midx_hash, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
 	free_chunkfile(cf);
 
-	if (flags & MIDX_WRITE_REV_INDEX)
+	if (flags & MIDX_WRITE_REV_INDEX &&
+	    git_env_bool("GIT_TEST_MIDX_WRITE_REV", 0))
 		write_midx_reverse_index(midx_name.buf, midx_hash, &ctx);
 	if (flags & MIDX_WRITE_BITMAP) {
 		if (write_midx_bitmap(midx_name.buf, midx_hash, &ctx,
diff --git a/midx.h b/midx.h
index b7d79a515c..22e8e53288 100644
--- a/midx.h
+++ b/midx.h
@@ -36,6 +36,7 @@ struct multi_pack_index {
 	const unsigned char *chunk_oid_lookup;
 	const unsigned char *chunk_object_offsets;
 	const unsigned char *chunk_large_offsets;
+	const unsigned char *chunk_revindex;
 
 	const char **pack_names;
 	struct packed_git **packs;
diff --git a/pack-revindex.c b/pack-revindex.c
index bd15ebad03..08dc160167 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -298,9 +298,26 @@ int load_midx_revindex(struct multi_pack_index *m)
 {
 	struct strbuf revindex_name = STRBUF_INIT;
 	int ret;
+
 	if (m->revindex_data)
 		return 0;
 
+	if (m->chunk_revindex) {
+		/*
+		 * If the MIDX `m` has a `RIDX` chunk, then use its contents for
+		 * the reverse index instead of trying to load a separate `.rev`
+		 * file.
+		 *
+		 * Note that we do *not* set `m->revindex_map` here, since we do
+		 * not want to accidentally call munmap() in the middle of the
+		 * MIDX.
+		 */
+		trace2_data_string("load_midx_revindex", the_repository,
+				   "source", "midx");
+		m->revindex_data = (const uint32_t *)m->chunk_revindex;
+		return 0;
+	}
+
 	trace2_data_string("load_midx_revindex", the_repository,
 			   "source", "rev");
 
diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
index 77b5f46a03..365d990ce3 100644
--- a/t/lib-bitmap.sh
+++ b/t/lib-bitmap.sh
@@ -290,7 +290,7 @@ test_rev_exists () {
 }
 
 midx_bitmap_core () {
-	rev_kind="${1:-rev}"
+	rev_kind="${1:-midx}"
 
 	setup_bitmap_history
 
@@ -434,7 +434,7 @@ midx_bitmap_core () {
 }
 
 midx_bitmap_partial_tests () {
-	rev_kind="${1:-rev}"
+	rev_kind="${1:-midx}"
 
 	test_expect_success 'setup partial bitmaps' '
 		test_commit packed &&
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 100ac90d15..c0924074c4 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -9,6 +9,12 @@ test_description='exercise basic multi-pack bitmap functionality'
 GIT_TEST_MULTI_PACK_INDEX=0
 GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
 
+# This test exercise multi-pack bitmap functionality where the object order is
+# stored and read from a special chunk within the MIDX, so use the default
+# behavior here.
+sane_unset GIT_TEST_MIDX_WRITE_REV
+sane_unset GIT_TEST_MIDX_READ_RIDX
+
 midx_bitmap_core
 
 bitmap_reuse_tests() {
diff --git a/t/t5327-multi-pack-bitmaps-rev.sh b/t/t5327-multi-pack-bitmaps-rev.sh
new file mode 100755
index 0000000000..d30ba632c8
--- /dev/null
+++ b/t/t5327-multi-pack-bitmaps-rev.sh
@@ -0,0 +1,23 @@
+#!/bin/sh
+
+test_description='exercise basic multi-pack bitmap functionality (.rev files)'
+
+. ./test-lib.sh
+. "${TEST_DIRECTORY}/lib-bitmap.sh"
+
+# We'll be writing our own midx and bitmaps, so avoid getting confused by the
+# automatic ones.
+GIT_TEST_MULTI_PACK_INDEX=0
+GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP=0
+
+# Unlike t5326, this test exercise multi-pack bitmap functionality where the
+# object order is stored in a separate .rev file.
+GIT_TEST_MIDX_WRITE_REV=1
+GIT_TEST_MIDX_READ_RIDX=0
+export GIT_TEST_MIDX_WRITE_REV
+export GIT_TEST_MIDX_READ_RIDX
+
+midx_bitmap_core rev
+midx_bitmap_partial_tests rev
+
+test_done
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 4693f8dc2b..02a6633a16 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -311,16 +311,13 @@ test_expect_success 'cleans up MIDX when appropriate' '
 		checksum=$(midx_checksum $objdir) &&
 		test_path_is_file $midx &&
 		test_path_is_file $midx-$checksum.bitmap &&
-		test_path_is_file $midx-$checksum.rev &&
 
 		test_commit repack-3 &&
 		GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb --write-midx &&
 
 		test_path_is_file $midx &&
 		test_path_is_missing $midx-$checksum.bitmap &&
-		test_path_is_missing $midx-$checksum.rev &&
 		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
-		test_path_is_file $midx-$(midx_checksum $objdir).rev &&
 
 		test_commit repack-4 &&
 		GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adb &&
@@ -353,7 +350,6 @@ test_expect_success '--write-midx with preferred bitmap tips' '
 		test_line_count = 1 before &&
 
 		rm -fr $midx-$(midx_checksum $objdir).bitmap &&
-		rm -fr $midx-$(midx_checksum $objdir).rev &&
 		rm -fr $midx &&
 
 		# instead of constructing the snapshot ourselves (c.f., the test
-- 
2.34.1.25.gb3157a20e6


^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH v3 9/9] pack-bitmap.c: gracefully fallback after opening pack/MIDX
  2022-01-04 18:15 ` [PATCH v3 0/9] " Taylor Blau
                     ` (7 preceding siblings ...)
  2022-01-04 18:16   ` [PATCH v3 8/9] midx: read `RIDX` chunk when present Taylor Blau
@ 2022-01-04 18:16   ` Taylor Blau
  2022-01-24 19:29     ` Jonathan Tan
  8 siblings, 1 reply; 59+ messages in thread
From: Taylor Blau @ 2022-01-04 18:16 UTC (permalink / raw)
  To: git; +Cc: gitster, stolee

When opening a MIDX/pack-bitmap, we call open_midx_bitmap_1() or
open_pack_bitmap_1() respectively in a loop over the set of MIDXs/packs.
By design, these functions are supposed to be called over every pack and
MIDX, since only one of them should have a valid bitmap.

Ordinarily we return '0' from these two functions in order to indicate
that we successfully loaded a bitmap To signal that we couldn't load a
bitmap corresponding to the MIDX/pack (either because one doesn't exist,
or because there was an error with loading it), we can return '-1'. In
either case, the callers each enumerate all MIDXs/packs to ensure that
at most one bitmap per-kind is present.

But when we fail to load a bitmap that does exist (for example, loading
a MIDX bitmap without finding a corresponding reverse index), we'll
return -1 but leave the 'midx' field non-NULL. So when we fallback to
loading a pack bitmap, we'll complain that the bitmap we're trying to
populate already is "opened", even though it isn't.

Rectify this by setting the '->pack' and '->midx' field back to NULL as
appropriate. Two tests are added: one to ensure that the MIDX-to-pack
bitmap fallback works, and another to ensure we still complain when
there are multiple pack bitmaps in a repository.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 pack-bitmap.c                 |  4 ++++
 t/t5310-pack-bitmaps.sh       | 28 ++++++++++++++++++++++++++++
 t/t5326-multi-pack-bitmaps.sh | 19 +++++++++++++++++++
 3 files changed, 51 insertions(+)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index f772d3cb7f..9c666cdb8b 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -358,7 +358,9 @@ static int open_midx_bitmap_1(struct bitmap_index *bitmap_git,
 cleanup:
 	munmap(bitmap_git->map, bitmap_git->map_size);
 	bitmap_git->map_size = 0;
+	bitmap_git->map_pos = 0;
 	bitmap_git->map = NULL;
+	bitmap_git->midx = NULL;
 	return -1;
 }
 
@@ -405,6 +407,8 @@ static int open_pack_bitmap_1(struct bitmap_index *bitmap_git, struct packed_git
 		munmap(bitmap_git->map, bitmap_git->map_size);
 		bitmap_git->map = NULL;
 		bitmap_git->map_size = 0;
+		bitmap_git->map_pos = 0;
+		bitmap_git->pack = NULL;
 		return -1;
 	}
 
diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
index d05ab716f6..f775fc1ce6 100755
--- a/t/t5310-pack-bitmaps.sh
+++ b/t/t5310-pack-bitmaps.sh
@@ -397,4 +397,32 @@ test_expect_success 'pack.preferBitmapTips' '
 	)
 '
 
+test_expect_success 'complains about multiple pack bitmaps' '
+	rm -fr repo &&
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		test_commit base &&
+
+		git repack -adb &&
+		bitmap="$(ls .git/objects/pack/pack-*.bitmap)" &&
+		mv "$bitmap" "$bitmap.bak" &&
+
+		test_commit other &&
+		git repack -ab &&
+
+		mv "$bitmap.bak" "$bitmap" &&
+
+		find .git/objects/pack -type f -name "*.pack" >packs &&
+		find .git/objects/pack -type f -name "*.bitmap" >bitmaps &&
+		test_line_count = 2 packs &&
+		test_line_count = 2 bitmaps &&
+
+		git rev-list --use-bitmap-index HEAD 2>err &&
+		grep "ignoring extra bitmap file" err
+	)
+'
+
 test_done
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index c0924074c4..3c1ecc7e25 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -266,4 +266,23 @@ test_expect_success 'hash-cache values are propagated from pack bitmaps' '
 	)
 '
 
+test_expect_success 'graceful fallback when missing reverse index' '
+	rm -fr repo &&
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		test_commit base &&
+
+		# write a pack and MIDX bitmap containing base
+		git repack -adb &&
+		git multi-pack-index write --bitmap &&
+
+		GIT_TEST_MIDX_READ_RIDX=0 \
+			git rev-list --use-bitmap-index HEAD 2>err &&
+		! grep "ignoring extra bitmap file" err
+	)
+'
+
 test_done
-- 
2.34.1.25.gb3157a20e6

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v3 2/9] midx.c: make changing the preferred pack safe
  2022-01-04 18:15   ` [PATCH v3 2/9] midx.c: make changing the preferred pack safe Taylor Blau
@ 2022-01-14 21:35     ` Junio C Hamano
  2022-01-14 21:43       ` Junio C Hamano
  2022-01-20 18:08     ` Jonathan Tan
  1 sibling, 1 reply; 59+ messages in thread
From: Junio C Hamano @ 2022-01-14 21:35 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, stolee

Taylor Blau <me@ttaylorr.com> writes:

> ... It's likely we were using
> finalize_object_file() instead of a pure rename() because the former
> also adjusts shared permissions.

I thought the primary reason why we use finalize was because we
ignore EEXIST (and the assumption is that the files with the same
contents get the same name computed from their contents).

>  	tmp_file = write_rev_file_order(NULL, ctx->pack_order, ctx->entries_nr,
>  					midx_hash, WRITE_REV);
>  
> -	if (finalize_object_file(tmp_file, buf.buf))
> +	if (rename(tmp_file, buf.buf))
>  		die(_("cannot store reverse index file"));

Doesn't your new code die with it if buf.buf names an existing file?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v3 2/9] midx.c: make changing the preferred pack safe
  2022-01-14 21:35     ` Junio C Hamano
@ 2022-01-14 21:43       ` Junio C Hamano
  2022-01-15  0:59         ` Taylor Blau
  0 siblings, 1 reply; 59+ messages in thread
From: Junio C Hamano @ 2022-01-14 21:43 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, stolee

Junio C Hamano <gitster@pobox.com> writes:

> Taylor Blau <me@ttaylorr.com> writes:
>
>> ... It's likely we were using
>> finalize_object_file() instead of a pure rename() because the former
>> also adjusts shared permissions.
>
> I thought the primary reason why we use finalize was because we
> ignore EEXIST (and the assumption is that the files with the same
> contents get the same name computed from their contents).
>
>>  	tmp_file = write_rev_file_order(NULL, ctx->pack_order, ctx->entries_nr,
>>  					midx_hash, WRITE_REV);
>>  
>> -	if (finalize_object_file(tmp_file, buf.buf))
>> +	if (rename(tmp_file, buf.buf))
>>  		die(_("cannot store reverse index file"));
>
> Doesn't your new code die with it if buf.buf names an existing file?

Ah, scratch that.  rename() discards the old one atomically, so as
long as tmp_file and buf.buf are in the same directory (which I
think it is in this case), we wouldn't be affected by the bug that
is worked around with "Coda hack" in finalize_object_file(), either.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v3 2/9] midx.c: make changing the preferred pack safe
  2022-01-14 21:43       ` Junio C Hamano
@ 2022-01-15  0:59         ` Taylor Blau
  2022-01-15  6:27           ` Junio C Hamano
  0 siblings, 1 reply; 59+ messages in thread
From: Taylor Blau @ 2022-01-15  0:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, stolee

On Fri, Jan 14, 2022 at 01:43:55PM -0800, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
>
> > Taylor Blau <me@ttaylorr.com> writes:
> >
> >> ... It's likely we were using
> >> finalize_object_file() instead of a pure rename() because the former
> >> also adjusts shared permissions.
> >
> > I thought the primary reason why we use finalize was because we
> > ignore EEXIST (and the assumption is that the files with the same
> > contents get the same name computed from their contents).
> >
> >>  	tmp_file = write_rev_file_order(NULL, ctx->pack_order, ctx->entries_nr,
> >>  					midx_hash, WRITE_REV);
> >>
> >> -	if (finalize_object_file(tmp_file, buf.buf))
> >> +	if (rename(tmp_file, buf.buf))
> >>  		die(_("cannot store reverse index file"));
> >
> > Doesn't your new code die with it if buf.buf names an existing file?
>
> Ah, scratch that.  rename() discards the old one atomically, so as
> long as tmp_file and buf.buf are in the same directory (which I
> think it is in this case), we wouldn't be affected by the bug that
> is worked around with "Coda hack" in finalize_object_file(), either.

Exactly. In this case, we really did want to overwrite an existing .rev
file with the same name. That's because prior to this patch, we didn't
store the object order in the MIDX itself. That made it possible for us
to change the object order, but leave the MIDX checksum alone.

Then we'd keep the old .rev (with the old order, but the same name) in
place, if we used link(2) in order to try (and fail) to overwrite the
existing .rev with the new one.

Indeed, the temporary file created by write_rev_file_order() is written
in the pack directory, which is where the .rev file and MIDX will
ultimately live.

This code is going to be hidden behind a test knob in a few patches
anyway, but verifying all of this is good nonetheless (especially if you
just want to apply these first two patches into the v2.35 tree).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v3 2/9] midx.c: make changing the preferred pack safe
  2022-01-15  0:59         ` Taylor Blau
@ 2022-01-15  6:27           ` Junio C Hamano
  0 siblings, 0 replies; 59+ messages in thread
From: Junio C Hamano @ 2022-01-15  6:27 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, stolee

Taylor Blau <me@ttaylorr.com> writes:

> On Fri, Jan 14, 2022 at 01:43:55PM -0800, Junio C Hamano wrote:
>> Junio C Hamano <gitster@pobox.com> writes:
>>
>> > Taylor Blau <me@ttaylorr.com> writes:
>> >
>> >> ... It's likely we were using
>> >> finalize_object_file() instead of a pure rename() because the former
>> >> also adjusts shared permissions.
>> >
>> > I thought the primary reason why we use finalize was because we
>> > ignore EEXIST (and the assumption is that the files with the same
>> > contents get the same name computed from their contents).
>> >
>> >>  	tmp_file = write_rev_file_order(NULL, ctx->pack_order, ctx->entries_nr,
>> >>  					midx_hash, WRITE_REV);
>> >>
>> >> -	if (finalize_object_file(tmp_file, buf.buf))
>> >> +	if (rename(tmp_file, buf.buf))
>> >>  		die(_("cannot store reverse index file"));
>> >
>> > Doesn't your new code die with it if buf.buf names an existing file?
>>
>> Ah, scratch that.  rename() discards the old one atomically, so as
>> long as tmp_file and buf.buf are in the same directory (which I
>> think it is in this case), we wouldn't be affected by the bug that
>> is worked around with "Coda hack" in finalize_object_file(), either.
>
> Exactly. In this case, we really did want to overwrite an existing .rev
> file with the same name. That's because prior to this patch, we didn't
> store the object order in the MIDX itself. That made it possible for us
> to change the object order, but leave the MIDX checksum alone.

The other change in this step is addition of a new chunk type that
records the revindex data.  With that, you are guaranteeing that a
new file with the same checksum as an old one must have the
byte-for-byte identical contents, so at that point, it is OK to use
finalize_object_file() and not use rename() to keep the old file, no?

If that is the case, we may rather want to use f-o-f consistently
for stuff inside .git/ directory whose filename is tied with its
contents.  I do not think we want to chase a bug that comes from
difference between link-then-unlink vs rename, which affects loose
object files in one way and midx file in another way.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v3 1/9] t5326: demonstrate bitmap corruption after permutation
  2022-01-04 18:15   ` [PATCH v3 1/9] t5326: demonstrate bitmap corruption after permutation Taylor Blau
@ 2022-01-20 17:55     ` Jonathan Tan
  2022-01-20 22:11       ` Taylor Blau
  0 siblings, 1 reply; 59+ messages in thread
From: Jonathan Tan @ 2022-01-20 17:55 UTC (permalink / raw)
  To: me; +Cc: git, gitster, stolee, Jonathan Tan

Taylor Blau <me@ttaylorr.com> writes:
> This patch demonstrates a cause of bitmap corruption that can occur when
> the contents of the multi-pack index does not change, but the underlying
> object order does.
> 
> In this example, we have a MIDX containing two packs, each with a
> distinct set of objects (pack A corresponds to the tree, blob, and
> commit from the first patch, and pack B corresponds to the second
> patch).
> 
> First, a MIDX is written where the 'A' pack is preferred. As expected,
> the bitmaps generated there are in-tact. But then, we generate an
> identical MIDX with a different object order: this time preferring pack
> 'B'.
> 
> Due to a bug which will be explained and fixed in the following commit,
> the MIDX is updated, but the .rev file is not, causing the .bitmap file
> to be read incorrectly. Specifically, the .bitmap file will contain
> correct data, but the auxiliary object order in the .rev file is stale,
> causing readers to get confused by reading the new bitmaps using the old
> object order.

Thanks - overall, this looks like a bug that needs to be fixed.

For the benefit of other reviewers, here's my summary of the problem:
the .midx, .rev, and .bitmap files are almost always generated together,
and it is possible for two different invocations of Git to generate the
same .midx but a different .rev and .bitmap. For example, when
generating a .midx+.rev+.bitmap for 2 disjoint packfiles, the 1st time
with one packfile as preferred and the 2nd time with the other packfile
as preferred. In .midx, packfiles are always ordered by lexicographical
order of their names, and the preferred status only matters when an
object is in multiple packfiles (which never happens in this case, since
the packfiles are disjoint). But the preferred status affects .rev and
.bitmap, because they use a concept called "pseudo-pack order" (see
pack-format.txt for more details) in which the preferred pack comes
first.

As an effort to ensure that Git reads coherent .midx, .rev, and .bitmap
files, both the .rev and .bitmap files are keyed on the checksum of the
.midx file. But the issue here is that a .rev and a .bitmap could both
refer to the same .midx checksum when the .rev and .bitmap files are not
coherent with respect to each other (e.g. when a Git process has written
the .rev, but not the .bitmap yet - but this would appear perfectly
ordinary to another concurrently running Git process, since the .midx
checksum in the .rev and .bitmap files match).

This problem is exacerbated by the fact that the .rev has its .midx
checksum in its filename, whereas the .bitmap has its .midx checksum in
its file contents. When generating .midx+.rev+.bitmap, it would write
the .bitmap but not the .rev, since a .rev of the same filename already
exists.

The solution is to embed the .rev in the .midx. This means that the
checksum stored in .bitmap takes into account the contents of what would
have been in .rev, solving the coherency issue. (There are other
solutions like storing the name of the preferred pack in .midx, but I
think that putting the contents of .rev in the .midx is best.)

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v3 2/9] midx.c: make changing the preferred pack safe
  2022-01-04 18:15   ` [PATCH v3 2/9] midx.c: make changing the preferred pack safe Taylor Blau
  2022-01-14 21:35     ` Junio C Hamano
@ 2022-01-20 18:08     ` Jonathan Tan
  2022-01-20 22:13       ` Taylor Blau
  1 sibling, 1 reply; 59+ messages in thread
From: Jonathan Tan @ 2022-01-20 18:08 UTC (permalink / raw)
  To: me; +Cc: git, gitster, stolee, Jonathan Tan

Taylor Blau <me@ttaylorr.com> writes:
> The previous patch demonstrates a bug where a MIDX's auxiliary object
> order can become out of sync with a MIDX bitmap.
> 
> This is because of two confounding factors:
> 
>   - First, the object order is stored in a file which is named according
>     to the multi-pack index's checksum, and the MIDX does not store the
>     object order. This means that the object order can change without
>     altering the checksum.
> 
>   - But the .rev file is moved into place with finalize_object_file(),
>     which link(2)'s the file into place instead of renaming it. For us,
>     that means that a modified .rev file will not be moved into place if
>     MIDX's checksum was unchanged.
> 
> The fix here is two-fold. First, we need to stop linking the file into
> place and instead rename it. It's likely we were using
> finalize_object_file() instead of a pure rename() because the former
> also adjusts shared permissions. But that is unnecessary, because we
> already do so in write_rev_file_order(), so rename alone is safe.
> 
> But we also need to make the MIDX's checksum change in some way when the
> preferred pack changes without altering the set of packs stored in a
> MIDX to prevent a race where the new .rev file is moved into place
> before the MIDX is updated. Here, you'd get the opposite effect: reading
> old bitmaps with the new object order.

I think the main issue is the first confounding factor you listed above:
even if we didn't have the other confounding factor, that issue alone is
enough to motivate the entire patch set. Likewise, as Junio said [1], I
don't think we need to switch to rename() if we make the checksum
different, so the fix is one-fold, not two-fold. For what it's worth, I
switched back to finalize_object_file() and ran the tests, and they all
pass.

So I would simplify the commit message to just talk about the checksum
issue. (This is definitely not a blocker for merging, though - others
might find the additional context helpful.)

The code up to this patch (apart from the rename()) looks good.

[1] https://lore.kernel.org/git/xmqqtue54iop.fsf@gitster.g/

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v3 3/9] pack-revindex.c: instrument loading on-disk reverse index
  2022-01-04 18:15   ` [PATCH v3 3/9] pack-revindex.c: instrument loading on-disk reverse index Taylor Blau
@ 2022-01-20 18:15     ` Jonathan Tan
  2022-01-20 22:18       ` Taylor Blau
  0 siblings, 1 reply; 59+ messages in thread
From: Jonathan Tan @ 2022-01-20 18:15 UTC (permalink / raw)
  To: me; +Cc: git, gitster, stolee, Jonathan Tan

Taylor Blau <me@ttaylorr.com> writes:
> In a subsequent commit, we'll use the MIDX's new 'RIDX' chunk as a
> source for the reverse index's data. But it will be useful for tests to
> be able to determine whether the reverse index was loaded from the
> separate .rev file, or from a chunk within the MIDX.

As for this patch and all subsequent patches, we (at $DAYJOB) discussed
during our Review Club the idea of not supporting .rev files altogether.
Firstly, because of this bug, we cannot fully trust .rev files anymore.
And even if we decided to trust it (or to trust it after some
verification step that I can't think of), that file would only be useful
for a short time until a regularly scheduled maintenance step
regenerates the bitmaps anyway.

That would also simplify the code slightly and eliminate the need to test a
variety of cases - so we probably wouldn't need the t/lib-bitmap.sh file
introduced in patch 6. I'll hold off on reviewing the rest of the
patches for now, since they may substantially change if we decide to
stop supporting .rev files.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v3 1/9] t5326: demonstrate bitmap corruption after permutation
  2022-01-20 17:55     ` Jonathan Tan
@ 2022-01-20 22:11       ` Taylor Blau
  2022-01-20 22:41         ` Junio C Hamano
  2022-01-24 17:40         ` Jonathan Tan
  0 siblings, 2 replies; 59+ messages in thread
From: Taylor Blau @ 2022-01-20 22:11 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, gitster, stolee

On Thu, Jan 20, 2022 at 09:55:41AM -0800, Jonathan Tan wrote:
> As an effort to ensure that Git reads coherent .midx, .rev, and .bitmap
> files, both the .rev and .bitmap files are keyed on the checksum of the
> .midx file. But the issue here is that a .rev and a .bitmap could both
> refer to the same .midx checksum when the .rev and .bitmap files are not
> coherent with respect to each other (e.g. when a Git process has written
> the .rev, but not the .bitmap yet - but this would appear perfectly
> ordinary to another concurrently running Git process, since the .midx
> checksum in the .rev and .bitmap files match).

Kind of, and it's possible that we're saying the same thing here using
different words.

But seeing an out-of-sync .bitmap and .rev is really a symptom of the
underlying problem, which is that the MIDX checksum could (prior to
these patches) be unchanged even when the object order it represents
_does_ change (e.g., in the case of swapping the preferred pack around
like the test here demonstrates).

> This problem is exacerbated by the fact that the .rev has its .midx
> checksum in its filename, whereas the .bitmap has its .midx checksum in
> its file contents. When generating .midx+.rev+.bitmap, it would write
> the .bitmap but not the .rev, since a .rev of the same filename already
> exists.

This isn't quite right: both have the MIDX's checksum in their filename.
For example after running `git repack --write-midx --write-bitmap-index`
on a random repository, I get these MIDX-related files:

    $ find .git/objects/pack -type f -name 'multi-pack-index*'
    .git/objects/pack/multi-pack-index-fd71600b4ceb4caf23a538c7829b9284f2b66d73.rev
    .git/objects/pack/multi-pack-index-fd71600b4ceb4caf23a538c7829b9284f2b66d73.bitmap
    .git/objects/pack/multi-pack-index

Before these patches, it was possible for the MIDX's object order to
change but for its checksum to remain the same. The problem here is that
we rename(2)'d the .bitmap into place, but only link(2)'d the .rev into
place.

So one of the two was updated, and the other was left behind. That does
make them incoherent with respect to each other; but I find it more
useful to think of it as the .rev being out-of-sync with the MIDX.

> The solution is to embed the .rev in the .midx. This means that the
> checksum stored in .bitmap takes into account the contents of what would
> have been in .rev, solving the coherency issue. (There are other
> solutions like storing the name of the preferred pack in .midx, but I
> think that putting the contents of .rev in the .midx is best.)

Right. The problem being that it's possible to change the MIDX's
object order without changing its checksum. Since the .rev file's
contents encodes the pseudo-pack order, embedding it into the MIDX is
sufficient to guarantee that the MIDX's checksum changes whenever the
object order does.

Apologies if that all exactly matched up with your understanding, and I
was just telling you stuff that you already knew. But I figure that this
bug is subtle enough that a little bit of hair-splitting doesn't hurt
;).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v3 2/9] midx.c: make changing the preferred pack safe
  2022-01-20 18:08     ` Jonathan Tan
@ 2022-01-20 22:13       ` Taylor Blau
  0 siblings, 0 replies; 59+ messages in thread
From: Taylor Blau @ 2022-01-20 22:13 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, gitster, stolee

On Thu, Jan 20, 2022 at 10:08:43AM -0800, Jonathan Tan wrote:
> I think the main issue is the first confounding factor you listed above:
> even if we didn't have the other confounding factor, that issue alone is
> enough to motivate the entire patch set. Likewise, as Junio said [1], I
> don't think we need to switch to rename() if we make the checksum
> different, so the fix is one-fold, not two-fold. For what it's worth, I
> switched back to finalize_object_file() and ran the tests, and they all
> pass.

Yeah, I agree with both of you and would rather use
finalize_object_file() here and be consistent with other callers that
modify $GIT_DIR/objects.

It is safe to do, since the .rev file's name will necessarily be
different if the MIDX's object order changes.

> [1] https://lore.kernel.org/git/xmqqtue54iop.fsf@gitster.g/

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v3 3/9] pack-revindex.c: instrument loading on-disk reverse index
  2022-01-20 18:15     ` Jonathan Tan
@ 2022-01-20 22:18       ` Taylor Blau
  2022-01-24 17:53         ` Jonathan Tan
  0 siblings, 1 reply; 59+ messages in thread
From: Taylor Blau @ 2022-01-20 22:18 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: git, gitster, stolee

On Thu, Jan 20, 2022 at 10:15:45AM -0800, Jonathan Tan wrote:
> Taylor Blau <me@ttaylorr.com> writes:
> > In a subsequent commit, we'll use the MIDX's new 'RIDX' chunk as a
> > source for the reverse index's data. But it will be useful for tests to
> > be able to determine whether the reverse index was loaded from the
> > separate .rev file, or from a chunk within the MIDX.
>
> As for this patch and all subsequent patches, we (at $DAYJOB) discussed
> during our Review Club the idea of not supporting .rev files altogether.
> Firstly, because of this bug, we cannot fully trust .rev files anymore.
> And even if we decided to trust it (or to trust it after some
> verification step that I can't think of), that file would only be useful
> for a short time until a regularly scheduled maintenance step
> regenerates the bitmaps anyway.

I assume that you're talking about MIDX .rev files here, not the usual
packfile ones which are only related in the sense that they follow the
same file format, but are otherwise distinct.

We could drop support for the MIDX .rev files, but I do not think it is
a very good idea. There are tagged versions of Git which have .rev files
out in the wild, so suddenly dropping support for them would mean that
existing repositories would no longer be able to read their multi-pack
bitmaps after upgrading to a version of Git which doesn't include
support for separate .rev files.

So there are some backwards compatibility concerns there, and I think
that that alone means we can't do it.

In order to maintain good test coverage of the soon-to-be deprecated
external MIDX .rev files, this patch series adds some additional testing
knobs to make sure that we're thoroughly exercising both cases.

It's somewhat unfortunate that we are stuck with on-disk .rev files for
MIDXs, since it opened the door for this bug to manifest itself. But I
am not comfortable getting rid of them at this point.

In any case, the .rev file's contents are moving to the MIDX, and by the
end of this series we do not write an external .rev file when creating a
multi-pack bitmap, so you are more than free to not trust any existing
.rev files anymore.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v3 1/9] t5326: demonstrate bitmap corruption after permutation
  2022-01-20 22:11       ` Taylor Blau
@ 2022-01-20 22:41         ` Junio C Hamano
  2022-01-20 22:46           ` Taylor Blau
  2022-01-24 17:40         ` Jonathan Tan
  1 sibling, 1 reply; 59+ messages in thread
From: Junio C Hamano @ 2022-01-20 22:41 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Jonathan Tan, git, stolee

Taylor Blau <me@ttaylorr.com> writes:

> Right. The problem being that it's possible to change the MIDX's
> object order without changing its checksum. Since the .rev file's
> contents encodes the pseudo-pack order, embedding it into the MIDX is
> sufficient to guarantee that the MIDX's checksum changes whenever the
> object order does.

To slightly rephrase,

    The problem being that before the change to embed reverse table in
    MIDX, it was possible to change the object order without changing
    the checksum, but with that change alone, if a rebuilt MIDX has the
    same checksum, it is guaranteed that it records the same objects in
    the same order.

Hence, shuffling between rename(2) and link(2), which was the code
change that is only necessary to force updating the existing MIDX
with newly recreated one because two MIDX with the same checksum
could record objects in different orders, is no longer necessary.

Am I on the same page as you two?

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v3 1/9] t5326: demonstrate bitmap corruption after permutation
  2022-01-20 22:41         ` Junio C Hamano
@ 2022-01-20 22:46           ` Taylor Blau
  0 siblings, 0 replies; 59+ messages in thread
From: Taylor Blau @ 2022-01-20 22:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Taylor Blau, Jonathan Tan, git, stolee

On Thu, Jan 20, 2022 at 02:41:47PM -0800, Junio C Hamano wrote:
> Am I on the same page as you two?

Yes, exactly.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v3 1/9] t5326: demonstrate bitmap corruption after permutation
  2022-01-20 22:11       ` Taylor Blau
  2022-01-20 22:41         ` Junio C Hamano
@ 2022-01-24 17:40         ` Jonathan Tan
  1 sibling, 0 replies; 59+ messages in thread
From: Jonathan Tan @ 2022-01-24 17:40 UTC (permalink / raw)
  To: me; +Cc: jonathantanmy, git, gitster, stolee

Taylor Blau <me@ttaylorr.com> writes:
> On Thu, Jan 20, 2022 at 09:55:41AM -0800, Jonathan Tan wrote:
> > As an effort to ensure that Git reads coherent .midx, .rev, and .bitmap
> > files, both the .rev and .bitmap files are keyed on the checksum of the
> > .midx file. But the issue here is that a .rev and a .bitmap could both
> > refer to the same .midx checksum when the .rev and .bitmap files are not
> > coherent with respect to each other (e.g. when a Git process has written
> > the .rev, but not the .bitmap yet - but this would appear perfectly
> > ordinary to another concurrently running Git process, since the .midx
> > checksum in the .rev and .bitmap files match).
> 
> Kind of, and it's possible that we're saying the same thing here using
> different words.
> 
> But seeing an out-of-sync .bitmap and .rev is really a symptom of the
> underlying problem, which is that the MIDX checksum could (prior to
> these patches) be unchanged even when the object order it represents
> _does_ change (e.g., in the case of swapping the preferred pack around
> like the test here demonstrates).

Yeah - I think the way you're describing it is that the concept of
preferred packfile is inherent to the .midx, whereas I just think of it
as an input that helps the algorithm choose between the many .midx
possible given a set of packfiles. Maybe these are just two different
ways of looking at the same thing.

> > This problem is exacerbated by the fact that the .rev has its .midx
> > checksum in its filename, whereas the .bitmap has its .midx checksum in
> > its file contents. When generating .midx+.rev+.bitmap, it would write
> > the .bitmap but not the .rev, since a .rev of the same filename already
> > exists.
> 
> This isn't quite right: both have the MIDX's checksum in their filename.
> For example after running `git repack --write-midx --write-bitmap-index`
> on a random repository, I get these MIDX-related files:
> 
>     $ find .git/objects/pack -type f -name 'multi-pack-index*'
>     .git/objects/pack/multi-pack-index-fd71600b4ceb4caf23a538c7829b9284f2b66d73.rev
>     .git/objects/pack/multi-pack-index-fd71600b4ceb4caf23a538c7829b9284f2b66d73.bitmap
>     .git/objects/pack/multi-pack-index

Ah, thanks for the clarification. I saw that bitmaps were overridden
correctly and saw BITMAP_OPT_HASH_CACHE in the bitmap-format
documentation, and made an incorrect assumption.

> Apologies if that all exactly matched up with your understanding, and I
> was just telling you stuff that you already knew. But I figure that this
> bug is subtle enough that a little bit of hair-splitting doesn't hurt
> ;).

Thanks for all the clarification, especially about the bitmap.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v3 3/9] pack-revindex.c: instrument loading on-disk reverse index
  2022-01-20 22:18       ` Taylor Blau
@ 2022-01-24 17:53         ` Jonathan Tan
  0 siblings, 0 replies; 59+ messages in thread
From: Jonathan Tan @ 2022-01-24 17:53 UTC (permalink / raw)
  To: me; +Cc: jonathantanmy, git, gitster, stolee

Taylor Blau <me@ttaylorr.com> writes:
> On Thu, Jan 20, 2022 at 10:15:45AM -0800, Jonathan Tan wrote:
> > Taylor Blau <me@ttaylorr.com> writes:
> > > In a subsequent commit, we'll use the MIDX's new 'RIDX' chunk as a
> > > source for the reverse index's data. But it will be useful for tests to
> > > be able to determine whether the reverse index was loaded from the
> > > separate .rev file, or from a chunk within the MIDX.
> >
> > As for this patch and all subsequent patches, we (at $DAYJOB) discussed
> > during our Review Club the idea of not supporting .rev files altogether.
> > Firstly, because of this bug, we cannot fully trust .rev files anymore.
> > And even if we decided to trust it (or to trust it after some
> > verification step that I can't think of), that file would only be useful
> > for a short time until a regularly scheduled maintenance step
> > regenerates the bitmaps anyway.
> 
> I assume that you're talking about MIDX .rev files here, not the usual
> packfile ones which are only related in the sense that they follow the
> same file format, but are otherwise distinct.

Ah, yes - sorry for the lack of clarity.

> We could drop support for the MIDX .rev files, but I do not think it is
> a very good idea. There are tagged versions of Git which have .rev files
> out in the wild, so suddenly dropping support for them would mean that
> existing repositories would no longer be able to read their multi-pack
> bitmaps after upgrading to a version of Git which doesn't include
> support for separate .rev files.
> 
> So there are some backwards compatibility concerns there, and I think
> that that alone means we can't do it.
> 
> In order to maintain good test coverage of the soon-to-be deprecated
> external MIDX .rev files, this patch series adds some additional testing
> knobs to make sure that we're thoroughly exercising both cases.
> 
> It's somewhat unfortunate that we are stuck with on-disk .rev files for
> MIDXs, since it opened the door for this bug to manifest itself. But I
> am not comfortable getting rid of them at this point.

Thanks for this explanation of your viewpoint. I still think that it's
fine to drop support for reading something that a tagged and released
version of Git has created, since it is a potentially wrong cache (and
does not contain any new information), but what I can do is to review
the rest of this patch set assuming that we want to keep support for the
on-disk MIDX .rev files, so I'll do that. This way, if we decide to keep
support for these files, I would be happy to see this patch set merged.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v3 7/9] t/lib-bitmap.sh: parameterize tests over reverse index source
  2022-01-04 18:15   ` [PATCH v3 7/9] t/lib-bitmap.sh: parameterize tests over reverse index source Taylor Blau
@ 2022-01-24 19:15     ` Jonathan Tan
  0 siblings, 0 replies; 59+ messages in thread
From: Jonathan Tan @ 2022-01-24 19:15 UTC (permalink / raw)
  To: me; +Cc: git, gitster, stolee, Jonathan Tan

Taylor Blau <me@ttaylorr.com> writes:
> To prepare for reading the reverse index data out of the MIDX itself,
> teach the `test_rev_exists` function to take an expected "source" for
> the reverse index data.

Thanks - up to here looks good. Thanks especially for patch 6, which was easy
to verify using "--color-moved --color-moved-ws=allow-indentation-change".

> diff --git a/t/lib-bitmap.sh b/t/lib-bitmap.sh
> index 48a8730a13..77b5f46a03 100644
> --- a/t/lib-bitmap.sh
> +++ b/t/lib-bitmap.sh
> @@ -275,17 +275,23 @@ midx_pack_source () {
>  
>  test_rev_exists () {
>  	commit="$1"
> +	kind="$2"
>  
>  	test_expect_success 'reverse index exists' '

To make it easier to understand test failures, we should probably
include "kind" in the name of the test case.

In a separate commit after this one, we should do it for the other
blocks.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v3 8/9] midx: read `RIDX` chunk when present
  2022-01-04 18:16   ` [PATCH v3 8/9] midx: read `RIDX` chunk when present Taylor Blau
@ 2022-01-24 19:27     ` Jonathan Tan
  0 siblings, 0 replies; 59+ messages in thread
From: Jonathan Tan @ 2022-01-24 19:27 UTC (permalink / raw)
  To: me; +Cc: git, gitster, stolee, Jonathan Tan

Taylor Blau <me@ttaylorr.com> writes:
> When a MIDX contains the new `RIDX` chunk, ensure that the reverse index
> is read from it instead of the on-disk .rev file. Since we need to
> encode the object order in the MIDX itself for correctness reasons,
> there is no point in storing the same data again outside of the MIDX.
> 
> So, this patch stops writing separate .rev files, and reads it out of
> the MIDX itself. This is possible to do with relatively little new code,
> since the format of the RIDX chunk is identical to the data in the .rev
> file. In other words, we can implement this by pointing the
> `revindex_data` field at the reverse index chunk of the MIDX instead of
> the .rev file without any other changes.

Ah, that's great.

> Note that we have two knobs that are adjusted for the new tests:
> GIT_TEST_MIDX_WRITE_REV and GIT_TEST_MIDX_READ_RIDX. The former controls
> whether the MIDX .rev is written at all, and the latter controls whether
> we read the MIDX's RIDX chunk.
> 
> Both are necessary to ensure that the test added at the beginning of
> this series continues to work. This is because we always need to write
> the RIDX chunk in the MIDX in order to change its checksum, but we want
> to make sure reading the existing .rev file still works (since the RIDX
> chunk takes precedence by default).

Could we disable the beginning-of-this-series test when the MIDX .rev is
not written instead? Then, we can test what the user would actually
experience (no reverse index in .midx, reverse index in .rev) instead of
simulating it by skipping over the reverse index section in .midx.

If we can't do that, then I would agree that the approach in this patch
seems like the best approach.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH v3 9/9] pack-bitmap.c: gracefully fallback after opening pack/MIDX
  2022-01-04 18:16   ` [PATCH v3 9/9] pack-bitmap.c: gracefully fallback after opening pack/MIDX Taylor Blau
@ 2022-01-24 19:29     ` Jonathan Tan
  0 siblings, 0 replies; 59+ messages in thread
From: Jonathan Tan @ 2022-01-24 19:29 UTC (permalink / raw)
  To: me; +Cc: git, gitster, stolee, Jonathan Tan

Taylor Blau <me@ttaylorr.com> writes:
> When opening a MIDX/pack-bitmap, we call open_midx_bitmap_1() or
> open_pack_bitmap_1() respectively in a loop over the set of MIDXs/packs.
> By design, these functions are supposed to be called over every pack and
> MIDX, since only one of them should have a valid bitmap.

Thanks for this series. Apart from my minor comments on patches 7 and 8,
this series looks good.

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2022-01-24 20:18 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-08 19:26 [PATCH 0/2] midx: prevent bitmap corruption when permuting pack order Taylor Blau
2021-12-08 19:26 ` [PATCH 1/2] t5326: demonstrate bitmap corruption after permutation Taylor Blau
2021-12-08 19:26 ` [PATCH 2/2] midx.c: make changing the preferred pack safe Taylor Blau
2021-12-08 19:30 ` [PATCH 0/2] midx: prevent bitmap corruption when permuting pack order Derrick Stolee
2021-12-08 19:55   ` Jeff King
2021-12-10 18:36     ` Taylor Blau
2021-12-10 22:31       ` Taylor Blau
2021-12-11  1:39         ` Taylor Blau
2021-12-13 14:00           ` Derrick Stolee
2021-12-13 14:31             ` Taylor Blau
2021-12-14  1:55 ` [PATCH v2 0/8] " Taylor Blau
2021-12-14  1:55   ` [PATCH v2 1/8] t5326: demonstrate bitmap corruption after permutation Taylor Blau
2021-12-14  1:55   ` [PATCH v2 2/8] midx.c: make changing the preferred pack safe Taylor Blau
2021-12-14  1:55   ` [PATCH v2 3/8] pack-revindex.c: instrument loading on-disk reverse index Taylor Blau
2021-12-14  1:55   ` [PATCH v2 4/8] t5326: drop unnecessary setup Taylor Blau
2021-12-14  1:55   ` [PATCH v2 5/8] t5326: extract `test_rev_exists` Taylor Blau
2021-12-20 18:33     ` Derrick Stolee
2022-01-04 15:33       ` Taylor Blau
2021-12-14  1:55   ` [PATCH v2 6/8] t5326: move tests to t/lib-bitmap.sh Taylor Blau
2021-12-14  1:55   ` [PATCH v2 7/8] t/lib-bitmap.sh: parameterize tests over reverse index source Taylor Blau
2021-12-14  1:55   ` [PATCH v2 8/8] midx: read `RIDX` chunk when present Taylor Blau
2021-12-20 18:42     ` Derrick Stolee
2022-01-04 15:21       ` Taylor Blau
2021-12-15 19:46   ` [PATCH v2 0/8] midx: prevent bitmap corruption when permuting pack order Junio C Hamano
2021-12-15 21:37     ` Taylor Blau
2021-12-15 22:17       ` Junio C Hamano
2021-12-15 22:55         ` Junio C Hamano
2021-12-20 18:51     ` Derrick Stolee
2021-12-20 19:52       ` Taylor Blau
2021-12-20 20:09         ` Derrick Stolee
2021-12-15 22:58   ` Junio C Hamano
2021-12-15 23:01     ` Taylor Blau
2022-01-04 18:15 ` [PATCH v3 0/9] " Taylor Blau
2022-01-04 18:15   ` [PATCH v3 1/9] t5326: demonstrate bitmap corruption after permutation Taylor Blau
2022-01-20 17:55     ` Jonathan Tan
2022-01-20 22:11       ` Taylor Blau
2022-01-20 22:41         ` Junio C Hamano
2022-01-20 22:46           ` Taylor Blau
2022-01-24 17:40         ` Jonathan Tan
2022-01-04 18:15   ` [PATCH v3 2/9] midx.c: make changing the preferred pack safe Taylor Blau
2022-01-14 21:35     ` Junio C Hamano
2022-01-14 21:43       ` Junio C Hamano
2022-01-15  0:59         ` Taylor Blau
2022-01-15  6:27           ` Junio C Hamano
2022-01-20 18:08     ` Jonathan Tan
2022-01-20 22:13       ` Taylor Blau
2022-01-04 18:15   ` [PATCH v3 3/9] pack-revindex.c: instrument loading on-disk reverse index Taylor Blau
2022-01-20 18:15     ` Jonathan Tan
2022-01-20 22:18       ` Taylor Blau
2022-01-24 17:53         ` Jonathan Tan
2022-01-04 18:15   ` [PATCH v3 4/9] t5326: drop unnecessary setup Taylor Blau
2022-01-04 18:15   ` [PATCH v3 5/9] t5326: extract `test_rev_exists` Taylor Blau
2022-01-04 18:15   ` [PATCH v3 6/9] t5326: move tests to t/lib-bitmap.sh Taylor Blau
2022-01-04 18:15   ` [PATCH v3 7/9] t/lib-bitmap.sh: parameterize tests over reverse index source Taylor Blau
2022-01-24 19:15     ` Jonathan Tan
2022-01-04 18:16   ` [PATCH v3 8/9] midx: read `RIDX` chunk when present Taylor Blau
2022-01-24 19:27     ` Jonathan Tan
2022-01-04 18:16   ` [PATCH v3 9/9] pack-bitmap.c: gracefully fallback after opening pack/MIDX Taylor Blau
2022-01-24 19:29     ` Jonathan Tan

Code repositories for project(s) associated with this inbox:

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).