git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Cc: Taylor Blau <me@ttaylorr.com>
Subject: [PATCH] pack-bitmap: gracefully handle missing BTMP chunks
Date: Tue, 9 Apr 2024 07:59:25 +0200	[thread overview]
Message-ID: <5933a302b581670183a6f3c881f62e96f61ff192.1712642313.git.ps@pks.im> (raw)

[-- Attachment #1: Type: text/plain, Size: 6191 bytes --]

In 0fea6b73f1 (Merge branch 'tb/multi-pack-verbatim-reuse', 2024-01-12)
we have introduced multi-pack verbatim reuse of objects. This series has
introduced a new BTMP chunk, which encodes information about bitmapped
objects in the multi-pack index. Starting with dab60934e3 (pack-bitmap:
pass `bitmapped_pack` struct to pack-reuse functions, 2023-12-14) we use
this information to figure out objects which we can reuse from each of
the packfiles.

One thing that we glossed over though is backwards compatibility with
repositories that do not yet have BTMP chunks in their multi-pack index.
In that case, `nth_bitmapped_pack()` would return an error, which causes
us to emit a warning followed by another error message. These warnings
are visible to users that fetch from a repository:

```
$ git fetch
...
remote: error: MIDX does not contain the BTMP chunk
remote: warning: unable to load pack: 'pack-f6bb7bd71d345ea9fe604b60cab9ba9ece54ffbe.idx', disabling pack-reuse
remote: Enumerating objects: 40, done.
remote: Counting objects: 100% (40/40), done.
remote: Compressing objects: 100% (39/39), done.
remote: Total 40 (delta 5), reused 0 (delta 0), pack-reused 0 (from 0)
...
```

While the fetch succeeds the user is left wondering what they did wrong.
Furthermore, as visible both from the warning and from the reuse stats,
pack-reuse is completely disabled in such repositories.

What is quite interesting is that this issue can even be triggered in
case `pack.allowPackReuse=single` is set, which is the default value.
One could have expected that in this case we fall back to the old logic,
which is to use the preferred packfile without consulting BTMP chunks at
all. But either we fail with the above error in case they are missing,
or we use the first pack in the multi-pack-index. The former case
disables pack-reuse altogether, whereas the latter case may result in
reusing objects from a suboptimal packfile.

Fix this issue by partially reverting the logic back to what we had
before this patch series landed. Namely, in the case where we have no
BTMP chunks or when `pack.allowPackReuse=single` are set, we use the
preferred pack instead of consulting the BTMP chunks.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 midx.c                        |  7 ++++---
 pack-bitmap.c                 | 36 ++++++++++++++++++-----------------
 t/t5326-multi-pack-bitmaps.sh | 22 +++++++++++++++++++++
 3 files changed, 45 insertions(+), 20 deletions(-)

diff --git a/midx.c b/midx.c
index 41521e019c..6903e9dfd2 100644
--- a/midx.c
+++ b/midx.c
@@ -1661,9 +1661,10 @@ static int write_midx_internal(const char *object_dir,
 		add_chunk(cf, MIDX_CHUNKID_REVINDEX,
 			  st_mult(ctx.entries_nr, sizeof(uint32_t)),
 			  write_midx_revindex);
-		add_chunk(cf, MIDX_CHUNKID_BITMAPPEDPACKS,
-			  bitmapped_packs_concat_len,
-			  write_midx_bitmapped_packs);
+		if (git_env_bool("GIT_TEST_MIDX_WRITE_BTMP", 1))
+			add_chunk(cf, MIDX_CHUNKID_BITMAPPEDPACKS,
+				  bitmapped_packs_concat_len,
+				  write_midx_bitmapped_packs);
 	}
 
 	write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs);
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 2baeabacee..f286805724 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -2049,7 +2049,25 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 
 	load_reverse_index(r, bitmap_git);
 
-	if (bitmap_is_midx(bitmap_git)) {
+	if (bitmap_is_midx(bitmap_git) &&
+	    (!multi_pack_reuse || !bitmap_git->midx->chunk_bitmapped_packs)) {
+		uint32_t preferred_pack_pos;
+		struct packed_git *pack;
+
+		if (midx_preferred_pack(bitmap_git->midx, &preferred_pack_pos) < 0) {
+			warning(_("unable to compute preferred pack, disabling pack-reuse"));
+			return;
+		}
+
+		pack = bitmap_git->midx->packs[preferred_pack_pos];
+
+		ALLOC_GROW(packs, packs_nr + 1, packs_alloc);
+		packs[packs_nr].p = pack;
+		packs[packs_nr].bitmap_nr = pack->num_objects;
+		packs[packs_nr].bitmap_pos = 0;
+
+		objects_nr = packs[packs_nr++].bitmap_nr;
+	} else if (bitmap_is_midx(bitmap_git)) {
 		for (i = 0; i < bitmap_git->midx->num_packs; i++) {
 			struct bitmapped_pack pack;
 			if (nth_bitmapped_pack(r, bitmap_git->midx, &pack, i) < 0) {
@@ -2062,26 +2080,10 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
 			if (!pack.bitmap_nr)
 				continue;
 
-			if (!multi_pack_reuse && pack.bitmap_pos) {
-				/*
-				 * If we're only reusing a single pack, skip
-				 * over any packs which are not positioned at
-				 * the beginning of the MIDX bitmap.
-				 *
-				 * This is consistent with the existing
-				 * single-pack reuse behavior, which only reuses
-				 * parts of the MIDX's preferred pack.
-				 */
-				continue;
-			}
-
 			ALLOC_GROW(packs, packs_nr + 1, packs_alloc);
 			memcpy(&packs[packs_nr++], &pack, sizeof(pack));
 
 			objects_nr += pack.p->num_objects;
-
-			if (!multi_pack_reuse)
-				break;
 		}
 
 		QSORT(packs, packs_nr, bitmapped_pack_cmp);
diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
index 70d1b58709..ee3843b239 100755
--- a/t/t5326-multi-pack-bitmaps.sh
+++ b/t/t5326-multi-pack-bitmaps.sh
@@ -513,4 +513,26 @@ test_expect_success 'corrupt MIDX with bitmap causes fallback' '
 	)
 '
 
+for allow_pack_reuse in single multi
+do
+	test_expect_success "MIDX without BTMP chunk does not complain with $allow_pack_reuse pack reuse" '
+		test_when_finished "rm -rf midx-without-btmp" &&
+		git init midx-without-btmp &&
+		(
+			cd midx-without-btmp &&
+			test_commit initial &&
+
+			# Write a multi-pack index that does have a bitmap, but
+			# no BTMP chunk. Such MIDX files would not be generated
+			# by modern Git anymore, but they were generated by
+			# older Git versions.
+			GIT_TEST_MIDX_WRITE_BTMP=false \
+				git repack -Adbl --write-bitmap-index --write-midx &&
+			git -c pack.allowPackReuse=$allow_pack_reuse \
+				pack-objects --all --use-bitmap-index --stdout </dev/null >/dev/null 2>err &&
+			test_must_be_empty err
+		)
+	'
+done
+
 test_done
-- 
2.44.GIT


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

             reply	other threads:[~2024-04-09  5:59 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-09  5:59 Patrick Steinhardt [this message]
2024-04-10 15:02 ` [PATCH] pack-bitmap: gracefully handle missing BTMP chunks Taylor Blau
2024-04-15  6:34   ` Patrick Steinhardt
2024-04-15 22:42     ` Taylor Blau
2024-04-15  6:41 ` [PATCH v2] " Patrick Steinhardt
2024-04-15  8:51   ` Patrick Steinhardt
2024-04-15 17:41     ` Junio C Hamano
2024-04-15 22:51       ` Taylor Blau
2024-04-15 23:46         ` Junio C Hamano
2024-04-15 22:51   ` Taylor Blau
2024-04-16  4:47     ` Patrick Steinhardt
2024-04-16  5:12       ` Jeff King
2024-04-16  5:14         ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5933a302b581670183a6f3c881f62e96f61ff192.1712642313.git.ps@pks.im \
    --to=ps@pks.im \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).