git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org
Subject: Re: [PATCH v2] pack-bitmap: gracefully handle missing BTMP chunks
Date: Mon, 15 Apr 2024 18:51:16 -0400	[thread overview]
Message-ID: <Zh2vZB/60QlLYjUZ@nand.local> (raw)
In-Reply-To: <a8251f8278ba9a3b41a8e299cb4918a62df6d1c7.1713163238.git.ps@pks.im>

On Mon, Apr 15, 2024 at 08:41:25AM +0200, Patrick Steinhardt wrote:
> diff --git a/midx.c b/midx.c
> index ae3b49166c..6f07de3688 100644
> --- a/midx.c
> +++ b/midx.c
> @@ -170,9 +170,10 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
>
>  	pair_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS, &m->chunk_large_offsets,
>  		   &m->chunk_large_offsets_len);
> -	pair_chunk(cf, MIDX_CHUNKID_BITMAPPEDPACKS,
> -		   (const unsigned char **)&m->chunk_bitmapped_packs,
> -		   &m->chunk_bitmapped_packs_len);
> +	if (git_env_bool("GIT_TEST_MIDX_READ_BTMP", 1))
> +		pair_chunk(cf, MIDX_CHUNKID_BITMAPPEDPACKS,
> +			   (const unsigned char **)&m->chunk_bitmapped_packs,
> +			   &m->chunk_bitmapped_packs_len);

OK, so we're switching to a new GIT_TEST_-variable here, which controls
whether or not we read the BTMP chunk. That makes sense, and is much
appreciated :-).

> diff --git a/pack-bitmap.c b/pack-bitmap.c
> index 2baeabacee..35c5ef9d3c 100644
> --- a/pack-bitmap.c
> +++ b/pack-bitmap.c
> @@ -2049,7 +2049,10 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
>
>  	load_reverse_index(r, bitmap_git);
>
> -	if (bitmap_is_midx(bitmap_git)) {
> +	if (!bitmap_is_midx(bitmap_git) || !bitmap_git->midx->chunk_bitmapped_packs)
> +		multi_pack_reuse = 0;
> +

Either we don't have a MIDX, or we do, but it doesn't have a BTMP chunk.
In either case, we should disable multi-pack reuse (either using the
single pack corresponding with a classic pack-bitmap, or the preferred
pack if using a MIDX bitamp written prior to the BTMP chunk).

Looking good.

> +	if (multi_pack_reuse) {
>  		for (i = 0; i < bitmap_git->midx->num_packs; i++) {
>  			struct bitmapped_pack pack;
>  			if (nth_bitmapped_pack(r, bitmap_git->midx, &pack, i) < 0) {
> @@ -2062,34 +2065,32 @@ void reuse_partial_packfile_from_bitmap(struct bitmap_index *bitmap_git,
>  			if (!pack.bitmap_nr)
>  				continue;
>
> -			if (!multi_pack_reuse && pack.bitmap_pos) {
> -				/*
> -				 * If we're only reusing a single pack, skip
> -				 * over any packs which are not positioned at
> -				 * the beginning of the MIDX bitmap.
> -				 *
> -				 * This is consistent with the existing
> -				 * single-pack reuse behavior, which only reuses
> -				 * parts of the MIDX's preferred pack.
> -				 */
> -				continue;
> -			}

Yep, this hunk can go since it used to belong to the outer if-statement
in the pre-image that was conditioned on 'bitmap_is_midx()'. This is
dealt with separately, since we know ahead of time we're doing
multi-pack reuse (and can do so).
> -
>  			ALLOC_GROW(packs, packs_nr + 1, packs_alloc);
>  			memcpy(&packs[packs_nr++], &pack, sizeof(pack));
>
>  			objects_nr += pack.p->num_objects;
> -
> -			if (!multi_pack_reuse)
> -				break;
>  		}
>
>  		QSORT(packs, packs_nr, bitmapped_pack_cmp);
>  	} else {
> +		struct packed_git *pack;
> +
> +		if (bitmap_is_midx(bitmap_git)) {
> +			uint32_t preferred_pack_pos;
> +
> +			if (midx_preferred_pack(bitmap_git->midx, &preferred_pack_pos) < 0) {
> +				warning(_("unable to compute preferred pack, disabling pack-reuse"));
> +				return;
> +			}
> +
> +			pack = bitmap_git->midx->packs[preferred_pack_pos];
> +		} else {
> +			pack = bitmap_git->pack;
> +		}
> +

Looking good. Here we're doing single-pack reuse (either from the pack
corresponding with the bitmap or the MIDX's preferred pack). Either way
we set the 'pack' variable to point at the appropriate pack, and then
add that pack to the list of reusable packs below. Good.

>  		ALLOC_GROW(packs, packs_nr + 1, packs_alloc);
> -
> -		packs[packs_nr].p = bitmap_git->pack;
> -		packs[packs_nr].bitmap_nr = bitmap_git->pack->num_objects;
> +		packs[packs_nr].p = pack;
> +		packs[packs_nr].bitmap_nr = pack->num_objects;
>  		packs[packs_nr].bitmap_pos = 0;
>
>  		objects_nr = packs[packs_nr++].bitmap_nr;

Makes sense.

> diff --git a/t/t5326-multi-pack-bitmaps.sh b/t/t5326-multi-pack-bitmaps.sh
> index 70d1b58709..5d7d321840 100755
> --- a/t/t5326-multi-pack-bitmaps.sh
> +++ b/t/t5326-multi-pack-bitmaps.sh
> @@ -513,4 +513,21 @@ test_expect_success 'corrupt MIDX with bitmap causes fallback' '
>  	)
>  '
>
> +for allow_pack_reuse in single multi
> +do
> +	test_expect_success "reading MIDX without BTMP chunk does not complain with $allow_pack_reuse pack reuse" '
> +		test_when_finished "rm -rf midx-without-btmp" &&
> +		git init midx-without-btmp &&
> +		(
> +			cd midx-without-btmp &&
> +			test_commit initial &&
> +
> +			git repack -Adbl --write-bitmap-index --write-midx &&

`-b` is redundant with `--write-bitmap-index`.

> +			GIT_TEST_MIDX_READ_BTMP=false git -c pack.allowPackReuse=$allow_pack_reuse \
> +				pack-objects --all --use-bitmap-index --stdout </dev/null >/dev/null 2>err &&

A small note here, but setting stdin to read from /dev/null is
unnecessary with `--all.`

> +			test_must_be_empty err
> +		)
> +	'
> +done
> +

This test looks like it's exercising the right thing, but I'm not sure
why it was split into two separate tests. Perhaps to allow the two to
fail separately?

Either way, the repository initialization, test_commit, and repacking
could probably be combined into a single step to avoid re-running them
for different values of $allow_pack_reuse.

I would probably have written:

    git init midx-without-btmp &&
    (
        cd midx-without-btmp &&

        test_commit base &&
        git repack -adb --write-midx &&

        for c in single multi
        do
            GIT_TEST_MIDX_READ_BTMP=false git -c pack.allowPackReuse=$c pack-objects \
              --all --use-bitmap-index --stdout >/dev/null 2>err &&
            test_must_be_empty err || return 1
        done
    )

TBH, I would like to see this test cleaned up before merging this one
down. But otherwise this patch is looking good.

Thanks,
Taylor


  parent reply	other threads:[~2024-04-15 22:51 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-09  5:59 [PATCH] pack-bitmap: gracefully handle missing BTMP chunks Patrick Steinhardt
2024-04-10 15:02 ` Taylor Blau
2024-04-15  6:34   ` Patrick Steinhardt
2024-04-15 22:42     ` Taylor Blau
2024-04-15  6:41 ` [PATCH v2] " Patrick Steinhardt
2024-04-15  8:51   ` Patrick Steinhardt
2024-04-15 17:41     ` Junio C Hamano
2024-04-15 22:51       ` Taylor Blau
2024-04-15 23:46         ` Junio C Hamano
2024-04-15 22:51   ` Taylor Blau [this message]
2024-04-16  4:47     ` Patrick Steinhardt
2024-04-16  5:12       ` Jeff King
2024-04-16  5:14         ` Patrick Steinhardt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zh2vZB/60QlLYjUZ@nand.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).