git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, peff@peff.net, dstolee@microsoft.com
Subject: Re: [PATCH v2 2/8] repack: fix trying to use preferred pack in alternates
Date: Thu, 13 Apr 2023 11:31:43 +0200	[thread overview]
Message-ID: <ZDfL_6G-ySm94Vla@ncase> (raw)
In-Reply-To: <ZDb6gZycwTdsaB6o@nand.local>

[-- Attachment #1: Type: text/plain, Size: 5395 bytes --]

On Wed, Apr 12, 2023 at 02:37:53PM -0400, Taylor Blau wrote:
> On Wed, Apr 12, 2023 at 12:22:35PM +0200, Patrick Steinhardt wrote:
> > When doing a geometric repack with multi-pack-indices, then we ask
> > git-multi-pack-index(1) to use the largest packfile as the preferred
> > pack. It can happen though that the largest packfile is not part of the
> > main object database, but instead part of an alternate object database.
> > The result is that git-multi-pack-index(1) will not be able to find the
> > preferred pack and print a warning. It then falls back to use the first
> > packfile that the multi-pack-index shall reference.
> >
> > Fix this bug by only considering packfiles as preferred pack that are
> > local. This is the right thing to do given that a multi-pack-index
> > should never reference packfiles borrowed from an alternate.
> >
> > While at it, rename the function `get_largest_active_packfile()` to
> > `get_preferred_pack()` to better document its intent.
> >
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> 
> > @@ -464,7 +466,16 @@ static struct packed_git *get_largest_active_pack(struct pack_geometry *geometry
> >  	}
> >  	if (geometry->split == geometry->pack_nr)
> >  		return NULL;
> > -	return geometry->pack[geometry->pack_nr - 1];
> > +
> > +	for (i = geometry->pack_nr; i > 0; i--)
> > +		/*
> > +		 * A pack that is not local would never be included in a
> > +		 * multi-pack index. We thus skip over any non-local packs.
> > +		 */
> > +		if (geometry->pack[i - 1]->pack_local)
> > +			return geometry->pack[i - 1];
> > +
> > +	return NULL;
> >  }
> 
> Looking good, we want to go through this list in reverse order, since
> the packs are ordered largest to smallest. I think that you could
> slightly rewrite the loop condition to avoid having to always access
> `geometry->pack` at `i-1` instead of `i`, like so:
> 
> --- 8< ---
> diff --git a/builtin/repack.c b/builtin/repack.c
> index 9d36dc8b84..ba468ac44e 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -467,13 +467,15 @@ static struct packed_git *get_preferred_pack(struct pack_geometry *geometry)
>  	if (geometry->split == geometry->pack_nr)
>  		return NULL;
> 
> -	for (i = geometry->pack_nr; i > 0; i--)
> +	for (i = geometry->pack_nr - 1; i >= 0; i--) {
>  		/*
>  		 * A pack that is not local would never be included in a
>  		 * multi-pack index. We thus skip over any non-local packs.
>  		 */
> -		if (geometry->pack[i - 1]->pack_local)
> -			return geometry->pack[i - 1];
> +		struct packed_git *p = geometry->pack[i];
> +		if (p->pack_local)
> +			return p;
> +	}
> 
>  	return NULL;
>  }
> --- >8 ---

There's a gotcha: `i` is an `unit32_t`, so `i >= 0` would always be true
and thus we wrap around and would try to access the pack array at an
out-of-bound index.

> but I'm not sure that the loop condition is quite right to begin with.
> We don't want to iterate all the way down to the beginning of the pack
> list, since some of those packs may be below the "split" line, IOW their
> contents are going to be rolled up and the packs destroyed.
> 
> So I think the right loop condition would be:
> 
>     for (i = geometry->pack_nr - 1; i >= geometry->split; i--)
> 
> which I think makes the "if (geometry->split == geometry->pack_nr)"
> condition above redundant with this loop, so you could probably drop
> that.
> 
> I would definitely appreciate a second set of eye here. The pack *at*
> the split line is considered frozen (IOW, we create a new pack
> consisting of the packs in range [0, geometry->split), and leave the
> packs in range [geometry->split, geometry->nr) alone).
> 
> So it should be fine to enter that loop when "i == geometry->split",
> because it's OK to return that as a potential preferred pack.

That makes sense indeed. Will amend.

[snip]
> > +	test $(wc -l <packs) = 1 &&
> > +
> > +	# We should also see a multi-pack-index. This multi-pack-index should
> > +	# never refer to any packfiles in the alternate object database.
> > +	# Consequentially, it should be valid even with the alternate ODB
> > +	# deleted.
> > +	rm -r shared &&
> > +	git -C member multi-pack-index verify
> 
> To test this even more directly, I think that you could ensure that the
> set of packs (which should just be the single entry found in "packs"
> above) matches what we expect. That should look something like:
> 
>     test-tool read-midx member/.git/objects >packs.midx &&
>     grep "^pack-.*\.idx$" packs.midx | sort >actual &&
>     xargs -n 1 basename <packs | sort >expect
>     test_cmp expect actual
> 
> Note that the read-midx test helper outputs packs with the "*.idx"
> suffix, so you'd probably want to alter your find invocation a few lines
> above accordingly, or rewrite it before writing into "actual".
> 
> Alternatively, since we expect there to be just a single pack here, I
> think we could just as easily write something like:
> 
>     test-tool read-midx member/.git/objects >packs.midx &&
>     grep "^pack-.*\.idx$" packs.midx >actual &&
>     test_line_count = 1 actual &&
>     grep $(cat actual) packs
> 
> though I slightly prefer the former since it is less brittle to changing
> this test. Either way, though.

Thanks, I've adopted that approach with a slight modification.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2023-04-13  9:31 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-04 11:08 [PATCH] repack: fix geometric repacking with gitalternates Patrick Steinhardt
2023-04-04 18:55 ` Taylor Blau
2023-04-04 19:00   ` Taylor Blau
2023-04-05  7:08   ` Patrick Steinhardt
2023-04-10 15:06     ` Derrick Stolee
2023-04-10 23:49       ` Taylor Blau
2023-04-11 17:13         ` Patrick Steinhardt
2023-04-11 21:13           ` Taylor Blau
2023-04-12  9:37             ` Patrick Steinhardt
2023-04-11 17:06       ` Patrick Steinhardt
2023-04-11 17:26         ` Patrick Steinhardt
2023-04-11 21:14           ` Taylor Blau
2023-04-10 23:29     ` Taylor Blau
2023-04-12 10:22 ` [PATCH v2 0/8] " Patrick Steinhardt
2023-04-12 10:22   ` [PATCH v2 1/8] midx: fix segfault with no packs and invalid preferred pack Patrick Steinhardt
2023-04-12 17:56     ` Taylor Blau
2023-04-13  9:28       ` Patrick Steinhardt
2023-04-12 10:22   ` [PATCH v2 2/8] repack: fix trying to use preferred pack in alternates Patrick Steinhardt
2023-04-12 18:37     ` Taylor Blau
2023-04-13  9:31       ` Patrick Steinhardt [this message]
2023-04-12 10:22   ` [PATCH v2 3/8] repack: fix generating multi-pack-index with only non-local packs Patrick Steinhardt
2023-04-12 20:39     ` Taylor Blau
2023-04-12 10:22   ` [PATCH v2 4/8] pack-objects: fix error when packing same pack twice Patrick Steinhardt
2023-04-12 21:33     ` Taylor Blau
2023-04-12 10:22   ` [PATCH v2 5/8] pack-objects: fix error when same packfile is included and excluded Patrick Steinhardt
2023-04-12 21:52     ` Taylor Blau
2023-04-12 10:22   ` [PATCH v2 6/8] pack-objects: extend test coverage of `--stdin-packs` with alternates Patrick Steinhardt
2023-04-12 10:22   ` [PATCH v2 7/8] repack: honor `-l` when calculating pack geometry Patrick Steinhardt
2023-04-12 23:56     ` Junio C Hamano
2023-04-13  5:11       ` Junio C Hamano
2023-04-13  6:41         ` Patrick Steinhardt
2023-04-12 10:23   ` [PATCH v2 8/8] repack: disable writing bitmaps when doing a local geometric repack Patrick Steinhardt
2023-04-12 22:01     ` Taylor Blau
2023-04-13  9:54       ` Patrick Steinhardt
2023-04-13 10:14         ` Patrick Steinhardt
2023-04-12 22:02   ` [PATCH v2 0/8] repack: fix geometric repacking with gitalternates Taylor Blau
2023-04-13 11:16 ` [PATCH v3 00/10] " Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 01/10] midx: fix segfault with no packs and invalid preferred pack Patrick Steinhardt
2023-04-13 13:49     ` Derrick Stolee
2023-04-13 11:16   ` [PATCH v3 02/10] repack: fix trying to use preferred pack in alternates Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 03/10] repack: fix generating multi-pack-index with only non-local packs Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 04/10] pack-objects: split out `--stdin-packs` tests into separate file Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 05/10] pack-objects: fix error when packing same pack twice Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 06/10] pack-objects: fix error when same packfile is included and excluded Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 07/10] pack-objects: extend test coverage of `--stdin-packs` with alternates Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 08/10] t/helper: allow chmtime to print verbosely without modifying mtime Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 09/10] repack: honor `-l` when calculating pack geometry Patrick Steinhardt
2023-04-13 13:59     ` Derrick Stolee
2023-04-13 14:13       ` Patrick Steinhardt
2023-04-13 15:40       ` Junio C Hamano
2023-04-13 11:16   ` [PATCH v3 10/10] repack: disable writing bitmaps when doing a local repack Patrick Steinhardt
2023-04-13 14:54   ` [PATCH v3 00/10] repack: fix geometric repacking with gitalternates Derrick Stolee
2023-04-14  2:03   ` Junio C Hamano
2023-04-14  5:42     ` Patrick Steinhardt
2023-04-14  6:01 ` [PATCH v4 " Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 01/10] midx: fix segfault with no packs and invalid preferred pack Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 02/10] repack: fix trying to use preferred pack in alternates Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 03/10] repack: fix generating multi-pack-index with only non-local packs Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 04/10] pack-objects: split out `--stdin-packs` tests into separate file Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 05/10] pack-objects: fix error when packing same pack twice Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 06/10] pack-objects: fix error when same packfile is included and excluded Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 07/10] pack-objects: extend test coverage of `--stdin-packs` with alternates Patrick Steinhardt
2023-04-14  6:02   ` [PATCH v4 08/10] t/helper: allow chmtime to print verbosely without modifying mtime Patrick Steinhardt
2023-04-14  6:02   ` [PATCH v4 09/10] repack: honor `-l` when calculating pack geometry Patrick Steinhardt
2023-04-14  6:02   ` [PATCH v4 10/10] repack: disable writing bitmaps when doing a local repack Patrick Steinhardt
2023-04-14 13:23   ` [PATCH v4 00/10] repack: fix geometric repacking with gitalternates Derrick Stolee
2023-04-14 17:29     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZDfL_6G-ySm94Vla@ncase \
    --to=ps@pks.im \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).