git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Patrick Steinhardt <ps@pks.im>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, peff@peff.net, dstolee@microsoft.com
Subject: Re: [PATCH] repack: fix geometric repacking with gitalternates
Date: Wed, 5 Apr 2023 09:08:19 +0200	[thread overview]
Message-ID: <ZC0eY8q6ushpfkrQ@ncase> (raw)
In-Reply-To: <ZCxytq1esQWvjIz/@nand.local>

[-- Attachment #1: Type: text/plain, Size: 6313 bytes --]

On Tue, Apr 04, 2023 at 02:55:50PM -0400, Taylor Blau wrote:
> On Tue, Apr 04, 2023 at 01:08:33PM +0200, Patrick Steinhardt wrote:
> > Both issues have the same underlying root cause, which is that geometric
> > repacks don't honor whether packfiles are local or not. As a result,
> > they will try to include packs part of the alternate object directory
> > and then at a later point fail to locate them as they do not exist in
> > the object directory of the repository we're about to repack.
> 
> Interesting. This fix does make sense, but I wonder if it is doing the
> right thing.
> 
> When we have an alternated repository and do 'git repack -ad' in the
> "member" repository, we'll end up creating a new pack containing all
> objects in that repository, including ones from the alternate.
> 
> For example, if you do something like:
> 
>     rm -fr shared member
> 
>     git init shared
>     git -C shared commit --allow-empty -m "base" && git -C shared repack -d
> 
>     git clone --shared shared member
>     git -C member repack -ad
> 
>     for dir in shared member
>     do
>       echo "==> $dir"
>       find $dir/.git/objects -type f
>     done
> 
> Then you'll end up with the output:
> 
>     ==> shared
>     shared/.git/objects/pack/pack-a2f0c663b287c3eeb0207397f8cafb9ae49f8277.idx
>     shared/.git/objects/pack/pack-a2f0c663b287c3eeb0207397f8cafb9ae49f8277.pack
>     shared/.git/objects/info/packs
>     ==> member
>     member/.git/objects/pack/pack-a2f0c663b287c3eeb0207397f8cafb9ae49f8277.idx
>     member/.git/objects/pack/pack-a2f0c663b287c3eeb0207397f8cafb9ae49f8277.pack
>     member/.git/objects/info/alternates
>     member/.git/objects/info/packs
> 
> In other words, we end up creating the pack necessary in the member
> repository necessary to complete the repack. Since we're using `-a`
> here, that means creating an identical pack as we have in the shared
> repository.
> 
> If we instead did something like:
> 
>     git -C member repack -adl # <- `-l` is new here
> 
> , our output changes to instead contain the following (empty) pack
> directory in the member repository:
> 
>     ==> member
>     member/.git/objects/info/alternates
>     member/.git/objects/info/packs

Yup, that's fair.

> > Skip over packfiles that aren't local. This will cause geometric repacks
> > to never include packfiles of its alternates.
> 
> ...So I wonder if this is the right thing to do in all cases. IOW,
> should we be creating packs in the "member" repository which may be
> based off of packs from the shared repository when `-l` is not
> specified?
> 
> I think this gets tricky. For one, the geometric repack code creates at
> most one new pack. So if some of the packs that were above the split
> line (IOW, the ones that don't need to be squashed together) were in the
> shared repository, I'm not sure how you'd write a MIDX that covers both
> without using the MIDX-over-alternates feature. I have no idea how that
> works with MIDX bitmaps (IIUC, the MIDX/alternates feature is very
> niche).

I agree, things would become tricky in case geometric repacks are
expected to work across alternates.

> I think we reasonably could do something like ignoring non-local packs
> in `init_pack_geometry()` only when `-l` is given. That still runs into
> problems when trying to write a MIDX or MIDX bitmaps, so we should
> likely declare the combination "-l --write-midx --write-bitmap-index" as
> unsupported. For backwards compatibility, I think it would make sense to
> have "--no-local" be the default when `--geometric` is given (which is
> already the case, since po_args is zero-initialized).

Okay, I agree that it's not all that sensible to allow writing bitmaps
in a geometric repack that spans across multiple repositories. These
bitmaps would immediately break once the shared repository performs a
repack that removes a subset of packfiles that the bitmap depends on,
which would make it non-obvious for how to even do repacks in such a
shared repository at all.

But I'm not yet sure whether I understand why `-l --write-midx` should
be prohibited like you summarized in the follow-up mail:

On Tue, Apr 04, 2023 at 02:55:50PM -0400, Taylor Blau wrote:
> TL;DR: I think that this is a (much) more complicated problem than
> originally anticipated, but the easiest thing to do is to assume that
> git repack --geometric=<d> means git repack --geometric=<d> --no-local
> unless otherwise specified, and declare --geometric=<d> --local
> unsupported when used in conjunction with --write-midx or
> --write-bitmap-index.

The newly written MIDX would of course only span over the local
packfiles, but is that even a problem? Ideally, Git would know to handle
multiple MIDX files, and in that case it would make sense both for the
shared and for the member repository to have one.

> I suspect in practice that nobody cares about what happens when you run
> "git repack --geometric=<d> --local", but in case they do, I think the
> above is probably the most reasonable behavior that I can quickly come
> up with.

Well, I do as we use alternates to implement fork networks at GitLab and
we're in the process of adopting geometric repacking. So what I want to
have is that I can perform geometric repacks both in the shared and in
the member repository that includes only the local packfiles.

And yes, I agree that the above is the most reasonable behaviour, with
the exception of disallowing MIDXs when doing a local geometric repack.
But that raises the question: what do we do about the currently-broken
behaviour when executing `git repack --geometric=<d> --no-local` in a
alternated repository?

I'd personally be fine to start honoring the `po_args.local` flag so
that we skip over any non-local packfiles there while ignoring the
larger problem of non-local geometric repacks breaking in certain
scenarios. It would at least improve the status quo as users now have a
way out in case they ever happen to hit that error. And it allows us to
use geometric repacks in alternated repositories. But are we okay with
punting on the larger issue for the time being?

Thanks for your feedback and this interesting discussion!

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  parent reply	other threads:[~2023-04-05  7:08 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-04 11:08 [PATCH] repack: fix geometric repacking with gitalternates Patrick Steinhardt
2023-04-04 18:55 ` Taylor Blau
2023-04-04 19:00   ` Taylor Blau
2023-04-05  7:08   ` Patrick Steinhardt [this message]
2023-04-10 15:06     ` Derrick Stolee
2023-04-10 23:49       ` Taylor Blau
2023-04-11 17:13         ` Patrick Steinhardt
2023-04-11 21:13           ` Taylor Blau
2023-04-12  9:37             ` Patrick Steinhardt
2023-04-11 17:06       ` Patrick Steinhardt
2023-04-11 17:26         ` Patrick Steinhardt
2023-04-11 21:14           ` Taylor Blau
2023-04-10 23:29     ` Taylor Blau
2023-04-12 10:22 ` [PATCH v2 0/8] " Patrick Steinhardt
2023-04-12 10:22   ` [PATCH v2 1/8] midx: fix segfault with no packs and invalid preferred pack Patrick Steinhardt
2023-04-12 17:56     ` Taylor Blau
2023-04-13  9:28       ` Patrick Steinhardt
2023-04-12 10:22   ` [PATCH v2 2/8] repack: fix trying to use preferred pack in alternates Patrick Steinhardt
2023-04-12 18:37     ` Taylor Blau
2023-04-13  9:31       ` Patrick Steinhardt
2023-04-12 10:22   ` [PATCH v2 3/8] repack: fix generating multi-pack-index with only non-local packs Patrick Steinhardt
2023-04-12 20:39     ` Taylor Blau
2023-04-12 10:22   ` [PATCH v2 4/8] pack-objects: fix error when packing same pack twice Patrick Steinhardt
2023-04-12 21:33     ` Taylor Blau
2023-04-12 10:22   ` [PATCH v2 5/8] pack-objects: fix error when same packfile is included and excluded Patrick Steinhardt
2023-04-12 21:52     ` Taylor Blau
2023-04-12 10:22   ` [PATCH v2 6/8] pack-objects: extend test coverage of `--stdin-packs` with alternates Patrick Steinhardt
2023-04-12 10:22   ` [PATCH v2 7/8] repack: honor `-l` when calculating pack geometry Patrick Steinhardt
2023-04-12 23:56     ` Junio C Hamano
2023-04-13  5:11       ` Junio C Hamano
2023-04-13  6:41         ` Patrick Steinhardt
2023-04-12 10:23   ` [PATCH v2 8/8] repack: disable writing bitmaps when doing a local geometric repack Patrick Steinhardt
2023-04-12 22:01     ` Taylor Blau
2023-04-13  9:54       ` Patrick Steinhardt
2023-04-13 10:14         ` Patrick Steinhardt
2023-04-12 22:02   ` [PATCH v2 0/8] repack: fix geometric repacking with gitalternates Taylor Blau
2023-04-13 11:16 ` [PATCH v3 00/10] " Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 01/10] midx: fix segfault with no packs and invalid preferred pack Patrick Steinhardt
2023-04-13 13:49     ` Derrick Stolee
2023-04-13 11:16   ` [PATCH v3 02/10] repack: fix trying to use preferred pack in alternates Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 03/10] repack: fix generating multi-pack-index with only non-local packs Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 04/10] pack-objects: split out `--stdin-packs` tests into separate file Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 05/10] pack-objects: fix error when packing same pack twice Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 06/10] pack-objects: fix error when same packfile is included and excluded Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 07/10] pack-objects: extend test coverage of `--stdin-packs` with alternates Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 08/10] t/helper: allow chmtime to print verbosely without modifying mtime Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 09/10] repack: honor `-l` when calculating pack geometry Patrick Steinhardt
2023-04-13 13:59     ` Derrick Stolee
2023-04-13 14:13       ` Patrick Steinhardt
2023-04-13 15:40       ` Junio C Hamano
2023-04-13 11:16   ` [PATCH v3 10/10] repack: disable writing bitmaps when doing a local repack Patrick Steinhardt
2023-04-13 14:54   ` [PATCH v3 00/10] repack: fix geometric repacking with gitalternates Derrick Stolee
2023-04-14  2:03   ` Junio C Hamano
2023-04-14  5:42     ` Patrick Steinhardt
2023-04-14  6:01 ` [PATCH v4 " Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 01/10] midx: fix segfault with no packs and invalid preferred pack Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 02/10] repack: fix trying to use preferred pack in alternates Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 03/10] repack: fix generating multi-pack-index with only non-local packs Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 04/10] pack-objects: split out `--stdin-packs` tests into separate file Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 05/10] pack-objects: fix error when packing same pack twice Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 06/10] pack-objects: fix error when same packfile is included and excluded Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 07/10] pack-objects: extend test coverage of `--stdin-packs` with alternates Patrick Steinhardt
2023-04-14  6:02   ` [PATCH v4 08/10] t/helper: allow chmtime to print verbosely without modifying mtime Patrick Steinhardt
2023-04-14  6:02   ` [PATCH v4 09/10] repack: honor `-l` when calculating pack geometry Patrick Steinhardt
2023-04-14  6:02   ` [PATCH v4 10/10] repack: disable writing bitmaps when doing a local repack Patrick Steinhardt
2023-04-14 13:23   ` [PATCH v4 00/10] repack: fix geometric repacking with gitalternates Derrick Stolee
2023-04-14 17:29     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZC0eY8q6ushpfkrQ@ncase \
    --to=ps@pks.im \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).