git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Patrick Steinhardt <ps@pks.im>
Cc: git@vger.kernel.org, peff@peff.net, dstolee@microsoft.com
Subject: Re: [PATCH] repack: fix geometric repacking with gitalternates
Date: Tue, 4 Apr 2023 14:55:50 -0400	[thread overview]
Message-ID: <ZCxytq1esQWvjIz/@nand.local> (raw)
In-Reply-To: <a07ed50feeec4bfc3e9736bf493b9876896bcdd2.1680606445.git.ps@pks.im>

On Tue, Apr 04, 2023 at 01:08:33PM +0200, Patrick Steinhardt wrote:
> Both issues have the same underlying root cause, which is that geometric
> repacks don't honor whether packfiles are local or not. As a result,
> they will try to include packs part of the alternate object directory
> and then at a later point fail to locate them as they do not exist in
> the object directory of the repository we're about to repack.

Interesting. This fix does make sense, but I wonder if it is doing the
right thing.

When we have an alternated repository and do 'git repack -ad' in the
"member" repository, we'll end up creating a new pack containing all
objects in that repository, including ones from the alternate.

For example, if you do something like:

    rm -fr shared member

    git init shared
    git -C shared commit --allow-empty -m "base" && git -C shared repack -d

    git clone --shared shared member
    git -C member repack -ad

    for dir in shared member
    do
      echo "==> $dir"
      find $dir/.git/objects -type f
    done

Then you'll end up with the output:

    ==> shared
    shared/.git/objects/pack/pack-a2f0c663b287c3eeb0207397f8cafb9ae49f8277.idx
    shared/.git/objects/pack/pack-a2f0c663b287c3eeb0207397f8cafb9ae49f8277.pack
    shared/.git/objects/info/packs
    ==> member
    member/.git/objects/pack/pack-a2f0c663b287c3eeb0207397f8cafb9ae49f8277.idx
    member/.git/objects/pack/pack-a2f0c663b287c3eeb0207397f8cafb9ae49f8277.pack
    member/.git/objects/info/alternates
    member/.git/objects/info/packs

In other words, we end up creating the pack necessary in the member
repository necessary to complete the repack. Since we're using `-a`
here, that means creating an identical pack as we have in the shared
repository.

If we instead did something like:

    git -C member repack -adl # <- `-l` is new here

, our output changes to instead contain the following (empty) pack
directory in the member repository:

    ==> member
    member/.git/objects/info/alternates
    member/.git/objects/info/packs

> Skip over packfiles that aren't local. This will cause geometric repacks
> to never include packfiles of its alternates.

...So I wonder if this is the right thing to do in all cases. IOW,
should we be creating packs in the "member" repository which may be
based off of packs from the shared repository when `-l` is not
specified?

I think this gets tricky. For one, the geometric repack code creates at
most one new pack. So if some of the packs that were above the split
line (IOW, the ones that don't need to be squashed together) were in the
shared repository, I'm not sure how you'd write a MIDX that covers both
without using the MIDX-over-alternates feature. I have no idea how that
works with MIDX bitmaps (IIUC, the MIDX/alternates feature is very
niche).

I think we reasonably could do something like ignoring non-local packs
in `init_pack_geometry()` only when `-l` is given. That still runs into
problems when trying to write a MIDX or MIDX bitmaps, so we should
likely declare the combination "-l --write-midx --write-bitmap-index" as
unsupported. For backwards compatibility, I think it would make sense to
have "--no-local" be the default when `--geometric` is given (which is
already the case, since po_args is zero-initialized).

I suspect in practice that nobody cares about what happens when you run
"git repack --geometric=<d> --local", but in case they do, I think the
above is probably the most reasonable behavior that I can quickly come
up with.

> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  builtin/repack.c            |  6 ++++
>  t/t7703-repack-geometric.sh | 59 +++++++++++++++++++++++++++++++++++++
>  2 files changed, 65 insertions(+)
>
> diff --git a/builtin/repack.c b/builtin/repack.c
> index 87f73c8923..c6d12fa4bd 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -333,6 +333,12 @@ static void init_pack_geometry(struct pack_geometry **geometry_p,
>  	geometry = *geometry_p;
>
>  	for (p = get_all_packs(the_repository); p; p = p->next) {
> +		/*
> +		 * We don't want to repack packfiles which are not part of the
> +		 * primary object database.
> +		 */
> +		if (!p->pack_local)
> +			continue;

So this would change to something like `if (po_args.local &&
!p->pack_local)`.

> diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
> index 8821fbd2dd..9f8bc663e4 100755
> --- a/t/t7703-repack-geometric.sh
> +++ b/t/t7703-repack-geometric.sh
> @@ -281,4 +281,63 @@ test_expect_success '--geometric with pack.packSizeLimit' '
>  	)
>  '
>
> +packed_objects() {
> +	git verify-pack -v "$@" >tmp-object-list &&
> +	sed -n -e "s/^\([0-9a-f][0-9a-f]*\).*\(commit\|tree\|blob\|tag\).*/\1/p" <tmp-object-list &&
> +	rm -f tmp-object-list
> +}

Just a nitpick, but I've found the output from index-pack to be a little
easier to work with here. I think that you could replace this function
with the following and have it work:

--- 8< ---
diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
index 9f8bc663e4..e3ac5001bd 100755
--- a/t/t7703-repack-geometric.sh
+++ b/t/t7703-repack-geometric.sh
@@ -282,9 +282,8 @@ test_expect_success '--geometric with pack.packSizeLimit' '
 '

 packed_objects() {
-	git verify-pack -v "$@" >tmp-object-list &&
-	sed -n -e "s/^\([0-9a-f][0-9a-f]*\).*\(commit\|tree\|blob\|tag\).*/\1/p" <tmp-object-list &&
-	rm -f tmp-object-list
+	git show-index <"$1" >tmp-object-list &&
+	cut -d' ' -f2 tmp-object-list
 }

 test_expect_success '--geometric with no local objects creates no pack' '
--- >8 ---

I removed the "&& rm -f tmp-object-list", since we overwrite it anyway
whenever we call packed_objects().

> +test_expect_success '--geometric with no local objects creates no pack' '
> +	git init shared &&
> +	test_when_finished "rm -fr shared" &&
> +	test_commit -C shared "shared" &&
> +	git -C shared repack -Ad &&
> +
> +	# Set up the member repository so that it does not have
> +	# any objects on its own.
> +	git clone --shared shared member &&
> +	test_when_finished "rm -fr member" &&
> +
> +	git -C member repack --geometric 2 --write-midx &&
> +	find member/.git/objects/pack -type f >actual &&
> +	test_must_be_empty actual

Another small nitpick, but I think these last two lines could be
replaced with

    test_dir_is_empty member/.git/objects/pack

or even

    test_dir_is_empty member/$packdir

> +test_expect_success '--geometric does not include shared packfiles' '
> +	git init shared &&
> +	test_when_finished "rm -fr shared" &&
> +	test_commit -C shared "shared" &&
> +	git -C shared repack -Ad &&
> +
> +	git clone --shared shared member &&
> +	test_when_finished "rm -fr member" &&
> +	git -C member commit --allow-empty --message "not-shared" &&

Any reason to make this commit empty? Could we replace this with

    test_commit -C member not-shared

?

> +test_expect_success '--geometric with same packfile in shared repository' '
> +	git init shared &&
> +	test_when_finished "rm -fr shared" &&
> +	test_commit -C shared "shared" &&
> +	git -C shared repack -Ad &&
> +
> +	# Prepare the member repository so that it got the exact same packfile
> +	# as the shared repository and set up gitalternates.
> +	cp -r shared member &&
> +	test_when_finished "rm -fr member" &&
> +	test-tool path-utils real_path shared/.git/objects >member/.git/objects/info/alternates &&
> +	find shared/.git/objects -type f >expect &&

This works, though it may be easier to write something like:

    git clone -s shared member &&
    test_when_finished "rm -fr member" &&
    # ensure that "shared" and "member" have an identical set of packs
    git -C member repack -a &&
    find shared/.git/objects -type f >expect &&

> +	# After repacking, contents of the member repository should not have
> +	# changed.
> +	git -C member repack --geometric 2 --write-midx 2>error &&

I think that "2>error" is a stray change left over from debugging, and
could probably be removed.

> +	find shared/.git/objects -type f >actual &&

This and above could be replaced with shared/$packdir. Whenever I'm
asserting the contents of a directory are unchanged, I typically like to
call the two files under comparison "before" and "after", but "expect"
and "actual" are fine here, too.

Thanks,
Taylor

  reply	other threads:[~2023-04-04 18:56 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-04 11:08 [PATCH] repack: fix geometric repacking with gitalternates Patrick Steinhardt
2023-04-04 18:55 ` Taylor Blau [this message]
2023-04-04 19:00   ` Taylor Blau
2023-04-05  7:08   ` Patrick Steinhardt
2023-04-10 15:06     ` Derrick Stolee
2023-04-10 23:49       ` Taylor Blau
2023-04-11 17:13         ` Patrick Steinhardt
2023-04-11 21:13           ` Taylor Blau
2023-04-12  9:37             ` Patrick Steinhardt
2023-04-11 17:06       ` Patrick Steinhardt
2023-04-11 17:26         ` Patrick Steinhardt
2023-04-11 21:14           ` Taylor Blau
2023-04-10 23:29     ` Taylor Blau
2023-04-12 10:22 ` [PATCH v2 0/8] " Patrick Steinhardt
2023-04-12 10:22   ` [PATCH v2 1/8] midx: fix segfault with no packs and invalid preferred pack Patrick Steinhardt
2023-04-12 17:56     ` Taylor Blau
2023-04-13  9:28       ` Patrick Steinhardt
2023-04-12 10:22   ` [PATCH v2 2/8] repack: fix trying to use preferred pack in alternates Patrick Steinhardt
2023-04-12 18:37     ` Taylor Blau
2023-04-13  9:31       ` Patrick Steinhardt
2023-04-12 10:22   ` [PATCH v2 3/8] repack: fix generating multi-pack-index with only non-local packs Patrick Steinhardt
2023-04-12 20:39     ` Taylor Blau
2023-04-12 10:22   ` [PATCH v2 4/8] pack-objects: fix error when packing same pack twice Patrick Steinhardt
2023-04-12 21:33     ` Taylor Blau
2023-04-12 10:22   ` [PATCH v2 5/8] pack-objects: fix error when same packfile is included and excluded Patrick Steinhardt
2023-04-12 21:52     ` Taylor Blau
2023-04-12 10:22   ` [PATCH v2 6/8] pack-objects: extend test coverage of `--stdin-packs` with alternates Patrick Steinhardt
2023-04-12 10:22   ` [PATCH v2 7/8] repack: honor `-l` when calculating pack geometry Patrick Steinhardt
2023-04-12 23:56     ` Junio C Hamano
2023-04-13  5:11       ` Junio C Hamano
2023-04-13  6:41         ` Patrick Steinhardt
2023-04-12 10:23   ` [PATCH v2 8/8] repack: disable writing bitmaps when doing a local geometric repack Patrick Steinhardt
2023-04-12 22:01     ` Taylor Blau
2023-04-13  9:54       ` Patrick Steinhardt
2023-04-13 10:14         ` Patrick Steinhardt
2023-04-12 22:02   ` [PATCH v2 0/8] repack: fix geometric repacking with gitalternates Taylor Blau
2023-04-13 11:16 ` [PATCH v3 00/10] " Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 01/10] midx: fix segfault with no packs and invalid preferred pack Patrick Steinhardt
2023-04-13 13:49     ` Derrick Stolee
2023-04-13 11:16   ` [PATCH v3 02/10] repack: fix trying to use preferred pack in alternates Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 03/10] repack: fix generating multi-pack-index with only non-local packs Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 04/10] pack-objects: split out `--stdin-packs` tests into separate file Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 05/10] pack-objects: fix error when packing same pack twice Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 06/10] pack-objects: fix error when same packfile is included and excluded Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 07/10] pack-objects: extend test coverage of `--stdin-packs` with alternates Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 08/10] t/helper: allow chmtime to print verbosely without modifying mtime Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 09/10] repack: honor `-l` when calculating pack geometry Patrick Steinhardt
2023-04-13 13:59     ` Derrick Stolee
2023-04-13 14:13       ` Patrick Steinhardt
2023-04-13 15:40       ` Junio C Hamano
2023-04-13 11:16   ` [PATCH v3 10/10] repack: disable writing bitmaps when doing a local repack Patrick Steinhardt
2023-04-13 14:54   ` [PATCH v3 00/10] repack: fix geometric repacking with gitalternates Derrick Stolee
2023-04-14  2:03   ` Junio C Hamano
2023-04-14  5:42     ` Patrick Steinhardt
2023-04-14  6:01 ` [PATCH v4 " Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 01/10] midx: fix segfault with no packs and invalid preferred pack Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 02/10] repack: fix trying to use preferred pack in alternates Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 03/10] repack: fix generating multi-pack-index with only non-local packs Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 04/10] pack-objects: split out `--stdin-packs` tests into separate file Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 05/10] pack-objects: fix error when packing same pack twice Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 06/10] pack-objects: fix error when same packfile is included and excluded Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 07/10] pack-objects: extend test coverage of `--stdin-packs` with alternates Patrick Steinhardt
2023-04-14  6:02   ` [PATCH v4 08/10] t/helper: allow chmtime to print verbosely without modifying mtime Patrick Steinhardt
2023-04-14  6:02   ` [PATCH v4 09/10] repack: honor `-l` when calculating pack geometry Patrick Steinhardt
2023-04-14  6:02   ` [PATCH v4 10/10] repack: disable writing bitmaps when doing a local repack Patrick Steinhardt
2023-04-14 13:23   ` [PATCH v4 00/10] repack: fix geometric repacking with gitalternates Derrick Stolee
2023-04-14 17:29     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZCxytq1esQWvjIz/@nand.local \
    --to=me@ttaylorr.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).