git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Derrick Stolee <derrickstolee@github.com>
Cc: Patrick Steinhardt <ps@pks.im>,
	git@vger.kernel.org, peff@peff.net, dstolee@microsoft.com
Subject: Re: [PATCH] repack: fix geometric repacking with gitalternates
Date: Mon, 10 Apr 2023 19:49:01 -0400	[thread overview]
Message-ID: <ZDSgbYa+j/5c5t8j@nand.local> (raw)
In-Reply-To: <4c7b95e1-9d3c-e253-98ca-ac6c201babb3@github.com>

On Mon, Apr 10, 2023 at 11:06:49AM -0400, Derrick Stolee wrote:
> Your original message includes error messages like "could not find pack"
> and "unknown preferred pack" which makes me think the _real_ problem is
> that we are not respecting the full path name of the pack-file and are
> somehow localizing packs to the local object dir.

"could not find pack" comes from pack-objects's --stdin-packs mode when
the caller specifies a pack that doesn't exist in the repository.

I'm not sure what's going on here, since read_packs_list_from_stdin()
searches over the result of calling get_all_packs(), which does include
packs in alternate object stores. And indeed, pack-objects can recognize
such packs:

--- 8< ---
#!/bin/sh

rm -fr shared member

git init shared
git -C shared commit --allow-empty -m "base" && git -C shared repack -d

git clone --shared shared member

basename "$(find shared/.git/objects/pack -type f -name '*.pack')" >input

git -C member pack-objects .git/objects/pack/pack --stdin-packs <input

for repo in shared member
do
	echo "==> $repo"
	find $repo/.git/objects/pack -type f
done
--- >8 ---

^^ the above results in generating the identical pack in the "member"
repository as expected:

==> shared
shared/.git/objects/pack/pack-d51c724345703411d57134d018e9643924900d39.idx
shared/.git/objects/pack/pack-d51c724345703411d57134d018e9643924900d39.pack
==> member
member/.git/objects/pack/pack-d51c724345703411d57134d018e9643924900d39.idx
member/.git/objects/pack/pack-d51c724345703411d57134d018e9643924900d39.pack

> The basic reason is that write_midx_included_packs() takes all of the
> pack-files from the geometry, but does not strip out the pack-files
> that are not in the same object directory.

I agree here; the pack-geometry code should be stripping out non-local
packs when writing a MIDX regardless of whether the caller passed
--local or not.

> Perhaps this method could include a step to create a new, "local"
> geometry containing only the packs within the local object dir. We can
> then skip the --preferred-pack option and bitmap if this is different
> than the original geometry (perhaps with a warning message to help
> users who did this accidentally).

It would be nice to not have to juggle multiple pack geometry structs,
since the logic of what gets repacked, what gets thrown away, and what
gets kept is already fairly complicated (at least to me) and pretty
fragile.

I think if I were sketching this out, I'd start out by doing something
like:

--- 8< ---
diff --git a/builtin/repack.c b/builtin/repack.c
index df4d8e0f0b..eab5f58444 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -558,6 +558,10 @@ static void midx_included_packs(struct string_list *include,
 		for (i = geometry->split; i < geometry->pack_nr; i++) {
 			struct packed_git *p = geometry->pack[i];

+			/* MIDXs cannot refer to non-local packs */
+			if (!p->pack_local)
+				continue;
+
 			strbuf_addstr(&buf, pack_basename(p));
 			strbuf_strip_suffix(&buf, ".pack");
 			strbuf_addstr(&buf, ".idx");
--- >8 ---

...and I actually think that might do it, since:

	- existing_nonkept_packs is populated by calling readdir() on the local
		repository's $GIT_DIR/objects/pack directory, so it will never contain
		any non-local packs.

	- existing_kept_packs is also OK for the same reason

	- names (which tracks the packs that we just wrote) will never contain a
		non-local pack, since we never write packs outside of our local pack
		directory

So that would cause you to write a MIDX containing only local packs (as
desired) regardless of whether or not the caller passed --[no]-local or
not.

> > I'd personally be fine to start honoring the `po_args.local` flag so
> > that we skip over any non-local packfiles there while ignoring the
> > larger problem of non-local geometric repacks breaking in certain
> > scenarios. It would at least improve the status quo as users now have a
> > way out in case they ever happen to hit that error. And it allows us to
> > use geometric repacks in alternated repositories. But are we okay with
> > punting on the larger issue for the time being?
>
> I think the real bug is isolated in write_midx_included_packs() in how
> it may specify packs that the multi-pack-index cannot track. It should
> be worth the time exploring if there is an easy fix there, and then the
> po_args.local version can be used as a backup/performance tweak on top
> of that.

Yeah, seeing "unknown preferred pack" makes me suspicious that we tried
to write a MIDX that contains a pack outside of our repository. I tried
to reproduce that case with the following script but couldn't do it (the
pack that the MIDX does track is the local one, as expected):

--- 8< ---
#!/bin/sh

rm -fr shared member

git init shared
git -C shared commit --allow-empty -m "base" && git -C shared repack -d

git clone --shared shared member

git -C member commit --allow-empty -m "base" && git -C member repack -d

{
	basename "$(find shared/.git/objects/pack -type f -name '*.idx')"
	basename "$(find member/.git/objects/pack -type f -name '*.idx')"
} |
git.compile -C member multi-pack-index write --stdin-packs

for repo in shared member
do
	echo "==> $repo"
	find $repo/.git/objects/pack -type f
done

( cd member && ~/src/git/t/helper/test-tool read-midx --show-objects .git/objects )
--- >8 ---

But the MIDX code that is responsible for collecting the packs specified
over --stdin-packs enumerates the possible packs by calling
`for_each_file_in_pack_dir()` with our local `object_dir`, so I don't
think it's possible to have a MIDX that contains a non-local pack.

IOW, I agree with you that the bug is in write_midx_included_packs(),
and I think the fix that I outlined would do the trick.

Thanks,
Taylor

  reply	other threads:[~2023-04-10 23:49 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-04 11:08 [PATCH] repack: fix geometric repacking with gitalternates Patrick Steinhardt
2023-04-04 18:55 ` Taylor Blau
2023-04-04 19:00   ` Taylor Blau
2023-04-05  7:08   ` Patrick Steinhardt
2023-04-10 15:06     ` Derrick Stolee
2023-04-10 23:49       ` Taylor Blau [this message]
2023-04-11 17:13         ` Patrick Steinhardt
2023-04-11 21:13           ` Taylor Blau
2023-04-12  9:37             ` Patrick Steinhardt
2023-04-11 17:06       ` Patrick Steinhardt
2023-04-11 17:26         ` Patrick Steinhardt
2023-04-11 21:14           ` Taylor Blau
2023-04-10 23:29     ` Taylor Blau
2023-04-12 10:22 ` [PATCH v2 0/8] " Patrick Steinhardt
2023-04-12 10:22   ` [PATCH v2 1/8] midx: fix segfault with no packs and invalid preferred pack Patrick Steinhardt
2023-04-12 17:56     ` Taylor Blau
2023-04-13  9:28       ` Patrick Steinhardt
2023-04-12 10:22   ` [PATCH v2 2/8] repack: fix trying to use preferred pack in alternates Patrick Steinhardt
2023-04-12 18:37     ` Taylor Blau
2023-04-13  9:31       ` Patrick Steinhardt
2023-04-12 10:22   ` [PATCH v2 3/8] repack: fix generating multi-pack-index with only non-local packs Patrick Steinhardt
2023-04-12 20:39     ` Taylor Blau
2023-04-12 10:22   ` [PATCH v2 4/8] pack-objects: fix error when packing same pack twice Patrick Steinhardt
2023-04-12 21:33     ` Taylor Blau
2023-04-12 10:22   ` [PATCH v2 5/8] pack-objects: fix error when same packfile is included and excluded Patrick Steinhardt
2023-04-12 21:52     ` Taylor Blau
2023-04-12 10:22   ` [PATCH v2 6/8] pack-objects: extend test coverage of `--stdin-packs` with alternates Patrick Steinhardt
2023-04-12 10:22   ` [PATCH v2 7/8] repack: honor `-l` when calculating pack geometry Patrick Steinhardt
2023-04-12 23:56     ` Junio C Hamano
2023-04-13  5:11       ` Junio C Hamano
2023-04-13  6:41         ` Patrick Steinhardt
2023-04-12 10:23   ` [PATCH v2 8/8] repack: disable writing bitmaps when doing a local geometric repack Patrick Steinhardt
2023-04-12 22:01     ` Taylor Blau
2023-04-13  9:54       ` Patrick Steinhardt
2023-04-13 10:14         ` Patrick Steinhardt
2023-04-12 22:02   ` [PATCH v2 0/8] repack: fix geometric repacking with gitalternates Taylor Blau
2023-04-13 11:16 ` [PATCH v3 00/10] " Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 01/10] midx: fix segfault with no packs and invalid preferred pack Patrick Steinhardt
2023-04-13 13:49     ` Derrick Stolee
2023-04-13 11:16   ` [PATCH v3 02/10] repack: fix trying to use preferred pack in alternates Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 03/10] repack: fix generating multi-pack-index with only non-local packs Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 04/10] pack-objects: split out `--stdin-packs` tests into separate file Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 05/10] pack-objects: fix error when packing same pack twice Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 06/10] pack-objects: fix error when same packfile is included and excluded Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 07/10] pack-objects: extend test coverage of `--stdin-packs` with alternates Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 08/10] t/helper: allow chmtime to print verbosely without modifying mtime Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 09/10] repack: honor `-l` when calculating pack geometry Patrick Steinhardt
2023-04-13 13:59     ` Derrick Stolee
2023-04-13 14:13       ` Patrick Steinhardt
2023-04-13 15:40       ` Junio C Hamano
2023-04-13 11:16   ` [PATCH v3 10/10] repack: disable writing bitmaps when doing a local repack Patrick Steinhardt
2023-04-13 14:54   ` [PATCH v3 00/10] repack: fix geometric repacking with gitalternates Derrick Stolee
2023-04-14  2:03   ` Junio C Hamano
2023-04-14  5:42     ` Patrick Steinhardt
2023-04-14  6:01 ` [PATCH v4 " Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 01/10] midx: fix segfault with no packs and invalid preferred pack Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 02/10] repack: fix trying to use preferred pack in alternates Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 03/10] repack: fix generating multi-pack-index with only non-local packs Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 04/10] pack-objects: split out `--stdin-packs` tests into separate file Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 05/10] pack-objects: fix error when packing same pack twice Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 06/10] pack-objects: fix error when same packfile is included and excluded Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 07/10] pack-objects: extend test coverage of `--stdin-packs` with alternates Patrick Steinhardt
2023-04-14  6:02   ` [PATCH v4 08/10] t/helper: allow chmtime to print verbosely without modifying mtime Patrick Steinhardt
2023-04-14  6:02   ` [PATCH v4 09/10] repack: honor `-l` when calculating pack geometry Patrick Steinhardt
2023-04-14  6:02   ` [PATCH v4 10/10] repack: disable writing bitmaps when doing a local repack Patrick Steinhardt
2023-04-14 13:23   ` [PATCH v4 00/10] repack: fix geometric repacking with gitalternates Derrick Stolee
2023-04-14 17:29     ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZDSgbYa+j/5c5t8j@nand.local \
    --to=me@ttaylorr.com \
    --cc=derrickstolee@github.com \
    --cc=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    --cc=ps@pks.im \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).