git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] repack: fix geometric repacking with gitalternates
@ 2023-04-04 11:08 Patrick Steinhardt
  2023-04-04 18:55 ` Taylor Blau
                   ` (3 more replies)
  0 siblings, 4 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-04 11:08 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 4757 bytes --]

Performing geometric repacking with repositories that have alternate
object directories set up is causing errors in different scenarios.

- Repacking fails when the repository linked to the object
  directory does not have any objects on its own.

  ```
  $ git init shared
  $ git -C shared commit --allow-empty --message something
  $ git clone --shared shared member
  $ git -C member repack --geometric=2 --write-midx
  Nothing new to pack.
  warning: unknown preferred pack: 'pack-3e1a94a8dc9bb4defb0d98ce2ebf325312d76362.pack'
  error: multi-pack-index died of signal 7
  ```

- Repacking fails when the repository linked to the object
  directory has the exact same packfile as the linked-to object
  directory.

  ```
  $ git init shared
  $ git -C shared commit --allow-empty --message something
  $ git -C shared repack -Ad
  $ cp -r shared member
  $ realpath shared/.git/objects >member/.git/objects/info/alternates
  $ git -C member repack --geometric 2 --write-midx
  fatal: could not find pack 'pack-d404037a861afe456e07a1aefb3655150f1299f0.pack'
  ```

Both issues have the same underlying root cause, which is that geometric
repacks don't honor whether packfiles are local or not. As a result,
they will try to include packs part of the alternate object directory
and then at a later point fail to locate them as they do not exist in
the object directory of the repository we're about to repack.

Skip over packfiles that aren't local. This will cause geometric repacks
to never include packfiles of its alternates.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/repack.c            |  6 ++++
 t/t7703-repack-geometric.sh | 59 +++++++++++++++++++++++++++++++++++++
 2 files changed, 65 insertions(+)

diff --git a/builtin/repack.c b/builtin/repack.c
index 87f73c8923..c6d12fa4bd 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -333,6 +333,12 @@ static void init_pack_geometry(struct pack_geometry **geometry_p,
 	geometry = *geometry_p;
 
 	for (p = get_all_packs(the_repository); p; p = p->next) {
+		/*
+		 * We don't want to repack packfiles which are not part of the
+		 * primary object database.
+		 */
+		if (!p->pack_local)
+			continue;
 		if (!pack_kept_objects) {
 			/*
 			 * Any pack that has its pack_keep bit set will appear
diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
index 8821fbd2dd..9f8bc663e4 100755
--- a/t/t7703-repack-geometric.sh
+++ b/t/t7703-repack-geometric.sh
@@ -281,4 +281,63 @@ test_expect_success '--geometric with pack.packSizeLimit' '
 	)
 '
 
+packed_objects() {
+	git verify-pack -v "$@" >tmp-object-list &&
+	sed -n -e "s/^\([0-9a-f][0-9a-f]*\).*\(commit\|tree\|blob\|tag\).*/\1/p" <tmp-object-list &&
+	rm -f tmp-object-list
+}
+
+test_expect_success '--geometric with no local objects creates no pack' '
+	git init shared &&
+	test_when_finished "rm -fr shared" &&
+	test_commit -C shared "shared" &&
+	git -C shared repack -Ad &&
+
+	# Set up the member repository so that it does not have
+	# any objects on its own.
+	git clone --shared shared member &&
+	test_when_finished "rm -fr member" &&
+
+	git -C member repack --geometric 2 --write-midx &&
+	find member/.git/objects/pack -type f >actual &&
+	test_must_be_empty actual
+'
+
+test_expect_success '--geometric does not include shared packfiles' '
+	git init shared &&
+	test_when_finished "rm -fr shared" &&
+	test_commit -C shared "shared" &&
+	git -C shared repack -Ad &&
+
+	git clone --shared shared member &&
+	test_when_finished "rm -fr member" &&
+	git -C member commit --allow-empty --message "not-shared" &&
+	git -C member repack --geometric 2 --write-midx &&
+
+	# We expect the created packfile to only contain the new commit.
+	packed_objects member/.git/objects/pack/pack-*.idx >actual &&
+	git -C member rev-parse HEAD >expect &&
+	test_cmp expect actual
+'
+
+test_expect_success '--geometric with same packfile in shared repository' '
+	git init shared &&
+	test_when_finished "rm -fr shared" &&
+	test_commit -C shared "shared" &&
+	git -C shared repack -Ad &&
+
+	# Prepare the member repository so that it got the exact same packfile
+	# as the shared repository and set up gitalternates.
+	cp -r shared member &&
+	test_when_finished "rm -fr member" &&
+	test-tool path-utils real_path shared/.git/objects >member/.git/objects/info/alternates &&
+	find shared/.git/objects -type f >expect &&
+
+	# After repacking, contents of the member repository should not have
+	# changed.
+	git -C member repack --geometric 2 --write-midx 2>error &&
+	find shared/.git/objects -type f >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.40.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH] repack: fix geometric repacking with gitalternates
  2023-04-04 11:08 [PATCH] repack: fix geometric repacking with gitalternates Patrick Steinhardt
@ 2023-04-04 18:55 ` Taylor Blau
  2023-04-04 19:00   ` Taylor Blau
  2023-04-05  7:08   ` Patrick Steinhardt
  2023-04-12 10:22 ` [PATCH v2 0/8] " Patrick Steinhardt
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 67+ messages in thread
From: Taylor Blau @ 2023-04-04 18:55 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, peff, dstolee

On Tue, Apr 04, 2023 at 01:08:33PM +0200, Patrick Steinhardt wrote:
> Both issues have the same underlying root cause, which is that geometric
> repacks don't honor whether packfiles are local or not. As a result,
> they will try to include packs part of the alternate object directory
> and then at a later point fail to locate them as they do not exist in
> the object directory of the repository we're about to repack.

Interesting. This fix does make sense, but I wonder if it is doing the
right thing.

When we have an alternated repository and do 'git repack -ad' in the
"member" repository, we'll end up creating a new pack containing all
objects in that repository, including ones from the alternate.

For example, if you do something like:

    rm -fr shared member

    git init shared
    git -C shared commit --allow-empty -m "base" && git -C shared repack -d

    git clone --shared shared member
    git -C member repack -ad

    for dir in shared member
    do
      echo "==> $dir"
      find $dir/.git/objects -type f
    done

Then you'll end up with the output:

    ==> shared
    shared/.git/objects/pack/pack-a2f0c663b287c3eeb0207397f8cafb9ae49f8277.idx
    shared/.git/objects/pack/pack-a2f0c663b287c3eeb0207397f8cafb9ae49f8277.pack
    shared/.git/objects/info/packs
    ==> member
    member/.git/objects/pack/pack-a2f0c663b287c3eeb0207397f8cafb9ae49f8277.idx
    member/.git/objects/pack/pack-a2f0c663b287c3eeb0207397f8cafb9ae49f8277.pack
    member/.git/objects/info/alternates
    member/.git/objects/info/packs

In other words, we end up creating the pack necessary in the member
repository necessary to complete the repack. Since we're using `-a`
here, that means creating an identical pack as we have in the shared
repository.

If we instead did something like:

    git -C member repack -adl # <- `-l` is new here

, our output changes to instead contain the following (empty) pack
directory in the member repository:

    ==> member
    member/.git/objects/info/alternates
    member/.git/objects/info/packs

> Skip over packfiles that aren't local. This will cause geometric repacks
> to never include packfiles of its alternates.

...So I wonder if this is the right thing to do in all cases. IOW,
should we be creating packs in the "member" repository which may be
based off of packs from the shared repository when `-l` is not
specified?

I think this gets tricky. For one, the geometric repack code creates at
most one new pack. So if some of the packs that were above the split
line (IOW, the ones that don't need to be squashed together) were in the
shared repository, I'm not sure how you'd write a MIDX that covers both
without using the MIDX-over-alternates feature. I have no idea how that
works with MIDX bitmaps (IIUC, the MIDX/alternates feature is very
niche).

I think we reasonably could do something like ignoring non-local packs
in `init_pack_geometry()` only when `-l` is given. That still runs into
problems when trying to write a MIDX or MIDX bitmaps, so we should
likely declare the combination "-l --write-midx --write-bitmap-index" as
unsupported. For backwards compatibility, I think it would make sense to
have "--no-local" be the default when `--geometric` is given (which is
already the case, since po_args is zero-initialized).

I suspect in practice that nobody cares about what happens when you run
"git repack --geometric=<d> --local", but in case they do, I think the
above is probably the most reasonable behavior that I can quickly come
up with.

> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  builtin/repack.c            |  6 ++++
>  t/t7703-repack-geometric.sh | 59 +++++++++++++++++++++++++++++++++++++
>  2 files changed, 65 insertions(+)
>
> diff --git a/builtin/repack.c b/builtin/repack.c
> index 87f73c8923..c6d12fa4bd 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -333,6 +333,12 @@ static void init_pack_geometry(struct pack_geometry **geometry_p,
>  	geometry = *geometry_p;
>
>  	for (p = get_all_packs(the_repository); p; p = p->next) {
> +		/*
> +		 * We don't want to repack packfiles which are not part of the
> +		 * primary object database.
> +		 */
> +		if (!p->pack_local)
> +			continue;

So this would change to something like `if (po_args.local &&
!p->pack_local)`.

> diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
> index 8821fbd2dd..9f8bc663e4 100755
> --- a/t/t7703-repack-geometric.sh
> +++ b/t/t7703-repack-geometric.sh
> @@ -281,4 +281,63 @@ test_expect_success '--geometric with pack.packSizeLimit' '
>  	)
>  '
>
> +packed_objects() {
> +	git verify-pack -v "$@" >tmp-object-list &&
> +	sed -n -e "s/^\([0-9a-f][0-9a-f]*\).*\(commit\|tree\|blob\|tag\).*/\1/p" <tmp-object-list &&
> +	rm -f tmp-object-list
> +}

Just a nitpick, but I've found the output from index-pack to be a little
easier to work with here. I think that you could replace this function
with the following and have it work:

--- 8< ---
diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
index 9f8bc663e4..e3ac5001bd 100755
--- a/t/t7703-repack-geometric.sh
+++ b/t/t7703-repack-geometric.sh
@@ -282,9 +282,8 @@ test_expect_success '--geometric with pack.packSizeLimit' '
 '

 packed_objects() {
-	git verify-pack -v "$@" >tmp-object-list &&
-	sed -n -e "s/^\([0-9a-f][0-9a-f]*\).*\(commit\|tree\|blob\|tag\).*/\1/p" <tmp-object-list &&
-	rm -f tmp-object-list
+	git show-index <"$1" >tmp-object-list &&
+	cut -d' ' -f2 tmp-object-list
 }

 test_expect_success '--geometric with no local objects creates no pack' '
--- >8 ---

I removed the "&& rm -f tmp-object-list", since we overwrite it anyway
whenever we call packed_objects().

> +test_expect_success '--geometric with no local objects creates no pack' '
> +	git init shared &&
> +	test_when_finished "rm -fr shared" &&
> +	test_commit -C shared "shared" &&
> +	git -C shared repack -Ad &&
> +
> +	# Set up the member repository so that it does not have
> +	# any objects on its own.
> +	git clone --shared shared member &&
> +	test_when_finished "rm -fr member" &&
> +
> +	git -C member repack --geometric 2 --write-midx &&
> +	find member/.git/objects/pack -type f >actual &&
> +	test_must_be_empty actual

Another small nitpick, but I think these last two lines could be
replaced with

    test_dir_is_empty member/.git/objects/pack

or even

    test_dir_is_empty member/$packdir

> +test_expect_success '--geometric does not include shared packfiles' '
> +	git init shared &&
> +	test_when_finished "rm -fr shared" &&
> +	test_commit -C shared "shared" &&
> +	git -C shared repack -Ad &&
> +
> +	git clone --shared shared member &&
> +	test_when_finished "rm -fr member" &&
> +	git -C member commit --allow-empty --message "not-shared" &&

Any reason to make this commit empty? Could we replace this with

    test_commit -C member not-shared

?

> +test_expect_success '--geometric with same packfile in shared repository' '
> +	git init shared &&
> +	test_when_finished "rm -fr shared" &&
> +	test_commit -C shared "shared" &&
> +	git -C shared repack -Ad &&
> +
> +	# Prepare the member repository so that it got the exact same packfile
> +	# as the shared repository and set up gitalternates.
> +	cp -r shared member &&
> +	test_when_finished "rm -fr member" &&
> +	test-tool path-utils real_path shared/.git/objects >member/.git/objects/info/alternates &&
> +	find shared/.git/objects -type f >expect &&

This works, though it may be easier to write something like:

    git clone -s shared member &&
    test_when_finished "rm -fr member" &&
    # ensure that "shared" and "member" have an identical set of packs
    git -C member repack -a &&
    find shared/.git/objects -type f >expect &&

> +	# After repacking, contents of the member repository should not have
> +	# changed.
> +	git -C member repack --geometric 2 --write-midx 2>error &&

I think that "2>error" is a stray change left over from debugging, and
could probably be removed.

> +	find shared/.git/objects -type f >actual &&

This and above could be replaced with shared/$packdir. Whenever I'm
asserting the contents of a directory are unchanged, I typically like to
call the two files under comparison "before" and "after", but "expect"
and "actual" are fine here, too.

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH] repack: fix geometric repacking with gitalternates
  2023-04-04 18:55 ` Taylor Blau
@ 2023-04-04 19:00   ` Taylor Blau
  2023-04-05  7:08   ` Patrick Steinhardt
  1 sibling, 0 replies; 67+ messages in thread
From: Taylor Blau @ 2023-04-04 19:00 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, peff, dstolee

On Tue, Apr 04, 2023 at 02:55:50PM -0400, Taylor Blau wrote:
> I think we reasonably could do something like ignoring non-local packs
> in `init_pack_geometry()` only when `-l` is given. That still runs into
> problems when trying to write a MIDX or MIDX bitmaps, so we should
> likely declare the combination "-l --write-midx --write-bitmap-index" as
> unsupported. For backwards compatibility, I think it would make sense to
> have "--no-local" be the default when `--geometric` is given (which is
> already the case, since po_args is zero-initialized).

When summarizing this message to colleagues at GitHub, I came up with
this TL;DR, which I figured may be useful to share:

  TL;DR: I think that this is a (much) more complicated problem than
  originally anticipated, but the easiest thing to do is to assume that
  git repack --geometric=<d> means git repack --geometric=<d> --no-local
  unless otherwise specified, and declare --geometric=<d> --local
  unsupported when used in conjunction with --write-midx or
  --write-bitmap-index.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] repack: fix geometric repacking with gitalternates
  2023-04-04 18:55 ` Taylor Blau
  2023-04-04 19:00   ` Taylor Blau
@ 2023-04-05  7:08   ` Patrick Steinhardt
  2023-04-10 15:06     ` Derrick Stolee
  2023-04-10 23:29     ` Taylor Blau
  1 sibling, 2 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-05  7:08 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 6313 bytes --]

On Tue, Apr 04, 2023 at 02:55:50PM -0400, Taylor Blau wrote:
> On Tue, Apr 04, 2023 at 01:08:33PM +0200, Patrick Steinhardt wrote:
> > Both issues have the same underlying root cause, which is that geometric
> > repacks don't honor whether packfiles are local or not. As a result,
> > they will try to include packs part of the alternate object directory
> > and then at a later point fail to locate them as they do not exist in
> > the object directory of the repository we're about to repack.
> 
> Interesting. This fix does make sense, but I wonder if it is doing the
> right thing.
> 
> When we have an alternated repository and do 'git repack -ad' in the
> "member" repository, we'll end up creating a new pack containing all
> objects in that repository, including ones from the alternate.
> 
> For example, if you do something like:
> 
>     rm -fr shared member
> 
>     git init shared
>     git -C shared commit --allow-empty -m "base" && git -C shared repack -d
> 
>     git clone --shared shared member
>     git -C member repack -ad
> 
>     for dir in shared member
>     do
>       echo "==> $dir"
>       find $dir/.git/objects -type f
>     done
> 
> Then you'll end up with the output:
> 
>     ==> shared
>     shared/.git/objects/pack/pack-a2f0c663b287c3eeb0207397f8cafb9ae49f8277.idx
>     shared/.git/objects/pack/pack-a2f0c663b287c3eeb0207397f8cafb9ae49f8277.pack
>     shared/.git/objects/info/packs
>     ==> member
>     member/.git/objects/pack/pack-a2f0c663b287c3eeb0207397f8cafb9ae49f8277.idx
>     member/.git/objects/pack/pack-a2f0c663b287c3eeb0207397f8cafb9ae49f8277.pack
>     member/.git/objects/info/alternates
>     member/.git/objects/info/packs
> 
> In other words, we end up creating the pack necessary in the member
> repository necessary to complete the repack. Since we're using `-a`
> here, that means creating an identical pack as we have in the shared
> repository.
> 
> If we instead did something like:
> 
>     git -C member repack -adl # <- `-l` is new here
> 
> , our output changes to instead contain the following (empty) pack
> directory in the member repository:
> 
>     ==> member
>     member/.git/objects/info/alternates
>     member/.git/objects/info/packs

Yup, that's fair.

> > Skip over packfiles that aren't local. This will cause geometric repacks
> > to never include packfiles of its alternates.
> 
> ...So I wonder if this is the right thing to do in all cases. IOW,
> should we be creating packs in the "member" repository which may be
> based off of packs from the shared repository when `-l` is not
> specified?
> 
> I think this gets tricky. For one, the geometric repack code creates at
> most one new pack. So if some of the packs that were above the split
> line (IOW, the ones that don't need to be squashed together) were in the
> shared repository, I'm not sure how you'd write a MIDX that covers both
> without using the MIDX-over-alternates feature. I have no idea how that
> works with MIDX bitmaps (IIUC, the MIDX/alternates feature is very
> niche).

I agree, things would become tricky in case geometric repacks are
expected to work across alternates.

> I think we reasonably could do something like ignoring non-local packs
> in `init_pack_geometry()` only when `-l` is given. That still runs into
> problems when trying to write a MIDX or MIDX bitmaps, so we should
> likely declare the combination "-l --write-midx --write-bitmap-index" as
> unsupported. For backwards compatibility, I think it would make sense to
> have "--no-local" be the default when `--geometric` is given (which is
> already the case, since po_args is zero-initialized).

Okay, I agree that it's not all that sensible to allow writing bitmaps
in a geometric repack that spans across multiple repositories. These
bitmaps would immediately break once the shared repository performs a
repack that removes a subset of packfiles that the bitmap depends on,
which would make it non-obvious for how to even do repacks in such a
shared repository at all.

But I'm not yet sure whether I understand why `-l --write-midx` should
be prohibited like you summarized in the follow-up mail:

On Tue, Apr 04, 2023 at 02:55:50PM -0400, Taylor Blau wrote:
> TL;DR: I think that this is a (much) more complicated problem than
> originally anticipated, but the easiest thing to do is to assume that
> git repack --geometric=<d> means git repack --geometric=<d> --no-local
> unless otherwise specified, and declare --geometric=<d> --local
> unsupported when used in conjunction with --write-midx or
> --write-bitmap-index.

The newly written MIDX would of course only span over the local
packfiles, but is that even a problem? Ideally, Git would know to handle
multiple MIDX files, and in that case it would make sense both for the
shared and for the member repository to have one.

> I suspect in practice that nobody cares about what happens when you run
> "git repack --geometric=<d> --local", but in case they do, I think the
> above is probably the most reasonable behavior that I can quickly come
> up with.

Well, I do as we use alternates to implement fork networks at GitLab and
we're in the process of adopting geometric repacking. So what I want to
have is that I can perform geometric repacks both in the shared and in
the member repository that includes only the local packfiles.

And yes, I agree that the above is the most reasonable behaviour, with
the exception of disallowing MIDXs when doing a local geometric repack.
But that raises the question: what do we do about the currently-broken
behaviour when executing `git repack --geometric=<d> --no-local` in a
alternated repository?

I'd personally be fine to start honoring the `po_args.local` flag so
that we skip over any non-local packfiles there while ignoring the
larger problem of non-local geometric repacks breaking in certain
scenarios. It would at least improve the status quo as users now have a
way out in case they ever happen to hit that error. And it allows us to
use geometric repacks in alternated repositories. But are we okay with
punting on the larger issue for the time being?

Thanks for your feedback and this interesting discussion!

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] repack: fix geometric repacking with gitalternates
  2023-04-05  7:08   ` Patrick Steinhardt
@ 2023-04-10 15:06     ` Derrick Stolee
  2023-04-10 23:49       ` Taylor Blau
  2023-04-11 17:06       ` Patrick Steinhardt
  2023-04-10 23:29     ` Taylor Blau
  1 sibling, 2 replies; 67+ messages in thread
From: Derrick Stolee @ 2023-04-10 15:06 UTC (permalink / raw)
  To: Patrick Steinhardt, Taylor Blau; +Cc: git, peff, dstolee

On 4/5/2023 3:08 AM, Patrick Steinhardt wrote:
> On Tue, Apr 04, 2023 at 02:55:50PM -0400, Taylor Blau wrote:
>> On Tue, Apr 04, 2023 at 01:08:33PM +0200, Patrick Steinhardt wrote:
>>> Both issues have the same underlying root cause, which is that geometric
>>> repacks don't honor whether packfiles are local or not. As a result,
>>> they will try to include packs part of the alternate object directory
>>> and then at a later point fail to locate them as they do not exist in
>>> the object directory of the repository we're about to repack.
>>
>> Interesting. This fix does make sense, but I wonder if it is doing the
>> right thing.
>>
>> When we have an alternated repository and do 'git repack -ad' in the
>> "member" repository, we'll end up creating a new pack containing all
>> objects in that repository, including ones from the alternate.
...
>> I think we reasonably could do something like ignoring non-local packs
>> in `init_pack_geometry()` only when `-l` is given. That still runs into
>> problems when trying to write a MIDX or MIDX bitmaps, so we should
>> likely declare the combination "-l --write-midx --write-bitmap-index" as
>> unsupported. For backwards compatibility, I think it would make sense to
>> have "--no-local" be the default when `--geometric` is given (which is
>> already the case, since po_args is zero-initialized).
> 
> Okay, I agree that it's not all that sensible to allow writing bitmaps
> in a geometric repack that spans across multiple repositories. These
> bitmaps would immediately break once the shared repository performs a
> repack that removes a subset of packfiles that the bitmap depends on,
> which would make it non-obvious for how to even do repacks in such a
> shared repository at all.

We have mechanisms for avoiding writing bitmaps for packs that are not
closed under reachability. We should have some protection against writing
them for multi-pack-indexes that are not closed under reachability, if
only as a check during bitmap_writer_build().

> But I'm not yet sure whether I understand why `-l --write-midx` should
> be prohibited like you summarized in the follow-up mail:
> 
> On Tue, Apr 04, 2023 at 02:55:50PM -0400, Taylor Blau wrote:
>> TL;DR: I think that this is a (much) more complicated problem than
>> originally anticipated, but the easiest thing to do is to assume that
>> git repack --geometric=<d> means git repack --geometric=<d> --no-local
>> unless otherwise specified, and declare --geometric=<d> --local
>> unsupported when used in conjunction with --write-midx or
>> --write-bitmap-index.
> 
> The newly written MIDX would of course only span over the local
> packfiles, but is that even a problem? Ideally, Git would know to handle
> multiple MIDX files, and in that case it would make sense both for the
> shared and for the member repository to have one.

Yes, each odb is allowed a multi-pack-index, and they chain the same way
pack-files do. I agree that this restriction isn't necessary.

>> I suspect in practice that nobody cares about what happens when you run
>> "git repack --geometric=<d> --local", but in case they do, I think the
>> above is probably the most reasonable behavior that I can quickly come
>> up with.
> 
> Well, I do as we use alternates to implement fork networks at GitLab and
> we're in the process of adopting geometric repacking. So what I want to
> have is that I can perform geometric repacks both in the shared and in
> the member repository that includes only the local packfiles.
> 
> And yes, I agree that the above is the most reasonable behaviour, with
> the exception of disallowing MIDXs when doing a local geometric repack.

I think the recommended

	if (po_args.local && !p->local)
		continue;

approach is a nice _performance improvement_ for the --local case, since
it avoids adding a list of objects to be packed (and then thrown away,
because those objects exist in an alternate). Of course, we are currently
blocked on that part working because of the non-local packs being a
problem.

> But that raises the question: what do we do about the currently-broken
> behaviour when executing `git repack --geometric=<d> --no-local` in a
> alternated repository?

Much like "git repack -a --no-local" in an alternated repository, I
don't think this is something good for users to do, but I agree that it
not working is a problem we should fix.

Your original message includes error messages like "could not find pack"
and "unknown preferred pack" which makes me think the _real_ problem is
that we are not respecting the full path name of the pack-file and are
somehow localizing packs to the local object dir.

The basic reason is that write_midx_included_packs() takes all of the
pack-files from the geometry, but does not strip out the pack-files
that are not in the same object directory.

Perhaps this method could include a step to create a new, "local"
geometry containing only the packs within the local object dir. We can
then skip the --preferred-pack option and bitmap if this is different
than the original geometry (perhaps with a warning message to help
users who did this accidentally).

> I'd personally be fine to start honoring the `po_args.local` flag so
> that we skip over any non-local packfiles there while ignoring the
> larger problem of non-local geometric repacks breaking in certain
> scenarios. It would at least improve the status quo as users now have a
> way out in case they ever happen to hit that error. And it allows us to
> use geometric repacks in alternated repositories. But are we okay with
> punting on the larger issue for the time being?

I think the real bug is isolated in write_midx_included_packs() in how
it may specify packs that the multi-pack-index cannot track. It should
be worth the time exploring if there is an easy fix there, and then the
po_args.local version can be used as a backup/performance tweak on top
of that.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] repack: fix geometric repacking with gitalternates
  2023-04-05  7:08   ` Patrick Steinhardt
  2023-04-10 15:06     ` Derrick Stolee
@ 2023-04-10 23:29     ` Taylor Blau
  1 sibling, 0 replies; 67+ messages in thread
From: Taylor Blau @ 2023-04-10 23:29 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, peff, dstolee

On Wed, Apr 05, 2023 at 09:08:19AM +0200, Patrick Steinhardt wrote:
> But I'm not yet sure whether I understand why `-l --write-midx` should
> be prohibited like you summarized in the follow-up mail:
>
> On Tue, Apr 04, 2023 at 02:55:50PM -0400, Taylor Blau wrote:
> > TL;DR: I think that this is a (much) more complicated problem than
> > originally anticipated, but the easiest thing to do is to assume that
> > git repack --geometric=<d> means git repack --geometric=<d> --no-local
> > unless otherwise specified, and declare --geometric=<d> --local
> > unsupported when used in conjunction with --write-midx or
> > --write-bitmap-index.
>
> The newly written MIDX would of course only span over the local
> packfiles, but is that even a problem? Ideally, Git would know to handle
> multiple MIDX files, and in that case it would make sense both for the
> shared and for the member repository to have one.

Right, I don't think that it's a problem. But I don't know how well the
MIDX-over-alternates thing works in practice. I know that the feature
exists, but I don't think it is as battled-tested as non-alternated
MIDXs.

So I think you'd have to ban MIDX bitmaps when the MIDX spans over
multiple repositories through an alternates file, but otherwise it
should be OK.

> But that raises the question: what do we do about the currently-broken
> behaviour when executing `git repack --geometric=<d> --no-local` in a
> alternated repository?
>
> I'd personally be fine to start honoring the `po_args.local` flag so
> that we skip over any non-local packfiles there while ignoring the
> larger problem of non-local geometric repacks breaking in certain
> scenarios. It would at least improve the status quo as users now have a
> way out in case they ever happen to hit that error. And it allows us to
> use geometric repacks in alternated repositories. But are we okay with
> punting on the larger issue for the time being?

I wonder if the larger issue could be simplified into (a) honoring
`po_args.local` when doing the geometric rollup, and (b) dropping the
non-local packs when writing a MIDX.

> Thanks for your feedback and this interesting discussion!

Ditto ;-).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] repack: fix geometric repacking with gitalternates
  2023-04-10 15:06     ` Derrick Stolee
@ 2023-04-10 23:49       ` Taylor Blau
  2023-04-11 17:13         ` Patrick Steinhardt
  2023-04-11 17:06       ` Patrick Steinhardt
  1 sibling, 1 reply; 67+ messages in thread
From: Taylor Blau @ 2023-04-10 23:49 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Patrick Steinhardt, git, peff, dstolee

On Mon, Apr 10, 2023 at 11:06:49AM -0400, Derrick Stolee wrote:
> Your original message includes error messages like "could not find pack"
> and "unknown preferred pack" which makes me think the _real_ problem is
> that we are not respecting the full path name of the pack-file and are
> somehow localizing packs to the local object dir.

"could not find pack" comes from pack-objects's --stdin-packs mode when
the caller specifies a pack that doesn't exist in the repository.

I'm not sure what's going on here, since read_packs_list_from_stdin()
searches over the result of calling get_all_packs(), which does include
packs in alternate object stores. And indeed, pack-objects can recognize
such packs:

--- 8< ---
#!/bin/sh

rm -fr shared member

git init shared
git -C shared commit --allow-empty -m "base" && git -C shared repack -d

git clone --shared shared member

basename "$(find shared/.git/objects/pack -type f -name '*.pack')" >input

git -C member pack-objects .git/objects/pack/pack --stdin-packs <input

for repo in shared member
do
	echo "==> $repo"
	find $repo/.git/objects/pack -type f
done
--- >8 ---

^^ the above results in generating the identical pack in the "member"
repository as expected:

==> shared
shared/.git/objects/pack/pack-d51c724345703411d57134d018e9643924900d39.idx
shared/.git/objects/pack/pack-d51c724345703411d57134d018e9643924900d39.pack
==> member
member/.git/objects/pack/pack-d51c724345703411d57134d018e9643924900d39.idx
member/.git/objects/pack/pack-d51c724345703411d57134d018e9643924900d39.pack

> The basic reason is that write_midx_included_packs() takes all of the
> pack-files from the geometry, but does not strip out the pack-files
> that are not in the same object directory.

I agree here; the pack-geometry code should be stripping out non-local
packs when writing a MIDX regardless of whether the caller passed
--local or not.

> Perhaps this method could include a step to create a new, "local"
> geometry containing only the packs within the local object dir. We can
> then skip the --preferred-pack option and bitmap if this is different
> than the original geometry (perhaps with a warning message to help
> users who did this accidentally).

It would be nice to not have to juggle multiple pack geometry structs,
since the logic of what gets repacked, what gets thrown away, and what
gets kept is already fairly complicated (at least to me) and pretty
fragile.

I think if I were sketching this out, I'd start out by doing something
like:

--- 8< ---
diff --git a/builtin/repack.c b/builtin/repack.c
index df4d8e0f0b..eab5f58444 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -558,6 +558,10 @@ static void midx_included_packs(struct string_list *include,
 		for (i = geometry->split; i < geometry->pack_nr; i++) {
 			struct packed_git *p = geometry->pack[i];

+			/* MIDXs cannot refer to non-local packs */
+			if (!p->pack_local)
+				continue;
+
 			strbuf_addstr(&buf, pack_basename(p));
 			strbuf_strip_suffix(&buf, ".pack");
 			strbuf_addstr(&buf, ".idx");
--- >8 ---

...and I actually think that might do it, since:

	- existing_nonkept_packs is populated by calling readdir() on the local
		repository's $GIT_DIR/objects/pack directory, so it will never contain
		any non-local packs.

	- existing_kept_packs is also OK for the same reason

	- names (which tracks the packs that we just wrote) will never contain a
		non-local pack, since we never write packs outside of our local pack
		directory

So that would cause you to write a MIDX containing only local packs (as
desired) regardless of whether or not the caller passed --[no]-local or
not.

> > I'd personally be fine to start honoring the `po_args.local` flag so
> > that we skip over any non-local packfiles there while ignoring the
> > larger problem of non-local geometric repacks breaking in certain
> > scenarios. It would at least improve the status quo as users now have a
> > way out in case they ever happen to hit that error. And it allows us to
> > use geometric repacks in alternated repositories. But are we okay with
> > punting on the larger issue for the time being?
>
> I think the real bug is isolated in write_midx_included_packs() in how
> it may specify packs that the multi-pack-index cannot track. It should
> be worth the time exploring if there is an easy fix there, and then the
> po_args.local version can be used as a backup/performance tweak on top
> of that.

Yeah, seeing "unknown preferred pack" makes me suspicious that we tried
to write a MIDX that contains a pack outside of our repository. I tried
to reproduce that case with the following script but couldn't do it (the
pack that the MIDX does track is the local one, as expected):

--- 8< ---
#!/bin/sh

rm -fr shared member

git init shared
git -C shared commit --allow-empty -m "base" && git -C shared repack -d

git clone --shared shared member

git -C member commit --allow-empty -m "base" && git -C member repack -d

{
	basename "$(find shared/.git/objects/pack -type f -name '*.idx')"
	basename "$(find member/.git/objects/pack -type f -name '*.idx')"
} |
git.compile -C member multi-pack-index write --stdin-packs

for repo in shared member
do
	echo "==> $repo"
	find $repo/.git/objects/pack -type f
done

( cd member && ~/src/git/t/helper/test-tool read-midx --show-objects .git/objects )
--- >8 ---

But the MIDX code that is responsible for collecting the packs specified
over --stdin-packs enumerates the possible packs by calling
`for_each_file_in_pack_dir()` with our local `object_dir`, so I don't
think it's possible to have a MIDX that contains a non-local pack.

IOW, I agree with you that the bug is in write_midx_included_packs(),
and I think the fix that I outlined would do the trick.

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH] repack: fix geometric repacking with gitalternates
  2023-04-10 15:06     ` Derrick Stolee
  2023-04-10 23:49       ` Taylor Blau
@ 2023-04-11 17:06       ` Patrick Steinhardt
  2023-04-11 17:26         ` Patrick Steinhardt
  1 sibling, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-11 17:06 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Taylor Blau, git, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 1975 bytes --]

On Mon, Apr 10, 2023 at 11:06:49AM -0400, Derrick Stolee wrote:
> On 4/5/2023 3:08 AM, Patrick Steinhardt wrote:
> > On Tue, Apr 04, 2023 at 02:55:50PM -0400, Taylor Blau wrote:
> >> On Tue, Apr 04, 2023 at 01:08:33PM +0200, Patrick Steinhardt wrote:
[snip]
> > I'd personally be fine to start honoring the `po_args.local` flag so
> > that we skip over any non-local packfiles there while ignoring the
> > larger problem of non-local geometric repacks breaking in certain
> > scenarios. It would at least improve the status quo as users now have a
> > way out in case they ever happen to hit that error. And it allows us to
> > use geometric repacks in alternated repositories. But are we okay with
> > punting on the larger issue for the time being?
> 
> I think the real bug is isolated in write_midx_included_packs() in how
> it may specify packs that the multi-pack-index cannot track. It should
> be worth the time exploring if there is an easy fix there, and then the
> po_args.local version can be used as a backup/performance tweak on top
> of that.

The problems are quite multi-faceted, but taken on their own most of the
problems actually seem quite easy to fix. I've got a local patch series
that addresses almost all of the pain points I have found until now:

- A segfault in git-multi-pack-index(1) when passed no packfiles and a
  preferred packfile that it cannot find.

- The issue that git-repack(1) asks git-multi-pack-index(1) to write an
  MIDX with packs that it cannot actually track because they are not
  local.

- An issue with git-pack-objects(1) that keeps it from packing objects
  that are non-local due to the way `--stdin-packs` is only referring to
  the packfile basenames.

- The fact that we don't honor `-l` when doing geometric repacking.

I'm still busy polishing it and finding testcases for each of these to
demonstrate the actual bugs, but will send out the series later this
week.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] repack: fix geometric repacking with gitalternates
  2023-04-10 23:49       ` Taylor Blau
@ 2023-04-11 17:13         ` Patrick Steinhardt
  2023-04-11 21:13           ` Taylor Blau
  0 siblings, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-11 17:13 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Derrick Stolee, git, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 2893 bytes --]

On Mon, Apr 10, 2023 at 07:49:01PM -0400, Taylor Blau wrote:
> On Mon, Apr 10, 2023 at 11:06:49AM -0400, Derrick Stolee wrote:
[snip]
> > Perhaps this method could include a step to create a new, "local"
> > geometry containing only the packs within the local object dir. We can
> > then skip the --preferred-pack option and bitmap if this is different
> > than the original geometry (perhaps with a warning message to help
> > users who did this accidentally).
> 
> It would be nice to not have to juggle multiple pack geometry structs,
> since the logic of what gets repacked, what gets thrown away, and what
> gets kept is already fairly complicated (at least to me) and pretty
> fragile.
> 
> I think if I were sketching this out, I'd start out by doing something
> like:
> 
> --- 8< ---
> diff --git a/builtin/repack.c b/builtin/repack.c
> index df4d8e0f0b..eab5f58444 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -558,6 +558,10 @@ static void midx_included_packs(struct string_list *include,
>  		for (i = geometry->split; i < geometry->pack_nr; i++) {
>  			struct packed_git *p = geometry->pack[i];
> 
> +			/* MIDXs cannot refer to non-local packs */
> +			if (!p->pack_local)
> +				continue;
> +
>  			strbuf_addstr(&buf, pack_basename(p));
>  			strbuf_strip_suffix(&buf, ".pack");
>  			strbuf_addstr(&buf, ".idx");
> --- >8 ---
> 
> ...and I actually think that might do it, since:
> 
> 	- existing_nonkept_packs is populated by calling readdir() on the local
> 		repository's $GIT_DIR/objects/pack directory, so it will never contain
> 		any non-local packs.
> 
> 	- existing_kept_packs is also OK for the same reason
> 
> 	- names (which tracks the packs that we just wrote) will never contain a
> 		non-local pack, since we never write packs outside of our local pack
> 		directory
> 
> So that would cause you to write a MIDX containing only local packs (as
> desired) regardless of whether or not the caller passed --[no]-local or
> not.

I think this doesn't actually do anything. The reason is that the
`--stdin-packs` option in git-multi-pack-index(1) does not actually
treat the provided packs as packs that are to be included in the
multi-pack-index. Instead, it uses it as a filter for `get_all_packs()`.
And as `get_all_packs()` returns packfiles paths prefixed with the
alternate directory's name, but we pass in basenames only, we would
already be filtering out any non-local packfiles.

So ultimately, we would only ever write an MIDX containing only local
packs already. It rather feels like this is only by chance though, so I
think it is good to include your patch regardless of whether it actually
does something or not. Better be explicit here, also as documentation to
the reader.

I'll include it as part of my patch series that I'm to send out later
this week.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] repack: fix geometric repacking with gitalternates
  2023-04-11 17:06       ` Patrick Steinhardt
@ 2023-04-11 17:26         ` Patrick Steinhardt
  2023-04-11 21:14           ` Taylor Blau
  0 siblings, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-11 17:26 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Taylor Blau, git, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 2078 bytes --]

On Tue, Apr 11, 2023 at 07:06:48PM +0200, Patrick Steinhardt wrote:
> On Mon, Apr 10, 2023 at 11:06:49AM -0400, Derrick Stolee wrote:
> > On 4/5/2023 3:08 AM, Patrick Steinhardt wrote:
> > > On Tue, Apr 04, 2023 at 02:55:50PM -0400, Taylor Blau wrote:
> > >> On Tue, Apr 04, 2023 at 01:08:33PM +0200, Patrick Steinhardt wrote:
> [snip]
> > > I'd personally be fine to start honoring the `po_args.local` flag so
> > > that we skip over any non-local packfiles there while ignoring the
> > > larger problem of non-local geometric repacks breaking in certain
> > > scenarios. It would at least improve the status quo as users now have a
> > > way out in case they ever happen to hit that error. And it allows us to
> > > use geometric repacks in alternated repositories. But are we okay with
> > > punting on the larger issue for the time being?
> > 
> > I think the real bug is isolated in write_midx_included_packs() in how
> > it may specify packs that the multi-pack-index cannot track. It should
> > be worth the time exploring if there is an easy fix there, and then the
> > po_args.local version can be used as a backup/performance tweak on top
> > of that.
> 
> The problems are quite multi-faceted, but taken on their own most of the
> problems actually seem quite easy to fix. I've got a local patch series
> that addresses almost all of the pain points I have found until now:
> 
> - A segfault in git-multi-pack-index(1) when passed no packfiles and a
>   preferred packfile that it cannot find.
> 
> - The issue that git-repack(1) asks git-multi-pack-index(1) to write an
>   MIDX with packs that it cannot actually track because they are not
>   local.
> 
> - An issue with git-pack-objects(1) that keeps it from packing objects
>   that are non-local due to the way `--stdin-packs` is only referring to
>   the packfile basenames.
> 
> - The fact that we don't honor `-l` when doing geometric repacking.

Ah, one more thing I forgot: add a safety mechanism that disables
writing bitmaps when we see non-local packs.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] repack: fix geometric repacking with gitalternates
  2023-04-11 17:13         ` Patrick Steinhardt
@ 2023-04-11 21:13           ` Taylor Blau
  2023-04-12  9:37             ` Patrick Steinhardt
  0 siblings, 1 reply; 67+ messages in thread
From: Taylor Blau @ 2023-04-11 21:13 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Derrick Stolee, git, peff, dstolee

On Tue, Apr 11, 2023 at 07:13:46PM +0200, Patrick Steinhardt wrote:
> So ultimately, we would only ever write an MIDX containing only local
> packs already. It rather feels like this is only by chance though, so I
> think it is good to include your patch regardless of whether it actually
> does something or not. Better be explicit here, also as documentation to
> the reader.

Doh, this is definitely right. I even wrote that code a while ago; shows
you how good my memory is ;-).

FWIW, I agree that even though it doesn't do anything in this instance,
it is a good safeguard to have in place, so I think including it in your
series is the right thing to do.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] repack: fix geometric repacking with gitalternates
  2023-04-11 17:26         ` Patrick Steinhardt
@ 2023-04-11 21:14           ` Taylor Blau
  0 siblings, 0 replies; 67+ messages in thread
From: Taylor Blau @ 2023-04-11 21:14 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: Derrick Stolee, git, peff, dstolee

On Tue, Apr 11, 2023 at 07:26:04PM +0200, Patrick Steinhardt wrote:
> > The problems are quite multi-faceted, but taken on their own most of the
> > problems actually seem quite easy to fix. I've got a local patch series
> > that addresses almost all of the pain points I have found until now:
> >
> > - A segfault in git-multi-pack-index(1) when passed no packfiles and a
> >   preferred packfile that it cannot find.
> >
> > - The issue that git-repack(1) asks git-multi-pack-index(1) to write an
> >   MIDX with packs that it cannot actually track because they are not
> >   local.
> >
> > - An issue with git-pack-objects(1) that keeps it from packing objects
> >   that are non-local due to the way `--stdin-packs` is only referring to
> >   the packfile basenames.
> >
> > - The fact that we don't honor `-l` when doing geometric repacking.
>
> Ah, one more thing I forgot: add a safety mechanism that disables
> writing bitmaps when we see non-local packs.

Thank you very much for working on this.

For what it's worth, I think that this whole thing is pretty cool.
Having a couple of different forges use this feature in subtly different
ways is proving to be an extremely effective way to shake out subtle
bugs in the implementation.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] repack: fix geometric repacking with gitalternates
  2023-04-11 21:13           ` Taylor Blau
@ 2023-04-12  9:37             ` Patrick Steinhardt
  0 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-12  9:37 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Derrick Stolee, git, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 1107 bytes --]

On Tue, Apr 11, 2023 at 05:13:12PM -0400, Taylor Blau wrote:
> On Tue, Apr 11, 2023 at 07:13:46PM +0200, Patrick Steinhardt wrote:
> > So ultimately, we would only ever write an MIDX containing only local
> > packs already. It rather feels like this is only by chance though, so I
> > think it is good to include your patch regardless of whether it actually
> > does something or not. Better be explicit here, also as documentation to
> > the reader.
> 
> Doh, this is definitely right. I even wrote that code a while ago; shows
> you how good my memory is ;-).
> 
> FWIW, I agree that even though it doesn't do anything in this instance,
> it is a good safeguard to have in place, so I think including it in your
> series is the right thing to do.

I was wrong, it actually does fix a problem, just not the one we thought
it would fix. When all packfiles are non-local, git-multi-pack-index(1)
would end up returning an error because it cannot find any of them. So
by skipping over these non-local packfiles early on we avoid spawning
it altogether and thus avoid the error.

Partick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v2 0/8] repack: fix geometric repacking with gitalternates
  2023-04-04 11:08 [PATCH] repack: fix geometric repacking with gitalternates Patrick Steinhardt
  2023-04-04 18:55 ` Taylor Blau
@ 2023-04-12 10:22 ` Patrick Steinhardt
  2023-04-12 10:22   ` [PATCH v2 1/8] midx: fix segfault with no packs and invalid preferred pack Patrick Steinhardt
                     ` (8 more replies)
  2023-04-13 11:16 ` [PATCH v3 00/10] " Patrick Steinhardt
  2023-04-14  6:01 ` [PATCH v4 " Patrick Steinhardt
  3 siblings, 9 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-12 10:22 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 1308 bytes --]

Hi,

this is the second version of my patch to fix geometric repacking with
repositories that are connected to an alternate object database.
Following the discussion with Taylor and Derrick this series goes into
much more detail than the first version and fixes a bunch of issues all
related to geometric repacking.

Patrick

Patrick Steinhardt (8):
  midx: fix segfault with no packs and invalid preferred pack
  repack: fix trying to use preferred pack in alternates
  repack: fix generating multi-pack-index with only non-local packs
  pack-objects: fix error when packing same pack twice
  pack-objects: fix error when same packfile is included and excluded
  pack-objects: extend test coverage of `--stdin-packs` with alternates
  repack: honor `-l` when calculating pack geometry
  repack: disable writing bitmaps when doing a local geometric repack

 builtin/pack-objects.c        |  10 +-
 builtin/repack.c              |  65 +++++++++++--
 midx.c                        |   6 ++
 t/t5319-multi-pack-index.sh   |  11 +++
 t/t5331-pack-objects-stdin.sh | 103 +++++++++++++++++++++
 t/t7703-repack-geometric.sh   | 166 ++++++++++++++++++++++++++++++++++
 6 files changed, 349 insertions(+), 12 deletions(-)
 create mode 100755 t/t5331-pack-objects-stdin.sh

-- 
2.40.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v2 1/8] midx: fix segfault with no packs and invalid preferred pack
  2023-04-12 10:22 ` [PATCH v2 0/8] " Patrick Steinhardt
@ 2023-04-12 10:22   ` Patrick Steinhardt
  2023-04-12 17:56     ` Taylor Blau
  2023-04-12 10:22   ` [PATCH v2 2/8] repack: fix trying to use preferred pack in alternates Patrick Steinhardt
                     ` (7 subsequent siblings)
  8 siblings, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-12 10:22 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 2299 bytes --]

When asked to write a multi-pack-index the user can specify a preferred
pack that is used as a tie breaker when multiple packs contain the same
objects. When this packfile cannot be found, we just pick the first pack
that is getting tracked by the newly written multi-pack-index as a
fallback.

Picking the fallback can fail in the case where we're asked to write a
multi-pack-index with no packfiles at all: picking the fallback value
will cause a segfault as we blindly index into the array of packfiles,
which would be empty.

Fix this bug by exiting early in case we have determined that the MIDX
wouldn't have any packfiles to index. While the request itself does not
make much sense anyway, it is still preferable to exit gracefully than
to abort.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 midx.c                      |  6 ++++++
 t/t5319-multi-pack-index.sh | 11 +++++++++++
 2 files changed, 17 insertions(+)

diff --git a/midx.c b/midx.c
index 9af3e5de88..22ea7ffb75 100644
--- a/midx.c
+++ b/midx.c
@@ -1307,6 +1307,12 @@ static int write_midx_internal(const char *object_dir,
 	for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx);
 	stop_progress(&ctx.progress);
 
+	if (!ctx.nr) {
+		error(_("no pack files to index."));
+		result = 1;
+		goto cleanup;
+	}
+
 	if ((ctx.m && ctx.nr == ctx.m->num_packs) &&
 	    !(packs_to_include || packs_to_drop)) {
 		struct bitmap_index *bitmap_git;
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 499d5d4c78..be7f3c1e1f 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -183,6 +183,17 @@ test_expect_success 'write midx with --stdin-packs' '
 
 compare_results_with_midx "mixed mode (one pack + extra)"
 
+test_expect_success 'write with no objects and preferred pack' '
+	test_when_finished "rm -rf empty" &&
+	git init empty &&
+	test_must_fail git -C empty multi-pack-index write \
+		--stdin-packs --preferred-pack=does-not-exist </dev/null 2>err &&
+	cat >expect <<-EOF &&
+	error: no pack files to index.
+	EOF
+	test_cmp expect err
+'
+
 test_expect_success 'write progress off for redirected stderr' '
 	git multi-pack-index --object-dir=$objdir write 2>err &&
 	test_line_count = 0 err
-- 
2.40.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 2/8] repack: fix trying to use preferred pack in alternates
  2023-04-12 10:22 ` [PATCH v2 0/8] " Patrick Steinhardt
  2023-04-12 10:22   ` [PATCH v2 1/8] midx: fix segfault with no packs and invalid preferred pack Patrick Steinhardt
@ 2023-04-12 10:22   ` Patrick Steinhardt
  2023-04-12 18:37     ` Taylor Blau
  2023-04-12 10:22   ` [PATCH v2 3/8] repack: fix generating multi-pack-index with only non-local packs Patrick Steinhardt
                     ` (6 subsequent siblings)
  8 siblings, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-12 10:22 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 4465 bytes --]

When doing a geometric repack with multi-pack-indices, then we ask
git-multi-pack-index(1) to use the largest packfile as the preferred
pack. It can happen though that the largest packfile is not part of the
main object database, but instead part of an alternate object database.
The result is that git-multi-pack-index(1) will not be able to find the
preferred pack and print a warning. It then falls back to use the first
packfile that the multi-pack-index shall reference.

Fix this bug by only considering packfiles as preferred pack that are
local. This is the right thing to do given that a multi-pack-index
should never reference packfiles borrowed from an alternate.

While at it, rename the function `get_largest_active_packfile()` to
`get_preferred_pack()` to better document its intent.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/repack.c            | 21 ++++++++++++++++-----
 t/t7703-repack-geometric.sh | 31 +++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index df4d8e0f0b..9d36dc8b84 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -448,8 +448,10 @@ static void split_pack_geometry(struct pack_geometry *geometry, int factor)
 	geometry->split = split;
 }
 
-static struct packed_git *get_largest_active_pack(struct pack_geometry *geometry)
+static struct packed_git *get_preferred_pack(struct pack_geometry *geometry)
 {
+	uint32_t i;
+
 	if (!geometry) {
 		/*
 		 * No geometry means either an all-into-one repack (in which
@@ -464,7 +466,16 @@ static struct packed_git *get_largest_active_pack(struct pack_geometry *geometry
 	}
 	if (geometry->split == geometry->pack_nr)
 		return NULL;
-	return geometry->pack[geometry->pack_nr - 1];
+
+	for (i = geometry->pack_nr; i > 0; i--)
+		/*
+		 * A pack that is not local would never be included in a
+		 * multi-pack index. We thus skip over any non-local packs.
+		 */
+		if (geometry->pack[i - 1]->pack_local)
+			return geometry->pack[i - 1];
+
+	return NULL;
 }
 
 static void clear_pack_geometry(struct pack_geometry *geometry)
@@ -591,7 +602,7 @@ static int write_midx_included_packs(struct string_list *include,
 {
 	struct child_process cmd = CHILD_PROCESS_INIT;
 	struct string_list_item *item;
-	struct packed_git *largest = get_largest_active_pack(geometry);
+	struct packed_git *preferred = get_preferred_pack(geometry);
 	FILE *in;
 	int ret;
 
@@ -612,9 +623,9 @@ static int write_midx_included_packs(struct string_list *include,
 	if (write_bitmaps)
 		strvec_push(&cmd.args, "--bitmap");
 
-	if (largest)
+	if (preferred)
 		strvec_pushf(&cmd.args, "--preferred-pack=%s",
-			     pack_basename(largest));
+			     pack_basename(preferred));
 
 	if (refs_snapshot)
 		strvec_pushf(&cmd.args, "--refs-snapshot=%s", refs_snapshot);
diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
index 8821fbd2dd..92a1aaa754 100755
--- a/t/t7703-repack-geometric.sh
+++ b/t/t7703-repack-geometric.sh
@@ -281,4 +281,35 @@ test_expect_success '--geometric with pack.packSizeLimit' '
 	)
 '
 
+test_expect_success '--geometric --write-midx with packfiles in main and alternate ODB' '
+	test_when_finished "rm -fr shared member" &&
+
+	# Create a shared repository that will serve as the alternate object
+	# database for the member linked to it. It has got some objects on its
+	# own that are packed into a single packfile.
+	git init shared &&
+	test_commit -C shared common-object &&
+	git -C shared repack -ad &&
+
+	# We create member so that its alternates file points to the shared
+	# repository. We then create a commit in it so that git-repack(1) has
+	# something to repack.
+	# of the shared object database.
+	git clone --shared shared member &&
+	test_commit -C member unique-object &&
+	git -C member repack --geometric=2 --write-midx 2>err &&
+	test_must_be_empty err &&
+
+	# We should see that a new packfile was generated.
+	find shared/.git/objects/pack -type f -name "*.pack" | sort >packs &&
+	test $(wc -l <packs) = 1 &&
+
+	# We should also see a multi-pack-index. This multi-pack-index should
+	# never refer to any packfiles in the alternate object database.
+	# Consequentially, it should be valid even with the alternate ODB
+	# deleted.
+	rm -r shared &&
+	git -C member multi-pack-index verify
+'
+
 test_done
-- 
2.40.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 3/8] repack: fix generating multi-pack-index with only non-local packs
  2023-04-12 10:22 ` [PATCH v2 0/8] " Patrick Steinhardt
  2023-04-12 10:22   ` [PATCH v2 1/8] midx: fix segfault with no packs and invalid preferred pack Patrick Steinhardt
  2023-04-12 10:22   ` [PATCH v2 2/8] repack: fix trying to use preferred pack in alternates Patrick Steinhardt
@ 2023-04-12 10:22   ` Patrick Steinhardt
  2023-04-12 20:39     ` Taylor Blau
  2023-04-12 10:22   ` [PATCH v2 4/8] pack-objects: fix error when packing same pack twice Patrick Steinhardt
                     ` (5 subsequent siblings)
  8 siblings, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-12 10:22 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 3144 bytes --]

When writing the multi-pack-index with geometric repacking we will add
all packfiles to the index that are part of the geometric sequence. This
can potentially also include packfiles borrowed from an alternate object
directory. But given that a multi-pack-index can only ever include packs
that are part of the main object database this does not make much sense
whatsoever.

In the edge case where all packfiles are contained in the alternate
object database and the local repository has none itself this bug can
cause us to invoke git-multi-pack-index(1) with only non-local packfiles
that it ultimately cannot find. This causes it to return an error and
thus causes the geometric repack to fail.

Fix the code to skip non-local packfiles.

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/repack.c            | 11 +++++++++++
 t/t7703-repack-geometric.sh | 24 ++++++++++++++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/builtin/repack.c b/builtin/repack.c
index 9d36dc8b84..80d4e813c8 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -569,6 +569,17 @@ static void midx_included_packs(struct string_list *include,
 		for (i = geometry->split; i < geometry->pack_nr; i++) {
 			struct packed_git *p = geometry->pack[i];
 
+			/*
+			 * The multi-pack index never refers to packfiles part
+			 * of an alternate object database, so we skip these.
+			 * While git-multi-pack-index(1) would silently ignore
+			 * them anyway, this allows us to skip executing the
+			 * command completely when we have only non-local
+			 * packfiles.
+			 */
+			if (!p->pack_local)
+				continue;
+
 			strbuf_addstr(&buf, pack_basename(p));
 			strbuf_strip_suffix(&buf, ".pack");
 			strbuf_addstr(&buf, ".idx");
diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
index 92a1aaa754..9e95350cbf 100755
--- a/t/t7703-repack-geometric.sh
+++ b/t/t7703-repack-geometric.sh
@@ -312,4 +312,28 @@ test_expect_success '--geometric --write-midx with packfiles in main and alterna
 	git -C member multi-pack-index verify
 '
 
+test_expect_success '--geometric --with-midx with no local objects' '
+	test_when_finished "rm -fr shared member" &&
+
+	# Create a repository with a single packfile that acts as alternate
+	# object database.
+	git init shared &&
+	test_commit -C shared "shared-objects" &&
+	git -C shared repack -ad &&
+
+	# Create a second repository linked to the first one and perform a
+	# geometric repack on it.
+	git clone --shared shared member &&
+	git -C member repack --geometric 2 --write-midx 2>err &&
+	test_must_be_empty err &&
+
+	# Assert that we wrote neither a new packfile nor a multi-pack-index.
+	# We should not have a packfile because the single packfile in the
+	# alternate object database does not invalidate the geometric sequence.
+	# And we should not have a multi-pack-index because these only index
+	# local packfiles, and there are none.
+	find member/.git/objects/pack -type f >actual &&
+	test_must_be_empty actual
+'
+
 test_done
-- 
2.40.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 4/8] pack-objects: fix error when packing same pack twice
  2023-04-12 10:22 ` [PATCH v2 0/8] " Patrick Steinhardt
                     ` (2 preceding siblings ...)
  2023-04-12 10:22   ` [PATCH v2 3/8] repack: fix generating multi-pack-index with only non-local packs Patrick Steinhardt
@ 2023-04-12 10:22   ` Patrick Steinhardt
  2023-04-12 21:33     ` Taylor Blau
  2023-04-12 10:22   ` [PATCH v2 5/8] pack-objects: fix error when same packfile is included and excluded Patrick Steinhardt
                     ` (4 subsequent siblings)
  8 siblings, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-12 10:22 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 4042 bytes --]

When passed the same packfile twice via `--stdin-packs` we return an
error that the packfile supposedly was not found. This is because when
reading packs into the list of included or excluded packfiles, we will
happily re-add packfiles even if they are part of the lists already. And
while the list can now contain duplicates, we will only set the `util`
pointer of the first list entry to the `packed_git` structure. We notice
that at a later point when checking that all list entries have their
`util` pointer set and die with an error.

While this is kind of a nonsensical request, this scenario can be hit
when doing geometric repacks. When a repository is connected to an
alternate object directory and both have the exact same packfile then
both would get added to the geometric sequence. And when we then decide
to perform the repack, we will invoke git-pack-objects(1) with the same
packfile twice.

Fix this bug by removing any duplicates from both the included and
excluded packs.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/pack-objects.c        |  2 ++
 t/t5331-pack-objects-stdin.sh | 29 +++++++++++++++++++++++++++++
 t/t7703-repack-geometric.sh   | 25 +++++++++++++++++++++++++
 3 files changed, 56 insertions(+)
 create mode 100755 t/t5331-pack-objects-stdin.sh

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 77d88f85b0..fdf3f440be 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3359,7 +3359,9 @@ static void read_packs_list_from_stdin(void)
 	}
 
 	string_list_sort(&include_packs);
+	string_list_remove_duplicates(&include_packs, 0);
 	string_list_sort(&exclude_packs);
+	string_list_remove_duplicates(&exclude_packs, 0);
 
 	for (p = get_all_packs(the_repository); p; p = p->next) {
 		const char *pack_name = pack_basename(p);
diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
new file mode 100755
index 0000000000..ab34cfc729
--- /dev/null
+++ b/t/t5331-pack-objects-stdin.sh
@@ -0,0 +1,29 @@
+#!/bin/sh
+
+test_description='pack-objects --stdin'
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+test_expect_success 'pack-objects --stdin with duplicate packfile' '
+	test_when_finished "rm -fr repo" &&
+
+	git init repo &&
+	(
+		cd repo &&
+		test_commit "commit" &&
+		git repack -ad &&
+
+		(
+			basename .git/objects/pack/pack-*.pack &&
+			basename .git/objects/pack/pack-*.pack
+		) >packfiles &&
+
+		git pack-objects --stdin-packs generated-pack <packfiles &&
+		test_cmp generated-pack-*.pack .git/objects/pack/pack-*.pack
+	)
+'
+
+test_done
diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
index 9e95350cbf..0a2f2bd46c 100755
--- a/t/t7703-repack-geometric.sh
+++ b/t/t7703-repack-geometric.sh
@@ -336,4 +336,29 @@ test_expect_success '--geometric --with-midx with no local objects' '
 	test_must_be_empty actual
 '
 
+test_expect_success '--geometric with same pack in main and alternate ODB' '
+	test_when_finished "rm -fr shared member" &&
+
+	# Create a repository with a single packfile that acts as alternate
+	# object database.
+	git init shared &&
+	test_commit -C shared "shared-objects" &&
+	git -C shared repack -ad &&
+
+	# We create the member repository as an exact copy so that it has the
+	# same packfile.
+	cp -r shared member &&
+	test-tool path-utils real_path shared/.git/objects >member/.git/objects/info/alternates &&
+	find shared/.git/objects -type f >expected-files &&
+
+	# Verify that we can repack objects as expected without observing any
+	# error. Having the same packfile in both ODBs used to cause an error
+	# in git-pack-objects(1).
+	git -C member repack --geometric 2 2>err &&
+	test_must_be_empty err &&
+	# Nothing should have changed.
+	find shared/.git/objects -type f >actual-files &&
+	test_cmp expected-files actual-files
+'
+
 test_done
-- 
2.40.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 5/8] pack-objects: fix error when same packfile is included and excluded
  2023-04-12 10:22 ` [PATCH v2 0/8] " Patrick Steinhardt
                     ` (3 preceding siblings ...)
  2023-04-12 10:22   ` [PATCH v2 4/8] pack-objects: fix error when packing same pack twice Patrick Steinhardt
@ 2023-04-12 10:22   ` Patrick Steinhardt
  2023-04-12 21:52     ` Taylor Blau
  2023-04-12 10:22   ` [PATCH v2 6/8] pack-objects: extend test coverage of `--stdin-packs` with alternates Patrick Steinhardt
                     ` (3 subsequent siblings)
  8 siblings, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-12 10:22 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 2566 bytes --]

When passing the same packfile both as included and excluded via the
`--stdin-packs` option, then we will return an error because the
excluded packfile cannot be found. This is because we will only set the
`util` pointer for the included packfile list if it was found, so that
we later die when we notice that it's in fact not set for the excluded
packfile list.

Fix this bug by always setting the `util` pointer for both the included
and excluded list entries.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/pack-objects.c        |  8 +++-----
 t/t5331-pack-objects-stdin.sh | 26 ++++++++++++++++++++++++++
 2 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index fdf3f440be..522eb4dd31 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3366,11 +3366,9 @@ static void read_packs_list_from_stdin(void)
 	for (p = get_all_packs(the_repository); p; p = p->next) {
 		const char *pack_name = pack_basename(p);
 
-		item = string_list_lookup(&include_packs, pack_name);
-		if (!item)
-			item = string_list_lookup(&exclude_packs, pack_name);
-
-		if (item)
+		if ((item = string_list_lookup(&include_packs, pack_name)))
+			item->util = p;
+		if ((item = string_list_lookup(&exclude_packs, pack_name)))
 			item->util = p;
 	}
 
diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
index ab34cfc729..f389a78b38 100755
--- a/t/t5331-pack-objects-stdin.sh
+++ b/t/t5331-pack-objects-stdin.sh
@@ -7,6 +7,12 @@ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
+packed_objects() {
+	git show-index <"$1" >tmp-object-list &&
+	cut -d' ' -f2 tmp-object-list &&
+	rm tmp-object-list
+ }
+
 test_expect_success 'pack-objects --stdin with duplicate packfile' '
 	test_when_finished "rm -fr repo" &&
 
@@ -26,4 +32,24 @@ test_expect_success 'pack-objects --stdin with duplicate packfile' '
 	)
 '
 
+test_expect_success 'pack-objects --stdin with same packfile excluded and included' '
+	test_when_finished "rm -fr repo" &&
+
+	git init repo &&
+	(
+		cd repo &&
+		test_commit "commit" &&
+		git repack -ad &&
+
+		(
+			basename .git/objects/pack/pack-*.pack &&
+			printf "^%s\n" "$(basename .git/objects/pack/pack-*.pack)"
+		) >packfiles &&
+
+		git pack-objects --stdin-packs generated-pack <packfiles &&
+		packed_objects generated-pack-*.idx >packed-objects &&
+		test_must_be_empty packed-objects
+	)
+'
+
 test_done
-- 
2.40.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 6/8] pack-objects: extend test coverage of `--stdin-packs` with alternates
  2023-04-12 10:22 ` [PATCH v2 0/8] " Patrick Steinhardt
                     ` (4 preceding siblings ...)
  2023-04-12 10:22   ` [PATCH v2 5/8] pack-objects: fix error when same packfile is included and excluded Patrick Steinhardt
@ 2023-04-12 10:22   ` Patrick Steinhardt
  2023-04-12 10:22   ` [PATCH v2 7/8] repack: honor `-l` when calculating pack geometry Patrick Steinhardt
                     ` (2 subsequent siblings)
  8 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-12 10:22 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 2590 bytes --]

We don't have any tests that verify that git-pack-objects(1) works with
`--stdin-packs` when combined with alternate object directories. Add
some to make sure that the basic functionality works as expected.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 t/t5331-pack-objects-stdin.sh | 48 +++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
index f389a78b38..8bd1d26caf 100755
--- a/t/t5331-pack-objects-stdin.sh
+++ b/t/t5331-pack-objects-stdin.sh
@@ -52,4 +52,52 @@ test_expect_success 'pack-objects --stdin with same packfile excluded and includ
 	)
 '
 
+test_expect_success 'pack-objects --stdin with packfiles from alternate object database' '
+	test_when_finished "rm -fr shared member" &&
+
+	# Set up a shared repository with a single packfile.
+	git init shared &&
+	test_commit -C shared "shared-objects" &&
+	git -C shared repack -ad &&
+	basename shared/.git/objects/pack/pack-*.pack >packfile &&
+
+	# Set up a repository that is connected to the shared repository. This
+	# repository has no objects on its own, but we still expect to be able
+	# to pack objects from its alternate.
+	git clone --shared shared member &&
+	git -C member pack-objects --stdin-packs generated-pack <packfile &&
+	test_cmp shared/.git/objects/pack/pack-*.pack member/generated-pack-*.pack
+'
+
+test_expect_success 'pack-objects --stdin with packfiles from main and alternate object database' '
+	test_when_finished "rm -fr shared member" &&
+
+	# Set up a shared repository with a single packfile.
+	git init shared &&
+	test_commit -C shared "shared-commit" &&
+	git -C shared repack -ad &&
+
+	# Set up a repository that is connected to the shared repository. This
+	# repository has a second packfile so that we can verify that it is
+	# possible to write packs that include packfiles from different object
+	# databases.
+	git clone --shared shared member &&
+	test_commit -C member "local-commit" &&
+	git -C member repack -dl &&
+
+	(
+		basename shared/.git/objects/pack/pack-*.pack  &&
+		basename member/.git/objects/pack/pack-*.pack
+	) >packfiles &&
+
+	(
+		packed_objects shared/.git/objects/pack/pack-*.idx &&
+		packed_objects member/.git/objects/pack/pack-*.idx
+	) | sort >expected-objects &&
+
+	git -C member pack-objects --stdin-packs generated-pack <packfiles &&
+	packed_objects member/generated-pack-*.idx | sort >actual-objects &&
+	test_cmp expected-objects actual-objects
+'
+
 test_done
-- 
2.40.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 7/8] repack: honor `-l` when calculating pack geometry
  2023-04-12 10:22 ` [PATCH v2 0/8] " Patrick Steinhardt
                     ` (5 preceding siblings ...)
  2023-04-12 10:22   ` [PATCH v2 6/8] pack-objects: extend test coverage of `--stdin-packs` with alternates Patrick Steinhardt
@ 2023-04-12 10:22   ` Patrick Steinhardt
  2023-04-12 23:56     ` Junio C Hamano
  2023-04-12 10:23   ` [PATCH v2 8/8] repack: disable writing bitmaps when doing a local geometric repack Patrick Steinhardt
  2023-04-12 22:02   ` [PATCH v2 0/8] repack: fix geometric repacking with gitalternates Taylor Blau
  8 siblings, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-12 10:22 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 5712 bytes --]

When the user passes `-l` to git-repack(1), then they essentially ask us
to only repack objects part of the local object database while ignoring
any packfiles part of an alternate object database. And we in fact honor
this bit when doing a geometric repack as the resulting packfile will
only ever contain local objects.

What we're missing though is that we don't take locality of packfiles
into account when computing whether the geometric sequence is intact or
not. So even though we would only ever roll up local packfiles anyway,
we could end up trying to repack because of non-local packfiles. This
does not make much sense, and in the worst case it can cause us to try
and do the geometric repack over and over again because we're never able
to restore the geometric sequence.

Fix this bug by honoring whether the user has passed `-l`. If so, we
skip adding any non-local packfiles to the pack geometry.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/repack.c            | 13 ++++++--
 t/t7703-repack-geometric.sh | 59 +++++++++++++++++++++++++++++++++++++
 2 files changed, 70 insertions(+), 2 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 80d4e813c8..f57869f14a 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -325,7 +325,8 @@ static int geometry_cmp(const void *va, const void *vb)
 }
 
 static void init_pack_geometry(struct pack_geometry **geometry_p,
-			       struct string_list *existing_kept_packs)
+			       struct string_list *existing_kept_packs,
+			       const struct pack_objects_args *args)
 {
 	struct packed_git *p;
 	struct pack_geometry *geometry;
@@ -335,6 +336,14 @@ static void init_pack_geometry(struct pack_geometry **geometry_p,
 	geometry = *geometry_p;
 
 	for (p = get_all_packs(the_repository); p; p = p->next) {
+		if (args->local && !p->pack_local)
+			/*
+			 * When asked to only repack local packfiles we skip
+			 * over any packfiles that are borrowed from alternate
+			 * object directories.
+			 */
+			continue;
+
 		if (!pack_kept_objects) {
 			/*
 			 * Any pack that has its pack_keep bit set will appear
@@ -897,7 +906,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	if (geometric_factor) {
 		if (pack_everything)
 			die(_("options '%s' and '%s' cannot be used together"), "--geometric", "-A/-a");
-		init_pack_geometry(&geometry, &existing_kept_packs);
+		init_pack_geometry(&geometry, &existing_kept_packs, &po_args);
 		split_pack_geometry(geometry, geometric_factor);
 	}
 
diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
index 0a2f2bd46c..96c8d4cdfa 100755
--- a/t/t7703-repack-geometric.sh
+++ b/t/t7703-repack-geometric.sh
@@ -10,6 +10,12 @@ objdir=.git/objects
 packdir=$objdir/pack
 midx=$objdir/pack/multi-pack-index
 
+packed_objects() {
+	git show-index <"$1" >tmp-object-list &&
+	cut -d' ' -f2 tmp-object-list &&
+	rm tmp-object-list
+ }
+
 test_expect_success '--geometric with no packs' '
 	git init geometric &&
 	test_when_finished "rm -fr geometric" &&
@@ -361,4 +367,57 @@ test_expect_success '--geometric with same pack in main and alternate ODB' '
 	test_cmp expected-files actual-files
 '
 
+test_expect_success '--geometric -l with non-intact geometric sequence across ODBs' '
+	test_when_finished "rm -fr shared member" &&
+
+	git init shared &&
+	test_commit_bulk -C shared --start=1 1 &&
+
+	git clone --shared shared member &&
+	test_commit_bulk -C member --start=2 1 &&
+
+	# Verify that our assumptions actually hold: both generated packfiles
+	# should have three objects and should be non-equal.
+	packed_objects shared/.git/objects/pack/pack-*.idx >packed-objects &&
+	test_line_count = 3 packed-objects &&
+	packed_objects member/.git/objects/pack/pack-*.idx >packed-objetcs &&
+	test_line_count = 3 packed-objects &&
+	test "$(basename member/.git/objects/pack/pack-*.pack)" != "$(basename shared/.git/objects/pack/pack-*.pack)" &&
+
+	# Perform the geometric repack. With `-l`, we should only see the local
+	# packfile and thus arrive at the conclusion that the geometric
+	# sequence is intact. We thus expect no changes.
+	#
+	# Note that we are using stat(1) to verify idempotence to also verify
+	# that the mtime did not change. This is done in order to detect the
+	# case where we do repack objects, but the resulting packfile is the
+	# same.
+	stat member/.git/objects/pack/* >expected-member-packs &&
+	git -C member repack --geometric=2 -l -d &&
+	stat member/.git/objects/pack/* >actual-member-packs &&
+	test_cmp expected-member-packs actual-member-packs &&
+
+	(
+		packed_objects shared/.git/objects/pack/pack-*.idx &&
+		packed_objects member/.git/objects/pack/pack-*.idx
+	) | sort >expected-objects &&
+
+	# On the other hand, when doing a non-local geometric repack we should
+	# see both packfiles and thus repack them. We expect that the shared
+	# object database was not changed.
+	stat shared/.git/objects/pack/* >expected-shared-packs &&
+	git -C member repack --geometric=2 -d &&
+	stat shared/.git/objects/pack/* >actual-shared-packs &&
+	test_cmp expected-shared-packs actual-shared-packs &&
+
+	# Furthermore, we expect that the member repository now has a single
+	# packfile that contains the combined shared and non-shared objects.
+	ls member/.git/objects/pack/pack-*.idx >actual &&
+	test_line_count = 1 actual &&
+	packed_objects member/.git/objects/pack/pack-*.idx >actual-objects &&
+	test_line_count = 6 actual-objects &&
+	sort <actual-objects >actual-objects.sorted &&
+	test_cmp expected-objects actual-objects.sorted
+'
+
 test_done
-- 
2.40.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v2 8/8] repack: disable writing bitmaps when doing a local geometric repack
  2023-04-12 10:22 ` [PATCH v2 0/8] " Patrick Steinhardt
                     ` (6 preceding siblings ...)
  2023-04-12 10:22   ` [PATCH v2 7/8] repack: honor `-l` when calculating pack geometry Patrick Steinhardt
@ 2023-04-12 10:23   ` Patrick Steinhardt
  2023-04-12 22:01     ` Taylor Blau
  2023-04-12 22:02   ` [PATCH v2 0/8] repack: fix geometric repacking with gitalternates Taylor Blau
  8 siblings, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-12 10:23 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 4553 bytes --]

In order to write a bitmap, we need to have full coverage of all objects
that are about to be packed. In the traditional non-multi-pack-index
world this meant we need to do a full repack of all objects into a
single packfile. But in the new multi-pack-index world we can get away
with writing bitmaps when we have multiple packfiles as long as the
multi-pack-index covers all objects.

This is not always the case though. When writing multi-pack-indices in a
repository that is connected to an alternate object directory we may end
up writing a multi-pack-index that only has partial coverage of objects.
The end result is that writing the bitmap will fail:

    $ git multi-pack-index write --stdin-packs --bitmap <packfiles
    warning: Failed to write bitmap index. Packfile doesn't have full closure (object 1529341d78cf45377407369acb0f4ff2b5cdae42 is missing)
    error: could not write multi-pack bitmap

Now there are two different ways to fix this. The first one would be to
amend git-multi-pack-index(1) to disable writing bitmaps when we notice
that we don't have full object coverage. But we ain't really got enough
information there, and seeing that it is a low-level plumbing command it
does not feel like the right place to fix this.

We can easily fix it at a higher level in git-repack(1) though. When all
of the following conditions are true:

    - We are asked to write a multi-pack index with bitmaps.

    - We are asked to only include local objects via `-l`.

    - We are connected to an alternate repository that has packfiles.

Then we will override the user's ask and disable writing bitmaps with a
warning. This is similar to what we do in git-pack-objects(1), where we
also disable writing bitmaps in case we omit an object from the pack.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/repack.c            | 20 ++++++++++++++++++++
 t/t7703-repack-geometric.sh | 27 +++++++++++++++++++++++++++
 2 files changed, 47 insertions(+)

diff --git a/builtin/repack.c b/builtin/repack.c
index f57869f14a..07d92fdf87 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -881,6 +881,26 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	if (pack_kept_objects < 0)
 		pack_kept_objects = write_bitmaps > 0 && !write_midx;
 
+	if (write_midx && write_bitmaps && geometric_factor && po_args.local) {
+		struct packed_git *p;
+
+		for (p = get_all_packs(the_repository); p; p = p->next) {
+			if (p->pack_local)
+				continue;
+
+			/*
+			 * When asked to do a local repack, but we have
+			 * packfiles that are inherited from an alternate, then
+			 * we cannot guarantee that the multi-pack-index would
+			 * have full coverage of all objects. We thus disable
+			 * writing bitmaps in that case.
+			 */
+			warning(_("disabling bitmap writing, as some objects are not being packed"));
+			write_bitmaps = 0;
+			break;
+		}
+	}
+
 	if (write_bitmaps && !(pack_everything & ALL_INTO_ONE) && !write_midx)
 		die(_(incremental_bitmap_conflict_error));
 
diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
index 96c8d4cdfa..0aaec9f853 100755
--- a/t/t7703-repack-geometric.sh
+++ b/t/t7703-repack-geometric.sh
@@ -420,4 +420,31 @@ test_expect_success '--geometric -l with non-intact geometric sequence across OD
 	test_cmp expected-objects actual-objects.sorted
 '
 
+test_expect_success '--geometric -l disables writing bitmaps with non-local packfiles' '
+	test_when_finished "rm -fr shared member" &&
+
+	git init shared &&
+	test_commit_bulk -C shared --start=1 1 &&
+
+	git clone --shared shared member &&
+	test_commit_bulk -C member --start=2 1 &&
+
+	# When performing a geometric repack with `-l` while connecting to an
+	# alternate object database that has a packfile we do not have full
+	# coverage of objects. As a result, we expect that writing the bitmap
+	# will be disabled.
+	git -C member repack -l --geometric=2 --write-midx --write-bitmap-index 2>err &&
+	cat >expect <<-EOF &&
+	warning: disabling bitmap writing, as some objects are not being packed
+	EOF
+	test_cmp expect err &&
+	test ! -f member/.git/objects/pack/multi-pack-index-*.bitmap &&
+
+	# On the other hand, when we repack without `-l`, we should see that
+	# the bitmap gets created.
+	git -C member repack --geometric=2 --write-midx --write-bitmap-index 2>err &&
+	test_must_be_empty err &&
+	test -f member/.git/objects/pack/multi-pack-index-*.bitmap
+'
+
 test_done
-- 
2.40.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 1/8] midx: fix segfault with no packs and invalid preferred pack
  2023-04-12 10:22   ` [PATCH v2 1/8] midx: fix segfault with no packs and invalid preferred pack Patrick Steinhardt
@ 2023-04-12 17:56     ` Taylor Blau
  2023-04-13  9:28       ` Patrick Steinhardt
  0 siblings, 1 reply; 67+ messages in thread
From: Taylor Blau @ 2023-04-12 17:56 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, peff, dstolee

On Wed, Apr 12, 2023 at 12:22:31PM +0200, Patrick Steinhardt wrote:
> Fix this bug by exiting early in case we have determined that the MIDX
> wouldn't have any packfiles to index. While the request itself does not
> make much sense anyway, it is still preferable to exit gracefully than
> to abort.

Interesting. This reminded me quite a bit of eb57277ba3 (midx: prevent
writing a .bitmap without any objects, 2022-02-09) which tackled a
similar problem of trying to write a MIDX bitmap without any objects.

We may want to consider moving that conditional further up, since this
makes the conditional added in eb57277ba3 dead code AFAICT. Here's a
patch on top of this one that I think would do the trick.

It has the added benefit of sticking a:

    warning: unknown preferred pack: 'does-not-exist'

in the output before dying, which might be nice (though I doubt anybody
will ever see it ;-)). The main difference is that we unset the bitmap
related bits from `flags`, which avoids us trying to compute a preferred
pack in the first place.

For it to work, though, we need to make sure that ctx.preferred_pack_idx
is set to -1, and not zero-initialized, since we'll segfault otherwise
when trying to read into an empty array.

--- 8< ---
diff --git a/midx.c b/midx.c
index 22ea7ffb75..dc4821eab8 100644
--- a/midx.c
+++ b/midx.c
@@ -1263,6 +1263,7 @@ static int write_midx_internal(const char *object_dir,
 	ctx.nr = 0;
 	ctx.alloc = ctx.m ? ctx.m->num_packs : 16;
 	ctx.info = NULL;
+	ctx.preferred_pack_idx = -1;
 	ALLOC_ARRAY(ctx.info, ctx.alloc);

 	if (ctx.m) {
@@ -1307,10 +1308,10 @@ static int write_midx_internal(const char *object_dir,
 	for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx);
 	stop_progress(&ctx.progress);

-	if (!ctx.nr) {
-		error(_("no pack files to index."));
-		result = 1;
-		goto cleanup;
+	if (!ctx.entries_nr) {
+		if (flags & MIDX_WRITE_BITMAP)
+			warning(_("refusing to write multi-pack .bitmap without any objects"));
+		flags &= ~(MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP);
 	}

 	if ((ctx.m && ctx.nr == ctx.m->num_packs) &&
@@ -1488,12 +1489,6 @@ static int write_midx_internal(const char *object_dir,
 		goto cleanup;
 	}

-	if (!ctx.entries_nr) {
-		if (flags & MIDX_WRITE_BITMAP)
-			warning(_("refusing to write multi-pack .bitmap without any objects"));
-		flags &= ~(MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP);
-	}
-
 	cf = init_chunkfile(f);

 	add_chunk(cf, MIDX_CHUNKID_PACKNAMES, pack_name_concat_len,
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index be7f3c1e1f..8a17361272 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -188,10 +188,9 @@ test_expect_success 'write with no objects and preferred pack' '
 	git init empty &&
 	test_must_fail git -C empty multi-pack-index write \
 		--stdin-packs --preferred-pack=does-not-exist </dev/null 2>err &&
-	cat >expect <<-EOF &&
-	error: no pack files to index.
-	EOF
-	test_cmp expect err
+		cat err &&
+	grep "warning: unknown preferred pack: .does-not-exist." err &&
+	grep "error: no pack files to index." err
 '

 test_expect_success 'write progress off for redirected stderr' '
--- >8 ---

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 2/8] repack: fix trying to use preferred pack in alternates
  2023-04-12 10:22   ` [PATCH v2 2/8] repack: fix trying to use preferred pack in alternates Patrick Steinhardt
@ 2023-04-12 18:37     ` Taylor Blau
  2023-04-13  9:31       ` Patrick Steinhardt
  0 siblings, 1 reply; 67+ messages in thread
From: Taylor Blau @ 2023-04-12 18:37 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, peff, dstolee

On Wed, Apr 12, 2023 at 12:22:35PM +0200, Patrick Steinhardt wrote:
> When doing a geometric repack with multi-pack-indices, then we ask
> git-multi-pack-index(1) to use the largest packfile as the preferred
> pack. It can happen though that the largest packfile is not part of the
> main object database, but instead part of an alternate object database.
> The result is that git-multi-pack-index(1) will not be able to find the
> preferred pack and print a warning. It then falls back to use the first
> packfile that the multi-pack-index shall reference.
>
> Fix this bug by only considering packfiles as preferred pack that are
> local. This is the right thing to do given that a multi-pack-index
> should never reference packfiles borrowed from an alternate.
>
> While at it, rename the function `get_largest_active_packfile()` to
> `get_preferred_pack()` to better document its intent.
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>

> @@ -464,7 +466,16 @@ static struct packed_git *get_largest_active_pack(struct pack_geometry *geometry
>  	}
>  	if (geometry->split == geometry->pack_nr)
>  		return NULL;
> -	return geometry->pack[geometry->pack_nr - 1];
> +
> +	for (i = geometry->pack_nr; i > 0; i--)
> +		/*
> +		 * A pack that is not local would never be included in a
> +		 * multi-pack index. We thus skip over any non-local packs.
> +		 */
> +		if (geometry->pack[i - 1]->pack_local)
> +			return geometry->pack[i - 1];
> +
> +	return NULL;
>  }

Looking good, we want to go through this list in reverse order, since
the packs are ordered largest to smallest. I think that you could
slightly rewrite the loop condition to avoid having to always access
`geometry->pack` at `i-1` instead of `i`, like so:

--- 8< ---
diff --git a/builtin/repack.c b/builtin/repack.c
index 9d36dc8b84..ba468ac44e 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -467,13 +467,15 @@ static struct packed_git *get_preferred_pack(struct pack_geometry *geometry)
 	if (geometry->split == geometry->pack_nr)
 		return NULL;

-	for (i = geometry->pack_nr; i > 0; i--)
+	for (i = geometry->pack_nr - 1; i >= 0; i--) {
 		/*
 		 * A pack that is not local would never be included in a
 		 * multi-pack index. We thus skip over any non-local packs.
 		 */
-		if (geometry->pack[i - 1]->pack_local)
-			return geometry->pack[i - 1];
+		struct packed_git *p = geometry->pack[i];
+		if (p->pack_local)
+			return p;
+	}

 	return NULL;
 }
--- >8 ---

but I'm not sure that the loop condition is quite right to begin with.
We don't want to iterate all the way down to the beginning of the pack
list, since some of those packs may be below the "split" line, IOW their
contents are going to be rolled up and the packs destroyed.

So I think the right loop condition would be:

    for (i = geometry->pack_nr - 1; i >= geometry->split; i--)

which I think makes the "if (geometry->split == geometry->pack_nr)"
condition above redundant with this loop, so you could probably drop
that.

I would definitely appreciate a second set of eye here. The pack *at*
the split line is considered frozen (IOW, we create a new pack
consisting of the packs in range [0, geometry->split), and leave the
packs in range [geometry->split, geometry->nr) alone).

So it should be fine to enter that loop when "i == geometry->split",
because it's OK to return that as a potential preferred pack.

> +test_expect_success '--geometric --write-midx with packfiles in main and alternate ODB' '
> +	test_when_finished "rm -fr shared member" &&

Looks familiar ;-).

> +	# Create a shared repository that will serve as the alternate object
> +	# database for the member linked to it. It has got some objects on its
> +	# own that are packed into a single packfile.
> +	git init shared &&
> +	test_commit -C shared common-object &&
> +	git -C shared repack -ad &&
> +
> +	# We create member so that its alternates file points to the shared
> +	# repository. We then create a commit in it so that git-repack(1) has
> +	# something to repack.
> +	# of the shared object database.
> +	git clone --shared shared member &&
> +	test_commit -C member unique-object &&
> +	git -C member repack --geometric=2 --write-midx 2>err &&
> +	test_must_be_empty err &&
> +
> +	# We should see that a new packfile was generated.
> +	find shared/.git/objects/pack -type f -name "*.pack" | sort >packs &&

Do we need a sort here? I don't think we rely on the order anywhere, and
it looks like we only care that there is a single entry.

If that's the case, I think we could replace the wc invocation here
with:

    test_line_count = 1 packs

> +	test $(wc -l <packs) = 1 &&
> +
> +	# We should also see a multi-pack-index. This multi-pack-index should
> +	# never refer to any packfiles in the alternate object database.
> +	# Consequentially, it should be valid even with the alternate ODB
> +	# deleted.
> +	rm -r shared &&
> +	git -C member multi-pack-index verify

To test this even more directly, I think that you could ensure that the
set of packs (which should just be the single entry found in "packs"
above) matches what we expect. That should look something like:

    test-tool read-midx member/.git/objects >packs.midx &&
    grep "^pack-.*\.idx$" packs.midx | sort >actual &&
    xargs -n 1 basename <packs | sort >expect
    test_cmp expect actual

Note that the read-midx test helper outputs packs with the "*.idx"
suffix, so you'd probably want to alter your find invocation a few lines
above accordingly, or rewrite it before writing into "actual".

Alternatively, since we expect there to be just a single pack here, I
think we could just as easily write something like:

    test-tool read-midx member/.git/objects >packs.midx &&
    grep "^pack-.*\.idx$" packs.midx >actual &&
    test_line_count = 1 actual &&
    grep $(cat actual) packs

though I slightly prefer the former since it is less brittle to changing
this test. Either way, though.

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 3/8] repack: fix generating multi-pack-index with only non-local packs
  2023-04-12 10:22   ` [PATCH v2 3/8] repack: fix generating multi-pack-index with only non-local packs Patrick Steinhardt
@ 2023-04-12 20:39     ` Taylor Blau
  0 siblings, 0 replies; 67+ messages in thread
From: Taylor Blau @ 2023-04-12 20:39 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, peff, dstolee

On Wed, Apr 12, 2023 at 12:22:39PM +0200, Patrick Steinhardt wrote:
> When writing the multi-pack-index with geometric repacking we will add
> all packfiles to the index that are part of the geometric sequence. This
> can potentially also include packfiles borrowed from an alternate object
> directory. But given that a multi-pack-index can only ever include packs
> that are part of the main object database this does not make much sense
> whatsoever.
>
> In the edge case where all packfiles are contained in the alternate
> object database and the local repository has none itself this bug can
> cause us to invoke git-multi-pack-index(1) with only non-local packfiles
> that it ultimately cannot find. This causes it to return an error and
> thus causes the geometric repack to fail.
>
> Fix the code to skip non-local packfiles.
>
> Co-authored-by: Taylor Blau <me@ttaylorr.com>

Thanks for listing me as a co-author. Doing so probably requires my
S-o-b, which you are more than free to forge here.

> +	# Assert that we wrote neither a new packfile nor a multi-pack-index.
> +	# We should not have a packfile because the single packfile in the
> +	# alternate object database does not invalidate the geometric sequence.
> +	# And we should not have a multi-pack-index because these only index
> +	# local packfiles, and there are none.
> +	find member/.git/objects/pack -type f >actual &&
> +	test_must_be_empty actual

This test looks good, though one small opportunity for cleanup might be
to replace this with a single:

    test_dir_is_empty member/$packdir

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 4/8] pack-objects: fix error when packing same pack twice
  2023-04-12 10:22   ` [PATCH v2 4/8] pack-objects: fix error when packing same pack twice Patrick Steinhardt
@ 2023-04-12 21:33     ` Taylor Blau
  0 siblings, 0 replies; 67+ messages in thread
From: Taylor Blau @ 2023-04-12 21:33 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, peff, dstolee

On Wed, Apr 12, 2023 at 12:22:44PM +0200, Patrick Steinhardt wrote:
> When passed the same packfile twice via `--stdin-packs` we return an
> error that the packfile supposedly was not found. This is because when
> reading packs into the list of included or excluded packfiles, we will
> happily re-add packfiles even if they are part of the lists already. And
> while the list can now contain duplicates, we will only set the `util`
> pointer of the first list entry to the `packed_git` structure. We notice
> that at a later point when checking that all list entries have their
> `util` pointer set and die with an error.

Well explained.

> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  builtin/pack-objects.c        |  2 ++
>  t/t5331-pack-objects-stdin.sh | 29 +++++++++++++++++++++++++++++
>  t/t7703-repack-geometric.sh   | 25 +++++++++++++++++++++++++
>  3 files changed, 56 insertions(+)
>  create mode 100755 t/t5331-pack-objects-stdin.sh
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 77d88f85b0..fdf3f440be 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3359,7 +3359,9 @@ static void read_packs_list_from_stdin(void)
>  	}
>
>  	string_list_sort(&include_packs);
> +	string_list_remove_duplicates(&include_packs, 0);
>  	string_list_sort(&exclude_packs);
> +	string_list_remove_duplicates(&exclude_packs, 0);

I for some reason thought that string_list_remove_duplicates() sorted
its first argument as a side-effect, but double checking on it, that
isn't the case. So I agree that calling string_list_remove_duplicates()
is the right thing to do here.

> diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
> new file mode 100755
> index 0000000000..ab34cfc729
> --- /dev/null
> +++ b/t/t5331-pack-objects-stdin.sh

Cool, I am glad that we are moving some of these tests out of t5300. We
could probably go a little bit further, since there are a handful of
purely `--stdin-packs`-related tests towards the bottom of that script
that could or should also be moved out.

We could also always do it later, but I wouldn't be sad to see an extra
commit before this one that introduces t5331 and moves the last six or
so `--stdin-packs` tests out of t5300. If nothing else, it would be
nice to be able to run the tests for just that feature without having to
skip a bunch of other unrelated tests.

Luckily, those tests don't depend on any of the earlier ones or setup
tests, so they should be able to move with a pure cut and paste.

> @@ -0,0 +1,29 @@
> +#!/bin/sh
> +
> +test_description='pack-objects --stdin'
> +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
> +
> +TEST_PASSES_SANITIZE_LEAK=true
> +. ./test-lib.sh
> +
> +test_expect_success 'pack-objects --stdin with duplicate packfile' '
> +	test_when_finished "rm -fr repo" &&
> +
> +	git init repo &&
> +	(
> +		cd repo &&
> +		test_commit "commit" &&
> +		git repack -ad &&
> +
> +		(
> +			basename .git/objects/pack/pack-*.pack &&
> +			basename .git/objects/pack/pack-*.pack
> +		) >packfiles &&
> +
> +		git pack-objects --stdin-packs generated-pack <packfiles &&
> +		test_cmp generated-pack-*.pack .git/objects/pack/pack-*.pack

I think that in practice that test_cmp'ing these two together will be
just fine, but if you wanted to be extra careful, you could grab their
object lists and compare those two together. That would allow you to
continue passing this test without needing to worry about if you
generated the same logical content with slightly different on-disk
content.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 5/8] pack-objects: fix error when same packfile is included and excluded
  2023-04-12 10:22   ` [PATCH v2 5/8] pack-objects: fix error when same packfile is included and excluded Patrick Steinhardt
@ 2023-04-12 21:52     ` Taylor Blau
  0 siblings, 0 replies; 67+ messages in thread
From: Taylor Blau @ 2023-04-12 21:52 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, peff, dstolee

On Wed, Apr 12, 2023 at 12:22:48PM +0200, Patrick Steinhardt wrote:
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index fdf3f440be..522eb4dd31 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -3366,11 +3366,9 @@ static void read_packs_list_from_stdin(void)
>  	for (p = get_all_packs(the_repository); p; p = p->next) {
>  		const char *pack_name = pack_basename(p);
>
> -		item = string_list_lookup(&include_packs, pack_name);
> -		if (!item)
> -			item = string_list_lookup(&exclude_packs, pack_name);
> -
> -		if (item)
> +		if ((item = string_list_lookup(&include_packs, pack_name)))
> +			item->util = p;
> +		if ((item = string_list_lookup(&exclude_packs, pack_name)))
>  			item->util = p;

Oof. I was hoping that we could avoid having to look through both
lists. But that relies on us having disjoint sets of packs in the
include and exclude lists.

We probably *could* just ban this combination outright, but that would
also involve some work to try and detect that case. So I think that
doing this (and ensuring that the resulting pack is empty, that is that
the exclude set takes precedence here) is the right thing to do.

This (and the elided test below) look great to me.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 8/8] repack: disable writing bitmaps when doing a local geometric repack
  2023-04-12 10:23   ` [PATCH v2 8/8] repack: disable writing bitmaps when doing a local geometric repack Patrick Steinhardt
@ 2023-04-12 22:01     ` Taylor Blau
  2023-04-13  9:54       ` Patrick Steinhardt
  0 siblings, 1 reply; 67+ messages in thread
From: Taylor Blau @ 2023-04-12 22:01 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, peff, dstolee

On Wed, Apr 12, 2023 at 12:23:01PM +0200, Patrick Steinhardt wrote:
> Now there are two different ways to fix this. The first one would be to
> amend git-multi-pack-index(1) to disable writing bitmaps when we notice
> that we don't have full object coverage. But we ain't really got enough
> information there, and seeing that it is a low-level plumbing command it
> does not feel like the right place to fix this.

I might actually advocate that we either fix this in both places, or fix
it at the lower level only. I think that we would still be able to
trigger this problem by invoking `git multi-pack-index write
--bitmap --stdin-packs` directly.

> ---
>  builtin/repack.c            | 20 ++++++++++++++++++++
>  t/t7703-repack-geometric.sh | 27 +++++++++++++++++++++++++++
>  2 files changed, 47 insertions(+)
>
> diff --git a/builtin/repack.c b/builtin/repack.c
> index f57869f14a..07d92fdf87 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -881,6 +881,26 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  	if (pack_kept_objects < 0)
>  		pack_kept_objects = write_bitmaps > 0 && !write_midx;
>
> +	if (write_midx && write_bitmaps && geometric_factor && po_args.local) {
> +		struct packed_git *p;
> +
> +		for (p = get_all_packs(the_repository); p; p = p->next) {
> +			if (p->pack_local)
> +				continue;
> +
> +			/*
> +			 * When asked to do a local repack, but we have
> +			 * packfiles that are inherited from an alternate, then
> +			 * we cannot guarantee that the multi-pack-index would
> +			 * have full coverage of all objects. We thus disable
> +			 * writing bitmaps in that case.
> +			 */
> +			warning(_("disabling bitmap writing, as some objects are not being packed"));
> +			write_bitmaps = 0;
> +			break;
> +		}
> +	}
> +

In terms of the higher-level fix here, though, I think that you could
reasonably assume that the alternate repository has at least one pack,
and that the combination of "write_midx && write_bitmaps &&
po.args_local" and "has any alternate(s)" is banned (or, at least, emits
the above warning and disables writing bitmaps).

But certainly ensuring that there are indeed packs in at least one of
the alternate(s) doesn't hurt either, so I don't mind this approach at
all.

One thing that I don't quite follow with this logic is why we need to
have geometric_factor set. You could (somewhat unreasonably) write a
MIDX containing a single pack (git repack -[A|a] --write-midx
--write-bitmap-index), or a MIDX containing just the new pack along with
all of the existing (local) packs, (git repack --write-midx
--write-bitmap-index).

So I think we'd want to drop the geometric_factor from the above
conditional. (And in the future, I think we typically refer to whether
or not the "geometry" pointer is NULL or not to indicate whether or not
we are doing a geometric repack, though the diff context doesn't give me
enough to know whether we have even attempted to set up that instance
yet, so this is fine, too).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 0/8] repack: fix geometric repacking with gitalternates
  2023-04-12 10:22 ` [PATCH v2 0/8] " Patrick Steinhardt
                     ` (7 preceding siblings ...)
  2023-04-12 10:23   ` [PATCH v2 8/8] repack: disable writing bitmaps when doing a local geometric repack Patrick Steinhardt
@ 2023-04-12 22:02   ` Taylor Blau
  8 siblings, 0 replies; 67+ messages in thread
From: Taylor Blau @ 2023-04-12 22:02 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, peff, dstolee

On Wed, Apr 12, 2023 at 12:22:27PM +0200, Patrick Steinhardt wrote:
> Patrick Steinhardt (8):
>   midx: fix segfault with no packs and invalid preferred pack
>   repack: fix trying to use preferred pack in alternates
>   repack: fix generating multi-pack-index with only non-local packs
>   pack-objects: fix error when packing same pack twice
>   pack-objects: fix error when same packfile is included and excluded
>   pack-objects: extend test coverage of `--stdin-packs` with alternates
>   repack: honor `-l` when calculating pack geometry
>   repack: disable writing bitmaps when doing a local geometric repack

This is looking like it's on a good track. I gave this a fairly thorough
review, and left a handful of comments ranging from "you could do it
this way, or not", to "we should probably adjust this before merging".

Let me know if you have any questions, and I'll be more than happy to
review a v2 ;-).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 7/8] repack: honor `-l` when calculating pack geometry
  2023-04-12 10:22   ` [PATCH v2 7/8] repack: honor `-l` when calculating pack geometry Patrick Steinhardt
@ 2023-04-12 23:56     ` Junio C Hamano
  2023-04-13  5:11       ` Junio C Hamano
  0 siblings, 1 reply; 67+ messages in thread
From: Junio C Hamano @ 2023-04-12 23:56 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Taylor Blau, peff, dstolee

Patrick Steinhardt <ps@pks.im> writes:

> +	# Note that we are using stat(1) to verify idempotence to also verify
> +	# that the mtime did not change. This is done in order to detect the
> +	# case where we do repack objects, but the resulting packfile is the
> +	# same.
> +	stat member/.git/objects/pack/* >expected-member-packs &&
> +	git -C member repack --geometric=2 -l -d &&
> +	stat member/.git/objects/pack/* >actual-member-packs &&

These tests that use stat(1) are very likely to be non-portable.

Worse yet, even when stat(1) is available, when running

    $ cd t && sh t7703-repack-geometric.sh --root=/dev/shm/testpen -i -v

in my environment (based on Debian), the lists fail to match at
subseconds.

--- expected-member-packs       2023-04-12 23:48:52.817035290 +0000
+++ actual-member-packs 2023-04-12 23:48:52.833036370 +0000
@@ -2,7 +2,7 @@
   Size: 1156           Blocks: 8          IO Block: 4096   regular file
 Device: 0,25   Inode: 2330515     Links: 1
 Access: (0400/-r--------)  Uid: (1001/   gitster)   Gid: (1001/gitster)
-Access: 2023-04-12 23:48:52.801034210 +0000
+Access: 2023-04-12 23:48:52.829036100 +0000
 Modify: 2023-04-12 23:48:52.773032320 +0000
 Change: 2023-04-12 23:48:52.773032320 +0000
  Birth: 2023-04-12 23:48:52.773032320 +0000
@@ -10,7 +10,7 @@
   Size: 276            Blocks: 8          IO Block: 4096   regular file
 Device: 0,25   Inode: 2330514     Links: 1
 Access: (0400/-r--------)  Uid: (1001/   gitster)   Gid: (1001/gitster)
-Access: 2023-04-12 23:48:52.781032860 +0000
+Access: 2023-04-12 23:48:52.829036100 +0000
 Modify: 2023-04-12 23:48:52.773032320 +0000
 Change: 2023-04-12 23:48:52.773032320 +0000
  Birth: 2023-04-12 23:48:52.769032050 +0000


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 7/8] repack: honor `-l` when calculating pack geometry
  2023-04-12 23:56     ` Junio C Hamano
@ 2023-04-13  5:11       ` Junio C Hamano
  2023-04-13  6:41         ` Patrick Steinhardt
  0 siblings, 1 reply; 67+ messages in thread
From: Junio C Hamano @ 2023-04-13  5:11 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Taylor Blau, peff, dstolee

Junio C Hamano <gitster@pobox.com> writes:

> Worse yet, even when stat(1) is available, when running
>
>     $ cd t && sh t7703-repack-geometric.sh --root=/dev/shm/testpen -i -v
>
> in my environment (based on Debian), the lists fail to match at
> subseconds.
> ...

A CI run of 'seen' with this topic fails on macOS:

  https://github.com/git/git/actions/runs/4683726291/jobs/8299125722#step:4:2090

Almost the same 'seen' without this topic is CI clean:

  https://github.com/git/git/actions/runs/4683470969


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 7/8] repack: honor `-l` when calculating pack geometry
  2023-04-13  5:11       ` Junio C Hamano
@ 2023-04-13  6:41         ` Patrick Steinhardt
  0 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-13  6:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Taylor Blau, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 714 bytes --]

On Wed, Apr 12, 2023 at 10:11:06PM -0700, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > Worse yet, even when stat(1) is available, when running
> >
> >     $ cd t && sh t7703-repack-geometric.sh --root=/dev/shm/testpen -i -v
> >
> > in my environment (based on Debian), the lists fail to match at
> > subseconds.
> > ...
> 
> A CI run of 'seen' with this topic fails on macOS:
> 
>   https://github.com/git/git/actions/runs/4683726291/jobs/8299125722#step:4:2090
> 
> Almost the same 'seen' without this topic is CI clean:
> 
>   https://github.com/git/git/actions/runs/4683470969
> 

Thanks, I'll come up with an alternative in v3 of this patch series.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 1/8] midx: fix segfault with no packs and invalid preferred pack
  2023-04-12 17:56     ` Taylor Blau
@ 2023-04-13  9:28       ` Patrick Steinhardt
  0 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-13  9:28 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 2525 bytes --]

On Wed, Apr 12, 2023 at 01:56:58PM -0400, Taylor Blau wrote:
> On Wed, Apr 12, 2023 at 12:22:31PM +0200, Patrick Steinhardt wrote:
> > Fix this bug by exiting early in case we have determined that the MIDX
> > wouldn't have any packfiles to index. While the request itself does not
> > make much sense anyway, it is still preferable to exit gracefully than
> > to abort.
> 
> Interesting. This reminded me quite a bit of eb57277ba3 (midx: prevent
> writing a .bitmap without any objects, 2022-02-09) which tackled a
> similar problem of trying to write a MIDX bitmap without any objects.
> 
> We may want to consider moving that conditional further up, since this
> makes the conditional added in eb57277ba3 dead code AFAICT. Here's a
> patch on top of this one that I think would do the trick.
> 
> It has the added benefit of sticking a:
> 
>     warning: unknown preferred pack: 'does-not-exist'
> 
> in the output before dying, which might be nice (though I doubt anybody
> will ever see it ;-)). The main difference is that we unset the bitmap
> related bits from `flags`, which avoids us trying to compute a preferred
> pack in the first place.
> 
> For it to work, though, we need to make sure that ctx.preferred_pack_idx
> is set to -1, and not zero-initialized, since we'll segfault otherwise
> when trying to read into an empty array.

Indeed, that is a good point. I think we can simplify your patch even
further in that case:

diff --git a/midx.c b/midx.c
index 47989f7ea7..67eb617591 100644
--- a/midx.c
+++ b/midx.c
@@ -1328,17 +1328,17 @@ static int write_midx_internal(const char *object_dir,
        }

        if (preferred_pack_name) {
-               int found = 0;
+               ctx.preferred_pack_idx = -1;
+
                for (i = 0; i < ctx.nr; i++) {
                        if (!cmp_idx_or_pack_name(preferred_pack_name,
                                                  ctx.info[i].pack_name)) {
                                ctx.preferred_pack_idx = i;
-                               found = 1;
                                break;
                        }
                }

-               if (!found)
+               if (ctx.preferred_pack_idx == -1)
                        warning(_("unknown preferred pack: '%s'"),
                                preferred_pack_name);
        } else if (ctx.nr &&

The other cases already set `preferred_pack_idx = -1`, so this is really
all we need to do to fix the segfault.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 2/8] repack: fix trying to use preferred pack in alternates
  2023-04-12 18:37     ` Taylor Blau
@ 2023-04-13  9:31       ` Patrick Steinhardt
  0 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-13  9:31 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 5395 bytes --]

On Wed, Apr 12, 2023 at 02:37:53PM -0400, Taylor Blau wrote:
> On Wed, Apr 12, 2023 at 12:22:35PM +0200, Patrick Steinhardt wrote:
> > When doing a geometric repack with multi-pack-indices, then we ask
> > git-multi-pack-index(1) to use the largest packfile as the preferred
> > pack. It can happen though that the largest packfile is not part of the
> > main object database, but instead part of an alternate object database.
> > The result is that git-multi-pack-index(1) will not be able to find the
> > preferred pack and print a warning. It then falls back to use the first
> > packfile that the multi-pack-index shall reference.
> >
> > Fix this bug by only considering packfiles as preferred pack that are
> > local. This is the right thing to do given that a multi-pack-index
> > should never reference packfiles borrowed from an alternate.
> >
> > While at it, rename the function `get_largest_active_packfile()` to
> > `get_preferred_pack()` to better document its intent.
> >
> > Signed-off-by: Patrick Steinhardt <ps@pks.im>
> 
> > @@ -464,7 +466,16 @@ static struct packed_git *get_largest_active_pack(struct pack_geometry *geometry
> >  	}
> >  	if (geometry->split == geometry->pack_nr)
> >  		return NULL;
> > -	return geometry->pack[geometry->pack_nr - 1];
> > +
> > +	for (i = geometry->pack_nr; i > 0; i--)
> > +		/*
> > +		 * A pack that is not local would never be included in a
> > +		 * multi-pack index. We thus skip over any non-local packs.
> > +		 */
> > +		if (geometry->pack[i - 1]->pack_local)
> > +			return geometry->pack[i - 1];
> > +
> > +	return NULL;
> >  }
> 
> Looking good, we want to go through this list in reverse order, since
> the packs are ordered largest to smallest. I think that you could
> slightly rewrite the loop condition to avoid having to always access
> `geometry->pack` at `i-1` instead of `i`, like so:
> 
> --- 8< ---
> diff --git a/builtin/repack.c b/builtin/repack.c
> index 9d36dc8b84..ba468ac44e 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -467,13 +467,15 @@ static struct packed_git *get_preferred_pack(struct pack_geometry *geometry)
>  	if (geometry->split == geometry->pack_nr)
>  		return NULL;
> 
> -	for (i = geometry->pack_nr; i > 0; i--)
> +	for (i = geometry->pack_nr - 1; i >= 0; i--) {
>  		/*
>  		 * A pack that is not local would never be included in a
>  		 * multi-pack index. We thus skip over any non-local packs.
>  		 */
> -		if (geometry->pack[i - 1]->pack_local)
> -			return geometry->pack[i - 1];
> +		struct packed_git *p = geometry->pack[i];
> +		if (p->pack_local)
> +			return p;
> +	}
> 
>  	return NULL;
>  }
> --- >8 ---

There's a gotcha: `i` is an `unit32_t`, so `i >= 0` would always be true
and thus we wrap around and would try to access the pack array at an
out-of-bound index.

> but I'm not sure that the loop condition is quite right to begin with.
> We don't want to iterate all the way down to the beginning of the pack
> list, since some of those packs may be below the "split" line, IOW their
> contents are going to be rolled up and the packs destroyed.
> 
> So I think the right loop condition would be:
> 
>     for (i = geometry->pack_nr - 1; i >= geometry->split; i--)
> 
> which I think makes the "if (geometry->split == geometry->pack_nr)"
> condition above redundant with this loop, so you could probably drop
> that.
> 
> I would definitely appreciate a second set of eye here. The pack *at*
> the split line is considered frozen (IOW, we create a new pack
> consisting of the packs in range [0, geometry->split), and leave the
> packs in range [geometry->split, geometry->nr) alone).
> 
> So it should be fine to enter that loop when "i == geometry->split",
> because it's OK to return that as a potential preferred pack.

That makes sense indeed. Will amend.

[snip]
> > +	test $(wc -l <packs) = 1 &&
> > +
> > +	# We should also see a multi-pack-index. This multi-pack-index should
> > +	# never refer to any packfiles in the alternate object database.
> > +	# Consequentially, it should be valid even with the alternate ODB
> > +	# deleted.
> > +	rm -r shared &&
> > +	git -C member multi-pack-index verify
> 
> To test this even more directly, I think that you could ensure that the
> set of packs (which should just be the single entry found in "packs"
> above) matches what we expect. That should look something like:
> 
>     test-tool read-midx member/.git/objects >packs.midx &&
>     grep "^pack-.*\.idx$" packs.midx | sort >actual &&
>     xargs -n 1 basename <packs | sort >expect
>     test_cmp expect actual
> 
> Note that the read-midx test helper outputs packs with the "*.idx"
> suffix, so you'd probably want to alter your find invocation a few lines
> above accordingly, or rewrite it before writing into "actual".
> 
> Alternatively, since we expect there to be just a single pack here, I
> think we could just as easily write something like:
> 
>     test-tool read-midx member/.git/objects >packs.midx &&
>     grep "^pack-.*\.idx$" packs.midx >actual &&
>     test_line_count = 1 actual &&
>     grep $(cat actual) packs
> 
> though I slightly prefer the former since it is less brittle to changing
> this test. Either way, though.

Thanks, I've adopted that approach with a slight modification.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 8/8] repack: disable writing bitmaps when doing a local geometric repack
  2023-04-12 22:01     ` Taylor Blau
@ 2023-04-13  9:54       ` Patrick Steinhardt
  2023-04-13 10:14         ` Patrick Steinhardt
  0 siblings, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-13  9:54 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 4306 bytes --]

On Wed, Apr 12, 2023 at 06:01:06PM -0400, Taylor Blau wrote:
> On Wed, Apr 12, 2023 at 12:23:01PM +0200, Patrick Steinhardt wrote:
> > Now there are two different ways to fix this. The first one would be to
> > amend git-multi-pack-index(1) to disable writing bitmaps when we notice
> > that we don't have full object coverage. But we ain't really got enough
> > information there, and seeing that it is a low-level plumbing command it
> > does not feel like the right place to fix this.
> 
> I might actually advocate that we either fix this in both places, or fix
> it at the lower level only. I think that we would still be able to
> trigger this problem by invoking `git multi-pack-index write
> --bitmap --stdin-packs` directly.

The problem I see with implementing the fix is that we're just not in a
good position to judge whether we have full coverage of objects or not.
All we see is a set of packfiles, and those packfiles _could_ have full
coverage, but they may just as well not have full coverage. And whether
they do is not easy to figure out in git-multi-pack-index(1).

So in order to fix this we'd likely have to use heuristics, like whether
or not there are alternates or alternate packfiles. But unconditionally
disabling bitmaps when there are feels overly restrictive to me as it
would break perfectly-valid usecases.

I'm thus still not convinced we should implement it at the lowest level
possible. While it would be nice to deduplicate the logic around this,
I wouldn't want to close doors we don't necessarily have to.

> > ---
> >  builtin/repack.c            | 20 ++++++++++++++++++++
> >  t/t7703-repack-geometric.sh | 27 +++++++++++++++++++++++++++
> >  2 files changed, 47 insertions(+)
> >
> > diff --git a/builtin/repack.c b/builtin/repack.c
> > index f57869f14a..07d92fdf87 100644
> > --- a/builtin/repack.c
> > +++ b/builtin/repack.c
> > @@ -881,6 +881,26 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
> >  	if (pack_kept_objects < 0)
> >  		pack_kept_objects = write_bitmaps > 0 && !write_midx;
> >
> > +	if (write_midx && write_bitmaps && geometric_factor && po_args.local) {
> > +		struct packed_git *p;
> > +
> > +		for (p = get_all_packs(the_repository); p; p = p->next) {
> > +			if (p->pack_local)
> > +				continue;
> > +
> > +			/*
> > +			 * When asked to do a local repack, but we have
> > +			 * packfiles that are inherited from an alternate, then
> > +			 * we cannot guarantee that the multi-pack-index would
> > +			 * have full coverage of all objects. We thus disable
> > +			 * writing bitmaps in that case.
> > +			 */
> > +			warning(_("disabling bitmap writing, as some objects are not being packed"));
> > +			write_bitmaps = 0;
> > +			break;
> > +		}
> > +	}
> > +
> 
> In terms of the higher-level fix here, though, I think that you could
> reasonably assume that the alternate repository has at least one pack,
> and that the combination of "write_midx && write_bitmaps &&
> po.args_local" and "has any alternate(s)" is banned (or, at least, emits
> the above warning and disables writing bitmaps).
> 
> But certainly ensuring that there are indeed packs in at least one of
> the alternate(s) doesn't hurt either, so I don't mind this approach at
> all.

It's an edge case for sure. I don't quite mind which way we go either.
For now I'll just keep the current way of doing things, but am happy to
change it.

> One thing that I don't quite follow with this logic is why we need to
> have geometric_factor set. You could (somewhat unreasonably) write a
> MIDX containing a single pack (git repack -[A|a] --write-midx
> --write-bitmap-index), or a MIDX containing just the new pack along with
> all of the existing (local) packs, (git repack --write-midx
> --write-bitmap-index).
> 
> So I think we'd want to drop the geometric_factor from the above
> conditional. (And in the future, I think we typically refer to whether
> or not the "geometry" pointer is NULL or not to indicate whether or not
> we are doing a geometric repack, though the diff context doesn't give me
> enough to know whether we have even attempted to set up that instance
> yet, so this is fine, too).

Mh. Yeah, I think you're right. I'll change it.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v2 8/8] repack: disable writing bitmaps when doing a local geometric repack
  2023-04-13  9:54       ` Patrick Steinhardt
@ 2023-04-13 10:14         ` Patrick Steinhardt
  0 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-13 10:14 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 2540 bytes --]

On Thu, Apr 13, 2023 at 11:54:55AM +0200, Patrick Steinhardt wrote:
> On Wed, Apr 12, 2023 at 06:01:06PM -0400, Taylor Blau wrote:
> > On Wed, Apr 12, 2023 at 12:23:01PM +0200, Patrick Steinhardt wrote:
[snip]
> > One thing that I don't quite follow with this logic is why we need to
> > have geometric_factor set. You could (somewhat unreasonably) write a
> > MIDX containing a single pack (git repack -[A|a] --write-midx
> > --write-bitmap-index), or a MIDX containing just the new pack along with
> > all of the existing (local) packs, (git repack --write-midx
> > --write-bitmap-index).
> > 
> > So I think we'd want to drop the geometric_factor from the above
> > conditional. (And in the future, I think we typically refer to whether
> > or not the "geometry" pointer is NULL or not to indicate whether or not
> > we are doing a geometric repack, though the diff context doesn't give me
> > enough to know whether we have even attempted to set up that instance
> > yet, so this is fine, too).
> 
> Mh. Yeah, I think you're right. I'll change it.

Actually, do we even have to care about the `write_midx` case? Right now
we have:

```
if (write_bitmaps && !(pack_everything & ALL_INTO_ONE) && !write_midx)
    die(_(incremental_bitmap_conflict_error));
```

The intent is to die when not repacking all objects into a single pack
(potentially with a cruft pack), but to allow this when writing a
multi-pack-index because they can have a bitmap that spans across
multiple packs.

Now, whether or not we write a multi-pack-index, as soon as the user
passes `-l` we cannot guarantee that we have all objects available
locally either in a single packfile nor in multiple packfiles when the
repository is connected to an alternate object directory. So in the
spirit of the preexisting check, couldn't we do the following:

```
if (write_bitmaps && po_args.local && has_alternates(repo))
    die(_("Repacking local objects is incompatible with bitmap indexes.");
```

So in words, we die when the user asks us to write a bitmap index
for a repack that is supposed to only include local objects when there
are objects that could have been inherited from an alternate object
directory.

I'm not sure whether we are okay with retroactively tightening checks
though. I'd argue it's likely fine, because it wouldn't have worked
before this check either. And I'd rather fail with explicit reasons that
are user-actionable rather than implicitly somewhere deep down in the
callstack.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v3 00/10] repack: fix geometric repacking with gitalternates
  2023-04-04 11:08 [PATCH] repack: fix geometric repacking with gitalternates Patrick Steinhardt
  2023-04-04 18:55 ` Taylor Blau
  2023-04-12 10:22 ` [PATCH v2 0/8] " Patrick Steinhardt
@ 2023-04-13 11:16 ` Patrick Steinhardt
  2023-04-13 11:16   ` [PATCH v3 01/10] midx: fix segfault with no packs and invalid preferred pack Patrick Steinhardt
                     ` (11 more replies)
  2023-04-14  6:01 ` [PATCH v4 " Patrick Steinhardt
  3 siblings, 12 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-13 11:16 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 22925 bytes --]

Hi,

this is the third version of my patch series to fix geometric repacking
with repositories that are connected to an alternate object database.

Changes compared to v2:

    - I've simplified patch 1 to reset the preferred pack index instead
      of moving around some of the checks. This also causes us to print
      the warning for missing preferred packfiles again.

    - I've fixed the logic in patch 2 to find the preferred packfile to
      not return a packfile that would be rolled up in a geometric
      repack.

    - I've added an additional patch to split out preexisting tests for
      `--stdin-packs` into their own test file.

    - I've changed the unportable test added for geometric repacking
      with `-l` that used stat(1) to instead use our `test-tool chmtime`
      helper.

    - I've changed the logic that disables writing bitmaps in git-repack
      to cover more cases. It now always kicks in when doing a repack
      with `-l` that asks for bitmaps when connected to an alternate
      object directory.

    - In general, there's a bunch of small improvements left and right
      for the tests I'm adding.

Thanks for all the feedback so far.

Patrick

Patrick Steinhardt (10):
  midx: fix segfault with no packs and invalid preferred pack
  repack: fix trying to use preferred pack in alternates
  repack: fix generating multi-pack-index with only non-local packs
  pack-objects: split out `--stdin-packs` tests into separate file
  pack-objects: fix error when packing same pack twice
  pack-objects: fix error when same packfile is included and excluded
  pack-objects: extend test coverage of `--stdin-packs` with alternates
  t/helper: allow chmtime to print verbosely without modifying mtime
  repack: honor `-l` when calculating pack geometry
  repack: disable writing bitmaps when doing a local repack

 builtin/pack-objects.c        |  10 +-
 builtin/repack.c              |  62 ++++++++-
 midx.c                        |   6 +-
 object-file.c                 |   6 +
 object-store.h                |   1 +
 t/helper/test-chmtime.c       |   2 +-
 t/t5300-pack-object.sh        | 135 -------------------
 t/t5319-multi-pack-index.sh   |  12 ++
 t/t5331-pack-objects-stdin.sh | 240 ++++++++++++++++++++++++++++++++++
 t/t7700-repack.sh             |  17 +++
 t/t7703-repack-geometric.sh   | 165 +++++++++++++++++++++++
 11 files changed, 505 insertions(+), 151 deletions(-)
 create mode 100755 t/t5331-pack-objects-stdin.sh

Range-diff against v2:
 1:  8c384aabde !  1:  dd8145bead midx: fix segfault with no packs and invalid preferred pack
    @@ Commit message
         will cause a segfault as we blindly index into the array of packfiles,
         which would be empty.
     
    -    Fix this bug by exiting early in case we have determined that the MIDX
    -    wouldn't have any packfiles to index. While the request itself does not
    -    make much sense anyway, it is still preferable to exit gracefully than
    -    to abort.
    +    Fix this bug by resetting the preferred packfile index to `-1` before
    +    searching for the preferred pack. This fixes the segfault as we already
    +    check for whether the index is `> - 1`. If it is not, we simply don't
    +    pick a preferred packfile at all.
     
    +    Helped-by: Taylor Blau <me@ttaylorr.com>
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
      ## midx.c ##
     @@ midx.c: static int write_midx_internal(const char *object_dir,
    - 	for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx);
    - 	stop_progress(&ctx.progress);
    + 	}
      
    -+	if (!ctx.nr) {
    -+		error(_("no pack files to index."));
    -+		result = 1;
    -+		goto cleanup;
    -+	}
    + 	if (preferred_pack_name) {
    +-		int found = 0;
    ++		ctx.preferred_pack_idx = -1;
     +
    - 	if ((ctx.m && ctx.nr == ctx.m->num_packs) &&
    - 	    !(packs_to_include || packs_to_drop)) {
    - 		struct bitmap_index *bitmap_git;
    + 		for (i = 0; i < ctx.nr; i++) {
    + 			if (!cmp_idx_or_pack_name(preferred_pack_name,
    + 						  ctx.info[i].pack_name)) {
    + 				ctx.preferred_pack_idx = i;
    +-				found = 1;
    + 				break;
    + 			}
    + 		}
    + 
    +-		if (!found)
    ++		if (ctx.preferred_pack_idx == -1)
    + 			warning(_("unknown preferred pack: '%s'"),
    + 				preferred_pack_name);
    + 	} else if (ctx.nr &&
     
      ## t/t5319-multi-pack-index.sh ##
     @@ t/t5319-multi-pack-index.sh: test_expect_success 'write midx with --stdin-packs' '
    @@ t/t5319-multi-pack-index.sh: test_expect_success 'write midx with --stdin-packs'
     +	test_must_fail git -C empty multi-pack-index write \
     +		--stdin-packs --preferred-pack=does-not-exist </dev/null 2>err &&
     +	cat >expect <<-EOF &&
    ++	warning: unknown preferred pack: ${SQ}does-not-exist${SQ}
     +	error: no pack files to index.
     +	EOF
     +	test_cmp expect err
 2:  5a6c2ead87 !  2:  c5db1bc587 repack: fix trying to use preferred pack in alternates
    @@ Commit message
         While at it, rename the function `get_largest_active_packfile()` to
         `get_preferred_pack()` to better document its intent.
     
    +    Helped-by: Taylor Blau <me@ttaylorr.com>
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
      ## builtin/repack.c ##
    @@ builtin/repack.c: static struct packed_git *get_largest_active_pack(struct pack_
      		return NULL;
     -	return geometry->pack[geometry->pack_nr - 1];
     +
    -+	for (i = geometry->pack_nr; i > 0; i--)
    ++	/*
    ++	 * The preferred pack is the largest pack above the split line. In
    ++	 * other words, it is the largest pack that does not get rolled up in
    ++	 * the geometric repack.
    ++	 */
    ++	for (i = geometry->pack_nr; i > geometry->split; i--)
     +		/*
     +		 * A pack that is not local would never be included in a
     +		 * multi-pack index. We thus skip over any non-local packs.
    @@ t/t7703-repack-geometric.sh: test_expect_success '--geometric with pack.packSize
     +	test_must_be_empty err &&
     +
     +	# We should see that a new packfile was generated.
    -+	find shared/.git/objects/pack -type f -name "*.pack" | sort >packs &&
    -+	test $(wc -l <packs) = 1 &&
    ++	find shared/.git/objects/pack -type f -name "*.pack" >packs &&
    ++	test_line_count = 1 packs &&
     +
     +	# We should also see a multi-pack-index. This multi-pack-index should
     +	# never refer to any packfiles in the alternate object database.
    -+	# Consequentially, it should be valid even with the alternate ODB
    -+	# deleted.
    -+	rm -r shared &&
    -+	git -C member multi-pack-index verify
    ++	test -f member/.git/objects/pack/multi-pack-index &&
    ++	test-tool read-midx member/.git/objects >packs.midx &&
    ++	grep "^pack-.*\.idx$" packs.midx | sort >actual &&
    ++	basename member/.git/objects/pack/pack-*.idx >expect &&
    ++	test_cmp expect actual
     +'
     +
      test_done
 3:  f705cd6c49 !  3:  8c3193268f repack: fix generating multi-pack-index with only non-local packs
    @@ Commit message
         Fix the code to skip non-local packfiles.
     
         Co-authored-by: Taylor Blau <me@ttaylorr.com>
    +    Signed-off-by: Taylor Blau <me@ttaylorr.com>
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
      ## builtin/repack.c ##
    @@ builtin/repack.c: static void midx_included_packs(struct string_list *include,
     
      ## t/t7703-repack-geometric.sh ##
     @@ t/t7703-repack-geometric.sh: test_expect_success '--geometric --write-midx with packfiles in main and alterna
    - 	git -C member multi-pack-index verify
    + 	test_cmp expect actual
      '
      
     +test_expect_success '--geometric --with-midx with no local objects' '
    @@ t/t7703-repack-geometric.sh: test_expect_success '--geometric --write-midx with
     +	# alternate object database does not invalidate the geometric sequence.
     +	# And we should not have a multi-pack-index because these only index
     +	# local packfiles, and there are none.
    -+	find member/.git/objects/pack -type f >actual &&
    -+	test_must_be_empty actual
    ++	test_dir_is_empty member/$packdir
     +'
     +
      test_done
 -:  ---------- >  4:  8d47d753dc pack-objects: split out `--stdin-packs` tests into separate file
 4:  ff28829589 !  5:  ac528514e7 pack-objects: fix error when packing same pack twice
    @@ builtin/pack-objects.c: static void read_packs_list_from_stdin(void)
      	for (p = get_all_packs(the_repository); p; p = p->next) {
      		const char *pack_name = pack_basename(p);
     
    - ## t/t5331-pack-objects-stdin.sh (new) ##
    -@@
    -+#!/bin/sh
    -+
    -+test_description='pack-objects --stdin'
    -+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
    -+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
    -+
    -+TEST_PASSES_SANITIZE_LEAK=true
    -+. ./test-lib.sh
    + ## t/t5331-pack-objects-stdin.sh ##
    +@@ t/t5331-pack-objects-stdin.sh: export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
    + TEST_PASSES_SANITIZE_LEAK=true
    + . ./test-lib.sh
    + 
    ++packed_objects() {
    ++	git show-index <"$1" >tmp-object-list &&
    ++	cut -d' ' -f2 tmp-object-list | sort &&
    ++	rm tmp-object-list
    ++ }
     +
    + test_expect_success 'setup for --stdin-packs tests' '
    + 	git init stdin-packs &&
    + 	(
    +@@ t/t5331-pack-objects-stdin.sh: test_expect_success '--stdin-packs with broken links' '
    + 	)
    + '
    + 
     +test_expect_success 'pack-objects --stdin with duplicate packfile' '
     +	test_when_finished "rm -fr repo" &&
     +
    @@ t/t5331-pack-objects-stdin.sh (new)
     +		test_commit "commit" &&
     +		git repack -ad &&
     +
    -+		(
    ++		{
     +			basename .git/objects/pack/pack-*.pack &&
     +			basename .git/objects/pack/pack-*.pack
    -+		) >packfiles &&
    ++		} >packfiles &&
     +
     +		git pack-objects --stdin-packs generated-pack <packfiles &&
    -+		test_cmp generated-pack-*.pack .git/objects/pack/pack-*.pack
    ++		packed_objects .git/objects/pack/pack-*.idx >expect &&
    ++		packed_objects generated-pack-*.idx >actual &&
    ++		test_cmp expect actual
     +	)
     +'
     +
    -+test_done
    + test_done
     
      ## t/t7703-repack-geometric.sh ##
     @@ t/t7703-repack-geometric.sh: test_expect_success '--geometric --with-midx with no local objects' '
    - 	test_must_be_empty actual
    + 	test_dir_is_empty member/$packdir
      '
      
     +test_expect_success '--geometric with same pack in main and alternate ODB' '
 5:  8011603d6d !  6:  afd7c7864d pack-objects: fix error when same packfile is included and excluded
    @@ builtin/pack-objects.c: static void read_packs_list_from_stdin(void)
      
     
      ## t/t5331-pack-objects-stdin.sh ##
    -@@ t/t5331-pack-objects-stdin.sh: export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
    - TEST_PASSES_SANITIZE_LEAK=true
    - . ./test-lib.sh
    - 
    -+packed_objects() {
    -+	git show-index <"$1" >tmp-object-list &&
    -+	cut -d' ' -f2 tmp-object-list &&
    -+	rm tmp-object-list
    -+ }
    -+
    - test_expect_success 'pack-objects --stdin with duplicate packfile' '
    - 	test_when_finished "rm -fr repo" &&
    - 
     @@ t/t5331-pack-objects-stdin.sh: test_expect_success 'pack-objects --stdin with duplicate packfile' '
      	)
      '
    @@ t/t5331-pack-objects-stdin.sh: test_expect_success 'pack-objects --stdin with du
     +		test_commit "commit" &&
     +		git repack -ad &&
     +
    -+		(
    ++		{
     +			basename .git/objects/pack/pack-*.pack &&
     +			printf "^%s\n" "$(basename .git/objects/pack/pack-*.pack)"
    -+		) >packfiles &&
    ++		} >packfiles &&
     +
     +		git pack-objects --stdin-packs generated-pack <packfiles &&
     +		packed_objects generated-pack-*.idx >packed-objects &&
 6:  3e958bf2a4 !  7:  cd135439ae pack-objects: extend test coverage of `--stdin-packs` with alternates
    @@ t/t5331-pack-objects-stdin.sh: test_expect_success 'pack-objects --stdin with sa
     +	test_commit -C member "local-commit" &&
     +	git -C member repack -dl &&
     +
    -+	(
    -+		basename shared/.git/objects/pack/pack-*.pack  &&
    ++	{
    ++		basename shared/.git/objects/pack/pack-*.pack &&
     +		basename member/.git/objects/pack/pack-*.pack
    -+	) >packfiles &&
    ++	} >packfiles &&
     +
    -+	(
    ++	{
     +		packed_objects shared/.git/objects/pack/pack-*.idx &&
     +		packed_objects member/.git/objects/pack/pack-*.idx
    -+	) | sort >expected-objects &&
    ++	} | sort >expected-objects &&
     +
     +	git -C member pack-objects --stdin-packs generated-pack <packfiles &&
    -+	packed_objects member/generated-pack-*.idx | sort >actual-objects &&
    ++	packed_objects member/generated-pack-*.idx >actual-objects &&
     +	test_cmp expected-objects actual-objects
     +'
     +
 -:  ---------- >  8:  18bac90289 t/helper: allow chmtime to print verbosely without modifying mtime
 7:  f014872ed4 !  9:  285695deaf repack: honor `-l` when calculating pack geometry
    @@ t/t7703-repack-geometric.sh: test_expect_success '--geometric with same pack in
     +	# packfile and thus arrive at the conclusion that the geometric
     +	# sequence is intact. We thus expect no changes.
     +	#
    -+	# Note that we are using stat(1) to verify idempotence to also verify
    -+	# that the mtime did not change. This is done in order to detect the
    -+	# case where we do repack objects, but the resulting packfile is the
    -+	# same.
    -+	stat member/.git/objects/pack/* >expected-member-packs &&
    ++	# Note that we are tweaking mtimes of the packfiles so that we can
    ++	# verify they did not change. This is done in order to detect the case
    ++	# where we do repack objects, but the resulting packfile is the same.
    ++	test-tool chmtime --verbose =0 member/.git/objects/pack/* >expected-member-packs &&
     +	git -C member repack --geometric=2 -l -d &&
    -+	stat member/.git/objects/pack/* >actual-member-packs &&
    ++	test-tool chmtime --verbose member/.git/objects/pack/* >actual-member-packs &&
     +	test_cmp expected-member-packs actual-member-packs &&
     +
    -+	(
    ++	{
     +		packed_objects shared/.git/objects/pack/pack-*.idx &&
     +		packed_objects member/.git/objects/pack/pack-*.idx
    -+	) | sort >expected-objects &&
    ++	} | sort >expected-objects &&
     +
     +	# On the other hand, when doing a non-local geometric repack we should
     +	# see both packfiles and thus repack them. We expect that the shared
     +	# object database was not changed.
    -+	stat shared/.git/objects/pack/* >expected-shared-packs &&
    ++	test-tool chmtime --verbose =0 shared/.git/objects/pack/* >expected-shared-packs &&
     +	git -C member repack --geometric=2 -d &&
    -+	stat shared/.git/objects/pack/* >actual-shared-packs &&
    ++	test-tool chmtime --verbose shared/.git/objects/pack/* >actual-shared-packs &&
     +	test_cmp expected-shared-packs actual-shared-packs &&
     +
     +	# Furthermore, we expect that the member repository now has a single
 8:  b10ee7fc3d ! 10:  bb0ee0eb3c repack: disable writing bitmaps when doing a local geometric repack
    @@ Metadata
     Author: Patrick Steinhardt <ps@pks.im>
     
      ## Commit message ##
    -    repack: disable writing bitmaps when doing a local geometric repack
    +    repack: disable writing bitmaps when doing a local repack
     
         In order to write a bitmap, we need to have full coverage of all objects
         that are about to be packed. In the traditional non-multi-pack-index
    @@ Commit message
         with writing bitmaps when we have multiple packfiles as long as the
         multi-pack-index covers all objects.
     
    -    This is not always the case though. When writing multi-pack-indices in a
    -    repository that is connected to an alternate object directory we may end
    -    up writing a multi-pack-index that only has partial coverage of objects.
    -    The end result is that writing the bitmap will fail:
    +    This is not always the case though. When asked to perform a repack of
    +    local objects, only, then we cannot guarantee to have full coverage of
    +    all objects regardless of whether we do a full repack or a repack with a
    +    multi-pack-index. The end result is that writing the bitmap will fail in
    +    both worlds:
     
             $ git multi-pack-index write --stdin-packs --bitmap <packfiles
             warning: Failed to write bitmap index. Packfile doesn't have full closure (object 1529341d78cf45377407369acb0f4ff2b5cdae42 is missing)
    @@ Commit message
     
         Now there are two different ways to fix this. The first one would be to
         amend git-multi-pack-index(1) to disable writing bitmaps when we notice
    -    that we don't have full object coverage. But we ain't really got enough
    -    information there, and seeing that it is a low-level plumbing command it
    -    does not feel like the right place to fix this.
    +    that we don't have full object coverage.
     
    -    We can easily fix it at a higher level in git-repack(1) though. When all
    -    of the following conditions are true:
    +        - We don't have enough information in git-multi-pack-index(1) in
    +          order to tell whether the local repository _should_ have full
    +          coverage. Because even when connected to an alternate object
    +          directory, it may be the case that we still have all objects
    +          around in the main object database.
     
    -        - We are asked to write a multi-pack index with bitmaps.
    +        - git-multi-pack-index(1) is quite a low-level tool. Automatically
    +          disabling functionality that it was asked to provide does not feel
    +          like the right thing to do.
     
    -        - We are asked to only include local objects via `-l`.
    -
    -        - We are connected to an alternate repository that has packfiles.
    -
    -    Then we will override the user's ask and disable writing bitmaps with a
    -    warning. This is similar to what we do in git-pack-objects(1), where we
    -    also disable writing bitmaps in case we omit an object from the pack.
    +    We can easily fix it at a higher level in git-repack(1) though. When
    +    asked to only include local objects via `-l` and when connected to an
    +    alternate object directory then we will override the user's ask and
    +    disable writing bitmaps with a warning. This is similar to what we do in
    +    git-pack-objects(1), where we also disable writing bitmaps in case we
    +    omit an object from the pack.
     
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
      ## builtin/repack.c ##
     @@ builtin/repack.c: int cmd_repack(int argc, const char **argv, const char *prefix)
    - 	if (pack_kept_objects < 0)
    - 		pack_kept_objects = write_bitmaps > 0 && !write_midx;
    - 
    -+	if (write_midx && write_bitmaps && geometric_factor && po_args.local) {
    -+		struct packed_git *p;
    -+
    -+		for (p = get_all_packs(the_repository); p; p = p->next) {
    -+			if (p->pack_local)
    -+				continue;
    -+
    -+			/*
    -+			 * When asked to do a local repack, but we have
    -+			 * packfiles that are inherited from an alternate, then
    -+			 * we cannot guarantee that the multi-pack-index would
    -+			 * have full coverage of all objects. We thus disable
    -+			 * writing bitmaps in that case.
    -+			 */
    -+			warning(_("disabling bitmap writing, as some objects are not being packed"));
    -+			write_bitmaps = 0;
    -+			break;
    -+		}
    -+	}
    -+
      	if (write_bitmaps && !(pack_everything & ALL_INTO_ONE) && !write_midx)
      		die(_(incremental_bitmap_conflict_error));
      
    ++	if (write_bitmaps && po_args.local && has_alt_odb(the_repository)) {
    ++		/*
    ++		 * When asked to do a local repack, but we have
    ++		 * packfiles that are inherited from an alternate, then
    ++		 * we cannot guarantee that the multi-pack-index would
    ++		 * have full coverage of all objects. We thus disable
    ++		 * writing bitmaps in that case.
    ++		 */
    ++		warning(_("disabling bitmap writing, as some objects are not being packed"));
    ++		write_bitmaps = 0;
    ++	}
    ++
    + 	if (write_midx && write_bitmaps) {
    + 		struct strbuf path = STRBUF_INIT;
    + 
    +
    + ## object-file.c ##
    +@@ object-file.c: void prepare_alt_odb(struct repository *r)
    + 	r->objects->loaded_alternates = 1;
    + }
    + 
    ++int has_alt_odb(struct repository *r)
    ++{
    ++	prepare_alt_odb(r);
    ++	return !!r->objects->odb->next;
    ++}
    ++
    + /* Returns 1 if we have successfully freshened the file, 0 otherwise. */
    + static int freshen_file(const char *fn)
    + {
    +
    + ## object-store.h ##
    +@@ object-store.h: KHASH_INIT(odb_path_map, const char * /* key: odb_path */,
    + 	struct object_directory *, 1, fspathhash, fspatheq)
    + 
    + void prepare_alt_odb(struct repository *r);
    ++int has_alt_odb(struct repository *r);
    + char *compute_alternate_path(const char *path, struct strbuf *err);
    + struct object_directory *find_odb(struct repository *r, const char *obj_dir);
    + typedef int alt_odb_fn(struct object_directory *, void *);
    +
    + ## t/t7700-repack.sh ##
    +@@ t/t7700-repack.sh: test_expect_success SYMLINKS '--local keeps packs when alternate is objectdir '
    + 	test_cmp expect actual
    + '
    + 
    ++test_expect_success '--local disables writing bitmaps when connected to alternate ODB' '
    ++	test_when_finished "rm -rf shared member" &&
    ++
    ++	git init shared &&
    ++	git clone --shared shared member &&
    ++	(
    ++		cd member &&
    ++		test_commit "object" &&
    ++		git repack -Adl --write-bitmap-index 2>err &&
    ++		cat >expect <<-EOF &&
    ++		warning: disabling bitmap writing, as some objects are not being packed
    ++		EOF
    ++		test_cmp expect err &&
    ++		test ! -f .git/objects/pack-*.bitmap
    ++	)
    ++'
    ++
    + test_expect_success 'packed obs in alt ODB are repacked even when local repo is packless' '
    + 	mkdir alt_objects/pack &&
    + 	mv .git/objects/pack/* alt_objects/pack &&
     
      ## t/t7703-repack-geometric.sh ##
     @@ t/t7703-repack-geometric.sh: test_expect_success '--geometric -l with non-intact geometric sequence across OD
-- 
2.40.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v3 01/10] midx: fix segfault with no packs and invalid preferred pack
  2023-04-13 11:16 ` [PATCH v3 00/10] " Patrick Steinhardt
@ 2023-04-13 11:16   ` Patrick Steinhardt
  2023-04-13 13:49     ` Derrick Stolee
  2023-04-13 11:16   ` [PATCH v3 02/10] repack: fix trying to use preferred pack in alternates Patrick Steinhardt
                     ` (10 subsequent siblings)
  11 siblings, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-13 11:16 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 2557 bytes --]

When asked to write a multi-pack-index the user can specify a preferred
pack that is used as a tie breaker when multiple packs contain the same
objects. When this packfile cannot be found, we just pick the first pack
that is getting tracked by the newly written multi-pack-index as a
fallback.

Picking the fallback can fail in the case where we're asked to write a
multi-pack-index with no packfiles at all: picking the fallback value
will cause a segfault as we blindly index into the array of packfiles,
which would be empty.

Fix this bug by resetting the preferred packfile index to `-1` before
searching for the preferred pack. This fixes the segfault as we already
check for whether the index is `> - 1`. If it is not, we simply don't
pick a preferred packfile at all.

Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 midx.c                      |  6 +++---
 t/t5319-multi-pack-index.sh | 12 ++++++++++++
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/midx.c b/midx.c
index 47989f7ea7..67eb617591 100644
--- a/midx.c
+++ b/midx.c
@@ -1328,17 +1328,17 @@ static int write_midx_internal(const char *object_dir,
 	}
 
 	if (preferred_pack_name) {
-		int found = 0;
+		ctx.preferred_pack_idx = -1;
+
 		for (i = 0; i < ctx.nr; i++) {
 			if (!cmp_idx_or_pack_name(preferred_pack_name,
 						  ctx.info[i].pack_name)) {
 				ctx.preferred_pack_idx = i;
-				found = 1;
 				break;
 			}
 		}
 
-		if (!found)
+		if (ctx.preferred_pack_idx == -1)
 			warning(_("unknown preferred pack: '%s'"),
 				preferred_pack_name);
 	} else if (ctx.nr &&
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 499d5d4c78..0883c7c6bd 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -183,6 +183,18 @@ test_expect_success 'write midx with --stdin-packs' '
 
 compare_results_with_midx "mixed mode (one pack + extra)"
 
+test_expect_success 'write with no objects and preferred pack' '
+	test_when_finished "rm -rf empty" &&
+	git init empty &&
+	test_must_fail git -C empty multi-pack-index write \
+		--stdin-packs --preferred-pack=does-not-exist </dev/null 2>err &&
+	cat >expect <<-EOF &&
+	warning: unknown preferred pack: ${SQ}does-not-exist${SQ}
+	error: no pack files to index.
+	EOF
+	test_cmp expect err
+'
+
 test_expect_success 'write progress off for redirected stderr' '
 	git multi-pack-index --object-dir=$objdir write 2>err &&
 	test_line_count = 0 err
-- 
2.40.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v3 02/10] repack: fix trying to use preferred pack in alternates
  2023-04-13 11:16 ` [PATCH v3 00/10] " Patrick Steinhardt
  2023-04-13 11:16   ` [PATCH v3 01/10] midx: fix segfault with no packs and invalid preferred pack Patrick Steinhardt
@ 2023-04-13 11:16   ` Patrick Steinhardt
  2023-04-13 11:16   ` [PATCH v3 03/10] repack: fix generating multi-pack-index with only non-local packs Patrick Steinhardt
                     ` (9 subsequent siblings)
  11 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-13 11:16 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 4815 bytes --]

When doing a geometric repack with multi-pack-indices, then we ask
git-multi-pack-index(1) to use the largest packfile as the preferred
pack. It can happen though that the largest packfile is not part of the
main object database, but instead part of an alternate object database.
The result is that git-multi-pack-index(1) will not be able to find the
preferred pack and print a warning. It then falls back to use the first
packfile that the multi-pack-index shall reference.

Fix this bug by only considering packfiles as preferred pack that are
local. This is the right thing to do given that a multi-pack-index
should never reference packfiles borrowed from an alternate.

While at it, rename the function `get_largest_active_packfile()` to
`get_preferred_pack()` to better document its intent.

Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/repack.c            | 26 +++++++++++++++++++++-----
 t/t7703-repack-geometric.sh | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 87f73c8923..687419776d 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -446,8 +446,10 @@ static void split_pack_geometry(struct pack_geometry *geometry, int factor)
 	geometry->split = split;
 }
 
-static struct packed_git *get_largest_active_pack(struct pack_geometry *geometry)
+static struct packed_git *get_preferred_pack(struct pack_geometry *geometry)
 {
+	uint32_t i;
+
 	if (!geometry) {
 		/*
 		 * No geometry means either an all-into-one repack (in which
@@ -462,7 +464,21 @@ static struct packed_git *get_largest_active_pack(struct pack_geometry *geometry
 	}
 	if (geometry->split == geometry->pack_nr)
 		return NULL;
-	return geometry->pack[geometry->pack_nr - 1];
+
+	/*
+	 * The preferred pack is the largest pack above the split line. In
+	 * other words, it is the largest pack that does not get rolled up in
+	 * the geometric repack.
+	 */
+	for (i = geometry->pack_nr; i > geometry->split; i--)
+		/*
+		 * A pack that is not local would never be included in a
+		 * multi-pack index. We thus skip over any non-local packs.
+		 */
+		if (geometry->pack[i - 1]->pack_local)
+			return geometry->pack[i - 1];
+
+	return NULL;
 }
 
 static void clear_pack_geometry(struct pack_geometry *geometry)
@@ -589,7 +605,7 @@ static int write_midx_included_packs(struct string_list *include,
 {
 	struct child_process cmd = CHILD_PROCESS_INIT;
 	struct string_list_item *item;
-	struct packed_git *largest = get_largest_active_pack(geometry);
+	struct packed_git *preferred = get_preferred_pack(geometry);
 	FILE *in;
 	int ret;
 
@@ -610,9 +626,9 @@ static int write_midx_included_packs(struct string_list *include,
 	if (write_bitmaps)
 		strvec_push(&cmd.args, "--bitmap");
 
-	if (largest)
+	if (preferred)
 		strvec_pushf(&cmd.args, "--preferred-pack=%s",
-			     pack_basename(largest));
+			     pack_basename(preferred));
 
 	if (refs_snapshot)
 		strvec_pushf(&cmd.args, "--refs-snapshot=%s", refs_snapshot);
diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
index 8821fbd2dd..d6db362541 100755
--- a/t/t7703-repack-geometric.sh
+++ b/t/t7703-repack-geometric.sh
@@ -281,4 +281,36 @@ test_expect_success '--geometric with pack.packSizeLimit' '
 	)
 '
 
+test_expect_success '--geometric --write-midx with packfiles in main and alternate ODB' '
+	test_when_finished "rm -fr shared member" &&
+
+	# Create a shared repository that will serve as the alternate object
+	# database for the member linked to it. It has got some objects on its
+	# own that are packed into a single packfile.
+	git init shared &&
+	test_commit -C shared common-object &&
+	git -C shared repack -ad &&
+
+	# We create member so that its alternates file points to the shared
+	# repository. We then create a commit in it so that git-repack(1) has
+	# something to repack.
+	# of the shared object database.
+	git clone --shared shared member &&
+	test_commit -C member unique-object &&
+	git -C member repack --geometric=2 --write-midx 2>err &&
+	test_must_be_empty err &&
+
+	# We should see that a new packfile was generated.
+	find shared/.git/objects/pack -type f -name "*.pack" >packs &&
+	test_line_count = 1 packs &&
+
+	# We should also see a multi-pack-index. This multi-pack-index should
+	# never refer to any packfiles in the alternate object database.
+	test -f member/.git/objects/pack/multi-pack-index &&
+	test-tool read-midx member/.git/objects >packs.midx &&
+	grep "^pack-.*\.idx$" packs.midx | sort >actual &&
+	basename member/.git/objects/pack/pack-*.idx >expect &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.40.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v3 03/10] repack: fix generating multi-pack-index with only non-local packs
  2023-04-13 11:16 ` [PATCH v3 00/10] " Patrick Steinhardt
  2023-04-13 11:16   ` [PATCH v3 01/10] midx: fix segfault with no packs and invalid preferred pack Patrick Steinhardt
  2023-04-13 11:16   ` [PATCH v3 02/10] repack: fix trying to use preferred pack in alternates Patrick Steinhardt
@ 2023-04-13 11:16   ` Patrick Steinhardt
  2023-04-13 11:16   ` [PATCH v3 04/10] pack-objects: split out `--stdin-packs` tests into separate file Patrick Steinhardt
                     ` (8 subsequent siblings)
  11 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-13 11:16 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 3130 bytes --]

When writing the multi-pack-index with geometric repacking we will add
all packfiles to the index that are part of the geometric sequence. This
can potentially also include packfiles borrowed from an alternate object
directory. But given that a multi-pack-index can only ever include packs
that are part of the main object database this does not make much sense
whatsoever.

In the edge case where all packfiles are contained in the alternate
object database and the local repository has none itself this bug can
cause us to invoke git-multi-pack-index(1) with only non-local packfiles
that it ultimately cannot find. This causes it to return an error and
thus causes the geometric repack to fail.

Fix the code to skip non-local packfiles.

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/repack.c            | 11 +++++++++++
 t/t7703-repack-geometric.sh | 23 +++++++++++++++++++++++
 2 files changed, 34 insertions(+)

diff --git a/builtin/repack.c b/builtin/repack.c
index 687419776d..80fc860613 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -572,6 +572,17 @@ static void midx_included_packs(struct string_list *include,
 		for (i = geometry->split; i < geometry->pack_nr; i++) {
 			struct packed_git *p = geometry->pack[i];
 
+			/*
+			 * The multi-pack index never refers to packfiles part
+			 * of an alternate object database, so we skip these.
+			 * While git-multi-pack-index(1) would silently ignore
+			 * them anyway, this allows us to skip executing the
+			 * command completely when we have only non-local
+			 * packfiles.
+			 */
+			if (!p->pack_local)
+				continue;
+
 			strbuf_addstr(&buf, pack_basename(p));
 			strbuf_strip_suffix(&buf, ".pack");
 			strbuf_addstr(&buf, ".idx");
diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
index d6db362541..7027676977 100755
--- a/t/t7703-repack-geometric.sh
+++ b/t/t7703-repack-geometric.sh
@@ -313,4 +313,27 @@ test_expect_success '--geometric --write-midx with packfiles in main and alterna
 	test_cmp expect actual
 '
 
+test_expect_success '--geometric --with-midx with no local objects' '
+	test_when_finished "rm -fr shared member" &&
+
+	# Create a repository with a single packfile that acts as alternate
+	# object database.
+	git init shared &&
+	test_commit -C shared "shared-objects" &&
+	git -C shared repack -ad &&
+
+	# Create a second repository linked to the first one and perform a
+	# geometric repack on it.
+	git clone --shared shared member &&
+	git -C member repack --geometric 2 --write-midx 2>err &&
+	test_must_be_empty err &&
+
+	# Assert that we wrote neither a new packfile nor a multi-pack-index.
+	# We should not have a packfile because the single packfile in the
+	# alternate object database does not invalidate the geometric sequence.
+	# And we should not have a multi-pack-index because these only index
+	# local packfiles, and there are none.
+	test_dir_is_empty member/$packdir
+'
+
 test_done
-- 
2.40.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v3 04/10] pack-objects: split out `--stdin-packs` tests into separate file
  2023-04-13 11:16 ` [PATCH v3 00/10] " Patrick Steinhardt
                     ` (2 preceding siblings ...)
  2023-04-13 11:16   ` [PATCH v3 03/10] repack: fix generating multi-pack-index with only non-local packs Patrick Steinhardt
@ 2023-04-13 11:16   ` Patrick Steinhardt
  2023-04-13 11:16   ` [PATCH v3 05/10] pack-objects: fix error when packing same pack twice Patrick Steinhardt
                     ` (7 subsequent siblings)
  11 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-13 11:16 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 8776 bytes --]

The test suite for git-pack-objects(1) is quite huge, and we're about to
add more tests that relate to the `--stdin-packs` option. Split out all
tests related to this option into a standalone file so that it becomes
easier to test the feature in isolation.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 t/t5300-pack-object.sh        | 135 -------------------------------
 t/t5331-pack-objects-stdin.sh | 145 ++++++++++++++++++++++++++++++++++
 2 files changed, 145 insertions(+), 135 deletions(-)
 create mode 100755 t/t5331-pack-objects-stdin.sh

diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index f8a0f309e2..d2ce236d61 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -589,141 +589,6 @@ test_expect_success 'prefetch objects' '
 	test_line_count = 1 donelines
 '
 
-test_expect_success 'setup for --stdin-packs tests' '
-	git init stdin-packs &&
-	(
-		cd stdin-packs &&
-
-		test_commit A &&
-		test_commit B &&
-		test_commit C &&
-
-		for id in A B C
-		do
-			git pack-objects .git/objects/pack/pack-$id \
-				--incremental --revs <<-EOF || exit 1
-			refs/tags/$id
-			EOF
-		done &&
-
-		ls -la .git/objects/pack
-	)
-'
-
-test_expect_success '--stdin-packs with excluded packs' '
-	(
-		cd stdin-packs &&
-
-		PACK_A="$(basename .git/objects/pack/pack-A-*.pack)" &&
-		PACK_B="$(basename .git/objects/pack/pack-B-*.pack)" &&
-		PACK_C="$(basename .git/objects/pack/pack-C-*.pack)" &&
-
-		git pack-objects test --stdin-packs <<-EOF &&
-		$PACK_A
-		^$PACK_B
-		$PACK_C
-		EOF
-
-		(
-			git show-index <$(ls .git/objects/pack/pack-A-*.idx) &&
-			git show-index <$(ls .git/objects/pack/pack-C-*.idx)
-		) >expect.raw &&
-		git show-index <$(ls test-*.idx) >actual.raw &&
-
-		cut -d" " -f2 <expect.raw | sort >expect &&
-		cut -d" " -f2 <actual.raw | sort >actual &&
-		test_cmp expect actual
-	)
-'
-
-test_expect_success '--stdin-packs is incompatible with --filter' '
-	(
-		cd stdin-packs &&
-		test_must_fail git pack-objects --stdin-packs --stdout \
-			--filter=blob:none </dev/null 2>err &&
-		test_i18ngrep "cannot use --filter with --stdin-packs" err
-	)
-'
-
-test_expect_success '--stdin-packs is incompatible with --revs' '
-	(
-		cd stdin-packs &&
-		test_must_fail git pack-objects --stdin-packs --revs out \
-			</dev/null 2>err &&
-		test_i18ngrep "cannot use internal rev list with --stdin-packs" err
-	)
-'
-
-test_expect_success '--stdin-packs with loose objects' '
-	(
-		cd stdin-packs &&
-
-		PACK_A="$(basename .git/objects/pack/pack-A-*.pack)" &&
-		PACK_B="$(basename .git/objects/pack/pack-B-*.pack)" &&
-		PACK_C="$(basename .git/objects/pack/pack-C-*.pack)" &&
-
-		test_commit D && # loose
-
-		git pack-objects test2 --stdin-packs --unpacked <<-EOF &&
-		$PACK_A
-		^$PACK_B
-		$PACK_C
-		EOF
-
-		(
-			git show-index <$(ls .git/objects/pack/pack-A-*.idx) &&
-			git show-index <$(ls .git/objects/pack/pack-C-*.idx) &&
-			git rev-list --objects --no-object-names \
-				refs/tags/C..refs/tags/D
-
-		) >expect.raw &&
-		ls -la . &&
-		git show-index <$(ls test2-*.idx) >actual.raw &&
-
-		cut -d" " -f2 <expect.raw | sort >expect &&
-		cut -d" " -f2 <actual.raw | sort >actual &&
-		test_cmp expect actual
-	)
-'
-
-test_expect_success '--stdin-packs with broken links' '
-	(
-		cd stdin-packs &&
-
-		# make an unreachable object with a bogus parent
-		git cat-file -p HEAD >commit &&
-		sed "s/$(git rev-parse HEAD^)/$(test_oid zero)/" <commit |
-		git hash-object -w -t commit --stdin >in &&
-
-		git pack-objects .git/objects/pack/pack-D <in &&
-
-		PACK_A="$(basename .git/objects/pack/pack-A-*.pack)" &&
-		PACK_B="$(basename .git/objects/pack/pack-B-*.pack)" &&
-		PACK_C="$(basename .git/objects/pack/pack-C-*.pack)" &&
-		PACK_D="$(basename .git/objects/pack/pack-D-*.pack)" &&
-
-		git pack-objects test3 --stdin-packs --unpacked <<-EOF &&
-		$PACK_A
-		^$PACK_B
-		$PACK_C
-		$PACK_D
-		EOF
-
-		(
-			git show-index <$(ls .git/objects/pack/pack-A-*.idx) &&
-			git show-index <$(ls .git/objects/pack/pack-C-*.idx) &&
-			git show-index <$(ls .git/objects/pack/pack-D-*.idx) &&
-			git rev-list --objects --no-object-names \
-				refs/tags/C..refs/tags/D
-		) >expect.raw &&
-		git show-index <$(ls test3-*.idx) >actual.raw &&
-
-		cut -d" " -f2 <expect.raw | sort >expect &&
-		cut -d" " -f2 <actual.raw | sort >actual &&
-		test_cmp expect actual
-	)
-'
-
 test_expect_success 'negative window clamps to 0' '
 	git pack-objects --progress --window=-1 neg-window <obj-list 2>stderr &&
 	check_deltas stderr = 0
diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
new file mode 100755
index 0000000000..d5eece5899
--- /dev/null
+++ b/t/t5331-pack-objects-stdin.sh
@@ -0,0 +1,145 @@
+#!/bin/sh
+
+test_description='pack-objects --stdin'
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+test_expect_success 'setup for --stdin-packs tests' '
+	git init stdin-packs &&
+	(
+		cd stdin-packs &&
+
+		test_commit A &&
+		test_commit B &&
+		test_commit C &&
+
+		for id in A B C
+		do
+			git pack-objects .git/objects/pack/pack-$id \
+				--incremental --revs <<-EOF || exit 1
+			refs/tags/$id
+			EOF
+		done &&
+
+		ls -la .git/objects/pack
+	)
+'
+
+test_expect_success '--stdin-packs with excluded packs' '
+	(
+		cd stdin-packs &&
+
+		PACK_A="$(basename .git/objects/pack/pack-A-*.pack)" &&
+		PACK_B="$(basename .git/objects/pack/pack-B-*.pack)" &&
+		PACK_C="$(basename .git/objects/pack/pack-C-*.pack)" &&
+
+		git pack-objects test --stdin-packs <<-EOF &&
+		$PACK_A
+		^$PACK_B
+		$PACK_C
+		EOF
+
+		(
+			git show-index <$(ls .git/objects/pack/pack-A-*.idx) &&
+			git show-index <$(ls .git/objects/pack/pack-C-*.idx)
+		) >expect.raw &&
+		git show-index <$(ls test-*.idx) >actual.raw &&
+
+		cut -d" " -f2 <expect.raw | sort >expect &&
+		cut -d" " -f2 <actual.raw | sort >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success '--stdin-packs is incompatible with --filter' '
+	(
+		cd stdin-packs &&
+		test_must_fail git pack-objects --stdin-packs --stdout \
+			--filter=blob:none </dev/null 2>err &&
+		test_i18ngrep "cannot use --filter with --stdin-packs" err
+	)
+'
+
+test_expect_success '--stdin-packs is incompatible with --revs' '
+	(
+		cd stdin-packs &&
+		test_must_fail git pack-objects --stdin-packs --revs out \
+			</dev/null 2>err &&
+		test_i18ngrep "cannot use internal rev list with --stdin-packs" err
+	)
+'
+
+test_expect_success '--stdin-packs with loose objects' '
+	(
+		cd stdin-packs &&
+
+		PACK_A="$(basename .git/objects/pack/pack-A-*.pack)" &&
+		PACK_B="$(basename .git/objects/pack/pack-B-*.pack)" &&
+		PACK_C="$(basename .git/objects/pack/pack-C-*.pack)" &&
+
+		test_commit D && # loose
+
+		git pack-objects test2 --stdin-packs --unpacked <<-EOF &&
+		$PACK_A
+		^$PACK_B
+		$PACK_C
+		EOF
+
+		(
+			git show-index <$(ls .git/objects/pack/pack-A-*.idx) &&
+			git show-index <$(ls .git/objects/pack/pack-C-*.idx) &&
+			git rev-list --objects --no-object-names \
+				refs/tags/C..refs/tags/D
+
+		) >expect.raw &&
+		ls -la . &&
+		git show-index <$(ls test2-*.idx) >actual.raw &&
+
+		cut -d" " -f2 <expect.raw | sort >expect &&
+		cut -d" " -f2 <actual.raw | sort >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success '--stdin-packs with broken links' '
+	(
+		cd stdin-packs &&
+
+		# make an unreachable object with a bogus parent
+		git cat-file -p HEAD >commit &&
+		sed "s/$(git rev-parse HEAD^)/$(test_oid zero)/" <commit |
+		git hash-object -w -t commit --stdin >in &&
+
+		git pack-objects .git/objects/pack/pack-D <in &&
+
+		PACK_A="$(basename .git/objects/pack/pack-A-*.pack)" &&
+		PACK_B="$(basename .git/objects/pack/pack-B-*.pack)" &&
+		PACK_C="$(basename .git/objects/pack/pack-C-*.pack)" &&
+		PACK_D="$(basename .git/objects/pack/pack-D-*.pack)" &&
+
+		git pack-objects test3 --stdin-packs --unpacked <<-EOF &&
+		$PACK_A
+		^$PACK_B
+		$PACK_C
+		$PACK_D
+		EOF
+
+		(
+			git show-index <$(ls .git/objects/pack/pack-A-*.idx) &&
+			git show-index <$(ls .git/objects/pack/pack-C-*.idx) &&
+			git show-index <$(ls .git/objects/pack/pack-D-*.idx) &&
+			git rev-list --objects --no-object-names \
+				refs/tags/C..refs/tags/D
+		) >expect.raw &&
+		git show-index <$(ls test3-*.idx) >actual.raw &&
+
+		cut -d" " -f2 <expect.raw | sort >expect &&
+		cut -d" " -f2 <actual.raw | sort >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_done
-- 
2.40.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v3 05/10] pack-objects: fix error when packing same pack twice
  2023-04-13 11:16 ` [PATCH v3 00/10] " Patrick Steinhardt
                     ` (3 preceding siblings ...)
  2023-04-13 11:16   ` [PATCH v3 04/10] pack-objects: split out `--stdin-packs` tests into separate file Patrick Steinhardt
@ 2023-04-13 11:16   ` Patrick Steinhardt
  2023-04-13 11:16   ` [PATCH v3 06/10] pack-objects: fix error when same packfile is included and excluded Patrick Steinhardt
                     ` (6 subsequent siblings)
  11 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-13 11:16 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 4286 bytes --]

When passed the same packfile twice via `--stdin-packs` we return an
error that the packfile supposedly was not found. This is because when
reading packs into the list of included or excluded packfiles, we will
happily re-add packfiles even if they are part of the lists already. And
while the list can now contain duplicates, we will only set the `util`
pointer of the first list entry to the `packed_git` structure. We notice
that at a later point when checking that all list entries have their
`util` pointer set and die with an error.

While this is kind of a nonsensical request, this scenario can be hit
when doing geometric repacks. When a repository is connected to an
alternate object directory and both have the exact same packfile then
both would get added to the geometric sequence. And when we then decide
to perform the repack, we will invoke git-pack-objects(1) with the same
packfile twice.

Fix this bug by removing any duplicates from both the included and
excluded packs.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/pack-objects.c        |  2 ++
 t/t5331-pack-objects-stdin.sh | 27 +++++++++++++++++++++++++++
 t/t7703-repack-geometric.sh   | 25 +++++++++++++++++++++++++
 3 files changed, 54 insertions(+)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 7e9e20172a..7d0e864c35 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3348,7 +3348,9 @@ static void read_packs_list_from_stdin(void)
 	}
 
 	string_list_sort(&include_packs);
+	string_list_remove_duplicates(&include_packs, 0);
 	string_list_sort(&exclude_packs);
+	string_list_remove_duplicates(&exclude_packs, 0);
 
 	for (p = get_all_packs(the_repository); p; p = p->next) {
 		const char *pack_name = pack_basename(p);
diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
index d5eece5899..def7640c91 100755
--- a/t/t5331-pack-objects-stdin.sh
+++ b/t/t5331-pack-objects-stdin.sh
@@ -7,6 +7,12 @@ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
+packed_objects() {
+	git show-index <"$1" >tmp-object-list &&
+	cut -d' ' -f2 tmp-object-list | sort &&
+	rm tmp-object-list
+ }
+
 test_expect_success 'setup for --stdin-packs tests' '
 	git init stdin-packs &&
 	(
@@ -142,4 +148,25 @@ test_expect_success '--stdin-packs with broken links' '
 	)
 '
 
+test_expect_success 'pack-objects --stdin with duplicate packfile' '
+	test_when_finished "rm -fr repo" &&
+
+	git init repo &&
+	(
+		cd repo &&
+		test_commit "commit" &&
+		git repack -ad &&
+
+		{
+			basename .git/objects/pack/pack-*.pack &&
+			basename .git/objects/pack/pack-*.pack
+		} >packfiles &&
+
+		git pack-objects --stdin-packs generated-pack <packfiles &&
+		packed_objects .git/objects/pack/pack-*.idx >expect &&
+		packed_objects generated-pack-*.idx >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_done
diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
index 7027676977..d0823f2eb2 100755
--- a/t/t7703-repack-geometric.sh
+++ b/t/t7703-repack-geometric.sh
@@ -336,4 +336,29 @@ test_expect_success '--geometric --with-midx with no local objects' '
 	test_dir_is_empty member/$packdir
 '
 
+test_expect_success '--geometric with same pack in main and alternate ODB' '
+	test_when_finished "rm -fr shared member" &&
+
+	# Create a repository with a single packfile that acts as alternate
+	# object database.
+	git init shared &&
+	test_commit -C shared "shared-objects" &&
+	git -C shared repack -ad &&
+
+	# We create the member repository as an exact copy so that it has the
+	# same packfile.
+	cp -r shared member &&
+	test-tool path-utils real_path shared/.git/objects >member/.git/objects/info/alternates &&
+	find shared/.git/objects -type f >expected-files &&
+
+	# Verify that we can repack objects as expected without observing any
+	# error. Having the same packfile in both ODBs used to cause an error
+	# in git-pack-objects(1).
+	git -C member repack --geometric 2 2>err &&
+	test_must_be_empty err &&
+	# Nothing should have changed.
+	find shared/.git/objects -type f >actual-files &&
+	test_cmp expected-files actual-files
+'
+
 test_done
-- 
2.40.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v3 06/10] pack-objects: fix error when same packfile is included and excluded
  2023-04-13 11:16 ` [PATCH v3 00/10] " Patrick Steinhardt
                     ` (4 preceding siblings ...)
  2023-04-13 11:16   ` [PATCH v3 05/10] pack-objects: fix error when packing same pack twice Patrick Steinhardt
@ 2023-04-13 11:16   ` Patrick Steinhardt
  2023-04-13 11:16   ` [PATCH v3 07/10] pack-objects: extend test coverage of `--stdin-packs` with alternates Patrick Steinhardt
                     ` (5 subsequent siblings)
  11 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-13 11:16 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 2202 bytes --]

When passing the same packfile both as included and excluded via the
`--stdin-packs` option, then we will return an error because the
excluded packfile cannot be found. This is because we will only set the
`util` pointer for the included packfile list if it was found, so that
we later die when we notice that it's in fact not set for the excluded
packfile list.

Fix this bug by always setting the `util` pointer for both the included
and excluded list entries.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/pack-objects.c        |  8 +++-----
 t/t5331-pack-objects-stdin.sh | 20 ++++++++++++++++++++
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 7d0e864c35..c5147d392f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3355,11 +3355,9 @@ static void read_packs_list_from_stdin(void)
 	for (p = get_all_packs(the_repository); p; p = p->next) {
 		const char *pack_name = pack_basename(p);
 
-		item = string_list_lookup(&include_packs, pack_name);
-		if (!item)
-			item = string_list_lookup(&exclude_packs, pack_name);
-
-		if (item)
+		if ((item = string_list_lookup(&include_packs, pack_name)))
+			item->util = p;
+		if ((item = string_list_lookup(&exclude_packs, pack_name)))
 			item->util = p;
 	}
 
diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
index def7640c91..1e23cc2e61 100755
--- a/t/t5331-pack-objects-stdin.sh
+++ b/t/t5331-pack-objects-stdin.sh
@@ -169,4 +169,24 @@ test_expect_success 'pack-objects --stdin with duplicate packfile' '
 	)
 '
 
+test_expect_success 'pack-objects --stdin with same packfile excluded and included' '
+	test_when_finished "rm -fr repo" &&
+
+	git init repo &&
+	(
+		cd repo &&
+		test_commit "commit" &&
+		git repack -ad &&
+
+		{
+			basename .git/objects/pack/pack-*.pack &&
+			printf "^%s\n" "$(basename .git/objects/pack/pack-*.pack)"
+		} >packfiles &&
+
+		git pack-objects --stdin-packs generated-pack <packfiles &&
+		packed_objects generated-pack-*.idx >packed-objects &&
+		test_must_be_empty packed-objects
+	)
+'
+
 test_done
-- 
2.40.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v3 07/10] pack-objects: extend test coverage of `--stdin-packs` with alternates
  2023-04-13 11:16 ` [PATCH v3 00/10] " Patrick Steinhardt
                     ` (5 preceding siblings ...)
  2023-04-13 11:16   ` [PATCH v3 06/10] pack-objects: fix error when same packfile is included and excluded Patrick Steinhardt
@ 2023-04-13 11:16   ` Patrick Steinhardt
  2023-04-13 11:16   ` [PATCH v3 08/10] t/helper: allow chmtime to print verbosely without modifying mtime Patrick Steinhardt
                     ` (4 subsequent siblings)
  11 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-13 11:16 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 2584 bytes --]

We don't have any tests that verify that git-pack-objects(1) works with
`--stdin-packs` when combined with alternate object directories. Add
some to make sure that the basic functionality works as expected.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 t/t5331-pack-objects-stdin.sh | 48 +++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
index 1e23cc2e61..45e24fa94a 100755
--- a/t/t5331-pack-objects-stdin.sh
+++ b/t/t5331-pack-objects-stdin.sh
@@ -189,4 +189,52 @@ test_expect_success 'pack-objects --stdin with same packfile excluded and includ
 	)
 '
 
+test_expect_success 'pack-objects --stdin with packfiles from alternate object database' '
+	test_when_finished "rm -fr shared member" &&
+
+	# Set up a shared repository with a single packfile.
+	git init shared &&
+	test_commit -C shared "shared-objects" &&
+	git -C shared repack -ad &&
+	basename shared/.git/objects/pack/pack-*.pack >packfile &&
+
+	# Set up a repository that is connected to the shared repository. This
+	# repository has no objects on its own, but we still expect to be able
+	# to pack objects from its alternate.
+	git clone --shared shared member &&
+	git -C member pack-objects --stdin-packs generated-pack <packfile &&
+	test_cmp shared/.git/objects/pack/pack-*.pack member/generated-pack-*.pack
+'
+
+test_expect_success 'pack-objects --stdin with packfiles from main and alternate object database' '
+	test_when_finished "rm -fr shared member" &&
+
+	# Set up a shared repository with a single packfile.
+	git init shared &&
+	test_commit -C shared "shared-commit" &&
+	git -C shared repack -ad &&
+
+	# Set up a repository that is connected to the shared repository. This
+	# repository has a second packfile so that we can verify that it is
+	# possible to write packs that include packfiles from different object
+	# databases.
+	git clone --shared shared member &&
+	test_commit -C member "local-commit" &&
+	git -C member repack -dl &&
+
+	{
+		basename shared/.git/objects/pack/pack-*.pack &&
+		basename member/.git/objects/pack/pack-*.pack
+	} >packfiles &&
+
+	{
+		packed_objects shared/.git/objects/pack/pack-*.idx &&
+		packed_objects member/.git/objects/pack/pack-*.idx
+	} | sort >expected-objects &&
+
+	git -C member pack-objects --stdin-packs generated-pack <packfiles &&
+	packed_objects member/generated-pack-*.idx >actual-objects &&
+	test_cmp expected-objects actual-objects
+'
+
 test_done
-- 
2.40.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v3 08/10] t/helper: allow chmtime to print verbosely without modifying mtime
  2023-04-13 11:16 ` [PATCH v3 00/10] " Patrick Steinhardt
                     ` (6 preceding siblings ...)
  2023-04-13 11:16   ` [PATCH v3 07/10] pack-objects: extend test coverage of `--stdin-packs` with alternates Patrick Steinhardt
@ 2023-04-13 11:16   ` Patrick Steinhardt
  2023-04-13 11:16   ` [PATCH v3 09/10] repack: honor `-l` when calculating pack geometry Patrick Steinhardt
                     ` (3 subsequent siblings)
  11 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-13 11:16 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 1059 bytes --]

The `test-tool chmtime` helper allows us to both read and modify the
modification time of files. But while it is possible to only read the
mtimes of a file via `--get`, it is not possible to read the mtimes
and report them together with their respective file paths via the
`--verbose` flag without also modifying the mtime at the same time.

Fix this so that it is possible to call `test-tool chmtime --verbose
<files>...` without modifying any mtimes.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 t/helper/test-chmtime.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/helper/test-chmtime.c b/t/helper/test-chmtime.c
index dc28890a18..0e5538833a 100644
--- a/t/helper/test-chmtime.c
+++ b/t/helper/test-chmtime.c
@@ -94,7 +94,7 @@ int cmd__chmtime(int argc, const char **argv)
 	if (timespec_arg(argv[i], &set_time, &set_eq)) {
 		++i;
 	} else {
-		if (get == 0) {
+		if (get == 0 && verbose == 0) {
 			fprintf(stderr, "Not a base-10 integer: %s\n", argv[i] + 1);
 			goto usage;
 		}
-- 
2.40.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v3 09/10] repack: honor `-l` when calculating pack geometry
  2023-04-13 11:16 ` [PATCH v3 00/10] " Patrick Steinhardt
                     ` (7 preceding siblings ...)
  2023-04-13 11:16   ` [PATCH v3 08/10] t/helper: allow chmtime to print verbosely without modifying mtime Patrick Steinhardt
@ 2023-04-13 11:16   ` Patrick Steinhardt
  2023-04-13 13:59     ` Derrick Stolee
  2023-04-13 11:16   ` [PATCH v3 10/10] repack: disable writing bitmaps when doing a local repack Patrick Steinhardt
                     ` (2 subsequent siblings)
  11 siblings, 1 reply; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-13 11:16 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 5800 bytes --]

When the user passes `-l` to git-repack(1), then they essentially ask us
to only repack objects part of the local object database while ignoring
any packfiles part of an alternate object database. And we in fact honor
this bit when doing a geometric repack as the resulting packfile will
only ever contain local objects.

What we're missing though is that we don't take locality of packfiles
into account when computing whether the geometric sequence is intact or
not. So even though we would only ever roll up local packfiles anyway,
we could end up trying to repack because of non-local packfiles. This
does not make much sense, and in the worst case it can cause us to try
and do the geometric repack over and over again because we're never able
to restore the geometric sequence.

Fix this bug by honoring whether the user has passed `-l`. If so, we
skip adding any non-local packfiles to the pack geometry.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/repack.c            | 13 +++++++--
 t/t7703-repack-geometric.sh | 58 +++++++++++++++++++++++++++++++++++++
 2 files changed, 69 insertions(+), 2 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 80fc860613..5768aeb1f3 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -323,7 +323,8 @@ static int geometry_cmp(const void *va, const void *vb)
 }
 
 static void init_pack_geometry(struct pack_geometry **geometry_p,
-			       struct string_list *existing_kept_packs)
+			       struct string_list *existing_kept_packs,
+			       const struct pack_objects_args *args)
 {
 	struct packed_git *p;
 	struct pack_geometry *geometry;
@@ -333,6 +334,14 @@ static void init_pack_geometry(struct pack_geometry **geometry_p,
 	geometry = *geometry_p;
 
 	for (p = get_all_packs(the_repository); p; p = p->next) {
+		if (args->local && !p->pack_local)
+			/*
+			 * When asked to only repack local packfiles we skip
+			 * over any packfiles that are borrowed from alternate
+			 * object directories.
+			 */
+			continue;
+
 		if (!pack_kept_objects) {
 			/*
 			 * Any pack that has its pack_keep bit set will appear
@@ -900,7 +909,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	if (geometric_factor) {
 		if (pack_everything)
 			die(_("options '%s' and '%s' cannot be used together"), "--geometric", "-A/-a");
-		init_pack_geometry(&geometry, &existing_kept_packs);
+		init_pack_geometry(&geometry, &existing_kept_packs, &po_args);
 		split_pack_geometry(geometry, geometric_factor);
 	}
 
diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
index d0823f2eb2..c440956ad5 100755
--- a/t/t7703-repack-geometric.sh
+++ b/t/t7703-repack-geometric.sh
@@ -10,6 +10,12 @@ objdir=.git/objects
 packdir=$objdir/pack
 midx=$objdir/pack/multi-pack-index
 
+packed_objects() {
+	git show-index <"$1" >tmp-object-list &&
+	cut -d' ' -f2 tmp-object-list &&
+	rm tmp-object-list
+ }
+
 test_expect_success '--geometric with no packs' '
 	git init geometric &&
 	test_when_finished "rm -fr geometric" &&
@@ -361,4 +367,56 @@ test_expect_success '--geometric with same pack in main and alternate ODB' '
 	test_cmp expected-files actual-files
 '
 
+test_expect_success '--geometric -l with non-intact geometric sequence across ODBs' '
+	test_when_finished "rm -fr shared member" &&
+
+	git init shared &&
+	test_commit_bulk -C shared --start=1 1 &&
+
+	git clone --shared shared member &&
+	test_commit_bulk -C member --start=2 1 &&
+
+	# Verify that our assumptions actually hold: both generated packfiles
+	# should have three objects and should be non-equal.
+	packed_objects shared/.git/objects/pack/pack-*.idx >packed-objects &&
+	test_line_count = 3 packed-objects &&
+	packed_objects member/.git/objects/pack/pack-*.idx >packed-objetcs &&
+	test_line_count = 3 packed-objects &&
+	test "$(basename member/.git/objects/pack/pack-*.pack)" != "$(basename shared/.git/objects/pack/pack-*.pack)" &&
+
+	# Perform the geometric repack. With `-l`, we should only see the local
+	# packfile and thus arrive at the conclusion that the geometric
+	# sequence is intact. We thus expect no changes.
+	#
+	# Note that we are tweaking mtimes of the packfiles so that we can
+	# verify they did not change. This is done in order to detect the case
+	# where we do repack objects, but the resulting packfile is the same.
+	test-tool chmtime --verbose =0 member/.git/objects/pack/* >expected-member-packs &&
+	git -C member repack --geometric=2 -l -d &&
+	test-tool chmtime --verbose member/.git/objects/pack/* >actual-member-packs &&
+	test_cmp expected-member-packs actual-member-packs &&
+
+	{
+		packed_objects shared/.git/objects/pack/pack-*.idx &&
+		packed_objects member/.git/objects/pack/pack-*.idx
+	} | sort >expected-objects &&
+
+	# On the other hand, when doing a non-local geometric repack we should
+	# see both packfiles and thus repack them. We expect that the shared
+	# object database was not changed.
+	test-tool chmtime --verbose =0 shared/.git/objects/pack/* >expected-shared-packs &&
+	git -C member repack --geometric=2 -d &&
+	test-tool chmtime --verbose shared/.git/objects/pack/* >actual-shared-packs &&
+	test_cmp expected-shared-packs actual-shared-packs &&
+
+	# Furthermore, we expect that the member repository now has a single
+	# packfile that contains the combined shared and non-shared objects.
+	ls member/.git/objects/pack/pack-*.idx >actual &&
+	test_line_count = 1 actual &&
+	packed_objects member/.git/objects/pack/pack-*.idx >actual-objects &&
+	test_line_count = 6 actual-objects &&
+	sort <actual-objects >actual-objects.sorted &&
+	test_cmp expected-objects actual-objects.sorted
+'
+
 test_done
-- 
2.40.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v3 10/10] repack: disable writing bitmaps when doing a local repack
  2023-04-13 11:16 ` [PATCH v3 00/10] " Patrick Steinhardt
                     ` (8 preceding siblings ...)
  2023-04-13 11:16   ` [PATCH v3 09/10] repack: honor `-l` when calculating pack geometry Patrick Steinhardt
@ 2023-04-13 11:16   ` Patrick Steinhardt
  2023-04-13 14:54   ` [PATCH v3 00/10] repack: fix geometric repacking with gitalternates Derrick Stolee
  2023-04-14  2:03   ` Junio C Hamano
  11 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-13 11:16 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 6694 bytes --]

In order to write a bitmap, we need to have full coverage of all objects
that are about to be packed. In the traditional non-multi-pack-index
world this meant we need to do a full repack of all objects into a
single packfile. But in the new multi-pack-index world we can get away
with writing bitmaps when we have multiple packfiles as long as the
multi-pack-index covers all objects.

This is not always the case though. When asked to perform a repack of
local objects, only, then we cannot guarantee to have full coverage of
all objects regardless of whether we do a full repack or a repack with a
multi-pack-index. The end result is that writing the bitmap will fail in
both worlds:

    $ git multi-pack-index write --stdin-packs --bitmap <packfiles
    warning: Failed to write bitmap index. Packfile doesn't have full closure (object 1529341d78cf45377407369acb0f4ff2b5cdae42 is missing)
    error: could not write multi-pack bitmap

Now there are two different ways to fix this. The first one would be to
amend git-multi-pack-index(1) to disable writing bitmaps when we notice
that we don't have full object coverage.

    - We don't have enough information in git-multi-pack-index(1) in
      order to tell whether the local repository _should_ have full
      coverage. Because even when connected to an alternate object
      directory, it may be the case that we still have all objects
      around in the main object database.

    - git-multi-pack-index(1) is quite a low-level tool. Automatically
      disabling functionality that it was asked to provide does not feel
      like the right thing to do.

We can easily fix it at a higher level in git-repack(1) though. When
asked to only include local objects via `-l` and when connected to an
alternate object directory then we will override the user's ask and
disable writing bitmaps with a warning. This is similar to what we do in
git-pack-objects(1), where we also disable writing bitmaps in case we
omit an object from the pack.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/repack.c            | 12 ++++++++++++
 object-file.c               |  6 ++++++
 object-store.h              |  1 +
 t/t7700-repack.sh           | 17 +++++++++++++++++
 t/t7703-repack-geometric.sh | 27 +++++++++++++++++++++++++++
 5 files changed, 63 insertions(+)

diff --git a/builtin/repack.c b/builtin/repack.c
index 5768aeb1f3..e2abcea2ee 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -887,6 +887,18 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	if (write_bitmaps && !(pack_everything & ALL_INTO_ONE) && !write_midx)
 		die(_(incremental_bitmap_conflict_error));
 
+	if (write_bitmaps && po_args.local && has_alt_odb(the_repository)) {
+		/*
+		 * When asked to do a local repack, but we have
+		 * packfiles that are inherited from an alternate, then
+		 * we cannot guarantee that the multi-pack-index would
+		 * have full coverage of all objects. We thus disable
+		 * writing bitmaps in that case.
+		 */
+		warning(_("disabling bitmap writing, as some objects are not being packed"));
+		write_bitmaps = 0;
+	}
+
 	if (write_midx && write_bitmaps) {
 		struct strbuf path = STRBUF_INIT;
 
diff --git a/object-file.c b/object-file.c
index 8fab8dbe80..3ad11c683b 100644
--- a/object-file.c
+++ b/object-file.c
@@ -946,6 +946,12 @@ void prepare_alt_odb(struct repository *r)
 	r->objects->loaded_alternates = 1;
 }
 
+int has_alt_odb(struct repository *r)
+{
+	prepare_alt_odb(r);
+	return !!r->objects->odb->next;
+}
+
 /* Returns 1 if we have successfully freshened the file, 0 otherwise. */
 static int freshen_file(const char *fn)
 {
diff --git a/object-store.h b/object-store.h
index 82201ec3e7..574f16140f 100644
--- a/object-store.h
+++ b/object-store.h
@@ -56,6 +56,7 @@ KHASH_INIT(odb_path_map, const char * /* key: odb_path */,
 	struct object_directory *, 1, fspathhash, fspatheq)
 
 void prepare_alt_odb(struct repository *r);
+int has_alt_odb(struct repository *r);
 char *compute_alternate_path(const char *path, struct strbuf *err);
 struct object_directory *find_odb(struct repository *r, const char *obj_dir);
 typedef int alt_odb_fn(struct object_directory *, void *);
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 4aabe98139..93c5763e7d 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -106,6 +106,23 @@ test_expect_success SYMLINKS '--local keeps packs when alternate is objectdir '
 	test_cmp expect actual
 '
 
+test_expect_success '--local disables writing bitmaps when connected to alternate ODB' '
+	test_when_finished "rm -rf shared member" &&
+
+	git init shared &&
+	git clone --shared shared member &&
+	(
+		cd member &&
+		test_commit "object" &&
+		git repack -Adl --write-bitmap-index 2>err &&
+		cat >expect <<-EOF &&
+		warning: disabling bitmap writing, as some objects are not being packed
+		EOF
+		test_cmp expect err &&
+		test ! -f .git/objects/pack-*.bitmap
+	)
+'
+
 test_expect_success 'packed obs in alt ODB are repacked even when local repo is packless' '
 	mkdir alt_objects/pack &&
 	mv .git/objects/pack/* alt_objects/pack &&
diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
index c440956ad5..33d7977fca 100755
--- a/t/t7703-repack-geometric.sh
+++ b/t/t7703-repack-geometric.sh
@@ -419,4 +419,31 @@ test_expect_success '--geometric -l with non-intact geometric sequence across OD
 	test_cmp expected-objects actual-objects.sorted
 '
 
+test_expect_success '--geometric -l disables writing bitmaps with non-local packfiles' '
+	test_when_finished "rm -fr shared member" &&
+
+	git init shared &&
+	test_commit_bulk -C shared --start=1 1 &&
+
+	git clone --shared shared member &&
+	test_commit_bulk -C member --start=2 1 &&
+
+	# When performing a geometric repack with `-l` while connecting to an
+	# alternate object database that has a packfile we do not have full
+	# coverage of objects. As a result, we expect that writing the bitmap
+	# will be disabled.
+	git -C member repack -l --geometric=2 --write-midx --write-bitmap-index 2>err &&
+	cat >expect <<-EOF &&
+	warning: disabling bitmap writing, as some objects are not being packed
+	EOF
+	test_cmp expect err &&
+	test ! -f member/.git/objects/pack/multi-pack-index-*.bitmap &&
+
+	# On the other hand, when we repack without `-l`, we should see that
+	# the bitmap gets created.
+	git -C member repack --geometric=2 --write-midx --write-bitmap-index 2>err &&
+	test_must_be_empty err &&
+	test -f member/.git/objects/pack/multi-pack-index-*.bitmap
+'
+
 test_done
-- 
2.40.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH v3 01/10] midx: fix segfault with no packs and invalid preferred pack
  2023-04-13 11:16   ` [PATCH v3 01/10] midx: fix segfault with no packs and invalid preferred pack Patrick Steinhardt
@ 2023-04-13 13:49     ` Derrick Stolee
  0 siblings, 0 replies; 67+ messages in thread
From: Derrick Stolee @ 2023-04-13 13:49 UTC (permalink / raw)
  To: Patrick Steinhardt, git; +Cc: Taylor Blau, peff, dstolee

On 4/13/2023 7:16 AM, Patrick Steinhardt wrote:

> Fix this bug by resetting the preferred packfile index to `-1` before
> searching for the preferred pack. This fixes the segfault as we already
> check for whether the index is `> - 1`. If it is not, we simply don't
> pick a preferred packfile at all.

>  	if (preferred_pack_name) {
> -		int found = 0;
> +		ctx.preferred_pack_idx = -1;
> +

This patch looks good, but I did need to think about it for a bit,
so here are my thoughts (feel free to ignore):

I briefly considered recommending that we set this as the default in
an initializer macro, something like

  #define WRITE_MIDX_CONTEXT_INIT { .preferred_pack_idx = -1 }

but this struct is internal to the file and only constructed once
(at the start of write_midx_internal).

Further, outside the context of this patch we have this code:

	} else if (ctx.nr &&
		   (flags & (MIDX_WRITE_REV_INDEX | MIDX_WRITE_BITMAP))) {
		struct packed_git *oldest = ctx.info[ctx.preferred_pack_idx].p;
		ctx.preferred_pack_idx = 0;

I don't see a way that ctx.preferred_pack_idx can be anything but zero
here, but it's also not going to give a segfault because of the 'ctx.nr'
condition.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v3 09/10] repack: honor `-l` when calculating pack geometry
  2023-04-13 11:16   ` [PATCH v3 09/10] repack: honor `-l` when calculating pack geometry Patrick Steinhardt
@ 2023-04-13 13:59     ` Derrick Stolee
  2023-04-13 14:13       ` Patrick Steinhardt
  2023-04-13 15:40       ` Junio C Hamano
  0 siblings, 2 replies; 67+ messages in thread
From: Derrick Stolee @ 2023-04-13 13:59 UTC (permalink / raw)
  To: Patrick Steinhardt, git; +Cc: Taylor Blau, peff, dstolee

On 4/13/2023 7:16 AM, Patrick Steinhardt wrote:
> When the user passes `-l` to git-repack(1), then they essentially ask us
> to only repack objects part of the local object database while ignoring
> any packfiles part of an alternate object database. And we in fact honor
> this bit when doing a geometric repack as the resulting packfile will
> only ever contain local objects.

> +	# Verify that our assumptions actually hold: both generated packfiles
> +	# should have three objects and should be non-equal.
> +	packed_objects shared/.git/objects/pack/pack-*.idx >packed-objects &&
> +	test_line_count = 3 packed-objects &&
> +	packed_objects member/.git/objects/pack/pack-*.idx >packed-objetcs &&

Typo: s/packed-objetcs/packed-objects/

> +	test_line_count = 3 packed-objects &&
> +	test "$(basename member/.git/objects/pack/pack-*.pack)" != "$(basename shared/.git/objects/pack/pack-*.pack)" &&

nit: could we do this where we store the output of the previous two
commands into different files and then use "! test_cmp"?

	packed_objects shared/.git/objects/pack/pack-*.idx >shared-objects &&
	packed_objects member/.git/objects/pack/pack-*.idx >member-objects &&
	test_line_count = 3 shared-objects &&
	test_line_count = 3 member-objects &&
	! test_cmp shared-objects member-objects &&

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v3 09/10] repack: honor `-l` when calculating pack geometry
  2023-04-13 13:59     ` Derrick Stolee
@ 2023-04-13 14:13       ` Patrick Steinhardt
  2023-04-13 15:40       ` Junio C Hamano
  1 sibling, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-13 14:13 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Taylor Blau, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 1491 bytes --]

On Thu, Apr 13, 2023 at 09:59:02AM -0400, Derrick Stolee wrote:
> On 4/13/2023 7:16 AM, Patrick Steinhardt wrote:
> > When the user passes `-l` to git-repack(1), then they essentially ask us
> > to only repack objects part of the local object database while ignoring
> > any packfiles part of an alternate object database. And we in fact honor
> > this bit when doing a geometric repack as the resulting packfile will
> > only ever contain local objects.
> 
> > +	# Verify that our assumptions actually hold: both generated packfiles
> > +	# should have three objects and should be non-equal.
> > +	packed_objects shared/.git/objects/pack/pack-*.idx >packed-objects &&
> > +	test_line_count = 3 packed-objects &&
> > +	packed_objects member/.git/objects/pack/pack-*.idx >packed-objetcs &&
> 
> Typo: s/packed-objetcs/packed-objects/
> 
> > +	test_line_count = 3 packed-objects &&
> > +	test "$(basename member/.git/objects/pack/pack-*.pack)" != "$(basename shared/.git/objects/pack/pack-*.pack)" &&
> 
> nit: could we do this where we store the output of the previous two
> commands into different files and then use "! test_cmp"?
> 
> 	packed_objects shared/.git/objects/pack/pack-*.idx >shared-objects &&
> 	packed_objects member/.git/objects/pack/pack-*.idx >member-objects &&
> 	test_line_count = 3 shared-objects &&
> 	test_line_count = 3 member-objects &&
> 	! test_cmp shared-objects member-objects &&

Yeah, this is definitely easier to read.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v3 00/10] repack: fix geometric repacking with gitalternates
  2023-04-13 11:16 ` [PATCH v3 00/10] " Patrick Steinhardt
                     ` (9 preceding siblings ...)
  2023-04-13 11:16   ` [PATCH v3 10/10] repack: disable writing bitmaps when doing a local repack Patrick Steinhardt
@ 2023-04-13 14:54   ` Derrick Stolee
  2023-04-14  2:03   ` Junio C Hamano
  11 siblings, 0 replies; 67+ messages in thread
From: Derrick Stolee @ 2023-04-13 14:54 UTC (permalink / raw)
  To: Patrick Steinhardt, git; +Cc: Taylor Blau, peff, dstolee

On 4/13/2023 7:16 AM, Patrick Steinhardt wrote:
> Hi,
> 
> this is the third version of my patch series to fix geometric repacking
> with repositories that are connected to an alternate object database.
> 
> Changes compared to v2:
> 
>     - I've simplified patch 1 to reset the preferred pack index instead
>       of moving around some of the checks. This also causes us to print
>       the warning for missing preferred packfiles again.
> 
>     - I've fixed the logic in patch 2 to find the preferred packfile to
>       not return a packfile that would be rolled up in a geometric
>       repack.
> 
>     - I've added an additional patch to split out preexisting tests for
>       `--stdin-packs` into their own test file.
> 
>     - I've changed the unportable test added for geometric repacking
>       with `-l` that used stat(1) to instead use our `test-tool chmtime`
>       helper.
> 
>     - I've changed the logic that disables writing bitmaps in git-repack
>       to cover more cases. It now always kicks in when doing a repack
>       with `-l` that asks for bitmaps when connected to an alternate
>       object directory.
> 
>     - In general, there's a bunch of small improvements left and right
>       for the tests I'm adding.

I didn't get feedback on your v2 in time, but this v3 is very close.

There's just the one test change in patch 9 that I think deserves a
re-roll, but otherwise this looks good.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v3 09/10] repack: honor `-l` when calculating pack geometry
  2023-04-13 13:59     ` Derrick Stolee
  2023-04-13 14:13       ` Patrick Steinhardt
@ 2023-04-13 15:40       ` Junio C Hamano
  1 sibling, 0 replies; 67+ messages in thread
From: Junio C Hamano @ 2023-04-13 15:40 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Patrick Steinhardt, git, Taylor Blau, peff, dstolee

Derrick Stolee <derrickstolee@github.com> writes:

>> +	test_line_count = 3 packed-objects &&
>> +	packed_objects member/.git/objects/pack/pack-*.idx >packed-objetcs &&
>
> Typo: s/packed-objetcs/packed-objects/
>
>> +	test_line_count = 3 packed-objects &&

That's a good one.  Because the result file created by the previous
step happens to also have three lines, the typo does not cause the
test to fail, but it is not testing what it designed to test.

Here is "git diff @{1}" after my local "rebase -i" ("diff" of the
result is easier to read than "range-diff" in this case, as I did
not touch any log message).

Thanks for a very good set of eyes.

diff --git c/t/t5331-pack-objects-stdin.sh w/t/t5331-pack-objects-stdin.sh
index 45e24fa94a..acab31667a 100755
--- c/t/t5331-pack-objects-stdin.sh
+++ w/t/t5331-pack-objects-stdin.sh
@@ -7,7 +7,7 @@ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
-packed_objects() {
+packed_objects () {
 	git show-index <"$1" >tmp-object-list &&
 	cut -d' ' -f2 tmp-object-list | sort &&
 	rm tmp-object-list
diff --git c/t/t7703-repack-geometric.sh w/t/t7703-repack-geometric.sh
index 33d7977fca..57796f3437 100755
--- c/t/t7703-repack-geometric.sh
+++ w/t/t7703-repack-geometric.sh
@@ -10,7 +10,7 @@ objdir=.git/objects
 packdir=$objdir/pack
 midx=$objdir/pack/multi-pack-index
 
-packed_objects() {
+packed_objects () {
 	git show-index <"$1" >tmp-object-list &&
 	cut -d' ' -f2 tmp-object-list &&
 	rm tmp-object-list
@@ -380,7 +380,7 @@ test_expect_success '--geometric -l with non-intact geometric sequence across OD
 	# should have three objects and should be non-equal.
 	packed_objects shared/.git/objects/pack/pack-*.idx >packed-objects &&
 	test_line_count = 3 packed-objects &&
-	packed_objects member/.git/objects/pack/pack-*.idx >packed-objetcs &&
+	packed_objects member/.git/objects/pack/pack-*.idx >packed-objects &&
 	test_line_count = 3 packed-objects &&
 	test "$(basename member/.git/objects/pack/pack-*.pack)" != "$(basename shared/.git/objects/pack/pack-*.pack)" &&
 

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH v3 00/10] repack: fix geometric repacking with gitalternates
  2023-04-13 11:16 ` [PATCH v3 00/10] " Patrick Steinhardt
                     ` (10 preceding siblings ...)
  2023-04-13 14:54   ` [PATCH v3 00/10] repack: fix geometric repacking with gitalternates Derrick Stolee
@ 2023-04-14  2:03   ` Junio C Hamano
  2023-04-14  5:42     ` Patrick Steinhardt
  11 siblings, 1 reply; 67+ messages in thread
From: Junio C Hamano @ 2023-04-14  2:03 UTC (permalink / raw)
  To: Patrick Steinhardt; +Cc: git, Taylor Blau, peff, dstolee

Patrick Steinhardt <ps@pks.im> writes:

> Changes compared to v2:
>
>     - I've simplified patch 1 to reset the preferred pack index instead
>       of moving around some of the checks. This also causes us to print
>       the warning for missing preferred packfiles again.
>
>     - I've fixed the logic in patch 2 to find the preferred packfile to
>       not return a packfile that would be rolled up in a geometric
>       repack.
>
>     - I've added an additional patch to split out preexisting tests for
>       `--stdin-packs` into their own test file.
>
>     - I've changed the unportable test added for geometric repacking
>       with `-l` that used stat(1) to instead use our `test-tool chmtime`
>       helper.
>
>     - I've changed the logic that disables writing bitmaps in git-repack
>       to cover more cases. It now always kicks in when doing a repack
>       with `-l` that asks for bitmaps when connected to an alternate
>       object directory.
>
>     - In general, there's a bunch of small improvements left and right
>       for the tests I'm adding.
>
> Thanks for all the feedback so far.

Loss of "stat" is very much appreciated, as the ones added
in the previous round were the only hits from

    $ git grep '^[    ]*stat ' t/ ;# leading SP/HT plus "stat "

and I suspect among platforms that are not Windows, not Linux, and
not macOS, there may be ones that lack the tool.

With a few fix-ups to the test script I sent to the thread for the
review of [09/10] squashed in, this topic however seems to cause the
linux-TEST-vars job fail when queued at the tip of 'seen' (bbfd3bf)
[*1*].  When 'seen' is rewound by one merge to exclude this topic,
the CI runs OK (c35f3cd)[*2*].  I dug only far enough to identify
which topic among 40+ ones was causing the CI breakage.  Perhaps the
code is operating correctly but the expectation in the test needs
updating?  I dunno.

Thanks.


[References]
*1* https://github.com/git/git/actions/runs/4694670520/jobs/8323047888#step:5:313
*2* https://github.com/git/git/actions/runs/4694155668


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v3 00/10] repack: fix geometric repacking with gitalternates
  2023-04-14  2:03   ` Junio C Hamano
@ 2023-04-14  5:42     ` Patrick Steinhardt
  0 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-14  5:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Taylor Blau, peff, dstolee

[-- Attachment #1: Type: text/plain, Size: 2069 bytes --]

On Thu, Apr 13, 2023 at 07:03:48PM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
[snip]
> With a few fix-ups to the test script I sent to the thread for the
> review of [09/10] squashed in, this topic however seems to cause the
> linux-TEST-vars job fail when queued at the tip of 'seen' (bbfd3bf)
> [*1*].  When 'seen' is rewound by one merge to exclude this topic,
> the CI runs OK (c35f3cd)[*2*].  I dug only far enough to identify
> which topic among 40+ ones was causing the CI breakage.  Perhaps the
> code is operating correctly but the expectation in the test needs
> updating?  I dunno.

The issue is that linux-TEST-vars sets a bunch of environment variables
that change the way git-repack(1) operates. Important in this context
are GIT_TEST_MULTI_PACK_INDEX and GIT_TEST_MULTI_PACK_INDEX_GENERATE_BITMAP
as they change the assumptions that we're doing in our test. What is
interesting is that these cause us to write a multi-pack-index and in
fact a bitmap, but the warning that we're testing for is not printed.
And that's because of the following code snippet:

```
if (write_bitmaps < 0) {
        if (!write_midx &&
            (!(pack_everything & ALL_INTO_ONE) || !is_bare_repository()))
                write_bitmaps = 0;
} else if (write_bitmaps &&
           git_env_bool(GIT_TEST_MULTI_PACK_INDEX, 0) &&
           git_env_bool(GIT_TEST_MULTI_PACK_INDEX_WRITE_BITMAP, 0)) {
        write_bitmaps = 0;
}
```

So if either of those environment variables is set we'll disable
generation of bitmaps in git-repack(1) in favor of enabling them in
git-multi-pack-index(1). And consequentially we don't see the expected
error message.

Anyway. I think what we should do here is to simply override the
mechanism with GIT_TEST_MULTI_PACK_INDEX=0. Other tests in this file do
the same already, and it does make sense as we explicitly want to test
the non-multi-pack-index scenario. And given that we already added a
test for the with-multi-pack-index scenario in the same commit we don't
really loose any test coverage there.

Patrick

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 67+ messages in thread

* [PATCH v4 00/10] repack: fix geometric repacking with gitalternates
  2023-04-04 11:08 [PATCH] repack: fix geometric repacking with gitalternates Patrick Steinhardt
                   ` (2 preceding siblings ...)
  2023-04-13 11:16 ` [PATCH v3 00/10] " Patrick Steinhardt
@ 2023-04-14  6:01 ` Patrick Steinhardt
  2023-04-14  6:01   ` [PATCH v4 01/10] midx: fix segfault with no packs and invalid preferred pack Patrick Steinhardt
                     ` (10 more replies)
  3 siblings, 11 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-14  6:01 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, peff, dstolee, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 7189 bytes --]

Hi,

this is the fourth version of my patch series to fix geometric repacking
with repositories connected to an alternate object database.

This version only addresses some issues with the tests, the actual logic
remains untouched:

    - The test added in t7700-repack.sh that verifies that `--local`
      causes us to disable generation of the bitmap index was failing in
      the linux-TEST-vars CI job. This was because it sets
      GIT_TEST_MULTI_PACK_INDEX=1, which causes us to disable the bitmap
      logic in git-repack(1). I've fixed this failure by explicitly
      overriding the environment variable like other tests in the same
      file do.

    - I've converted path checks to use `test_path_is_missing` and
      `test_path_is_file` instead of `test ! -f` and `test -f`.

    - I've fixed a typo in t7703-repack-geometric.sh and shifted code
      around a bit to make the test more readable, following Derrick's
      suggestion.

Thanks!

Patrick

Patrick Steinhardt (10):
  midx: fix segfault with no packs and invalid preferred pack
  repack: fix trying to use preferred pack in alternates
  repack: fix generating multi-pack-index with only non-local packs
  pack-objects: split out `--stdin-packs` tests into separate file
  pack-objects: fix error when packing same pack twice
  pack-objects: fix error when same packfile is included and excluded
  pack-objects: extend test coverage of `--stdin-packs` with alternates
  t/helper: allow chmtime to print verbosely without modifying mtime
  repack: honor `-l` when calculating pack geometry
  repack: disable writing bitmaps when doing a local repack

 builtin/pack-objects.c        |  10 +-
 builtin/repack.c              |  62 ++++++++-
 midx.c                        |   6 +-
 object-file.c                 |   6 +
 object-store.h                |   1 +
 t/helper/test-chmtime.c       |   2 +-
 t/t5300-pack-object.sh        | 135 -------------------
 t/t5319-multi-pack-index.sh   |  12 ++
 t/t5331-pack-objects-stdin.sh | 240 ++++++++++++++++++++++++++++++++++
 t/t7700-repack.sh             |  17 +++
 t/t7703-repack-geometric.sh   | 164 +++++++++++++++++++++++
 11 files changed, 504 insertions(+), 151 deletions(-)
 create mode 100755 t/t5331-pack-objects-stdin.sh

Diff against v3:
diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
index 45e24fa94a..acab31667a 100755
--- a/t/t5331-pack-objects-stdin.sh
+++ b/t/t5331-pack-objects-stdin.sh
@@ -7,7 +7,7 @@ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
-packed_objects() {
+packed_objects () {
 	git show-index <"$1" >tmp-object-list &&
 	cut -d' ' -f2 tmp-object-list | sort &&
 	rm tmp-object-list
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 93c5763e7d..faa739eeb9 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -114,12 +114,12 @@ test_expect_success '--local disables writing bitmaps when connected to alternat
 	(
 		cd member &&
 		test_commit "object" &&
-		git repack -Adl --write-bitmap-index 2>err &&
+		GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adl --write-bitmap-index 2>err &&
 		cat >expect <<-EOF &&
 		warning: disabling bitmap writing, as some objects are not being packed
 		EOF
 		test_cmp expect err &&
-		test ! -f .git/objects/pack-*.bitmap
+		test_path_is_missing .git/objects/pack-*.bitmap
 	)
 '
 
diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
index 33d7977fca..00f28fb558 100755
--- a/t/t7703-repack-geometric.sh
+++ b/t/t7703-repack-geometric.sh
@@ -10,9 +10,9 @@ objdir=.git/objects
 packdir=$objdir/pack
 midx=$objdir/pack/multi-pack-index
 
-packed_objects() {
+packed_objects () {
 	git show-index <"$1" >tmp-object-list &&
-	cut -d' ' -f2 tmp-object-list &&
+	cut -d' ' -f2 tmp-object-list | sort &&
 	rm tmp-object-list
  }
 
@@ -312,7 +312,7 @@ test_expect_success '--geometric --write-midx with packfiles in main and alterna
 
 	# We should also see a multi-pack-index. This multi-pack-index should
 	# never refer to any packfiles in the alternate object database.
-	test -f member/.git/objects/pack/multi-pack-index &&
+	test_path_is_file member/.git/objects/pack/multi-pack-index &&
 	test-tool read-midx member/.git/objects >packs.midx &&
 	grep "^pack-.*\.idx$" packs.midx | sort >actual &&
 	basename member/.git/objects/pack/pack-*.idx >expect &&
@@ -378,11 +378,11 @@ test_expect_success '--geometric -l with non-intact geometric sequence across OD
 
 	# Verify that our assumptions actually hold: both generated packfiles
 	# should have three objects and should be non-equal.
-	packed_objects shared/.git/objects/pack/pack-*.idx >packed-objects &&
-	test_line_count = 3 packed-objects &&
-	packed_objects member/.git/objects/pack/pack-*.idx >packed-objetcs &&
-	test_line_count = 3 packed-objects &&
-	test "$(basename member/.git/objects/pack/pack-*.pack)" != "$(basename shared/.git/objects/pack/pack-*.pack)" &&
+	packed_objects shared/.git/objects/pack/pack-*.idx >shared-objects &&
+	packed_objects member/.git/objects/pack/pack-*.idx >member-objects &&
+	test_line_count = 3 shared-objects &&
+	test_line_count = 3 member-objects &&
+	! test_cmp shared-objects member-objects &&
 
 	# Perform the geometric repack. With `-l`, we should only see the local
 	# packfile and thus arrive at the conclusion that the geometric
@@ -415,8 +415,7 @@ test_expect_success '--geometric -l with non-intact geometric sequence across OD
 	test_line_count = 1 actual &&
 	packed_objects member/.git/objects/pack/pack-*.idx >actual-objects &&
 	test_line_count = 6 actual-objects &&
-	sort <actual-objects >actual-objects.sorted &&
-	test_cmp expected-objects actual-objects.sorted
+	test_cmp expected-objects actual-objects
 '
 
 test_expect_success '--geometric -l disables writing bitmaps with non-local packfiles' '
@@ -428,7 +427,7 @@ test_expect_success '--geometric -l disables writing bitmaps with non-local pack
 	git clone --shared shared member &&
 	test_commit_bulk -C member --start=2 1 &&
 
-	# When performing a geometric repack with `-l` while connecting to an
+	# When performing a geometric repack with `-l` while connected to an
 	# alternate object database that has a packfile we do not have full
 	# coverage of objects. As a result, we expect that writing the bitmap
 	# will be disabled.
@@ -437,13 +436,13 @@ test_expect_success '--geometric -l disables writing bitmaps with non-local pack
 	warning: disabling bitmap writing, as some objects are not being packed
 	EOF
 	test_cmp expect err &&
-	test ! -f member/.git/objects/pack/multi-pack-index-*.bitmap &&
+	test_path_is_missing member/.git/objects/pack/multi-pack-index-*.bitmap &&
 
 	# On the other hand, when we repack without `-l`, we should see that
 	# the bitmap gets created.
 	git -C member repack --geometric=2 --write-midx --write-bitmap-index 2>err &&
 	test_must_be_empty err &&
-	test -f member/.git/objects/pack/multi-pack-index-*.bitmap
+	test_path_is_file member/.git/objects/pack/multi-pack-index-*.bitmap
 '
 
 test_done
-- 
2.40.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 01/10] midx: fix segfault with no packs and invalid preferred pack
  2023-04-14  6:01 ` [PATCH v4 " Patrick Steinhardt
@ 2023-04-14  6:01   ` Patrick Steinhardt
  2023-04-14  6:01   ` [PATCH v4 02/10] repack: fix trying to use preferred pack in alternates Patrick Steinhardt
                     ` (9 subsequent siblings)
  10 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-14  6:01 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, peff, dstolee, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 2557 bytes --]

When asked to write a multi-pack-index the user can specify a preferred
pack that is used as a tie breaker when multiple packs contain the same
objects. When this packfile cannot be found, we just pick the first pack
that is getting tracked by the newly written multi-pack-index as a
fallback.

Picking the fallback can fail in the case where we're asked to write a
multi-pack-index with no packfiles at all: picking the fallback value
will cause a segfault as we blindly index into the array of packfiles,
which would be empty.

Fix this bug by resetting the preferred packfile index to `-1` before
searching for the preferred pack. This fixes the segfault as we already
check for whether the index is `> - 1`. If it is not, we simply don't
pick a preferred packfile at all.

Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 midx.c                      |  6 +++---
 t/t5319-multi-pack-index.sh | 12 ++++++++++++
 2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/midx.c b/midx.c
index 47989f7ea7..67eb617591 100644
--- a/midx.c
+++ b/midx.c
@@ -1328,17 +1328,17 @@ static int write_midx_internal(const char *object_dir,
 	}
 
 	if (preferred_pack_name) {
-		int found = 0;
+		ctx.preferred_pack_idx = -1;
+
 		for (i = 0; i < ctx.nr; i++) {
 			if (!cmp_idx_or_pack_name(preferred_pack_name,
 						  ctx.info[i].pack_name)) {
 				ctx.preferred_pack_idx = i;
-				found = 1;
 				break;
 			}
 		}
 
-		if (!found)
+		if (ctx.preferred_pack_idx == -1)
 			warning(_("unknown preferred pack: '%s'"),
 				preferred_pack_name);
 	} else if (ctx.nr &&
diff --git a/t/t5319-multi-pack-index.sh b/t/t5319-multi-pack-index.sh
index 499d5d4c78..0883c7c6bd 100755
--- a/t/t5319-multi-pack-index.sh
+++ b/t/t5319-multi-pack-index.sh
@@ -183,6 +183,18 @@ test_expect_success 'write midx with --stdin-packs' '
 
 compare_results_with_midx "mixed mode (one pack + extra)"
 
+test_expect_success 'write with no objects and preferred pack' '
+	test_when_finished "rm -rf empty" &&
+	git init empty &&
+	test_must_fail git -C empty multi-pack-index write \
+		--stdin-packs --preferred-pack=does-not-exist </dev/null 2>err &&
+	cat >expect <<-EOF &&
+	warning: unknown preferred pack: ${SQ}does-not-exist${SQ}
+	error: no pack files to index.
+	EOF
+	test_cmp expect err
+'
+
 test_expect_success 'write progress off for redirected stderr' '
 	git multi-pack-index --object-dir=$objdir write 2>err &&
 	test_line_count = 0 err
-- 
2.40.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 02/10] repack: fix trying to use preferred pack in alternates
  2023-04-14  6:01 ` [PATCH v4 " Patrick Steinhardt
  2023-04-14  6:01   ` [PATCH v4 01/10] midx: fix segfault with no packs and invalid preferred pack Patrick Steinhardt
@ 2023-04-14  6:01   ` Patrick Steinhardt
  2023-04-14  6:01   ` [PATCH v4 03/10] repack: fix generating multi-pack-index with only non-local packs Patrick Steinhardt
                     ` (8 subsequent siblings)
  10 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-14  6:01 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, peff, dstolee, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 4825 bytes --]

When doing a geometric repack with multi-pack-indices, then we ask
git-multi-pack-index(1) to use the largest packfile as the preferred
pack. It can happen though that the largest packfile is not part of the
main object database, but instead part of an alternate object database.
The result is that git-multi-pack-index(1) will not be able to find the
preferred pack and print a warning. It then falls back to use the first
packfile that the multi-pack-index shall reference.

Fix this bug by only considering packfiles as preferred pack that are
local. This is the right thing to do given that a multi-pack-index
should never reference packfiles borrowed from an alternate.

While at it, rename the function `get_largest_active_packfile()` to
`get_preferred_pack()` to better document its intent.

Helped-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/repack.c            | 26 +++++++++++++++++++++-----
 t/t7703-repack-geometric.sh | 32 ++++++++++++++++++++++++++++++++
 2 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 87f73c8923..687419776d 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -446,8 +446,10 @@ static void split_pack_geometry(struct pack_geometry *geometry, int factor)
 	geometry->split = split;
 }
 
-static struct packed_git *get_largest_active_pack(struct pack_geometry *geometry)
+static struct packed_git *get_preferred_pack(struct pack_geometry *geometry)
 {
+	uint32_t i;
+
 	if (!geometry) {
 		/*
 		 * No geometry means either an all-into-one repack (in which
@@ -462,7 +464,21 @@ static struct packed_git *get_largest_active_pack(struct pack_geometry *geometry
 	}
 	if (geometry->split == geometry->pack_nr)
 		return NULL;
-	return geometry->pack[geometry->pack_nr - 1];
+
+	/*
+	 * The preferred pack is the largest pack above the split line. In
+	 * other words, it is the largest pack that does not get rolled up in
+	 * the geometric repack.
+	 */
+	for (i = geometry->pack_nr; i > geometry->split; i--)
+		/*
+		 * A pack that is not local would never be included in a
+		 * multi-pack index. We thus skip over any non-local packs.
+		 */
+		if (geometry->pack[i - 1]->pack_local)
+			return geometry->pack[i - 1];
+
+	return NULL;
 }
 
 static void clear_pack_geometry(struct pack_geometry *geometry)
@@ -589,7 +605,7 @@ static int write_midx_included_packs(struct string_list *include,
 {
 	struct child_process cmd = CHILD_PROCESS_INIT;
 	struct string_list_item *item;
-	struct packed_git *largest = get_largest_active_pack(geometry);
+	struct packed_git *preferred = get_preferred_pack(geometry);
 	FILE *in;
 	int ret;
 
@@ -610,9 +626,9 @@ static int write_midx_included_packs(struct string_list *include,
 	if (write_bitmaps)
 		strvec_push(&cmd.args, "--bitmap");
 
-	if (largest)
+	if (preferred)
 		strvec_pushf(&cmd.args, "--preferred-pack=%s",
-			     pack_basename(largest));
+			     pack_basename(preferred));
 
 	if (refs_snapshot)
 		strvec_pushf(&cmd.args, "--refs-snapshot=%s", refs_snapshot);
diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
index 8821fbd2dd..4abc7d4c55 100755
--- a/t/t7703-repack-geometric.sh
+++ b/t/t7703-repack-geometric.sh
@@ -281,4 +281,36 @@ test_expect_success '--geometric with pack.packSizeLimit' '
 	)
 '
 
+test_expect_success '--geometric --write-midx with packfiles in main and alternate ODB' '
+	test_when_finished "rm -fr shared member" &&
+
+	# Create a shared repository that will serve as the alternate object
+	# database for the member linked to it. It has got some objects on its
+	# own that are packed into a single packfile.
+	git init shared &&
+	test_commit -C shared common-object &&
+	git -C shared repack -ad &&
+
+	# We create member so that its alternates file points to the shared
+	# repository. We then create a commit in it so that git-repack(1) has
+	# something to repack.
+	# of the shared object database.
+	git clone --shared shared member &&
+	test_commit -C member unique-object &&
+	git -C member repack --geometric=2 --write-midx 2>err &&
+	test_must_be_empty err &&
+
+	# We should see that a new packfile was generated.
+	find shared/.git/objects/pack -type f -name "*.pack" >packs &&
+	test_line_count = 1 packs &&
+
+	# We should also see a multi-pack-index. This multi-pack-index should
+	# never refer to any packfiles in the alternate object database.
+	test_path_is_file member/.git/objects/pack/multi-pack-index &&
+	test-tool read-midx member/.git/objects >packs.midx &&
+	grep "^pack-.*\.idx$" packs.midx | sort >actual &&
+	basename member/.git/objects/pack/pack-*.idx >expect &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.40.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 03/10] repack: fix generating multi-pack-index with only non-local packs
  2023-04-14  6:01 ` [PATCH v4 " Patrick Steinhardt
  2023-04-14  6:01   ` [PATCH v4 01/10] midx: fix segfault with no packs and invalid preferred pack Patrick Steinhardt
  2023-04-14  6:01   ` [PATCH v4 02/10] repack: fix trying to use preferred pack in alternates Patrick Steinhardt
@ 2023-04-14  6:01   ` Patrick Steinhardt
  2023-04-14  6:01   ` [PATCH v4 04/10] pack-objects: split out `--stdin-packs` tests into separate file Patrick Steinhardt
                     ` (7 subsequent siblings)
  10 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-14  6:01 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, peff, dstolee, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 3130 bytes --]

When writing the multi-pack-index with geometric repacking we will add
all packfiles to the index that are part of the geometric sequence. This
can potentially also include packfiles borrowed from an alternate object
directory. But given that a multi-pack-index can only ever include packs
that are part of the main object database this does not make much sense
whatsoever.

In the edge case where all packfiles are contained in the alternate
object database and the local repository has none itself this bug can
cause us to invoke git-multi-pack-index(1) with only non-local packfiles
that it ultimately cannot find. This causes it to return an error and
thus causes the geometric repack to fail.

Fix the code to skip non-local packfiles.

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/repack.c            | 11 +++++++++++
 t/t7703-repack-geometric.sh | 23 +++++++++++++++++++++++
 2 files changed, 34 insertions(+)

diff --git a/builtin/repack.c b/builtin/repack.c
index 687419776d..80fc860613 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -572,6 +572,17 @@ static void midx_included_packs(struct string_list *include,
 		for (i = geometry->split; i < geometry->pack_nr; i++) {
 			struct packed_git *p = geometry->pack[i];
 
+			/*
+			 * The multi-pack index never refers to packfiles part
+			 * of an alternate object database, so we skip these.
+			 * While git-multi-pack-index(1) would silently ignore
+			 * them anyway, this allows us to skip executing the
+			 * command completely when we have only non-local
+			 * packfiles.
+			 */
+			if (!p->pack_local)
+				continue;
+
 			strbuf_addstr(&buf, pack_basename(p));
 			strbuf_strip_suffix(&buf, ".pack");
 			strbuf_addstr(&buf, ".idx");
diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
index 4abc7d4c55..9dd002437f 100755
--- a/t/t7703-repack-geometric.sh
+++ b/t/t7703-repack-geometric.sh
@@ -313,4 +313,27 @@ test_expect_success '--geometric --write-midx with packfiles in main and alterna
 	test_cmp expect actual
 '
 
+test_expect_success '--geometric --with-midx with no local objects' '
+	test_when_finished "rm -fr shared member" &&
+
+	# Create a repository with a single packfile that acts as alternate
+	# object database.
+	git init shared &&
+	test_commit -C shared "shared-objects" &&
+	git -C shared repack -ad &&
+
+	# Create a second repository linked to the first one and perform a
+	# geometric repack on it.
+	git clone --shared shared member &&
+	git -C member repack --geometric 2 --write-midx 2>err &&
+	test_must_be_empty err &&
+
+	# Assert that we wrote neither a new packfile nor a multi-pack-index.
+	# We should not have a packfile because the single packfile in the
+	# alternate object database does not invalidate the geometric sequence.
+	# And we should not have a multi-pack-index because these only index
+	# local packfiles, and there are none.
+	test_dir_is_empty member/$packdir
+'
+
 test_done
-- 
2.40.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 04/10] pack-objects: split out `--stdin-packs` tests into separate file
  2023-04-14  6:01 ` [PATCH v4 " Patrick Steinhardt
                     ` (2 preceding siblings ...)
  2023-04-14  6:01   ` [PATCH v4 03/10] repack: fix generating multi-pack-index with only non-local packs Patrick Steinhardt
@ 2023-04-14  6:01   ` Patrick Steinhardt
  2023-04-14  6:01   ` [PATCH v4 05/10] pack-objects: fix error when packing same pack twice Patrick Steinhardt
                     ` (6 subsequent siblings)
  10 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-14  6:01 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, peff, dstolee, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 8776 bytes --]

The test suite for git-pack-objects(1) is quite huge, and we're about to
add more tests that relate to the `--stdin-packs` option. Split out all
tests related to this option into a standalone file so that it becomes
easier to test the feature in isolation.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 t/t5300-pack-object.sh        | 135 -------------------------------
 t/t5331-pack-objects-stdin.sh | 145 ++++++++++++++++++++++++++++++++++
 2 files changed, 145 insertions(+), 135 deletions(-)
 create mode 100755 t/t5331-pack-objects-stdin.sh

diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index f8a0f309e2..d2ce236d61 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -589,141 +589,6 @@ test_expect_success 'prefetch objects' '
 	test_line_count = 1 donelines
 '
 
-test_expect_success 'setup for --stdin-packs tests' '
-	git init stdin-packs &&
-	(
-		cd stdin-packs &&
-
-		test_commit A &&
-		test_commit B &&
-		test_commit C &&
-
-		for id in A B C
-		do
-			git pack-objects .git/objects/pack/pack-$id \
-				--incremental --revs <<-EOF || exit 1
-			refs/tags/$id
-			EOF
-		done &&
-
-		ls -la .git/objects/pack
-	)
-'
-
-test_expect_success '--stdin-packs with excluded packs' '
-	(
-		cd stdin-packs &&
-
-		PACK_A="$(basename .git/objects/pack/pack-A-*.pack)" &&
-		PACK_B="$(basename .git/objects/pack/pack-B-*.pack)" &&
-		PACK_C="$(basename .git/objects/pack/pack-C-*.pack)" &&
-
-		git pack-objects test --stdin-packs <<-EOF &&
-		$PACK_A
-		^$PACK_B
-		$PACK_C
-		EOF
-
-		(
-			git show-index <$(ls .git/objects/pack/pack-A-*.idx) &&
-			git show-index <$(ls .git/objects/pack/pack-C-*.idx)
-		) >expect.raw &&
-		git show-index <$(ls test-*.idx) >actual.raw &&
-
-		cut -d" " -f2 <expect.raw | sort >expect &&
-		cut -d" " -f2 <actual.raw | sort >actual &&
-		test_cmp expect actual
-	)
-'
-
-test_expect_success '--stdin-packs is incompatible with --filter' '
-	(
-		cd stdin-packs &&
-		test_must_fail git pack-objects --stdin-packs --stdout \
-			--filter=blob:none </dev/null 2>err &&
-		test_i18ngrep "cannot use --filter with --stdin-packs" err
-	)
-'
-
-test_expect_success '--stdin-packs is incompatible with --revs' '
-	(
-		cd stdin-packs &&
-		test_must_fail git pack-objects --stdin-packs --revs out \
-			</dev/null 2>err &&
-		test_i18ngrep "cannot use internal rev list with --stdin-packs" err
-	)
-'
-
-test_expect_success '--stdin-packs with loose objects' '
-	(
-		cd stdin-packs &&
-
-		PACK_A="$(basename .git/objects/pack/pack-A-*.pack)" &&
-		PACK_B="$(basename .git/objects/pack/pack-B-*.pack)" &&
-		PACK_C="$(basename .git/objects/pack/pack-C-*.pack)" &&
-
-		test_commit D && # loose
-
-		git pack-objects test2 --stdin-packs --unpacked <<-EOF &&
-		$PACK_A
-		^$PACK_B
-		$PACK_C
-		EOF
-
-		(
-			git show-index <$(ls .git/objects/pack/pack-A-*.idx) &&
-			git show-index <$(ls .git/objects/pack/pack-C-*.idx) &&
-			git rev-list --objects --no-object-names \
-				refs/tags/C..refs/tags/D
-
-		) >expect.raw &&
-		ls -la . &&
-		git show-index <$(ls test2-*.idx) >actual.raw &&
-
-		cut -d" " -f2 <expect.raw | sort >expect &&
-		cut -d" " -f2 <actual.raw | sort >actual &&
-		test_cmp expect actual
-	)
-'
-
-test_expect_success '--stdin-packs with broken links' '
-	(
-		cd stdin-packs &&
-
-		# make an unreachable object with a bogus parent
-		git cat-file -p HEAD >commit &&
-		sed "s/$(git rev-parse HEAD^)/$(test_oid zero)/" <commit |
-		git hash-object -w -t commit --stdin >in &&
-
-		git pack-objects .git/objects/pack/pack-D <in &&
-
-		PACK_A="$(basename .git/objects/pack/pack-A-*.pack)" &&
-		PACK_B="$(basename .git/objects/pack/pack-B-*.pack)" &&
-		PACK_C="$(basename .git/objects/pack/pack-C-*.pack)" &&
-		PACK_D="$(basename .git/objects/pack/pack-D-*.pack)" &&
-
-		git pack-objects test3 --stdin-packs --unpacked <<-EOF &&
-		$PACK_A
-		^$PACK_B
-		$PACK_C
-		$PACK_D
-		EOF
-
-		(
-			git show-index <$(ls .git/objects/pack/pack-A-*.idx) &&
-			git show-index <$(ls .git/objects/pack/pack-C-*.idx) &&
-			git show-index <$(ls .git/objects/pack/pack-D-*.idx) &&
-			git rev-list --objects --no-object-names \
-				refs/tags/C..refs/tags/D
-		) >expect.raw &&
-		git show-index <$(ls test3-*.idx) >actual.raw &&
-
-		cut -d" " -f2 <expect.raw | sort >expect &&
-		cut -d" " -f2 <actual.raw | sort >actual &&
-		test_cmp expect actual
-	)
-'
-
 test_expect_success 'negative window clamps to 0' '
 	git pack-objects --progress --window=-1 neg-window <obj-list 2>stderr &&
 	check_deltas stderr = 0
diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
new file mode 100755
index 0000000000..d5eece5899
--- /dev/null
+++ b/t/t5331-pack-objects-stdin.sh
@@ -0,0 +1,145 @@
+#!/bin/sh
+
+test_description='pack-objects --stdin'
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+
+TEST_PASSES_SANITIZE_LEAK=true
+. ./test-lib.sh
+
+test_expect_success 'setup for --stdin-packs tests' '
+	git init stdin-packs &&
+	(
+		cd stdin-packs &&
+
+		test_commit A &&
+		test_commit B &&
+		test_commit C &&
+
+		for id in A B C
+		do
+			git pack-objects .git/objects/pack/pack-$id \
+				--incremental --revs <<-EOF || exit 1
+			refs/tags/$id
+			EOF
+		done &&
+
+		ls -la .git/objects/pack
+	)
+'
+
+test_expect_success '--stdin-packs with excluded packs' '
+	(
+		cd stdin-packs &&
+
+		PACK_A="$(basename .git/objects/pack/pack-A-*.pack)" &&
+		PACK_B="$(basename .git/objects/pack/pack-B-*.pack)" &&
+		PACK_C="$(basename .git/objects/pack/pack-C-*.pack)" &&
+
+		git pack-objects test --stdin-packs <<-EOF &&
+		$PACK_A
+		^$PACK_B
+		$PACK_C
+		EOF
+
+		(
+			git show-index <$(ls .git/objects/pack/pack-A-*.idx) &&
+			git show-index <$(ls .git/objects/pack/pack-C-*.idx)
+		) >expect.raw &&
+		git show-index <$(ls test-*.idx) >actual.raw &&
+
+		cut -d" " -f2 <expect.raw | sort >expect &&
+		cut -d" " -f2 <actual.raw | sort >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success '--stdin-packs is incompatible with --filter' '
+	(
+		cd stdin-packs &&
+		test_must_fail git pack-objects --stdin-packs --stdout \
+			--filter=blob:none </dev/null 2>err &&
+		test_i18ngrep "cannot use --filter with --stdin-packs" err
+	)
+'
+
+test_expect_success '--stdin-packs is incompatible with --revs' '
+	(
+		cd stdin-packs &&
+		test_must_fail git pack-objects --stdin-packs --revs out \
+			</dev/null 2>err &&
+		test_i18ngrep "cannot use internal rev list with --stdin-packs" err
+	)
+'
+
+test_expect_success '--stdin-packs with loose objects' '
+	(
+		cd stdin-packs &&
+
+		PACK_A="$(basename .git/objects/pack/pack-A-*.pack)" &&
+		PACK_B="$(basename .git/objects/pack/pack-B-*.pack)" &&
+		PACK_C="$(basename .git/objects/pack/pack-C-*.pack)" &&
+
+		test_commit D && # loose
+
+		git pack-objects test2 --stdin-packs --unpacked <<-EOF &&
+		$PACK_A
+		^$PACK_B
+		$PACK_C
+		EOF
+
+		(
+			git show-index <$(ls .git/objects/pack/pack-A-*.idx) &&
+			git show-index <$(ls .git/objects/pack/pack-C-*.idx) &&
+			git rev-list --objects --no-object-names \
+				refs/tags/C..refs/tags/D
+
+		) >expect.raw &&
+		ls -la . &&
+		git show-index <$(ls test2-*.idx) >actual.raw &&
+
+		cut -d" " -f2 <expect.raw | sort >expect &&
+		cut -d" " -f2 <actual.raw | sort >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_expect_success '--stdin-packs with broken links' '
+	(
+		cd stdin-packs &&
+
+		# make an unreachable object with a bogus parent
+		git cat-file -p HEAD >commit &&
+		sed "s/$(git rev-parse HEAD^)/$(test_oid zero)/" <commit |
+		git hash-object -w -t commit --stdin >in &&
+
+		git pack-objects .git/objects/pack/pack-D <in &&
+
+		PACK_A="$(basename .git/objects/pack/pack-A-*.pack)" &&
+		PACK_B="$(basename .git/objects/pack/pack-B-*.pack)" &&
+		PACK_C="$(basename .git/objects/pack/pack-C-*.pack)" &&
+		PACK_D="$(basename .git/objects/pack/pack-D-*.pack)" &&
+
+		git pack-objects test3 --stdin-packs --unpacked <<-EOF &&
+		$PACK_A
+		^$PACK_B
+		$PACK_C
+		$PACK_D
+		EOF
+
+		(
+			git show-index <$(ls .git/objects/pack/pack-A-*.idx) &&
+			git show-index <$(ls .git/objects/pack/pack-C-*.idx) &&
+			git show-index <$(ls .git/objects/pack/pack-D-*.idx) &&
+			git rev-list --objects --no-object-names \
+				refs/tags/C..refs/tags/D
+		) >expect.raw &&
+		git show-index <$(ls test3-*.idx) >actual.raw &&
+
+		cut -d" " -f2 <expect.raw | sort >expect &&
+		cut -d" " -f2 <actual.raw | sort >actual &&
+		test_cmp expect actual
+	)
+'
+
+test_done
-- 
2.40.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 05/10] pack-objects: fix error when packing same pack twice
  2023-04-14  6:01 ` [PATCH v4 " Patrick Steinhardt
                     ` (3 preceding siblings ...)
  2023-04-14  6:01   ` [PATCH v4 04/10] pack-objects: split out `--stdin-packs` tests into separate file Patrick Steinhardt
@ 2023-04-14  6:01   ` Patrick Steinhardt
  2023-04-14  6:01   ` [PATCH v4 06/10] pack-objects: fix error when same packfile is included and excluded Patrick Steinhardt
                     ` (5 subsequent siblings)
  10 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-14  6:01 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, peff, dstolee, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 4287 bytes --]

When passed the same packfile twice via `--stdin-packs` we return an
error that the packfile supposedly was not found. This is because when
reading packs into the list of included or excluded packfiles, we will
happily re-add packfiles even if they are part of the lists already. And
while the list can now contain duplicates, we will only set the `util`
pointer of the first list entry to the `packed_git` structure. We notice
that at a later point when checking that all list entries have their
`util` pointer set and die with an error.

While this is kind of a nonsensical request, this scenario can be hit
when doing geometric repacks. When a repository is connected to an
alternate object directory and both have the exact same packfile then
both would get added to the geometric sequence. And when we then decide
to perform the repack, we will invoke git-pack-objects(1) with the same
packfile twice.

Fix this bug by removing any duplicates from both the included and
excluded packs.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/pack-objects.c        |  2 ++
 t/t5331-pack-objects-stdin.sh | 27 +++++++++++++++++++++++++++
 t/t7703-repack-geometric.sh   | 25 +++++++++++++++++++++++++
 3 files changed, 54 insertions(+)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 7e9e20172a..7d0e864c35 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3348,7 +3348,9 @@ static void read_packs_list_from_stdin(void)
 	}
 
 	string_list_sort(&include_packs);
+	string_list_remove_duplicates(&include_packs, 0);
 	string_list_sort(&exclude_packs);
+	string_list_remove_duplicates(&exclude_packs, 0);
 
 	for (p = get_all_packs(the_repository); p; p = p->next) {
 		const char *pack_name = pack_basename(p);
diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
index d5eece5899..71c8a4a635 100755
--- a/t/t5331-pack-objects-stdin.sh
+++ b/t/t5331-pack-objects-stdin.sh
@@ -7,6 +7,12 @@ export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
+packed_objects () {
+	git show-index <"$1" >tmp-object-list &&
+	cut -d' ' -f2 tmp-object-list | sort &&
+	rm tmp-object-list
+ }
+
 test_expect_success 'setup for --stdin-packs tests' '
 	git init stdin-packs &&
 	(
@@ -142,4 +148,25 @@ test_expect_success '--stdin-packs with broken links' '
 	)
 '
 
+test_expect_success 'pack-objects --stdin with duplicate packfile' '
+	test_when_finished "rm -fr repo" &&
+
+	git init repo &&
+	(
+		cd repo &&
+		test_commit "commit" &&
+		git repack -ad &&
+
+		{
+			basename .git/objects/pack/pack-*.pack &&
+			basename .git/objects/pack/pack-*.pack
+		} >packfiles &&
+
+		git pack-objects --stdin-packs generated-pack <packfiles &&
+		packed_objects .git/objects/pack/pack-*.idx >expect &&
+		packed_objects generated-pack-*.idx >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_done
diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
index 9dd002437f..a358dfb7bd 100755
--- a/t/t7703-repack-geometric.sh
+++ b/t/t7703-repack-geometric.sh
@@ -336,4 +336,29 @@ test_expect_success '--geometric --with-midx with no local objects' '
 	test_dir_is_empty member/$packdir
 '
 
+test_expect_success '--geometric with same pack in main and alternate ODB' '
+	test_when_finished "rm -fr shared member" &&
+
+	# Create a repository with a single packfile that acts as alternate
+	# object database.
+	git init shared &&
+	test_commit -C shared "shared-objects" &&
+	git -C shared repack -ad &&
+
+	# We create the member repository as an exact copy so that it has the
+	# same packfile.
+	cp -r shared member &&
+	test-tool path-utils real_path shared/.git/objects >member/.git/objects/info/alternates &&
+	find shared/.git/objects -type f >expected-files &&
+
+	# Verify that we can repack objects as expected without observing any
+	# error. Having the same packfile in both ODBs used to cause an error
+	# in git-pack-objects(1).
+	git -C member repack --geometric 2 2>err &&
+	test_must_be_empty err &&
+	# Nothing should have changed.
+	find shared/.git/objects -type f >actual-files &&
+	test_cmp expected-files actual-files
+'
+
 test_done
-- 
2.40.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 06/10] pack-objects: fix error when same packfile is included and excluded
  2023-04-14  6:01 ` [PATCH v4 " Patrick Steinhardt
                     ` (4 preceding siblings ...)
  2023-04-14  6:01   ` [PATCH v4 05/10] pack-objects: fix error when packing same pack twice Patrick Steinhardt
@ 2023-04-14  6:01   ` Patrick Steinhardt
  2023-04-14  6:01   ` [PATCH v4 07/10] pack-objects: extend test coverage of `--stdin-packs` with alternates Patrick Steinhardt
                     ` (4 subsequent siblings)
  10 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-14  6:01 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, peff, dstolee, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 2202 bytes --]

When passing the same packfile both as included and excluded via the
`--stdin-packs` option, then we will return an error because the
excluded packfile cannot be found. This is because we will only set the
`util` pointer for the included packfile list if it was found, so that
we later die when we notice that it's in fact not set for the excluded
packfile list.

Fix this bug by always setting the `util` pointer for both the included
and excluded list entries.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/pack-objects.c        |  8 +++-----
 t/t5331-pack-objects-stdin.sh | 20 ++++++++++++++++++++
 2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 7d0e864c35..c5147d392f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3355,11 +3355,9 @@ static void read_packs_list_from_stdin(void)
 	for (p = get_all_packs(the_repository); p; p = p->next) {
 		const char *pack_name = pack_basename(p);
 
-		item = string_list_lookup(&include_packs, pack_name);
-		if (!item)
-			item = string_list_lookup(&exclude_packs, pack_name);
-
-		if (item)
+		if ((item = string_list_lookup(&include_packs, pack_name)))
+			item->util = p;
+		if ((item = string_list_lookup(&exclude_packs, pack_name)))
 			item->util = p;
 	}
 
diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
index 71c8a4a635..3ef736ec05 100755
--- a/t/t5331-pack-objects-stdin.sh
+++ b/t/t5331-pack-objects-stdin.sh
@@ -169,4 +169,24 @@ test_expect_success 'pack-objects --stdin with duplicate packfile' '
 	)
 '
 
+test_expect_success 'pack-objects --stdin with same packfile excluded and included' '
+	test_when_finished "rm -fr repo" &&
+
+	git init repo &&
+	(
+		cd repo &&
+		test_commit "commit" &&
+		git repack -ad &&
+
+		{
+			basename .git/objects/pack/pack-*.pack &&
+			printf "^%s\n" "$(basename .git/objects/pack/pack-*.pack)"
+		} >packfiles &&
+
+		git pack-objects --stdin-packs generated-pack <packfiles &&
+		packed_objects generated-pack-*.idx >packed-objects &&
+		test_must_be_empty packed-objects
+	)
+'
+
 test_done
-- 
2.40.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 07/10] pack-objects: extend test coverage of `--stdin-packs` with alternates
  2023-04-14  6:01 ` [PATCH v4 " Patrick Steinhardt
                     ` (5 preceding siblings ...)
  2023-04-14  6:01   ` [PATCH v4 06/10] pack-objects: fix error when same packfile is included and excluded Patrick Steinhardt
@ 2023-04-14  6:01   ` Patrick Steinhardt
  2023-04-14  6:02   ` [PATCH v4 08/10] t/helper: allow chmtime to print verbosely without modifying mtime Patrick Steinhardt
                     ` (3 subsequent siblings)
  10 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-14  6:01 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, peff, dstolee, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 2584 bytes --]

We don't have any tests that verify that git-pack-objects(1) works with
`--stdin-packs` when combined with alternate object directories. Add
some to make sure that the basic functionality works as expected.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 t/t5331-pack-objects-stdin.sh | 48 +++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/t/t5331-pack-objects-stdin.sh b/t/t5331-pack-objects-stdin.sh
index 3ef736ec05..acab31667a 100755
--- a/t/t5331-pack-objects-stdin.sh
+++ b/t/t5331-pack-objects-stdin.sh
@@ -189,4 +189,52 @@ test_expect_success 'pack-objects --stdin with same packfile excluded and includ
 	)
 '
 
+test_expect_success 'pack-objects --stdin with packfiles from alternate object database' '
+	test_when_finished "rm -fr shared member" &&
+
+	# Set up a shared repository with a single packfile.
+	git init shared &&
+	test_commit -C shared "shared-objects" &&
+	git -C shared repack -ad &&
+	basename shared/.git/objects/pack/pack-*.pack >packfile &&
+
+	# Set up a repository that is connected to the shared repository. This
+	# repository has no objects on its own, but we still expect to be able
+	# to pack objects from its alternate.
+	git clone --shared shared member &&
+	git -C member pack-objects --stdin-packs generated-pack <packfile &&
+	test_cmp shared/.git/objects/pack/pack-*.pack member/generated-pack-*.pack
+'
+
+test_expect_success 'pack-objects --stdin with packfiles from main and alternate object database' '
+	test_when_finished "rm -fr shared member" &&
+
+	# Set up a shared repository with a single packfile.
+	git init shared &&
+	test_commit -C shared "shared-commit" &&
+	git -C shared repack -ad &&
+
+	# Set up a repository that is connected to the shared repository. This
+	# repository has a second packfile so that we can verify that it is
+	# possible to write packs that include packfiles from different object
+	# databases.
+	git clone --shared shared member &&
+	test_commit -C member "local-commit" &&
+	git -C member repack -dl &&
+
+	{
+		basename shared/.git/objects/pack/pack-*.pack &&
+		basename member/.git/objects/pack/pack-*.pack
+	} >packfiles &&
+
+	{
+		packed_objects shared/.git/objects/pack/pack-*.idx &&
+		packed_objects member/.git/objects/pack/pack-*.idx
+	} | sort >expected-objects &&
+
+	git -C member pack-objects --stdin-packs generated-pack <packfiles &&
+	packed_objects member/generated-pack-*.idx >actual-objects &&
+	test_cmp expected-objects actual-objects
+'
+
 test_done
-- 
2.40.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 08/10] t/helper: allow chmtime to print verbosely without modifying mtime
  2023-04-14  6:01 ` [PATCH v4 " Patrick Steinhardt
                     ` (6 preceding siblings ...)
  2023-04-14  6:01   ` [PATCH v4 07/10] pack-objects: extend test coverage of `--stdin-packs` with alternates Patrick Steinhardt
@ 2023-04-14  6:02   ` Patrick Steinhardt
  2023-04-14  6:02   ` [PATCH v4 09/10] repack: honor `-l` when calculating pack geometry Patrick Steinhardt
                     ` (2 subsequent siblings)
  10 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-14  6:02 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, peff, dstolee, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 1059 bytes --]

The `test-tool chmtime` helper allows us to both read and modify the
modification time of files. But while it is possible to only read the
mtimes of a file via `--get`, it is not possible to read the mtimes
and report them together with their respective file paths via the
`--verbose` flag without also modifying the mtime at the same time.

Fix this so that it is possible to call `test-tool chmtime --verbose
<files>...` without modifying any mtimes.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 t/helper/test-chmtime.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/helper/test-chmtime.c b/t/helper/test-chmtime.c
index dc28890a18..0e5538833a 100644
--- a/t/helper/test-chmtime.c
+++ b/t/helper/test-chmtime.c
@@ -94,7 +94,7 @@ int cmd__chmtime(int argc, const char **argv)
 	if (timespec_arg(argv[i], &set_time, &set_eq)) {
 		++i;
 	} else {
-		if (get == 0) {
+		if (get == 0 && verbose == 0) {
 			fprintf(stderr, "Not a base-10 integer: %s\n", argv[i] + 1);
 			goto usage;
 		}
-- 
2.40.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 09/10] repack: honor `-l` when calculating pack geometry
  2023-04-14  6:01 ` [PATCH v4 " Patrick Steinhardt
                     ` (7 preceding siblings ...)
  2023-04-14  6:02   ` [PATCH v4 08/10] t/helper: allow chmtime to print verbosely without modifying mtime Patrick Steinhardt
@ 2023-04-14  6:02   ` Patrick Steinhardt
  2023-04-14  6:02   ` [PATCH v4 10/10] repack: disable writing bitmaps when doing a local repack Patrick Steinhardt
  2023-04-14 13:23   ` [PATCH v4 00/10] repack: fix geometric repacking with gitalternates Derrick Stolee
  10 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-14  6:02 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, peff, dstolee, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 5682 bytes --]

When the user passes `-l` to git-repack(1), then they essentially ask us
to only repack objects part of the local object database while ignoring
any packfiles part of an alternate object database. And we in fact honor
this bit when doing a geometric repack as the resulting packfile will
only ever contain local objects.

What we're missing though is that we don't take locality of packfiles
into account when computing whether the geometric sequence is intact or
not. So even though we would only ever roll up local packfiles anyway,
we could end up trying to repack because of non-local packfiles. This
does not make much sense, and in the worst case it can cause us to try
and do the geometric repack over and over again because we're never able
to restore the geometric sequence.

Fix this bug by honoring whether the user has passed `-l`. If so, we
skip adding any non-local packfiles to the pack geometry.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/repack.c            | 13 +++++++--
 t/t7703-repack-geometric.sh | 57 +++++++++++++++++++++++++++++++++++++
 2 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 80fc860613..5768aeb1f3 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -323,7 +323,8 @@ static int geometry_cmp(const void *va, const void *vb)
 }
 
 static void init_pack_geometry(struct pack_geometry **geometry_p,
-			       struct string_list *existing_kept_packs)
+			       struct string_list *existing_kept_packs,
+			       const struct pack_objects_args *args)
 {
 	struct packed_git *p;
 	struct pack_geometry *geometry;
@@ -333,6 +334,14 @@ static void init_pack_geometry(struct pack_geometry **geometry_p,
 	geometry = *geometry_p;
 
 	for (p = get_all_packs(the_repository); p; p = p->next) {
+		if (args->local && !p->pack_local)
+			/*
+			 * When asked to only repack local packfiles we skip
+			 * over any packfiles that are borrowed from alternate
+			 * object directories.
+			 */
+			continue;
+
 		if (!pack_kept_objects) {
 			/*
 			 * Any pack that has its pack_keep bit set will appear
@@ -900,7 +909,7 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	if (geometric_factor) {
 		if (pack_everything)
 			die(_("options '%s' and '%s' cannot be used together"), "--geometric", "-A/-a");
-		init_pack_geometry(&geometry, &existing_kept_packs);
+		init_pack_geometry(&geometry, &existing_kept_packs, &po_args);
 		split_pack_geometry(geometry, geometric_factor);
 	}
 
diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
index a358dfb7bd..be8fe78448 100755
--- a/t/t7703-repack-geometric.sh
+++ b/t/t7703-repack-geometric.sh
@@ -10,6 +10,12 @@ objdir=.git/objects
 packdir=$objdir/pack
 midx=$objdir/pack/multi-pack-index
 
+packed_objects () {
+	git show-index <"$1" >tmp-object-list &&
+	cut -d' ' -f2 tmp-object-list | sort &&
+	rm tmp-object-list
+ }
+
 test_expect_success '--geometric with no packs' '
 	git init geometric &&
 	test_when_finished "rm -fr geometric" &&
@@ -361,4 +367,55 @@ test_expect_success '--geometric with same pack in main and alternate ODB' '
 	test_cmp expected-files actual-files
 '
 
+test_expect_success '--geometric -l with non-intact geometric sequence across ODBs' '
+	test_when_finished "rm -fr shared member" &&
+
+	git init shared &&
+	test_commit_bulk -C shared --start=1 1 &&
+
+	git clone --shared shared member &&
+	test_commit_bulk -C member --start=2 1 &&
+
+	# Verify that our assumptions actually hold: both generated packfiles
+	# should have three objects and should be non-equal.
+	packed_objects shared/.git/objects/pack/pack-*.idx >shared-objects &&
+	packed_objects member/.git/objects/pack/pack-*.idx >member-objects &&
+	test_line_count = 3 shared-objects &&
+	test_line_count = 3 member-objects &&
+	! test_cmp shared-objects member-objects &&
+
+	# Perform the geometric repack. With `-l`, we should only see the local
+	# packfile and thus arrive at the conclusion that the geometric
+	# sequence is intact. We thus expect no changes.
+	#
+	# Note that we are tweaking mtimes of the packfiles so that we can
+	# verify they did not change. This is done in order to detect the case
+	# where we do repack objects, but the resulting packfile is the same.
+	test-tool chmtime --verbose =0 member/.git/objects/pack/* >expected-member-packs &&
+	git -C member repack --geometric=2 -l -d &&
+	test-tool chmtime --verbose member/.git/objects/pack/* >actual-member-packs &&
+	test_cmp expected-member-packs actual-member-packs &&
+
+	{
+		packed_objects shared/.git/objects/pack/pack-*.idx &&
+		packed_objects member/.git/objects/pack/pack-*.idx
+	} | sort >expected-objects &&
+
+	# On the other hand, when doing a non-local geometric repack we should
+	# see both packfiles and thus repack them. We expect that the shared
+	# object database was not changed.
+	test-tool chmtime --verbose =0 shared/.git/objects/pack/* >expected-shared-packs &&
+	git -C member repack --geometric=2 -d &&
+	test-tool chmtime --verbose shared/.git/objects/pack/* >actual-shared-packs &&
+	test_cmp expected-shared-packs actual-shared-packs &&
+
+	# Furthermore, we expect that the member repository now has a single
+	# packfile that contains the combined shared and non-shared objects.
+	ls member/.git/objects/pack/pack-*.idx >actual &&
+	test_line_count = 1 actual &&
+	packed_objects member/.git/objects/pack/pack-*.idx >actual-objects &&
+	test_line_count = 6 actual-objects &&
+	test_cmp expected-objects actual-objects
+'
+
 test_done
-- 
2.40.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH v4 10/10] repack: disable writing bitmaps when doing a local repack
  2023-04-14  6:01 ` [PATCH v4 " Patrick Steinhardt
                     ` (8 preceding siblings ...)
  2023-04-14  6:02   ` [PATCH v4 09/10] repack: honor `-l` when calculating pack geometry Patrick Steinhardt
@ 2023-04-14  6:02   ` Patrick Steinhardt
  2023-04-14 13:23   ` [PATCH v4 00/10] repack: fix geometric repacking with gitalternates Derrick Stolee
  10 siblings, 0 replies; 67+ messages in thread
From: Patrick Steinhardt @ 2023-04-14  6:02 UTC (permalink / raw)
  To: git; +Cc: Taylor Blau, peff, dstolee, Junio C Hamano

[-- Attachment #1: Type: text/plain, Size: 6746 bytes --]

In order to write a bitmap, we need to have full coverage of all objects
that are about to be packed. In the traditional non-multi-pack-index
world this meant we need to do a full repack of all objects into a
single packfile. But in the new multi-pack-index world we can get away
with writing bitmaps when we have multiple packfiles as long as the
multi-pack-index covers all objects.

This is not always the case though. When asked to perform a repack of
local objects, only, then we cannot guarantee to have full coverage of
all objects regardless of whether we do a full repack or a repack with a
multi-pack-index. The end result is that writing the bitmap will fail in
both worlds:

    $ git multi-pack-index write --stdin-packs --bitmap <packfiles
    warning: Failed to write bitmap index. Packfile doesn't have full closure (object 1529341d78cf45377407369acb0f4ff2b5cdae42 is missing)
    error: could not write multi-pack bitmap

Now there are two different ways to fix this. The first one would be to
amend git-multi-pack-index(1) to disable writing bitmaps when we notice
that we don't have full object coverage.

    - We don't have enough information in git-multi-pack-index(1) in
      order to tell whether the local repository _should_ have full
      coverage. Because even when connected to an alternate object
      directory, it may be the case that we still have all objects
      around in the main object database.

    - git-multi-pack-index(1) is quite a low-level tool. Automatically
      disabling functionality that it was asked to provide does not feel
      like the right thing to do.

We can easily fix it at a higher level in git-repack(1) though. When
asked to only include local objects via `-l` and when connected to an
alternate object directory then we will override the user's ask and
disable writing bitmaps with a warning. This is similar to what we do in
git-pack-objects(1), where we also disable writing bitmaps in case we
omit an object from the pack.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/repack.c            | 12 ++++++++++++
 object-file.c               |  6 ++++++
 object-store.h              |  1 +
 t/t7700-repack.sh           | 17 +++++++++++++++++
 t/t7703-repack-geometric.sh | 27 +++++++++++++++++++++++++++
 5 files changed, 63 insertions(+)

diff --git a/builtin/repack.c b/builtin/repack.c
index 5768aeb1f3..e2abcea2ee 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -887,6 +887,18 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 	if (write_bitmaps && !(pack_everything & ALL_INTO_ONE) && !write_midx)
 		die(_(incremental_bitmap_conflict_error));
 
+	if (write_bitmaps && po_args.local && has_alt_odb(the_repository)) {
+		/*
+		 * When asked to do a local repack, but we have
+		 * packfiles that are inherited from an alternate, then
+		 * we cannot guarantee that the multi-pack-index would
+		 * have full coverage of all objects. We thus disable
+		 * writing bitmaps in that case.
+		 */
+		warning(_("disabling bitmap writing, as some objects are not being packed"));
+		write_bitmaps = 0;
+	}
+
 	if (write_midx && write_bitmaps) {
 		struct strbuf path = STRBUF_INIT;
 
diff --git a/object-file.c b/object-file.c
index 8fab8dbe80..3ad11c683b 100644
--- a/object-file.c
+++ b/object-file.c
@@ -946,6 +946,12 @@ void prepare_alt_odb(struct repository *r)
 	r->objects->loaded_alternates = 1;
 }
 
+int has_alt_odb(struct repository *r)
+{
+	prepare_alt_odb(r);
+	return !!r->objects->odb->next;
+}
+
 /* Returns 1 if we have successfully freshened the file, 0 otherwise. */
 static int freshen_file(const char *fn)
 {
diff --git a/object-store.h b/object-store.h
index 82201ec3e7..574f16140f 100644
--- a/object-store.h
+++ b/object-store.h
@@ -56,6 +56,7 @@ KHASH_INIT(odb_path_map, const char * /* key: odb_path */,
 	struct object_directory *, 1, fspathhash, fspatheq)
 
 void prepare_alt_odb(struct repository *r);
+int has_alt_odb(struct repository *r);
 char *compute_alternate_path(const char *path, struct strbuf *err);
 struct object_directory *find_odb(struct repository *r, const char *obj_dir);
 typedef int alt_odb_fn(struct object_directory *, void *);
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 4aabe98139..faa739eeb9 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -106,6 +106,23 @@ test_expect_success SYMLINKS '--local keeps packs when alternate is objectdir '
 	test_cmp expect actual
 '
 
+test_expect_success '--local disables writing bitmaps when connected to alternate ODB' '
+	test_when_finished "rm -rf shared member" &&
+
+	git init shared &&
+	git clone --shared shared member &&
+	(
+		cd member &&
+		test_commit "object" &&
+		GIT_TEST_MULTI_PACK_INDEX=0 git repack -Adl --write-bitmap-index 2>err &&
+		cat >expect <<-EOF &&
+		warning: disabling bitmap writing, as some objects are not being packed
+		EOF
+		test_cmp expect err &&
+		test_path_is_missing .git/objects/pack-*.bitmap
+	)
+'
+
 test_expect_success 'packed obs in alt ODB are repacked even when local repo is packless' '
 	mkdir alt_objects/pack &&
 	mv .git/objects/pack/* alt_objects/pack &&
diff --git a/t/t7703-repack-geometric.sh b/t/t7703-repack-geometric.sh
index be8fe78448..00f28fb558 100755
--- a/t/t7703-repack-geometric.sh
+++ b/t/t7703-repack-geometric.sh
@@ -418,4 +418,31 @@ test_expect_success '--geometric -l with non-intact geometric sequence across OD
 	test_cmp expected-objects actual-objects
 '
 
+test_expect_success '--geometric -l disables writing bitmaps with non-local packfiles' '
+	test_when_finished "rm -fr shared member" &&
+
+	git init shared &&
+	test_commit_bulk -C shared --start=1 1 &&
+
+	git clone --shared shared member &&
+	test_commit_bulk -C member --start=2 1 &&
+
+	# When performing a geometric repack with `-l` while connected to an
+	# alternate object database that has a packfile we do not have full
+	# coverage of objects. As a result, we expect that writing the bitmap
+	# will be disabled.
+	git -C member repack -l --geometric=2 --write-midx --write-bitmap-index 2>err &&
+	cat >expect <<-EOF &&
+	warning: disabling bitmap writing, as some objects are not being packed
+	EOF
+	test_cmp expect err &&
+	test_path_is_missing member/.git/objects/pack/multi-pack-index-*.bitmap &&
+
+	# On the other hand, when we repack without `-l`, we should see that
+	# the bitmap gets created.
+	git -C member repack --geometric=2 --write-midx --write-bitmap-index 2>err &&
+	test_must_be_empty err &&
+	test_path_is_file member/.git/objects/pack/multi-pack-index-*.bitmap
+'
+
 test_done
-- 
2.40.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 00/10] repack: fix geometric repacking with gitalternates
  2023-04-14  6:01 ` [PATCH v4 " Patrick Steinhardt
                     ` (9 preceding siblings ...)
  2023-04-14  6:02   ` [PATCH v4 10/10] repack: disable writing bitmaps when doing a local repack Patrick Steinhardt
@ 2023-04-14 13:23   ` Derrick Stolee
  2023-04-14 17:29     ` Junio C Hamano
  10 siblings, 1 reply; 67+ messages in thread
From: Derrick Stolee @ 2023-04-14 13:23 UTC (permalink / raw)
  To: Patrick Steinhardt, git; +Cc: Taylor Blau, peff, dstolee, Junio C Hamano

On 4/14/2023 2:01 AM, Patrick Steinhardt wrote:
> Hi,
> 
> this is the fourth version of my patch series to fix geometric repacking
> with repositories connected to an alternate object database.
> 
> This version only addresses some issues with the tests, the actual logic
> remains untouched:
> 
>     - The test added in t7700-repack.sh that verifies that `--local`
>       causes us to disable generation of the bitmap index was failing in
>       the linux-TEST-vars CI job. This was because it sets
>       GIT_TEST_MULTI_PACK_INDEX=1, which causes us to disable the bitmap
>       logic in git-repack(1). I've fixed this failure by explicitly
>       overriding the environment variable like other tests in the same
>       file do.
> 
>     - I've converted path checks to use `test_path_is_missing` and
>       `test_path_is_file` instead of `test ! -f` and `test -f`.
> 
>     - I've fixed a typo in t7703-repack-geometric.sh and shifted code
>       around a bit to make the test more readable, following Derrick's
>       suggestion.

Thanks for these updates. I'm happy with v4.

-Stolee

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH v4 00/10] repack: fix geometric repacking with gitalternates
  2023-04-14 13:23   ` [PATCH v4 00/10] repack: fix geometric repacking with gitalternates Derrick Stolee
@ 2023-04-14 17:29     ` Junio C Hamano
  0 siblings, 0 replies; 67+ messages in thread
From: Junio C Hamano @ 2023-04-14 17:29 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Patrick Steinhardt, git, Taylor Blau, peff, dstolee

Derrick Stolee <derrickstolee@github.com> writes:

> On 4/14/2023 2:01 AM, Patrick Steinhardt wrote:
>> Hi,
>> 
>> this is the fourth version of my patch series to fix geometric repacking
>> with repositories connected to an alternate object database.
>> 
>> This version only addresses some issues with the tests, the actual logic
>> remains untouched:
>> 
>>     - The test added in t7700-repack.sh that verifies that `--local`
>>       causes us to disable generation of the bitmap index was failing in
>>       the linux-TEST-vars CI job. This was because it sets
>>       GIT_TEST_MULTI_PACK_INDEX=1, which causes us to disable the bitmap
>>       logic in git-repack(1). I've fixed this failure by explicitly
>>       overriding the environment variable like other tests in the same
>>       file do.
>> 
>>     - I've converted path checks to use `test_path_is_missing` and
>>       `test_path_is_file` instead of `test ! -f` and `test -f`.
>> 
>>     - I've fixed a typo in t7703-repack-geometric.sh and shifted code
>>       around a bit to make the test more readable, following Derrick's
>>       suggestion.
>
> Thanks for these updates. I'm happy with v4.

Yup, the changes since the previous round all look definite
improvement, and GIT_TEST_MULTI_PACK_INDEX=1 workaround looks fine,
too.

Thanks, all.  Will queue.  Hopefully we can merge it down to 'next'
soonish.


^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2023-04-14 17:32 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-04 11:08 [PATCH] repack: fix geometric repacking with gitalternates Patrick Steinhardt
2023-04-04 18:55 ` Taylor Blau
2023-04-04 19:00   ` Taylor Blau
2023-04-05  7:08   ` Patrick Steinhardt
2023-04-10 15:06     ` Derrick Stolee
2023-04-10 23:49       ` Taylor Blau
2023-04-11 17:13         ` Patrick Steinhardt
2023-04-11 21:13           ` Taylor Blau
2023-04-12  9:37             ` Patrick Steinhardt
2023-04-11 17:06       ` Patrick Steinhardt
2023-04-11 17:26         ` Patrick Steinhardt
2023-04-11 21:14           ` Taylor Blau
2023-04-10 23:29     ` Taylor Blau
2023-04-12 10:22 ` [PATCH v2 0/8] " Patrick Steinhardt
2023-04-12 10:22   ` [PATCH v2 1/8] midx: fix segfault with no packs and invalid preferred pack Patrick Steinhardt
2023-04-12 17:56     ` Taylor Blau
2023-04-13  9:28       ` Patrick Steinhardt
2023-04-12 10:22   ` [PATCH v2 2/8] repack: fix trying to use preferred pack in alternates Patrick Steinhardt
2023-04-12 18:37     ` Taylor Blau
2023-04-13  9:31       ` Patrick Steinhardt
2023-04-12 10:22   ` [PATCH v2 3/8] repack: fix generating multi-pack-index with only non-local packs Patrick Steinhardt
2023-04-12 20:39     ` Taylor Blau
2023-04-12 10:22   ` [PATCH v2 4/8] pack-objects: fix error when packing same pack twice Patrick Steinhardt
2023-04-12 21:33     ` Taylor Blau
2023-04-12 10:22   ` [PATCH v2 5/8] pack-objects: fix error when same packfile is included and excluded Patrick Steinhardt
2023-04-12 21:52     ` Taylor Blau
2023-04-12 10:22   ` [PATCH v2 6/8] pack-objects: extend test coverage of `--stdin-packs` with alternates Patrick Steinhardt
2023-04-12 10:22   ` [PATCH v2 7/8] repack: honor `-l` when calculating pack geometry Patrick Steinhardt
2023-04-12 23:56     ` Junio C Hamano
2023-04-13  5:11       ` Junio C Hamano
2023-04-13  6:41         ` Patrick Steinhardt
2023-04-12 10:23   ` [PATCH v2 8/8] repack: disable writing bitmaps when doing a local geometric repack Patrick Steinhardt
2023-04-12 22:01     ` Taylor Blau
2023-04-13  9:54       ` Patrick Steinhardt
2023-04-13 10:14         ` Patrick Steinhardt
2023-04-12 22:02   ` [PATCH v2 0/8] repack: fix geometric repacking with gitalternates Taylor Blau
2023-04-13 11:16 ` [PATCH v3 00/10] " Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 01/10] midx: fix segfault with no packs and invalid preferred pack Patrick Steinhardt
2023-04-13 13:49     ` Derrick Stolee
2023-04-13 11:16   ` [PATCH v3 02/10] repack: fix trying to use preferred pack in alternates Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 03/10] repack: fix generating multi-pack-index with only non-local packs Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 04/10] pack-objects: split out `--stdin-packs` tests into separate file Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 05/10] pack-objects: fix error when packing same pack twice Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 06/10] pack-objects: fix error when same packfile is included and excluded Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 07/10] pack-objects: extend test coverage of `--stdin-packs` with alternates Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 08/10] t/helper: allow chmtime to print verbosely without modifying mtime Patrick Steinhardt
2023-04-13 11:16   ` [PATCH v3 09/10] repack: honor `-l` when calculating pack geometry Patrick Steinhardt
2023-04-13 13:59     ` Derrick Stolee
2023-04-13 14:13       ` Patrick Steinhardt
2023-04-13 15:40       ` Junio C Hamano
2023-04-13 11:16   ` [PATCH v3 10/10] repack: disable writing bitmaps when doing a local repack Patrick Steinhardt
2023-04-13 14:54   ` [PATCH v3 00/10] repack: fix geometric repacking with gitalternates Derrick Stolee
2023-04-14  2:03   ` Junio C Hamano
2023-04-14  5:42     ` Patrick Steinhardt
2023-04-14  6:01 ` [PATCH v4 " Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 01/10] midx: fix segfault with no packs and invalid preferred pack Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 02/10] repack: fix trying to use preferred pack in alternates Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 03/10] repack: fix generating multi-pack-index with only non-local packs Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 04/10] pack-objects: split out `--stdin-packs` tests into separate file Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 05/10] pack-objects: fix error when packing same pack twice Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 06/10] pack-objects: fix error when same packfile is included and excluded Patrick Steinhardt
2023-04-14  6:01   ` [PATCH v4 07/10] pack-objects: extend test coverage of `--stdin-packs` with alternates Patrick Steinhardt
2023-04-14  6:02   ` [PATCH v4 08/10] t/helper: allow chmtime to print verbosely without modifying mtime Patrick Steinhardt
2023-04-14  6:02   ` [PATCH v4 09/10] repack: honor `-l` when calculating pack geometry Patrick Steinhardt
2023-04-14  6:02   ` [PATCH v4 10/10] repack: disable writing bitmaps when doing a local repack Patrick Steinhardt
2023-04-14 13:23   ` [PATCH v4 00/10] repack: fix geometric repacking with gitalternates Derrick Stolee
2023-04-14 17:29     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).