git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] builtin/repack.c: remove redundant pack-based bitmaps
@ 2022-10-12 19:05 Taylor Blau
  2022-10-17 18:02 ` Jeff King
  2022-10-18  2:45 ` [PATCH v2] " Taylor Blau
  0 siblings, 2 replies; 10+ messages in thread
From: Taylor Blau @ 2022-10-12 19:05 UTC (permalink / raw)
  To: git; +Cc: derrickstolee, vdye, peff, gitster

When we write a MIDX bitmap after repacking, it is possible that the
repository would be left in a state with both pack- and multi-pack
reachability bitmaps.

This can occur, for instance, if a pack that was kept (either by having
a .keep file, or during a geometric repack in which it is not rolled up)
has a bitmap file, and the repack wrote a multi-pack index and bitmap.

When loading a reachability bitmap for the repository, the multi-pack
one is always preferred, so the pack-based one is redundant. Let's
remove it unconditionally, even if '-d' isn't passed, since there is no
practical reason to keep both around. The patch below does just that.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 builtin/repack.c  | 40 +++++++++++++++++++++++++++++++++++++---
 t/t7700-repack.sh | 21 +++++++++++++++++++++
 2 files changed, 58 insertions(+), 3 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index a5bacc7797..59001e8020 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -661,6 +661,35 @@ static int write_midx_included_packs(struct string_list *include,
 	return finish_command(&cmd);
 }
 
+static void remove_redundant_bitmaps(struct string_list *include,
+				     const char *packdir)
+{
+	struct strbuf path = STRBUF_INIT;
+	struct string_list_item *item;
+	size_t packdir_len;
+
+	strbuf_addstr(&path, packdir);
+	strbuf_addch(&path, '/');
+	packdir_len = path.len;
+
+	/*
+	 * Remove any pack bitmaps corresponding to packs which are now
+	 * included in the MIDX.
+	 */
+	for_each_string_list_item(item, include) {
+		strbuf_addstr(&path, item->string);
+		strbuf_strip_suffix(&path, ".idx");
+		strbuf_addstr(&path, ".bitmap");
+
+		if (unlink(path.buf) && errno != ENOENT)
+			die_errno(_("could not remove stale bitmap: %s"),
+				  path.buf);
+
+		strbuf_setlen(&path, packdir_len);
+	}
+	strbuf_release(&path);
+}
+
 static int write_cruft_pack(const struct pack_objects_args *args,
 			    const char *pack_prefix,
 			    struct string_list *names,
@@ -1059,10 +1088,15 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 						refs_snapshot ? get_tempfile_path(refs_snapshot) : NULL,
 						show_progress, write_bitmaps > 0);
 
+		if (ret) {
+			string_list_clear(&include, 0);
+			return ret;
+		}
+
+		if (write_bitmaps)
+			remove_redundant_bitmaps(&include, packdir);
+
 		string_list_clear(&include, 0);
-
-		if (ret)
-			return ret;
 	}
 
 	reprepare_packed_git(the_repository);
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index ca45c4cd2c..2d0e9448dd 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -426,6 +426,27 @@ test_expect_success '--write-midx -b packs non-kept objects' '
 	)
 '
 
+test_expect_success '--write-midx removes stale pack-based bitmaps' '
+       rm -fr repo &&
+       git init repo &&
+       test_when_finished "rm -fr repo" &&
+       (
+		cd repo &&
+		test_commit base &&
+		GIT_TEST_MULTI_PACK_INDEX=0 git repack -Ab &&
+
+		pack_bitmap=$(ls $objdir/pack/pack-*.bitmap) &&
+		test_path_is_file "$pack_bitmap" &&
+
+		test_commit tip &&
+		GIT_TEST_MULTI_PACK_INDEX=0 git repack -bm &&
+
+		test_path_is_file $midx &&
+		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+		test_path_is_missing $pack_bitmap
+       )
+'
+
 test_expect_success TTY '--quiet disables progress' '
 	test_terminal env GIT_PROGRESS_DELAY=0 \
 		git -C midx repack -ad --quiet --write-midx 2>stderr &&
-- 
2.38.0.16.g393fd4c6db

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] builtin/repack.c: remove redundant pack-based bitmaps
  2022-10-12 19:05 [PATCH] builtin/repack.c: remove redundant pack-based bitmaps Taylor Blau
@ 2022-10-17 18:02 ` Jeff King
  2022-10-17 18:50   ` Ævar Arnfjörð Bjarmason
                     ` (2 more replies)
  2022-10-18  2:45 ` [PATCH v2] " Taylor Blau
  1 sibling, 3 replies; 10+ messages in thread
From: Jeff King @ 2022-10-17 18:02 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, derrickstolee, vdye, gitster

On Wed, Oct 12, 2022 at 03:05:33PM -0400, Taylor Blau wrote:

> When we write a MIDX bitmap after repacking, it is possible that the
> repository would be left in a state with both pack- and multi-pack
> reachability bitmaps.
> 
> This can occur, for instance, if a pack that was kept (either by having
> a .keep file, or during a geometric repack in which it is not rolled up)
> has a bitmap file, and the repack wrote a multi-pack index and bitmap.
> 
> When loading a reachability bitmap for the repository, the multi-pack
> one is always preferred, so the pack-based one is redundant. Let's
> remove it unconditionally, even if '-d' isn't passed, since there is no
> practical reason to keep both around. The patch below does just that.

Yeah, this is certainly a reasonable thing to be doing. I wonder if you
want to share the story of why the original midx-bitmap series did not
include this patch. It is (IMHO, at least) an interesting debugging
tale.

>  builtin/repack.c  | 40 +++++++++++++++++++++++++++++++++++++---
>  t/t7700-repack.sh | 21 +++++++++++++++++++++
>  2 files changed, 58 insertions(+), 3 deletions(-)

The patch looks good. Two very minor comments, though I'd be happy
enough to see it merged as-is:

> +static void remove_redundant_bitmaps(struct string_list *include,
> +				     const char *packdir)
> +{
> +	struct strbuf path = STRBUF_INIT;
> +	struct string_list_item *item;
> +	size_t packdir_len;
> +
> +	strbuf_addstr(&path, packdir);
> +	strbuf_addch(&path, '/');
> +	packdir_len = path.len;
> +
> +	/*
> +	 * Remove any pack bitmaps corresponding to packs which are now
> +	 * included in the MIDX.
> +	 */
> +	for_each_string_list_item(item, include) {
> +		strbuf_addstr(&path, item->string);
> +		strbuf_strip_suffix(&path, ".idx");
> +		strbuf_addstr(&path, ".bitmap");
> +
> +		if (unlink(path.buf) && errno != ENOENT)
> +			die_errno(_("could not remove stale bitmap: %s"),
> +				  path.buf);

We could downgrade this to a warning, since there is no downside to
retaining those files (aside from wasted space). In
remove_redundant_pack(), we call into unlink_pack_path(), which just
ignores unlink errors (though arguably it should at least warn).

> @@ -1059,10 +1088,15 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  						refs_snapshot ? get_tempfile_path(refs_snapshot) : NULL,
>  						show_progress, write_bitmaps > 0);
>  
> +		if (ret) {
> +			string_list_clear(&include, 0);
> +			return ret;
> +		}
> +
> +		if (write_bitmaps)
> +			remove_redundant_bitmaps(&include, packdir);
> +
>  		string_list_clear(&include, 0);
> -
> -		if (ret)
> -			return ret;
>  	}

You could avoid having to repeat the string-list cleanup here by
structuring it like:

  if (!ret && write_bitmaps)
	remove_redundant_bitmaps(&include, packdir);

  /* as before, clear string list and possibly return ret */

Since it's only one line, it's not that big a deal, but it simplifies
the flow.

It's correct either way, of course. One thing I did have to do while
reviewing this was look at this hunk in place. The context omits that
this is in the "if (write_midx)" conditional, which is of course very
important. ;)

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] builtin/repack.c: remove redundant pack-based bitmaps
  2022-10-17 18:02 ` Jeff King
@ 2022-10-17 18:50   ` Ævar Arnfjörð Bjarmason
  2022-10-17 19:27     ` Jeff King
  2022-10-18  2:38   ` Taylor Blau
  2022-10-18  2:43   ` Taylor Blau
  2 siblings, 1 reply; 10+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-10-17 18:50 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, derrickstolee, vdye, gitster


On Mon, Oct 17 2022, Jeff King wrote:

> On Wed, Oct 12, 2022 at 03:05:33PM -0400, Taylor Blau wrote:
> [...]
>> [...]
>> +	for_each_string_list_item(item, include) {
>> +		strbuf_addstr(&path, item->string);
>> +		strbuf_strip_suffix(&path, ".idx");
>> +		strbuf_addstr(&path, ".bitmap");
>> +
>> +		if (unlink(path.buf) && errno != ENOENT)
>> +			die_errno(_("could not remove stale bitmap: %s"),
>> +				  path.buf);
>
> We could downgrade this to a warning, since there is no downside to
> retaining those files (aside from wasted space). In
> remove_redundant_pack(), we call into unlink_pack_path(), which just
> ignores unlink errors (though arguably it should at least warn).

An ENOENT would indicate a race with a concurrent process.

That in itself seems like a wart of our lock management. I.e. we invoked
"multi-pack-index" earlier and took a lock to create the midx.

I don't know how exactly that interact with the pack bitmaps, but should
we ideally have taken locks on the relevant associated files instead
during the larger operation?

But anyway, that's a much larger topic, so we could sweep it under the
rug here.

But at that point should we be ignoring other unlink() errors?  Maybe we
should additionally be ignoring EBUSY.

But the rest all seem from a quick scan of unlink(2) to be things we'd
like to just die on, e.g. EIO.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] builtin/repack.c: remove redundant pack-based bitmaps
  2022-10-17 18:50   ` Ævar Arnfjörð Bjarmason
@ 2022-10-17 19:27     ` Jeff King
  2022-10-17 21:04       ` Junio C Hamano
  0 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2022-10-17 19:27 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Taylor Blau, git, derrickstolee, vdye, gitster

On Mon, Oct 17, 2022 at 08:50:07PM +0200, Ævar Arnfjörð Bjarmason wrote:

> On Mon, Oct 17 2022, Jeff King wrote:
> 
> > On Wed, Oct 12, 2022 at 03:05:33PM -0400, Taylor Blau wrote:
> > [...]
> >> [...]
> >> +	for_each_string_list_item(item, include) {
> >> +		strbuf_addstr(&path, item->string);
> >> +		strbuf_strip_suffix(&path, ".idx");
> >> +		strbuf_addstr(&path, ".bitmap");
> >> +
> >> +		if (unlink(path.buf) && errno != ENOENT)
> >> +			die_errno(_("could not remove stale bitmap: %s"),
> >> +				  path.buf);
> >
> > We could downgrade this to a warning, since there is no downside to
> > retaining those files (aside from wasted space). In
> > remove_redundant_pack(), we call into unlink_pack_path(), which just
> > ignores unlink errors (though arguably it should at least warn).
> 
> An ENOENT would indicate a race with a concurrent process.

In the code above, I don't think it is. We included some packs in the
midx, and we know that their bitmaps (if any) are now redundant. So we
remove them by calling unlink() speculatively, rather than asking "do we
have this file? If so, unlink".

Likewise in remove_redundant_pack(). We are not just removing the
packfiles, but speculatively removing their .rev, .bitmap, .promisor,
etc. If they don't exist, that's perfectly normal. Arguably we could
notice and complain if the .pack or .idx went away in the interim, since
those would indicate a race.

So I think the code above is right to ignore ENOENT. The one in
remove_redundant_pack() does so, too, but it also ignores everything
else.

> But at that point should we be ignoring other unlink() errors?  Maybe we
> should additionally be ignoring EBUSY.

For most errors besides ENOENT, I think a warning is the right thing. It
is OK that we failed to remove the file (it does not make the overall
operation a failure). But it may be helpful for the user to let them
know that the cruft was left (e.g., for EPERM or EIO). And that applies
here and in remove_redundant_pack().

Really, both here and remove_redundant_pack() should probably be using
unlink_or_warn(). Though if you did want to notice the race on deleting
the actual .pack file, it would need to be hand-coded.

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] builtin/repack.c: remove redundant pack-based bitmaps
  2022-10-17 19:27     ` Jeff King
@ 2022-10-17 21:04       ` Junio C Hamano
  2022-10-17 21:40         ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: Junio C Hamano @ 2022-10-17 21:04 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, Taylor Blau, git,
	derrickstolee, vdye

Jeff King <peff@peff.net> writes:

> Really, both here and remove_redundant_pack() should probably be using
> unlink_or_warn(). Though if you did want to notice the race on deleting
> the actual .pack file, it would need to be hand-coded.

Yeah, unlink_or_warn() would probably be the right way to go.

Other than that, the patch looks good to you folks?

Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] builtin/repack.c: remove redundant pack-based bitmaps
  2022-10-17 21:04       ` Junio C Hamano
@ 2022-10-17 21:40         ` Jeff King
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2022-10-17 21:40 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, Taylor Blau, git,
	derrickstolee, vdye

On Mon, Oct 17, 2022 at 02:04:13PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Really, both here and remove_redundant_pack() should probably be using
> > unlink_or_warn(). Though if you did want to notice the race on deleting
> > the actual .pack file, it would need to be hand-coded.
> 
> Yeah, unlink_or_warn() would probably be the right way to go.
> 
> Other than that, the patch looks good to you folks?

Yeah. I suggested a minor reordering of the one change, but the
substance of the patch looks good to me (and I am OK either way on the
little suggestions).

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] builtin/repack.c: remove redundant pack-based bitmaps
  2022-10-17 18:02 ` Jeff King
  2022-10-17 18:50   ` Ævar Arnfjörð Bjarmason
@ 2022-10-18  2:38   ` Taylor Blau
  2022-10-18  2:43   ` Taylor Blau
  2 siblings, 0 replies; 10+ messages in thread
From: Taylor Blau @ 2022-10-18  2:38 UTC (permalink / raw)
  To: Jeff King; +Cc: git, derrickstolee, vdye, gitster

On Mon, Oct 17, 2022 at 02:02:53PM -0400, Jeff King wrote:
> On Wed, Oct 12, 2022 at 03:05:33PM -0400, Taylor Blau wrote:
>
> > When we write a MIDX bitmap after repacking, it is possible that the
> > repository would be left in a state with both pack- and multi-pack
> > reachability bitmaps.
> >
> > This can occur, for instance, if a pack that was kept (either by having
> > a .keep file, or during a geometric repack in which it is not rolled up)
> > has a bitmap file, and the repack wrote a multi-pack index and bitmap.
> >
> > When loading a reachability bitmap for the repository, the multi-pack
> > one is always preferred, so the pack-based one is redundant. Let's
> > remove it unconditionally, even if '-d' isn't passed, since there is no
> > practical reason to keep both around. The patch below does just that.
>
> Yeah, this is certainly a reasonable thing to be doing. I wonder if you
> want to share the story of why the original midx-bitmap series did not
> include this patch. It is (IMHO, at least) an interesting debugging
> tale.

Sure, though note that this is a mystery that plague me on-and-off for a
year or so before finally figuring it out ;-).

When we initially deployed MIDX bitmaps at GitHub, I held back this
patch as a recovery mechanism (ie., if something was horribly broken
with MIDX bitmaps, we could quickly deploy a patch to stop reading them,
knowing that the existing pack bitmaps were still around).

After a while building up confidence in the MIDX bitmaps implementation,
I wrote something that resembles the patch here and deployed it. No big
deal, right?

Wrong. The "total time spent running `git repack`" metrics started
ticking up. But the metrics ticked up even in hosts and sites that had
not yet received the patch. Huh?

This led to many of the trace2 regions in midx.c that I recently[1, 2]
sent. But after deploying that patch and reverting it over and over
again, nothing in the trace2 metrics ended up yielding a satisfying
answer.

I spent a few days trying to debug the issue but didn't make any
progress. I confirmed that, yes, we were indeed reading MIDX bitmaps and
not relying on the pack-based bitmaps for some weird reason. And after I
confirmed that, I set the bug aside to work on other things, and mostly
forgot about it.

But in an unrelated incident, a colleague of mine noted a bug in a
GitHub-internal program called `git-get-unpacked-size`. The output of
this program is the number of bytes in non-bitmapped packs, and it's
used as a heuristic to help us pick repository networks to repack
roughly in order of "most non-repacked bytes".

The bug in `get-unpacked-size` was that we were including the size of
the cruft pack, meaning that certain networks would get scheduled much
more often. But in fixing that bug, I noticed that packs in the MIDX
would also get counted against our unpacked size.

So I wrote the minimal fix to fix the cruft pack bug, and noted the MIDX
issue for another day. When we did come back to it (I say "we", since
Victoria Dye implemented the actual fix), I suggested that we ignore any
packs that appear in a MIDX from the unpacked size, since we only write
a MIDX after repacking a repository.

When we did so, I suspected that it might inadvertently solve the original mystery. It did, and here's why.

Individual maintenance runs did not get slower when removing redundant
pack bitmaps. But because removing the .bitmap file causes that pack to
be counted towards a repository's total unpacked size, it biases
scheduling towards larger repositories (which have bigger packs, and
therefore a more inflated overall size). So we weren't slowing anything
down, just doing a slightly more expensive maintenance (on bigger
repositories) slightly more often. This explained not only the uptick,
but why the effect was so resistant to measuring.

It also explained why we saw the same effect when removing pack bitmaps
with MIDX bitmaps deployed to a single site only: whenever `git
get-unpacked-size` ran on a replica that *did* have the new code, the
unpacked-size count would get inflated. That explains why we saw
maintenance times tick up outside of our testing site.

So, that's the story :-).

[1]: https://lore.kernel.org/git/dc50527d99680fb0ff1f3240531105badaa221dc.1665612094.git.me@ttaylorr.com/
[2]: https://lore.kernel.org/git/dd5ceb1ec3c01e8a2af55130718fff0b5eaf2de0.1665612094.git.me@ttaylorr.com/

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] builtin/repack.c: remove redundant pack-based bitmaps
  2022-10-17 18:02 ` Jeff King
  2022-10-17 18:50   ` Ævar Arnfjörð Bjarmason
  2022-10-18  2:38   ` Taylor Blau
@ 2022-10-18  2:43   ` Taylor Blau
  2 siblings, 0 replies; 10+ messages in thread
From: Taylor Blau @ 2022-10-18  2:43 UTC (permalink / raw)
  To: Jeff King; +Cc: git, derrickstolee, vdye, gitster

On Mon, Oct 17, 2022 at 02:02:53PM -0400, Jeff King wrote:
> > +		if (unlink(path.buf) && errno != ENOENT)
> > +			die_errno(_("could not remove stale bitmap: %s"),
> > +				  path.buf);
>
> We could downgrade this to a warning, since there is no downside to
> retaining those files (aside from wasted space). In
> remove_redundant_pack(), we call into unlink_pack_path(), which just
> ignores unlink errors (though arguably it should at least warn).

I think that downgrading this to `warning_errno()` is appropriate. I'll
make that change locally and send a new version.

I think a good separate topic is teaching `remove_redundant_pack()` to
emit warnings for non-ENOENT errors, too. But I'll leave that for
another day :-).

> > @@ -1059,10 +1088,15 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
> >  						refs_snapshot ? get_tempfile_path(refs_snapshot) : NULL,
> >  						show_progress, write_bitmaps > 0);
> >
> > +		if (ret) {
> > +			string_list_clear(&include, 0);
> > +			return ret;
> > +		}
> > +
> > +		if (write_bitmaps)
> > +			remove_redundant_bitmaps(&include, packdir);
> > +
> >  		string_list_clear(&include, 0);
> > -
> > -		if (ret)
> > -			return ret;
> >  	}
>
> You could avoid having to repeat the string-list cleanup here by
> structuring it like:
>
>   if (!ret && write_bitmaps)
> 	remove_redundant_bitmaps(&include, packdir);
>
>   /* as before, clear string list and possibly return ret */
>
> Since it's only one line, it's not that big a deal, but it simplifies
> the flow.
>
> It's correct either way, of course. One thing I did have to do while
> reviewing this was look at this hunk in place. The context omits that
> this is in the "if (write_midx)" conditional, which is of course very
> important. ;)

Great suggestion, thanks. I'll apply it locally and send a reroll now.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2] builtin/repack.c: remove redundant pack-based bitmaps
  2022-10-12 19:05 [PATCH] builtin/repack.c: remove redundant pack-based bitmaps Taylor Blau
  2022-10-17 18:02 ` Jeff King
@ 2022-10-18  2:45 ` Taylor Blau
  2022-10-18  6:29   ` Jeff King
  1 sibling, 1 reply; 10+ messages in thread
From: Taylor Blau @ 2022-10-18  2:45 UTC (permalink / raw)
  To: git; +Cc: vdye, derrickstolee, peff, avarab

When we write a MIDX bitmap after repacking, it is possible that the
repository would be left in a state with both pack- and multi-pack
reachability bitmaps.

This can occur, for instance, if a pack that was kept (either by having
a .keep file, or during a geometric repack in which it is not rolled up)
has a bitmap file, and the repack wrote a multi-pack index and bitmap.

When loading a reachability bitmap for the repository, the multi-pack
one is always preferred, so the pack-based one is redundant. Let's
remove it unconditionally, even if '-d' isn't passed, since there is no
practical reason to keep both around. The patch below does just that.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
A small reroll to address a pair of comments from Peff.

Range-diff against v1:
1:  393fd4c6db ! 1:  1e0ef7ee7f builtin/repack.c: remove redundant pack-based bitmaps
    @@ builtin/repack.c: static int write_midx_included_packs(struct string_list *inclu
     +		strbuf_addstr(&path, ".bitmap");
     +
     +		if (unlink(path.buf) && errno != ENOENT)
    -+			die_errno(_("could not remove stale bitmap: %s"),
    -+				  path.buf);
    ++			warning_errno(_("could not remove stale bitmap: %s"),
    ++				      path.buf);
     +
     +		strbuf_setlen(&path, packdir_len);
     +	}
    @@ builtin/repack.c: int cmd_repack(int argc, const char **argv, const char *prefix
      						refs_snapshot ? get_tempfile_path(refs_snapshot) : NULL,
      						show_progress, write_bitmaps > 0);

    -+		if (ret) {
    -+			string_list_clear(&include, 0);
    -+			return ret;
    -+		}
    -+
    -+		if (write_bitmaps)
    ++		if (!ret && write_bitmaps)
     +			remove_redundant_bitmaps(&include, packdir);
     +
      		string_list_clear(&include, 0);
    --
    --		if (ret)
    --			return ret;
    - 	}

    - 	reprepare_packed_git(the_repository);
    + 		if (ret)

      ## t/t7700-repack.sh ##
     @@ t/t7700-repack.sh: test_expect_success '--write-midx -b packs non-kept objects' '

 builtin/repack.c  | 32 ++++++++++++++++++++++++++++++++
 t/t7700-repack.sh | 21 +++++++++++++++++++++
 2 files changed, 53 insertions(+)

diff --git a/builtin/repack.c b/builtin/repack.c
index a5bacc7797..c2d2e52bd4 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -661,6 +661,35 @@ static int write_midx_included_packs(struct string_list *include,
 	return finish_command(&cmd);
 }

+static void remove_redundant_bitmaps(struct string_list *include,
+				     const char *packdir)
+{
+	struct strbuf path = STRBUF_INIT;
+	struct string_list_item *item;
+	size_t packdir_len;
+
+	strbuf_addstr(&path, packdir);
+	strbuf_addch(&path, '/');
+	packdir_len = path.len;
+
+	/*
+	 * Remove any pack bitmaps corresponding to packs which are now
+	 * included in the MIDX.
+	 */
+	for_each_string_list_item(item, include) {
+		strbuf_addstr(&path, item->string);
+		strbuf_strip_suffix(&path, ".idx");
+		strbuf_addstr(&path, ".bitmap");
+
+		if (unlink(path.buf) && errno != ENOENT)
+			warning_errno(_("could not remove stale bitmap: %s"),
+				      path.buf);
+
+		strbuf_setlen(&path, packdir_len);
+	}
+	strbuf_release(&path);
+}
+
 static int write_cruft_pack(const struct pack_objects_args *args,
 			    const char *pack_prefix,
 			    struct string_list *names,
@@ -1059,6 +1088,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
 						refs_snapshot ? get_tempfile_path(refs_snapshot) : NULL,
 						show_progress, write_bitmaps > 0);

+		if (!ret && write_bitmaps)
+			remove_redundant_bitmaps(&include, packdir);
+
 		string_list_clear(&include, 0);

 		if (ret)
diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index ca45c4cd2c..2d0e9448dd 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -426,6 +426,27 @@ test_expect_success '--write-midx -b packs non-kept objects' '
 	)
 '

+test_expect_success '--write-midx removes stale pack-based bitmaps' '
+       rm -fr repo &&
+       git init repo &&
+       test_when_finished "rm -fr repo" &&
+       (
+		cd repo &&
+		test_commit base &&
+		GIT_TEST_MULTI_PACK_INDEX=0 git repack -Ab &&
+
+		pack_bitmap=$(ls $objdir/pack/pack-*.bitmap) &&
+		test_path_is_file "$pack_bitmap" &&
+
+		test_commit tip &&
+		GIT_TEST_MULTI_PACK_INDEX=0 git repack -bm &&
+
+		test_path_is_file $midx &&
+		test_path_is_file $midx-$(midx_checksum $objdir).bitmap &&
+		test_path_is_missing $pack_bitmap
+       )
+'
+
 test_expect_success TTY '--quiet disables progress' '
 	test_terminal env GIT_PROGRESS_DELAY=0 \
 		git -C midx repack -ad --quiet --write-midx 2>stderr &&
--
2.38.0.16.g393fd4c6db

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] builtin/repack.c: remove redundant pack-based bitmaps
  2022-10-18  2:45 ` [PATCH v2] " Taylor Blau
@ 2022-10-18  6:29   ` Jeff King
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2022-10-18  6:29 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, vdye, derrickstolee, avarab

On Mon, Oct 17, 2022 at 10:45:12PM -0400, Taylor Blau wrote:

> When we write a MIDX bitmap after repacking, it is possible that the
> repository would be left in a state with both pack- and multi-pack
> reachability bitmaps.
> 
> This can occur, for instance, if a pack that was kept (either by having
> a .keep file, or during a geometric repack in which it is not rolled up)
> has a bitmap file, and the repack wrote a multi-pack index and bitmap.
> 
> When loading a reachability bitmap for the repository, the multi-pack
> one is always preferred, so the pack-based one is redundant. Let's
> remove it unconditionally, even if '-d' isn't passed, since there is no
> practical reason to keep both around. The patch below does just that.

Thanks, this version looks great to me.

> 1:  393fd4c6db ! 1:  1e0ef7ee7f builtin/repack.c: remove redundant pack-based bitmaps
>     @@ builtin/repack.c: static int write_midx_included_packs(struct string_list *inclu
>      +		strbuf_addstr(&path, ".bitmap");
>      +
>      +		if (unlink(path.buf) && errno != ENOENT)
>     -+			die_errno(_("could not remove stale bitmap: %s"),
>     -+				  path.buf);
>     ++			warning_errno(_("could not remove stale bitmap: %s"),
>     ++				      path.buf);

As noted elsewhere in the thread, this could be unlink_or_warn(), but
doing it manually lets us produce a more specific message, which is
nice.

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-10-18  6:29 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-12 19:05 [PATCH] builtin/repack.c: remove redundant pack-based bitmaps Taylor Blau
2022-10-17 18:02 ` Jeff King
2022-10-17 18:50   ` Ævar Arnfjörð Bjarmason
2022-10-17 19:27     ` Jeff King
2022-10-17 21:04       ` Junio C Hamano
2022-10-17 21:40         ` Jeff King
2022-10-18  2:38   ` Taylor Blau
2022-10-18  2:43   ` Taylor Blau
2022-10-18  2:45 ` [PATCH v2] " Taylor Blau
2022-10-18  6:29   ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).