git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Taylor Blau <me@ttaylorr.com>
Cc: git@vger.kernel.org, vdye@github.com, derrickstolee@github.com
Subject: Re: [PATCH] repack: don't remove .keep packs with `--pack-kept-objects`
Date: Tue, 18 Oct 2022 02:54:56 -0400	[thread overview]
Message-ID: <Y05NwIfY/906b2Vd@coredump.intra.peff.net> (raw)
In-Reply-To: <6a012cd625c1d197ede91c85299cbfb37adf356b.1666059872.git.me@ttaylorr.com>

On Mon, Oct 17, 2022 at 10:26:06PM -0400, Taylor Blau wrote:

> But when `--pack-kept-objects` is given, things can go awry. Namely,
> when a kept pack is included in the list of packs tracked by the
> `pack_geometry` struct *and* part of the pack roll-up, we will delete
> the `.keep` pack when we shouldn't.

Oops. Your explanation makes sense, and this is an easy case to miss.

After reading the message but before seeing the patch, I'd guess we just
need to add "if (!this_pack_is_kept)" to the removal loop.

And indeed, it's close to that:

> diff --git a/builtin/repack.c b/builtin/repack.c
> index a5bacc7797..f71909696d 100644
> --- a/builtin/repack.c
> +++ b/builtin/repack.c
> @@ -1089,6 +1089,11 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
>  				strbuf_addstr(&buf, pack_basename(p));
>  				strbuf_strip_suffix(&buf, ".pack");
> 
> +				if ((p->pack_keep) ||
> +				    (string_list_has_string(&existing_kept_packs,
> +							    buf.buf)))
> +					continue;
> +
>  				remove_redundant_pack(packdir, buf.buf);
>  			}
>  			strbuf_release(&buf);

but what is the difference between p->pack_keep, and being in the
existing_kept_packs list? From the similar logic in
init_pack_geometry():

         /*
          * Any pack that has its pack_keep bit set will appear
          * in existing_kept_packs below, but this saves us from
          * doing a more expensive check.
          */
         if (p->pack_keep)
                 continue;

         /*
          * The pack may be kept via the --keep-pack option;
          * check 'existing_kept_packs' to determine whether to
          * ignore it.
          */
         strbuf_reset(&buf);
         strbuf_addstr(&buf, pack_basename(p));
         strbuf_strip_suffix(&buf, ".pack");

         if (string_list_has_string(existing_kept_packs, buf.buf))
                  continue;

it looks like one is a superset of the other, but checking p->pack_keep
is just slightly cheaper, so we do it first. And checking just
p->pack_keep isn't sufficient; the list has extra packs from the
--keep-pack command-line. So your new code checking both is correct.

But one thing to note. In that existing logic, the two checks are split
apart, since in the fast path (i.e., when p->pack_keep is set) we can
skip the extra work to form the pack string. We could do that here, but
I doubt it matters much:

  1. It's really not that expensive. It's a little extra string
     manipulation and a binary-search lookup for each pack.

  2. Normally most packs aren't kept, so we'd end up in the "slow" path
     most of the time anyway.

So I suspect that existing code is premature optimization, and you
really could just look in existing_kept_packs (both there and in this
patch).

But I don't mind this patch copying the existing logic in the meantime.
It's exactly this kind of "while we are in the area" cleanup where you
end up surprised that the p->pack_keep check really was covering some
subtle corner case. ;)

So the patch looks good to me as-is (and sorry for the verbose review;
we've just had enough tricky corner cases in this repack code that I
wanted to make sure I understood all the implications).

> This had mostly been a mystery that didn't occur at high enough volume
> to justify looking into. But it went away entirely after merging in
> v2.36.x, which contains e4d0c11c04.
> 
> Some investigating with Victoria today led to the patch above, which is
> still relevant since e4d0c11c04 papers over an existing bug.

Thank you both for digging into this. Because of the scale, GitHub has a
unique opportunity to trigger these kind of weird corner cases and races
that would otherwise go unreported and just cause occasional
head-scratching among people running sane-sized Git servers.

-Peff

  reply	other threads:[~2022-10-18  6:55 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-18  2:26 [PATCH] repack: don't remove .keep packs with `--pack-kept-objects` Taylor Blau
2022-10-18  6:54 ` Jeff King [this message]
2022-10-21  3:14   ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y05NwIfY/906b2Vd@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=derrickstolee@github.com \
    --cc=git@vger.kernel.org \
    --cc=me@ttaylorr.com \
    --cc=vdye@github.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).