git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Jeff King <peff@peff.net>, ZheNing Hu <adlternative@gmail.com>,
	Git List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>,
	Christian Couder <christian.couder@gmail.com>,
	johncai86@gmail.com
Subject: Re: Question: How to execute git-gc correctly on the git server
Date: Wed, 14 Dec 2022 15:11:57 -0500	[thread overview]
Message-ID: <Y5ouDcvjCaRlCGJf@nand.local> (raw)
In-Reply-To: <221208.86o7se6ou1.gmgdl@evledraar.gmail.com>

On Thu, Dec 08, 2022 at 01:35:04PM +0100, Ævar Arnfjörð Bjarmason wrote:
> >> The "cruft pack" facility does many different things, and my
> >> understanding of it is that GitHub's not using it only as an end-run
> >> around potential corruption issues, but that some not yet in tree
> >> patches on top of it allow more aggressive "gc" without the fear of
> >> corruption.
> >
> > I don't think cruft packs themselves help against corruption that much.
> > For many years, GitHub used "repack -k" to just never expire objects.
> > What cruft packs help with is:
> >
> >   1. They keep cruft objects out of the main pack, which reduces the
> >      costs of lookups and bitmaps for the main pack.

Peff isn't wrong here, but there is a big caveat which is that this is
only true when using a single pack bitmap. Single pack bitmaps are
guaranteed to have reachability closure over their objects, but writing
a MIDX bitmap after generating the MIDX does not afford us the same
guarantees.

So if you have a cruft pack which contains some unreachable object X,
which is made reachable by some other object that *is* reachable from
some reference, *and that* object is included in one of the MIDX's
packs, then we won't have reachability closure unless we also bitmap the
cruft pack, too.

So even though it helps a lot with bitmapping in the single-pack case,
in practice it doesn't make a significant difference with multi-pack
bitmaps.

> >   2. When you _do_ choose to expire, you can do so without worrying
> >      about accidentally exploding all of those old objects into loose
> >      ones (which is not wrong from a correctness point of view, but can
> >      have some amazingly bad performance characteristics).
> >
> > I think the bits you're thinking of on top are in v2.39. The "repack
> > --expire-to" option lets you write objects that _would_ be deleted into
> > a cruft pack, which can serve as a backup (but managing that is out of
> > scope for repack itself, so you have to roll your own strategy there).
>
> Yes, that's what I was referring to.

Yes, we use the `--expire-to` option when doing a pruning GC to move the
expired objects out of the repo to some "../backup.git" location. The
out-of-tree tools that Ævar is speculating is basically running
`cat-file --batch` in the backup repo, feeding it the list of missing
objects, and then writing those objects (back) into the GC'd repository.

> I think I had feedback on that series saying that if held correctly this
> would also nicely solve that long-time race. Maybe I'm just
> misremembering, but I (mis?)recalled that Taylor indicated that it was
> being used like that at GitHub.

It (the above) doesn't solve the race, but it does make it easier to
recover from a corrupt repository when we lose that race.

Thanks,
Taylor

      reply	other threads:[~2022-12-14 20:25 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-07 15:58 Question: How to execute git-gc correctly on the git server ZheNing Hu
2022-12-07 23:57 ` Ævar Arnfjörð Bjarmason
2022-12-08  1:16   ` Michal Suchánek
2022-12-08  7:01     ` Jeff King
2022-12-09  0:49       ` Michal Suchánek
2022-12-09  1:37         ` Jeff King
2022-12-09  7:26           ` ZheNing Hu
2022-12-09 13:48             ` Ævar Arnfjörð Bjarmason
2022-12-11 16:01               ` ZheNing Hu
2022-12-11 16:27                 ` Michal Suchánek
2022-12-09  7:15     ` ZheNing Hu
2022-12-08  6:59   ` Jeff King
2022-12-08 12:35     ` Ævar Arnfjörð Bjarmason
2022-12-14 20:11       ` Taylor Blau [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y5ouDcvjCaRlCGJf@nand.local \
    --to=me@ttaylorr.com \
    --cc=adlternative@gmail.com \
    --cc=avarab@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johncai86@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).