From: Taylor Blau <me@ttaylorr.com>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Jeff King <peff@peff.net>, ZheNing Hu <adlternative@gmail.com>,
Git List <git@vger.kernel.org>,
Junio C Hamano <gitster@pobox.com>,
Christian Couder <christian.couder@gmail.com>,
johncai86@gmail.com
Subject: Re: Question: How to execute git-gc correctly on the git server
Date: Wed, 14 Dec 2022 15:11:57 -0500 [thread overview]
Message-ID: <Y5ouDcvjCaRlCGJf@nand.local> (raw)
In-Reply-To: <221208.86o7se6ou1.gmgdl@evledraar.gmail.com>
On Thu, Dec 08, 2022 at 01:35:04PM +0100, Ævar Arnfjörð Bjarmason wrote:
> >> The "cruft pack" facility does many different things, and my
> >> understanding of it is that GitHub's not using it only as an end-run
> >> around potential corruption issues, but that some not yet in tree
> >> patches on top of it allow more aggressive "gc" without the fear of
> >> corruption.
> >
> > I don't think cruft packs themselves help against corruption that much.
> > For many years, GitHub used "repack -k" to just never expire objects.
> > What cruft packs help with is:
> >
> > 1. They keep cruft objects out of the main pack, which reduces the
> > costs of lookups and bitmaps for the main pack.
Peff isn't wrong here, but there is a big caveat which is that this is
only true when using a single pack bitmap. Single pack bitmaps are
guaranteed to have reachability closure over their objects, but writing
a MIDX bitmap after generating the MIDX does not afford us the same
guarantees.
So if you have a cruft pack which contains some unreachable object X,
which is made reachable by some other object that *is* reachable from
some reference, *and that* object is included in one of the MIDX's
packs, then we won't have reachability closure unless we also bitmap the
cruft pack, too.
So even though it helps a lot with bitmapping in the single-pack case,
in practice it doesn't make a significant difference with multi-pack
bitmaps.
> > 2. When you _do_ choose to expire, you can do so without worrying
> > about accidentally exploding all of those old objects into loose
> > ones (which is not wrong from a correctness point of view, but can
> > have some amazingly bad performance characteristics).
> >
> > I think the bits you're thinking of on top are in v2.39. The "repack
> > --expire-to" option lets you write objects that _would_ be deleted into
> > a cruft pack, which can serve as a backup (but managing that is out of
> > scope for repack itself, so you have to roll your own strategy there).
>
> Yes, that's what I was referring to.
Yes, we use the `--expire-to` option when doing a pruning GC to move the
expired objects out of the repo to some "../backup.git" location. The
out-of-tree tools that Ævar is speculating is basically running
`cat-file --batch` in the backup repo, feeding it the list of missing
objects, and then writing those objects (back) into the GC'd repository.
> I think I had feedback on that series saying that if held correctly this
> would also nicely solve that long-time race. Maybe I'm just
> misremembering, but I (mis?)recalled that Taylor indicated that it was
> being used like that at GitHub.
It (the above) doesn't solve the race, but it does make it easier to
recover from a corrupt repository when we lose that race.
Thanks,
Taylor
prev parent reply other threads:[~2022-12-14 20:25 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-07 15:58 Question: How to execute git-gc correctly on the git server ZheNing Hu
2022-12-07 23:57 ` Ævar Arnfjörð Bjarmason
2022-12-08 1:16 ` Michal Suchánek
2022-12-08 7:01 ` Jeff King
2022-12-09 0:49 ` Michal Suchánek
2022-12-09 1:37 ` Jeff King
2022-12-09 7:26 ` ZheNing Hu
2022-12-09 13:48 ` Ævar Arnfjörð Bjarmason
2022-12-11 16:01 ` ZheNing Hu
2022-12-11 16:27 ` Michal Suchánek
2022-12-09 7:15 ` ZheNing Hu
2022-12-08 6:59 ` Jeff King
2022-12-08 12:35 ` Ævar Arnfjörð Bjarmason
2022-12-14 20:11 ` Taylor Blau [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y5ouDcvjCaRlCGJf@nand.local \
--to=me@ttaylorr.com \
--cc=adlternative@gmail.com \
--cc=avarab@gmail.com \
--cc=christian.couder@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=johncai86@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).