git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: ZheNing Hu <adlternative@gmail.com>,
	Git List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>,
	Christian Couder <christian.couder@gmail.com>,
	johncai86@gmail.com, Taylor Blau <me@ttaylorr.com>
Subject: Re: Question: How to execute git-gc correctly on the git server
Date: Thu, 8 Dec 2022 01:59:37 -0500	[thread overview]
Message-ID: <Y5GLWe4Kdaz+T5P8@coredump.intra.peff.net> (raw)
In-Reply-To: <221208.86a63y9309.gmgdl@evledraar.gmail.com>

On Thu, Dec 08, 2022 at 12:57:45AM +0100, Ævar Arnfjörð Bjarmason wrote:

> Is this for a very large hosting site that's anywhere near GitHub,
> GitLab's etc. scale?
> 
> A "git gc" on a "live" repo is always racy in theory, but the odds that
> you'll run into data corrupting trouble tends to approach zero as you
> increase the gc.pruneExpire setting, with the default 2 weeks being more
> than enough for even the most paranoid user.

I'm a bit less optimistic than "tends to approach zero" here. The
objects themselves might be older, but they may become referenced or
unreferenced in an immediate racy way. E.g., intending to move branch Z
to branch A, a client asks to add A and remove Z. Since there is no
atomic view of the ref namespace, a simultaneous gc might see Z gone,
but A not yet existing (and of course it could also be _two_ clients,
etc).

> The "cruft pack" facility does many different things, and my
> understanding of it is that GitHub's not using it only as an end-run
> around potential corruption issues, but that some not yet in tree
> patches on top of it allow more aggressive "gc" without the fear of
> corruption.

I don't think cruft packs themselves help against corruption that much.
For many years, GitHub used "repack -k" to just never expire objects.
What cruft packs help with is:

  1. They keep cruft objects out of the main pack, which reduces the
     costs of lookups and bitmaps for the main pack.

  2. When you _do_ choose to expire, you can do so without worrying
     about accidentally exploding all of those old objects into loose
     ones (which is not wrong from a correctness point of view, but can
     have some amazingly bad performance characteristics).

I think the bits you're thinking of on top are in v2.39. The "repack
--expire-to" option lets you write objects that _would_ be deleted into
a cruft pack, which can serve as a backup (but managing that is out of
scope for repack itself, so you have to roll your own strategy there).

-Peff

  parent reply	other threads:[~2022-12-08  6:59 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-07 15:58 Question: How to execute git-gc correctly on the git server ZheNing Hu
2022-12-07 23:57 ` Ævar Arnfjörð Bjarmason
2022-12-08  1:16   ` Michal Suchánek
2022-12-08  7:01     ` Jeff King
2022-12-09  0:49       ` Michal Suchánek
2022-12-09  1:37         ` Jeff King
2022-12-09  7:26           ` ZheNing Hu
2022-12-09 13:48             ` Ævar Arnfjörð Bjarmason
2022-12-11 16:01               ` ZheNing Hu
2022-12-11 16:27                 ` Michal Suchánek
2022-12-09  7:15     ` ZheNing Hu
2022-12-08  6:59   ` Jeff King [this message]
2022-12-08 12:35     ` Ævar Arnfjörð Bjarmason
2022-12-14 20:11       ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y5GLWe4Kdaz+T5P8@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=adlternative@gmail.com \
    --cc=avarab@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johncai86@gmail.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).