git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: ZheNing Hu <adlternative@gmail.com>
Cc: Git List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>,
	Christian Couder <christian.couder@gmail.com>,
	johncai86@gmail.com, Taylor Blau <me@ttaylorr.com>
Subject: Re: Question: How to execute git-gc correctly on the git server
Date: Thu, 08 Dec 2022 00:57:45 +0100	[thread overview]
Message-ID: <221208.86a63y9309.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <CAOLTT8Tt3jW2yvm6BRU3yG+EvW1WG9wWFq6PuOcaHNNLQAaGjg@mail.gmail.com>


On Wed, Dec 07 2022, ZheNing Hu wrote:

> I would like to run git gc on my git server periodically, which should help
> reduce storage space and optimize the read performance of the repository.
> I know github, gitlab all have this process...
>
> But the concurrency between git gc and other git commands is holding
> me back a bit.
>
> git-gc [1] docs say:
>
>     On the other hand, when git gc runs concurrently with another process,
>     there is a risk of it deleting an object that the other process is using but
>     hasn’t created a reference to. This may just cause the other process to
>     fail or may corrupt the repository if the other process later adds
> a reference
>     to the deleted object.
>
> It seems that git gc is a dangerous operation that may cause data corruption
> concurrently with other git commands.
>
> Then I read the contents of Github's blog [2], git gc ---cruft seems to be used
> to keep those expiring unreachable objects in a cruft pack, but the blog says
> github use some special "limbo" repository to keep the cruft pack for git data
> recover. Well, a lot of the details here are pretty hard to understand for me :(
>
> However, on the other hand, my git server is still at v2.35, and --cruft was
> introduced in v2.38, so I'm actually more curious about: how did the server
> execute git gc correctly in the past? Do we need a repository level "big lock"
> that blocks most/all other git operations? What should the behavior of users'
> git clone/push be at this time? Report error that the git server is performing
> git gc? Or just wait for git gc to complete?
>
> Thanks for any comments and help!
>
> [1]: https://git-scm.com/docs/git-gc
> [2]: https://github.blog/2022-09-13-scaling-gits-garbage-collection/

Is this for a very large hosting site that's anywhere near GitHub,
GitLab's etc. scale?

A "git gc" on a "live" repo is always racy in theory, but the odds that
you'll run into data corrupting trouble tends to approach zero as you
increase the gc.pruneExpire setting, with the default 2 weeks being more
than enough for even the most paranoid user.

The "cruft pack" facility does many different things, and my
understanding of it is that GitHub's not using it only as an end-run
around potential corruption issues, but that some not yet in tree
patches on top of it allow more aggressive "gc" without the fear of
corruption.

So, I think you probably don't need to worry about it. Other major
hosting sites do run "git gc" on live repositories, but as always take
backups etc.

  reply	other threads:[~2022-12-08  0:02 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-07 15:58 Question: How to execute git-gc correctly on the git server ZheNing Hu
2022-12-07 23:57 ` Ævar Arnfjörð Bjarmason [this message]
2022-12-08  1:16   ` Michal Suchánek
2022-12-08  7:01     ` Jeff King
2022-12-09  0:49       ` Michal Suchánek
2022-12-09  1:37         ` Jeff King
2022-12-09  7:26           ` ZheNing Hu
2022-12-09 13:48             ` Ævar Arnfjörð Bjarmason
2022-12-11 16:01               ` ZheNing Hu
2022-12-11 16:27                 ` Michal Suchánek
2022-12-09  7:15     ` ZheNing Hu
2022-12-08  6:59   ` Jeff King
2022-12-08 12:35     ` Ævar Arnfjörð Bjarmason
2022-12-14 20:11       ` Taylor Blau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=221208.86a63y9309.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=adlternative@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johncai86@gmail.com \
    --cc=me@ttaylorr.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).