git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Git Mailing List <git@vger.kernel.org>
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Christian Couder" <christian.couder@gmail.com>
Subject: Re: git gc --auto yelling at users where a repo legitimately has >6700 loose objects
Date: Thu, 08 Feb 2018 17:23:47 +0100	[thread overview]
Message-ID: <87eflvmovg.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <87inc89j38.fsf@evledraar.gmail.com>


On Thu, Jan 11 2018, Ævar Arnfjörð Bjarmason jotted:

> I recently disabled gc.auto=0 and my nightly aggressive repack script on
> our big monorepo across our infra, relying instead on git gc --auto in
> the background to just do its thing.
>
> I didn't want users to wait for git-gc, and I'd written this nightly
> cronjob before git-gc learned to detach to the background.
>
> But now I have git-gc on some servers yelling at users on every pull
> command:
>
>     warning: There are too many unreachable loose objects; run 'git prune' to remove them.
>
> The reason is that I have all the values at git's default settings, and
> there legitimately are >~6700 loose objects that were created in the
> last 2 weeks.
>
> For those rusty on git-gc's defaults, this is what it looks like in this
> scenario:
>
>  1. User runs "git pull"
>  2. git gc --auto is called, there are >6700 loose objects
>  3. it forks into the background, tries to prune and repack, objects
>     older than gc.pruneExpire (2.weeks.ago) are pruned.
>  4. At the end of all this, we check *again* if we have >6700 objects,
>     if we do we print "run 'git prune'" to .git/gc.log, and will just
>     emit that error for the next day before trying again, at which point
>     we unlink the gc.log and retry, see gc.logExpiry.
>
> Right now I've just worked around this by setting gc.pruneExpire to a
> lower value (4.days.ago). But there's a larger issue to be addressed
> here, and I'm not sure how.
>
> When the warning was added in [1] it didn't know to detach to the
> background yet, that came in [2], shortly after came gc.log in [3].
>
> We could add another gc.auto-like limit, which could be set at some
> higher value than gc.auto. "Hey if I have more than 6700 loose objects,
> prune the <2wks old ones, but if at the end there's still >6700 I don't
> want to hear about it unless there's >6700*N".
>
> I thought I'd just add that, but the details of how to pass that message
> around get nasty. With that solution we *also* don't want git gc to
> start churning in the background once we reach >6700 objects, so we need
> something like gc.logExpiry which defers the gc until the next day. We
> might need to create .git/gc-waitabit.marker, ew.
>
> More generally, these hard limits seem contrary to what the user cares
> about. E.g. I suspect that most of these loose objects come from
> branches since deleted in upstream, whose objects could have a different
> retention policy.
>
> Or we could say "I want 2 weeks of objects, but if that runs against the
> 6700 limit just keep the latest 6700/2".
>
> 1. a087cc9819 ("git-gc --auto: protect ourselves from accumulated
>    cruft", 2007-09-17)
> 2. 9f673f9477 ("gc: config option for running --auto in background",
>    2014-02-08)
> 3. 329e6e8794 ("gc: save log from daemonized gc --auto and print it next
>    time", 2015-09-19)

My just-sent "How to produce a loose ref+size explosion via pruning +
git-gc", <87fu6bmr0j.fsf@evledraar.gmail.com>
(https://public-inbox.org/git/87fu6bmr0j.fsf@evledraar.gmail.com/),
shows an easy way to reproduce this.

After the steps outlined there git-gc --auto will end up in a state
where it'll start telling the user off for having too many loose
objects.

      parent reply	other threads:[~2018-02-08 16:23 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-11 21:33 git gc --auto yelling at users where a repo legitimately has >6700 loose objects Ævar Arnfjörð Bjarmason
2018-01-12 12:07 ` Duy Nguyen
2018-01-12 13:41   ` Duy Nguyen
2018-01-12 14:44   ` Ævar Arnfjörð Bjarmason
2018-01-13 10:07     ` Jeff King
2018-01-12 13:46 ` Jeff King
2018-01-12 14:23   ` Duy Nguyen
2018-01-13  9:58     ` Jeff King
2018-02-08 16:23 ` Ævar Arnfjörð Bjarmason [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87eflvmovg.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).