git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git gc --auto yelling at users where a repo legitimately has >6700 loose objects
@ 2018-01-11 21:33 Ævar Arnfjörð Bjarmason
  2018-01-12 12:07 ` Duy Nguyen
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-01-11 21:33 UTC (permalink / raw)
  To: Git Mailing List
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy,
	Christian Couder

I recently disabled gc.auto=0 and my nightly aggressive repack script on
our big monorepo across our infra, relying instead on git gc --auto in
the background to just do its thing.

I didn't want users to wait for git-gc, and I'd written this nightly
cronjob before git-gc learned to detach to the background.

But now I have git-gc on some servers yelling at users on every pull
command:

    warning: There are too many unreachable loose objects; run 'git prune' to remove them.

The reason is that I have all the values at git's default settings, and
there legitimately are >~6700 loose objects that were created in the
last 2 weeks.

For those rusty on git-gc's defaults, this is what it looks like in this
scenario:

 1. User runs "git pull"
 2. git gc --auto is called, there are >6700 loose objects
 3. it forks into the background, tries to prune and repack, objects
    older than gc.pruneExpire (2.weeks.ago) are pruned.
 4. At the end of all this, we check *again* if we have >6700 objects,
    if we do we print "run 'git prune'" to .git/gc.log, and will just
    emit that error for the next day before trying again, at which point
    we unlink the gc.log and retry, see gc.logExpiry.

Right now I've just worked around this by setting gc.pruneExpire to a
lower value (4.days.ago). But there's a larger issue to be addressed
here, and I'm not sure how.

When the warning was added in [1] it didn't know to detach to the
background yet, that came in [2], shortly after came gc.log in [3].

We could add another gc.auto-like limit, which could be set at some
higher value than gc.auto. "Hey if I have more than 6700 loose objects,
prune the <2wks old ones, but if at the end there's still >6700 I don't
want to hear about it unless there's >6700*N".

I thought I'd just add that, but the details of how to pass that message
around get nasty. With that solution we *also* don't want git gc to
start churning in the background once we reach >6700 objects, so we need
something like gc.logExpiry which defers the gc until the next day. We
might need to create .git/gc-waitabit.marker, ew.

More generally, these hard limits seem contrary to what the user cares
about. E.g. I suspect that most of these loose objects come from
branches since deleted in upstream, whose objects could have a different
retention policy.

Or we could say "I want 2 weeks of objects, but if that runs against the
6700 limit just keep the latest 6700/2".

1. a087cc9819 ("git-gc --auto: protect ourselves from accumulated
   cruft", 2007-09-17)
2. 9f673f9477 ("gc: config option for running --auto in background",
   2014-02-08)
3. 329e6e8794 ("gc: save log from daemonized gc --auto and print it next
   time", 2015-09-19)

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-02-08 16:23 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-11 21:33 git gc --auto yelling at users where a repo legitimately has >6700 loose objects Ævar Arnfjörð Bjarmason
2018-01-12 12:07 ` Duy Nguyen
2018-01-12 13:41   ` Duy Nguyen
2018-01-12 14:44   ` Ævar Arnfjörð Bjarmason
2018-01-13 10:07     ` Jeff King
2018-01-12 13:46 ` Jeff King
2018-01-12 14:23   ` Duy Nguyen
2018-01-13  9:58     ` Jeff King
2018-02-08 16:23 ` Ævar Arnfjörð Bjarmason

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).