git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Jeff King <peff@peff.net>
Cc: "Jonathan Nieder" <jrnieder@gmail.com>,
	"Jonathan Tan" <jonathantanmy@google.com>,
	git@vger.kernel.org, "Andrii Dehtiarov" <adehtiarov@google.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: Re: [PATCH] gc: do not warn about too many loose objects
Date: Wed, 18 Jul 2018 15:11:38 +0200	[thread overview]
Message-ID: <874lgwd6yd.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <20180717202700.GE26218@sigill.intra.peff.net>


On Tue, Jul 17 2018, Jeff King wrote:

> On Tue, Jul 17, 2018 at 10:59:50AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> That doesn't make sense to me. Just because git itself is happily
>> ignoring the exit code I don't think that should mean there shouldn't be
>> a meaningful exit code. Why don't you just ignore the exit code in the
>> repo tool as well?
>>
>> Now e.g. automation I have to see if git-gc ---auto is having issues
>> can't just be 'git gc --auto || echo fail @ {hostname}' across a fleet
>> of servers, but will need to start caring if stderr was emitted to.
>
> If you're daemonizing gc, though, there are a number of cases where the
> exit code is not propagated. If you really care about the outcome,
> you're probably better off either:

In theory a lot of the other stuff can fail, but in practice I've only
seen it get stuck due to running out of disk space (easily detected in
other ways), and because of having too many loose objects (e.g. due to,
but not limited to
https://public-inbox.org/git/87fu6bmr0j.fsf@evledraar.gmail.com/).

>   - doing synchronous gc's, which will still return a meaningful code
>     after Jonathan's patches

(Reply to this and "we've found at GitHub..." later in your mail)

I'm targeting clients (dev machines, laptops, staging boxes) which have
clones of repos, some put in place by automation, some manually cloned
by users in ~.

So it's much easier to rely on shipping a /etc/gitconfig than setting
gc.auto=0 and having some system-wide job that goes and hunts for local
git repos to manually gc.

It's also a big advantage that it's driven by user activity, because
it's an implicit guard of only_do_this_if_the_user_is_active_here()
before "git-gc" on a fleet of machines, which when some of those get
their disk space from shared NFS resources cuts down on overall I/O.

>   - inspecting the log yourself. I know that comes close to the un-unixy
>     stderr thing. But in this case, the presence of a non-empty log is
>     literally the on-disk bit for "the previous run failed". And doing
>     so catches all of the daemonized cases, even the "first one" that
>     you'd miss by paying attention to the immediate exit code.
>
>     This will treat the zero-exit-code "warning" case as an error. I'm
>     not against propagating the true original error code, if somebody
>     wants to work on it. But I think Jonathan's patches are a strict
>     improvement over the current situation, and a patch to propagate
>     could come on top.

Yeah, I can check gc.log, if others are of a different opinion about
this #3 case at the end of the day I don't mind if it returns 0. It just
doesn't make any conceptual sense to me.

>> I think you're conflating two things here in a way that (if I'm reading
>> this correctly) produces a pretty bad regression for users.
>>
>>  a) If we have something in the gc.log we keep yelling at users until we
>>     reach the gc.logExpiry, this was the subject of my thread back in
>>     January. This sucks, and should be fixed somehow.
>>
>>     Maybe we should only emit the warning once, e.g. creating a
>>     .git/gc.log.wasemitted marker and as long as its ctime is later than
>>     the mtime on .git/gc.log we don't emit it again).
>>
>>     But that also creates the UX problem that it's easy to miss this
>>     message in the middle of some huge "fetch" output. Perhaps we should
>>     just start emitting this as part of "git status" or something (or
>>     solve the root cause, as Peff notes...).
>
> I kind of like that "git status" suggestion. Of course many users run
> "git status" more often than "git commit", so it may actually spam them
> more!

Maybe piggy-packing on the advice facility ...

>>  b) We *also* use this presence of a gc.log as a marker for "we failed
>>     too recently, let's not try again until after a day".
>>
>> The semantics of b) are very useful, and just because they're tied up
>> with a) right now, let's not throw out b) just because we're trying to
>> solve a).
>
> Yeah. In general my concern was the handling of (b), which I think this
> last crop of patches is fine with. As far as showing the message
> repeatedly or not, I don't have a strong opinion (it could even be
> configurable, once it's split from the "marker" behavior).
>
>> Right now one thing we do right is we always try to look at the actual
>> state of the repo with too_many_packs() and too_many_loose_objects().
>>
>> We don't assume the state of your repo hasn't drastically changed
>> recently. That means that we do the right thing and gc when the repo
>> needs it, not just because we GC'd recently enough.
>>
>> With these proposed semantics we'd skip a needed GC (potentially for
>> days, depending on the default) just because we recently ran one.
>
> Yeah, I agree that deferring repeated gc's based on time is going to run
> into pathological corner cases. OTOH, what we've found at GitHub is that
> "gc --auto" is quite insufficient for scheduling maintenance anyway
> (e.g., if a machine gets pushes to 100 separate repositories in quick
> succession, you probably want to queue and throttle any associated gc).

I'm sure you have better solutions for this at GitHub, but as an aside
it might be interesting to add some sort of gc flock + retry setting for
this use-case, i.e. even if you had 100 concurrent gc's due to
too_many_*(), they'd wait + retry until they could get the flock on a
given file.

Then again this is probably too specific, and could be done with a
pre-auto-gc hook too..

> But there are probably more mid-size sites that are big enough to have
> corner cases, but not so big that "gc --auto" is hopeless. ;)

FWIW I don't deal with gc on the server at all these days (that's our
internal GitLab's instance problem to solve). I just mentioned the edge
case of a growing number of pack files on the server upthread as
something that we'd be shipping with git if we had time-based backoff
semantics.

  reply	other threads:[~2018-07-18 13:11 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-16 17:27 [PATCH] gc: do not warn about too many loose objects Jonathan Tan
2018-07-16 17:51 ` Jeff King
2018-07-16 18:22   ` Jonathan Nieder
2018-07-16 18:52     ` Jeff King
2018-07-16 19:09       ` Jonathan Nieder
2018-07-16 19:41         ` Jeff King
2018-07-16 19:54           ` Jonathan Nieder
2018-07-16 20:29             ` Jeff King
2018-07-16 20:37               ` Jonathan Nieder
2018-07-16 21:09                 ` Jeff King
2018-07-16 21:40                   ` Jonathan Nieder
2018-07-16 21:45                     ` Jeff King
2018-07-16 22:03                       ` Jonathan Nieder
2018-07-16 22:43                         ` Jeff King
2018-07-16 22:56                           ` Jonathan Nieder
2018-07-16 23:26                             ` Jeff King
2018-07-17  1:53                               ` Jonathan Nieder
2018-07-17  8:59                                 ` Ævar Arnfjörð Bjarmason
2018-07-17 14:03                                   ` Jonathan Nieder
2018-07-17 15:24                                     ` Ævar Arnfjörð Bjarmason
2018-07-17 20:27                                   ` Jeff King
2018-07-18 13:11                                     ` Ævar Arnfjörð Bjarmason [this message]
2018-07-18 17:29                                       ` Jeff King
2018-07-17 15:59                                 ` Duy Nguyen
2018-07-17 18:09                                 ` Junio C Hamano
2018-07-16 19:15 ` Elijah Newren
2018-07-16 19:19   ` Jonathan Nieder
2018-07-16 20:21     ` Elijah Newren
2018-07-16 20:35       ` Jeff King
2018-07-16 20:56         ` Jonathan Nieder
2018-07-16 21:12           ` Jeff King
2018-07-16 19:52   ` Jeff King
2018-07-16 20:16     ` Elijah Newren
2018-07-16 20:38       ` Jeff King
2018-07-16 21:09         ` Elijah Newren
2018-07-16 21:21           ` Jeff King
2018-07-16 22:07             ` Elijah Newren
2018-07-16 22:55               ` Jeff King
2018-07-16 23:06                 ` Elijah Newren
2018-07-16 21:31           ` Jonathan Nieder
2018-07-17  6:51 ` [PATCH v2 0/3] gc --auto: do not return error for prior errors in daemonized mode Jonathan Nieder
2018-07-17  6:53   ` [PATCH 1/3] gc: improve handling of errors reading gc.log Jonathan Nieder
2018-07-17 18:19     ` Junio C Hamano
2018-07-17 19:58     ` Jeff King
2018-07-17  6:54   ` [PATCH 2/3] gc: exit with status 128 on failure Jonathan Nieder
2018-07-17 18:22     ` Junio C Hamano
2018-07-17 19:59     ` Jeff King
2018-09-17 18:33       ` Jeff King
2018-09-17 18:40         ` Jonathan Nieder
2018-09-18 17:30           ` Jeff King
2018-07-17  6:57   ` [PATCH 3/3] gc: do not return error for prior errors in daemonized mode Jonathan Nieder
2018-07-17 20:13     ` Jeff King
2018-07-18 16:21       ` Junio C Hamano
2018-07-18 17:22         ` Jeff King
2018-07-18 18:19           ` Junio C Hamano
2018-07-18 19:06             ` Jeff King
2018-07-18 19:55               ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874lgwd6yd.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=adehtiarov@google.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).