git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: "Jonathan Nieder" <jrnieder@gmail.com>,
	"Jonathan Tan" <jonathantanmy@google.com>,
	git@vger.kernel.org, "Andrii Dehtiarov" <adehtiarov@google.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: Re: [PATCH] gc: do not warn about too many loose objects
Date: Tue, 17 Jul 2018 16:27:00 -0400	[thread overview]
Message-ID: <20180717202700.GE26218@sigill.intra.peff.net> (raw)
In-Reply-To: <87d0vmck55.fsf@evledraar.gmail.com>

On Tue, Jul 17, 2018 at 10:59:50AM +0200, Ævar Arnfjörð Bjarmason wrote:

> That doesn't make sense to me. Just because git itself is happily
> ignoring the exit code I don't think that should mean there shouldn't be
> a meaningful exit code. Why don't you just ignore the exit code in the
> repo tool as well?
> 
> Now e.g. automation I have to see if git-gc ---auto is having issues
> can't just be 'git gc --auto || echo fail @ {hostname}' across a fleet
> of servers, but will need to start caring if stderr was emitted to.

If you're daemonizing gc, though, there are a number of cases where the
exit code is not propagated. If you really care about the outcome,
you're probably better off either:

  - doing synchronous gc's, which will still return a meaningful code
    after Jonathan's patches

  - inspecting the log yourself. I know that comes close to the un-unixy
    stderr thing. But in this case, the presence of a non-empty log is
    literally the on-disk bit for "the previous run failed". And doing
    so catches all of the daemonized cases, even the "first one" that
    you'd miss by paying attention to the immediate exit code.

    This will treat the zero-exit-code "warning" case as an error. I'm
    not against propagating the true original error code, if somebody
    wants to work on it. But I think Jonathan's patches are a strict
    improvement over the current situation, and a patch to propagate
    could come on top.

> I think you're conflating two things here in a way that (if I'm reading
> this correctly) produces a pretty bad regression for users.
> 
>  a) If we have something in the gc.log we keep yelling at users until we
>     reach the gc.logExpiry, this was the subject of my thread back in
>     January. This sucks, and should be fixed somehow.
> 
>     Maybe we should only emit the warning once, e.g. creating a
>     .git/gc.log.wasemitted marker and as long as its ctime is later than
>     the mtime on .git/gc.log we don't emit it again).
> 
>     But that also creates the UX problem that it's easy to miss this
>     message in the middle of some huge "fetch" output. Perhaps we should
>     just start emitting this as part of "git status" or something (or
>     solve the root cause, as Peff notes...).

I kind of like that "git status" suggestion. Of course many users run
"git status" more often than "git commit", so it may actually spam them
more!

>  b) We *also* use this presence of a gc.log as a marker for "we failed
>     too recently, let's not try again until after a day".
> 
> The semantics of b) are very useful, and just because they're tied up
> with a) right now, let's not throw out b) just because we're trying to
> solve a).

Yeah. In general my concern was the handling of (b), which I think this
last crop of patches is fine with. As far as showing the message
repeatedly or not, I don't have a strong opinion (it could even be
configurable, once it's split from the "marker" behavior).

> Right now one thing we do right is we always try to look at the actual
> state of the repo with too_many_packs() and too_many_loose_objects().
> 
> We don't assume the state of your repo hasn't drastically changed
> recently. That means that we do the right thing and gc when the repo
> needs it, not just because we GC'd recently enough.
> 
> With these proposed semantics we'd skip a needed GC (potentially for
> days, depending on the default) just because we recently ran one.

Yeah, I agree that deferring repeated gc's based on time is going to run
into pathological corner cases. OTOH, what we've found at GitHub is that
"gc --auto" is quite insufficient for scheduling maintenance anyway
(e.g., if a machine gets pushes to 100 separate repositories in quick
succession, you probably want to queue and throttle any associated gc).

But there are probably more mid-size sites that are big enough to have
corner cases, but not so big that "gc --auto" is hopeless. ;)

-Peff

  parent reply	other threads:[~2018-07-17 20:27 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-16 17:27 [PATCH] gc: do not warn about too many loose objects Jonathan Tan
2018-07-16 17:51 ` Jeff King
2018-07-16 18:22   ` Jonathan Nieder
2018-07-16 18:52     ` Jeff King
2018-07-16 19:09       ` Jonathan Nieder
2018-07-16 19:41         ` Jeff King
2018-07-16 19:54           ` Jonathan Nieder
2018-07-16 20:29             ` Jeff King
2018-07-16 20:37               ` Jonathan Nieder
2018-07-16 21:09                 ` Jeff King
2018-07-16 21:40                   ` Jonathan Nieder
2018-07-16 21:45                     ` Jeff King
2018-07-16 22:03                       ` Jonathan Nieder
2018-07-16 22:43                         ` Jeff King
2018-07-16 22:56                           ` Jonathan Nieder
2018-07-16 23:26                             ` Jeff King
2018-07-17  1:53                               ` Jonathan Nieder
2018-07-17  8:59                                 ` Ævar Arnfjörð Bjarmason
2018-07-17 14:03                                   ` Jonathan Nieder
2018-07-17 15:24                                     ` Ævar Arnfjörð Bjarmason
2018-07-17 20:27                                   ` Jeff King [this message]
2018-07-18 13:11                                     ` Ævar Arnfjörð Bjarmason
2018-07-18 17:29                                       ` Jeff King
2018-07-17 15:59                                 ` Duy Nguyen
2018-07-17 18:09                                 ` Junio C Hamano
2018-07-16 19:15 ` Elijah Newren
2018-07-16 19:19   ` Jonathan Nieder
2018-07-16 20:21     ` Elijah Newren
2018-07-16 20:35       ` Jeff King
2018-07-16 20:56         ` Jonathan Nieder
2018-07-16 21:12           ` Jeff King
2018-07-16 19:52   ` Jeff King
2018-07-16 20:16     ` Elijah Newren
2018-07-16 20:38       ` Jeff King
2018-07-16 21:09         ` Elijah Newren
2018-07-16 21:21           ` Jeff King
2018-07-16 22:07             ` Elijah Newren
2018-07-16 22:55               ` Jeff King
2018-07-16 23:06                 ` Elijah Newren
2018-07-16 21:31           ` Jonathan Nieder
2018-07-17  6:51 ` [PATCH v2 0/3] gc --auto: do not return error for prior errors in daemonized mode Jonathan Nieder
2018-07-17  6:53   ` [PATCH 1/3] gc: improve handling of errors reading gc.log Jonathan Nieder
2018-07-17 18:19     ` Junio C Hamano
2018-07-17 19:58     ` Jeff King
2018-07-17  6:54   ` [PATCH 2/3] gc: exit with status 128 on failure Jonathan Nieder
2018-07-17 18:22     ` Junio C Hamano
2018-07-17 19:59     ` Jeff King
2018-09-17 18:33       ` Jeff King
2018-09-17 18:40         ` Jonathan Nieder
2018-09-18 17:30           ` Jeff King
2018-07-17  6:57   ` [PATCH 3/3] gc: do not return error for prior errors in daemonized mode Jonathan Nieder
2018-07-17 20:13     ` Jeff King
2018-07-18 16:21       ` Junio C Hamano
2018-07-18 17:22         ` Jeff King
2018-07-18 18:19           ` Junio C Hamano
2018-07-18 19:06             ` Jeff King
2018-07-18 19:55               ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180717202700.GE26218@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=adehtiarov@google.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).