git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: Jonathan Tan <jonathantanmy@google.com>, git@vger.kernel.org
Subject: Re: [PATCH] gc: do not warn about too many loose objects
Date: Mon, 16 Jul 2018 17:09:38 -0400	[thread overview]
Message-ID: <20180716210938.GF25189@sigill.intra.peff.net> (raw)
In-Reply-To: <20180716203753.GE11513@aiede.svl.corp.google.com>

On Mon, Jul 16, 2018 at 01:37:53PM -0700, Jonathan Nieder wrote:

> >                                                        and I think the
> > right solution is to pack the unreachable objects instead of exploding
> > them.
> 
> That seems like a huge widening in scope relative to what this
> original patch tackled.

I'm not at all saying that Jonathan needs to work on that. It's a big
topic that we've been putting off for years. I _am_ saying that I don't
think his patch as-is should be merged, as it makes the daemonized
behavior here worse.

> I'm not aware of a way to do this without
> breaking compatibility with the current (broken) race prevention.  If
> you're saying that breaking compatibility in that way is okay with
> you, then great!

It depends what you're trying to fix. If you're trying to fix auto-gc
outputting an annoying message (or repeatedly doing useless invocations
of pack-objects), you don't have to break compatibility at all. You just
have to put the objects into a pack instead of exploding them loose.
Worst case if you use an old implementation to run "git gc", it may
accidentally freshen a cruft pack's timestamp.

If you want to improve the raciness, then yes, I think you need stronger
rules around marking packs as cruft/garbage, and making operations that
want to use those objects copy them out to a non-cruft location (like
you describe in the hash transition document). There you risk corruption
if an old implementation fails to make the non-cruft copy. But that's
really no worse than we are today.

So I don't even really see it as a backwards-compatibility break. But
even if it were, I'd probably still be fine with it. This is a
local-repo thing, and in the absolute worst case we could bump
core.repositoryformatversion (though again, I don't even think that
would be necessary since it would degrade to behaving just as the
current code does).

> > With the current code, that produces a bunch of annoying warnings for
> > every commit ("I couldn't gc because the last gc reported a warning").
> > But after Jonathan's patch, every single commit will do a full gc of the
> > repository. In this tiny repository that's relatively quick, but in a
> > real repo it's a ton of CPU and I/O, all for nothing.
> 
> I see.  Do I understand correctly that if we find a way to print the
> warning but not error out, that would be good enough for you?

Yes. I thought that's what I proposed originally. ;)

The key thing is that the presence of the warning/log still suppress
the actual gc for the gc.logExpiry period.

> >> Have you looked over the discussion in "Loose objects and unreachable
> >> objects" in Documentation/technical/hash-function-transition.txt?  Do
> >> you have thoughts on it (preferrably in a separate thread)?
> >
> > It seems to propose putting the unreachable objects into a pack. Which
> > yes, I absolutely agree with (as I thought I'd been saying in every
> > single email in this thread).
> 
> I figured you were proposing something like
> https://public-inbox.org/git/20180113100734.GA30698@sigill.intra.peff.net/,
> which is still racy (because it does not handle the freshening in a safe
> way).

That message is just me telling somebody that their idea _won't_ work. ;)

But I assume you mean the general idea of putting things in a cruft
pack[1]. Yes, I was only suggesting going that far as a solution to the
auto-gc woes. Solving races is another issue entirely (though obviously
it may make sense to build on the single-pack work, if it exists).

[1] The best message discussion on that is I think:

      https://public-inbox.org/git/20170610080626.sjujpmgkli4muh7h@sigill.intra.peff.net/

    which I think I linked earlier, so possibly that is even what you
    meant. :)

> What decides it for me is that the user did not invoke "git gc --auto"
> explicitly, so anything "git gc --auto" prints is tangential to what
> the user was trying to do.  If the gc failed, that is worth telling
> them, but if e.g. it encountered a disk I/O error reading and
> succeeded on retry (to make up a fake example), then that's likely
> worth logging to syslog but it's not something the user asked to be
> directly informed about.

I'm not sure I agree. If a repack discovered that you had a bit
corruption on disk, but you happened to have another copy of the object
available that made the repack succeed, I'd like to know that I'm
getting bit corruptions, and earlier is better. I think we need to
surface that information to the user eventually, and I don't think we
can count on syslog being universally available.

By definition we're daemonized here, so we can't count on even having
access to the user's terminal. But it would make more sense to me if the
logic were:

  - at the end of a "gc --auto" run, gc should write a machine-readable
    indication of its exit code (either as a final line in the log file,
    or to a gc.exitcode file). The warnings/errors remain intermingled
    in the logfile.

  - on the _next_ run, if:

    - the exit code was non-zero, dump the log output to stderr and exit
      with the same code. This is the same as the current behavior (with
      the exception that we propagate the code instead of returning -1,
      which is a minor bonus).

    - the exit code was zero, report to the user synchronously that the
      last run found some oddities, and dump the log. Proceed with the
      rest of auto-gc (either exiting 0 if nothing to be done, or doing
      useful work).

      Optionally, delete the log after showing it so the user is not
      spammed with the warning on every commit. Though for the warning
      under discussion, this does mean that if they don't take any
      action, they'll end up running a useless gc every _other_ commit.
      So I'd probably keep the log and let it expire naturally after the
      1-day gc.logExpiry period.

-Peff

  reply	other threads:[~2018-07-16 21:09 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-16 17:27 [PATCH] gc: do not warn about too many loose objects Jonathan Tan
2018-07-16 17:51 ` Jeff King
2018-07-16 18:22   ` Jonathan Nieder
2018-07-16 18:52     ` Jeff King
2018-07-16 19:09       ` Jonathan Nieder
2018-07-16 19:41         ` Jeff King
2018-07-16 19:54           ` Jonathan Nieder
2018-07-16 20:29             ` Jeff King
2018-07-16 20:37               ` Jonathan Nieder
2018-07-16 21:09                 ` Jeff King [this message]
2018-07-16 21:40                   ` Jonathan Nieder
2018-07-16 21:45                     ` Jeff King
2018-07-16 22:03                       ` Jonathan Nieder
2018-07-16 22:43                         ` Jeff King
2018-07-16 22:56                           ` Jonathan Nieder
2018-07-16 23:26                             ` Jeff King
2018-07-17  1:53                               ` Jonathan Nieder
2018-07-17  8:59                                 ` Ævar Arnfjörð Bjarmason
2018-07-17 14:03                                   ` Jonathan Nieder
2018-07-17 15:24                                     ` Ævar Arnfjörð Bjarmason
2018-07-17 20:27                                   ` Jeff King
2018-07-18 13:11                                     ` Ævar Arnfjörð Bjarmason
2018-07-18 17:29                                       ` Jeff King
2018-07-17 15:59                                 ` Duy Nguyen
2018-07-17 18:09                                 ` Junio C Hamano
2018-07-16 19:15 ` Elijah Newren
2018-07-16 19:19   ` Jonathan Nieder
2018-07-16 20:21     ` Elijah Newren
2018-07-16 20:35       ` Jeff King
2018-07-16 20:56         ` Jonathan Nieder
2018-07-16 21:12           ` Jeff King
2018-07-16 19:52   ` Jeff King
2018-07-16 20:16     ` Elijah Newren
2018-07-16 20:38       ` Jeff King
2018-07-16 21:09         ` Elijah Newren
2018-07-16 21:21           ` Jeff King
2018-07-16 22:07             ` Elijah Newren
2018-07-16 22:55               ` Jeff King
2018-07-16 23:06                 ` Elijah Newren
2018-07-16 21:31           ` Jonathan Nieder
2018-07-17  6:51 ` [PATCH v2 0/3] gc --auto: do not return error for prior errors in daemonized mode Jonathan Nieder
2018-07-17  6:53   ` [PATCH 1/3] gc: improve handling of errors reading gc.log Jonathan Nieder
2018-07-17 18:19     ` Junio C Hamano
2018-07-17 19:58     ` Jeff King
2018-07-17  6:54   ` [PATCH 2/3] gc: exit with status 128 on failure Jonathan Nieder
2018-07-17 18:22     ` Junio C Hamano
2018-07-17 19:59     ` Jeff King
2018-09-17 18:33       ` Jeff King
2018-09-17 18:40         ` Jonathan Nieder
2018-09-18 17:30           ` Jeff King
2018-07-17  6:57   ` [PATCH 3/3] gc: do not return error for prior errors in daemonized mode Jonathan Nieder
2018-07-17 20:13     ` Jeff King
2018-07-18 16:21       ` Junio C Hamano
2018-07-18 17:22         ` Jeff King
2018-07-18 18:19           ` Junio C Hamano
2018-07-18 19:06             ` Jeff King
2018-07-18 19:55               ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180716210938.GF25189@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).