git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: Jonathan Tan <jonathantanmy@google.com>, git@vger.kernel.org
Subject: Re: [PATCH] gc: do not warn about too many loose objects
Date: Mon, 16 Jul 2018 16:29:16 -0400	[thread overview]
Message-ID: <20180716202915.GC25189@sigill.intra.peff.net> (raw)
In-Reply-To: <20180716195431.GD11513@aiede.svl.corp.google.com>

On Mon, Jul 16, 2018 at 12:54:31PM -0700, Jonathan Nieder wrote:

> Even restricting attention to the warning, not the exit code, I think
> you are saying the current behavior is acceptable.  I do not believe
> it is.

I didn't say that at all. The current situation sucks, and I think the
right solution is to pack the unreachable objects instead of exploding
them.

> I think you are saying Jonathan's patch makes the behavior
> worse, and I'm not seeing it.  Can you describe an example user
> interaction where the current behavior would be better than the
> behavior after Jonathan's patch?  That might help make this discussion
> more concrete.

It makes it worse because there is nothing to throttle a "gc --auto"
that is making no progress.

Try this:

-- >8 --
#!/bin/sh

rm -rf repo

# new mostly-empty repo
git init repo
cd repo
git commit --allow-empty -m base

# make a bunch of unreachable blobs
for i in $(seq 7000); do
  echo "blob"
  echo "data <<EOD"
  echo "cruft $i"
  echo "EOD"
done |
git fast-import

# This gc will explode them (doing a "gc --auto" isn't sufficient here, because
# we don't have enough _other_ material in the repo to trigger a gc. But you
# can imagine that a normal repo would eventually "gc --auto".
git gc

# Now simulate some real work in the repo.
for i in $(seq 50); do
  git commit --allow-empty -m "commit $i"
  sleep 30
done
-- 8< --

With the current code, that produces a bunch of annoying warnings for
every commit ("I couldn't gc because the last gc reported a warning").
But after Jonathan's patch, every single commit will do a full gc of the
repository. In this tiny repository that's relatively quick, but in a
real repo it's a ton of CPU and I/O, all for nothing.

> [...]
> > See the thread I linked earlier on putting unreachable objects into a
> > pack, which I think is the real solution.
> 
> Have you looked over the discussion in "Loose objects and unreachable
> objects" in Documentation/technical/hash-function-transition.txt?  Do
> you have thoughts on it (preferrably in a separate thread)?

It seems to propose putting the unreachable objects into a pack. Which
yes, I absolutely agree with (as I thought I'd been saying in every
single email in this thread).

> > I mean that if you do not write a persistent log, then "gc --auto" will
> > do an unproductive gc every time it is invoked. I.e., it will see "oh,
> > there are too many loose objects", and then waste a bunch of CPU every
> > time you commit.
> 
> If so, then this would already be the behavior in non daemon mode.
> Are you saying this accidental effect of daemon mode is in fact
> intentional?

I'm not sure if I'd call it intentional, since I don't recall ever
seeing this side effect discussed. But since daemonizing is the default,
I think that's the case people have focused on when they hit annoying
cases. E.g., a831c06a2b (gc: ignore old gc.log files, 2017-02-10).

I'd consider the fact that "gc --auto" with gc.autoDetach=false will
repeatedly do useless work to be a minor bug. But I think Jonathan's
patch makes it even worse because we do not even tell people that
useless work is being done (while making them wait for each gc to
finish!). Even if we were going to remove this message to help the
daemonized case, I think we'd want to retain it for the non-daemon case.

> > A daemonized git-gc runs a bunch of sub-commands (e.g., "git prune")
> > with their stderr redirected into the logfile. If you want to have
> > warnings go somewhere else, then either:
> >
> >   - you need some way to tell those sub-commands to send the warnings
> >     elsewhere (i.e., _not_ stderr)
> >
> >     or
> >
> >   - you have to post-process the output they send to separate warnings
> >     from other errors. Right now that means scraping. If you are
> >     proposing a system of machine-readable output, it would need to work
> >     not just for git-gc, but for every sub-command it runs.
> 
>       or
> 
>     - you can simply record and trust the exit code
> 
> A simple way to do that without changing formats is to truncate the
> file when exiting with status 0.

That's a different behavior than what we do now (and what was suggested
earlier), in that it assumes that anything written to stderr is OK for
gc to hide from the user if the process exits with code zero.

That's probably OK in most cases, though I wonder if there are corner
cases. For example, if you have a corrupt ref, we used to say "error:
refs/heads/foo does not point to a valid object!" but otherwise ignore
it. These days git-gc sets REF_PARANOIA=1, so we'll actually barf on a
corrupt ref. But I wonder if there are other cases lurking.

-Peff

  reply	other threads:[~2018-07-16 20:29 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-16 17:27 [PATCH] gc: do not warn about too many loose objects Jonathan Tan
2018-07-16 17:51 ` Jeff King
2018-07-16 18:22   ` Jonathan Nieder
2018-07-16 18:52     ` Jeff King
2018-07-16 19:09       ` Jonathan Nieder
2018-07-16 19:41         ` Jeff King
2018-07-16 19:54           ` Jonathan Nieder
2018-07-16 20:29             ` Jeff King [this message]
2018-07-16 20:37               ` Jonathan Nieder
2018-07-16 21:09                 ` Jeff King
2018-07-16 21:40                   ` Jonathan Nieder
2018-07-16 21:45                     ` Jeff King
2018-07-16 22:03                       ` Jonathan Nieder
2018-07-16 22:43                         ` Jeff King
2018-07-16 22:56                           ` Jonathan Nieder
2018-07-16 23:26                             ` Jeff King
2018-07-17  1:53                               ` Jonathan Nieder
2018-07-17  8:59                                 ` Ævar Arnfjörð Bjarmason
2018-07-17 14:03                                   ` Jonathan Nieder
2018-07-17 15:24                                     ` Ævar Arnfjörð Bjarmason
2018-07-17 20:27                                   ` Jeff King
2018-07-18 13:11                                     ` Ævar Arnfjörð Bjarmason
2018-07-18 17:29                                       ` Jeff King
2018-07-17 15:59                                 ` Duy Nguyen
2018-07-17 18:09                                 ` Junio C Hamano
2018-07-16 19:15 ` Elijah Newren
2018-07-16 19:19   ` Jonathan Nieder
2018-07-16 20:21     ` Elijah Newren
2018-07-16 20:35       ` Jeff King
2018-07-16 20:56         ` Jonathan Nieder
2018-07-16 21:12           ` Jeff King
2018-07-16 19:52   ` Jeff King
2018-07-16 20:16     ` Elijah Newren
2018-07-16 20:38       ` Jeff King
2018-07-16 21:09         ` Elijah Newren
2018-07-16 21:21           ` Jeff King
2018-07-16 22:07             ` Elijah Newren
2018-07-16 22:55               ` Jeff King
2018-07-16 23:06                 ` Elijah Newren
2018-07-16 21:31           ` Jonathan Nieder
2018-07-17  6:51 ` [PATCH v2 0/3] gc --auto: do not return error for prior errors in daemonized mode Jonathan Nieder
2018-07-17  6:53   ` [PATCH 1/3] gc: improve handling of errors reading gc.log Jonathan Nieder
2018-07-17 18:19     ` Junio C Hamano
2018-07-17 19:58     ` Jeff King
2018-07-17  6:54   ` [PATCH 2/3] gc: exit with status 128 on failure Jonathan Nieder
2018-07-17 18:22     ` Junio C Hamano
2018-07-17 19:59     ` Jeff King
2018-09-17 18:33       ` Jeff King
2018-09-17 18:40         ` Jonathan Nieder
2018-09-18 17:30           ` Jeff King
2018-07-17  6:57   ` [PATCH 3/3] gc: do not return error for prior errors in daemonized mode Jonathan Nieder
2018-07-17 20:13     ` Jeff King
2018-07-18 16:21       ` Junio C Hamano
2018-07-18 17:22         ` Jeff King
2018-07-18 18:19           ` Junio C Hamano
2018-07-18 19:06             ` Jeff King
2018-07-18 19:55               ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180716202915.GC25189@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).