From: Jeff King <peff@peff.net>
To: Lars Schneider <larsxschneider@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: "There are too many unreachable loose objects" - why don't we run 'git prune' automatically?
Date: Sat, 10 Jun 2017 04:06:27 -0400 [thread overview]
Message-ID: <20170610080626.sjujpmgkli4muh7h@sigill.intra.peff.net> (raw)
In-Reply-To: <7497DFA7-3F4E-4DB2-B31B-FDDEB2F30BB8@gmail.com>
On Fri, Jun 09, 2017 at 02:03:18PM +0200, Lars Schneider wrote:
> > I agree the existing message isn't great. There should probably be a big
> > advise() block explaining what's going on (and that expert users can
> > disable).
>
> How about this?
>
> diff --git a/builtin/gc.c b/builtin/gc.c
> index c2c61a57bb..12ee212544 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -473,9 +473,18 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
> if (pack_garbage.nr > 0)
> clean_pack_garbage();
>
> - if (auto_gc && too_many_loose_objects())
> - warning(_("There are too many unreachable loose objects; "
> - "run 'git prune' to remove them."));
> + if (auto_gc && too_many_loose_objects()) {
> + warning(_("Auto packing did not lead to optimal results as the "
> + "repository contains too many unreachable objects."));
> + advice(_("Unreachable objects are Git objects (commits, files, ...) "
> + "that are not referenced by any branch or tag. This might happen "
> + "if you use 'git rebase' or if you delete branches. Auto packing "
> + "only prunes unreachable objects that are older than 2 weeks "
> + "(default, overridable by the config variable 'gc.pruneExpire'). "
> + "Please run 'git prune' to prune all unreachable objects for "
> + "optimal repository performance."));
> + }
s/advice/advise/, of course. This probably be protected by a new entry
in advice_config[] in advice.c.
But I assume you are most interested in the text. I think it
simultaneously goes into too much and too little detail. I think the
warning itself should just say _what_ we observed: after garbage
collection, there were still enough objects to trigger a gc. And then
the hint doesn't need to go into the details of why we prune or what
unreachable objects are. Those can be cross-referenced with other
documentation. I think we need to focus on what the warning means, and
whether and how they would correct it.
Maybe:
warning: too many loose objects remain after garbage collection
hint: Automatic garbage collection is triggered when there are a
hint: large number of unpacked objects in the repository. Unreachable
hint: objects that are more recent than gc.pruneExpire are not
hint: pruned. If there are too many of these recent loose
hint: objects, automatic garbage collection may be triggered more
hint: frequently than necessary. You may run "git prune" now to
hint: prune all unreachable objects, regardless of their age.
I was tempted to suggest that we find and report the correct "prune"
cutoff that would let us avoid auto-gc. I.e., sort the unreachable
objects by timestamp and find the cutoff that will drop enough to leave
fewer than `gc.auto`. That in theory makes things a bit safer. That's
probably not a good idea, though:
1. Telling the user to run `git prune --expire=37.minutes.ago` is
just going to confuse them. We could hide it behind a command line
option like `git prune --expire-auto-gc` or something, though.
2. Now that we try to keep recent chunks, the analysis isn't quite so
easy. You may have a single recent commit that references a ton of
old history, and only dropping that commit would help. So the
analysis is harder than a simple sort-and-cutoff, but it also means
that the prune times are likely to skew close to "now".
3. If we just show them how to prune the minimal amount, then they're
likely to just hit this message again soon.
So that's probably a dead end.
To be honest, the fact that we have to write this warning at all is a
sign that Git is not doing a very good job. The best place to spend
effort would be to teach git-gc to pack all of the unreachable objects
into a single "cruft" pack, so this problem doesn't happen at all (and
it's way more efficient, as well).
The big problem with that approach is that we lose individual-object
timestamps. Each object just gets the timestamp of its surrounding pack,
so as we continually ran auto-gc, the cruft-pack timestamp would get
updated and we'd never drop objects. So we'd need some auxiliary file
(e.g., pack-1234abcd.times) that stores the per-object timestamps. This
can be as small as a 32- or 64-bit int per object, since we can just
index it by the existing object list in the pack .idx.
The trickiest part would be when an object's timestamp gets freshened
(because somebody tried to write it again but we optimized out the
write). Updating the timestamps in the .times file would probably work
atomically, though we usually avoid writing in the middle of a file (we
certainly can't portably do so via mmap, and I can't think of another
case where we do seeked writes). It might be sufficient for objects in
the cruft pack to just do the actual loose object write instead of
optimizing it out. It should happen very rarely.
-Peff
next prev parent reply other threads:[~2017-06-10 8:06 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-08 12:45 "There are too many unreachable loose objects" - why don't we run 'git prune' automatically? Lars Schneider
2017-06-09 5:27 ` Jeff King
2017-06-09 12:03 ` Lars Schneider
2017-06-10 8:06 ` Jeff King [this message]
2017-06-18 13:22 ` Lars Schneider
2017-06-20 14:08 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170610080626.sjujpmgkli4muh7h@sigill.intra.peff.net \
--to=peff@peff.net \
--cc=git@vger.kernel.org \
--cc=larsxschneider@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).