git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Elijah Newren <newren@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Git Mailing List <git@vger.kernel.org>,
	Jonathan Tan <jonathantanmy@google.com>,
	Jonathan Nieder <jrnieder@gmail.com>
Subject: Re: [PATCH] gc: do not warn about too many loose objects
Date: Mon, 16 Jul 2018 14:09:43 -0700	[thread overview]
Message-ID: <CABPp-BEpCF9FE7eJwZWjY+bMsjDQnnDaSrHO+e3DtDDsR-=7Hg@mail.gmail.com> (raw)
In-Reply-To: <20180716203847.GE25189@sigill.intra.peff.net>

On Mon, Jul 16, 2018 at 1:38 PM, Jeff King <peff@peff.net> wrote:
> On Mon, Jul 16, 2018 at 01:16:45PM -0700, Elijah Newren wrote:
>
>> >> The basic problem here, at least for us, is that gc has enough
>> >> information to know it could expunge some objects, but because of how
>> >> it is structured in terms of several substeps (reflog expiration,
>> >> repack, prune), the information is lost between the steps and it
>> >> instead writes them out as unreachable objects.  If we could prune (or
>> >> avoid exploding) loose objects that are only reachable from reflog
>> >> entries that we are expiring, then the problem goes away for us.  (I
>> >> totally understand that other repos may have enough unreachable
>> >> objects for other reasons that Peff's suggestion to just pack up
>> >> unreachable objects is still a really good idea.  But on its own, it
>> >> seems like a waste since it's packing stuff that we know we could just
>> >> expunge.)
>> >
>> > No, we should have expunged everything that could be during the "repack"
>> > and "prune" steps. We feed the expiration time to repack, so that it
>> > knows to drop objects entirely instead of exploding them loose.
>>
>> Um, except it doesn't actually do that.  The testcase I provided shows
>> that it leaves around 10000 objects that are totally deletable and
>> were only previously referenced by reflog entries -- entries that gc
>> removed without removing the corresponding objects.
>
> What's your definition of "totally deletable"?

The point of gc is to: expire reflogs, repack referenced objects, and
delete loose objects that (1) are no longer referenced and (2) are
"old enough".

The "old enough" bit is the special part here.  Before the gc, those
objects were referenced (only by old reflog entries) and were found in
a pack.  git currently writes those objects out to disk and gives them
the age of the packfile they are contained in, which I think is the
source of the bug.  We have a reference for those objects from the
reflogs, know they aren't referenced anywhere else, so those objects
to me are the age of those reflog entries: 90 days.  As such, they are
"old enough" and should be deleted.

With big repos, it's easy to get into situations where there are well
more than 10000 objects satisfying these conditions.  In fact, it's
not all that rare that the repo has far more loose objects after a git
gc than before.

I never got around to fixing it properly, though, because 'git prune'
is such a handy workaround that for now.  Having people nuke all their
loose objects is a bit risky, but the only other workaround people had
was to re-clone (waiting the requisite 20+ minutes for the repo to
download) and throw away their old clone.  (Though some people even
went that route, IIRC.)

> If the pack they were in is less than 2 weeks old, then they are
> unreachable but "fresh", and are intentionally retained. If it was older
> than 2 weeks, they should have been deleted.

That's what's currently implemented, yes, but as above I think that
logic is bad.

> If you don't like that policy, you can set gc.pruneExpire to something
> lower (including "now", but watch out, as that can race with other
> operations!). But I don't think that's an implementation short-coming.
> It's intentional to keep those objects. It's just that the format
> they're in is inefficient and confuses later gc runs.

I'm aware of pruneExpire, but I think it's the wrong way to handle this.

I totally agree with your general plan to put unreferenced loose
objects into a pack.  However, I don't think these objects should be
part of that pack; they should just be deleted instead.

  reply	other threads:[~2018-07-16 21:09 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-16 17:27 [PATCH] gc: do not warn about too many loose objects Jonathan Tan
2018-07-16 17:51 ` Jeff King
2018-07-16 18:22   ` Jonathan Nieder
2018-07-16 18:52     ` Jeff King
2018-07-16 19:09       ` Jonathan Nieder
2018-07-16 19:41         ` Jeff King
2018-07-16 19:54           ` Jonathan Nieder
2018-07-16 20:29             ` Jeff King
2018-07-16 20:37               ` Jonathan Nieder
2018-07-16 21:09                 ` Jeff King
2018-07-16 21:40                   ` Jonathan Nieder
2018-07-16 21:45                     ` Jeff King
2018-07-16 22:03                       ` Jonathan Nieder
2018-07-16 22:43                         ` Jeff King
2018-07-16 22:56                           ` Jonathan Nieder
2018-07-16 23:26                             ` Jeff King
2018-07-17  1:53                               ` Jonathan Nieder
2018-07-17  8:59                                 ` Ævar Arnfjörð Bjarmason
2018-07-17 14:03                                   ` Jonathan Nieder
2018-07-17 15:24                                     ` Ævar Arnfjörð Bjarmason
2018-07-17 20:27                                   ` Jeff King
2018-07-18 13:11                                     ` Ævar Arnfjörð Bjarmason
2018-07-18 17:29                                       ` Jeff King
2018-07-17 15:59                                 ` Duy Nguyen
2018-07-17 18:09                                 ` Junio C Hamano
2018-07-16 19:15 ` Elijah Newren
2018-07-16 19:19   ` Jonathan Nieder
2018-07-16 20:21     ` Elijah Newren
2018-07-16 20:35       ` Jeff King
2018-07-16 20:56         ` Jonathan Nieder
2018-07-16 21:12           ` Jeff King
2018-07-16 19:52   ` Jeff King
2018-07-16 20:16     ` Elijah Newren
2018-07-16 20:38       ` Jeff King
2018-07-16 21:09         ` Elijah Newren [this message]
2018-07-16 21:21           ` Jeff King
2018-07-16 22:07             ` Elijah Newren
2018-07-16 22:55               ` Jeff King
2018-07-16 23:06                 ` Elijah Newren
2018-07-16 21:31           ` Jonathan Nieder
2018-07-17  6:51 ` [PATCH v2 0/3] gc --auto: do not return error for prior errors in daemonized mode Jonathan Nieder
2018-07-17  6:53   ` [PATCH 1/3] gc: improve handling of errors reading gc.log Jonathan Nieder
2018-07-17 18:19     ` Junio C Hamano
2018-07-17 19:58     ` Jeff King
2018-07-17  6:54   ` [PATCH 2/3] gc: exit with status 128 on failure Jonathan Nieder
2018-07-17 18:22     ` Junio C Hamano
2018-07-17 19:59     ` Jeff King
2018-09-17 18:33       ` Jeff King
2018-09-17 18:40         ` Jonathan Nieder
2018-09-18 17:30           ` Jeff King
2018-07-17  6:57   ` [PATCH 3/3] gc: do not return error for prior errors in daemonized mode Jonathan Nieder
2018-07-17 20:13     ` Jeff King
2018-07-18 16:21       ` Junio C Hamano
2018-07-18 17:22         ` Jeff King
2018-07-18 18:19           ` Junio C Hamano
2018-07-18 19:06             ` Jeff King
2018-07-18 19:55               ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABPp-BEpCF9FE7eJwZWjY+bMsjDQnnDaSrHO+e3DtDDsR-=7Hg@mail.gmail.com' \
    --to=newren@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).