git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Elijah Newren <newren@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Jonathan Tan <jonathantanmy@google.com>,
	Jonathan Nieder <jrnieder@gmail.com>
Subject: Re: [PATCH] gc: do not warn about too many loose objects
Date: Mon, 16 Jul 2018 18:55:20 -0400	[thread overview]
Message-ID: <20180716225520.GC12482@sigill.intra.peff.net> (raw)
In-Reply-To: <CABPp-BHFinoE1=1bOhiwOrYpLB+kB3yAKbNg77K9kqKDH_1JLA@mail.gmail.com>

On Mon, Jul 16, 2018 at 03:07:34PM -0700, Elijah Newren wrote:

> > If we were to delete those objects, wouldn't it be exactly the same as
> > running "git prune"? Or setting your gc.pruneExpire to "now" or even "5
> > minutes"?  Or are you concerned with taking other objects along for the
> > ride that weren't part of old reflogs? I think that's a valid concern,
> 
> Yes, I was worried about taking other objects along for the ride that
> weren't part of old reflogs.
> 
> > but it's also an issue for objects which were previously referenced in
> > a reflog, but are part of another current operation.
> 
> I'm not certain what you're referring to here.

I mean that an ongoing operation could refer to a blob that just
recently became unreachable via reflog pruning. And then we would delete
it, leaving the repository corrupt.

One of the protections we have against that is that if I ask to write
blob XYZ and we find that we already have the object, Git will freshen
the mtime of the loose object or pack that contains it, protecting it
from pruning. But with your suggestion, we'd delete it immediately,
regardless of the mtime of the containing pack.

Another way to think of it is this: a reflog mentioning an object does
not mean it is the exclusive user of that object. So when we expire it,
that does not mean that it is OK to delete it immediately; there may be
other users.

> > Also, what do you do if there weren't reflogs in the repo? Or the
> > reflogs were deleted (e.g., because branch deletion drops the associated
> > reflog entirely)?
> 
> Yes, there are issues this rule won't help with, but in my experience
> it was a primary (if not sole) actual cause in practice.  (I believe I
> even said elsewhere in this thread that I knew there were unreachable
> objects for other reasons and they might also become large in number).
> At $DAYJOB we've had multiple people including myself hit the "too
> many unreachable loose objects" nasty loop issue (some of us multiple
> different times), and as far as I can tell, most (perhaps even all) of
> them would have been avoided by just "properly" deleting garbage as
> per my object-age-is-reflog-age-if-not-otherwise-referenced rule.

I agree with you that this is a frequent cause, and probably even the
primary one. But my concern is that your loosening increases the risk of
corruption for other cases.

> > I assume by "these objects" you mean ones which used to be reachable
> > from a reflog, but that reflog entry just expired.  I think you'd be
> > sacrificing some race-safety in that case.
> 
> Is that inherently any more race unsafe than 'git prune
> --expire=2.weeks.ago'?  I thought it'd be racy in the same ways, and
> thus a tradeoff folks are already accepting (at least implicitly) when
> running git-gc.  Since these objects are actually 90 days old rather
> than a mere two weeks, it actually seemed more safe to me.  But maybe
> I'm overlooking something with the pack->loose transition that makes
> it more racy?

I think it's worse in terms of race-safety because you're losing one of
the signals that users of the objects can provide to git-prune to tell
it the object is useful: updating the mtime. So yes, you think of the
objects as "90 days old", but we don't know if there are other users.
Has somebody else been accessing them in the meantime?

To be honest, I'm not sure how meaningful that distinction is in
practice. The current scheme is still racy, even if the windows are
shorter in some circumstances. But it seems like cruft packfiles are
a similar amount of work to your scheme, cover more cases, and are
slightly safer. And possibly even give us a long-term route to true
race-free pruning (via the "garbage pack" mark that Jonathan mentioned).

Assuming you buy into the idea that objects in a cruft-pack are not
hurting anything aside from a little wasted storage for up to 2 weeks
(which it sounds like you're at least partially on board with ;) ).

-Peff

  reply	other threads:[~2018-07-16 22:55 UTC|newest]

Thread overview: 57+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-16 17:27 [PATCH] gc: do not warn about too many loose objects Jonathan Tan
2018-07-16 17:51 ` Jeff King
2018-07-16 18:22   ` Jonathan Nieder
2018-07-16 18:52     ` Jeff King
2018-07-16 19:09       ` Jonathan Nieder
2018-07-16 19:41         ` Jeff King
2018-07-16 19:54           ` Jonathan Nieder
2018-07-16 20:29             ` Jeff King
2018-07-16 20:37               ` Jonathan Nieder
2018-07-16 21:09                 ` Jeff King
2018-07-16 21:40                   ` Jonathan Nieder
2018-07-16 21:45                     ` Jeff King
2018-07-16 22:03                       ` Jonathan Nieder
2018-07-16 22:43                         ` Jeff King
2018-07-16 22:56                           ` Jonathan Nieder
2018-07-16 23:26                             ` Jeff King
2018-07-17  1:53                               ` Jonathan Nieder
2018-07-17  8:59                                 ` Ævar Arnfjörð Bjarmason
2018-07-17 14:03                                   ` Jonathan Nieder
2018-07-17 15:24                                     ` Ævar Arnfjörð Bjarmason
2018-07-17 20:27                                   ` Jeff King
2018-07-18 13:11                                     ` Ævar Arnfjörð Bjarmason
2018-07-18 17:29                                       ` Jeff King
2018-07-17 15:59                                 ` Duy Nguyen
2018-07-17 18:09                                 ` Junio C Hamano
2018-07-16 19:15 ` Elijah Newren
2018-07-16 19:19   ` Jonathan Nieder
2018-07-16 20:21     ` Elijah Newren
2018-07-16 20:35       ` Jeff King
2018-07-16 20:56         ` Jonathan Nieder
2018-07-16 21:12           ` Jeff King
2018-07-16 19:52   ` Jeff King
2018-07-16 20:16     ` Elijah Newren
2018-07-16 20:38       ` Jeff King
2018-07-16 21:09         ` Elijah Newren
2018-07-16 21:21           ` Jeff King
2018-07-16 22:07             ` Elijah Newren
2018-07-16 22:55               ` Jeff King [this message]
2018-07-16 23:06                 ` Elijah Newren
2018-07-16 21:31           ` Jonathan Nieder
2018-07-17  6:51 ` [PATCH v2 0/3] gc --auto: do not return error for prior errors in daemonized mode Jonathan Nieder
2018-07-17  6:53   ` [PATCH 1/3] gc: improve handling of errors reading gc.log Jonathan Nieder
2018-07-17 18:19     ` Junio C Hamano
2018-07-17 19:58     ` Jeff King
2018-07-17  6:54   ` [PATCH 2/3] gc: exit with status 128 on failure Jonathan Nieder
2018-07-17 18:22     ` Junio C Hamano
2018-07-17 19:59     ` Jeff King
2018-09-17 18:33       ` Jeff King
2018-09-17 18:40         ` Jonathan Nieder
2018-09-18 17:30           ` Jeff King
2018-07-17  6:57   ` [PATCH 3/3] gc: do not return error for prior errors in daemonized mode Jonathan Nieder
2018-07-17 20:13     ` Jeff King
2018-07-18 16:21       ` Junio C Hamano
2018-07-18 17:22         ` Jeff King
2018-07-18 18:19           ` Junio C Hamano
2018-07-18 19:06             ` Jeff King
2018-07-18 19:55               ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180716225520.GC12482@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).