git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: [RFC] git gc "--prune=now" semantics considered harmful
Date: Fri, 1 Jun 2018 03:04:56 -0400	[thread overview]
Message-ID: <20180601070456.GB15578@sigill.intra.peff.net> (raw)
In-Reply-To: <xmqqd0xim1tp.fsf@gitster-ct.c.googlers.com>

On Sun, May 27, 2018 at 08:31:14AM +0900, Junio C Hamano wrote:

> > So I actually would much prefer that foir git gc, "--prune=now" means
> >
> >  (a) "now"
> >
> >  (b) now at the _start_ of the "git gc" operation, not the time at
> >      the _end_ of the operation when we've already spent a minute or
> >      two doing repacking and are now doing the final pruning.
> >
> > anyway, with that explanation in mind, I'm appending a patch that is 
> > pretty small and does that. It's a bit hacky, but I think it still makes 
> > sense.
> >
> > Comments?
> 
> Closing the possiblity of racing a running "gc" and new object
> creation like the above generally makes sense, I would think,
> whether the creation is due to 'pull/fetch', 'add', or even 'push'.

I think Linus's suggestion is an obvious improvement. It does shorten
the window for confusing things to happen, and I think it makes things
much easier to reason about if all parts of the gc are using the same
timestamp.

Regarding the implementation:

> > -	if (prune_expire && parse_expiry_date(prune_expire, &dummy))
> > -		die(_("failed to parse prune expiry value %s"), prune_expire);
> > +	if (prune_expire) {
> > +		if (!strcmp(prune_expire, "now"))
> > +			prune_expire = show_date(time(NULL), 0, DATE_MODE(ISO8601));
> > +		if (parse_expiry_date(prune_expire, &dummy))
> > +			die(_("failed to parse prune expiry value %s"), prune_expire);
> > +	}

We'd also accept relative times like "5.minutes.ago" (in fact, the
default is a relative 2.weeks.ago, though it's long enough that the
difference between "2 weeks" and "2 weeks plus 5 minutes" may not matter
much). So we probably ought to just normalize _everything_ without even
bothering to match "now". It's a noop for non-relative times, but that's
OK.

> I however have to wonder if there are opposite "oops" end-user
> operation we also need to worry about, i.e. we are doing a large-ish
> fetch, and get bored and run a gc fron another terminal.  Perhaps
> *that* is a bit too stupid to worry about?  Auto-gc deliberately
> does not use 'now' because it wants to leave a grace period to avoid
> exactly that kind of race.

There are still possibilities for a race, even with the grace period.
You can have an unreferenced 2-week-old object sitting on disk, and
somebody can choose to reference it at the same time as we are pruning
it. My freshness patches from a few years ago made things a bit better:

  - when we optimize out the write of an existing object, we now at
    least update its timestamp

  - we consider non-fresh objects reachable from fresh ones to also be
    fresh

But fundamentally none of this is atomic. You can have an old tree, and
while you're pruning somebody writes a new commit referencing it and
sticks that in a ref. It's more common if your grace period is "now",
but it can still happen with any grace period.

-Peff

  parent reply	other threads:[~2018-06-01  7:05 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-26 21:49 [RFC] git gc "--prune=now" semantics considered harmful Linus Torvalds
2018-05-26 23:31 ` Junio C Hamano
2018-05-27  1:27   ` Linus Torvalds
2018-06-01  7:04   ` Jeff King [this message]
2018-06-01 11:07     ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180601070456.GB15578@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).