git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Git List" <git@vger.kernel.org>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Michael Haggerty" <mhagger@alum.mit.edu>,
	"Stefan Beller" <stefanbeller@gmail.com>,
	"Jonathan Nieder" <jrnieder@gmail.com>
Subject: Re: BUG: Race condition due to reflog expiry in "gc"
Date: Wed, 13 Mar 2019 12:02:05 -0400	[thread overview]
Message-ID: <20190313160204.GD24101@sigill.intra.peff.net> (raw)
In-Reply-To: <87sgvrax0o.fsf@evledraar.gmail.com>

On Wed, Mar 13, 2019 at 11:28:39AM +0100, Ævar Arnfjörð Bjarmason wrote:

> I was under the impression that git-gc was supposed to support operating
> on a repository that's concurrently being modified, as long as you don't
> set the likes of gc.pruneExpire too aggressively.

To some degree. If it has to take locks to modify items, then inherently
there's going to be lock contention. But we may be able to work around
it some.

A big one we used to hit at GitHub is that running `git pack-refs` needs
to take the individual ref locks when pruning loose refs that have just
been packed. We dealt with that by adding retry-with-timeout logic to
ref lock acquisition, in 4ff0f01cb7 (refs: retry acquiring reference
locks for 100ms, 2017-08-21). Since then, I can't remember seeing a
single instance of this coming up in production use.

I don't think we use a timeout with the reflog lock. Maybe we ought to.
It might need to be longer than the 100ms default for refs, since I
think we'd do more significant work during the reflog expiration. On the
other hand, I think we hold the lock on the ref itself, too, during that
expiration.

We don't actually expire reflogs regularly at GitHub, so I can't say one
way or the other if there would be lock contention there.

> Running a "gc" in a loop without "git reflog expire --all" and when
> watching the repository being GC'd with:
> 
>     fswatch -l 0.1 -t -r . 2>&1 | grep lock
> 
> We only create .git/MERGE_RR.lock, .git/gc.pid.lock and
> git/packed-refs.lock. These are all things that would only cause another
> concurrent GC to fail, not a normal git command.

The packed-refs.lock can conflict with a normal operation; regular
writers need to update it when they delete a ref.

You'd also be locking regular refs as part of "pack-refs --prune", but
you probably don't see it running gc in a loop, because all of the loose
refs are pruned after the first run.

> I'm just including that as illustration that add_reflogs_to_pending() in
> revision.c during "gc" already iterates over the reflogs without locking
> anything, but of course it's just reading them.

Right. It's always safe to read without locking (refs, packed-refs, and
reflogs).

> So just this fixes that:
> 
>     diff --git a/refs/files-backend.c b/refs/files-backend.c
>     index ef053f716c..b6576f28b9 100644
>     --- a/refs/files-backend.c
>     +++ b/refs/files-backend.c
>     @@ -3037,7 +3037,7 @@ static int files_reflog_expire(struct ref_store *ref_store,
>      	 * reference itself, plus we might need to update the
>      	 * reference if --updateref was specified:
>      	 */
>     -	lock = lock_ref_oid_basic(refs, refname, oid,
>     +	lock = lock_ref_oid_basic(refs, refname, NULL,
>      				  NULL, NULL, REF_NO_DEREF,
>      				  &type, &err);
>      	if (!lock) {
> 
> Which seems sensible to me. We'll still get the lock, we just don't
> assert that the refname we use to get the lock must be at that
> SHA-1. We'll still use it for the purposes of expiry.

I _think_ this should be OK, because we don't open the reflog until
after we hold the lock. So we don't really care what the value was at.
Wherever we got the lock, we'll do the expiration process atomically at
that point.

I don't think this makes all your problems go away, though, because
you'd still have the immediate contention of actually taking the lock
(at any value).

> But maybe I've missed some caveat in reflog_expiry_prepare() and friends
> and we really do need the reflog at that OID, then this would suck less:

If I'm reading the code correctly, we don't call reflog_expiry_prepare()
until we're holding the lock (because we pass it in as a callback to
reflog_expire()). So I think that would be OK.

>      	if (!lock) {
>     +		if (errno == EBUSY) {
>     +			warning("cannot lock ref '%s': %s. Skipping!", refname, err.buf);
>     +			strbuf_release(&err);
>     +			return -2;
>     +		}
>      		error("cannot lock ref '%s': %s", refname, err.buf);
>      		strbuf_release(&err);
>      		return -1;
> 
> I.e. we just detect the EBUSY that verify_lock() sets if the OID doesn't
> match, and don't prune that reflog.

That saves git-gc from failing. But do we have the problem in the other
direction? I.e., that the gc would take a lock, and the actual user
request to update a ref would fail?

That's what the retry-with-timeout is supposed to address, so maybe it
works. But I wouldn't be surprised if it's insufficient in practice,
since the reflog code may walk big parts of the graph under lock,
checking reachability.

-Peff

  reply	other threads:[~2019-03-13 16:03 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-12 23:28 BUG: Race condition due to reflog expiry in "gc" Ævar Arnfjörð Bjarmason
2019-03-13  1:43 ` Junio C Hamano
2019-03-13 10:28   ` Ævar Arnfjörð Bjarmason
2019-03-13 16:02     ` Jeff King [this message]
2019-03-13 16:22       ` Ævar Arnfjörð Bjarmason
2019-03-13 23:54         ` [PATCH 0/5] gc: minor code cleanup + contention fixes Ævar Arnfjörð Bjarmason
2019-03-14 12:34           ` [PATCH v2 0/7] " Ævar Arnfjörð Bjarmason
2019-03-15 15:59             ` [PATCH v3 0/8] " Ævar Arnfjörð Bjarmason
2019-03-18  6:13               ` Junio C Hamano
2019-03-28 16:14               ` [PATCH v4 0/7] gc: tests and handle reflog expire config Ævar Arnfjörð Bjarmason
2019-03-28 16:14               ` [PATCH v4 1/7] gc: remove redundant check for gc_auto_threshold Ævar Arnfjörð Bjarmason
2019-03-28 16:14               ` [PATCH v4 2/7] gc: convert to using the_hash_algo Ævar Arnfjörð Bjarmason
2019-03-28 16:14               ` [PATCH v4 3/7] gc: refactor a "call me once" pattern Ævar Arnfjörð Bjarmason
2019-03-28 16:14               ` [PATCH v4 4/7] reflog tests: make use of "test_config" idiom Ævar Arnfjörð Bjarmason
2019-03-28 16:14               ` [PATCH v4 5/7] reflog tests: test for the "points nowhere" warning Ævar Arnfjörð Bjarmason
2019-03-28 16:14               ` [PATCH v4 6/7] reflog tests: assert lack of early exit with expiry="never" Ævar Arnfjörð Bjarmason
2019-03-28 16:14               ` [PATCH v4 7/7] gc: handle & check gc.reflogExpire config Ævar Arnfjörð Bjarmason
2019-03-15 15:59             ` [PATCH v3 1/8] gc: remove redundant check for gc_auto_threshold Ævar Arnfjörð Bjarmason
2019-03-15 15:59             ` [PATCH v3 2/8] gc: convert to using the_hash_algo Ævar Arnfjörð Bjarmason
2019-03-15 15:59             ` [PATCH v3 3/8] gc: refactor a "call me once" pattern Ævar Arnfjörð Bjarmason
2019-03-15 15:59             ` [PATCH v3 4/8] reflog tests: make use of "test_config" idiom Ævar Arnfjörð Bjarmason
2019-03-15 15:59             ` [PATCH v3 5/8] reflog tests: test for the "points nowhere" warning Ævar Arnfjörð Bjarmason
2019-03-15 15:59             ` [PATCH v3 6/8] reflog tests: assert lack of early exit with expiry="never" Ævar Arnfjörð Bjarmason
2019-03-19  6:20               ` Jeff King
2019-03-15 15:59             ` [PATCH v3 7/8] gc: handle & check gc.reflogExpire config Ævar Arnfjörð Bjarmason
2019-03-15 15:59             ` [PATCH v3 8/8] reflog expire: don't assert the OID when locking refs Ævar Arnfjörð Bjarmason
     [not found]               ` <b870a17d-2103-41b8-3cbc-7389d5fff33a@alum.mit.edu>
2019-03-21 14:10                 ` Ævar Arnfjörð Bjarmason
2019-03-19  6:27             ` [PATCH v2 0/7] gc: minor code cleanup + contention fixes Jeff King
2019-03-14 12:34           ` [PATCH v2 1/7] gc: remove redundant check for gc_auto_threshold Ævar Arnfjörð Bjarmason
2019-03-14 12:34           ` [PATCH v2 2/7] gc: convert to using the_hash_algo Ævar Arnfjörð Bjarmason
2019-03-15  9:51             ` Duy Nguyen
2019-03-15 10:42               ` Ævar Arnfjörð Bjarmason
2019-03-15 15:49                 ` Johannes Schindelin
2019-03-14 12:34           ` [PATCH v2 3/7] gc: refactor a "call me once" pattern Ævar Arnfjörð Bjarmason
2019-03-14 12:34           ` [PATCH v2 4/7] reflog tests: make use of "test_config" idiom Ævar Arnfjörð Bjarmason
2019-03-14 12:34           ` [PATCH v2 5/7] reflog: exit early if there's no work to do Ævar Arnfjörð Bjarmason
2019-03-15 10:01             ` Duy Nguyen
2019-03-15 15:51               ` Ævar Arnfjörð Bjarmason
2019-03-14 12:34           ` [PATCH v2 6/7] gc: don't run "reflog expire" when keeping reflogs Ævar Arnfjörð Bjarmason
2019-03-15 10:05             ` Duy Nguyen
2019-03-15 10:24               ` Ævar Arnfjörð Bjarmason
2019-03-15 10:32                 ` Duy Nguyen
2019-03-15 15:51                 ` Johannes Schindelin
2019-03-18  6:04                 ` Junio C Hamano
2019-03-14 12:34           ` [PATCH v2 7/7] reflog expire: don't assert the OID when locking refs Ævar Arnfjörð Bjarmason
2019-03-15 11:10             ` Duy Nguyen
2019-03-15 15:52               ` Ævar Arnfjörð Bjarmason
2019-03-13 23:54         ` [PATCH 1/5] gc: remove redundant check for gc_auto_threshold Ævar Arnfjörð Bjarmason
2019-03-14  0:25           ` Jeff King
2019-03-13 23:54         ` [PATCH 2/5] gc: convert to using the_hash_algo Ævar Arnfjörð Bjarmason
2019-03-14  0:26           ` Jeff King
2019-03-13 23:54         ` [PATCH 3/5] gc: refactor a "call me once" pattern Ævar Arnfjörð Bjarmason
2019-03-14  0:30           ` Jeff King
2019-03-13 23:54         ` [PATCH 4/5] gc: don't run "reflog expire" when keeping reflogs Ævar Arnfjörð Bjarmason
2019-03-14  0:40           ` Jeff King
2019-03-14  4:51             ` Junio C Hamano
2019-03-14 19:26               ` Jeff King
2019-03-13 23:54         ` [PATCH 5/5] reflog expire: don't assert the OID when locking refs Ævar Arnfjörð Bjarmason
2019-03-14  0:44           ` Jeff King
2019-03-14  0:25         ` BUG: Race condition due to reflog expiry in "gc" Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190313160204.GD24101@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jrnieder@gmail.com \
    --cc=mhagger@alum.mit.edu \
    --cc=pclouds@gmail.com \
    --cc=stefanbeller@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).