From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Junio C Hamano <gitster@pobox.com>
Cc: "Git List" <git@vger.kernel.org>,
"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
"Michael Haggerty" <mhagger@alum.mit.edu>,
"Stefan Beller" <stefanbeller@gmail.com>,
"Jeff King" <peff@peff.net>,
"Jonathan Nieder" <jrnieder@gmail.com>
Subject: Re: BUG: Race condition due to reflog expiry in "gc"
Date: Wed, 13 Mar 2019 11:28:39 +0100 [thread overview]
Message-ID: <87sgvrax0o.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <xmqq1s3bh7ky.fsf@gitster-ct.c.googlers.com>
On Wed, Mar 13 2019, Junio C Hamano wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> I'm still working on fixing a race condition I encountered in "gc"
>> recently, but am not 100% sure of the fix. So I thought I'd send a
>> braindump of what I have so far in case it jolts any memories.
>>
>> The problem is that when we "gc" we run "reflog expire --all". This
>> iterates over the reflogs, and takes a *.lock for each reference.
>>
>> It'll fail intermittendly in two ways:
>>
>> 1. If something is concurrently committing to the repo it'll fail
>> because we for a tiny amount of time hold a HEAD.lock file, so HEAD
>> can't be updated.
>>
>> 2. On a repository that's just being "git fetch"'d by some concurrent
>> process the "gc" will fail, because the ref's SHA1 has changed,
>> which we inspect as we aquire the lock.
>
> Both sounds very much expected and expectable outcome. I am not
> sure how they need to be called bugs.
Let's leave aside that I started the subject with "BUG:" and let me
rephrase.
I was under the impression that git-gc was supposed to support operating
on a repository that's concurrently being modified, as long as you don't
set the likes of gc.pruneExpire too aggressively.
Running a "gc" in a loop without "git reflog expire --all" and when
watching the repository being GC'd with:
fswatch -l 0.1 -t -r . 2>&1 | grep lock
We only create .git/MERGE_RR.lock, .git/gc.pid.lock and
git/packed-refs.lock. These are all things that would only cause another
concurrent GC to fail, not a normal git command.
So the only reason a concurrent commit (case #1) fails is because of the
refs being locked during the reflog iteration, and similarly "gc" itself
will fail due to a concurrently updating ref (case #2).
It seems that first of all we need this, I'll submit that as a separate
patch sometime soon:
diff --git a/builtin/gc.c b/builtin/gc.c
index 020f725acc..ae488646e1 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -127,6 +127,12 @@ static void gc_config(void)
pack_refs = git_config_bool("gc.packrefs", value);
}
+ if (!git_config_get_value("gc.reflogexpire", &value) && value &&
+ !strcmp(value, "never") &&
+ !git_config_get_value("gc.reflogexpireunreachable", &value) && value &&
+ !strcmp(value, "never"))
+ prune_reflogs = 0;
+
git_config_get_int("gc.aggressivewindow", &aggressive_window);
git_config_get_int("gc.aggressivedepth", &aggressive_depth);
git_config_get_int("gc.auto", &gc_auto_threshold);
I.e. now even if your gc.* config says you don't want the reflogs
touched, we still pointlessly iterate over all of them. The case I'm
running into (a variant of #2) is one solved by that patch, i.e. I'm
fine "gc" just having the reflogs kept forever as a workaround in this
case.
Something like that should have been added back in 62aad1849f ("gc
--auto: do not lock refs in the background", 2014-05-25), i.e. now the
"prune_reflogs" variable is never used, it's just cargo-culted from a
copy/pasting of the "pack_refs" code.
In other "gc" phases in "pack-objects" and "prune" we also look at the
reflogs. This obviously bad patch ignores them entirely:
diff --git a/builtin/prune.c b/builtin/prune.c
index 97613eccb5..bccee7813e 100644
--- a/builtin/prune.c
+++ b/builtin/prune.c
@@ -41,7 +41,7 @@ static void perform_reachability_traversal(struct rev_info *revs)
if (show_progress)
progress = start_delayed_progress(_("Checking connectivity"), 0);
- mark_reachable_objects(revs, 1, expire, progress);
+ mark_reachable_objects(revs, 0, expire, progress);
stop_progress(&progress);
initialized = 1;
}
diff --git a/builtin/repack.c b/builtin/repack.c
index 67f8978043..618ffbfe0a 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -364,7 +364,6 @@ int cmd_repack(int argc, const char **argv, const char *prefix)
keep_pack_list.items[i].string);
argv_array_push(&cmd.args, "--non-empty");
argv_array_push(&cmd.args, "--all");
- argv_array_push(&cmd.args, "--reflog");
argv_array_push(&cmd.args, "--indexed-objects");
if (repository_format_partial_clone)
argv_array_push(&cmd.args, "--exclude-promisor-objects");
I'm just including that as illustration that add_reflogs_to_pending() in
revision.c during "gc" already iterates over the reflogs without locking
anything, but of course it's just reading them.
So one thing that would mitigate things a lot is if
files_reflog_expire() and its call to expire_reflog_ent() via
refs_for_each_reflog_ent() would lazily aquire the lock on the ref.
Digging a bit further that's actually what we're doing now since
4ff0f01cb7 ("refs: retry acquiring reference locks for 100ms",
2017-08-21).
But this runs into the logic we've had for a long time, or since your
bda3a31cc7 ("reflog-expire: Avoid creating new files in a directory
inside readdir(3) loop", 2008-01-25) where we first loop over all the
refs in the process of finding the reflogs, and then will try to lock
those refs at those expected SHA-1s. If they've changed in the meantime
we error out don't clean up the lockfile.
So just this fixes that:
diff --git a/refs/files-backend.c b/refs/files-backend.c
index ef053f716c..b6576f28b9 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3037,7 +3037,7 @@ static int files_reflog_expire(struct ref_store *ref_store,
* reference itself, plus we might need to update the
* reference if --updateref was specified:
*/
- lock = lock_ref_oid_basic(refs, refname, oid,
+ lock = lock_ref_oid_basic(refs, refname, NULL,
NULL, NULL, REF_NO_DEREF,
&type, &err);
if (!lock) {
Which seems sensible to me. We'll still get the lock, we just don't
assert that the refname we use to get the lock must be at that
SHA-1. We'll still use it for the purposes of expiry.
But maybe I've missed some caveat in reflog_expiry_prepare() and friends
and we really do need the reflog at that OID, then this would suck less:
diff --git a/builtin/reflog.c b/builtin/reflog.c
index 4d3430900d..4bb272fdc8 100644
--- a/builtin/reflog.c
+++ b/builtin/reflog.c
@@ -625,12 +625,16 @@ static int cmd_reflog_expire(int argc, const char **argv, const char *prefix)
free_worktrees(worktrees);
for (i = 0; i < collected.nr; i++) {
struct collected_reflog *e = collected.e[i];
+ int st;
set_reflog_expiry_param(&cb.cmd, explicit_expiry, e->reflog);
- status |= reflog_expire(e->reflog, &e->oid, flags,
- reflog_expiry_prepare,
- should_expire_reflog_ent,
- reflog_expiry_cleanup,
- &cb);
+ st = reflog_expire(e->reflog, &e->oid, flags,
+ reflog_expiry_prepare,
+ should_expire_reflog_ent,
+ reflog_expiry_cleanup,
+ &cb);
+ if (st == -2)
+ continue;
+ status |= st;
free(e);
}
free(collected.e);
diff --git a/refs/files-backend.c b/refs/files-backend.c
index ef053f716c..8b0b6b7b85 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3041,6 +3041,11 @@ static int files_reflog_expire(struct ref_store *ref_store,
NULL, NULL, REF_NO_DEREF,
&type, &err);
if (!lock) {
+ if (errno == EBUSY) {
+ warning("cannot lock ref '%s': %s. Skipping!", refname, err.buf);
+ strbuf_release(&err);
+ return -2;
+ }
error("cannot lock ref '%s': %s", refname, err.buf);
strbuf_release(&err);
return -1;
I.e. we just detect the EBUSY that verify_lock() sets if the OID doesn't
match, and don't prune that reflog. As seen above "pack-objects" and
"prune" will still iterate over the same logs later for the purposes of
reachability, so this shouldn't get us into a corrupt state due to
throwing away objects referenced in those logs, we'll just prune fewer
things than we could have.
So I think I'll use the first patch noted above as a hack to solve the
narrow problem I have now, but any comments on the above most
welcome. I'm not very familiar with the ref code in case that wasn't
obvious already.
B.t.w. the mention of f3b661f766 ("expire_reflog(): use a lock_file for
rewriting the reflog file", 2014-12-12) upthread is irrelevant. That's a
commit where we use the lockfile code to write out the *new* reflog,
which is unrelated to all of this.
next prev parent reply other threads:[~2019-03-13 10:28 UTC|newest]
Thread overview: 60+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-12 23:28 BUG: Race condition due to reflog expiry in "gc" Ævar Arnfjörð Bjarmason
2019-03-13 1:43 ` Junio C Hamano
2019-03-13 10:28 ` Ævar Arnfjörð Bjarmason [this message]
2019-03-13 16:02 ` Jeff King
2019-03-13 16:22 ` Ævar Arnfjörð Bjarmason
2019-03-13 23:54 ` [PATCH 0/5] gc: minor code cleanup + contention fixes Ævar Arnfjörð Bjarmason
2019-03-14 12:34 ` [PATCH v2 0/7] " Ævar Arnfjörð Bjarmason
2019-03-15 15:59 ` [PATCH v3 0/8] " Ævar Arnfjörð Bjarmason
2019-03-18 6:13 ` Junio C Hamano
2019-03-28 16:14 ` [PATCH v4 0/7] gc: tests and handle reflog expire config Ævar Arnfjörð Bjarmason
2019-03-28 16:14 ` [PATCH v4 1/7] gc: remove redundant check for gc_auto_threshold Ævar Arnfjörð Bjarmason
2019-03-28 16:14 ` [PATCH v4 2/7] gc: convert to using the_hash_algo Ævar Arnfjörð Bjarmason
2019-03-28 16:14 ` [PATCH v4 3/7] gc: refactor a "call me once" pattern Ævar Arnfjörð Bjarmason
2019-03-28 16:14 ` [PATCH v4 4/7] reflog tests: make use of "test_config" idiom Ævar Arnfjörð Bjarmason
2019-03-28 16:14 ` [PATCH v4 5/7] reflog tests: test for the "points nowhere" warning Ævar Arnfjörð Bjarmason
2019-03-28 16:14 ` [PATCH v4 6/7] reflog tests: assert lack of early exit with expiry="never" Ævar Arnfjörð Bjarmason
2019-03-28 16:14 ` [PATCH v4 7/7] gc: handle & check gc.reflogExpire config Ævar Arnfjörð Bjarmason
2019-03-15 15:59 ` [PATCH v3 1/8] gc: remove redundant check for gc_auto_threshold Ævar Arnfjörð Bjarmason
2019-03-15 15:59 ` [PATCH v3 2/8] gc: convert to using the_hash_algo Ævar Arnfjörð Bjarmason
2019-03-15 15:59 ` [PATCH v3 3/8] gc: refactor a "call me once" pattern Ævar Arnfjörð Bjarmason
2019-03-15 15:59 ` [PATCH v3 4/8] reflog tests: make use of "test_config" idiom Ævar Arnfjörð Bjarmason
2019-03-15 15:59 ` [PATCH v3 5/8] reflog tests: test for the "points nowhere" warning Ævar Arnfjörð Bjarmason
2019-03-15 15:59 ` [PATCH v3 6/8] reflog tests: assert lack of early exit with expiry="never" Ævar Arnfjörð Bjarmason
2019-03-19 6:20 ` Jeff King
2019-03-15 15:59 ` [PATCH v3 7/8] gc: handle & check gc.reflogExpire config Ævar Arnfjörð Bjarmason
2019-03-15 15:59 ` [PATCH v3 8/8] reflog expire: don't assert the OID when locking refs Ævar Arnfjörð Bjarmason
[not found] ` <b870a17d-2103-41b8-3cbc-7389d5fff33a@alum.mit.edu>
2019-03-21 14:10 ` Ævar Arnfjörð Bjarmason
2019-03-19 6:27 ` [PATCH v2 0/7] gc: minor code cleanup + contention fixes Jeff King
2019-03-14 12:34 ` [PATCH v2 1/7] gc: remove redundant check for gc_auto_threshold Ævar Arnfjörð Bjarmason
2019-03-14 12:34 ` [PATCH v2 2/7] gc: convert to using the_hash_algo Ævar Arnfjörð Bjarmason
2019-03-15 9:51 ` Duy Nguyen
2019-03-15 10:42 ` Ævar Arnfjörð Bjarmason
2019-03-15 15:49 ` Johannes Schindelin
2019-03-14 12:34 ` [PATCH v2 3/7] gc: refactor a "call me once" pattern Ævar Arnfjörð Bjarmason
2019-03-14 12:34 ` [PATCH v2 4/7] reflog tests: make use of "test_config" idiom Ævar Arnfjörð Bjarmason
2019-03-14 12:34 ` [PATCH v2 5/7] reflog: exit early if there's no work to do Ævar Arnfjörð Bjarmason
2019-03-15 10:01 ` Duy Nguyen
2019-03-15 15:51 ` Ævar Arnfjörð Bjarmason
2019-03-14 12:34 ` [PATCH v2 6/7] gc: don't run "reflog expire" when keeping reflogs Ævar Arnfjörð Bjarmason
2019-03-15 10:05 ` Duy Nguyen
2019-03-15 10:24 ` Ævar Arnfjörð Bjarmason
2019-03-15 10:32 ` Duy Nguyen
2019-03-15 15:51 ` Johannes Schindelin
2019-03-18 6:04 ` Junio C Hamano
2019-03-14 12:34 ` [PATCH v2 7/7] reflog expire: don't assert the OID when locking refs Ævar Arnfjörð Bjarmason
2019-03-15 11:10 ` Duy Nguyen
2019-03-15 15:52 ` Ævar Arnfjörð Bjarmason
2019-03-13 23:54 ` [PATCH 1/5] gc: remove redundant check for gc_auto_threshold Ævar Arnfjörð Bjarmason
2019-03-14 0:25 ` Jeff King
2019-03-13 23:54 ` [PATCH 2/5] gc: convert to using the_hash_algo Ævar Arnfjörð Bjarmason
2019-03-14 0:26 ` Jeff King
2019-03-13 23:54 ` [PATCH 3/5] gc: refactor a "call me once" pattern Ævar Arnfjörð Bjarmason
2019-03-14 0:30 ` Jeff King
2019-03-13 23:54 ` [PATCH 4/5] gc: don't run "reflog expire" when keeping reflogs Ævar Arnfjörð Bjarmason
2019-03-14 0:40 ` Jeff King
2019-03-14 4:51 ` Junio C Hamano
2019-03-14 19:26 ` Jeff King
2019-03-13 23:54 ` [PATCH 5/5] reflog expire: don't assert the OID when locking refs Ævar Arnfjörð Bjarmason
2019-03-14 0:44 ` Jeff King
2019-03-14 0:25 ` BUG: Race condition due to reflog expiry in "gc" Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87sgvrax0o.fsf@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jrnieder@gmail.com \
--cc=mhagger@alum.mit.edu \
--cc=pclouds@gmail.com \
--cc=peff@peff.net \
--cc=stefanbeller@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).