git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Michael Haggerty <mhagger@alum.mit.edu>
Cc: Johan Herland <johan@herland.net>, git@vger.kernel.org
Subject: Re: another packed-refs race
Date: Fri, 3 May 2013 17:21:44 -0400	[thread overview]
Message-ID: <20130503212144.GA17698@sigill.intra.peff.net> (raw)
In-Reply-To: <20130503083847.GA16542@sigill.intra.peff.net>

On Fri, May 03, 2013 at 04:38:47AM -0400, Jeff King wrote:

> For reference, here's a script that demonstrates the problem during
> enumeration (sometimes for-each-ref fails to realize that
> refs/heads/master exists at all):
> 
>   # run this in one terminal
>   git init repo &&
>   cd repo &&
>   git commit --allow-empty -m foo &&
>   base=`git rev-parse HEAD` &&
>   while true; do
>     # this re-creates the loose ref in .git/refs/heads/master
>     git update-ref refs/heads/master $base &&

It turns out this is wrong. Git is smart enough not to bother writing
out the loose ref if it isn't changing. So the script as I showed it
actually ends up in a state with _neither_ the packed-refs file nor the
loose ref for an instant.

The correct script looks like this (it just flips between two objects):

  git init -q repo &&
  cd repo &&
  git commit -q --allow-empty -m one &&
  one=`git rev-parse HEAD` &&
  git commit -q --allow-empty -m two &&
  two=`git rev-parse HEAD` &&
  sha1=$one &&
  while true; do
    # this re-creates the loose ref in .git/refs/heads/master
    if test "$sha1" = "$one"; then
      sha1=$two
    else
      sha1=$one
    fi &&
    git update-ref refs/heads/master $sha1 &&

    # we can remove packed-refs safely, as we know that
    # its only value is now stale. Real git would not do
    # this, but we are simulating the case that "master"
    # simply wasn't included in the last packed-refs file.
    rm -f .git/packed-refs &&

    # and now we repack, which will create an up-to-date
    # packed-refs file, and then delete the loose ref
    git pack-refs --all --prune
  done

And a racy lookup check could look like this:

  cd repo &&
  while true; do
    ref=`git rev-parse --verify master`
    echo "==> $ref"
    test -z "$ref" && break
  done

it doesn't know which of the two flipping refs it will get on any given
invocation, but it should never see nothing. It should get one or the
other. With stock git, running these two looks for me simultaneously
typically causes a failure in the second one within about 15 seconds.
The (messy, not ready for application) patch below fixes it (at least I
let it run for 30 minutes without a problem).

The fix is actually two-fold:

  1. Re-load the packed-refs file after each loose object lookup
     failure. This is made more palatable by using stat() to avoid
     re-reading the file in the common case that it wasn't updated.

  2. The loose ref reading itself is actually not atomic. We call
     lstat() on the ref to find out whether it exists (and whether it is
     a symlink). If we get ENOENT, we fall back to finding the loose
     ref.  If it does exist and is a regular file, we proceed to open()
     it. But if the ref gets packed and pruned in the interim, our open
     will fail and we just return NULL to say "oops, I guess it doesn't
     exist". We want the same fallback-to-packed behavior we would get
     if the lstat failed.

     We could potentially do the same when we readlink() a symbolic
     link, but I don't think it is necessary. We do not pack symbolic
     refs, so if readlink gets ENOENT, it's OK to say "nope, the ref
     does not exist".

This doesn't cover the for_each_ref enumeration case at all, which
should still fail.  I'll try to look at that next.

---
diff --git a/refs.c b/refs.c
index de2d8eb..45a7ee6 100644
--- a/refs.c
+++ b/refs.c
@@ -708,6 +708,7 @@ static struct ref_cache {
 	struct ref_cache *next;
 	struct ref_entry *loose;
 	struct ref_entry *packed;
+	struct stat packed_validity;
 	/* The submodule name, or "" for the main repo. */
 	char name[FLEX_ARRAY];
 } *ref_cache;
@@ -717,6 +718,7 @@ static void clear_packed_ref_cache(struct ref_cache *refs)
 	if (refs->packed) {
 		free_ref_entry(refs->packed);
 		refs->packed = NULL;
+		memset(&refs->packed_validity, 0, sizeof(refs->packed_validity));
 	}
 }
 
@@ -876,19 +878,57 @@ static struct ref_dir *get_packed_refs(struct ref_cache *refs)
 	}
 }
 
+/*
+ * Returns 1 if the cached stat information matches the
+ * current state of the file, and 0 otherwise. This should
+ * probably be refactored to share code with ce_match_stat_basic,
+ * which has platform-specific knobs for which fields to respect.
+ */
+static int check_stat_validity(const struct stat *old, const char *fn)
+{
+	static struct stat null;
+	struct stat cur;
+
+	if (stat(fn, &cur))
+		return errno == ENOENT && !memcmp(old, &null, sizeof(null));
+	return cur.st_ino == old->st_ino &&
+	       cur.st_size == old->st_size &&
+	       cur.st_mtime == old->st_mtime;
+}
+
+/*
+ * Call fstat, but zero out the stat structure if for whatever
+ * reason we can't get an answer.
+ */
+static int safe_fstat(int fd, struct stat *out)
+{
+	int r = fstat(fd, out);
+	if (r)
+		memset(out, 0, sizeof(*out));
+	return r;
+}
+
 static struct ref_dir *get_packed_refs(struct ref_cache *refs)
 {
+	const char *packed_refs_file;
+
+	if (*refs->name)
+		packed_refs_file = git_path_submodule(refs->name, "packed-refs");
+	else
+		packed_refs_file = git_path("packed-refs");
+
+	if (refs->packed &&
+	    !check_stat_validity(&refs->packed_validity, packed_refs_file))
+		clear_packed_ref_cache(refs);
+
 	if (!refs->packed) {
-		const char *packed_refs_file;
 		FILE *f;
 
 		refs->packed = create_dir_entry(refs, "", 0, 0);
-		if (*refs->name)
-			packed_refs_file = git_path_submodule(refs->name, "packed-refs");
-		else
-			packed_refs_file = git_path("packed-refs");
+
 		f = fopen(packed_refs_file, "r");
 		if (f) {
+			safe_fstat(fileno(f), &refs->packed_validity);
 			read_packed_refs(f, get_ref_dir(refs->packed));
 			fclose(f);
 		}
@@ -1108,6 +1148,13 @@ const char *resolve_ref_unsafe(const char *refname, unsigned char *sha1, int rea
 		git_snpath(path, sizeof(path), "%s", refname);
 
 		if (lstat(path, &st) < 0) {
+			/*
+			 * this lets us reuse this code path
+			 * for later syscall failures; it should
+			 * almost certainly just get factored out into a
+			 * function though
+			 */
+fallback_to_packed:
 			if (errno != ENOENT)
 				return NULL;
 			/*
@@ -1156,7 +1203,7 @@ const char *resolve_ref_unsafe(const char *refname, unsigned char *sha1, int rea
 		 */
 		fd = open(path, O_RDONLY);
 		if (fd < 0)
-			return NULL;
+			goto fallback_to_packed;
 		len = read_in_full(fd, buffer, sizeof(buffer)-1);
 		close(fd);
 		if (len < 0)

  parent reply	other threads:[~2013-05-03 21:21 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-03  8:38 another packed-refs race Jeff King
2013-05-03  9:26 ` Johan Herland
2013-05-03 17:28   ` Jeff King
2013-05-03 18:26     ` Jeff King
2013-05-03 21:02       ` Johan Herland
2013-05-06 12:12     ` Michael Haggerty
2013-05-06 18:44       ` Jeff King
2013-05-03 21:21 ` Jeff King [this message]
2013-05-06 12:03 ` Michael Haggerty
2013-05-06 18:41   ` Jeff King
2013-05-06 22:18     ` Jeff King
2013-05-07  4:32     ` Michael Haggerty
2013-05-07  4:44       ` Jeff King
2013-05-07  8:03         ` Michael Haggerty
2013-05-07  2:36 ` [PATCH 0/4] fix packed-refs races Jeff King
2013-05-07  2:38   ` [PATCH 1/4] resolve_ref: close race condition for packed refs Jeff King
2013-05-12 22:56     ` Michael Haggerty
2013-05-16  3:47       ` Jeff King
2013-05-16  5:50         ` Michael Haggerty
2013-05-12 23:26     ` Michael Haggerty
2013-06-11 14:26     ` [PATCH 0/4] Fix a race condition when reading loose refs Michael Haggerty
2013-06-11 14:26       ` [PATCH 1/4] resolve_ref_unsafe(): extract function handle_missing_loose_ref() Michael Haggerty
2013-06-11 14:26       ` [PATCH 2/4] resolve_ref_unsafe(): handle the case of an SHA-1 within loop Michael Haggerty
2013-06-11 14:26       ` [PATCH 3/4] resolve_ref_unsafe(): nest reference-reading code in an infinite loop Michael Haggerty
2013-06-11 14:26       ` [PATCH 4/4] resolve_ref_unsafe(): close race condition reading loose refs Michael Haggerty
2013-06-12  8:04         ` Jeff King
2013-06-13  8:22         ` Thomas Rast
2013-06-14  7:17           ` Michael Haggerty
2013-06-11 20:57       ` [PATCH 0/4] Fix a race condition when " Junio C Hamano
2013-05-07  2:39   ` [PATCH 2/4] add a stat_validity struct Jeff King
2013-05-13  2:29     ` Michael Haggerty
2013-05-13  3:00       ` [RFC 0/2] Separate stat_data from cache_entry Michael Haggerty
2013-05-13  3:00         ` [RFC 1/2] Extract a struct " Michael Haggerty
2013-05-13  3:00         ` [RFC 2/2] add a stat_validity struct Michael Haggerty
2013-05-13  5:10         ` [RFC 0/2] Separate stat_data from cache_entry Junio C Hamano
2013-05-16  3:51       ` [PATCH 2/4] add a stat_validity struct Jeff King
2013-05-07  2:43   ` [PATCH 3/4] get_packed_refs: reload packed-refs file when it changes Jeff King
2013-05-07  2:54     ` [PATCH 0/2] peel_ref cleanups changes Jeff King
2013-05-07  2:56       ` [PATCH 1/2] peel_ref: rename "sha1" argument to "peeled" Jeff King
2013-05-07  3:06       ` [PATCH 2/2] peel_ref: refactor for safety with simultaneous update Jeff King
2013-05-09 19:18     ` [PATCH 3/4] get_packed_refs: reload packed-refs file when it changes Eric Sunshine
2013-05-13  2:43     ` Michael Haggerty
2013-05-07  2:51   ` [PATCH 4/4] for_each_ref: load all loose refs before packed refs Jeff King
2013-05-07  6:40   ` [PATCH 0/4] fix packed-refs races Junio C Hamano
2013-05-07 14:19     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130503212144.GA17698@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=johan@herland.net \
    --cc=mhagger@alum.mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).