git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Junio C Hamano <gitster@pobox.com>
Cc: Jeff King <peff@peff.net>, Johan Herland <johan@herland.net>,
	git@vger.kernel.org, Michael Haggerty <mhagger@alum.mit.edu>
Subject: [PATCH 11/12] for_each_ref: load all loose refs before packed refs
Date: Tue, 11 Jun 2013 23:48:31 +0200	[thread overview]
Message-ID: <1370987312-6761-12-git-send-email-mhagger@alum.mit.edu> (raw)
In-Reply-To: <1370987312-6761-1-git-send-email-mhagger@alum.mit.edu>

From: Jeff King <peff@peff.net>

If we are iterating through the refs using for_each_ref (or
any of its sister functions), we can get into a race
condition with a simultaneous "pack-refs --prune" that looks
like this:

  0. We have a large number of loose refs, and a few packed
     refs. refs/heads/z/foo is loose, with no matching entry
     in the packed-refs file.

  1. Process A starts iterating through the refs. It loads
     the packed-refs file from disk, then starts lazily
     traversing through the loose ref directories.

  2. Process B, running "pack-refs --prune", writes out the
     new packed-refs file. It then deletes the newly packed
     refs, including refs/heads/z/foo.

  3. Meanwhile, process A has finally gotten to
     refs/heads/z (it traverses alphabetically). It
     descends, but finds nothing there.  It checks its
     cached view of the packed-refs file, but it does not
     mention anything in "refs/heads/z/" at all (it predates
     the new file written by B in step 2).

The traversal completes successfully without mentioning
refs/heads/z/foo at all (the name, of course, isn't
important; but the more refs you have and the farther down
the alphabetical list a ref is, the more likely it is to hit
the race). If refs/heads/z/foo did exist in the packed refs
file at state 0, we would see an entry for it, but it would
show whatever sha1 the ref had the last time it was packed
(which could be an arbitrarily long time ago).

This can be especially dangerous when process A is "git
prune", as it means our set of reachable tips will be
incomplete, and we may erroneously prune objects reachable
from that tip (the same thing can happen if "repack -ad" is
used, as it simply drops unreachable objects that are
packed).

This patch solves it by loading all of the loose refs for
our traversal into our in-memory cache, and then refreshing
the packed-refs cache. Because a pack-refs writer will
always put the new packed-refs file into place before
starting the prune, we know that any loose refs we fail to
see will either truly be missing, or will have already been
put in the packed-refs file by the time we refresh.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
Ditto.

 refs.c | 39 +++++++++++++++++++++++++++++++++++----
 1 file changed, 35 insertions(+), 4 deletions(-)

diff --git a/refs.c b/refs.c
index 64f72ab..aa4641b 100644
--- a/refs.c
+++ b/refs.c
@@ -746,6 +746,21 @@ static int do_for_each_entry_in_dirs(struct ref_dir *dir1,
 }
 
 /*
+ * Load all of the refs from the dir into our in-memory cache. The hard work
+ * of loading loose refs is done by get_ref_dir(), so we just need to recurse
+ * through all of the sub-directories. We do not even need to care about
+ * sorting, as traversal order does not matter to us.
+ */
+static void prime_ref_dir(struct ref_dir *dir)
+{
+	int i;
+	for (i = 0; i < dir->nr; i++) {
+		struct ref_entry *entry = dir->entries[i];
+		if (entry->flag & REF_DIR)
+			prime_ref_dir(get_ref_dir(entry));
+	}
+}
+/*
  * Return true iff refname1 and refname2 conflict with each other.
  * Two reference names conflict if one of them exactly matches the
  * leading components of the other; e.g., "foo/bar" conflicts with
@@ -1600,15 +1615,31 @@ void warn_dangling_symref(FILE *fp, const char *msg_fmt, const char *refname)
 static int do_for_each_entry(struct ref_cache *refs, const char *base,
 			     each_ref_entry_fn fn, void *cb_data)
 {
-	struct packed_ref_cache *packed_ref_cache = get_packed_ref_cache(refs);
-	struct ref_dir *packed_dir = get_packed_ref_dir(packed_ref_cache);
-	struct ref_dir *loose_dir = get_loose_refs(refs);
+	struct packed_ref_cache *packed_ref_cache;
+	struct ref_dir *loose_dir;
+	struct ref_dir *packed_dir;
 	int retval = 0;
 
+	/*
+	 * We must make sure that all loose refs are read before accessing the
+	 * packed-refs file; this avoids a race condition in which loose refs
+	 * are migrated to the packed-refs file by a simultaneous process, but
+	 * our in-memory view is from before the migration. get_packed_ref_cache()
+	 * takes care of making sure our view is up to date with what is on
+	 * disk.
+	 */
+	loose_dir = get_loose_refs(refs);
+	if (base && *base) {
+		loose_dir = find_containing_dir(loose_dir, base, 0);
+	}
+	if (loose_dir)
+		prime_ref_dir(loose_dir);
+
+	packed_ref_cache = get_packed_ref_cache(refs);
 	acquire_packed_ref_cache(packed_ref_cache);
+	packed_dir = get_packed_ref_dir(packed_ref_cache);
 	if (base && *base) {
 		packed_dir = find_containing_dir(packed_dir, base, 0);
-		loose_dir = find_containing_dir(loose_dir, base, 0);
 	}
 
 	if (packed_dir && loose_dir) {
-- 
1.8.3

  parent reply	other threads:[~2013-06-11 21:49 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-11 21:48 [PATCH 00/12] Fix some reference-related races Michael Haggerty
2013-06-11 21:48 ` [PATCH 01/12] repack_without_ref(): split list curation and entry writing Michael Haggerty
2013-06-12 11:38   ` Jeff King
2013-06-12 11:56     ` Michael Haggerty
2013-06-11 21:48 ` [PATCH 02/12] pack_refs(): split creation of packed refs " Michael Haggerty
2013-06-11 21:48 ` [PATCH 03/12] refs: wrap the packed refs cache in a level of indirection Michael Haggerty
2013-06-11 21:48 ` [PATCH 04/12] refs: implement simple transactions for the packed-refs file Michael Haggerty
2013-06-12 12:01   ` Jeff King
2013-06-11 21:48 ` [PATCH 05/12] refs: manage lifetime of packed refs cache via reference counting Michael Haggerty
2013-06-11 21:48 ` [PATCH 06/12] do_for_each_entry(): increment the packed refs cache refcount Michael Haggerty
2013-06-11 21:48 ` [PATCH 07/12] packed_ref_cache: increment refcount when locked Michael Haggerty
2013-06-11 21:48 ` [PATCH 08/12] Extract a struct stat_data from cache_entry Michael Haggerty
2013-06-11 21:48 ` [PATCH 09/12] add a stat_validity struct Michael Haggerty
2013-06-11 21:48 ` [PATCH 10/12] get_packed_ref_cache: reload packed-refs file when it changes Michael Haggerty
2013-06-11 21:48 ` Michael Haggerty [this message]
2013-06-11 21:48 ` [PATCH 12/12] refs: do not invalidate the packed-refs cache unnecessarily Michael Haggerty
2013-06-12 12:39   ` Jeff King
2013-06-12 12:52 ` [PATCH 00/12] Fix some reference-related races Jeff King
2013-06-15 20:13 ` Ramsay Jones
2013-06-16  5:50   ` Michael Haggerty
2013-06-18 18:13     ` Ramsay Jones
2013-06-19  5:51       ` Michael Haggerty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1370987312-6761-12-git-send-email-mhagger@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johan@herland.net \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).