git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Michael Haggerty <mhagger@alum.mit.edu>
Cc: git@vger.kernel.org
Subject: another packed-refs race
Date: Fri, 3 May 2013 04:38:48 -0400	[thread overview]
Message-ID: <20130503083847.GA16542@sigill.intra.peff.net> (raw)

I found another race related to the packed-refs code. Consider for a
moment what happens when we are looking at refs and another process does
a simultaneous "git pack-refs --all --prune", updating packed-refs and
deleting the loose refs.

If we are resolving a single ref, then we will either find its loose
form or not. If we do, we're done. If not, we can fall back on what was
in the packed-refs file. If we read the packed-refs file at that point,
we're OK. If the loose ref existed before but was pruned before we could
read it, then we know the packed-refs file is already in place, because
pack-refs would not have deleted the loose ref until it had finished
writing the new file. But imagine that we already loaded the packed-refs
file into memory earlier. We may be looking at an _old_ version of it
that has an irrelevant sha1 from the older packed-refs file. Or it might
not even exist in the packed-refs file at all, and we think the ref does
not resolve.

We could fix this by making sure our packed-refs file is up to date
before using it. E.g., resolving a ref with the following sequence:

  1. Look for loose ref. If found, OK.

  2. Compare inode/size/mtime/etc of on-disk packed-refs to their values
     from the last time we loaded it. If they're not the same, reload
     packed-refs from disk. Otherwise, continue.

  3. Look for ref in in-memory packed-refs.

Not too bad. We introduce one extra stat() for a ref that has been
packed, and the scheme isn't very complicated.

But what about enumerating refs via for_each_ref? It's possible to have
the same problem there, and the packed-refs file may be moved into place
midway through the process of enumerating the loose refs. So we may see
refs/heads/master, but when we get to refs/remotes/origin/master, it has
now been packed and pruned. I _think_ we can get by with:

  1. Generate the complete sorted list of loose refs.

  2. Check that packed-refs is stat-clean, and reload if necessary, as
     above.

  3. Merge the sorted loose and packed lists, letting loose override
     packed (so even if we get repacked halfway through our loose
     traversal and get half of the refs there, it's OK to see duplicates
     in packed-refs, which we will ignore).

This is not very far off of what we do now. Obviously we don't do the
stat-clean check in step 2. But we also don't generate the complete list
of loose refs before hitting the packed-refs file. Instead we lazily
load the loose directories as needed. And of course we cache that
information in memory, even though it may go out of date. I think the
best we could do is keep a stat() for each individual directory we see,
and check it before using the in-memory contents. That may be a lot of
stats, but it's still better than actually opening each loose ref
separately.

So I think it's possible to fix, but I thought you might have some
insight on the simplest way to fit it into the current ref code.

Did I explain the problem well enough to understand? Can you think of
any simpler or better solutions (or is there a case where my proposed
solutions don't work?).

For reference, here's a script that demonstrates the problem during
enumeration (sometimes for-each-ref fails to realize that
refs/heads/master exists at all):

  # run this in one terminal
  git init repo &&
  cd repo &&
  git commit --allow-empty -m foo &&
  base=`git rev-parse HEAD` &&
  while true; do
    # this re-creates the loose ref in .git/refs/heads/master
    git update-ref refs/heads/master $base &&

    # we can remove packed-refs safely, as we know that
    # its only value is now stale. Real git would not do
    # this, but we are simulating the case that "master"
    # simply wasn't included in the last packed-refs file.
    rm -f .git/packed-refs &&

    # and now we repack, which will create an up-to-date
    # packed-refs file, and then delete the loose ref
    git pack-refs --all --prune
  done

  # then simultaneously run this in another
  cd repo &&
  while true; do
    refs=`git for-each-ref`
    echo "==> $refs"
    test -z "$refs" && break
  done

Obviously the "rm -f packed-refs" above is contrived, but it does
simulate a real possible state. You could also do it with a packed-refs
file that has an outdated value for refs/heads/master, and demonstrate
that for-each-ref sees the outdated value.

-Peff

             reply	other threads:[~2013-05-03  8:38 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-03  8:38 Jeff King [this message]
2013-05-03  9:26 ` another packed-refs race Johan Herland
2013-05-03 17:28   ` Jeff King
2013-05-03 18:26     ` Jeff King
2013-05-03 21:02       ` Johan Herland
2013-05-06 12:12     ` Michael Haggerty
2013-05-06 18:44       ` Jeff King
2013-05-03 21:21 ` Jeff King
2013-05-06 12:03 ` Michael Haggerty
2013-05-06 18:41   ` Jeff King
2013-05-06 22:18     ` Jeff King
2013-05-07  4:32     ` Michael Haggerty
2013-05-07  4:44       ` Jeff King
2013-05-07  8:03         ` Michael Haggerty
2013-05-07  2:36 ` [PATCH 0/4] fix packed-refs races Jeff King
2013-05-07  2:38   ` [PATCH 1/4] resolve_ref: close race condition for packed refs Jeff King
2013-05-12 22:56     ` Michael Haggerty
2013-05-16  3:47       ` Jeff King
2013-05-16  5:50         ` Michael Haggerty
2013-05-12 23:26     ` Michael Haggerty
2013-06-11 14:26     ` [PATCH 0/4] Fix a race condition when reading loose refs Michael Haggerty
2013-06-11 14:26       ` [PATCH 1/4] resolve_ref_unsafe(): extract function handle_missing_loose_ref() Michael Haggerty
2013-06-11 14:26       ` [PATCH 2/4] resolve_ref_unsafe(): handle the case of an SHA-1 within loop Michael Haggerty
2013-06-11 14:26       ` [PATCH 3/4] resolve_ref_unsafe(): nest reference-reading code in an infinite loop Michael Haggerty
2013-06-11 14:26       ` [PATCH 4/4] resolve_ref_unsafe(): close race condition reading loose refs Michael Haggerty
2013-06-12  8:04         ` Jeff King
2013-06-13  8:22         ` Thomas Rast
2013-06-14  7:17           ` Michael Haggerty
2013-06-11 20:57       ` [PATCH 0/4] Fix a race condition when " Junio C Hamano
2013-05-07  2:39   ` [PATCH 2/4] add a stat_validity struct Jeff King
2013-05-13  2:29     ` Michael Haggerty
2013-05-13  3:00       ` [RFC 0/2] Separate stat_data from cache_entry Michael Haggerty
2013-05-13  3:00         ` [RFC 1/2] Extract a struct " Michael Haggerty
2013-05-13  3:00         ` [RFC 2/2] add a stat_validity struct Michael Haggerty
2013-05-13  5:10         ` [RFC 0/2] Separate stat_data from cache_entry Junio C Hamano
2013-05-16  3:51       ` [PATCH 2/4] add a stat_validity struct Jeff King
2013-05-07  2:43   ` [PATCH 3/4] get_packed_refs: reload packed-refs file when it changes Jeff King
2013-05-07  2:54     ` [PATCH 0/2] peel_ref cleanups changes Jeff King
2013-05-07  2:56       ` [PATCH 1/2] peel_ref: rename "sha1" argument to "peeled" Jeff King
2013-05-07  3:06       ` [PATCH 2/2] peel_ref: refactor for safety with simultaneous update Jeff King
2013-05-09 19:18     ` [PATCH 3/4] get_packed_refs: reload packed-refs file when it changes Eric Sunshine
2013-05-13  2:43     ` Michael Haggerty
2013-05-07  2:51   ` [PATCH 4/4] for_each_ref: load all loose refs before packed refs Jeff King
2013-05-07  6:40   ` [PATCH 0/4] fix packed-refs races Junio C Hamano
2013-05-07 14:19     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130503083847.GA16542@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=mhagger@alum.mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).