git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Junio C Hamano <gitster@pobox.com>
Cc: Jeff King <peff@peff.net>, Johan Herland <johan@herland.net>,
	git@vger.kernel.org, Michael Haggerty <mhagger@alum.mit.edu>
Subject: [PATCH 00/12] Fix some reference-related races
Date: Tue, 11 Jun 2013 23:48:20 +0200	[thread overview]
Message-ID: <1370987312-6761-1-git-send-email-mhagger@alum.mit.edu> (raw)

*This patch series must be built on top of mh/reflife.*

This patch series fixes some races reading loose and packed refs.
Most of the problems, and some of the solutions, were pointed out by
Jeff King [1] but some other work was necessary to prevent his fixes
from causing problems elsewhere.

The basic race being addressed is that at any time "pack-refs --prune"
might be run at any time.  It rewrites the packed-refs file and then
deletes the just-packed loose refs.

Readers, then, to get a self-consistent snapshot of references [2],
must be sure to read all of the loose references it will need *before*
reading the packed references (or at least verifying that the packed
references that it read earlier are still valid).  But given the
lazy-loading of the loose references cache, this was not always the
case.

So the core of this patch series is to force loose references for an
iteration to be read all at once into the cache, and *then* to verify
that the packed-refs cache is up-to-date and if not to reload it.
Similarly, when looking up single references, a loose reference is
sought first, and then the validity of the packed-refs cache is
verified, and then the loose reference is sought in the cache.

The problem is that there was a lot of code that assumed that the
lifetime of the reference cache was essentially infinite.  The
mh/reflife patch series (which has made it to next) fixed callers who
retained pointers to refnames in the cache.

The other problem was that the for_each_ref() functions will die if
the ref cache that they are iterating over is freed out from under
them.  This problem is solved by using reference counts to avoid
freeing the old packed ref cache (even if it is no longer valid) until
all users are done with it.

Once those are done, it is possible to invalidate the packed refs
cache when needed.  So (1) we always read all loose references that
will be needed in an iteration before the iteration starts, and (2) we
add a check (based on file metadata) whenever the packed-refs cache is
accessed that it is still up-to-date WRT the packed-refs file, and if
not reread it (but leave the old copy in memory as long as its
refcount is nonzero).

Along the way, this patch series adds simple transactions around the
packed-refs file/cache.  The transaction interface is public.  I think
this is a step in a good direction, because other race conditions not
addressed by this patch series are likely to require transactions
across the whole reference namespace to be made 100% reliable.

As a stress test, the test suite can be run with a simulated
"hyperactive repository" in which the packed-refs file is made to look
like it changes every time it is checked (except when its lock is
held):

    ------------------------------------ refs.c ------------------------------------
    @@ -1075,8 +1075,8 @@ static struct packed_ref_cache *get_packed_ref_cache(struct ref_cache *refs)
     	else
     		packed_refs_file = git_path("packed-refs");

    -	if (refs->packed &&
    -	    !stat_validity_check(&refs->packed->validity, packed_refs_file))
    +	if (refs->packed && !refs->packed->lock
    +	    /*!stat_validity_check(&refs->packed->validity, packed_refs_file)*/)
     		clear_packed_ref_cache(refs);

     	if (!refs->packed) {


It passes the stress test.

[1] http://thread.gmane.org/gmane.comp.version-control.git/223299/focus=223526

Jeff King (2):
  get_packed_ref_cache: reload packed-refs file when it changes
  for_each_ref: load all loose refs before packed refs

Michael Haggerty (10):
  repack_without_ref(): split list curation and entry writing
  pack_refs(): split creation of packed refs and entry writing
  refs: wrap the packed refs cache in a level of indirection
  refs: implement simple transactions for the packed-refs file
  refs: manage lifetime of packed refs cache via reference counting
  do_for_each_entry(): increment the packed refs cache refcount
  packed_ref_cache: increment refcount when locked
  Extract a struct stat_data from cache_entry
  add a stat_validity struct
  refs: do not invalidate the packed-refs cache unnecessarily

 builtin/clone.c    |   7 +-
 builtin/ls-files.c |  12 ++-
 cache.h            |  60 +++++++++--
 read-cache.c       | 181 +++++++++++++++++++------------
 refs.c             | 308 ++++++++++++++++++++++++++++++++++++++++++++---------
 refs.h             |  27 ++++-
 6 files changed, 464 insertions(+), 131 deletions(-)

-- 
1.8.3

             reply	other threads:[~2013-06-11 21:48 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-06-11 21:48 Michael Haggerty [this message]
2013-06-11 21:48 ` [PATCH 01/12] repack_without_ref(): split list curation and entry writing Michael Haggerty
2013-06-12 11:38   ` Jeff King
2013-06-12 11:56     ` Michael Haggerty
2013-06-11 21:48 ` [PATCH 02/12] pack_refs(): split creation of packed refs " Michael Haggerty
2013-06-11 21:48 ` [PATCH 03/12] refs: wrap the packed refs cache in a level of indirection Michael Haggerty
2013-06-11 21:48 ` [PATCH 04/12] refs: implement simple transactions for the packed-refs file Michael Haggerty
2013-06-12 12:01   ` Jeff King
2013-06-11 21:48 ` [PATCH 05/12] refs: manage lifetime of packed refs cache via reference counting Michael Haggerty
2013-06-11 21:48 ` [PATCH 06/12] do_for_each_entry(): increment the packed refs cache refcount Michael Haggerty
2013-06-11 21:48 ` [PATCH 07/12] packed_ref_cache: increment refcount when locked Michael Haggerty
2013-06-11 21:48 ` [PATCH 08/12] Extract a struct stat_data from cache_entry Michael Haggerty
2013-06-11 21:48 ` [PATCH 09/12] add a stat_validity struct Michael Haggerty
2013-06-11 21:48 ` [PATCH 10/12] get_packed_ref_cache: reload packed-refs file when it changes Michael Haggerty
2013-06-11 21:48 ` [PATCH 11/12] for_each_ref: load all loose refs before packed refs Michael Haggerty
2013-06-11 21:48 ` [PATCH 12/12] refs: do not invalidate the packed-refs cache unnecessarily Michael Haggerty
2013-06-12 12:39   ` Jeff King
2013-06-12 12:52 ` [PATCH 00/12] Fix some reference-related races Jeff King
2013-06-15 20:13 ` Ramsay Jones
2013-06-16  5:50   ` Michael Haggerty
2013-06-18 18:13     ` Ramsay Jones
2013-06-19  5:51       ` Michael Haggerty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1370987312-6761-1-git-send-email-mhagger@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=johan@herland.net \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).