git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: David Turner <David.Turner@twosigma.com>
Cc: 'Howard Chu' <hyc@symas.com>,
	"spearce@spearce.org" <spearce@spearce.org>,
	"avarab@gmail.com" <avarab@gmail.com>,
	"ben.alex@acegi.com.au" <ben.alex@acegi.com.au>,
	"dborowitz@google.com" <dborowitz@google.com>,
	"git@vger.kernel.org" <git@vger.kernel.org>,
	"gitster@pobox.com" <gitster@pobox.com>,
	"mhagger@alum.mit.edu" <mhagger@alum.mit.edu>,
	"sbeller@google.com" <sbeller@google.com>,
	"stoffe@gmail.com" <stoffe@gmail.com>
Subject: Re: reftable [v5]: new ref storage format
Date: Mon, 14 Aug 2017 23:54:33 -0400	[thread overview]
Message-ID: <20170815035432.kyrrqoagoxouwyln@sigill.intra.peff.net> (raw)
In-Reply-To: <4c1c1fc9904f4678823b6c3054c02b4d@exmbdft7.ad.twosigma.com>

On Mon, Aug 14, 2017 at 04:05:05PM +0000, David Turner wrote:

> > All that aside, we could simply add an EXCLUSIVE open-flag to LMDB, and
> > prevent multiple processes from using the DB concurrently. In that case,
> > maintaining coherence with other NFS clients is a non-issue. It strikes me that git
> > doesn't require concurrent multi-process access anyway, and any particular
> > process would only use the DB for a short time before closing it and going away.
> 
> Git, in general, does require concurrent multi-process access, depending on what 
> that means.
> 
> For example, a post-receive hook might call some git command which opens the 
> ref database.  This means that git receive-pack would have to close and 
> re-open the ref database.  More generally, a fair number of git commands are
> implemented in terms of other git commands, and might need the same treatment.
> We could, in general, close and re-open the database around fork/exec, but I am
> not sure that this solves the general problem -- by mere happenstance, one might
> be e.g. pushing in one terminal while running git checkout in another.  This is 
> especially true with git worktrees, which share one ref database across multiple
> working directories.

Yeah, I'd agree that git's multi-process way of working would probably
cause some headaches if there were a broad lock.

I had the impression that Howard meant we would lock for _read_
operations, too. If so, I think that's probably going to cause a
noticeable performance problem for servers.  A repository which is
serving fetches to a lot of clients (even if some of those are noops)
has to send the current ref state out to each client. I don't think we'd
want to add a serial bottleneck to that portion of each process, which
can otherwise happen totally in parallel.

Serializing writes is probably not so big a deal as long as it is kept
to the portion where the process is actively writing out values. And as
long as there's a reasonable backoff/retry protocol; right now we don't
generally bother retrying ref locks because they're taken individually,
so racing on a lock almost certainly[1] means that you've lost the
sha1-lease and need to restart the larger operation.

-Peff

[1] Actually, we've found this isn't always true. Things like ref
    packing require taking locks for correctness, which means they can
    interfere with actual ref updates. That's yet another thing it would
    be nice to get rid of when moving away from the loose/packed
    storage.

      reply	other threads:[~2017-08-15  3:54 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-06  3:15 reftable [v5]: new ref storage format Shawn Pearce
2017-08-06 16:56 ` Ævar Arnfjörð Bjarmason
2017-08-06 22:56   ` Shawn Pearce
     [not found]     ` <CAOhB0ruYhGAyNn84ZjS7TH7QdwxNi2bPN8KFxEEBd58B9qVrmg@mail.gmail.com>
2017-08-07 14:41       ` Shawn Pearce
2017-08-07 15:40         ` David Turner
2017-08-08  7:52           ` Jeff King
2017-08-08  9:16             ` Shawn Pearce
2017-08-08  7:38         ` Jeff King
2017-08-09 11:18         ` Howard Chu
2017-08-14 12:30           ` Howard Chu
2017-08-14 16:05             ` David Turner
2017-08-15  3:54               ` Jeff King [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170815035432.kyrrqoagoxouwyln@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=David.Turner@twosigma.com \
    --cc=avarab@gmail.com \
    --cc=ben.alex@acegi.com.au \
    --cc=dborowitz@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=hyc@symas.com \
    --cc=mhagger@alum.mit.edu \
    --cc=sbeller@google.com \
    --cc=spearce@spearce.org \
    --cc=stoffe@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).