git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: David Turner <dturner@twopensource.com>, git@vger.kernel.org
Subject: Re: [PATCH v4 13/21] refs: resolve symbolic refs first
Date: Thu, 18 Feb 2016 12:59:58 +0100	[thread overview]
Message-ID: <56C5B23E.6090905@alum.mit.edu> (raw)
In-Reply-To: <1455755367.7528.57.camel@twopensource.com>

On 02/18/2016 01:29 AM, David Turner wrote:
> On Fri, 201-02-12 at 15:09 +0100, Michael Haggerty wrote:]
>> On 02/05/2016 08:44 PM, David Turner wrote:
>>> Before committing ref updates, split symbolic ref updates into two
>>> parts: an update to the underlying ref, and a log-only update to
>>> the
>>> symbolic ref.  This ensures that both references are locked
>>> correctly
>>> while their reflogs are updated.
>>>
>>> It is still possible to confuse git by concurrent updates, since
>>> the
>>> splitting of symbolic refs does not happen under lock. So a
>>> symbolic ref
>>> could be replaced by a plain ref in the middle of this operation,
>>> which
>>> would lead to reflog discontinuities and missed old-ref checks.
>>
>> This patch is doing too much at once for my little brain to follow.
>>
>> My first hangup is the change to setting RESOLVE_REF_NO_RECURSE
>> unconditionally in lock_ref_sha1_basic(). I count five callers of
>> that
>> function and see no justification for why the change is OK in the
>> context of each caller. Here are some thoughts:
>>
>> * The call from files_create_symref() sets REF_NODEREF, so it is
>> unaffected by this change.
> 
> Yes.
> 
>> * The call from files_transaction_commit() is preceded by a call to
>> dereference_symrefs(), which I assume effectively replaces the need
>> for
>> RESOLVE_REF_NO_RECURSE.
> 
> Yes.
> 
>> * There are two calls from files_rename_ref(). Why is it OK to do
>> without RESOLVE_REF_NO_RECURSE there?
>>
>>   * For the oldrefname call, I suppose the justification is the
>> "(flag &
>> REF_ISSYMREF)" check earlier in the function. (But does this
>> introduce a
>> significant TOCTOU race?)
> 
> The refs code as a whole seems likely to have TOCTOU issues. In
> general, anywhere we check/set flag & REF_ISSYMREF without holding a
> lock, we have a potential problem.  I haven't generally tried to handle
> these cases, since they're not presently handled.  

I agree that we don't do so well here, though I think that most races
would result in reading/writing a ref that was pointed to by the symref
a moment ago, which is usually indistinguishable to the user from their
update having gone through the moment before the symref was updated. So
I don't think your change makes this bit of code significantly worse.

> The central problem with this area of the code is that commit interacts
> so intimately with the locking machinery.  I understand some of why
> it's done that way.  In particular, your change to ref locking to not
> hold lots of open files was a big win for us at Twitter.  But this
> means that it's hard to deal with cross-backend ref updates: you want
> to hold multiple locks, and backends don't have the machinery for it.
> 
> We could add backend hooks to specifically lock and unlock refs. Then
> the backend commit code would just be handled a bundle of locked refs
> and would commit them.  This might be hairy, but it could fix the
> TOCTOU problems.  So, first lock the outer refs, then split out updates
> for any which are symbolic refs, and lock those. Finally, commit all
> updates (split by backend).

As chance would have it, for an internal GitHub project I've implemented
hooks that can be called *during* a ref transaction. The hooks can, for
example, take arbitrary actions between the time that the reflocks are
all acquired and the time that the updates start to be committed. I
didn't submit this code upstream because I didn't think that it would
benefit other users, but many it would be useful for implementing
split-backend reference transaction commits. E.g., the primary reference
transaction could run the secondary backend's commit while holding the
locks for the primary backend references.

Let me think about it.

I don't think this is urgent though. The current code is not
significantly racy in mainstream usage scenarios, right?

> One downside of this is that right now, the backend API is relatively
> close to the front-end, and this would leak what should be an
> implementation detail.  But maybe this is necessary to knit multiple
> backends together.  
> 
> But I'm not sure that this is necessary right now, because I'm not sure
> that I'm actually making TOCTOU issues much worse. 

Agreed.

> [...]
> That's a legit complaint.  The problem, as you note, is that doing some
> of these steps completely independently doesn't work.  But I'll try
> splitting out what I can.

Thanks!

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu

  reply	other threads:[~2016-02-18 12:07 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-05 19:44 [PATCH v4 00/20] refs backend David Turner
2016-02-05 19:44 ` [PATCH v4 01/21] refs: add a backend method structure with transaction functions David Turner
2016-02-05 19:44 ` [PATCH v4 02/21] refs: add methods for misc ref operations David Turner
2016-02-11  7:45   ` Michael Haggerty
2016-02-12  1:09     ` David Turner
2016-02-05 19:44 ` [PATCH v4 03/21] refs: add methods for the ref iterators David Turner
2016-02-11  8:42   ` Michael Haggerty
2016-02-12  1:08     ` David Turner
2016-02-05 19:44 ` [PATCH v4 04/21] refs: add do_for_each_per_worktree_ref David Turner
2016-02-05 19:44 ` [PATCH v4 05/21] refs: add methods for reflog David Turner
2016-02-05 19:44 ` [PATCH v4 06/21] refs: add method for initial ref transaction commit David Turner
2016-02-05 19:44 ` [PATCH v4 07/21] refs: add method for delete_refs David Turner
2016-02-05 19:44 ` [PATCH v4 08/21] refs: add methods to init refs db David Turner
2016-02-11  8:54   ` Michael Haggerty
2016-02-11 21:15     ` David Turner
2016-02-05 19:44 ` [PATCH v4 09/21] refs: add method to rename refs David Turner
2016-02-11  9:00   ` Michael Haggerty
2016-02-11 21:12     ` David Turner
2016-02-05 19:44 ` [PATCH v4 10/21] refs: make lock generic David Turner
2016-02-05 19:44 ` [PATCH v4 11/21] refs: move duplicate check to common code David Turner
2016-02-05 19:44 ` [PATCH v4 12/21] refs: allow log-only updates David Turner
2016-02-11 10:03   ` Michael Haggerty
2016-02-11 21:23     ` David Turner
2016-02-05 19:44 ` [PATCH v4 13/21] refs: resolve symbolic refs first David Turner
2016-02-12 14:09   ` Michael Haggerty
2016-02-18  0:29     ` David Turner
2016-02-18 11:59       ` Michael Haggerty [this message]
2016-02-05 19:44 ` [PATCH v4 14/21] refs: always handle non-normal refs in files backend David Turner
2016-02-12 15:07   ` Michael Haggerty
2016-02-18  2:44     ` David Turner
2016-02-18 12:07       ` Michael Haggerty
2016-02-18 18:32         ` David Turner
2016-02-05 19:44 ` [PATCH v4 15/21] init: allow alternate ref strorage to be set for new repos David Turner
2016-02-12 15:26   ` Michael Haggerty
2016-02-17 20:47     ` David Turner
2016-02-18 14:12       ` Michael Haggerty
2016-02-05 19:44 ` [PATCH v4 16/21] refs: check submodules ref storage config David Turner
2016-02-05 19:44 ` [PATCH v4 17/21] clone: allow ref storage backend to be set for clone David Turner
2016-02-05 19:44 ` [PATCH v4 18/21] svn: learn ref-storage argument David Turner
2016-02-05 19:44 ` [PATCH v4 19/21] refs: add register_ref_storage_backends() David Turner
2016-02-12 15:42   ` Michael Haggerty
2016-02-17 20:32     ` David Turner
2016-02-05 19:44 ` [PATCH v4 20/21] refs: add LMDB refs storage backend David Turner
2016-02-11  8:48   ` Michael Haggerty
2016-02-11 21:21     ` David Turner
2016-02-12 17:01   ` Michael Haggerty
2016-02-13  1:23     ` David Turner
2016-02-14 12:04   ` Duy Nguyen
2016-02-15  9:57     ` Duy Nguyen
2016-02-16 22:01       ` David Turner
2016-02-17 20:32     ` David Turner
2016-02-05 19:44 ` [PATCH v4 21/21] refs: tests for lmdb backend David Turner
2016-02-08 23:37 ` [PATCH v4 00/20] refs backend Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=56C5B23E.6090905@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=dturner@twopensource.com \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).