From: Michael Haggerty <mhagger@alum.mit.edu>
To: David Turner <dturner@twopensource.com>, git@vger.kernel.org
Subject: Re: [PATCH v4 13/21] refs: resolve symbolic refs first
Date: Thu, 18 Feb 2016 12:59:58 +0100 [thread overview]
Message-ID: <56C5B23E.6090905@alum.mit.edu> (raw)
In-Reply-To: <1455755367.7528.57.camel@twopensource.com>
On 02/18/2016 01:29 AM, David Turner wrote:
> On Fri, 201-02-12 at 15:09 +0100, Michael Haggerty wrote:]
>> On 02/05/2016 08:44 PM, David Turner wrote:
>>> Before committing ref updates, split symbolic ref updates into two
>>> parts: an update to the underlying ref, and a log-only update to
>>> the
>>> symbolic ref. This ensures that both references are locked
>>> correctly
>>> while their reflogs are updated.
>>>
>>> It is still possible to confuse git by concurrent updates, since
>>> the
>>> splitting of symbolic refs does not happen under lock. So a
>>> symbolic ref
>>> could be replaced by a plain ref in the middle of this operation,
>>> which
>>> would lead to reflog discontinuities and missed old-ref checks.
>>
>> This patch is doing too much at once for my little brain to follow.
>>
>> My first hangup is the change to setting RESOLVE_REF_NO_RECURSE
>> unconditionally in lock_ref_sha1_basic(). I count five callers of
>> that
>> function and see no justification for why the change is OK in the
>> context of each caller. Here are some thoughts:
>>
>> * The call from files_create_symref() sets REF_NODEREF, so it is
>> unaffected by this change.
>
> Yes.
>
>> * The call from files_transaction_commit() is preceded by a call to
>> dereference_symrefs(), which I assume effectively replaces the need
>> for
>> RESOLVE_REF_NO_RECURSE.
>
> Yes.
>
>> * There are two calls from files_rename_ref(). Why is it OK to do
>> without RESOLVE_REF_NO_RECURSE there?
>>
>> * For the oldrefname call, I suppose the justification is the
>> "(flag &
>> REF_ISSYMREF)" check earlier in the function. (But does this
>> introduce a
>> significant TOCTOU race?)
>
> The refs code as a whole seems likely to have TOCTOU issues. In
> general, anywhere we check/set flag & REF_ISSYMREF without holding a
> lock, we have a potential problem. I haven't generally tried to handle
> these cases, since they're not presently handled.
I agree that we don't do so well here, though I think that most races
would result in reading/writing a ref that was pointed to by the symref
a moment ago, which is usually indistinguishable to the user from their
update having gone through the moment before the symref was updated. So
I don't think your change makes this bit of code significantly worse.
> The central problem with this area of the code is that commit interacts
> so intimately with the locking machinery. I understand some of why
> it's done that way. In particular, your change to ref locking to not
> hold lots of open files was a big win for us at Twitter. But this
> means that it's hard to deal with cross-backend ref updates: you want
> to hold multiple locks, and backends don't have the machinery for it.
>
> We could add backend hooks to specifically lock and unlock refs. Then
> the backend commit code would just be handled a bundle of locked refs
> and would commit them. This might be hairy, but it could fix the
> TOCTOU problems. So, first lock the outer refs, then split out updates
> for any which are symbolic refs, and lock those. Finally, commit all
> updates (split by backend).
As chance would have it, for an internal GitHub project I've implemented
hooks that can be called *during* a ref transaction. The hooks can, for
example, take arbitrary actions between the time that the reflocks are
all acquired and the time that the updates start to be committed. I
didn't submit this code upstream because I didn't think that it would
benefit other users, but many it would be useful for implementing
split-backend reference transaction commits. E.g., the primary reference
transaction could run the secondary backend's commit while holding the
locks for the primary backend references.
Let me think about it.
I don't think this is urgent though. The current code is not
significantly racy in mainstream usage scenarios, right?
> One downside of this is that right now, the backend API is relatively
> close to the front-end, and this would leak what should be an
> implementation detail. But maybe this is necessary to knit multiple
> backends together.
>
> But I'm not sure that this is necessary right now, because I'm not sure
> that I'm actually making TOCTOU issues much worse.
Agreed.
> [...]
> That's a legit complaint. The problem, as you note, is that doing some
> of these steps completely independently doesn't work. But I'll try
> splitting out what I can.
Thanks!
Michael
--
Michael Haggerty
mhagger@alum.mit.edu
next prev parent reply other threads:[~2016-02-18 12:07 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-05 19:44 [PATCH v4 00/20] refs backend David Turner
2016-02-05 19:44 ` [PATCH v4 01/21] refs: add a backend method structure with transaction functions David Turner
2016-02-05 19:44 ` [PATCH v4 02/21] refs: add methods for misc ref operations David Turner
2016-02-11 7:45 ` Michael Haggerty
2016-02-12 1:09 ` David Turner
2016-02-05 19:44 ` [PATCH v4 03/21] refs: add methods for the ref iterators David Turner
2016-02-11 8:42 ` Michael Haggerty
2016-02-12 1:08 ` David Turner
2016-02-05 19:44 ` [PATCH v4 04/21] refs: add do_for_each_per_worktree_ref David Turner
2016-02-05 19:44 ` [PATCH v4 05/21] refs: add methods for reflog David Turner
2016-02-05 19:44 ` [PATCH v4 06/21] refs: add method for initial ref transaction commit David Turner
2016-02-05 19:44 ` [PATCH v4 07/21] refs: add method for delete_refs David Turner
2016-02-05 19:44 ` [PATCH v4 08/21] refs: add methods to init refs db David Turner
2016-02-11 8:54 ` Michael Haggerty
2016-02-11 21:15 ` David Turner
2016-02-05 19:44 ` [PATCH v4 09/21] refs: add method to rename refs David Turner
2016-02-11 9:00 ` Michael Haggerty
2016-02-11 21:12 ` David Turner
2016-02-05 19:44 ` [PATCH v4 10/21] refs: make lock generic David Turner
2016-02-05 19:44 ` [PATCH v4 11/21] refs: move duplicate check to common code David Turner
2016-02-05 19:44 ` [PATCH v4 12/21] refs: allow log-only updates David Turner
2016-02-11 10:03 ` Michael Haggerty
2016-02-11 21:23 ` David Turner
2016-02-05 19:44 ` [PATCH v4 13/21] refs: resolve symbolic refs first David Turner
2016-02-12 14:09 ` Michael Haggerty
2016-02-18 0:29 ` David Turner
2016-02-18 11:59 ` Michael Haggerty [this message]
2016-02-05 19:44 ` [PATCH v4 14/21] refs: always handle non-normal refs in files backend David Turner
2016-02-12 15:07 ` Michael Haggerty
2016-02-18 2:44 ` David Turner
2016-02-18 12:07 ` Michael Haggerty
2016-02-18 18:32 ` David Turner
2016-02-05 19:44 ` [PATCH v4 15/21] init: allow alternate ref strorage to be set for new repos David Turner
2016-02-12 15:26 ` Michael Haggerty
2016-02-17 20:47 ` David Turner
2016-02-18 14:12 ` Michael Haggerty
2016-02-05 19:44 ` [PATCH v4 16/21] refs: check submodules ref storage config David Turner
2016-02-05 19:44 ` [PATCH v4 17/21] clone: allow ref storage backend to be set for clone David Turner
2016-02-05 19:44 ` [PATCH v4 18/21] svn: learn ref-storage argument David Turner
2016-02-05 19:44 ` [PATCH v4 19/21] refs: add register_ref_storage_backends() David Turner
2016-02-12 15:42 ` Michael Haggerty
2016-02-17 20:32 ` David Turner
2016-02-05 19:44 ` [PATCH v4 20/21] refs: add LMDB refs storage backend David Turner
2016-02-11 8:48 ` Michael Haggerty
2016-02-11 21:21 ` David Turner
2016-02-12 17:01 ` Michael Haggerty
2016-02-13 1:23 ` David Turner
2016-02-14 12:04 ` Duy Nguyen
2016-02-15 9:57 ` Duy Nguyen
2016-02-16 22:01 ` David Turner
2016-02-17 20:32 ` David Turner
2016-02-05 19:44 ` [PATCH v4 21/21] refs: tests for lmdb backend David Turner
2016-02-08 23:37 ` [PATCH v4 00/20] refs backend Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56C5B23E.6090905@alum.mit.edu \
--to=mhagger@alum.mit.edu \
--cc=dturner@twopensource.com \
--cc=git@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).