From: Martin Fick <mfick@codeaurora.org>
To: Michael Haggerty <mhagger@alum.mit.edu>, Shawn Pearce <sop@google.com>
Cc: Jeff King <peff@peff.net>,
git@vger.kernel.org, Junio C Hamano <gitster@pobox.com>
Subject: Re: Lockless Refs? (Was [PATCH] refs: do not use cached refs in repack_without_ref)
Date: Mon, 31 Dec 2012 03:30:53 -0700 [thread overview]
Message-ID: <201212310330.53835.mfick@codeaurora.org> (raw)
In-Reply-To: <201212271611.52203.mfick@codeaurora.org>
On Thursday, December 27, 2012 04:11:51 pm Martin Fick wrote:
> It concerns me that git uses any locking at all, even for
> refs since it has the potential to leave around stale
> locks.
> ...
> [a previous not so great attempt to fix this]
> ...
I may have finally figured out a working loose ref update
mechanism which I think can avoid stale locks. Unfortunately
it requires atomic directory renames and universally unique
identifiers (uuids). These may be no-go criteria? But I
figure it is worth at least exploring this idea because of the
potential benefits?
The general approach is to setup a transaction and either
commit or abort it. A transaction can be setup by renaming
an appropriately setup directory to the "ref.lock" name. If
the rename succeeds, the transaction is begun. Any actor can
abort the transaction (up until it is committed) by simply
deleting the "ref.lock" directory, so it is not at risk of
going stale. However, once the actor who sets up the
transaction commits it, deleting the "ref.lock" directory
simply aids in cleaning it up for the next transaction
(instead of aborting it).
One important piece of the transaction is the use of uuids.
The uuids provide a mechanism to tie the atomic commit pieces
to the transactions and thus to prevent long sleeping process
from inadvertently performing actions which could be out of
date when they wake finally up. In each case, the atomic
commit piece is the renaming of a file. For the create and
update pieces, a file is renamed from the "ref.lock" dir to
the "ref" file resulting in an update to the sha for the ref.
However, in the delete case, the "ref" file is instead renamed
to end up in the "ref.lock" directory resulting in a delete
of the ref. This scheme does not affect the way refs are read
today,
To prepare for a transaction, an actor first generates a uuid
(an exercise I will delay for now). Next, a tmp directory
named after the uuid is generated in the parent directory for
the ref to be updated, perhaps something like: ".lock_uuid".
In this directory is places either a file or a directory named
after the uuid, something like: ".lock_uuid/,uuid". In the
case of a create or an update, the new sha is written to this
file. In the case of a delete, it is a directory.
Once the tmp directory is setup, the initiating actor
attempts to start the transaction by renaming the tmp
directory to "ref.lock". If the rename fails, the update
fails. If the rename succeeds, the actor can then attempt to
commit the transaction (before another actor aborts it).
In the case of a create, the actor verifies that "ref" does
not currently exist, and then renames the now named
"ref.lock/uuid" file to "ref". On success, the ref was
created.
In the case of an update, the actor verifies that "ref"
currently contains the old sha, and then also renames the now
named "ref.lock/uuid" file to "ref". On success, the ref was
updated.
In the case of a delete, the actor may verify that "ref"
currently contains the sha to "prune" if it needs to, and
then renames the "ref" file to "ref.lock/uuid/delete". On
success, the ref was deleted.
Whether successful or not, the actor may now simply delete
the "ref.lock" directory, clearing the way for a new
transaction. Any other actor may delete this directory at
any time also, likely either on conflict (if they are
attempting to initiate a transaction), or after a grace
period just to cleanup the FS. Any actor may also safely
cleanup the tmp directories, preferably also after a grace
period.
One neat part about this scheme is that I believe it would be
backwards compatible with the current locking mechanism since
the transaction directory will simply appear to be a lock to
older clients. And the old lock file should continue to lock
out these newer transactions.
Due to this backwards compatibility, I believe that this
could be incrementally employed today without affecting very
much. It could be deployed in place of any updates which
only hold ref.locks to update the loose ref. So for example
I think it could replace step 4a below from Michael
Haggerty's description of today's loose ref pruning during
ref packing:
> * Pack references:
...
> 4. prune_refs(): for each ref in the ref_to_prune list,
> call prune_ref():
>
> a. Lock the reference using lock_ref_sha1(),
> verifying that the recorded SHA1 is still valid. If it
> is, unlink the loose reference file then free the lock;
> otherwise leave the loose reference file untouched.
I think it would also therefore be able to replace the loose
ref locking in Michael's new ref-packing scheme as well as
the locking in Michael's new ref deletion scheme (again steps
4):
> * Delete reference foo:
...
> 4. Delete loose ref for "foo":
>
> a. Acquire the lock $GIT_DIR/refs/heads/foo.lock
>
> b. Unlink $GIT_DIR/refs/heads/foo if it is unchanged.
> If it is changed, leave it untouched. If it is deleted,
> that is OK too.
>
> c. Release lock $GIT_DIR/refs/heads/foo.lock
...
> * Pack references:
...
> 4. prune_refs(): for each ref in the ref_to_prune list,
> call prune_ref():
>
> a. Lock the loose reference using lock_ref_sha1(),
> verifying that the recorded SHA1 is still valid
>
> b. If it is, unlink the loose reference file
> (otherwise, leave it untouched)
>
> c. Release the lock on the loose reference
To be honest, I suspect I missed something obvious because
this seems almost too simple to work. I am ashamed that it
took me so long to come up with (of course, I will be even
more ashamed :( when it is shown to be flawed!) This scheme
also feels extensible. if there are no obvious flaws in it, I
will try to post solutions for ref packing and for multiple
repository/ref transactions also soon.
I welcome any comments/criticisms,
-Martin
next prev parent reply other threads:[~2012-12-31 10:31 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-12-21 8:04 [PATCH] refs: do not use cached refs in repack_without_ref Jeff King
2012-12-26 8:24 ` Michael Haggerty
2012-12-27 23:11 ` Lockless Refs? (Was [PATCH] refs: do not use cached refs in repack_without_ref) Martin Fick
2012-12-28 14:50 ` Martin Fick
2012-12-28 17:15 ` Lockless Refs? Junio C Hamano
2012-12-29 8:16 ` Jeff King
2012-12-29 21:15 ` Martin Fick
2012-12-29 8:12 ` Lockless Refs? (Was [PATCH] refs: do not use cached refs in repack_without_ref) Jeff King
2012-12-29 21:29 ` Martin Fick
2012-12-28 16:58 ` Lockless Refs? Junio C Hamano
2012-12-29 1:07 ` Martin Fick
2012-12-29 8:10 ` Lockless Refs? (Was [PATCH] refs: do not use cached refs in repack_without_ref) Jeff King
2012-12-29 22:18 ` Martin Fick
2012-12-30 17:03 ` Martin Fick
2012-12-31 10:30 ` Martin Fick [this message]
2013-01-03 23:52 ` Martin Fick
2013-01-04 17:52 ` Pyeron, Jason J CTR (US)
2013-01-04 18:01 ` Martin Fick
2013-01-04 21:28 ` Lockless Refs? Junio C Hamano
2013-01-05 16:12 ` Lockless Refs? (Was [PATCH] refs: do not use cached refs in repack_without_ref) Jeff King
2013-01-22 4:31 ` Drew Northup
2012-12-29 7:16 ` [PATCH] refs: do not use cached refs in repack_without_ref Jeff King
[not found] ` <201301071109.12086.mfick@codeaurora.org>
2013-01-07 18:14 ` Martin Fick
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201212310330.53835.mfick@codeaurora.org \
--to=mfick@codeaurora.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=mhagger@alum.mit.edu \
--cc=peff@peff.net \
--cc=sop@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).