Re: [PATCH v2 42/43] refs: add LMDB refs backend

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

From: David Turner <dturner@twopensource.com>
To: Michael Haggerty <mhagger@alum.mit.edu>
Cc: git@vger.kernel.org
Subject: Re: [PATCH v2 42/43] refs: add LMDB refs backend
Date: Tue, 06 Oct 2015 21:51:00 -0400	[thread overview]
Message-ID: <1444182660.7739.77.camel@twopensource.com> (raw)
In-Reply-To: <56129B77.1030409@alum.mit.edu>

On Mon, 2015-10-05 at 17:47 +0200, Michael Haggerty wrote:
> On 09/29/2015 12:02 AM, David Turner wrote:
> > Add a database backend for refs using LMDB.  This backend runs git
> > for-each-ref about 30% faster than the files backend with fully-packed
> > refs on a repo with ~120k refs.  It's also about 4x faster than using
> > fully-unpacked refs.  In addition, and perhaps more importantly , it
> > avoids case-conflict issues on OS X.
> > 
> > LMDB has a few features that make it suitable for usage in git:
> > 
> > 1. It is relatively lightweight; it requires only one header file, and
> > the library code takes under 64k at runtime.
> > 
> > 2. It is well-tested: it's been used in OpenLDAP for years.
> > 
> > 3. It's very fast.  LMDB's benchmarks show that it is among
> > the fastest key-value stores.
> > 
> > 4. It has a relatively simple concurrency story; readers don't
> > block writers and writers don't block readers.
> 
> It would be great if you would go into more detail about this point in
> the permanent technical documentation
> (Documentation/technical/refs-be-lmdb.txt): how exactly do readers &
> writers interact with each others, and how do writers interact with
> other writers?

I'll add a bit of text on that.

> I think you have said before that if one writer holds the write lock on
> the DB, then other writers fail immediately. Is that correct? If so, is
> that something that can be adjusted? I think it would be preferable for
> the second writer to retry acquiring the write lock for a little while
> with a timeout (as we now do when trying to acquire the packed-refs
> lock). Otherwise you could have the unhappy situation that somebody
> spends a long time pushing a packfile to a server, only to have the
> reference update be rejected at the last moment due to a lock conflict
> with another process that was touching completely different references.
> We already do before/after consistency checks when updating references,
> so you wouldn't even have to add such code in the backend yourself.

No, the second writer waits for the first writer to unlock (or for it to
crash).

> > +Keys for refs are the name of the ref (e.g. refs/heads/master).
> > +Values are the value of the ref
> > +(e.g. 61f23eb0f81357c19fa91e2b8c6f3906c3a8f9b0).
> 
> Please document more unambiguously whether the value is stored as
> 40-char hexadecimal or 20-byte binary.

Will fix.

> Do you store "peeled" reference values for tags, as is done in
> packed-refs? I think that is an important optimization.

No.  Do you happen to know in what situations this is a performance
benefit, so that I can benchmark?  I suspect it would matter much less
for the LMDB backend, because lookups are pretty quick.

> > +All per-worktree refs (refs/bisect/* and HEAD) are store using
> > +the traditional files-based backend.
> > +
> > +Reflogs are stored as a series of database entries.
> > +
> > +For an empty reflog, there is a "header" entry to show that a reflog
> > +exists.
> 
> What key is used for the "header" entry? Does it have a value?

Will fix.

> > +         For non-empty reflogs, there is one entry per logged ref
> > +update.  The key format is logs/[refname]\0[timestamp].  The timestamp
> > +is a 64-bit unsigned integer number of nanoseconds since 1/1/1970.
> > +This means that reflog entries are chronologically ordered.  Because
> > +LMDB is a btree database, we can efficiently iterate over these keys.
> 
> Currently we discard the reflog for a reference when the reference is
> deleted. This is mostly due to a limitation of the filesystem-based
> storage scheme--otherwise the reflog left over from a deleted reference
> could prevent the creation of a reflog for another reference whose name
> overlaps with it (in the sense of is_refname_available()).
> 
> It is a pretty nasty limitation because it is one of the few ways that
> your actions can cause Git to lose data in a way that it is hard to
> recover from.
> 
> Have you thought about removing this limitation in the lbdb backend?

I'm going for feature parity first.  We can always add new functionality
later.  This particular function would be pretty straightforward to add,
I think.

> > +Reflog values are in the same format as the original files-based
> > +reflog.
> > +
> > +Weaknesses:
> > +-----------
> > +
> > +The reflog format is somewhat inefficient: a binary format could store
> > +reflog date/time information in somewhat less space.
> > +
> > +The rsync and file:// transports don't work yet, because they
> > +don't use the refs API.
> 
> Do they fail gracefully?

Not particularly gracefully.

rsync: link_stat "/home/dturner/git/t/trash
directory.t5510-fetch/.git/packed-refs" failed: No such file or
directory (2)
rsync error: some files/attrs were not transferred (see previous errors)
(code 23) at main.c(1183) [sender=3.1.1]
fatal: Could not run rsync to get refs
-------------

The problem is that rsync on the client assumes that packed-refs exists.
We could hack it to also check for refdb.

> > +#ifdef USE_LIBLMDB
> > +			refs_backend_type = refdb_data.refs_backend_type;
> > +			register_refs_backend(&refs_be_lmdb);
> > +			set_refs_backend(refs_backend_type, &refdb_data);
> > +#else
> > +			die("Git was not built with USE_LIBLMDB, so the db refs backend is not available");
> 
> s/db/lmdb/
> 
> I'm somewhat surprised that you only register the lmdb backend if it is
> used in the main repo. I would expect the backend to be registered
> unconditionally on startup. The cost is trivial, isn't it?

Yeah, but this was the easiest place to do it.

> It appears that you recognize "lmdb" but you treat a missing value *or
> any other value* like "files". I think you should be more conservative
> here and die() if you see a backend type that you don't recognize.

Will fix, thanks.

> > diff --git a/contrib/workdir/git-new-workdir b/contrib/workdir/git-new-workdir
> > index 888c34a..9761a1a 100755
> > --- a/contrib/workdir/git-new-workdir
> > +++ b/contrib/workdir/git-new-workdir
> > @@ -28,6 +28,8 @@ git_dir=$(cd "$orig_git" 2>/dev/null &&
> >    git rev-parse --git-dir 2>/dev/null) ||
> >    die "Not a git repository: \"$orig_git\""
> >  
> > +test -d $git_dir/refdb && die "git-new-workdir is incompatible with the refs db backend"
> > +
> 
> It would be safer to check $(git config core.refs-backend-type) to avoid
> depending on an implementation detail of the LMDB backend (and as a bit
> of safety against any backends that might be added in the future).

Will fix, thanks.

> > +static void init_env(MDB_env **env, const char *path)
> > +{
> > +	int ret;
> > +	if (*env)
> > +		return;
> > +
> > +	if ((ret = mdb_env_create(env)) != MDB_SUCCESS)
> > +		die("mdb_env_create failed: %s", mdb_strerror(ret));
> 
> The LMDB docs seem to say that their functions "return a non-zero error
> value on failure and 0 on success". So can we avoid writing MDB_SUCCESS
> everywhere and just compare to zero?

Sure.

> Also, the Git project CodingGuidelines say to avoid assignments in the
> condition of an "if" statement.
> 
> So all in all I think these should be written like
> 
>         ret = mdb_env_create(env);
>         if (ret)
>                 die("mdb_env_create failed: %s", mdb_strerror(ret));

OK.

> > +
> > +int ref_update_cmp(const void *entry, const void *entry_or_key, const void *keydata)
> 
> The usual convention for a "cmp" function is that it returns <0, =0, or
> >0 depending on the relative ordering of its arguments. This one can
> only be used for equality/inequality. Please add a comment warning the
> reader of this (and maybe rename it, e.g., to ref_update_not_equal(), to
> further reduce the chance of confusion).

Will fix.

> > +	if (total_commands_run != last_commands_run) {
> > +		/*
> > +		 * Since each transaction sees a consistent view of
> > +		 * the db, downstream processes that write the db
> > +		 * won't be seen in this transaction.  We don't know
> > +		 * whether any given downstream process has made any
> > +		 * writes, so if there have been any downstream processes,
> > +		 * we had better reopen the transaction.
> > +		 */
> 
> Is it possible to tell from the database whether there has been a write
> since the current read transaction was started? 

Actually, yes.  Fixed.

> It seems like this would
> be a more reliable method, given that an unrelated process might have
> updated references while this command was running, or a long-running
> background process (like Duy's background `git gc --auto`) might still
> continue running even after the original command exits.
> 
> Of course in the end checks like this can never 100% prevent races. The
> only way, it would seem, would be to check-and-set during a single write
> transaction that holds a write lock on the DB.
> 
> That's exactly what we do for file-based references: ideally, the only
> allowed changes are compare-and-set operations that are made atomic by
> holding a lock while it is occurring (one or more loose reference
> lockfiles and/or the packed-refs lockfile). That is why our ref_updates
> have old_sha1 and new_sha1, and old_sha1 is always used for operations
> that start before a lock is held.
> 
> So I guess my question is this: to what extent are you relying on this
> total_commands_run mechanism for correctness, vs. just using it to avoid
> using reference values that are *really* old?

Some tests fail without it, unfortunately.  (I can't remember which ones
now).

> It seems to me that a very important feature of Git is the following:
> 
>     Two processes *must not* block each other (except possibly
>     for a very brief time) if they are updating disjoint sets
>     of references.
>
> With the file-backed reference backend, this feature is accomplished via
> the following mechanisms:
> 
> * Locks are only held for very short periods of time, and never when
>   calling other processes.
> 
> * Locks usually only apply to single references (loose references).
>   Sometimes the packed-refs file has to be locked, but in such cases
>   a second writer retries the lock acquisition for long enough that
>   it should usually succeed.
> 
> * If a reference value was checked before a lock was held, then it
>   is always re-checked after acquiring the lock to make sure that
>   another process didn't change it in the meantime. (If the
>   reference *was* changed from its expected value, it is ok to
>   die().) This is necessary to preserve safety even if a lock is
>   not held for the duration of a command.
> 
> In particular, locks are only held for the duration of the call to
> ref_transaction_commit(), *not* while building up a transaction using
> ref_transaction_update() etc.
> 
> I am afraid that the LMDB backend as currently implemented will be more
> subject to locking conflicts because the LMDB locks are broader in both
> time and space:
> 
> 1. You are holding the LMDB transaction open for longer (during the
>    whole time that the ref_transaction is being built up). This is
>    worse simply because it is a longer time. But it is also worse
>    because the process that is building up the transaction might be
>    doing other things while holding the write lock. Have you checked
>    whether any callers do things that could change references (for
>    example, invoke user hooks) between the calls to
>    ref_transaction_begin() and ref_transaction_commit()? With the
>    files backend this isn't an issue because the lock isn't acquired
>    until ref_transaction_commit() is called.
> 
> 2. The LMDB write lock is database-wide. This means that a single
>    writer, even if modifying only a single reference, blocks all
>    writes to all references.
> 
> Therefore I am worried that the LMDB backend will be susceptable to more
> lock conflicts than the files backend.
> 
> [What] would be the disadvantage of letting ref_transaction_update() and
> its friends build up `struct ref_update` objects in memory (much like
> they do in the files backend), and only start the LMDB write transaction
> when ref_transaction_commit() is called? Obviously it would cost some
> more bookkeeping, but that code is already written. This would be much
> closer to the files backend model.
> 
> This is where I stopped for the day.

I think you have convinced me to switch over to this method.  Unless
something weird comes up while I'm doing the switch, in which case I'll
send email.

> I have a bunch of small stylistic and formatting suggestions regarding
> this patch in particular. Rather than describe them all in prose, I just
> made the suggested changes and pushed them to my GitHub fork [1], branch
> "refs-be-lmdb-housecleaning". Feel free to squash the ones you want into
> this commit.

I took them all.  THanks.

> A lot of these problems were found by gcc, which I apparently have
> configured more strictly than yours. You might want to pick some options
> from my config.mak [2] to make gcc enforce some of the Git project
> policies automatically. (Every dev on this list probably has his/her own
> variation on this file.)

I've adopted yours, thanks.

> I apologize again for not having reviewed your patches more promptly. I
> really think it is good and important work. I've just been so busy with
> other things and I knew it would take a couple of solid days of work to
> get through it. I'll try to review the rest tomorrow, but since I don't
> know anything about the LMDB API, the review will either be superficial
> or it will take a long time...

Thanks for the hard work here.  I know this is a big chunk of code to
review, and I'm grateful for the thoughts.

next prev parent reply	other threads:[~2015-10-07  1:51 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-28 22:01 [PATCH v2 00/43] lmdb ref backend David Turner
2015-09-28 22:01 ` [PATCH v2 01/43] refs.c: create a public version of verify_refname_available David Turner
2015-10-03  5:02   ` Torsten Bögershausen
2015-10-03 16:50     ` David Turner
2015-10-03 21:07       ` Torsten Bögershausen
2015-10-04  6:07       ` Torsten Bögershausen
2015-10-05  4:29   ` Michael Haggerty
2015-10-05 20:23     ` David Turner
2015-09-28 22:01 ` [PATCH v2 02/43] refs: make repack_without_refs and is_branch public David Turner
2015-10-05  4:34   ` Michael Haggerty
2015-10-05 20:26     ` David Turner
2015-09-28 22:01 ` [PATCH v2 03/43] refs-be-files.c: rename refs to refs-be-files David Turner
2015-09-28 22:01 ` [PATCH v2 04/43] refs.c: add a new refs.c file to hold all common refs code David Turner
2015-09-28 22:01 ` [PATCH v2 05/43] refs.c: move update_ref to refs.c David Turner
2015-09-28 22:01 ` [PATCH v2 06/43] refs.c: move delete_ref and delete_refs to the common code David Turner
2015-09-28 22:01 ` [PATCH v2 07/43] refs.c: move read_ref_at to the common refs file David Turner
2015-09-28 22:01 ` [PATCH v2 08/43] refs.c: move the hidden refs functions to the common code David Turner
2015-09-28 22:01 ` [PATCH v2 09/43] refs.c: move dwim and friend functions to the common refs code David Turner
2015-09-28 22:01 ` [PATCH v2 10/43] refs.c: move warn_if_dangling_symref* to the common code David Turner
2015-09-28 22:01 ` [PATCH v2 11/43] refs.c: move read_ref, read_ref_full and ref_exists " David Turner
2015-09-28 22:01 ` [PATCH v2 12/43] refs.c: move resolve_refdup to common David Turner
2015-09-28 22:01 ` [PATCH v2 13/43] refs.c: move check_refname_format to the common code David Turner
2015-09-28 22:01 ` [PATCH v2 14/43] refs.c: move is_branch " David Turner
2015-09-28 22:01 ` [PATCH v2 15/43] refs.c: move prettify_refname " David Turner
2015-09-28 22:01 ` [PATCH v2 16/43] refs.c: move ref iterators " David Turner
2015-09-28 22:01 ` [PATCH v2 17/43] refs.c: move head_ref_namespaced " David Turner
2015-09-28 22:01 ` [PATCH v2 18/43] refs-be-files.c: add a backend method structure with transaction functions David Turner
2015-10-05  8:03   ` Michael Haggerty
2015-10-05 17:25     ` Junio C Hamano
2015-10-06  0:20       ` David Turner
2015-09-28 22:01 ` [PATCH v2 19/43] refs-be-files.c: add methods for misc ref operations David Turner
2015-09-28 22:01 ` [PATCH v2 20/43] refs-be-files.c: add methods for the ref iterators David Turner
2015-09-28 22:01 ` [PATCH v2 21/43] refs-be-files.c: add method for for_each_reftype_ David Turner
2015-09-28 22:01 ` [PATCH v2 22/43] refs-be-files.c: add do_for_each_per_worktree_ref David Turner
2015-10-05  8:19   ` Michael Haggerty
2015-10-05 20:14     ` David Turner
2015-09-28 22:01 ` [PATCH v2 23/43] refs.c: move refname_is_safe to the common code David Turner
2015-09-28 22:01 ` [PATCH v2 24/43] refs.h: document make refname_is_safe and add it to header David Turner
2015-09-28 22:02 ` [PATCH v2 25/43] refs.c: move copy_msg to the common code David Turner
2015-09-28 22:02 ` [PATCH v2 26/43] refs.c: move peel_object " David Turner
2015-09-28 22:02 ` [PATCH v2 27/43] refs.c: move should_autocreate_reflog to " David Turner
2015-10-02 20:57   ` Junio C Hamano
2015-09-28 22:02 ` [PATCH v2 28/43] refs.c: add ref backend init function David Turner
2015-10-05  8:37   ` Michael Haggerty
2015-10-05 20:37     ` David Turner
2015-09-28 22:02 ` [PATCH v2 29/43] refs.c: add methods for reflog David Turner
2015-09-28 22:02 ` [PATCH v2 30/43] refs-be-files.c: add method to expire reflogs David Turner
2015-10-05  8:41   ` Michael Haggerty
2015-09-28 22:02 ` [PATCH v2 31/43] refs.c: add method for initial ref transaction commit David Turner
2015-09-28 22:02 ` [PATCH v2 32/43] initdb: move safe_create_dir into common code David Turner
2015-09-28 22:02 ` [PATCH v2 33/43] refs.c: add method for initializing refs db David Turner
2015-09-28 22:02 ` [PATCH v2 34/43] refs.c: make struct ref_transaction generic David Turner
2015-10-06 17:43   ` Michael Blume
2015-10-06 17:53     ` David Turner
2015-09-28 22:02 ` [PATCH v2 35/43] refs-be-files.c: add method to rename refs David Turner
2015-09-28 22:02 ` [PATCH v2 36/43] run-command: track total number of commands run David Turner
2015-09-28 22:02 ` [PATCH v2 37/43] refs: move some defines from refs-be-files.c to refs.h David Turner
2015-10-05  8:57   ` Michael Haggerty
2015-09-28 22:02 ` [PATCH v2 38/43] refs: make some files backend functions public David Turner
2015-10-05  9:03   ` Michael Haggerty
2015-10-06  1:24     ` David Turner
2015-10-07  1:25     ` David Turner
2015-10-07 16:00       ` Michael Haggerty
2015-10-07 17:20         ` Junio C Hamano
2015-10-07 20:55         ` David Turner
2015-10-07 22:47           ` Michael Haggerty
2015-09-28 22:02 ` [PATCH v2 39/43] refs: break out a ref conflict check David Turner
2015-10-05  9:06   ` Michael Haggerty
2015-10-06  0:28     ` David Turner
2015-09-28 22:02 ` [PATCH v2 40/43] refs: allow ref backend to be set for clone David Turner
2015-10-05  9:32   ` Michael Haggerty
2015-10-06  1:29     ` David Turner
2015-10-06  1:58       ` Jeff King
2015-10-06 18:09         ` David Turner
2015-10-12  9:00           ` Carlos Martín Nieto
2015-10-05 11:55   ` Michael Haggerty
2015-10-06  1:16     ` David Turner
2015-09-28 22:02 ` [PATCH v2 41/43] refs: add register_refs_backend David Turner
2015-09-28 22:02 ` [PATCH v2 42/43] refs: add LMDB refs backend David Turner
2015-10-02 21:35   ` Junio C Hamano
2015-10-05 15:47   ` Michael Haggerty
2015-10-07  1:51     ` David Turner [this message]
2015-10-07 18:31       ` Michael Haggerty
2015-10-07 19:08         ` Junio C Hamano
2015-10-07 19:20         ` David Turner
2015-10-07 22:13           ` Michael Haggerty
2015-09-28 22:02 ` [PATCH v2 43/43] refs: tests for db backend David Turner
2015-10-03 17:39   ` Dennis Kaarsemaker
2015-10-05 16:56     ` Junio C Hamano
2015-10-06  0:20       ` David Turner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1444182660.7739.77.camel@twopensource.com \
    --to=dturner@twopensource.com \
    --cc=git@vger.kernel.org \
    --cc=mhagger@alum.mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).