From: David Turner <dturner@twopensource.com>
To: Michael Haggerty <mhagger@alum.mit.edu>, git@vger.kernel.org
Subject: Re: [PATCH v4 20/21] refs: add LMDB refs storage backend
Date: Fri, 12 Feb 2016 20:23:43 -0500 [thread overview]
Message-ID: <1455326623.10888.3.camel@twopensource.com> (raw)
In-Reply-To: <56BE0FE1.5030407@alum.mit.edu>
On Fri, 2016-02-12 at 18:01 +0100, Michael Haggerty wrote:
> On 02/05/2016 08:44 PM, David Turner wrote:
> > Add a database backend for refs using LMDB. This backend runs git
> > for-each-ref about 30% faster than the files backend with fully
> > -packed
> > refs on a repo with ~120k refs. It's also about 4x faster than
> > using
> > fully-unpacked refs. In addition, and perhaps more importantly, it
> > avoids case-conflict issues on OS X.
> >
> > LMDB has a few features that make it suitable for usage in git:
> >
>
> 0. It is licensed under the OpenLDAP Public License, which is
> apparently
> GPL compatible [1].
>
> [1] http://www.gnu.org/licenses/license-list.en.html#newOpenLDAP
Oh, yeah, I checked that before choosing it and then forgot all about
it.
> > 1. It is relatively lightweight; it requires only one header file,
> > and
> > the library code takes under 64k at runtime.
> >
> > 2. It is well-tested: it's been used in OpenLDAP for years.
> >
> > 3. It's very fast. LMDB's benchmarks show that it is among
> > the fastest key-value stores.
> >
> > 4. It has a relatively simple concurrency story; readers don't
> > block writers and writers don't block readers.
> >
> > Ronnie Sahlberg's original version of this patchset used tdb. The
> > major disadvantage of tdb is that tdb is hard to build on OS X.
> > It's
> > also not in homebrew. So lmdb seemed simpler.
> >
> > To test this backend's correctness, I hacked test-lib.sh and
> > test-lib-functions.sh to run all tests under the refs backend.
> > Dozens
> > of tests use manual ref/reflog reading/writing, or create
> > submodules
> > without passing --ref-storage to git init. If those tests are
> > changed to use the update-ref machinery or test-refs-lmdb-backend
> > (or,
> > in the case of packed-refs, corrupt refs, and dumb fetch tests, are
> > skipped), the only remaining failing tests are the git-new-workdir
> > tests and the gitweb tests.
>
> Would it possible to build your "hack" into the test suite? For
> example,
> perhaps one could implement an option that requests tests to use LMDB
> wherever possible and skip the tests that are not LMDB-compatible.
> Over
> time, we could try to increase the fraction of tests that are
> LMDB-compatible by (for example) using `git update-ref` and `git
> rev-parse` rather than reading/writing reference files via the
> filesystem directly.
>
> Otherwise, I'm worried that LMDB support will bitrot quickly because
> it
> will be too much of a nuisance to exercise the code.
I can add that. It requires minor changes to a large number of tests,
but that's OK.
> > Signed-off-by: David Turner <dturner@twopensource.com>
> > ---
> > .gitignore | 1 +
> > Documentation/config.txt | 7 +
> > Documentation/git-clone.txt | 3 +-
> > Documentation/git-init.txt | 2 +-
> > Documentation/technical/refs-lmdb-backend.txt | 52 +
> > Documentation/technical/repository-version.txt | 5 +
> > Makefile | 12 +
> > configure.ac | 33 +
> > contrib/workdir/git-new-workdir | 3 +
> > refs.c | 3 +
> > refs.h | 2 +
> > refs/lmdb-backend.c | 2052
> > ++++++++++++++++++++++++
> > setup.c | 11 +-
> > test-refs-lmdb-backend.c | 64 +
> > transport.c | 7 +-
> > 15 files changed, 2250 insertions(+), 7 deletions(-)
> > create mode 100644 Documentation/technical/refs-lmdb-backend.txt
> > create mode 100644 refs/lmdb-backend.c
> > create mode 100644 test-refs-lmdb-backend.c
> >
> > diff --git a/.gitignore b/.gitignore
> > index 1c2f832..87d45a2 100644
> > --- a/.gitignore
> > +++ b/.gitignore
> > @@ -199,6 +199,7 @@
> > /test-path-utils
> > /test-prio-queue
> > /test-read-cache
> > +/test-refs-lmdb-backend
> > /test-regex
> > /test-revision-walking
> > /test-run-command
> > diff --git a/Documentation/config.txt b/Documentation/config.txt
> > index 06d3659..bc222c9 100644
> > --- a/Documentation/config.txt
> > +++ b/Documentation/config.txt
> > @@ -1153,6 +1153,13 @@ difftool.<tool>.cmd::
> > difftool.prompt::
> > Prompt before each invocation of the diff tool.
> >
> > +extensions.refsStorage::
> > + Type of refs storage backend. Default is to use the
> > original
> > + files based ref storage. When set to "lmdb", refs are
> > stored in
> > + the lmdb database backend. This setting reflects the refs
> > + backend that is currently in use; it is not possible to
> > change
> > + the backend by changing this setting.
> > +
>
> Let's give the users a pointer to how they *can* change this setting.
> Maybe
>
> Type of refs storage backend. Default is to use the original
> files based ref storage. When set to "lmdb", refs are stored
> in
> an LMDB database. This setting reflects the refs storage
> backend that was chosen via the --ref-storage option when the
> repository was originally created. It is currently
> not possible to change the refs storage backend of an
> existing repository.
OK.
> > fetch.recurseSubmodules::
> > This option can be either set to a boolean value or to 'on
> > -demand'.
> > Setting it to a boolean changes the behavior of fetch and
> > pull to
> > diff --git a/Documentation/git-clone.txt b/Documentation/git
> > -clone.txt
> > index 68f56a7..b9c52ce 100644
> > --- a/Documentation/git-clone.txt
> > +++ b/Documentation/git-clone.txt
> > @@ -227,7 +227,8 @@ objects from the source repository into a pack
> > in the cloned repository.
> >
> > --ref-storage=<name>::
> > Type of ref storage backend. Default is to use the
> > original files
> > - based ref storage.
> > + based ref storage. Set to "lmdb" to store refs in the lmdb
> > database
> > + backend.
>
> Should the second "lmdb" be upper-case? Maybe also in other places
> where
> the LMDB database is being referred to?
OK, I think I got most of them.
> > <repository>::
> > The (possibly remote) repository to clone from. See the
> > [...]
>
> I still haven't reviewed the actual lmdb-backend code. I guess I'm
> going
> to have to make myself an LMDB expert one of these days...
I highly recommend it! The API is pretty reasonable, and it does seem
to be fast. The error messages could use some work, but that's my only
complaint.
next prev parent reply other threads:[~2016-02-13 1:24 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-05 19:44 [PATCH v4 00/20] refs backend David Turner
2016-02-05 19:44 ` [PATCH v4 01/21] refs: add a backend method structure with transaction functions David Turner
2016-02-05 19:44 ` [PATCH v4 02/21] refs: add methods for misc ref operations David Turner
2016-02-11 7:45 ` Michael Haggerty
2016-02-12 1:09 ` David Turner
2016-02-05 19:44 ` [PATCH v4 03/21] refs: add methods for the ref iterators David Turner
2016-02-11 8:42 ` Michael Haggerty
2016-02-12 1:08 ` David Turner
2016-02-05 19:44 ` [PATCH v4 04/21] refs: add do_for_each_per_worktree_ref David Turner
2016-02-05 19:44 ` [PATCH v4 05/21] refs: add methods for reflog David Turner
2016-02-05 19:44 ` [PATCH v4 06/21] refs: add method for initial ref transaction commit David Turner
2016-02-05 19:44 ` [PATCH v4 07/21] refs: add method for delete_refs David Turner
2016-02-05 19:44 ` [PATCH v4 08/21] refs: add methods to init refs db David Turner
2016-02-11 8:54 ` Michael Haggerty
2016-02-11 21:15 ` David Turner
2016-02-05 19:44 ` [PATCH v4 09/21] refs: add method to rename refs David Turner
2016-02-11 9:00 ` Michael Haggerty
2016-02-11 21:12 ` David Turner
2016-02-05 19:44 ` [PATCH v4 10/21] refs: make lock generic David Turner
2016-02-05 19:44 ` [PATCH v4 11/21] refs: move duplicate check to common code David Turner
2016-02-05 19:44 ` [PATCH v4 12/21] refs: allow log-only updates David Turner
2016-02-11 10:03 ` Michael Haggerty
2016-02-11 21:23 ` David Turner
2016-02-05 19:44 ` [PATCH v4 13/21] refs: resolve symbolic refs first David Turner
2016-02-12 14:09 ` Michael Haggerty
2016-02-18 0:29 ` David Turner
2016-02-18 11:59 ` Michael Haggerty
2016-02-05 19:44 ` [PATCH v4 14/21] refs: always handle non-normal refs in files backend David Turner
2016-02-12 15:07 ` Michael Haggerty
2016-02-18 2:44 ` David Turner
2016-02-18 12:07 ` Michael Haggerty
2016-02-18 18:32 ` David Turner
2016-02-05 19:44 ` [PATCH v4 15/21] init: allow alternate ref strorage to be set for new repos David Turner
2016-02-12 15:26 ` Michael Haggerty
2016-02-17 20:47 ` David Turner
2016-02-18 14:12 ` Michael Haggerty
2016-02-05 19:44 ` [PATCH v4 16/21] refs: check submodules ref storage config David Turner
2016-02-05 19:44 ` [PATCH v4 17/21] clone: allow ref storage backend to be set for clone David Turner
2016-02-05 19:44 ` [PATCH v4 18/21] svn: learn ref-storage argument David Turner
2016-02-05 19:44 ` [PATCH v4 19/21] refs: add register_ref_storage_backends() David Turner
2016-02-12 15:42 ` Michael Haggerty
2016-02-17 20:32 ` David Turner
2016-02-05 19:44 ` [PATCH v4 20/21] refs: add LMDB refs storage backend David Turner
2016-02-11 8:48 ` Michael Haggerty
2016-02-11 21:21 ` David Turner
2016-02-12 17:01 ` Michael Haggerty
2016-02-13 1:23 ` David Turner [this message]
2016-02-14 12:04 ` Duy Nguyen
2016-02-15 9:57 ` Duy Nguyen
2016-02-16 22:01 ` David Turner
2016-02-17 20:32 ` David Turner
2016-02-05 19:44 ` [PATCH v4 21/21] refs: tests for lmdb backend David Turner
2016-02-08 23:37 ` [PATCH v4 00/20] refs backend Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1455326623.10888.3.camel@twopensource.com \
--to=dturner@twopensource.com \
--cc=git@vger.kernel.org \
--cc=mhagger@alum.mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).