git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: David Turner <dturner@twopensource.com>
To: Jeff King <peff@peff.net>
Cc: git mailing list <git@vger.kernel.org>
Subject: Re: RFC/Pull Request: Refs db backend
Date: Tue, 23 Jun 2015 14:18:36 -0400	[thread overview]
Message-ID: <1435083516.28466.24.camel@twopensource.com> (raw)
In-Reply-To: <20150623114716.GC12518@peff.net>

On Tue, 2015-06-23 at 07:47 -0400, Jeff King wrote:
> On Mon, Jun 22, 2015 at 08:50:56PM -0400, David Turner wrote:
> 
> > The db backend runs git for-each-ref about 30% faster than the files
> > backend with fully-packed refs on a repo with ~120k refs.  It's also
> > about 4x faster than using fully-unpacked refs.  In addition, and
> > perhaps more importantly, it avoids case-conflict issues on OS X.
> 
> Neat.
> 
> Can you describe a bit more about the reflog handling?
> 
> One of the problems we've had with large-ref repos is that the reflog
> storage is quite inefficient. You can pack all the refs, but you may
> still be stuck with a bunch of reflog files with one entry, wasting a
> whole inode. Doing a "git repack" when you have a million of those has
> horrible cold-cache performance. Basically anything that isn't
> one-file-per-reflog would be a welcome change. :)

Reflogs are stored in the database as well.  There is one header entry
per ref to indicate that a reflog is present, and then one database
entry per reflog entry; the entries are stored consecutively and
immediately following the header so that it's fast to iterate over them.

> It has also been a dream of mine to stop tying the reflogs specifically
> to the refs. I.e., have a spot for reflogs of branches that no longer
> exist, which allows us to retain them for deleted branches. Then you can
> possibly recover from a branch deletion, whereas now you have to dig
> through "git fsck"'s dangling output. And the reflog, if you don't
> expire it, becomes a suitable audit log to find out what happened to
> each branch when (whereas now it is full of holes when things get
> deleted).

That would be cool, and I don't think it would be hard to add to my
current code; we could simply replace the header with a "tombstone".
But I would prefer to wait until the series is merged; then we can build
on top of it.

> I dunno. Maybe I am overthinking it. But it really feels like the _refs_
> are a key/value thing, but the _reflogs_ are not. You can cram them into
> a key/value store, but you're probably operating on them as a big blob,
> then.

Reflogs are, conceptually, queues. I agree that a raw key-value store is
not a good way to store queues, but a B-Tree is not so terrible, since
it offers relatively fast iteration (amortized constant time IIRC).

> > I chose to use LMDB for the database.  LMDB has a few features that make
> > it suitable for usage in git:
> 
> One of the complaints that Shawn had about sqlite is that there is no
> native Java implementation, which makes it hard for JGit to ship a
> compatible backend. I suspect the same is true for LMDB, but it is
> probably a lot simpler than sqlite (so reimplementation might be
> possible).
> 
> But it may also be worth going with a slightly slower database if we can
> get wider compatibility for free.

There's a JNI interface to LMDB, which is, of course, not native.  I
don't think it would be too hard to entirely rewrite LMDB in Java, but
I'm not going to have time to do it for the forseeable future.  I've
asked Howard Chu if he knows of any efforts in progress.

> > To test this backend's correctness, I hacked test-lib.sh and
> > test-lib-functions.sh to run all tests under the refs backend. Dozens
> > of tests use manual ref/reflog reading/writing, or create submodules
> > without passing --refs-backend-type to git init.  If those tests are
> > changed to use the update-ref machinery or test-refs-be-db (or, in the
> > case of packed-refs, corrupt refs, and dumb fetch tests, are skipped),
> > the only remaining failing tests are the git-new-workdir tests and the
> > gitweb tests.
> 
> I think we'll need to bump core.repositoryformatversion, too. See the
> patches I just posted here:
> 
>   http://thread.gmane.org/gmane.comp.version-control.git/272447

Thanks, that's valuable.  For the refs backend, opening the LMDB
database for writing is sufficient to block other writers.  Do you think
it would be valuable to provide a git hold-ref-lock command that simply
reads refs from stdin and keeps them locked until it reads EOF from
stdin?  That would allow cross-backend ref locking. 

  parent reply	other threads:[~2015-06-23 18:18 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-23  0:50 RFC/Pull Request: Refs db backend David Turner
2015-06-23  5:36 ` Junio C Hamano
2015-06-23 10:23   ` Duy Nguyen
2015-06-23 18:47     ` David Turner
2015-06-23 17:29   ` David Turner
2015-06-23 11:47 ` Jeff King
2015-06-23 13:10   ` Duy Nguyen
2015-06-24  8:51     ` Jeff King
2015-06-23 18:18   ` David Turner [this message]
2015-06-24  9:14     ` Jeff King
2015-06-24 17:29       ` David Turner
2015-06-24  6:09   ` Shawn Pearce
2015-06-24  9:49     ` Jeff King
2015-06-25  1:08       ` brian m. carlson
2015-06-24 10:18     ` Duy Nguyen
2015-06-23 15:51 ` Michael Haggerty
2015-06-23 19:53   ` David Turner
2015-06-23 21:27     ` Michael Haggerty
2015-06-24 17:31       ` David Turner
2015-06-23 21:35     ` David Turner
2015-06-23 21:41       ` Junio C Hamano
2015-06-23 17:16 ` Stefan Beller
2015-06-23 20:04   ` David Turner
2015-06-23 20:10     ` Randall S. Becker
2015-06-23 20:22       ` David Turner
2015-06-23 20:27         ` Randall S. Becker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1435083516.28466.24.camel@twopensource.com \
    --to=dturner@twopensource.com \
    --cc=git@vger.kernel.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).