git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: David Turner <dturner@twopensource.com>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>,
	Michael Haggerty <mhagger@alum.mit.edu>
Subject: Re: [PATCH v5 25/27] refs: add LMDB refs storage backend
Date: Wed, 24 Feb 2016 15:37:22 -0500	[thread overview]
Message-ID: <1456346242.18017.16.camel@twopensource.com> (raw)
In-Reply-To: <CACsJy8BQZCBpfmvXk+o5PqM7=zad7cxgA9B2995Rb0D0YBxEVw@mail.gmail.com>

On Sat, 2016-02-20 at 15:59 +0700, Duy Nguyen wrote:
> On Thu, Feb 18, 2016 at 12:17 PM, David Turner <
> dturner@twopensource.com> wrote:
> > LMDB has a few features that make it suitable for usage in git:
> > ...
> 
> I'm reading lmdb documents and hitting  the caveat section [1].
> Random thoughts
> 
> * "There is normally no pure read-only mode, since readers need write
> access to locks and lock file.".
> 
> This will be a problem for server side that serves git:// protocol
> only. Some of those servers disable write access to the entire
> repository and git still works fine (but won't when lmdb is used).
> Should we do something in this case? Just tell server admins to relax
> file access, or use MDB_NOLOCK (and when? based on config var?)

MDB_NOLOCK is a good idea. I'm going to add this to the "Weaknesses"
section of the docs and plan to fix it later, unless you feel strongly
otherwise.

> *  " Use an MDB_env* in the process which opened it, without
> fork()ing.". We do use fork on non-Windows in run-command.c, but it
> should be followed by exec() with no ref access in between, so we're
> almost good.
> 
> I notice atexit() is used in this for/exec code, which reminds me we
> also use atexit() in many other places. I hope none of them access
> refs, or we could be in trouble.

I don't think they should do that anyway -- it's too much complexity
for an atexit handler.

> * "Do not have open an LMDB database twice in the same process at the
> same time. Not even from a plain open() call - close()ing it breaks
> flock() advisory locking"
> 
> I wonder what happens if we do open twice, will it error out or
> silently ignore and move on? Because if it's the latter case, we need
> some protection from the caller and I'm not sure if
> lmdb-backend.c:init_env() has it, especially when it open submodule's
> lmdb.

I think LMDB could check this, but it doesn't seem to. I've refactored
some of the submodule transaction stuff so that now we have more
protection there.

> * "Avoid long-lived transactions...."
> 
> OK we don't have a problem with this. But it makes me realize lmdb
> transactions do not map with ref transactions. We don't open lmdb
> transaction at ref_transaction_begin(), for example. Is it simply
> more
> convenient to do transactions the current way, or is it impossible or
> incorrect to attach lmdb transactions to ref_transaction_*()?

That was what I did originally, but reviewers here noted that it had
some problems:
1. What if, while a transaction is open, git opens a subprocess that
wants to make its own transaction?  There can only be one writer
transaction open at a time.
2. As you note, long-lived transactions.

Also, the files backend also doesn't do any locking until the last
moment, and it's reasonable to try to be compatible with that.

> * "Avoid aborting a process with an active transaction. The
> transaction becomes "long-lived" as above until a check for stale
> readers is performed or the lockfile is reset, since the process may
> not remove it from the lockfile."
> 
> Does it mean we should at atexit() and signal handler to release
> currently active transaction?

Probably a good idea.

> * "Do not use LMDB databases on remote filesystems, even between
> processes on the same host. This breaks flock() on some OSes,
> possibly
> memory map sync, and certainly sync between programs on different
> hosts."
> 
> OK can't do anything about it anyway, but maybe it should be
> mentioned
> somewhere in Git documentation.

Sure.

> * "Opening a database can fail if another process is opening or
> closing it at exactly the same time."
> 
> We have some retry logic in resolve_ref_1(). Do we need the same for
> lmdb? Not very important though.
> 
> [1] http://symas.com/mdb/doc/

wat.

OK, I'll add a retry loop on that.  I guess we can just keep retrying
on EACCES or EAGAIN.

  reply	other threads:[~2016-02-24 20:37 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-18  5:17 [PATCH v5 00/27] refs backends David Turner
2016-02-18  5:17 ` [PATCH v5 01/27] refs: Move head_ref{,_submodule} to the common code David Turner
2016-02-18  5:17 ` [PATCH v5 02/27] refs: move for_each_*ref* functions into " David Turner
2016-02-18  5:17 ` [PATCH v5 03/27] refs: add a backend method structure with transaction functions David Turner
2016-02-18  5:17 ` [PATCH v5 04/27] refs: add methods for misc ref operations David Turner
2016-02-18  5:17 ` [PATCH v5 05/27] refs: add method for do_for_each_ref David Turner
2016-02-18  5:17 ` [PATCH v5 06/27] refs: add do_for_each_per_worktree_ref David Turner
2016-02-18  5:17 ` [PATCH v5 07/27] refs: add methods for reflog David Turner
2016-02-18  5:17 ` [PATCH v5 08/27] refs: add method for initial ref transaction commit David Turner
2016-02-18  5:17 ` [PATCH v5 09/27] refs: add method for delete_refs David Turner
2016-02-18  5:17 ` [PATCH v5 10/27] refs: add methods to init refs db David Turner
2016-02-18  5:17 ` [PATCH v5 11/27] refs: add method to rename refs David Turner
2016-02-18  5:17 ` [PATCH v5 12/27] refs: forbid cross-backend ref renames David Turner
2016-02-20  4:30   ` Duy Nguyen
2016-02-24 20:48     ` David Turner
2016-02-18  5:17 ` [PATCH v5 13/27] refs: make lock generic David Turner
2016-02-18  5:17 ` [PATCH v5 14/27] refs: move duplicate check to common code David Turner
2016-02-18  5:17 ` [PATCH v5 15/27] refs: allow log-only updates David Turner
2016-02-18  5:17 ` [PATCH v5 16/27] refs: don't dereference on rename David Turner
2016-02-18  5:17 ` [PATCH v5 17/27] refs: on symref reflog expire, lock symref not referrent David Turner
2016-02-18  5:17 ` [PATCH v5 18/27] refs: resolve symbolic refs first David Turner
2016-02-18  5:17 ` [PATCH v5 19/27] refs: always handle non-normal refs in files backend David Turner
2016-02-18  5:17 ` [PATCH v5 20/27] init: allow alternate ref strorage to be set for new repos David Turner
2016-02-18  5:17 ` [PATCH v5 21/27] refs: check submodules' ref storage config David Turner
2016-02-18  5:17 ` [PATCH v5 22/27] clone: allow ref storage backend to be set for clone David Turner
2016-02-18  5:17 ` [PATCH v5 23/27] svn: learn ref-storage argument David Turner
2016-02-20 23:55   ` Eric Wong
2016-02-23 18:08     ` David Turner
2016-02-18  5:17 ` [PATCH v5 24/27] refs: add register_ref_storage_backends() David Turner
2016-02-18  5:17 ` [PATCH v5 25/27] refs: add LMDB refs storage backend David Turner
2016-02-18  8:50   ` Duy Nguyen
2016-02-18 20:23     ` David Turner
2016-02-18 21:15       ` Junio C Hamano
2016-02-19  2:54       ` Duy Nguyen
2016-02-19 19:10         ` David Turner
2016-02-20 13:14       ` Duy Nguyen
2016-02-24 20:41         ` David Turner
2016-02-20 21:32       ` Junio C Hamano
2016-02-19 22:49     ` David Turner
2016-02-19 23:08       ` Junio C Hamano
2016-02-20  2:58       ` Duy Nguyen
2016-02-24 20:43         ` David Turner
2016-02-25 10:07           ` Duy Nguyen
2016-02-20  8:59   ` Duy Nguyen
2016-02-24 20:37     ` David Turner [this message]
2016-02-25 10:12       ` Duy Nguyen
2016-02-25 20:05         ` [PATCH] refs: document transaction semantics David Turner
2016-02-25 20:10           ` David Turner
2016-02-25 20:34             ` Junio C Hamano
2016-02-25 20:50               ` David Turner
2016-02-18  5:17 ` [PATCH v5 26/27] refs: tests for lmdb backend David Turner
2016-02-18  5:17 ` [PATCH v5 27/27] tests: add ref-storage argument David Turner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1456346242.18017.16.camel@twopensource.com \
    --to=dturner@twopensource.com \
    --cc=git@vger.kernel.org \
    --cc=mhagger@alum.mit.edu \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).