git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Duy Nguyen <pclouds@gmail.com>
To: David Turner <dturner@twopensource.com>
Cc: git@vger.kernel.org, mhagger@alum.mit.edu
Subject: Re: [PATCH v5 25/27] refs: add LMDB refs storage backend
Date: Thu, 18 Feb 2016 15:50:23 +0700	[thread overview]
Message-ID: <20160218085023.GA30049@lanh> (raw)
In-Reply-To: <1455772670-21142-26-git-send-email-dturner@twopensource.com>

Caveat: I did not study how to use lmdb. I just guessed what it does
based on function names. I don't know much about refs handling either
(especially since the transaction thing is introduced)

> diff --git a/Documentation/technical/refs-lmdb-backend.txt b/Documentation/technical/refs-lmdb-backend.txt
> new file mode 100644
> index 0000000..eb81465
> --- /dev/null
> +++ b/Documentation/technical/refs-lmdb-backend.txt
> +Reflog values are in the same format as the original files-based
> +reflog, including the trailing LF. The date in the reflog value
> +matches the date in the timestamp field.

..except that SHA-1s are stored in raw values instead of hex strings.

> diff --git a/Documentation/technical/repository-version.txt b/Documentation/technical/repository-version.txt
> index 00ad379..fca5ecd 100644
> --- a/Documentation/technical/repository-version.txt
> +++ b/Documentation/technical/repository-version.txt
> @@ -86,3 +86,8 @@ for testing format-1 compatibility.
>  When the config key `extensions.preciousObjects` is set to `true`,
>  objects in the repository MUST NOT be deleted (e.g., by `git-prune` or
>  `git repack -d`).
> +
> +`refStorage`
> +~~~~~~~~~~~~
> +This extension allows the use of alternate ref storage backends.  The
> +only defined value is `lmdb`.

refStorage accepts empty string and `files` as well, should probably
be worth mentioning.

> diff --git a/refs/lmdb-backend.c b/refs/lmdb-backend.c
> +#include "../cache.h"
> +#include <lmdb.h>
> +#include "../object.h"
> +#include "../refs.h"
> +#include "refs-internal.h"
> +#include "../tag.h"
> +#include "../lockfile.h"

I'm quite sure we don't need "../". We have plenty of source files in
subdirs and many of them (haven't checked all) just go with
#include "cache.h".

> +struct lmdb_transaction transaction;

static?

> +
> +static int in_write_transaction(void)
> +{
> +	return transaction.txn && !(transaction.flags & MDB_RDONLY);
> +}
> +
> +static void init_env(MDB_env **env, const char *path)
> +{
> +	int ret;
> +	if (*env)
> +		return;
> +
> +	assert(path);
> +
> +	ret = mdb_env_create(env);
> +	if (ret)
> +		die("BUG: mdb_env_create failed: %s", mdb_strerror(ret));
> +	ret = mdb_env_set_maxreaders(*env, 1000);
> +	if (ret)
> +		die("BUG: mdb_env_set_maxreaders failed: %s", mdb_strerror(ret));
> +	ret = mdb_env_set_mapsize(*env, (1<<30));
> +	if (ret)
> +		die("BUG: mdb_set_mapsize failed: %s", mdb_strerror(ret));
> +	ret = mdb_env_open(*env, path, 0 , 0664);

This permission makes me wonder if we need adjust_shared_perm() here
and some other places.

> +	if (ret)
> +		die("BUG: mdb_env_open (%s) failed: %s", path,
> +		    mdb_strerror(ret));
> +}
> +
> +static int lmdb_init_db(int shared, struct strbuf *err)
> +{
> +	/*
> +	 * To create a db, all we need to do is make a directory for
> +	 * it to live in; lmdb will do the rest.
> +	 */
> +
> +	if (!db_path)
> +		db_path = xstrdup(real_path(git_path("refs.lmdb")));
> +
> +	if (mkdir(db_path, 0775) && errno != EEXIST) {
> +		strbuf_addf(err, "%s", strerror(errno));

maybe strbuf_addstr, unless want to add something more, "mkdir failed"?

> +static int read_per_worktree_ref(const char *submodule, const char *refname,
> +				 struct MDB_val *val, int *needs_free)

>From what I read, I suspect these _per_worktree functions will be
identical for the next backend. Should we just hand over the job for
files backend? For all entry points that may deal with per-worktree
refs, e.g. lmdb_resolve_ref_unsafe, can we check ref_type() first
thing, if it's per-worktree we call refs_be_files.resolve_ref_unsafe()
instead?  It could even be done at frontend level,
e.g. refs.c:resolve_ref_unsafe().

Though I may be talking rubbish here because I don't know how whether
it has anything to do with transactions.

> +{
> +	struct strbuf sb = STRBUF_INIT;
> +	struct strbuf path = STRBUF_INIT;
> +	struct stat st;
> +	int ret = -1;
> +
> +	submodule_path(&path, submodule, refname);
> +
> +#ifndef NO_SYMLINK_HEAD

It started with the compiler warns about unused "st" when this macro
is defined. Which makes me wonder, should we do something like this to
make sure this code compiles unconditionally?

+#ifndef NO_SYMLINK_HEAD
+       int no_symlink_head = 0;
+#else
+       int no_symlink_head = 1;
+#endif
...
+       if (!no_symlink_head) {
...


> +int lmdb_transaction_begin_flags(struct strbuf *err, unsigned int flags)

static?

> +#define MAXDEPTH 5
> +
> +static const char *parse_ref_data(struct lmdb_transaction *transaction,
> +				  const char *refname, const char *ref_data,
> +				  unsigned char *sha1, int resolve_flags,
> +				  int *flags, int bad_name)
> +{
> +	int depth = MAXDEPTH;
> +	const char *buf;
> +	static struct strbuf refname_buffer = STRBUF_INIT;
> +	static struct strbuf refdata_buffer = STRBUF_INIT;
> +	MDB_val key, val;
> +	int needs_free = 0;
> +
> +	for (;;) {
> +		if (--depth < 0)
> +			return NULL;
> +
> +		if (!starts_with(ref_data, "ref:")) {
> +			if (get_sha1_hex(ref_data, sha1) ||
> +			    (ref_data[40] != '\0' && !isspace(ref_data[40]))) {
> +				if (flags)
> +					*flags |= REF_ISBROKEN;
> +				errno = EINVAL;
> +				return NULL;
> +			}
> +
> +			if (bad_name) {
> +				hashclr(sha1);
> +				if (flags)
> +					*flags |= REF_ISBROKEN;
> +			} else if (is_null_sha1(sha1)) {
> +				if (flags)
> +					*flags |= REF_ISBROKEN;
> +			}
> +			return refname;
> +		}
> +		if (flags)
> +			*flags |= REF_ISSYMREF;
> +		buf = ref_data + 4;
> +		while (isspace(*buf))
> +			buf++;
> +		strbuf_reset(&refname_buffer);
> +		strbuf_addstr(&refname_buffer, buf);
> +		refname = refname_buffer.buf;
> +		if (resolve_flags & RESOLVE_REF_NO_RECURSE) {
> +			hashclr(sha1);
> +			return refname;
> +		}
> +		if (check_refname_format(buf, REFNAME_ALLOW_ONELEVEL)) {
> +			if (flags)
> +				*flags |= REF_ISBROKEN;
> +
> +			if (!(resolve_flags & RESOLVE_REF_ALLOW_BAD_NAME) ||
> +			    !refname_is_safe(buf)) {
> +				errno = EINVAL;
> +				return NULL;
> +			}
> +			bad_name = 1;
> +		}

This code looks a lot like near the end of resolve_ref_1(). Maybe we
could share the code in refs/backend-common.c or something and call
here instead?

> +static const char *resolve_ref_unsafe_txn(struct lmdb_transaction *transaction,
> +					  const char *refname,
> +					  int resolve_flags,
> +					  unsigned char *sha1,
> +					  int *flags)
> +{
...
> +	if (flags)
> +		*flags = 0;
> +
> +	if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL)) {
> +		if (flags)
> +			*flags |= REF_BAD_NAME;
> +
> +		if (!(resolve_flags & RESOLVE_REF_ALLOW_BAD_NAME) ||
> +		    !refname_is_safe(refname)) {
> +			errno = EINVAL;
> +			return NULL;
> +		}
> +		/*
> +		 * dwim_ref() uses REF_ISBROKEN to distinguish between
> +		 * missing refs and refs that were present but invalid,
> +		 * to complain about the latter to stderr.
> +		 *
> +		 * We don't know whether the ref exists, so don't set
> +		 * REF_ISBROKEN yet.
> +		 */
> +		bad_name = 1;
> +	}

This code too.

> +static int show_one_reflog_ent(struct strbuf *sb, each_reflog_ent_fn fn, void *cb_data)
> +{
> +	unsigned char osha1[20], nsha1[20];
> +	char *email_end, *message;
> +	unsigned long timestamp;
> +	int tz;
> +
> +	/* old (raw sha) new (raw sha) name <email> SP time TAB msg LF */

Hmm.. since you're going with raw sha-1, this is clearly not a text
string anymore, any reason to still keep LF at the end?

> +static int lmdb_delete_reflog(const char *refname)
> +{
> +	MDB_val key, val;
> +	char *log_path;
> +	int len;
> +	MDB_cursor *cursor;
> +	int ret = 0;
> +	int mdb_ret;
> +	struct strbuf err = STRBUF_INIT;
> +	int in_transaction;
> +
> +	if (ref_type(refname) != REF_TYPE_NORMAL)
> +		return refs_be_files.delete_reflog(refname);

Yay.. delegating work to files backend. I still think doing this in
refs.c:delete_reflog() may be a good idea.

> +int lmdb_reflog_expire(const char *refname, const unsigned char *sha1,

static?

> +static int lmdb_create_symref(const char *ref_target,
> +			      const char *refs_heads_master,
> +			      const char *logmsg)
> +{
> +
...
> +	mdb_put_or_die(&transaction, &key, &val, 0);
> +
> +	/* TODO: Don't create ref d/f conflicts */

I'm not sure I get this comment. D/F conflicts are no longer a thing
for lmdb backend, right?

> +MDB_env *submodule_txn_begin(struct lmdb_transaction *transaction)

static?

> +{
> +	int ret;
> +	MDB_env *submodule_env = NULL;
> +	struct strbuf path = STRBUF_INIT;
> +
> +	strbuf_git_path_submodule(&path, transaction->submodule, "refs.lmdb");
> +
> +	if (!is_directory(path.buf))
> +		goto done;
> +
> +	mkdir(path.buf, 0775);

A few other places where mkdir() is called, we may need to
adjust_shared_perm().

> diff --git a/setup.c b/setup.c
> index 1a62277..00625ab 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -279,7 +279,7 @@ int ref_storage_backend_config(const char *var, const char *value, void *ptr)
>   *
>   *  - either an objects/ directory _or_ the proper
>   *    GIT_OBJECT_DIRECTORY environment variable
> - *  - a refs/ directory
> + *  - a refs.lmdb/ directory or a refs/ directory
>   *  - either a HEAD symlink or a HEAD file that is formatted as
>   *    a proper "ref:", or a regular file HEAD that has a properly
>   *    formatted sha1 object name.
> @@ -313,8 +313,13 @@ int is_git_directory(const char *suspect)
>  
>  	strbuf_setlen(&path, len);
>  	strbuf_addstr(&path, "/refs");
> -	if (access(path.buf, X_OK))
> -		goto done;
> +
> +	if (access(path.buf, X_OK)) {
> +		strbuf_setlen(&path, len);
> +		strbuf_addstr(&path, "/refs.lmdb");
> +		if (access(path.buf, X_OK))
> +			goto done;
> +	}
>

I think it's ok leaving this function unmodified, which means "refs"
directory will always be there and "refs.lmdb" does not matter. If
somehow "refs" is deleted, old binaries get confused anyway so we
can't delete it.

> @@ -1089,8 +1089,11 @@ static int refs_from_alternate_cb(struct alternate_object_database *e,
>  		goto out;
>  	/* Is this a git repository with refs? */
>  	memcpy(other + len - 8, "/refs", 6);
> -	if (!is_directory(other))
> -		goto out;
> +	if (!is_directory(other)) {
> +		memcpy(other + len - 8, "/refs.lmdb", 11);
> +		if (!is_directory(other))
> +			goto out;
> +	}

and probably the same here. I have no idea what this code does though,
but if it's about detecting git directory, it should call
is_git_directory() instead.

>  	other[len - 8] = '\0';
>  	remote = remote_get(other);
>  	transport = transport_get(remote, other);
--
Duy

  reply	other threads:[~2016-02-18  8:50 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-18  5:17 [PATCH v5 00/27] refs backends David Turner
2016-02-18  5:17 ` [PATCH v5 01/27] refs: Move head_ref{,_submodule} to the common code David Turner
2016-02-18  5:17 ` [PATCH v5 02/27] refs: move for_each_*ref* functions into " David Turner
2016-02-18  5:17 ` [PATCH v5 03/27] refs: add a backend method structure with transaction functions David Turner
2016-02-18  5:17 ` [PATCH v5 04/27] refs: add methods for misc ref operations David Turner
2016-02-18  5:17 ` [PATCH v5 05/27] refs: add method for do_for_each_ref David Turner
2016-02-18  5:17 ` [PATCH v5 06/27] refs: add do_for_each_per_worktree_ref David Turner
2016-02-18  5:17 ` [PATCH v5 07/27] refs: add methods for reflog David Turner
2016-02-18  5:17 ` [PATCH v5 08/27] refs: add method for initial ref transaction commit David Turner
2016-02-18  5:17 ` [PATCH v5 09/27] refs: add method for delete_refs David Turner
2016-02-18  5:17 ` [PATCH v5 10/27] refs: add methods to init refs db David Turner
2016-02-18  5:17 ` [PATCH v5 11/27] refs: add method to rename refs David Turner
2016-02-18  5:17 ` [PATCH v5 12/27] refs: forbid cross-backend ref renames David Turner
2016-02-20  4:30   ` Duy Nguyen
2016-02-24 20:48     ` David Turner
2016-02-18  5:17 ` [PATCH v5 13/27] refs: make lock generic David Turner
2016-02-18  5:17 ` [PATCH v5 14/27] refs: move duplicate check to common code David Turner
2016-02-18  5:17 ` [PATCH v5 15/27] refs: allow log-only updates David Turner
2016-02-18  5:17 ` [PATCH v5 16/27] refs: don't dereference on rename David Turner
2016-02-18  5:17 ` [PATCH v5 17/27] refs: on symref reflog expire, lock symref not referrent David Turner
2016-02-18  5:17 ` [PATCH v5 18/27] refs: resolve symbolic refs first David Turner
2016-02-18  5:17 ` [PATCH v5 19/27] refs: always handle non-normal refs in files backend David Turner
2016-02-18  5:17 ` [PATCH v5 20/27] init: allow alternate ref strorage to be set for new repos David Turner
2016-02-18  5:17 ` [PATCH v5 21/27] refs: check submodules' ref storage config David Turner
2016-02-18  5:17 ` [PATCH v5 22/27] clone: allow ref storage backend to be set for clone David Turner
2016-02-18  5:17 ` [PATCH v5 23/27] svn: learn ref-storage argument David Turner
2016-02-20 23:55   ` Eric Wong
2016-02-23 18:08     ` David Turner
2016-02-18  5:17 ` [PATCH v5 24/27] refs: add register_ref_storage_backends() David Turner
2016-02-18  5:17 ` [PATCH v5 25/27] refs: add LMDB refs storage backend David Turner
2016-02-18  8:50   ` Duy Nguyen [this message]
2016-02-18 20:23     ` David Turner
2016-02-18 21:15       ` Junio C Hamano
2016-02-19  2:54       ` Duy Nguyen
2016-02-19 19:10         ` David Turner
2016-02-20 13:14       ` Duy Nguyen
2016-02-24 20:41         ` David Turner
2016-02-20 21:32       ` Junio C Hamano
2016-02-19 22:49     ` David Turner
2016-02-19 23:08       ` Junio C Hamano
2016-02-20  2:58       ` Duy Nguyen
2016-02-24 20:43         ` David Turner
2016-02-25 10:07           ` Duy Nguyen
2016-02-20  8:59   ` Duy Nguyen
2016-02-24 20:37     ` David Turner
2016-02-25 10:12       ` Duy Nguyen
2016-02-25 20:05         ` [PATCH] refs: document transaction semantics David Turner
2016-02-25 20:10           ` David Turner
2016-02-25 20:34             ` Junio C Hamano
2016-02-25 20:50               ` David Turner
2016-02-18  5:17 ` [PATCH v5 26/27] refs: tests for lmdb backend David Turner
2016-02-18  5:17 ` [PATCH v5 27/27] tests: add ref-storage argument David Turner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160218085023.GA30049@lanh \
    --to=pclouds@gmail.com \
    --cc=dturner@twopensource.com \
    --cc=git@vger.kernel.org \
    --cc=mhagger@alum.mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).