From: David Turner <dturner@twopensource.com>
To: Duy Nguyen <pclouds@gmail.com>,
git mailing list <git@vger.kernel.org>,
mhagger@alum.mit.edu
Subject: Re: [PATCH v5 25/27] refs: add LMDB refs storage backend
Date: Fri, 19 Feb 2016 17:49:46 -0500 [thread overview]
Message-ID: <1455922186.7528.97.camel@twopensource.com> (raw)
In-Reply-To: <20160218085023.GA30049@lanh>
On Thu, 2016-02-18 at 15:50 +0700, Duy Nguyen wrote:
> Caveat: I did not study how to use lmdb. I just guessed what it does
> based on function names. I don't know much about refs handling either
> (especially since the transaction thing is introduced)
>
> > diff --git a/Documentation/technical/refs-lmdb-backend.txt
> > b/Documentation/technical/refs-lmdb-backend.txt
> > new file mode 100644
> > index 0000000..eb81465
> > --- /dev/null
> > +++ b/Documentation/technical/refs-lmdb-backend.txt
> > +Reflog values are in the same format as the original files-based
> > +reflog, including the trailing LF. The date in the reflog value
> > +matches the date in the timestamp field.
>
> ..except that SHA-1s are stored in raw values instead of hex strings.
>
> > diff --git a/Documentation/technical/repository-version.txt
> > b/Documentation/technical/repository-version.txt
> > index 00ad379..fca5ecd 100644
> > --- a/Documentation/technical/repository-version.txt
> > +++ b/Documentation/technical/repository-version.txt
> > @@ -86,3 +86,8 @@ for testing format-1 compatibility.
> > When the config key `extensions.preciousObjects` is set to `true`,
> > objects in the repository MUST NOT be deleted (e.g., by `git
> > -prune` or
> > `git repack -d`).
> > +
> > +`refStorage`
> > +~~~~~~~~~~~~
> > +This extension allows the use of alternate ref storage backends.
> > The
> > +only defined value is `lmdb`.
>
> refStorage accepts empty string and `files` as well, should probably
> be worth mentioning.
>
> > diff --git a/refs/lmdb-backend.c b/refs/lmdb-backend.c
> > +#include "../cache.h"
> > +#include <lmdb.h>
> > +#include "../object.h"
> > +#include "../refs.h"
> > +#include "refs-internal.h"
> > +#include "../tag.h"
> > +#include "../lockfile.h"
>
> I'm quite sure we don't need "../". We have plenty of source files in
> subdirs and many of them (haven't checked all) just go with
> #include "cache.h".
>
> > +struct lmdb_transaction transaction;
>
> static?
>
> > +
> > +static int in_write_transaction(void)
> > +{
> > + return transaction.txn && !(transaction.flags &
> > MDB_RDONLY);
> > +}
> > +
> > +static void init_env(MDB_env **env, const char *path)
> > +{
> > + int ret;
> > + if (*env)
> > + return;
> > +
> > + assert(path);
> > +
> > + ret = mdb_env_create(env);
> > + if (ret)
> > + die("BUG: mdb_env_create failed: %s",
> > mdb_strerror(ret));
> > + ret = mdb_env_set_maxreaders(*env, 1000);
> > + if (ret)
> > + die("BUG: mdb_env_set_maxreaders failed: %s",
> > mdb_strerror(ret));
> > + ret = mdb_env_set_mapsize(*env, (1<<30));
> > + if (ret)
> > + die("BUG: mdb_set_mapsize failed: %s",
> > mdb_strerror(ret));
> > + ret = mdb_env_open(*env, path, 0 , 0664);
>
> This permission makes me wonder if we need adjust_shared_perm() here
> and some other places.
>
> > + if (ret)
> > + die("BUG: mdb_env_open (%s) failed: %s", path,
> > + mdb_strerror(ret));
> > +}
> > +
> > +static int lmdb_init_db(int shared, struct strbuf *err)
> > +{
> > + /*
> > + * To create a db, all we need to do is make a directory
> > for
> > + * it to live in; lmdb will do the rest.
> > + */
> > +
> > + if (!db_path)
> > + db_path =
> > xstrdup(real_path(git_path("refs.lmdb")));
> > +
> > + if (mkdir(db_path, 0775) && errno != EEXIST) {
> > + strbuf_addf(err, "%s", strerror(errno));
>
> maybe strbuf_addstr, unless want to add something more, "mkdir
> failed"?
>
> > +static int read_per_worktree_ref(const char *submodule, const char
> > *refname,
> > + struct MDB_val *val, int
> > *needs_free)
>
> From what I read, I suspect these _per_worktree functions will be
> identical for the next backend. Should we just hand over the job for
> files backend? For all entry points that may deal with per-worktree
> refs, e.g. lmdb_resolve_ref_unsafe, can we check ref_type() first
> thing, if it's per-worktree we call
> refs_be_files.resolve_ref_unsafe()
> instead? It could even be done at frontend level,
> e.g. refs.c:resolve_ref_unsafe().
>
> Though I may be talking rubbish here because I don't know how whether
> it has anything to do with transactions.
>
> > +{
> > + struct strbuf sb = STRBUF_INIT;
> > + struct strbuf path = STRBUF_INIT;
> > + struct stat st;
> > + int ret = -1;
> > +
> > + submodule_path(&path, submodule, refname);
> > +
> > +#ifndef NO_SYMLINK_HEAD
>
> It started with the compiler warns about unused "st" when this macro
> is defined. Which makes me wonder, should we do something like this
> to
> make sure this code compiles unconditionally?
>
> +#ifndef NO_SYMLINK_HEAD
> + int no_symlink_head = 0;
> +#else
> + int no_symlink_head = 1;
> +#endif
> ...
> + if (!no_symlink_head) {
> ...
>
>
> > +int lmdb_transaction_begin_flags(struct strbuf *err, unsigned int
> > flags)
>
> static?
>
> > +#define MAXDEPTH 5
> > +
> > +static const char *parse_ref_data(struct lmdb_transaction
> > *transaction,
> > + const char *refname, const char
> > *ref_data,
> > + unsigned char *sha1, int
> > resolve_flags,
> > + int *flags, int bad_name)
> > +{
> > + int depth = MAXDEPTH;
> > + const char *buf;
> > + static struct strbuf refname_buffer = STRBUF_INIT;
> > + static struct strbuf refdata_buffer = STRBUF_INIT;
> > + MDB_val key, val;
> > + int needs_free = 0;
> > +
> > + for (;;) {
> > + if (--depth < 0)
> > + return NULL;
> > +
> > + if (!starts_with(ref_data, "ref:")) {
> > + if (get_sha1_hex(ref_data, sha1) ||
> > + (ref_data[40] != '\0' &&
> > !isspace(ref_data[40]))) {
> > + if (flags)
> > + *flags |= REF_ISBROKEN;
> > + errno = EINVAL;
> > + return NULL;
> > + }
> > +
> > + if (bad_name) {
> > + hashclr(sha1);
> > + if (flags)
> > + *flags |= REF_ISBROKEN;
> > + } else if (is_null_sha1(sha1)) {
> > + if (flags)
> > + *flags |= REF_ISBROKEN;
> > + }
> > + return refname;
> > + }
> > + if (flags)
> > + *flags |= REF_ISSYMREF;
> > + buf = ref_data + 4;
> > + while (isspace(*buf))
> > + buf++;
> > + strbuf_reset(&refname_buffer);
> > + strbuf_addstr(&refname_buffer, buf);
> > + refname = refname_buffer.buf;
> > + if (resolve_flags & RESOLVE_REF_NO_RECURSE) {
> > + hashclr(sha1);
> > + return refname;
> > + }
> > + if (check_refname_format(buf,
> > REFNAME_ALLOW_ONELEVEL)) {
> > + if (flags)
> > + *flags |= REF_ISBROKEN;
> > +
> > + if (!(resolve_flags &
> > RESOLVE_REF_ALLOW_BAD_NAME) ||
> > + !refname_is_safe(buf)) {
> > + errno = EINVAL;
> > + return NULL;
> > + }
> > + bad_name = 1;
> > + }
>
> This code looks a lot like near the end of resolve_ref_1(). Maybe we
> could share the code in refs/backend-common.c or something and call
> here instead?
Something like the following?
commit aad6b84fd1869f6e1cf6ed15bcece0c2f6429e9d
Author: David Turner <dturner@twopensource.com>
Date: Thu Feb 18 17:09:29 2016 -0500
refs: break out some functions from resolve_ref_1
A bunch of resolve_ref_1 is not backend-specific, so we can
break it out into separate internal functions that other
backends can use.
Signed-off-by: David Turner <dturner@twopensource.com>
diff --git a/refs.c b/refs.c
index c9fa34d..680c2a5 100644
--- a/refs.c
+++ b/refs.c
@@ -1221,6 +1221,66 @@ int for_each_rawref(each_ref_fn fn, void
*cb_data)
DO_FOR_EACH_INCLUDE_BROKEN, cb_data);
}
+int parse_simple_ref(const char *buf, unsigned char *sha1, unsigned
int *flags, int bad_name)
+{
+ /*
+ * Please note that FETCH_HEAD has a second
+ * line containing other data.
+ */
+ if (get_sha1_hex(buf, sha1) ||
+ (buf[40] != '\0' && !isspace(buf[40]))) {
+ if (flags)
+ *flags |= REF_ISBROKEN;
+ errno = EINVAL;
+ return -1;
+ }
+ if (bad_name) {
+ hashclr(sha1);
+ if (flags)
+ *flags |= REF_ISBROKEN;
+ }
+ return 0;
+}
+
+int check_bad_refname(const char *refname, int *flags, int
resolve_flags)
+{
+ if (!check_refname_format(refname, REFNAME_ALLOW_ONELEVEL))
+ return 0;
+
+ if (flags)
+ *flags |= REF_BAD_NAME;
+
+ if (!(resolve_flags & RESOLVE_REF_ALLOW_BAD_NAME) ||
+ !refname_is_safe(refname)) {
+ errno = EINVAL;
+ return -1;
+ }
+ /*
+ * dwim_ref() uses REF_ISBROKEN to distinguish between
+ * missing refs and refs that were present but invalid,
+ * to complain about the latter to stderr.
+ *
+ * We don't know whether the ref exists, so don't set
+ * REF_ISBROKEN yet.
+ */
+ return 1;
+}
+
+/*
+ * Parse a refname out of the contents of a symref into a provided
+ * strbuf. Return a pointer to the strbuf's contents.
+ */
+char *parse_symref_data(const char *buf, struct strbuf *sb_refname)
+{
+ buf += 4;
+ while (isspace(*buf))
+ buf++;
+ strbuf_reset(sb_refname);
+ strbuf_addstr(sb_refname, buf);
+ return sb_refname->buf;
+}
+
+
/* backend functions */
int refs_init_db(int shared, struct strbuf *err)
{
diff --git a/refs/files-backend.c b/refs/files-backend.c
index da06408..52972e6 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1427,25 +1427,9 @@ static const char *resolve_ref_1(const char
*refname,
if (flags)
*flags = 0;
- if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL)) {
- if (flags)
- *flags |= REF_BAD_NAME;
-
- if (!(resolve_flags & RESOLVE_REF_ALLOW_BAD_NAME) ||
- !refname_is_safe(refname)) {
- errno = EINVAL;
- return NULL;
- }
- /*
- * dwim_ref() uses REF_ISBROKEN to distinguish between
- * missing refs and refs that were present but
invalid,
- * to complain about the latter to stderr.
- *
- * We don't know whether the ref exists, so don't set
- * REF_ISBROKEN yet.
- */
- bad_name = 1;
- }
+ bad_name = check_bad_refname(refname, flags, resolve_flags);
+ if (bad_name < 0)
+ return NULL;
for (;;) {
const char *path;
struct stat st;
@@ -1541,47 +1525,20 @@ static const char *resolve_ref_1(const char
*refname,
* Is it a symbolic ref?
*/
if (!starts_with(sb_contents->buf, "ref:")) {
- /*
- * Please note that FETCH_HEAD has a second
- * line containing other data.
- */
- if (get_sha1_hex(sb_contents->buf, sha1) ||
- (sb_contents->buf[40] != '\0' &&
!isspace(sb_contents->buf[40]))) {
- if (flags)
- *flags |= REF_ISBROKEN;
- errno = EINVAL;
- return NULL;
- }
- if (bad_name) {
- hashclr(sha1);
- if (flags)
- *flags |= REF_ISBROKEN;
- }
+ if (parse_simple_ref(sb_contents->buf, sha1,
flags, bad_name))
+ refname = NULL;
return refname;
}
if (flags)
*flags |= REF_ISSYMREF;
- buf = sb_contents->buf + 4;
- while (isspace(*buf))
- buf++;
- strbuf_reset(sb_refname);
- strbuf_addstr(sb_refname, buf);
- refname = sb_refname->buf;
+ refname = parse_symref_data(sb_contents->buf,
sb_refname);
if (resolve_flags & RESOLVE_REF_NO_RECURSE) {
hashclr(sha1);
return refname;
}
- if (check_refname_format(buf, REFNAME_ALLOW_ONELEVEL))
{
- if (flags)
- *flags |= REF_ISBROKEN;
-
- if (!(resolve_flags &
RESOLVE_REF_ALLOW_BAD_NAME) ||
- !refname_is_safe(buf)) {
- errno = EINVAL;
- return NULL;
- }
- bad_name = 1;
- }
+ bad_name |= check_bad_refname(refname, flags,
resolve_flags);
+ if (bad_name < 0)
+ return NULL;
}
}
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index efdde82..7cdfffe 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -218,6 +218,26 @@ int do_for_each_per_worktree_ref(const char
*submodule, const char *base,
int do_for_each_ref(const char *submodule, const char *base,
each_ref_fn fn, int trim, int flags, void
*cb_data);
+/*
+ * Parse a non-symref -- a buf hopefully containing 40 hex characters.
+ * Set errno and flags appropriately. If the buf can be parsed but
+ * bad_name is set, the ref is broken: zero out sha1.
+ */
+int parse_simple_ref(const char *buf, unsigned char *sha1, unsigned
int *flags,
+ int bad_name);
+/*
+ * Parse a refname out of the contents of a symref into a provided
+ * strbuf. Return a pointer to the strbuf's contents.
+ */
+char *parse_symref_data(const char *buf, struct strbuf *sb_refname);
+
+/*
+ * Check the format of refname. Set flags and errno appropriately.
+ * Returns 0 if the refname is good, -1 if it is bad enough that we
+ * have to stop parsing and 1 if we just have to note that it is bad.
+ */
+int check_bad_refname(const char *refname, int *flags, int
resolve_flags);
+
/* refs backends */
typedef int ref_init_db_fn(int shared, struct strbuf *err);
typedef int ref_transaction_commit_fn(struct ref_transaction
*transaction,
followed by this version of parse_ref_data:
static const char *parse_ref_data(struct lmdb_transaction *transaction,
const char *refname, const char
*ref_data,
unsigned char *sha1, int
resolve_flags,
int *flags, int bad_name)
{
int depth = MAXDEPTH;
const char *buf;
static struct strbuf refname_buffer = STRBUF_INIT;
static struct strbuf refdata_buffer = STRBUF_INIT;
MDB_val key, val;
int needs_free = 0;
for (;;) {
if (--depth < 0)
return NULL;
/*
* Is it a symbolic ref?
*/
if (!starts_with(ref_data, "ref:")) {
if (parse_simple_ref(ref_data, sha1, flags,
bad_name))
refname = NULL;
if (is_null_sha1(sha1) && flags)
*flags |= REF_ISBROKEN;
return refname;
}
if (flags)
*flags |= REF_ISSYMREF;
refname = parse_symref_data(ref_data, &refname_buffer);
if (resolve_flags & RESOLVE_REF_NO_RECURSE) {
hashclr(sha1);
return refname;
}
bad_name |= check_bad_refname(refname, flags,
resolve_flags);
if (bad_name < 0)
return NULL;
key.mv_data = (char *)refname;
key.mv_size = strlen(refname) + 1;
if (mdb_get_or_die(transaction, &key, &val,
&needs_free)) {
hashclr(sha1);
if (bad_name) {
if (flags)
*flags |= REF_ISBROKEN;
}
if (resolve_flags & RESOLVE_REF_READING)
return NULL;
return refname;
}
strbuf_reset(&refdata_buffer);
strbuf_add(&refdata_buffer, val.mv_data, val.mv_size);
if (needs_free)
free(val.mv_data);
ref_data = refdata_buffer.buf;
}
return refname;
}
----------------
I'm not sure I like it, because it breaks out these weird tiny
functions that take a lot of arguments. But maybe it's worth it? What
do you think?
next prev parent reply other threads:[~2016-02-19 22:49 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-18 5:17 [PATCH v5 00/27] refs backends David Turner
2016-02-18 5:17 ` [PATCH v5 01/27] refs: Move head_ref{,_submodule} to the common code David Turner
2016-02-18 5:17 ` [PATCH v5 02/27] refs: move for_each_*ref* functions into " David Turner
2016-02-18 5:17 ` [PATCH v5 03/27] refs: add a backend method structure with transaction functions David Turner
2016-02-18 5:17 ` [PATCH v5 04/27] refs: add methods for misc ref operations David Turner
2016-02-18 5:17 ` [PATCH v5 05/27] refs: add method for do_for_each_ref David Turner
2016-02-18 5:17 ` [PATCH v5 06/27] refs: add do_for_each_per_worktree_ref David Turner
2016-02-18 5:17 ` [PATCH v5 07/27] refs: add methods for reflog David Turner
2016-02-18 5:17 ` [PATCH v5 08/27] refs: add method for initial ref transaction commit David Turner
2016-02-18 5:17 ` [PATCH v5 09/27] refs: add method for delete_refs David Turner
2016-02-18 5:17 ` [PATCH v5 10/27] refs: add methods to init refs db David Turner
2016-02-18 5:17 ` [PATCH v5 11/27] refs: add method to rename refs David Turner
2016-02-18 5:17 ` [PATCH v5 12/27] refs: forbid cross-backend ref renames David Turner
2016-02-20 4:30 ` Duy Nguyen
2016-02-24 20:48 ` David Turner
2016-02-18 5:17 ` [PATCH v5 13/27] refs: make lock generic David Turner
2016-02-18 5:17 ` [PATCH v5 14/27] refs: move duplicate check to common code David Turner
2016-02-18 5:17 ` [PATCH v5 15/27] refs: allow log-only updates David Turner
2016-02-18 5:17 ` [PATCH v5 16/27] refs: don't dereference on rename David Turner
2016-02-18 5:17 ` [PATCH v5 17/27] refs: on symref reflog expire, lock symref not referrent David Turner
2016-02-18 5:17 ` [PATCH v5 18/27] refs: resolve symbolic refs first David Turner
2016-02-18 5:17 ` [PATCH v5 19/27] refs: always handle non-normal refs in files backend David Turner
2016-02-18 5:17 ` [PATCH v5 20/27] init: allow alternate ref strorage to be set for new repos David Turner
2016-02-18 5:17 ` [PATCH v5 21/27] refs: check submodules' ref storage config David Turner
2016-02-18 5:17 ` [PATCH v5 22/27] clone: allow ref storage backend to be set for clone David Turner
2016-02-18 5:17 ` [PATCH v5 23/27] svn: learn ref-storage argument David Turner
2016-02-20 23:55 ` Eric Wong
2016-02-23 18:08 ` David Turner
2016-02-18 5:17 ` [PATCH v5 24/27] refs: add register_ref_storage_backends() David Turner
2016-02-18 5:17 ` [PATCH v5 25/27] refs: add LMDB refs storage backend David Turner
2016-02-18 8:50 ` Duy Nguyen
2016-02-18 20:23 ` David Turner
2016-02-18 21:15 ` Junio C Hamano
2016-02-19 2:54 ` Duy Nguyen
2016-02-19 19:10 ` David Turner
2016-02-20 13:14 ` Duy Nguyen
2016-02-24 20:41 ` David Turner
2016-02-20 21:32 ` Junio C Hamano
2016-02-19 22:49 ` David Turner [this message]
2016-02-19 23:08 ` Junio C Hamano
2016-02-20 2:58 ` Duy Nguyen
2016-02-24 20:43 ` David Turner
2016-02-25 10:07 ` Duy Nguyen
2016-02-20 8:59 ` Duy Nguyen
2016-02-24 20:37 ` David Turner
2016-02-25 10:12 ` Duy Nguyen
2016-02-25 20:05 ` [PATCH] refs: document transaction semantics David Turner
2016-02-25 20:10 ` David Turner
2016-02-25 20:34 ` Junio C Hamano
2016-02-25 20:50 ` David Turner
2016-02-18 5:17 ` [PATCH v5 26/27] refs: tests for lmdb backend David Turner
2016-02-18 5:17 ` [PATCH v5 27/27] tests: add ref-storage argument David Turner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1455922186.7528.97.camel@twopensource.com \
--to=dturner@twopensource.com \
--cc=git@vger.kernel.org \
--cc=mhagger@alum.mit.edu \
--cc=pclouds@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).