From: Duy Nguyen <pclouds@gmail.com>
To: David Turner <dturner@twopensource.com>
Cc: git mailing list <git@vger.kernel.org>,
mhagger@alum.mit.edu, Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v5 25/27] refs: add LMDB refs storage backend
Date: Sat, 20 Feb 2016 09:58:27 +0700 [thread overview]
Message-ID: <20160220025826.GA9338@lanh> (raw)
In-Reply-To: <1455922186.7528.97.camel@twopensource.com>
> On Fri, 2016-02-19 at 09:54 +0700, Duy Nguyen wrote:
>> On Fri, Feb 19, 2016 at 3:23 AM, David Turner <
>> dturner@twopensource.com> wrote:
>> > > > +static int read_per_worktree_ref(const char *submodule, const
>> > > > char
>> > > > *refname,
>> > > > + struct MDB_val *val, int
>> > > > *needs_free)
>> > >
>> > > From what I read, I suspect these _per_worktree functions will be
>> > > identical for the next backend. Should we just hand over the job
>> > > for
>> > > files backend? For all entry points that may deal with per
>> > > -worktree
>> > > refs, e.g. lmdb_resolve_ref_unsafe, can we check ref_type() first
>> > > thing, if it's per-worktree we call
>> > > refs_be_files.resolve_ref_unsafe()
>> > > instead? It could even be done at frontend level,
>> > > e.g. refs.c:resolve_ref_unsafe().
>> > >
>> > > Though I may be talking rubbish here because I don't know how
>> > > whether
>> > > it has anything to do with transactions.
>> >
>> > The reason I did it this way is that some ref chains cross backend
>> > boundaries (e.g. HEAD -> refs/heads/master). But if we have other
>> > backends later, we could generalize.
>>
>> Crossing backends should go through frontend again, imo. But I don't
>> really know if it's efficient.
>
> It's pretty tricky to maintain state (e.g. count of symref redirects)
> across that barrier. So I'm not sure how to do it cleanly.
I notice files backend does pretty much the same thing. "files"
backend looks more like two backends combined in one, one is files,
the other is packed-refs. And it looks like we could solve it by
providing a lower level api, read_raw_ref() or something, that
retrieves the ref without any validation or link following. More on
this later.
>> > > I'm not sure I get this comment. D/F conflicts are no longer a
>> > > thing
>> > > for lmdb backend, right?
>> >
>> > I'm trying to avoid the lmdb backend creating a set of refs that
>> > the
>> > files backend can't handle. This would make collaboration with
>> > other
>> > versions of git more difficult.
>>
>> It already is. If you create refs "foo" and "FOO" on case sensitive
>> file system and clone it on a case-insensitive one, you face the same
>> problem. We may have an optional configuration knob to prevent
>> incompatibilities with files backend, but I think that should be done
>> (and enforced if necessary) outside backends.
>
> Sure, the current state isn't perfect, but why make it worse?
I see it from a different angle. The current state isn't perfect, but
we will be moving to a better future where "files" backend may
eventually be deprecated. Why hold back?
But this line of reasoning only works if we have a new backend capable
of replacing "files" without regressions or introducing new
dependency. Which is why I suggest a new backend [1] (or implement
Shawn's RefTree if it's proven as good with small repos)
I have no problem if you want to stay strictly compatible with "files"
though.
[1] http://thread.gmane.org/gmane.comp.version-control.git/285893/focus=286654
On Fri, Feb 19, 2016 at 05:49:46PM -0500, David Turner wrote:
> >
> > This code looks a lot like near the end of resolve_ref_1(). Maybe we
> > could share the code in refs/backend-common.c or something and call
> > here instead?
>
> Something like the following?
>
> commit aad6b84fd1869f6e1cf6ed15bcece0c2f6429e9d
> Author: David Turner <dturner@twopensource.com>
> Date: Thu Feb 18 17:09:29 2016 -0500
>
> refs: break out some functions from resolve_ref_1
>
> A bunch of resolve_ref_1 is not backend-specific, so we can
> break it out into separate internal functions that other
> backends can use.
...
>
> I'm not sure I like it, because it breaks out these weird tiny
> functions that take a lot of arguments. But maybe it's worth it? What
> do you think?
OK how about we keep resolve_ref_1() whole and split real backend code
out? Something like these three patches (only built, did not test). A
bit ugly with continue_symlink, but it's just demonstration.
commit ef46fcdc62ef89fd5260ca054cd1d98f9f2d7c2b
Author: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Date: Sat Feb 20 09:18:45 2016 +0700
refs/files: move ref I/O code out of resolve_refs_1()
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 4bddfb3..f54f2ae 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1407,6 +1407,95 @@ static int resolve_missing_loose_ref(const char *refname,
}
}
+static const char *continue_normal_ref = "read_ref returns a normal ref";
+static const char *continue_symlink = "read_ref returns a symlink";
+
+/*
+ * Read a ref from backend. Returning any values except
+ * continue_normal_ref or continue_symlink ends resolve_ref_1()
+ * execution. If the return value is not NULL, sha1 and flags must be
+ * updated correctly. except REF_ISBROKEN which is set by
+ * resolve_ref_1().
+ *
+ * If continue_* is returned, sb_contents must contain the ref data.
+ */
+static const char *parse_ref(const char *refname,
+ int resolve_flags,
+ unsigned char *sha1,
+ int *flags,
+ struct strbuf *sb_path,
+ struct strbuf *sb_contents)
+{
+ const char *path;
+ struct stat st;
+ int fd;
+
+ strbuf_reset(sb_path);
+ strbuf_git_path(sb_path, "%s", refname);
+ path = sb_path->buf;
+
+ /*
+ * We might have to loop back here to avoid a race
+ * condition: first we lstat() the file, then we try
+ * to read it as a link or as a file. But if somebody
+ * changes the type of the file (file <-> directory
+ * <-> symlink) between the lstat() and reading, then
+ * we don't want to report that as an error but rather
+ * try again starting with the lstat().
+ */
+stat_ref:
+ if (lstat(path, &st) < 0) {
+ if (errno != ENOENT)
+ return NULL;
+ if (resolve_missing_loose_ref(refname, resolve_flags,
+ sha1, flags))
+ return NULL;
+ return refname;
+ }
+
+ /* Follow "normalized" - ie "refs/.." symlinks by hand */
+ if (S_ISLNK(st.st_mode)) {
+ strbuf_reset(sb_contents);
+ if (strbuf_readlink(sb_contents, path, 0) < 0) {
+ if (errno == ENOENT || errno == EINVAL)
+ /* inconsistent with lstat; retry */
+ goto stat_ref;
+ else
+ return NULL;
+ }
+ return continue_symlink;
+ }
+
+ /* Is it a directory? */
+ if (S_ISDIR(st.st_mode)) {
+ errno = EISDIR;
+ return NULL;
+ }
+
+ /*
+ * Anything else, just open it and try to use it as
+ * a ref
+ */
+ fd = open(path, O_RDONLY);
+ if (fd < 0) {
+ if (errno == ENOENT)
+ /* inconsistent with lstat; retry */
+ goto stat_ref;
+ else
+ return NULL;
+ }
+ strbuf_reset(sb_contents);
+ if (strbuf_read(sb_contents, fd, 256) < 0) {
+ int save_errno = errno;
+ close(fd);
+ errno = save_errno;
+ return NULL;
+ }
+ close(fd);
+
+ return continue_normal_ref;
+}
+
/* This function needs to return a meaningful errno on failure */
static const char *resolve_ref_1(const char *refname,
int resolve_flags,
@@ -1442,54 +1531,18 @@ static const char *resolve_ref_1(const char *refname,
bad_name = 1;
}
for (;;) {
- const char *path;
- struct stat st;
+ const char *ret;
char *buf;
- int fd;
if (--depth < 0) {
errno = ELOOP;
return NULL;
}
- strbuf_reset(sb_path);
- strbuf_git_path(sb_path, "%s", refname);
- path = sb_path->buf;
+ ret = parse_ref(refname, resolve_flags, sha1,
+ flags, sb_path, sb_contents);
- /*
- * We might have to loop back here to avoid a race
- * condition: first we lstat() the file, then we try
- * to read it as a link or as a file. But if somebody
- * changes the type of the file (file <-> directory
- * <-> symlink) between the lstat() and reading, then
- * we don't want to report that as an error but rather
- * try again starting with the lstat().
- */
- stat_ref:
- if (lstat(path, &st) < 0) {
- if (errno != ENOENT)
- return NULL;
- if (resolve_missing_loose_ref(refname, resolve_flags,
- sha1, flags))
- return NULL;
- if (bad_name) {
- hashclr(sha1);
- if (flags)
- *flags |= REF_ISBROKEN;
- }
- return refname;
- }
-
- /* Follow "normalized" - ie "refs/.." symlinks by hand */
- if (S_ISLNK(st.st_mode)) {
- strbuf_reset(sb_contents);
- if (strbuf_readlink(sb_contents, path, 0) < 0) {
- if (errno == ENOENT || errno == EINVAL)
- /* inconsistent with lstat; retry */
- goto stat_ref;
- else
- return NULL;
- }
+ if (ret == continue_symlink) {
if (starts_with(sb_contents->buf, "refs/") &&
!check_refname_format(sb_contents->buf, 0)) {
strbuf_swap(sb_refname, sb_contents);
@@ -1502,35 +1555,10 @@ static const char *resolve_ref_1(const char *refname,
}
continue;
}
- }
-
- /* Is it a directory? */
- if (S_ISDIR(st.st_mode)) {
- errno = EISDIR;
- return NULL;
- }
-
- /*
- * Anything else, just open it and try to use it as
- * a ref
- */
- fd = open(path, O_RDONLY);
- if (fd < 0) {
- if (errno == ENOENT)
- /* inconsistent with lstat; retry */
- goto stat_ref;
- else
- return NULL;
- }
- strbuf_reset(sb_contents);
- if (strbuf_read(sb_contents, fd, 256) < 0) {
- int save_errno = errno;
- close(fd);
- errno = save_errno;
- return NULL;
- }
- close(fd);
- strbuf_rtrim(sb_contents);
+ } else if (ret == refname)
+ break;
+ else if (ret != continue_normal_ref)
+ return ret;
/*
* Is it a symbolic ref?
@@ -1547,12 +1575,7 @@ static const char *resolve_ref_1(const char *refname,
errno = EINVAL;
return NULL;
}
- if (bad_name) {
- hashclr(sha1);
- if (flags)
- *flags |= REF_ISBROKEN;
- }
- return refname;
+ break;
}
if (flags)
*flags |= REF_ISSYMREF;
@@ -1578,6 +1601,13 @@ static const char *resolve_ref_1(const char *refname,
bad_name = 1;
}
}
+
+ if (bad_name) {
+ hashclr(sha1);
+ if (flags)
+ *flags |= REF_ISBROKEN;
+ }
+ return refname;
}
static const char *files_resolve_ref_unsafe(const char *refname,
After this resolve_ref_1() is backend independent. So we can make it
take parse_ref() as a function pointer instead.
commit 50d96b6f79b30b5ba17fa00ec3ee42845546a123
Author: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Date: Sat Feb 20 09:22:03 2016 +0700
refs/files-backend.c: let resolve_refs_1() accept parse_ref as callback
diff --git a/refs/files-backend.c b/refs/files-backend.c
index f54f2ae..9b4de9f 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1424,7 +1424,8 @@ static const char *parse_ref(const char *refname,
unsigned char *sha1,
int *flags,
struct strbuf *sb_path,
- struct strbuf *sb_contents)
+ struct strbuf *sb_contents,
+ void *cb_data)
{
const char *path;
struct stat st;
@@ -1497,13 +1498,21 @@ stat_ref:
}
/* This function needs to return a meaningful errno on failure */
-static const char *resolve_ref_1(const char *refname,
- int resolve_flags,
- unsigned char *sha1,
- int *flags,
- struct strbuf *sb_refname,
- struct strbuf *sb_path,
- struct strbuf *sb_contents)
+const char *resolve_ref_1(const char *refname,
+ int resolve_flags,
+ unsigned char *sha1,
+ int *flags,
+ struct strbuf *sb_refname,
+ struct strbuf *sb_path,
+ struct strbuf *sb_contents,
+ const char *(*parse_ref)(const char *refname,
+ int resolve_flags,
+ unsigned char *sha1,
+ int *flags,
+ struct strbuf *sb_path,
+ struct strbuf *sb_contents,
+ void *cb_data),
+ void *cb_data)
{
int depth = MAXDEPTH;
int bad_name = 0;
@@ -1540,7 +1549,8 @@ static const char *resolve_ref_1(const char *refname,
}
ret = parse_ref(refname, resolve_flags, sha1,
- flags, sb_path, sb_contents);
+ flags, sb_path, sb_contents,
+ cb_data);
if (ret == continue_symlink) {
if (starts_with(sb_contents->buf, "refs/") &&
@@ -1621,7 +1631,8 @@ static const char *files_resolve_ref_unsafe(const char *refname,
const char *ret;
ret = resolve_ref_1(refname, resolve_flags, sha1, flags,
- &sb_refname, &sb_path, &sb_contents);
+ &sb_refname, &sb_path, &sb_contents,
+ parse_ref, NULL);
strbuf_release(&sb_path);
strbuf_release(&sb_contents);
return ret;
And finally we can make lmdb use resolve_ref_1(). lmdb-specific code
is in the new retrieve_ref() function.
commit 62a5df3117c0f825bc26fd09dda29e713f94d743 (HEAD -> lmdb)
Author: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Date: Sat Feb 20 09:33:01 2016 +0700
refs/lmdb-backend: use resolve_ref_1()
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 9b4de9f..44b7136 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1407,7 +1407,7 @@ static int resolve_missing_loose_ref(const char *refname,
}
}
-static const char *continue_normal_ref = "read_ref returns a normal ref";
+const char *continue_normal_ref = "read_ref returns a normal ref";
static const char *continue_symlink = "read_ref returns a symlink";
/*
diff --git a/refs/lmdb-backend.c b/refs/lmdb-backend.c
index 6c0d7fb..7f169bd 100644
--- a/refs/lmdb-backend.c
+++ b/refs/lmdb-backend.c
@@ -544,74 +544,64 @@ done:
return ret;
}
+extern const char *continue_normal_ref;
+
+static const char *retrieve_ref(const char *refname,
+ int resolve_flags,
+ unsigned char *sha1,
+ int *flags,
+ struct strbuf *sb_path,
+ struct strbuf *sb_contents,
+ void *cb_data)
+{
+ struct lmdb_transaction *transaction = cb_data;
+ MDB_val key, val;
+ int needs_free; /* dont care, leak */
+
+ key.mv_data = (void *)refname;
+ key.mv_size = strlen(refname) + 1;
+
+ val.mv_data = NULL;
+ val.mv_size = 0;
+
+ if (mdb_get_or_die(transaction, &key, &val, &needs_free)) {
+ struct strbuf err = STRBUF_INIT;
+
+ if (resolve_flags & RESOLVE_REF_READING)
+ return NULL;
+
+ if (verify_refname_available_txn(transaction,
+ refname, NULL, NULL, &err)) {
+ error("%s", err.buf);
+ strbuf_release(&err);
+ return NULL;
+ }
+
+ hashclr(sha1);
+ return refname;
+ }
+
+ strbuf_reset(sb_contents);
+ strbuf_add(sb_contents, val.mv_data, val.mv_size);
+ return continue_normal_ref;
+}
+
static const char *resolve_ref_unsafe_txn(struct lmdb_transaction *transaction,
const char *refname,
int resolve_flags,
unsigned char *sha1,
int *flags)
{
- int bad_name = 0;
- char *ref_data;
- struct MDB_val key, val;
- struct strbuf err = STRBUF_INIT;
- int needs_free = 0;
+ static struct strbuf sb_refname = STRBUF_INIT;
+ struct strbuf sb_contents = STRBUF_INIT;
+ struct strbuf sb_path = STRBUF_INIT;
const char *ret;
- val.mv_size = 0;
- val.mv_data = NULL;
-
- if (flags)
- *flags = 0;
-
- if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL)) {
- if (flags)
- *flags |= REF_BAD_NAME;
-
- if (!(resolve_flags & RESOLVE_REF_ALLOW_BAD_NAME) ||
- !refname_is_safe(refname)) {
- errno = EINVAL;
- return NULL;
- }
- /*
- * dwim_ref() uses REF_ISBROKEN to distinguish between
- * missing refs and refs that were present but invalid,
- * to complain about the latter to stderr.
- *
- * We don't know whether the ref exists, so don't set
- * REF_ISBROKEN yet.
- */
- bad_name = 1;
- }
-
- key.mv_data = (void *)refname;
- key.mv_size = strlen(refname) + 1;
- if (mdb_get_or_die(transaction, &key, &val, &needs_free)) {
- if (bad_name) {
- hashclr(sha1);
- if (flags)
- *flags |= REF_ISBROKEN;
- }
-
- if (resolve_flags & RESOLVE_REF_READING)
- return NULL;
-
- if (verify_refname_available_txn(transaction, refname, NULL, NULL, &err)) {
- error("%s", err.buf);
- strbuf_release(&err);
- return NULL;
- }
-
- hashclr(sha1);
- return refname;
- }
-
- ref_data = val.mv_data;
- assert(ref_data[val.mv_size - 1] == 0);
-
- ret = parse_ref_data(transaction, refname, ref_data, sha1,
- resolve_flags, flags, bad_name);
- if (needs_free)
- free(ref_data);
+ ret = resolve_ref_1(refname, resolve_flags, sha1, flags,
+ &sb_refname, &sb_path, &sb_contents,
+ retrieve_ref, transaction);
+ strbuf_release(&sb_path);
+ strbuf_release(&sb_contents);
return ret;
}
lmdb-backend.c:retrieve_ref(), files-backend.c:parse_ref() can be made
part of ref api that, given a ref name, returns the ref raw data and
type. The frontend can decide what backend callback to use based on
refname, so retrieve_ref() in the end does not have to call
read_per_worktree_ref() internally anymore.
Hmm?
--
Duy
next prev parent reply other threads:[~2016-02-20 2:58 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-18 5:17 [PATCH v5 00/27] refs backends David Turner
2016-02-18 5:17 ` [PATCH v5 01/27] refs: Move head_ref{,_submodule} to the common code David Turner
2016-02-18 5:17 ` [PATCH v5 02/27] refs: move for_each_*ref* functions into " David Turner
2016-02-18 5:17 ` [PATCH v5 03/27] refs: add a backend method structure with transaction functions David Turner
2016-02-18 5:17 ` [PATCH v5 04/27] refs: add methods for misc ref operations David Turner
2016-02-18 5:17 ` [PATCH v5 05/27] refs: add method for do_for_each_ref David Turner
2016-02-18 5:17 ` [PATCH v5 06/27] refs: add do_for_each_per_worktree_ref David Turner
2016-02-18 5:17 ` [PATCH v5 07/27] refs: add methods for reflog David Turner
2016-02-18 5:17 ` [PATCH v5 08/27] refs: add method for initial ref transaction commit David Turner
2016-02-18 5:17 ` [PATCH v5 09/27] refs: add method for delete_refs David Turner
2016-02-18 5:17 ` [PATCH v5 10/27] refs: add methods to init refs db David Turner
2016-02-18 5:17 ` [PATCH v5 11/27] refs: add method to rename refs David Turner
2016-02-18 5:17 ` [PATCH v5 12/27] refs: forbid cross-backend ref renames David Turner
2016-02-20 4:30 ` Duy Nguyen
2016-02-24 20:48 ` David Turner
2016-02-18 5:17 ` [PATCH v5 13/27] refs: make lock generic David Turner
2016-02-18 5:17 ` [PATCH v5 14/27] refs: move duplicate check to common code David Turner
2016-02-18 5:17 ` [PATCH v5 15/27] refs: allow log-only updates David Turner
2016-02-18 5:17 ` [PATCH v5 16/27] refs: don't dereference on rename David Turner
2016-02-18 5:17 ` [PATCH v5 17/27] refs: on symref reflog expire, lock symref not referrent David Turner
2016-02-18 5:17 ` [PATCH v5 18/27] refs: resolve symbolic refs first David Turner
2016-02-18 5:17 ` [PATCH v5 19/27] refs: always handle non-normal refs in files backend David Turner
2016-02-18 5:17 ` [PATCH v5 20/27] init: allow alternate ref strorage to be set for new repos David Turner
2016-02-18 5:17 ` [PATCH v5 21/27] refs: check submodules' ref storage config David Turner
2016-02-18 5:17 ` [PATCH v5 22/27] clone: allow ref storage backend to be set for clone David Turner
2016-02-18 5:17 ` [PATCH v5 23/27] svn: learn ref-storage argument David Turner
2016-02-20 23:55 ` Eric Wong
2016-02-23 18:08 ` David Turner
2016-02-18 5:17 ` [PATCH v5 24/27] refs: add register_ref_storage_backends() David Turner
2016-02-18 5:17 ` [PATCH v5 25/27] refs: add LMDB refs storage backend David Turner
2016-02-18 8:50 ` Duy Nguyen
2016-02-18 20:23 ` David Turner
2016-02-18 21:15 ` Junio C Hamano
2016-02-19 2:54 ` Duy Nguyen
2016-02-19 19:10 ` David Turner
2016-02-20 13:14 ` Duy Nguyen
2016-02-24 20:41 ` David Turner
2016-02-20 21:32 ` Junio C Hamano
2016-02-19 22:49 ` David Turner
2016-02-19 23:08 ` Junio C Hamano
2016-02-20 2:58 ` Duy Nguyen [this message]
2016-02-24 20:43 ` David Turner
2016-02-25 10:07 ` Duy Nguyen
2016-02-20 8:59 ` Duy Nguyen
2016-02-24 20:37 ` David Turner
2016-02-25 10:12 ` Duy Nguyen
2016-02-25 20:05 ` [PATCH] refs: document transaction semantics David Turner
2016-02-25 20:10 ` David Turner
2016-02-25 20:34 ` Junio C Hamano
2016-02-25 20:50 ` David Turner
2016-02-18 5:17 ` [PATCH v5 26/27] refs: tests for lmdb backend David Turner
2016-02-18 5:17 ` [PATCH v5 27/27] tests: add ref-storage argument David Turner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160220025826.GA9338@lanh \
--to=pclouds@gmail.com \
--cc=dturner@twopensource.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=mhagger@alum.mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).