git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 00/20] Separate `ref_cache` into a separate module
@ 2017-03-20 16:33 Michael Haggerty
  2017-03-20 16:33 ` [PATCH 01/20] get_ref_dir(): don't call read_loose_refs() for "refs/bisect" Michael Haggerty
                   ` (23 more replies)
  0 siblings, 24 replies; 38+ messages in thread
From: Michael Haggerty @ 2017-03-20 16:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, David Turner, Jeff King,
	git, Michael Haggerty

I had a window of opportunity last week to hack intensely on Git, with
the following goals:

* Separate `ref_cache` out of `files_ref_cache`.

* Separate a new `packed_ref_cache` class out of `files_ref_cache`.
  Change the latter to use an instance of the former for all of its
  interactions with the `packed-refs` file.

* Mmap `packed-refs` files rather than reading-and-parsing.

* Use the mmapped version of the `packed-refs` file as the "cache"
  rather than using a separate `ref_cache`.

* (And the main goal): Avoid reading and parsing the *whole
  `packed-refs` file* (as we do now) every time any part of it is
  needed. Instead, use binary search to find the reference and/or
  range of references that we want, and parse the info out of the
  mmapped image on the fly.

I've completed a draft of an epic 48-patch series implementing all of
the above points on my GitHub fork [1] as branch
`wip/mmap-packed-refs`. It dramatically speeds up performance and
reduces memory usage for some tasks in repositories with very many
packed references.

But the later parts of that series aren't completely polished yet, and
such a large patch series would be indigestible anyway, so here I
submit the first part...

This patch series extracts a `ref_cache` module out of
`files_ref_cache`, and goes some way to disentangling those two
modules, which until now were overly intimate with each other:

* Remove `verify_refname_available()` from the refs VTABLE, instead
  implementing it in a generic way that uses only the usual refs API
  to talk to the `ref_store`.

* Split `ref_cache`-related code into a new module,
  `refs/ref-cache.{c,h}`. Encapsulate the data structure in a new
  class, `struct ref_cache`.

* Change the lazy-filling mechanism of `ref_cache` to call back to its
  backing `ref_store` via a callback function rather than calling
  `read_loose_refs()` directly.

* Move the special handling of `refs/bisect/` from `ref_cache` to
  `files_ref_store`.

* Make `cache_ref_iterator_begin()` smarter, and change external users
  to iterate via this interface instead of using
  `do_for_each_entry_in_dir()`.

Even after this patch series, the modules are still too intimate for
my taste, but I think this is a big step forward, and it is enough to
allow the other changes that I've been working on.

These patches depend on Duy's nd/files-backend-git-dir branch, v6 [2].
They are also available from my GitHub fork [1] as branch
`separate-ref-cache`.

Happily, this patch series actually removes a few more lines than it
adds, mostly thanks to the simpler `verify_refname_available()`
implementation.

Michael

[1] https://github.com/mhagger/git
[2] http://public-inbox.org/git/20170318020337.22767-1-pclouds@gmail.com/

Michael Haggerty (20):
  get_ref_dir(): don't call read_loose_refs() for "refs/bisect"
  refs_read_raw_ref(): new function
  refs_ref_iterator_begin(): new function
  refs_verify_refname_available(): implement once for all backends
  refs_verify_refname_available(): use function in more places
  Rename `add_ref()` to `add_ref_entry()`
  Rename `find_ref()` to `find_ref_entry()`
  Rename `remove_entry()` to `remove_entry_from_dir()`
  refs: split `ref_cache` code into separate files
  ref-cache: introduce a new type, ref_cache
  refs: record the ref_store in ref_cache, not ref_dir
  ref-cache: use a callback function to fill the cache
  refs: handle "refs/bisect/" in `loose_fill_ref_dir()`
  do_for_each_entry_in_dir(): eliminate `offset` argument
  get_loose_ref_dir(): function renamed from get_loose_refs()
  get_loose_ref_cache(): new function
  cache_ref_iterator_begin(): make function smarter
  commit_packed_refs(): use reference iteration
  files_pack_refs(): use reference iteration
  do_for_each_entry_in_dir(): delete function

 Makefile             |    1 +
 refs.c               |  111 ++++-
 refs.h               |    2 +-
 refs/files-backend.c | 1229 +++++++-------------------------------------------
 refs/ref-cache.c     |  523 +++++++++++++++++++++
 refs/ref-cache.h     |  267 +++++++++++
 refs/refs-internal.h |   22 +-
 7 files changed, 1066 insertions(+), 1089 deletions(-)
 create mode 100644 refs/ref-cache.c
 create mode 100644 refs/ref-cache.h

-- 
2.11.0


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH 01/20] get_ref_dir(): don't call read_loose_refs() for "refs/bisect"
  2017-03-20 16:33 [PATCH 00/20] Separate `ref_cache` into a separate module Michael Haggerty
@ 2017-03-20 16:33 ` Michael Haggerty
  2017-03-20 16:33 ` [PATCH 02/20] refs_read_raw_ref(): new function Michael Haggerty
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Michael Haggerty @ 2017-03-20 16:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, David Turner, Jeff King,
	git, Michael Haggerty

Since references under "refs/bisect/" are per-worktree, they have to
be sought in the worktree rather than in the main repository. But
since loose references are found by traversing directories, the
reference iterator won't even get the idea to look for a
"refs/bisect/" directory in the worktree if there is not a directory
with that name in the main repository. Thus `get_ref_dir()` manually
inserts a dir_entry for "refs/bisect/" whenever it reads the entry for
"refs/".

The current code then immediately calls `read_loose_refs()` on that
directory. But since the dir_entry is created with its `incomplete`
flag set, any traversal that gets to this point will read the
directory automatically. So there is no need to call
`read_loose_refs()` explicitly; the lazy mechanism suffices.

And in fact, the attempt to `read_loose_refs()` was broken anyway.
That function needs its `dirname` argument to have a trailing `/`
character, but the invocation here was passing it "refs/bisect"
without a trailing slash. So `read_loose_refs()` would read
`$GIT_DIR/refs/bisect" correctly, but if it found an entry "foo" in
that directory, it would try to read "$GIT_DIR/refs/bisectfoo".
Normally it wouldn't find anything at that path, but the failure was
canceled out because `get_ref_dir()` *also* forgot to reset the
`REF_INCOMPLETE` bit on the dir_entry. So the read was attempted again
when it was accessed, via the lazy mechanism, and this time the read
was done correctly.

This code has been broken since it was first introduced.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/files-backend.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 4242486118..e7a075e864 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -191,8 +191,6 @@ static struct ref_dir *get_ref_dir(struct ref_entry *entry)
 							       "refs/bisect/",
 							       12, 1);
 				add_entry_to_dir(dir, child_entry);
-				read_loose_refs("refs/bisect",
-						&child_entry->u.subdir);
 			}
 		}
 		entry->flag &= ~REF_INCOMPLETE;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 02/20] refs_read_raw_ref(): new function
  2017-03-20 16:33 [PATCH 00/20] Separate `ref_cache` into a separate module Michael Haggerty
  2017-03-20 16:33 ` [PATCH 01/20] get_ref_dir(): don't call read_loose_refs() for "refs/bisect" Michael Haggerty
@ 2017-03-20 16:33 ` Michael Haggerty
  2017-03-20 16:33 ` [PATCH 03/20] refs_ref_iterator_begin(): " Michael Haggerty
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Michael Haggerty @ 2017-03-20 16:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, David Turner, Jeff King,
	git, Michael Haggerty

Extract a new function from `refs_resolve_ref_unsafe()`. It will be
useful elsewhere.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs.c               | 11 +++++++++--
 refs/refs-internal.h |  4 ++++
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/refs.c b/refs.c
index 77a39f8b17..0ed6c3c7a4 100644
--- a/refs.c
+++ b/refs.c
@@ -1326,6 +1326,13 @@ int for_each_rawref(each_ref_fn fn, void *cb_data)
 	return refs_for_each_rawref(get_main_ref_store(), fn, cb_data);
 }
 
+int refs_read_raw_ref(struct ref_store *ref_store,
+		      const char *refname, unsigned char *sha1,
+		      struct strbuf *referent, unsigned int *type)
+{
+	return ref_store->be->read_raw_ref(ref_store, refname, sha1, referent, type);
+}
+
 /* This function needs to return a meaningful errno on failure */
 const char *refs_resolve_ref_unsafe(struct ref_store *refs,
 				    const char *refname,
@@ -1362,8 +1369,8 @@ const char *refs_resolve_ref_unsafe(struct ref_store *refs,
 	for (symref_count = 0; symref_count < SYMREF_MAXDEPTH; symref_count++) {
 		unsigned int read_flags = 0;
 
-		if (refs->be->read_raw_ref(refs, refname,
-					   sha1, &sb_refname, &read_flags)) {
+		if (refs_read_raw_ref(refs, refname,
+				      sha1, &sb_refname, &read_flags)) {
 			*flags |= read_flags;
 			if (errno != ENOENT || (resolve_flags & RESOLVE_REF_READING))
 				return NULL;
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 690498698e..6ee9f20dbc 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -165,6 +165,10 @@ struct ref_update {
 	const char refname[FLEX_ARRAY];
 };
 
+int refs_read_raw_ref(struct ref_store *ref_store,
+		      const char *refname, unsigned char *sha1,
+		      struct strbuf *referent, unsigned int *type);
+
 /*
  * Add a ref_update with the specified properties to transaction, and
  * return a pointer to the new object. This function does not verify
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 03/20] refs_ref_iterator_begin(): new function
  2017-03-20 16:33 [PATCH 00/20] Separate `ref_cache` into a separate module Michael Haggerty
  2017-03-20 16:33 ` [PATCH 01/20] get_ref_dir(): don't call read_loose_refs() for "refs/bisect" Michael Haggerty
  2017-03-20 16:33 ` [PATCH 02/20] refs_read_raw_ref(): new function Michael Haggerty
@ 2017-03-20 16:33 ` Michael Haggerty
  2017-03-20 16:33 ` [PATCH 04/20] refs_verify_refname_available(): implement once for all backends Michael Haggerty
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Michael Haggerty @ 2017-03-20 16:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, David Turner, Jeff King,
	git, Michael Haggerty

Extract a new function from `do_for_each_ref()`. It will be useful
elsewhere.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs.c               | 15 +++++++++++++--
 refs/refs-internal.h | 11 +++++++++++
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/refs.c b/refs.c
index 0ed6c3c7a4..aeb704ab92 100644
--- a/refs.c
+++ b/refs.c
@@ -1230,6 +1230,18 @@ int head_ref(each_ref_fn fn, void *cb_data)
 	return head_ref_submodule(NULL, fn, cb_data);
 }
 
+struct ref_iterator *refs_ref_iterator_begin(
+		struct ref_store *refs,
+		const char *prefix, int trim, int flags)
+{
+	struct ref_iterator *iter;
+
+	iter = refs->be->iterator_begin(refs, prefix, flags);
+	iter = prefix_ref_iterator_begin(iter, prefix, trim);
+
+	return iter;
+}
+
 /*
  * Call fn for each reference in the specified submodule for which the
  * refname begins with prefix. If trim is non-zero, then trim that
@@ -1247,8 +1259,7 @@ static int do_for_each_ref(struct ref_store *refs, const char *prefix,
 	if (!refs)
 		return 0;
 
-	iter = refs->be->iterator_begin(refs, prefix, flags);
-	iter = prefix_ref_iterator_begin(iter, prefix, trim);
+	iter = refs_ref_iterator_begin(refs, prefix, trim, flags);
 
 	return do_for_each_ref_iterator(iter, fn, cb_data);
 }
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 6ee9f20dbc..545989ae7f 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -335,6 +335,17 @@ struct ref_iterator *empty_ref_iterator_begin(void);
  */
 int is_empty_ref_iterator(struct ref_iterator *ref_iterator);
 
+/*
+ * Return an iterator that goes over each reference in `refs` for
+ * which the refname begins with prefix. If trim is non-zero, then
+ * trim that many characters off the beginning of each refname. flags
+ * can be DO_FOR_EACH_INCLUDE_BROKEN to include broken references in
+ * the iteration.
+ */
+struct ref_iterator *refs_ref_iterator_begin(
+		struct ref_store *refs,
+		const char *prefix, int trim, int flags);
+
 /*
  * A callback function used to instruct merge_ref_iterator how to
  * interleave the entries from iter0 and iter1. The function should
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 04/20] refs_verify_refname_available(): implement once for all backends
  2017-03-20 16:33 [PATCH 00/20] Separate `ref_cache` into a separate module Michael Haggerty
                   ` (2 preceding siblings ...)
  2017-03-20 16:33 ` [PATCH 03/20] refs_ref_iterator_begin(): " Michael Haggerty
@ 2017-03-20 16:33 ` Michael Haggerty
  2017-03-20 17:42   ` Jeff King
  2017-03-20 16:33 ` [PATCH 05/20] refs_verify_refname_available(): use function in more places Michael Haggerty
                   ` (19 subsequent siblings)
  23 siblings, 1 reply; 38+ messages in thread
From: Michael Haggerty @ 2017-03-20 16:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, David Turner, Jeff King,
	git, Michael Haggerty

It turns out that we can now implement
`refs_verify_refname_available()` based on the other virtual
functions, so there is no need for it to be defined at the backend
level. Instead, define it once in `refs.c` and remove the
`files_backend` definition.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs.c               | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 refs.h               |  2 +-
 refs/files-backend.c | 39 +++++-------------------
 refs/refs-internal.h |  7 -----
 4 files changed, 92 insertions(+), 41 deletions(-)

diff --git a/refs.c b/refs.c
index aeb704ab92..2cf531b9c7 100644
--- a/refs.c
+++ b/refs.c
@@ -5,6 +5,7 @@
 #include "cache.h"
 #include "hashmap.h"
 #include "lockfile.h"
+#include "iterator.h"
 #include "refs.h"
 #include "refs/refs-internal.h"
 #include "object.h"
@@ -1661,11 +1662,91 @@ int ref_transaction_commit(struct ref_transaction *transaction,
 
 int refs_verify_refname_available(struct ref_store *refs,
 				  const char *refname,
-				  const struct string_list *extra,
+				  const struct string_list *extras,
 				  const struct string_list *skip,
 				  struct strbuf *err)
 {
-	return refs->be->verify_refname_available(refs, refname, extra, skip, err);
+	const char *slash;
+	const char *extra_refname;
+	struct strbuf dirname = STRBUF_INIT;
+	struct strbuf referent = STRBUF_INIT;
+	struct object_id oid;
+	unsigned int type;
+	struct ref_iterator *iter;
+	int ok;
+	int ret = -1;
+
+	/*
+	 * For the sake of comments in this function, suppose that
+	 * refname is "refs/foo/bar".
+	 */
+
+	assert(err);
+
+	strbuf_grow(&dirname, strlen(refname) + 1);
+	for (slash = strchr(refname, '/'); slash; slash = strchr(slash + 1, '/')) {
+		/* Expand dirname to the new prefix, not including the trailing slash: */
+		strbuf_add(&dirname, refname + dirname.len, slash - refname - dirname.len);
+
+		/*
+		 * We are still at a leading dir of the refname (e.g.,
+		 * "refs/foo"; if there is a reference with that name,
+		 * it is a conflict, *unless* it is in skip.
+		 */
+		if (skip && string_list_has_string(skip, dirname.buf))
+			continue;
+
+		if (!refs_read_raw_ref(refs, dirname.buf, oid.hash, &referent, &type)) {
+			strbuf_addf(err, "'%s' exists; cannot create '%s'",
+				    dirname.buf, refname);
+			goto cleanup;
+		}
+
+		if (extras && string_list_has_string(extras, dirname.buf)) {
+			strbuf_addf(err, "cannot process '%s' and '%s' at the same time",
+				    refname, dirname.buf);
+			goto cleanup;
+		}
+	}
+
+	/*
+	 * We are at the leaf of our refname (e.g., "refs/foo/bar").
+	 * There is no point in searching for a reference with that
+	 * name, because a refname isn't considered to conflict with
+	 * itself. But we still need to check for references whose
+	 * names are in the "refs/foo/bar/" namespace, because they
+	 * *do* conflict.
+	 */
+	strbuf_addstr(&dirname, refname + dirname.len);
+	strbuf_addch(&dirname, '/');
+
+	iter = refs_ref_iterator_begin(refs, dirname.buf, 0,
+				       DO_FOR_EACH_INCLUDE_BROKEN);
+	while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
+		if (skip &&
+		    string_list_has_string(skip, iter->refname))
+			continue;
+
+		strbuf_addf(err, "'%s' exists; cannot create '%s'",
+			    iter->refname, refname);
+		ok = ref_iterator_abort(iter);
+		goto cleanup;
+	}
+
+	if (ok != ITER_DONE)
+		die("BUG: error while iterating over references");
+
+	extra_refname = find_descendant_ref(dirname.buf, extras, skip);
+	if (extra_refname)
+		strbuf_addf(err, "cannot process '%s' and '%s' at the same time",
+			    refname, extra_refname);
+	else
+		ret = 0;
+
+cleanup:
+	strbuf_release(&referent);
+	strbuf_release(&dirname);
+	return ret;
 }
 
 int refs_for_each_reflog(struct ref_store *refs, each_ref_fn fn, void *cb_data)
diff --git a/refs.h b/refs.h
index 49e97d7d5f..07cf4cd41b 100644
--- a/refs.h
+++ b/refs.h
@@ -97,7 +97,7 @@ int read_ref(const char *refname, unsigned char *sha1);
 
 int refs_verify_refname_available(struct ref_store *refs,
 				  const char *refname,
-				  const struct string_list *extra,
+				  const struct string_list *extras,
 				  const struct string_list *skip,
 				  struct strbuf *err);
 
diff --git a/refs/files-backend.c b/refs/files-backend.c
index e7a075e864..447f2767f3 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1725,10 +1725,9 @@ static int lock_raw_ref(struct files_ref_store *refs,
 				goto error_return;
 			} else if (remove_dir_recursively(&ref_file,
 							  REMOVE_DIR_EMPTY_ONLY)) {
-				if (verify_refname_available_dir(
-						    refname, extras, skip,
-						    get_loose_refs(refs),
-						    err)) {
+				if (refs_verify_refname_available(
+						    &refs->base, refname,
+						    extras, skip, err)) {
 					/*
 					 * The error message set by
 					 * verify_refname_available() is OK.
@@ -2097,9 +2096,9 @@ static struct ref_lock *lock_ref_sha1_basic(struct files_ref_store *refs,
 		 */
 		if (remove_empty_directories(&ref_file)) {
 			last_errno = errno;
-			if (!verify_refname_available_dir(
-					    refname, extras, skip,
-					    get_loose_refs(refs), err))
+			if (!refs_verify_refname_available(
+					    &refs->base,
+					    refname, extras, skip, err))
 				strbuf_addf(err, "there are still refs under '%s'",
 					    refname);
 			goto error_return;
@@ -2111,9 +2110,8 @@ static struct ref_lock *lock_ref_sha1_basic(struct files_ref_store *refs,
 	if (!resolved) {
 		last_errno = errno;
 		if (last_errno != ENOTDIR ||
-		    !verify_refname_available_dir(
-				    refname, extras, skip,
-				    get_loose_refs(refs), err))
+		    !refs_verify_refname_available(&refs->base, refname,
+						   extras, skip, err))
 			strbuf_addf(err, "unable to resolve reference '%s': %s",
 				    refname, strerror(last_errno));
 
@@ -2609,26 +2607,6 @@ static int rename_tmp_log(struct files_ref_store *refs, const char *newrefname)
 	return ret;
 }
 
-static int files_verify_refname_available(struct ref_store *ref_store,
-					  const char *newname,
-					  const struct string_list *extras,
-					  const struct string_list *skip,
-					  struct strbuf *err)
-{
-	struct files_ref_store *refs =
-		files_downcast(ref_store, REF_STORE_READ, "verify_refname_available");
-	struct ref_dir *packed_refs = get_packed_refs(refs);
-	struct ref_dir *loose_refs = get_loose_refs(refs);
-
-	if (verify_refname_available_dir(newname, extras, skip,
-					 packed_refs, err) ||
-	    verify_refname_available_dir(newname, extras, skip,
-					 loose_refs, err))
-		return -1;
-
-	return 0;
-}
-
 static int write_ref_to_lockfile(struct ref_lock *lock,
 				 const unsigned char *sha1, struct strbuf *err);
 static int commit_ref_update(struct files_ref_store *refs,
@@ -4291,7 +4269,6 @@ struct ref_storage_be refs_be_files = {
 
 	files_ref_iterator_begin,
 	files_read_raw_ref,
-	files_verify_refname_available,
 
 	files_reflog_iterator_begin,
 	files_for_each_reflog_ent,
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 545989ae7f..3d46131efb 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -590,12 +590,6 @@ typedef int read_raw_ref_fn(struct ref_store *ref_store,
 			    const char *refname, unsigned char *sha1,
 			    struct strbuf *referent, unsigned int *type);
 
-typedef int verify_refname_available_fn(struct ref_store *ref_store,
-					const char *newname,
-					const struct string_list *extras,
-					const struct string_list *skip,
-					struct strbuf *err);
-
 struct ref_storage_be {
 	struct ref_storage_be *next;
 	const char *name;
@@ -612,7 +606,6 @@ struct ref_storage_be {
 
 	ref_iterator_begin_fn *iterator_begin;
 	read_raw_ref_fn *read_raw_ref;
-	verify_refname_available_fn *verify_refname_available;
 
 	reflog_iterator_begin_fn *reflog_iterator_begin;
 	for_each_reflog_ent_fn *for_each_reflog_ent;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 05/20] refs_verify_refname_available(): use function in more places
  2017-03-20 16:33 [PATCH 00/20] Separate `ref_cache` into a separate module Michael Haggerty
                   ` (3 preceding siblings ...)
  2017-03-20 16:33 ` [PATCH 04/20] refs_verify_refname_available(): implement once for all backends Michael Haggerty
@ 2017-03-20 16:33 ` Michael Haggerty
  2017-03-20 16:33 ` [PATCH 06/20] Rename `add_ref()` to `add_ref_entry()` Michael Haggerty
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Michael Haggerty @ 2017-03-20 16:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, David Turner, Jeff King,
	git, Michael Haggerty

Change `lock_raw_ref()` and `lock_ref_sha1_basic()` to use
`refs_verify_refname_available()` instead of
`verify_refname_available_dir()`. This means that those callsites now
check for conflicts with all references rather than just packed refs,
but the performance cost shouldn't be significant (and will be
regained later).

These were the last callers of `verify_refname_available_dir()`, so
also delete that (very complicated) function.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/files-backend.c | 171 ++++-----------------------------------------------
 1 file changed, 11 insertions(+), 160 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 447f2767f3..cad56efb04 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -738,152 +738,6 @@ static struct ref_iterator *cache_ref_iterator_begin(struct ref_dir *dir)
 	return ref_iterator;
 }
 
-struct nonmatching_ref_data {
-	const struct string_list *skip;
-	const char *conflicting_refname;
-};
-
-static int nonmatching_ref_fn(struct ref_entry *entry, void *vdata)
-{
-	struct nonmatching_ref_data *data = vdata;
-
-	if (data->skip && string_list_has_string(data->skip, entry->name))
-		return 0;
-
-	data->conflicting_refname = entry->name;
-	return 1;
-}
-
-/*
- * Return 0 if a reference named refname could be created without
- * conflicting with the name of an existing reference in dir.
- * See verify_refname_available for more information.
- */
-static int verify_refname_available_dir(const char *refname,
-					const struct string_list *extras,
-					const struct string_list *skip,
-					struct ref_dir *dir,
-					struct strbuf *err)
-{
-	const char *slash;
-	const char *extra_refname;
-	int pos;
-	struct strbuf dirname = STRBUF_INIT;
-	int ret = -1;
-
-	/*
-	 * For the sake of comments in this function, suppose that
-	 * refname is "refs/foo/bar".
-	 */
-
-	assert(err);
-
-	strbuf_grow(&dirname, strlen(refname) + 1);
-	for (slash = strchr(refname, '/'); slash; slash = strchr(slash + 1, '/')) {
-		/* Expand dirname to the new prefix, not including the trailing slash: */
-		strbuf_add(&dirname, refname + dirname.len, slash - refname - dirname.len);
-
-		/*
-		 * We are still at a leading dir of the refname (e.g.,
-		 * "refs/foo"; if there is a reference with that name,
-		 * it is a conflict, *unless* it is in skip.
-		 */
-		if (dir) {
-			pos = search_ref_dir(dir, dirname.buf, dirname.len);
-			if (pos >= 0 &&
-			    (!skip || !string_list_has_string(skip, dirname.buf))) {
-				/*
-				 * We found a reference whose name is
-				 * a proper prefix of refname; e.g.,
-				 * "refs/foo", and is not in skip.
-				 */
-				strbuf_addf(err, "'%s' exists; cannot create '%s'",
-					    dirname.buf, refname);
-				goto cleanup;
-			}
-		}
-
-		if (extras && string_list_has_string(extras, dirname.buf) &&
-		    (!skip || !string_list_has_string(skip, dirname.buf))) {
-			strbuf_addf(err, "cannot process '%s' and '%s' at the same time",
-				    refname, dirname.buf);
-			goto cleanup;
-		}
-
-		/*
-		 * Otherwise, we can try to continue our search with
-		 * the next component. So try to look up the
-		 * directory, e.g., "refs/foo/". If we come up empty,
-		 * we know there is nothing under this whole prefix,
-		 * but even in that case we still have to continue the
-		 * search for conflicts with extras.
-		 */
-		strbuf_addch(&dirname, '/');
-		if (dir) {
-			pos = search_ref_dir(dir, dirname.buf, dirname.len);
-			if (pos < 0) {
-				/*
-				 * There was no directory "refs/foo/",
-				 * so there is nothing under this
-				 * whole prefix. So there is no need
-				 * to continue looking for conflicting
-				 * references. But we need to continue
-				 * looking for conflicting extras.
-				 */
-				dir = NULL;
-			} else {
-				dir = get_ref_dir(dir->entries[pos]);
-			}
-		}
-	}
-
-	/*
-	 * We are at the leaf of our refname (e.g., "refs/foo/bar").
-	 * There is no point in searching for a reference with that
-	 * name, because a refname isn't considered to conflict with
-	 * itself. But we still need to check for references whose
-	 * names are in the "refs/foo/bar/" namespace, because they
-	 * *do* conflict.
-	 */
-	strbuf_addstr(&dirname, refname + dirname.len);
-	strbuf_addch(&dirname, '/');
-
-	if (dir) {
-		pos = search_ref_dir(dir, dirname.buf, dirname.len);
-
-		if (pos >= 0) {
-			/*
-			 * We found a directory named "$refname/"
-			 * (e.g., "refs/foo/bar/"). It is a problem
-			 * iff it contains any ref that is not in
-			 * "skip".
-			 */
-			struct nonmatching_ref_data data;
-
-			data.skip = skip;
-			data.conflicting_refname = NULL;
-			dir = get_ref_dir(dir->entries[pos]);
-			sort_ref_dir(dir);
-			if (do_for_each_entry_in_dir(dir, 0, nonmatching_ref_fn, &data)) {
-				strbuf_addf(err, "'%s' exists; cannot create '%s'",
-					    data.conflicting_refname, refname);
-				goto cleanup;
-			}
-		}
-	}
-
-	extra_refname = find_descendant_ref(dirname.buf, extras, skip);
-	if (extra_refname)
-		strbuf_addf(err, "cannot process '%s' and '%s' at the same time",
-			    refname, extra_refname);
-	else
-		ret = 0;
-
-cleanup:
-	strbuf_release(&dirname);
-	return ret;
-}
-
 struct packed_ref_cache {
 	struct ref_entry *root;
 
@@ -1563,7 +1417,7 @@ static void unlock_ref(struct ref_lock *lock)
  *
  * If the reference doesn't already exist, verify that refname doesn't
  * have a D/F conflict with any existing references. extras and skip
- * are passed to verify_refname_available_dir() for this check.
+ * are passed to refs_verify_refname_available() for this check.
  *
  * If mustexist is not set and the reference is not found or is
  * broken, lock the reference anyway but clear sha1.
@@ -1578,7 +1432,7 @@ static void unlock_ref(struct ref_lock *lock)
  *
  * but it includes a lot more code to
  * - Deal with possible races with other processes
- * - Avoid calling verify_refname_available_dir() when it can be
+ * - Avoid calling refs_verify_refname_available() when it can be
  *   avoided, namely if we were successfully able to read the ref
  * - Generate informative error messages in the case of failure
  */
@@ -1635,7 +1489,8 @@ static int lock_raw_ref(struct files_ref_store *refs,
 			} else {
 				/*
 				 * The error message set by
-				 * verify_refname_available_dir() is OK.
+				 * refs_verify_refname_available() is
+				 * OK.
 				 */
 				ret = TRANSACTION_NAME_CONFLICT;
 			}
@@ -1759,16 +1614,13 @@ static int lock_raw_ref(struct files_ref_store *refs,
 
 		/*
 		 * If the ref did not exist and we are creating it,
-		 * make sure there is no existing packed ref whose
-		 * name begins with our refname, nor a packed ref
-		 * whose name is a proper prefix of our refname.
+		 * make sure there is no existing ref that conflicts
+		 * with refname:
 		 */
-		if (verify_refname_available_dir(
-				    refname, extras, skip,
-				    get_packed_refs(refs),
-				    err)) {
+		if (refs_verify_refname_available(
+				    &refs->base, refname,
+				    extras, skip, err))
 			goto error_return;
-		}
 	}
 
 	ret = 0;
@@ -2125,9 +1977,8 @@ static struct ref_lock *lock_ref_sha1_basic(struct files_ref_store *refs,
 	 * our refname.
 	 */
 	if (is_null_oid(&lock->old_oid) &&
-	    verify_refname_available_dir(refname, extras, skip,
-					 get_packed_refs(refs),
-					 err)) {
+	    refs_verify_refname_available(&refs->base, refname,
+					  extras, skip, err)) {
 		last_errno = ENOTDIR;
 		goto error_return;
 	}
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 06/20] Rename `add_ref()` to `add_ref_entry()`
  2017-03-20 16:33 [PATCH 00/20] Separate `ref_cache` into a separate module Michael Haggerty
                   ` (4 preceding siblings ...)
  2017-03-20 16:33 ` [PATCH 05/20] refs_verify_refname_available(): use function in more places Michael Haggerty
@ 2017-03-20 16:33 ` Michael Haggerty
  2017-03-20 16:33 ` [PATCH 07/20] Rename `find_ref()` to `find_ref_entry()` Michael Haggerty
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Michael Haggerty @ 2017-03-20 16:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, David Turner, Jeff King,
	git, Michael Haggerty

This function's visibility is about to be increased, so give it a more
distinctive name.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/files-backend.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index cad56efb04..4d579cbdac 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -455,7 +455,7 @@ static int remove_entry(struct ref_dir *dir, const char *refname)
  * subdirectories as necessary.  dir must represent the top-level
  * directory.  Return 0 on success.
  */
-static int add_ref(struct ref_dir *dir, struct ref_entry *ref)
+static int add_ref_entry(struct ref_dir *dir, struct ref_entry *ref)
 {
 	dir = find_containing_dir(dir, ref->name, 1);
 	if (!dir)
@@ -994,7 +994,7 @@ static void read_packed_refs(FILE *f, struct ref_dir *dir)
 			if (peeled == PEELED_FULLY ||
 			    (peeled == PEELED_TAGS && starts_with(refname, "refs/tags/")))
 				last->flag |= REF_KNOWS_PEELED;
-			add_ref(dir, last);
+			add_ref_entry(dir, last);
 			continue;
 		}
 		if (last &&
@@ -1116,7 +1116,7 @@ static void add_packed_ref(struct files_ref_store *refs,
 
 	if (!packed_ref_cache->lock)
 		die("internal error: packed refs not locked");
-	add_ref(get_packed_ref_dir(packed_ref_cache),
+	add_ref_entry(get_packed_ref_dir(packed_ref_cache),
 		create_ref_entry(refname, sha1, REF_ISPACKED, 1));
 }
 
@@ -2179,7 +2179,7 @@ static int pack_if_possible_fn(struct ref_entry *entry, void *cb_data)
 	} else {
 		packed_entry = create_ref_entry(entry->name, entry->u.value.oid.hash,
 						REF_ISPACKED | REF_KNOWS_PEELED, 0);
-		add_ref(cb->packed_refs, packed_entry);
+		add_ref_entry(cb->packed_refs, packed_entry);
 	}
 	oidcpy(&packed_entry->u.value.peeled, &entry->u.value.peeled);
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 07/20] Rename `find_ref()` to `find_ref_entry()`
  2017-03-20 16:33 [PATCH 00/20] Separate `ref_cache` into a separate module Michael Haggerty
                   ` (5 preceding siblings ...)
  2017-03-20 16:33 ` [PATCH 06/20] Rename `add_ref()` to `add_ref_entry()` Michael Haggerty
@ 2017-03-20 16:33 ` Michael Haggerty
  2017-03-20 16:33 ` [PATCH 08/20] Rename `remove_entry()` to `remove_entry_from_dir()` Michael Haggerty
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Michael Haggerty @ 2017-03-20 16:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, David Turner, Jeff King,
	git, Michael Haggerty

This function's visibility is about to be increased, so give it a more
distinctive name.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/files-backend.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 4d579cbdac..6768c8c86b 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -385,7 +385,7 @@ static struct ref_dir *find_containing_dir(struct ref_dir *dir,
  * and recursing into subdirectories as necessary.  If the name is not
  * found or it corresponds to a directory entry, return NULL.
  */
-static struct ref_entry *find_ref(struct ref_dir *dir, const char *refname)
+static struct ref_entry *find_ref_entry(struct ref_dir *dir, const char *refname)
 {
 	int entry_index;
 	struct ref_entry *entry;
@@ -1227,7 +1227,7 @@ static struct ref_dir *get_loose_refs(struct files_ref_store *refs)
 static struct ref_entry *get_packed_ref(struct files_ref_store *refs,
 					const char *refname)
 {
-	return find_ref(get_packed_refs(refs), refname);
+	return find_ref_entry(get_packed_refs(refs), refname);
 }
 
 /*
@@ -2171,7 +2171,7 @@ static int pack_if_possible_fn(struct ref_entry *entry, void *cb_data)
 	if (peel_status != PEEL_PEELED && peel_status != PEEL_NON_TAG)
 		die("internal error peeling reference %s (%s)",
 		    entry->name, oid_to_hex(&entry->u.value.oid));
-	packed_entry = find_ref(cb->packed_refs, entry->name);
+	packed_entry = find_ref_entry(cb->packed_refs, entry->name);
 	if (packed_entry) {
 		/* Overwrite existing packed entry with info from loose entry */
 		packed_entry->flag = REF_ISPACKED | REF_KNOWS_PEELED;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 08/20] Rename `remove_entry()` to `remove_entry_from_dir()`
  2017-03-20 16:33 [PATCH 00/20] Separate `ref_cache` into a separate module Michael Haggerty
                   ` (6 preceding siblings ...)
  2017-03-20 16:33 ` [PATCH 07/20] Rename `find_ref()` to `find_ref_entry()` Michael Haggerty
@ 2017-03-20 16:33 ` Michael Haggerty
  2017-03-20 16:33 ` [PATCH 09/20] refs: split `ref_cache` code into separate files Michael Haggerty
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Michael Haggerty @ 2017-03-20 16:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, David Turner, Jeff King,
	git, Michael Haggerty

This function's visibility is about to be increased, so give it a more
distinctive name.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/files-backend.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 6768c8c86b..b4c11afadf 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -413,7 +413,7 @@ static struct ref_entry *find_ref_entry(struct ref_dir *dir, const char *refname
  * empty by the removal.  dir must represent the top-level directory
  * and must already be complete.
  */
-static int remove_entry(struct ref_dir *dir, const char *refname)
+static int remove_entry_from_dir(struct ref_dir *dir, const char *refname)
 {
 	int refname_len = strlen(refname);
 	int entry_index;
@@ -2338,7 +2338,7 @@ static int repack_without_refs(struct files_ref_store *refs,
 
 	/* Remove refnames from the cache */
 	for_each_string_list_item(refname, refnames)
-		if (remove_entry(packed, refname->string) != -1)
+		if (remove_entry_from_dir(packed, refname->string) != -1)
 			removed = 1;
 	if (!removed) {
 		/*
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 09/20] refs: split `ref_cache` code into separate files
  2017-03-20 16:33 [PATCH 00/20] Separate `ref_cache` into a separate module Michael Haggerty
                   ` (7 preceding siblings ...)
  2017-03-20 16:33 ` [PATCH 08/20] Rename `remove_entry()` to `remove_entry_from_dir()` Michael Haggerty
@ 2017-03-20 16:33 ` Michael Haggerty
  2017-03-20 17:49   ` Jeff King
  2017-03-20 16:33 ` [PATCH 10/20] ref-cache: introduce a new type, ref_cache Michael Haggerty
                   ` (14 subsequent siblings)
  23 siblings, 1 reply; 38+ messages in thread
From: Michael Haggerty @ 2017-03-20 16:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, David Turner, Jeff King,
	git, Michael Haggerty

The `ref_cache` code is currently too tightly coupled to
`files-backend`, making the code harder to understand and making it
awkward for new code to use `ref_cache` (as we indeed have planned).
Start loosening that coupling by splitting `ref_cache` into a separate
module.

This commit moves code, adds declarations, and changes the visibility
of some functions, but doesn't change any code.

The modules are still too tightly coupled, but the situation will be
improved in subsequent commits.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 Makefile             |   1 +
 refs/files-backend.c | 736 +--------------------------------------------------
 refs/ref-cache.c     | 512 +++++++++++++++++++++++++++++++++++
 refs/ref-cache.h     | 251 ++++++++++++++++++
 4 files changed, 767 insertions(+), 733 deletions(-)
 create mode 100644 refs/ref-cache.c
 create mode 100644 refs/ref-cache.h

diff --git a/Makefile b/Makefile
index 5f3844e33e..2f30580cde 100644
--- a/Makefile
+++ b/Makefile
@@ -807,6 +807,7 @@ LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
 LIB_OBJS += refs/files-backend.o
 LIB_OBJS += refs/iterator.o
+LIB_OBJS += refs/ref-cache.o
 LIB_OBJS += ref-filter.o
 LIB_OBJS += remote.o
 LIB_OBJS += replace_object.o
diff --git a/refs/files-backend.c b/refs/files-backend.c
index b4c11afadf..f07e93d522 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1,6 +1,7 @@
 #include "../cache.h"
 #include "../refs.h"
 #include "refs-internal.h"
+#include "ref-cache.h"
 #include "../iterator.h"
 #include "../dir-iterator.h"
 #include "../lockfile.h"
@@ -13,509 +14,6 @@ struct ref_lock {
 	struct object_id old_oid;
 };
 
-struct ref_entry;
-
-/*
- * Information used (along with the information in ref_entry) to
- * describe a single cached reference.  This data structure only
- * occurs embedded in a union in struct ref_entry, and only when
- * (ref_entry->flag & REF_DIR) is zero.
- */
-struct ref_value {
-	/*
-	 * The name of the object to which this reference resolves
-	 * (which may be a tag object).  If REF_ISBROKEN, this is
-	 * null.  If REF_ISSYMREF, then this is the name of the object
-	 * referred to by the last reference in the symlink chain.
-	 */
-	struct object_id oid;
-
-	/*
-	 * If REF_KNOWS_PEELED, then this field holds the peeled value
-	 * of this reference, or null if the reference is known not to
-	 * be peelable.  See the documentation for peel_ref() for an
-	 * exact definition of "peelable".
-	 */
-	struct object_id peeled;
-};
-
-struct files_ref_store;
-
-/*
- * Information used (along with the information in ref_entry) to
- * describe a level in the hierarchy of references.  This data
- * structure only occurs embedded in a union in struct ref_entry, and
- * only when (ref_entry.flag & REF_DIR) is set.  In that case,
- * (ref_entry.flag & REF_INCOMPLETE) determines whether the references
- * in the directory have already been read:
- *
- *     (ref_entry.flag & REF_INCOMPLETE) unset -- a directory of loose
- *         or packed references, already read.
- *
- *     (ref_entry.flag & REF_INCOMPLETE) set -- a directory of loose
- *         references that hasn't been read yet (nor has any of its
- *         subdirectories).
- *
- * Entries within a directory are stored within a growable array of
- * pointers to ref_entries (entries, nr, alloc).  Entries 0 <= i <
- * sorted are sorted by their component name in strcmp() order and the
- * remaining entries are unsorted.
- *
- * Loose references are read lazily, one directory at a time.  When a
- * directory of loose references is read, then all of the references
- * in that directory are stored, and REF_INCOMPLETE stubs are created
- * for any subdirectories, but the subdirectories themselves are not
- * read.  The reading is triggered by get_ref_dir().
- */
-struct ref_dir {
-	int nr, alloc;
-
-	/*
-	 * Entries with index 0 <= i < sorted are sorted by name.  New
-	 * entries are appended to the list unsorted, and are sorted
-	 * only when required; thus we avoid the need to sort the list
-	 * after the addition of every reference.
-	 */
-	int sorted;
-
-	/* A pointer to the files_ref_store that contains this ref_dir. */
-	struct files_ref_store *ref_store;
-
-	struct ref_entry **entries;
-};
-
-/*
- * Bit values for ref_entry::flag.  REF_ISSYMREF=0x01,
- * REF_ISPACKED=0x02, REF_ISBROKEN=0x04 and REF_BAD_NAME=0x08 are
- * public values; see refs.h.
- */
-
-/*
- * The field ref_entry->u.value.peeled of this value entry contains
- * the correct peeled value for the reference, which might be
- * null_sha1 if the reference is not a tag or if it is broken.
- */
-#define REF_KNOWS_PEELED 0x10
-
-/* ref_entry represents a directory of references */
-#define REF_DIR 0x20
-
-/*
- * Entry has not yet been read from disk (used only for REF_DIR
- * entries representing loose references)
- */
-#define REF_INCOMPLETE 0x40
-
-/*
- * A ref_entry represents either a reference or a "subdirectory" of
- * references.
- *
- * Each directory in the reference namespace is represented by a
- * ref_entry with (flags & REF_DIR) set and containing a subdir member
- * that holds the entries in that directory that have been read so
- * far.  If (flags & REF_INCOMPLETE) is set, then the directory and
- * its subdirectories haven't been read yet.  REF_INCOMPLETE is only
- * used for loose reference directories.
- *
- * References are represented by a ref_entry with (flags & REF_DIR)
- * unset and a value member that describes the reference's value.  The
- * flag member is at the ref_entry level, but it is also needed to
- * interpret the contents of the value field (in other words, a
- * ref_value object is not very much use without the enclosing
- * ref_entry).
- *
- * Reference names cannot end with slash and directories' names are
- * always stored with a trailing slash (except for the top-level
- * directory, which is always denoted by "").  This has two nice
- * consequences: (1) when the entries in each subdir are sorted
- * lexicographically by name (as they usually are), the references in
- * a whole tree can be generated in lexicographic order by traversing
- * the tree in left-to-right, depth-first order; (2) the names of
- * references and subdirectories cannot conflict, and therefore the
- * presence of an empty subdirectory does not block the creation of a
- * similarly-named reference.  (The fact that reference names with the
- * same leading components can conflict *with each other* is a
- * separate issue that is regulated by verify_refname_available().)
- *
- * Please note that the name field contains the fully-qualified
- * reference (or subdirectory) name.  Space could be saved by only
- * storing the relative names.  But that would require the full names
- * to be generated on the fly when iterating in do_for_each_ref(), and
- * would break callback functions, who have always been able to assume
- * that the name strings that they are passed will not be freed during
- * the iteration.
- */
-struct ref_entry {
-	unsigned char flag; /* ISSYMREF? ISPACKED? */
-	union {
-		struct ref_value value; /* if not (flags&REF_DIR) */
-		struct ref_dir subdir; /* if (flags&REF_DIR) */
-	} u;
-	/*
-	 * The full name of the reference (e.g., "refs/heads/master")
-	 * or the full name of the directory with a trailing slash
-	 * (e.g., "refs/heads/"):
-	 */
-	char name[FLEX_ARRAY];
-};
-
-static void read_loose_refs(const char *dirname, struct ref_dir *dir);
-static int search_ref_dir(struct ref_dir *dir, const char *refname, size_t len);
-static struct ref_entry *create_dir_entry(struct files_ref_store *ref_store,
-					  const char *dirname, size_t len,
-					  int incomplete);
-static void add_entry_to_dir(struct ref_dir *dir, struct ref_entry *entry);
-static int files_log_ref_write(struct files_ref_store *refs,
-			       const char *refname, const unsigned char *old_sha1,
-			       const unsigned char *new_sha1, const char *msg,
-			       int flags, struct strbuf *err);
-
-static struct ref_dir *get_ref_dir(struct ref_entry *entry)
-{
-	struct ref_dir *dir;
-	assert(entry->flag & REF_DIR);
-	dir = &entry->u.subdir;
-	if (entry->flag & REF_INCOMPLETE) {
-		read_loose_refs(entry->name, dir);
-
-		/*
-		 * Manually add refs/bisect, which, being
-		 * per-worktree, might not appear in the directory
-		 * listing for refs/ in the main repo.
-		 */
-		if (!strcmp(entry->name, "refs/")) {
-			int pos = search_ref_dir(dir, "refs/bisect/", 12);
-			if (pos < 0) {
-				struct ref_entry *child_entry;
-				child_entry = create_dir_entry(dir->ref_store,
-							       "refs/bisect/",
-							       12, 1);
-				add_entry_to_dir(dir, child_entry);
-			}
-		}
-		entry->flag &= ~REF_INCOMPLETE;
-	}
-	return dir;
-}
-
-static struct ref_entry *create_ref_entry(const char *refname,
-					  const unsigned char *sha1, int flag,
-					  int check_name)
-{
-	struct ref_entry *ref;
-
-	if (check_name &&
-	    check_refname_format(refname, REFNAME_ALLOW_ONELEVEL))
-		die("Reference has invalid format: '%s'", refname);
-	FLEX_ALLOC_STR(ref, name, refname);
-	hashcpy(ref->u.value.oid.hash, sha1);
-	oidclr(&ref->u.value.peeled);
-	ref->flag = flag;
-	return ref;
-}
-
-static void clear_ref_dir(struct ref_dir *dir);
-
-static void free_ref_entry(struct ref_entry *entry)
-{
-	if (entry->flag & REF_DIR) {
-		/*
-		 * Do not use get_ref_dir() here, as that might
-		 * trigger the reading of loose refs.
-		 */
-		clear_ref_dir(&entry->u.subdir);
-	}
-	free(entry);
-}
-
-/*
- * Add a ref_entry to the end of dir (unsorted).  Entry is always
- * stored directly in dir; no recursion into subdirectories is
- * done.
- */
-static void add_entry_to_dir(struct ref_dir *dir, struct ref_entry *entry)
-{
-	ALLOC_GROW(dir->entries, dir->nr + 1, dir->alloc);
-	dir->entries[dir->nr++] = entry;
-	/* optimize for the case that entries are added in order */
-	if (dir->nr == 1 ||
-	    (dir->nr == dir->sorted + 1 &&
-	     strcmp(dir->entries[dir->nr - 2]->name,
-		    dir->entries[dir->nr - 1]->name) < 0))
-		dir->sorted = dir->nr;
-}
-
-/*
- * Clear and free all entries in dir, recursively.
- */
-static void clear_ref_dir(struct ref_dir *dir)
-{
-	int i;
-	for (i = 0; i < dir->nr; i++)
-		free_ref_entry(dir->entries[i]);
-	free(dir->entries);
-	dir->sorted = dir->nr = dir->alloc = 0;
-	dir->entries = NULL;
-}
-
-/*
- * Create a struct ref_entry object for the specified dirname.
- * dirname is the name of the directory with a trailing slash (e.g.,
- * "refs/heads/") or "" for the top-level directory.
- */
-static struct ref_entry *create_dir_entry(struct files_ref_store *ref_store,
-					  const char *dirname, size_t len,
-					  int incomplete)
-{
-	struct ref_entry *direntry;
-	FLEX_ALLOC_MEM(direntry, name, dirname, len);
-	direntry->u.subdir.ref_store = ref_store;
-	direntry->flag = REF_DIR | (incomplete ? REF_INCOMPLETE : 0);
-	return direntry;
-}
-
-static int ref_entry_cmp(const void *a, const void *b)
-{
-	struct ref_entry *one = *(struct ref_entry **)a;
-	struct ref_entry *two = *(struct ref_entry **)b;
-	return strcmp(one->name, two->name);
-}
-
-static void sort_ref_dir(struct ref_dir *dir);
-
-struct string_slice {
-	size_t len;
-	const char *str;
-};
-
-static int ref_entry_cmp_sslice(const void *key_, const void *ent_)
-{
-	const struct string_slice *key = key_;
-	const struct ref_entry *ent = *(const struct ref_entry * const *)ent_;
-	int cmp = strncmp(key->str, ent->name, key->len);
-	if (cmp)
-		return cmp;
-	return '\0' - (unsigned char)ent->name[key->len];
-}
-
-/*
- * Return the index of the entry with the given refname from the
- * ref_dir (non-recursively), sorting dir if necessary.  Return -1 if
- * no such entry is found.  dir must already be complete.
- */
-static int search_ref_dir(struct ref_dir *dir, const char *refname, size_t len)
-{
-	struct ref_entry **r;
-	struct string_slice key;
-
-	if (refname == NULL || !dir->nr)
-		return -1;
-
-	sort_ref_dir(dir);
-	key.len = len;
-	key.str = refname;
-	r = bsearch(&key, dir->entries, dir->nr, sizeof(*dir->entries),
-		    ref_entry_cmp_sslice);
-
-	if (r == NULL)
-		return -1;
-
-	return r - dir->entries;
-}
-
-/*
- * Search for a directory entry directly within dir (without
- * recursing).  Sort dir if necessary.  subdirname must be a directory
- * name (i.e., end in '/').  If mkdir is set, then create the
- * directory if it is missing; otherwise, return NULL if the desired
- * directory cannot be found.  dir must already be complete.
- */
-static struct ref_dir *search_for_subdir(struct ref_dir *dir,
-					 const char *subdirname, size_t len,
-					 int mkdir)
-{
-	int entry_index = search_ref_dir(dir, subdirname, len);
-	struct ref_entry *entry;
-	if (entry_index == -1) {
-		if (!mkdir)
-			return NULL;
-		/*
-		 * Since dir is complete, the absence of a subdir
-		 * means that the subdir really doesn't exist;
-		 * therefore, create an empty record for it but mark
-		 * the record complete.
-		 */
-		entry = create_dir_entry(dir->ref_store, subdirname, len, 0);
-		add_entry_to_dir(dir, entry);
-	} else {
-		entry = dir->entries[entry_index];
-	}
-	return get_ref_dir(entry);
-}
-
-/*
- * If refname is a reference name, find the ref_dir within the dir
- * tree that should hold refname.  If refname is a directory name
- * (i.e., ends in '/'), then return that ref_dir itself.  dir must
- * represent the top-level directory and must already be complete.
- * Sort ref_dirs and recurse into subdirectories as necessary.  If
- * mkdir is set, then create any missing directories; otherwise,
- * return NULL if the desired directory cannot be found.
- */
-static struct ref_dir *find_containing_dir(struct ref_dir *dir,
-					   const char *refname, int mkdir)
-{
-	const char *slash;
-	for (slash = strchr(refname, '/'); slash; slash = strchr(slash + 1, '/')) {
-		size_t dirnamelen = slash - refname + 1;
-		struct ref_dir *subdir;
-		subdir = search_for_subdir(dir, refname, dirnamelen, mkdir);
-		if (!subdir) {
-			dir = NULL;
-			break;
-		}
-		dir = subdir;
-	}
-
-	return dir;
-}
-
-/*
- * Find the value entry with the given name in dir, sorting ref_dirs
- * and recursing into subdirectories as necessary.  If the name is not
- * found or it corresponds to a directory entry, return NULL.
- */
-static struct ref_entry *find_ref_entry(struct ref_dir *dir, const char *refname)
-{
-	int entry_index;
-	struct ref_entry *entry;
-	dir = find_containing_dir(dir, refname, 0);
-	if (!dir)
-		return NULL;
-	entry_index = search_ref_dir(dir, refname, strlen(refname));
-	if (entry_index == -1)
-		return NULL;
-	entry = dir->entries[entry_index];
-	return (entry->flag & REF_DIR) ? NULL : entry;
-}
-
-/*
- * Remove the entry with the given name from dir, recursing into
- * subdirectories as necessary.  If refname is the name of a directory
- * (i.e., ends with '/'), then remove the directory and its contents.
- * If the removal was successful, return the number of entries
- * remaining in the directory entry that contained the deleted entry.
- * If the name was not found, return -1.  Please note that this
- * function only deletes the entry from the cache; it does not delete
- * it from the filesystem or ensure that other cache entries (which
- * might be symbolic references to the removed entry) are updated.
- * Nor does it remove any containing dir entries that might be made
- * empty by the removal.  dir must represent the top-level directory
- * and must already be complete.
- */
-static int remove_entry_from_dir(struct ref_dir *dir, const char *refname)
-{
-	int refname_len = strlen(refname);
-	int entry_index;
-	struct ref_entry *entry;
-	int is_dir = refname[refname_len - 1] == '/';
-	if (is_dir) {
-		/*
-		 * refname represents a reference directory.  Remove
-		 * the trailing slash; otherwise we will get the
-		 * directory *representing* refname rather than the
-		 * one *containing* it.
-		 */
-		char *dirname = xmemdupz(refname, refname_len - 1);
-		dir = find_containing_dir(dir, dirname, 0);
-		free(dirname);
-	} else {
-		dir = find_containing_dir(dir, refname, 0);
-	}
-	if (!dir)
-		return -1;
-	entry_index = search_ref_dir(dir, refname, refname_len);
-	if (entry_index == -1)
-		return -1;
-	entry = dir->entries[entry_index];
-
-	memmove(&dir->entries[entry_index],
-		&dir->entries[entry_index + 1],
-		(dir->nr - entry_index - 1) * sizeof(*dir->entries)
-		);
-	dir->nr--;
-	if (dir->sorted > entry_index)
-		dir->sorted--;
-	free_ref_entry(entry);
-	return dir->nr;
-}
-
-/*
- * Add a ref_entry to the ref_dir (unsorted), recursing into
- * subdirectories as necessary.  dir must represent the top-level
- * directory.  Return 0 on success.
- */
-static int add_ref_entry(struct ref_dir *dir, struct ref_entry *ref)
-{
-	dir = find_containing_dir(dir, ref->name, 1);
-	if (!dir)
-		return -1;
-	add_entry_to_dir(dir, ref);
-	return 0;
-}
-
-/*
- * Emit a warning and return true iff ref1 and ref2 have the same name
- * and the same sha1.  Die if they have the same name but different
- * sha1s.
- */
-static int is_dup_ref(const struct ref_entry *ref1, const struct ref_entry *ref2)
-{
-	if (strcmp(ref1->name, ref2->name))
-		return 0;
-
-	/* Duplicate name; make sure that they don't conflict: */
-
-	if ((ref1->flag & REF_DIR) || (ref2->flag & REF_DIR))
-		/* This is impossible by construction */
-		die("Reference directory conflict: %s", ref1->name);
-
-	if (oidcmp(&ref1->u.value.oid, &ref2->u.value.oid))
-		die("Duplicated ref, and SHA1s don't match: %s", ref1->name);
-
-	warning("Duplicated ref: %s", ref1->name);
-	return 1;
-}
-
-/*
- * Sort the entries in dir non-recursively (if they are not already
- * sorted) and remove any duplicate entries.
- */
-static void sort_ref_dir(struct ref_dir *dir)
-{
-	int i, j;
-	struct ref_entry *last = NULL;
-
-	/*
-	 * This check also prevents passing a zero-length array to qsort(),
-	 * which is a problem on some platforms.
-	 */
-	if (dir->sorted == dir->nr)
-		return;
-
-	QSORT(dir->entries, dir->nr, ref_entry_cmp);
-
-	/* Remove any duplicates: */
-	for (i = 0, j = 0; j < dir->nr; j++) {
-		struct ref_entry *entry = dir->entries[j];
-		if (last && is_dup_ref(last, entry))
-			free_ref_entry(entry);
-		else
-			last = dir->entries[i++] = entry;
-	}
-	dir->sorted = dir->nr = i;
-}
-
 /*
  * Return true if refname, which has the specified oid and flags, can
  * be resolved to an object in the database. If the referred-to object
@@ -545,199 +43,6 @@ static int entry_resolves_to_object(struct ref_entry *entry)
 				      &entry->u.value.oid, entry->flag);
 }
 
-typedef int each_ref_entry_fn(struct ref_entry *entry, void *cb_data);
-
-/*
- * Call fn for each reference in dir that has index in the range
- * offset <= index < dir->nr.  Recurse into subdirectories that are in
- * that index range, sorting them before iterating.  This function
- * does not sort dir itself; it should be sorted beforehand.  fn is
- * called for all references, including broken ones.
- */
-static int do_for_each_entry_in_dir(struct ref_dir *dir, int offset,
-				    each_ref_entry_fn fn, void *cb_data)
-{
-	int i;
-	assert(dir->sorted == dir->nr);
-	for (i = offset; i < dir->nr; i++) {
-		struct ref_entry *entry = dir->entries[i];
-		int retval;
-		if (entry->flag & REF_DIR) {
-			struct ref_dir *subdir = get_ref_dir(entry);
-			sort_ref_dir(subdir);
-			retval = do_for_each_entry_in_dir(subdir, 0, fn, cb_data);
-		} else {
-			retval = fn(entry, cb_data);
-		}
-		if (retval)
-			return retval;
-	}
-	return 0;
-}
-
-/*
- * Load all of the refs from the dir into our in-memory cache. The hard work
- * of loading loose refs is done by get_ref_dir(), so we just need to recurse
- * through all of the sub-directories. We do not even need to care about
- * sorting, as traversal order does not matter to us.
- */
-static void prime_ref_dir(struct ref_dir *dir)
-{
-	int i;
-	for (i = 0; i < dir->nr; i++) {
-		struct ref_entry *entry = dir->entries[i];
-		if (entry->flag & REF_DIR)
-			prime_ref_dir(get_ref_dir(entry));
-	}
-}
-
-/*
- * A level in the reference hierarchy that is currently being iterated
- * through.
- */
-struct cache_ref_iterator_level {
-	/*
-	 * The ref_dir being iterated over at this level. The ref_dir
-	 * is sorted before being stored here.
-	 */
-	struct ref_dir *dir;
-
-	/*
-	 * The index of the current entry within dir (which might
-	 * itself be a directory). If index == -1, then the iteration
-	 * hasn't yet begun. If index == dir->nr, then the iteration
-	 * through this level is over.
-	 */
-	int index;
-};
-
-/*
- * Represent an iteration through a ref_dir in the memory cache. The
- * iteration recurses through subdirectories.
- */
-struct cache_ref_iterator {
-	struct ref_iterator base;
-
-	/*
-	 * The number of levels currently on the stack. This is always
-	 * at least 1, because when it becomes zero the iteration is
-	 * ended and this struct is freed.
-	 */
-	size_t levels_nr;
-
-	/* The number of levels that have been allocated on the stack */
-	size_t levels_alloc;
-
-	/*
-	 * A stack of levels. levels[0] is the uppermost level that is
-	 * being iterated over in this iteration. (This is not
-	 * necessary the top level in the references hierarchy. If we
-	 * are iterating through a subtree, then levels[0] will hold
-	 * the ref_dir for that subtree, and subsequent levels will go
-	 * on from there.)
-	 */
-	struct cache_ref_iterator_level *levels;
-};
-
-static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
-{
-	struct cache_ref_iterator *iter =
-		(struct cache_ref_iterator *)ref_iterator;
-
-	while (1) {
-		struct cache_ref_iterator_level *level =
-			&iter->levels[iter->levels_nr - 1];
-		struct ref_dir *dir = level->dir;
-		struct ref_entry *entry;
-
-		if (level->index == -1)
-			sort_ref_dir(dir);
-
-		if (++level->index == level->dir->nr) {
-			/* This level is exhausted; pop up a level */
-			if (--iter->levels_nr == 0)
-				return ref_iterator_abort(ref_iterator);
-
-			continue;
-		}
-
-		entry = dir->entries[level->index];
-
-		if (entry->flag & REF_DIR) {
-			/* push down a level */
-			ALLOC_GROW(iter->levels, iter->levels_nr + 1,
-				   iter->levels_alloc);
-
-			level = &iter->levels[iter->levels_nr++];
-			level->dir = get_ref_dir(entry);
-			level->index = -1;
-		} else {
-			iter->base.refname = entry->name;
-			iter->base.oid = &entry->u.value.oid;
-			iter->base.flags = entry->flag;
-			return ITER_OK;
-		}
-	}
-}
-
-static enum peel_status peel_entry(struct ref_entry *entry, int repeel);
-
-static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator,
-				   struct object_id *peeled)
-{
-	struct cache_ref_iterator *iter =
-		(struct cache_ref_iterator *)ref_iterator;
-	struct cache_ref_iterator_level *level;
-	struct ref_entry *entry;
-
-	level = &iter->levels[iter->levels_nr - 1];
-
-	if (level->index == -1)
-		die("BUG: peel called before advance for cache iterator");
-
-	entry = level->dir->entries[level->index];
-
-	if (peel_entry(entry, 0))
-		return -1;
-	oidcpy(peeled, &entry->u.value.peeled);
-	return 0;
-}
-
-static int cache_ref_iterator_abort(struct ref_iterator *ref_iterator)
-{
-	struct cache_ref_iterator *iter =
-		(struct cache_ref_iterator *)ref_iterator;
-
-	free(iter->levels);
-	base_ref_iterator_free(ref_iterator);
-	return ITER_DONE;
-}
-
-static struct ref_iterator_vtable cache_ref_iterator_vtable = {
-	cache_ref_iterator_advance,
-	cache_ref_iterator_peel,
-	cache_ref_iterator_abort
-};
-
-static struct ref_iterator *cache_ref_iterator_begin(struct ref_dir *dir)
-{
-	struct cache_ref_iterator *iter;
-	struct ref_iterator *ref_iterator;
-	struct cache_ref_iterator_level *level;
-
-	iter = xcalloc(1, sizeof(*iter));
-	ref_iterator = &iter->base;
-	base_ref_iterator_init(ref_iterator, &cache_ref_iterator_vtable);
-	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
-
-	iter->levels_nr = 1;
-	level = &iter->levels[0];
-	level->index = -1;
-	level->dir = dir;
-
-	return ref_iterator;
-}
-
 struct packed_ref_cache {
 	struct ref_entry *root;
 
@@ -1117,7 +422,7 @@ static void add_packed_ref(struct files_ref_store *refs,
 	if (!packed_ref_cache->lock)
 		die("internal error: packed refs not locked");
 	add_ref_entry(get_packed_ref_dir(packed_ref_cache),
-		create_ref_entry(refname, sha1, REF_ISPACKED, 1));
+		      create_ref_entry(refname, sha1, REF_ISPACKED, 1));
 }
 
 /*
@@ -1125,7 +430,7 @@ static void add_packed_ref(struct files_ref_store *refs,
  * (without recursing).  dirname must end with '/'.  dir must be the
  * directory entry corresponding to dirname.
  */
-static void read_loose_refs(const char *dirname, struct ref_dir *dir)
+void read_loose_refs(const char *dirname, struct ref_dir *dir)
 {
 	struct files_ref_store *refs = dir->ref_store;
 	DIR *d;
@@ -1635,41 +940,6 @@ static int lock_raw_ref(struct files_ref_store *refs,
 	return ret;
 }
 
-/*
- * Peel the entry (if possible) and return its new peel_status.  If
- * repeel is true, re-peel the entry even if there is an old peeled
- * value that is already stored in it.
- *
- * It is OK to call this function with a packed reference entry that
- * might be stale and might even refer to an object that has since
- * been garbage-collected.  In such a case, if the entry has
- * REF_KNOWS_PEELED then leave the status unchanged and return
- * PEEL_PEELED or PEEL_NON_TAG; otherwise, return PEEL_INVALID.
- */
-static enum peel_status peel_entry(struct ref_entry *entry, int repeel)
-{
-	enum peel_status status;
-
-	if (entry->flag & REF_KNOWS_PEELED) {
-		if (repeel) {
-			entry->flag &= ~REF_KNOWS_PEELED;
-			oidclr(&entry->u.value.peeled);
-		} else {
-			return is_null_oid(&entry->u.value.peeled) ?
-				PEEL_NON_TAG : PEEL_PEELED;
-		}
-	}
-	if (entry->flag & REF_ISBROKEN)
-		return PEEL_BROKEN;
-	if (entry->flag & REF_ISSYMREF)
-		return PEEL_IS_SYMREF;
-
-	status = peel_object(entry->u.value.oid.hash, entry->u.value.peeled.hash);
-	if (status == PEEL_PEELED || status == PEEL_NON_TAG)
-		entry->flag |= REF_KNOWS_PEELED;
-	return status;
-}
-
 static int files_peel_ref(struct ref_store *ref_store,
 			  const char *refname, unsigned char *sha1)
 {
diff --git a/refs/ref-cache.c b/refs/ref-cache.c
new file mode 100644
index 0000000000..4274a43a9b
--- /dev/null
+++ b/refs/ref-cache.c
@@ -0,0 +1,512 @@
+#include "../cache.h"
+#include "../refs.h"
+#include "refs-internal.h"
+#include "ref-cache.h"
+#include "../iterator.h"
+
+/* FIXME: This declaration shouldn't be here */
+void read_loose_refs(const char *dirname, struct ref_dir *dir);
+
+void add_entry_to_dir(struct ref_dir *dir, struct ref_entry *entry)
+{
+	ALLOC_GROW(dir->entries, dir->nr + 1, dir->alloc);
+	dir->entries[dir->nr++] = entry;
+	/* optimize for the case that entries are added in order */
+	if (dir->nr == 1 ||
+	    (dir->nr == dir->sorted + 1 &&
+	     strcmp(dir->entries[dir->nr - 2]->name,
+		    dir->entries[dir->nr - 1]->name) < 0))
+		dir->sorted = dir->nr;
+}
+
+struct ref_dir *get_ref_dir(struct ref_entry *entry)
+{
+	struct ref_dir *dir;
+	assert(entry->flag & REF_DIR);
+	dir = &entry->u.subdir;
+	if (entry->flag & REF_INCOMPLETE) {
+		read_loose_refs(entry->name, dir);
+
+		/*
+		 * Manually add refs/bisect, which, being
+		 * per-worktree, might not appear in the directory
+		 * listing for refs/ in the main repo.
+		 */
+		if (!strcmp(entry->name, "refs/")) {
+			int pos = search_ref_dir(dir, "refs/bisect/", 12);
+			if (pos < 0) {
+				struct ref_entry *child_entry;
+				child_entry = create_dir_entry(dir->ref_store,
+							       "refs/bisect/",
+							       12, 1);
+				add_entry_to_dir(dir, child_entry);
+			}
+		}
+		entry->flag &= ~REF_INCOMPLETE;
+	}
+	return dir;
+}
+
+struct ref_entry *create_ref_entry(const char *refname,
+				   const unsigned char *sha1, int flag,
+				   int check_name)
+{
+	struct ref_entry *ref;
+
+	if (check_name &&
+	    check_refname_format(refname, REFNAME_ALLOW_ONELEVEL))
+		die("Reference has invalid format: '%s'", refname);
+	FLEX_ALLOC_STR(ref, name, refname);
+	hashcpy(ref->u.value.oid.hash, sha1);
+	oidclr(&ref->u.value.peeled);
+	ref->flag = flag;
+	return ref;
+}
+
+static void clear_ref_dir(struct ref_dir *dir);
+
+void free_ref_entry(struct ref_entry *entry)
+{
+	if (entry->flag & REF_DIR) {
+		/*
+		 * Do not use get_ref_dir() here, as that might
+		 * trigger the reading of loose refs.
+		 */
+		clear_ref_dir(&entry->u.subdir);
+	}
+	free(entry);
+}
+
+/*
+ * Clear and free all entries in dir, recursively.
+ */
+static void clear_ref_dir(struct ref_dir *dir)
+{
+	int i;
+	for (i = 0; i < dir->nr; i++)
+		free_ref_entry(dir->entries[i]);
+	free(dir->entries);
+	dir->sorted = dir->nr = dir->alloc = 0;
+	dir->entries = NULL;
+}
+
+struct ref_entry *create_dir_entry(struct files_ref_store *ref_store,
+				   const char *dirname, size_t len,
+				   int incomplete)
+{
+	struct ref_entry *direntry;
+	FLEX_ALLOC_MEM(direntry, name, dirname, len);
+	direntry->u.subdir.ref_store = ref_store;
+	direntry->flag = REF_DIR | (incomplete ? REF_INCOMPLETE : 0);
+	return direntry;
+}
+
+static int ref_entry_cmp(const void *a, const void *b)
+{
+	struct ref_entry *one = *(struct ref_entry **)a;
+	struct ref_entry *two = *(struct ref_entry **)b;
+	return strcmp(one->name, two->name);
+}
+
+static void sort_ref_dir(struct ref_dir *dir);
+
+struct string_slice {
+	size_t len;
+	const char *str;
+};
+
+static int ref_entry_cmp_sslice(const void *key_, const void *ent_)
+{
+	const struct string_slice *key = key_;
+	const struct ref_entry *ent = *(const struct ref_entry * const *)ent_;
+	int cmp = strncmp(key->str, ent->name, key->len);
+	if (cmp)
+		return cmp;
+	return '\0' - (unsigned char)ent->name[key->len];
+}
+
+int search_ref_dir(struct ref_dir *dir, const char *refname, size_t len)
+{
+	struct ref_entry **r;
+	struct string_slice key;
+
+	if (refname == NULL || !dir->nr)
+		return -1;
+
+	sort_ref_dir(dir);
+	key.len = len;
+	key.str = refname;
+	r = bsearch(&key, dir->entries, dir->nr, sizeof(*dir->entries),
+		    ref_entry_cmp_sslice);
+
+	if (r == NULL)
+		return -1;
+
+	return r - dir->entries;
+}
+
+/*
+ * Search for a directory entry directly within dir (without
+ * recursing).  Sort dir if necessary.  subdirname must be a directory
+ * name (i.e., end in '/').  If mkdir is set, then create the
+ * directory if it is missing; otherwise, return NULL if the desired
+ * directory cannot be found.  dir must already be complete.
+ */
+static struct ref_dir *search_for_subdir(struct ref_dir *dir,
+					 const char *subdirname, size_t len,
+					 int mkdir)
+{
+	int entry_index = search_ref_dir(dir, subdirname, len);
+	struct ref_entry *entry;
+	if (entry_index == -1) {
+		if (!mkdir)
+			return NULL;
+		/*
+		 * Since dir is complete, the absence of a subdir
+		 * means that the subdir really doesn't exist;
+		 * therefore, create an empty record for it but mark
+		 * the record complete.
+		 */
+		entry = create_dir_entry(dir->ref_store, subdirname, len, 0);
+		add_entry_to_dir(dir, entry);
+	} else {
+		entry = dir->entries[entry_index];
+	}
+	return get_ref_dir(entry);
+}
+
+struct ref_dir *find_containing_dir(struct ref_dir *dir,
+				    const char *refname, int mkdir)
+{
+	const char *slash;
+	for (slash = strchr(refname, '/'); slash; slash = strchr(slash + 1, '/')) {
+		size_t dirnamelen = slash - refname + 1;
+		struct ref_dir *subdir;
+		subdir = search_for_subdir(dir, refname, dirnamelen, mkdir);
+		if (!subdir) {
+			dir = NULL;
+			break;
+		}
+		dir = subdir;
+	}
+
+	return dir;
+}
+
+struct ref_entry *find_ref_entry(struct ref_dir *dir, const char *refname)
+{
+	int entry_index;
+	struct ref_entry *entry;
+	dir = find_containing_dir(dir, refname, 0);
+	if (!dir)
+		return NULL;
+	entry_index = search_ref_dir(dir, refname, strlen(refname));
+	if (entry_index == -1)
+		return NULL;
+	entry = dir->entries[entry_index];
+	return (entry->flag & REF_DIR) ? NULL : entry;
+}
+
+int remove_entry_from_dir(struct ref_dir *dir, const char *refname)
+{
+	int refname_len = strlen(refname);
+	int entry_index;
+	struct ref_entry *entry;
+	int is_dir = refname[refname_len - 1] == '/';
+	if (is_dir) {
+		/*
+		 * refname represents a reference directory.  Remove
+		 * the trailing slash; otherwise we will get the
+		 * directory *representing* refname rather than the
+		 * one *containing* it.
+		 */
+		char *dirname = xmemdupz(refname, refname_len - 1);
+		dir = find_containing_dir(dir, dirname, 0);
+		free(dirname);
+	} else {
+		dir = find_containing_dir(dir, refname, 0);
+	}
+	if (!dir)
+		return -1;
+	entry_index = search_ref_dir(dir, refname, refname_len);
+	if (entry_index == -1)
+		return -1;
+	entry = dir->entries[entry_index];
+
+	memmove(&dir->entries[entry_index],
+		&dir->entries[entry_index + 1],
+		(dir->nr - entry_index - 1) * sizeof(*dir->entries)
+		);
+	dir->nr--;
+	if (dir->sorted > entry_index)
+		dir->sorted--;
+	free_ref_entry(entry);
+	return dir->nr;
+}
+
+int add_ref_entry(struct ref_dir *dir, struct ref_entry *ref)
+{
+	dir = find_containing_dir(dir, ref->name, 1);
+	if (!dir)
+		return -1;
+	add_entry_to_dir(dir, ref);
+	return 0;
+}
+
+/*
+ * Emit a warning and return true iff ref1 and ref2 have the same name
+ * and the same sha1.  Die if they have the same name but different
+ * sha1s.
+ */
+static int is_dup_ref(const struct ref_entry *ref1, const struct ref_entry *ref2)
+{
+	if (strcmp(ref1->name, ref2->name))
+		return 0;
+
+	/* Duplicate name; make sure that they don't conflict: */
+
+	if ((ref1->flag & REF_DIR) || (ref2->flag & REF_DIR))
+		/* This is impossible by construction */
+		die("Reference directory conflict: %s", ref1->name);
+
+	if (oidcmp(&ref1->u.value.oid, &ref2->u.value.oid))
+		die("Duplicated ref, and SHA1s don't match: %s", ref1->name);
+
+	warning("Duplicated ref: %s", ref1->name);
+	return 1;
+}
+
+/*
+ * Sort the entries in dir non-recursively (if they are not already
+ * sorted) and remove any duplicate entries.
+ */
+static void sort_ref_dir(struct ref_dir *dir)
+{
+	int i, j;
+	struct ref_entry *last = NULL;
+
+	/*
+	 * This check also prevents passing a zero-length array to qsort(),
+	 * which is a problem on some platforms.
+	 */
+	if (dir->sorted == dir->nr)
+		return;
+
+	QSORT(dir->entries, dir->nr, ref_entry_cmp);
+
+	/* Remove any duplicates: */
+	for (i = 0, j = 0; j < dir->nr; j++) {
+		struct ref_entry *entry = dir->entries[j];
+		if (last && is_dup_ref(last, entry))
+			free_ref_entry(entry);
+		else
+			last = dir->entries[i++] = entry;
+	}
+	dir->sorted = dir->nr = i;
+}
+
+int do_for_each_entry_in_dir(struct ref_dir *dir, int offset,
+			     each_ref_entry_fn fn, void *cb_data)
+{
+	int i;
+	assert(dir->sorted == dir->nr);
+	for (i = offset; i < dir->nr; i++) {
+		struct ref_entry *entry = dir->entries[i];
+		int retval;
+		if (entry->flag & REF_DIR) {
+			struct ref_dir *subdir = get_ref_dir(entry);
+			sort_ref_dir(subdir);
+			retval = do_for_each_entry_in_dir(subdir, 0, fn, cb_data);
+		} else {
+			retval = fn(entry, cb_data);
+		}
+		if (retval)
+			return retval;
+	}
+	return 0;
+}
+
+void prime_ref_dir(struct ref_dir *dir)
+{
+	/*
+	 * The hard work of loading loose refs is done by get_ref_dir(), so we
+	 * just need to recurse through all of the sub-directories. We do not
+	 * even need to care about sorting, as traversal order does not matter
+	 * to us.
+	 */
+	int i;
+	for (i = 0; i < dir->nr; i++) {
+		struct ref_entry *entry = dir->entries[i];
+		if (entry->flag & REF_DIR)
+			prime_ref_dir(get_ref_dir(entry));
+	}
+}
+
+/*
+ * A level in the reference hierarchy that is currently being iterated
+ * through.
+ */
+struct cache_ref_iterator_level {
+	/*
+	 * The ref_dir being iterated over at this level. The ref_dir
+	 * is sorted before being stored here.
+	 */
+	struct ref_dir *dir;
+
+	/*
+	 * The index of the current entry within dir (which might
+	 * itself be a directory). If index == -1, then the iteration
+	 * hasn't yet begun. If index == dir->nr, then the iteration
+	 * through this level is over.
+	 */
+	int index;
+};
+
+/*
+ * Represent an iteration through a ref_dir in the memory cache. The
+ * iteration recurses through subdirectories.
+ */
+struct cache_ref_iterator {
+	struct ref_iterator base;
+
+	/*
+	 * The number of levels currently on the stack. This is always
+	 * at least 1, because when it becomes zero the iteration is
+	 * ended and this struct is freed.
+	 */
+	size_t levels_nr;
+
+	/* The number of levels that have been allocated on the stack */
+	size_t levels_alloc;
+
+	/*
+	 * A stack of levels. levels[0] is the uppermost level that is
+	 * being iterated over in this iteration. (This is not
+	 * necessary the top level in the references hierarchy. If we
+	 * are iterating through a subtree, then levels[0] will hold
+	 * the ref_dir for that subtree, and subsequent levels will go
+	 * on from there.)
+	 */
+	struct cache_ref_iterator_level *levels;
+};
+
+static int cache_ref_iterator_advance(struct ref_iterator *ref_iterator)
+{
+	struct cache_ref_iterator *iter =
+		(struct cache_ref_iterator *)ref_iterator;
+
+	while (1) {
+		struct cache_ref_iterator_level *level =
+			&iter->levels[iter->levels_nr - 1];
+		struct ref_dir *dir = level->dir;
+		struct ref_entry *entry;
+
+		if (level->index == -1)
+			sort_ref_dir(dir);
+
+		if (++level->index == level->dir->nr) {
+			/* This level is exhausted; pop up a level */
+			if (--iter->levels_nr == 0)
+				return ref_iterator_abort(ref_iterator);
+
+			continue;
+		}
+
+		entry = dir->entries[level->index];
+
+		if (entry->flag & REF_DIR) {
+			/* push down a level */
+			ALLOC_GROW(iter->levels, iter->levels_nr + 1,
+				   iter->levels_alloc);
+
+			level = &iter->levels[iter->levels_nr++];
+			level->dir = get_ref_dir(entry);
+			level->index = -1;
+		} else {
+			iter->base.refname = entry->name;
+			iter->base.oid = &entry->u.value.oid;
+			iter->base.flags = entry->flag;
+			return ITER_OK;
+		}
+	}
+}
+
+enum peel_status peel_entry(struct ref_entry *entry, int repeel)
+{
+	enum peel_status status;
+
+	if (entry->flag & REF_KNOWS_PEELED) {
+		if (repeel) {
+			entry->flag &= ~REF_KNOWS_PEELED;
+			oidclr(&entry->u.value.peeled);
+		} else {
+			return is_null_oid(&entry->u.value.peeled) ?
+				PEEL_NON_TAG : PEEL_PEELED;
+		}
+	}
+	if (entry->flag & REF_ISBROKEN)
+		return PEEL_BROKEN;
+	if (entry->flag & REF_ISSYMREF)
+		return PEEL_IS_SYMREF;
+
+	status = peel_object(entry->u.value.oid.hash, entry->u.value.peeled.hash);
+	if (status == PEEL_PEELED || status == PEEL_NON_TAG)
+		entry->flag |= REF_KNOWS_PEELED;
+	return status;
+}
+
+static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator,
+				   struct object_id *peeled)
+{
+	struct cache_ref_iterator *iter =
+		(struct cache_ref_iterator *)ref_iterator;
+	struct cache_ref_iterator_level *level;
+	struct ref_entry *entry;
+
+	level = &iter->levels[iter->levels_nr - 1];
+
+	if (level->index == -1)
+		die("BUG: peel called before advance for cache iterator");
+
+	entry = level->dir->entries[level->index];
+
+	if (peel_entry(entry, 0))
+		return -1;
+	oidcpy(peeled, &entry->u.value.peeled);
+	return 0;
+}
+
+static int cache_ref_iterator_abort(struct ref_iterator *ref_iterator)
+{
+	struct cache_ref_iterator *iter =
+		(struct cache_ref_iterator *)ref_iterator;
+
+	free(iter->levels);
+	base_ref_iterator_free(ref_iterator);
+	return ITER_DONE;
+}
+
+static struct ref_iterator_vtable cache_ref_iterator_vtable = {
+	cache_ref_iterator_advance,
+	cache_ref_iterator_peel,
+	cache_ref_iterator_abort
+};
+
+struct ref_iterator *cache_ref_iterator_begin(struct ref_dir *dir)
+{
+	struct cache_ref_iterator *iter;
+	struct ref_iterator *ref_iterator;
+	struct cache_ref_iterator_level *level;
+
+	iter = xcalloc(1, sizeof(*iter));
+	ref_iterator = &iter->base;
+	base_ref_iterator_init(ref_iterator, &cache_ref_iterator_vtable);
+	ALLOC_GROW(iter->levels, 10, iter->levels_alloc);
+
+	iter->levels_nr = 1;
+	level = &iter->levels[0];
+	level->index = -1;
+	level->dir = dir;
+
+	return ref_iterator;
+}
diff --git a/refs/ref-cache.h b/refs/ref-cache.h
new file mode 100644
index 0000000000..2e7b1a366e
--- /dev/null
+++ b/refs/ref-cache.h
@@ -0,0 +1,251 @@
+#ifndef REFS_REF_CACHE_H
+#define REFS_REF_CACHE_H
+
+/*
+ * Information used (along with the information in ref_entry) to
+ * describe a single cached reference.  This data structure only
+ * occurs embedded in a union in struct ref_entry, and only when
+ * (ref_entry->flag & REF_DIR) is zero.
+ */
+struct ref_value {
+	/*
+	 * The name of the object to which this reference resolves
+	 * (which may be a tag object).  If REF_ISBROKEN, this is
+	 * null.  If REF_ISSYMREF, then this is the name of the object
+	 * referred to by the last reference in the symlink chain.
+	 */
+	struct object_id oid;
+
+	/*
+	 * If REF_KNOWS_PEELED, then this field holds the peeled value
+	 * of this reference, or null if the reference is known not to
+	 * be peelable.  See the documentation for peel_ref() for an
+	 * exact definition of "peelable".
+	 */
+	struct object_id peeled;
+};
+
+/*
+ * Information used (along with the information in ref_entry) to
+ * describe a level in the hierarchy of references.  This data
+ * structure only occurs embedded in a union in struct ref_entry, and
+ * only when (ref_entry.flag & REF_DIR) is set.  In that case,
+ * (ref_entry.flag & REF_INCOMPLETE) determines whether the references
+ * in the directory have already been read:
+ *
+ *     (ref_entry.flag & REF_INCOMPLETE) unset -- a directory of loose
+ *         or packed references, already read.
+ *
+ *     (ref_entry.flag & REF_INCOMPLETE) set -- a directory of loose
+ *         references that hasn't been read yet (nor has any of its
+ *         subdirectories).
+ *
+ * Entries within a directory are stored within a growable array of
+ * pointers to ref_entries (entries, nr, alloc).  Entries 0 <= i <
+ * sorted are sorted by their component name in strcmp() order and the
+ * remaining entries are unsorted.
+ *
+ * Loose references are read lazily, one directory at a time.  When a
+ * directory of loose references is read, then all of the references
+ * in that directory are stored, and REF_INCOMPLETE stubs are created
+ * for any subdirectories, but the subdirectories themselves are not
+ * read.  The reading is triggered by get_ref_dir().
+ */
+struct ref_dir {
+	int nr, alloc;
+
+	/*
+	 * Entries with index 0 <= i < sorted are sorted by name.  New
+	 * entries are appended to the list unsorted, and are sorted
+	 * only when required; thus we avoid the need to sort the list
+	 * after the addition of every reference.
+	 */
+	int sorted;
+
+	/* A pointer to the files_ref_store that contains this ref_dir. */
+	struct files_ref_store *ref_store;
+
+	struct ref_entry **entries;
+};
+
+/*
+ * Bit values for ref_entry::flag.  REF_ISSYMREF=0x01,
+ * REF_ISPACKED=0x02, REF_ISBROKEN=0x04 and REF_BAD_NAME=0x08 are
+ * public values; see refs.h.
+ */
+
+/*
+ * The field ref_entry->u.value.peeled of this value entry contains
+ * the correct peeled value for the reference, which might be
+ * null_sha1 if the reference is not a tag or if it is broken.
+ */
+#define REF_KNOWS_PEELED 0x10
+
+/* ref_entry represents a directory of references */
+#define REF_DIR 0x20
+
+/*
+ * Entry has not yet been read from disk (used only for REF_DIR
+ * entries representing loose references)
+ */
+#define REF_INCOMPLETE 0x40
+
+/*
+ * A ref_entry represents either a reference or a "subdirectory" of
+ * references.
+ *
+ * Each directory in the reference namespace is represented by a
+ * ref_entry with (flags & REF_DIR) set and containing a subdir member
+ * that holds the entries in that directory that have been read so
+ * far.  If (flags & REF_INCOMPLETE) is set, then the directory and
+ * its subdirectories haven't been read yet.  REF_INCOMPLETE is only
+ * used for loose reference directories.
+ *
+ * References are represented by a ref_entry with (flags & REF_DIR)
+ * unset and a value member that describes the reference's value.  The
+ * flag member is at the ref_entry level, but it is also needed to
+ * interpret the contents of the value field (in other words, a
+ * ref_value object is not very much use without the enclosing
+ * ref_entry).
+ *
+ * Reference names cannot end with slash and directories' names are
+ * always stored with a trailing slash (except for the top-level
+ * directory, which is always denoted by "").  This has two nice
+ * consequences: (1) when the entries in each subdir are sorted
+ * lexicographically by name (as they usually are), the references in
+ * a whole tree can be generated in lexicographic order by traversing
+ * the tree in left-to-right, depth-first order; (2) the names of
+ * references and subdirectories cannot conflict, and therefore the
+ * presence of an empty subdirectory does not block the creation of a
+ * similarly-named reference.  (The fact that reference names with the
+ * same leading components can conflict *with each other* is a
+ * separate issue that is regulated by refs_verify_refname_available().)
+ *
+ * Please note that the name field contains the fully-qualified
+ * reference (or subdirectory) name.  Space could be saved by only
+ * storing the relative names.  But that would require the full names
+ * to be generated on the fly when iterating in do_for_each_ref(), and
+ * would break callback functions, who have always been able to assume
+ * that the name strings that they are passed will not be freed during
+ * the iteration.
+ */
+struct ref_entry {
+	unsigned char flag; /* ISSYMREF? ISPACKED? */
+	union {
+		struct ref_value value; /* if not (flags&REF_DIR) */
+		struct ref_dir subdir; /* if (flags&REF_DIR) */
+	} u;
+	/*
+	 * The full name of the reference (e.g., "refs/heads/master")
+	 * or the full name of the directory with a trailing slash
+	 * (e.g., "refs/heads/"):
+	 */
+	char name[FLEX_ARRAY];
+};
+
+/*
+ * Return the index of the entry with the given refname from the
+ * ref_dir (non-recursively), sorting dir if necessary.  Return -1 if
+ * no such entry is found.  dir must already be complete.
+ */
+int search_ref_dir(struct ref_dir *dir, const char *refname, size_t len);
+
+struct ref_dir *get_ref_dir(struct ref_entry *entry);
+
+/*
+ * Create a struct ref_entry object for the specified dirname.
+ * dirname is the name of the directory with a trailing slash (e.g.,
+ * "refs/heads/") or "" for the top-level directory.
+ */
+struct ref_entry *create_dir_entry(struct files_ref_store *ref_store,
+				   const char *dirname, size_t len,
+				   int incomplete);
+
+struct ref_entry *create_ref_entry(const char *refname,
+				   const unsigned char *sha1, int flag,
+				   int check_name);
+
+void free_ref_entry(struct ref_entry *entry);
+
+/*
+ * Add a ref_entry to the end of dir (unsorted).  Entry is always
+ * stored directly in dir; no recursion into subdirectories is
+ * done.
+ */
+void add_entry_to_dir(struct ref_dir *dir, struct ref_entry *entry);
+
+/*
+ * Remove the entry with the given name from dir, recursing into
+ * subdirectories as necessary.  If refname is the name of a directory
+ * (i.e., ends with '/'), then remove the directory and its contents.
+ * If the removal was successful, return the number of entries
+ * remaining in the directory entry that contained the deleted entry.
+ * If the name was not found, return -1.  Please note that this
+ * function only deletes the entry from the cache; it does not delete
+ * it from the filesystem or ensure that other cache entries (which
+ * might be symbolic references to the removed entry) are updated.
+ * Nor does it remove any containing dir entries that might be made
+ * empty by the removal.  dir must represent the top-level directory
+ * and must already be complete.
+ */
+int remove_entry_from_dir(struct ref_dir *dir, const char *refname);
+
+/*
+ * Add a ref_entry to the ref_dir (unsorted), recursing into
+ * subdirectories as necessary.  dir must represent the top-level
+ * directory.  Return 0 on success.
+ */
+int add_ref_entry(struct ref_dir *dir, struct ref_entry *ref);
+
+/*
+ * If refname is a reference name, find the ref_dir within the dir
+ * tree that should hold refname. If refname is a directory name
+ * (i.e., it ends in '/'), then return that ref_dir itself. dir must
+ * represent the top-level directory and must already be complete.
+ * Sort ref_dirs and recurse into subdirectories as necessary. If
+ * mkdir is set, then create any missing directories; otherwise,
+ * return NULL if the desired directory cannot be found.
+ */
+struct ref_dir *find_containing_dir(struct ref_dir *dir,
+				    const char *refname, int mkdir);
+
+/*
+ * Find the value entry with the given name in dir, sorting ref_dirs
+ * and recursing into subdirectories as necessary.  If the name is not
+ * found or it corresponds to a directory entry, return NULL.
+ */
+struct ref_entry *find_ref_entry(struct ref_dir *dir, const char *refname);
+
+struct ref_iterator *cache_ref_iterator_begin(struct ref_dir *dir);
+
+typedef int each_ref_entry_fn(struct ref_entry *entry, void *cb_data);
+
+/*
+ * Call fn for each reference in dir that has index in the range
+ * offset <= index < dir->nr.  Recurse into subdirectories that are in
+ * that index range, sorting them before iterating.  This function
+ * does not sort dir itself; it should be sorted beforehand.  fn is
+ * called for all references, including broken ones.
+ */
+int do_for_each_entry_in_dir(struct ref_dir *dir, int offset,
+			     each_ref_entry_fn fn, void *cb_data);
+
+/*
+ * Peel the entry (if possible) and return its new peel_status.  If
+ * repeel is true, re-peel the entry even if there is an old peeled
+ * value that is already stored in it.
+ *
+ * It is OK to call this function with a packed reference entry that
+ * might be stale and might even refer to an object that has since
+ * been garbage-collected.  In such a case, if the entry has
+ * REF_KNOWS_PEELED then leave the status unchanged and return
+ * PEEL_PEELED or PEEL_NON_TAG; otherwise, return PEEL_INVALID.
+ */
+enum peel_status peel_entry(struct ref_entry *entry, int repeel);
+
+/*
+ * Load all of the refs from `dir` into our in-memory cache.
+ */
+void prime_ref_dir(struct ref_dir *dir);
+
+#endif /* REFS_REF_CACHE_H */
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 10/20] ref-cache: introduce a new type, ref_cache
  2017-03-20 16:33 [PATCH 00/20] Separate `ref_cache` into a separate module Michael Haggerty
                   ` (8 preceding siblings ...)
  2017-03-20 16:33 ` [PATCH 09/20] refs: split `ref_cache` code into separate files Michael Haggerty
@ 2017-03-20 16:33 ` Michael Haggerty
  2017-03-20 16:33 ` [PATCH 11/20] refs: record the ref_store in ref_cache, not ref_dir Michael Haggerty
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Michael Haggerty @ 2017-03-20 16:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, David Turner, Jeff King,
	git, Michael Haggerty

For now, it just wraps a `ref_entry *` that points at the root of the
tree. Soon it will hold more information.

Add two new functions, `create_ref_cache()` and `free_ref_cache()`.
Make `free_ref_entry()` private.

Change files-backend to use this type to hold its caches.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/files-backend.c | 28 +++++++++++++++++-----------
 refs/ref-cache.c     | 16 +++++++++++++++-
 refs/ref-cache.h     | 15 ++++++++++++++-
 3 files changed, 46 insertions(+), 13 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index f07e93d522..a7d912ae39 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -44,7 +44,7 @@ static int entry_resolves_to_object(struct ref_entry *entry)
 }
 
 struct packed_ref_cache {
-	struct ref_entry *root;
+	struct ref_cache *cache;
 
 	/*
 	 * Count of references to the data structure in this instance,
@@ -79,7 +79,7 @@ struct files_ref_store {
 	char *gitcommondir;
 	char *packed_refs_path;
 
-	struct ref_entry *loose;
+	struct ref_cache *loose;
 	struct packed_ref_cache *packed;
 };
 
@@ -101,7 +101,7 @@ static void acquire_packed_ref_cache(struct packed_ref_cache *packed_refs)
 static int release_packed_ref_cache(struct packed_ref_cache *packed_refs)
 {
 	if (!--packed_refs->referrers) {
-		free_ref_entry(packed_refs->root);
+		free_ref_cache(packed_refs->cache);
 		stat_validity_clear(&packed_refs->validity);
 		free(packed_refs);
 		return 1;
@@ -125,7 +125,7 @@ static void clear_packed_ref_cache(struct files_ref_store *refs)
 static void clear_loose_ref_cache(struct files_ref_store *refs)
 {
 	if (refs->loose) {
-		free_ref_entry(refs->loose);
+		free_ref_cache(refs->loose);
 		refs->loose = NULL;
 	}
 }
@@ -387,11 +387,12 @@ static struct packed_ref_cache *get_packed_ref_cache(struct files_ref_store *ref
 
 		refs->packed = xcalloc(1, sizeof(*refs->packed));
 		acquire_packed_ref_cache(refs->packed);
-		refs->packed->root = create_dir_entry(refs, "", 0, 0);
+		refs->packed->cache = create_ref_cache(refs);
+		refs->packed->cache->root->flag &= ~REF_INCOMPLETE;
 		f = fopen(packed_refs_file, "r");
 		if (f) {
 			stat_validity_update(&refs->packed->validity, fileno(f));
-			read_packed_refs(f, get_ref_dir(refs->packed->root));
+			read_packed_refs(f, get_ref_dir(refs->packed->cache->root));
 			fclose(f);
 		}
 	}
@@ -400,7 +401,7 @@ static struct packed_ref_cache *get_packed_ref_cache(struct files_ref_store *ref
 
 static struct ref_dir *get_packed_ref_dir(struct packed_ref_cache *packed_ref_cache)
 {
-	return get_ref_dir(packed_ref_cache->root);
+	return get_ref_dir(packed_ref_cache->cache->root);
 }
 
 static struct ref_dir *get_packed_refs(struct files_ref_store *refs)
@@ -515,14 +516,19 @@ static struct ref_dir *get_loose_refs(struct files_ref_store *refs)
 		 * are about to read the only subdirectory that can
 		 * hold references:
 		 */
-		refs->loose = create_dir_entry(refs, "", 0, 0);
+		refs->loose = create_ref_cache(refs);
+
+		/* We're going to fill the top level ourselves: */
+		refs->loose->root->flag &= ~REF_INCOMPLETE;
+
 		/*
-		 * Create an incomplete entry for "refs/":
+		 * Add an incomplete entry for "refs/" (to be filled
+		 * lazily):
 		 */
-		add_entry_to_dir(get_ref_dir(refs->loose),
+		add_entry_to_dir(get_ref_dir(refs->loose->root),
 				 create_dir_entry(refs, "refs/", 5, 1));
 	}
-	return get_ref_dir(refs->loose);
+	return get_ref_dir(refs->loose->root);
 }
 
 /*
diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index 4274a43a9b..bf911028c8 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -63,9 +63,17 @@ struct ref_entry *create_ref_entry(const char *refname,
 	return ref;
 }
 
+struct ref_cache *create_ref_cache(struct files_ref_store *refs)
+{
+	struct ref_cache *ret = xcalloc(1, sizeof(*ret));
+
+	ret->root = create_dir_entry(refs, "", 0, 1);
+	return ret;
+}
+
 static void clear_ref_dir(struct ref_dir *dir);
 
-void free_ref_entry(struct ref_entry *entry)
+static void free_ref_entry(struct ref_entry *entry)
 {
 	if (entry->flag & REF_DIR) {
 		/*
@@ -77,6 +85,12 @@ void free_ref_entry(struct ref_entry *entry)
 	free(entry);
 }
 
+void free_ref_cache(struct ref_cache *cache)
+{
+	free_ref_entry(cache->root);
+	free(cache);
+}
+
 /*
  * Clear and free all entries in dir, recursively.
  */
diff --git a/refs/ref-cache.h b/refs/ref-cache.h
index 2e7b1a366e..da5388c136 100644
--- a/refs/ref-cache.h
+++ b/refs/ref-cache.h
@@ -1,6 +1,10 @@
 #ifndef REFS_REF_CACHE_H
 #define REFS_REF_CACHE_H
 
+struct ref_cache {
+	struct ref_entry *root;
+};
+
 /*
  * Information used (along with the information in ref_entry) to
  * describe a single cached reference.  This data structure only
@@ -165,7 +169,16 @@ struct ref_entry *create_ref_entry(const char *refname,
 				   const unsigned char *sha1, int flag,
 				   int check_name);
 
-void free_ref_entry(struct ref_entry *entry);
+/*
+ * Return a pointer to a new `ref_cache`. Its top-level starts out
+ * marked incomplete.
+ */
+struct ref_cache *create_ref_cache(struct files_ref_store *refs);
+
+/*
+ * Free the `ref_cache` and all of its associated data.
+ */
+void free_ref_cache(struct ref_cache *cache);
 
 /*
  * Add a ref_entry to the end of dir (unsorted).  Entry is always
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 11/20] refs: record the ref_store in ref_cache, not ref_dir
  2017-03-20 16:33 [PATCH 00/20] Separate `ref_cache` into a separate module Michael Haggerty
                   ` (9 preceding siblings ...)
  2017-03-20 16:33 ` [PATCH 10/20] ref-cache: introduce a new type, ref_cache Michael Haggerty
@ 2017-03-20 16:33 ` Michael Haggerty
  2017-03-20 17:51   ` Jeff King
  2017-03-20 16:33 ` [PATCH 12/20] ref-cache: use a callback function to fill the cache Michael Haggerty
                   ` (12 subsequent siblings)
  23 siblings, 1 reply; 38+ messages in thread
From: Michael Haggerty @ 2017-03-20 16:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, David Turner, Jeff King,
	git, Michael Haggerty

Instead of keeping a pointer to the ref_store in every ref_dir entry,
store it once in `struct ref_cache`, and change `struct ref_dir` to
include a pointer to its containing `ref_cache` instead. This makes it
easier to add to the information that is stored in `struct ref_cache`
without inflating the size of the individual entries.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/files-backend.c |  6 +++---
 refs/ref-cache.c     | 12 +++++++-----
 refs/ref-cache.h     |  9 ++++++---
 3 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index a7d912ae39..78572e55a0 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -433,7 +433,7 @@ static void add_packed_ref(struct files_ref_store *refs,
  */
 void read_loose_refs(const char *dirname, struct ref_dir *dir)
 {
-	struct files_ref_store *refs = dir->ref_store;
+	struct files_ref_store *refs = dir->cache->ref_store;
 	DIR *d;
 	struct dirent *de;
 	int dirnamelen = strlen(dirname);
@@ -469,7 +469,7 @@ void read_loose_refs(const char *dirname, struct ref_dir *dir)
 		} else if (S_ISDIR(st.st_mode)) {
 			strbuf_addch(&refname, '/');
 			add_entry_to_dir(dir,
-					 create_dir_entry(refs, refname.buf,
+					 create_dir_entry(dir->cache, refname.buf,
 							  refname.len, 1));
 		} else {
 			if (!refs_resolve_ref_unsafe(&refs->base,
@@ -526,7 +526,7 @@ static struct ref_dir *get_loose_refs(struct files_ref_store *refs)
 		 * lazily):
 		 */
 		add_entry_to_dir(get_ref_dir(refs->loose->root),
-				 create_dir_entry(refs, "refs/", 5, 1));
+				 create_dir_entry(refs->loose, "refs/", 5, 1));
 	}
 	return get_ref_dir(refs->loose->root);
 }
diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index bf911028c8..96da094788 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -36,7 +36,7 @@ struct ref_dir *get_ref_dir(struct ref_entry *entry)
 			int pos = search_ref_dir(dir, "refs/bisect/", 12);
 			if (pos < 0) {
 				struct ref_entry *child_entry;
-				child_entry = create_dir_entry(dir->ref_store,
+				child_entry = create_dir_entry(dir->cache,
 							       "refs/bisect/",
 							       12, 1);
 				add_entry_to_dir(dir, child_entry);
@@ -67,7 +67,8 @@ struct ref_cache *create_ref_cache(struct files_ref_store *refs)
 {
 	struct ref_cache *ret = xcalloc(1, sizeof(*ret));
 
-	ret->root = create_dir_entry(refs, "", 0, 1);
+	ret->ref_store = refs;
+	ret->root = create_dir_entry(ret, "", 0, 1);
 	return ret;
 }
 
@@ -104,13 +105,14 @@ static void clear_ref_dir(struct ref_dir *dir)
 	dir->entries = NULL;
 }
 
-struct ref_entry *create_dir_entry(struct files_ref_store *ref_store,
+struct ref_entry *create_dir_entry(struct ref_cache *cache,
 				   const char *dirname, size_t len,
 				   int incomplete)
 {
 	struct ref_entry *direntry;
+
 	FLEX_ALLOC_MEM(direntry, name, dirname, len);
-	direntry->u.subdir.ref_store = ref_store;
+	direntry->u.subdir.cache = cache;
 	direntry->flag = REF_DIR | (incomplete ? REF_INCOMPLETE : 0);
 	return direntry;
 }
@@ -181,7 +183,7 @@ static struct ref_dir *search_for_subdir(struct ref_dir *dir,
 		 * therefore, create an empty record for it but mark
 		 * the record complete.
 		 */
-		entry = create_dir_entry(dir->ref_store, subdirname, len, 0);
+		entry = create_dir_entry(dir->cache, subdirname, len, 0);
 		add_entry_to_dir(dir, entry);
 	} else {
 		entry = dir->entries[entry_index];
diff --git a/refs/ref-cache.h b/refs/ref-cache.h
index da5388c136..83051854ff 100644
--- a/refs/ref-cache.h
+++ b/refs/ref-cache.h
@@ -3,6 +3,9 @@
 
 struct ref_cache {
 	struct ref_entry *root;
+
+	/* A pointer to the files_ref_store whose cache this is: */
+	struct files_ref_store *ref_store;
 };
 
 /*
@@ -66,8 +69,8 @@ struct ref_dir {
 	 */
 	int sorted;
 
-	/* A pointer to the files_ref_store that contains this ref_dir. */
-	struct files_ref_store *ref_store;
+	/* The ref_cache containing this entry: */
+	struct ref_cache *cache;
 
 	struct ref_entry **entries;
 };
@@ -161,7 +164,7 @@ struct ref_dir *get_ref_dir(struct ref_entry *entry);
  * dirname is the name of the directory with a trailing slash (e.g.,
  * "refs/heads/") or "" for the top-level directory.
  */
-struct ref_entry *create_dir_entry(struct files_ref_store *ref_store,
+struct ref_entry *create_dir_entry(struct ref_cache *cache,
 				   const char *dirname, size_t len,
 				   int incomplete);
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 12/20] ref-cache: use a callback function to fill the cache
  2017-03-20 16:33 [PATCH 00/20] Separate `ref_cache` into a separate module Michael Haggerty
                   ` (10 preceding siblings ...)
  2017-03-20 16:33 ` [PATCH 11/20] refs: record the ref_store in ref_cache, not ref_dir Michael Haggerty
@ 2017-03-20 16:33 ` Michael Haggerty
  2017-03-20 16:33 ` [PATCH 13/20] refs: handle "refs/bisect/" in `loose_fill_ref_dir()` Michael Haggerty
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Michael Haggerty @ 2017-03-20 16:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, David Turner, Jeff King,
	git, Michael Haggerty

It is a leveling violation for `ref_cache` to know about
`files_ref_store` or that it should call `read_loose_refs()` to lazily
fill cache directories. So instead, have its constructor take as an
argument a callback function that it should use for lazy-filling, and
change `files_ref_store` to supply a pointer to function
`read_loose_refs` (renamed to `loose_fill_ref_dir`) when creating the
ref cache for its loose refs.

This means that we can generify the type of the back-pointer in
`struct ref_cache` from `files_ref_store` to `ref_store`.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/files-backend.c | 10 ++++++----
 refs/ref-cache.c     | 12 +++++++-----
 refs/ref-cache.h     | 29 +++++++++++++++++++++++++----
 3 files changed, 38 insertions(+), 13 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 78572e55a0..e4d78393ac 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -387,7 +387,7 @@ static struct packed_ref_cache *get_packed_ref_cache(struct files_ref_store *ref
 
 		refs->packed = xcalloc(1, sizeof(*refs->packed));
 		acquire_packed_ref_cache(refs->packed);
-		refs->packed->cache = create_ref_cache(refs);
+		refs->packed->cache = create_ref_cache(&refs->base, NULL);
 		refs->packed->cache->root->flag &= ~REF_INCOMPLETE;
 		f = fopen(packed_refs_file, "r");
 		if (f) {
@@ -431,9 +431,11 @@ static void add_packed_ref(struct files_ref_store *refs,
  * (without recursing).  dirname must end with '/'.  dir must be the
  * directory entry corresponding to dirname.
  */
-void read_loose_refs(const char *dirname, struct ref_dir *dir)
+static void loose_fill_ref_dir(struct ref_store *ref_store,
+			       struct ref_dir *dir, const char *dirname)
 {
-	struct files_ref_store *refs = dir->cache->ref_store;
+	struct files_ref_store *refs =
+		files_downcast(ref_store, REF_STORE_READ, "fill_ref_dir");
 	DIR *d;
 	struct dirent *de;
 	int dirnamelen = strlen(dirname);
@@ -516,7 +518,7 @@ static struct ref_dir *get_loose_refs(struct files_ref_store *refs)
 		 * are about to read the only subdirectory that can
 		 * hold references:
 		 */
-		refs->loose = create_ref_cache(refs);
+		refs->loose = create_ref_cache(&refs->base, loose_fill_ref_dir);
 
 		/* We're going to fill the top level ourselves: */
 		refs->loose->root->flag &= ~REF_INCOMPLETE;
diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index 96da094788..7f247b9170 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -4,9 +4,6 @@
 #include "ref-cache.h"
 #include "../iterator.h"
 
-/* FIXME: This declaration shouldn't be here */
-void read_loose_refs(const char *dirname, struct ref_dir *dir);
-
 void add_entry_to_dir(struct ref_dir *dir, struct ref_entry *entry)
 {
 	ALLOC_GROW(dir->entries, dir->nr + 1, dir->alloc);
@@ -25,7 +22,10 @@ struct ref_dir *get_ref_dir(struct ref_entry *entry)
 	assert(entry->flag & REF_DIR);
 	dir = &entry->u.subdir;
 	if (entry->flag & REF_INCOMPLETE) {
-		read_loose_refs(entry->name, dir);
+		if (!dir->cache->fill_ref_dir)
+			die("BUG: incomplete ref_store without fill_ref_dir function");
+
+		dir->cache->fill_ref_dir(dir->cache->ref_store, dir, entry->name);
 
 		/*
 		 * Manually add refs/bisect, which, being
@@ -63,11 +63,13 @@ struct ref_entry *create_ref_entry(const char *refname,
 	return ref;
 }
 
-struct ref_cache *create_ref_cache(struct files_ref_store *refs)
+struct ref_cache *create_ref_cache(struct ref_store *refs,
+				   fill_ref_dir_fn *fill_ref_dir)
 {
 	struct ref_cache *ret = xcalloc(1, sizeof(*ret));
 
 	ret->ref_store = refs;
+	ret->fill_ref_dir = fill_ref_dir;
 	ret->root = create_dir_entry(ret, "", 0, 1);
 	return ret;
 }
diff --git a/refs/ref-cache.h b/refs/ref-cache.h
index 83051854ff..ed51e80d88 100644
--- a/refs/ref-cache.h
+++ b/refs/ref-cache.h
@@ -1,11 +1,27 @@
 #ifndef REFS_REF_CACHE_H
 #define REFS_REF_CACHE_H
 
+struct ref_dir;
+
+/*
+ * If this ref_cache is filled lazily, this function is used to load
+ * information into the specified ref_dir (shallow or deep, at the
+ * option of the ref_store). dirname includes a trailing slash.
+ */
+typedef void fill_ref_dir_fn(struct ref_store *ref_store,
+			     struct ref_dir *dir, const char *dirname);
+
 struct ref_cache {
 	struct ref_entry *root;
 
-	/* A pointer to the files_ref_store whose cache this is: */
-	struct files_ref_store *ref_store;
+	/* A pointer to the ref_store whose cache this is: */
+	struct ref_store *ref_store;
+
+	/*
+	 * Function used (if necessary) to lazily-fill cache. May be
+	 * NULL.
+	 */
+	fill_ref_dir_fn *fill_ref_dir;
 };
 
 /*
@@ -174,9 +190,14 @@ struct ref_entry *create_ref_entry(const char *refname,
 
 /*
  * Return a pointer to a new `ref_cache`. Its top-level starts out
- * marked incomplete.
+ * marked incomplete. If `fill_ref_dir` is non-NULL, it is the
+ * function called to fill in incomplete directories in the
+ * `ref_cache` when they are accessed. If it is NULL, then the whole
+ * `ref_cache` must be filled (including clearing its directories'
+ * `REF_INCOMPLETE` bits) before it is used.
  */
-struct ref_cache *create_ref_cache(struct files_ref_store *refs);
+struct ref_cache *create_ref_cache(struct ref_store *refs,
+				   fill_ref_dir_fn *fill_ref_dir);
 
 /*
  * Free the `ref_cache` and all of its associated data.
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 13/20] refs: handle "refs/bisect/" in `loose_fill_ref_dir()`
  2017-03-20 16:33 [PATCH 00/20] Separate `ref_cache` into a separate module Michael Haggerty
                   ` (11 preceding siblings ...)
  2017-03-20 16:33 ` [PATCH 12/20] ref-cache: use a callback function to fill the cache Michael Haggerty
@ 2017-03-20 16:33 ` Michael Haggerty
  2017-03-20 16:33 ` [PATCH 14/20] do_for_each_entry_in_dir(): eliminate `offset` argument Michael Haggerty
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Michael Haggerty @ 2017-03-20 16:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, David Turner, Jeff King,
	git, Michael Haggerty

That "refs/bisect/" has to be handled specially when filling the
ref_cache for loose references is a peculiarity of the files backend,
and the ref-cache code shouldn't need to know about it. So move this
code to the callback function, `loose_fill_ref_dir()`.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/files-backend.c | 15 +++++++++++++++
 refs/ref-cache.c     | 16 ----------------
 2 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index e4d78393ac..7b5f5c1240 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -508,6 +508,21 @@ static void loose_fill_ref_dir(struct ref_store *ref_store,
 	strbuf_release(&refname);
 	strbuf_release(&path);
 	closedir(d);
+
+	/*
+	 * Manually add refs/bisect, which, being per-worktree, might
+	 * not appear in the directory listing for refs/ in the main
+	 * repo.
+	 */
+	if (!strcmp(dirname, "refs/")) {
+		int pos = search_ref_dir(dir, "refs/bisect/", 12);
+
+		if (pos < 0) {
+			struct ref_entry *child_entry = create_dir_entry(
+					dir->cache, "refs/bisect/", 12, 1);
+			add_entry_to_dir(dir, child_entry);
+		}
+	}
 }
 
 static struct ref_dir *get_loose_refs(struct files_ref_store *refs)
diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index 7f247b9170..44440e0c13 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -26,22 +26,6 @@ struct ref_dir *get_ref_dir(struct ref_entry *entry)
 			die("BUG: incomplete ref_store without fill_ref_dir function");
 
 		dir->cache->fill_ref_dir(dir->cache->ref_store, dir, entry->name);
-
-		/*
-		 * Manually add refs/bisect, which, being
-		 * per-worktree, might not appear in the directory
-		 * listing for refs/ in the main repo.
-		 */
-		if (!strcmp(entry->name, "refs/")) {
-			int pos = search_ref_dir(dir, "refs/bisect/", 12);
-			if (pos < 0) {
-				struct ref_entry *child_entry;
-				child_entry = create_dir_entry(dir->cache,
-							       "refs/bisect/",
-							       12, 1);
-				add_entry_to_dir(dir, child_entry);
-			}
-		}
 		entry->flag &= ~REF_INCOMPLETE;
 	}
 	return dir;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 14/20] do_for_each_entry_in_dir(): eliminate `offset` argument
  2017-03-20 16:33 [PATCH 00/20] Separate `ref_cache` into a separate module Michael Haggerty
                   ` (12 preceding siblings ...)
  2017-03-20 16:33 ` [PATCH 13/20] refs: handle "refs/bisect/" in `loose_fill_ref_dir()` Michael Haggerty
@ 2017-03-20 16:33 ` Michael Haggerty
  2017-03-20 16:33 ` [PATCH 15/20] get_loose_ref_dir(): function renamed from get_loose_refs() Michael Haggerty
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Michael Haggerty @ 2017-03-20 16:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, David Turner, Jeff King,
	git, Michael Haggerty

It was never used.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/files-backend.c |  4 ++--
 refs/ref-cache.c     |  6 +++---
 refs/ref-cache.h     | 11 +++++------
 3 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 7b5f5c1240..0ff5df6b46 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1390,7 +1390,7 @@ static int commit_packed_refs(struct files_ref_store *refs)
 
 	fprintf_or_die(out, "%s", PACKED_REFS_HEADER);
 	do_for_each_entry_in_dir(get_packed_ref_dir(packed_ref_cache),
-				 0, write_packed_entry_fn, out);
+				 write_packed_entry_fn, out);
 
 	if (commit_lock_file(packed_ref_cache->lock)) {
 		save_errno = errno;
@@ -1584,7 +1584,7 @@ static int files_pack_refs(struct ref_store *ref_store, unsigned int flags)
 	lock_packed_refs(refs, LOCK_DIE_ON_ERROR);
 	cbdata.packed_refs = get_packed_refs(refs);
 
-	do_for_each_entry_in_dir(get_loose_refs(refs), 0,
+	do_for_each_entry_in_dir(get_loose_refs(refs),
 				 pack_if_possible_fn, &cbdata);
 
 	if (commit_packed_refs(refs))
diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index 44440e0c13..38d4c31985 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -307,18 +307,18 @@ static void sort_ref_dir(struct ref_dir *dir)
 	dir->sorted = dir->nr = i;
 }
 
-int do_for_each_entry_in_dir(struct ref_dir *dir, int offset,
+int do_for_each_entry_in_dir(struct ref_dir *dir,
 			     each_ref_entry_fn fn, void *cb_data)
 {
 	int i;
 	assert(dir->sorted == dir->nr);
-	for (i = offset; i < dir->nr; i++) {
+	for (i = 0; i < dir->nr; i++) {
 		struct ref_entry *entry = dir->entries[i];
 		int retval;
 		if (entry->flag & REF_DIR) {
 			struct ref_dir *subdir = get_ref_dir(entry);
 			sort_ref_dir(subdir);
-			retval = do_for_each_entry_in_dir(subdir, 0, fn, cb_data);
+			retval = do_for_each_entry_in_dir(subdir, fn, cb_data);
 		} else {
 			retval = fn(entry, cb_data);
 		}
diff --git a/refs/ref-cache.h b/refs/ref-cache.h
index ed51e80d88..6eecdf4276 100644
--- a/refs/ref-cache.h
+++ b/refs/ref-cache.h
@@ -258,13 +258,12 @@ struct ref_iterator *cache_ref_iterator_begin(struct ref_dir *dir);
 typedef int each_ref_entry_fn(struct ref_entry *entry, void *cb_data);
 
 /*
- * Call fn for each reference in dir that has index in the range
- * offset <= index < dir->nr.  Recurse into subdirectories that are in
- * that index range, sorting them before iterating.  This function
- * does not sort dir itself; it should be sorted beforehand.  fn is
- * called for all references, including broken ones.
+ * Call `fn` for each reference in `dir`. Recurse into subdirectories,
+ * sorting them before iterating. This function does not sort `dir`
+ * itself; it should be sorted beforehand. `fn` is called for all
+ * references, including broken ones.
  */
-int do_for_each_entry_in_dir(struct ref_dir *dir, int offset,
+int do_for_each_entry_in_dir(struct ref_dir *dir,
 			     each_ref_entry_fn fn, void *cb_data);
 
 /*
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 15/20] get_loose_ref_dir(): function renamed from get_loose_refs()
  2017-03-20 16:33 [PATCH 00/20] Separate `ref_cache` into a separate module Michael Haggerty
                   ` (13 preceding siblings ...)
  2017-03-20 16:33 ` [PATCH 14/20] do_for_each_entry_in_dir(): eliminate `offset` argument Michael Haggerty
@ 2017-03-20 16:33 ` Michael Haggerty
  2017-03-20 16:33 ` [PATCH 16/20] get_loose_ref_cache(): new function Michael Haggerty
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Michael Haggerty @ 2017-03-20 16:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, David Turner, Jeff King,
	git, Michael Haggerty

The new name is more analogous to `get_packed_ref_dir()`.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/files-backend.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 0ff5df6b46..0a16f6196c 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -525,7 +525,7 @@ static void loose_fill_ref_dir(struct ref_store *ref_store,
 	}
 }
 
-static struct ref_dir *get_loose_refs(struct files_ref_store *refs)
+static struct ref_dir *get_loose_ref_dir(struct files_ref_store *refs)
 {
 	if (!refs->loose) {
 		/*
@@ -1113,7 +1113,7 @@ static struct ref_iterator *files_ref_iterator_begin(
 	 * date with what is on disk, and re-reads it if not.
 	 */
 
-	loose_dir = get_loose_refs(refs);
+	loose_dir = get_loose_ref_dir(refs);
 
 	if (prefix && *prefix)
 		loose_dir = find_containing_dir(loose_dir, prefix, 0);
@@ -1584,7 +1584,7 @@ static int files_pack_refs(struct ref_store *ref_store, unsigned int flags)
 	lock_packed_refs(refs, LOCK_DIE_ON_ERROR);
 	cbdata.packed_refs = get_packed_refs(refs);
 
-	do_for_each_entry_in_dir(get_loose_refs(refs),
+	do_for_each_entry_in_dir(get_loose_ref_dir(refs),
 				 pack_if_possible_fn, &cbdata);
 
 	if (commit_packed_refs(refs))
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 16/20] get_loose_ref_cache(): new function
  2017-03-20 16:33 [PATCH 00/20] Separate `ref_cache` into a separate module Michael Haggerty
                   ` (14 preceding siblings ...)
  2017-03-20 16:33 ` [PATCH 15/20] get_loose_ref_dir(): function renamed from get_loose_refs() Michael Haggerty
@ 2017-03-20 16:33 ` Michael Haggerty
  2017-03-20 16:33 ` [PATCH 17/20] cache_ref_iterator_begin(): make function smarter Michael Haggerty
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Michael Haggerty @ 2017-03-20 16:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, David Turner, Jeff King,
	git, Michael Haggerty

Extract a new function, `get_loose_ref_cache()`, from
get_loose_ref_dir(). The function returns the `ref_cache` for the
loose refs of a `files_ref_store`.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/files-backend.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 0a16f6196c..fefc29433a 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -525,7 +525,7 @@ static void loose_fill_ref_dir(struct ref_store *ref_store,
 	}
 }
 
-static struct ref_dir *get_loose_ref_dir(struct files_ref_store *refs)
+static struct ref_cache *get_loose_ref_cache(struct files_ref_store *refs)
 {
 	if (!refs->loose) {
 		/*
@@ -545,7 +545,12 @@ static struct ref_dir *get_loose_ref_dir(struct files_ref_store *refs)
 		add_entry_to_dir(get_ref_dir(refs->loose->root),
 				 create_dir_entry(refs->loose, "refs/", 5, 1));
 	}
-	return get_ref_dir(refs->loose->root);
+	return refs->loose;
+}
+
+static struct ref_dir *get_loose_ref_dir(struct files_ref_store *refs)
+{
+	return get_ref_dir(get_loose_ref_cache(refs)->root);
 }
 
 /*
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 17/20] cache_ref_iterator_begin(): make function smarter
  2017-03-20 16:33 [PATCH 00/20] Separate `ref_cache` into a separate module Michael Haggerty
                   ` (15 preceding siblings ...)
  2017-03-20 16:33 ` [PATCH 16/20] get_loose_ref_cache(): new function Michael Haggerty
@ 2017-03-20 16:33 ` Michael Haggerty
  2017-03-20 16:33 ` [PATCH 18/20] commit_packed_refs(): use reference iteration Michael Haggerty
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Michael Haggerty @ 2017-03-20 16:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, David Turner, Jeff King,
	git, Michael Haggerty

Change `cache_ref_iterator_begin()` to take two new arguments:

* `prefix` -- to iterate only over references with the specified
  prefix.

* `prime_dir` -- to "prime" (i.e., pre-load) the cache before starting
  the iteration.

The new functionality makes it possible for
`files_ref_iterator_begin()` to be made more ignorant of the internals
of `ref_cache`, and `find_containing_dir()` and `prime_ref_dir()` to
be made private.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/files-backend.c | 44 +++++++++++++-------------------------------
 refs/ref-cache.c     | 38 ++++++++++++++++++++++++++++++++++----
 refs/ref-cache.h     | 27 +++++++++------------------
 3 files changed, 56 insertions(+), 53 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index fefc29433a..736a6c9ff7 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1083,7 +1083,6 @@ static struct ref_iterator *files_ref_iterator_begin(
 		const char *prefix, unsigned int flags)
 {
 	struct files_ref_store *refs;
-	struct ref_dir *loose_dir, *packed_dir;
 	struct ref_iterator *loose_iter, *packed_iter;
 	struct files_ref_iterator *iter;
 	struct ref_iterator *ref_iterator;
@@ -1109,41 +1108,24 @@ static struct ref_iterator *files_ref_iterator_begin(
 	 * condition if loose refs are migrated to the packed-refs
 	 * file by a simultaneous process, but our in-memory view is
 	 * from before the migration. We ensure this as follows:
-	 * First, we call prime_ref_dir(), which pre-reads the loose
-	 * references for the subtree into the cache. (If they've
-	 * already been read, that's OK; we only need to guarantee
-	 * that they're read before the packed refs, not *how much*
-	 * before.) After that, we call get_packed_ref_cache(), which
-	 * internally checks whether the packed-ref cache is up to
-	 * date with what is on disk, and re-reads it if not.
+	 * First, we call start the loose refs iteration with its
+	 * `prime_ref` argument set to true. This causes the loose
+	 * references in the subtree to be pre-read into the cache.
+	 * (If they've already been read, that's OK; we only need to
+	 * guarantee that they're read before the packed refs, not
+	 * *how much* before.) After that, we call
+	 * get_packed_ref_cache(), which internally checks whether the
+	 * packed-ref cache is up to date with what is on disk, and
+	 * re-reads it if not.
 	 */
 
-	loose_dir = get_loose_ref_dir(refs);
-
-	if (prefix && *prefix)
-		loose_dir = find_containing_dir(loose_dir, prefix, 0);
-
-	if (loose_dir) {
-		prime_ref_dir(loose_dir);
-		loose_iter = cache_ref_iterator_begin(loose_dir);
-	} else {
-		/* There's nothing to iterate over. */
-		loose_iter = empty_ref_iterator_begin();
-	}
+	loose_iter = cache_ref_iterator_begin(get_loose_ref_cache(refs),
+					      prefix, 1);
 
 	iter->packed_ref_cache = get_packed_ref_cache(refs);
 	acquire_packed_ref_cache(iter->packed_ref_cache);
-	packed_dir = get_packed_ref_dir(iter->packed_ref_cache);
-
-	if (prefix && *prefix)
-		packed_dir = find_containing_dir(packed_dir, prefix, 0);
-
-	if (packed_dir) {
-		packed_iter = cache_ref_iterator_begin(packed_dir);
-	} else {
-		/* There's nothing to iterate over. */
-		packed_iter = empty_ref_iterator_begin();
-	}
+	packed_iter = cache_ref_iterator_begin(iter->packed_ref_cache->cache,
+					       prefix, 0);
 
 	iter->iter0 = overlay_ref_iterator_begin(loose_iter, packed_iter);
 	iter->flags = flags;
diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index 38d4c31985..b3a30350d7 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -177,8 +177,17 @@ static struct ref_dir *search_for_subdir(struct ref_dir *dir,
 	return get_ref_dir(entry);
 }
 
-struct ref_dir *find_containing_dir(struct ref_dir *dir,
-				    const char *refname, int mkdir)
+/*
+ * If refname is a reference name, find the ref_dir within the dir
+ * tree that should hold refname. If refname is a directory name
+ * (i.e., it ends in '/'), then return that ref_dir itself. dir must
+ * represent the top-level directory and must already be complete.
+ * Sort ref_dirs and recurse into subdirectories as necessary. If
+ * mkdir is set, then create any missing directories; otherwise,
+ * return NULL if the desired directory cannot be found.
+ */
+static struct ref_dir *find_containing_dir(struct ref_dir *dir,
+					   const char *refname, int mkdir)
 {
 	const char *slash;
 	for (slash = strchr(refname, '/'); slash; slash = strchr(slash + 1, '/')) {
@@ -328,7 +337,11 @@ int do_for_each_entry_in_dir(struct ref_dir *dir,
 	return 0;
 }
 
-void prime_ref_dir(struct ref_dir *dir)
+/*
+ * Load all of the refs from `dir` (recursively) into our in-memory
+ * cache.
+ */
+static void prime_ref_dir(struct ref_dir *dir)
 {
 	/*
 	 * The hard work of loading loose refs is done by get_ref_dir(), so we
@@ -494,12 +507,25 @@ static struct ref_iterator_vtable cache_ref_iterator_vtable = {
 	cache_ref_iterator_abort
 };
 
-struct ref_iterator *cache_ref_iterator_begin(struct ref_dir *dir)
+struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache,
+					      const char *prefix,
+					      int prime_dir)
 {
+	struct ref_dir *dir;
 	struct cache_ref_iterator *iter;
 	struct ref_iterator *ref_iterator;
 	struct cache_ref_iterator_level *level;
 
+	dir = get_ref_dir(cache->root);
+	if (prefix && *prefix)
+		dir = find_containing_dir(dir, prefix, 0);
+	if (!dir)
+		/* There's nothing to iterate over. */
+		return  empty_ref_iterator_begin();
+
+	if (prime_dir)
+		prime_ref_dir(dir);
+
 	iter = xcalloc(1, sizeof(*iter));
 	ref_iterator = &iter->base;
 	base_ref_iterator_init(ref_iterator, &cache_ref_iterator_vtable);
@@ -510,5 +536,9 @@ struct ref_iterator *cache_ref_iterator_begin(struct ref_dir *dir)
 	level->index = -1;
 	level->dir = dir;
 
+	if (prefix && *prefix)
+		ref_iterator = prefix_ref_iterator_begin(ref_iterator,
+							 prefix, 0);
+
 	return ref_iterator;
 }
diff --git a/refs/ref-cache.h b/refs/ref-cache.h
index 6eecdf4276..5e7a918ac0 100644
--- a/refs/ref-cache.h
+++ b/refs/ref-cache.h
@@ -234,18 +234,6 @@ int remove_entry_from_dir(struct ref_dir *dir, const char *refname);
  */
 int add_ref_entry(struct ref_dir *dir, struct ref_entry *ref);
 
-/*
- * If refname is a reference name, find the ref_dir within the dir
- * tree that should hold refname. If refname is a directory name
- * (i.e., it ends in '/'), then return that ref_dir itself. dir must
- * represent the top-level directory and must already be complete.
- * Sort ref_dirs and recurse into subdirectories as necessary. If
- * mkdir is set, then create any missing directories; otherwise,
- * return NULL if the desired directory cannot be found.
- */
-struct ref_dir *find_containing_dir(struct ref_dir *dir,
-				    const char *refname, int mkdir);
-
 /*
  * Find the value entry with the given name in dir, sorting ref_dirs
  * and recursing into subdirectories as necessary.  If the name is not
@@ -253,7 +241,15 @@ struct ref_dir *find_containing_dir(struct ref_dir *dir,
  */
 struct ref_entry *find_ref_entry(struct ref_dir *dir, const char *refname);
 
-struct ref_iterator *cache_ref_iterator_begin(struct ref_dir *dir);
+/*
+ * Start iterating over references in `cache`. If `prefix` is
+ * specified, only include references whose names start with that
+ * prefix. If `prime_dir` is true, then fill any incomplete
+ * directories before beginning the iteration.
+ */
+struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache,
+					      const char *prefix,
+					      int prime_dir);
 
 typedef int each_ref_entry_fn(struct ref_entry *entry, void *cb_data);
 
@@ -279,9 +275,4 @@ int do_for_each_entry_in_dir(struct ref_dir *dir,
  */
 enum peel_status peel_entry(struct ref_entry *entry, int repeel);
 
-/*
- * Load all of the refs from `dir` into our in-memory cache.
- */
-void prime_ref_dir(struct ref_dir *dir);
-
 #endif /* REFS_REF_CACHE_H */
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 18/20] commit_packed_refs(): use reference iteration
  2017-03-20 16:33 [PATCH 00/20] Separate `ref_cache` into a separate module Michael Haggerty
                   ` (16 preceding siblings ...)
  2017-03-20 16:33 ` [PATCH 17/20] cache_ref_iterator_begin(): make function smarter Michael Haggerty
@ 2017-03-20 16:33 ` Michael Haggerty
  2017-03-20 18:05   ` Jeff King
  2017-03-20 16:33 ` [PATCH 19/20] files_pack_refs(): " Michael Haggerty
                   ` (5 subsequent siblings)
  23 siblings, 1 reply; 38+ messages in thread
From: Michael Haggerty @ 2017-03-20 16:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, David Turner, Jeff King,
	git, Michael Haggerty

Use reference iteration rather than do_for_each_entry_in_dir() in the
definition of commit_packed_refs().

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/files-backend.c | 38 +++++++++++++++++---------------------
 1 file changed, 17 insertions(+), 21 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 736a6c9ff7..0ea42826c8 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1293,30 +1293,15 @@ static struct ref_lock *lock_ref_sha1_basic(struct files_ref_store *refs,
  * Write an entry to the packed-refs file for the specified refname.
  * If peeled is non-NULL, write it as the entry's peeled value.
  */
-static void write_packed_entry(FILE *fh, char *refname, unsigned char *sha1,
-			       unsigned char *peeled)
+static void write_packed_entry(FILE *fh, const char *refname,
+			       const unsigned char *sha1,
+			       const unsigned char *peeled)
 {
 	fprintf_or_die(fh, "%s %s\n", sha1_to_hex(sha1), refname);
 	if (peeled)
 		fprintf_or_die(fh, "^%s\n", sha1_to_hex(peeled));
 }
 
-/*
- * An each_ref_entry_fn that writes the entry to a packed-refs file.
- */
-static int write_packed_entry_fn(struct ref_entry *entry, void *cb_data)
-{
-	enum peel_status peel_status = peel_entry(entry, 0);
-
-	if (peel_status != PEEL_PEELED && peel_status != PEEL_NON_TAG)
-		error("internal error: %s is not a valid packed reference!",
-		      entry->name);
-	write_packed_entry(cb_data, entry->name, entry->u.value.oid.hash,
-			   peel_status == PEEL_PEELED ?
-			   entry->u.value.peeled.hash : NULL);
-	return 0;
-}
-
 /*
  * Lock the packed-refs file for writing. Flags is passed to
  * hold_lock_file_for_update(). Return 0 on success. On errors, set
@@ -1362,9 +1347,10 @@ static int commit_packed_refs(struct files_ref_store *refs)
 {
 	struct packed_ref_cache *packed_ref_cache =
 		get_packed_ref_cache(refs);
-	int error = 0;
+	int ok, error = 0;
 	int save_errno = 0;
 	FILE *out;
+	struct ref_iterator *iter;
 
 	files_assert_main_repository(refs, "commit_packed_refs");
 
@@ -1376,8 +1362,18 @@ static int commit_packed_refs(struct files_ref_store *refs)
 		die_errno("unable to fdopen packed-refs descriptor");
 
 	fprintf_or_die(out, "%s", PACKED_REFS_HEADER);
-	do_for_each_entry_in_dir(get_packed_ref_dir(packed_ref_cache),
-				 write_packed_entry_fn, out);
+
+	iter = cache_ref_iterator_begin(packed_ref_cache->cache, NULL, 0);
+	while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
+		struct object_id peeled;
+		int peel_error = ref_iterator_peel(iter, &peeled);
+
+		write_packed_entry(out, iter->refname, iter->oid->hash,
+				   peel_error ? NULL : peeled.hash);
+	}
+
+	if (ok != ITER_DONE)
+		die("error while iterating over references");
 
 	if (commit_lock_file(packed_ref_cache->lock)) {
 		save_errno = errno;
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 19/20] files_pack_refs(): use reference iteration
  2017-03-20 16:33 [PATCH 00/20] Separate `ref_cache` into a separate module Michael Haggerty
                   ` (17 preceding siblings ...)
  2017-03-20 16:33 ` [PATCH 18/20] commit_packed_refs(): use reference iteration Michael Haggerty
@ 2017-03-20 16:33 ` Michael Haggerty
  2017-03-20 16:33 ` [PATCH 20/20] do_for_each_entry_in_dir(): delete function Michael Haggerty
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Michael Haggerty @ 2017-03-20 16:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, David Turner, Jeff King,
	git, Michael Haggerty

Use reference iteration rather than do_for_each_entry_in_dir() in the
definition of files_pack_refs().

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/files-backend.c | 143 +++++++++++++++++++++------------------------------
 1 file changed, 60 insertions(+), 83 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 0ea42826c8..81d59e0de3 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -32,17 +32,6 @@ static int ref_resolves_to_object(const char *refname,
 	return 1;
 }
 
-/*
- * Return true if the reference described by entry can be resolved to
- * an object in the database; otherwise, emit a warning and return
- * false.
- */
-static int entry_resolves_to_object(struct ref_entry *entry)
-{
-	return ref_resolves_to_object(entry->name,
-				      &entry->u.value.oid, entry->flag);
-}
-
 struct packed_ref_cache {
 	struct ref_cache *cache;
 
@@ -548,11 +537,6 @@ static struct ref_cache *get_loose_ref_cache(struct files_ref_store *refs)
 	return refs->loose;
 }
 
-static struct ref_dir *get_loose_ref_dir(struct files_ref_store *refs)
-{
-	return get_ref_dir(get_loose_ref_cache(refs)->root);
-}
-
 /*
  * Return the ref_entry for the given refname from the packed
  * references.  If it does not exist, return NULL.
@@ -1411,65 +1395,6 @@ struct ref_to_prune {
 	char name[FLEX_ARRAY];
 };
 
-struct pack_refs_cb_data {
-	unsigned int flags;
-	struct ref_dir *packed_refs;
-	struct ref_to_prune *ref_to_prune;
-};
-
-/*
- * An each_ref_entry_fn that is run over loose references only.  If
- * the loose reference can be packed, add an entry in the packed ref
- * cache.  If the reference should be pruned, also add it to
- * ref_to_prune in the pack_refs_cb_data.
- */
-static int pack_if_possible_fn(struct ref_entry *entry, void *cb_data)
-{
-	struct pack_refs_cb_data *cb = cb_data;
-	enum peel_status peel_status;
-	struct ref_entry *packed_entry;
-	int is_tag_ref = starts_with(entry->name, "refs/tags/");
-
-	/* Do not pack per-worktree refs: */
-	if (ref_type(entry->name) != REF_TYPE_NORMAL)
-		return 0;
-
-	/* ALWAYS pack tags */
-	if (!(cb->flags & PACK_REFS_ALL) && !is_tag_ref)
-		return 0;
-
-	/* Do not pack symbolic or broken refs: */
-	if ((entry->flag & REF_ISSYMREF) || !entry_resolves_to_object(entry))
-		return 0;
-
-	/* Add a packed ref cache entry equivalent to the loose entry. */
-	peel_status = peel_entry(entry, 1);
-	if (peel_status != PEEL_PEELED && peel_status != PEEL_NON_TAG)
-		die("internal error peeling reference %s (%s)",
-		    entry->name, oid_to_hex(&entry->u.value.oid));
-	packed_entry = find_ref_entry(cb->packed_refs, entry->name);
-	if (packed_entry) {
-		/* Overwrite existing packed entry with info from loose entry */
-		packed_entry->flag = REF_ISPACKED | REF_KNOWS_PEELED;
-		oidcpy(&packed_entry->u.value.oid, &entry->u.value.oid);
-	} else {
-		packed_entry = create_ref_entry(entry->name, entry->u.value.oid.hash,
-						REF_ISPACKED | REF_KNOWS_PEELED, 0);
-		add_ref_entry(cb->packed_refs, packed_entry);
-	}
-	oidcpy(&packed_entry->u.value.peeled, &entry->u.value.peeled);
-
-	/* Schedule the loose reference for pruning if requested. */
-	if ((cb->flags & PACK_REFS_PRUNE)) {
-		struct ref_to_prune *n;
-		FLEX_ALLOC_STR(n, name, entry->name);
-		hashcpy(n->sha1, entry->u.value.oid.hash);
-		n->next = cb->ref_to_prune;
-		cb->ref_to_prune = n;
-	}
-	return 0;
-}
-
 enum {
 	REMOVE_EMPTY_PARENTS_REF = 0x01,
 	REMOVE_EMPTY_PARENTS_REFLOG = 0x02
@@ -1559,21 +1484,73 @@ static int files_pack_refs(struct ref_store *ref_store, unsigned int flags)
 	struct files_ref_store *refs =
 		files_downcast(ref_store, REF_STORE_WRITE | REF_STORE_ODB,
 			       "pack_refs");
-	struct pack_refs_cb_data cbdata;
-
-	memset(&cbdata, 0, sizeof(cbdata));
-	cbdata.flags = flags;
+	struct ref_iterator *iter;
+	struct ref_dir *packed_refs;
+	int ok;
+	struct ref_to_prune *refs_to_prune = NULL;
 
 	lock_packed_refs(refs, LOCK_DIE_ON_ERROR);
-	cbdata.packed_refs = get_packed_refs(refs);
+	packed_refs = get_packed_refs(refs);
+
+	iter = cache_ref_iterator_begin(get_loose_ref_cache(refs), NULL, 0);
+	while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
+		/*
+		 * If the loose reference can be packed, add an entry
+		 * in the packed ref cache. If the reference should be
+		 * pruned, also add it to refs_to_prune.
+		 */
+		struct ref_entry *packed_entry;
+		int is_tag_ref = starts_with(iter->refname, "refs/tags/");
+
+		/* Do not pack per-worktree refs: */
+		if (ref_type(iter->refname) != REF_TYPE_NORMAL)
+			continue;
+
+		/* ALWAYS pack tags */
+		if (!(flags & PACK_REFS_ALL) && !is_tag_ref)
+			continue;
+
+		/* Do not pack symbolic or broken refs: */
+		if (iter->flags & REF_ISSYMREF)
+			continue;
 
-	do_for_each_entry_in_dir(get_loose_ref_dir(refs),
-				 pack_if_possible_fn, &cbdata);
+		if (!ref_resolves_to_object(iter->refname, iter->oid, iter->flags))
+			continue;
+
+		/*
+		 * Create an entry in the packed-refs cache equivalent
+		 * to the one from the loose refs cache, except that
+		 * we don't copy the peeled status, because we want it
+		 * to be re-peeled.
+		 */
+		packed_entry = find_ref_entry(packed_refs, iter->refname);
+		if (packed_entry) {
+			/* Overwrite existing packed entry with info from loose entry */
+			packed_entry->flag = REF_ISPACKED;
+			oidcpy(&packed_entry->u.value.oid, iter->oid);
+		} else {
+			packed_entry = create_ref_entry(iter->refname, iter->oid->hash,
+							REF_ISPACKED, 0);
+			add_ref_entry(packed_refs, packed_entry);
+		}
+		oidclr(&packed_entry->u.value.peeled);
+
+		/* Schedule the loose reference for pruning if requested. */
+		if ((flags & PACK_REFS_PRUNE)) {
+			struct ref_to_prune *n;
+			FLEX_ALLOC_STR(n, name, iter->refname);
+			hashcpy(n->sha1, iter->oid->hash);
+			n->next = refs_to_prune;
+			refs_to_prune = n;
+		}
+	}
+	if (ok != ITER_DONE)
+		die("error while iterating over references");
 
 	if (commit_packed_refs(refs))
 		die_errno("unable to overwrite old ref-pack file");
 
-	prune_refs(refs, cbdata.ref_to_prune);
+	prune_refs(refs, refs_to_prune);
 	return 0;
 }
 
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH 20/20] do_for_each_entry_in_dir(): delete function
  2017-03-20 16:33 [PATCH 00/20] Separate `ref_cache` into a separate module Michael Haggerty
                   ` (18 preceding siblings ...)
  2017-03-20 16:33 ` [PATCH 19/20] files_pack_refs(): " Michael Haggerty
@ 2017-03-20 16:33 ` Michael Haggerty
  2017-03-20 17:25 ` [PATCH 00/20] Separate `ref_cache` into a separate module Junio C Hamano
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Michael Haggerty @ 2017-03-20 16:33 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, David Turner, Jeff King,
	git, Michael Haggerty

Its only remaining caller was itself.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/ref-cache.c | 21 ---------------------
 refs/ref-cache.h | 11 -----------
 2 files changed, 32 deletions(-)

diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index b3a30350d7..6059362f1d 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -316,27 +316,6 @@ static void sort_ref_dir(struct ref_dir *dir)
 	dir->sorted = dir->nr = i;
 }
 
-int do_for_each_entry_in_dir(struct ref_dir *dir,
-			     each_ref_entry_fn fn, void *cb_data)
-{
-	int i;
-	assert(dir->sorted == dir->nr);
-	for (i = 0; i < dir->nr; i++) {
-		struct ref_entry *entry = dir->entries[i];
-		int retval;
-		if (entry->flag & REF_DIR) {
-			struct ref_dir *subdir = get_ref_dir(entry);
-			sort_ref_dir(subdir);
-			retval = do_for_each_entry_in_dir(subdir, fn, cb_data);
-		} else {
-			retval = fn(entry, cb_data);
-		}
-		if (retval)
-			return retval;
-	}
-	return 0;
-}
-
 /*
  * Load all of the refs from `dir` (recursively) into our in-memory
  * cache.
diff --git a/refs/ref-cache.h b/refs/ref-cache.h
index 5e7a918ac0..ffdc54f3f0 100644
--- a/refs/ref-cache.h
+++ b/refs/ref-cache.h
@@ -251,17 +251,6 @@ struct ref_iterator *cache_ref_iterator_begin(struct ref_cache *cache,
 					      const char *prefix,
 					      int prime_dir);
 
-typedef int each_ref_entry_fn(struct ref_entry *entry, void *cb_data);
-
-/*
- * Call `fn` for each reference in `dir`. Recurse into subdirectories,
- * sorting them before iterating. This function does not sort `dir`
- * itself; it should be sorted beforehand. `fn` is called for all
- * references, including broken ones.
- */
-int do_for_each_entry_in_dir(struct ref_dir *dir,
-			     each_ref_entry_fn fn, void *cb_data);
-
 /*
  * Peel the entry (if possible) and return its new peel_status.  If
  * repeel is true, re-peel the entry even if there is an old peeled
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/20] Separate `ref_cache` into a separate module
  2017-03-20 16:33 [PATCH 00/20] Separate `ref_cache` into a separate module Michael Haggerty
                   ` (19 preceding siblings ...)
  2017-03-20 16:33 ` [PATCH 20/20] do_for_each_entry_in_dir(): delete function Michael Haggerty
@ 2017-03-20 17:25 ` Junio C Hamano
  2017-03-20 18:12 ` Jeff King
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 38+ messages in thread
From: Junio C Hamano @ 2017-03-20 17:25 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Nguyễn Thái Ngọc Duy, David Turner, Jeff King,
	git

Michael Haggerty <mhagger@alum.mit.edu> writes:

> * (And the main goal): Avoid reading and parsing the *whole
>   `packed-refs` file* (as we do now) every time any part of it is
>   needed. Instead, use binary search to find the reference and/or
>   range of references that we want, and parse the info out of the
>   mmapped image on the fly.

Oooohh....  Juicy.

> This patch series extracts a `ref_cache` module out of
> `files_ref_cache`, and goes some way to disentangling those two
> modules, which until now were overly intimate with each other:
>
> * Remove `verify_refname_available()` from the refs VTABLE, instead
>   implementing it in a generic way that uses only the usual refs API
>   to talk to the `ref_store`.
>
> * Split `ref_cache`-related code into a new module,
>   `refs/ref-cache.{c,h}`. Encapsulate the data structure in a new
>   class, `struct ref_cache`.
>
> * Change the lazy-filling mechanism of `ref_cache` to call back to its
>   backing `ref_store` via a callback function rather than calling
>   `read_loose_refs()` directly.

Nice.

> * Move the special handling of `refs/bisect/` from `ref_cache` to
>   `files_ref_store`.
>
> * Make `cache_ref_iterator_begin()` smarter, and change external users
>   to iterate via this interface instead of using
>   `do_for_each_entry_in_dir()`.
>
> Even after this patch series, the modules are still too intimate for
> my taste, but I think this is a big step forward, and it is enough to
> allow the other changes that I've been working on.
>
> These patches depend on Duy's nd/files-backend-git-dir branch, v6 [2].
> They are also available from my GitHub fork [1] as branch
> `separate-ref-cache`.
>
> Happily, this patch series actually removes a few more lines than it
> adds, mostly thanks to the simpler `verify_refname_available()`
> implementation.

Thanks.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 04/20] refs_verify_refname_available(): implement once for all backends
  2017-03-20 16:33 ` [PATCH 04/20] refs_verify_refname_available(): implement once for all backends Michael Haggerty
@ 2017-03-20 17:42   ` Jeff King
  2017-03-20 22:20     ` Michael Haggerty
  0 siblings, 1 reply; 38+ messages in thread
From: Jeff King @ 2017-03-20 17:42 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy,
	David Turner, git

On Mon, Mar 20, 2017 at 05:33:09PM +0100, Michael Haggerty wrote:

> It turns out that we can now implement
> `refs_verify_refname_available()` based on the other virtual
> functions, so there is no need for it to be defined at the backend
> level. Instead, define it once in `refs.c` and remove the
> `files_backend` definition.

Does this mean that backends can no longer provide storage for
D/F-conflicted refnames (i.e., "refs/foo" and "refs/foo/bar")? It looks
like the global verify_refname_available() has that logic baked in.

I know that was a complex subject because of the potential compatibility
issues (e.g., fetching from a server with a more capable backend), but
I'd just worry we are shutting a door on some options. OTOH, it probably
wouldn't be that hard for the global function to check a flag specific
to the ref_store, and allow such refs when it is set.

>  int refs_verify_refname_available(struct ref_store *refs,
>  				  const char *refname,
> -				  const struct string_list *extra,
> +				  const struct string_list *extras,
>  				  const struct string_list *skip,
>  				  struct strbuf *err)
>  {
> [...]
> +		/*
> +		 * We are still at a leading dir of the refname (e.g.,
> +		 * "refs/foo"; if there is a reference with that name,
> +		 * it is a conflict, *unless* it is in skip.
> +		 */
> +		if (skip && string_list_has_string(skip, dirname.buf))
> +			continue;
> +
> +		if (!refs_read_raw_ref(refs, dirname.buf, oid.hash, &referent, &type)) {
> +			strbuf_addf(err, "'%s' exists; cannot create '%s'",
> +				    dirname.buf, refname);
> +			goto cleanup;
> +		}

We don't really care about reading the ref value here; we just care if
it exists. Does that matter for efficiency (e.g., for the files backend
it's a stat() versus an open/read/close)? I guess the difference only
matters when it _does_ exist, which is the uncommon error case.

(Also, I suspect the loose ref cache always just reads everything in the
current code, though with the iterator approach in theory we could stop
doing that).

-Peff

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 09/20] refs: split `ref_cache` code into separate files
  2017-03-20 16:33 ` [PATCH 09/20] refs: split `ref_cache` code into separate files Michael Haggerty
@ 2017-03-20 17:49   ` Jeff King
  2017-03-20 19:47     ` Junio C Hamano
  0 siblings, 1 reply; 38+ messages in thread
From: Jeff King @ 2017-03-20 17:49 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy,
	David Turner, git

On Mon, Mar 20, 2017 at 05:33:14PM +0100, Michael Haggerty wrote:

> The `ref_cache` code is currently too tightly coupled to
> `files-backend`, making the code harder to understand and making it
> awkward for new code to use `ref_cache` (as we indeed have planned).
> Start loosening that coupling by splitting `ref_cache` into a separate
> module.
> 
> This commit moves code, adds declarations, and changes the visibility
> of some functions, but doesn't change any code.
> 
> The modules are still too tightly coupled, but the situation will be
> improved in subsequent commits.
> 
> Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
> ---
>  Makefile             |   1 +
>  refs/files-backend.c | 736 +--------------------------------------------------
>  refs/ref-cache.c     | 512 +++++++++++++++++++++++++++++++++++
>  refs/ref-cache.h     | 251 ++++++++++++++++++
>  4 files changed, 767 insertions(+), 733 deletions(-)
>  create mode 100644 refs/ref-cache.c
>  create mode 100644 refs/ref-cache.h

I'm not surprised that stock format-patch didn't treat this as a rename,
but I would have thought it would with "-B". It doesn't seem to, though.
Perhaps because of the code movement, etc.

It might have been a bit easier to review as separate steps for that
reason, but I doubt it's worth going back and re-doing. I'll take your
word that nothing substantive changed.

-Peff

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 11/20] refs: record the ref_store in ref_cache, not ref_dir
  2017-03-20 16:33 ` [PATCH 11/20] refs: record the ref_store in ref_cache, not ref_dir Michael Haggerty
@ 2017-03-20 17:51   ` Jeff King
  2017-03-20 22:39     ` Michael Haggerty
  0 siblings, 1 reply; 38+ messages in thread
From: Jeff King @ 2017-03-20 17:51 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy,
	David Turner, git

On Mon, Mar 20, 2017 at 05:33:16PM +0100, Michael Haggerty wrote:

> Instead of keeping a pointer to the ref_store in every ref_dir entry,
> store it once in `struct ref_cache`, and change `struct ref_dir` to
> include a pointer to its containing `ref_cache` instead. This makes it
> easier to add to the information that is stored in `struct ref_cache`
> without inflating the size of the individual entries.

This last sentence confused me. It's a pointer either way, no?

Do you just mean that we are free to add whatever we like to the
"ref_cache" without polluting the "ref_store" that is a more public data
structure?

-Peff

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 18/20] commit_packed_refs(): use reference iteration
  2017-03-20 16:33 ` [PATCH 18/20] commit_packed_refs(): use reference iteration Michael Haggerty
@ 2017-03-20 18:05   ` Jeff King
  2017-03-22  8:42     ` Michael Haggerty
  0 siblings, 1 reply; 38+ messages in thread
From: Jeff King @ 2017-03-20 18:05 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy,
	David Turner, git

On Mon, Mar 20, 2017 at 05:33:23PM +0100, Michael Haggerty wrote:

> -/*
> - * An each_ref_entry_fn that writes the entry to a packed-refs file.
> - */
> -static int write_packed_entry_fn(struct ref_entry *entry, void *cb_data)
> -{
> -	enum peel_status peel_status = peel_entry(entry, 0);
> -
> -	if (peel_status != PEEL_PEELED && peel_status != PEEL_NON_TAG)
> -		error("internal error: %s is not a valid packed reference!",
> -		      entry->name);
> -	write_packed_entry(cb_data, entry->name, entry->u.value.oid.hash,
> -			   peel_status == PEEL_PEELED ?
> -			   entry->u.value.peeled.hash : NULL);
> -	return 0;
> -}

This assertion goes away. It can't be moved into write_packed_entry()
because the peel status is only known in the caller.

But here:

> @@ -1376,8 +1362,18 @@ static int commit_packed_refs(struct files_ref_store *refs)
>  		die_errno("unable to fdopen packed-refs descriptor");
>  
>  	fprintf_or_die(out, "%s", PACKED_REFS_HEADER);
> -	do_for_each_entry_in_dir(get_packed_ref_dir(packed_ref_cache),
> -				 write_packed_entry_fn, out);
> +
> +	iter = cache_ref_iterator_begin(packed_ref_cache->cache, NULL, 0);
> +	while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
> +		struct object_id peeled;
> +		int peel_error = ref_iterator_peel(iter, &peeled);
> +
> +		write_packed_entry(out, iter->refname, iter->oid->hash,
> +				   peel_error ? NULL : peeled.hash);
> +	}

Should we be checking that peel_error is only PEELED or NON_TAG?

-Peff

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/20] Separate `ref_cache` into a separate module
  2017-03-20 16:33 [PATCH 00/20] Separate `ref_cache` into a separate module Michael Haggerty
                   ` (20 preceding siblings ...)
  2017-03-20 17:25 ` [PATCH 00/20] Separate `ref_cache` into a separate module Junio C Hamano
@ 2017-03-20 18:12 ` Jeff King
  2017-03-20 18:24 ` Ævar Arnfjörð Bjarmason
  2017-03-20 22:32 ` Junio C Hamano
  23 siblings, 0 replies; 38+ messages in thread
From: Jeff King @ 2017-03-20 18:12 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy,
	David Turner, git

On Mon, Mar 20, 2017 at 05:33:05PM +0100, Michael Haggerty wrote:

> I've completed a draft of an epic 48-patch series implementing all of
> the above points on my GitHub fork [1] as branch
> `wip/mmap-packed-refs`. It dramatically speeds up performance and
> reduces memory usage for some tasks in repositories with very many
> packed references.

Having played a bit with the work-in-progress, I'm very excited about
the performance improvements.

> But the later parts of that series aren't completely polished yet, and
> such a large patch series would be indigestible anyway, so here I
> submit the first part...
> 
> This patch series extracts a `ref_cache` module out of
> `files_ref_cache`, and goes some way to disentangling those two
> modules, which until now were overly intimate with each other:
> [...]

I just read through it and didn't see anything objectionable. I had a
few questions, but I expect that most can be answered with an
explanation rather than a re-roll.

> Even after this patch series, the modules are still too intimate for
> my taste, but I think this is a big step forward, and it is enough to
> allow the other changes that I've been working on.

The resulting code looks like a big improvement in modularity to me. My
one complaint is that the virtual functions make it hard to dig through
the code. E.g., when looking at one patch I saw that we called
ref_iterator_peel(), and I wanted to know what that entailed. My editor
helpfully jumps straight to the definition, but of course it has nothing
useful; it's just a vtable wrapper. And I had to go dig up the name
"files_peel_ref()" manually.

I guess that's the price we pay for modularity.

-Peff

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/20] Separate `ref_cache` into a separate module
  2017-03-20 16:33 [PATCH 00/20] Separate `ref_cache` into a separate module Michael Haggerty
                   ` (21 preceding siblings ...)
  2017-03-20 18:12 ` Jeff King
@ 2017-03-20 18:24 ` Ævar Arnfjörð Bjarmason
  2017-03-20 18:30   ` Jeff King
  2017-03-20 22:32 ` Junio C Hamano
  23 siblings, 1 reply; 38+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-03-20 18:24 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy,
	David Turner, Jeff King, Git Mailing List

On Mon, Mar 20, 2017 at 5:33 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:

> I've completed a draft of an epic 48-patch series implementing all of
> the above points on my GitHub fork [1] as branch
> `wip/mmap-packed-refs`. It dramatically speeds up performance and
> reduces memory usage for some tasks in repositories with very many
> packed references.

Just curious, what sort of performance numbers on mainly what sort of
operations? I'd expect the fixed cost of e.g. `git log -1 >/dev/null`
to go down, which can take as long as a second on some repos with a
huge amount of refs, but anything else?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/20] Separate `ref_cache` into a separate module
  2017-03-20 18:24 ` Ævar Arnfjörð Bjarmason
@ 2017-03-20 18:30   ` Jeff King
  0 siblings, 0 replies; 38+ messages in thread
From: Jeff King @ 2017-03-20 18:30 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Michael Haggerty, Junio C Hamano,
	Nguyễn Thái Ngọc Duy, David Turner,
	Git Mailing List

On Mon, Mar 20, 2017 at 07:24:12PM +0100, Ævar Arnfjörð Bjarmason wrote:

> On Mon, Mar 20, 2017 at 5:33 PM, Michael Haggerty <mhagger@alum.mit.edu> wrote:
> 
> > I've completed a draft of an epic 48-patch series implementing all of
> > the above points on my GitHub fork [1] as branch
> > `wip/mmap-packed-refs`. It dramatically speeds up performance and
> > reduces memory usage for some tasks in repositories with very many
> > packed references.
> 
> Just curious, what sort of performance numbers on mainly what sort of
> operations? I'd expect the fixed cost of e.g. `git log -1 >/dev/null`
> to go down, which can take as long as a second on some repos with a
> huge amount of refs, but anything else?

Anything that reads refs (and that includes anything that touches an
object, because it looks at refs/replace) loads the entire packed-refs
file into memory. So I'd expect you're paying that "1 second" cost on a
lot of commands.

Even commands that traverse the entire ref namespace should improve, as
we'd spend a lot less time allocating and copying data into the
in-memory refs cache. Not to mention that the memory usage is reduced,
though that probably only really matters for pathological cases.

-Peff

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 09/20] refs: split `ref_cache` code into separate files
  2017-03-20 17:49   ` Jeff King
@ 2017-03-20 19:47     ` Junio C Hamano
  2017-03-20 20:35       ` Stefan Beller
  0 siblings, 1 reply; 38+ messages in thread
From: Junio C Hamano @ 2017-03-20 19:47 UTC (permalink / raw)
  To: Jeff King
  Cc: Michael Haggerty, Nguyễn Thái Ngọc Duy,
	David Turner, git

Jeff King <peff@peff.net> writes:

> It might have been a bit easier to review as separate steps for that
> reason, but I doubt it's worth going back and re-doing. I'll take your
> word that nothing substantive changed.

In such a case when I do not want to "take your word", I often do
"blame -C" on the result, setting the bottom of the range close-by.
I should mostly see the ^bottom lines (or with "blame -b", lines
intended with blanks without any commit object name).


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 09/20] refs: split `ref_cache` code into separate files
  2017-03-20 19:47     ` Junio C Hamano
@ 2017-03-20 20:35       ` Stefan Beller
  2017-03-20 22:40         ` Junio C Hamano
  0 siblings, 1 reply; 38+ messages in thread
From: Stefan Beller @ 2017-03-20 20:35 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Jeff King, Michael Haggerty,
	Nguyễn Thái Ngọc Duy, David Turner,
	git@vger.kernel.org

On Mon, Mar 20, 2017 at 12:47 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Jeff King <peff@peff.net> writes:
>
>> It might have been a bit easier to review as separate steps for that
>> reason, but I doubt it's worth going back and re-doing. I'll take your
>> word that nothing substantive changed.
>
> In such a case when I do not want to "take your word", I often do
> "blame -C" on the result, setting the bottom of the range close-by.
> I should mostly see the ^bottom lines (or with "blame -b", lines
> intended with blanks without any commit object name).

FYI:
I still want to revive the series that colors moved text differently.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 04/20] refs_verify_refname_available(): implement once for all backends
  2017-03-20 17:42   ` Jeff King
@ 2017-03-20 22:20     ` Michael Haggerty
  0 siblings, 0 replies; 38+ messages in thread
From: Michael Haggerty @ 2017-03-20 22:20 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy,
	David Turner, git

On 03/20/2017 06:42 PM, Jeff King wrote:
> On Mon, Mar 20, 2017 at 05:33:09PM +0100, Michael Haggerty wrote:
> 
>> It turns out that we can now implement
>> `refs_verify_refname_available()` based on the other virtual
>> functions, so there is no need for it to be defined at the backend
>> level. Instead, define it once in `refs.c` and remove the
>> `files_backend` definition.
> 
> Does this mean that backends can no longer provide storage for
> D/F-conflicted refnames (i.e., "refs/foo" and "refs/foo/bar")? It looks
> like the global verify_refname_available() has that logic baked in.

The `verify_refname_available()` function implements the "no D/F
conflict" policy. But it's called from the backends, not from the common
code, and nobody says that a backend needs to call the function.

> [...]
>>  int refs_verify_refname_available(struct ref_store *refs,
>>  				  const char *refname,
>> -				  const struct string_list *extra,
>> +				  const struct string_list *extras,
>>  				  const struct string_list *skip,
>>  				  struct strbuf *err)
>>  {
>> [...]
>> +		/*
>> +		 * We are still at a leading dir of the refname (e.g.,
>> +		 * "refs/foo"; if there is a reference with that name,
>> +		 * it is a conflict, *unless* it is in skip.
>> +		 */
>> +		if (skip && string_list_has_string(skip, dirname.buf))
>> +			continue;
>> +
>> +		if (!refs_read_raw_ref(refs, dirname.buf, oid.hash, &referent, &type)) {
>> +			strbuf_addf(err, "'%s' exists; cannot create '%s'",
>> +				    dirname.buf, refname);
>> +			goto cleanup;
>> +		}
> 
> We don't really care about reading the ref value here; we just care if
> it exists. Does that matter for efficiency (e.g., for the files backend
> it's a stat() versus an open/read/close)? I guess the difference only
> matters when it _does_ exist, which is the uncommon error case.

Yes, I assume that the conflict cases are unusual so I wasn't worrying
too much about their performance.

Not quite so obvious is that the new code sometimes checks for conflicts
against both loose and packed references in situations where the old
code only checked against packed references. Specifically, this happens
when the code has just read, or locked, or failed to find a loose
reference, so it could infer that there are no loose references that
could conflict. I don't think that will be noticeable, because it is the
reading of the whole `packed-refs` file that is a big expensive
operation that it is important to avoid. Anyway, later patches (i.e.,
after there is a `packed_ref_store`) can switch back to checking only
packed references and get back the old optimization.

> (Also, I suspect the loose ref cache always just reads everything in the
> current code, though with the iterator approach in theory we could stop
> doing that).

The loose ref cache reads directories into the cache lazily,
breadth-first. So it reads all of the entries in the directories leading
to the reference being looked up, but no others. When iterating, it
reads the parent directories of the prefix that is being iterated over
plus the whole subtree under the prefix, and that's it (though this
optimization is not yet wired through to `git for-each-ref`).

Michael


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/20] Separate `ref_cache` into a separate module
  2017-03-20 16:33 [PATCH 00/20] Separate `ref_cache` into a separate module Michael Haggerty
                   ` (22 preceding siblings ...)
  2017-03-20 18:24 ` Ævar Arnfjörð Bjarmason
@ 2017-03-20 22:32 ` Junio C Hamano
  2017-03-20 22:48   ` Michael Haggerty
  23 siblings, 1 reply; 38+ messages in thread
From: Junio C Hamano @ 2017-03-20 22:32 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Nguyễn Thái Ngọc Duy, David Turner, Jeff King,
	git

Michael Haggerty <mhagger@alum.mit.edu> writes:

> Michael Haggerty (20):
>   get_ref_dir(): don't call read_loose_refs() for "refs/bisect"
>   refs_read_raw_ref(): new function
>   refs_ref_iterator_begin(): new function
>   refs_verify_refname_available(): implement once for all backends
>   refs_verify_refname_available(): use function in more places
>   Rename `add_ref()` to `add_ref_entry()`
>   Rename `find_ref()` to `find_ref_entry()`
>   Rename `remove_entry()` to `remove_entry_from_dir()`
>   refs: split `ref_cache` code into separate files
>   ref-cache: introduce a new type, ref_cache
>   refs: record the ref_store in ref_cache, not ref_dir
>   ref-cache: use a callback function to fill the cache
>   refs: handle "refs/bisect/" in `loose_fill_ref_dir()`
>   do_for_each_entry_in_dir(): eliminate `offset` argument
>   get_loose_ref_dir(): function renamed from get_loose_refs()
>   get_loose_ref_cache(): new function
>   cache_ref_iterator_begin(): make function smarter
>   commit_packed_refs(): use reference iteration
>   files_pack_refs(): use reference iteration
>   do_for_each_entry_in_dir(): delete function

These mostly seem to be ref-cache but there is one mention of "refs
cache", and also the topic cover says ref_cache.  Which one is the
canonical one?  I'd assume it is "ref-cache" (or "ref cache").

The reason I am asking is because I wanted to give the three
"Rename" ones an "<area>:" prefix, and then noticed that the
shortlog output looked somewhat incoherent.

Thanks.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 11/20] refs: record the ref_store in ref_cache, not ref_dir
  2017-03-20 17:51   ` Jeff King
@ 2017-03-20 22:39     ` Michael Haggerty
  0 siblings, 0 replies; 38+ messages in thread
From: Michael Haggerty @ 2017-03-20 22:39 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy,
	David Turner, git

On 03/20/2017 06:51 PM, Jeff King wrote:
> On Mon, Mar 20, 2017 at 05:33:16PM +0100, Michael Haggerty wrote:
> 
>> Instead of keeping a pointer to the ref_store in every ref_dir entry,
>> store it once in `struct ref_cache`, and change `struct ref_dir` to
>> include a pointer to its containing `ref_cache` instead. This makes it
>> easier to add to the information that is stored in `struct ref_cache`
>> without inflating the size of the individual entries.
> 
> This last sentence confused me. It's a pointer either way, no?
> 
> Do you just mean that we are free to add whatever we like to the
> "ref_cache" without polluting the "ref_store" that is a more public data
> structure?

Yeah, that's explained poorly. It might be clearer as

> This makes it easier to add to the information that is accessible
> from a `ref_dir` without increasing the size of every `ref_dir`
> instance.

It used to be that `ref_dir` contained a pointer to the `ref_store` that
contains it. (BTW, such a pointer *can't* be turned into a pointer to
the `ref_cache` because (1) the `ref_dir` shouldn't have to know what
kind of `ref_store` it is being used with, and (2) a `packed_ref_cache`
can be detached from its old `ref_cache` if the stat info indicates that
the `packed-refs` file has been modified since it was last read.)

But we want to add a `fill_ref_dir` callback pointer in a place that is
reachable from the `ref_dir` so that it can fill itself when necessary.
We could add it to the `ref_dir` struct, but that would inflate its
size. Instead, it makes more sense for the `ref_dir` to have a pointer
to the enclosing `ref_cache`, and store the `fill_ref_dir` pointer in
`ref_cache`. The `ref_cache` also gets a pointer to the enclosing
`ref_store`; that way a `ref_dir` also has a way to access the
`ref_store` that contains it.

An alternative would be to pass a pointer to the `ref_cache` down the
call stack when it is being accessed, but that seemed like a lot of work
that wouldn't lead to a very clean result.

Michael


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 09/20] refs: split `ref_cache` code into separate files
  2017-03-20 20:35       ` Stefan Beller
@ 2017-03-20 22:40         ` Junio C Hamano
  0 siblings, 0 replies; 38+ messages in thread
From: Junio C Hamano @ 2017-03-20 22:40 UTC (permalink / raw)
  To: Stefan Beller
  Cc: Jeff King, Michael Haggerty,
	Nguyễn Thái Ngọc Duy, David Turner,
	git@vger.kernel.org

Stefan Beller <sbeller@google.com> writes:

> On Mon, Mar 20, 2017 at 12:47 PM, Junio C Hamano <gitster@pobox.com> wrote:
>> Jeff King <peff@peff.net> writes:
>>
>>> It might have been a bit easier to review as separate steps for that
>>> reason, but I doubt it's worth going back and re-doing. I'll take your
>>> word that nothing substantive changed.
>>
>> In such a case when I do not want to "take your word", I often do
>> "blame -C" on the result, setting the bottom of the range close-by.
>> I should mostly see the ^bottom lines (or with "blame -b", lines
>> intended with blanks without any commit object name).
>
> FYI:
> I still want to revive the series that colors moved text differently.

I agree that the mode would help viewing this change with "git
show".

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 00/20] Separate `ref_cache` into a separate module
  2017-03-20 22:32 ` Junio C Hamano
@ 2017-03-20 22:48   ` Michael Haggerty
  0 siblings, 0 replies; 38+ messages in thread
From: Michael Haggerty @ 2017-03-20 22:48 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Nguyễn Thái Ngọc Duy, David Turner, Jeff King,
	git

On 03/20/2017 11:32 PM, Junio C Hamano wrote:
> Michael Haggerty <mhagger@alum.mit.edu> writes:
> 
>> Michael Haggerty (20):
>>   get_ref_dir(): don't call read_loose_refs() for "refs/bisect"
>>   refs_read_raw_ref(): new function
>>   refs_ref_iterator_begin(): new function
>>   refs_verify_refname_available(): implement once for all backends
>>   refs_verify_refname_available(): use function in more places
>>   Rename `add_ref()` to `add_ref_entry()`
>>   Rename `find_ref()` to `find_ref_entry()`
>>   Rename `remove_entry()` to `remove_entry_from_dir()`
>>   refs: split `ref_cache` code into separate files
>>   ref-cache: introduce a new type, ref_cache
>>   refs: record the ref_store in ref_cache, not ref_dir
>>   ref-cache: use a callback function to fill the cache
>>   refs: handle "refs/bisect/" in `loose_fill_ref_dir()`
>>   do_for_each_entry_in_dir(): eliminate `offset` argument
>>   get_loose_ref_dir(): function renamed from get_loose_refs()
>>   get_loose_ref_cache(): new function
>>   cache_ref_iterator_begin(): make function smarter
>>   commit_packed_refs(): use reference iteration
>>   files_pack_refs(): use reference iteration
>>   do_for_each_entry_in_dir(): delete function
> 
> These mostly seem to be ref-cache but there is one mention of "refs
> cache", and also the topic cover says ref_cache.  Which one is the
> canonical one?  I'd assume it is "ref-cache" (or "ref cache").
> 
> The reason I am asking is because I wanted to give the three
> "Rename" ones an "<area>:" prefix, and then noticed that the
> shortlog output looked somewhat incoherent.

It's meant to be "ref-cache", short for "reference cache", which seems
more natural than "references cache". But I admit that I often type it
wrong because of muscle memory trained on other things named "refs_blah".

Thanks for noticing.

Michael


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 18/20] commit_packed_refs(): use reference iteration
  2017-03-20 18:05   ` Jeff King
@ 2017-03-22  8:42     ` Michael Haggerty
  2017-03-22 13:06       ` Jeff King
  0 siblings, 1 reply; 38+ messages in thread
From: Michael Haggerty @ 2017-03-22  8:42 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy,
	David Turner, git

On 03/20/2017 07:05 PM, Jeff King wrote:
> On Mon, Mar 20, 2017 at 05:33:23PM +0100, Michael Haggerty wrote:
> 
>> -/*
>> - * An each_ref_entry_fn that writes the entry to a packed-refs file.
>> - */
>> -static int write_packed_entry_fn(struct ref_entry *entry, void *cb_data)
>> -{
>> -	enum peel_status peel_status = peel_entry(entry, 0);
>> -
>> -	if (peel_status != PEEL_PEELED && peel_status != PEEL_NON_TAG)
>> -		error("internal error: %s is not a valid packed reference!",
>> -		      entry->name);
>> -	write_packed_entry(cb_data, entry->name, entry->u.value.oid.hash,
>> -			   peel_status == PEEL_PEELED ?
>> -			   entry->u.value.peeled.hash : NULL);
>> -	return 0;
>> -}
> 
> This assertion goes away. It can't be moved into write_packed_entry()
> because the peel status is only known in the caller.
> 
> But here:
> 
>> @@ -1376,8 +1362,18 @@ static int commit_packed_refs(struct files_ref_store *refs)
>>  		die_errno("unable to fdopen packed-refs descriptor");
>>  
>>  	fprintf_or_die(out, "%s", PACKED_REFS_HEADER);
>> -	do_for_each_entry_in_dir(get_packed_ref_dir(packed_ref_cache),
>> -				 write_packed_entry_fn, out);
>> +
>> +	iter = cache_ref_iterator_begin(packed_ref_cache->cache, NULL, 0);
>> +	while ((ok = ref_iterator_advance(iter)) == ITER_OK) {
>> +		struct object_id peeled;
>> +		int peel_error = ref_iterator_peel(iter, &peeled);
>> +
>> +		write_packed_entry(out, iter->refname, iter->oid->hash,
>> +				   peel_error ? NULL : peeled.hash);
>> +	}
> 
> Should we be checking that peel_error is only PEELED or NON_TAG?

This is a good question, and it took me a while to figure out the whole
answer. At first I deleted this check without much thought because it
was just an internal consistency check that should never trigger. But
actually the story is more complicated than that.

Tl;dr: the old code is slightly wrong and I think the new code is correct.

First the superficial answer: we can't `peel_error` in
`commit_packed_refs()` as you suggested, because `ref_iterator_peel()`
doesn't return an `enum peel_status`. It only returns 0 on OK /  nonzero
for problems. A legitimate reference should never have a status
`PEEL_INVALID`. That status is meant for stale packed refs that are
hidden by more recent loose refs, which *can* legitimately point at
objects that have since been GCed. Since `ref_iterator_peel()` is
someday meant to be an exposed part of the API, I didn't want it to give
out more information than pass/fail [1]. Also, the reason for a peel
failure is not necessarily known accurately (information is lost when a
reference is packed then the packed-refs file is read).

So, when could the old error message have been emitted? It is when there
is an entry in the packed-ref cache that is not `REF_KNOWS_PEELED`, and
when the peel is attempted, the result is `PEEL_INVALID`,
`PEEL_IS_SYMREF`, or `PEEL_BROKEN`. Since a packed ref cannot be a
symref, `PEEL_IS_SYMREF` and `PEEL_BROKEN` can be ruled out. So we're
left with `PEEL_INVALID`.

How could an entry get into the packed-refs cache without
`REF_KNOWS_PEELED`? One of the following:

* It was read from a `packed-refs` file that didn't have the
`fully-peeled` attribute. In that case, we *don't want* to emit an
error, because the broken value is presumably masked by a loose version
of the same reference (which we just don't happen to be packing this
time). The old code incorrectly emits the error message in this case. It
was probably never reported as a bug because this scenario is rare.

* It was a loose reference that was just added to the packed ref cache
by `files_packed_refs()` via `pack_if_possible_fn()` in preparation for
being packed. The latter function refuses to pack a reference for which
`entry_resolves_to_object()` returns false, and otherwise calls
`peel_entry()` itself and checks the return value. So a reference added
this way should always be `REF_KNOWS_PEELED`.

Therefore, I think it is a good thing to remove this check. I'll improve
the commit message to make this story clearer.

Michael

[1] We could change this policy, for example by only documenting the
pass/fail return value externally, while distinguishing between the
types of failure when iterating internal to the module.


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH 18/20] commit_packed_refs(): use reference iteration
  2017-03-22  8:42     ` Michael Haggerty
@ 2017-03-22 13:06       ` Jeff King
  0 siblings, 0 replies; 38+ messages in thread
From: Jeff King @ 2017-03-22 13:06 UTC (permalink / raw)
  To: Michael Haggerty
  Cc: Junio C Hamano, Nguyễn Thái Ngọc Duy,
	David Turner, git

On Wed, Mar 22, 2017 at 09:42:58AM +0100, Michael Haggerty wrote:

> > Should we be checking that peel_error is only PEELED or NON_TAG?
> 
> This is a good question, and it took me a while to figure out the whole
> answer. At first I deleted this check without much thought because it
> was just an internal consistency check that should never trigger. But
> actually the story is more complicated than that.
> 
> Tl;dr: the old code is slightly wrong and I think the new code is correct.

OK, that all makes sense. Thanks for digging.

-Peff

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2017-03-22 13:06 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-20 16:33 [PATCH 00/20] Separate `ref_cache` into a separate module Michael Haggerty
2017-03-20 16:33 ` [PATCH 01/20] get_ref_dir(): don't call read_loose_refs() for "refs/bisect" Michael Haggerty
2017-03-20 16:33 ` [PATCH 02/20] refs_read_raw_ref(): new function Michael Haggerty
2017-03-20 16:33 ` [PATCH 03/20] refs_ref_iterator_begin(): " Michael Haggerty
2017-03-20 16:33 ` [PATCH 04/20] refs_verify_refname_available(): implement once for all backends Michael Haggerty
2017-03-20 17:42   ` Jeff King
2017-03-20 22:20     ` Michael Haggerty
2017-03-20 16:33 ` [PATCH 05/20] refs_verify_refname_available(): use function in more places Michael Haggerty
2017-03-20 16:33 ` [PATCH 06/20] Rename `add_ref()` to `add_ref_entry()` Michael Haggerty
2017-03-20 16:33 ` [PATCH 07/20] Rename `find_ref()` to `find_ref_entry()` Michael Haggerty
2017-03-20 16:33 ` [PATCH 08/20] Rename `remove_entry()` to `remove_entry_from_dir()` Michael Haggerty
2017-03-20 16:33 ` [PATCH 09/20] refs: split `ref_cache` code into separate files Michael Haggerty
2017-03-20 17:49   ` Jeff King
2017-03-20 19:47     ` Junio C Hamano
2017-03-20 20:35       ` Stefan Beller
2017-03-20 22:40         ` Junio C Hamano
2017-03-20 16:33 ` [PATCH 10/20] ref-cache: introduce a new type, ref_cache Michael Haggerty
2017-03-20 16:33 ` [PATCH 11/20] refs: record the ref_store in ref_cache, not ref_dir Michael Haggerty
2017-03-20 17:51   ` Jeff King
2017-03-20 22:39     ` Michael Haggerty
2017-03-20 16:33 ` [PATCH 12/20] ref-cache: use a callback function to fill the cache Michael Haggerty
2017-03-20 16:33 ` [PATCH 13/20] refs: handle "refs/bisect/" in `loose_fill_ref_dir()` Michael Haggerty
2017-03-20 16:33 ` [PATCH 14/20] do_for_each_entry_in_dir(): eliminate `offset` argument Michael Haggerty
2017-03-20 16:33 ` [PATCH 15/20] get_loose_ref_dir(): function renamed from get_loose_refs() Michael Haggerty
2017-03-20 16:33 ` [PATCH 16/20] get_loose_ref_cache(): new function Michael Haggerty
2017-03-20 16:33 ` [PATCH 17/20] cache_ref_iterator_begin(): make function smarter Michael Haggerty
2017-03-20 16:33 ` [PATCH 18/20] commit_packed_refs(): use reference iteration Michael Haggerty
2017-03-20 18:05   ` Jeff King
2017-03-22  8:42     ` Michael Haggerty
2017-03-22 13:06       ` Jeff King
2017-03-20 16:33 ` [PATCH 19/20] files_pack_refs(): " Michael Haggerty
2017-03-20 16:33 ` [PATCH 20/20] do_for_each_entry_in_dir(): delete function Michael Haggerty
2017-03-20 17:25 ` [PATCH 00/20] Separate `ref_cache` into a separate module Junio C Hamano
2017-03-20 18:12 ` Jeff King
2017-03-20 18:24 ` Ævar Arnfjörð Bjarmason
2017-03-20 18:30   ` Jeff King
2017-03-20 22:32 ` Junio C Hamano
2017-03-20 22:48   ` Michael Haggerty

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).