git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Michael Haggerty <mhagger@alum.mit.edu>
To: Junio C Hamano <gitster@pobox.com>
Cc: "Stefan Beller" <sbeller@google.com>,
	"Johannes Schindelin" <Johannes.Schindelin@gmx.de>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>,
	"Jeff King" <peff@peff.net>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Brandon Williams" <bmwill@google.com>,
	git@vger.kernel.org, "Michael Haggerty" <mhagger@alum.mit.edu>
Subject: [PATCH v3 15/21] packed_ref_iterator_begin(): iterate using `mmapped_ref_iterator`
Date: Mon, 25 Sep 2017 10:00:12 +0200	[thread overview]
Message-ID: <e81507d182b1ccf17e21594a98bf065b809539f3.1506325610.git.mhagger@alum.mit.edu> (raw)
In-Reply-To: <cover.1506325610.git.mhagger@alum.mit.edu>

Now that we have an efficient way to iterate, in order, over the
mmapped contents of the `packed-refs` file, we can use that directly
to implement reference iteration for the `packed_ref_store`, rather
than iterating over the `ref_cache`. This is the next step towards
getting rid of the `ref_cache` entirely.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
---
 refs/packed-backend.c | 109 ++++++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 106 insertions(+), 3 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index a7fc613c06..abf14a1405 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -397,6 +397,27 @@ static int cmp_packed_ref_entries(const void *v1, const void *v2)
 	}
 }
 
+/*
+ * Compare a packed-refs record pointed to by `rec` to the specified
+ * NUL-terminated refname.
+ */
+static int cmp_entry_to_refname(const char *rec, const char *refname)
+{
+	const char *r1 = rec + GIT_SHA1_HEXSZ + 1;
+	const char *r2 = refname;
+
+	while (1) {
+		if (*r1 == '\n')
+			return *r2 ? -1 : 0;
+		if (!*r2)
+			return 1;
+		if (*r1 != *r2)
+			return (unsigned char)*r1 < (unsigned char)*r2 ? -1 : +1;
+		r1++;
+		r2++;
+	}
+}
+
 /*
  * `packed_refs->buf` is not known to be sorted. Check whether it is,
  * and if not, sort it into new memory and munmap/free the old
@@ -503,6 +524,17 @@ static const char *find_start_of_record(const char *buf, const char *p)
 	return p;
 }
 
+/*
+ * Return a pointer to the start of the record following the record
+ * that contains `*p`. If none is found before `end`, return `end`.
+ */
+static const char *find_end_of_record(const char *p, const char *end)
+{
+	while (++p < end && (p[-1] != '\n' || p[0] == '^'))
+		;
+	return p;
+}
+
 /*
  * We want to be able to compare mmapped reference records quickly,
  * without totally parsing them. We can do so because the records are
@@ -592,6 +624,65 @@ static int load_contents(struct packed_ref_cache *packed_refs)
 	return 1;
 }
 
+/*
+ * Find the place in `cache->buf` where the start of the record for
+ * `refname` starts. If `mustexist` is true and the reference doesn't
+ * exist, then return NULL. If `mustexist` is false and the reference
+ * doesn't exist, then return the point where that reference would be
+ * inserted. In the latter mode, `refname` doesn't have to be a proper
+ * reference name; for example, one could search for "refs/replace/"
+ * to find the start of any replace references.
+ *
+ * The record is sought using a binary search, so `cache->buf` must be
+ * sorted.
+ */
+static const char *find_reference_location(struct packed_ref_cache *cache,
+					   const char *refname, int mustexist)
+{
+	/*
+	 * This is not *quite* a garden-variety binary search, because
+	 * the data we're searching is made up of records, and we
+	 * always need to find the beginning of a record to do a
+	 * comparison. A "record" here is one line for the reference
+	 * itself and zero or one peel lines that start with '^'. Our
+	 * loop invariant is described in the next two comments.
+	 */
+
+	/*
+	 * A pointer to the character at the start of a record whose
+	 * preceding records all have reference names that come
+	 * *before* `refname`.
+	 */
+	const char *lo = cache->buf + cache->header_len;
+
+	/*
+	 * A pointer to a the first character of a record whose
+	 * reference name comes *after* `refname`.
+	 */
+	const char *hi = cache->eof;
+
+	while (lo < hi) {
+		const char *mid, *rec;
+		int cmp;
+
+		mid = lo + (hi - lo) / 2;
+		rec = find_start_of_record(lo, mid);
+		cmp = cmp_entry_to_refname(rec, refname);
+		if (cmp < 0) {
+			lo = find_end_of_record(mid, hi);
+		} else if (cmp > 0) {
+			hi = rec;
+		} else {
+			return rec;
+		}
+	}
+
+	if (mustexist)
+		return NULL;
+	else
+		return lo;
+}
+
 /*
  * Read from the `packed-refs` file into a newly-allocated
  * `packed_ref_cache` and return it. The return value will already
@@ -888,6 +979,8 @@ static struct ref_iterator *packed_ref_iterator_begin(
 		const char *prefix, unsigned int flags)
 {
 	struct packed_ref_store *refs;
+	struct packed_ref_cache *packed_refs;
+	const char *start;
 	struct packed_ref_iterator *iter;
 	struct ref_iterator *ref_iterator;
 	unsigned int required_flags = REF_STORE_READ;
@@ -905,13 +998,23 @@ static struct ref_iterator *packed_ref_iterator_begin(
 	 * the packed-ref cache is up to date with what is on disk,
 	 * and re-reads it if not.
 	 */
+	iter->cache = packed_refs = get_packed_ref_cache(refs);
+	acquire_packed_ref_cache(packed_refs);
 
-	iter->cache = get_packed_ref_cache(refs);
-	acquire_packed_ref_cache(iter->cache);
-	iter->iter0 = cache_ref_iterator_begin(iter->cache->cache, prefix, 0);
+	if (prefix && *prefix)
+		start = find_reference_location(packed_refs, prefix, 0);
+	else
+		start = packed_refs->buf + packed_refs->header_len;
+
+	iter->iter0 = mmapped_ref_iterator_begin(
+			packed_refs, start, packed_refs->eof);
 
 	iter->flags = flags;
 
+	if (prefix && *prefix)
+		/* Stop iteration after we've gone *past* prefix: */
+		ref_iterator = prefix_ref_iterator_begin(ref_iterator, prefix, 0);
+
 	return ref_iterator;
 }
 
-- 
2.14.1


  parent reply	other threads:[~2017-09-25  8:01 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-25  7:59 [PATCH v3 00/21] Read `packed-refs` using mmap() Michael Haggerty
2017-09-25  7:59 ` [PATCH v3 01/21] ref_iterator: keep track of whether the iterator output is ordered Michael Haggerty
2017-09-25  7:59 ` [PATCH v3 02/21] prefix_ref_iterator: break when we leave the prefix Michael Haggerty
2017-09-25  8:00 ` [PATCH v3 03/21] packed_ref_cache: add a backlink to the associated `packed_ref_store` Michael Haggerty
2017-09-25  8:00 ` [PATCH v3 04/21] die_unterminated_line(), die_invalid_line(): new functions Michael Haggerty
2017-09-25  8:00 ` [PATCH v3 05/21] read_packed_refs(): use mmap to read the `packed-refs` file Michael Haggerty
2017-09-25  8:00 ` [PATCH v3 06/21] read_packed_refs(): only check for a header at the top of the file Michael Haggerty
2017-09-25  8:00 ` [PATCH v3 07/21] read_packed_refs(): make parsing of the header line more robust Michael Haggerty
2017-09-25  8:00 ` [PATCH v3 08/21] read_packed_refs(): read references with minimal copying Michael Haggerty
2017-09-25  8:00 ` [PATCH v3 09/21] packed_ref_cache: remember the file-wide peeling state Michael Haggerty
2017-09-25  8:00 ` [PATCH v3 10/21] mmapped_ref_iterator: add iterator over a packed-refs file Michael Haggerty
2017-09-25  8:00 ` [PATCH v3 11/21] mmapped_ref_iterator_advance(): no peeled value for broken refs Michael Haggerty
2017-09-25  8:00 ` [PATCH v3 12/21] packed-backend.c: reorder some definitions Michael Haggerty
2017-09-25  8:00 ` [PATCH v3 13/21] packed_ref_cache: keep the `packed-refs` file mmapped if possible Michael Haggerty
2017-09-25  8:00 ` [PATCH v3 14/21] read_packed_refs(): ensure that references are ordered when read Michael Haggerty
2017-09-25  8:00 ` Michael Haggerty [this message]
2017-09-25  8:00 ` [PATCH v3 16/21] packed_read_raw_ref(): read the reference from the mmapped buffer Michael Haggerty
2017-09-25  8:00 ` [PATCH v3 17/21] ref_store: implement `refs_peel_ref()` generically Michael Haggerty
2017-09-25  8:00 ` [PATCH v3 18/21] packed_ref_store: get rid of the `ref_cache` entirely Michael Haggerty
2017-09-25  8:00 ` [PATCH v3 19/21] ref_cache: remove support for storing peeled values Michael Haggerty
2017-09-25  8:00 ` [PATCH v3 20/21] mmapped_ref_iterator: inline into `packed_ref_iterator` Michael Haggerty
2017-09-25  8:00 ` [PATCH v3 21/21] packed-backend.c: rename a bunch of things and update comments Michael Haggerty
2017-09-25 12:22 ` [PATCH v3 00/21] Read `packed-refs` using mmap() Jeff King
2017-09-29  2:13   ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e81507d182b1ccf17e21594a98bf065b809539f3.1506325610.git.mhagger@alum.mit.edu \
    --to=mhagger@alum.mit.edu \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=bmwill@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=pclouds@gmail.com \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).