git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jonathan Tan <jonathantanmy@google.com>
To: git@vger.kernel.org
Cc: Jonathan Tan <jonathantanmy@google.com>, stolee@gmail.com
Subject: [PATCH 2/2] packfile: refactor hash search with fanout table
Date: Fri,  2 Feb 2018 14:36:31 -0800	[thread overview]
Message-ID: <007f3a4c84cb1c07255404ad1ea9f797129c5cf0.1517609773.git.jonathantanmy@google.com> (raw)
In-Reply-To: <cover.1517609773.git.jonathantanmy@google.com>
In-Reply-To: <cover.1517609773.git.jonathantanmy@google.com>

Subsequent patches will introduce file formats that make use of a fanout
array and a sorted table containing hashes, just like packfiles.
Refactor the hash search in packfile.c into its own function, so that
those patches can make use of it as well.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 packfile.c    | 19 +++++--------------
 sha1-lookup.c | 24 ++++++++++++++++++++++++
 sha1-lookup.h | 21 +++++++++++++++++++++
 3 files changed, 50 insertions(+), 14 deletions(-)

diff --git a/packfile.c b/packfile.c
index 58bdced3b..29f5dc239 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1712,7 +1712,8 @@ off_t find_pack_entry_one(const unsigned char *sha1,
 {
 	const uint32_t *level1_ofs = p->index_data;
 	const unsigned char *index = p->index_data;
-	unsigned hi, lo, stride;
+	unsigned stride;
+	int ret;
 
 	if (!index) {
 		if (open_pack_index(p))
@@ -1725,8 +1726,6 @@ off_t find_pack_entry_one(const unsigned char *sha1,
 		index += 8;
 	}
 	index += 4 * 256;
-	hi = ntohl(level1_ofs[*sha1]);
-	lo = ((*sha1 == 0x0) ? 0 : ntohl(level1_ofs[*sha1 - 1]));
 	if (p->index_version > 1) {
 		stride = 20;
 	} else {
@@ -1734,17 +1733,9 @@ off_t find_pack_entry_one(const unsigned char *sha1,
 		index += 4;
 	}
 
-	while (lo < hi) {
-		unsigned mi = lo + (hi - lo) / 2;
-		int cmp = hashcmp(index + mi * stride, sha1);
-
-		if (!cmp)
-			return nth_packed_object_offset(p, mi);
-		if (cmp > 0)
-			hi = mi;
-		else
-			lo = mi+1;
-	}
+	ret = bsearch_hash(sha1, level1_ofs, index, stride);
+	if (ret >= 0)
+		return nth_packed_object_offset(p, ret);
 	return 0;
 }
 
diff --git a/sha1-lookup.c b/sha1-lookup.c
index 4cf3ebd92..d11c5e526 100644
--- a/sha1-lookup.c
+++ b/sha1-lookup.c
@@ -99,3 +99,27 @@ int sha1_pos(const unsigned char *sha1, void *table, size_t nr,
 	} while (lo < hi);
 	return -lo-1;
 }
+
+int bsearch_hash(const unsigned char *sha1, const void *fanout_,
+		 const void *table_, size_t stride)
+{
+	const uint32_t *fanout = fanout_;
+	const unsigned char *table = table_;
+	int hi, lo;
+
+	hi = ntohl(fanout[*sha1]);
+	lo = ((*sha1 == 0x0) ? 0 : ntohl(fanout[*sha1 - 1]));
+
+	while (lo < hi) {
+		unsigned mi = lo + (hi - lo) / 2;
+		int cmp = hashcmp(table + mi * stride, sha1);
+
+		if (!cmp)
+			return mi;
+		if (cmp > 0)
+			hi = mi;
+		else
+			lo = mi + 1;
+	}
+	return -lo - 1;
+}
diff --git a/sha1-lookup.h b/sha1-lookup.h
index cf5314f40..3c59e9cb1 100644
--- a/sha1-lookup.h
+++ b/sha1-lookup.h
@@ -7,4 +7,25 @@ extern int sha1_pos(const unsigned char *sha1,
 		    void *table,
 		    size_t nr,
 		    sha1_access_fn fn);
+
+/*
+ * Searches for sha1 in table, using the given fanout table to determine the
+ * interval to search, then using binary search. Returns the element index of
+ * the position found if successful, -i-1 if not (where i is the index of the
+ * least element that is greater than sha1).
+ *
+ * Takes the following parameters:
+ *
+ *  - sha1: the hash to search for
+ *  - fanout: a 256-element array of NETWORK-order 32-bit integers; the integer
+ *    at position i represents the number of elements in table whose first byte
+ *    is less than or equal to i
+ *  - table: a sorted list of hashes with optional extra information in between
+ *  - stride: distance between two consecutive elements in table (should be
+ *    GIT_MAX_RAWSZ or greater)
+ *
+ * This function does not verify the validity of the fanout table.
+ */
+extern int bsearch_hash(const unsigned char *sha1, const void *fanout,
+			const void *table, size_t stride);
 #endif
-- 
2.16.0.rc1.238.g530d649a79-goog


  parent reply	other threads:[~2018-02-02 22:36 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-02 22:36 [PATCH 0/2] Refactor hash search with fanout table Jonathan Tan
2018-02-02 22:36 ` [PATCH 1/2] packfile: remove GIT_DEBUG_LOOKUP log statements Jonathan Tan
2018-02-02 22:36 ` Jonathan Tan [this message]
2018-02-09 18:03   ` [PATCH 2/2] packfile: refactor hash search with fanout table René Scharfe
2018-02-09 19:50     ` Jonathan Tan
2018-02-02 23:30 ` [PATCH 0/2] Refactor " Junio C Hamano
2018-02-03  2:09   ` Derrick Stolee
2018-02-13 18:39 ` [PATCH v2 " Jonathan Tan
2018-02-13 18:39   ` [PATCH v2 1/2] packfile: remove GIT_DEBUG_LOOKUP log statements Jonathan Tan
2018-02-13 18:39   ` [PATCH v2 2/2] packfile: refactor hash search with fanout table Jonathan Tan
2018-02-13 18:52   ` [PATCH v2 0/2] Refactor " Derrick Stolee
2018-02-13 19:57   ` Junio C Hamano
2018-02-13 20:15     ` Jonathan Tan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=007f3a4c84cb1c07255404ad1ea9f797129c5cf0.1517609773.git.jonathantanmy@google.com \
    --to=jonathantanmy@google.com \
    --cc=git@vger.kernel.org \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).