[PATCH] sha1_name: use bsearch_hash() for abbreviations

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH] sha1_name: use bsearch_hash() for abbreviations
@ 2018-03-20 20:03 Derrick Stolee
  2018-03-20 22:25 ` Jonathan Tan
  0 siblings, 1 reply; 12+ messages in thread
From: Derrick Stolee @ 2018-03-20 20:03 UTC (permalink / raw)
  To: git; +Cc: jonathantanmy, stolee, Derrick Stolee

This patch updates the abbreviation code to use bsearch_hash() as defined
in [1]. It gets a nice speedup since the old implementation did not use
the fanout table at all.

One caveat about the patch: there is a place where I cast a sha1 hash
into a struct object_id pointer. This is because the abbreviation code
still uses 'const unsigned char *' instead of structs. I wanted to avoid
a hashcpy() in these calls, but perhaps that is not too heavy a cost.

I look forward to feedback on this.

Thanks,
-Stolee

[1] https://public-inbox.org/git/007f3a4c84cb1c07255404ad1ea9f797129c5cf0.1517609773.git.jonathantanmy@google.com/
    [PATCH 2/2] packfile: refactor hash search with fanout table

-- >8 --

When computing abbreviation lengths for an object ID against a single
packfile, the method find_abbrev_len_for_pack() currently implements
binary search. This is one of several implementations. One issue with
this implementation is that it ignores the fanout table in the pack-
index.

Translate this binary search to use the existing bsearch_hash() method
that correctly uses a fanout table. To keep the details about pack-
index version 1 or 2 out of sha1_name.c, create a bsearch_pack() method
in packfile.c.

Due to the use of the fanout table, the abbreviation computation is
slightly faster than before. For a fully-repacked copy of the Linux
repo, the following 'git log' commands improved:

* git log --oneline --parents --raw
  Before: 66.83s
  After:  64.85s
  Rel %:  -2.9%

* git log --oneline --parents
  Before: 7.85s
  After:  7.29s
  Rel %: -7.1%

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 packfile.c  | 23 +++++++++++++++++++++++
 packfile.h  |  8 ++++++++
 sha1_name.c | 24 ++++--------------------
 3 files changed, 35 insertions(+), 20 deletions(-)

diff --git a/packfile.c b/packfile.c
index 7c1a2519fc..ea0388f6dd 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1654,6 +1654,29 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset,
 	return data;
 }
 
+int bsearch_pack(const struct object_id *oid, const struct packed_git *p, uint32_t *result)
+{
+	const unsigned char *index_fanout = p->index_data;
+	const unsigned char *index_lookup;
+	int index_lookup_width;
+
+	if (!index_fanout)
+		BUG("bsearch_pack called without a valid pack-index");
+
+	index_lookup = index_fanout + 4 * 256;
+	if (p->index_version == 1) {
+		index_lookup_width = 24;
+		index_lookup += 4;
+	} else {
+		index_lookup_width = 20;
+		index_fanout += 8;
+		index_lookup += 8;
+	}
+
+	return bsearch_hash(oid->hash, (const uint32_t*)index_fanout,
+			    index_lookup, index_lookup_width, result);
+}
+
 const unsigned char *nth_packed_object_sha1(struct packed_git *p,
 					    uint32_t n)
 {
diff --git a/packfile.h b/packfile.h
index a7fca598d6..ec08cb2bb0 100644
--- a/packfile.h
+++ b/packfile.h
@@ -78,6 +78,14 @@ extern struct packed_git *add_packed_git(const char *path, size_t path_len, int
  */
 extern void check_pack_index_ptr(const struct packed_git *p, const void *ptr);
 
+/*
+ * Perform binary search on a pack-index for a given oid. Packfile is expected to
+ * have a valid pack-index.
+ *
+ * See 'bsearch_hash' for more information.
+ */
+int bsearch_pack(const struct object_id *oid, const struct packed_git *p, uint32_t *result);
+
 /*
  * Return the SHA-1 of the nth object within the specified packfile.
  * Open the index if it is not already open.  The return value points
diff --git a/sha1_name.c b/sha1_name.c
index 735c1c0b8e..be3627b915 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -512,32 +512,16 @@ static void find_abbrev_len_for_pack(struct packed_git *p,
 				     struct min_abbrev_data *mad)
 {
 	int match = 0;
-	uint32_t num, last, first = 0;
+	uint32_t num, first = 0;
 	struct object_id oid;
+	const struct object_id *mad_oid;
 
 	if (open_pack_index(p) || !p->num_objects)
 		return;
 
 	num = p->num_objects;
-	last = num;
-	while (first < last) {
-		uint32_t mid = first + (last - first) / 2;
-		const unsigned char *current;
-		int cmp;
-
-		current = nth_packed_object_sha1(p, mid);
-		cmp = hashcmp(mad->hash, current);
-		if (!cmp) {
-			match = 1;
-			first = mid;
-			break;
-		}
-		if (cmp > 0) {
-			first = mid + 1;
-			continue;
-		}
-		last = mid;
-	}
+	mad_oid = (const struct object_id *)mad->hash;
+	match = bsearch_pack(mad_oid, p, &first);
 
 	/*
 	 * first is now the position in the packfile where we would insert
-- 
2.17.0.rc0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] sha1_name: use bsearch_hash() for abbreviations
  2018-03-20 20:03 [PATCH] sha1_name: use bsearch_hash() for abbreviations Derrick Stolee
@ 2018-03-20 22:25 ` Jonathan Tan
  2018-03-21 13:24   ` Derrick Stolee
  0 siblings, 1 reply; 12+ messages in thread
From: Jonathan Tan @ 2018-03-20 22:25 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, stolee

On Tue, 20 Mar 2018 16:03:25 -0400
Derrick Stolee <dstolee@microsoft.com> wrote:

> This patch updates the abbreviation code to use bsearch_hash() as defined
> in [1]. It gets a nice speedup since the old implementation did not use
> the fanout table at all.

You can refer to the patch as:

  b4e00f7306a1 ("packfile: refactor hash search with fanout table",
  2018-02-15)

Also, might be worth noting that this patch builds on
jt/binsearch-with-fanout.

> One caveat about the patch: there is a place where I cast a sha1 hash
> into a struct object_id pointer. This is because the abbreviation code
> still uses 'const unsigned char *' instead of structs. I wanted to avoid
> a hashcpy() in these calls, but perhaps that is not too heavy a cost.

I recall a discussion that there were alignment issues with doing this,
but I might have be remembering wrongly - in my limited knowledge of C
alignment, both "unsigned char *" and "struct object_id *" have the same
constraints, but I'm not sure.

> +	const unsigned char *index_fanout = p->index_data;
[snip]
> +	return bsearch_hash(oid->hash, (const uint32_t*)index_fanout,
> +			    index_lookup, index_lookup_width, result);

This cast to "const uint32_t *" is safe, because p->index_data points to
a mmap-ed region (which has very good alignment, as far as I know). I
wonder if we should document alignment guarantees on p->index_data, and
if yes, what guarantees to declare.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] sha1_name: use bsearch_hash() for abbreviations
  2018-03-20 22:25 ` Jonathan Tan
@ 2018-03-21 13:24   ` Derrick Stolee
  2018-03-21 22:42     ` brian m. carlson
  0 siblings, 1 reply; 12+ messages in thread
From: Derrick Stolee @ 2018-03-21 13:24 UTC (permalink / raw)
  To: Jonathan Tan, Derrick Stolee; +Cc: git, sandals

On 3/20/2018 6:25 PM, Jonathan Tan wrote:
> On Tue, 20 Mar 2018 16:03:25 -0400
> Derrick Stolee <dstolee@microsoft.com> wrote:
>
>> This patch updates the abbreviation code to use bsearch_hash() as defined
>> in [1]. It gets a nice speedup since the old implementation did not use
>> the fanout table at all.
> You can refer to the patch as:
>
>    b4e00f7306a1 ("packfile: refactor hash search with fanout table",
>    2018-02-15)
>
> Also, might be worth noting that this patch builds on
> jt/binsearch-with-fanout.

Thanks. That commit is in master and v2.17.0-rc0, so hopefully it isn't 
difficult to apply the patch.

>
>> One caveat about the patch: there is a place where I cast a sha1 hash
>> into a struct object_id pointer. This is because the abbreviation code
>> still uses 'const unsigned char *' instead of structs. I wanted to avoid
>> a hashcpy() in these calls, but perhaps that is not too heavy a cost.
> I recall a discussion that there were alignment issues with doing this,
> but I might have be remembering wrongly - in my limited knowledge of C
> alignment, both "unsigned char *" and "struct object_id *" have the same
> constraints, but I'm not sure.

Adding Brian M. Carlson in the CC line for advice on how to do this 
translation between old sha1's and new object_ids. If it isn't safe, 
then we could do a hashcpy() until the translation makes it unnecessary.

I should have compared the two methods before sending the patch, but 
running the "git log --oneline --parents" test with a hashcpy() versus a 
cast has no measurable difference in performance for Linux. Probably 
best to do the safest thing here if there is no cost to perf.

>
>> +	const unsigned char *index_fanout = p->index_data;
> [snip]
>> +	return bsearch_hash(oid->hash, (const uint32_t*)index_fanout,
>> +			    index_lookup, index_lookup_width, result);
> This cast to "const uint32_t *" is safe, because p->index_data points to
> a mmap-ed region (which has very good alignment, as far as I know). I
> wonder if we should document alignment guarantees on p->index_data, and
> if yes, what guarantees to declare.

The existing application in find_pack_entry_one() [1] does similar 
pack-index arithmetic before calling. A similar amount of arithmetic and 
alignment concerns appear in the commit-graph patch. Where is the right 
place to declare these alignment requirements?

In v2, I'll have find_pack_entry_one() call bsearch_pack() so this logic 
is not duplicated.

Thanks,
-Stolee

[1] 
https://github.com/git/git/blob/c6284da4ff4afbde8211efe5d03f3604b1c6b9d6/packfile.c#L1721-L1749
     find_pack_entry_one() in packfile.c

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] sha1_name: use bsearch_hash() for abbreviations
  2018-03-21 13:24   ` Derrick Stolee
@ 2018-03-21 22:42     ` brian m. carlson
  2018-03-22 17:40       ` [PATCH v2 0/3] Use " Derrick Stolee
  0 siblings, 1 reply; 12+ messages in thread
From: brian m. carlson @ 2018-03-21 22:42 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Jonathan Tan, Derrick Stolee, git

[-- Attachment #1: Type: text/plain, Size: 3837 bytes --]

On Wed, Mar 21, 2018 at 09:24:06AM -0400, Derrick Stolee wrote:
> On 3/20/2018 6:25 PM, Jonathan Tan wrote:
> > On Tue, 20 Mar 2018 16:03:25 -0400
> > Derrick Stolee <dstolee@microsoft.com> wrote:
> > > One caveat about the patch: there is a place where I cast a sha1 hash
> > > into a struct object_id pointer. This is because the abbreviation code
> > > still uses 'const unsigned char *' instead of structs. I wanted to avoid
> > > a hashcpy() in these calls, but perhaps that is not too heavy a cost.
> > I recall a discussion that there were alignment issues with doing this,
> > but I might have be remembering wrongly - in my limited knowledge of C
> > alignment, both "unsigned char *" and "struct object_id *" have the same
> > constraints, but I'm not sure.
> 
> Adding Brian M. Carlson in the CC line for advice on how to do this
> translation between old sha1's and new object_ids. If it isn't safe, then we
> could do a hashcpy() until the translation makes it unnecessary.
> 
> I should have compared the two methods before sending the patch, but running
> the "git log --oneline --parents" test with a hashcpy() versus a cast has no
> measurable difference in performance for Linux. Probably best to do the
> safest thing here if there is no cost to perf.

There is no alignment difference.  The alignment of struct object_id is
going to be the same as the underlying hash.  My concern in the past has
been strict aliasing violations, which compilers can sometimes exploit
to generate incorrect code.

However, the bigger concern tends to be that when we switch to a new
hash function, we may extend struct object_id with a hash type byte.
The current hash function transition plan certainly makes this a likely
scenario.  In such a case, a cast would end reading past the end of the
underlying array should we read the type byte.

If this isn't a performance critical path, I'd recommend simply making a
copy.  I can clean up the definition of struct min_abbrev_data in a
future series, or I can do something like the following on top of the
last series I sent, which is in next (only compile tested).  If you're
willing to wait until it hits master, you can just drop the patch in.

-- >8 --
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: "brian m. carlson" <sandals@crustytoothpaste.net>
Date: Wed, 21 Mar 2018 22:38:09 +0000
Subject: [PATCH] sha1_name: convert struct min_abbrev_data to object_id

This structure is only written to in one place, where we already have a
struct object_id.  Convert the struct to use a struct object_id instead.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 sha1_name.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index 39e911c8ba..16e0003396 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -480,7 +480,7 @@ struct min_abbrev_data {
 	unsigned int init_len;
 	unsigned int cur_len;
 	char *hex;
-	const unsigned char *hash;
+	const struct object_id *oid;
 };
 
 static inline char get_hex_char_from_oid(const struct object_id *oid,
@@ -526,7 +526,7 @@ static void find_abbrev_len_for_pack(struct packed_git *p,
 		int cmp;
 
 		current = nth_packed_object_sha1(p, mid);
-		cmp = hashcmp(mad->hash, current);
+		cmp = hashcmp(mad->oid->hash, current);
 		if (!cmp) {
 			match = 1;
 			first = mid;
@@ -603,7 +603,7 @@ int find_unique_abbrev_r(char *hex, const struct object_id *oid, int len)
 	mad.init_len = len;
 	mad.cur_len = len;
 	mad.hex = hex;
-	mad.hash = oid->hash;
+	mad.oid = oid;
 
 	find_abbrev_len_packed(&mad);
 
-- >8 --
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 867 bytes --]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 0/3] Use bsearch_hash() for abbreviations
  2018-03-21 22:42     ` brian m. carlson
@ 2018-03-22 17:40       ` Derrick Stolee
  2018-03-22 17:40         ` [PATCH v2 1/3] sha1_name: convert struct min_abbrev_data to object_id Derrick Stolee
                           ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Derrick Stolee @ 2018-03-22 17:40 UTC (permalink / raw)
  To: git; +Cc: stolee, jonathantanmy, sandals, Derrick Stolee

Thanks to Jonathan and Brian for the help with the proper way to handle
OIDs and existing callers to bsearch_hash(). This patch includes one
commit that Brian sent in the previous discussion (included again here
for completeness).

Derrick Stolee (2):
  packfile: define and use bsearch_pack()
  sha1_name: use bsearch_pack() for abbreviations

brian m. carlson (1):
  sha1_name: use bsearch_hash() for abbreviations

 packfile.c  | 42 ++++++++++++++++++++++++++----------------
 packfile.h  |  8 ++++++++
 sha1_name.c | 28 ++++++----------------------
 3 files changed, 40 insertions(+), 38 deletions(-)


base-commit: 1a750441a7360b29fff7a414649ece1d35acaca6
-- 
2.17.0.rc0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/3] sha1_name: convert struct min_abbrev_data to object_id
  2018-03-22 17:40       ` [PATCH v2 0/3] Use " Derrick Stolee
@ 2018-03-22 17:40         ` Derrick Stolee
  2018-03-22 17:40         ` [PATCH v2 2/3] packfile: define and use bsearch_pack() Derrick Stolee
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Derrick Stolee @ 2018-03-22 17:40 UTC (permalink / raw)
  To: git; +Cc: stolee, jonathantanmy, sandals

From: "brian m. carlson" <sandals@crustytoothpaste.net>

This structure is only written to in one place, where we already have a
struct object_id.  Convert the struct to use a struct object_id instead.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
---
 sha1_name.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index 39e911c8ba..16e0003396 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -480,7 +480,7 @@ struct min_abbrev_data {
 	unsigned int init_len;
 	unsigned int cur_len;
 	char *hex;
-	const unsigned char *hash;
+	const struct object_id *oid;
 };
 
 static inline char get_hex_char_from_oid(const struct object_id *oid,
@@ -526,7 +526,7 @@ static void find_abbrev_len_for_pack(struct packed_git *p,
 		int cmp;
 
 		current = nth_packed_object_sha1(p, mid);
-		cmp = hashcmp(mad->hash, current);
+		cmp = hashcmp(mad->oid->hash, current);
 		if (!cmp) {
 			match = 1;
 			first = mid;
@@ -603,7 +603,7 @@ int find_unique_abbrev_r(char *hex, const struct object_id *oid, int len)
 	mad.init_len = len;
 	mad.cur_len = len;
 	mad.hex = hex;
-	mad.hash = oid->hash;
+	mad.oid = oid;
 
 	find_abbrev_len_packed(&mad);
 

base-commit: 1a750441a7360b29fff7a414649ece1d35acaca6
-- 
2.17.0.rc0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/3] packfile: define and use bsearch_pack()
  2018-03-22 17:40       ` [PATCH v2 0/3] Use " Derrick Stolee
  2018-03-22 17:40         ` [PATCH v2 1/3] sha1_name: convert struct min_abbrev_data to object_id Derrick Stolee
@ 2018-03-22 17:40         ` Derrick Stolee
  2018-03-22 17:40         ` [PATCH v2 3/3] sha1_name: use bsearch_pack() for abbreviations Derrick Stolee
  2018-03-24 16:41         ` [PATCH 4/3] sha1_name: use bsearch_pack() in unique_in_pack() René Scharfe
  3 siblings, 0 replies; 12+ messages in thread
From: Derrick Stolee @ 2018-03-22 17:40 UTC (permalink / raw)
  To: git; +Cc: stolee, jonathantanmy, sandals, Derrick Stolee

The method bsearch_hash() generalizes binary searches using a
fanout table. The only consumer is currently find_pack_entry_one().
It requires a bit of pointer arithmetic to align the fanout table
and the lookup table depending on the pack-index version.

Extract the pack-index pointer arithmetic to a new method,
bsearch_pack(), so this can be re-used in other code paths.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 packfile.c | 42 ++++++++++++++++++++++++++----------------
 packfile.h |  8 ++++++++
 2 files changed, 34 insertions(+), 16 deletions(-)

diff --git a/packfile.c b/packfile.c
index f26395ecab..69d3afda6c 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1654,6 +1654,29 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset,
 	return data;
 }
 
+int bsearch_pack(const struct object_id *oid, const struct packed_git *p, uint32_t *result)
+{
+	const unsigned char *index_fanout = p->index_data;
+	const unsigned char *index_lookup;
+	int index_lookup_width;
+
+	if (!index_fanout)
+		BUG("bsearch_pack called without a valid pack-index");
+
+	index_lookup = index_fanout + 4 * 256;
+	if (p->index_version == 1) {
+		index_lookup_width = 24;
+		index_lookup += 4;
+	} else {
+		index_lookup_width = 20;
+		index_fanout += 8;
+		index_lookup += 8;
+	}
+
+	return bsearch_hash(oid->hash, (const uint32_t*)index_fanout,
+			    index_lookup, index_lookup_width, result);
+}
+
 const unsigned char *nth_packed_object_sha1(struct packed_git *p,
 					    uint32_t n)
 {
@@ -1720,30 +1743,17 @@ off_t nth_packed_object_offset(const struct packed_git *p, uint32_t n)
 off_t find_pack_entry_one(const unsigned char *sha1,
 				  struct packed_git *p)
 {
-	const uint32_t *level1_ofs = p->index_data;
 	const unsigned char *index = p->index_data;
-	unsigned stride;
+	struct object_id oid;
 	uint32_t result;
 
 	if (!index) {
 		if (open_pack_index(p))
 			return 0;
-		level1_ofs = p->index_data;
-		index = p->index_data;
-	}
-	if (p->index_version > 1) {
-		level1_ofs += 2;
-		index += 8;
-	}
-	index += 4 * 256;
-	if (p->index_version > 1) {
-		stride = 20;
-	} else {
-		stride = 24;
-		index += 4;
 	}
 
-	if (bsearch_hash(sha1, level1_ofs, index, stride, &result))
+	hashcpy(oid.hash, sha1);
+	if (bsearch_pack(&oid, p, &result))
 		return nth_packed_object_offset(p, result);
 	return 0;
 }
diff --git a/packfile.h b/packfile.h
index a7fca598d6..ec08cb2bb0 100644
--- a/packfile.h
+++ b/packfile.h
@@ -78,6 +78,14 @@ extern struct packed_git *add_packed_git(const char *path, size_t path_len, int
  */
 extern void check_pack_index_ptr(const struct packed_git *p, const void *ptr);
 
+/*
+ * Perform binary search on a pack-index for a given oid. Packfile is expected to
+ * have a valid pack-index.
+ *
+ * See 'bsearch_hash' for more information.
+ */
+int bsearch_pack(const struct object_id *oid, const struct packed_git *p, uint32_t *result);
+
 /*
  * Return the SHA-1 of the nth object within the specified packfile.
  * Open the index if it is not already open.  The return value points
-- 
2.17.0.rc0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 3/3] sha1_name: use bsearch_pack() for abbreviations
  2018-03-22 17:40       ` [PATCH v2 0/3] Use " Derrick Stolee
  2018-03-22 17:40         ` [PATCH v2 1/3] sha1_name: convert struct min_abbrev_data to object_id Derrick Stolee
  2018-03-22 17:40         ` [PATCH v2 2/3] packfile: define and use bsearch_pack() Derrick Stolee
@ 2018-03-22 17:40         ` Derrick Stolee
  2018-03-24 16:41         ` [PATCH 4/3] sha1_name: use bsearch_pack() in unique_in_pack() René Scharfe
  3 siblings, 0 replies; 12+ messages in thread
From: Derrick Stolee @ 2018-03-22 17:40 UTC (permalink / raw)
  To: git; +Cc: stolee, jonathantanmy, sandals, Derrick Stolee

When computing abbreviation lengths for an object ID against a single
packfile, the method find_abbrev_len_for_pack() currently implements
binary search. This is one of several implementations. One issue with
this implementation is that it ignores the fanout table in the pack-
index.

Translate this binary search to use the existing bsearch_pack() method
that correctly uses a fanout table.

Due to the use of the fanout table, the abbreviation computation is
slightly faster than before. For a fully-repacked copy of the Linux
repo, the following 'git log' commands improved:

* git log --oneline --parents --raw
  Before: 59.2s
  After:  56.9s
  Rel %:  -3.8%

* git log --oneline --parents
  Before: 6.48s
  After:  5.91s
  Rel %: -8.9%

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sha1_name.c | 24 ++++--------------------
 1 file changed, 4 insertions(+), 20 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index 16e0003396..24894b3dbe 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -512,32 +512,16 @@ static void find_abbrev_len_for_pack(struct packed_git *p,
 				     struct min_abbrev_data *mad)
 {
 	int match = 0;
-	uint32_t num, last, first = 0;
+	uint32_t num, first = 0;
 	struct object_id oid;
+	const struct object_id *mad_oid;
 
 	if (open_pack_index(p) || !p->num_objects)
 		return;
 
 	num = p->num_objects;
-	last = num;
-	while (first < last) {
-		uint32_t mid = first + (last - first) / 2;
-		const unsigned char *current;
-		int cmp;
-
-		current = nth_packed_object_sha1(p, mid);
-		cmp = hashcmp(mad->oid->hash, current);
-		if (!cmp) {
-			match = 1;
-			first = mid;
-			break;
-		}
-		if (cmp > 0) {
-			first = mid + 1;
-			continue;
-		}
-		last = mid;
-	}
+	mad_oid = mad->oid;
+	match = bsearch_pack(mad_oid, p, &first);
 
 	/*
 	 * first is now the position in the packfile where we would insert
-- 
2.17.0.rc0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/3] sha1_name: use bsearch_pack() in unique_in_pack()
  2018-03-22 17:40       ` [PATCH v2 0/3] Use " Derrick Stolee
                           ` (2 preceding siblings ...)
  2018-03-22 17:40         ` [PATCH v2 3/3] sha1_name: use bsearch_pack() for abbreviations Derrick Stolee
@ 2018-03-24 16:41         ` René Scharfe
  2018-03-25 16:19           ` Junio C Hamano
  2018-03-25 18:21           ` Derrick Stolee
  3 siblings, 2 replies; 12+ messages in thread
From: René Scharfe @ 2018-03-24 16:41 UTC (permalink / raw)
  To: Derrick Stolee, git; +Cc: stolee, jonathantanmy, sandals

Replace the custom binary search in unique_in_pack() with a call to
bsearch_pack().  This reduces code duplication and makes use of the
fan-out table of packs.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
---
This is basically the same replacement as done by patch 3.  Speed is
less of a concern here -- at least I don't know a commonly used
command that needs to resolve lots of short hashes.

 sha1_name.c | 21 ++-------------------
 1 file changed, 2 insertions(+), 19 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index 24894b3dbe..0185c6081a 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -150,31 +150,14 @@ static int match_sha(unsigned len, const unsigned char *a, const unsigned char *
 static void unique_in_pack(struct packed_git *p,
 			   struct disambiguate_state *ds)
 {
-	uint32_t num, last, i, first = 0;
+	uint32_t num, i, first = 0;
 	const struct object_id *current = NULL;
 
 	if (open_pack_index(p) || !p->num_objects)
 		return;
 
 	num = p->num_objects;
-	last = num;
-	while (first < last) {
-		uint32_t mid = first + (last - first) / 2;
-		const unsigned char *current;
-		int cmp;
-
-		current = nth_packed_object_sha1(p, mid);
-		cmp = hashcmp(ds->bin_pfx.hash, current);
-		if (!cmp) {
-			first = mid;
-			break;
-		}
-		if (cmp > 0) {
-			first = mid+1;
-			continue;
-		}
-		last = mid;
-	}
+	bsearch_pack(&ds->bin_pfx, p, &first);
 
 	/*
 	 * At this point, "first" is the location of the lowest object
-- 
2.16.3

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/3] sha1_name: use bsearch_pack() in unique_in_pack()
  2018-03-24 16:41         ` [PATCH 4/3] sha1_name: use bsearch_pack() in unique_in_pack() René Scharfe
@ 2018-03-25 16:19           ` Junio C Hamano
  2018-03-25 16:32             ` René Scharfe
  2018-03-25 18:21           ` Derrick Stolee
  1 sibling, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2018-03-25 16:19 UTC (permalink / raw)
  To: René Scharfe; +Cc: Derrick Stolee, git, stolee, jonathantanmy, sandals

René Scharfe <l.s.r@web.de> writes:

> Replace the custom binary search in unique_in_pack() with a call to
> bsearch_pack().  This reduces code duplication and makes use of the
> fan-out table of packs.
>
> Signed-off-by: Rene Scharfe <l.s.r@web.de>
> ---
> This is basically the same replacement as done by patch 3.  Speed is
> less of a concern here -- at least I don't know a commonly used
> command that needs to resolve lots of short hashes.

Looks correct.  Did you find this by eyeballing, or do you have some
interesting tool you use?

>
>  sha1_name.c | 21 ++-------------------
>  1 file changed, 2 insertions(+), 19 deletions(-)
>
> diff --git a/sha1_name.c b/sha1_name.c
> index 24894b3dbe..0185c6081a 100644
> --- a/sha1_name.c
> +++ b/sha1_name.c
> @@ -150,31 +150,14 @@ static int match_sha(unsigned len, const unsigned char *a, const unsigned char *
>  static void unique_in_pack(struct packed_git *p,
>  			   struct disambiguate_state *ds)
>  {
> -	uint32_t num, last, i, first = 0;
> +	uint32_t num, i, first = 0;
>  	const struct object_id *current = NULL;
>  
>  	if (open_pack_index(p) || !p->num_objects)
>  		return;
>  
>  	num = p->num_objects;
> -	last = num;
> -	while (first < last) {
> -		uint32_t mid = first + (last - first) / 2;
> -		const unsigned char *current;
> -		int cmp;
> -
> -		current = nth_packed_object_sha1(p, mid);
> -		cmp = hashcmp(ds->bin_pfx.hash, current);
> -		if (!cmp) {
> -			first = mid;
> -			break;
> -		}
> -		if (cmp > 0) {
> -			first = mid+1;
> -			continue;
> -		}
> -		last = mid;
> -	}
> +	bsearch_pack(&ds->bin_pfx, p, &first);
>  
>  	/*
>  	 * At this point, "first" is the location of the lowest object

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/3] sha1_name: use bsearch_pack() in unique_in_pack()
  2018-03-25 16:19           ` Junio C Hamano
@ 2018-03-25 16:32             ` René Scharfe
  0 siblings, 0 replies; 12+ messages in thread
From: René Scharfe @ 2018-03-25 16:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee, git, stolee, jonathantanmy, sandals

Am 25.03.2018 um 18:19 schrieb Junio C Hamano:
> René Scharfe <l.s.r@web.de> writes:
> 
>> Replace the custom binary search in unique_in_pack() with a call to
>> bsearch_pack().  This reduces code duplication and makes use of the
>> fan-out table of packs.
>>
>> Signed-off-by: Rene Scharfe <l.s.r@web.de>
>> ---
>> This is basically the same replacement as done by patch 3.  Speed is
>> less of a concern here -- at least I don't know a commonly used
>> command that needs to resolve lots of short hashes.
> 
> Looks correct.  Did you find this by eyeballing, or do you have some
> interesting tool you use?

I was looking for SHA1 binary searches using something like this:

	git grep -e '/ 2' -e hashcmp -W --all-match

René

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/3] sha1_name: use bsearch_pack() in unique_in_pack()
  2018-03-24 16:41         ` [PATCH 4/3] sha1_name: use bsearch_pack() in unique_in_pack() René Scharfe
  2018-03-25 16:19           ` Junio C Hamano
@ 2018-03-25 18:21           ` Derrick Stolee
  1 sibling, 0 replies; 12+ messages in thread
From: Derrick Stolee @ 2018-03-25 18:21 UTC (permalink / raw)
  To: René Scharfe, Derrick Stolee, git; +Cc: jonathantanmy, sandals

On 3/24/2018 12:41 PM, René Scharfe wrote:
> Replace the custom binary search in unique_in_pack() with a call to
> bsearch_pack().  This reduces code duplication and makes use of the
> fan-out table of packs.
>
> Signed-off-by: Rene Scharfe <l.s.r@web.de>
> ---
> This is basically the same replacement as done by patch 3.  Speed is
> less of a concern here -- at least I don't know a commonly used
> command that needs to resolve lots of short hashes.

Thanks, René! Good teamwork on this patch series.

Reviewed-by: Derrick Stolee <dstolee@microsoft.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2018-03-25 18:21 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-20 20:03 [PATCH] sha1_name: use bsearch_hash() for abbreviations Derrick Stolee
2018-03-20 22:25 ` Jonathan Tan
2018-03-21 13:24   ` Derrick Stolee
2018-03-21 22:42     ` brian m. carlson
2018-03-22 17:40       ` [PATCH v2 0/3] Use " Derrick Stolee
2018-03-22 17:40         ` [PATCH v2 1/3] sha1_name: convert struct min_abbrev_data to object_id Derrick Stolee
2018-03-22 17:40         ` [PATCH v2 2/3] packfile: define and use bsearch_pack() Derrick Stolee
2018-03-22 17:40         ` [PATCH v2 3/3] sha1_name: use bsearch_pack() for abbreviations Derrick Stolee
2018-03-24 16:41         ` [PATCH 4/3] sha1_name: use bsearch_pack() in unique_in_pack() René Scharfe
2018-03-25 16:19           ` Junio C Hamano
2018-03-25 16:32             ` René Scharfe
2018-03-25 18:21           ` Derrick Stolee

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).