git@vger.kernel.org mailing list mirror (one of many)
 help / Atom feed
* [PATCH v4 0/4] Improve abbreviation disambiguation
@ 2017-10-08 18:49 Derrick Stolee
  2017-10-08 18:49 ` [PATCH v4 1/4] p4211-line-log.sh: add log --online --raw --parents perf test Derrick Stolee
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Derrick Stolee @ 2017-10-08 18:49 UTC (permalink / raw)
  To: git; +Cc: gitster, stolee, peff, ramsay, sbeller, Derrick Stolee

Changes since previous version:

* Fixed an overflow error in the binary search. I sent a separate patch
  to fix this error in existing searches; that patch should be applied
  before this one.

* Removed test-list-objects and test-abbrev in favor of a new git log
  test in p4211-line-log.sh. Limited perf numbers for Linux repo are
  given in cover letter and commit 4/4.

* Silently skip packfiles that fail to open with open_pack_index()

Thanks for all the comments from Jeff, Junio, Ramsey, and Stefan!

Thanks,
 Stolee

---

When displaying object ids, we frequently want to see an abbreviation
for easier typing. That abbreviation must be unambiguous among all
object ids.

The current implementation of find_unique_abbrev() performs a loop
checking if each abbreviation length is unambiguous until finding one
that works. This causes multiple round-trips to the disk when starting
with the default abbreviation length (usually 7) but needing up to 12
characters for an unambiguous short-sha. For very large repos, this
effect is pronounced and causes issues with several commands, from
obvious consumers `status` and `log` to less obvious commands such as
`fetch` and `push`.

This patch improves performance by iterating over objects matching the
short abbreviation only once, inspecting each object id, and reporting
the minimum length of an unambiguous abbreviation.

Add a new perf test for testing the performance of log while computing
OID abbreviations. Using --oneline --raw and --parents options maximizes
the number of OIDs to abbreviate while still spending some time
computing diffs. Below we report performance statistics for perf test
4211.6 from p4211-line-log.sh using three copies of the Linux repo:

| Packs | Loose  | Base Time | New Time | Rel%  |
|-------|--------|-----------|----------|-------|
|  1    |      0 |   41.27 s |  38.93 s | -4.8% |
| 24    |      0 |   98.04 s |  91.35 s | -5.7% |
| 23    | 323952 |  117.78 s | 112.18 s | -4.8% |

Derrick Stolee (4):
  p4211-line-log.sh: add log --online --raw --parents perf test
  sha1_name: Unroll len loop in find_unique_abbrev_r
  sha1_name: Parse less while finding common prefix
  sha1_name: Minimize OID comparisons during disambiguation

 sha1_name.c              | 129 +++++++++++++++++++++++++++++++++++++++++------
 t/perf/p4211-line-log.sh |   4 ++
 2 files changed, 118 insertions(+), 15 deletions(-)

-- 
2.14.1.538.g56ec8fc98.dirty


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 1/4] p4211-line-log.sh: add log --online --raw --parents perf test
  2017-10-08 18:49 [PATCH v4 0/4] Improve abbreviation disambiguation Derrick Stolee
@ 2017-10-08 18:49 ` Derrick Stolee
  2017-10-08 18:49 ` [PATCH v4 2/4] sha1_name: unroll len loop in find_unique_abbrev_r Derrick Stolee
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: Derrick Stolee @ 2017-10-08 18:49 UTC (permalink / raw)
  To: git; +Cc: gitster, stolee, peff, ramsay, sbeller, Derrick Stolee

Add a new perf test for testing the performance of log while computing
OID abbreviations. Using --oneline --raw and --parents options maximizes
the number of OIDs to abbreviate while still spending some time computing
diffs.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/perf/p4211-line-log.sh | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/t/perf/p4211-line-log.sh b/t/perf/p4211-line-log.sh
index b7ff68d4f..e0ed05907 100755
--- a/t/perf/p4211-line-log.sh
+++ b/t/perf/p4211-line-log.sh
@@ -31,4 +31,8 @@ test_perf 'git log -L (renames on)' '
 	git log -M -L 1:"$file" >/dev/null
 '
 
+test_perf 'git log --oneline --raw --parents' '
+	git log --oneline --raw --parents >/dev/null
+'
+
 test_done
-- 
2.14.1.538.g56ec8fc98.dirty


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 2/4] sha1_name: unroll len loop in find_unique_abbrev_r
  2017-10-08 18:49 [PATCH v4 0/4] Improve abbreviation disambiguation Derrick Stolee
  2017-10-08 18:49 ` [PATCH v4 1/4] p4211-line-log.sh: add log --online --raw --parents perf test Derrick Stolee
@ 2017-10-08 18:49 ` Derrick Stolee
  2017-10-08 18:49 ` [PATCH v4 3/4] sha1_name: parse less while finding common prefix Derrick Stolee
  2017-10-08 18:49 ` [PATCH v4 4/4] sha1_name: minimize OID comparisons during disambiguation Derrick Stolee
  3 siblings, 0 replies; 22+ messages in thread
From: Derrick Stolee @ 2017-10-08 18:49 UTC (permalink / raw)
  To: git; +Cc: gitster, stolee, peff, ramsay, sbeller, Derrick Stolee

Unroll the while loop inside find_unique_abbrev_r to avoid iterating
through all loose objects and packfiles multiple times when the short
name is longer than the predicted length.

Instead, inspect each object that collides with the estimated
abbreviation to find the longest common prefix.

The focus of this change is to refactor the existing method in a way
that clearly does not change the current behavior. In some cases, the
new method is slower than the previous method. Later changes will
correct all performance loss.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sha1_name.c | 57 ++++++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 42 insertions(+), 15 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index 134ac9742..f2a1ebe49 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -474,10 +474,32 @@ static unsigned msb(unsigned long val)
 	return r;
 }
 
-int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len)
+struct min_abbrev_data {
+	unsigned int init_len;
+	unsigned int cur_len;
+	char *hex;
+};
+
+static int extend_abbrev_len(const struct object_id *oid, void *cb_data)
 {
-	int status, exists;
+	struct min_abbrev_data *mad = cb_data;
+
+	char *hex = oid_to_hex(oid);
+	unsigned int i = mad->init_len;
+	while (mad->hex[i] && mad->hex[i] == hex[i])
+		i++;
+
+	if (i < GIT_MAX_RAWSZ && i >= mad->cur_len)
+		mad->cur_len = i + 1;
+
+	return 0;
+}
 
+int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len)
+{
+	struct disambiguate_state ds;
+	struct min_abbrev_data mad;
+	struct object_id oid_ret;
 	if (len < 0) {
 		unsigned long count = approximate_object_count();
 		/*
@@ -503,19 +525,24 @@ int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len)
 	sha1_to_hex_r(hex, sha1);
 	if (len == GIT_SHA1_HEXSZ || !len)
 		return GIT_SHA1_HEXSZ;
-	exists = has_sha1_file(sha1);
-	while (len < GIT_SHA1_HEXSZ) {
-		struct object_id oid_ret;
-		status = get_short_oid(hex, len, &oid_ret, GET_OID_QUIETLY);
-		if (exists
-		    ? !status
-		    : status == SHORT_NAME_NOT_FOUND) {
-			hex[len] = 0;
-			return len;
-		}
-		len++;
-	}
-	return len;
+
+	if (init_object_disambiguation(hex, len, &ds) < 0)
+		return -1;
+
+	mad.init_len = len;
+	mad.cur_len = len;
+	mad.hex = hex;
+
+	ds.fn = extend_abbrev_len;
+	ds.always_call_fn = 1;
+	ds.cb_data = (void*)&mad;
+
+	find_short_object_filename(&ds);
+	find_short_packed_object(&ds);
+	(void)finish_object_disambiguation(&ds, &oid_ret);
+
+	hex[mad.cur_len] = 0;
+	return mad.cur_len;
 }
 
 const char *find_unique_abbrev(const unsigned char *sha1, int len)
-- 
2.14.1.538.g56ec8fc98.dirty


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 3/4] sha1_name: parse less while finding common prefix
  2017-10-08 18:49 [PATCH v4 0/4] Improve abbreviation disambiguation Derrick Stolee
  2017-10-08 18:49 ` [PATCH v4 1/4] p4211-line-log.sh: add log --online --raw --parents perf test Derrick Stolee
  2017-10-08 18:49 ` [PATCH v4 2/4] sha1_name: unroll len loop in find_unique_abbrev_r Derrick Stolee
@ 2017-10-08 18:49 ` Derrick Stolee
  2017-10-09 13:42   ` Jeff King
  2017-10-08 18:49 ` [PATCH v4 4/4] sha1_name: minimize OID comparisons during disambiguation Derrick Stolee
  3 siblings, 1 reply; 22+ messages in thread
From: Derrick Stolee @ 2017-10-08 18:49 UTC (permalink / raw)
  To: git; +Cc: gitster, stolee, peff, ramsay, sbeller, Derrick Stolee

Create get_hex_char_from_oid() to parse oids one hex character at a
time. This prevents unnecessary copying of hex characters in
extend_abbrev_len() when finding the length of a common prefix.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sha1_name.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index f2a1ebe49..5081aeb71 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -480,13 +480,23 @@ struct min_abbrev_data {
 	char *hex;
 };
 
+static inline char get_hex_char_from_oid(const struct object_id *oid,
+					 int pos)
+{
+	static const char hex[] = "0123456789abcdef";
+
+	if ((pos & 1) == 0)
+		return hex[oid->hash[pos >> 1] >> 4];
+	else
+		return hex[oid->hash[pos >> 1] & 0xf];
+}
+
 static int extend_abbrev_len(const struct object_id *oid, void *cb_data)
 {
 	struct min_abbrev_data *mad = cb_data;
 
-	char *hex = oid_to_hex(oid);
 	unsigned int i = mad->init_len;
-	while (mad->hex[i] && mad->hex[i] == hex[i])
+	while (mad->hex[i] && mad->hex[i] == get_hex_char_from_oid(oid, i))
 		i++;
 
 	if (i < GIT_MAX_RAWSZ && i >= mad->cur_len)
-- 
2.14.1.538.g56ec8fc98.dirty


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 4/4] sha1_name: minimize OID comparisons during disambiguation
  2017-10-08 18:49 [PATCH v4 0/4] Improve abbreviation disambiguation Derrick Stolee
                   ` (2 preceding siblings ...)
  2017-10-08 18:49 ` [PATCH v4 3/4] sha1_name: parse less while finding common prefix Derrick Stolee
@ 2017-10-08 18:49 ` Derrick Stolee
  2017-10-09 13:49   ` Jeff King
  3 siblings, 1 reply; 22+ messages in thread
From: Derrick Stolee @ 2017-10-08 18:49 UTC (permalink / raw)
  To: git; +Cc: gitster, stolee, peff, ramsay, sbeller, Derrick Stolee

Minimize OID comparisons during disambiguatiosn of packfile OIDs.

Teach git to use binary search with the full OID to find the object's
position (or insertion position, if not present) in the pack-index.
The object before and immediately after (or the one at the insertion
position) give the maximum common prefix.  No subsequent linear search
is required.

Take care of which two to inspect, in case the object id exists in the
packfile.

If the input to find_unique_abbrev_r() is a partial prefix, then the
OID used for the binary search is padded with zeroes so the object will
not exist in the repo (with high probability) and the same logic
applies.

This commit completes a series of three changes to OID abbreviation
code, and the overall change can be seen using standard commands for
large repos. Below we report performance statistics for perf test 4211.6
from p4211-line-log.sh using three copies of the Linux repo:

| Packs | Loose  | HEAD~3   | HEAD     | Rel%  |
|-------|--------|----------|----------|-------|
|  1    |      0 |  41.27 s |  38.93 s | -4.8% |
| 24    |      0 |  98.04 s |  91.35 s | -5.7% |
| 23    | 323952 | 117.78 s | 112.18 s | -4.8% |

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sha1_name.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 66 insertions(+), 4 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index 5081aeb71..49ba67955 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -478,6 +478,7 @@ struct min_abbrev_data {
 	unsigned int init_len;
 	unsigned int cur_len;
 	char *hex;
+	const unsigned char *hash;
 };
 
 static inline char get_hex_char_from_oid(const struct object_id *oid,
@@ -505,6 +506,65 @@ static int extend_abbrev_len(const struct object_id *oid, void *cb_data)
 	return 0;
 }
 
+static void find_abbrev_len_for_pack(struct packed_git *p,
+				     struct min_abbrev_data *mad)
+{
+	int match = 0;
+	uint32_t num, last, first = 0;
+	struct object_id oid;
+
+	open_pack_index(p);
+	num = p->num_objects;
+	last = num;
+	while (first < last) {
+		uint32_t mid = first + (last - first) / 2;
+		const unsigned char *current;
+		int cmp;
+
+		current = nth_packed_object_sha1(p, mid);
+		cmp = hashcmp(mad->hash, current);
+		if (!cmp) {
+			match = 1;
+			first = mid;
+			break;
+		}
+		if (cmp > 0) {
+			first = mid + 1;
+			continue;
+		}
+		last = mid;
+	}
+
+	/*
+	 * first is now the position in the packfile where we would insert
+	 * mad->hash if it does not exist (or the position of mad->hash if
+	 * it does exist). Hence, we consider a maximum of three objects
+	 * nearby for the abbreviation length.
+	 */
+	mad->init_len = 0;
+	if (!match) {
+		nth_packed_object_oid(&oid, p, first);
+		extend_abbrev_len(&oid, mad);
+	} else if (first < num - 1) {
+		nth_packed_object_oid(&oid, p, first + 1);
+		extend_abbrev_len(&oid, mad);
+	}
+	if (first > 0) {
+		nth_packed_object_oid(&oid, p, first - 1);
+		extend_abbrev_len(&oid, mad);
+	}
+	mad->init_len = mad->cur_len;
+}
+
+static void find_abbrev_len_packed(struct min_abbrev_data *mad)
+{
+	struct packed_git *p;
+
+	prepare_packed_git();
+	for (p = packed_git; p; p = p->next)
+		find_abbrev_len_for_pack(p, mad);
+}
+
 int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len)
 {
 	struct disambiguate_state ds;
@@ -536,19 +596,21 @@ int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len)
 	if (len == GIT_SHA1_HEXSZ || !len)
 		return GIT_SHA1_HEXSZ;
 
-	if (init_object_disambiguation(hex, len, &ds) < 0)
-		return -1;
-
 	mad.init_len = len;
 	mad.cur_len = len;
 	mad.hex = hex;
+	mad.hash = sha1;
+
+	find_abbrev_len_packed(&mad);
+
+	if (init_object_disambiguation(hex, mad.cur_len, &ds) < 0)
+		return -1;
 
 	ds.fn = extend_abbrev_len;
 	ds.always_call_fn = 1;
 	ds.cb_data = (void*)&mad;
 
 	find_short_object_filename(&ds);
-	find_short_packed_object(&ds);
 	(void)finish_object_disambiguation(&ds, &oid_ret);
 
 	hex[mad.cur_len] = 0;
-- 
2.14.1.538.g56ec8fc98.dirty


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 3/4] sha1_name: parse less while finding common prefix
  2017-10-08 18:49 ` [PATCH v4 3/4] sha1_name: parse less while finding common prefix Derrick Stolee
@ 2017-10-09 13:42   ` Jeff King
  0 siblings, 0 replies; 22+ messages in thread
From: Jeff King @ 2017-10-09 13:42 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, gitster, stolee, ramsay, sbeller

On Sun, Oct 08, 2017 at 02:49:41PM -0400, Derrick Stolee wrote:

> +static inline char get_hex_char_from_oid(const struct object_id *oid,
> +					 int pos)
> +{
> +	static const char hex[] = "0123456789abcdef";
> +
> +	if ((pos & 1) == 0)
> +		return hex[oid->hash[pos >> 1] >> 4];
> +	else
> +		return hex[oid->hash[pos >> 1] & 0xf];
> +}

Should "pos" be unsigned? I don't think it matters much in practice (as
long as it's not negative, the results are well defined by the standard,
and this clearly will be between 0 and 40). But it seems funny that we
consistently use unsigned in the rest of the caller and then implicitly
convert to signed here.

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 4/4] sha1_name: minimize OID comparisons during disambiguation
  2017-10-08 18:49 ` [PATCH v4 4/4] sha1_name: minimize OID comparisons during disambiguation Derrick Stolee
@ 2017-10-09 13:49   ` Jeff King
  2017-10-10 12:16     ` Derrick Stolee
  0 siblings, 1 reply; 22+ messages in thread
From: Jeff King @ 2017-10-09 13:49 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, gitster, stolee, ramsay, sbeller

On Sun, Oct 08, 2017 at 02:49:42PM -0400, Derrick Stolee wrote:

> @@ -505,6 +506,65 @@ static int extend_abbrev_len(const struct object_id *oid, void *cb_data)
>  	return 0;
>  }
>  
> +static void find_abbrev_len_for_pack(struct packed_git *p,
> +				     struct min_abbrev_data *mad)
> +{
> +	int match = 0;
> +	uint32_t num, last, first = 0;
> +	struct object_id oid;
> +
> +	open_pack_index(p);
> +	num = p->num_objects;
> +	last = num;
> +	while (first < last) {
> [...]

Your cover letter lists:

  * Silently skip packfiles that fail to open with open_pack_index()

as a change from the previous version. But this looks the same as the
last round. I think this _does_ end up skipping such packfiles because
p->num_objects will be zero. Is it worth having a comment to that
effect (or even just an early return) to make it clear that the
situation is intentional?

Although...

> +	/*
> +	 * first is now the position in the packfile where we would insert
> +	 * mad->hash if it does not exist (or the position of mad->hash if
> +	 * it does exist). Hence, we consider a maximum of three objects
> +	 * nearby for the abbreviation length.
> +	 */
> +	mad->init_len = 0;
> +	if (!match) {
> +		nth_packed_object_oid(&oid, p, first);
> +		extend_abbrev_len(&oid, mad);

If we have zero objects in the pack, what would nth_packed_object_oid()
be returning here?

So I actually think we do want an early return, not just when
open_packed_index() fails, but also when p->num_objects is zero.

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 4/4] sha1_name: minimize OID comparisons during disambiguation
  2017-10-09 13:49   ` Jeff King
@ 2017-10-10 12:16     ` Derrick Stolee
  2017-10-10 12:36       ` Jeff King
  0 siblings, 1 reply; 22+ messages in thread
From: Derrick Stolee @ 2017-10-10 12:16 UTC (permalink / raw)
  To: Jeff King, Derrick Stolee; +Cc: git, gitster, ramsay, sbeller

On 10/9/2017 9:49 AM, Jeff King wrote:
> On Sun, Oct 08, 2017 at 02:49:42PM -0400, Derrick Stolee wrote:
>
>> @@ -505,6 +506,65 @@ static int extend_abbrev_len(const struct object_id *oid, void *cb_data)
>>   	return 0;
>>   }
>>   
>> +static void find_abbrev_len_for_pack(struct packed_git *p,
>> +				     struct min_abbrev_data *mad)
>> +{
>> +	int match = 0;
>> +	uint32_t num, last, first = 0;
>> +	struct object_id oid;
>> +
>> +	open_pack_index(p);
>> +	num = p->num_objects;
>> +	last = num;
>> +	while (first < last) {
>> [...]
> Your cover letter lists:
>
>    * Silently skip packfiles that fail to open with open_pack_index()
>
> as a change from the previous version. But this looks the same as the
> last round. I think this _does_ end up skipping such packfiles because
> p->num_objects will be zero. Is it worth having a comment to that
> effect (or even just an early return) to make it clear that the
> situation is intentional?
>
> Although...
>
>> +	/*
>> +	 * first is now the position in the packfile where we would insert
>> +	 * mad->hash if it does not exist (or the position of mad->hash if
>> +	 * it does exist). Hence, we consider a maximum of three objects
>> +	 * nearby for the abbreviation length.
>> +	 */
>> +	mad->init_len = 0;
>> +	if (!match) {
>> +		nth_packed_object_oid(&oid, p, first);
>> +		extend_abbrev_len(&oid, mad);
> If we have zero objects in the pack, what would nth_packed_object_oid()
> be returning here?
>
> So I actually think we do want an early return, not just when
> open_packed_index() fails, but also when p->num_objects is zero.
>
> -Peff

Sorry about this. I caught this while I was writing my cover letter and 
amended my last commit to include the following:

     if (open_pack_index(p))
         return;

After I amended the commit, I forgot to 'format-patch' again. I can send 
a diff between the commits after review has calmed.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 4/4] sha1_name: minimize OID comparisons during disambiguation
  2017-10-10 12:16     ` Derrick Stolee
@ 2017-10-10 12:36       ` Jeff King
  2017-10-10 12:56         ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Jeff King @ 2017-10-10 12:36 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Derrick Stolee, git, gitster, ramsay, sbeller

On Tue, Oct 10, 2017 at 08:16:27AM -0400, Derrick Stolee wrote:

> > > +	mad->init_len = 0;
> > > +	if (!match) {
> > > +		nth_packed_object_oid(&oid, p, first);
> > > +		extend_abbrev_len(&oid, mad);
> > If we have zero objects in the pack, what would nth_packed_object_oid()
> > be returning here?
> > 
> > So I actually think we do want an early return, not just when
> > open_packed_index() fails, but also when p->num_objects is zero.
> > 
> > -Peff
> 
> Sorry about this. I caught this while I was writing my cover letter and
> amended my last commit to include the following:
> 
>     if (open_pack_index(p))
>         return;
> 
> After I amended the commit, I forgot to 'format-patch' again. I can send a
> diff between the commits after review has calmed.

OK, I think that makes more sense. But note the p->num_objects thing I
mentioned. If I do:

  git pack-objects .git/objects/pack/pack </dev/null

then I have a pack with zero objects, which I think we'd similarly want
to return early from. I.e., I think we need:

  if (p->num_objects)
	return;

Technically that also covers open_pack_index() failure, too, but that's
a subtlety I don't think we should rely on.

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 4/4] sha1_name: minimize OID comparisons during disambiguation
  2017-10-10 12:36       ` Jeff King
@ 2017-10-10 12:56         ` Junio C Hamano
  2017-10-10 13:09           ` Jeff King
  2017-10-10 13:11           ` Derrick Stolee
  0 siblings, 2 replies; 22+ messages in thread
From: Junio C Hamano @ 2017-10-10 12:56 UTC (permalink / raw)
  To: Jeff King; +Cc: Derrick Stolee, Derrick Stolee, git, ramsay, sbeller

Jeff King <peff@peff.net> writes:

> OK, I think that makes more sense. But note the p->num_objects thing I
> mentioned. If I do:
>
>   git pack-objects .git/objects/pack/pack </dev/null
>
> then I have a pack with zero objects, which I think we'd similarly want
> to return early from. I.e., I think we need:
>
>   if (p->num_objects)
> 	return;
>
> Technically that also covers open_pack_index() failure, too, but that's
> a subtlety I don't think we should rely on.

True.  I notice that the early part of the two functions look almost
identical.  Do we need error condition handling for the other one,
too?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 4/4] sha1_name: minimize OID comparisons during disambiguation
  2017-10-10 12:56         ` Junio C Hamano
@ 2017-10-10 13:09           ` Jeff King
  2017-10-10 13:11           ` Derrick Stolee
  1 sibling, 0 replies; 22+ messages in thread
From: Jeff King @ 2017-10-10 13:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee, Derrick Stolee, git, ramsay, sbeller

On Tue, Oct 10, 2017 at 09:56:38PM +0900, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > OK, I think that makes more sense. But note the p->num_objects thing I
> > mentioned. If I do:
> >
> >   git pack-objects .git/objects/pack/pack </dev/null
> >
> > then I have a pack with zero objects, which I think we'd similarly want
> > to return early from. I.e., I think we need:
> >
> >   if (p->num_objects)
> > 	return;
> >
> > Technically that also covers open_pack_index() failure, too, but that's
> > a subtlety I don't think we should rely on.
> 
> True.  I notice that the early part of the two functions look almost
> identical.  Do we need error condition handling for the other one,
> too?

I'm not sure which two you mean. Do you mean find_pack_entry_one() in
packfile.c as the other one? If so, I think it is fine in the
zero-object case, because it does not do the "this is the sha1 at the
position where it _would_ be found" trick, which is what causes us to
potentially dereference nonsense.

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 4/4] sha1_name: minimize OID comparisons during disambiguation
  2017-10-10 12:56         ` Junio C Hamano
  2017-10-10 13:09           ` Jeff King
@ 2017-10-10 13:11           ` Derrick Stolee
  2017-10-10 13:30             ` Jeff King
  1 sibling, 1 reply; 22+ messages in thread
From: Derrick Stolee @ 2017-10-10 13:11 UTC (permalink / raw)
  To: Junio C Hamano, Jeff King; +Cc: Derrick Stolee, git, ramsay, sbeller

On 10/10/2017 8:56 AM, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
>
>> OK, I think that makes more sense. But note the p->num_objects thing I
>> mentioned. If I do:
>>
>>    git pack-objects .git/objects/pack/pack </dev/null
>>
>> then I have a pack with zero objects, which I think we'd similarly want
>> to return early from. I.e., I think we need:
>>
>>    if (p->num_objects)
>> 	return;
>>
>> Technically that also covers open_pack_index() failure, too, but that's
>> a subtlety I don't think we should rely on.
> True.  I notice that the early part of the two functions look almost
> identical.  Do we need error condition handling for the other one,
> too?

I prefer to fix the problem in all code clones when they cause review 
friction, so I'll send a fifth commit showing just the diff for these 
packfile issues in sha1_name.c. See patch below.

Should open_pack_index() return a non-zero status if the packfile is 
empty? Or, is there a meaningful reason to have empty packfiles?

Thanks,
-Stolee


diff --git a/sha1_name.c b/sha1_name.c
index 49ba67955..9f8a33e82 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -153,7 +153,9 @@ static void unique_in_pack(struct packed_git *p,
         uint32_t num, last, i, first = 0;
         const struct object_id *current = NULL;

-       open_pack_index(p);
+       if (open_pack_index(p) || !p->num_objects)
+               return;
+
         num = p->num_objects;
         last = num;
         while (first < last) {
@@ -513,7 +515,9 @@ static void find_abbrev_len_for_pack(struct 
packed_git *p,
         uint32_t num, last, first = 0;
         struct object_id oid;

-       open_pack_index(p);
+       if (open_pack_index(p) || !p->num_objects)
+               return;
+
         num = p->num_objects;
         last = num;
         while (first < last) {


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 4/4] sha1_name: minimize OID comparisons during disambiguation
  2017-10-10 13:11           ` Derrick Stolee
@ 2017-10-10 13:30             ` Jeff King
  2017-10-11 13:58               ` Derrick Stolee
  0 siblings, 1 reply; 22+ messages in thread
From: Jeff King @ 2017-10-10 13:30 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Junio C Hamano, Derrick Stolee, git, ramsay, sbeller

On Tue, Oct 10, 2017 at 09:11:15AM -0400, Derrick Stolee wrote:

> On 10/10/2017 8:56 AM, Junio C Hamano wrote:
> > Jeff King <peff@peff.net> writes:
> > 
> > > OK, I think that makes more sense. But note the p->num_objects thing I
> > > mentioned. If I do:
> > > 
> > >    git pack-objects .git/objects/pack/pack </dev/null
> > > 
> > > then I have a pack with zero objects, which I think we'd similarly want
> > > to return early from. I.e., I think we need:
> > > 
> > >    if (p->num_objects)
> > > 	return;
> > > 
> > > Technically that also covers open_pack_index() failure, too, but that's
> > > a subtlety I don't think we should rely on.
> > True.  I notice that the early part of the two functions look almost
> > identical.  Do we need error condition handling for the other one,
> > too?
> 
> I prefer to fix the problem in all code clones when they cause review
> friction, so I'll send a fifth commit showing just the diff for these
> packfile issues in sha1_name.c. See patch below.

Ah, that answers my earlier question. Junio mean unique_in_pack(). And
yeah, I think it suffers from the same problem.

> Should open_pack_index() return a non-zero status if the packfile is empty?
> Or, is there a meaningful reason to have empty packfiles?

I can't think of a compelling reason to have an empty packfile. But nor
do I think we should consider them an error when we can handle them
quietly (and returning non-zero status would cause Git to complain on
many operations in a repository that has such a file).

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 4/4] sha1_name: minimize OID comparisons during disambiguation
  2017-10-10 13:30             ` Jeff King
@ 2017-10-11 13:58               ` Derrick Stolee
  2017-10-12 12:02                 ` [PATCH v5 0/4] Improve abbreviation " Derrick Stolee
                                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Derrick Stolee @ 2017-10-11 13:58 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Derrick Stolee, git, ramsay, sbeller

On 10/10/2017 9:30 AM, Jeff King wrote:
> On Tue, Oct 10, 2017 at 09:11:15AM -0400, Derrick Stolee wrote:
>
>> On 10/10/2017 8:56 AM, Junio C Hamano wrote:
>>> Jeff King <peff@peff.net> writes:
>>>
>>>> OK, I think that makes more sense. But note the p->num_objects thing I
>>>> mentioned. If I do:
>>>>
>>>>     git pack-objects .git/objects/pack/pack </dev/null
>>>>
>>>> then I have a pack with zero objects, which I think we'd similarly want
>>>> to return early from. I.e., I think we need:
>>>>
>>>>     if (p->num_objects)
>>>> 	return;
>>>>
>>>> Technically that also covers open_pack_index() failure, too, but that's
>>>> a subtlety I don't think we should rely on.
>>> True.  I notice that the early part of the two functions look almost
>>> identical.  Do we need error condition handling for the other one,
>>> too?
>> I prefer to fix the problem in all code clones when they cause review
>> friction, so I'll send a fifth commit showing just the diff for these
>> packfile issues in sha1_name.c. See patch below.
> Ah, that answers my earlier question. Junio mean unique_in_pack(). And
> yeah, I think it suffers from the same problem.
>
>> Should open_pack_index() return a non-zero status if the packfile is empty?
>> Or, is there a meaningful reason to have empty packfiles?
> I can't think of a compelling reason to have an empty packfile. But nor
> do I think we should consider them an error when we can handle them
> quietly (and returning non-zero status would cause Git to complain on
> many operations in a repository that has such a file).
>
> -Peff

Thanks for the comments. I found some typos in my commit messages, too.

I plan to send a (hopefully) final version tomorrow (Thursday). It will 
include:

* Make 'pos' unsigned in get_hex_char_from_oid()

* Check response from open_pack_index()

* Small typos in commit messages

If there are other issues, then please let me know.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v5 0/4] Improve abbreviation disambiguation
  2017-10-11 13:58               ` Derrick Stolee
@ 2017-10-12 12:02                 ` " Derrick Stolee
  2017-10-12 12:04                   ` Derrick Stolee
  2017-10-12 12:02                 ` [PATCH v5 1/4] p4211-line-log.sh: add log --online --raw --parents perf test Derrick Stolee
                                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Derrick Stolee @ 2017-10-12 12:02 UTC (permalink / raw)
  To: git; +Cc: peff, gitster, stolee, Derrick Stolee

Changes since previous version:

* Make 'pos' unsigned in get_hex_char_from_oid()

* Check response from open_pack_index()

* Small typos in commit messages 

Thanks,
 Stolee

---

When displaying object ids, we frequently want to see an abbreviation
for easier typing. That abbreviation must be unambiguous among all
object ids.

The current implementation of find_unique_abbrev() performs a loop
checking if each abbreviation length is unambiguous until finding one
that works. This causes multiple round-trips to the disk when starting
with the default abbreviation length (usually 7) but needing up to 12
characters for an unambiguous short-sha. For very large repos, this
effect is pronounced and causes issues with several commands, from
obvious consumers `status` and `log` to less obvious commands such as
`fetch` and `push`.

This patch improves performance by iterating over objects matching the
short abbreviation only once, inspecting each object id, and reporting
the minimum length of an unambiguous abbreviation.

Add a new perf test for testing the performance of log while computing
OID abbreviations. Using --oneline --raw and --parents options maximizes
the number of OIDs to abbreviate while still spending some time
computing diffs. Below we report performance statistics for perf test
4211.6 from p4211-line-log.sh using three copies of the Linux repo:

| Packs | Loose  | Base Time | New Time | Rel%  |
|-------|--------|-----------|----------|-------|
|  1    |      0 |   41.27 s |  38.93 s | -4.8% |
| 24    |      0 |   98.04 s |  91.35 s | -5.7% |
| 23    | 323952 |  117.78 s | 112.18 s | -4.8% |

Derrick Stolee (4):
  p4211-line-log.sh: add log --online --raw --parents perf test
  sha1_name: unroll len loop in find_unique_abbrev_r
  sha1_name: parse less while finding common prefix
  sha1_name: minimize OID comparisons during disambiguation
 sha1_name.c              | 135 +++++++++++++++++++++++++++++++++++++++++------
 t/perf/p4211-line-log.sh |   4 ++
 2 files changed, 123 insertions(+), 16 deletions(-)

-- 
2.14.1.538.g56ec8fc98.dirty


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v5 1/4] p4211-line-log.sh: add log --online --raw --parents perf test
  2017-10-11 13:58               ` Derrick Stolee
  2017-10-12 12:02                 ` [PATCH v5 0/4] Improve abbreviation " Derrick Stolee
@ 2017-10-12 12:02                 ` Derrick Stolee
  2017-10-12 12:02                 ` [PATCH v5 2/4] sha1_name: unroll len loop in find_unique_abbrev_r Derrick Stolee
                                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Derrick Stolee @ 2017-10-12 12:02 UTC (permalink / raw)
  To: git; +Cc: peff, gitster, stolee, Derrick Stolee

Add a new perf test for testing the performance of log while computing
OID abbreviations. Using --oneline --raw and --parents options maximizes
the number of OIDs to abbreviate while still spending some time computing
diffs.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/perf/p4211-line-log.sh | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/t/perf/p4211-line-log.sh b/t/perf/p4211-line-log.sh
index b7ff68d4f..e0ed05907 100755
--- a/t/perf/p4211-line-log.sh
+++ b/t/perf/p4211-line-log.sh
@@ -31,4 +31,8 @@ test_perf 'git log -L (renames on)' '
 	git log -M -L 1:"$file" >/dev/null
 '
 
+test_perf 'git log --oneline --raw --parents' '
+	git log --oneline --raw --parents >/dev/null
+'
+
 test_done
-- 
2.14.1.538.g56ec8fc98.dirty


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v5 2/4] sha1_name: unroll len loop in find_unique_abbrev_r
  2017-10-11 13:58               ` Derrick Stolee
  2017-10-12 12:02                 ` [PATCH v5 0/4] Improve abbreviation " Derrick Stolee
  2017-10-12 12:02                 ` [PATCH v5 1/4] p4211-line-log.sh: add log --online --raw --parents perf test Derrick Stolee
@ 2017-10-12 12:02                 ` Derrick Stolee
  2017-10-12 12:02                 ` [PATCH v5 3/4] sha1_name: parse less while finding common prefix Derrick Stolee
  2017-10-12 12:02                 ` [PATCH v5 4/4] sha1_name: minimize OID comparisons during disambiguation Derrick Stolee
  4 siblings, 0 replies; 22+ messages in thread
From: Derrick Stolee @ 2017-10-12 12:02 UTC (permalink / raw)
  To: git; +Cc: peff, gitster, stolee, Derrick Stolee

Unroll the while loop inside find_unique_abbrev_r to avoid iterating
through all loose objects and packfiles multiple times when the short
name is longer than the predicted length.

Instead, inspect each object that collides with the estimated
abbreviation to find the longest common prefix.

The focus of this change is to refactor the existing method in a way
that clearly does not change the current behavior. In some cases, the
new method is slower than the previous method. Later changes will
correct all performance loss.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sha1_name.c | 57 ++++++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 42 insertions(+), 15 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index c7c5ab376..19603713f 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -474,10 +474,32 @@ static unsigned msb(unsigned long val)
 	return r;
 }
 
-int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len)
+struct min_abbrev_data {
+	unsigned int init_len;
+	unsigned int cur_len;
+	char *hex;
+};
+
+static int extend_abbrev_len(const struct object_id *oid, void *cb_data)
 {
-	int status, exists;
+	struct min_abbrev_data *mad = cb_data;
+
+	char *hex = oid_to_hex(oid);
+	unsigned int i = mad->init_len;
+	while (mad->hex[i] && mad->hex[i] == hex[i])
+		i++;
+
+	if (i < GIT_MAX_RAWSZ && i >= mad->cur_len)
+		mad->cur_len = i + 1;
+
+	return 0;
+}
 
+int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len)
+{
+	struct disambiguate_state ds;
+	struct min_abbrev_data mad;
+	struct object_id oid_ret;
 	if (len < 0) {
 		unsigned long count = approximate_object_count();
 		/*
@@ -503,19 +525,24 @@ int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len)
 	sha1_to_hex_r(hex, sha1);
 	if (len == GIT_SHA1_HEXSZ || !len)
 		return GIT_SHA1_HEXSZ;
-	exists = has_sha1_file(sha1);
-	while (len < GIT_SHA1_HEXSZ) {
-		struct object_id oid_ret;
-		status = get_short_oid(hex, len, &oid_ret, GET_OID_QUIETLY);
-		if (exists
-		    ? !status
-		    : status == SHORT_NAME_NOT_FOUND) {
-			hex[len] = 0;
-			return len;
-		}
-		len++;
-	}
-	return len;
+
+	if (init_object_disambiguation(hex, len, &ds) < 0)
+		return -1;
+
+	mad.init_len = len;
+	mad.cur_len = len;
+	mad.hex = hex;
+
+	ds.fn = extend_abbrev_len;
+	ds.always_call_fn = 1;
+	ds.cb_data = (void*)&mad;
+
+	find_short_object_filename(&ds);
+	find_short_packed_object(&ds);
+	(void)finish_object_disambiguation(&ds, &oid_ret);
+
+	hex[mad.cur_len] = 0;
+	return mad.cur_len;
 }
 
 const char *find_unique_abbrev(const unsigned char *sha1, int len)
-- 
2.14.1.538.g56ec8fc98.dirty


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v5 3/4] sha1_name: parse less while finding common prefix
  2017-10-11 13:58               ` Derrick Stolee
                                   ` (2 preceding siblings ...)
  2017-10-12 12:02                 ` [PATCH v5 2/4] sha1_name: unroll len loop in find_unique_abbrev_r Derrick Stolee
@ 2017-10-12 12:02                 ` Derrick Stolee
  2017-10-12 12:02                 ` [PATCH v5 4/4] sha1_name: minimize OID comparisons during disambiguation Derrick Stolee
  4 siblings, 0 replies; 22+ messages in thread
From: Derrick Stolee @ 2017-10-12 12:02 UTC (permalink / raw)
  To: git; +Cc: peff, gitster, stolee, Derrick Stolee

Create get_hex_char_from_oid() to parse oids one hex character at a
time. This prevents unnecessary copying of hex characters in
extend_abbrev_len() when finding the length of a common prefix.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sha1_name.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index 19603713f..fdd2711a6 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -480,13 +480,23 @@ struct min_abbrev_data {
 	char *hex;
 };
 
+static inline char get_hex_char_from_oid(const struct object_id *oid,
+					 unsigned int pos)
+{
+	static const char hex[] = "0123456789abcdef";
+
+	if ((pos & 1) == 0)
+		return hex[oid->hash[pos >> 1] >> 4];
+	else
+		return hex[oid->hash[pos >> 1] & 0xf];
+}
+
 static int extend_abbrev_len(const struct object_id *oid, void *cb_data)
 {
 	struct min_abbrev_data *mad = cb_data;
 
-	char *hex = oid_to_hex(oid);
 	unsigned int i = mad->init_len;
-	while (mad->hex[i] && mad->hex[i] == hex[i])
+	while (mad->hex[i] && mad->hex[i] == get_hex_char_from_oid(oid, i))
 		i++;
 
 	if (i < GIT_MAX_RAWSZ && i >= mad->cur_len)
-- 
2.14.1.538.g56ec8fc98.dirty


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v5 4/4] sha1_name: minimize OID comparisons during disambiguation
  2017-10-11 13:58               ` Derrick Stolee
                                   ` (3 preceding siblings ...)
  2017-10-12 12:02                 ` [PATCH v5 3/4] sha1_name: parse less while finding common prefix Derrick Stolee
@ 2017-10-12 12:02                 ` Derrick Stolee
  4 siblings, 0 replies; 22+ messages in thread
From: Derrick Stolee @ 2017-10-12 12:02 UTC (permalink / raw)
  To: git; +Cc: peff, gitster, stolee, Derrick Stolee

Minimize OID comparisons during disambiguation of packfile OIDs.

Teach git to use binary search with the full OID to find the object's
position (or insertion position, if not present) in the pack-index.
The object before and immediately after (or the one at the insertion
position) give the maximum common prefix.  No subsequent linear search
is required.

Take care of which two to inspect, in case the object id exists in the
packfile.

If the input to find_unique_abbrev_r() is a partial prefix, then the
OID used for the binary search is padded with zeroes so the object will
not exist in the repo (with high probability) and the same logic
applies.

This commit completes a series of three changes to OID abbreviation
code, and the overall change can be seen using standard commands for
large repos. Below we report performance statistics for perf test 4211.6
from p4211-line-log.sh using three copies of the Linux repo:

| Packs | Loose  | HEAD~3   | HEAD     | Rel%  |
|-------|--------|----------|----------|-------|
|  1    |      0 |  41.27 s |  38.93 s | -4.8% |
| 24    |      0 |  98.04 s |  91.35 s | -5.7% |
| 23    | 323952 | 117.78 s | 112.18 s | -4.8% |

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sha1_name.c | 76 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 71 insertions(+), 5 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index fdd2711a6..7fd5f5f71 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -153,7 +153,9 @@ static void unique_in_pack(struct packed_git *p,
 	uint32_t num, last, i, first = 0;
 	const struct object_id *current = NULL;
 
-	open_pack_index(p);
+	if (open_pack_index(p) || !p->num_objects)
+		return;
+
 	num = p->num_objects;
 	last = num;
 	while (first < last) {
@@ -478,6 +480,7 @@ struct min_abbrev_data {
 	unsigned int init_len;
 	unsigned int cur_len;
 	char *hex;
+	const unsigned char *hash;
 };
 
 static inline char get_hex_char_from_oid(const struct object_id *oid,
@@ -505,6 +508,67 @@ static int extend_abbrev_len(const struct object_id *oid, void *cb_data)
 	return 0;
 }
 
+static void find_abbrev_len_for_pack(struct packed_git *p,
+				     struct min_abbrev_data *mad)
+{
+	int match = 0;
+	uint32_t num, last, first = 0;
+	struct object_id oid;
+
+	if (open_pack_index(p) || !p->num_objects)
+		return;
+
+	num = p->num_objects;
+	last = num;
+	while (first < last) {
+		uint32_t mid = first + (last - first) / 2;
+		const unsigned char *current;
+		int cmp;
+
+		current = nth_packed_object_sha1(p, mid);
+		cmp = hashcmp(mad->hash, current);
+		if (!cmp) {
+			match = 1;
+			first = mid;
+			break;
+		}
+		if (cmp > 0) {
+			first = mid + 1;
+			continue;
+		}
+		last = mid;
+	}
+
+	/*
+	 * first is now the position in the packfile where we would insert
+	 * mad->hash if it does not exist (or the position of mad->hash if
+	 * it does exist). Hence, we consider a maximum of three objects
+	 * nearby for the abbreviation length.
+	 */
+	mad->init_len = 0;
+	if (!match) {
+		nth_packed_object_oid(&oid, p, first);
+		extend_abbrev_len(&oid, mad);
+	} else if (first < num - 1) {
+		nth_packed_object_oid(&oid, p, first + 1);
+		extend_abbrev_len(&oid, mad);
+	}
+	if (first > 0) {
+		nth_packed_object_oid(&oid, p, first - 1);
+		extend_abbrev_len(&oid, mad);
+	}
+	mad->init_len = mad->cur_len;
+}
+
+static void find_abbrev_len_packed(struct min_abbrev_data *mad)
+{
+	struct packed_git *p;
+
+	prepare_packed_git();
+	for (p = packed_git; p; p = p->next)
+		find_abbrev_len_for_pack(p, mad);
+}
+
 int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len)
 {
 	struct disambiguate_state ds;
@@ -536,19 +600,21 @@ int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len)
 	if (len == GIT_SHA1_HEXSZ || !len)
 		return GIT_SHA1_HEXSZ;
 
-	if (init_object_disambiguation(hex, len, &ds) < 0)
-		return -1;
-
 	mad.init_len = len;
 	mad.cur_len = len;
 	mad.hex = hex;
+	mad.hash = sha1;
+
+	find_abbrev_len_packed(&mad);
+
+	if (init_object_disambiguation(hex, mad.cur_len, &ds) < 0)
+		return -1;
 
 	ds.fn = extend_abbrev_len;
 	ds.always_call_fn = 1;
 	ds.cb_data = (void*)&mad;
 
 	find_short_object_filename(&ds);
-	find_short_packed_object(&ds);
 	(void)finish_object_disambiguation(&ds, &oid_ret);
 
 	hex[mad.cur_len] = 0;
-- 
2.14.1.538.g56ec8fc98.dirty


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 0/4] Improve abbreviation disambiguation
  2017-10-12 12:02                 ` [PATCH v5 0/4] Improve abbreviation " Derrick Stolee
@ 2017-10-12 12:04                   ` Derrick Stolee
  2017-10-12 12:21                     ` Junio C Hamano
  0 siblings, 1 reply; 22+ messages in thread
From: Derrick Stolee @ 2017-10-12 12:04 UTC (permalink / raw)
  To: Derrick Stolee, git; +Cc: peff, gitster

On 10/12/2017 8:02 AM, Derrick Stolee wrote:
> Changes since previous version:
>
> * Make 'pos' unsigned in get_hex_char_from_oid()
>
> * Check response from open_pack_index()
>
> * Small typos in commit messages
>
> Thanks,
>   Stolee
>
I forgot to mention that I rebased on master this morning to be sure 
this doesn't conflict with the binary-search patch that was queued this 
week.

Thanks,
  Stolee

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 0/4] Improve abbreviation disambiguation
  2017-10-12 12:04                   ` Derrick Stolee
@ 2017-10-12 12:21                     ` Junio C Hamano
  2017-10-12 14:22                       ` Jeff King
  0 siblings, 1 reply; 22+ messages in thread
From: Junio C Hamano @ 2017-10-12 12:21 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Derrick Stolee, git, peff

Derrick Stolee <stolee@gmail.com> writes:

> On 10/12/2017 8:02 AM, Derrick Stolee wrote:
>> Changes since previous version:
>>
>> * Make 'pos' unsigned in get_hex_char_from_oid()
>>
>> * Check response from open_pack_index()
>>
>> * Small typos in commit messages
>>
>> Thanks,
>>   Stolee
>>
> I forgot to mention that I rebased on master this morning to be sure
> this doesn't conflict with the binary-search patch that was queued
> this week.

Thanks for being extra careful.  

I've re-applied minor style fix s/(void\*)/(void \*)/ to 2/4 and 4/4
but other than that, the difference between this round and what has
been queued looked all reasonable.

Will replace.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v5 0/4] Improve abbreviation disambiguation
  2017-10-12 12:21                     ` Junio C Hamano
@ 2017-10-12 14:22                       ` Jeff King
  0 siblings, 0 replies; 22+ messages in thread
From: Jeff King @ 2017-10-12 14:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Derrick Stolee, Derrick Stolee, git

On Thu, Oct 12, 2017 at 09:21:15PM +0900, Junio C Hamano wrote:

> Derrick Stolee <stolee@gmail.com> writes:
> 
> > On 10/12/2017 8:02 AM, Derrick Stolee wrote:
> >> Changes since previous version:
> >>
> >> * Make 'pos' unsigned in get_hex_char_from_oid()
> >>
> >> * Check response from open_pack_index()
> >>
> >> * Small typos in commit messages
> >>
> >> Thanks,
> >>   Stolee
> >>
> > I forgot to mention that I rebased on master this morning to be sure
> > this doesn't conflict with the binary-search patch that was queued
> > this week.
> 
> Thanks for being extra careful.  
> 
> I've re-applied minor style fix s/(void\*)/(void \*)/ to 2/4 and 4/4
> but other than that, the difference between this round and what has
> been queued looked all reasonable.
> 
> Will replace.

This looks good to me, too.

At first I was going to point out the nit that unique_in_pack() is
quietly fixed in 4/4 for the empty-pack case. But I don't think it's
actually buggy. The examination of nth_packed_object_oid() would never
be triggered if "num" is 0. So it probably is fine simply to fix it
quietly along with adding the new function.

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, back to index

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-08 18:49 [PATCH v4 0/4] Improve abbreviation disambiguation Derrick Stolee
2017-10-08 18:49 ` [PATCH v4 1/4] p4211-line-log.sh: add log --online --raw --parents perf test Derrick Stolee
2017-10-08 18:49 ` [PATCH v4 2/4] sha1_name: unroll len loop in find_unique_abbrev_r Derrick Stolee
2017-10-08 18:49 ` [PATCH v4 3/4] sha1_name: parse less while finding common prefix Derrick Stolee
2017-10-09 13:42   ` Jeff King
2017-10-08 18:49 ` [PATCH v4 4/4] sha1_name: minimize OID comparisons during disambiguation Derrick Stolee
2017-10-09 13:49   ` Jeff King
2017-10-10 12:16     ` Derrick Stolee
2017-10-10 12:36       ` Jeff King
2017-10-10 12:56         ` Junio C Hamano
2017-10-10 13:09           ` Jeff King
2017-10-10 13:11           ` Derrick Stolee
2017-10-10 13:30             ` Jeff King
2017-10-11 13:58               ` Derrick Stolee
2017-10-12 12:02                 ` [PATCH v5 0/4] Improve abbreviation " Derrick Stolee
2017-10-12 12:04                   ` Derrick Stolee
2017-10-12 12:21                     ` Junio C Hamano
2017-10-12 14:22                       ` Jeff King
2017-10-12 12:02                 ` [PATCH v5 1/4] p4211-line-log.sh: add log --online --raw --parents perf test Derrick Stolee
2017-10-12 12:02                 ` [PATCH v5 2/4] sha1_name: unroll len loop in find_unique_abbrev_r Derrick Stolee
2017-10-12 12:02                 ` [PATCH v5 3/4] sha1_name: parse less while finding common prefix Derrick Stolee
2017-10-12 12:02                 ` [PATCH v5 4/4] sha1_name: minimize OID comparisons during disambiguation Derrick Stolee

git@vger.kernel.org mailing list mirror (one of many)

Archives are clonable:
	git clone --mirror https://public-inbox.org/git
	git clone --mirror http://ou63pmih66umazou.onion/git
	git clone --mirror http://czquwvybam4bgbro.onion/git
	git clone --mirror http://hjrcffqmbrq6wope.onion/git

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.version-control.git
	nntp://ou63pmih66umazou.onion/inbox.comp.version-control.git
	nntp://czquwvybam4bgbro.onion/inbox.comp.version-control.git
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.version-control.git
	nntp://news.gmane.org/gmane.comp.version-control.git

 note: .onion URLs require Tor: https://www.torproject.org/
       or Tor2web: https://www.tor2web.org/

AGPL code for this site: git clone https://public-inbox.org/ public-inbox