git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <dstolee@microsoft.com>
To: git@vger.kernel.org
Cc: gitster@pobox.com, stolee@gmail.com, peff@peff.net,
	ramsay@ramsayjones.plus.com, sbeller@google.com,
	Derrick Stolee <dstolee@microsoft.com>
Subject: [PATCH v4 4/4] sha1_name: minimize OID comparisons during disambiguation
Date: Sun,  8 Oct 2017 14:49:42 -0400	[thread overview]
Message-ID: <20171008184942.69444-5-dstolee@microsoft.com> (raw)
In-Reply-To: <20171008184942.69444-1-dstolee@microsoft.com>

Minimize OID comparisons during disambiguatiosn of packfile OIDs.

Teach git to use binary search with the full OID to find the object's
position (or insertion position, if not present) in the pack-index.
The object before and immediately after (or the one at the insertion
position) give the maximum common prefix.  No subsequent linear search
is required.

Take care of which two to inspect, in case the object id exists in the
packfile.

If the input to find_unique_abbrev_r() is a partial prefix, then the
OID used for the binary search is padded with zeroes so the object will
not exist in the repo (with high probability) and the same logic
applies.

This commit completes a series of three changes to OID abbreviation
code, and the overall change can be seen using standard commands for
large repos. Below we report performance statistics for perf test 4211.6
from p4211-line-log.sh using three copies of the Linux repo:

| Packs | Loose  | HEAD~3   | HEAD     | Rel%  |
|-------|--------|----------|----------|-------|
|  1    |      0 |  41.27 s |  38.93 s | -4.8% |
| 24    |      0 |  98.04 s |  91.35 s | -5.7% |
| 23    | 323952 | 117.78 s | 112.18 s | -4.8% |

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 sha1_name.c | 70 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 66 insertions(+), 4 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index 5081aeb71..49ba67955 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -478,6 +478,7 @@ struct min_abbrev_data {
 	unsigned int init_len;
 	unsigned int cur_len;
 	char *hex;
+	const unsigned char *hash;
 };
 
 static inline char get_hex_char_from_oid(const struct object_id *oid,
@@ -505,6 +506,65 @@ static int extend_abbrev_len(const struct object_id *oid, void *cb_data)
 	return 0;
 }
 
+static void find_abbrev_len_for_pack(struct packed_git *p,
+				     struct min_abbrev_data *mad)
+{
+	int match = 0;
+	uint32_t num, last, first = 0;
+	struct object_id oid;
+
+	open_pack_index(p);
+	num = p->num_objects;
+	last = num;
+	while (first < last) {
+		uint32_t mid = first + (last - first) / 2;
+		const unsigned char *current;
+		int cmp;
+
+		current = nth_packed_object_sha1(p, mid);
+		cmp = hashcmp(mad->hash, current);
+		if (!cmp) {
+			match = 1;
+			first = mid;
+			break;
+		}
+		if (cmp > 0) {
+			first = mid + 1;
+			continue;
+		}
+		last = mid;
+	}
+
+	/*
+	 * first is now the position in the packfile where we would insert
+	 * mad->hash if it does not exist (or the position of mad->hash if
+	 * it does exist). Hence, we consider a maximum of three objects
+	 * nearby for the abbreviation length.
+	 */
+	mad->init_len = 0;
+	if (!match) {
+		nth_packed_object_oid(&oid, p, first);
+		extend_abbrev_len(&oid, mad);
+	} else if (first < num - 1) {
+		nth_packed_object_oid(&oid, p, first + 1);
+		extend_abbrev_len(&oid, mad);
+	}
+	if (first > 0) {
+		nth_packed_object_oid(&oid, p, first - 1);
+		extend_abbrev_len(&oid, mad);
+	}
+	mad->init_len = mad->cur_len;
+}
+
+static void find_abbrev_len_packed(struct min_abbrev_data *mad)
+{
+	struct packed_git *p;
+
+	prepare_packed_git();
+	for (p = packed_git; p; p = p->next)
+		find_abbrev_len_for_pack(p, mad);
+}
+
 int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len)
 {
 	struct disambiguate_state ds;
@@ -536,19 +596,21 @@ int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len)
 	if (len == GIT_SHA1_HEXSZ || !len)
 		return GIT_SHA1_HEXSZ;
 
-	if (init_object_disambiguation(hex, len, &ds) < 0)
-		return -1;
-
 	mad.init_len = len;
 	mad.cur_len = len;
 	mad.hex = hex;
+	mad.hash = sha1;
+
+	find_abbrev_len_packed(&mad);
+
+	if (init_object_disambiguation(hex, mad.cur_len, &ds) < 0)
+		return -1;
 
 	ds.fn = extend_abbrev_len;
 	ds.always_call_fn = 1;
 	ds.cb_data = (void*)&mad;
 
 	find_short_object_filename(&ds);
-	find_short_packed_object(&ds);
 	(void)finish_object_disambiguation(&ds, &oid_ret);
 
 	hex[mad.cur_len] = 0;
-- 
2.14.1.538.g56ec8fc98.dirty


  parent reply	other threads:[~2017-10-08 18:50 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-10-08 18:49 [PATCH v4 0/4] Improve abbreviation disambiguation Derrick Stolee
2017-10-08 18:49 ` [PATCH v4 1/4] p4211-line-log.sh: add log --online --raw --parents perf test Derrick Stolee
2017-10-08 18:49 ` [PATCH v4 2/4] sha1_name: unroll len loop in find_unique_abbrev_r Derrick Stolee
2017-10-08 18:49 ` [PATCH v4 3/4] sha1_name: parse less while finding common prefix Derrick Stolee
2017-10-09 13:42   ` Jeff King
2017-10-08 18:49 ` Derrick Stolee [this message]
2017-10-09 13:49   ` [PATCH v4 4/4] sha1_name: minimize OID comparisons during disambiguation Jeff King
2017-10-10 12:16     ` Derrick Stolee
2017-10-10 12:36       ` Jeff King
2017-10-10 12:56         ` Junio C Hamano
2017-10-10 13:09           ` Jeff King
2017-10-10 13:11           ` Derrick Stolee
2017-10-10 13:30             ` Jeff King
2017-10-11 13:58               ` Derrick Stolee
2017-10-12 12:02                 ` [PATCH v5 0/4] Improve abbreviation disambiguation Derrick Stolee
2017-10-12 12:04                   ` Derrick Stolee
2017-10-12 12:21                     ` Junio C Hamano
2017-10-12 14:22                       ` Jeff King
2017-10-12 12:02                 ` [PATCH v5 1/4] p4211-line-log.sh: add log --online --raw --parents perf test Derrick Stolee
2017-10-12 12:02                 ` [PATCH v5 2/4] sha1_name: unroll len loop in find_unique_abbrev_r Derrick Stolee
2017-10-12 12:02                 ` [PATCH v5 3/4] sha1_name: parse less while finding common prefix Derrick Stolee
2017-10-12 12:02                 ` [PATCH v5 4/4] sha1_name: minimize OID comparisons during disambiguation Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171008184942.69444-5-dstolee@microsoft.com \
    --to=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=ramsay@ramsayjones.plus.com \
    --cc=sbeller@google.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).