git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Junio C Hamano <gitster@pobox.com>
Cc: Jeff King <peff@peff.net>, Johannes Sixt <j6t@kdbg.org>,
	Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
Date: Thu, 29 Sep 2016 18:18:03 -0700	[thread overview]
Message-ID: <CA+55aFyHn0Q-qPq4dPEJ7X_4jf5UbsVw2vE-4LoWYbPn6gS10g@mail.gmail.com> (raw)
In-Reply-To: <CA+55aFxsfxvDQqi2M3TUVvAHUx3Qm1hHQ4DMyzXzN6V2v7o-3A@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1470 bytes --]

On Thu, Sep 29, 2016 at 5:57 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Actually, all the other cases seem to be "parse a SHA1 with a known
> length", so they really don't have a negative length.  So this seems
> ok, and is easier to verify than the "what all contexts might use
> DEFAULT_ABBREV" thing. There's only a few callers, and it's a static
> function so it's easy to check it locally in sha1_name.c.

Here's my original patch with just a tiny change that instead of
starting the automatic guessing at 7 each time, it starts at
"default_automatic_abbrev", which is initialized to 7.

The difference is that if we decide that "oh, that was too small, need
to repeat", we also update that "default_automatic_abbrev" value, so
that we won't start at the number that we now know was too small.

So it still loops over the abbrev values, but now it only loops a
couple of times.

I actually verified the performance impact by doing

      time git rev-list --abbrev-commit HEAD > /dev/null

on the kernel git tree, and it does actually matter. With my original
patch, we wasted a noticeable amount of time on just the extra
looping, with this it's down to the same performance as just doing it
once at init time (it's about 12s vs 9s on my laptop).

So this patch may actually be "production ready" apart from the fact
that some tests still fail (at least t2027-worktree-list.sh) because
of different short SHA1 cases.

                     Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 3393 bytes --]

 cache.h       |  1 +
 environment.c |  2 +-
 sha1_name.c   | 26 +++++++++++++++++++++++++-
 3 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index 6e33f2f28..d2da6d186 100644
--- a/cache.h
+++ b/cache.h
@@ -1207,6 +1207,7 @@ struct object_context {
 #define GET_SHA1_TREEISH          020
 #define GET_SHA1_BLOB             040
 #define GET_SHA1_FOLLOW_SYMLINKS 0100
+#define GET_SHA1_AUTOMATIC	 0200
 #define GET_SHA1_ONLY_TO_DIE    04000
 
 #define GET_SHA1_DISAMBIGUATORS \
diff --git a/environment.c b/environment.c
index c1442df9a..fd6681e46 100644
--- a/environment.c
+++ b/environment.c
@@ -16,7 +16,7 @@ int trust_executable_bit = 1;
 int trust_ctime = 1;
 int check_stat = 1;
 int has_symlinks = 1;
-int minimum_abbrev = 4, default_abbrev = 7;
+int minimum_abbrev = 4, default_abbrev = -1;
 int ignore_case;
 int assume_unchanged;
 int prefer_symlink_refs;
diff --git a/sha1_name.c b/sha1_name.c
index 3b647fd7c..1003c96ea 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -15,6 +15,7 @@ typedef int (*disambiguate_hint_fn)(const unsigned char *, void *);
 
 struct disambiguate_state {
 	int len; /* length of prefix in hex chars */
+	unsigned int nrobjects;
 	char hex_pfx[GIT_SHA1_HEXSZ + 1];
 	unsigned char bin_pfx[GIT_SHA1_RAWSZ];
 
@@ -118,6 +119,12 @@ static void find_short_object_filename(struct disambiguate_state *ds)
 
 			if (strlen(de->d_name) != 38)
 				continue;
+
+			// We only look at the one subdirectory, and we assume
+			// each subdirectory is roughly similar, so each object
+			// we find probably has 255 other objects in the other
+			// fan-out directories
+			ds->nrobjects += 256;
 			if (memcmp(de->d_name, ds->hex_pfx + 2, ds->len - 2))
 				continue;
 			memcpy(hex + 2, de->d_name, 38);
@@ -151,6 +158,7 @@ static void unique_in_pack(struct packed_git *p,
 
 	open_pack_index(p);
 	num = p->num_objects;
+	ds->nrobjects += num;
 	last = num;
 	while (first < last) {
 		uint32_t mid = (first + last) / 2;
@@ -380,6 +388,9 @@ static int show_ambiguous_object(const unsigned char *sha1, void *data)
 	return 0;
 }
 
+// Why seven? That's our historical default before the automatic abbreviation
+static int default_automatic_abbrev = 7;
+
 static int get_short_sha1(const char *name, int len, unsigned char *sha1,
 			  unsigned flags)
 {
@@ -426,6 +437,14 @@ static int get_short_sha1(const char *name, int len, unsigned char *sha1,
 		for_each_abbrev(ds.hex_pfx, show_ambiguous_object, &ds);
 	}
 
+	if (len < 16 && !status && (flags & GET_SHA1_AUTOMATIC)) {
+		unsigned int expect_collision = 1 << (len * 2);
+		if (ds.nrobjects > expect_collision) {
+			default_automatic_abbrev = len+1;
+			return SHORT_NAME_AMBIGUOUS;
+		}
+	}
+
 	return status;
 }
 
@@ -458,14 +477,19 @@ int for_each_abbrev(const char *prefix, each_abbrev_fn fn, void *cb_data)
 int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len)
 {
 	int status, exists;
+	int flags = GET_SHA1_QUIETLY;
 
+	if (len < 0) {
+		flags |= GET_SHA1_AUTOMATIC;
+		len = default_automatic_abbrev;
+	}
 	sha1_to_hex_r(hex, sha1);
 	if (len == 40 || !len)
 		return 40;
 	exists = has_sha1_file(sha1);
 	while (len < 40) {
 		unsigned char sha1_ret[20];
-		status = get_short_sha1(hex, len, sha1_ret, GET_SHA1_QUIETLY);
+		status = get_short_sha1(hex, len, sha1_ret, flags);
 		if (exists
 		    ? !status
 		    : status == SHORT_NAME_NOT_FOUND) {

  reply	other threads:[~2016-09-30  1:18 UTC|newest]

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-26  1:39 Changing the default for "core.abbrev"? Linus Torvalds
2016-09-26  3:46 ` Junio C Hamano
2016-09-26  4:34   ` Jeff King
2016-09-26  4:45     ` Junio C Hamano
2016-09-26 11:57       ` [PATCH 0/10] helping people resolve ambiguous sha1s Jeff King
2016-09-26 11:59         ` [PATCH 01/10] get_sha1: detect buggy calls with multiple disambiguators Jeff King
2016-09-26 16:37           ` Junio C Hamano
2016-09-26 17:21             ` Jeff King
2016-09-26 17:50               ` Junio C Hamano
2016-09-26 11:59         ` [PATCH 02/10] get_sha1: avoid repeating ourselves via ONLY_TO_DIE Jeff King
2016-09-26 11:59         ` [PATCH 03/10] get_sha1: propagate flags to child functions Jeff King
2016-09-26 11:59         ` [PATCH 04/10] get_short_sha1: peel tags when looking for treeish Jeff King
2016-09-26 12:11           ` Jeff King
2016-09-26 16:55           ` Junio C Hamano
2016-09-26 17:23             ` Jeff King
2016-09-26 12:00         ` [PATCH 05/10] get_short_sha1: refactor init of disambiguation code Jeff King
2016-09-26 12:00         ` [PATCH 06/10] get_short_sha1: NUL-terminate hex prefix Jeff King
2016-09-26 17:10           ` Junio C Hamano
2016-09-26 17:25             ` Jeff King
2016-09-26 17:36               ` Junio C Hamano
2016-09-26 12:00         ` [PATCH 07/10] get_short_sha1: mark ambiguity error for translation Jeff King
2016-09-26 12:00         ` [PATCH 08/10] sha1_array: let callbacks interrupt iteration Jeff King
2016-09-26 12:00         ` [PATCH 09/10] for_each_abbrev: drop duplicate objects Jeff King
2016-09-26 12:00         ` [PATCH 10/10] get_short_sha1: list ambiguous objects on error Jeff King
2016-09-26 16:36           ` Linus Torvalds
2016-09-27  5:42             ` Jacob Keller
2016-09-27 12:38             ` Jeff King
2016-09-29 13:01             ` Kyle J. McKay
2016-09-29 13:24               ` Jeff King
2016-09-29 14:36                 ` Kyle J. McKay
2016-09-29 14:55                   ` Jeff King
2016-09-26 17:30           ` Junio C Hamano
2016-09-26 17:34             ` Jeff King
2016-09-26 17:39               ` Junio C Hamano
2016-09-29 11:46           ` Kyle J. McKay
2016-09-29 13:03             ` Jeff King
2016-09-29 17:19               ` Junio C Hamano
2016-09-30  5:51                 ` Jacob Keller
2019-02-04 16:12     ` [RFC/PATCH] core.abbrev doc: document and test the abbreviation length Ævar Arnfjörð Bjarmason
2019-02-04 19:13       ` Junio C Hamano
2019-02-04 20:04       ` Junio C Hamano
2019-02-04 21:36         ` Ævar Arnfjörð Bjarmason
2019-02-04 23:32         ` Jeff King
2019-02-04 23:50           ` Ævar Arnfjörð Bjarmason
2019-02-06 18:29             ` Jeff King
2019-02-06 18:36               ` Ævar Arnfjörð Bjarmason
2016-09-26  6:33   ` Changing the default for "core.abbrev"? Matthieu Moy
2016-09-26 12:09     ` Jeff King
2016-09-29 13:01   ` Kyle J. McKay
2016-09-26  7:13 ` Christian Couder
2016-09-28 23:30 ` [PATCH 0/4] raising core.abbrev default to 12 hexdigits Junio C Hamano
2016-09-28 23:30   ` [PATCH 1/4] config: allow customizing /etc/gitconfig location Junio C Hamano
2016-09-29  9:53     ` Jakub Narębski
2016-09-29 17:20       ` Junio C Hamano
2016-09-29 17:45         ` Matthieu Moy
2016-09-28 23:30   ` [PATCH 2/4] t13xx: do not assume system config is empty Junio C Hamano
2016-09-29  9:01     ` Jeff King
2016-09-29 18:13       ` Junio C Hamano
2016-09-29 18:26         ` Jeff King
2016-09-29 18:57           ` Junio C Hamano
2016-09-29 19:18             ` Jeff King
2016-09-29 19:57               ` Junio C Hamano
2016-09-29 19:06           ` Junio C Hamano
2016-09-29 19:26             ` Jeff King
2016-09-29 21:03               ` Junio C Hamano
2016-09-29 21:08                 ` Jeff King
2016-09-28 23:30   ` [PATCH 3/4] worktree: honor configuration variables Junio C Hamano
2016-09-28 23:30   ` [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits Junio C Hamano
2016-09-29  2:44     ` SZEDER Gábor
2016-09-29  5:27       ` Lukas Fleischer
2016-09-29  9:22         ` Jeff King
2016-09-29  9:15       ` Jeff King
2016-09-29 10:03         ` Matthieu Moy
2016-09-29 12:52         ` SZEDER Gábor
2016-09-29  5:58     ` Johannes Sixt
2016-09-29 18:05       ` Junio C Hamano
2016-09-29 18:37         ` Linus Torvalds
2016-09-29 18:55           ` Linus Torvalds
2016-09-29 19:06             ` Linus Torvalds
2016-09-29 19:42               ` Junio C Hamano
2016-09-30  0:56               ` Mike Hommey
2016-09-30  1:01                 ` Linus Torvalds
2016-09-30 19:41                   ` Ævar Arnfjörð Bjarmason
2016-09-29 19:16             ` Jeff King
2016-09-29 19:40               ` Linus Torvalds
2016-09-29 19:45                 ` Junio C Hamano
2016-09-29 21:53                   ` Linus Torvalds
2016-09-29 23:13                     ` Junio C Hamano
2016-09-29 23:20                       ` Junio C Hamano
2016-09-30  0:20                       ` Linus Torvalds
2016-09-30  0:28                         ` Linus Torvalds
2016-09-30  0:57                           ` Linus Torvalds
2016-09-30  1:18                             ` Linus Torvalds [this message]
2016-09-30  3:54                               ` Junio C Hamano
2016-09-30  4:10                                 ` Junio C Hamano
2016-09-30  4:18                                   ` Linus Torvalds
2016-09-30  4:29                                     ` Linus Torvalds
2016-09-30  4:27                                   ` Junio C Hamano
2016-09-30  4:35                                     ` Junio C Hamano
2016-09-30 18:40                                     ` Junio C Hamano
2016-09-30 18:51                                       ` Linus Torvalds
2016-09-30 19:00                                         ` Junio C Hamano
2016-09-30  4:11                                 ` Linus Torvalds
2016-09-30  8:06                               ` Jeff King
2016-09-30 17:54                                 ` Linus Torvalds
2016-09-30 18:05                                   ` Jeff King
2016-09-30 18:21                                   ` Linus Torvalds
2016-09-30 20:01                                     ` Junio C Hamano
2016-09-30 17:56                                 ` Junio C Hamano
2016-09-30  7:47                       ` Jeff King
2016-09-29  9:25     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+55aFyHn0Q-qPq4dPEJ7X_4jf5UbsVw2vE-4LoWYbPn6gS10g@mail.gmail.com \
    --to=torvalds@linux-foundation.org \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j6t@kdbg.org \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).