git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/3] auto-sizing default abbreviation length
@ 2016-10-01  0:19 Junio C Hamano
  2016-10-01  0:19 ` [PATCH 1/3] abbrev: add FALLBACK_DEFAULT_ABBREV to prepare for auto sizing Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Junio C Hamano @ 2016-10-01  0:19 UTC (permalink / raw)
  To: git

So here is what I queued in 'pu' tonight after back and forth with
Linus and Peff.  The third step from Linus needs to be signed off
and also a meaningful log message for it needs to be written, and
also may need to be updated to include what Linus did in [*1*], but
otherwise I think these are in good enough shape for people to start
playing with them.

They apply on top of Peff's jk/ambiguous-short-object-names topic
that ends at 5b33cb1fd7 ("get_short_sha1: make default
disambiguation configurable", 2016-09-27).


*1*

http://public-inbox.org/git/<CA+55aFxyF=xX84AXr8MG14MRHwdrQw00PBM20UfqBdidaeqdMg@mail.gmail.com>


Junio C Hamano (2):
  abbrev: add FALLBACK_DEFAULT_ABBREV to prepare for auto sizing
  abbrev: prepare for new world order

Linus Torvalds (1):
  abbrev: auto size the default abbreviation

 builtin/fetch.c     |  3 +++
 builtin/rev-parse.c |  5 +++--
 cache.h             |  4 ++++
 diff.c              | 25 +++++++++++++++++++++++--
 environment.c       |  2 +-
 sha1_name.c         | 28 +++++++++++++++++++++++++++-
 transport.h         |  3 +--
 7 files changed, 62 insertions(+), 8 deletions(-)

-- 
2.10.0-622-g05f606bbb0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/3] abbrev: add FALLBACK_DEFAULT_ABBREV to prepare for auto sizing
  2016-10-01  0:19 [PATCH 0/3] auto-sizing default abbreviation length Junio C Hamano
@ 2016-10-01  0:19 ` Junio C Hamano
  2016-10-01  0:19 ` [PATCH 2/3] abbrev: prepare for new world order Junio C Hamano
  2016-10-01  0:19 ` [PATCH 3/3] abbrev: auto size the default abbreviation Junio C Hamano
  2 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2016-10-01  0:19 UTC (permalink / raw)
  To: git

We'll be introducing a new way to decide the default abbreviation
length by initialising DEFAULT_ABBREV to -1 to signal the first call
to "find unique abbreviation" codepath to compute a reasonable value
based on the number of objects we have to avoid collisions.

We have long relied on DEFAULT_ABBREV being a positive concrete
value that is used as the abbreviation length when no extra
configuration or command line option has overridden it.  Some
codepaths wants to use such a positive concrete default value
even before making their first request to actually trigger the
computation for the auto sized default.

Introduce FALLBACK_DEFAULT_ABBREV and use it to the code that
attempts to align the report from "git fetch".  For now, this
macro is also used to initialize the default_abbrev variable,
but the auto-sizing code will use -1 and then use the value of
FALLBACK_DEFAULT_ABBREV as the starting point of auto-sizing.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/fetch.c | 3 +++
 cache.h         | 3 +++
 environment.c   | 2 +-
 transport.h     | 3 +--
 4 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/builtin/fetch.c b/builtin/fetch.c
index 164623bb6f..a9f12cc5cf 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -17,6 +17,9 @@
 #include "argv-array.h"
 #include "utf8.h"
 
+#define TRANSPORT_SUMMARY(x) \
+	(int)(TRANSPORT_SUMMARY_WIDTH + strlen(x) - gettext_width(x)), (x)
+
 static const char * const builtin_fetch_usage[] = {
 	N_("git fetch [<options>] [<repository> [<refspec>...]]"),
 	N_("git fetch [<options>] <group>"),
diff --git a/cache.h b/cache.h
index f346c01708..5a651b8435 100644
--- a/cache.h
+++ b/cache.h
@@ -1183,6 +1183,9 @@ static inline int hex2chr(const char *s)
 #define MINIMUM_ABBREV minimum_abbrev
 #define DEFAULT_ABBREV default_abbrev
 
+/* used when the code does not know or care what the default abbrev is */
+#define FALLBACK_DEFAULT_ABBREV 7
+
 struct object_context {
 	unsigned char tree[20];
 	char path[PATH_MAX];
diff --git a/environment.c b/environment.c
index cd5aa57179..44fb107b8a 100644
--- a/environment.c
+++ b/environment.c
@@ -16,7 +16,7 @@ int trust_executable_bit = 1;
 int trust_ctime = 1;
 int check_stat = 1;
 int has_symlinks = 1;
-int minimum_abbrev = 4, default_abbrev = 7;
+int minimum_abbrev = 4, default_abbrev = FALLBACK_DEFAULT_ABBREV;
 int ignore_case;
 int assume_unchanged;
 int prefer_symlink_refs;
diff --git a/transport.h b/transport.h
index 6fe3485325..e783377e40 100644
--- a/transport.h
+++ b/transport.h
@@ -142,8 +142,7 @@ struct transport {
 #define TRANSPORT_PUSH_ATOMIC 8192
 #define TRANSPORT_PUSH_OPTIONS 16384
 
-#define TRANSPORT_SUMMARY_WIDTH (2 * DEFAULT_ABBREV + 3)
-#define TRANSPORT_SUMMARY(x) (int)(TRANSPORT_SUMMARY_WIDTH + strlen(x) - gettext_width(x)), (x)
+#define TRANSPORT_SUMMARY_WIDTH (2 * FALLBACK_DEFAULT_ABBREV + 3)
 
 /* Returns a transport suitable for the url */
 struct transport *transport_get(struct remote *, const char *);
-- 
2.10.0-622-g05f606bbb0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/3] abbrev: prepare for new world order
  2016-10-01  0:19 [PATCH 0/3] auto-sizing default abbreviation length Junio C Hamano
  2016-10-01  0:19 ` [PATCH 1/3] abbrev: add FALLBACK_DEFAULT_ABBREV to prepare for auto sizing Junio C Hamano
@ 2016-10-01  0:19 ` Junio C Hamano
  2016-10-01  0:19 ` [PATCH 3/3] abbrev: auto size the default abbreviation Junio C Hamano
  2 siblings, 0 replies; 13+ messages in thread
From: Junio C Hamano @ 2016-10-01  0:19 UTC (permalink / raw)
  To: git

The code that sets custom abbreviation length, in response to
command line argument, often does something like this:

	if (skip_prefix(arg, "--abbrev=", &arg))
		abbrev = atoi(arg);
	else if (!strcmp("--abbrev", &arg))
		abbrev = DEFAULT_ABBREV;
	/* make the value sane */
	if (abbrev < 0 || 40 < abbrev)
		abbrev = ... some sane value ...

However, it is pointless to sanity-check and tweak the value
obtained from DEFAULT_ABBREV.  We are going to allow it to be
initially set to -1 to signal that the default abbreviation length
must be auto sized upon the first request to abbreviate, based on
the number of objects in the repository, and when that happens,
rejecting or tweaking a negative value to a "saner" one will
negatively interfere with the auto sizing.  The codepaths for

    git rev-parse --short <object>
    git diff --raw --abbrev

do exactly that; allow them to pass possibly negative abbrevs
intact, that will come from DEFAULT_ABBREV in the future.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/rev-parse.c | 5 +++--
 diff.c              | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index 76cf05e2ad..17cbfabdde 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -643,8 +643,9 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
 				filter &= ~(DO_FLAGS|DO_NOREV);
 				verify = 1;
 				abbrev = DEFAULT_ABBREV;
-				if (arg[7] == '=')
-					abbrev = strtoul(arg + 8, NULL, 10);
+				if (!arg[7])
+					continue;
+				abbrev = strtoul(arg + 8, NULL, 10);
 				if (abbrev < MINIMUM_ABBREV)
 					abbrev = MINIMUM_ABBREV;
 				else if (40 <= abbrev)
diff --git a/diff.c b/diff.c
index c6da383c56..cefc13eb8e 100644
--- a/diff.c
+++ b/diff.c
@@ -3399,7 +3399,7 @@ void diff_setup_done(struct diff_options *options)
 			 */
 			read_cache();
 	}
-	if (options->abbrev <= 0 || 40 < options->abbrev)
+	if (40 < options->abbrev)
 		options->abbrev = 40; /* full */
 
 	/*
-- 
2.10.0-622-g05f606bbb0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/3] abbrev: auto size the default abbreviation
  2016-10-01  0:19 [PATCH 0/3] auto-sizing default abbreviation length Junio C Hamano
  2016-10-01  0:19 ` [PATCH 1/3] abbrev: add FALLBACK_DEFAULT_ABBREV to prepare for auto sizing Junio C Hamano
  2016-10-01  0:19 ` [PATCH 2/3] abbrev: prepare for new world order Junio C Hamano
@ 2016-10-01  0:19 ` Junio C Hamano
  2016-10-03 22:27   ` Jeff King
  2 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2016-10-01  0:19 UTC (permalink / raw)
  To: git; +Cc: Linus Torvalds

From: Linus Torvalds <torvalds@linux-foundation.org>

In fairly early days we somehow decided to abbreviate object names
down to 7-hexdigits, but as projects grow, it is becoming more and
more likely to see such a short object names made in earlier days
and recorded in the log messages no longer unique.

Currently the Linux kernel project needs 11 to 12 hexdigits, while
Git itself needs 10 hexdigits to uniquely identify the objects they
have, while many smaller projects may still be fine with the
original 7-hexdigit default.  One-size does not fit all projects.

Introduce a mechanism, where we estimate the number of objects in
the repository upon the first request to abbreviate an object name
with the default setting and come up with a sane default for the
repository.  Based on the expectation that we would see collision in
a repository with 2^(2N) objects when using object names shortened
to first N bits, use sufficient number of hexdigits to cover the
number of objects in the repository.  Each hexdigit (4-bits) we add
to the shortened name allows us to have four times (2-bits) as many
objects in the repository.

---
 cache.h       |  1 +
 environment.c |  2 +-
 sha1_name.c   | 28 +++++++++++++++++++++++++++-
 3 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index 5a651b8435..0e2a0595e5 100644
--- a/cache.h
+++ b/cache.h
@@ -1204,6 +1204,7 @@ struct object_context {
 #define GET_SHA1_TREEISH          020
 #define GET_SHA1_BLOB             040
 #define GET_SHA1_FOLLOW_SYMLINKS 0100
+#define GET_SHA1_AUTOMATIC	 0200
 #define GET_SHA1_ONLY_TO_DIE    04000
 
 #define GET_SHA1_DISAMBIGUATORS \
diff --git a/environment.c b/environment.c
index 44fb107b8a..6f9d290563 100644
--- a/environment.c
+++ b/environment.c
@@ -16,7 +16,7 @@ int trust_executable_bit = 1;
 int trust_ctime = 1;
 int check_stat = 1;
 int has_symlinks = 1;
-int minimum_abbrev = 4, default_abbrev = FALLBACK_DEFAULT_ABBREV;
+int minimum_abbrev = 4, default_abbrev = -1;
 int ignore_case;
 int assume_unchanged;
 int prefer_symlink_refs;
diff --git a/sha1_name.c b/sha1_name.c
index 3b647fd7cf..beb7ab588b 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -15,6 +15,7 @@ typedef int (*disambiguate_hint_fn)(const unsigned char *, void *);
 
 struct disambiguate_state {
 	int len; /* length of prefix in hex chars */
+	unsigned int nrobjects;
 	char hex_pfx[GIT_SHA1_HEXSZ + 1];
 	unsigned char bin_pfx[GIT_SHA1_RAWSZ];
 
@@ -118,6 +119,14 @@ static void find_short_object_filename(struct disambiguate_state *ds)
 
 			if (strlen(de->d_name) != 38)
 				continue;
+
+			/*
+			 * We only look at the one subdirectory, and we assume
+			 * each subdirectory is roughly similar, so each
+			 * object we find probably has 255 other objects in
+			 * the other fan-out directories.
+			 */
+			ds->nrobjects += 256;
 			if (memcmp(de->d_name, ds->hex_pfx + 2, ds->len - 2))
 				continue;
 			memcpy(hex + 2, de->d_name, 38);
@@ -151,6 +160,7 @@ static void unique_in_pack(struct packed_git *p,
 
 	open_pack_index(p);
 	num = p->num_objects;
+	ds->nrobjects += num;
 	last = num;
 	while (first < last) {
 		uint32_t mid = (first + last) / 2;
@@ -380,6 +390,9 @@ static int show_ambiguous_object(const unsigned char *sha1, void *data)
 	return 0;
 }
 
+/* start from our historical default before the automatic abbreviation */
+static int default_automatic_abbrev = FALLBACK_DEFAULT_ABBREV;
+
 static int get_short_sha1(const char *name, int len, unsigned char *sha1,
 			  unsigned flags)
 {
@@ -426,6 +439,14 @@ static int get_short_sha1(const char *name, int len, unsigned char *sha1,
 		for_each_abbrev(ds.hex_pfx, show_ambiguous_object, &ds);
 	}
 
+	if (len < 16 && !status && (flags & GET_SHA1_AUTOMATIC)) {
+		unsigned int expect_collision = 1 << (len * 2);
+		if (ds.nrobjects > expect_collision) {
+			default_automatic_abbrev = len+1;
+			return SHORT_NAME_AMBIGUOUS;
+		}
+	}
+
 	return status;
 }
 
@@ -458,14 +479,19 @@ int for_each_abbrev(const char *prefix, each_abbrev_fn fn, void *cb_data)
 int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len)
 {
 	int status, exists;
+	int flags = GET_SHA1_QUIETLY;
 
+	if (len < 0) {
+		flags |= GET_SHA1_AUTOMATIC;
+		len = default_automatic_abbrev;
+	}
 	sha1_to_hex_r(hex, sha1);
 	if (len == 40 || !len)
 		return 40;
 	exists = has_sha1_file(sha1);
 	while (len < 40) {
 		unsigned char sha1_ret[20];
-		status = get_short_sha1(hex, len, sha1_ret, GET_SHA1_QUIETLY);
+		status = get_short_sha1(hex, len, sha1_ret, flags);
 		if (exists
 		    ? !status
 		    : status == SHORT_NAME_NOT_FOUND) {
-- 
2.10.0-622-g05f606bbb0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] abbrev: auto size the default abbreviation
  2016-10-01  0:19 ` [PATCH 3/3] abbrev: auto size the default abbreviation Junio C Hamano
@ 2016-10-03 22:27   ` Jeff King
  2016-10-03 22:34     ` Linus Torvalds
  2016-11-02  1:33     ` Junio C Hamano
  0 siblings, 2 replies; 13+ messages in thread
From: Jeff King @ 2016-10-03 22:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds

On Fri, Sep 30, 2016 at 05:19:37PM -0700, Junio C Hamano wrote:

> Introduce a mechanism, where we estimate the number of objects in
> the repository upon the first request to abbreviate an object name
> with the default setting and come up with a sane default for the
> repository.  Based on the expectation that we would see collision in
> a repository with 2^(2N) objects when using object names shortened
> to first N bits, use sufficient number of hexdigits to cover the
> number of objects in the repository.  Each hexdigit (4-bits) we add
> to the shortened name allows us to have four times (2-bits) as many
> objects in the repository.
> 
> ---
>  cache.h       |  1 +
>  environment.c |  2 +-
>  sha1_name.c   | 28 +++++++++++++++++++++++++++-
>  3 files changed, 29 insertions(+), 2 deletions(-)

For reference, here's a working version that just uses a separate
counting function (no commit message, because I would just steal the one
from Linus ;) ).

---
 cache.h       |  6 ++++++
 environment.c |  2 +-
 sha1_file.c   | 27 +++++++++++++++++++++++++++
 sha1_name.c   | 20 ++++++++++++++++++++
 4 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/cache.h b/cache.h
index 5a651b8..f22ace5 100644
--- a/cache.h
+++ b/cache.h
@@ -1455,6 +1455,12 @@ extern void prepare_packed_git(void);
 extern void reprepare_packed_git(void);
 extern void install_packed_git(struct packed_git *pack);
 
+/*
+ * Give a rough count of objects in the repository. This sacrifices accuracy
+ * for speed.
+ */
+unsigned long approximate_object_count(void);
+
 extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
 					 struct packed_git *packs);
 
diff --git a/environment.c b/environment.c
index 44fb107..6f9d290 100644
--- a/environment.c
+++ b/environment.c
@@ -16,7 +16,7 @@ int trust_executable_bit = 1;
 int trust_ctime = 1;
 int check_stat = 1;
 int has_symlinks = 1;
-int minimum_abbrev = 4, default_abbrev = FALLBACK_DEFAULT_ABBREV;
+int minimum_abbrev = 4, default_abbrev = -1;
 int ignore_case;
 int assume_unchanged;
 int prefer_symlink_refs;
diff --git a/sha1_file.c b/sha1_file.c
index b9c1fa3..4882440 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1381,6 +1381,32 @@ static void prepare_packed_git_one(char *objdir, int local)
 	strbuf_release(&path);
 }
 
+static int approximate_object_count_valid;
+
+/*
+ * Give a fast, rough count of the number of objects in the repository. This
+ * ignores loose objects completely. If you have a lot of them, then either
+ * you should repack because your performance will be awful, or they are
+ * all unreachable objects about to be pruned, in which case they're not really
+ * interesting as a measure of repo size in the first place.
+ */
+unsigned long approximate_object_count(void)
+{
+	static unsigned long count;
+	if (!approximate_object_count_valid) {
+		struct packed_git *p;
+
+		prepare_packed_git();
+		count = 0;
+		for (p = packed_git; p; p = p->next) {
+			if (open_pack_index(p))
+				continue;
+			count += p->num_objects;
+		}
+	}
+	return count;
+}
+
 static void *get_next_packed_git(const void *p)
 {
 	return ((const struct packed_git *)p)->next;
@@ -1455,6 +1481,7 @@ void prepare_packed_git(void)
 
 void reprepare_packed_git(void)
 {
+	approximate_object_count_valid = 0;
 	prepare_packed_git_run_once = 0;
 	prepare_packed_git();
 }
diff --git a/sha1_name.c b/sha1_name.c
index 3b647fd..ecc4b54 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -455,10 +455,30 @@ int for_each_abbrev(const char *prefix, each_abbrev_fn fn, void *cb_data)
 	return ret;
 }
 
+/*
+ * Return the slot of the most-significant bit set in "val". There are various
+ * ways to do this quickly with fls() or __builtin_clzl(), but speed is
+ * probably not a big deal here.
+ */
+unsigned msb(unsigned long val)
+{
+	unsigned r = 0;
+	while (val >>= 1)
+		r++;
+	return r;
+}
+
 int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len)
 {
 	int status, exists;
 
+	if (len < 0) {
+		unsigned long count = approximate_object_count();
+		len = (msb(count) + 1) / 2;
+		if (len < 0)
+			len = FALLBACK_DEFAULT_ABBREV;
+	}
+
 	sha1_to_hex_r(hex, sha1);
 	if (len == 40 || !len)
 		return 40;
-- 
2.10.0.618.g82cc264


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] abbrev: auto size the default abbreviation
  2016-10-03 22:27   ` Jeff King
@ 2016-10-03 22:34     ` Linus Torvalds
  2016-10-03 22:40       ` Jeff King
  2016-11-02  1:33     ` Junio C Hamano
  1 sibling, 1 reply; 13+ messages in thread
From: Linus Torvalds @ 2016-10-03 22:34 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Git Mailing List

On Mon, Oct 3, 2016 at 3:27 PM, Jeff King <peff@peff.net> wrote:
>
> +       if (len < 0) {
> +               unsigned long count = approximate_object_count();
> +               len = (msb(count) + 1) / 2;
> +               if (len < 0)
> +                       len = FALLBACK_DEFAULT_ABBREV;
> +       }

that second "if (len < 0)" should probably be testing against
FALLBACK_DEFAULT_ABBREV, not zero. Or at the very least
MINIMUM_ABBREV. Because a two-character abbreviation won't even be
recognized, even if the git project is very small indeed.

            Linus

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] abbrev: auto size the default abbreviation
  2016-10-03 22:34     ` Linus Torvalds
@ 2016-10-03 22:40       ` Jeff King
  2016-10-03 22:52         ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2016-10-03 22:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List

On Mon, Oct 03, 2016 at 03:34:03PM -0700, Linus Torvalds wrote:

> On Mon, Oct 3, 2016 at 3:27 PM, Jeff King <peff@peff.net> wrote:
> >
> > +       if (len < 0) {
> > +               unsigned long count = approximate_object_count();
> > +               len = (msb(count) + 1) / 2;
> > +               if (len < 0)
> > +                       len = FALLBACK_DEFAULT_ABBREV;
> > +       }
> 
> that second "if (len < 0)" should probably be testing against
> FALLBACK_DEFAULT_ABBREV, not zero. Or at the very least
> MINIMUM_ABBREV. Because a two-character abbreviation won't even be
> recognized, even if the git project is very small indeed.

Oops, yes, clearly it should be FALLBACK_DEFAULT_ABBREV. What is there
would not even pass the tests (it _does_ work on linux.git, of course,
because it is much too large for that code to be triggered).

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] abbrev: auto size the default abbreviation
  2016-10-03 22:40       ` Jeff King
@ 2016-10-03 22:52         ` Junio C Hamano
  2016-10-03 23:47           ` Jeff King
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2016-10-03 22:52 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, Git Mailing List

Jeff King <peff@peff.net> writes:

> On Mon, Oct 03, 2016 at 03:34:03PM -0700, Linus Torvalds wrote:
>
>> On Mon, Oct 3, 2016 at 3:27 PM, Jeff King <peff@peff.net> wrote:
>> >
>> > +       if (len < 0) {
>> > +               unsigned long count = approximate_object_count();
>> > +               len = (msb(count) + 1) / 2;
>> > +               if (len < 0)
>> > +                       len = FALLBACK_DEFAULT_ABBREV;
>> > +       }
>> 
>> that second "if (len < 0)" should probably be testing against
>> FALLBACK_DEFAULT_ABBREV, not zero. Or at the very least
>> MINIMUM_ABBREV. Because a two-character abbreviation won't even be
>> recognized, even if the git project is very small indeed.
>
> Oops, yes, clearly it should be FALLBACK_DEFAULT_ABBREV. What is there
> would not even pass the tests (it _does_ work on linux.git, of course,
> because it is much too large for that code to be triggered).

OK, as Linus's "count at the point of use" is already in 'next',
could you make it incremental with a log message?

Thanks.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] abbrev: auto size the default abbreviation
  2016-10-03 22:52         ` Junio C Hamano
@ 2016-10-03 23:47           ` Jeff King
  2016-10-04  1:37             ` Junio C Hamano
  0 siblings, 1 reply; 13+ messages in thread
From: Jeff King @ 2016-10-03 23:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List

On Mon, Oct 03, 2016 at 03:52:44PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > On Mon, Oct 03, 2016 at 03:34:03PM -0700, Linus Torvalds wrote:
> >
> >> On Mon, Oct 3, 2016 at 3:27 PM, Jeff King <peff@peff.net> wrote:
> >> >
> >> > +       if (len < 0) {
> >> > +               unsigned long count = approximate_object_count();
> >> > +               len = (msb(count) + 1) / 2;
> >> > +               if (len < 0)
> >> > +                       len = FALLBACK_DEFAULT_ABBREV;
> >> > +       }
> >> 
> >> that second "if (len < 0)" should probably be testing against
> >> FALLBACK_DEFAULT_ABBREV, not zero. Or at the very least
> >> MINIMUM_ABBREV. Because a two-character abbreviation won't even be
> >> recognized, even if the git project is very small indeed.
> >
> > Oops, yes, clearly it should be FALLBACK_DEFAULT_ABBREV. What is there
> > would not even pass the tests (it _does_ work on linux.git, of course,
> > because it is much too large for that code to be triggered).
> 
> OK, as Linus's "count at the point of use" is already in 'next',
> could you make it incremental with a log message?

Sure. I wasn't sure if you actually liked my direction or not, so I was
mostly just showing off what the completed one would look like.

Here it is as an incremental on top of lt/abbrev-auto. I also tweaked
the math a bit to round-up more aggressively, and commented it more (I
could also just make the math look exactly like Linus's, counting up
until hitting an expected collision. I dunno if that is more clear).

-- >8 --
Subject: [PATCH] find_unique_abbrev: move logic out of get_short_sha1()

The get_short_sha1() is only about reading short sha1s; we
do call it in a loop to check "is this long enough" for each
object, but otherwise it should not need to know about
things like our default_abbrev setting.

So instead of asking it to set default_automatic_abbrev as a
side-effect, let's just have find_unique_abbrev() pick the
right place to start its loop.  This requires a separate
approximate_object_count() function, but that naturally
belongs with the rest of sha1_file.c.

Signed-off-by: Jeff King <peff@peff.net>
---
 cache.h     |  7 ++++++-
 sha1_file.c | 27 +++++++++++++++++++++++++++
 sha1_name.c | 60 +++++++++++++++++++++++++++++++++++-------------------------
 3 files changed, 68 insertions(+), 26 deletions(-)

diff --git a/cache.h b/cache.h
index 0e2a059..f22ace5 100644
--- a/cache.h
+++ b/cache.h
@@ -1204,7 +1204,6 @@ struct object_context {
 #define GET_SHA1_TREEISH          020
 #define GET_SHA1_BLOB             040
 #define GET_SHA1_FOLLOW_SYMLINKS 0100
-#define GET_SHA1_AUTOMATIC	 0200
 #define GET_SHA1_ONLY_TO_DIE    04000
 
 #define GET_SHA1_DISAMBIGUATORS \
@@ -1456,6 +1455,12 @@ extern void prepare_packed_git(void);
 extern void reprepare_packed_git(void);
 extern void install_packed_git(struct packed_git *pack);
 
+/*
+ * Give a rough count of objects in the repository. This sacrifices accuracy
+ * for speed.
+ */
+unsigned long approximate_object_count(void);
+
 extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
 					 struct packed_git *packs);
 
diff --git a/sha1_file.c b/sha1_file.c
index b9c1fa3..4882440 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1381,6 +1381,32 @@ static void prepare_packed_git_one(char *objdir, int local)
 	strbuf_release(&path);
 }
 
+static int approximate_object_count_valid;
+
+/*
+ * Give a fast, rough count of the number of objects in the repository. This
+ * ignores loose objects completely. If you have a lot of them, then either
+ * you should repack because your performance will be awful, or they are
+ * all unreachable objects about to be pruned, in which case they're not really
+ * interesting as a measure of repo size in the first place.
+ */
+unsigned long approximate_object_count(void)
+{
+	static unsigned long count;
+	if (!approximate_object_count_valid) {
+		struct packed_git *p;
+
+		prepare_packed_git();
+		count = 0;
+		for (p = packed_git; p; p = p->next) {
+			if (open_pack_index(p))
+				continue;
+			count += p->num_objects;
+		}
+	}
+	return count;
+}
+
 static void *get_next_packed_git(const void *p)
 {
 	return ((const struct packed_git *)p)->next;
@@ -1455,6 +1481,7 @@ void prepare_packed_git(void)
 
 void reprepare_packed_git(void)
 {
+	approximate_object_count_valid = 0;
 	prepare_packed_git_run_once = 0;
 	prepare_packed_git();
 }
diff --git a/sha1_name.c b/sha1_name.c
index beb7ab5..76e6885 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -15,7 +15,6 @@ typedef int (*disambiguate_hint_fn)(const unsigned char *, void *);
 
 struct disambiguate_state {
 	int len; /* length of prefix in hex chars */
-	unsigned int nrobjects;
 	char hex_pfx[GIT_SHA1_HEXSZ + 1];
 	unsigned char bin_pfx[GIT_SHA1_RAWSZ];
 
@@ -119,14 +118,6 @@ static void find_short_object_filename(struct disambiguate_state *ds)
 
 			if (strlen(de->d_name) != 38)
 				continue;
-
-			/*
-			 * We only look at the one subdirectory, and we assume
-			 * each subdirectory is roughly similar, so each
-			 * object we find probably has 255 other objects in
-			 * the other fan-out directories.
-			 */
-			ds->nrobjects += 256;
 			if (memcmp(de->d_name, ds->hex_pfx + 2, ds->len - 2))
 				continue;
 			memcpy(hex + 2, de->d_name, 38);
@@ -160,7 +151,6 @@ static void unique_in_pack(struct packed_git *p,
 
 	open_pack_index(p);
 	num = p->num_objects;
-	ds->nrobjects += num;
 	last = num;
 	while (first < last) {
 		uint32_t mid = (first + last) / 2;
@@ -390,9 +380,6 @@ static int show_ambiguous_object(const unsigned char *sha1, void *data)
 	return 0;
 }
 
-/* start from our historical default before the automatic abbreviation */
-static int default_automatic_abbrev = FALLBACK_DEFAULT_ABBREV;
-
 static int get_short_sha1(const char *name, int len, unsigned char *sha1,
 			  unsigned flags)
 {
@@ -439,14 +426,6 @@ static int get_short_sha1(const char *name, int len, unsigned char *sha1,
 		for_each_abbrev(ds.hex_pfx, show_ambiguous_object, &ds);
 	}
 
-	if (len < 16 && !status && (flags & GET_SHA1_AUTOMATIC)) {
-		unsigned int expect_collision = 1 << (len * 2);
-		if (ds.nrobjects > expect_collision) {
-			default_automatic_abbrev = len+1;
-			return SHORT_NAME_AMBIGUOUS;
-		}
-	}
-
 	return status;
 }
 
@@ -476,22 +455,53 @@ int for_each_abbrev(const char *prefix, each_abbrev_fn fn, void *cb_data)
 	return ret;
 }
 
+/*
+ * Return the slot of the most-significant bit set in "val". There are various
+ * ways to do this quickly with fls() or __builtin_clzl(), but speed is
+ * probably not a big deal here.
+ */
+unsigned msb(unsigned long val)
+{
+	unsigned r = 0;
+	while (val >>= 1)
+		r++;
+	return r;
+}
+
 int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len)
 {
 	int status, exists;
-	int flags = GET_SHA1_QUIETLY;
 
 	if (len < 0) {
-		flags |= GET_SHA1_AUTOMATIC;
-		len = default_automatic_abbrev;
+		unsigned long count = approximate_object_count();
+		/*
+		 * Add one because the MSB only tells us the highest bit set,
+		 * not including the value of all the _other_ bits (so "15"
+		 * is only one off of 2^4, but the MSB is the 3rd bit.
+		 */
+		len = msb(count) + 1;
+		/*
+		 * We now know we have on the order of 2^len objects, which
+		 * expects a collision at 2^(len/2). But we also care about hex
+		 * chars, not bits, and there are 4 bits per hex. So all
+		 * together we need to divide by 2; but we also want to round
+		 * odd numbers up, hence adding one before dividing.
+		 */
+		len = (len + 1) / 2;
+		/*
+		 * For very small repos, we stick with our regular fallback.
+		 */
+		if (len < FALLBACK_DEFAULT_ABBREV)
+			len = FALLBACK_DEFAULT_ABBREV;
 	}
+
 	sha1_to_hex_r(hex, sha1);
 	if (len == 40 || !len)
 		return 40;
 	exists = has_sha1_file(sha1);
 	while (len < 40) {
 		unsigned char sha1_ret[20];
-		status = get_short_sha1(hex, len, sha1_ret, flags);
+		status = get_short_sha1(hex, len, sha1_ret, GET_SHA1_QUIETLY);
 		if (exists
 		    ? !status
 		    : status == SHORT_NAME_NOT_FOUND) {
-- 
2.10.0.618.g82cc264


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] abbrev: auto size the default abbreviation
  2016-10-03 23:47           ` Jeff King
@ 2016-10-04  1:37             ` Junio C Hamano
  2016-10-04 12:18               ` Jeff King
  0 siblings, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2016-10-04  1:37 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, Git Mailing List

Jeff King <peff@peff.net> writes:

>> OK, as Linus's "count at the point of use" is already in 'next',
>> could you make it incremental with a log message?
>
> Sure. I wasn't sure if you actually liked my direction or not, so I was
> mostly just showing off what the completed one would look like.

To be quite honest, I am not just unsure if I liked your direction;
rather I am not sure if I actually understood what you perceived as
a difference that matters between the two approaches.  I wanted to
hear you explain the difference in terms of "Linus's does this, but
it is bad in X and Y way, so let's avoid it and do it like Z
instead".  One effective way to extract that out of you was to force
you to justify the "incremental" update.

And it seems that I succeeded ;-).

I am still not sure if I 100% agree with your first paragraph, but
at least now I think I see where you are coming from.

You probably will hear from Ramsay about extern-ness of msb().

> -- >8 --
> Subject: [PATCH] find_unique_abbrev: move logic out of get_short_sha1()
>
> The get_short_sha1() is only about reading short sha1s; we
> do call it in a loop to check "is this long enough" for each
> object, but otherwise it should not need to know about
> things like our default_abbrev setting.
>
> So instead of asking it to set default_automatic_abbrev as a
> side-effect, let's just have find_unique_abbrev() pick the
> right place to start its loop.  This requires a separate
> approximate_object_count() function, but that naturally
> belongs with the rest of sha1_file.c.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  cache.h     |  7 ++++++-
>  sha1_file.c | 27 +++++++++++++++++++++++++++
>  sha1_name.c | 60 +++++++++++++++++++++++++++++++++++-------------------------
>  3 files changed, 68 insertions(+), 26 deletions(-)
>
> diff --git a/cache.h b/cache.h
> index 0e2a059..f22ace5 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1204,7 +1204,6 @@ struct object_context {
>  #define GET_SHA1_TREEISH          020
>  #define GET_SHA1_BLOB             040
>  #define GET_SHA1_FOLLOW_SYMLINKS 0100
> -#define GET_SHA1_AUTOMATIC	 0200
>  #define GET_SHA1_ONLY_TO_DIE    04000
>  
>  #define GET_SHA1_DISAMBIGUATORS \
> @@ -1456,6 +1455,12 @@ extern void prepare_packed_git(void);
>  extern void reprepare_packed_git(void);
>  extern void install_packed_git(struct packed_git *pack);
>  
> +/*
> + * Give a rough count of objects in the repository. This sacrifices accuracy
> + * for speed.
> + */
> +unsigned long approximate_object_count(void);
> +
>  extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
>  					 struct packed_git *packs);
>  
> diff --git a/sha1_file.c b/sha1_file.c
> index b9c1fa3..4882440 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -1381,6 +1381,32 @@ static void prepare_packed_git_one(char *objdir, int local)
>  	strbuf_release(&path);
>  }
>  
> +static int approximate_object_count_valid;
> +
> +/*
> + * Give a fast, rough count of the number of objects in the repository. This
> + * ignores loose objects completely. If you have a lot of them, then either
> + * you should repack because your performance will be awful, or they are
> + * all unreachable objects about to be pruned, in which case they're not really
> + * interesting as a measure of repo size in the first place.
> + */
> +unsigned long approximate_object_count(void)
> +{
> +	static unsigned long count;
> +	if (!approximate_object_count_valid) {
> +		struct packed_git *p;
> +
> +		prepare_packed_git();
> +		count = 0;
> +		for (p = packed_git; p; p = p->next) {
> +			if (open_pack_index(p))
> +				continue;
> +			count += p->num_objects;
> +		}
> +	}
> +	return count;
> +}
> +
>  static void *get_next_packed_git(const void *p)
>  {
>  	return ((const struct packed_git *)p)->next;
> @@ -1455,6 +1481,7 @@ void prepare_packed_git(void)
>  
>  void reprepare_packed_git(void)
>  {
> +	approximate_object_count_valid = 0;
>  	prepare_packed_git_run_once = 0;
>  	prepare_packed_git();
>  }
> diff --git a/sha1_name.c b/sha1_name.c
> index beb7ab5..76e6885 100644
> --- a/sha1_name.c
> +++ b/sha1_name.c
> @@ -15,7 +15,6 @@ typedef int (*disambiguate_hint_fn)(const unsigned char *, void *);
>  
>  struct disambiguate_state {
>  	int len; /* length of prefix in hex chars */
> -	unsigned int nrobjects;
>  	char hex_pfx[GIT_SHA1_HEXSZ + 1];
>  	unsigned char bin_pfx[GIT_SHA1_RAWSZ];
>  
> @@ -119,14 +118,6 @@ static void find_short_object_filename(struct disambiguate_state *ds)
>  
>  			if (strlen(de->d_name) != 38)
>  				continue;
> -
> -			/*
> -			 * We only look at the one subdirectory, and we assume
> -			 * each subdirectory is roughly similar, so each
> -			 * object we find probably has 255 other objects in
> -			 * the other fan-out directories.
> -			 */
> -			ds->nrobjects += 256;
>  			if (memcmp(de->d_name, ds->hex_pfx + 2, ds->len - 2))
>  				continue;
>  			memcpy(hex + 2, de->d_name, 38);
> @@ -160,7 +151,6 @@ static void unique_in_pack(struct packed_git *p,
>  
>  	open_pack_index(p);
>  	num = p->num_objects;
> -	ds->nrobjects += num;
>  	last = num;
>  	while (first < last) {
>  		uint32_t mid = (first + last) / 2;
> @@ -390,9 +380,6 @@ static int show_ambiguous_object(const unsigned char *sha1, void *data)
>  	return 0;
>  }
>  
> -/* start from our historical default before the automatic abbreviation */
> -static int default_automatic_abbrev = FALLBACK_DEFAULT_ABBREV;
> -
>  static int get_short_sha1(const char *name, int len, unsigned char *sha1,
>  			  unsigned flags)
>  {
> @@ -439,14 +426,6 @@ static int get_short_sha1(const char *name, int len, unsigned char *sha1,
>  		for_each_abbrev(ds.hex_pfx, show_ambiguous_object, &ds);
>  	}
>  
> -	if (len < 16 && !status && (flags & GET_SHA1_AUTOMATIC)) {
> -		unsigned int expect_collision = 1 << (len * 2);
> -		if (ds.nrobjects > expect_collision) {
> -			default_automatic_abbrev = len+1;
> -			return SHORT_NAME_AMBIGUOUS;
> -		}
> -	}
> -
>  	return status;
>  }
>  
> @@ -476,22 +455,53 @@ int for_each_abbrev(const char *prefix, each_abbrev_fn fn, void *cb_data)
>  	return ret;
>  }
>  
> +/*
> + * Return the slot of the most-significant bit set in "val". There are various
> + * ways to do this quickly with fls() or __builtin_clzl(), but speed is
> + * probably not a big deal here.
> + */
> +unsigned msb(unsigned long val)
> +{
> +	unsigned r = 0;
> +	while (val >>= 1)
> +		r++;
> +	return r;
> +}
> +
>  int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len)
>  {
>  	int status, exists;
> -	int flags = GET_SHA1_QUIETLY;
>  
>  	if (len < 0) {
> -		flags |= GET_SHA1_AUTOMATIC;
> -		len = default_automatic_abbrev;
> +		unsigned long count = approximate_object_count();
> +		/*
> +		 * Add one because the MSB only tells us the highest bit set,
> +		 * not including the value of all the _other_ bits (so "15"
> +		 * is only one off of 2^4, but the MSB is the 3rd bit.
> +		 */
> +		len = msb(count) + 1;
> +		/*
> +		 * We now know we have on the order of 2^len objects, which
> +		 * expects a collision at 2^(len/2). But we also care about hex
> +		 * chars, not bits, and there are 4 bits per hex. So all
> +		 * together we need to divide by 2; but we also want to round
> +		 * odd numbers up, hence adding one before dividing.
> +		 */
> +		len = (len + 1) / 2;
> +		/*
> +		 * For very small repos, we stick with our regular fallback.
> +		 */
> +		if (len < FALLBACK_DEFAULT_ABBREV)
> +			len = FALLBACK_DEFAULT_ABBREV;
>  	}
> +
>  	sha1_to_hex_r(hex, sha1);
>  	if (len == 40 || !len)
>  		return 40;
>  	exists = has_sha1_file(sha1);
>  	while (len < 40) {
>  		unsigned char sha1_ret[20];
> -		status = get_short_sha1(hex, len, sha1_ret, flags);
> +		status = get_short_sha1(hex, len, sha1_ret, GET_SHA1_QUIETLY);
>  		if (exists
>  		    ? !status
>  		    : status == SHORT_NAME_NOT_FOUND) {

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] abbrev: auto size the default abbreviation
  2016-10-04  1:37             ` Junio C Hamano
@ 2016-10-04 12:18               ` Jeff King
  0 siblings, 0 replies; 13+ messages in thread
From: Jeff King @ 2016-10-04 12:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List

On Mon, Oct 03, 2016 at 06:37:18PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> >> OK, as Linus's "count at the point of use" is already in 'next',
> >> could you make it incremental with a log message?
> >
> > Sure. I wasn't sure if you actually liked my direction or not, so I was
> > mostly just showing off what the completed one would look like.
> 
> To be quite honest, I am not just unsure if I liked your direction;
> rather I am not sure if I actually understood what you perceived as
> a difference that matters between the two approaches.  I wanted to
> hear you explain the difference in terms of "Linus's does this, but
> it is bad in X and Y way, so let's avoid it and do it like Z
> instead".  One effective way to extract that out of you was to force
> you to justify the "incremental" update.
> 
> And it seems that I succeeded ;-).
> 
> I am still not sure if I 100% agree with your first paragraph, but
> at least now I think I see where you are coming from.

For the record, I am OK with Linus's patch as-is. It's mostly "that's
not how I would have done it, and the flow seems confusing to me". But
that's subjective; I don't think there are any functional flaws in it.

> You probably will hear from Ramsay about extern-ness of msb().

Heh. I seem to have a real problem with that lately.

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] abbrev: auto size the default abbreviation
  2016-10-03 22:27   ` Jeff King
  2016-10-03 22:34     ` Linus Torvalds
@ 2016-11-02  1:33     ` Junio C Hamano
  2016-11-02  2:12       ` Jeff King
  1 sibling, 1 reply; 13+ messages in thread
From: Junio C Hamano @ 2016-11-02  1:33 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Linus Torvalds

Jeff King <peff@peff.net> writes:

> On Fri, Sep 30, 2016 at 05:19:37PM -0700, Junio C Hamano wrote:
>
>> Introduce a mechanism, where we estimate the number of objects in
>> the repository upon the first request to abbreviate an object name
>> with the default setting and come up with a sane default for the
>> repository.  Based on the expectation that we would see collision in
>> a repository with 2^(2N) objects when using object names shortened
>> to first N bits, use sufficient number of hexdigits to cover the
>> number of objects in the repository.  Each hexdigit (4-bits) we add
>> to the shortened name allows us to have four times (2-bits) as many
>> objects in the repository.

I was idly browsing the draft release notes and then documentation
and noticed that, even though the new default is to auto-scale,
there is no mention of that in the documentation and there is no way
to explicitly ask for auto-scaling.

I wonder if we want to have something like this.  I actually am
inclined to drop the change to config.c and remove the new mention
of "auto" in the documentation.

 Documentation/config.txt |  9 +++++----
 config.c                 | 14 ++++++++++----
 2 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index a0ab66aae7..b02f8a4025 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -783,10 +783,11 @@ core.sparseCheckout::
 	linkgit:git-read-tree[1] for more information.
 
 core.abbrev::
-	Set the length object names are abbreviated to.  If unspecified,
-	many commands abbreviate to 7 hexdigits, which may not be enough
-	for abbreviated object names to stay unique for sufficiently long
-	time.
+	Set the length object names are abbreviated to.  If
+	unspecified or set to "auto", an appropriate value is
+	computed based on the approximate number of packed objects
+	in your repository, which hopefully is enough for
+	abbreviated object names to stay unique for some time.
 
 add.ignoreErrors::
 add.ignore-errors (deprecated)::
diff --git a/config.c b/config.c
index 83fdecb1bc..c363cca4a9 100644
--- a/config.c
+++ b/config.c
@@ -834,10 +834,16 @@ static int git_default_core_config(const char *var, const char *value)
 	}
 
 	if (!strcmp(var, "core.abbrev")) {
-		int abbrev = git_config_int(var, value);
-		if (abbrev < minimum_abbrev || abbrev > 40)
-			return -1;
-		default_abbrev = abbrev;
+		if (!value)
+			return config_error_nonbool(var);
+		if (!strcasecmp(value, "auto"))
+			default_abbrev = -1;
+		else {
+			int abbrev = git_config_int(var, value);
+			if (abbrev < minimum_abbrev || abbrev > 40)
+				return -1;
+			default_abbrev = abbrev;
+		}
 		return 0;
 	}
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] abbrev: auto size the default abbreviation
  2016-11-02  1:33     ` Junio C Hamano
@ 2016-11-02  2:12       ` Jeff King
  0 siblings, 0 replies; 13+ messages in thread
From: Jeff King @ 2016-11-02  2:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Linus Torvalds

On Tue, Nov 01, 2016 at 06:33:38PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > On Fri, Sep 30, 2016 at 05:19:37PM -0700, Junio C Hamano wrote:
> >
> >> Introduce a mechanism, where we estimate the number of objects in
> >> the repository upon the first request to abbreviate an object name
> >> with the default setting and come up with a sane default for the
> >> repository.  Based on the expectation that we would see collision in
> >> a repository with 2^(2N) objects when using object names shortened
> >> to first N bits, use sufficient number of hexdigits to cover the
> >> number of objects in the repository.  Each hexdigit (4-bits) we add
> >> to the shortened name allows us to have four times (2-bits) as many
> >> objects in the repository.
> 
> I was idly browsing the draft release notes and then documentation
> and noticed that, even though the new default is to auto-scale,
> there is no mention of that in the documentation and there is no way
> to explicitly ask for auto-scaling.
> 
> I wonder if we want to have something like this.  I actually am
> inclined to drop the change to config.c and remove the new mention
> of "auto" in the documentation.

I doubt anybody cares that much either way, but in theory
core.abbrev=auto is a way to override core.abbrev=10 in /etc/gitconfig
or something. Though I'm having trouble envisioning a case where anybody
would set it in /etc/gitconfig, or why somebody would then want to
override that back to auto.

So I think it is fine either way (but I do agree that the core.abbrev
needs _some_ update to mention the auto-scaling behavior).

> diff --git a/config.c b/config.c
> index 83fdecb1bc..c363cca4a9 100644
> --- a/config.c
> +++ b/config.c
> @@ -834,10 +834,16 @@ static int git_default_core_config(const char *var, const char *value)
>  	}
>  
>  	if (!strcmp(var, "core.abbrev")) {
> -		int abbrev = git_config_int(var, value);
> -		if (abbrev < minimum_abbrev || abbrev > 40)
> -			return -1;
> -		default_abbrev = abbrev;
> +		if (!value)
> +			return config_error_nonbool(var);
> +		if (!strcasecmp(value, "auto"))
> +			default_abbrev = -1;
> +		else {
> +			int abbrev = git_config_int(var, value);
> +			if (abbrev < minimum_abbrev || abbrev > 40)
> +				return -1;
> +			default_abbrev = abbrev;
> +		}

This isn't a new problem you added, but that "return -1" would probably
leave people confused:

  $ git -c core.abbrev=2 log
  fatal: unable to parse 'core.abbrev' from command-line config

Probably something like:

  return error("abbrev length out of range: %d", abbrev);

would be more descriptive. I doubt it's a big deal in practice, though
(why would you set core.abbrev to something silly in the first place?).

-Peff

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-11-02  2:12 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-01  0:19 [PATCH 0/3] auto-sizing default abbreviation length Junio C Hamano
2016-10-01  0:19 ` [PATCH 1/3] abbrev: add FALLBACK_DEFAULT_ABBREV to prepare for auto sizing Junio C Hamano
2016-10-01  0:19 ` [PATCH 2/3] abbrev: prepare for new world order Junio C Hamano
2016-10-01  0:19 ` [PATCH 3/3] abbrev: auto size the default abbreviation Junio C Hamano
2016-10-03 22:27   ` Jeff King
2016-10-03 22:34     ` Linus Torvalds
2016-10-03 22:40       ` Jeff King
2016-10-03 22:52         ` Junio C Hamano
2016-10-03 23:47           ` Jeff King
2016-10-04  1:37             ` Junio C Hamano
2016-10-04 12:18               ` Jeff King
2016-11-02  1:33     ` Junio C Hamano
2016-11-02  2:12       ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).