git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Changing the default for "core.abbrev"?
@ 2016-09-26  1:39 Linus Torvalds
  2016-09-26  3:46 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 111+ messages in thread
From: Linus Torvalds @ 2016-09-26  1:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

The default value for commit abbreviation (environment.c: 19) is seven:

    int minimum_abbrev = 4, default_abbrev = 7;

which back in the dark early days of git was fairly reasonable.

It's probably still a perfectly fine default for lots of projects,
since 7 hex digits is a few hundred million unique values, and you
won't really start to get very many collisions in that until you get
closer to a million objects.

The kernel, these days, is at roughly 5 million objects, and while the
seven hex digits are still often enough for uniqueness (and git will
always add digits *until* it is unique), it's long been at the point
where I tell people to do

    git config --global core.abbrev 12

because even though git will extend the seven hex digits until the
object name is unique, that only reflects the *current* situation in
the repository. With 5 million objects and a very healthy growth rate,
a 7-8 hex digit number that is unique today is not necessarily unique
a month or two from now, and then it gets annoying when a commit
message has a short git ID that is no longer unique when you go back
and try to figure out what went wrong in that commit.

I can just keep reminding kernel maintainers and developers to update
their git config, but maybe it would be a good idea to just admit that
the defaults picked in 2005 weren't necessarily the best ones
possible, and those could be bumped up a bit?

I think I mentioned this some time ago, and it's not a huge deal, but
I thought I'd just mention it again because it came up again today for
me..

Thanks,

              Linus

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Changing the default for "core.abbrev"?
  2016-09-26  1:39 Changing the default for "core.abbrev"? Linus Torvalds
@ 2016-09-26  3:46 ` Junio C Hamano
  2016-09-26  4:34   ` Jeff King
                     ` (2 more replies)
  2016-09-26  7:13 ` Christian Couder
  2016-09-28 23:30 ` [PATCH 0/4] raising core.abbrev default to 12 hexdigits Junio C Hamano
  2 siblings, 3 replies; 111+ messages in thread
From: Junio C Hamano @ 2016-09-26  3:46 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> I can just keep reminding kernel maintainers and developers to update
> their git config, but maybe it would be a good idea to just admit that
> the defaults picked in 2005 weren't necessarily the best ones
> possible, and those could be bumped up a bit?
>
> I think I mentioned this some time ago, and it's not a huge deal, but
> I thought I'd just mention it again because it came up again today for
> me..

I am not quite sure how good any new default would be, though.  Just
like any timeout is not long enough for somebody, growing projects
will eventually hit whatever abbreviation length they start with.

Even if we bump it to 12 for everybody, majority of projects at
GitHub would probably be just wasting 5 more hexdigits in addition
to whatever they are already wasting.  The kernel folks will keep
having the problem of having harder time looking up objects referred
to by ancient commits no matter what the new default is anyway, and
then they will again regret we didn't bump it to 16 in year 2016 in
several decades; by that time both of us are probably retired so it
may no longer be our problems, though ;-)

I am not opposed to bump the default to 12 or whatever, but I
suspect any lengthening today may need to be accompanied by a tool
support that finds the set of objects that are reachable from a
commit whose names begin with non-unique abbreviations that appear
in the commit log message. Assuming that it is very hard to refer to
future objects in the log message you write today, such a tool may
find a single object that used to be the unique instance of that
abbreviation back then, and with reachability bitmap support, it may
not be too expensive to run.


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Changing the default for "core.abbrev"?
  2016-09-26  3:46 ` Junio C Hamano
@ 2016-09-26  4:34   ` Jeff King
  2016-09-26  4:45     ` Junio C Hamano
  2019-02-04 16:12     ` [RFC/PATCH] core.abbrev doc: document and test the abbreviation length Ævar Arnfjörð Bjarmason
  2016-09-26  6:33   ` Changing the default for "core.abbrev"? Matthieu Moy
  2016-09-29 13:01   ` Kyle J. McKay
  2 siblings, 2 replies; 111+ messages in thread
From: Jeff King @ 2016-09-26  4:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List

On Sun, Sep 25, 2016 at 08:46:39PM -0700, Junio C Hamano wrote:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
> 
> > I can just keep reminding kernel maintainers and developers to update
> > their git config, but maybe it would be a good idea to just admit that
> > the defaults picked in 2005 weren't necessarily the best ones
> > possible, and those could be bumped up a bit?
> >
> > I think I mentioned this some time ago, and it's not a huge deal, but
> > I thought I'd just mention it again because it came up again today for
> > me..
> 
> I am not quite sure how good any new default would be, though.  Just
> like any timeout is not long enough for somebody, growing projects
> will eventually hit whatever abbreviation length they start with.

I actually think "12" might be sane for a long time. That's 48 bits of
sha1, so we'd expect a 50% change of a _single_ collision at 2^24, or 16
million.  The biggest repository I know about (in number of objects) is
the one holding all of the objects for all of the forks of
torvalds/linux on GitHub. It's at about 15 million objects.

Which _seems_ close, but remember that's the size where we expect to see
a single collision. They don't become common until much later (I didn't
compute an exact number, but Linus's 16x sounds about right). I know
that the growth of the kernel isn't really linear, but I think the need
to bump to "13" might not just be decades, but possibly a century or
more.

So 12 seems reasonable, and the only downside for it (or for "13", for
that matter) is a few extra bytes. I dunno, maybe people will really
hate that, but I have a feeling these are mostly cut-and-pasted anyway.

> I am not opposed to bump the default to 12 or whatever, but I
> suspect any lengthening today may need to be accompanied by a tool
> support that finds the set of objects that are reachable from a
> commit whose names begin with non-unique abbreviations that appear
> in the commit log message. Assuming that it is very hard to refer to
> future objects in the log message you write today, such a tool may
> find a single object that used to be the unique instance of that
> abbreviation back then, and with reachability bitmap support, it may
> not be too expensive to run.

I had a similar thought, but I think it's not just reachability. You
might refer to a short sha1 on an alternate branch that isn't reachable
from you. Or you may even use the short sha1 in an email message or a
bug tracker. So I think the extra context you want is probably a
timestamp: at time t, what was a reasonable guess for this sha1?

That's easy to answer for commits and tags (cull the ones that are too
new), but harder for blobs and trees (you'd want to know the earliest
commit which contains them).

An easier (but less automatic) tool would be to improve our error
message for the ambiguous case, and actually report details of the
candidates. I'm working up a patch now.

-Peff

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Changing the default for "core.abbrev"?
  2016-09-26  4:34   ` Jeff King
@ 2016-09-26  4:45     ` Junio C Hamano
  2016-09-26 11:57       ` [PATCH 0/10] helping people resolve ambiguous sha1s Jeff King
  2019-02-04 16:12     ` [RFC/PATCH] core.abbrev doc: document and test the abbreviation length Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 111+ messages in thread
From: Junio C Hamano @ 2016-09-26  4:45 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, Git Mailing List

On Sun, Sep 25, 2016 at 9:34 PM, Jeff King <peff@peff.net> wrote:
>
> An easier (but less automatic) tool would be to improve our error
> message for the ambiguous case, and actually report details of the
> candidates. I'm working up a patch now.

That sounds like a fun little lunch-break project. Thanks.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Changing the default for "core.abbrev"?
  2016-09-26  3:46 ` Junio C Hamano
  2016-09-26  4:34   ` Jeff King
@ 2016-09-26  6:33   ` Matthieu Moy
  2016-09-26 12:09     ` Jeff King
  2016-09-29 13:01   ` Kyle J. McKay
  2 siblings, 1 reply; 111+ messages in thread
From: Matthieu Moy @ 2016-09-26  6:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List

Junio C Hamano <gitster@pobox.com> writes:

> I am not opposed to bump the default to 12 or whatever, but I
> suspect any lengthening today may need to be accompanied by a tool
> support that finds the set of objects that are reachable from a
> commit whose names begin with non-unique abbreviations that appear
> in the commit log message.

Something much simpler would be to set core.abbrev at clone time,
depending on the size of the project just cloned. So, when cloning a
hello-world, we'd keep the 7 but when cloning a big project we'd get a
larger value.

This doesn't cover the case of someone growing his own project without
cloning, and isn't as clever as actually looking for colision, but it
would probably provide a sane default in 99% cases, and wouldn't be
worse than hardcoding 7 in the 1% remaining cases.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Changing the default for "core.abbrev"?
  2016-09-26  1:39 Changing the default for "core.abbrev"? Linus Torvalds
  2016-09-26  3:46 ` Junio C Hamano
@ 2016-09-26  7:13 ` Christian Couder
  2016-09-28 23:30 ` [PATCH 0/4] raising core.abbrev default to 12 hexdigits Junio C Hamano
  2 siblings, 0 replies; 111+ messages in thread
From: Christian Couder @ 2016-09-26  7:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Junio C Hamano, Git Mailing List,
	Ævar Arnfjörð Bjarmason

On Mon, Sep 26, 2016 at 3:39 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> The kernel, these days, is at roughly 5 million objects, and while the
> seven hex digits are still often enough for uniqueness (and git will
> always add digits *until* it is unique), it's long been at the point
> where I tell people to do
>
>     git config --global core.abbrev 12
>
> because even though git will extend the seven hex digits until the
> object name is unique, that only reflects the *current* situation in
> the repository. With 5 million objects and a very healthy growth rate,
> a 7-8 hex digit number that is unique today is not necessarily unique
> a month or two from now, and then it gets annoying when a commit
> message has a short git ID that is no longer unique when you go back
> and try to figure out what went wrong in that commit.

AEvar sent a patch recently
(https://public-inbox.org/git/20160921114428.28664-3-avarab@gmail.com/)
to have gitweb link to "git describe"'d commits in log messages, and
this makes me wonder if it woudn't be better for the kernel to also
use the output of a command like `git describe --verylong` or `git
describe --long=12` instead of a regular git ID in commit messages.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH 0/10] helping people resolve ambiguous sha1s
  2016-09-26  4:45     ` Junio C Hamano
@ 2016-09-26 11:57       ` Jeff King
  2016-09-26 11:59         ` [PATCH 01/10] get_sha1: detect buggy calls with multiple disambiguators Jeff King
                           ` (9 more replies)
  0 siblings, 10 replies; 111+ messages in thread
From: Jeff King @ 2016-09-26 11:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List

On Sun, Sep 25, 2016 at 09:45:18PM -0700, Junio C Hamano wrote:

> On Sun, Sep 25, 2016 at 9:34 PM, Jeff King <peff@peff.net> wrote:
> >
> > An easier (but less automatic) tool would be to improve our error
> > message for the ambiguous case, and actually report details of the
> > candidates. I'm working up a patch now.
> 
> That sounds like a fun little lunch-break project. Thanks.

That's what I thought, but it turned out to be quite involved. :)

I started by trying to teach get_short_sha1() to remember all of the
candidates it sees, but it turns out to be surprisingly complicated. I
did have something working, but I scrapped it in favor of just looking
at the object database again. It's the error code path, so it's OK to be
slower (especially if it keeps the non-error code path much simpler).

But then being the diligent programmer that I am, I added a tests.
And that failed because of an unrelated bug. Fixing that revealed
another bug. And so on.

The good news is that I think I've finally cleared up all of the
long-standing bugs where git will print the same error message twice.
Those have been annoying me for yours (and apparently others[1]).

Patches 2-4 and 9 are all bugfixes. Patch 10 is the interesting part.
The rest are just cleanups and refactoring.

  [01/10]: get_sha1: detect buggy calls with multiple disambiguators
  [02/10]: get_sha1: avoid repeating ourselves via ONLY_TO_DIE
  [03/10]: get_sha1: propagate flags to child functions
  [04/10]: get_short_sha1: peel tags when looking for treeish
  [05/10]: get_short_sha1: refactor init of disambiguation code
  [06/10]: get_short_sha1: NUL-terminate hex prefix
  [07/10]: get_short_sha1: mark ambiguity error for translation
  [08/10]: sha1_array: let callbacks interrupt iteration
  [09/10]: for_each_abbrev: drop duplicate objects
  [10/10]: get_short_sha1: list ambiguous objects on error

Of course this is all totally orthogonal to Linus's original question. I
hope it will make things more pleasant when somebody does end up having
to look up a too-short sha1, but it's probably still a good idea to
bump the default.

-Peff

[1] http://public-inbox.org/git/504B91B7.1000406@avtalion.name/

^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH 01/10] get_sha1: detect buggy calls with multiple disambiguators
  2016-09-26 11:57       ` [PATCH 0/10] helping people resolve ambiguous sha1s Jeff King
@ 2016-09-26 11:59         ` Jeff King
  2016-09-26 16:37           ` Junio C Hamano
  2016-09-26 11:59         ` [PATCH 02/10] get_sha1: avoid repeating ourselves via ONLY_TO_DIE Jeff King
                           ` (8 subsequent siblings)
  9 siblings, 1 reply; 111+ messages in thread
From: Jeff King @ 2016-09-26 11:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List

The get_sha1() family of functions takes a flags field, but
some of the flags are mutually exclusive. In particular, we
can only handle one disambiguating function, and the flags
quietly override each other. Let's instead detect these as
programming bugs.

Technically some of the flags are supersets of the others,
so treating COMMITTISH|TREEISH as just COMMITTISH is not
wrong, but it's a good sign the caller is confused. And
certainly asking for BLOB|TREE does not work.

We can do the check easily with some bit-twiddling, and as a
bonus, the bit-mask of disambiguators will come in handy in
a future patch.

Signed-off-by: Jeff King <peff@peff.net>
---
 cache.h     | 5 +++++
 sha1_name.c | 9 +++++++++
 2 files changed, 14 insertions(+)

diff --git a/cache.h b/cache.h
index d0494c8..7bd78ca 100644
--- a/cache.h
+++ b/cache.h
@@ -1203,6 +1203,11 @@ struct object_context {
 #define GET_SHA1_FOLLOW_SYMLINKS 0100
 #define GET_SHA1_ONLY_TO_DIE    04000
 
+#define GET_SHA1_DISAMBIGUATORS \
+	(GET_SHA1_COMMIT | GET_SHA1_COMMITTISH | \
+	GET_SHA1_TREE | GET_SHA1_TREEISH | \
+	GET_SHA1_BLOB)
+
 extern int get_sha1(const char *str, unsigned char *sha1);
 extern int get_sha1_commit(const char *str, unsigned char *sha1);
 extern int get_sha1_committish(const char *str, unsigned char *sha1);
diff --git a/sha1_name.c b/sha1_name.c
index faf873c..f9812ff 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -310,6 +310,11 @@ static int prepare_prefixes(const char *name, int len,
 	return 0;
 }
 
+static int multiple_bits_set(unsigned flags)
+{
+	return !!(flags & (flags - 1));
+}
+
 static int get_short_sha1(const char *name, int len, unsigned char *sha1,
 			  unsigned flags)
 {
@@ -327,6 +332,10 @@ static int get_short_sha1(const char *name, int len, unsigned char *sha1,
 	prepare_alt_odb();
 
 	memset(&ds, 0, sizeof(ds));
+
+	if (multiple_bits_set(flags & GET_SHA1_DISAMBIGUATORS))
+		die("BUG: multiple get_short_sha1 disambiguator flags");
+
 	if (flags & GET_SHA1_COMMIT)
 		ds.fn = disambiguate_commit_only;
 	else if (flags & GET_SHA1_COMMITTISH)
-- 
2.10.0.492.g14f803f


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 02/10] get_sha1: avoid repeating ourselves via ONLY_TO_DIE
  2016-09-26 11:57       ` [PATCH 0/10] helping people resolve ambiguous sha1s Jeff King
  2016-09-26 11:59         ` [PATCH 01/10] get_sha1: detect buggy calls with multiple disambiguators Jeff King
@ 2016-09-26 11:59         ` Jeff King
  2016-09-26 11:59         ` [PATCH 03/10] get_sha1: propagate flags to child functions Jeff King
                           ` (7 subsequent siblings)
  9 siblings, 0 replies; 111+ messages in thread
From: Jeff King @ 2016-09-26 11:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List

When the revision code cannot parse an argument like
"HEAD:foo", it will call maybe_die_on_misspelt_object_name(),
which re-runs get_sha1() with an extra ONLY_TO_DIE flag. We
then spend more effort to generate a better error message.

Unfortunately, a side effect is that our second call may
repeat the same error messages from the original get_sha1()
call. You can see this with:

  $ git show 0017
  error: short SHA1 0017 is ambiguous.
  error: short SHA1 0017 is ambiguous.
  fatal: ambiguous argument '0017': unknown revision or path not in the working tree.
  Use '--' to separate paths from revisions, like this:
  'git <command> [<revision>...] -- [<file>...]'

where the second "error:" line comes from the ONLY_TO_DIE
call.

To fix this, we can make ONLY_TO_DIE imply QUIETLY. This is
a little odd, because the whole point of ONLY_TO_DIE is to
output error messages. But what we want to do is tell the
rest of the get_sha1() code (particularly get_sha1_1()) that
the _regular_ messages should be quiet, but the only-to-die
ones should not.

Signed-off-by: Jeff King <peff@peff.net>
---
 sha1_name.c                         | 3 +++
 t/t1512-rev-parse-disambiguation.sh | 6 ++++++
 2 files changed, 9 insertions(+)

diff --git a/sha1_name.c b/sha1_name.c
index f9812ff..fe05ba0 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -1391,6 +1391,9 @@ static int get_sha1_with_context_1(const char *name,
 	const char *cp;
 	int only_to_die = flags & GET_SHA1_ONLY_TO_DIE;
 
+	if (only_to_die)
+		flags |= GET_SHA1_QUIETLY;
+
 	memset(oc, 0, sizeof(*oc));
 	oc->mode = S_IFINVALID;
 	ret = get_sha1_1(name, namelen, sha1, flags);
diff --git a/t/t1512-rev-parse-disambiguation.sh b/t/t1512-rev-parse-disambiguation.sh
index e221167..16f9709 100755
--- a/t/t1512-rev-parse-disambiguation.sh
+++ b/t/t1512-rev-parse-disambiguation.sh
@@ -291,4 +291,10 @@ test_expect_success 'ambiguous short sha1 ref' '
 	grep "refname.*${REF}.*ambiguous" err
 '
 
+test_expect_success C_LOCALE_OUTPUT 'ambiguity errors are not repeated' '
+	test_must_fail git rev-parse 00000 2>stderr &&
+	grep "is ambiguous" stderr >errors &&
+	test_line_count = 1 errors
+'
+
 test_done
-- 
2.10.0.492.g14f803f


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 03/10] get_sha1: propagate flags to child functions
  2016-09-26 11:57       ` [PATCH 0/10] helping people resolve ambiguous sha1s Jeff King
  2016-09-26 11:59         ` [PATCH 01/10] get_sha1: detect buggy calls with multiple disambiguators Jeff King
  2016-09-26 11:59         ` [PATCH 02/10] get_sha1: avoid repeating ourselves via ONLY_TO_DIE Jeff King
@ 2016-09-26 11:59         ` Jeff King
  2016-09-26 11:59         ` [PATCH 04/10] get_short_sha1: peel tags when looking for treeish Jeff King
                           ` (6 subsequent siblings)
  9 siblings, 0 replies; 111+ messages in thread
From: Jeff King @ 2016-09-26 11:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List

The get_sha1() function is actually implementation by many
sub-functions, but we do not always pass our flags around to
all of those functions. As a result, we may forget that our
caller asked us to resolve with GET_SHA1_QUIETLY and output
messages. The two triggerable cases are:

  1. Resolving treeish:path will resolve the "treeish"
     portion using GET_SHA1_TREEISH, dropping all other
     flags.

  2. The peel_onion() function did not take flags at all
     but recurses to get_sha1_1(), which does.

The solution for both is to bitwise-OR their new flags with
the existing ones (after dropping any mutually exclusive
disambiguation flags).

This bug can trigger with "git rev-parse --quiet", which
asks for quiet resolution. But it can also happen in a more
vanilla code path when we do a follow-up ONLY_TO_DIE
invocation of get_sha1(), and that's what the tests check.

Signed-off-by: Jeff King <peff@peff.net>
---
 sha1_name.c                         | 18 ++++++++++++------
 t/t1512-rev-parse-disambiguation.sh | 14 +++++++++++++-
 2 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index fe05ba0..38e51d9 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -686,12 +686,12 @@ struct object *peel_to_type(const char *name, int namelen,
 	}
 }
 
-static int peel_onion(const char *name, int len, unsigned char *sha1)
+static int peel_onion(const char *name, int len, unsigned char *sha1,
+		      unsigned lookup_flags)
 {
 	unsigned char outer[20];
 	const char *sp;
 	unsigned int expected_type = 0;
-	unsigned lookup_flags = 0;
 	struct object *o;
 
 	/*
@@ -731,10 +731,11 @@ static int peel_onion(const char *name, int len, unsigned char *sha1)
 	else
 		return -1;
 
+	lookup_flags &= ~GET_SHA1_DISAMBIGUATORS;
 	if (expected_type == OBJ_COMMIT)
-		lookup_flags = GET_SHA1_COMMITTISH;
+		lookup_flags |= GET_SHA1_COMMITTISH;
 	else if (expected_type == OBJ_TREE)
-		lookup_flags = GET_SHA1_TREEISH;
+		lookup_flags |= GET_SHA1_TREEISH;
 
 	if (get_sha1_1(name, sp - name - 2, outer, lookup_flags))
 		return -1;
@@ -835,7 +836,7 @@ static int get_sha1_1(const char *name, int len, unsigned char *sha1, unsigned l
 		return get_nth_ancestor(name, len1, sha1, num);
 	}
 
-	ret = peel_onion(name, len, sha1);
+	ret = peel_onion(name, len, sha1, lookup_flags);
 	if (!ret)
 		return 0;
 
@@ -1470,7 +1471,12 @@ static int get_sha1_with_context_1(const char *name,
 	if (*cp == ':') {
 		unsigned char tree_sha1[20];
 		int len = cp - name;
-		if (!get_sha1_1(name, len, tree_sha1, GET_SHA1_TREEISH)) {
+		unsigned sub_flags = flags;
+
+		sub_flags &= ~GET_SHA1_DISAMBIGUATORS;
+		sub_flags |= GET_SHA1_TREEISH;
+
+		if (!get_sha1_1(name, len, tree_sha1, sub_flags)) {
 			const char *filename = cp+1;
 			char *new_filename = NULL;
 
diff --git a/t/t1512-rev-parse-disambiguation.sh b/t/t1512-rev-parse-disambiguation.sh
index 16f9709..30e0b80 100755
--- a/t/t1512-rev-parse-disambiguation.sh
+++ b/t/t1512-rev-parse-disambiguation.sh
@@ -291,10 +291,22 @@ test_expect_success 'ambiguous short sha1 ref' '
 	grep "refname.*${REF}.*ambiguous" err
 '
 
-test_expect_success C_LOCALE_OUTPUT 'ambiguity errors are not repeated' '
+test_expect_success C_LOCALE_OUTPUT 'ambiguity errors are not repeated (raw)' '
 	test_must_fail git rev-parse 00000 2>stderr &&
 	grep "is ambiguous" stderr >errors &&
 	test_line_count = 1 errors
 '
 
+test_expect_success C_LOCALE_OUTPUT 'ambiguity errors are not repeated (treeish)' '
+	test_must_fail git rev-parse 00000:foo 2>stderr &&
+	grep "is ambiguous" stderr >errors &&
+	test_line_count = 1 errors
+'
+
+test_expect_success C_LOCALE_OUTPUT 'ambiguity errors are not repeated (peel)' '
+	test_must_fail git rev-parse 00000^{commit} 2>stderr &&
+	grep "is ambiguous" stderr >errors &&
+	test_line_count = 1 errors
+'
+
 test_done
-- 
2.10.0.492.g14f803f


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 04/10] get_short_sha1: peel tags when looking for treeish
  2016-09-26 11:57       ` [PATCH 0/10] helping people resolve ambiguous sha1s Jeff King
                           ` (2 preceding siblings ...)
  2016-09-26 11:59         ` [PATCH 03/10] get_sha1: propagate flags to child functions Jeff King
@ 2016-09-26 11:59         ` Jeff King
  2016-09-26 12:11           ` Jeff King
  2016-09-26 16:55           ` Junio C Hamano
  2016-09-26 12:00         ` [PATCH 05/10] get_short_sha1: refactor init of disambiguation code Jeff King
                           ` (5 subsequent siblings)
  9 siblings, 2 replies; 111+ messages in thread
From: Jeff King @ 2016-09-26 11:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List

The treeish disambiguation function tries to peel tags, but
it does so by calling:

  deref_tag(lookup_object(sha1), ...);

This will only work if we have previously looked at the tag
and created a "struct tag" for it. Since parsing revision
arguments typically happens before anything else, this is
usually not the case, and we would fail to peel the tag (we
are lucky that deref_tag() gracefully handles the NULL and
does not segfault).

Instead, we can use parse_object(). Note that this is the
same fix done by 94d75d1 (get_short_sha1(): correctly
disambiguate type-limited abbreviation, 2013-07-01), but
that commit fixed only the committish disambiguator, and
left the bug in the treeish one.

Signed-off-by: Jeff King <peff@peff.net>
---
 sha1_name.c                         | 2 +-
 t/t1512-rev-parse-disambiguation.sh | 7 +++++++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/sha1_name.c b/sha1_name.c
index 38e51d9..432a308 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -269,7 +269,7 @@ static int disambiguate_treeish_only(const unsigned char *sha1, void *cb_data_un
 		return 0;
 
 	/* We need to do this the hard way... */
-	obj = deref_tag(lookup_object(sha1), NULL, 0);
+	obj = deref_tag(parse_object(sha1), NULL, 0);
 	if (obj && (obj->type == OBJ_TREE || obj->type == OBJ_COMMIT))
 		return 1;
 	return 0;
diff --git a/t/t1512-rev-parse-disambiguation.sh b/t/t1512-rev-parse-disambiguation.sh
index 30e0b80..dfd3567 100755
--- a/t/t1512-rev-parse-disambiguation.sh
+++ b/t/t1512-rev-parse-disambiguation.sh
@@ -264,6 +264,13 @@ test_expect_success 'ambiguous commit-ish' '
 	test_must_fail git log 000000000...
 '
 
+# There are three objects with this prefix: a blob, a tree, and a tag. We know
+# the blob will not pass as a treeish, but the tree and tag should (and thus
+# cause an error).
+test_expect_success 'ambiguous tags peel to treeish' '
+	test_must_fail git rev-parse 0000000000f^{tree}
+'
+
 test_expect_success 'rev-parse --disambiguate' '
 	# The test creates 16 objects that share the prefix and two
 	# commits created by commit-tree in earlier tests share a
-- 
2.10.0.492.g14f803f


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 05/10] get_short_sha1: refactor init of disambiguation code
  2016-09-26 11:57       ` [PATCH 0/10] helping people resolve ambiguous sha1s Jeff King
                           ` (3 preceding siblings ...)
  2016-09-26 11:59         ` [PATCH 04/10] get_short_sha1: peel tags when looking for treeish Jeff King
@ 2016-09-26 12:00         ` Jeff King
  2016-09-26 12:00         ` [PATCH 06/10] get_short_sha1: NUL-terminate hex prefix Jeff King
                           ` (4 subsequent siblings)
  9 siblings, 0 replies; 111+ messages in thread
From: Jeff King @ 2016-09-26 12:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List

The disambiguation machinery has two callers: get_short_sha1
and for_each_abbrev. Both need to repeat much of the same
setup: declaring buffers, sanity-checking lengths, preparing
the prefixes, etc.  Let's pull that into a single init
function so we can avoid repeating ourselves.

Pulling the buffers into the "struct disambiguate_state"
isn't strictly necessary, but it does make things simpler
for the callers, who no longer have to worry about sizing
them correctly (i.e., it's an implicit requirement that
the caller provide 20- and 40-byte buffers).

And while we're touching this code, we can convert any
magic-number sizes to the more modern GIT_SHA1_* constants.

Signed-off-by: Jeff King <peff@peff.net>
---
 sha1_name.c | 79 +++++++++++++++++++++++++++----------------------------------
 1 file changed, 35 insertions(+), 44 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index 432a308..79eb1ee 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -13,9 +13,13 @@ static int get_sha1_oneline(const char *, unsigned char *, struct commit_list *)
 typedef int (*disambiguate_hint_fn)(const unsigned char *, void *);
 
 struct disambiguate_state {
+	int len; /* length of prefix in hex chars */
+	char hex_pfx[GIT_SHA1_HEXSZ];
+	unsigned char bin_pfx[GIT_SHA1_RAWSZ];
+
 	disambiguate_hint_fn fn;
 	void *cb_data;
-	unsigned char candidate[20];
+	unsigned char candidate[GIT_SHA1_RAWSZ];
 	unsigned candidate_exists:1;
 	unsigned candidate_checked:1;
 	unsigned candidate_ok:1;
@@ -72,10 +76,10 @@ static void update_candidates(struct disambiguate_state *ds, const unsigned char
 	/* otherwise, current can be discarded and candidate is still good */
 }
 
-static void find_short_object_filename(int len, const char *hex_pfx, struct disambiguate_state *ds)
+static void find_short_object_filename(struct disambiguate_state *ds)
 {
 	struct alternate_object_database *alt;
-	char hex[40];
+	char hex[GIT_SHA1_HEXSZ];
 	static struct alternate_object_database *fakeent;
 
 	if (!fakeent) {
@@ -95,7 +99,7 @@ static void find_short_object_filename(int len, const char *hex_pfx, struct disa
 	}
 	fakeent->next = alt_odb_list;
 
-	xsnprintf(hex, sizeof(hex), "%.2s", hex_pfx);
+	xsnprintf(hex, sizeof(hex), "%.2s", ds->hex_pfx);
 	for (alt = fakeent; alt && !ds->ambiguous; alt = alt->next) {
 		struct dirent *de;
 		DIR *dir;
@@ -103,7 +107,7 @@ static void find_short_object_filename(int len, const char *hex_pfx, struct disa
 		 * every alt_odb struct has 42 extra bytes after the base
 		 * for exactly this purpose
 		 */
-		xsnprintf(alt->name, 42, "%.2s/", hex_pfx);
+		xsnprintf(alt->name, 42, "%.2s/", ds->hex_pfx);
 		dir = opendir(alt->base);
 		if (!dir)
 			continue;
@@ -113,7 +117,7 @@ static void find_short_object_filename(int len, const char *hex_pfx, struct disa
 
 			if (strlen(de->d_name) != 38)
 				continue;
-			if (memcmp(de->d_name, hex_pfx + 2, len - 2))
+			if (memcmp(de->d_name, ds->hex_pfx + 2, ds->len - 2))
 				continue;
 			memcpy(hex + 2, de->d_name, 38);
 			if (!get_sha1_hex(hex, sha1))
@@ -138,9 +142,7 @@ static int match_sha(unsigned len, const unsigned char *a, const unsigned char *
 	return 1;
 }
 
-static void unique_in_pack(int len,
-			  const unsigned char *bin_pfx,
-			   struct packed_git *p,
+static void unique_in_pack(struct packed_git *p,
 			   struct disambiguate_state *ds)
 {
 	uint32_t num, last, i, first = 0;
@@ -155,7 +157,7 @@ static void unique_in_pack(int len,
 		int cmp;
 
 		current = nth_packed_object_sha1(p, mid);
-		cmp = hashcmp(bin_pfx, current);
+		cmp = hashcmp(ds->bin_pfx, current);
 		if (!cmp) {
 			first = mid;
 			break;
@@ -174,20 +176,19 @@ static void unique_in_pack(int len,
 	 */
 	for (i = first; i < num && !ds->ambiguous; i++) {
 		current = nth_packed_object_sha1(p, i);
-		if (!match_sha(len, bin_pfx, current))
+		if (!match_sha(ds->len, ds->bin_pfx, current))
 			break;
 		update_candidates(ds, current);
 	}
 }
 
-static void find_short_packed_object(int len, const unsigned char *bin_pfx,
-				     struct disambiguate_state *ds)
+static void find_short_packed_object(struct disambiguate_state *ds)
 {
 	struct packed_git *p;
 
 	prepare_packed_git();
 	for (p = packed_git; p && !ds->ambiguous; p = p->next)
-		unique_in_pack(len, bin_pfx, p, ds);
+		unique_in_pack(p, ds);
 }
 
 #define SHORT_NAME_NOT_FOUND (-1)
@@ -281,14 +282,17 @@ static int disambiguate_blob_only(const unsigned char *sha1, void *cb_data_unuse
 	return kind == OBJ_BLOB;
 }
 
-static int prepare_prefixes(const char *name, int len,
-			    unsigned char *bin_pfx,
-			    char *hex_pfx)
+static int init_object_disambiguation(const char *name, int len,
+				      struct disambiguate_state *ds)
 {
 	int i;
 
-	hashclr(bin_pfx);
-	memset(hex_pfx, 'x', 40);
+	if (len < MINIMUM_ABBREV || len > GIT_SHA1_HEXSZ)
+		return -1;
+
+	memset(ds, 0, sizeof(*ds));
+	memset(ds->hex_pfx, 'x', GIT_SHA1_HEXSZ);
+
 	for (i = 0; i < len ;i++) {
 		unsigned char c = name[i];
 		unsigned char val;
@@ -302,11 +306,14 @@ static int prepare_prefixes(const char *name, int len,
 		}
 		else
 			return -1;
-		hex_pfx[i] = c;
+		ds->hex_pfx[i] = c;
 		if (!(i & 1))
 			val <<= 4;
-		bin_pfx[i >> 1] |= val;
+		ds->bin_pfx[i >> 1] |= val;
 	}
+
+	ds->len = len;
+	prepare_alt_odb();
 	return 0;
 }
 
@@ -319,20 +326,12 @@ static int get_short_sha1(const char *name, int len, unsigned char *sha1,
 			  unsigned flags)
 {
 	int status;
-	char hex_pfx[40];
-	unsigned char bin_pfx[20];
 	struct disambiguate_state ds;
 	int quietly = !!(flags & GET_SHA1_QUIETLY);
 
-	if (len < MINIMUM_ABBREV || len > 40)
-		return -1;
-	if (prepare_prefixes(name, len, bin_pfx, hex_pfx) < 0)
+	if (init_object_disambiguation(name, len, &ds) < 0)
 		return -1;
 
-	prepare_alt_odb();
-
-	memset(&ds, 0, sizeof(ds));
-
 	if (multiple_bits_set(flags & GET_SHA1_DISAMBIGUATORS))
 		die("BUG: multiple get_short_sha1 disambiguator flags");
 
@@ -347,36 +346,28 @@ static int get_short_sha1(const char *name, int len, unsigned char *sha1,
 	else if (flags & GET_SHA1_BLOB)
 		ds.fn = disambiguate_blob_only;
 
-	find_short_object_filename(len, hex_pfx, &ds);
-	find_short_packed_object(len, bin_pfx, &ds);
+	find_short_object_filename(&ds);
+	find_short_packed_object(&ds);
 	status = finish_object_disambiguation(&ds, sha1);
 
 	if (!quietly && (status == SHORT_NAME_AMBIGUOUS))
-		return error("short SHA1 %.*s is ambiguous.", len, hex_pfx);
+		return error("short SHA1 %.*s is ambiguous.", ds.len, ds.hex_pfx);
 	return status;
 }
 
 int for_each_abbrev(const char *prefix, each_abbrev_fn fn, void *cb_data)
 {
-	char hex_pfx[40];
-	unsigned char bin_pfx[20];
 	struct disambiguate_state ds;
-	int len = strlen(prefix);
 
-	if (len < MINIMUM_ABBREV || len > 40)
+	if (init_object_disambiguation(prefix, strlen(prefix), &ds) < 0)
 		return -1;
-	if (prepare_prefixes(prefix, len, bin_pfx, hex_pfx) < 0)
-		return -1;
-
-	prepare_alt_odb();
 
-	memset(&ds, 0, sizeof(ds));
 	ds.always_call_fn = 1;
 	ds.cb_data = cb_data;
 	ds.fn = fn;
 
-	find_short_object_filename(len, hex_pfx, &ds);
-	find_short_packed_object(len, bin_pfx, &ds);
+	find_short_object_filename(&ds);
+	find_short_packed_object(&ds);
 	return ds.ambiguous;
 }
 
-- 
2.10.0.492.g14f803f


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 06/10] get_short_sha1: NUL-terminate hex prefix
  2016-09-26 11:57       ` [PATCH 0/10] helping people resolve ambiguous sha1s Jeff King
                           ` (4 preceding siblings ...)
  2016-09-26 12:00         ` [PATCH 05/10] get_short_sha1: refactor init of disambiguation code Jeff King
@ 2016-09-26 12:00         ` Jeff King
  2016-09-26 17:10           ` Junio C Hamano
  2016-09-26 12:00         ` [PATCH 07/10] get_short_sha1: mark ambiguity error for translation Jeff King
                           ` (3 subsequent siblings)
  9 siblings, 1 reply; 111+ messages in thread
From: Jeff King @ 2016-09-26 12:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List

We store the hex prefix in a 40-byte buffer with the prefix
itself followed by 40-minus-len "x" characters. These x's
serve no purpose, and the lack of NUL termination makes the
prefix string annoying to use. Let's just terminate it.

Note that this is in contrast to the binary prefix, which
_must_ be zero-padded, because we look at the whole thing
during a binary search to find the first potential match in
each pack index. The loose-object hex search cannot use the
same trick because it has to do a linear walk through the
unsorted results of readdir() (and even if it could, you'd
want zeroes instead of x's).

Signed-off-by: Jeff King <peff@peff.net>
---
 sha1_name.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index 79eb1ee..549ef3f 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -14,7 +14,7 @@ typedef int (*disambiguate_hint_fn)(const unsigned char *, void *);
 
 struct disambiguate_state {
 	int len; /* length of prefix in hex chars */
-	char hex_pfx[GIT_SHA1_HEXSZ];
+	char hex_pfx[GIT_SHA1_HEXSZ + 1];
 	unsigned char bin_pfx[GIT_SHA1_RAWSZ];
 
 	disambiguate_hint_fn fn;
@@ -291,7 +291,6 @@ static int init_object_disambiguation(const char *name, int len,
 		return -1;
 
 	memset(ds, 0, sizeof(*ds));
-	memset(ds->hex_pfx, 'x', GIT_SHA1_HEXSZ);
 
 	for (i = 0; i < len ;i++) {
 		unsigned char c = name[i];
@@ -313,6 +312,7 @@ static int init_object_disambiguation(const char *name, int len,
 	}
 
 	ds->len = len;
+	ds->hex_pfx[len] = '\0';
 	prepare_alt_odb();
 	return 0;
 }
@@ -351,7 +351,7 @@ static int get_short_sha1(const char *name, int len, unsigned char *sha1,
 	status = finish_object_disambiguation(&ds, sha1);
 
 	if (!quietly && (status == SHORT_NAME_AMBIGUOUS))
-		return error("short SHA1 %.*s is ambiguous.", ds.len, ds.hex_pfx);
+		return error("short SHA1 %s is ambiguous.", ds.hex_pfx);
 	return status;
 }
 
-- 
2.10.0.492.g14f803f


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 07/10] get_short_sha1: mark ambiguity error for translation
  2016-09-26 11:57       ` [PATCH 0/10] helping people resolve ambiguous sha1s Jeff King
                           ` (5 preceding siblings ...)
  2016-09-26 12:00         ` [PATCH 06/10] get_short_sha1: NUL-terminate hex prefix Jeff King
@ 2016-09-26 12:00         ` Jeff King
  2016-09-26 12:00         ` [PATCH 08/10] sha1_array: let callbacks interrupt iteration Jeff King
                           ` (2 subsequent siblings)
  9 siblings, 0 replies; 111+ messages in thread
From: Jeff King @ 2016-09-26 12:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List

This is a human-readable message, and there's no reason it
should not be translated. While we're at it, let's drop the
period from the end, which is not our usual style.

Signed-off-by: Jeff King <peff@peff.net>
---
 sha1_name.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sha1_name.c b/sha1_name.c
index 549ef3f..d4c7e26 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -351,7 +351,7 @@ static int get_short_sha1(const char *name, int len, unsigned char *sha1,
 	status = finish_object_disambiguation(&ds, sha1);
 
 	if (!quietly && (status == SHORT_NAME_AMBIGUOUS))
-		return error("short SHA1 %s is ambiguous.", ds.hex_pfx);
+		return error(_("short SHA1 %s is ambiguous"), ds.hex_pfx);
 	return status;
 }
 
-- 
2.10.0.492.g14f803f


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 08/10] sha1_array: let callbacks interrupt iteration
  2016-09-26 11:57       ` [PATCH 0/10] helping people resolve ambiguous sha1s Jeff King
                           ` (6 preceding siblings ...)
  2016-09-26 12:00         ` [PATCH 07/10] get_short_sha1: mark ambiguity error for translation Jeff King
@ 2016-09-26 12:00         ` Jeff King
  2016-09-26 12:00         ` [PATCH 09/10] for_each_abbrev: drop duplicate objects Jeff King
  2016-09-26 12:00         ` [PATCH 10/10] get_short_sha1: list ambiguous objects on error Jeff King
  9 siblings, 0 replies; 111+ messages in thread
From: Jeff King @ 2016-09-26 12:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List

The callbacks for iterating a sha1_array must have a void
return.  This is unlike our usual for_each semantics, where
a callback may interrupt iteration and have its value
propagated. Let's switch it to the usual form, which will
enable its use in more places (e.g., where we are replacing
an existing iteration with a different data structure).

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/technical/api-sha1-array.txt | 8 ++++++--
 builtin/cat-file.c                         | 3 ++-
 builtin/receive-pack.c                     | 3 ++-
 sha1-array.c                               | 8 ++++++--
 sha1-array.h                               | 8 ++++----
 submodule.c                                | 3 ++-
 t/helper/test-sha1-array.c                 | 3 ++-
 7 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/Documentation/technical/api-sha1-array.txt b/Documentation/technical/api-sha1-array.txt
index 3e75497..dcc5294 100644
--- a/Documentation/technical/api-sha1-array.txt
+++ b/Documentation/technical/api-sha1-array.txt
@@ -38,16 +38,20 @@ Functions
 `sha1_array_for_each_unique`::
 	Efficiently iterate over each unique element of the list,
 	executing the callback function for each one. If the array is
-	not sorted, this function has the side effect of sorting it.
+	not sorted, this function has the side effect of sorting it. If
+	the callback returns a non-zero value, the iteration ends
+	immediately and the callback's return is propagated; otherwise,
+	0 is returned.
 
 Examples
 --------
 
 -----------------------------------------
-void print_callback(const unsigned char sha1[20],
+int print_callback(const unsigned char sha1[20],
 		    void *data)
 {
 	printf("%s\n", sha1_to_hex(sha1));
+	return 0; /* always continue */
 }
 
 void some_func(void)
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 94e67eb..cca97a8 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -401,11 +401,12 @@ struct object_cb_data {
 	struct expand_data *expand;
 };
 
-static void batch_object_cb(const unsigned char sha1[20], void *vdata)
+static int batch_object_cb(const unsigned char sha1[20], void *vdata)
 {
 	struct object_cb_data *data = vdata;
 	hashcpy(data->expand->oid.hash, sha1);
 	batch_object_write(NULL, data->opt, data->expand);
+	return 0;
 }
 
 static int batch_loose_object(const unsigned char *sha1,
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 896b16f..f7cd180 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -268,9 +268,10 @@ static int show_ref_cb(const char *path_full, const struct object_id *oid,
 	return 0;
 }
 
-static void show_one_alternate_sha1(const unsigned char sha1[20], void *unused)
+static int show_one_alternate_sha1(const unsigned char sha1[20], void *unused)
 {
 	show_ref(".have", sha1);
+	return 0;
 }
 
 static void collect_one_alternate_ref(const struct ref *ref, void *data)
diff --git a/sha1-array.c b/sha1-array.c
index 6f4a224..af1d7d5 100644
--- a/sha1-array.c
+++ b/sha1-array.c
@@ -42,7 +42,7 @@ void sha1_array_clear(struct sha1_array *array)
 	array->sorted = 0;
 }
 
-void sha1_array_for_each_unique(struct sha1_array *array,
+int sha1_array_for_each_unique(struct sha1_array *array,
 				for_each_sha1_fn fn,
 				void *data)
 {
@@ -52,8 +52,12 @@ void sha1_array_for_each_unique(struct sha1_array *array,
 		sha1_array_sort(array);
 
 	for (i = 0; i < array->nr; i++) {
+		int ret;
 		if (i > 0 && !hashcmp(array->sha1[i], array->sha1[i-1]))
 			continue;
-		fn(array->sha1[i], data);
+		ret = fn(array->sha1[i], data);
+		if (ret)
+			return ret;
 	}
+	return 0;
 }
diff --git a/sha1-array.h b/sha1-array.h
index 72bb33b..b3230be 100644
--- a/sha1-array.h
+++ b/sha1-array.h
@@ -14,10 +14,10 @@ void sha1_array_append(struct sha1_array *array, const unsigned char *sha1);
 int sha1_array_lookup(struct sha1_array *array, const unsigned char *sha1);
 void sha1_array_clear(struct sha1_array *array);
 
-typedef void (*for_each_sha1_fn)(const unsigned char sha1[20],
-				 void *data);
-void sha1_array_for_each_unique(struct sha1_array *array,
-				for_each_sha1_fn fn,
+typedef int (*for_each_sha1_fn)(const unsigned char sha1[20],
 				void *data);
+int sha1_array_for_each_unique(struct sha1_array *array,
+			       for_each_sha1_fn fn,
+			       void *data);
 
 #endif /* SHA1_ARRAY_H */
diff --git a/submodule.c b/submodule.c
index 0ef2ff4..aba94dd 100644
--- a/submodule.c
+++ b/submodule.c
@@ -728,9 +728,10 @@ void check_for_new_submodule_commits(unsigned char new_sha1[20])
 	sha1_array_append(&ref_tips_after_fetch, new_sha1);
 }
 
-static void add_sha1_to_argv(const unsigned char sha1[20], void *data)
+static int add_sha1_to_argv(const unsigned char sha1[20], void *data)
 {
 	argv_array_push(data, sha1_to_hex(sha1));
+	return 0;
 }
 
 static void calculate_changed_submodule_paths(void)
diff --git a/t/helper/test-sha1-array.c b/t/helper/test-sha1-array.c
index 09f7790..f7a53c4 100644
--- a/t/helper/test-sha1-array.c
+++ b/t/helper/test-sha1-array.c
@@ -1,9 +1,10 @@
 #include "cache.h"
 #include "sha1-array.h"
 
-static void print_sha1(const unsigned char sha1[20], void *data)
+static int print_sha1(const unsigned char sha1[20], void *data)
 {
 	puts(sha1_to_hex(sha1));
+	return 0;
 }
 
 int cmd_main(int argc, const char **argv)
-- 
2.10.0.492.g14f803f


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 09/10] for_each_abbrev: drop duplicate objects
  2016-09-26 11:57       ` [PATCH 0/10] helping people resolve ambiguous sha1s Jeff King
                           ` (7 preceding siblings ...)
  2016-09-26 12:00         ` [PATCH 08/10] sha1_array: let callbacks interrupt iteration Jeff King
@ 2016-09-26 12:00         ` Jeff King
  2016-09-26 12:00         ` [PATCH 10/10] get_short_sha1: list ambiguous objects on error Jeff King
  9 siblings, 0 replies; 111+ messages in thread
From: Jeff King @ 2016-09-26 12:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List

If an object appears multiple times in the object database
(e.g., in both loose and packed form, or in two separate
packs), the disambiguation machinery may see it more than
once. The get_short_sha1() function handles this already,
but for_each_abbrev() blindly fires the callback for each
instance it finds.

We can fix this by collecting the output in a sha1 array and
de-duplicating it.  As a bonus, the sort done for the
de-duplication means that our output will be stable,
regardless of the order in which the objects are found.

Note that the old code normalized the callback's output to
0/1 to store in the 1-bit ds->ambiguous flag (which both
halted the iteration and was returned from the
for_each_abbrev function). Now that we are using sha1_array,
we can return the real value. In practice, it doesn't matter
as the sole caller only ever returns 0.

Signed-off-by: Jeff King <peff@peff.net>
---
 sha1_name.c                         | 19 +++++++++++++++----
 t/t1512-rev-parse-disambiguation.sh |  7 +++++++
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index d4c7e26..f7403d7 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -7,6 +7,7 @@
 #include "refs.h"
 #include "remote.h"
 #include "dir.h"
+#include "sha1-array.h"
 
 static int get_sha1_oneline(const char *, unsigned char *, struct commit_list *);
 
@@ -355,20 +356,30 @@ static int get_short_sha1(const char *name, int len, unsigned char *sha1,
 	return status;
 }
 
+static int collect_ambiguous(const unsigned char *sha1, void *data)
+{
+	sha1_array_append(data, sha1);
+	return 0;
+}
+
 int for_each_abbrev(const char *prefix, each_abbrev_fn fn, void *cb_data)
 {
+	struct sha1_array collect = SHA1_ARRAY_INIT;
 	struct disambiguate_state ds;
+	int ret;
 
 	if (init_object_disambiguation(prefix, strlen(prefix), &ds) < 0)
 		return -1;
 
 	ds.always_call_fn = 1;
-	ds.cb_data = cb_data;
-	ds.fn = fn;
-
+	ds.fn = collect_ambiguous;
+	ds.cb_data = &collect;
 	find_short_object_filename(&ds);
 	find_short_packed_object(&ds);
-	return ds.ambiguous;
+
+	ret = sha1_array_for_each_unique(&collect, fn, cb_data);
+	sha1_array_clear(&collect);
+	return ret;
 }
 
 int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len)
diff --git a/t/t1512-rev-parse-disambiguation.sh b/t/t1512-rev-parse-disambiguation.sh
index dfd3567..1d8f550 100755
--- a/t/t1512-rev-parse-disambiguation.sh
+++ b/t/t1512-rev-parse-disambiguation.sh
@@ -280,6 +280,13 @@ test_expect_success 'rev-parse --disambiguate' '
 	test "$(sed -e "s/^\(.........\).*/\1/" actual | sort -u)" = 000000000
 '
 
+test_expect_success 'rev-parse --disambiguate drops duplicates' '
+	git rev-parse --disambiguate=000000000 >expect &&
+	git pack-objects .git/objects/pack/pack <expect &&
+	git rev-parse --disambiguate=000000000 >actual &&
+	test_cmp expect actual
+'
+
 test_expect_success 'ambiguous 40-hex ref' '
 	TREE=$(git mktree </dev/null) &&
 	REF=$(git rev-parse HEAD) &&
-- 
2.10.0.492.g14f803f


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 10/10] get_short_sha1: list ambiguous objects on error
  2016-09-26 11:57       ` [PATCH 0/10] helping people resolve ambiguous sha1s Jeff King
                           ` (8 preceding siblings ...)
  2016-09-26 12:00         ` [PATCH 09/10] for_each_abbrev: drop duplicate objects Jeff King
@ 2016-09-26 12:00         ` Jeff King
  2016-09-26 16:36           ` Linus Torvalds
                             ` (2 more replies)
  9 siblings, 3 replies; 111+ messages in thread
From: Jeff King @ 2016-09-26 12:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List

When the user gives us an ambiguous short sha1, we print an
error and refuse to resolve it. In some cases, the next step
is for them to feed us more characters (e.g., if they were
retyping or cut-and-pasting from a full sha1). But in other
cases, that might be all they have. For example, an old
commit message may have used a 7-character hex that was
unique at the time, but is now ambiguous.  Git doesn't
provide any information about the ambiguous objects it
found, so it's hard for the user to find out which one they
probably meant.

This patch teaches get_short_sha1() to list the sha1s of the
objects it found, along with a few bits of information that
may help the user decide which one they meant. Here's what
it looks like on git.git:

  $ git rev-parse b2e1
  error: short SHA1 b2e1 is ambiguous
  hint: The candidates are:
  hint:   b2e1196 tag v2.8.0-rc1
  hint:   b2e11d1 tree
  hint:   b2e1632 commit 2007-11-14 - Merge branch 'bs/maint-commit-options'
  hint:   b2e1759 blob
  hint:   b2e18954 blob
  hint:   b2e1895c blob
  fatal: ambiguous argument 'b2e1': unknown revision or path not in the working tree.
  Use '--' to separate paths from revisions, like this:
  'git <command> [<revision>...] -- [<file>...]'

We show the tagname for tags, and the date and subject for
commits. For trees and blobs, in theory we could dig in the
history to find the paths at which they were present. But
that's very expensive (on the order of 30s for the kernel),
and it's not likely to be all that helpful. Most short
references are to commits, so the useful information is
typically going to be that the object in question _isn't_ a
commit. So it's silly to spend a lot of CPU preemptively
digging up the path; the user can do it themselves if they
really need to.

And of course it's somewhat ironic that we abbreviate the
sha1s in the disambiguation hint. But full sha1s would cause
annoying line wrapping for the commit lines, and presumably
the user is going to just re-issue their command immediately
with the corrected sha1.

We also restrict the list to those that match any
disambiguation hint. E.g.:

  $ git rev-parse b2e1:foo
  error: short SHA1 b2e1 is ambiguous
  hint: The candidates are:
  hint:   b2e1196 tag v2.8.0-rc1
  hint:   b2e11d1 tree
  hint:   b2e1632 commit 2007-11-14 - Merge branch 'bs/maint-commit-options'
  fatal: Invalid object name 'b2e1'.

does not bother reporting the blobs, because they cannot
work as a treeish.

Signed-off-by: Jeff King <peff@peff.net>
---
 sha1_name.c                         | 50 +++++++++++++++++++++++++++++++++++--
 t/t1512-rev-parse-disambiguation.sh | 24 ++++++++++++++++++
 2 files changed, 72 insertions(+), 2 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index f7403d7..35d943d 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -318,6 +318,38 @@ static int init_object_disambiguation(const char *name, int len,
 	return 0;
 }
 
+static int show_ambiguous_object(const unsigned char *sha1, void *data)
+{
+	const struct disambiguate_state *ds = data;
+	struct strbuf desc = STRBUF_INIT;
+	int type;
+
+	if (ds->fn && !ds->fn(sha1, ds->cb_data))
+		return 0;
+
+	type = sha1_object_info(sha1, NULL);
+	if (type == OBJ_COMMIT) {
+		struct commit *commit = lookup_commit(sha1);
+		if (commit) {
+			struct pretty_print_context pp = {0};
+			pp.date_mode.type = DATE_SHORT;
+			format_commit_message(commit, " %ad - %s", &desc, &pp);
+		}
+	} else if (type == OBJ_TAG) {
+		struct tag *tag = lookup_tag(sha1);
+		if (!parse_tag(tag) && tag->tag)
+			strbuf_addf(&desc, " %s", tag->tag);
+	}
+
+	advise("  %s %s%s",
+	       find_unique_abbrev(sha1, DEFAULT_ABBREV),
+	       typename(type) ? typename(type) : "unknown type",
+	       desc.buf);
+
+	strbuf_release(&desc);
+	return 0;
+}
+
 static int multiple_bits_set(unsigned flags)
 {
 	return !!(flags & (flags - 1));
@@ -351,8 +383,22 @@ static int get_short_sha1(const char *name, int len, unsigned char *sha1,
 	find_short_packed_object(&ds);
 	status = finish_object_disambiguation(&ds, sha1);
 
-	if (!quietly && (status == SHORT_NAME_AMBIGUOUS))
-		return error(_("short SHA1 %s is ambiguous"), ds.hex_pfx);
+	if (!quietly && (status == SHORT_NAME_AMBIGUOUS)) {
+		error(_("short SHA1 %s is ambiguous"), ds.hex_pfx);
+
+		/*
+		 * We may still have ambiguity if we simply saw a series of
+		 * candidates that did not satisfy our hint function. In
+		 * that case, we still want to show them, so disable the hint
+		 * function entirely.
+		 */
+		if (!ds.ambiguous)
+			ds.fn = NULL;
+
+		advise(_("The candidates are:"));
+		for_each_abbrev(ds.hex_pfx, show_ambiguous_object, &ds);
+	}
+
 	return status;
 }
 
diff --git a/t/t1512-rev-parse-disambiguation.sh b/t/t1512-rev-parse-disambiguation.sh
index 1d8f550..c5447ef 100755
--- a/t/t1512-rev-parse-disambiguation.sh
+++ b/t/t1512-rev-parse-disambiguation.sh
@@ -323,4 +323,28 @@ test_expect_success C_LOCALE_OUTPUT 'ambiguity errors are not repeated (peel)' '
 	test_line_count = 1 errors
 '
 
+test_expect_success C_LOCALE_OUTPUT 'ambiguity hints' '
+	test_must_fail git rev-parse 000000000 2>stderr &&
+	grep ^hint: stderr >hints &&
+	# 16 candidates, plus one intro line
+	test_line_count = 17 hints
+'
+
+test_expect_success C_LOCALE_OUTPUT 'ambiguity hints respect type' '
+	test_must_fail git rev-parse 000000000^{commit} 2>stderr &&
+	grep ^hint: stderr >hints &&
+	# 5 commits, 1 tag (which is a commitish), plus intro line
+	test_line_count = 7 hints
+'
+
+test_expect_success C_LOCALE_OUTPUT 'failed type-selector still shows hint' '
+	# these two blobs share the same prefix "ee3d", but neither
+	# will pass for a commit
+	echo 851 | git hash-object --stdin -w &&
+	echo 872 | git hash-object --stdin -w &&
+	test_must_fail git rev-parse ee3d^{commit} 2>stderr &&
+	grep ^hint: stderr >hints &&
+	test_line_count = 3 hints
+'
+
 test_done
-- 
2.10.0.492.g14f803f

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: Changing the default for "core.abbrev"?
  2016-09-26  6:33   ` Changing the default for "core.abbrev"? Matthieu Moy
@ 2016-09-26 12:09     ` Jeff King
  0 siblings, 0 replies; 111+ messages in thread
From: Jeff King @ 2016-09-26 12:09 UTC (permalink / raw)
  To: Matthieu Moy; +Cc: Junio C Hamano, Linus Torvalds, Git Mailing List

On Mon, Sep 26, 2016 at 08:33:52AM +0200, Matthieu Moy wrote:

> Junio C Hamano <gitster@pobox.com> writes:
> 
> > I am not opposed to bump the default to 12 or whatever, but I
> > suspect any lengthening today may need to be accompanied by a tool
> > support that finds the set of objects that are reachable from a
> > commit whose names begin with non-unique abbreviations that appear
> > in the commit log message.
> 
> Something much simpler would be to set core.abbrev at clone time,
> depending on the size of the project just cloned. So, when cloning a
> hello-world, we'd keep the 7 but when cloning a big project we'd get a
> larger value.
> 
> This doesn't cover the case of someone growing his own project without
> cloning, and isn't as clever as actually looking for colision, but it
> would probably provide a sane default in 99% cases, and wouldn't be
> worse than hardcoding 7 in the 1% remaining cases.

I think we could easily make this even more dynamic, and just base the
minimum for DEFAULT_ABBREV on the number of objects _currently_ in the
repository, plus some safety factor. We could do this cheaply by just
counting the number of objects in the packs (which we get for free when
we open their pack index). That misses loose objects, but if you have 4
million loose objects you have bigger problems than abbreviation
lengths, I think.

OTOH, any scheme that looks at the current repository size will
eventually grow outdated. The safety factor depends on how fast your
repository grows, and how big you expect it to eventually get. Such a
default might still have been using 7-character abbreviations on
linux.git in 2006, and we'd be stuck with them now.

The idea of a 12-character default is basically that we'd expect decades
or more for even the largest projects to get there, so you err on the
side of future-proofing.

-Peff

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 04/10] get_short_sha1: peel tags when looking for treeish
  2016-09-26 11:59         ` [PATCH 04/10] get_short_sha1: peel tags when looking for treeish Jeff King
@ 2016-09-26 12:11           ` Jeff King
  2016-09-26 16:55           ` Junio C Hamano
  1 sibling, 0 replies; 111+ messages in thread
From: Jeff King @ 2016-09-26 12:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List

On Mon, Sep 26, 2016 at 07:59:48AM -0400, Jeff King wrote:

> Subject: Re: [PATCH 04/10] get_short_sha1: peel tags when looking for treeish
>
> The treeish disambiguation function tries to peel tags, but
> it does so by calling:

Probably the subject should be "parse tags when...". We already try to
peel, we just don't do it right.

-Peff

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 10/10] get_short_sha1: list ambiguous objects on error
  2016-09-26 12:00         ` [PATCH 10/10] get_short_sha1: list ambiguous objects on error Jeff King
@ 2016-09-26 16:36           ` Linus Torvalds
  2016-09-27  5:42             ` Jacob Keller
                               ` (2 more replies)
  2016-09-26 17:30           ` Junio C Hamano
  2016-09-29 11:46           ` Kyle J. McKay
  2 siblings, 3 replies; 111+ messages in thread
From: Linus Torvalds @ 2016-09-26 16:36 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Git Mailing List

On Mon, Sep 26, 2016 at 5:00 AM, Jeff King <peff@peff.net> wrote:
>
> This patch teaches get_short_sha1() to list the sha1s of the
> objects it found, along with a few bits of information that
> may help the user decide which one they meant.

This looks very good to me, but I wonder if it couldn't be even more aggressive.

In particular, the only hashes that most people ever use in short form
are commit hashes. Those are the ones you'd use in normal human
interactions to point to something happening.

So when the disambiguation notices that there is ambiguity, but there
is only _one_ commit, maybe it should just have an aggressive mode
that says "use that as if it wasn't ambiguous".

And then have an explicit command (or flag) to do disambiguation for
when you explicitly want it.

Rationale: you'd never care about short forms for tags. You'd just use
the tag name. And while blob ID's certainly show up in short form in
diff output (in the "index" line), very few people will use them. And
tree hashes are basically never seen outside of any plumbing commands
and then seldom in shortened form.

So I think it would make sense to default to a mode that just picks
the commit hash if there is only one such hash. Sure, some command
might want a "treeish", but a commit is still more likely than a tree
or a tag.

But regardless, this series looks like a good thing.

                        Linus

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 01/10] get_sha1: detect buggy calls with multiple disambiguators
  2016-09-26 11:59         ` [PATCH 01/10] get_sha1: detect buggy calls with multiple disambiguators Jeff King
@ 2016-09-26 16:37           ` Junio C Hamano
  2016-09-26 17:21             ` Jeff King
  0 siblings, 1 reply; 111+ messages in thread
From: Junio C Hamano @ 2016-09-26 16:37 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, Git Mailing List

Jeff King <peff@peff.net> writes:

> The get_sha1() family of functions takes a flags field, but
> some of the flags are mutually exclusive. In particular, we
> can only handle one disambiguating function, and the flags
> quietly override each other. Let's instead detect these as
> programming bugs.
>
> Technically some of the flags are supersets of the others,
> so treating COMMITTISH|TREEISH as just COMMITTISH is not
> wrong, but it's a good sign the caller is confused. And
> certainly asking for BLOB|TREE does not work.
>
> We can do the check easily with some bit-twiddling, and as a
> bonus, the bit-mask of disambiguators will come in handy in
> a future patch.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---

Other than your reinvention of HAS_MULTI_BITS(), which has been with
us since db7244bd ("parse-options new features.", 2007-11-07), this
looks like a reasonable thing to do.

;-)

>  cache.h     | 5 +++++
>  sha1_name.c | 9 +++++++++
>  2 files changed, 14 insertions(+)
>
> diff --git a/cache.h b/cache.h
> index d0494c8..7bd78ca 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1203,6 +1203,11 @@ struct object_context {
>  #define GET_SHA1_FOLLOW_SYMLINKS 0100
>  #define GET_SHA1_ONLY_TO_DIE    04000
>  
> +#define GET_SHA1_DISAMBIGUATORS \
> +	(GET_SHA1_COMMIT | GET_SHA1_COMMITTISH | \
> +	GET_SHA1_TREE | GET_SHA1_TREEISH | \
> +	GET_SHA1_BLOB)
> +
>  extern int get_sha1(const char *str, unsigned char *sha1);
>  extern int get_sha1_commit(const char *str, unsigned char *sha1);
>  extern int get_sha1_committish(const char *str, unsigned char *sha1);
> diff --git a/sha1_name.c b/sha1_name.c
> index faf873c..f9812ff 100644
> --- a/sha1_name.c
> +++ b/sha1_name.c
> @@ -310,6 +310,11 @@ static int prepare_prefixes(const char *name, int len,
>  	return 0;
>  }
>  
> +static int multiple_bits_set(unsigned flags)
> +{
> +	return !!(flags & (flags - 1));
> +}
> +
>  static int get_short_sha1(const char *name, int len, unsigned char *sha1,
>  			  unsigned flags)
>  {
> @@ -327,6 +332,10 @@ static int get_short_sha1(const char *name, int len, unsigned char *sha1,
>  	prepare_alt_odb();
>  
>  	memset(&ds, 0, sizeof(ds));
> +
> +	if (multiple_bits_set(flags & GET_SHA1_DISAMBIGUATORS))
> +		die("BUG: multiple get_short_sha1 disambiguator flags");
> +
>  	if (flags & GET_SHA1_COMMIT)
>  		ds.fn = disambiguate_commit_only;
>  	else if (flags & GET_SHA1_COMMITTISH)

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 04/10] get_short_sha1: peel tags when looking for treeish
  2016-09-26 11:59         ` [PATCH 04/10] get_short_sha1: peel tags when looking for treeish Jeff King
  2016-09-26 12:11           ` Jeff King
@ 2016-09-26 16:55           ` Junio C Hamano
  2016-09-26 17:23             ` Jeff King
  1 sibling, 1 reply; 111+ messages in thread
From: Junio C Hamano @ 2016-09-26 16:55 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, Git Mailing List

Jeff King <peff@peff.net> writes:

> The treeish disambiguation function tries to peel tags, but
> it does so by calling:
>
>   deref_tag(lookup_object(sha1), ...);
>
> This will only work if we have previously looked at the tag
> and created a "struct tag" for it. Since parsing revision
> arguments typically happens before anything else, this is
> usually not the case, and we would fail to peel the tag (we
> are lucky that deref_tag() gracefully handles the NULL and
> does not segfault).

Makes perfect sense.

> Instead, we can use parse_object(). Note that this is the
> same fix done by 94d75d1 (get_short_sha1(): correctly
> disambiguate type-limited abbreviation, 2013-07-01), but
> that commit fixed only the committish disambiguator, and
> left the bug in the treeish one.

Can you share your secret tool you use to find this kind of thing?
Yes, the patch from that commit does look very similar to what we
see in this patch, but I'd love to see "I am fixing an incorrect
call to lookup-object by replacing it with parse-object; has there
been a similar fix?" automated ;-)

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  sha1_name.c                         | 2 +-
>  t/t1512-rev-parse-disambiguation.sh | 7 +++++++
>  2 files changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/sha1_name.c b/sha1_name.c
> index 38e51d9..432a308 100644
> --- a/sha1_name.c
> +++ b/sha1_name.c
> @@ -269,7 +269,7 @@ static int disambiguate_treeish_only(const unsigned char *sha1, void *cb_data_un
>  		return 0;
>  
>  	/* We need to do this the hard way... */
> -	obj = deref_tag(lookup_object(sha1), NULL, 0);
> +	obj = deref_tag(parse_object(sha1), NULL, 0);
>  	if (obj && (obj->type == OBJ_TREE || obj->type == OBJ_COMMIT))
>  		return 1;
>  	return 0;
> diff --git a/t/t1512-rev-parse-disambiguation.sh b/t/t1512-rev-parse-disambiguation.sh
> index 30e0b80..dfd3567 100755
> --- a/t/t1512-rev-parse-disambiguation.sh
> +++ b/t/t1512-rev-parse-disambiguation.sh
> @@ -264,6 +264,13 @@ test_expect_success 'ambiguous commit-ish' '
>  	test_must_fail git log 000000000...
>  '
>  
> +# There are three objects with this prefix: a blob, a tree, and a tag. We know
> +# the blob will not pass as a treeish, but the tree and tag should (and thus
> +# cause an error).
> +test_expect_success 'ambiguous tags peel to treeish' '
> +	test_must_fail git rev-parse 0000000000f^{tree}
> +'
> +
>  test_expect_success 'rev-parse --disambiguate' '
>  	# The test creates 16 objects that share the prefix and two
>  	# commits created by commit-tree in earlier tests share a

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 06/10] get_short_sha1: NUL-terminate hex prefix
  2016-09-26 12:00         ` [PATCH 06/10] get_short_sha1: NUL-terminate hex prefix Jeff King
@ 2016-09-26 17:10           ` Junio C Hamano
  2016-09-26 17:25             ` Jeff King
  0 siblings, 1 reply; 111+ messages in thread
From: Junio C Hamano @ 2016-09-26 17:10 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, Git Mailing List

Jeff King <peff@peff.net> writes:

> We store the hex prefix in a 40-byte buffer with the prefix
> itself followed by 40-minus-len "x" characters. These x's
> serve no purpose, and the lack of NUL termination makes the
> prefix string annoying to use. Let's just terminate it.

> Note that this is in contrast to the binary prefix, which
> _must_ be zero-padded, because we look at the whole thing
> during a binary search to find the first potential match in
> each pack index. 

Makes sense.

> The loose-object hex search cannot use the
> same trick because it has to do a linear walk through the
> unsorted results of readdir() (and even if it could, you'd
> want zeroes instead of x's).

OK.

>  struct disambiguate_state {
>  	int len; /* length of prefix in hex chars */
> -	char hex_pfx[GIT_SHA1_HEXSZ];
> +	char hex_pfx[GIT_SHA1_HEXSZ + 1];
>  	unsigned char bin_pfx[GIT_SHA1_RAWSZ];
>  
>  	disambiguate_hint_fn fn;
> @@ -291,7 +291,6 @@ static int init_object_disambiguation(const char *name, int len,
>  		return -1;
>  
>  	memset(ds, 0, sizeof(*ds));
> -	memset(ds->hex_pfx, 'x', GIT_SHA1_HEXSZ);

As the whole thing is cleared here...

>  
>  	for (i = 0; i < len ;i++) {
>  		unsigned char c = name[i];
> @@ -313,6 +312,7 @@ static int init_object_disambiguation(const char *name, int len,
>  	}
>  
>  	ds->len = len;
> +	ds->hex_pfx[len] = '\0';

... do we even need this one?  It would not hurt, though.

> @@ -351,7 +351,7 @@ static int get_short_sha1(const char *name, int len, unsigned char *sha1,
>  	status = finish_object_disambiguation(&ds, sha1);
>  
>  	if (!quietly && (status == SHORT_NAME_AMBIGUOUS))
> -		return error("short SHA1 %.*s is ambiguous.", ds.len, ds.hex_pfx);
> +		return error("short SHA1 %s is ambiguous.", ds.hex_pfx);

Makes sense.

Thanks.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 01/10] get_sha1: detect buggy calls with multiple disambiguators
  2016-09-26 16:37           ` Junio C Hamano
@ 2016-09-26 17:21             ` Jeff King
  2016-09-26 17:50               ` Junio C Hamano
  0 siblings, 1 reply; 111+ messages in thread
From: Jeff King @ 2016-09-26 17:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List

On Mon, Sep 26, 2016 at 09:37:10AM -0700, Junio C Hamano wrote:

> > We can do the check easily with some bit-twiddling, and as a
> > bonus, the bit-mask of disambiguators will come in handy in
> > a future patch.
> >
> > Signed-off-by: Jeff King <peff@peff.net>
> > ---
> 
> Other than your reinvention of HAS_MULTI_BITS(), which has been with
> us since db7244bd ("parse-options new features.", 2007-11-07), this
> looks like a reasonable thing to do.

Heh, I _thought_ we had something like that but couldn't find it. I
grepped for "[^&]& .*-", which does match it, but stupidly did it only
in '*.c'. Definitely it should use the existing macro instead.

-Peff

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 04/10] get_short_sha1: peel tags when looking for treeish
  2016-09-26 16:55           ` Junio C Hamano
@ 2016-09-26 17:23             ` Jeff King
  0 siblings, 0 replies; 111+ messages in thread
From: Jeff King @ 2016-09-26 17:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List

On Mon, Sep 26, 2016 at 09:55:20AM -0700, Junio C Hamano wrote:

> > Instead, we can use parse_object(). Note that this is the
> > same fix done by 94d75d1 (get_short_sha1(): correctly
> > disambiguate type-limited abbreviation, 2013-07-01), but
> > that commit fixed only the committish disambiguator, and
> > left the bug in the treeish one.
> 
> Can you share your secret tool you use to find this kind of thing?
> Yes, the patch from that commit does look very similar to what we
> see in this patch, but I'd love to see "I am fixing an incorrect
> call to lookup-object by replacing it with parse-object; has there
> been a similar fix?" automated ;-)

I wish there was an answer besides "persistence and patience". I was
just finishing up the commit message for the final patch, and noticed
that the tag was not present in the second example output, which happens
to use the tree-ish syntax. And I noticed it was doubly weird that the
same bug did not show up in the test scripts, which look for
committishes. That made me peek at the implementation, and from there it
was an easy `git blame` away.

-Peff

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 06/10] get_short_sha1: NUL-terminate hex prefix
  2016-09-26 17:10           ` Junio C Hamano
@ 2016-09-26 17:25             ` Jeff King
  2016-09-26 17:36               ` Junio C Hamano
  0 siblings, 1 reply; 111+ messages in thread
From: Jeff King @ 2016-09-26 17:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List

On Mon, Sep 26, 2016 at 10:10:46AM -0700, Junio C Hamano wrote:

> >  struct disambiguate_state {
> >  	int len; /* length of prefix in hex chars */
> > -	char hex_pfx[GIT_SHA1_HEXSZ];
> > +	char hex_pfx[GIT_SHA1_HEXSZ + 1];
> >  	unsigned char bin_pfx[GIT_SHA1_RAWSZ];
> >  
> >  	disambiguate_hint_fn fn;
> > @@ -291,7 +291,6 @@ static int init_object_disambiguation(const char *name, int len,
> >  		return -1;
> >  
> >  	memset(ds, 0, sizeof(*ds));
> > -	memset(ds->hex_pfx, 'x', GIT_SHA1_HEXSZ);
> 
> As the whole thing is cleared here...
> 
> >  
> >  	for (i = 0; i < len ;i++) {
> >  		unsigned char c = name[i];
> > @@ -313,6 +312,7 @@ static int init_object_disambiguation(const char *name, int len,
> >  	}
> >  
> >  	ds->len = len;
> > +	ds->hex_pfx[len] = '\0';
> 
> ... do we even need this one?  It would not hurt, though.

Sharp eyes. I noticed that while writing it, but wondered if anybody
else would. :)

I left the second one in to make the intention more explicit, and so
readers did not have to worry that the NULs were overwritten in the
loop. I'd be OK with it either way, though.

-Peff

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 10/10] get_short_sha1: list ambiguous objects on error
  2016-09-26 12:00         ` [PATCH 10/10] get_short_sha1: list ambiguous objects on error Jeff King
  2016-09-26 16:36           ` Linus Torvalds
@ 2016-09-26 17:30           ` Junio C Hamano
  2016-09-26 17:34             ` Jeff King
  2016-09-29 11:46           ` Kyle J. McKay
  2 siblings, 1 reply; 111+ messages in thread
From: Junio C Hamano @ 2016-09-26 17:30 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, Git Mailing List

Jeff King <peff@peff.net> writes:

> We also restrict the list to those that match any
> disambiguation hint. E.g.:
>
>   $ git rev-parse b2e1:foo
>   error: short SHA1 b2e1 is ambiguous
>   hint: The candidates are:
>   hint:   b2e1196 tag v2.8.0-rc1
>   hint:   b2e11d1 tree
>   hint:   b2e1632 commit 2007-11-14 - Merge branch 'bs/maint-commit-options'
>   fatal: Invalid object name 'b2e1'.
>
> does not bother reporting the blobs, because they cannot
> work as a treeish.

That's a nice touch, and it even comes free--how wonderful.

It somehow felt strange to have an expensive (compared to no-op,
anyway) loop whose only externally visible effect is to call
advise(), but there does not appear to be a way to even disable this
advise() output, so it probably is OK, I guess.

>  
> +test_expect_success C_LOCALE_OUTPUT 'ambiguity hints' '
> +	test_must_fail git rev-parse 000000000 2>stderr &&
> +	grep ^hint: stderr >hints &&
> +	# 16 candidates, plus one intro line
> +	test_line_count = 17 hints
> +'
> +
> +test_expect_success C_LOCALE_OUTPUT 'ambiguity hints respect type' '
> +	test_must_fail git rev-parse 000000000^{commit} 2>stderr &&
> +	grep ^hint: stderr >hints &&
> +	# 5 commits, 1 tag (which is a commitish), plus intro line
> +	test_line_count = 7 hints
> +'
> +
> +test_expect_success C_LOCALE_OUTPUT 'failed type-selector still shows hint' '
> +	# these two blobs share the same prefix "ee3d", but neither
> +	# will pass for a commit
> +	echo 851 | git hash-object --stdin -w &&
> +	echo 872 | git hash-object --stdin -w &&
> +	test_must_fail git rev-parse ee3d^{commit} 2>stderr &&
> +	grep ^hint: stderr >hints &&
> +	test_line_count = 3 hints
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 10/10] get_short_sha1: list ambiguous objects on error
  2016-09-26 17:30           ` Junio C Hamano
@ 2016-09-26 17:34             ` Jeff King
  2016-09-26 17:39               ` Junio C Hamano
  0 siblings, 1 reply; 111+ messages in thread
From: Jeff King @ 2016-09-26 17:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Git Mailing List

On Mon, Sep 26, 2016 at 10:30:48AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > We also restrict the list to those that match any
> > disambiguation hint. E.g.:
> >
> >   $ git rev-parse b2e1:foo
> >   error: short SHA1 b2e1 is ambiguous
> >   hint: The candidates are:
> >   hint:   b2e1196 tag v2.8.0-rc1
> >   hint:   b2e11d1 tree
> >   hint:   b2e1632 commit 2007-11-14 - Merge branch 'bs/maint-commit-options'
> >   fatal: Invalid object name 'b2e1'.
> >
> > does not bother reporting the blobs, because they cannot
> > work as a treeish.
> 
> That's a nice touch, and it even comes free--how wonderful.
> 
> It somehow felt strange to have an expensive (compared to no-op,
> anyway) loop whose only externally visible effect is to call
> advise(), but there does not appear to be a way to even disable this
> advise() output, so it probably is OK, I guess.

Right, advise() always has an effect. But that reminds me.  I wasn't
sure if we should attach an advice.* config to this. If we do, then the
right place to put the conditional is right after the error() call in
get_short_sha1().

Since it's attached to an error path, I'm guessing nobody will be too
upset about it, so my inclination was to wait and let somebody add the
conditional advice code if they're bothered.

-Peff

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 06/10] get_short_sha1: NUL-terminate hex prefix
  2016-09-26 17:25             ` Jeff King
@ 2016-09-26 17:36               ` Junio C Hamano
  0 siblings, 0 replies; 111+ messages in thread
From: Junio C Hamano @ 2016-09-26 17:36 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, Git Mailing List

Jeff King <peff@peff.net> writes:

> I left the second one in to make the intention more explicit, and so
> readers did not have to worry that the NULs were overwritten in the
> loop. I'd be OK with it either way, though.

Yes, I agree with that it is a good thing to make our intention more
explicit and I am perfectly fine with leaving it as-is.

Thanks.


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 10/10] get_short_sha1: list ambiguous objects on error
  2016-09-26 17:34             ` Jeff King
@ 2016-09-26 17:39               ` Junio C Hamano
  0 siblings, 0 replies; 111+ messages in thread
From: Junio C Hamano @ 2016-09-26 17:39 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, Git Mailing List

Jeff King <peff@peff.net> writes:

> Since it's attached to an error path, I'm guessing nobody will be too
> upset about it, so my inclination was to wait and let somebody add the
> conditional advice code if they're bothered.

Fair enough.  At that point of getting an error message, the only
thing they can do is to start wondering what object the person who
gave the now-non-unique abbrevation to them, so I suspect this is
one of the "advice" messages that can always be there.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 01/10] get_sha1: detect buggy calls with multiple disambiguators
  2016-09-26 17:21             ` Jeff King
@ 2016-09-26 17:50               ` Junio C Hamano
  0 siblings, 0 replies; 111+ messages in thread
From: Junio C Hamano @ 2016-09-26 17:50 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, Git Mailing List

Jeff King <peff@peff.net> writes:

>> Other than your reinvention of HAS_MULTI_BITS(), which has been with
>> us since db7244bd ("parse-options new features.", 2007-11-07), this
>> looks like a reasonable thing to do.
>
> Heh, I _thought_ we had something like that but couldn't find it. I
> grepped for "[^&]& .*-", which does match it, but stupidly did it only
> in '*.c'. Definitely it should use the existing macro instead.

OK, I'll queue this on top to be squashed, so no need to resend only
for this one.

 sha1_name.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/sha1_name.c b/sha1_name.c
index f9812ff..0ff83a9 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -310,11 +310,6 @@ static int prepare_prefixes(const char *name, int len,
 	return 0;
 }
 
-static int multiple_bits_set(unsigned flags)
-{
-	return !!(flags & (flags - 1));
-}
-
 static int get_short_sha1(const char *name, int len, unsigned char *sha1,
 			  unsigned flags)
 {
@@ -333,7 +328,7 @@ static int get_short_sha1(const char *name, int len, unsigned char *sha1,
 
 	memset(&ds, 0, sizeof(ds));
 
-	if (multiple_bits_set(flags & GET_SHA1_DISAMBIGUATORS))
+	if (HAS_MULTI_BITS(flags & GET_SHA1_DISAMBIGUATORS))
 		die("BUG: multiple get_short_sha1 disambiguator flags");
 
 	if (flags & GET_SHA1_COMMIT)

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [PATCH 10/10] get_short_sha1: list ambiguous objects on error
  2016-09-26 16:36           ` Linus Torvalds
@ 2016-09-27  5:42             ` Jacob Keller
  2016-09-27 12:38             ` Jeff King
  2016-09-29 13:01             ` Kyle J. McKay
  2 siblings, 0 replies; 111+ messages in thread
From: Jacob Keller @ 2016-09-27  5:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff King, Junio C Hamano, Git Mailing List

On Mon, Sep 26, 2016 at 9:36 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> This looks very good to me, but I wonder if it couldn't be even more aggressive.
>
> In particular, the only hashes that most people ever use in short form
> are commit hashes. Those are the ones you'd use in normal human
> interactions to point to something happening.
>
> So when the disambiguation notices that there is ambiguity, but there
> is only _one_ commit, maybe it should just have an aggressive mode
> that says "use that as if it wasn't ambiguous".
>
> And then have an explicit command (or flag) to do disambiguation for
> when you explicitly want it.
>
> Rationale: you'd never care about short forms for tags. You'd just use
> the tag name. And while blob ID's certainly show up in short form in
> diff output (in the "index" line), very few people will use them. And
> tree hashes are basically never seen outside of any plumbing commands
> and then seldom in shortened form.
>
> So I think it would make sense to default to a mode that just picks
> the commit hash if there is only one such hash. Sure, some command
> might want a "treeish", but a commit is still more likely than a tree
> or a tag.
>

I'd think we would want to phase this in over a few releases if we do
this? Maybe at least sort commits first in the list so that they are
faster to spot.

I am trying to think of what problems we'd cause by having the
behavior be this aggressive...

Thanks,
Jake

> But regardless, this series looks like a good thing.
>
>                         Linus

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 10/10] get_short_sha1: list ambiguous objects on error
  2016-09-26 16:36           ` Linus Torvalds
  2016-09-27  5:42             ` Jacob Keller
@ 2016-09-27 12:38             ` Jeff King
  2016-09-29 13:01             ` Kyle J. McKay
  2 siblings, 0 replies; 111+ messages in thread
From: Jeff King @ 2016-09-27 12:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List

On Mon, Sep 26, 2016 at 09:36:23AM -0700, Linus Torvalds wrote:

> On Mon, Sep 26, 2016 at 5:00 AM, Jeff King <peff@peff.net> wrote:
> >
> > This patch teaches get_short_sha1() to list the sha1s of the
> > objects it found, along with a few bits of information that
> > may help the user decide which one they meant.
> 
> This looks very good to me, but I wonder if it couldn't be even more
> aggressive.
> 
> In particular, the only hashes that most people ever use in short form
> are commit hashes. Those are the ones you'd use in normal human
> interactions to point to something happening.
> 
> So when the disambiguation notices that there is ambiguity, but there
> is only _one_ commit, maybe it should just have an aggressive mode
> that says "use that as if it wasn't ambiguous".

You can basically get that by using "1234^{commit}" all the time, as
that turns on the committish disambiguator function (though it's not
quite the same, as it would pick a tag, too; you really want the
commit-only disambiguator). But presumably you'd want it on all the
time. See the patch below, which lets you do:

  git config --global core.disambiguate commit

and I think should do what you want. I'm up in the air on whether it is
a good idea or not, but then I do not usually run into ambiguous sha1s.

> And then have an explicit command (or flag) to do disambiguation for
> when you explicitly want it.

In my patch you can tweak the config variable off, though it might make
sense to also have some per-short-sha1 syntax.

> Rationale: you'd never care about short forms for tags. You'd just use
> the tag name. And while blob ID's certainly show up in short form in
> diff output (in the "index" line), very few people will use them. And
> tree hashes are basically never seen outside of any plumbing commands
> and then seldom in shortened form.

I think I do sometimes "git show $blob_sha1" based on a diff index line.
OTOH, I don't think of though as "long-term" references. I'm usually
trying to apply the patch at the time, so it's fairly fresh (it's true
that the short-sha1 may have been generated on the sender's side, who
has fewer objects, but I doubt that's a big problem in general; the real
issue is that it was unique at one point, and isn't a few years later).

But more importantly, any fallback like this should take a backseat to
context provided by the rest of git. So for instance, the index-building
in "am -3" uses the blob disambiguator, and should continue to do so
(and does with my patch).

> So I think it would make sense to default to a mode that just picks
> the commit hash if there is only one such hash. Sure, some command
> might want a "treeish", but a commit is still more likely than a tree
> or a tag.

By the same rule I just mentioned above, if you use the short sha1 in a
treeish context, it will look for any treeish (so "1234:foo" would
continue to look for any treeish, not just a commit). So that might not
be as desirable, but I think it does make sense (and of course it will
still tell you immediately what the options are, and you can decide what
to do).

-- >8 --
Subject: [PATCH] get_short_sha1: make default disambiguation configurable

When we find ambiguous short sha1s, we may get a
disambiguation rule from our caller's context. But if we
don't, we fall back to treating all sha1s the same, even
though most projects will tend to refer only to commits by
their short sha1s.

This patch introduces a configuration option that lets the
user pick a different fallback (e.g., only commits). It's
possible that we may want to make this the default, but it's
a good idea to start as a config option for two reasons:

  1. It lets people experiment with this and see if it's a
     good idea (i.e., the "tend to" above is an assumption;
     we don't really know if this will break some obscure
     cases).

  2. Even if we do flip the default, it gives people an
     escape hatch if it causes problems (you can sometimes
     override it by asking for "1234^{tree}", but not all
     combinations are possible).

Signed-off-by: Jeff King <peff@peff.net>
---
 cache.h                             |  2 ++
 config.c                            |  3 +++
 sha1_name.c                         | 32 ++++++++++++++++++++++++++++++++
 t/t1512-rev-parse-disambiguation.sh | 14 ++++++++++++++
 4 files changed, 51 insertions(+)

diff --git a/cache.h b/cache.h
index 5df0f33..b9583c4 100644
--- a/cache.h
+++ b/cache.h
@@ -1224,6 +1224,8 @@ extern int get_oid(const char *str, struct object_id *oid);
 typedef int each_abbrev_fn(const unsigned char *sha1, void *);
 extern int for_each_abbrev(const char *prefix, each_abbrev_fn, void *);
 
+extern int set_disambiguate_hint_config(const char *var, const char *value);
+
 /*
  * Try to read a SHA1 in hexadecimal format from the 40 characters
  * starting at hex.  Write the 20-byte result to sha1 in binary form.
diff --git a/config.c b/config.c
index 1e4b617..83fdecb 100644
--- a/config.c
+++ b/config.c
@@ -841,6 +841,9 @@ static int git_default_core_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.disambiguate"))
+		return set_disambiguate_hint_config(var, value);
+
 	if (!strcmp(var, "core.loosecompression")) {
 		int level = git_config_int(var, value);
 		if (level == -1)
diff --git a/sha1_name.c b/sha1_name.c
index 0513f14..3b647fd 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -283,6 +283,36 @@ static int disambiguate_blob_only(const unsigned char *sha1, void *cb_data_unuse
 	return kind == OBJ_BLOB;
 }
 
+static disambiguate_hint_fn default_disambiguate_hint;
+
+int set_disambiguate_hint_config(const char *var, const char *value)
+{
+	static const struct {
+		const char *name;
+		disambiguate_hint_fn fn;
+	} hints[] = {
+		{ "none", NULL },
+		{ "commit", disambiguate_commit_only },
+		{ "committish", disambiguate_committish_only },
+		{ "tree", disambiguate_tree_only },
+		{ "treeish", disambiguate_treeish_only },
+		{ "blob", disambiguate_blob_only }
+	};
+	int i;
+
+	if (!value)
+		return config_error_nonbool(var);
+
+	for (i = 0; i < ARRAY_SIZE(hints); i++) {
+		if (!strcasecmp(value, hints[i].name)) {
+			default_disambiguate_hint = hints[i].fn;
+			return 0;
+		}
+	}
+
+	return error("unknown hint type for '%s': %s", var, value);
+}
+
 static int init_object_disambiguation(const char *name, int len,
 				      struct disambiguate_state *ds)
 {
@@ -373,6 +403,8 @@ static int get_short_sha1(const char *name, int len, unsigned char *sha1,
 		ds.fn = disambiguate_treeish_only;
 	else if (flags & GET_SHA1_BLOB)
 		ds.fn = disambiguate_blob_only;
+	else
+		ds.fn = default_disambiguate_hint;
 
 	find_short_object_filename(&ds);
 	find_short_packed_object(&ds);
diff --git a/t/t1512-rev-parse-disambiguation.sh b/t/t1512-rev-parse-disambiguation.sh
index c5447ef..7c659eb 100755
--- a/t/t1512-rev-parse-disambiguation.sh
+++ b/t/t1512-rev-parse-disambiguation.sh
@@ -347,4 +347,18 @@ test_expect_success C_LOCALE_OUTPUT 'failed type-selector still shows hint' '
 	test_line_count = 3 hints
 '
 
+test_expect_success 'core.disambiguate config can prefer types' '
+	# ambiguous between tree and tag
+	sha1=0000000000f &&
+	test_must_fail git rev-parse $sha1 &&
+	git rev-parse $sha1^{commit} &&
+	git -c core.disambiguate=committish rev-parse $sha1
+'
+
+test_expect_success 'core.disambiguate does not override context' '
+	# treeish ambiguous between tag and tree
+	test_must_fail \
+		git -c core.disambiguate=committish rev-parse $sha1^{tree}
+'
+
 test_done
-- 
2.10.0.564.g318c4ae


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 0/4] raising core.abbrev default to 12 hexdigits
  2016-09-26  1:39 Changing the default for "core.abbrev"? Linus Torvalds
  2016-09-26  3:46 ` Junio C Hamano
  2016-09-26  7:13 ` Christian Couder
@ 2016-09-28 23:30 ` Junio C Hamano
  2016-09-28 23:30   ` [PATCH 1/4] config: allow customizing /etc/gitconfig location Junio C Hamano
                     ` (3 more replies)
  2 siblings, 4 replies; 111+ messages in thread
From: Junio C Hamano @ 2016-09-28 23:30 UTC (permalink / raw)
  To: git; +Cc: peff, torvalds

Per request/suggestion by Linus. 

This took far more effort to keep the existing tests working than
the actual change.

Junio C Hamano (4):
  config: allow customizing /etc/gitconfig location
  t13xx: do not assume system config is empty
  worktree: honor configuration variables
  core.abbrev: raise the default abbreviation to 12 hexdigits

 builtin/worktree.c     |  2 ++
 cache.h                |  1 +
 config.c               |  2 ++
 environment.c          |  2 +-
 t/gitconfig-for-test   |  9 +++++++++
 t/t1300-repo-config.sh | 39 ++++++++++++++++++++++++++++-----------
 t/t1308-config-set.sh  |  1 +
 t/test-lib.sh          |  4 ++--
 8 files changed, 46 insertions(+), 14 deletions(-)
 create mode 100644 t/gitconfig-for-test

-- 
2.10.0-584-gc9e068c


^ permalink raw reply	[flat|nested] 111+ messages in thread

* [PATCH 1/4] config: allow customizing /etc/gitconfig location
  2016-09-28 23:30 ` [PATCH 0/4] raising core.abbrev default to 12 hexdigits Junio C Hamano
@ 2016-09-28 23:30   ` Junio C Hamano
  2016-09-29  9:53     ` Jakub Narębski
  2016-09-28 23:30   ` [PATCH 2/4] t13xx: do not assume system config is empty Junio C Hamano
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 111+ messages in thread
From: Junio C Hamano @ 2016-09-28 23:30 UTC (permalink / raw)
  To: git; +Cc: peff, torvalds

With a new environment variable GIT_ETC_GITCONFIG, the users can
specify a file that is used instead of /etc/gitconfig to read (and
write) the system-wide configuration.

Earlier, we introduced GIT_CONFIG_NOSYSTEM environment variable
ab88c363 ("allow suppressing of global and system config",
2008-02-06), primarily to protect our tests from random set of
configuration variables the system administrators would put in their
/etc/gitconfig file.  We can replace the use of this mechanism in
our tests by pointing GIT_ETC_GITCONFIG at our own instead.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * The next step is to add "[core]abbrev=7" to this file and update
   default_abbrev to 12 in environment.c and see what breaks.  I
   suspect that "git worktree list" would break without my recent
   patch.  I also know some tests expect "git config -l" to show
   only values they set to their local configuration, which would
   need to be corrected.  We'll see them in next steps.

 cache.h                |  1 +
 config.c               |  2 ++
 t/gitconfig-for-test   |  6 ++++++
 t/t1300-repo-config.sh | 15 +++++++++++++++
 t/test-lib.sh          |  4 ++--
 5 files changed, 26 insertions(+), 2 deletions(-)
 create mode 100644 t/gitconfig-for-test

diff --git a/cache.h b/cache.h
index b0dae4b..81a07bf 100644
--- a/cache.h
+++ b/cache.h
@@ -408,6 +408,7 @@ static inline enum object_type object_type(unsigned int mode)
 #define GIT_NAMESPACE_ENVIRONMENT "GIT_NAMESPACE"
 #define GIT_WORK_TREE_ENVIRONMENT "GIT_WORK_TREE"
 #define GIT_PREFIX_ENVIRONMENT "GIT_PREFIX"
+#define GIT_ETC_GITCONFIG_ENVIRONMENT "GIT_ETC_GITCONFIG"
 #define DEFAULT_GIT_DIR_ENVIRONMENT ".git"
 #define DB_ENVIRONMENT "GIT_OBJECT_DIRECTORY"
 #define INDEX_ENVIRONMENT "GIT_INDEX_FILE"
diff --git a/config.c b/config.c
index 0dfed68..124699b 100644
--- a/config.c
+++ b/config.c
@@ -1253,6 +1253,8 @@ const char *git_etc_gitconfig(void)
 {
 	static const char *system_wide;
 	if (!system_wide)
+		system_wide = getenv(GIT_ETC_GITCONFIG_ENVIRONMENT);
+	if (!system_wide)
 		system_wide = system_path(ETC_GITCONFIG);
 	return system_wide;
 }
diff --git a/t/gitconfig-for-test b/t/gitconfig-for-test
new file mode 100644
index 0000000..4598885
--- /dev/null
+++ b/t/gitconfig-for-test
@@ -0,0 +1,6 @@
+;; This file is used as if it were /etc/gitconfig while running the
+;; test scripts in this directory.
+;;
+;; [user]
+;;	name = A U Thor
+;;	email = author@example.com
diff --git a/t/t1300-repo-config.sh b/t/t1300-repo-config.sh
index 923bfc5..1184f43 100755
--- a/t/t1300-repo-config.sh
+++ b/t/t1300-repo-config.sh
@@ -1372,4 +1372,19 @@ test_expect_success !MINGW '--show-origin blob ref' '
 	test_cmp expect output
 '
 
+test_expect_success 'system-wide configuration' '
+	system="$TRASH_DIRECTORY/system-wide" &&
+	>"$system" &&
+	git config -f "$system" --add frotz.nitfol xyzzy &&
+
+	git config -f "$system" frotz.nitfol >expect &&
+	GIT_ETC_GITCONFIG="$system" \
+	git config --system frotz.nitfol >actual &&
+
+	GIT_ETC_GITCONFIG="$system" \
+	git config --system --replace-all frotz.nitfol blorb &&
+	echo blorb >expect &&
+	GIT_ETC_GITCONFIG="$system" git config --system frotz.nitfol >actual
+'
+
 test_done
diff --git a/t/test-lib.sh b/t/test-lib.sh
index ac56512..6803212 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -851,9 +851,9 @@ else # normal case, use ../bin-wrappers only unless $with_dashes:
 	fi
 fi
 GIT_TEMPLATE_DIR="$GIT_BUILD_DIR"/templates/blt
-GIT_CONFIG_NOSYSTEM=1
+GIT_ETC_GITCONFIG="$GIT_BUILD_DIR/t/gitconfig-for-test"
 GIT_ATTR_NOSYSTEM=1
-export PATH GIT_EXEC_PATH GIT_TEMPLATE_DIR GIT_CONFIG_NOSYSTEM GIT_ATTR_NOSYSTEM
+export PATH GIT_EXEC_PATH GIT_TEMPLATE_DIR GIT_ETC_GITCONFIG GIT_ATTR_NOSYSTEM
 
 if test -z "$GIT_TEST_CMP"
 then
-- 
2.10.0-584-gc9e068c


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 2/4] t13xx: do not assume system config is empty
  2016-09-28 23:30 ` [PATCH 0/4] raising core.abbrev default to 12 hexdigits Junio C Hamano
  2016-09-28 23:30   ` [PATCH 1/4] config: allow customizing /etc/gitconfig location Junio C Hamano
@ 2016-09-28 23:30   ` Junio C Hamano
  2016-09-29  9:01     ` Jeff King
  2016-09-28 23:30   ` [PATCH 3/4] worktree: honor configuration variables Junio C Hamano
  2016-09-28 23:30   ` [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits Junio C Hamano
  3 siblings, 1 reply; 111+ messages in thread
From: Junio C Hamano @ 2016-09-28 23:30 UTC (permalink / raw)
  To: git; +Cc: peff, torvalds

Most parts of these two tests want to read from the local
configuration file they prepare and make sure expected names and
values appear with "git config --list".

Once we add custom configuration items that we want to affect the
tests with globally to t/gitconfig-for-test file, these will start
seeing the contents from there and break.  Clarify with --local that
they only care about the contents from their local configuration.

The tests for show-origin codepath in "git config" however cannot be
tweaked with "--local" etc., because they wants to read also from
$HOME/.gitconfig and make sure what comes from where.  Disable
reading from the system-wide config with GIT_CONFIG_NOSYSTEM=1 for
these tests.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 t/t1300-repo-config.sh | 24 +++++++++++++-----------
 t/t1308-config-set.sh  |  1 +
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/t/t1300-repo-config.sh b/t/t1300-repo-config.sh
index 1184f43..b998568 100755
--- a/t/t1300-repo-config.sh
+++ b/t/t1300-repo-config.sh
@@ -341,13 +341,11 @@ version.1.2.3eX.alpha=beta
 EOF
 
 test_expect_success 'working --list' '
-	git config --list > output &&
+	git config --local --list > output &&
 	test_cmp expect output
 '
-cat > expect << EOF
-EOF
-
-test_expect_success '--list without repo produces empty output' '
+test_expect_success '--list without repo shows only from the global' '
+	git config --system --list >expect &&
 	git --git-dir=nonexistent config --list >output &&
 	test_cmp expect output
 '
@@ -360,7 +358,7 @@ version.1.2.3eX.alpha
 EOF
 
 test_expect_success '--name-only --list' '
-	git config --name-only --list >output &&
+	git config --local --name-only --list >output &&
 	test_cmp expect output
 '
 
@@ -370,7 +368,7 @@ nextsection.nonewline wow2 for me
 EOF
 
 test_expect_success '--get-regexp' '
-	git config --get-regexp in >output &&
+	git config --local --get-regexp in >output &&
 	test_cmp expect output
 '
 
@@ -380,7 +378,7 @@ nextsection.nonewline
 EOF
 
 test_expect_success '--name-only --get-regexp' '
-	git config --name-only --get-regexp in >output &&
+	git config --local --name-only --get-regexp in >output &&
 	test_cmp expect output
 '
 
@@ -391,7 +389,7 @@ EOF
 
 test_expect_success '--add' '
 	git config --add nextsection.nonewline "wow4 for you" &&
-	git config --get-all nextsection.nonewline > output &&
+	git config --local --get-all nextsection.nonewline > output &&
 	test_cmp expect output
 '
 
@@ -935,7 +933,7 @@ section.quotecont=cont;inued
 EOF
 
 test_expect_success 'value continued on next line' '
-	git config --list > result &&
+	git config --local --list > result &&
 	test_cmp result expect
 '
 
@@ -959,7 +957,7 @@ Qsection.sub=section.val4
 Qsection.sub=section.val5Q
 EOF
 test_expect_success '--null --list' '
-	git config --null --list >result.raw &&
+	git config --null --local --list >result.raw &&
 	nul_to_q <result.raw >result &&
 	echo >>result &&
 	test_cmp expect result
@@ -1264,6 +1262,7 @@ test_expect_success '--show-origin with --list' '
 		file:.git/../include/relative.include	user.relative=include
 		command line:	user.cmdline=true
 	EOF
+	GIT_CONFIG_NOSYSTEM=1 \
 	git -c user.cmdline=true config --list --show-origin >output &&
 	test_cmp expect output
 '
@@ -1281,6 +1280,7 @@ test_expect_success '--show-origin with --list --null' '
 		includeQcommand line:Quser.cmdline
 		trueQ
 	EOF
+	GIT_CONFIG_NOSYSTEM=1 \
 	git -c user.cmdline=true config --null --list --show-origin >output.raw &&
 	nul_to_q <output.raw >output &&
 	# The here-doc above adds a newline that the --null output would not
@@ -1304,6 +1304,7 @@ test_expect_success '--show-origin with --get-regexp' '
 		file:$HOME/.gitconfig	user.global true
 		file:.git/config	user.local true
 	EOF
+	GIT_CONFIG_NOSYSTEM=1 \
 	git config --show-origin --get-regexp "user\.[g|l].*" >output &&
 	test_cmp expect output
 '
@@ -1312,6 +1313,7 @@ test_expect_success '--show-origin getting a single key' '
 	cat >expect <<-\EOF &&
 		file:.git/config	local
 	EOF
+	GIT_CONFIG_NOSYSTEM=1 \
 	git config --show-origin user.override >output &&
 	test_cmp expect output
 '
diff --git a/t/t1308-config-set.sh b/t/t1308-config-set.sh
index 7655c94..5d5adb1 100755
--- a/t/t1308-config-set.sh
+++ b/t/t1308-config-set.sh
@@ -260,6 +260,7 @@ test_expect_success 'iteration shows correct origins' '
 	name=
 	scope=cmdline
 	EOF
+	GIT_CONFIG_NOSYSTEM=1 \
 	GIT_CONFIG_PARAMETERS=$cmdline_config test-config iterate >actual &&
 	test_cmp expect actual
 '
-- 
2.10.0-584-gc9e068c


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 3/4] worktree: honor configuration variables
  2016-09-28 23:30 ` [PATCH 0/4] raising core.abbrev default to 12 hexdigits Junio C Hamano
  2016-09-28 23:30   ` [PATCH 1/4] config: allow customizing /etc/gitconfig location Junio C Hamano
  2016-09-28 23:30   ` [PATCH 2/4] t13xx: do not assume system config is empty Junio C Hamano
@ 2016-09-28 23:30   ` Junio C Hamano
  2016-09-28 23:30   ` [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits Junio C Hamano
  3 siblings, 0 replies; 111+ messages in thread
From: Junio C Hamano @ 2016-09-28 23:30 UTC (permalink / raw)
  To: git; +Cc: peff, torvalds

The command accesses default_abbrev (defined in environment.c and is
updated via core.abbrev configuration), but never makes any call to
git_config().  The output from "worktree list" ignores the abbrev
setting for this reason.

Make a call to git_config() to read the default set of configuration
variables at the beginning of the command.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * This is already queued separately from this series.

 builtin/worktree.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/builtin/worktree.c b/builtin/worktree.c
index 6dcf7bd..5c4854d 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -528,6 +528,8 @@ int cmd_worktree(int ac, const char **av, const char *prefix)
 		OPT_END()
 	};
 
+	git_config(git_default_config, NULL);
+
 	if (ac < 2)
 		usage_with_options(worktree_usage, options);
 	if (!prefix)
-- 
2.10.0-584-gc9e068c


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-28 23:30 ` [PATCH 0/4] raising core.abbrev default to 12 hexdigits Junio C Hamano
                     ` (2 preceding siblings ...)
  2016-09-28 23:30   ` [PATCH 3/4] worktree: honor configuration variables Junio C Hamano
@ 2016-09-28 23:30   ` Junio C Hamano
  2016-09-29  2:44     ` SZEDER Gábor
                       ` (2 more replies)
  3 siblings, 3 replies; 111+ messages in thread
From: Junio C Hamano @ 2016-09-28 23:30 UTC (permalink / raw)
  To: git; +Cc: peff, torvalds

As Peff said, responding in a thread started by Linus's suggestion
to raise the default abbreviation to 12 hexdigits:

    I actually think "12" might be sane for a long time. That's 48 bits of
    sha1, so we'd expect a 50% change of a _single_ collision at 2^24, or 16
    million.  The biggest repository I know about (in number of objects) is
    the one holding all of the objects for all of the forks of
    torvalds/linux on GitHub. It's at about 15 million objects.

    Which _seems_ close, but remember that's the size where we expect to see
    a single collision. They don't become common until much later (I didn't
    compute an exact number, but Linus's 16x sounds about right). I know
    that the growth of the kernel isn't really linear, but I think the need
    to bump to "13" might not just be decades, but possibly a century or
    more.

    So 12 seems reasonable, and the only downside for it (or for "13", for
    that matter) is a few extra bytes. I dunno, maybe people will really
    hate that, but I have a feeling these are mostly cut-and-pasted anyway.

And this does exactly that.

Keep the tests working by explicitly asking for the old 7 hexdigits
setting in the fake system-wide configuration file used for tests.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 environment.c        | 2 +-
 t/gitconfig-for-test | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/environment.c b/environment.c
index ca72464..25daddb 100644
--- a/environment.c
+++ b/environment.c
@@ -16,7 +16,7 @@ int trust_executable_bit = 1;
 int trust_ctime = 1;
 int check_stat = 1;
 int has_symlinks = 1;
-int minimum_abbrev = 4, default_abbrev = 7;
+int minimum_abbrev = 4, default_abbrev = 12;
 int ignore_case;
 int assume_unchanged;
 int prefer_symlink_refs;
diff --git a/t/gitconfig-for-test b/t/gitconfig-for-test
index 4598885..8c28442 100644
--- a/t/gitconfig-for-test
+++ b/t/gitconfig-for-test
@@ -4,3 +4,6 @@
 ;; [user]
 ;;	name = A U Thor
 ;;	email = author@example.com
+
+[core]
+	abbrev = 7
-- 
2.10.0-584-gc9e068c


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-28 23:30   ` [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits Junio C Hamano
@ 2016-09-29  2:44     ` SZEDER Gábor
  2016-09-29  5:27       ` Lukas Fleischer
  2016-09-29  9:15       ` Jeff King
  2016-09-29  5:58     ` Johannes Sixt
  2016-09-29  9:25     ` Jeff King
  2 siblings, 2 replies; 111+ messages in thread
From: SZEDER Gábor @ 2016-09-29  2:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: SZEDER Gábor, peff, torvalds, git

> As Peff said, responding in a thread started by Linus's suggestion
> to raise the default abbreviation to 12 hexdigits:
> 
>     I actually think "12" might be sane for a long time. That's 48 bits of
>     sha1, so we'd expect a 50% change of a _single_ collision at 2^24, or 16

s/change/chance/

I know it's quoted, but still.

>     million.  The biggest repository I know about (in number of objects) is
>     the one holding all of the objects for all of the forks of
>     torvalds/linux on GitHub. It's at about 15 million objects.
> 
>     Which _seems_ close, but remember that's the size where we expect to see
>     a single collision. They don't become common until much later (I didn't
>     compute an exact number, but Linus's 16x sounds about right). I know
>     that the growth of the kernel isn't really linear, but I think the need
>     to bump to "13" might not just be decades, but possibly a century or
>     more.
> 
>     So 12 seems reasonable, and the only downside for it (or for "13", for
>     that matter) is a few extra bytes. I dunno, maybe people will really
>     hate that, but I have a feeling these are mostly cut-and-pasted anyway.

I for one raise my hand in protest...

"few extra bytes" is not the only downside, and it's not at all about
how many characters are copy-and-pasted.  In my opinion it's much more
important that this change wastes 5 columns worth of valuable screen
real estate e.g. for 'git blame' or 'git log --oneline' in projects
that don't need it and certainly won't ever need it.

Sure, users working on smaller repos are free to reset core.abbrev to
its original value.  I don't have any numbers, of course, but I
suspect that there are many more smaller repos out there that this
change will affect disadvantageously, than there are large repos for
which it's beneficial.


> And this does exactly that.
> 
> Keep the tests working by explicitly asking for the old 7 hexdigits
> setting in the fake system-wide configuration file used for tests.
> 
> Signed-off-by: Junio C Hamano <gitster@pobox.com>

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-29  2:44     ` SZEDER Gábor
@ 2016-09-29  5:27       ` Lukas Fleischer
  2016-09-29  9:22         ` Jeff King
  2016-09-29  9:15       ` Jeff King
  1 sibling, 1 reply; 111+ messages in thread
From: Lukas Fleischer @ 2016-09-29  5:27 UTC (permalink / raw)
  To: git; +Cc: SZEDER Gábor, peff, torvalds, git

On Thu, 29 Sep 2016 at 04:44:00, SZEDER Gábor wrote:
> I for one raise my hand in protest...
> 
> "few extra bytes" is not the only downside, and it's not at all about
> how many characters are copy-and-pasted.  In my opinion it's much more
> important that this change wastes 5 columns worth of valuable screen
> real estate e.g. for 'git blame' or 'git log --oneline' in projects
> that don't need it and certainly won't ever need it.
> 
> Sure, users working on smaller repos are free to reset core.abbrev to
> its original value.  I don't have any numbers, of course, but I
> suspect that there are many more smaller repos out there that this
> change will affect disadvantageously, than there are large repos for
> which it's beneficial.

I know this suggestion comes a bit late but would it make sense to let
the repository owner overwrite the core.abbrev setting?

One possible way to implement this would be adding .gitconfig support to
repositories with a very limited set of whitelisted variables allowed in
there (could be core.abbrev only to begin with). Or some entirely
separate mechanism like .gitignore.

With such a mechanism, we could keep the default of 7 which works fine
for most projects. Linus could bump the default to 12 for linux.git. If
some users are not happy with that, they can still overwrite it in their
local Git config. Anybody starting a project could change the initial
value to a suitable value in one of the first commits -- provided they
already have an idea how much the project will grow. That way, hashes
will be "long enough" even for early commits, before any heuristics
could guess that the project would become large.

Opinions?

Regards,
Lukas

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-28 23:30   ` [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits Junio C Hamano
  2016-09-29  2:44     ` SZEDER Gábor
@ 2016-09-29  5:58     ` Johannes Sixt
  2016-09-29 18:05       ` Junio C Hamano
  2016-09-29  9:25     ` Jeff King
  2 siblings, 1 reply; 111+ messages in thread
From: Johannes Sixt @ 2016-09-29  5:58 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, peff, torvalds

Am 29.09.2016 um 01:30 schrieb Junio C Hamano:
> As Peff said, responding in a thread started by Linus's suggestion
> to raise the default abbreviation to 12 hexdigits:

This is waayy too large for a new default. The vast majority of 
repositories is smallish. For those, the long sequences of hex digits 
are an uglification that is almost unbearable.

I know that kernel developers are important, but their importance has 
long been outnumbered by the anonymous and silent masses of users.

Personally, I use 8 digits just because it is a "rounder" number than 7, 
but in all of my repositories 7 would still work just as well.

-- Hannes


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 2/4] t13xx: do not assume system config is empty
  2016-09-28 23:30   ` [PATCH 2/4] t13xx: do not assume system config is empty Junio C Hamano
@ 2016-09-29  9:01     ` Jeff King
  2016-09-29 18:13       ` Junio C Hamano
  0 siblings, 1 reply; 111+ messages in thread
From: Jeff King @ 2016-09-29  9:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, torvalds

On Wed, Sep 28, 2016 at 04:30:45PM -0700, Junio C Hamano wrote:

> The tests for show-origin codepath in "git config" however cannot be
> tweaked with "--local" etc., because they wants to read also from
> $HOME/.gitconfig and make sure what comes from where.  Disable
> reading from the system-wide config with GIT_CONFIG_NOSYSTEM=1 for
> these tests.

I think anytime you would use GIT_CONFIG_NOSYSTEM over --local, it is an
indication that the test is trying to check how multiple sources
interact. And the right thing to do for them is to set GIT_ETC_GITCONFIG
to some known quantity. We just couldn't do that before, so we skipped
it.

IOW, something like the patch below (on top of yours). Note that the
commands that are doing a "--get" and not a "--list" don't actually seem
to need either (because they are getting the values out of the local
file anyway), so we could drop the setting of GIT_ETC_GITCONFIG from
them entirely.

diff --git a/t/t1300-repo-config.sh b/t/t1300-repo-config.sh
index b998568..d2476a8 100755
--- a/t/t1300-repo-config.sh
+++ b/t/t1300-repo-config.sh
@@ -1234,6 +1234,11 @@ test_expect_success 'set up --show-origin tests' '
 		[user]
 			relative = include
 	EOF
+	cat >"$HOME"/etc-gitconfig <<-\EOF &&
+		[user]
+			system = true
+			override = system
+	EOF
 	cat >"$HOME"/.gitconfig <<-EOF &&
 		[user]
 			global = true
@@ -1252,6 +1257,8 @@ test_expect_success 'set up --show-origin tests' '
 
 test_expect_success '--show-origin with --list' '
 	cat >expect <<-EOF &&
+		file:$HOME/etc-gitconfig	user.system=true
+		file:$HOME/etc-gitconfig	user.override=system
 		file:$HOME/.gitconfig	user.global=true
 		file:$HOME/.gitconfig	user.override=global
 		file:$HOME/.gitconfig	include.path=$INCLUDE_DIR/absolute.include
@@ -1262,14 +1269,16 @@ test_expect_success '--show-origin with --list' '
 		file:.git/../include/relative.include	user.relative=include
 		command line:	user.cmdline=true
 	EOF
-	GIT_CONFIG_NOSYSTEM=1 \
+	GIT_ETC_GITCONFIG=$HOME/etc-gitconfig \
 	git -c user.cmdline=true config --list --show-origin >output &&
 	test_cmp expect output
 '
 
 test_expect_success '--show-origin with --list --null' '
 	cat >expect <<-EOF &&
-		file:$HOME/.gitconfigQuser.global
+		file:$HOME/etc-gitconfigQuser.system
+		trueQfile:$HOME/etc-gitconfigQuser.override
+		systemQfile:$HOME/.gitconfigQuser.global
 		trueQfile:$HOME/.gitconfigQuser.override
 		globalQfile:$HOME/.gitconfigQinclude.path
 		$INCLUDE_DIR/absolute.includeQfile:$INCLUDE_DIR/absolute.includeQuser.absolute
@@ -1280,7 +1289,7 @@ test_expect_success '--show-origin with --list --null' '
 		includeQcommand line:Quser.cmdline
 		trueQ
 	EOF
-	GIT_CONFIG_NOSYSTEM=1 \
+	GIT_ETC_GITCONFIG=$HOME/etc-gitconfig \
 	git -c user.cmdline=true config --null --list --show-origin >output.raw &&
 	nul_to_q <output.raw >output &&
 	# The here-doc above adds a newline that the --null output would not
@@ -1304,7 +1313,7 @@ test_expect_success '--show-origin with --get-regexp' '
 		file:$HOME/.gitconfig	user.global true
 		file:.git/config	user.local true
 	EOF
-	GIT_CONFIG_NOSYSTEM=1 \
+	GIT_ETC_GITCONFIG=$HOME/etc-gitconfig \
 	git config --show-origin --get-regexp "user\.[g|l].*" >output &&
 	test_cmp expect output
 '
@@ -1313,7 +1322,7 @@ test_expect_success '--show-origin getting a single key' '
 	cat >expect <<-\EOF &&
 		file:.git/config	local
 	EOF
-	GIT_CONFIG_NOSYSTEM=1 \
+	GIT_ETC_GITCONFIG=$HOME/etc-gitconfig \
 	git config --show-origin user.override >output &&
 	test_cmp expect output
 '

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-29  2:44     ` SZEDER Gábor
  2016-09-29  5:27       ` Lukas Fleischer
@ 2016-09-29  9:15       ` Jeff King
  2016-09-29 10:03         ` Matthieu Moy
  2016-09-29 12:52         ` SZEDER Gábor
  1 sibling, 2 replies; 111+ messages in thread
From: Jeff King @ 2016-09-29  9:15 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Junio C Hamano, torvalds, git

On Thu, Sep 29, 2016 at 04:44:00AM +0200, SZEDER Gábor wrote:

> >     So 12 seems reasonable, and the only downside for it (or for "13", for
> >     that matter) is a few extra bytes. I dunno, maybe people will really
> >     hate that, but I have a feeling these are mostly cut-and-pasted anyway.
> 
> I for one raise my hand in protest...
> 
> "few extra bytes" is not the only downside, and it's not at all about
> how many characters are copy-and-pasted.  In my opinion it's much more
> important that this change wastes 5 columns worth of valuable screen
> real estate e.g. for 'git blame' or 'git log --oneline' in projects
> that don't need it and certainly won't ever need it.

True. The core of the issue is that we really only care about this
minimum length when _storing_ an abbreviation, but we don't know when
the user is just looking at it in the moment, and when they are going to
stick it in a commit message, email, or bug tracker.

In an ideal world, anybody who was about to store it would run "git
describe" or something to come up with some canonical reference format.
And we could just bump the default minimum there. Personally, I almost
exclusively cite commits as the output of:

  git log -1 --pretty='tformat:%h (%s, %ad)' --date=short

and I'd be fine to stick "--abbrev=12" in there for future-proofing. But
I don't know what the kernel or other projects do.

I'd also be curious to know if the patch I sent in [1] to more
aggressively prefer commits would make this less of an issue, and people
wouldn't care as much about using longer hashes in the first place. So
one option is to merge that (and possibly even make it the default) and
see if people still care in 6 months.

-Peff

[1] http://public-inbox.org/git/20160927123801.3bpdg3hap3kzzfmv@sigill.intra.peff.net/

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-29  5:27       ` Lukas Fleischer
@ 2016-09-29  9:22         ` Jeff King
  0 siblings, 0 replies; 111+ messages in thread
From: Jeff King @ 2016-09-29  9:22 UTC (permalink / raw)
  To: Lukas Fleischer; +Cc: git, SZEDER Gábor, torvalds

On Thu, Sep 29, 2016 at 07:27:01AM +0200, Lukas Fleischer wrote:

> > Sure, users working on smaller repos are free to reset core.abbrev to
> > its original value.  I don't have any numbers, of course, but I
> > suspect that there are many more smaller repos out there that this
> > change will affect disadvantageously, than there are large repos for
> > which it's beneficial.
> 
> I know this suggestion comes a bit late but would it make sense to let
> the repository owner overwrite the core.abbrev setting?
> 
> One possible way to implement this would be adding .gitconfig support to
> repositories with a very limited set of whitelisted variables allowed in
> there (could be core.abbrev only to begin with). Or some entirely
> separate mechanism like .gitignore.

The suggestion for versioned repository-level config comes up from time
to time; you can find other instances in the list archive. Usually the
biggest issue is that usually nobody comes up with a good example of
something that the project would actually want to set. Setting
"core.abbrev" at least seems plausible.

Though...

> With such a mechanism, we could keep the default of 7 which works fine
> for most projects. Linus could bump the default to 12 for linux.git. If
> some users are not happy with that, they can still overwrite it in their
> local Git config. Anybody starting a project could change the initial
> value to a suitable value in one of the first commits -- provided they
> already have an idea how much the project will grow. That way, hashes
> will be "long enough" even for early commits, before any heuristics
> could guess that the project would become large.

I wonder if in practice we would do just as well to size default_abbrev
dynamically based on the number of objects. That doesn't help projects
which are just starting, but will eventually grow gigantic.  But I doubt
that most projects would have the foresight to preemptively set
core.abbrev. And that would at least reduce the impact as the project
_does_ get big.

-Peff

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-28 23:30   ` [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits Junio C Hamano
  2016-09-29  2:44     ` SZEDER Gábor
  2016-09-29  5:58     ` Johannes Sixt
@ 2016-09-29  9:25     ` Jeff King
  2 siblings, 0 replies; 111+ messages in thread
From: Jeff King @ 2016-09-29  9:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, torvalds

On Wed, Sep 28, 2016 at 04:30:47PM -0700, Junio C Hamano wrote:

> As Peff said, responding in a thread started by Linus's suggestion
> to raise the default abbreviation to 12 hexdigits:
> 
>     I actually think "12" might be sane for a long time. That's 48 bits of
>     sha1, so we'd expect a 50% change of a _single_ collision at 2^24, or 16
>     million.  The biggest repository I know about (in number of objects) is
>     the one holding all of the objects for all of the forks of
>     torvalds/linux on GitHub. It's at about 15 million objects.
> 
>     Which _seems_ close, but remember that's the size where we expect to see
>     a single collision. They don't become common until much later (I didn't
>     compute an exact number, but Linus's 16x sounds about right). I know
>     that the growth of the kernel isn't really linear, but I think the need
>     to bump to "13" might not just be decades, but possibly a century or
>     more.
> 
>     So 12 seems reasonable, and the only downside for it (or for "13", for
>     that matter) is a few extra bytes. I dunno, maybe people will really
>     hate that, but I have a feeling these are mostly cut-and-pasted anyway.

I am not sure my quote is a good rationale for this bump. It was meant
to be a rationale that "12" is big enough, but the "I dunno" at the end
kind of glosses over the downsides.

-Peff

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 1/4] config: allow customizing /etc/gitconfig location
  2016-09-28 23:30   ` [PATCH 1/4] config: allow customizing /etc/gitconfig location Junio C Hamano
@ 2016-09-29  9:53     ` Jakub Narębski
  2016-09-29 17:20       ` Junio C Hamano
  0 siblings, 1 reply; 111+ messages in thread
From: Jakub Narębski @ 2016-09-29  9:53 UTC (permalink / raw)
  To: Junio C Hamano, git; +Cc: Jeff King, Linus Torvalds

W dniu 29.09.2016 o 01:30, Junio C Hamano pisze:
> With a new environment variable GIT_ETC_GITCONFIG, the users can
> specify a file that is used instead of /etc/gitconfig to read (and
> write) the system-wide configuration.

Why it is named GIT_ETC_GITCONFIG (which is Unix-ism), and not
GIT_CONFIG_SYSTEM / GIT_CONFIG_SYSTEM_PATH, that is something
OS-neutral?

-- 
Jakub Narębski


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-29  9:15       ` Jeff King
@ 2016-09-29 10:03         ` Matthieu Moy
  2016-09-29 12:52         ` SZEDER Gábor
  1 sibling, 0 replies; 111+ messages in thread
From: Matthieu Moy @ 2016-09-29 10:03 UTC (permalink / raw)
  To: Jeff King; +Cc: SZEDER Gábor, Junio C Hamano, torvalds, git

Jeff King <peff@peff.net> writes:

> On Thu, Sep 29, 2016 at 04:44:00AM +0200, SZEDER Gábor wrote:
>
>> >     So 12 seems reasonable, and the only downside for it (or for "13", for
>> >     that matter) is a few extra bytes. I dunno, maybe people will really
>> >     hate that, but I have a feeling these are mostly cut-and-pasted anyway.
>> 
>> I for one raise my hand in protest...
>> 
>> "few extra bytes" is not the only downside, and it's not at all about
>> how many characters are copy-and-pasted.  In my opinion it's much more
>> important that this change wastes 5 columns worth of valuable screen
>> real estate e.g. for 'git blame' or 'git log --oneline' in projects
>> that don't need it and certainly won't ever need it.
>
> True. The core of the issue is that we really only care about this
> minimum length when _storing_ an abbreviation, but we don't know when
> the user is just looking at it in the moment, and when they are going to
> stick it in a commit message, email, or bug tracker.

Perhaps a compromise would be to adapt the length to the size of the
project _and_ keep a huge margin. So, essentially, we'd have small
projects stick to the 7 characters, and very quickly bump to 12.

So, for a fast-growing project, there would be a short window at the
beginning of the project where people could cut-and-past short hashes.
OTOH, small projects could keep these few columns of screen real-estate.

That said, I can certainly live without these 5 columns, don't take my
message as an objection to setting to 12 right away.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 10/10] get_short_sha1: list ambiguous objects on error
  2016-09-26 12:00         ` [PATCH 10/10] get_short_sha1: list ambiguous objects on error Jeff King
  2016-09-26 16:36           ` Linus Torvalds
  2016-09-26 17:30           ` Junio C Hamano
@ 2016-09-29 11:46           ` Kyle J. McKay
  2016-09-29 13:03             ` Jeff King
  2 siblings, 1 reply; 111+ messages in thread
From: Kyle J. McKay @ 2016-09-29 11:46 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Linus Torvalds, Git Mailing List

On Sep 26, 2016, at 05:00, Jeff King wrote:

>  $ git rev-parse b2e1
>  error: short SHA1 b2e1 is ambiguous
>  hint: The candidates are:
>  hint:   b2e1196 tag v2.8.0-rc1
>  hint:   b2e11d1 tree
>  hint:   b2e1632 commit 2007-11-14 - Merge branch 'bs/maint-commit- 
> options'
>  hint:   b2e1759 blob
>  hint:   b2e18954 blob
>  hint:   b2e1895c blob
>  fatal: ambiguous argument 'b2e1': unknown revision or path not in  
> the working tree.
>  Use '--' to separate paths from revisions, like this:
>  'git <command> [<revision>...] -- [<file>...]'

This hint: information is excellent.  There needs to be a way to show  
it on demand.

$ git rev-parse --disambiguate=b2e1
b2e11962c5e6a9c81aa712c751c83a743fd4f384
b2e11d1bb40c5f81a2f4e37b9f9a60ec7474eeab
b2e163272c01aca4aee4684f5c683ba341c1953d
b2e18954c03ff502053cb74d142faab7d2a8dacb
b2e1895ca92ec2037349d88b945ba64ebf16d62d

Not nearly so helpful, but the operation of --disambiguate cannot be  
changed without breaking current scripts.

Can your excellent "hint:" output above be attached to the -- 
disambiguate option somehow, please.  Something like this perhaps:

$ git rev-parse --disambiguate-list=b2e1
b2e1196 tag v2.8.0-rc1
b2e11d1 tree
b2e1632 commit 2007-11-14 - Merge branch 'bs/maint-commit-options'
b2e1759 blob
b2e18954 blob
b2e1895c blob

Any option name will do, --disambiguate-verbose, --disambiguate- 
extended, --disambiguate-long, --disambiguate-log, --disambiguate- 
help, --disambiguate-show-me-something-useful-to-humans-not-scripts ...

--Kyle

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-29  9:15       ` Jeff King
  2016-09-29 10:03         ` Matthieu Moy
@ 2016-09-29 12:52         ` SZEDER Gábor
  1 sibling, 0 replies; 111+ messages in thread
From: SZEDER Gábor @ 2016-09-29 12:52 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, torvalds, git


Quoting Jeff King <peff@peff.net>:

> On Thu, Sep 29, 2016 at 04:44:00AM +0200, SZEDER Gábor wrote:
>
>> >     So 12 seems reasonable, and the only downside for it (or for "13", for
>> >     that matter) is a few extra bytes. I dunno, maybe people will really
>> >     hate that, but I have a feeling these are mostly  
>> cut-and-pasted anyway.
>>
>> I for one raise my hand in protest...
>>
>> "few extra bytes" is not the only downside, and it's not at all about
>> how many characters are copy-and-pasted.  In my opinion it's much more
>> important that this change wastes 5 columns worth of valuable screen
>> real estate e.g. for 'git blame' or 'git log --oneline' in projects
>> that don't need it and certainly won't ever need it.
>
> True. The core of the issue is that we really only care about this
> minimum length when _storing_ an abbreviation, but we don't know when
> the user is just looking at it in the moment, and when they are going to
> stick it in a commit message, email, or bug tracker.
>
> In an ideal world, anybody who was about to store it would run "git
> describe" or something to come up with some canonical reference format.
> And we could just bump the default minimum there. Personally, I almost
> exclusively cite commits as the output of:
>
>   git log -1 --pretty='tformat:%h (%s, %ad)' --date=short

Interesting, I have a pretty format alias that looks almost like this,
except that I carry a patch locally allowing me to say %as for short
date format :)

What I sometimes wished for is a pretty format specifier for 'git
describe --contains', which would make it convenient to cite commits
like this: v0.99~954 (Initial revision of "git", the information manager
from hell, 2005-04-07).  It's better than the abbreviated object name,
because it will stay unique, assuming that the chosen tag is never
deleted, and it carries extra information for humans (the first release
containing the referenced commit), while the abbreviated object name is
completely meaningless.

The obvious drawback that makes it a non-solution for the problem at
hand is that this format can only refer to commits that are reachable
from a tag and can't be used for commits that are descendants of the
most recent tag, e.g. when fixing a bug introduced after the last
release.  Oh, and the user has to fetch the tag first to be able to
make sense of such a reference.

> and I'd be fine to stick "--abbrev=12" in there for future-proofing. But
> I don't know what the kernel or other projects do.
>
> I'd also be curious to know if the patch I sent in [1] to more
> aggressively prefer commits would make this less of an issue, and people
> wouldn't care as much about using longer hashes in the first place. So
> one option is to merge that (and possibly even make it the default) and
> see if people still care in 6 months.
>
> -Peff
>
> [1]  
> http://public-inbox.org/git/20160927123801.3bpdg3hap3kzzfmv@sigill.intra.peff.net/



^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: Changing the default for "core.abbrev"?
  2016-09-26  3:46 ` Junio C Hamano
  2016-09-26  4:34   ` Jeff King
  2016-09-26  6:33   ` Changing the default for "core.abbrev"? Matthieu Moy
@ 2016-09-29 13:01   ` Kyle J. McKay
  2 siblings, 0 replies; 111+ messages in thread
From: Kyle J. McKay @ 2016-09-29 13:01 UTC (permalink / raw)
  To: Junio C Hamano, Linus Torvalds; +Cc: Git Mailing List

On Sep 25, 2016, at 18:39, Linus Torvalds wrote:

> The kernel, these days, is at roughly 5 million objects, and while the
> seven hex digits are still often enough for uniqueness (and git will
> always add digits *until* it is unique), it's long been at the point
> where I tell people to do
>
>    git config --global core.abbrev 12
>
> because even though git will extend the seven hex digits until the
> object name is unique, that only reflects the *current* situation in
> the repository. With 5 million objects and a very healthy growth rate,
> a 7-8 hex digit number that is unique today is not necessarily unique
> a month or two from now, and then it gets annoying when a commit
> message has a short git ID that is no longer unique when you go back
> and try to figure out what went wrong in that commit.

On Sep 25, 2016, at 20:46, Junio C Hamano wrote:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
>> I can just keep reminding kernel maintainers and developers to update
>> their git config, but maybe it would be a good idea to just admit  
>> that
>> the defaults picked in 2005 weren't necessarily the best ones
>> possible, and those could be bumped up a bit?
>
> I am not quite sure how good any new default would be, though.  Just
> like any timeout is not long enough for somebody, growing projects
> will eventually hit whatever abbreviation length they start with.

This made me curious what the situation is really like.  So I crunched  
some data.

Using a recent clone of $korg/torvalds/linux:

$ git rev-parse --verify d597639e203
error: short SHA1 d597639e203 is ambiguous.
fatal: Needed a single revision

So the kernel already has 11-character "short" SHA1s that are  
ambiguous.  Is a core.abbrev setting of 12 really good enough?

Here are the stats on the kernel's repository:

Ambiguous length 11 (but not at length 12) info:
   prefixes:       2
                   0 (with 1 or more commit disambiguations)

Ambiguous length 10 (but not at length 11) info:
   prefixes:      12
                   3 (with 1 or more commit disambiguations)
                   0 (with 2 or more commit disambiguations)

Ambiguous length 9 (but not at length 10) info:
   prefixes:     186
                  43 (with 1 or more commit disambiguations)
                   1 (with 2 or more commit disambiguations)
                   0 (with 3 or more disambiguations)

Ambiguous length 8 (but not at length 9) info:
   prefixes:    2723
                 651 (with 1 or more commit disambiguations)
                  40 (with 2 or more commit disambiguations)
                   1 (with 3 or more disambiguations)
   maxambig:       3 (there is 1 of them)

Ambiguous length 7 (but not at length 8) info:
   prefixes:   41864
                9842 (with 1 or more commit disambiguations)
                 680 (with 2 or more commit disambiguations)
                 299 (with 3 or more disambiguations)
   maxambig:       3 (there are 299 of them)

The "maxambig" value is the maximum number of disambiguations for any  
single prefix at that prefix length.  So for prefixes of length 7  
there are 299 that disambiguate into 3 objects.

Just out of curiosity, generating stats on the Git repository gives:

Ambiguous length 8 (but not at length 9) info:
   prefixes:       7
                   3 (with 1 or more commit disambiguations)
                   2 (with 2 or more commit disambiguations)
                   0 (with 3 or more disambiguations)

Ambiguous length 7 (but not at length 8) info:
   prefixes:      87
                  36 (with 1 or more commit disambiguations)
                   3 (with 2 or more commit disambiguations)
                   0 (with 3 or more disambiguations)

Running the stats on $github/gitster/git produces some ambiguous  
length 9 prefixes (one of which contains a commit disambiguation).

--Kyle

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 10/10] get_short_sha1: list ambiguous objects on error
  2016-09-26 16:36           ` Linus Torvalds
  2016-09-27  5:42             ` Jacob Keller
  2016-09-27 12:38             ` Jeff King
@ 2016-09-29 13:01             ` Kyle J. McKay
  2016-09-29 13:24               ` Jeff King
  2 siblings, 1 reply; 111+ messages in thread
From: Kyle J. McKay @ 2016-09-29 13:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff King, Junio C Hamano, Git Mailing List

On Sep 26, 2016, at 09:36, Linus Torvalds wrote:

> On Mon, Sep 26, 2016 at 5:00 AM, Jeff King <peff@peff.net> wrote:
>>
>> This patch teaches get_short_sha1() to list the sha1s of the
>> objects it found, along with a few bits of information that
>> may help the user decide which one they meant.
>
> This looks very good to me, but I wonder if it couldn't be even more  
> aggressive.
>
> In particular, the only hashes that most people ever use in short form
> are commit hashes. Those are the ones you'd use in normal human
> interactions to point to something happening.
>
> So when the disambiguation notices that there is ambiguity, but there
> is only _one_ commit, maybe it should just have an aggressive mode
> that says "use that as if it wasn't ambiguous".

If you have this:

faa23ec9b437812ce2fc9a5b3d59418d672debc1 refs/heads/ambig
7f40afe646fa3f8a0f361b6f567d8f7d7a184c10 refs/tags/ambig

and you do this:

$ git rev-parse ambig
warning: refname 'ambig' is ambiguous.
7f40afe646fa3f8a0f361b6f567d8f7d7a184c10

Git automatically prefers the tag over the branch, but it does spit  
out a warning.

> And then have an explicit command (or flag) to do disambiguation for
> when you explicitly want it.

I think you don't even need that.  Git already does disambiguation for  
ref names, picks one and spits out a warning.

Why not do the same for short hash names when it makes sense?

> Rationale: you'd never care about short forms for tags. You'd just use
> the tag name. And while blob ID's certainly show up in short form in
> diff output (in the "index" line), very few people will use them. And
> tree hashes are basically never seen outside of any plumbing commands
> and then seldom in shortened form.
>
> So I think it would make sense to default to a mode that just picks
> the commit hash if there is only one such hash. Sure, some command
> might want a "treeish", but a commit is still more likely than a tree
> or a tag.
>
> But regardless, this series looks like a good thing.

I like it too.

But perhaps it makes sense to actually pick one if there's only one  
disambiguation of the type you're looking for.

For example given:

235234a blob
2352347 tag
235234f tree
2352340 commit

If you are doing "git cat-file blob 235234" it should pick the blob  
and spit out a warning (and similarly for other cat-file types).  But  
"git cat-file -p 235234" would give the fatal error with the  
disambiguation hints because it wants type "any".

If you are doing "git show 235234" it should pick the tag (if it peels  
to a committish) because Git has already set a precedent of preferring  
tags over commits when it disambiguates ref names and otherwise pick  
the commit.

Lets consider this approach using the stats for the Linux kernel:

> Ambiguous prefix length 7 counts:
>   prefixes:   44733
>    objects:   89766
>
> Ambiguous length 11 (but not at length 12) info:
>   prefixes:       2
>                   0 (with 1 or more commit disambiguations)
>
> Ambiguous length 10 (but not at length 11) info:
>   prefixes:      12
>                   3 (with 1 or more commit disambiguations)
>                   0 (with 2 or more commit disambiguations)
>
> Ambiguous length 9 (but not at length 10) info:
>   prefixes:     186
>                  43 (with 1 or more commit disambiguations)
>                   1 (with 2 or more commit disambiguations)
>
> Ambiguous length 8 (but not at length 9) info:
>   prefixes:    2723
>                 651 (with 1 or more commit disambiguations)
>                  40 (with 2 or more commit disambiguations)
>
> Ambiguous length 7 (but not at length 8) info:
>   prefixes:   41864
>                9842 (with 1 or more commit disambiguations)
>                 680 (with 2 or more commit disambiguations)

Of the 44733 ambiguous length 7 prefixes, only about 10539 of them  
disambiguate into one or more commit objects.

But if we apply the "spit a warning and prefer a commit object if  
there's only one and you're looking for a committish" rule, that drops  
the number from 10539 to about 721.  In other words, only about 7% of  
the previously ambiguous short commit SHA1 prefixes would continue to  
be ambiguous at length 7.  In fact it almost makes a prefix length of  
9 good enough, there's just the one at length 9 that disambiguates  
into more than one commit (45f014c52).

--Kyle

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 10/10] get_short_sha1: list ambiguous objects on error
  2016-09-29 11:46           ` Kyle J. McKay
@ 2016-09-29 13:03             ` Jeff King
  2016-09-29 17:19               ` Junio C Hamano
  0 siblings, 1 reply; 111+ messages in thread
From: Jeff King @ 2016-09-29 13:03 UTC (permalink / raw)
  To: Kyle J. McKay; +Cc: Junio C Hamano, Linus Torvalds, Git Mailing List

On Thu, Sep 29, 2016 at 04:46:19AM -0700, Kyle J. McKay wrote:

> This hint: information is excellent.  There needs to be a way to show it on
> demand.
> 
> $ git rev-parse --disambiguate=b2e1
> b2e11962c5e6a9c81aa712c751c83a743fd4f384
> b2e11d1bb40c5f81a2f4e37b9f9a60ec7474eeab
> b2e163272c01aca4aee4684f5c683ba341c1953d
> b2e18954c03ff502053cb74d142faab7d2a8dacb
> b2e1895ca92ec2037349d88b945ba64ebf16d62d
> 
> Not nearly so helpful, but the operation of --disambiguate cannot be changed
> without breaking current scripts.
> 
> Can your excellent "hint:" output above be attached to the --disambiguate
> option somehow, please.  Something like this perhaps:
> 
> $ git rev-parse --disambiguate-list=b2e1
> b2e1196 tag v2.8.0-rc1
> b2e11d1 tree
> b2e1632 commit 2007-11-14 - Merge branch 'bs/maint-commit-options'
> b2e1759 blob
> b2e18954 blob
> b2e1895c blob

I think the "right" way to do this is pipe the list of sha1s into
another git commit which can format them however you want.
Unfortunately, there isn't a single command that does a great job:

  - "cat-file --batch-check" can show you the sha1 and type, but it
    won't abbreviate sha1s, and it won't show you commit/tag information

  - "log --stdin --no-walk" will format the commit however you like, but
    skips the trees and blobs entirely, and the tag can only be seen via
    "%d"

  - "for-each-ref" has flexible formatting, too, but wants to format
    refs, not objects (and doesn't read from stdin).

IMHO that is a sign that our formatting tools aren't as good as they
could be (I think the right tool is cat-file, but it should be able to
do all of the formatting that the other commands can do).

Of course if you really just want human-readable output, then:

  $ git cat-file -e b2e1
  error: short SHA1 b2e1 is ambiguous
  hint: The candidates are:
  hint:   b2e1196 tag v2.8.0-rc1
  hint:   b2e11d1 tree
  hint:   b2e1632 commit 2007-11-14 - Merge branch 'bs/maint-commit-options'
  hint:   b2e1759 blob
  hint:   b2e18954 blob
  hint:   b2e1895c blob
  fatal: Not a valid object name b2e1

is pretty easy.

That being said, I don't mind if somebody wanted to do a rev-parse
option on top of my series. The formatting code is already split into
its own function.

-Peff

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 10/10] get_short_sha1: list ambiguous objects on error
  2016-09-29 13:01             ` Kyle J. McKay
@ 2016-09-29 13:24               ` Jeff King
  2016-09-29 14:36                 ` Kyle J. McKay
  0 siblings, 1 reply; 111+ messages in thread
From: Jeff King @ 2016-09-29 13:24 UTC (permalink / raw)
  To: Kyle J. McKay; +Cc: Linus Torvalds, Junio C Hamano, Git Mailing List

On Thu, Sep 29, 2016 at 06:01:51AM -0700, Kyle J. McKay wrote:

> But perhaps it makes sense to actually pick one if there's only one
> disambiguation of the type you're looking for.
> 
> For example given:
> 
> 235234a blob
> 2352347 tag
> 235234f tree
> 2352340 commit
> 
> If you are doing "git cat-file blob 235234" it should pick the blob and spit
> out a warning (and similarly for other cat-file types).  But "git cat-file
> -p 235234" would give the fatal error with the disambiguation hints because
> it wants type "any".

That code is already there; it's just a matter of whether git has enough
information to know the context. E.g. (in git.git):

  $ git show b2e11
  error: short SHA1 b2e11 is ambiguous
  hint: The candidates are:
  hint:   b2e1196 tag v2.8.0-rc1
  hint:   b2e11d1 tree
  ...

  $ git log b2e11
  commit ab5d01a29eb7380ceab070f0807c2939849c44bc (tag: v2.8.0-rc1)
  ...

The "show" command can show anything, but "log" really wants
committishes, so it's able to disambiguate. It looks like cat-file never
learned to feed its context, but it's probably something like this:

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 94e67eb..ecbb959 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -56,12 +56,22 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 	struct object_info oi = {NULL};
 	struct strbuf sb = STRBUF_INIT;
 	unsigned flags = LOOKUP_REPLACE_OBJECT;
+	unsigned sha1_flags = 0;
 	const char *path = force_path;
 
 	if (unknown_type)
 		flags |= LOOKUP_UNKNOWN_OBJECT;
 
-	if (get_sha1_with_context(obj_name, 0, oid.hash, &obj_context))
+	if (exp_type) {
+		if (!strcmp(exp_type, "commit"))
+			sha1_flags |= GET_SHA1_COMMITTISH;
+		else if(!strcmp(exp_type, "tree"))
+			sha1_flags |= GET_SHA1_TREEISH;
+		else if(!strcmp(exp_type, "blob"))
+			sha1_flags |= GET_SHA1_BLOB;
+	}
+
+	if (get_sha1_with_context(obj_name, sha1_flags, oid.hash, &obj_context))
 		die("Not a valid object name %s", obj_name);
 
 	if (!path)

> If you are doing "git show 235234" it should pick the tag (if it peels to a
> committish) because Git has already set a precedent of preferring tags over
> commits when it disambiguates ref names and otherwise pick the commit.

I'm not convinced that picking the tag is actually helpful in this case;
I agree with Linus that feeding something to "git show" almost always
wants to choose the commit.

I also don't think tag ambiguity in short sha1s is all that interesting.
There are a tiny number of tag objects. Most of your collisions are
going to be with trees or blobs, which should generally outnumber
commits by a factor of 5-10, though it depends on your workflow (git.git
does not have a deep tree, so it's only a factor of 4).

And if you just want to choose a committish over trees and blobs, well,
then; I invite you to check out the core.disambiguate patch I sent
elsewhere in the thread. :)

-Peff

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [PATCH 10/10] get_short_sha1: list ambiguous objects on error
  2016-09-29 13:24               ` Jeff King
@ 2016-09-29 14:36                 ` Kyle J. McKay
  2016-09-29 14:55                   ` Jeff King
  0 siblings, 1 reply; 111+ messages in thread
From: Kyle J. McKay @ 2016-09-29 14:36 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, Junio C Hamano, Git Mailing List

On Sep 29, 2016, at 06:24, Jeff King wrote:

>> If you are doing "git show 235234" it should pick the tag (if it  
>> peels to a
>> committish) because Git has already set a precedent of preferring  
>> tags over
>> commits when it disambiguates ref names and otherwise pick the  
>> commit.
>
> I'm not convinced that picking the tag is actually helpful in this  
> case;
> I agree with Linus that feeding something to "git show" almost always
> wants to choose the commit.

Since "git show" peels tags you end up seeing the commit it refers to  
(assuming it's a committish tag).

> I also don't think tag ambiguity in short sha1s is all that  
> interesting.

The Linux repository has this:

    901069c:
       901069c71415a76d commit iwlagn: change Copyright to 2011
       901069c5c5b15532 tag    (v2.6.38-rc4) Linux 2.6.38-rc4

Since that tag peels to a commit, it seems like it would be incorrect  
to pick the commit over the tag when you're looking for a committish.

Either 901069c should resolve to the tag (which gets peeled to the  
commit) or it should error out with the hint messages.

The Git repository has this:

    c512b03:
       c512b035556eff4d commit Merge branch 'rc/maint-reflog-msg-for- 
forced
       c512b0344196931a tag    (v0.99.9a) GIT 0.99.9a

So perhaps it's a little bit more interesting than it first appears.  :)

--Kyle

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 10/10] get_short_sha1: list ambiguous objects on error
  2016-09-29 14:36                 ` Kyle J. McKay
@ 2016-09-29 14:55                   ` Jeff King
  0 siblings, 0 replies; 111+ messages in thread
From: Jeff King @ 2016-09-29 14:55 UTC (permalink / raw)
  To: Kyle J. McKay; +Cc: Linus Torvalds, Junio C Hamano, Git Mailing List

On Thu, Sep 29, 2016 at 07:36:27AM -0700, Kyle J. McKay wrote:

> On Sep 29, 2016, at 06:24, Jeff King wrote:
> 
> > > If you are doing "git show 235234" it should pick the tag (if it
> > > peels to a
> > > committish) because Git has already set a precedent of preferring
> > > tags over
> > > commits when it disambiguates ref names and otherwise pick the
> > > commit.
> > 
> > I'm not convinced that picking the tag is actually helpful in this case;
> > I agree with Linus that feeding something to "git show" almost always
> > wants to choose the commit.
> 
> Since "git show" peels tags you end up seeing the commit it refers to
> (assuming it's a committish tag).

Yes, but it's almost certainly _not_ the commit you meant. From your
example:

>    c512b03:
>       c512b035556eff4d commit Merge branch 'rc/maint-reflog-msg-for-forced
>       c512b0344196931a tag    (v0.99.9a) GIT 0.99.9a

If I'm looking for the commit c512b03, then it almost certainly isn't
v0.99.9a. That tag's commit is e634aec. Or another way of thinking about
it: you want to guess what the _writer_ of the note meant. Why would
somebody write "c512b03" when they could have written "v0.99.9a"? And
they certainly would not have written it if they meant "e634aec". :)

> > I also don't think tag ambiguity in short sha1s is all that interesting.
> 
> The Linux repository has this:
> 
>    901069c:
>       901069c71415a76d commit iwlagn: change Copyright to 2011
>       901069c5c5b15532 tag    (v2.6.38-rc4) Linux 2.6.38-rc4

Sure, I'm not surprised there's a collision. But I'd expect those to be
a tiny fraction of collisions. Here's the breakdown of object types in
my clone of linux.git:

  $ git cat-file --batch-all-objects --batch-check='%(objecttype)' |
    sort | uniq -c
  1421198 blob
   618073 commit
      479 tag
  2877913 tree

That's a hundredth of a percent tag objects.  The chance that you have
_a_ 7-hex collision with a tag is relatively high. But the chance that
any given collision involves a tag is rather small.

-Peff

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 10/10] get_short_sha1: list ambiguous objects on error
  2016-09-29 13:03             ` Jeff King
@ 2016-09-29 17:19               ` Junio C Hamano
  2016-09-30  5:51                 ` Jacob Keller
  0 siblings, 1 reply; 111+ messages in thread
From: Junio C Hamano @ 2016-09-29 17:19 UTC (permalink / raw)
  To: Jeff King; +Cc: Kyle J. McKay, Linus Torvalds, Git Mailing List

Jeff King <peff@peff.net> writes:

>> $ git rev-parse --disambiguate-list=b2e1
>> b2e1196 tag v2.8.0-rc1
>> b2e11d1 tree
>> b2e1632 commit 2007-11-14 - Merge branch 'bs/maint-commit-options'
>> b2e1759 blob
>> b2e18954 blob
>> b2e1895c blob
>
> I think the "right" way to do this is pipe the list of sha1s into
> another git commit which can format them however you want.
> Unfortunately, there isn't a single command that does a great job:
>
>   - "cat-file --batch-check" can show you the sha1 and type, but it
>     won't abbreviate sha1s, and it won't show you commit/tag information
>
>   - "log --stdin --no-walk" will format the commit however you like, but
>     skips the trees and blobs entirely, and the tag can only be seen via
>     "%d"
>
>   - "for-each-ref" has flexible formatting, too, but wants to format
>     refs, not objects (and doesn't read from stdin).

    - "name-rev" is used to give "describe --contains", and can read
      from its standard input, but has no format customization.
      Another downside of it is that it only wants to see
      committishes.

> IMHO that is a sign that our formatting tools aren't as good as they
> could be (I think the right tool is cat-file, but it should be able to
> do all of the formatting that the other commands can do).
>
> Of course if you really just want human-readable output, then:
>
>   $ git cat-file -e b2e1
>   error: short SHA1 b2e1 is ambiguous
>   hint: The candidates are:
>   hint:   b2e1196 tag v2.8.0-rc1
>   hint:   b2e11d1 tree
>   hint:   b2e1632 commit 2007-11-14 - Merge branch 'bs/maint-commit-options'
>   hint:   b2e1759 blob
>   hint:   b2e18954 blob
>   hint:   b2e1895c blob
>   fatal: Not a valid object name b2e1
>
> is pretty easy.

Yes.  I think adding this to rev-parse that is meant for machines is
probably a mistake, as this "hint" machinery's output will become
even more human friendly over time as we gain experience.

 - If the hypothetical "--disambiguate-list" option wants to produce
   machine parseable output for scripts, it would mean its output
   (and whatgver the reading script can do based on its output for
   humans) will become less useful for humans over time.

 - If the hypothetical "--disambiguate-list" option only wants to
   replicate the human readable output that is designed to be
   improved over time and expects its output _not_ to be interpreted
   by scripts but merely be relayed, then why aren't these scripts
   just invoking the commands that already gives the "hint:" output
   and showing that directly to humans in the first place?

> That being said, I don't mind if somebody wanted to do a rev-parse
> option on top of my series. The formatting code is already split into
> its own function.

So let's not go there.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 1/4] config: allow customizing /etc/gitconfig location
  2016-09-29  9:53     ` Jakub Narębski
@ 2016-09-29 17:20       ` Junio C Hamano
  2016-09-29 17:45         ` Matthieu Moy
  0 siblings, 1 reply; 111+ messages in thread
From: Junio C Hamano @ 2016-09-29 17:20 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: git, Jeff King, Linus Torvalds

Jakub Narębski <jnareb@gmail.com> writes:

> W dniu 29.09.2016 o 01:30, Junio C Hamano pisze:
>> With a new environment variable GIT_ETC_GITCONFIG, the users can
>> specify a file that is used instead of /etc/gitconfig to read (and
>> write) the system-wide configuration.
>
> Why it is named GIT_ETC_GITCONFIG (which is Unix-ism), and not
> GIT_CONFIG_SYSTEM / GIT_CONFIG_SYSTEM_PATH, that is something
> OS-neutral?

Isn't "environment variable" something that came from POSIX world?

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 1/4] config: allow customizing /etc/gitconfig location
  2016-09-29 17:20       ` Junio C Hamano
@ 2016-09-29 17:45         ` Matthieu Moy
  0 siblings, 0 replies; 111+ messages in thread
From: Matthieu Moy @ 2016-09-29 17:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jakub Narębski, git, Jeff King, Linus Torvalds

Junio C Hamano <gitster@pobox.com> writes:

> Jakub Narębski <jnareb@gmail.com> writes:
>
>> W dniu 29.09.2016 o 01:30, Junio C Hamano pisze:
>>> With a new environment variable GIT_ETC_GITCONFIG, the users can
>>> specify a file that is used instead of /etc/gitconfig to read (and
>>> write) the system-wide configuration.
>>
>> Why it is named GIT_ETC_GITCONFIG (which is Unix-ism), and not
>> GIT_CONFIG_SYSTEM / GIT_CONFIG_SYSTEM_PATH, that is something
>> OS-neutral?
>
> Isn't "environment variable" something that came from POSIX world?

I don't know who invented the concept, but environment variables have
been there in the windows world since it exists I think (it existed in
MS-DOS).

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-29  5:58     ` Johannes Sixt
@ 2016-09-29 18:05       ` Junio C Hamano
  2016-09-29 18:37         ` Linus Torvalds
  0 siblings, 1 reply; 111+ messages in thread
From: Junio C Hamano @ 2016-09-29 18:05 UTC (permalink / raw)
  To: Johannes Sixt; +Cc: git, peff, torvalds

Johannes Sixt <j6t@kdbg.org> writes:

> Am 29.09.2016 um 01:30 schrieb Junio C Hamano:
>> As Peff said, responding in a thread started by Linus's suggestion
>> to raise the default abbreviation to 12 hexdigits:
>
> This is waayy too large for a new default. The vast majority of
> repositories is smallish. For those, the long sequences of hex digits
> are an uglification that is almost unbearable.
>
> I know that kernel developers are important, but their importance has
> long been outnumbered by the anonymous and silent masses of users.
>
> Personally, I use 8 digits just because it is a "rounder" number than
> 7, but in all of my repositories 7 would still work just as well.

Yes, "git log --oneline" looks somewhat different and strange for
me, too ;-)

I am sure I'll get used to it if I keep using it, but I suspect that
I'd be irritated as I find myself typing 'q' more and more often to
"less -S" that is automatically invoked when I do "git log --oneline
master.." to see what commits are on my current topic branch.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 2/4] t13xx: do not assume system config is empty
  2016-09-29  9:01     ` Jeff King
@ 2016-09-29 18:13       ` Junio C Hamano
  2016-09-29 18:26         ` Jeff King
  0 siblings, 1 reply; 111+ messages in thread
From: Junio C Hamano @ 2016-09-29 18:13 UTC (permalink / raw)
  To: Jeff King; +Cc: git, torvalds

Jeff King <peff@peff.net> writes:

> I think anytime you would use GIT_CONFIG_NOSYSTEM over --local, it is an
> indication that the test is trying to check how multiple sources
> interact. And the right thing to do for them is to set GIT_ETC_GITCONFIG
> to some known quantity. We just couldn't do that before, so we skipped
> it.  IOW, something like the patch below (on top of yours).

OK, that way we can make sure that "multiple sources" operations do
look at the system-wide stuff.

> Note that the
> commands that are doing a "--get" and not a "--list" don't actually seem
> to need either (because they are getting the values out of the local
> file anyway), so we could drop the setting of GIT_ETC_GITCONFIG from
> them entirely.

"either" meaning "we do not need to add --local and we do not need
GIT_CONFIG_NOSYSTEM"?


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 2/4] t13xx: do not assume system config is empty
  2016-09-29 18:13       ` Junio C Hamano
@ 2016-09-29 18:26         ` Jeff King
  2016-09-29 18:57           ` Junio C Hamano
  2016-09-29 19:06           ` Junio C Hamano
  0 siblings, 2 replies; 111+ messages in thread
From: Jeff King @ 2016-09-29 18:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, torvalds

On Thu, Sep 29, 2016 at 11:13:45AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > I think anytime you would use GIT_CONFIG_NOSYSTEM over --local, it is an
> > indication that the test is trying to check how multiple sources
> > interact. And the right thing to do for them is to set GIT_ETC_GITCONFIG
> > to some known quantity. We just couldn't do that before, so we skipped
> > it.  IOW, something like the patch below (on top of yours).
> 
> OK, that way we can make sure that "multiple sources" operations do
> look at the system-wide stuff.

Exactly.

> > Note that the
> > commands that are doing a "--get" and not a "--list" don't actually seem
> > to need either (because they are getting the values out of the local
> > file anyway), so we could drop the setting of GIT_ETC_GITCONFIG from
> > them entirely.
> 
> "either" meaning "we do not need to add --local and we do not need
> GIT_CONFIG_NOSYSTEM"?

Yes. I didn't test it with your core.abbrev patch 4/4, but I _didn't_
have to touch their expected output after pointing them at a non-empty
etc-gitconfig file in the trash directory. Which implies to me they
don't care either way (which makes sense; they are asking for a specific
key which is supposed to be found in one of the other files).

-Peff

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-29 18:05       ` Junio C Hamano
@ 2016-09-29 18:37         ` Linus Torvalds
  2016-09-29 18:55           ` Linus Torvalds
  0 siblings, 1 reply; 111+ messages in thread
From: Linus Torvalds @ 2016-09-29 18:37 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, Git Mailing List, Jeff King

On Thu, Sep 29, 2016 at 11:05 AM, Junio C Hamano <gitster@pobox.com> wrote:
>
> Yes, "git log --oneline" looks somewhat different and strange for
> me, too ;-)

I'm playing with an early patch to make the default more dynamic.
Let's see how well it works in practice, but it looks fairly
promising. Let me test a bit more and send out an RFC patch..

              Linus

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-29 18:37         ` Linus Torvalds
@ 2016-09-29 18:55           ` Linus Torvalds
  2016-09-29 19:06             ` Linus Torvalds
  2016-09-29 19:16             ` Jeff King
  0 siblings, 2 replies; 111+ messages in thread
From: Linus Torvalds @ 2016-09-29 18:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, Git Mailing List, Jeff King

[-- Attachment #1: Type: text/plain, Size: 2343 bytes --]

On Thu, Sep 29, 2016 at 11:37 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I'm playing with an early patch to make the default more dynamic.
> Let's see how well it works in practice, but it looks fairly
> promising. Let me test a bit more and send out an RFC patch..

Ok, this is *very* rough, and it doesn't actuall pass all the tests,
and I didn't even try to look at why. But it passes the trivial
smell-test, and in particular it actually makes mathematical sense...

I think the patch can speak for itself, but the basic core is this
section in get_short_sha1():

  +       if (len < 16 && !status && (flags & GET_SHA1_AUTOMATIC)) {
  +               unsigned int expect_collision = 1 << (len * 2);
  +               if (ds.nrobjects > expect_collision)
  +                       return SHORT_NAME_AMBIGUOUS;
  +       }

basically, what it says is that we will consider a sha1 ambiguous even
if it was *technically* unique (that's the '!status' part of the test)
if:

 - the length was 15 or less

*and*

 - the number of objects we have is larger than the expected point
where statistically we should start to expect to get one collision.

That "expect_collision" math is actually very simple: each hex
character adds four bits of range, but since we expect collisions at
the square root of the maximum number of objects, we shift by just two
bits per hex digits instead.

The rest of the patch is a trivial change to just initialize the
default short size to -1, and consider that to mean "enable the
automatic size checking with a minimum of 7". And the trivial code to
estimate the number of objects (which ignores duplicates between packs
etc _entirely_).

For the kernel, just the *math* right now actually gives 12
characters. For current git it actually seems to say that 8 is the
correct number. For small projects, you'll still see 7.

ANYWAY. This patch is on top of Jeff's patches in 'pu' (I think those
are great regardless of this patch!), and as mentioned, it fails some
tests. I suspect that the failures might be due to the abbrev_default
being -1, and some other code finds that surprising now. But as
mentioned, I didn't really even look at it.

What do you think? It's actually a fairly simple patch and I really do
think it makes sense and it seems to just DTRT automatically.

              Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 2976 bytes --]

 cache.h       |  1 +
 environment.c |  2 +-
 sha1_name.c   | 21 ++++++++++++++++++++-
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index 6e33f2f..d2da6d1 100644
--- a/cache.h
+++ b/cache.h
@@ -1207,6 +1207,7 @@ struct object_context {
 #define GET_SHA1_TREEISH          020
 #define GET_SHA1_BLOB             040
 #define GET_SHA1_FOLLOW_SYMLINKS 0100
+#define GET_SHA1_AUTOMATIC	 0200
 #define GET_SHA1_ONLY_TO_DIE    04000
 
 #define GET_SHA1_DISAMBIGUATORS \
diff --git a/environment.c b/environment.c
index c1442df..fd6681e 100644
--- a/environment.c
+++ b/environment.c
@@ -16,7 +16,7 @@ int trust_executable_bit = 1;
 int trust_ctime = 1;
 int check_stat = 1;
 int has_symlinks = 1;
-int minimum_abbrev = 4, default_abbrev = 7;
+int minimum_abbrev = 4, default_abbrev = -1;
 int ignore_case;
 int assume_unchanged;
 int prefer_symlink_refs;
diff --git a/sha1_name.c b/sha1_name.c
index 3b647fd..8791ff3 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -15,6 +15,7 @@ typedef int (*disambiguate_hint_fn)(const unsigned char *, void *);
 
 struct disambiguate_state {
 	int len; /* length of prefix in hex chars */
+	unsigned int nrobjects;
 	char hex_pfx[GIT_SHA1_HEXSZ + 1];
 	unsigned char bin_pfx[GIT_SHA1_RAWSZ];
 
@@ -118,6 +119,12 @@ static void find_short_object_filename(struct disambiguate_state *ds)
 
 			if (strlen(de->d_name) != 38)
 				continue;
+
+			// We only look at the one subdirectory, and we assume
+			// each subdirectory is roughly similar, so each object
+			// we find probably has 255 other objects in the other
+			// fan-out directories
+			ds->nrobjects += 256;
 			if (memcmp(de->d_name, ds->hex_pfx + 2, ds->len - 2))
 				continue;
 			memcpy(hex + 2, de->d_name, 38);
@@ -151,6 +158,7 @@ static void unique_in_pack(struct packed_git *p,
 
 	open_pack_index(p);
 	num = p->num_objects;
+	ds->nrobjects += num;
 	last = num;
 	while (first < last) {
 		uint32_t mid = (first + last) / 2;
@@ -426,6 +434,12 @@ static int get_short_sha1(const char *name, int len, unsigned char *sha1,
 		for_each_abbrev(ds.hex_pfx, show_ambiguous_object, &ds);
 	}
 
+	if (len < 16 && !status && (flags & GET_SHA1_AUTOMATIC)) {
+		unsigned int expect_collision = 1 << (len * 2);
+		if (ds.nrobjects > expect_collision)
+			return SHORT_NAME_AMBIGUOUS;
+	}
+
 	return status;
 }
 
@@ -458,14 +472,19 @@ int for_each_abbrev(const char *prefix, each_abbrev_fn fn, void *cb_data)
 int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len)
 {
 	int status, exists;
+	int flags = GET_SHA1_QUIETLY;
 
+	if (len < 0) {
+		flags |= GET_SHA1_AUTOMATIC;
+		len = 7;
+	}
 	sha1_to_hex_r(hex, sha1);
 	if (len == 40 || !len)
 		return 40;
 	exists = has_sha1_file(sha1);
 	while (len < 40) {
 		unsigned char sha1_ret[20];
-		status = get_short_sha1(hex, len, sha1_ret, GET_SHA1_QUIETLY);
+		status = get_short_sha1(hex, len, sha1_ret, flags);
 		if (exists
 		    ? !status
 		    : status == SHORT_NAME_NOT_FOUND) {

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [PATCH 2/4] t13xx: do not assume system config is empty
  2016-09-29 18:26         ` Jeff King
@ 2016-09-29 18:57           ` Junio C Hamano
  2016-09-29 19:18             ` Jeff King
  2016-09-29 19:06           ` Junio C Hamano
  1 sibling, 1 reply; 111+ messages in thread
From: Junio C Hamano @ 2016-09-29 18:57 UTC (permalink / raw)
  To: Jeff King; +Cc: git, torvalds

Jeff King <peff@peff.net> writes:

>> "either" meaning "we do not need to add --local and we do not need
>> GIT_CONFIG_NOSYSTEM"?
>
> Yes. I didn't test it with your core.abbrev patch 4/4, but I _didn't_
> have to touch their expected output after pointing them at a non-empty
> etc-gitconfig file in the trash directory. Which implies to me they
> don't care either way (which makes sense; they are asking for a specific
> key which is supposed to be found in one of the other files).

There is a bit of problem here, though.

 * If we make t1300 point at its own system-wide config, it will be
   in control of its contents, so "find this key" will find only it
   wants to find (or we found a regression).

 * But then if it ever does something that depends on the default
   value of core.abbrev (or whatever we'd tweak in response to the
   next suggestion by Linus ;-), we cannot really allow it to do
   so.  We'd want t/gitconfig-for-test to be the single place that
   we can tweak these things, but we'll have to know t1300 uses its
   own and need to make the same change there, too.

So, I dunno.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 2/4] t13xx: do not assume system config is empty
  2016-09-29 18:26         ` Jeff King
  2016-09-29 18:57           ` Junio C Hamano
@ 2016-09-29 19:06           ` Junio C Hamano
  2016-09-29 19:26             ` Jeff King
  1 sibling, 1 reply; 111+ messages in thread
From: Junio C Hamano @ 2016-09-29 19:06 UTC (permalink / raw)
  To: Jeff King; +Cc: git, torvalds

Jeff King <peff@peff.net> writes:

> On Thu, Sep 29, 2016 at 11:13:45AM -0700, Junio C Hamano wrote:
>
>> Jeff King <peff@peff.net> writes:
>> 
>> > I think anytime you would use GIT_CONFIG_NOSYSTEM over --local, it is an
>> > indication that the test is trying to check how multiple sources
>> > interact. And the right thing to do for them is to set GIT_ETC_GITCONFIG
>> > to some known quantity. We just couldn't do that before, so we skipped
>> > it.  IOW, something like the patch below (on top of yours).
>> 
>> OK, that way we can make sure that "multiple sources" operations do
>> look at the system-wide stuff.
>
> Exactly.

I think it deserves a separate patch and the result is more
understandable.  I've queued this for now (on top of a revised 1/4
that uses GIT_CONFIG_SYSTEM_PATH instead).

-- >8 --
From: Jeff King <peff@peff.net>
Date: Thu, 29 Sep 2016 11:29:10 -0700
Subject: [PATCH] t1300: check also system-wide configuration file in
 --show-origin tests

Because we used to run our tests with GIT_CONFIG_NOSYSTEM, these did
not test that the system-wide configuration file is also read and
shown as one of the origins.  Create a custom/fake system-wide
configuration file and make sure it appears in the output, using the
newly introduced GIT_CONFIG_SYSTEM_PATH mechanism.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 t/t1300-repo-config.sh | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/t/t1300-repo-config.sh b/t/t1300-repo-config.sh
index 0543b62227bf..aa25577709c5 100755
--- a/t/t1300-repo-config.sh
+++ b/t/t1300-repo-config.sh
@@ -1236,6 +1236,11 @@ test_expect_success 'set up --show-origin tests' '
 		[user]
 			relative = include
 	EOF
+	cat >"$HOME"/etc-gitconfig <<-\EOF &&
+		[user]
+			system = true
+			override = system
+	EOF
 	cat >"$HOME"/.gitconfig <<-EOF &&
 		[user]
 			global = true
@@ -1254,6 +1259,8 @@ test_expect_success 'set up --show-origin tests' '
 
 test_expect_success '--show-origin with --list' '
 	cat >expect <<-EOF &&
+		file:$HOME/etc-gitconfig	user.system=true
+		file:$HOME/etc-gitconfig	user.override=system
 		file:$HOME/.gitconfig	user.global=true
 		file:$HOME/.gitconfig	user.override=global
 		file:$HOME/.gitconfig	include.path=$INCLUDE_DIR/absolute.include
@@ -1264,13 +1271,16 @@ test_expect_success '--show-origin with --list' '
 		file:.git/../include/relative.include	user.relative=include
 		command line:	user.cmdline=true
 	EOF
+	GIT_CONFIG_SYSTEM_PATH=$HOME/etc-gitconfig \
 	git -c user.cmdline=true config --list --show-origin >output &&
 	test_cmp expect output
 '
 
 test_expect_success '--show-origin with --list --null' '
 	cat >expect <<-EOF &&
-		file:$HOME/.gitconfigQuser.global
+		file:$HOME/etc-gitconfigQuser.system
+		trueQfile:$HOME/etc-gitconfigQuser.override
+		systemQfile:$HOME/.gitconfigQuser.global
 		trueQfile:$HOME/.gitconfigQuser.override
 		globalQfile:$HOME/.gitconfigQinclude.path
 		$INCLUDE_DIR/absolute.includeQfile:$INCLUDE_DIR/absolute.includeQuser.absolute
@@ -1281,6 +1291,7 @@ test_expect_success '--show-origin with --list --null' '
 		includeQcommand line:Quser.cmdline
 		trueQ
 	EOF
+	GIT_CONFIG_SYSTEM_PATH=$HOME/etc-gitconfig \
 	git -c user.cmdline=true config --null --list --show-origin >output.raw &&
 	nul_to_q <output.raw >output &&
 	# The here-doc above adds a newline that the --null output would not
@@ -1304,6 +1315,7 @@ test_expect_success '--show-origin with --get-regexp' '
 		file:$HOME/.gitconfig	user.global true
 		file:.git/config	user.local true
 	EOF
+	GIT_CONFIG_SYSTEM_PATH=$HOME/etc-gitconfig \
 	git config --show-origin --get-regexp "user\.[g|l].*" >output &&
 	test_cmp expect output
 '
@@ -1312,6 +1324,7 @@ test_expect_success '--show-origin getting a single key' '
 	cat >expect <<-\EOF &&
 		file:.git/config	local
 	EOF
+	GIT_CONFIG_SYSTEM_PATH=$HOME/etc-gitconfig \
 	git config --show-origin user.override >output &&
 	test_cmp expect output
 '
-- 
2.10.0-589-g5adf4e1


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-29 18:55           ` Linus Torvalds
@ 2016-09-29 19:06             ` Linus Torvalds
  2016-09-29 19:42               ` Junio C Hamano
  2016-09-30  0:56               ` Mike Hommey
  2016-09-29 19:16             ` Jeff King
  1 sibling, 2 replies; 111+ messages in thread
From: Linus Torvalds @ 2016-09-29 19:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Sixt, Git Mailing List, Jeff King

On Thu, Sep 29, 2016 at 11:55 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> For the kernel, just the *math* right now actually gives 12
> characters. For current git it actually seems to say that 8 is the
> correct number. For small projects, you'll still see 7.

Sorry, the git number is 9, not 8. The reason is that git has roughly
212k objects, and 9 hex digits gets expected collisions at about 256k
objects.

So the logic means that we'll see 7 hex digits for projects with less
than 16k objects, 8 hex digits if there are less than 64k objects, and
9 hex digits for projects like git that currently have fewer than 256k
objects.

But git itself might not be *that* far from going to 10 hex digits
with my patch.

The kernel uses 12 he digits because the collision math says that's
the right thing for a project with between 4M and 16M objects (with
the kernel being at 5M).

So on the whole the patch really does seem to just do the right thing
automatically.

              Linus

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-29 18:55           ` Linus Torvalds
  2016-09-29 19:06             ` Linus Torvalds
@ 2016-09-29 19:16             ` Jeff King
  2016-09-29 19:40               ` Linus Torvalds
  1 sibling, 1 reply; 111+ messages in thread
From: Jeff King @ 2016-09-29 19:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Johannes Sixt, Git Mailing List

On Thu, Sep 29, 2016 at 11:55:46AM -0700, Linus Torvalds wrote:

> I think the patch can speak for itself, but the basic core is this
> section in get_short_sha1():
> 
>   +       if (len < 16 && !status && (flags & GET_SHA1_AUTOMATIC)) {
>   +               unsigned int expect_collision = 1 << (len * 2);
>   +               if (ds.nrobjects > expect_collision)
>   +                       return SHORT_NAME_AMBIGUOUS;
>   +       }

Hmm. So at length 7, we expect collisions at 2^14, which is 16384. That
seems really low. I mean, by the birthday paradox that's where expect
a 50% chance of a collision. But that's a single collision. We
definitely don't expect them to be common at that size.

So I suspect this could be a bit looser. The real number we care about
is probably something like "there is probability 'p' of a collision when
we add a new object", but I'm not sure what that 'p' would be. Or
perhaps "we accept collisions in 'n' percent of objects". But again, I
don't know that 'n'.

I dunno. I suppose being overly conservative with this number leaves
room for growth. Repositories generally get bigger, not smaller. :)

> What do you think? It's actually a fairly simple patch and I really do
> think it makes sense and it seems to just DTRT automatically.

I like the general idea.

As far as the implementation, I was surprised to see it touch
get_short_sha1() at all. That's, after all, for lookups, and we would
never want to require more characters on the reading side.

I see you worked around it with a flag so that this behavior only kicks
in when called via find_unique_abbrev(). But if you look at the caller:

> @@ -458,14 +472,19 @@ int for_each_abbrev(const char *prefix, each_abbrev_fn fn, void *cb_data)
>  int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len)
>  {
>  	int status, exists;
> +	int flags = GET_SHA1_QUIETLY;
>  
> +	if (len < 0) {
> +		flags |= GET_SHA1_AUTOMATIC;
> +		len = 7;
> +	}
>  	sha1_to_hex_r(hex, sha1);
>  	if (len == 40 || !len)
>  		return 40;
>  	exists = has_sha1_file(sha1);
>  	while (len < 40) {
>  		unsigned char sha1_ret[20];
> -		status = get_short_sha1(hex, len, sha1_ret, GET_SHA1_QUIETLY);
> +		status = get_short_sha1(hex, len, sha1_ret, flags);
>  		if (exists
>  		    ? !status
>  		    : status == SHORT_NAME_NOT_FOUND) {

You can see that we're going to do more work than we would otherwise
need to. Because we start at 7, and ask get_short_sha1() "is this unique
enough?", and looping. But if we _know_ we won't accept any answer
shorter than some N based on the number of objects in the repository,
then we should start at that N.

IOW, something like:

  if (len < 0)
	len = ceil(log_base_2(repository_object_count()));

here, and then you don't have to touch get_short_sha1() at all.

I suspect you pushed it down into get_short_sha1() because it kind-of
does the repository_object_count() step for "free" as it's looking at
the object anyway. But that step is really not very expensive. And I'd
even say you could just ignore loose objects entirely, and treat them
like a rounding error (the way that duplicate objects in packs are
treated).

That leaves you with just an O(# of packs) loop over a linked list. You
could even just keep a global object count up to date in
add_packed_git(), and then it's O(1).

-Peff

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 2/4] t13xx: do not assume system config is empty
  2016-09-29 18:57           ` Junio C Hamano
@ 2016-09-29 19:18             ` Jeff King
  2016-09-29 19:57               ` Junio C Hamano
  0 siblings, 1 reply; 111+ messages in thread
From: Jeff King @ 2016-09-29 19:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, torvalds

On Thu, Sep 29, 2016 at 11:57:02AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> >> "either" meaning "we do not need to add --local and we do not need
> >> GIT_CONFIG_NOSYSTEM"?
> >
> > Yes. I didn't test it with your core.abbrev patch 4/4, but I _didn't_
> > have to touch their expected output after pointing them at a non-empty
> > etc-gitconfig file in the trash directory. Which implies to me they
> > don't care either way (which makes sense; they are asking for a specific
> > key which is supposed to be found in one of the other files).
> 
> There is a bit of problem here, though.
> 
>  * If we make t1300 point at its own system-wide config, it will be
>    in control of its contents, so "find this key" will find only it
>    wants to find (or we found a regression).
> 
>  * But then if it ever does something that depends on the default
>    value of core.abbrev (or whatever we'd tweak in response to the
>    next suggestion by Linus ;-), we cannot really allow it to do
>    so.  We'd want t/gitconfig-for-test to be the single place that
>    we can tweak these things, but we'll have to know t1300 uses its
>    own and need to make the same change there, too.

Right, but I think that's fine. Tests that care deeply about the
contents of etc-gitconfig are unlikely to care about core.abbrev. And in
the off chance that they do, then the worst case is...they get updated
to handle core.abbrev (either passing a command line option, or just
putting core.abbrev in their test file).

I just don't see it being a problem. Adding core.abbrev for the whole
test suite is just about not having a big flag day where we change all
the tests. Changing one or two tests (and again, I'd be surprised if we
even have to do that) doesn't seem like a big deal.

-Peff

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 2/4] t13xx: do not assume system config is empty
  2016-09-29 19:06           ` Junio C Hamano
@ 2016-09-29 19:26             ` Jeff King
  2016-09-29 21:03               ` Junio C Hamano
  0 siblings, 1 reply; 111+ messages in thread
From: Jeff King @ 2016-09-29 19:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, torvalds

On Thu, Sep 29, 2016 at 12:06:15PM -0700, Junio C Hamano wrote:

> I think it deserves a separate patch and the result is more
> understandable.  I've queued this for now (on top of a revised 1/4
> that uses GIT_CONFIG_SYSTEM_PATH instead).

Thanks, makes sense (and I like the new variable name better, by the
way).

> -- >8 --
> From: Jeff King <peff@peff.net>
> Date: Thu, 29 Sep 2016 11:29:10 -0700
> Subject: [PATCH] t1300: check also system-wide configuration file in
>  --show-origin tests
> 
> Because we used to run our tests with GIT_CONFIG_NOSYSTEM, these did
> not test that the system-wide configuration file is also read and
> shown as one of the origins.  Create a custom/fake system-wide
> configuration file and make sure it appears in the output, using the
> newly introduced GIT_CONFIG_SYSTEM_PATH mechanism.
> 
> Signed-off-by: Junio C Hamano <gitster@pobox.com>

Good description.

Signed-off-by: Jeff King <peff@peff.net>

of course.

> @@ -1304,6 +1315,7 @@ test_expect_success '--show-origin with --get-regexp' '
>  		file:$HOME/.gitconfig	user.global true
>  		file:.git/config	user.local true
>  	EOF
> +	GIT_CONFIG_SYSTEM_PATH=$HOME/etc-gitconfig \
>  	git config --show-origin --get-regexp "user\.[g|l].*" >output &&
>  	test_cmp expect output
>  '

This is one is trying to do a multi-file lookup, but we couldn't look in
the system config before. But to naturally extend it, it ought to look
like this on top:

diff --git a/t/t1300-repo-config.sh b/t/t1300-repo-config.sh
index d2476a8..4dd5ce3 100755
--- a/t/t1300-repo-config.sh
+++ b/t/t1300-repo-config.sh
@@ -1310,11 +1310,12 @@ test_expect_success '--show-origin with single file' '
 
 test_expect_success '--show-origin with --get-regexp' '
 	cat >expect <<-EOF &&
+		file:$HOME/etc-gitconfig	user.system true
 		file:$HOME/.gitconfig	user.global true
 		file:.git/config	user.local true
 	EOF
 	GIT_ETC_GITCONFIG=$HOME/etc-gitconfig \
-	git config --show-origin --get-regexp "user\.[g|l].*" >output &&
+	git config --show-origin --get-regexp "user\.[g|l|s].*" >output &&
 	test_cmp expect output
 '
 
> @@ -1312,6 +1324,7 @@ test_expect_success '--show-origin getting a single key' '
>  	cat >expect <<-\EOF &&
>  		file:.git/config	local
>  	EOF
> +	GIT_CONFIG_SYSTEM_PATH=$HOME/etc-gitconfig \
>  	git config --show-origin user.override >output &&
>  	test_cmp expect output
>  '

And I was tempted to say this one should not need to care, but I guess
it is testing that we correctly read the override from the local config
over the global one. So likewise, it is good to check that we also
override the system config (it does not effect the "expect" output, but
that does not mean it is not enhancing the test).

-Peff

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-29 19:16             ` Jeff King
@ 2016-09-29 19:40               ` Linus Torvalds
  2016-09-29 19:45                 ` Junio C Hamano
  0 siblings, 1 reply; 111+ messages in thread
From: Linus Torvalds @ 2016-09-29 19:40 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Johannes Sixt, Git Mailing List

On Thu, Sep 29, 2016 at 12:16 PM, Jeff King <peff@peff.net> wrote:
>
> Hmm. So at length 7, we expect collisions at 2^14, which is 16384. That
> seems really low. I mean, by the birthday paradox that's where expect
> a 50% chance of a collision. But that's a single collision. We
> definitely don't expect them to be common at that size.
>
> So I suspect this could be a bit looser.

So I have to admit that I was surprised by how quickly it actually
decided that 7 isn't enough. In fact, the reason I initially said that
git used 8 digits was that I didn't count very closely, and just
verified that it was more than the default 7.

But quite frankly, I think the math is correct, and part of that is
that the logic is all about not just the current state, but the
"reasonably near future".

So it is indeed fairly aggressive, and the moment you have more
objects than the "we'd expect to probably see  _one_ collision" it
grows the size. But looking at the kernel situation, that really is
what we'd want, because the whole problem with the existing code is
that it only takes the *current* situation into account. That's what
we want to get away from. We want git to pick a number that is sane
from a standpoint of "this project is still growing".

And git _already_ has commits that are ambiguous in 8 hex digits and
need 9. Yes, it's rare today, but the reason I'm telling kernel
developers to use 12 is because while a size-11 collision is very rare
today, it does actually happen, and we want o pick a value where it is
rare enough that even in the near future it's not going to be a big
deal.

Don't get me wrong: collisions aren't fatal. So it's not like we have
to absolutely avoid them, and I really like your patch series exactly
because it makes collisions even less of a deal (particularly since I
expect people will not upgrade immediately, so we'll continue to see
even new 7-hex-digit short forms even in the kernel). So it's a
balance of making the hex string long enough that it's simply not a
big worry.

So I'm sure it *could* be looser, but I actually also really suspect
that git truly *should* use a 9-digit abbreviation rather than 8 (and
7 is definitely starting to be borderline, I think).

> As far as the implementation, I was surprised to see it touch
> get_short_sha1() at all. That's, after all, for lookups, and we would
> never want to require more characters on the reading side.

Heh. The implementation is crap. It was literally a "how can I make
the smallest possible patch" implementation. I was finishing it off
while at a talk by Nicolas Pitre at Linaro Connect where I am right
now.

So I agree - it does extra work just because that's where it all
slotted in with minimal effort.

At a minimum, once it finds a good new default, it should just memoize
that. So a minimal fix to the "it's stupldly recalculating things over
rand over again" would be to just set "default_abbrev" to the value it
finds acceptable after the first time it finds something, so that it
doesn't end up looping _again_ in the future.

But you could easily also just instead have it do something like

      if (default_abbrev < 0)
            default_abbrev = initialize_abbrev();

at startup time if "abbrev_commit" is set, and just do it once and for
all rather rthan the odd loping behavior.

I really just wanted to see how well the concept worked, and I was
happy to see that it gave what I thought were the "correct" numbers.
And the loop was salready there ...

            Linus

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-29 19:06             ` Linus Torvalds
@ 2016-09-29 19:42               ` Junio C Hamano
  2016-09-30  0:56               ` Mike Hommey
  1 sibling, 0 replies; 111+ messages in thread
From: Junio C Hamano @ 2016-09-29 19:42 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Johannes Sixt, Git Mailing List, Jeff King

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Thu, Sep 29, 2016 at 11:55 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> For the kernel, just the *math* right now actually gives 12
>> characters. For current git it actually seems to say that 8 is the
>> correct number. For small projects, you'll still see 7.
>
> Sorry, the git number is 9, not 8. The reason is that git has roughly
> 212k objects, and 9 hex digits gets expected collisions at about 256k
> objects.
>
> So the logic means that we'll see 7 hex digits for projects with less
> than 16k objects, 8 hex digits if there are less than 64k objects, and
> 9 hex digits for projects like git that currently have fewer than 256k
> objects.

Whew.  I was wondering where my brain went wrong, as I knew we have
200k objects and 8 hexdigits means 1<<16 = 64k which is way too
short.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-29 19:40               ` Linus Torvalds
@ 2016-09-29 19:45                 ` Junio C Hamano
  2016-09-29 21:53                   ` Linus Torvalds
  0 siblings, 1 reply; 111+ messages in thread
From: Junio C Hamano @ 2016-09-29 19:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff King, Johannes Sixt, Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> But you could easily also just instead have it do something like
>
>       if (default_abbrev < 0)
>             default_abbrev = initialize_abbrev();
>
> at startup time if "abbrev_commit" is set, and just do it once and for
> all rather rthan the odd loping behavior.

I think that is a reasonable way to go.

#define DEFAULT_ABBREV get_default_abbrev()

would help.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 2/4] t13xx: do not assume system config is empty
  2016-09-29 19:18             ` Jeff King
@ 2016-09-29 19:57               ` Junio C Hamano
  0 siblings, 0 replies; 111+ messages in thread
From: Junio C Hamano @ 2016-09-29 19:57 UTC (permalink / raw)
  To: Jeff King; +Cc: git, torvalds

Jeff King <peff@peff.net> writes:

> I just don't see it being a problem. Adding core.abbrev for the whole
> test suite is just about not having a big flag day where we change all
> the tests. Changing one or two tests (and again, I'd be surprised if we
> even have to do that) doesn't seem like a big deal.

I've already wasted several hours whipping t1300 into shape, because
it was done in not so forward-looking future-proofed way.  I am not
worried about core.abbrev but I am worried more about the next thing
that requires us to add an entry to t/gitconfig-for-test.  Adding a
corresponding entry to retain the old default for that new config to
two places may not be a big deal, but it still makes me feel a bit
uneasy.

In any case, I suspect that Linus's "auto" thing may still need the
custom system config with t1300 clean-up to pass the test, even
though I suspect it would compute that 7 is enough for most of the
tiny repositories our tests use, so I'll polish this a bit more
while waiting for that discussion to settle.

Thanks.





^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 2/4] t13xx: do not assume system config is empty
  2016-09-29 19:26             ` Jeff King
@ 2016-09-29 21:03               ` Junio C Hamano
  2016-09-29 21:08                 ` Jeff King
  0 siblings, 1 reply; 111+ messages in thread
From: Junio C Hamano @ 2016-09-29 21:03 UTC (permalink / raw)
  To: Jeff King; +Cc: git, torvalds

Jeff King <peff@peff.net> writes:

> Good description.
>
> Signed-off-by: Jeff King <peff@peff.net>
>
> of course.
>
>> @@ -1304,6 +1315,7 @@ test_expect_success '--show-origin with --get-regexp' '
>>  		file:$HOME/.gitconfig	user.global true
>>  		file:.git/config	user.local true
>>  	EOF
>> +	GIT_CONFIG_SYSTEM_PATH=$HOME/etc-gitconfig \
>>  	git config --show-origin --get-regexp "user\.[g|l].*" >output &&
>>  	test_cmp expect output
>>  '
>
> This is one is trying to do a multi-file lookup, but we couldn't look in
> the system config before. But to naturally extend it, it ought to look
> like this on top:
>
> diff --git a/t/t1300-repo-config.sh b/t/t1300-repo-config.sh
> index d2476a8..4dd5ce3 100755
> --- a/t/t1300-repo-config.sh
> +++ b/t/t1300-repo-config.sh
> @@ -1310,11 +1310,12 @@ test_expect_success '--show-origin with single file' '
>  
>  test_expect_success '--show-origin with --get-regexp' '
>  	cat >expect <<-EOF &&
> +		file:$HOME/etc-gitconfig	user.system true
>  		file:$HOME/.gitconfig	user.global true
>  		file:.git/config	user.local true
>  	EOF
>  	GIT_ETC_GITCONFIG=$HOME/etc-gitconfig \
> -	git config --show-origin --get-regexp "user\.[g|l].*" >output &&
> +	git config --show-origin --get-regexp "user\.[g|l|s].*" >output &&
>  	test_cmp expect output
>  '

Makes sense modulo you inherited useless vertical bars from the
original.  I'll squash something like that in but without || ;-)

Thanks.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 2/4] t13xx: do not assume system config is empty
  2016-09-29 21:03               ` Junio C Hamano
@ 2016-09-29 21:08                 ` Jeff King
  0 siblings, 0 replies; 111+ messages in thread
From: Jeff King @ 2016-09-29 21:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, torvalds

On Thu, Sep 29, 2016 at 02:03:39PM -0700, Junio C Hamano wrote:

> > -	git config --show-origin --get-regexp "user\.[g|l].*" >output &&
> > +	git config --show-origin --get-regexp "user\.[g|l|s].*" >output &&
> >  	test_cmp expect output
> >  '
> 
> Makes sense modulo you inherited useless vertical bars from the
> original.  I'll squash something like that in but without || ;-)

Heh, I glossed over that completely. Thanks.

-Peff

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-29 19:45                 ` Junio C Hamano
@ 2016-09-29 21:53                   ` Linus Torvalds
  2016-09-29 23:13                     ` Junio C Hamano
  0 siblings, 1 reply; 111+ messages in thread
From: Linus Torvalds @ 2016-09-29 21:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Johannes Sixt, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 623 bytes --]

On Thu, Sep 29, 2016 at 12:45 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
> I think that is a reasonable way to go.
>
> #define DEFAULT_ABBREV get_default_abbrev()
>
> would help.

So something like this that replaces the previous patch?

Somebody should really double-check my heuristics, to see that I did
the pack counting etc right.  It doesn't do alternate loose file
counting at all, and maybe it could matter.  The advantage of the
previous patch was that it got the object counting right almost
automatically, this actually has its own new object counting code and
maybe I screwed it up.

                Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 2090 bytes --]

 cache.h       |  3 ++-
 environment.c |  2 +-
 sha1_file.c   | 43 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index 6e33f2f28..a022e1bd2 100644
--- a/cache.h
+++ b/cache.h
@@ -1186,8 +1186,9 @@ static inline int hex2chr(const char *s)
 }
 
 /* Convert to/from hex/sha1 representation */
+extern int get_default_abbrev(void);
 #define MINIMUM_ABBREV minimum_abbrev
-#define DEFAULT_ABBREV default_abbrev
+#define DEFAULT_ABBREV get_default_abbrev()
 
 struct object_context {
 	unsigned char tree[20];
diff --git a/environment.c b/environment.c
index c1442df9a..fd6681e46 100644
--- a/environment.c
+++ b/environment.c
@@ -16,7 +16,7 @@ int trust_executable_bit = 1;
 int trust_ctime = 1;
 int check_stat = 1;
 int has_symlinks = 1;
-int minimum_abbrev = 4, default_abbrev = 7;
+int minimum_abbrev = 4, default_abbrev = -1;
 int ignore_case;
 int assume_unchanged;
 int prefer_symlink_refs;
diff --git a/sha1_file.c b/sha1_file.c
index ca149a607..28ba04b65 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3720,3 +3720,46 @@ int for_each_packed_object(each_packed_object_fn cb, void *data, unsigned flags)
 	}
 	return r ? r : pack_errors;
 }
+
+static int init_default_abbrev(void)
+{
+	unsigned long count = 0;
+	struct packed_git *p;
+	struct strbuf buf = STRBUF_INIT;
+	DIR *dir;
+	char *name;
+	int ret;
+
+	prepare_packed_git();
+	for (p = packed_git; p; p = p->next) {
+		if (open_pack_index(p))
+			continue;
+		count += p->num_objects;
+	}
+
+	strbuf_addstr(&buf, get_object_directory());
+	strbuf_addstr(&buf, "/42/");
+	name = strbuf_detach(&buf, NULL);
+	dir = opendir(name);
+	free(name);
+	if (dir) {
+		struct dirent *de;
+		while ((de = readdir(dir)) != NULL) {
+			count += 256;
+		}
+		closedir(dir);
+	}
+	for (ret = 7; ret < 15; ret++) {
+		unsigned long expect_collision = 1ul << (ret * 2);
+		if (count < expect_collision)
+			break;
+	}
+	return ret;
+}
+
+int get_default_abbrev(void)
+{
+	if (default_abbrev < 0)
+		default_abbrev = init_default_abbrev();
+	return default_abbrev;
+}

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-29 21:53                   ` Linus Torvalds
@ 2016-09-29 23:13                     ` Junio C Hamano
  2016-09-29 23:20                       ` Junio C Hamano
                                         ` (2 more replies)
  0 siblings, 3 replies; 111+ messages in thread
From: Junio C Hamano @ 2016-09-29 23:13 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff King, Johannes Sixt, Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> Somebody should really double-check my heuristics, to see that I did
> the pack counting etc right.  It doesn't do alternate loose file
> counting at all, and maybe it could matter.  The advantage of the
> previous patch was that it got the object counting right almost
> automatically, this actually has its own new object counting code and
> maybe I screwed it up.

One thing that worries me is if we are ready to start accessing the
object store in all codepaths when we ask for DEFAULT_ABBREV.  The
worries are twofold:

 (1) Do we do the right thing if object store is not available to
     us?  Some commands can be run outside repository, and if our
     call to prepare_packed_git() or loose object iteration barfed
     in some way, that would introduce a regression.

 (2) Is calling prepare_packed_git() too early interfere with how
     the commands expect its own prepare_packed_git() work?  That
     is, if a command has this sequence, "ask DEFAULT_ABBREV,
     arrange things, and then call prepare_packed_git()", and the
     existing "arrange things" step had something that causes a new
     pack to become eligible to be read by prepare_packed_git(),
     like adding to the list of alternate object stores, its own
     prepare_packed_git() will now become a no-op.

I browsed through "tig grep DEFAULT_ABBREV \*.c" and it seems that
in majority of the hits, we not just are ready to start accessing,
but already have an object or two, which must have come from an
already open object store, so they are OK.  Especially the ones that
use it as the last argument to find_unique_abbrev() are OK as we are
about to open the object store to do the computation.

There are very early ones in the program startup sequence in the
following functions, but I do not think of a reason why our new and
early call to prepare_packed_git() might be problematic, given that
all of them require us to have an access to the repository (i.e.
this change cannot introduce a regression where a command used to
work outside a repository but barf when prepare_packed_git() is
called early):

 - builtin/describe.c
 - builtin/rev-list.c
 - builtin/rev-parse.c

I thought that the one in diff.c might be problematic when the "git
diff" command is run outside a repository with the "--no-index"
option, but it appears that init_default_abbrev() seems to be OK
when run outside a repository.

There is one in parse-options-cb.c that is used to parse the --abbrev
command line option.  This might cause a cosmetic problem but when
the user is asking for an abbreviation, it is expected that we will
have an access to the object store anyway, so it may be OK.

I am sorry that none of the above is about your math ;-)  I suck at
math so I won't comment.


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-29 23:13                     ` Junio C Hamano
@ 2016-09-29 23:20                       ` Junio C Hamano
  2016-09-30  0:20                       ` Linus Torvalds
  2016-09-30  7:47                       ` Jeff King
  2 siblings, 0 replies; 111+ messages in thread
From: Junio C Hamano @ 2016-09-29 23:20 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff King, Johannes Sixt, Git Mailing List

Junio C Hamano <gitster@pobox.com> writes:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
>> The advantage of the
>> previous patch was that it got the object counting right almost
>> automatically, this actually has its own new object counting code and
>> maybe I screwed it up.

I guess another advantage of your original approach was that it
delayed the counting to the very last minute, so the things that
worried me in my previous response were automatically made
non-issues.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-29 23:13                     ` Junio C Hamano
  2016-09-29 23:20                       ` Junio C Hamano
@ 2016-09-30  0:20                       ` Linus Torvalds
  2016-09-30  0:28                         ` Linus Torvalds
  2016-09-30  7:47                       ` Jeff King
  2 siblings, 1 reply; 111+ messages in thread
From: Linus Torvalds @ 2016-09-30  0:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Johannes Sixt, Git Mailing List

On Thu, Sep 29, 2016 at 4:13 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
> One thing that worries me is if we are ready to start accessing the
> object store in all codepaths when we ask for DEFAULT_ABBREV.

Yes. That was my main worry too. I also looked at just doing an explicit

     if (abbrev_commit && default_abbrev < 0)
          default_abbrev = get_default_abbrev();

and in many ways that would be nicer exactly because the point where
this happens is then explicit, instead of being hidden behind that
macro that may end up being done in random places.

But it wasn't entirely obvious which all paths would need that
initialization either, so on the whole it was very much a "six of one,
half a dozen of the other" thing.

As you say, my original patch had neither of those issues. It just
stupidly re-did the loop over and over, and maybe the right thing to
do is to have that original code, but just short-circuit the "over and
over" behavior by just resetting default_abbrev to the value we do
find.

              Linus

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-30  0:20                       ` Linus Torvalds
@ 2016-09-30  0:28                         ` Linus Torvalds
  2016-09-30  0:57                           ` Linus Torvalds
  0 siblings, 1 reply; 111+ messages in thread
From: Linus Torvalds @ 2016-09-30  0:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Johannes Sixt, Git Mailing List

On Thu, Sep 29, 2016 at 5:20 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> As you say, my original patch had neither of those issues.

To be fair, my original patch had a different worry that I didn't
bother with: what if one of the _other_ callers of "get_short_sha1()"
passed in -1 to it.  I only handled the -1 case in th eone path care
about in that first RFC for testing. So I'm *not* suggesting you
should apply my first version,, It has issues too.

Let me see if I can massage my first hacky RFC test-patch into
something more reliable.

              Linus

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-29 19:06             ` Linus Torvalds
  2016-09-29 19:42               ` Junio C Hamano
@ 2016-09-30  0:56               ` Mike Hommey
  2016-09-30  1:01                 ` Linus Torvalds
  1 sibling, 1 reply; 111+ messages in thread
From: Mike Hommey @ 2016-09-30  0:56 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Johannes Sixt, Git Mailing List, Jeff King

On Thu, Sep 29, 2016 at 12:06:23PM -0700, Linus Torvalds wrote:
> On Thu, Sep 29, 2016 at 11:55 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > For the kernel, just the *math* right now actually gives 12
> > characters. For current git it actually seems to say that 8 is the
> > correct number. For small projects, you'll still see 7.
> 
> Sorry, the git number is 9, not 8. The reason is that git has roughly
> 212k objects, and 9 hex digits gets expected collisions at about 256k
> objects.
> 
> So the logic means that we'll see 7 hex digits for projects with less
> than 16k objects, 8 hex digits if there are less than 64k objects, and
> 9 hex digits for projects like git that currently have fewer than 256k
> objects.
> 
> But git itself might not be *that* far from going to 10 hex digits
> with my patch.
> 
> The kernel uses 12 he digits because the collision math says that's
> the right thing for a project with between 4M and 16M objects (with
> the kernel being at 5M).

OTOH, how often does one refer to trees or blobs with abbreviated sha1s?
Most of the time, you'd use abbreviated sha1s for commits. And the number
of commits in git and the kernel repositories are much lower than the
number of overall objects.

rev-list --all --count on the git repo gives me 46790. On the kernel, it
gives 618078.

Now, the interesting thing is looking at the *actual* collisions in
those spaces.

At 9 digits, there's only one commit collision in the kernel repo:
  45f014c5264f5e68ef0e51b36f4ef5ede3d18397
  45f014c52eef022873b19d6a20eb0ec9668f2b09

And two commit collisions at 8 digits in the git repo:
  1536dd9c1df0b7167b139f6666080cc4774ef63f
  1536dd9c61b5582cf079999057cb715dd6dc6620

  2e6e3e82ee36b3e1bec1db8db24817270080424e
  2e6e3e829f3759823d70e7af511bc04cd05ad0af

At 7 digits, there are 5 actual commit collisions in the git repo and
718 in the kernel repo only one of those collisions involve more than 2
commits.

Mike

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-30  0:28                         ` Linus Torvalds
@ 2016-09-30  0:57                           ` Linus Torvalds
  2016-09-30  1:18                             ` Linus Torvalds
  0 siblings, 1 reply; 111+ messages in thread
From: Linus Torvalds @ 2016-09-30  0:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Johannes Sixt, Git Mailing List

On Thu, Sep 29, 2016 at 5:28 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> To be fair, my original patch had a different worry that I didn't
> bother with: what if one of the _other_ callers of "get_short_sha1()"
> passed in -1 to it.  I only handled the -1 case in th eone path care
> about in that first RFC for testing. So I'm *not* suggesting you
> should apply my first version,, It has issues too.

Actually, all the other cases seem to be "parse a SHA1 with a known
length", so they really don't have a negative length.  So this seems
ok, and is easier to verify than the "what all contexts might use
DEFAULT_ABBREV" thing. There's only a few callers, and it's a static
function so it's easy to check it locally in sha1_name.c.

               Linus

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-30  0:56               ` Mike Hommey
@ 2016-09-30  1:01                 ` Linus Torvalds
  2016-09-30 19:41                   ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 111+ messages in thread
From: Linus Torvalds @ 2016-09-30  1:01 UTC (permalink / raw)
  To: Mike Hommey; +Cc: Junio C Hamano, Johannes Sixt, Git Mailing List, Jeff King

On Thu, Sep 29, 2016 at 5:56 PM, Mike Hommey <mh@glandium.org> wrote:
>
> OTOH, how often does one refer to trees or blobs with abbreviated sha1s?
> Most of the time, you'd use abbreviated sha1s for commits. And the number
> of commits in git and the kernel repositories are much lower than the
> number of overall objects.

See that whole other discussion about this. I agree. If we only ever
worried about just commits, the abbreviation length wouldn't need to
be grown nearly as aggressively. The current default would still be
wrong for the kernel, but it wouldn't be as noticeably wrong, and
updating it to 8 or 9 would be fine.

That said, people argued against that too. We *do* end up having
abbreviated SHA1's for blobs in the diff index. When I said that _I_
neer use it, somebody piped up to say that they do.

So I'd rather just keep the existing semantics (a hash is a hash is a
hash), and just abbreviate at a sufficient point that we don't have to
worry too much about disambiguating further by object type.

      Linus

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-30  0:57                           ` Linus Torvalds
@ 2016-09-30  1:18                             ` Linus Torvalds
  2016-09-30  3:54                               ` Junio C Hamano
  2016-09-30  8:06                               ` Jeff King
  0 siblings, 2 replies; 111+ messages in thread
From: Linus Torvalds @ 2016-09-30  1:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Johannes Sixt, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 1470 bytes --]

On Thu, Sep 29, 2016 at 5:57 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> Actually, all the other cases seem to be "parse a SHA1 with a known
> length", so they really don't have a negative length.  So this seems
> ok, and is easier to verify than the "what all contexts might use
> DEFAULT_ABBREV" thing. There's only a few callers, and it's a static
> function so it's easy to check it locally in sha1_name.c.

Here's my original patch with just a tiny change that instead of
starting the automatic guessing at 7 each time, it starts at
"default_automatic_abbrev", which is initialized to 7.

The difference is that if we decide that "oh, that was too small, need
to repeat", we also update that "default_automatic_abbrev" value, so
that we won't start at the number that we now know was too small.

So it still loops over the abbrev values, but now it only loops a
couple of times.

I actually verified the performance impact by doing

      time git rev-list --abbrev-commit HEAD > /dev/null

on the kernel git tree, and it does actually matter. With my original
patch, we wasted a noticeable amount of time on just the extra
looping, with this it's down to the same performance as just doing it
once at init time (it's about 12s vs 9s on my laptop).

So this patch may actually be "production ready" apart from the fact
that some tests still fail (at least t2027-worktree-list.sh) because
of different short SHA1 cases.

                     Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 3393 bytes --]

 cache.h       |  1 +
 environment.c |  2 +-
 sha1_name.c   | 26 +++++++++++++++++++++++++-
 3 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/cache.h b/cache.h
index 6e33f2f28..d2da6d186 100644
--- a/cache.h
+++ b/cache.h
@@ -1207,6 +1207,7 @@ struct object_context {
 #define GET_SHA1_TREEISH          020
 #define GET_SHA1_BLOB             040
 #define GET_SHA1_FOLLOW_SYMLINKS 0100
+#define GET_SHA1_AUTOMATIC	 0200
 #define GET_SHA1_ONLY_TO_DIE    04000
 
 #define GET_SHA1_DISAMBIGUATORS \
diff --git a/environment.c b/environment.c
index c1442df9a..fd6681e46 100644
--- a/environment.c
+++ b/environment.c
@@ -16,7 +16,7 @@ int trust_executable_bit = 1;
 int trust_ctime = 1;
 int check_stat = 1;
 int has_symlinks = 1;
-int minimum_abbrev = 4, default_abbrev = 7;
+int minimum_abbrev = 4, default_abbrev = -1;
 int ignore_case;
 int assume_unchanged;
 int prefer_symlink_refs;
diff --git a/sha1_name.c b/sha1_name.c
index 3b647fd7c..1003c96ea 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -15,6 +15,7 @@ typedef int (*disambiguate_hint_fn)(const unsigned char *, void *);
 
 struct disambiguate_state {
 	int len; /* length of prefix in hex chars */
+	unsigned int nrobjects;
 	char hex_pfx[GIT_SHA1_HEXSZ + 1];
 	unsigned char bin_pfx[GIT_SHA1_RAWSZ];
 
@@ -118,6 +119,12 @@ static void find_short_object_filename(struct disambiguate_state *ds)
 
 			if (strlen(de->d_name) != 38)
 				continue;
+
+			// We only look at the one subdirectory, and we assume
+			// each subdirectory is roughly similar, so each object
+			// we find probably has 255 other objects in the other
+			// fan-out directories
+			ds->nrobjects += 256;
 			if (memcmp(de->d_name, ds->hex_pfx + 2, ds->len - 2))
 				continue;
 			memcpy(hex + 2, de->d_name, 38);
@@ -151,6 +158,7 @@ static void unique_in_pack(struct packed_git *p,
 
 	open_pack_index(p);
 	num = p->num_objects;
+	ds->nrobjects += num;
 	last = num;
 	while (first < last) {
 		uint32_t mid = (first + last) / 2;
@@ -380,6 +388,9 @@ static int show_ambiguous_object(const unsigned char *sha1, void *data)
 	return 0;
 }
 
+// Why seven? That's our historical default before the automatic abbreviation
+static int default_automatic_abbrev = 7;
+
 static int get_short_sha1(const char *name, int len, unsigned char *sha1,
 			  unsigned flags)
 {
@@ -426,6 +437,14 @@ static int get_short_sha1(const char *name, int len, unsigned char *sha1,
 		for_each_abbrev(ds.hex_pfx, show_ambiguous_object, &ds);
 	}
 
+	if (len < 16 && !status && (flags & GET_SHA1_AUTOMATIC)) {
+		unsigned int expect_collision = 1 << (len * 2);
+		if (ds.nrobjects > expect_collision) {
+			default_automatic_abbrev = len+1;
+			return SHORT_NAME_AMBIGUOUS;
+		}
+	}
+
 	return status;
 }
 
@@ -458,14 +477,19 @@ int for_each_abbrev(const char *prefix, each_abbrev_fn fn, void *cb_data)
 int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len)
 {
 	int status, exists;
+	int flags = GET_SHA1_QUIETLY;
 
+	if (len < 0) {
+		flags |= GET_SHA1_AUTOMATIC;
+		len = default_automatic_abbrev;
+	}
 	sha1_to_hex_r(hex, sha1);
 	if (len == 40 || !len)
 		return 40;
 	exists = has_sha1_file(sha1);
 	while (len < 40) {
 		unsigned char sha1_ret[20];
-		status = get_short_sha1(hex, len, sha1_ret, GET_SHA1_QUIETLY);
+		status = get_short_sha1(hex, len, sha1_ret, flags);
 		if (exists
 		    ? !status
 		    : status == SHORT_NAME_NOT_FOUND) {

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-30  1:18                             ` Linus Torvalds
@ 2016-09-30  3:54                               ` Junio C Hamano
  2016-09-30  4:10                                 ` Junio C Hamano
  2016-09-30  4:11                                 ` Linus Torvalds
  2016-09-30  8:06                               ` Jeff King
  1 sibling, 2 replies; 111+ messages in thread
From: Junio C Hamano @ 2016-09-30  3:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff King, Johannes Sixt, Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> So this patch may actually be "production ready" apart from the fact
> that some tests still fail (at least t2027-worktree-list.sh) because
> of different short SHA1 cases.

t2027 has at least two problems.

 * "git worktree" does not read the core.abbrev configuration,
   without a recent fix in jc/worktree-config, i.e. d49028e6
   ("worktree: honor configuration variables", 2016-09-26).

 * The script uses "git rev-parse --short HEAD"; I suspect that it
   says "ah, default_abbrev is -1 and minimum_abbrev is 4, so let's
   try abbreviating to 4 hexdigits".

The first failure in t3203 seems to come from the same issue in
"rev-parse --short".

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-30  3:54                               ` Junio C Hamano
@ 2016-09-30  4:10                                 ` Junio C Hamano
  2016-09-30  4:18                                   ` Linus Torvalds
  2016-09-30  4:27                                   ` Junio C Hamano
  2016-09-30  4:11                                 ` Linus Torvalds
  1 sibling, 2 replies; 111+ messages in thread
From: Junio C Hamano @ 2016-09-30  4:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff King, Johannes Sixt, Git Mailing List

Junio C Hamano <gitster@pobox.com> writes:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
>> So this patch may actually be "production ready" apart from the fact
>> that some tests still fail (at least t2027-worktree-list.sh) because
>> of different short SHA1 cases.
>
> t2027 has at least two problems.
>
>  * "git worktree" does not read the core.abbrev configuration,
>    without a recent fix in jc/worktree-config, i.e. d49028e6
>    ("worktree: honor configuration variables", 2016-09-26).
>
>  * The script uses "git rev-parse --short HEAD"; I suspect that it
>    says "ah, default_abbrev is -1 and minimum_abbrev is 4, so let's
>    try abbreviating to 4 hexdigits".
>
> The first failure in t3203 seems to come from the same issue in
> "rev-parse --short".

A quick and dirty fix for it may look like this.

We leave the variable abbrev to DEFAULT_ABBREV and let
find_unique_abbrev() react to "eh, -1? I need to do the
auto-scaling".  "git diff-tree --abbrev" seems to have a similar
problem, and the fix is the same.

There still are breakages seen in t5510 and t5526 that are about the
verbose output of "git fetch".  I'll stop digging at this point
tonight, and welcome others who look into it ;-)

 builtin/rev-parse.c | 14 ++++++++------
 diff.c              |  2 +-
 2 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index 76cf05e2ad..f8c8c6c22e 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -642,13 +642,15 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
 			    starts_with(arg, "--short=")) {
 				filter &= ~(DO_FLAGS|DO_NOREV);
 				verify = 1;
-				abbrev = DEFAULT_ABBREV;
-				if (arg[7] == '=')
+				if (arg[7] != '=') {
+					abbrev = DEFAULT_ABBREV;
+				} else {
 					abbrev = strtoul(arg + 8, NULL, 10);
-				if (abbrev < MINIMUM_ABBREV)
-					abbrev = MINIMUM_ABBREV;
-				else if (40 <= abbrev)
-					abbrev = 40;
+					if (abbrev < MINIMUM_ABBREV)
+						abbrev = MINIMUM_ABBREV;
+					else if (40 <= abbrev)
+						abbrev = 40;
+				}
 				continue;
 			}
 			if (!strcmp(arg, "--sq")) {
diff --git a/diff.c b/diff.c
index c6da383c56..cefc13eb8e 100644
--- a/diff.c
+++ b/diff.c
@@ -3399,7 +3399,7 @@ void diff_setup_done(struct diff_options *options)
 			 */
 			read_cache();
 	}
-	if (options->abbrev <= 0 || 40 < options->abbrev)
+	if (40 < options->abbrev)
 		options->abbrev = 40; /* full */
 
 	/*

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-30  3:54                               ` Junio C Hamano
  2016-09-30  4:10                                 ` Junio C Hamano
@ 2016-09-30  4:11                                 ` Linus Torvalds
  1 sibling, 0 replies; 111+ messages in thread
From: Linus Torvalds @ 2016-09-30  4:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Johannes Sixt, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 1048 bytes --]

On Thu, Sep 29, 2016 at 8:54 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
>  * The script uses "git rev-parse --short HEAD"; I suspect that it
>    says "ah, default_abbrev is -1 and minimum_abbrev is 4, so let's
>    try abbreviating to 4 hexdigits".

Ahh, right you are. The logic there is

                                abbrev = DEFAULT_ABBREV;
                                if (arg[7] == '=')
                                        abbrev = strtoul(arg + 8, NULL, 10);
                                if (abbrev < MINIMUM_ABBREV)
                                        abbrev = MINIMUM_ABBREV;
                                ....

which now does something different than what it used to do because
DEFAULT_ABBREV is -1.

Putting the "sanity-check the abbrev range" tests inside the "if()"
statement that does strtoul() should fix it. Let me test...

[ short time passes ]

Yup. Incremental patch for that single issue attached.  I made it do
an early "continue" instead of adding another level on indentation.

                 Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 629 bytes --]

 builtin/rev-parse.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index 4da1f1da2..cfb0f1510 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -671,8 +671,9 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
 				filter &= ~(DO_FLAGS|DO_NOREV);
 				verify = 1;
 				abbrev = DEFAULT_ABBREV;
-				if (arg[7] == '=')
-					abbrev = strtoul(arg + 8, NULL, 10);
+				if (!arg[7])
+					continue;
+				abbrev = strtoul(arg + 8, NULL, 10);
 				if (abbrev < MINIMUM_ABBREV)
 					abbrev = MINIMUM_ABBREV;
 				else if (40 <= abbrev)

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-30  4:10                                 ` Junio C Hamano
@ 2016-09-30  4:18                                   ` Linus Torvalds
  2016-09-30  4:29                                     ` Linus Torvalds
  2016-09-30  4:27                                   ` Junio C Hamano
  1 sibling, 1 reply; 111+ messages in thread
From: Linus Torvalds @ 2016-09-30  4:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Johannes Sixt, Git Mailing List

On Thu, Sep 29, 2016 at 9:10 PM, Junio C Hamano <gitster@pobox.com> wrote:
>
> A quick and dirty fix for it may look like this.

Crossed emails.

Indeed, I just solved the builtin/rev-parse.c thing slightly differently.

And you found another failure in the diff code similarly not liking
the negative DEFAULT_ABBREV.  There are probably other things like
that.

              Linus

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-30  4:10                                 ` Junio C Hamano
  2016-09-30  4:18                                   ` Linus Torvalds
@ 2016-09-30  4:27                                   ` Junio C Hamano
  2016-09-30  4:35                                     ` Junio C Hamano
  2016-09-30 18:40                                     ` Junio C Hamano
  1 sibling, 2 replies; 111+ messages in thread
From: Junio C Hamano @ 2016-09-30  4:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff King, Johannes Sixt, Git Mailing List

Junio C Hamano <gitster@pobox.com> writes:

> There still are breakages seen in t5510 and t5526 that are about the
> verbose output of "git fetch".  I'll stop digging at this point
> tonight, and welcome others who look into it ;-)

OK, just before I leave the keyboard for the night...

-- >8 --
From: Junio C Hamano <gitster@pobox.com>
Date: Thu, 29 Sep 2016 21:19:20 -0700
Subject: [PATCH] abbrev: adjust to the new world order

The default_abbrev used to be a concrete value usable as the default
abbreviation length.  The code that sets custom abbreviation length,
in response to command line argument, often did something like:

	if (skip_prefix(arg, "--abbrev=", &arg))
		abbrev = atoi(arg);
	else if (!strcmp("--abbrev", &arg))
		abbrev = DEFAULT_ABBREV;
	/* make the value sane */
	if (abbrev < 0 || 40 < abbrev)
		abbrev = ... some sane value ...

The new world order however is that the default_abbrev is a negative
value that signals find_unique_abbrev() that it needs to dynamically
find out a good default value.  We shouldn't coerce a negative value
into a random positive value like the above sample code.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/rev-parse.c | 5 +++--
 diff.c              | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index 76cf05e2ad..17cbfabdde 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -643,8 +643,9 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
 				filter &= ~(DO_FLAGS|DO_NOREV);
 				verify = 1;
 				abbrev = DEFAULT_ABBREV;
-				if (arg[7] == '=')
-					abbrev = strtoul(arg + 8, NULL, 10);
+				if (!arg[7])
+					continue;
+				abbrev = strtoul(arg + 8, NULL, 10);
 				if (abbrev < MINIMUM_ABBREV)
 					abbrev = MINIMUM_ABBREV;
 				else if (40 <= abbrev)
diff --git a/diff.c b/diff.c
index c6da383c56..cefc13eb8e 100644
--- a/diff.c
+++ b/diff.c
@@ -3399,7 +3399,7 @@ void diff_setup_done(struct diff_options *options)
 			 */
 			read_cache();
 	}
-	if (options->abbrev <= 0 || 40 < options->abbrev)
+	if (40 < options->abbrev)
 		options->abbrev = 40; /* full */
 
 	/*
-- 
2.10.0-612-g22341905f2


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-30  4:18                                   ` Linus Torvalds
@ 2016-09-30  4:29                                     ` Linus Torvalds
  0 siblings, 0 replies; 111+ messages in thread
From: Linus Torvalds @ 2016-09-30  4:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Johannes Sixt, Git Mailing List

On Thu, Sep 29, 2016 at 9:18 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> There are probably other things like that.

t5510-fetch.sh fails oddly, looks like the output is off by one character.

   not ok 77 - fetch aligned output

It has a magic "cut -c 22-" that expects the output at a specific
place, and now it's at column 21 instead of column 22. Strange test,
but it still seems to be aligned, just in a different column.

But clearly something changed.

             Linus

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-30  4:27                                   ` Junio C Hamano
@ 2016-09-30  4:35                                     ` Junio C Hamano
  2016-09-30 18:40                                     ` Junio C Hamano
  1 sibling, 0 replies; 111+ messages in thread
From: Junio C Hamano @ 2016-09-30  4:35 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Jeff King, Johannes Sixt, Linus Torvalds

Junio C Hamano <gitster@pobox.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> There still are breakages seen in t5510 and t5526 that are about the
>> verbose output of "git fetch".  I'll stop digging at this point
>> tonight, and welcome others who look into it ;-)
>
> OK, just before I leave the keyboard for the night...
>
> -- >8 --
> From: Junio C Hamano <gitster@pobox.com>
> Date: Thu, 29 Sep 2016 21:19:20 -0700
> Subject: [PATCH] abbrev: adjust to the new world order

To those who are following from sidelines, this builds on Linus's
third iteration patch (which is based on his first patch), applied
on Peff's "give disambiguation help when giving an ambiguity error"
series.  I didn't merge the work-in-progress going back and forth
between Linus and I tonight to any of the integration branches, but
it is available as lt/abbrev-auto-2 branch of the "broken down"
repository, i.e.

    git://github.com/gitster/git.git lt/abbrev-auto-2


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 10/10] get_short_sha1: list ambiguous objects on error
  2016-09-29 17:19               ` Junio C Hamano
@ 2016-09-30  5:51                 ` Jacob Keller
  0 siblings, 0 replies; 111+ messages in thread
From: Jacob Keller @ 2016-09-30  5:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Kyle J. McKay, Linus Torvalds, Git Mailing List

On Thu, Sep 29, 2016 at 10:19 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Jeff King <peff@peff.net> writes:
>>   - "cat-file --batch-check" can show you the sha1 and type, but it
>>     won't abbreviate sha1s, and it won't show you commit/tag information
>>
>>   - "log --stdin --no-walk" will format the commit however you like, but
>>     skips the trees and blobs entirely, and the tag can only be seen via
>>     "%d"
>>
>>   - "for-each-ref" has flexible formatting, too, but wants to format
>>     refs, not objects (and doesn't read from stdin).
>
>     - "name-rev" is used to give "describe --contains", and can read
>       from its standard input, but has no format customization.
>       Another downside of it is that it only wants to see
>       committishes.
>

Some tool which reads standard input and can be formatted would be
nice. Extending name-rev with the same format options as for-each-ref
would be nice.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-29 23:13                     ` Junio C Hamano
  2016-09-29 23:20                       ` Junio C Hamano
  2016-09-30  0:20                       ` Linus Torvalds
@ 2016-09-30  7:47                       ` Jeff King
  2 siblings, 0 replies; 111+ messages in thread
From: Jeff King @ 2016-09-30  7:47 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Johannes Sixt, Git Mailing List

On Thu, Sep 29, 2016 at 04:13:38PM -0700, Junio C Hamano wrote:

> There are very early ones in the program startup sequence in the
> following functions, but I do not think of a reason why our new and
> early call to prepare_packed_git() might be problematic, given that
> all of them require us to have an access to the repository (i.e.
> this change cannot introduce a regression where a command used to
> work outside a repository but barf when prepare_packed_git() is
> called early):
> 
>  - builtin/describe.c
>  - builtin/rev-list.c
>  - builtin/rev-parse.c
> 
> I thought that the one in diff.c might be problematic when the "git
> diff" command is run outside a repository with the "--no-index"
> option, but it appears that init_default_abbrev() seems to be OK
> when run outside a repository.

Actually, "diff --no-index" is currently buggy in this regard. In the
followup series to jk/setup-sequence-update (which I mentioned but
haven't posted yet), I teach get_object_dir() not to blindly default to
".git", and found that "diff --no-index" is perfectly happy to look in
".git/objects" for find_unique_abbrev(), even when we know there's no
repository (or it has an unknown vintage).

I fixed it there by just using the default abbrev value for out-of-repo
diffs, and skip calling find_unique_abbrev() at all. That would here,
too.

But if we add object-store initialization at other times, it's a
potential conflict. IMHO this should stay inside find_unique_abbrev(),
where we know we already must look at the object store.

-Peff

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-30  1:18                             ` Linus Torvalds
  2016-09-30  3:54                               ` Junio C Hamano
@ 2016-09-30  8:06                               ` Jeff King
  2016-09-30 17:54                                 ` Linus Torvalds
  2016-09-30 17:56                                 ` Junio C Hamano
  1 sibling, 2 replies; 111+ messages in thread
From: Jeff King @ 2016-09-30  8:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Johannes Sixt, Git Mailing List

On Thu, Sep 29, 2016 at 06:18:03PM -0700, Linus Torvalds wrote:

> On Thu, Sep 29, 2016 at 5:57 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > Actually, all the other cases seem to be "parse a SHA1 with a known
> > length", so they really don't have a negative length.  So this seems
> > ok, and is easier to verify than the "what all contexts might use
> > DEFAULT_ABBREV" thing. There's only a few callers, and it's a static
> > function so it's easy to check it locally in sha1_name.c.
> 
> Here's my original patch with just a tiny change that instead of
> starting the automatic guessing at 7 each time, it starts at
> "default_automatic_abbrev", which is initialized to 7.
> 
> The difference is that if we decide that "oh, that was too small, need
> to repeat", we also update that "default_automatic_abbrev" value, so
> that we won't start at the number that we now know was too small.
> 
> So it still loops over the abbrev values, but now it only loops a
> couple of times.
> 
> I actually verified the performance impact by doing
> 
>       time git rev-list --abbrev-commit HEAD > /dev/null
> 
> on the kernel git tree, and it does actually matter. With my original
> patch, we wasted a noticeable amount of time on just the extra
> looping, with this it's down to the same performance as just doing it
> once at init time (it's about 12s vs 9s on my laptop).

I agree that this deals with the performance concerns by caching the
default_abbrev_len and starting there. I still think it's unnecessarily
invasive to touch get_short_sha1() at all, which is otherwise only a
reading function.

So IMHO, the best combination is the init_default_abbrev() you posted in
[1], but initialized at the top of find_unique_abbrev(). And cached
there, obviously, in a similar way.

-Peff

[1] http://public-inbox.org/git/CA+55aFyVEQ+8TBBUm5KG9APtd9wy8cp_mRO=3nj12DXZNLAC9A@mail.gmail.com/

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-30  8:06                               ` Jeff King
@ 2016-09-30 17:54                                 ` Linus Torvalds
  2016-09-30 18:05                                   ` Jeff King
  2016-09-30 18:21                                   ` Linus Torvalds
  2016-09-30 17:56                                 ` Junio C Hamano
  1 sibling, 2 replies; 111+ messages in thread
From: Linus Torvalds @ 2016-09-30 17:54 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Johannes Sixt, Git Mailing List

On Fri, Sep 30, 2016 at 1:06 AM, Jeff King <peff@peff.net> wrote:
>
> I agree that this deals with the performance concerns by caching the
> default_abbrev_len and starting there. I still think it's unnecessarily
> invasive to touch get_short_sha1() at all, which is otherwise only a
> reading function.

So the reason that d oesn't work is that the "disambiguate_state" data
where we keep the number of objects is only visible within
get_short_sha1().

So outside that function, you don't have any sane way to figure out
how many objects. So then you have to do the extra counting function..

> So IMHO, the best combination is the init_default_abbrev() you posted in
> [1], but initialized at the top of find_unique_abbrev(). And cached
> there, obviously, in a similar way.

That's certainly possible, but I'm really not happy with how the
counting function looks.  And nobody actually stood up to say "yeah,
that gets alternate loose objects right" or "if you have tons of those
alternate loose objects you have other issues anyway". I think
somebody would have to "own" that counting function, the advantage of
just putting it into disambiguate_state is that we just get the
counting for free..

                         Linus

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-30  8:06                               ` Jeff King
  2016-09-30 17:54                                 ` Linus Torvalds
@ 2016-09-30 17:56                                 ` Junio C Hamano
  1 sibling, 0 replies; 111+ messages in thread
From: Junio C Hamano @ 2016-09-30 17:56 UTC (permalink / raw)
  To: Jeff King; +Cc: Linus Torvalds, Johannes Sixt, Git Mailing List

Jeff King <peff@peff.net> writes:

> I agree that this deals with the performance concerns by caching the
> default_abbrev_len and starting there. I still think it's unnecessarily
> invasive to touch get_short_sha1() at all, which is otherwise only a
> reading function.
>
> So IMHO, the best combination is the init_default_abbrev() you posted in
> [1], but initialized at the top of find_unique_abbrev(). And cached
> there, obviously, in a similar way.

Hmm. I am undecided; both approaches look OK to me.


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-30 17:54                                 ` Linus Torvalds
@ 2016-09-30 18:05                                   ` Jeff King
  2016-09-30 18:21                                   ` Linus Torvalds
  1 sibling, 0 replies; 111+ messages in thread
From: Jeff King @ 2016-09-30 18:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Johannes Sixt, Git Mailing List

On Fri, Sep 30, 2016 at 10:54:16AM -0700, Linus Torvalds wrote:

> On Fri, Sep 30, 2016 at 1:06 AM, Jeff King <peff@peff.net> wrote:
> >
> > I agree that this deals with the performance concerns by caching the
> > default_abbrev_len and starting there. I still think it's unnecessarily
> > invasive to touch get_short_sha1() at all, which is otherwise only a
> > reading function.
> 
> So the reason that d oesn't work is that the "disambiguate_state" data
> where we keep the number of objects is only visible within
> get_short_sha1().
> 
> So outside that function, you don't have any sane way to figure out
> how many objects. So then you have to do the extra counting function..

Right. I think you should do the extra counting function. It's a few
more lines, but the design is way less tangled.

> > So IMHO, the best combination is the init_default_abbrev() you posted in
> > [1], but initialized at the top of find_unique_abbrev(). And cached
> > there, obviously, in a similar way.
> 
> That's certainly possible, but I'm really not happy with how the
> counting function looks.  And nobody actually stood up to say "yeah,
> that gets alternate loose objects right" or "if you have tons of those
> alternate loose objects you have other issues anyway". I think
> somebody would have to "own" that counting function, the advantage of
> just putting it into disambiguate_state is that we just get the
> counting for free..

I don't think you _need_ get the alternate loose objects right. In fact,
I don't think you need to care about loose objects at all. For the
scales we're talking about, they're a rounding error. I would have done
it like this:

diff --git a/sha1_file.c b/sha1_file.c
index 65deaf9..1845502 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1382,6 +1382,32 @@ static void prepare_packed_git_one(char *objdir, int local)
 	strbuf_release(&path);
 }
 
+static int approximate_object_count_valid;
+
+/*
+ * Give a fast, rough count of the number of objects in the repository. This
+ * ignores loose objects completely. If you have a lot of them, then either
+ * you should repack because your performance will be awful, or they are
+ * all unreachable objects about to be pruned, in which case they're not really
+ * interesting as a measure of repo size in the first place.
+ */
+unsigned long approximate_object_count(void)
+{
+	static unsigned long count;
+	if (!approximate_object_count_valid) {
+		struct packed_git *p;
+
+		prepare_packed_git();
+		count = 0;
+		for (p = packed_git; p; p = p->next) {
+			if (open_pack_index(p))
+				continue;
+			count += p->num_objects;
+		}
+	}
+	return count;
+}
+
 static void *get_next_packed_git(const void *p)
 {
 	return ((const struct packed_git *)p)->next;
@@ -1456,6 +1482,7 @@ void prepare_packed_git(void)
 
 void reprepare_packed_git(void)
 {
+	approximate_object_count_valid = 0;
 	prepare_packed_git_run_once = 0;
 	prepare_packed_git();
 }

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-30 17:54                                 ` Linus Torvalds
  2016-09-30 18:05                                   ` Jeff King
@ 2016-09-30 18:21                                   ` Linus Torvalds
  2016-09-30 20:01                                     ` Junio C Hamano
  1 sibling, 1 reply; 111+ messages in thread
From: Linus Torvalds @ 2016-09-30 18:21 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, Johannes Sixt, Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 1449 bytes --]

On Fri, Sep 30, 2016 at 10:54 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
>> So IMHO, the best combination is the init_default_abbrev() you posted in
>> [1], but initialized at the top of find_unique_abbrev(). And cached
>> there, obviously, in a similar way.
>
> That's certainly possible, but I'm really not happy with how the
> counting function looks.  And nobody actually stood up to say "yeah,
> that gets alternate loose objects right" or "if you have tons of those
> alternate loose objects you have other issues anyway". I think
> somebody would have to "own" that counting function, the advantage of
> just putting it into disambiguate_state is that we just get the
> counting for free..

Side note: maybe we can mix the two approaches, and keep the counting
in the disambiguation state, and just make the counting function do

    init_object_disambiguation();
    find_short_object_filename(&ds);
    find_short_packed_object(&ds);
    finish_object_disambiguation(&ds, sha1);

and then just use "ds.nrobjects". So the counting would still be done
by the disambiguation code, it just woudln't be in get_short_sha1().

So here's another version that takes that approach. And if somebody
(hint hint) wants to do the counting differently, they can perhaps
send an incremental patch to do that.

(This patch also contains the few setup issues Junio found with the
new "default_abbrev is negative" model)

              Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 3705 bytes --]

 builtin/rev-parse.c |  5 +++--
 diff.c              |  2 +-
 environment.c       |  2 +-
 sha1_name.c         | 39 ++++++++++++++++++++++++++++++++++++++-
 4 files changed, 43 insertions(+), 5 deletions(-)

diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index 4da1f1da2..cfb0f1510 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -671,8 +671,9 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
 				filter &= ~(DO_FLAGS|DO_NOREV);
 				verify = 1;
 				abbrev = DEFAULT_ABBREV;
-				if (arg[7] == '=')
-					abbrev = strtoul(arg + 8, NULL, 10);
+				if (!arg[7])
+					continue;
+				abbrev = strtoul(arg + 8, NULL, 10);
 				if (abbrev < MINIMUM_ABBREV)
 					abbrev = MINIMUM_ABBREV;
 				else if (40 <= abbrev)
diff --git a/diff.c b/diff.c
index 59920747d..c6d445915 100644
--- a/diff.c
+++ b/diff.c
@@ -3421,7 +3421,7 @@ void diff_setup_done(struct diff_options *options)
 			 */
 			read_cache();
 	}
-	if (options->abbrev <= 0 || 40 < options->abbrev)
+	if (options->abbrev > 40)
 		options->abbrev = 40; /* full */
 
 	/*
diff --git a/environment.c b/environment.c
index c1442df9a..fd6681e46 100644
--- a/environment.c
+++ b/environment.c
@@ -16,7 +16,7 @@ int trust_executable_bit = 1;
 int trust_ctime = 1;
 int check_stat = 1;
 int has_symlinks = 1;
-int minimum_abbrev = 4, default_abbrev = 7;
+int minimum_abbrev = 4, default_abbrev = -1;
 int ignore_case;
 int assume_unchanged;
 int prefer_symlink_refs;
diff --git a/sha1_name.c b/sha1_name.c
index 3b647fd7c..684b36dba 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -15,6 +15,7 @@ typedef int (*disambiguate_hint_fn)(const unsigned char *, void *);
 
 struct disambiguate_state {
 	int len; /* length of prefix in hex chars */
+	unsigned int nrobjects;
 	char hex_pfx[GIT_SHA1_HEXSZ + 1];
 	unsigned char bin_pfx[GIT_SHA1_RAWSZ];
 
@@ -118,6 +119,12 @@ static void find_short_object_filename(struct disambiguate_state *ds)
 
 			if (strlen(de->d_name) != 38)
 				continue;
+
+			// We only look at the one subdirectory, and we assume
+			// each subdirectory is roughly similar, so each object
+			// we find probably has 255 other objects in the other
+			// fan-out directories
+			ds->nrobjects += 256;
 			if (memcmp(de->d_name, ds->hex_pfx + 2, ds->len - 2))
 				continue;
 			memcpy(hex + 2, de->d_name, 38);
@@ -151,6 +158,7 @@ static void unique_in_pack(struct packed_git *p,
 
 	open_pack_index(p);
 	num = p->num_objects;
+	ds->nrobjects += num;
 	last = num;
 	while (first < last) {
 		uint32_t mid = (first + last) / 2;
@@ -455,17 +463,46 @@ int for_each_abbrev(const char *prefix, each_abbrev_fn fn, void *cb_data)
 	return ret;
 }
 
+static int get_automatic_abbrev(const char *hex)
+{
+	static int len;
+	struct disambiguate_state ds;
+
+	if (init_object_disambiguation(hex, 7, &ds) < 0)
+		return 7;
+
+	find_short_object_filename(&ds);
+	find_short_packed_object(&ds);
+
+	for (len = 7; len < 16; len++) {
+		unsigned int expect_collision = 1 << (len * 2);
+		if (ds.nrobjects < expect_collision)
+			break;
+	}
+	return len;
+}
+
 int find_unique_abbrev_r(char *hex, const unsigned char *sha1, int len)
 {
 	int status, exists;
+	int flags = GET_SHA1_QUIETLY;
 
 	sha1_to_hex_r(hex, sha1);
 	if (len == 40 || !len)
 		return 40;
+
+	if (len < 0) {
+		static int automatic_abbrev = -1;
+
+		if (automatic_abbrev < 0)
+			automatic_abbrev = get_automatic_abbrev(hex);
+		len = automatic_abbrev;
+	}
+
 	exists = has_sha1_file(sha1);
 	while (len < 40) {
 		unsigned char sha1_ret[20];
-		status = get_short_sha1(hex, len, sha1_ret, GET_SHA1_QUIETLY);
+		status = get_short_sha1(hex, len, sha1_ret, flags);
 		if (exists
 		    ? !status
 		    : status == SHORT_NAME_NOT_FOUND) {

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-30  4:27                                   ` Junio C Hamano
  2016-09-30  4:35                                     ` Junio C Hamano
@ 2016-09-30 18:40                                     ` Junio C Hamano
  2016-09-30 18:51                                       ` Linus Torvalds
  1 sibling, 1 reply; 111+ messages in thread
From: Junio C Hamano @ 2016-09-30 18:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff King, Johannes Sixt, Git Mailing List

Junio C Hamano <gitster@pobox.com> writes:

> From: Junio C Hamano <gitster@pobox.com>
> Date: Thu, 29 Sep 2016 21:19:20 -0700
> Subject: [PATCH] abbrev: adjust to the new world order
>
> The default_abbrev used to be a concrete value usable as the default
> abbreviation length.  The code that sets custom abbreviation length,
> in response to command line argument, often did something like:
>
> 	if (skip_prefix(arg, "--abbrev=", &arg))
> 		abbrev = atoi(arg);
> 	else if (!strcmp("--abbrev", &arg))
> 		abbrev = DEFAULT_ABBREV;
> 	/* make the value sane */
> 	if (abbrev < 0 || 40 < abbrev)
> 		abbrev = ... some sane value ...
>
> The new world order however is that the default_abbrev is a negative
> value that signals find_unique_abbrev() that it needs to dynamically
> find out a good default value.  We shouldn't coerce a negative value
> into a random positive value like the above sample code.
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>

There is another instance buried deep in an obscure macro.  A
minimum fix may look like this, but I really hope somebody else
finds a better approach.  Peff alluded to "when it is still -1
substituting it with a reasonable value like 7" in a separate
thread, and we probably would want a way to allow accessing that
"reasonable value like 7" without triggering auto sizing logic
too early.

With this and the patch in the message I am responding to, your
patch from the last night seems to pass all the tests for me.

 transport.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/transport.h b/transport.h
index 6fe3485325..8a96e22bb0 100644
--- a/transport.h
+++ b/transport.h
@@ -142,7 +142,7 @@ struct transport {
 #define TRANSPORT_PUSH_ATOMIC 8192
 #define TRANSPORT_PUSH_OPTIONS 16384
 
-#define TRANSPORT_SUMMARY_WIDTH (2 * DEFAULT_ABBREV + 3)
+#define TRANSPORT_SUMMARY_WIDTH (2 * (DEFAULT_ABBREV < 0 ? 7 : DEFAULT_ABBREV) + 3)
 #define TRANSPORT_SUMMARY(x) (int)(TRANSPORT_SUMMARY_WIDTH + strlen(x) - gettext_width(x)), (x)
 
 /* Returns a transport suitable for the url */

^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-30 18:40                                     ` Junio C Hamano
@ 2016-09-30 18:51                                       ` Linus Torvalds
  2016-09-30 19:00                                         ` Junio C Hamano
  0 siblings, 1 reply; 111+ messages in thread
From: Linus Torvalds @ 2016-09-30 18:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Johannes Sixt, Git Mailing List

On Fri, Sep 30, 2016 at 11:40 AM, Junio C Hamano <gitster@pobox.com> wrote:
>
> There is another instance buried deep in an obscure macro.  A
> minimum fix may look like this, but I really hope somebody else
> finds a better approach.

Heh. Yeah, that's just ugly. I assume this is why the odd git fetch
pretty-printing test was off by one column..

Considering that TRANSPORT_SUMMARY and TRANSPORT_SUMMARY_WIDTH are
both used in exactly one place each, I'd suggest getting rid of that
crazy macro, and just expanding it in those places to avoid these
kinds of crazy "hiding variables inside complex defines thning".

And maybe just deciding to hardcode TRANSPORT_SUMMARY_WIDTH to 17
(which was it's original default value and presumably is what the test
is effectively hardcoded for too), and avoiding that complexity
entirely.

                Linus

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-30 18:51                                       ` Linus Torvalds
@ 2016-09-30 19:00                                         ` Junio C Hamano
  0 siblings, 0 replies; 111+ messages in thread
From: Junio C Hamano @ 2016-09-30 19:00 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jeff King, Johannes Sixt, Git Mailing List,
	Nguyễn Thái Ngọc Duy

Linus Torvalds <torvalds@linux-foundation.org> writes:

> Considering that TRANSPORT_SUMMARY and TRANSPORT_SUMMARY_WIDTH are
> both used in exactly one place each, I'd suggest getting rid of that
> crazy macro, and just expanding it in those places to avoid these
> kinds of crazy "hiding variables inside complex defines thning".
>
> And maybe just deciding to hardcode TRANSPORT_SUMMARY_WIDTH to 17
> (which was it's original default value and presumably is what the test
> is effectively hardcoded for too), and avoiding that complexity
> entirely.

For all fairness, when the WIDTH thing was introduced, there were
two places that needed reference it at f1863d0d16 ("refactor
duplicated code in builtin-send-pack.c and transport.c",
2010-02-16).  But that is no longer the case, and it makes sense to
hardcode it as 17 (or something derived from a symbolic constant
that gives the new "default to default").

What TRANSPORT_SUMMARY() does is even more crazy and it really
shouldn't be exposed as a public interface.  Let's move it to its
single calling place.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-30  1:01                 ` Linus Torvalds
@ 2016-09-30 19:41                   ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 111+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2016-09-30 19:41 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Mike Hommey, Junio C Hamano, Johannes Sixt, Git Mailing List,
	Jeff King

On Fri, Sep 30, 2016 at 3:01 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Thu, Sep 29, 2016 at 5:56 PM, Mike Hommey <mh@glandium.org> wrote:
>>
>> OTOH, how often does one refer to trees or blobs with abbreviated sha1s?
>> Most of the time, you'd use abbreviated sha1s for commits. And the number
>> of commits in git and the kernel repositories are much lower than the
>> number of overall objects.
>
> See that whole other discussion about this. I agree. If we only ever
> worried about just commits, the abbreviation length wouldn't need to
> be grown nearly as aggressively. The current default would still be
> wrong for the kernel, but it wouldn't be as noticeably wrong, and
> updating it to 8 or 9 would be fine.
>
> That said, people argued against that too. We *do* end up having
> abbreviated SHA1's for blobs in the diff index. When I said that _I_
> neer use it, somebody piped up to say that they do.
>
> So I'd rather just keep the existing semantics (a hash is a hash is a
> hash), and just abbreviate at a sufficient point that we don't have to
> worry too much about disambiguating further by object type.

I work on a repo that's around the size of linux.git in every way
(commits, objects etc.), and growing twice as fast.

So I also see 8 or 9 digit abbreviations on a daily basis, even with
the current defaults core.abbrev, but I still think growing it so
aggressively is the wrong thing to do.

The fact that we have a core.abbrev option at all and nobody's talking
about getting rid of it entirely means we all acknowledge the UX
convenience of short SHA1s.

I don't think it's a good idea for such UX options to have defaults
that really only make sense for repositories at the very far end of
the bell curve, which is the case with linux.git and the repo I work
on.

Either way you're going to waste somebody's time. I think it's a
better trade-off that some kernel dev occasionally has to look at
Peff's new disambiguation output, than have the wast hordes of
everyday Git users have less screen real estate, need to recite longer
sha1s over the phone during outages (people do that), and any number
of other every day use cases.

I think if anything we should be talking about making the default
shorter & then have some clever auto-scaling by repository size as has
been discussed in this thread to deal with the repositories at the far
end of the bell curve.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits
  2016-09-30 18:21                                   ` Linus Torvalds
@ 2016-09-30 20:01                                     ` Junio C Hamano
  0 siblings, 0 replies; 111+ messages in thread
From: Junio C Hamano @ 2016-09-30 20:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff King, Johannes Sixt, Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Fri, Sep 30, 2016 at 10:54 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>>> So IMHO, the best combination is the init_default_abbrev() you posted in
>>> [1], but initialized at the top of find_unique_abbrev(). And cached
>>> there, obviously, in a similar way.
>>
>> That's certainly possible, but I'm really not happy with how the
>> counting function looks.  And nobody actually stood up to say "yeah,
>> that gets alternate loose objects right" or "if you have tons of those
>> alternate loose objects you have other issues anyway". I think
>> somebody would have to "own" that counting function, the advantage of
>> just putting it into disambiguate_state is that we just get the
>> counting for free..
>
> Side note: maybe we can mix the two approaches, and keep the counting
> in the disambiguation state, and just make the counting function do
>
>     init_object_disambiguation();
>     find_short_object_filename(&ds);
>     find_short_packed_object(&ds);
>     finish_object_disambiguation(&ds, sha1);
>
> and then just use "ds.nrobjects". So the counting would still be done
> by the disambiguation code, it just woudln't be in get_short_sha1().
>
> So here's another version that takes that approach. And if somebody
> (hint hint) wants to do the counting differently, they can perhaps
> send an incremental patch to do that.
>
> (This patch also contains the few setup issues Junio found with the
> new "default_abbrev is negative" model)

Sorry, but I do not quite see the point in the difference between
this one and your original that had a hook in get_short_sha1(), as
it seemed to me that Peff's objection was about the counting done in
find_short_object_filename() and find_short_packed_object(), which
is (understandably) still here.




^ permalink raw reply	[flat|nested] 111+ messages in thread

* [RFC/PATCH] core.abbrev doc: document and test the abbreviation length
  2016-09-26  4:34   ` Jeff King
  2016-09-26  4:45     ` Junio C Hamano
@ 2019-02-04 16:12     ` Ævar Arnfjörð Bjarmason
  2019-02-04 19:13       ` Junio C Hamano
  2019-02-04 20:04       ` Junio C Hamano
  1 sibling, 2 replies; 111+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-02-04 16:12 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Linus Torvalds,
	Ævar Arnfjörð Bjarmason

The algorithm we use to pick the default abbreviation length as a
function of the approximate number of objects is described in the
commit message for e6c587c733 ("abbrev: auto size the default
abbreviation", 2016-09-30), as well as in and downthread of [1], but
it hasn't been documented.

Let's do that, and while we're at it explicitly test for when the
current implementation will "roll over" up to values of 2^32-1 (the
maximum portable "unsigned long" value).

1. https://public-inbox.org/git/20160926043442.3pz7ccawdcsn2kzb@sigill.intra.peff.net/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
This is a patch from the middle of a series I'm currently working on
re-rolling. See
https://public-inbox.org/git/20180608224136.20220-1-avarab@gmail.com/

What I'd like to get here is commentary on the phrasing and accuracy
of the doc patch I'm adding here.

This patch assumes that we have a abbrev_length_for_object_count()
function, which I've added in an eariler unpublished patch. It just
exposes the length picking algorithm found in find_unique_abbrev_r().

 Documentation/config/core.txt       | 17 +++++++
 builtin/rev-parse.c                 |  8 ++++
 t/t1512-rev-parse-disambiguation.sh | 74 +++++++++++++++++++++++++++++
 3 files changed, 99 insertions(+)

diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index 185857a13f..2175761833 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -599,6 +599,23 @@ core.abbrev::
 	abbreviated object names to stay unique for some time.
 	The minimum length is 4.
 +
+The algorithm to pick the the current abbreviation length is
+considered an implementation detail, and might be changed in the
+future. Since Git version 2.11, the length has been configured to
+auto-scale based on the estimated number of objects in the
+repository. We pick a length such that if all objects in the
+repository were abbreviated, we'd have a 50% chance of a *single*
+collision.
++
+For example, with 2^14-1 is the last object count at which we'll pick
+a short length of "7", and will roll over to "8" once we have one more
+object at 2^14. Since each hexdigit we add (4 bits) allows us to have
+four times (2 bits) as many objects in the repository, we'll roll over
+to a length of "9" at 2^16 objects, "10" at 2^18 etc. We'll never
+automatically pick a length less than "7", which effectively hardcodes
+2^12 as the minimum number of objects in a repository we'll consider
+when choosing the abbreviation length.
++
 This can also be set to relative values such as `+2` or `-2`, which
 means to add or subtract N characters from the SHA-1 that Git would
 otherwise print, this allows for producing more future-proof SHA-1s
diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index d0d751a009..e7bf4375a2 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -773,6 +773,14 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
 					return 1;
 				continue;
 			}
+			if (opt_with_value(arg, "--abbrev-len", &arg)) {
+				unsigned long v;
+				if (!git_parse_ulong(arg, &v))
+					return 1;
+				int len = abbrev_length_for_object_count(v);
+				printf("%d\n", len);
+				continue;
+			}
 			if (!strcmp(arg, "--bisect")) {
 				for_each_fullref_in("refs/bisect/bad", show_reference, NULL, 0);
 				for_each_fullref_in("refs/bisect/good", anti_reference, NULL, 0);
diff --git a/t/t1512-rev-parse-disambiguation.sh b/t/t1512-rev-parse-disambiguation.sh
index 265a6972fc..0e97888a44 100755
--- a/t/t1512-rev-parse-disambiguation.sh
+++ b/t/t1512-rev-parse-disambiguation.sh
@@ -450,4 +450,78 @@ test_expect_success C_LOCALE_OUTPUT 'ambiguous commits are printed by type first
 	done
 '
 
+test_expect_success 'abbreviation length at 2^N-1 and 2^N' '
+	pow_2_min=$(git rev-parse --abbrev-len=3) &&
+	pow_2_eql=$(git rev-parse --abbrev-len=4) &&
+	pow_4_min=$(git rev-parse --abbrev-len=15) &&
+	pow_4_eql=$(git rev-parse --abbrev-len=16) &&
+	pow_6_min=$(git rev-parse --abbrev-len=63) &&
+	pow_6_eql=$(git rev-parse --abbrev-len=64) &&
+	pow_8_min=$(git rev-parse --abbrev-len=255) &&
+	pow_8_eql=$(git rev-parse --abbrev-len=256) &&
+	pow_10_min=$(git rev-parse --abbrev-len=1023) &&
+	pow_10_eql=$(git rev-parse --abbrev-len=1024) &&
+	pow_12_min=$(git rev-parse --abbrev-len=4095) &&
+	pow_12_eql=$(git rev-parse --abbrev-len=4096) &&
+	pow_14_min=$(git rev-parse --abbrev-len=16383) &&
+	pow_14_eql=$(git rev-parse --abbrev-len=16384) &&
+	pow_16_min=$(git rev-parse --abbrev-len=65535) &&
+	pow_16_eql=$(git rev-parse --abbrev-len=65536) &&
+	pow_18_min=$(git rev-parse --abbrev-len=262143) &&
+	pow_18_eql=$(git rev-parse --abbrev-len=262144) &&
+	pow_20_min=$(git rev-parse --abbrev-len=1048575) &&
+	pow_20_eql=$(git rev-parse --abbrev-len=1048576) &&
+	pow_22_min=$(git rev-parse --abbrev-len=4194303) &&
+	pow_22_eql=$(git rev-parse --abbrev-len=4194304) &&
+	pow_24_min=$(git rev-parse --abbrev-len=16777215) &&
+	pow_24_eql=$(git rev-parse --abbrev-len=16777216) &&
+	pow_26_min=$(git rev-parse --abbrev-len=67108863) &&
+	pow_26_eql=$(git rev-parse --abbrev-len=67108864) &&
+	pow_28_min=$(git rev-parse --abbrev-len=268435455) &&
+	pow_28_eql=$(git rev-parse --abbrev-len=268435456) &&
+	pow_30_min=$(git rev-parse --abbrev-len=1073741823) &&
+	pow_30_eql=$(git rev-parse --abbrev-len=1073741824) &&
+	pow_32_min=$(git rev-parse --abbrev-len=4294967295) &&
+
+	cat >actual <<-EOF &&
+	2 = $pow_2_min $pow_2_eql
+	4 = $pow_4_min $pow_4_eql
+	6 = $pow_6_min $pow_6_eql
+	8 = $pow_8_min $pow_8_eql
+	10 = $pow_10_min $pow_10_eql
+	12 = $pow_12_min $pow_12_eql
+	14 = $pow_14_min $pow_14_eql
+	16 = $pow_16_min $pow_16_eql
+	18 = $pow_18_min $pow_18_eql
+	20 = $pow_20_min $pow_20_eql
+	22 = $pow_22_min $pow_22_eql
+	24 = $pow_24_min $pow_24_eql
+	26 = $pow_26_min $pow_26_eql
+	28 = $pow_28_min $pow_28_eql
+	30 = $pow_30_min $pow_30_eql
+	32 = 16
+	EOF
+
+	cat >expected <<-\EOF &&
+	2 = 7 7
+	4 = 7 7
+	6 = 7 7
+	8 = 7 7
+	10 = 7 7
+	12 = 7 7
+	14 = 7 8
+	16 = 8 9
+	18 = 9 10
+	20 = 10 11
+	22 = 11 12
+	24 = 12 13
+	26 = 13 14
+	28 = 14 15
+	30 = 15 16
+	32 = 16
+	EOF
+
+	test_cmp expected actual
+'
+
 test_done
-- 
2.20.1.611.gfbb209baf1


^ permalink raw reply related	[flat|nested] 111+ messages in thread

* Re: [RFC/PATCH] core.abbrev doc: document and test the abbreviation length
  2019-02-04 16:12     ` [RFC/PATCH] core.abbrev doc: document and test the abbreviation length Ævar Arnfjörð Bjarmason
@ 2019-02-04 19:13       ` Junio C Hamano
  2019-02-04 20:04       ` Junio C Hamano
  1 sibling, 0 replies; 111+ messages in thread
From: Junio C Hamano @ 2019-02-04 19:13 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Jeff King, Linus Torvalds

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> +The algorithm to pick the the current abbreviation length is
> +considered an implementation detail, and might be changed in the
> +future. Since Git version 2.11, the length has been configured to
> +auto-scale based on the estimated number of objects in the
> +repository. We pick a length such that if all objects in the
> +repository were abbreviated, we'd have a 50% chance of a *single*
> +collision.

Correct and reads well.

> +For example, with 2^14-1 is the last object count at which we'll pick
> +a short length of "7", and will roll over to "8" once we have one more
> +object at 2^14. Since each hexdigit we add (4 bits) allows us to have
> +four times (2 bits) as many objects in the repository

Something is missing at this point in the sentence. 

	"without raising the chance of a single collision higher"

or something like that.

> , we'll roll over
> +to a length of "9" at 2^16 objects, "10" at 2^18 etc.

Correct and reads well.

> We'll never
> +automatically pick a length less than "7", which effectively hardcodes
> +2^12 as the minimum number of objects in a repository we'll consider
> +when choosing the abbreviation length.

This may be technicaly correct, but to me, it seems to place stress
on the wrong side of the equation.  Since nobody would find "Ah, so
I can create up to 2^12 objects without fearing that my abbreviated
object name would become longer than 7", I do not see much point in
saying "hardcoded floor for the number of objects".

On the other hand, saying that 7 is the hardcoded floor for the
abbreviation length does make sense, as those adept at math after
reading the paragraph up to this point would wonder why their tiny
repository still uses 7 hexdigits, which is way too many to ensure
the low collision rate for the size of their toy repository.

	We do not use abbreviation shorter than 7 hexdigits by default,
	so a small repository with less than 2^12 objects may have even
	smaller chance than 50% to have a single collision.

may be an improvement.


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC/PATCH] core.abbrev doc: document and test the abbreviation length
  2019-02-04 16:12     ` [RFC/PATCH] core.abbrev doc: document and test the abbreviation length Ævar Arnfjörð Bjarmason
  2019-02-04 19:13       ` Junio C Hamano
@ 2019-02-04 20:04       ` Junio C Hamano
  2019-02-04 21:36         ` Ævar Arnfjörð Bjarmason
  2019-02-04 23:32         ` Jeff King
  1 sibling, 2 replies; 111+ messages in thread
From: Junio C Hamano @ 2019-02-04 20:04 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Jeff King, Linus Torvalds

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> @@ -773,6 +773,14 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
>  					return 1;
>  				continue;
>  			}
> +			if (opt_with_value(arg, "--abbrev-len", &arg)) {
> +				unsigned long v;
> +				if (!git_parse_ulong(arg, &v))
> +					return 1;
> +				int len = abbrev_length_for_object_count(v);
> +				printf("%d\n", len);
> +				continue;
> +			}

Instead of exposing this pretty-much "test-only" feature as a new
option to t/helper/test-tool, I think it is OK, if not even better,
to have it in rev-parse proper like this patch does.

I however have a mildly strong suspition that people would expect
"rev-parse --abbrev-len=<num>" to be a synonym of "--short=<num>"

As this is pretty-much a test-only option, perhaps going longer but
more descriptive would make sense?  

	git rev-parse --compute-abbrev-length-for <object-count>

may be an overkill, but something along those lines.

Oh by the way, the code has decl-after-stmt, and perhaps len needs
to be of type "const int" ;-)


^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC/PATCH] core.abbrev doc: document and test the abbreviation length
  2019-02-04 20:04       ` Junio C Hamano
@ 2019-02-04 21:36         ` Ævar Arnfjörð Bjarmason
  2019-02-04 23:32         ` Jeff King
  1 sibling, 0 replies; 111+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-02-04 21:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King, Linus Torvalds


On Mon, Feb 04 2019, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> @@ -773,6 +773,14 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
>>  					return 1;
>>  				continue;
>>  			}
>> +			if (opt_with_value(arg, "--abbrev-len", &arg)) {
>> +				unsigned long v;
>> +				if (!git_parse_ulong(arg, &v))
>> +					return 1;
>> +				int len = abbrev_length_for_object_count(v);
>> +				printf("%d\n", len);
>> +				continue;
>> +			}
>
> Instead of exposing this pretty-much "test-only" feature as a new
> option to t/helper/test-tool, I think it is OK, if not even better,
> to have it in rev-parse proper like this patch does.

While I mainly added this code so I could prove the docs correct with a
test for both myself & others, I think having this exposed is probably
useful.

I've seen more than once some feature of a web frontend for git where
there's both access to aggregate statistics (number of commits or
objects), and SHA-1 shortening going on, but the latter is just done via
substr().

Right now we have nothing directly exposed to answer "what length would
git pick", you can of course e.g. "log --abbrev" a single commit, but if
that commit happens to be more ambiguous than most you'll get the right
answer.


> I however have a mildly strong suspition that people would expect
> "rev-parse --abbrev-len=<num>" to be a synonym of "--short=<num>"
>
> As this is pretty-much a test-only option, perhaps going longer but
> more descriptive would make sense?
>
> 	git rev-parse --compute-abbrev-length-for <object-count>
>
> may be an overkill, but something along those lines.

Yeah I think that's better. This is so rare that it's better to be
verbose.

> Oh by the way, the code has decl-after-stmt, and perhaps len needs
> to be of type "const int" ;-)

Thanks. Will fix.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC/PATCH] core.abbrev doc: document and test the abbreviation length
  2019-02-04 20:04       ` Junio C Hamano
  2019-02-04 21:36         ` Ævar Arnfjörð Bjarmason
@ 2019-02-04 23:32         ` Jeff King
  2019-02-04 23:50           ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 111+ messages in thread
From: Jeff King @ 2019-02-04 23:32 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, git, Linus Torvalds

On Mon, Feb 04, 2019 at 12:04:21PM -0800, Junio C Hamano wrote:

> Instead of exposing this pretty-much "test-only" feature as a new
> option to t/helper/test-tool, I think it is OK, if not even better,
> to have it in rev-parse proper like this patch does.
> 
> I however have a mildly strong suspition that people would expect
> "rev-parse --abbrev-len=<num>" to be a synonym of "--short=<num>"
> 
> As this is pretty-much a test-only option, perhaps going longer but
> more descriptive would make sense?  
> 
> 	git rev-parse --compute-abbrev-length-for <object-count>
> 
> may be an overkill, but something along those lines.

You could even default <object-count> to the number of objects in the
repository. Which implies that perhaps the best spot is the command
where we already count the number of objects, git-count-objects.

-Peff

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC/PATCH] core.abbrev doc: document and test the abbreviation length
  2019-02-04 23:32         ` Jeff King
@ 2019-02-04 23:50           ` Ævar Arnfjörð Bjarmason
  2019-02-06 18:29             ` Jeff King
  0 siblings, 1 reply; 111+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-02-04 23:50 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Linus Torvalds


On Tue, Feb 05 2019, Jeff King wrote:

> On Mon, Feb 04, 2019 at 12:04:21PM -0800, Junio C Hamano wrote:
>
>> Instead of exposing this pretty-much "test-only" feature as a new
>> option to t/helper/test-tool, I think it is OK, if not even better,
>> to have it in rev-parse proper like this patch does.
>>
>> I however have a mildly strong suspition that people would expect
>> "rev-parse --abbrev-len=<num>" to be a synonym of "--short=<num>"
>>
>> As this is pretty-much a test-only option, perhaps going longer but
>> more descriptive would make sense?
>>
>> 	git rev-parse --compute-abbrev-length-for <object-count>
>>
>> may be an overkill, but something along those lines.
>
> You could even default <object-count> to the number of objects in the
> repository. Which implies that perhaps the best spot is the command
> where we already count the number of objects, git-count-objects.

That's documented as reporting loose objects by default, although it has
a full report with -v.

Maybe rev-parse isn't the right place, I just picked it because it seems
to be the general utility belt for stuff that doesn't fit elsewhere.

But putting it in git-count-objects seems like a bit more of a stretch
given the above.

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC/PATCH] core.abbrev doc: document and test the abbreviation length
  2019-02-04 23:50           ` Ævar Arnfjörð Bjarmason
@ 2019-02-06 18:29             ` Jeff King
  2019-02-06 18:36               ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 111+ messages in thread
From: Jeff King @ 2019-02-06 18:29 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, git, Linus Torvalds

On Tue, Feb 05, 2019 at 12:50:23AM +0100, Ævar Arnfjörð Bjarmason wrote:

> >> As this is pretty-much a test-only option, perhaps going longer but
> >> more descriptive would make sense?
> >>
> >> 	git rev-parse --compute-abbrev-length-for <object-count>
> >>
> >> may be an overkill, but something along those lines.
> >
> > You could even default <object-count> to the number of objects in the
> > repository. Which implies that perhaps the best spot is the command
> > where we already count the number of objects, git-count-objects.
> 
> That's documented as reporting loose objects by default, although it has
> a full report with -v.

True, though I think that's mostly for historical reasons. It _could_ be
part of the full report, like:

  $ git count-objects -v
  ...
  abbrev-len: 12

but from your test-script usage, I'd expect you'd want to be able to
feed a fake count to it, like:

  git count-objects --compute-abbrev-len=1234

or something (of course you _could_ also make a repository with N
objects, but that's a lot more expensive).

> Maybe rev-parse isn't the right place, I just picked it because it seems
> to be the general utility belt for stuff that doesn't fit elsewhere.
> 
> But putting it in git-count-objects seems like a bit more of a stretch
> given the above.

I dunno. It seems like less of a stretch to me, but it is true that
rev-parse is already a kitchen sink repository. I can live with it
either way.

-Peff

^ permalink raw reply	[flat|nested] 111+ messages in thread

* Re: [RFC/PATCH] core.abbrev doc: document and test the abbreviation length
  2019-02-06 18:29             ` Jeff King
@ 2019-02-06 18:36               ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 111+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-02-06 18:36 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Linus Torvalds


On Wed, Feb 06 2019, Jeff King wrote:

> On Tue, Feb 05, 2019 at 12:50:23AM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> >> As this is pretty-much a test-only option, perhaps going longer but
>> >> more descriptive would make sense?
>> >>
>> >> 	git rev-parse --compute-abbrev-length-for <object-count>
>> >>
>> >> may be an overkill, but something along those lines.
>> >
>> > You could even default <object-count> to the number of objects in the
>> > repository. Which implies that perhaps the best spot is the command
>> > where we already count the number of objects, git-count-objects.
>>
>> That's documented as reporting loose objects by default, although it has
>> a full report with -v.
>
> True, though I think that's mostly for historical reasons. It _could_ be
> part of the full report, like:
>
>   $ git count-objects -v
>   ...
>   abbrev-len: 12
>
> but from your test-script usage, I'd expect you'd want to be able to
> feed a fake count to it, like:
>
>   git count-objects --compute-abbrev-len=1234

Yeah for just reporting it count-objects makes more sense. I think I'll
add it there...

> or something (of course you _could_ also make a repository with N
> objects, but that's a lot more expensive).

...but yes, for the test script & to export the info I'd like to have
the "what's the abbrev length for a repo with N objects" option, which
would be for rev-parse.

>> Maybe rev-parse isn't the right place, I just picked it because it seems
>> to be the general utility belt for stuff that doesn't fit elsewhere.
>>
>> But putting it in git-count-objects seems like a bit more of a stretch
>> given the above.
>
> I dunno. It seems like less of a stretch to me, but it is true that
> rev-parse is already a kitchen sink repository. I can live with it
> either way.
>
> -Peff

^ permalink raw reply	[flat|nested] 111+ messages in thread

end of thread, other threads:[~2019-02-06 18:36 UTC | newest]

Thread overview: 111+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-26  1:39 Changing the default for "core.abbrev"? Linus Torvalds
2016-09-26  3:46 ` Junio C Hamano
2016-09-26  4:34   ` Jeff King
2016-09-26  4:45     ` Junio C Hamano
2016-09-26 11:57       ` [PATCH 0/10] helping people resolve ambiguous sha1s Jeff King
2016-09-26 11:59         ` [PATCH 01/10] get_sha1: detect buggy calls with multiple disambiguators Jeff King
2016-09-26 16:37           ` Junio C Hamano
2016-09-26 17:21             ` Jeff King
2016-09-26 17:50               ` Junio C Hamano
2016-09-26 11:59         ` [PATCH 02/10] get_sha1: avoid repeating ourselves via ONLY_TO_DIE Jeff King
2016-09-26 11:59         ` [PATCH 03/10] get_sha1: propagate flags to child functions Jeff King
2016-09-26 11:59         ` [PATCH 04/10] get_short_sha1: peel tags when looking for treeish Jeff King
2016-09-26 12:11           ` Jeff King
2016-09-26 16:55           ` Junio C Hamano
2016-09-26 17:23             ` Jeff King
2016-09-26 12:00         ` [PATCH 05/10] get_short_sha1: refactor init of disambiguation code Jeff King
2016-09-26 12:00         ` [PATCH 06/10] get_short_sha1: NUL-terminate hex prefix Jeff King
2016-09-26 17:10           ` Junio C Hamano
2016-09-26 17:25             ` Jeff King
2016-09-26 17:36               ` Junio C Hamano
2016-09-26 12:00         ` [PATCH 07/10] get_short_sha1: mark ambiguity error for translation Jeff King
2016-09-26 12:00         ` [PATCH 08/10] sha1_array: let callbacks interrupt iteration Jeff King
2016-09-26 12:00         ` [PATCH 09/10] for_each_abbrev: drop duplicate objects Jeff King
2016-09-26 12:00         ` [PATCH 10/10] get_short_sha1: list ambiguous objects on error Jeff King
2016-09-26 16:36           ` Linus Torvalds
2016-09-27  5:42             ` Jacob Keller
2016-09-27 12:38             ` Jeff King
2016-09-29 13:01             ` Kyle J. McKay
2016-09-29 13:24               ` Jeff King
2016-09-29 14:36                 ` Kyle J. McKay
2016-09-29 14:55                   ` Jeff King
2016-09-26 17:30           ` Junio C Hamano
2016-09-26 17:34             ` Jeff King
2016-09-26 17:39               ` Junio C Hamano
2016-09-29 11:46           ` Kyle J. McKay
2016-09-29 13:03             ` Jeff King
2016-09-29 17:19               ` Junio C Hamano
2016-09-30  5:51                 ` Jacob Keller
2019-02-04 16:12     ` [RFC/PATCH] core.abbrev doc: document and test the abbreviation length Ævar Arnfjörð Bjarmason
2019-02-04 19:13       ` Junio C Hamano
2019-02-04 20:04       ` Junio C Hamano
2019-02-04 21:36         ` Ævar Arnfjörð Bjarmason
2019-02-04 23:32         ` Jeff King
2019-02-04 23:50           ` Ævar Arnfjörð Bjarmason
2019-02-06 18:29             ` Jeff King
2019-02-06 18:36               ` Ævar Arnfjörð Bjarmason
2016-09-26  6:33   ` Changing the default for "core.abbrev"? Matthieu Moy
2016-09-26 12:09     ` Jeff King
2016-09-29 13:01   ` Kyle J. McKay
2016-09-26  7:13 ` Christian Couder
2016-09-28 23:30 ` [PATCH 0/4] raising core.abbrev default to 12 hexdigits Junio C Hamano
2016-09-28 23:30   ` [PATCH 1/4] config: allow customizing /etc/gitconfig location Junio C Hamano
2016-09-29  9:53     ` Jakub Narębski
2016-09-29 17:20       ` Junio C Hamano
2016-09-29 17:45         ` Matthieu Moy
2016-09-28 23:30   ` [PATCH 2/4] t13xx: do not assume system config is empty Junio C Hamano
2016-09-29  9:01     ` Jeff King
2016-09-29 18:13       ` Junio C Hamano
2016-09-29 18:26         ` Jeff King
2016-09-29 18:57           ` Junio C Hamano
2016-09-29 19:18             ` Jeff King
2016-09-29 19:57               ` Junio C Hamano
2016-09-29 19:06           ` Junio C Hamano
2016-09-29 19:26             ` Jeff King
2016-09-29 21:03               ` Junio C Hamano
2016-09-29 21:08                 ` Jeff King
2016-09-28 23:30   ` [PATCH 3/4] worktree: honor configuration variables Junio C Hamano
2016-09-28 23:30   ` [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits Junio C Hamano
2016-09-29  2:44     ` SZEDER Gábor
2016-09-29  5:27       ` Lukas Fleischer
2016-09-29  9:22         ` Jeff King
2016-09-29  9:15       ` Jeff King
2016-09-29 10:03         ` Matthieu Moy
2016-09-29 12:52         ` SZEDER Gábor
2016-09-29  5:58     ` Johannes Sixt
2016-09-29 18:05       ` Junio C Hamano
2016-09-29 18:37         ` Linus Torvalds
2016-09-29 18:55           ` Linus Torvalds
2016-09-29 19:06             ` Linus Torvalds
2016-09-29 19:42               ` Junio C Hamano
2016-09-30  0:56               ` Mike Hommey
2016-09-30  1:01                 ` Linus Torvalds
2016-09-30 19:41                   ` Ævar Arnfjörð Bjarmason
2016-09-29 19:16             ` Jeff King
2016-09-29 19:40               ` Linus Torvalds
2016-09-29 19:45                 ` Junio C Hamano
2016-09-29 21:53                   ` Linus Torvalds
2016-09-29 23:13                     ` Junio C Hamano
2016-09-29 23:20                       ` Junio C Hamano
2016-09-30  0:20                       ` Linus Torvalds
2016-09-30  0:28                         ` Linus Torvalds
2016-09-30  0:57                           ` Linus Torvalds
2016-09-30  1:18                             ` Linus Torvalds
2016-09-30  3:54                               ` Junio C Hamano
2016-09-30  4:10                                 ` Junio C Hamano
2016-09-30  4:18                                   ` Linus Torvalds
2016-09-30  4:29                                     ` Linus Torvalds
2016-09-30  4:27                                   ` Junio C Hamano
2016-09-30  4:35                                     ` Junio C Hamano
2016-09-30 18:40                                     ` Junio C Hamano
2016-09-30 18:51                                       ` Linus Torvalds
2016-09-30 19:00                                         ` Junio C Hamano
2016-09-30  4:11                                 ` Linus Torvalds
2016-09-30  8:06                               ` Jeff King
2016-09-30 17:54                                 ` Linus Torvalds
2016-09-30 18:05                                   ` Jeff King
2016-09-30 18:21                                   ` Linus Torvalds
2016-09-30 20:01                                     ` Junio C Hamano
2016-09-30 17:56                                 ` Junio C Hamano
2016-09-30  7:47                       ` Jeff King
2016-09-29  9:25     ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).