From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>, "Jeff King" <peff@peff.net>,
"Linus Torvalds" <torvalds@linux-foundation.org>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: [RFC/PATCH] core.abbrev doc: document and test the abbreviation length
Date: Mon, 4 Feb 2019 17:12:17 +0100 [thread overview]
Message-ID: <20190204161217.20047-1-avarab@gmail.com> (raw)
In-Reply-To: <20160926043442.3pz7ccawdcsn2kzb@sigill.intra.peff.net>
The algorithm we use to pick the default abbreviation length as a
function of the approximate number of objects is described in the
commit message for e6c587c733 ("abbrev: auto size the default
abbreviation", 2016-09-30), as well as in and downthread of [1], but
it hasn't been documented.
Let's do that, and while we're at it explicitly test for when the
current implementation will "roll over" up to values of 2^32-1 (the
maximum portable "unsigned long" value).
1. https://public-inbox.org/git/20160926043442.3pz7ccawdcsn2kzb@sigill.intra.peff.net/
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
This is a patch from the middle of a series I'm currently working on
re-rolling. See
https://public-inbox.org/git/20180608224136.20220-1-avarab@gmail.com/
What I'd like to get here is commentary on the phrasing and accuracy
of the doc patch I'm adding here.
This patch assumes that we have a abbrev_length_for_object_count()
function, which I've added in an eariler unpublished patch. It just
exposes the length picking algorithm found in find_unique_abbrev_r().
Documentation/config/core.txt | 17 +++++++
builtin/rev-parse.c | 8 ++++
t/t1512-rev-parse-disambiguation.sh | 74 +++++++++++++++++++++++++++++
3 files changed, 99 insertions(+)
diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index 185857a13f..2175761833 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -599,6 +599,23 @@ core.abbrev::
abbreviated object names to stay unique for some time.
The minimum length is 4.
+
+The algorithm to pick the the current abbreviation length is
+considered an implementation detail, and might be changed in the
+future. Since Git version 2.11, the length has been configured to
+auto-scale based on the estimated number of objects in the
+repository. We pick a length such that if all objects in the
+repository were abbreviated, we'd have a 50% chance of a *single*
+collision.
++
+For example, with 2^14-1 is the last object count at which we'll pick
+a short length of "7", and will roll over to "8" once we have one more
+object at 2^14. Since each hexdigit we add (4 bits) allows us to have
+four times (2 bits) as many objects in the repository, we'll roll over
+to a length of "9" at 2^16 objects, "10" at 2^18 etc. We'll never
+automatically pick a length less than "7", which effectively hardcodes
+2^12 as the minimum number of objects in a repository we'll consider
+when choosing the abbreviation length.
++
This can also be set to relative values such as `+2` or `-2`, which
means to add or subtract N characters from the SHA-1 that Git would
otherwise print, this allows for producing more future-proof SHA-1s
diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index d0d751a009..e7bf4375a2 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -773,6 +773,14 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
return 1;
continue;
}
+ if (opt_with_value(arg, "--abbrev-len", &arg)) {
+ unsigned long v;
+ if (!git_parse_ulong(arg, &v))
+ return 1;
+ int len = abbrev_length_for_object_count(v);
+ printf("%d\n", len);
+ continue;
+ }
if (!strcmp(arg, "--bisect")) {
for_each_fullref_in("refs/bisect/bad", show_reference, NULL, 0);
for_each_fullref_in("refs/bisect/good", anti_reference, NULL, 0);
diff --git a/t/t1512-rev-parse-disambiguation.sh b/t/t1512-rev-parse-disambiguation.sh
index 265a6972fc..0e97888a44 100755
--- a/t/t1512-rev-parse-disambiguation.sh
+++ b/t/t1512-rev-parse-disambiguation.sh
@@ -450,4 +450,78 @@ test_expect_success C_LOCALE_OUTPUT 'ambiguous commits are printed by type first
done
'
+test_expect_success 'abbreviation length at 2^N-1 and 2^N' '
+ pow_2_min=$(git rev-parse --abbrev-len=3) &&
+ pow_2_eql=$(git rev-parse --abbrev-len=4) &&
+ pow_4_min=$(git rev-parse --abbrev-len=15) &&
+ pow_4_eql=$(git rev-parse --abbrev-len=16) &&
+ pow_6_min=$(git rev-parse --abbrev-len=63) &&
+ pow_6_eql=$(git rev-parse --abbrev-len=64) &&
+ pow_8_min=$(git rev-parse --abbrev-len=255) &&
+ pow_8_eql=$(git rev-parse --abbrev-len=256) &&
+ pow_10_min=$(git rev-parse --abbrev-len=1023) &&
+ pow_10_eql=$(git rev-parse --abbrev-len=1024) &&
+ pow_12_min=$(git rev-parse --abbrev-len=4095) &&
+ pow_12_eql=$(git rev-parse --abbrev-len=4096) &&
+ pow_14_min=$(git rev-parse --abbrev-len=16383) &&
+ pow_14_eql=$(git rev-parse --abbrev-len=16384) &&
+ pow_16_min=$(git rev-parse --abbrev-len=65535) &&
+ pow_16_eql=$(git rev-parse --abbrev-len=65536) &&
+ pow_18_min=$(git rev-parse --abbrev-len=262143) &&
+ pow_18_eql=$(git rev-parse --abbrev-len=262144) &&
+ pow_20_min=$(git rev-parse --abbrev-len=1048575) &&
+ pow_20_eql=$(git rev-parse --abbrev-len=1048576) &&
+ pow_22_min=$(git rev-parse --abbrev-len=4194303) &&
+ pow_22_eql=$(git rev-parse --abbrev-len=4194304) &&
+ pow_24_min=$(git rev-parse --abbrev-len=16777215) &&
+ pow_24_eql=$(git rev-parse --abbrev-len=16777216) &&
+ pow_26_min=$(git rev-parse --abbrev-len=67108863) &&
+ pow_26_eql=$(git rev-parse --abbrev-len=67108864) &&
+ pow_28_min=$(git rev-parse --abbrev-len=268435455) &&
+ pow_28_eql=$(git rev-parse --abbrev-len=268435456) &&
+ pow_30_min=$(git rev-parse --abbrev-len=1073741823) &&
+ pow_30_eql=$(git rev-parse --abbrev-len=1073741824) &&
+ pow_32_min=$(git rev-parse --abbrev-len=4294967295) &&
+
+ cat >actual <<-EOF &&
+ 2 = $pow_2_min $pow_2_eql
+ 4 = $pow_4_min $pow_4_eql
+ 6 = $pow_6_min $pow_6_eql
+ 8 = $pow_8_min $pow_8_eql
+ 10 = $pow_10_min $pow_10_eql
+ 12 = $pow_12_min $pow_12_eql
+ 14 = $pow_14_min $pow_14_eql
+ 16 = $pow_16_min $pow_16_eql
+ 18 = $pow_18_min $pow_18_eql
+ 20 = $pow_20_min $pow_20_eql
+ 22 = $pow_22_min $pow_22_eql
+ 24 = $pow_24_min $pow_24_eql
+ 26 = $pow_26_min $pow_26_eql
+ 28 = $pow_28_min $pow_28_eql
+ 30 = $pow_30_min $pow_30_eql
+ 32 = 16
+ EOF
+
+ cat >expected <<-\EOF &&
+ 2 = 7 7
+ 4 = 7 7
+ 6 = 7 7
+ 8 = 7 7
+ 10 = 7 7
+ 12 = 7 7
+ 14 = 7 8
+ 16 = 8 9
+ 18 = 9 10
+ 20 = 10 11
+ 22 = 11 12
+ 24 = 12 13
+ 26 = 13 14
+ 28 = 14 15
+ 30 = 15 16
+ 32 = 16
+ EOF
+
+ test_cmp expected actual
+'
+
test_done
--
2.20.1.611.gfbb209baf1
next prev parent reply other threads:[~2019-02-04 16:12 UTC|newest]
Thread overview: 111+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-26 1:39 Changing the default for "core.abbrev"? Linus Torvalds
2016-09-26 3:46 ` Junio C Hamano
2016-09-26 4:34 ` Jeff King
2016-09-26 4:45 ` Junio C Hamano
2016-09-26 11:57 ` [PATCH 0/10] helping people resolve ambiguous sha1s Jeff King
2016-09-26 11:59 ` [PATCH 01/10] get_sha1: detect buggy calls with multiple disambiguators Jeff King
2016-09-26 16:37 ` Junio C Hamano
2016-09-26 17:21 ` Jeff King
2016-09-26 17:50 ` Junio C Hamano
2016-09-26 11:59 ` [PATCH 02/10] get_sha1: avoid repeating ourselves via ONLY_TO_DIE Jeff King
2016-09-26 11:59 ` [PATCH 03/10] get_sha1: propagate flags to child functions Jeff King
2016-09-26 11:59 ` [PATCH 04/10] get_short_sha1: peel tags when looking for treeish Jeff King
2016-09-26 12:11 ` Jeff King
2016-09-26 16:55 ` Junio C Hamano
2016-09-26 17:23 ` Jeff King
2016-09-26 12:00 ` [PATCH 05/10] get_short_sha1: refactor init of disambiguation code Jeff King
2016-09-26 12:00 ` [PATCH 06/10] get_short_sha1: NUL-terminate hex prefix Jeff King
2016-09-26 17:10 ` Junio C Hamano
2016-09-26 17:25 ` Jeff King
2016-09-26 17:36 ` Junio C Hamano
2016-09-26 12:00 ` [PATCH 07/10] get_short_sha1: mark ambiguity error for translation Jeff King
2016-09-26 12:00 ` [PATCH 08/10] sha1_array: let callbacks interrupt iteration Jeff King
2016-09-26 12:00 ` [PATCH 09/10] for_each_abbrev: drop duplicate objects Jeff King
2016-09-26 12:00 ` [PATCH 10/10] get_short_sha1: list ambiguous objects on error Jeff King
2016-09-26 16:36 ` Linus Torvalds
2016-09-27 5:42 ` Jacob Keller
2016-09-27 12:38 ` Jeff King
2016-09-29 13:01 ` Kyle J. McKay
2016-09-29 13:24 ` Jeff King
2016-09-29 14:36 ` Kyle J. McKay
2016-09-29 14:55 ` Jeff King
2016-09-26 17:30 ` Junio C Hamano
2016-09-26 17:34 ` Jeff King
2016-09-26 17:39 ` Junio C Hamano
2016-09-29 11:46 ` Kyle J. McKay
2016-09-29 13:03 ` Jeff King
2016-09-29 17:19 ` Junio C Hamano
2016-09-30 5:51 ` Jacob Keller
2019-02-04 16:12 ` Ævar Arnfjörð Bjarmason [this message]
2019-02-04 19:13 ` [RFC/PATCH] core.abbrev doc: document and test the abbreviation length Junio C Hamano
2019-02-04 20:04 ` Junio C Hamano
2019-02-04 21:36 ` Ævar Arnfjörð Bjarmason
2019-02-04 23:32 ` Jeff King
2019-02-04 23:50 ` Ævar Arnfjörð Bjarmason
2019-02-06 18:29 ` Jeff King
2019-02-06 18:36 ` Ævar Arnfjörð Bjarmason
2016-09-26 6:33 ` Changing the default for "core.abbrev"? Matthieu Moy
2016-09-26 12:09 ` Jeff King
2016-09-29 13:01 ` Kyle J. McKay
2016-09-26 7:13 ` Christian Couder
2016-09-28 23:30 ` [PATCH 0/4] raising core.abbrev default to 12 hexdigits Junio C Hamano
2016-09-28 23:30 ` [PATCH 1/4] config: allow customizing /etc/gitconfig location Junio C Hamano
2016-09-29 9:53 ` Jakub Narębski
2016-09-29 17:20 ` Junio C Hamano
2016-09-29 17:45 ` Matthieu Moy
2016-09-28 23:30 ` [PATCH 2/4] t13xx: do not assume system config is empty Junio C Hamano
2016-09-29 9:01 ` Jeff King
2016-09-29 18:13 ` Junio C Hamano
2016-09-29 18:26 ` Jeff King
2016-09-29 18:57 ` Junio C Hamano
2016-09-29 19:18 ` Jeff King
2016-09-29 19:57 ` Junio C Hamano
2016-09-29 19:06 ` Junio C Hamano
2016-09-29 19:26 ` Jeff King
2016-09-29 21:03 ` Junio C Hamano
2016-09-29 21:08 ` Jeff King
2016-09-28 23:30 ` [PATCH 3/4] worktree: honor configuration variables Junio C Hamano
2016-09-28 23:30 ` [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits Junio C Hamano
2016-09-29 2:44 ` SZEDER Gábor
2016-09-29 5:27 ` Lukas Fleischer
2016-09-29 9:22 ` Jeff King
2016-09-29 9:15 ` Jeff King
2016-09-29 10:03 ` Matthieu Moy
2016-09-29 12:52 ` SZEDER Gábor
2016-09-29 5:58 ` Johannes Sixt
2016-09-29 18:05 ` Junio C Hamano
2016-09-29 18:37 ` Linus Torvalds
2016-09-29 18:55 ` Linus Torvalds
2016-09-29 19:06 ` Linus Torvalds
2016-09-29 19:42 ` Junio C Hamano
2016-09-30 0:56 ` Mike Hommey
2016-09-30 1:01 ` Linus Torvalds
2016-09-30 19:41 ` Ævar Arnfjörð Bjarmason
2016-09-29 19:16 ` Jeff King
2016-09-29 19:40 ` Linus Torvalds
2016-09-29 19:45 ` Junio C Hamano
2016-09-29 21:53 ` Linus Torvalds
2016-09-29 23:13 ` Junio C Hamano
2016-09-29 23:20 ` Junio C Hamano
2016-09-30 0:20 ` Linus Torvalds
2016-09-30 0:28 ` Linus Torvalds
2016-09-30 0:57 ` Linus Torvalds
2016-09-30 1:18 ` Linus Torvalds
2016-09-30 3:54 ` Junio C Hamano
2016-09-30 4:10 ` Junio C Hamano
2016-09-30 4:18 ` Linus Torvalds
2016-09-30 4:29 ` Linus Torvalds
2016-09-30 4:27 ` Junio C Hamano
2016-09-30 4:35 ` Junio C Hamano
2016-09-30 18:40 ` Junio C Hamano
2016-09-30 18:51 ` Linus Torvalds
2016-09-30 19:00 ` Junio C Hamano
2016-09-30 4:11 ` Linus Torvalds
2016-09-30 8:06 ` Jeff King
2016-09-30 17:54 ` Linus Torvalds
2016-09-30 18:05 ` Jeff King
2016-09-30 18:21 ` Linus Torvalds
2016-09-30 20:01 ` Junio C Hamano
2016-09-30 17:56 ` Junio C Hamano
2016-09-30 7:47 ` Jeff King
2016-09-29 9:25 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190204161217.20047-1-avarab@gmail.com \
--to=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).