git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>, "Jeff King" <peff@peff.net>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: [RFC/PATCH] core.abbrev doc: document and test the abbreviation length
Date: Mon,  4 Feb 2019 17:12:17 +0100	[thread overview]
Message-ID: <20190204161217.20047-1-avarab@gmail.com> (raw)
In-Reply-To: <20160926043442.3pz7ccawdcsn2kzb@sigill.intra.peff.net>

The algorithm we use to pick the default abbreviation length as a
function of the approximate number of objects is described in the
commit message for e6c587c733 ("abbrev: auto size the default
abbreviation", 2016-09-30), as well as in and downthread of [1], but
it hasn't been documented.

Let's do that, and while we're at it explicitly test for when the
current implementation will "roll over" up to values of 2^32-1 (the
maximum portable "unsigned long" value).

1. https://public-inbox.org/git/20160926043442.3pz7ccawdcsn2kzb@sigill.intra.peff.net/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
This is a patch from the middle of a series I'm currently working on
re-rolling. See
https://public-inbox.org/git/20180608224136.20220-1-avarab@gmail.com/

What I'd like to get here is commentary on the phrasing and accuracy
of the doc patch I'm adding here.

This patch assumes that we have a abbrev_length_for_object_count()
function, which I've added in an eariler unpublished patch. It just
exposes the length picking algorithm found in find_unique_abbrev_r().

 Documentation/config/core.txt       | 17 +++++++
 builtin/rev-parse.c                 |  8 ++++
 t/t1512-rev-parse-disambiguation.sh | 74 +++++++++++++++++++++++++++++
 3 files changed, 99 insertions(+)

diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
index 185857a13f..2175761833 100644
--- a/Documentation/config/core.txt
+++ b/Documentation/config/core.txt
@@ -599,6 +599,23 @@ core.abbrev::
 	abbreviated object names to stay unique for some time.
 	The minimum length is 4.
 +
+The algorithm to pick the the current abbreviation length is
+considered an implementation detail, and might be changed in the
+future. Since Git version 2.11, the length has been configured to
+auto-scale based on the estimated number of objects in the
+repository. We pick a length such that if all objects in the
+repository were abbreviated, we'd have a 50% chance of a *single*
+collision.
++
+For example, with 2^14-1 is the last object count at which we'll pick
+a short length of "7", and will roll over to "8" once we have one more
+object at 2^14. Since each hexdigit we add (4 bits) allows us to have
+four times (2 bits) as many objects in the repository, we'll roll over
+to a length of "9" at 2^16 objects, "10" at 2^18 etc. We'll never
+automatically pick a length less than "7", which effectively hardcodes
+2^12 as the minimum number of objects in a repository we'll consider
+when choosing the abbreviation length.
++
 This can also be set to relative values such as `+2` or `-2`, which
 means to add or subtract N characters from the SHA-1 that Git would
 otherwise print, this allows for producing more future-proof SHA-1s
diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index d0d751a009..e7bf4375a2 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -773,6 +773,14 @@ int cmd_rev_parse(int argc, const char **argv, const char *prefix)
 					return 1;
 				continue;
 			}
+			if (opt_with_value(arg, "--abbrev-len", &arg)) {
+				unsigned long v;
+				if (!git_parse_ulong(arg, &v))
+					return 1;
+				int len = abbrev_length_for_object_count(v);
+				printf("%d\n", len);
+				continue;
+			}
 			if (!strcmp(arg, "--bisect")) {
 				for_each_fullref_in("refs/bisect/bad", show_reference, NULL, 0);
 				for_each_fullref_in("refs/bisect/good", anti_reference, NULL, 0);
diff --git a/t/t1512-rev-parse-disambiguation.sh b/t/t1512-rev-parse-disambiguation.sh
index 265a6972fc..0e97888a44 100755
--- a/t/t1512-rev-parse-disambiguation.sh
+++ b/t/t1512-rev-parse-disambiguation.sh
@@ -450,4 +450,78 @@ test_expect_success C_LOCALE_OUTPUT 'ambiguous commits are printed by type first
 	done
 '
 
+test_expect_success 'abbreviation length at 2^N-1 and 2^N' '
+	pow_2_min=$(git rev-parse --abbrev-len=3) &&
+	pow_2_eql=$(git rev-parse --abbrev-len=4) &&
+	pow_4_min=$(git rev-parse --abbrev-len=15) &&
+	pow_4_eql=$(git rev-parse --abbrev-len=16) &&
+	pow_6_min=$(git rev-parse --abbrev-len=63) &&
+	pow_6_eql=$(git rev-parse --abbrev-len=64) &&
+	pow_8_min=$(git rev-parse --abbrev-len=255) &&
+	pow_8_eql=$(git rev-parse --abbrev-len=256) &&
+	pow_10_min=$(git rev-parse --abbrev-len=1023) &&
+	pow_10_eql=$(git rev-parse --abbrev-len=1024) &&
+	pow_12_min=$(git rev-parse --abbrev-len=4095) &&
+	pow_12_eql=$(git rev-parse --abbrev-len=4096) &&
+	pow_14_min=$(git rev-parse --abbrev-len=16383) &&
+	pow_14_eql=$(git rev-parse --abbrev-len=16384) &&
+	pow_16_min=$(git rev-parse --abbrev-len=65535) &&
+	pow_16_eql=$(git rev-parse --abbrev-len=65536) &&
+	pow_18_min=$(git rev-parse --abbrev-len=262143) &&
+	pow_18_eql=$(git rev-parse --abbrev-len=262144) &&
+	pow_20_min=$(git rev-parse --abbrev-len=1048575) &&
+	pow_20_eql=$(git rev-parse --abbrev-len=1048576) &&
+	pow_22_min=$(git rev-parse --abbrev-len=4194303) &&
+	pow_22_eql=$(git rev-parse --abbrev-len=4194304) &&
+	pow_24_min=$(git rev-parse --abbrev-len=16777215) &&
+	pow_24_eql=$(git rev-parse --abbrev-len=16777216) &&
+	pow_26_min=$(git rev-parse --abbrev-len=67108863) &&
+	pow_26_eql=$(git rev-parse --abbrev-len=67108864) &&
+	pow_28_min=$(git rev-parse --abbrev-len=268435455) &&
+	pow_28_eql=$(git rev-parse --abbrev-len=268435456) &&
+	pow_30_min=$(git rev-parse --abbrev-len=1073741823) &&
+	pow_30_eql=$(git rev-parse --abbrev-len=1073741824) &&
+	pow_32_min=$(git rev-parse --abbrev-len=4294967295) &&
+
+	cat >actual <<-EOF &&
+	2 = $pow_2_min $pow_2_eql
+	4 = $pow_4_min $pow_4_eql
+	6 = $pow_6_min $pow_6_eql
+	8 = $pow_8_min $pow_8_eql
+	10 = $pow_10_min $pow_10_eql
+	12 = $pow_12_min $pow_12_eql
+	14 = $pow_14_min $pow_14_eql
+	16 = $pow_16_min $pow_16_eql
+	18 = $pow_18_min $pow_18_eql
+	20 = $pow_20_min $pow_20_eql
+	22 = $pow_22_min $pow_22_eql
+	24 = $pow_24_min $pow_24_eql
+	26 = $pow_26_min $pow_26_eql
+	28 = $pow_28_min $pow_28_eql
+	30 = $pow_30_min $pow_30_eql
+	32 = 16
+	EOF
+
+	cat >expected <<-\EOF &&
+	2 = 7 7
+	4 = 7 7
+	6 = 7 7
+	8 = 7 7
+	10 = 7 7
+	12 = 7 7
+	14 = 7 8
+	16 = 8 9
+	18 = 9 10
+	20 = 10 11
+	22 = 11 12
+	24 = 12 13
+	26 = 13 14
+	28 = 14 15
+	30 = 15 16
+	32 = 16
+	EOF
+
+	test_cmp expected actual
+'
+
 test_done
-- 
2.20.1.611.gfbb209baf1


  parent reply	other threads:[~2019-02-04 16:12 UTC|newest]

Thread overview: 111+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-26  1:39 Changing the default for "core.abbrev"? Linus Torvalds
2016-09-26  3:46 ` Junio C Hamano
2016-09-26  4:34   ` Jeff King
2016-09-26  4:45     ` Junio C Hamano
2016-09-26 11:57       ` [PATCH 0/10] helping people resolve ambiguous sha1s Jeff King
2016-09-26 11:59         ` [PATCH 01/10] get_sha1: detect buggy calls with multiple disambiguators Jeff King
2016-09-26 16:37           ` Junio C Hamano
2016-09-26 17:21             ` Jeff King
2016-09-26 17:50               ` Junio C Hamano
2016-09-26 11:59         ` [PATCH 02/10] get_sha1: avoid repeating ourselves via ONLY_TO_DIE Jeff King
2016-09-26 11:59         ` [PATCH 03/10] get_sha1: propagate flags to child functions Jeff King
2016-09-26 11:59         ` [PATCH 04/10] get_short_sha1: peel tags when looking for treeish Jeff King
2016-09-26 12:11           ` Jeff King
2016-09-26 16:55           ` Junio C Hamano
2016-09-26 17:23             ` Jeff King
2016-09-26 12:00         ` [PATCH 05/10] get_short_sha1: refactor init of disambiguation code Jeff King
2016-09-26 12:00         ` [PATCH 06/10] get_short_sha1: NUL-terminate hex prefix Jeff King
2016-09-26 17:10           ` Junio C Hamano
2016-09-26 17:25             ` Jeff King
2016-09-26 17:36               ` Junio C Hamano
2016-09-26 12:00         ` [PATCH 07/10] get_short_sha1: mark ambiguity error for translation Jeff King
2016-09-26 12:00         ` [PATCH 08/10] sha1_array: let callbacks interrupt iteration Jeff King
2016-09-26 12:00         ` [PATCH 09/10] for_each_abbrev: drop duplicate objects Jeff King
2016-09-26 12:00         ` [PATCH 10/10] get_short_sha1: list ambiguous objects on error Jeff King
2016-09-26 16:36           ` Linus Torvalds
2016-09-27  5:42             ` Jacob Keller
2016-09-27 12:38             ` Jeff King
2016-09-29 13:01             ` Kyle J. McKay
2016-09-29 13:24               ` Jeff King
2016-09-29 14:36                 ` Kyle J. McKay
2016-09-29 14:55                   ` Jeff King
2016-09-26 17:30           ` Junio C Hamano
2016-09-26 17:34             ` Jeff King
2016-09-26 17:39               ` Junio C Hamano
2016-09-29 11:46           ` Kyle J. McKay
2016-09-29 13:03             ` Jeff King
2016-09-29 17:19               ` Junio C Hamano
2016-09-30  5:51                 ` Jacob Keller
2019-02-04 16:12     ` Ævar Arnfjörð Bjarmason [this message]
2019-02-04 19:13       ` [RFC/PATCH] core.abbrev doc: document and test the abbreviation length Junio C Hamano
2019-02-04 20:04       ` Junio C Hamano
2019-02-04 21:36         ` Ævar Arnfjörð Bjarmason
2019-02-04 23:32         ` Jeff King
2019-02-04 23:50           ` Ævar Arnfjörð Bjarmason
2019-02-06 18:29             ` Jeff King
2019-02-06 18:36               ` Ævar Arnfjörð Bjarmason
2016-09-26  6:33   ` Changing the default for "core.abbrev"? Matthieu Moy
2016-09-26 12:09     ` Jeff King
2016-09-29 13:01   ` Kyle J. McKay
2016-09-26  7:13 ` Christian Couder
2016-09-28 23:30 ` [PATCH 0/4] raising core.abbrev default to 12 hexdigits Junio C Hamano
2016-09-28 23:30   ` [PATCH 1/4] config: allow customizing /etc/gitconfig location Junio C Hamano
2016-09-29  9:53     ` Jakub Narębski
2016-09-29 17:20       ` Junio C Hamano
2016-09-29 17:45         ` Matthieu Moy
2016-09-28 23:30   ` [PATCH 2/4] t13xx: do not assume system config is empty Junio C Hamano
2016-09-29  9:01     ` Jeff King
2016-09-29 18:13       ` Junio C Hamano
2016-09-29 18:26         ` Jeff King
2016-09-29 18:57           ` Junio C Hamano
2016-09-29 19:18             ` Jeff King
2016-09-29 19:57               ` Junio C Hamano
2016-09-29 19:06           ` Junio C Hamano
2016-09-29 19:26             ` Jeff King
2016-09-29 21:03               ` Junio C Hamano
2016-09-29 21:08                 ` Jeff King
2016-09-28 23:30   ` [PATCH 3/4] worktree: honor configuration variables Junio C Hamano
2016-09-28 23:30   ` [PATCH 4/4] core.abbrev: raise the default abbreviation to 12 hexdigits Junio C Hamano
2016-09-29  2:44     ` SZEDER Gábor
2016-09-29  5:27       ` Lukas Fleischer
2016-09-29  9:22         ` Jeff King
2016-09-29  9:15       ` Jeff King
2016-09-29 10:03         ` Matthieu Moy
2016-09-29 12:52         ` SZEDER Gábor
2016-09-29  5:58     ` Johannes Sixt
2016-09-29 18:05       ` Junio C Hamano
2016-09-29 18:37         ` Linus Torvalds
2016-09-29 18:55           ` Linus Torvalds
2016-09-29 19:06             ` Linus Torvalds
2016-09-29 19:42               ` Junio C Hamano
2016-09-30  0:56               ` Mike Hommey
2016-09-30  1:01                 ` Linus Torvalds
2016-09-30 19:41                   ` Ævar Arnfjörð Bjarmason
2016-09-29 19:16             ` Jeff King
2016-09-29 19:40               ` Linus Torvalds
2016-09-29 19:45                 ` Junio C Hamano
2016-09-29 21:53                   ` Linus Torvalds
2016-09-29 23:13                     ` Junio C Hamano
2016-09-29 23:20                       ` Junio C Hamano
2016-09-30  0:20                       ` Linus Torvalds
2016-09-30  0:28                         ` Linus Torvalds
2016-09-30  0:57                           ` Linus Torvalds
2016-09-30  1:18                             ` Linus Torvalds
2016-09-30  3:54                               ` Junio C Hamano
2016-09-30  4:10                                 ` Junio C Hamano
2016-09-30  4:18                                   ` Linus Torvalds
2016-09-30  4:29                                     ` Linus Torvalds
2016-09-30  4:27                                   ` Junio C Hamano
2016-09-30  4:35                                     ` Junio C Hamano
2016-09-30 18:40                                     ` Junio C Hamano
2016-09-30 18:51                                       ` Linus Torvalds
2016-09-30 19:00                                         ` Junio C Hamano
2016-09-30  4:11                                 ` Linus Torvalds
2016-09-30  8:06                               ` Jeff King
2016-09-30 17:54                                 ` Linus Torvalds
2016-09-30 18:05                                   ` Jeff King
2016-09-30 18:21                                   ` Linus Torvalds
2016-09-30 20:01                                     ` Junio C Hamano
2016-09-30 17:56                                 ` Junio C Hamano
2016-09-30  7:47                       ` Jeff King
2016-09-29  9:25     ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190204161217.20047-1-avarab@gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).