[PATCH 00/20] unconditional O(1) SHA-1 abbreviation

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH 00/20] unconditional O(1) SHA-1 abbreviation
@ 2018-06-08 22:41 Ævar Arnfjörð Bjarmason
  2018-06-08 22:41 ` [PATCH 01/20] t/README: clarify the description of test_line_count Ævar Arnfjörð Bjarmason
                   ` (19 more replies)
  0 siblings, 20 replies; 37+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-08 22:41 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, dstolee, git, peff, johannes.schindelin, jrnieder,
	Linus Torvalds, Ævar Arnfjörð Bjarmason

This patch series implements an entirely alternate method of achieving
some of the same ends as the MIDX, using the approach suggested by
Jeff King from the RFC thread back in January[1]. You can now do:

    core.abbrev = 20
    core.validateAbbrev = false

Or:

    core.abbrev = +2
    core.validateAbbrev = false

On linux.git `git log --oneline --raw --parents` with 64MB packs gives
this improvement with core.abbrev=15 & core.validateAbbrev=false
(v.s. true):

    Test                                        HEAD~               HEAD
    ----------------------------------------------------------------------------------------
    0014.1: git log --oneline --raw --parents   95.68(95.07+0.53)   42.74(42.33+0.39) -55.3%

While cleaning up the RFC version of this which I sent in [2] I
discovered that almost none of the existing functionality was tested
for, and was very inconsistent since we have 4 different places where
the abbrev config is parsed.

See 19/20 and 20/20 for what this whole thing is building towards, the
rest is all tests, cleanup, and preparatory work.

(There's still other long-standing issues with the existing behavior
which this doesn't change, but I had to stop somewhere to make this
digestible).

1. https://public-inbox.org/git/20180108102029.GA21232@sigill.intra.peff.net/
2. https://public-inbox.org/git/20180606102719.27145-1-avarab@gmail.com/

Ævar Arnfjörð Bjarmason (20):
  t/README: clarify the description of test_line_count
  test library: add a test_byte_count
  blame doc: explicitly note how --abbrev=40 gives 39 chars
  abbrev tests: add tests for core.abbrev and --abbrev
  abbrev tests: test "git-blame" behavior
  blame: fix a bug, core.abbrev should work like --abbrev
  abbrev tests: test "git branch" behavior
  abbrev tests: test for "git-describe" behavior
  abbrev tests: test for "git-log" behavior
  abbrev tests: test for "git-diff" behavior
  abbrev tests: test for plumbing behavior
  abbrev tests: test for --abbrev and core.abbrev=[+-]N
  parse-options-cb.c: convert uses of 40 to GIT_SHA1_HEXSZ
  config.c: use braces on multiple conditional arms
  parse-options-cb.c: use braces on multiple conditional arms
  abbrev: unify the handling of non-numeric values
  abbrev: unify the handling of empty values
  abbrev parsing: use braces on multiple conditional arms
  abbrev: support relative abbrev values
  abbrev: add a core.validateAbbrev setting

 Documentation/config.txt            |  49 +++
 Documentation/diff-options.txt      |   3 +
 Documentation/git-blame.txt         |  14 +
 Documentation/git-branch.txt        |   3 +
 Documentation/git-describe.txt      |   3 +
 Documentation/git-ls-files.txt      |   3 +
 Documentation/git-ls-tree.txt       |   3 +
 Documentation/git-show-ref.txt      |   3 +
 builtin/blame.c                     |   2 +
 cache.h                             |   2 +
 config.c                            |  22 +-
 diff.c                              |  24 +-
 environment.c                       |   2 +
 parse-options-cb.c                  |  19 +-
 revision.c                          |  24 +-
 sha1-name.c                         |  15 +
 t/README                            |   6 +-
 t/perf/p0014-abbrev.sh              |  13 +
 t/t0014-abbrev.sh                   | 452 ++++++++++++++++++++++++++++
 t/t1512-rev-parse-disambiguation.sh |  47 +++
 t/t6006-rev-list-format.sh          |   6 +-
 t/test-lib-functions.sh             |  23 ++
 22 files changed, 722 insertions(+), 16 deletions(-)
 create mode 100755 t/perf/p0014-abbrev.sh
 create mode 100755 t/t0014-abbrev.sh

-- 
2.17.0.290.gded63e768a


^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH 01/20] t/README: clarify the description of test_line_count
  2018-06-08 22:41 [PATCH 00/20] unconditional O(1) SHA-1 abbreviation Ævar Arnfjörð Bjarmason
@ 2018-06-08 22:41 ` Ævar Arnfjörð Bjarmason
  2018-06-08 22:41 ` [PATCH 02/20] test library: add a test_byte_count Ævar Arnfjörð Bjarmason
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-08 22:41 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, dstolee, git, peff, johannes.schindelin, jrnieder,
	Linus Torvalds, Ævar Arnfjörð Bjarmason

Referring to the "length" of a file is ambiguous. We mean the number
of lines, so let's say that.

Changes the description added in fb3340a6a7 ("test-lib: introduce
test_line_count to measure files", 2010-10-31).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/README | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/README b/t/README
index 8373a27fea..eede11d649 100644
--- a/t/README
+++ b/t/README
@@ -730,7 +730,7 @@ library for your script to use.
 
  - test_line_count (= | -lt | -ge | ...) <length> <file>
 
-   Check whether a file has the length it is expected to.
+   Check whether a file has the number of lines it is expected to.
 
  - test_path_is_file <path> [<diagnosis>]
    test_path_is_dir <path> [<diagnosis>]
-- 
2.17.0.290.gded63e768a


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 02/20] test library: add a test_byte_count
  2018-06-08 22:41 [PATCH 00/20] unconditional O(1) SHA-1 abbreviation Ævar Arnfjörð Bjarmason
  2018-06-08 22:41 ` [PATCH 01/20] t/README: clarify the description of test_line_count Ævar Arnfjörð Bjarmason
@ 2018-06-08 22:41 ` Ævar Arnfjörð Bjarmason
  2018-06-08 22:41 ` [PATCH 03/20] blame doc: explicitly note how --abbrev=40 gives 39 chars Ævar Arnfjörð Bjarmason
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-08 22:41 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, dstolee, git, peff, johannes.schindelin, jrnieder,
	Linus Torvalds, Ævar Arnfjörð Bjarmason

This new function is like test_line_count except it uses "wc -c"
instead of "wc -l". Perhaps this should be a parameter, but I don't
see us needing "wc -m" (chars), "wc -w" (words) etc.

Change a couple of existing tests that use this, I expect to use this
in later patches when adding more tests.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/README                   |  4 ++++
 t/t6006-rev-list-format.sh |  6 ++----
 t/test-lib-functions.sh    | 23 +++++++++++++++++++++++
 3 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/t/README b/t/README
index eede11d649..3139c27e4e 100644
--- a/t/README
+++ b/t/README
@@ -728,6 +728,10 @@ library for your script to use.
    Check whether the <expected> rev points to the same commit as the
    <actual> rev.
 
+ - test_byte_count (= | -lt | -ge | ...) <length> <file>
+
+   Check whether a file has the number of bytes it is expected to.
+
  - test_line_count (= | -lt | -ge | ...) <length> <file>
 
    Check whether a file has the number of lines it is expected to.
diff --git a/t/t6006-rev-list-format.sh b/t/t6006-rev-list-format.sh
index ec42c2f779..ec068c55ab 100755
--- a/t/t6006-rev-list-format.sh
+++ b/t/t6006-rev-list-format.sh
@@ -456,14 +456,12 @@ test_expect_success '--abbrev' '
 
 test_expect_success '%H is not affected by --abbrev-commit' '
 	git log -1 --format=%H --abbrev-commit --abbrev=20 HEAD >actual &&
-	len=$(wc -c <actual) &&
-	test $len = 41
+	test_byte_count = 41 actual
 '
 
 test_expect_success '%h is not affected by --abbrev-commit' '
 	git log -1 --format=%h --abbrev-commit --abbrev=20 HEAD >actual &&
-	len=$(wc -c <actual) &&
-	test $len = 21
+	test_byte_count = 21 actual
 '
 
 test_expect_success '"%h %gD: %gs" is same as git-reflog' '
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 2b2181dca0..91a566f14e 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -589,6 +589,29 @@ test_path_is_missing () {
 	fi
 }
 
+# test_byte_count checks that a file has the number of bytes it
+# ought to. For example:
+#
+#	test_expect_success 'produce exactly one byte of output' '
+#		do something >output &&
+#		test_byte_count = 1 output
+#	'
+#
+# is like "test $(wc -c <output) = 1" except that it passes the
+# output through when the number of bytes is wrong.
+
+test_byte_count () {
+	if test $# != 3
+	then
+		error "bug in the test script: not 3 parameters to test_byte_count"
+	elif ! test $(wc -c <"$3") "$1" "$2"
+	then
+		echo "test_byte_count: byte count for $3 !$1 $2"
+		cat "$3"
+		return 1
+	fi
+}
+
 # test_line_count checks that a file has the number of lines it
 # ought to. For example:
 #
-- 
2.17.0.290.gded63e768a


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 03/20] blame doc: explicitly note how --abbrev=40 gives 39 chars
  2018-06-08 22:41 [PATCH 00/20] unconditional O(1) SHA-1 abbreviation Ævar Arnfjörð Bjarmason
  2018-06-08 22:41 ` [PATCH 01/20] t/README: clarify the description of test_line_count Ævar Arnfjörð Bjarmason
  2018-06-08 22:41 ` [PATCH 02/20] test library: add a test_byte_count Ævar Arnfjörð Bjarmason
@ 2018-06-08 22:41 ` Ævar Arnfjörð Bjarmason
  2018-06-12 18:10   ` Junio C Hamano
  2018-06-08 22:41 ` [PATCH 04/20] abbrev tests: add tests for core.abbrev and --abbrev Ævar Arnfjörð Bjarmason
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 37+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-08 22:41 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, dstolee, git, peff, johannes.schindelin, jrnieder,
	Linus Torvalds, Ævar Arnfjörð Bjarmason

In a later change I'm adding stress testing of the commit abbreviation
as it relates to git-blame and others, and initially thought that the
inability to extract full SHA-1s from the non-"--porcelain" output was
a bug.

In hindsight I could have read the existing paragraph more carefully,
but let's make this clearer by explicitly stating this limitation of
--abbrev as it relates to git-blame, it is not shared by any other
command that supports core.abbrev or --abbrev.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-blame.txt | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/Documentation/git-blame.txt b/Documentation/git-blame.txt
index 16323eb80e..7b562494ac 100644
--- a/Documentation/git-blame.txt
+++ b/Documentation/git-blame.txt
@@ -88,6 +88,11 @@ include::blame-options.txt[]
 	Instead of using the default 7+1 hexadecimal digits as the
 	abbreviated object name, use <n>+1 digits. Note that 1 column
 	is used for a caret to mark the boundary commit.
++
+Because of this UI design, the only way to get the full SHA-1 of the
+boundary commit is to use the `--porcelain` format. With `--abbrev=40`
+only 39 characters of the boundary SHA-1 will be emitted, since one
+will be used for the caret to mark the boundary.
 
 
 THE PORCELAIN FORMAT
-- 
2.17.0.290.gded63e768a


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 04/20] abbrev tests: add tests for core.abbrev and --abbrev
  2018-06-08 22:41 [PATCH 00/20] unconditional O(1) SHA-1 abbreviation Ævar Arnfjörð Bjarmason
                   ` (2 preceding siblings ...)
  2018-06-08 22:41 ` [PATCH 03/20] blame doc: explicitly note how --abbrev=40 gives 39 chars Ævar Arnfjörð Bjarmason
@ 2018-06-08 22:41 ` Ævar Arnfjörð Bjarmason
  2018-06-12 18:31   ` Junio C Hamano
  2018-06-08 22:41 ` [PATCH 05/20] abbrev tests: test "git-blame" behavior Ævar Arnfjörð Bjarmason
                   ` (15 subsequent siblings)
  19 siblings, 1 reply; 37+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-08 22:41 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, dstolee, git, peff, johannes.schindelin, jrnieder,
	Linus Torvalds, Ævar Arnfjörð Bjarmason

How hashes are abbreviated is a core feature of git, and should have
its own t0*.sh tests. There's a few tests for core.abbrev and --abbrev
already, but none of those stress the feature itself and its edge
cases much. We should have tests for those in one place.

I don't like some of this behavior of --abbrev being looser about
input values that core.abbrev, But let's start by asserting the
current behavior we have before we change any of it.

That difference in behavior wasn't intentional. The code that does the
command-line parsing was initially added in 0ce865b134 ("Add shortcuts
for very often used options.", 2007-10-14), and it wasn't until much
later in dce9648916 ("Make the default abbrev length configurable",
2010-10-28) when core.abbrev was added with stricter parsing.

That's also only most of the command-line parsing. The diff and log
family of commands have their own parsing for it in diff.c and
revision.c, respectively. Those were added earlier in
47dd0d595d ("diff: --abbrev option", 2005-12-13) and 508d9e372e ("Fix
"--abbrev=xyz" for revision listing", 2006-05-27), although note that
sometimes diff goes via the revision.c path.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t0014-abbrev.sh | 118 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 118 insertions(+)
 create mode 100755 t/t0014-abbrev.sh

diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh
new file mode 100755
index 0000000000..1c60f5ff93
--- /dev/null
+++ b/t/t0014-abbrev.sh
@@ -0,0 +1,118 @@
+#!/bin/sh
+
+test_description='test core.abbrev and related features'
+
+. ./test-lib.sh
+
+tr_d_n() {
+	tr -d '\n'
+}
+
+cut_tr_d_n_field_n() {
+	cut -d " " -f $1 | tr_d_n
+}
+
+test_expect_success 'setup' '
+	test_commit A &&
+	git tag -a -mannotated A.annotated &&
+	test_commit B &&
+	test_commit C &&
+	mkdir X Y &&
+	touch X/file1 Y/file2
+'
+
+test_expect_success 'the FALLBACK_DEFAULT_ABBREV is 7' '
+	git log -1 --pretty=format:%h >log &&
+	test_byte_count = 7 log
+'
+
+test_expect_success 'abbrev empty value handling differs ' '
+	test_must_fail git -c core.abbrev= log -1 --pretty=format:%h 2>stderr &&
+	test_i18ngrep "bad numeric config value.*invalid unit" stderr &&
+
+	git branch -v --abbrev= | cut_tr_d_n_field_n 3 >branch &&
+	test_byte_count = 40 branch &&
+
+	git log --abbrev= -1 --pretty=format:%h >log &&
+	test_byte_count = 4 log &&
+
+	git diff --raw --abbrev= HEAD~ >diff &&
+	cut_tr_d_n_field_n 3 <diff >diff.3 &&
+	test_byte_count = 4 diff.3 &&
+	cut_tr_d_n_field_n 4 <diff >diff.4 &&
+	test_byte_count = 4 diff.4 &&
+
+	test_must_fail git diff --raw --abbrev= --no-index X Y >diff &&
+	cut_tr_d_n_field_n 3 <diff >diff.3 &&
+	test_byte_count = 4 diff.3 &&
+	cut_tr_d_n_field_n 4 <diff >diff.4 &&
+	test_byte_count = 4 diff.4
+'
+
+test_expect_success 'abbrev non-integer value handling differs ' '
+	test_must_fail git -c core.abbrev=XYZ log -1 --pretty=format:%h 2>stderr &&
+	test_i18ngrep "bad numeric config value.*invalid unit" stderr &&
+
+	test_must_fail git branch -v --abbrev=XYZ 2>stderr &&
+	test_i18ngrep "expects a numerical value" stderr &&
+
+	git log --abbrev=XYZ -1 --pretty=format:%h 2>stderr &&
+	! test -s stderr &&
+
+	git diff --raw --abbrev=XYZ HEAD~ 2>stderr &&
+	! test -s stderr &&
+
+	test_must_fail git diff --raw --abbrev=XYZ --no-index X Y 2>stderr &&
+	! test -s stderr
+'
+
+for i in -41 -20 -10 -1 0 1 2 3 41
+do
+	test_expect_success "core.abbrev value $i out of range errors out" "
+		test_must_fail git -c core.abbrev=$i log -1 --pretty=format:%h 2>stderr &&
+		test_i18ngrep 'abbrev length out of range' stderr
+	"
+done
+
+for i in -41 -20 -10 -1
+do
+	test_expect_success "negative --abbrev=$i value out of range means --abbrev=40" "
+		git log --abbrev=$i -1 --pretty=format:%h >log &&
+		test_byte_count = 40 log
+	"
+done
+
+for i in 0 1 2 3 4
+do
+	test_expect_success "non-negative --abbrev=$i value <MINIMUM_ABBREV falls back on MINIMUM_ABBREV" "
+		git log --abbrev=$i -1 --pretty=format:%h >log &&
+		test_byte_count = 4 log
+	"
+done
+
+for i in 41 9001
+do
+	test_expect_success "non-negative --abbrev=$i value >MINIMUM_ABBREV falls back on 40" "
+		git log --abbrev=$i -1 --pretty=format:%h >log &&
+		test_byte_count = 40 log
+	"
+done
+
+for i in $(test_seq 4 40)
+do
+	test_expect_success "core.abbrev=$i and --abbrev=$i in combination within the valid range" "
+		# Both core.abbrev=X and --abbrev=X do the same thing
+		# in isolation
+		git -c core.abbrev=$i log -1 --pretty=format:%h >log &&
+		test_byte_count = $i log &&
+		git log --abbrev=$i -1 --pretty=format:%h >log &&
+		test_byte_count = $i log &&
+
+		# The --abbrev option should take priority over
+		# core.abbrev
+		git -c core.abbrev=20 log --abbrev=$i -1 --pretty=format:%h >log &&
+		test_byte_count = $i log
+	"
+done
+
+test_done
-- 
2.17.0.290.gded63e768a


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 05/20] abbrev tests: test "git-blame" behavior
  2018-06-08 22:41 [PATCH 00/20] unconditional O(1) SHA-1 abbreviation Ævar Arnfjörð Bjarmason
                   ` (3 preceding siblings ...)
  2018-06-08 22:41 ` [PATCH 04/20] abbrev tests: add tests for core.abbrev and --abbrev Ævar Arnfjörð Bjarmason
@ 2018-06-08 22:41 ` Ævar Arnfjörð Bjarmason
  2018-06-08 22:41 ` [PATCH 06/20] blame: fix a bug, core.abbrev should work like --abbrev Ævar Arnfjörð Bjarmason
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-08 22:41 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, dstolee, git, peff, johannes.schindelin, jrnieder,
	Linus Torvalds, Ævar Arnfjörð Bjarmason

Add tests showing how "git-blame" behaves. As noted in an earlier
change there's a behavior difference between core.abbrev=40 and
--abbrev=40.

Let's also assert that neither way of changing the abbreviation length
modifies the porcelain output.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t0014-abbrev.sh | 48 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh
index 1c60f5ff93..77f15d5b0b 100755
--- a/t/t0014-abbrev.sh
+++ b/t/t0014-abbrev.sh
@@ -12,6 +12,10 @@ cut_tr_d_n_field_n() {
 	cut -d " " -f $1 | tr_d_n
 }
 
+nocaret() {
+	sed 's/\^//'
+}
+
 test_expect_success 'setup' '
 	test_commit A &&
 	git tag -a -mannotated A.annotated &&
@@ -115,4 +119,48 @@ do
 	"
 done
 
+for i in $(test_seq 4 40)
+do
+	for opt in --porcelain --line-porcelain
+	do
+		test_expect_success "blame $opt ignores core.abbrev=$i and --abbrev=$i" "
+			git -c core.abbrev=$i blame $opt A.t | head -n 1 | cut_tr_d_n_field_n 1 >blame &&
+			test_byte_count = 40 blame &&
+			git blame $opt --abbrev=$i A.t | head -n 1 | cut_tr_d_n_field_n 1 >blame &&
+			test_byte_count = 40 blame
+		"
+	done
+
+
+	test_expect_success "blame core.abbrev=$i and --abbrev=$i with boundary" "
+		# See the blame documentation for why this is off-by-one
+		git -c core.abbrev=$i blame A.t | cut_tr_d_n_field_n 1 | nocaret >blame &&
+		test_byte_count = $i blame &&
+		git blame --abbrev=$i A.t | cut_tr_d_n_field_n 1 | nocaret >blame &&
+		if test $i -eq 40
+		then
+			test_byte_count = 39 blame
+		else
+			test_byte_count = $i blame
+		fi
+	"
+
+	test_expect_success "blame core.abbrev=$i and --abbrev=$i without boundary" "
+		git -c core.abbrev=$i blame B.t | cut_tr_d_n_field_n 1 | nocaret >blame &&
+		if test $i -eq 40
+		then
+			test_byte_count = $i blame
+		else
+			test_byte_count = \$(($i + 1)) blame
+		fi &&
+		git blame --abbrev=$i B.t | cut_tr_d_n_field_n 1 | nocaret >blame &&
+		if test $i -eq 40
+		then
+			test_byte_count = $i blame
+		else
+			test_byte_count = \$(($i + 1)) blame
+		fi
+	"
+done
+
 test_done
-- 
2.17.0.290.gded63e768a


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 06/20] blame: fix a bug, core.abbrev should work like --abbrev
  2018-06-08 22:41 [PATCH 00/20] unconditional O(1) SHA-1 abbreviation Ævar Arnfjörð Bjarmason
                   ` (4 preceding siblings ...)
  2018-06-08 22:41 ` [PATCH 05/20] abbrev tests: test "git-blame" behavior Ævar Arnfjörð Bjarmason
@ 2018-06-08 22:41 ` Ævar Arnfjörð Bjarmason
  2018-06-08 22:41 ` [PATCH 07/20] abbrev tests: test "git branch" behavior Ævar Arnfjörð Bjarmason
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-08 22:41 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, dstolee, git, peff, johannes.schindelin, jrnieder,
	Linus Torvalds, Ævar Arnfjörð Bjarmason

Since 84393bfd73 ("blame: add --abbrev command line option and make it
honor core.abbrev", 2011-04-06) first released with v1.7.6 "git blame"
has supported both core.abbrev and --abbrev to change the abbreviation
length.

Initially output wouldn't alter the abbreviation length to account for
the boundary commit. This was changed in 91229834c2 ("blame: fix
alignment with --abbrev=40", 2017-01-05) first released with v2.11.1.

That change had a bug which I'm fixing here. It didn't account for the
abbreviation length also being set via core.abbrev, not just via the
--abbrev command-line option.

So let's handle that. The easiest way to do that is to check if the
global default_abbrev variable (-1 by default) has been set by
git_default_core_config(), and if so pretend we had the --abbrev
option is set, in lieu of making everything that uses the "abbrev"
variable now read that OR default_abbrev).

The reason I'm documenting these past behaviors is that whatever our
desires with --porcelain people do parse the human-readable "git
blame" output, and for any machine use are likely to use
core.abbrev=40 or --abbrev=40. Documenting how the format has changed
over time will help those users avoid nasty surprises.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/git-blame.txt | 8 +++++++-
 builtin/blame.c             | 2 ++
 t/t0014-abbrev.sh           | 7 ++++++-
 3 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-blame.txt b/Documentation/git-blame.txt
index 7b562494ac..d6cddbcb2e 100644
--- a/Documentation/git-blame.txt
+++ b/Documentation/git-blame.txt
@@ -92,7 +92,13 @@ include::blame-options.txt[]
 Because of this UI design, the only way to get the full SHA-1 of the
 boundary commit is to use the `--porcelain` format. With `--abbrev=40`
 only 39 characters of the boundary SHA-1 will be emitted, since one
-will be used for the caret to mark the boundary.
+will be used for the caret to mark the boundary. This behavior was
+different before 2.11.1, git would then emit the 40 character if
+requested, resulting in unaligned output.
++
+Before 2.19, setting `core.abbrev=40` wouldn't apply the above rule
+and would cause blame to emit output that was unaligned. This bug has
+since been fixed.
 
 
 THE PORCELAIN FORMAT
diff --git a/builtin/blame.c b/builtin/blame.c
index 4202584f97..6ab08561d1 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -868,6 +868,8 @@ int cmd_blame(int argc, const char **argv, const char *prefix)
 	} else if (show_progress < 0)
 		show_progress = isatty(2);
 
+	if (default_abbrev >= 0)
+		abbrev = default_abbrev;
 	if (0 < abbrev && abbrev < GIT_SHA1_HEXSZ)
 		/* one more abbrev length is needed for the boundary commit */
 		abbrev++;
diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh
index 77f15d5b0b..934c54a96b 100755
--- a/t/t0014-abbrev.sh
+++ b/t/t0014-abbrev.sh
@@ -135,7 +135,12 @@ do
 	test_expect_success "blame core.abbrev=$i and --abbrev=$i with boundary" "
 		# See the blame documentation for why this is off-by-one
 		git -c core.abbrev=$i blame A.t | cut_tr_d_n_field_n 1 | nocaret >blame &&
-		test_byte_count = $i blame &&
+		if test $i -eq 40
+		then
+			test_byte_count = 39 blame
+		else
+			test_byte_count = $i blame
+		fi &&
 		git blame --abbrev=$i A.t | cut_tr_d_n_field_n 1 | nocaret >blame &&
 		if test $i -eq 40
 		then
-- 
2.17.0.290.gded63e768a


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 07/20] abbrev tests: test "git branch" behavior
  2018-06-08 22:41 [PATCH 00/20] unconditional O(1) SHA-1 abbreviation Ævar Arnfjörð Bjarmason
                   ` (5 preceding siblings ...)
  2018-06-08 22:41 ` [PATCH 06/20] blame: fix a bug, core.abbrev should work like --abbrev Ævar Arnfjörð Bjarmason
@ 2018-06-08 22:41 ` Ævar Arnfjörð Bjarmason
  2018-06-08 22:41 ` [PATCH 08/20] abbrev tests: test for "git-describe" behavior Ævar Arnfjörð Bjarmason
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-08 22:41 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, dstolee, git, peff, johannes.schindelin, jrnieder,
	Linus Torvalds, Ævar Arnfjörð Bjarmason

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t0014-abbrev.sh | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh
index 934c54a96b..d8b060d922 100755
--- a/t/t0014-abbrev.sh
+++ b/t/t0014-abbrev.sh
@@ -168,4 +168,14 @@ do
 	"
 done
 
+for i in $(test_seq 4 40)
+do
+	test_expect_success "branch core.abbrev=$i and --abbrev=$i" "
+		git -c core.abbrev=$i branch -v | cut_tr_d_n_field_n 3 >branch &&
+		test_byte_count = $i branch &&
+		git branch --abbrev=$i -v | cut_tr_d_n_field_n 3 >branch &&
+		test_byte_count = $i branch
+	"
+done
+
 test_done
-- 
2.17.0.290.gded63e768a


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 08/20] abbrev tests: test for "git-describe" behavior
  2018-06-08 22:41 [PATCH 00/20] unconditional O(1) SHA-1 abbreviation Ævar Arnfjörð Bjarmason
                   ` (6 preceding siblings ...)
  2018-06-08 22:41 ` [PATCH 07/20] abbrev tests: test "git branch" behavior Ævar Arnfjörð Bjarmason
@ 2018-06-08 22:41 ` Ævar Arnfjörð Bjarmason
  2018-06-08 22:41 ` [PATCH 09/20] abbrev tests: test for "git-log" behavior Ævar Arnfjörð Bjarmason
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-08 22:41 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, dstolee, git, peff, johannes.schindelin, jrnieder,
	Linus Torvalds, Ævar Arnfjörð Bjarmason

The only thing out of the ordinary with git-describe is the --abbrev=0
special-case.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t0014-abbrev.sh | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh
index d8b060d922..645bcca1d1 100755
--- a/t/t0014-abbrev.sh
+++ b/t/t0014-abbrev.sh
@@ -16,6 +16,10 @@ nocaret() {
 	sed 's/\^//'
 }
 
+sed_g_tr_d_n() {
+	sed 's/.*g//' | tr_d_n
+}
+
 test_expect_success 'setup' '
 	test_commit A &&
 	git tag -a -mannotated A.annotated &&
@@ -178,4 +182,25 @@ do
 	"
 done
 
+test_expect_success 'describe core.abbrev and --abbrev special cases' '
+	# core.abbrev=0 behaves as usual...
+	test_must_fail git -c core.abbrev=0 describe &&
+
+	# ...but --abbrev=0 is special-cased to print the nearest tag,
+	# not fall back on "4" like git-log.
+	echo A.annotated >expected &&
+	git describe --abbrev=0 >actual &&
+	test_cmp expected actual
+'
+
+for i in $(test_seq 4 40)
+do
+	test_expect_success "describe core.abbrev=$i and --abbrev=$i" "
+		git -c core.abbrev=$i describe | sed_g_tr_d_n >describe &&
+		test_byte_count = $i describe &&
+		git describe --abbrev=$i | sed_g_tr_d_n >describe &&
+		test_byte_count = $i describe
+	"
+done
+
 test_done
-- 
2.17.0.290.gded63e768a


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 09/20] abbrev tests: test for "git-log" behavior
  2018-06-08 22:41 [PATCH 00/20] unconditional O(1) SHA-1 abbreviation Ævar Arnfjörð Bjarmason
                   ` (7 preceding siblings ...)
  2018-06-08 22:41 ` [PATCH 08/20] abbrev tests: test for "git-describe" behavior Ævar Arnfjörð Bjarmason
@ 2018-06-08 22:41 ` Ævar Arnfjörð Bjarmason
  2018-06-09  8:43   ` Martin Ågren
  2018-06-08 22:41 ` [PATCH 10/20] abbrev tests: test for "git-diff" behavior Ævar Arnfjörð Bjarmason
                   ` (10 subsequent siblings)
  19 siblings, 1 reply; 37+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-08 22:41 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, dstolee, git, peff, johannes.schindelin, jrnieder,
	Linus Torvalds, Ævar Arnfjörð Bjarmason

The "log" family of commands does its own parsing for --abbrev in
revision.c, so having dedicated tests for it makes sense.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t0014-abbrev.sh | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh
index 645bcca1d1..a66051c040 100755
--- a/t/t0014-abbrev.sh
+++ b/t/t0014-abbrev.sh
@@ -203,4 +203,14 @@ do
 	"
 done
 
+for i in $(test_seq 4 40)
+do
+	test_expect_success "log core.abbrev=$i and --abbrev=$i" "
+		git -c core.abbrev=$i log --pretty=format:%h -1 | tr_d_n >log &&
+		test_byte_count = $i log &&
+		git log --abbrev=$i --pretty=format:%h -1 | tr_d_n >log &&
+		test_byte_count = $i log
+	"
+done
+
 test_done
-- 
2.17.0.290.gded63e768a


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 10/20] abbrev tests: test for "git-diff" behavior
  2018-06-08 22:41 [PATCH 00/20] unconditional O(1) SHA-1 abbreviation Ævar Arnfjörð Bjarmason
                   ` (8 preceding siblings ...)
  2018-06-08 22:41 ` [PATCH 09/20] abbrev tests: test for "git-log" behavior Ævar Arnfjörð Bjarmason
@ 2018-06-08 22:41 ` Ævar Arnfjörð Bjarmason
  2018-06-08 22:41 ` [PATCH 11/20] abbrev tests: test for plumbing behavior Ævar Arnfjörð Bjarmason
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-08 22:41 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, dstolee, git, peff, johannes.schindelin, jrnieder,
	Linus Torvalds, Ævar Arnfjörð Bjarmason

The "diff" family of commands does its own parsing for --abbrev in
diff.c, so having dedicated tests for it makes sense.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t0014-abbrev.sh | 31 +++++++++++++++++++++++++++++++
 1 file changed, 31 insertions(+)

diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh
index a66051c040..783807475f 100755
--- a/t/t0014-abbrev.sh
+++ b/t/t0014-abbrev.sh
@@ -213,4 +213,35 @@ do
 	"
 done
 
+for i in $(test_seq 4 40)
+do
+	test_expect_success "diff --no-index --raw core.abbrev=$i and --abbrev=$i" "
+		test_must_fail git -c core.abbrev=$i diff --no-index --raw X Y >diff &&
+		cut_tr_d_n_field_n 3 <diff >diff.3 &&
+		test_byte_count = $i diff.3 &&
+		cut_tr_d_n_field_n 4 <diff >diff.4 &&
+		test_byte_count = $i diff.4 &&
+
+		test_must_fail git diff --no-index --raw --abbrev=$i X Y >diff &&
+		cut_tr_d_n_field_n 3 <diff >diff.3 &&
+		test_byte_count = $i diff.3 &&
+		cut_tr_d_n_field_n 4 <diff >diff.4 &&
+		test_byte_count = $i diff.4
+	"
+
+	test_expect_success "diff --raw core.abbrev=$i and --abbrev=$i" "
+		git -c core.abbrev=$i diff --raw HEAD~ >diff &&
+		cut_tr_d_n_field_n 3 <diff >diff.3 &&
+		test_byte_count = $i diff.3 &&
+		cut_tr_d_n_field_n 4 <diff >diff.4 &&
+		test_byte_count = $i diff.4 &&
+
+		git diff --raw --abbrev=$i HEAD~ >diff &&
+		cut_tr_d_n_field_n 3 <diff >diff.3 &&
+		test_byte_count = $i diff.3 &&
+		cut_tr_d_n_field_n 4 <diff >diff.4 &&
+		test_byte_count = $i diff.4
+	"
+done
+
 test_done
-- 
2.17.0.290.gded63e768a


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 11/20] abbrev tests: test for plumbing behavior
  2018-06-08 22:41 [PATCH 00/20] unconditional O(1) SHA-1 abbreviation Ævar Arnfjörð Bjarmason
                   ` (9 preceding siblings ...)
  2018-06-08 22:41 ` [PATCH 10/20] abbrev tests: test for "git-diff" behavior Ævar Arnfjörð Bjarmason
@ 2018-06-08 22:41 ` Ævar Arnfjörð Bjarmason
  2018-06-08 22:41 ` [PATCH 12/20] abbrev tests: test for --abbrev and core.abbrev=[+-]N Ævar Arnfjörð Bjarmason
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-08 22:41 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, dstolee, git, peff, johannes.schindelin, jrnieder,
	Linus Torvalds, Ævar Arnfjörð Bjarmason

The "git-{ls-files,ls-tree,show-ref}" commands all have in common that
the core.abbrev variable is ignored, and only --abbrev works. This is
intentional since these are all plumbing.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t0014-abbrev.sh | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh
index 783807475f..5a99cbe434 100755
--- a/t/t0014-abbrev.sh
+++ b/t/t0014-abbrev.sh
@@ -244,4 +244,36 @@ do
 	"
 done
 
+for i in $(test_seq 4 40)
+do
+	test_expect_success "ls-files core.abbrev=$i and --abbrev=$i" "
+		git -c core.abbrev=$i ls-files --stage A.t | cut_tr_d_n_field_n 2 >ls-files &&
+		test_byte_count = 40 ls-files &&
+		git ls-files --abbrev=$i --stage A.t | cut_tr_d_n_field_n 2 >ls-files &&
+		test_byte_count = $i ls-files
+	"
+done
+
+for i in $(test_seq 4 40)
+do
+	test_expect_success "ls-tree core.abbrev=$i and --abbrev=$i" "
+		git -c core.abbrev=$i ls-tree HEAD A.t | cut -f 1 | cut_tr_d_n_field_n 3 >ls-tree &&
+		test_byte_count = 40 ls-tree &&
+		git ls-tree --abbrev=$i HEAD A.t | cut -f 1 | cut_tr_d_n_field_n 3 >ls-tree &&
+		test_byte_count = $i ls-tree
+	"
+done
+
+for i in $(test_seq 4 40)
+do
+	test_expect_success "show-ref core.abbrev=$i and --abbrev=$i" "
+		git -c core.abbrev=$i show-ref --hash refs/heads/master | tr_d_n >show-ref &&
+		test_byte_count = 40 show-ref &&
+		git show-ref --hash --abbrev=$i refs/heads/master | tr_d_n >show-ref &&
+		test_byte_count = $i show-ref &&
+		git show-ref --hash=$i refs/heads/master | tr_d_n >show-ref &&
+		test_byte_count = $i show-ref
+	"
+done
+
 test_done
-- 
2.17.0.290.gded63e768a


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 12/20] abbrev tests: test for --abbrev and core.abbrev=[+-]N
  2018-06-08 22:41 [PATCH 00/20] unconditional O(1) SHA-1 abbreviation Ævar Arnfjörð Bjarmason
                   ` (10 preceding siblings ...)
  2018-06-08 22:41 ` [PATCH 11/20] abbrev tests: test for plumbing behavior Ævar Arnfjörð Bjarmason
@ 2018-06-08 22:41 ` Ævar Arnfjörð Bjarmason
  2018-06-08 22:41 ` [PATCH 13/20] parse-options-cb.c: convert uses of 40 to GIT_SHA1_HEXSZ Ævar Arnfjörð Bjarmason
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-08 22:41 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, dstolee, git, peff, johannes.schindelin, jrnieder,
	Linus Torvalds, Ævar Arnfjörð Bjarmason

In a later change I mean to make values like -1 and +1 mean something
different, but right now they're implicitly parsed. Let's test for the
current behavior before changing it.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t0014-abbrev.sh | 131 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 128 insertions(+), 3 deletions(-)

diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh
index 5a99cbe434..6dee92f35e 100755
--- a/t/t0014-abbrev.sh
+++ b/t/t0014-abbrev.sh
@@ -74,7 +74,7 @@ test_expect_success 'abbrev non-integer value handling differs ' '
 	! test -s stderr
 '
 
-for i in -41 -20 -10 -1 0 1 2 3 41
+for i in -41 -20 -10 -1 -0 +0 0 1 2 3 41
 do
 	test_expect_success "core.abbrev value $i out of range errors out" "
 		test_must_fail git -c core.abbrev=$i log -1 --pretty=format:%h 2>stderr &&
@@ -90,7 +90,7 @@ do
 	"
 done
 
-for i in 0 1 2 3 4
+for i in 0 1 2 3 4 -0 +0 +1 +2 +3 +4
 do
 	test_expect_success "non-negative --abbrev=$i value <MINIMUM_ABBREV falls back on MINIMUM_ABBREV" "
 		git log --abbrev=$i -1 --pretty=format:%h >log &&
@@ -98,7 +98,7 @@ do
 	"
 done
 
-for i in 41 9001
+for i in 41 9001 +41 +9001
 do
 	test_expect_success "non-negative --abbrev=$i value >MINIMUM_ABBREV falls back on 40" "
 		git log --abbrev=$i -1 --pretty=format:%h >log &&
@@ -116,6 +116,10 @@ do
 		git log --abbrev=$i -1 --pretty=format:%h >log &&
 		test_byte_count = $i log &&
 
+		# core.abbrev=+N is the same as core.abbrev=N
+		git -c core.abbrev=+$i log -1 --pretty=format:%h >log &&
+		test_byte_count = $i log &&
+
 		# The --abbrev option should take priority over
 		# core.abbrev
 		git -c core.abbrev=20 log --abbrev=$i -1 --pretty=format:%h >log &&
@@ -172,16 +176,39 @@ do
 	"
 done
 
+test_expect_success 'blame core.abbrev=[-+]1 and --abbrev=[-+]1' '
+	test_must_fail git -c core.abbrev=+1 blame A.t | cut_tr_d_n_field_n 1 >blame &&
+	test_must_fail git -c core.abbrev=-1 blame A.t | cut_tr_d_n_field_n 1 >blame &&
+
+	git blame --abbrev=-1 A.t | cut_tr_d_n_field_n 1 >blame &&
+	test_byte_count = 5 blame &&
+
+	git blame --abbrev=+1 A.t | cut_tr_d_n_field_n 1 >blame &&
+	test_byte_count = 5 blame
+'
+
 for i in $(test_seq 4 40)
 do
 	test_expect_success "branch core.abbrev=$i and --abbrev=$i" "
 		git -c core.abbrev=$i branch -v | cut_tr_d_n_field_n 3 >branch &&
 		test_byte_count = $i branch &&
+
 		git branch --abbrev=$i -v | cut_tr_d_n_field_n 3 >branch &&
 		test_byte_count = $i branch
 	"
 done
 
+test_expect_success 'branch core.abbrev=[-+]1 and --abbrev=[-+]1' '
+	test_must_fail git -c core.abbrev=+1 branch -v | cut_tr_d_n_field_n 3 >branch &&
+	test_must_fail git -c core.abbrev=-1 branch -v | cut_tr_d_n_field_n 3 >branch &&
+
+	git branch --abbrev=-1 -v | cut_tr_d_n_field_n 3 >branch &&
+	test_byte_count = 4 branch &&
+
+	git branch --abbrev=+1 -v | cut_tr_d_n_field_n 3 >branch &&
+	test_byte_count = 4 branch
+'
+
 test_expect_success 'describe core.abbrev and --abbrev special cases' '
 	# core.abbrev=0 behaves as usual...
 	test_must_fail git -c core.abbrev=0 describe &&
@@ -203,6 +230,17 @@ do
 	"
 done
 
+test_expect_success 'describe core.abbrev=[-+]1 and --abbrev=[-+]1' '
+	test_must_fail git -c core.abbrev=+1 describe | sed_g_tr_d_n >describe &&
+	test_must_fail git -c core.abbrev=-1 describe | sed_g_tr_d_n >describe &&
+
+	git describe --abbrev=-1 | sed_g_tr_d_n >describe &&
+	test_byte_count = 4 describe &&
+
+	git describe --abbrev=+1 | sed_g_tr_d_n >describe &&
+	test_byte_count = 4 describe
+'
+
 for i in $(test_seq 4 40)
 do
 	test_expect_success "log core.abbrev=$i and --abbrev=$i" "
@@ -213,6 +251,20 @@ do
 	"
 done
 
+test_expect_success 'log core.abbrev=[-+]1 and --abbrev=[-+]1' '
+	test_must_fail git -c core.abbrev=+1 log --pretty=format:%h -1 2>stderr &&
+	test_i18ngrep "abbrev length out of range" stderr &&
+
+	test_must_fail git -c core.abbrev=-1 log --pretty=format:%h -1 2>stderr &&
+	test_i18ngrep "abbrev length out of range" stderr &&
+
+	git log --abbrev=+1 --pretty=format:%h -1 | tr_d_n >log &&
+	test_byte_count = 4 log &&
+
+	git log --abbrev=-1 --pretty=format:%h -1 | tr_d_n >log &&
+	test_byte_count = 40 log
+'
+
 for i in $(test_seq 4 40)
 do
 	test_expect_success "diff --no-index --raw core.abbrev=$i and --abbrev=$i" "
@@ -244,6 +296,46 @@ do
 	"
 done
 
+test_expect_success 'diff --no-index --raw core.abbrev=[-+]1 and --abbrev=[-+]1' '
+	test_must_fail git -c core.abbrev=+1 diff --no-index --raw X Y 2>stderr &&
+	test_i18ngrep "abbrev length out of range" stderr &&
+
+	test_must_fail git -c core.abbrev=-1 diff --no-index --raw X Y 2>stderr &&
+	test_i18ngrep "abbrev length out of range" stderr &&
+
+	test_must_fail git diff --no-index --raw --abbrev=+1 X Y >diff &&
+	cut_tr_d_n_field_n 3 <diff >diff.3 &&
+	test_byte_count = 4 diff.3 &&
+	cut_tr_d_n_field_n 4 <diff >diff.4 &&
+	test_byte_count = 4 diff.4 &&
+
+	test_must_fail git diff --no-index --raw --abbrev=-1 X Y >diff &&
+	cut_tr_d_n_field_n 3 <diff >diff.3 &&
+	test_byte_count = 4 diff.3 &&
+	cut_tr_d_n_field_n 4 <diff >diff.4 &&
+	test_byte_count = 4 diff.4
+'
+
+test_expect_success 'diff --raw core.abbrev=[-+]1 and --abbrev=[-+]1' '
+	test_must_fail git -c core.abbrev=+1 diff HEAD~ 2>stderr &&
+	test_i18ngrep "abbrev length out of range" stderr &&
+
+	test_must_fail git -c core.abbrev=-1 diff HEAD~ 2>stderr &&
+	test_i18ngrep "abbrev length out of range" stderr &&
+
+	git diff --raw --abbrev=+1 HEAD~ >diff &&
+	cut_tr_d_n_field_n 3 <diff >diff.3 &&
+	test_byte_count = 4 diff.3 &&
+	cut_tr_d_n_field_n 4 <diff >diff.4 &&
+	test_byte_count = 4 diff.4 &&
+
+	git diff --raw --abbrev=-1 HEAD~ >diff &&
+	cut_tr_d_n_field_n 3 <diff >diff.3 &&
+	test_byte_count = 40 diff.3 &&
+	cut_tr_d_n_field_n 4 <diff >diff.4 &&
+	test_byte_count = 40 diff.4
+'
+
 for i in $(test_seq 4 40)
 do
 	test_expect_success "ls-files core.abbrev=$i and --abbrev=$i" "
@@ -254,6 +346,17 @@ do
 	"
 done
 
+test_expect_success 'ls-files core.abbrev=[-+]1 and --abbrev=[-+]1' '
+	test_must_fail git -c core.abbrev=+1 ls-files --stage A.t | cut_tr_d_n_field_n 2 >ls-files &&
+	test_must_fail git -c core.abbrev=-1 ls-files --stage A.t | cut_tr_d_n_field_n 2 >ls-files &&
+
+	git ls-files --abbrev=-1 --stage A.t | cut_tr_d_n_field_n 2 >ls-files &&
+	test_byte_count = 4 ls-files &&
+
+	git ls-files --abbrev=+1 --stage A.t | cut_tr_d_n_field_n 2 >ls-files &&
+	test_byte_count = 4 ls-files
+'
+
 for i in $(test_seq 4 40)
 do
 	test_expect_success "ls-tree core.abbrev=$i and --abbrev=$i" "
@@ -264,6 +367,17 @@ do
 	"
 done
 
+test_expect_success 'ls-tree core.abbrev=[-+]1 and --abbrev=[-+]1' '
+	test_must_fail git -c core.abbrev=+1 ls-tree HEAD A.t | cut -f 1 | cut_tr_d_n_field_n 3 >ls-tree &&
+	test_must_fail git -c core.abbrev=-1 ls-tree HEAD A.t | cut -f 1 | cut_tr_d_n_field_n 3 >ls-tree &&
+
+	git ls-tree --abbrev=-1 HEAD A.t | cut -f 1 | cut_tr_d_n_field_n 3 >ls-tree &&
+	test_byte_count = 4 ls-tree &&
+
+	git ls-tree --abbrev=+1 HEAD A.t | cut -f 1 | cut_tr_d_n_field_n 3 >ls-tree &&
+	test_byte_count = 4 ls-tree
+'
+
 for i in $(test_seq 4 40)
 do
 	test_expect_success "show-ref core.abbrev=$i and --abbrev=$i" "
@@ -276,4 +390,15 @@ do
 	"
 done
 
+test_expect_success 'show-ref core.abbrev=[-+]1 and --abbrev=[-+]1' '
+	test_must_fail git -c core.abbrev=+1 show-ref --hash refs/heads/master | tr_d_n >show-ref &&
+	test_must_fail git -c core.abbrev=-1 show-ref --hash refs/heads/master | tr_d_n >show-ref &&
+
+	git show-ref --abbrev=-1 --hash refs/heads/master | tr_d_n >show-ref &&
+	test_byte_count = 4 show-ref &&
+
+	git show-ref --abbrev=+1 --hash refs/heads/master | tr_d_n >show-ref &&
+	test_byte_count = 4 show-ref
+'
+
 test_done
-- 
2.17.0.290.gded63e768a


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 13/20] parse-options-cb.c: convert uses of 40 to GIT_SHA1_HEXSZ
  2018-06-08 22:41 [PATCH 00/20] unconditional O(1) SHA-1 abbreviation Ævar Arnfjörð Bjarmason
                   ` (11 preceding siblings ...)
  2018-06-08 22:41 ` [PATCH 12/20] abbrev tests: test for --abbrev and core.abbrev=[+-]N Ævar Arnfjörð Bjarmason
@ 2018-06-08 22:41 ` Ævar Arnfjörð Bjarmason
  2018-06-08 22:41 ` [PATCH 14/20] config.c: use braces on multiple conditional arms Ævar Arnfjörð Bjarmason
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-08 22:41 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, dstolee, git, peff, johannes.schindelin, jrnieder,
	Linus Torvalds, Ævar Arnfjörð Bjarmason

In ac53fe8601 ("sha1_name: convert uses of 40 to GIT_SHA1_HEXSZ",
2017-07-13) the code this is validating user input for in
find_unique_abbrev_r() was converted to GIT_SHA1_HEXSZ, but the
corresponding validation codepath wasn't change. Let's do that.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 parse-options-cb.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/parse-options-cb.c b/parse-options-cb.c
index 0f9f311a7a..298b5735c8 100644
--- a/parse-options-cb.c
+++ b/parse-options-cb.c
@@ -21,8 +21,8 @@ int parse_opt_abbrev_cb(const struct option *opt, const char *arg, int unset)
 			return opterror(opt, "expects a numerical value", 0);
 		if (v && v < MINIMUM_ABBREV)
 			v = MINIMUM_ABBREV;
-		else if (v > 40)
-			v = 40;
+		else if (v > GIT_SHA1_HEXSZ)
+			v = GIT_SHA1_HEXSZ;
 	}
 	*(int *)(opt->value) = v;
 	return 0;
-- 
2.17.0.290.gded63e768a


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 14/20] config.c: use braces on multiple conditional arms
  2018-06-08 22:41 [PATCH 00/20] unconditional O(1) SHA-1 abbreviation Ævar Arnfjörð Bjarmason
                   ` (12 preceding siblings ...)
  2018-06-08 22:41 ` [PATCH 13/20] parse-options-cb.c: convert uses of 40 to GIT_SHA1_HEXSZ Ævar Arnfjörð Bjarmason
@ 2018-06-08 22:41 ` Ævar Arnfjörð Bjarmason
  2018-06-08 22:41 ` [PATCH 15/20] parse-options-cb.c: " Ævar Arnfjörð Bjarmason
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-08 22:41 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, dstolee, git, peff, johannes.schindelin, jrnieder,
	Linus Torvalds, Ævar Arnfjörð Bjarmason

Adjust this code that'll be modified in a subsequent change to have
more than one line per branch to use braces per the CodingGuidelines,
this makes the later change easier to understand.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 config.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/config.c b/config.c
index fbbf0f8e9f..12f762ad92 100644
--- a/config.c
+++ b/config.c
@@ -1149,9 +1149,9 @@ static int git_default_core_config(const char *var, const char *value)
 	if (!strcmp(var, "core.abbrev")) {
 		if (!value)
 			return config_error_nonbool(var);
-		if (!strcasecmp(value, "auto"))
+		if (!strcasecmp(value, "auto")) {
 			default_abbrev = -1;
-		else {
+		} else {
 			int abbrev = git_config_int(var, value);
 			if (abbrev < minimum_abbrev || abbrev > 40)
 				return error("abbrev length out of range: %d", abbrev);
-- 
2.17.0.290.gded63e768a


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 15/20] parse-options-cb.c: use braces on multiple conditional arms
  2018-06-08 22:41 [PATCH 00/20] unconditional O(1) SHA-1 abbreviation Ævar Arnfjörð Bjarmason
                   ` (13 preceding siblings ...)
  2018-06-08 22:41 ` [PATCH 14/20] config.c: use braces on multiple conditional arms Ævar Arnfjörð Bjarmason
@ 2018-06-08 22:41 ` Ævar Arnfjörð Bjarmason
  2018-06-08 22:41 ` [PATCH 16/20] abbrev: unify the handling of non-numeric values Ævar Arnfjörð Bjarmason
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-08 22:41 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, dstolee, git, peff, johannes.schindelin, jrnieder,
	Linus Torvalds, Ævar Arnfjörð Bjarmason

Adjust this code that'll be modified in a subsequent change to have
more than one line per branch to use braces per the CodingGuidelines,
this makes the later change easier to understand.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 parse-options-cb.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/parse-options-cb.c b/parse-options-cb.c
index 298b5735c8..e3cd87fbd6 100644
--- a/parse-options-cb.c
+++ b/parse-options-cb.c
@@ -19,10 +19,11 @@ int parse_opt_abbrev_cb(const struct option *opt, const char *arg, int unset)
 		v = strtol(arg, (char **)&arg, 10);
 		if (*arg)
 			return opterror(opt, "expects a numerical value", 0);
-		if (v && v < MINIMUM_ABBREV)
+		if (v && v < MINIMUM_ABBREV) {
 			v = MINIMUM_ABBREV;
-		else if (v > GIT_SHA1_HEXSZ)
+		} else if (v > GIT_SHA1_HEXSZ) {
 			v = GIT_SHA1_HEXSZ;
+		}
 	}
 	*(int *)(opt->value) = v;
 	return 0;
-- 
2.17.0.290.gded63e768a


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 16/20] abbrev: unify the handling of non-numeric values
  2018-06-08 22:41 [PATCH 00/20] unconditional O(1) SHA-1 abbreviation Ævar Arnfjörð Bjarmason
                   ` (14 preceding siblings ...)
  2018-06-08 22:41 ` [PATCH 15/20] parse-options-cb.c: " Ævar Arnfjörð Bjarmason
@ 2018-06-08 22:41 ` Ævar Arnfjörð Bjarmason
  2018-06-08 22:41 ` [PATCH 17/20] abbrev: unify the handling of empty values Ævar Arnfjörð Bjarmason
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 37+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-08 22:41 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, dstolee, git, peff, johannes.schindelin, jrnieder,
	Linus Torvalds, Ævar Arnfjörð Bjarmason

Unify how --abbrev=XYZ and core.abbrev=XYZ is handled. 2/4 parsers for
these values would just let these invalid values pass, treating them
as though "0" was provided, which due to other inconsistent fallback
logic (soon to be fixed) would be treated as providing MINIMUM_ABBREV.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diff.c            |  5 ++++-
 revision.c        |  5 ++++-
 t/t0014-abbrev.sh | 10 +++++-----
 3 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index 136d44b455..75935322f1 100644
--- a/diff.c
+++ b/diff.c
@@ -4801,7 +4801,10 @@ int diff_opt_parse(struct diff_options *options,
 	else if (!strcmp(arg, "--abbrev"))
 		options->abbrev = DEFAULT_ABBREV;
 	else if (skip_prefix(arg, "--abbrev=", &arg)) {
-		options->abbrev = strtoul(arg, NULL, 10);
+		char *end;
+		options->abbrev = strtoul(arg, &end, 10);
+		if (*end)
+			die("--abbrev expects a numerical value, got '%s'", arg);
 		if (options->abbrev < MINIMUM_ABBREV)
 			options->abbrev = MINIMUM_ABBREV;
 		else if (the_hash_algo->hexsz < options->abbrev)
diff --git a/revision.c b/revision.c
index 40fd91ff2b..aa87afa77f 100644
--- a/revision.c
+++ b/revision.c
@@ -2047,7 +2047,10 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	} else if (!strcmp(arg, "--abbrev")) {
 		revs->abbrev = DEFAULT_ABBREV;
 	} else if (skip_prefix(arg, "--abbrev=", &optarg)) {
-		revs->abbrev = strtoul(optarg, NULL, 10);
+		char *end;
+		revs->abbrev = strtoul(optarg, &end, 10);
+		if (*end)
+			die("--abbrev expects a numerical value, got '%s'", optarg);
 		if (revs->abbrev < MINIMUM_ABBREV)
 			revs->abbrev = MINIMUM_ABBREV;
 		else if (revs->abbrev > hexsz)
diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh
index 6dee92f35e..203fe316b9 100755
--- a/t/t0014-abbrev.sh
+++ b/t/t0014-abbrev.sh
@@ -64,14 +64,14 @@ test_expect_success 'abbrev non-integer value handling differs ' '
 	test_must_fail git branch -v --abbrev=XYZ 2>stderr &&
 	test_i18ngrep "expects a numerical value" stderr &&
 
-	git log --abbrev=XYZ -1 --pretty=format:%h 2>stderr &&
-	! test -s stderr &&
+	test_must_fail git log --abbrev=XYZ -1 --pretty=format:%h 2>stderr &&
+	test_i18ngrep "expects a numerical value" stderr &&
 
-	git diff --raw --abbrev=XYZ HEAD~ 2>stderr &&
-	! test -s stderr &&
+	test_must_fail git diff --raw --abbrev=XYZ HEAD~ 2>stderr &&
+	test_i18ngrep "expects a numerical value" stderr &&
 
 	test_must_fail git diff --raw --abbrev=XYZ --no-index X Y 2>stderr &&
-	! test -s stderr
+	test_i18ngrep "expects a numerical value" stderr
 '
 
 for i in -41 -20 -10 -1 -0 +0 0 1 2 3 41
-- 
2.17.0.290.gded63e768a


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 17/20] abbrev: unify the handling of empty values
  2018-06-08 22:41 [PATCH 00/20] unconditional O(1) SHA-1 abbreviation Ævar Arnfjörð Bjarmason
                   ` (15 preceding siblings ...)
  2018-06-08 22:41 ` [PATCH 16/20] abbrev: unify the handling of non-numeric values Ævar Arnfjörð Bjarmason
@ 2018-06-08 22:41 ` Ævar Arnfjörð Bjarmason
  2018-06-09 14:24   ` Martin Ågren
  2018-06-08 22:41 ` [PATCH 18/20] abbrev parsing: use braces on multiple conditional arms Ævar Arnfjörð Bjarmason
                   ` (2 subsequent siblings)
  19 siblings, 1 reply; 37+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-08 22:41 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, dstolee, git, peff, johannes.schindelin, jrnieder,
	Linus Torvalds, Ævar Arnfjörð Bjarmason

For no good reason the --abbrev= command-line option was less strict
than the core.abbrev config option, which came down to the latter
using git_config_int() which rejects an empty string, but the rest of
the parsing using strtoul() which will convert it to 0.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diff.c             |  2 ++
 parse-options-cb.c |  2 ++
 revision.c         |  2 ++
 t/t0014-abbrev.sh  | 22 ++++++++--------------
 4 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/diff.c b/diff.c
index 75935322f1..cab79d24ab 100644
--- a/diff.c
+++ b/diff.c
@@ -4802,6 +4802,8 @@ int diff_opt_parse(struct diff_options *options,
 		options->abbrev = DEFAULT_ABBREV;
 	else if (skip_prefix(arg, "--abbrev=", &arg)) {
 		char *end;
+		if (!strcmp(arg, ""))
+			die("--abbrev expects a value, got '%s'", arg);
 		options->abbrev = strtoul(arg, &end, 10);
 		if (*end)
 			die("--abbrev expects a numerical value, got '%s'", arg);
diff --git a/parse-options-cb.c b/parse-options-cb.c
index e3cd87fbd6..aa9984f164 100644
--- a/parse-options-cb.c
+++ b/parse-options-cb.c
@@ -16,6 +16,8 @@ int parse_opt_abbrev_cb(const struct option *opt, const char *arg, int unset)
 	if (!arg) {
 		v = unset ? 0 : DEFAULT_ABBREV;
 	} else {
+		if (!strcmp(arg, ""))
+			return opterror(opt, "expects a value", 0);
 		v = strtol(arg, (char **)&arg, 10);
 		if (*arg)
 			return opterror(opt, "expects a numerical value", 0);
diff --git a/revision.c b/revision.c
index aa87afa77f..d39a292895 100644
--- a/revision.c
+++ b/revision.c
@@ -2048,6 +2048,8 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		revs->abbrev = DEFAULT_ABBREV;
 	} else if (skip_prefix(arg, "--abbrev=", &optarg)) {
 		char *end;
+		if (!strcmp(optarg, ""))
+			die("--abbrev expects a value, got '%s'", optarg);
 		revs->abbrev = strtoul(optarg, &end, 10);
 		if (*end)
 			die("--abbrev expects a numerical value, got '%s'", optarg);
diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh
index 203fe316b9..8448f78560 100755
--- a/t/t0014-abbrev.sh
+++ b/t/t0014-abbrev.sh
@@ -38,23 +38,17 @@ test_expect_success 'abbrev empty value handling differs ' '
 	test_must_fail git -c core.abbrev= log -1 --pretty=format:%h 2>stderr &&
 	test_i18ngrep "bad numeric config value.*invalid unit" stderr &&
 
-	git branch -v --abbrev= | cut_tr_d_n_field_n 3 >branch &&
-	test_byte_count = 40 branch &&
+	test_must_fail git branch -v --abbrev= 2>stderr &&
+	test_i18ngrep "expects a value" stderr &&
 
-	git log --abbrev= -1 --pretty=format:%h >log &&
-	test_byte_count = 4 log &&
+	test_must_fail git log --abbrev= -1 --pretty=format:%h 2>stderr &&
+	test_i18ngrep "expects a value" stderr &&
 
-	git diff --raw --abbrev= HEAD~ >diff &&
-	cut_tr_d_n_field_n 3 <diff >diff.3 &&
-	test_byte_count = 4 diff.3 &&
-	cut_tr_d_n_field_n 4 <diff >diff.4 &&
-	test_byte_count = 4 diff.4 &&
+	test_must_fail git diff --raw --abbrev= HEAD~ 2>stderr &&
+	test_i18ngrep "expects a value" stderr &&
 
-	test_must_fail git diff --raw --abbrev= --no-index X Y >diff &&
-	cut_tr_d_n_field_n 3 <diff >diff.3 &&
-	test_byte_count = 4 diff.3 &&
-	cut_tr_d_n_field_n 4 <diff >diff.4 &&
-	test_byte_count = 4 diff.4
+	test_must_fail git diff --raw --abbrev= --no-index X Y 2>stderr &&
+	test_i18ngrep "expects a value" stderr
 '
 
 test_expect_success 'abbrev non-integer value handling differs ' '
-- 
2.17.0.290.gded63e768a


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 18/20] abbrev parsing: use braces on multiple conditional arms
  2018-06-08 22:41 [PATCH 00/20] unconditional O(1) SHA-1 abbreviation Ævar Arnfjörð Bjarmason
                   ` (16 preceding siblings ...)
  2018-06-08 22:41 ` [PATCH 17/20] abbrev: unify the handling of empty values Ævar Arnfjörð Bjarmason
@ 2018-06-08 22:41 ` Ævar Arnfjörð Bjarmason
  2018-06-08 22:41 ` [PATCH 19/20] abbrev: support relative abbrev values Ævar Arnfjörð Bjarmason
  2018-06-08 22:41 ` [PATCH 20/20] abbrev: add a core.validateAbbrev setting Ævar Arnfjörð Bjarmason
  19 siblings, 0 replies; 37+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-08 22:41 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, dstolee, git, peff, johannes.schindelin, jrnieder,
	Linus Torvalds, Ævar Arnfjörð Bjarmason

Adjust this code that'll be modified in a subsequent change to have
more than one line per branch to use braces per the CodingGuidelines,
this makes the later change easier to understand.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 diff.c     | 5 +++--
 revision.c | 5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/diff.c b/diff.c
index cab79d24ab..e0141cfbc0 100644
--- a/diff.c
+++ b/diff.c
@@ -4807,10 +4807,11 @@ int diff_opt_parse(struct diff_options *options,
 		options->abbrev = strtoul(arg, &end, 10);
 		if (*end)
 			die("--abbrev expects a numerical value, got '%s'", arg);
-		if (options->abbrev < MINIMUM_ABBREV)
+		if (options->abbrev < MINIMUM_ABBREV) {
 			options->abbrev = MINIMUM_ABBREV;
-		else if (the_hash_algo->hexsz < options->abbrev)
+		} else if (the_hash_algo->hexsz < options->abbrev) {
 			options->abbrev = the_hash_algo->hexsz;
+		}
 	}
 	else if ((argcount = parse_long_opt("src-prefix", av, &optarg))) {
 		options->a_prefix = optarg;
diff --git a/revision.c b/revision.c
index d39a292895..2a75fef22d 100644
--- a/revision.c
+++ b/revision.c
@@ -2053,10 +2053,11 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 		revs->abbrev = strtoul(optarg, &end, 10);
 		if (*end)
 			die("--abbrev expects a numerical value, got '%s'", optarg);
-		if (revs->abbrev < MINIMUM_ABBREV)
+		if (revs->abbrev < MINIMUM_ABBREV) {
 			revs->abbrev = MINIMUM_ABBREV;
-		else if (revs->abbrev > hexsz)
+		} else if (revs->abbrev > hexsz) {
 			revs->abbrev = hexsz;
+		}
 	} else if (!strcmp(arg, "--abbrev-commit")) {
 		revs->abbrev_commit = 1;
 		revs->abbrev_commit_given = 1;
-- 
2.17.0.290.gded63e768a


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 19/20] abbrev: support relative abbrev values
  2018-06-08 22:41 [PATCH 00/20] unconditional O(1) SHA-1 abbreviation Ævar Arnfjörð Bjarmason
                   ` (17 preceding siblings ...)
  2018-06-08 22:41 ` [PATCH 18/20] abbrev parsing: use braces on multiple conditional arms Ævar Arnfjörð Bjarmason
@ 2018-06-08 22:41 ` Ævar Arnfjörð Bjarmason
  2018-06-09 15:38   ` Martin Ågren
  2018-06-12 19:16   ` Junio C Hamano
  2018-06-08 22:41 ` [PATCH 20/20] abbrev: add a core.validateAbbrev setting Ævar Arnfjörð Bjarmason
  19 siblings, 2 replies; 37+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-08 22:41 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, dstolee, git, peff, johannes.schindelin, jrnieder,
	Linus Torvalds, Ævar Arnfjörð Bjarmason

Change the core.abbrev config variable and the corresponding --abbrev
command-line option to support relative values such as +1 or -1.

Before Linus's e6c587c733 ("abbrev: auto size the default
abbreviation", 2016-09-30) git would default to abbreviating object
names to 7-hexdigits, and only picking longer SHA-1s as needed if that
was ambiguous.

That change instead set the default length as a function of the
estimated current count of objects:

    Based on the expectation that we would see collision in a
    repository with 2^(2N) objects when using object names shortened
    to first N bits, use sufficient number of hexdigits to cover the
    number of objects in the repository.  Each hexdigit (4-bits) we
    add to the shortened name allows us to have four times (2-bits) as
    many objects in the repository.

By supporting relative values for core.abbrev we can allow users to
consistently opt-in for either a higher or lower probability of
collision, without needing to hardcode a given numeric value like
"10", which would be overkill on some repositories, and far to small
on others.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config.txt       |   6 ++
 Documentation/diff-options.txt |   3 +
 Documentation/git-blame.txt    |   3 +
 Documentation/git-branch.txt   |   3 +
 Documentation/git-describe.txt |   3 +
 Documentation/git-ls-files.txt |   3 +
 Documentation/git-ls-tree.txt  |   3 +
 Documentation/git-show-ref.txt |   3 +
 cache.h                        |   1 +
 config.c                       |  11 +++
 diff.c                         |  18 +++-
 environment.c                  |   1 +
 parse-options-cb.c             |  12 ++-
 revision.c                     |  18 +++-
 sha1-name.c                    |  11 +++
 t/t0014-abbrev.sh              | 170 ++++++++++++++++++++++-----------
 16 files changed, 204 insertions(+), 65 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ab641bf5a9..abf07be7b6 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -919,6 +919,12 @@ core.abbrev::
 	in your repository, which hopefully is enough for
 	abbreviated object names to stay unique for some time.
 	The minimum length is 4.
++
+This can also be set to relative values such as `+2` or `-2`, which
+means to add or subtract N characters from the SHA-1 that Git would
+otherwise print, this allows for producing more future-proof SHA-1s
+for use within a given project, while adjusting the value for the
+current approximate number of objects.
 
 add.ignoreErrors::
 add.ignore-errors (deprecated)::
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index f466600972..f1114a7b8d 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -384,6 +384,9 @@ endif::git-format-patch[]
 	independent of the `--full-index` option above, which controls
 	the diff-patch output format.  Non default number of
 	digits can be specified with `--abbrev=<n>`.
++
+Can also be set to a relative value, see `core.abbrev` in
+linkgit:git-diff[1].
 
 -B[<n>][/<m>]::
 --break-rewrites[=[<n>][/<m>]]::
diff --git a/Documentation/git-blame.txt b/Documentation/git-blame.txt
index d6cddbcb2e..8559d3b0c7 100644
--- a/Documentation/git-blame.txt
+++ b/Documentation/git-blame.txt
@@ -99,6 +99,9 @@ requested, resulting in unaligned output.
 Before 2.19, setting `core.abbrev=40` wouldn't apply the above rule
 and would cause blame to emit output that was unaligned. This bug has
 since been fixed.
++
+Can also be set to a relative value, see `core.abbrev` in
+linkgit:git-diff[1].
 
 
 THE PORCELAIN FORMAT
diff --git a/Documentation/git-branch.txt b/Documentation/git-branch.txt
index 02eccbb931..0f8032cec6 100644
--- a/Documentation/git-branch.txt
+++ b/Documentation/git-branch.txt
@@ -182,6 +182,9 @@ See `--create-reflog` above for details.
 	Alter the sha1's minimum display length in the output listing.
 	The default value is 7 and can be overridden by the `core.abbrev`
 	config option.
++
+Can also be set to a relative value, see `core.abbrev` in
+linkgit:git-diff[1].
 
 --no-abbrev::
 	Display the full sha1s in the output listing rather than abbreviating them.
diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt
index e027fb8c4b..a3d5c7e817 100644
--- a/Documentation/git-describe.txt
+++ b/Documentation/git-describe.txt
@@ -65,6 +65,9 @@ OPTIONS
 	abbreviated object name, use <n> digits, or as many digits
 	as needed to form a unique object name.  An <n> of 0
 	will suppress long format, only showing the closest tag.
++
+Can also be set to a relative value, see `core.abbrev` in
+linkgit:git-diff[1].
 
 --candidates=<n>::
 	Instead of considering only the 10 most recent tags as
diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index 5298f1bc30..f9af2b23bf 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -153,6 +153,9 @@ a space) at the start of each line:
 	Instead of showing the full 40-byte hexadecimal object
 	lines, show only a partial prefix.
 	Non default number of digits can be specified with --abbrev=<n>.
++
+Can also be set to a relative value, see `core.abbrev` in
+linkgit:git-diff[1].
 
 --debug::
 	After each line that describes a file, add more data about its
diff --git a/Documentation/git-ls-tree.txt b/Documentation/git-ls-tree.txt
index 9dee7bef35..bcba39eed8 100644
--- a/Documentation/git-ls-tree.txt
+++ b/Documentation/git-ls-tree.txt
@@ -64,6 +64,9 @@ OPTIONS
 	Instead of showing the full 40-byte hexadecimal object
 	lines, show only a partial prefix.
 	Non default number of digits can be specified with --abbrev=<n>.
++
+Can also be set to a relative value, see `core.abbrev` in
+linkgit:git-diff[1].
 
 --full-name::
 	Instead of showing the path names relative to the current working
diff --git a/Documentation/git-show-ref.txt b/Documentation/git-show-ref.txt
index d28e6154c6..05c5579075 100644
--- a/Documentation/git-show-ref.txt
+++ b/Documentation/git-show-ref.txt
@@ -66,6 +66,9 @@ OPTIONS
 
 	Abbreviate the object name.  When using `--hash`, you do
 	not have to say `--hash --abbrev`; `--hash=n` would do.
++
+Can also be set to a relative value, see `core.abbrev` in
+linkgit:git-diff[1].
 
 -q::
 --quiet::
diff --git a/cache.h b/cache.h
index 89a107a7f7..0fb4211804 100644
--- a/cache.h
+++ b/cache.h
@@ -772,6 +772,7 @@ extern int check_stat;
 extern int quote_path_fully;
 extern int has_symlinks;
 extern int minimum_abbrev, default_abbrev;
+extern int default_abbrev_relative;
 extern int ignore_case;
 extern int assume_unchanged;
 extern int prefer_symlink_refs;
diff --git a/config.c b/config.c
index 12f762ad92..cd95c6bdfb 100644
--- a/config.c
+++ b/config.c
@@ -1151,6 +1151,17 @@ static int git_default_core_config(const char *var, const char *value)
 			return config_error_nonbool(var);
 		if (!strcasecmp(value, "auto")) {
 			default_abbrev = -1;
+		} else if (*value == '+' || *value == '-') {
+			int relative = git_config_int(var, value);
+			if (relative == 0)
+				die(_("bad core.abbrev value %s. "
+				      "relative values must be non-zero"),
+				    value);
+			if (abs(relative) > GIT_SHA1_HEXSZ)
+				die(_("bad core.abbrev value %s. "
+				      "impossibly out of range"),
+				    value);
+			default_abbrev_relative = relative;
 		} else {
 			int abbrev = git_config_int(var, value);
 			if (abbrev < minimum_abbrev || abbrev > 40)
diff --git a/diff.c b/diff.c
index e0141cfbc0..f7861b8472 100644
--- a/diff.c
+++ b/diff.c
@@ -4801,16 +4801,28 @@ int diff_opt_parse(struct diff_options *options,
 	else if (!strcmp(arg, "--abbrev"))
 		options->abbrev = DEFAULT_ABBREV;
 	else if (skip_prefix(arg, "--abbrev=", &arg)) {
+		int v;
 		char *end;
 		if (!strcmp(arg, ""))
 			die("--abbrev expects a value, got '%s'", arg);
-		options->abbrev = strtoul(arg, &end, 10);
+		v = strtoul(arg, &end, 10);
 		if (*end)
 			die("--abbrev expects a numerical value, got '%s'", arg);
-		if (options->abbrev < MINIMUM_ABBREV) {
+		if (*arg == '+' || *arg == '-') {
+			if (v == 0) {
+				die("relative abbrev must be non-zero");
+			} else if (abs(v) > the_hash_algo->hexsz) {
+				die("relative abbrev impossibly out of range");
+			} else {
+				default_abbrev_relative = v;
+				options->abbrev = DEFAULT_ABBREV;
+			}
+		} else if (v < MINIMUM_ABBREV) {
 			options->abbrev = MINIMUM_ABBREV;
-		} else if (the_hash_algo->hexsz < options->abbrev) {
+		} else if (the_hash_algo->hexsz < v) {
 			options->abbrev = the_hash_algo->hexsz;
+		} else {
+			options->abbrev = v;
 		}
 	}
 	else if ((argcount = parse_long_opt("src-prefix", av, &optarg))) {
diff --git a/environment.c b/environment.c
index 2a6de2330b..0d48d52fba 100644
--- a/environment.c
+++ b/environment.c
@@ -22,6 +22,7 @@ int trust_ctime = 1;
 int check_stat = 1;
 int has_symlinks = 1;
 int minimum_abbrev = 4, default_abbrev = -1;
+int default_abbrev_relative = 0;
 int ignore_case;
 int assume_unchanged;
 int prefer_symlink_refs;
diff --git a/parse-options-cb.c b/parse-options-cb.c
index aa9984f164..2a1ab449bf 100644
--- a/parse-options-cb.c
+++ b/parse-options-cb.c
@@ -16,12 +16,22 @@ int parse_opt_abbrev_cb(const struct option *opt, const char *arg, int unset)
 	if (!arg) {
 		v = unset ? 0 : DEFAULT_ABBREV;
 	} else {
+		const char *origarg = arg;
 		if (!strcmp(arg, ""))
 			return opterror(opt, "expects a value", 0);
 		v = strtol(arg, (char **)&arg, 10);
 		if (*arg)
 			return opterror(opt, "expects a numerical value", 0);
-		if (v && v < MINIMUM_ABBREV) {
+		if (*origarg == '+' || *origarg == '-') {
+			if (v == 0) {
+				return opterror(opt, "relative abbrev must be non-zero", 0);
+			} else if (abs(v) > GIT_SHA1_HEXSZ) {
+				return opterror(opt, "relative abbrev impossibly out of range", 0);
+			} else {
+				default_abbrev_relative = v;
+				v = -1;
+			}
+		} else if (v && v < MINIMUM_ABBREV) {
 			v = MINIMUM_ABBREV;
 		} else if (v > GIT_SHA1_HEXSZ) {
 			v = GIT_SHA1_HEXSZ;
diff --git a/revision.c b/revision.c
index 2a75fef22d..c858d32da7 100644
--- a/revision.c
+++ b/revision.c
@@ -2047,16 +2047,28 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg
 	} else if (!strcmp(arg, "--abbrev")) {
 		revs->abbrev = DEFAULT_ABBREV;
 	} else if (skip_prefix(arg, "--abbrev=", &optarg)) {
+		int v;
 		char *end;
 		if (!strcmp(optarg, ""))
 			die("--abbrev expects a value, got '%s'", optarg);
-		revs->abbrev = strtoul(optarg, &end, 10);
+		v = strtoul(optarg, &end, 10);
 		if (*end)
 			die("--abbrev expects a numerical value, got '%s'", optarg);
-		if (revs->abbrev < MINIMUM_ABBREV) {
+		if (*optarg == '+' || *optarg == '-') {
+			if (v == 0) {
+				die("relative abbrev must be non-zero");
+			} else if (abs(v) > hexsz) {
+				die("relative abbrev impossibly out of range");
+			} else {
+				default_abbrev_relative = v;
+				revs->abbrev = DEFAULT_ABBREV;
+			}
+		} else if (v < MINIMUM_ABBREV) {
 			revs->abbrev = MINIMUM_ABBREV;
-		} else if (revs->abbrev > hexsz) {
+		} else if (v > hexsz) {
 			revs->abbrev = hexsz;
+		} else {
+			revs->abbrev = v;
 		}
 	} else if (!strcmp(arg, "--abbrev-commit")) {
 		revs->abbrev_commit = 1;
diff --git a/sha1-name.c b/sha1-name.c
index 60d9ef3c7e..75f1bef7d1 100644
--- a/sha1-name.c
+++ b/sha1-name.c
@@ -602,6 +602,17 @@ int find_unique_abbrev_r(char *hex, const struct object_id *oid, int len)
 	if (len == GIT_SHA1_HEXSZ || !len)
 		return GIT_SHA1_HEXSZ;
 
+	if (default_abbrev_relative) {
+		int dar = default_abbrev_relative;
+		if (len + dar > GIT_SHA1_HEXSZ) {
+			return GIT_SHA1_HEXSZ;
+		} else if (len + dar < MINIMUM_ABBREV) {
+			len = MINIMUM_ABBREV;
+			dar = 0;
+		}
+		len += dar;
+	}
+
 	mad.init_len = len;
 	mad.cur_len = len;
 	mad.hex = hex;
diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh
index 8448f78560..73d990ecc1 100755
--- a/t/t0014-abbrev.sh
+++ b/t/t0014-abbrev.sh
@@ -68,23 +68,33 @@ test_expect_success 'abbrev non-integer value handling differs ' '
 	test_i18ngrep "expects a numerical value" stderr
 '
 
-for i in -41 -20 -10 -1 -0 +0 0 1 2 3 41
+for i in -41 +41
 do
 	test_expect_success "core.abbrev value $i out of range errors out" "
 		test_must_fail git -c core.abbrev=$i log -1 --pretty=format:%h 2>stderr &&
-		test_i18ngrep 'abbrev length out of range' stderr
+		test_i18ngrep 'impossibly out of range' stderr
 	"
 done
 
-for i in -41 -20 -10 -1
+for i in -39 -20 -10 -3
 do
-	test_expect_success "negative --abbrev=$i value out of range means --abbrev=40" "
+	test_expect_success "negative -31..-3 --abbrev=$i mean --abbrev=4" "
 		git log --abbrev=$i -1 --pretty=format:%h >log &&
-		test_byte_count = 40 log
+		test_byte_count = 4 log
+	"
+done
+
+for i in -9001 -41 +41 +9001
+do
+	test_expect_success "core.abbrev=$i and --abbrev=$i values out of range error out" "
+		test_must_fail git -c core.abbrev=$i branch -v 2>stderr &&
+		test_i18ngrep 'impossibly out of range' stderr &&
+		test_must_fail git branch -v --abbrev=$i 2>stderr &&
+		test_i18ngrep 'impossibly out of range' stderr
 	"
 done
 
-for i in 0 1 2 3 4 -0 +0 +1 +2 +3 +4
+for i in 0 1 2 3 4
 do
 	test_expect_success "non-negative --abbrev=$i value <MINIMUM_ABBREV falls back on MINIMUM_ABBREV" "
 		git log --abbrev=$i -1 --pretty=format:%h >log &&
@@ -92,7 +102,7 @@ do
 	"
 done
 
-for i in 41 9001 +41 +9001
+for i in 41 9001
 do
 	test_expect_success "non-negative --abbrev=$i value >MINIMUM_ABBREV falls back on 40" "
 		git log --abbrev=$i -1 --pretty=format:%h >log &&
@@ -110,9 +120,23 @@ do
 		git log --abbrev=$i -1 --pretty=format:%h >log &&
 		test_byte_count = $i log &&
 
-		# core.abbrev=+N is the same as core.abbrev=N
+		# core.abbrev=+N is the same as core.abbrev=7+N
 		git -c core.abbrev=+$i log -1 --pretty=format:%h >log &&
-		test_byte_count = $i log &&
+		if test \$((7 + $i)) -gt 40
+		then
+			test_byte_count = 40 log
+		else
+			test_byte_count = \$((7 + $i)) log
+		fi &&
+
+		# --abbrev=+N is the same as --abbrev=7+N
+		git log --abbrev=+$i -1 --pretty=format:%h >log &&
+		if test \$((7 + $i)) -gt 40
+		then
+			test_byte_count = 40 log
+		else
+			test_byte_count = \$((7 + $i)) log
+		fi &&
 
 		# The --abbrev option should take priority over
 		# core.abbrev
@@ -171,14 +195,17 @@ do
 done
 
 test_expect_success 'blame core.abbrev=[-+]1 and --abbrev=[-+]1' '
-	test_must_fail git -c core.abbrev=+1 blame A.t | cut_tr_d_n_field_n 1 >blame &&
-	test_must_fail git -c core.abbrev=-1 blame A.t | cut_tr_d_n_field_n 1 >blame &&
+	git -c core.abbrev=+1 blame A.t | cut_tr_d_n_field_n 1 >blame &&
+	test_byte_count = 9 blame &&
 
-	git blame --abbrev=-1 A.t | cut_tr_d_n_field_n 1 >blame &&
-	test_byte_count = 5 blame &&
+	git -c core.abbrev=-1 blame A.t | cut_tr_d_n_field_n 1 >blame &&
+	test_byte_count = 7 blame &&
 
 	git blame --abbrev=+1 A.t | cut_tr_d_n_field_n 1 >blame &&
-	test_byte_count = 5 blame
+	test_byte_count = 9 blame &&
+
+	git blame --abbrev=-1 A.t | cut_tr_d_n_field_n 1 >blame &&
+	test_byte_count = 7 blame
 '
 
 for i in $(test_seq 4 40)
@@ -193,14 +220,17 @@ do
 done
 
 test_expect_success 'branch core.abbrev=[-+]1 and --abbrev=[-+]1' '
-	test_must_fail git -c core.abbrev=+1 branch -v | cut_tr_d_n_field_n 3 >branch &&
-	test_must_fail git -c core.abbrev=-1 branch -v | cut_tr_d_n_field_n 3 >branch &&
+	git -c core.abbrev=+1 branch -v | cut_tr_d_n_field_n 3 >branch &&
+	test_byte_count = 8 branch &&
 
-	git branch --abbrev=-1 -v | cut_tr_d_n_field_n 3 >branch &&
-	test_byte_count = 4 branch &&
+	git -c core.abbrev=-1 branch -v | cut_tr_d_n_field_n 3 >branch &&
+	test_byte_count = 6 branch &&
 
-	git branch --abbrev=+1 -v | cut_tr_d_n_field_n 3 >branch &&
-	test_byte_count = 4 branch
+	git branch -v --abbrev=+1 | cut_tr_d_n_field_n 3 >branch &&
+	test_byte_count = 8 branch &&
+
+	git branch -v --abbrev=-1 | cut_tr_d_n_field_n 3 >branch &&
+	test_byte_count = 6 branch
 '
 
 test_expect_success 'describe core.abbrev and --abbrev special cases' '
@@ -225,14 +255,17 @@ do
 done
 
 test_expect_success 'describe core.abbrev=[-+]1 and --abbrev=[-+]1' '
-	test_must_fail git -c core.abbrev=+1 describe | sed_g_tr_d_n >describe &&
-	test_must_fail git -c core.abbrev=-1 describe | sed_g_tr_d_n >describe &&
+	git -c core.abbrev=-1 describe | sed_g_tr_d_n >describe &&
+	test_byte_count = 6 describe &&
+
+	git -c core.abbrev=+1 describe | sed_g_tr_d_n >describe &&
+	test_byte_count = 8 describe &&
 
 	git describe --abbrev=-1 | sed_g_tr_d_n >describe &&
-	test_byte_count = 4 describe &&
+	test_byte_count = 6 describe &&
 
 	git describe --abbrev=+1 | sed_g_tr_d_n >describe &&
-	test_byte_count = 4 describe
+	test_byte_count = 8 describe
 '
 
 for i in $(test_seq 4 40)
@@ -246,17 +279,17 @@ do
 done
 
 test_expect_success 'log core.abbrev=[-+]1 and --abbrev=[-+]1' '
-	test_must_fail git -c core.abbrev=+1 log --pretty=format:%h -1 2>stderr &&
-	test_i18ngrep "abbrev length out of range" stderr &&
+	git -c core.abbrev=+1 log --pretty=format:%h -1 >log &&
+	test_byte_count = 8 log &&
 
-	test_must_fail git -c core.abbrev=-1 log --pretty=format:%h -1 2>stderr &&
-	test_i18ngrep "abbrev length out of range" stderr &&
+	git -c core.abbrev=-1 log --pretty=format:%h -1 >log &&
+	test_byte_count = 6 log &&
 
 	git log --abbrev=+1 --pretty=format:%h -1 | tr_d_n >log &&
-	test_byte_count = 4 log &&
+	test_byte_count = 8 log &&
 
 	git log --abbrev=-1 --pretty=format:%h -1 | tr_d_n >log &&
-	test_byte_count = 40 log
+	test_byte_count = 6 log
 '
 
 for i in $(test_seq 4 40)
@@ -291,43 +324,55 @@ do
 done
 
 test_expect_success 'diff --no-index --raw core.abbrev=[-+]1 and --abbrev=[-+]1' '
-	test_must_fail git -c core.abbrev=+1 diff --no-index --raw X Y 2>stderr &&
-	test_i18ngrep "abbrev length out of range" stderr &&
+	test_must_fail git -c core.abbrev=+1 diff --no-index --raw X Y >diff &&
+	cut_tr_d_n_field_n 3 <diff >diff.3 &&
+	test_byte_count = 8 diff.3 &&
+	cut_tr_d_n_field_n 4 <diff >diff.4 &&
+	test_byte_count = 8 diff.4 &&
 
-	test_must_fail git -c core.abbrev=-1 diff --no-index --raw X Y 2>stderr &&
-	test_i18ngrep "abbrev length out of range" stderr &&
+	test_must_fail git -c core.abbrev=-1 diff --no-index --raw X Y >diff &&
+	cut_tr_d_n_field_n 3 <diff >diff.3 &&
+	test_byte_count = 6 diff.3 &&
+	cut_tr_d_n_field_n 4 <diff >diff.4 &&
+	test_byte_count = 6 diff.4 &&
 
 	test_must_fail git diff --no-index --raw --abbrev=+1 X Y >diff &&
 	cut_tr_d_n_field_n 3 <diff >diff.3 &&
-	test_byte_count = 4 diff.3 &&
+	test_byte_count = 8 diff.3 &&
 	cut_tr_d_n_field_n 4 <diff >diff.4 &&
-	test_byte_count = 4 diff.4 &&
+	test_byte_count = 8 diff.4 &&
 
 	test_must_fail git diff --no-index --raw --abbrev=-1 X Y >diff &&
 	cut_tr_d_n_field_n 3 <diff >diff.3 &&
-	test_byte_count = 4 diff.3 &&
+	test_byte_count = 6 diff.3 &&
 	cut_tr_d_n_field_n 4 <diff >diff.4 &&
-	test_byte_count = 4 diff.4
+	test_byte_count = 6 diff.4
 '
 
 test_expect_success 'diff --raw core.abbrev=[-+]1 and --abbrev=[-+]1' '
-	test_must_fail git -c core.abbrev=+1 diff HEAD~ 2>stderr &&
-	test_i18ngrep "abbrev length out of range" stderr &&
+	git -c core.abbrev=+1 diff --raw HEAD~ >diff &&
+	cut_tr_d_n_field_n 3 <diff >diff.3 &&
+	test_byte_count = 8 diff.3 &&
+	cut_tr_d_n_field_n 4 <diff >diff.4 &&
+	test_byte_count = 8 diff.4 &&
 
-	test_must_fail git -c core.abbrev=-1 diff HEAD~ 2>stderr &&
-	test_i18ngrep "abbrev length out of range" stderr &&
+	git -c core.abbrev=-1 diff --raw HEAD~ >diff &&
+	cut_tr_d_n_field_n 3 <diff >diff.3 &&
+	test_byte_count = 6 diff.3 &&
+	cut_tr_d_n_field_n 4 <diff >diff.4 &&
+	test_byte_count = 6 diff.4 &&
 
-	git diff --raw --abbrev=+1 HEAD~ >diff &&
+	git diff --raw --abbrev=+1 --raw HEAD~ >diff &&
 	cut_tr_d_n_field_n 3 <diff >diff.3 &&
-	test_byte_count = 4 diff.3 &&
+	test_byte_count = 8 diff.3 &&
 	cut_tr_d_n_field_n 4 <diff >diff.4 &&
-	test_byte_count = 4 diff.4 &&
+	test_byte_count = 8 diff.4 &&
 
-	git diff --raw --abbrev=-1 HEAD~ >diff &&
+	git diff --raw --abbrev=-1 --raw HEAD~ >diff &&
 	cut_tr_d_n_field_n 3 <diff >diff.3 &&
-	test_byte_count = 40 diff.3 &&
+	test_byte_count = 6 diff.3 &&
 	cut_tr_d_n_field_n 4 <diff >diff.4 &&
-	test_byte_count = 40 diff.4
+	test_byte_count = 6 diff.4
 '
 
 for i in $(test_seq 4 40)
@@ -342,13 +387,16 @@ done
 
 test_expect_success 'ls-files core.abbrev=[-+]1 and --abbrev=[-+]1' '
 	test_must_fail git -c core.abbrev=+1 ls-files --stage A.t | cut_tr_d_n_field_n 2 >ls-files &&
+	test_byte_count = 40 ls-files &&
+
 	test_must_fail git -c core.abbrev=-1 ls-files --stage A.t | cut_tr_d_n_field_n 2 >ls-files &&
+	test_byte_count = 40 ls-files &&
 
 	git ls-files --abbrev=-1 --stage A.t | cut_tr_d_n_field_n 2 >ls-files &&
-	test_byte_count = 4 ls-files &&
+	test_byte_count = 6 ls-files &&
 
 	git ls-files --abbrev=+1 --stage A.t | cut_tr_d_n_field_n 2 >ls-files &&
-	test_byte_count = 4 ls-files
+	test_byte_count = 8 ls-files
 '
 
 for i in $(test_seq 4 40)
@@ -362,14 +410,17 @@ do
 done
 
 test_expect_success 'ls-tree core.abbrev=[-+]1 and --abbrev=[-+]1' '
-	test_must_fail git -c core.abbrev=+1 ls-tree HEAD A.t | cut -f 1 | cut_tr_d_n_field_n 3 >ls-tree &&
-	test_must_fail git -c core.abbrev=-1 ls-tree HEAD A.t | cut -f 1 | cut_tr_d_n_field_n 3 >ls-tree &&
+	git -c core.abbrev=+1 ls-tree HEAD A.t | cut -f 1 | cut_tr_d_n_field_n 3 >ls-tree &&
+	test_byte_count = 40 ls-tree &&
+
+	git -c core.abbrev=-1 ls-tree HEAD A.t | cut -f 1 | cut_tr_d_n_field_n 3 >ls-tree &&
+	test_byte_count = 40 ls-tree &&
 
 	git ls-tree --abbrev=-1 HEAD A.t | cut -f 1 | cut_tr_d_n_field_n 3 >ls-tree &&
-	test_byte_count = 4 ls-tree &&
+	test_byte_count = 6 ls-tree &&
 
 	git ls-tree --abbrev=+1 HEAD A.t | cut -f 1 | cut_tr_d_n_field_n 3 >ls-tree &&
-	test_byte_count = 4 ls-tree
+	test_byte_count = 8 ls-tree
 '
 
 for i in $(test_seq 4 40)
@@ -385,14 +436,17 @@ do
 done
 
 test_expect_success 'show-ref core.abbrev=[-+]1 and --abbrev=[-+]1' '
-	test_must_fail git -c core.abbrev=+1 show-ref --hash refs/heads/master | tr_d_n >show-ref &&
-	test_must_fail git -c core.abbrev=-1 show-ref --hash refs/heads/master | tr_d_n >show-ref &&
+	git -c core.abbrev=+1 show-ref --hash refs/heads/master | tr_d_n >show-ref &&
+	test_byte_count = 40 show-ref &&
+
+	git -c core.abbrev=-1 show-ref --hash refs/heads/master | tr_d_n >show-ref &&
+	test_byte_count = 40 show-ref &&
 
 	git show-ref --abbrev=-1 --hash refs/heads/master | tr_d_n >show-ref &&
-	test_byte_count = 4 show-ref &&
+	test_byte_count = 6 show-ref &&
 
 	git show-ref --abbrev=+1 --hash refs/heads/master | tr_d_n >show-ref &&
-	test_byte_count = 4 show-ref
+	test_byte_count = 8 show-ref
 '
 
 test_done
-- 
2.17.0.290.gded63e768a


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 20/20] abbrev: add a core.validateAbbrev setting
  2018-06-08 22:41 [PATCH 00/20] unconditional O(1) SHA-1 abbreviation Ævar Arnfjörð Bjarmason
                   ` (18 preceding siblings ...)
  2018-06-08 22:41 ` [PATCH 19/20] abbrev: support relative abbrev values Ævar Arnfjörð Bjarmason
@ 2018-06-08 22:41 ` Ævar Arnfjörð Bjarmason
  2018-06-09 15:47   ` Martin Ågren
  19 siblings, 1 reply; 37+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-08 22:41 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, dstolee, git, peff, johannes.schindelin, jrnieder,
	Linus Torvalds, Ævar Arnfjörð Bjarmason

Operations that need to abbreviate a lot of SHA-1s such as 'git log
--oneline' experience degraded performance when traversing a lot of
packs. See [1] and [2] for some relevant performance numbers.

One way to address this is something like the MIDX to make looking up
the SHA-1s cheaper.

This change adds an alternate method of achieving some of the same
ends (but possibly not all, see [3] and replies to the original thread
at [1]).

Instead of trying really hard to find an unambiguous SHA-1 we can with
core.validateAbbrev=false, and preferably combined with the new
support for relative core.abbrev values (such as +4) unconditionally
print a short SHA-1 without doing any disambiguation check. I.e. it
allows for picking a trade-off between performance, and the odds that
future or remote (or current and local) short SHA-1 will be ambiguous.

With the included performance test my git.git repacked with with `git
repack -A -d --max-pack-size=X` gives the following results against
git.git itself with X=64M:

    Test                                        HEAD~             HEAD
    ------------------------------------------------------------------------------------
    0014.1: git log --oneline --raw --parents   2.53(2.48+0.05)   2.20(2.14+0.05) -13.0%

With one big pack -7.6%, and with 16M packs -23.8%.

1. https://public-inbox.org/git/20180107181459.222909-1-dstolee@microsoft.com/
2. https://public-inbox.org/git/20180607140338.32440-1-dstolee@microsoft.com/
3. https://public-inbox.org/git/87lgbsz61p.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/config.txt            | 43 ++++++++++++++++++++++++++
 cache.h                             |  1 +
 config.c                            |  7 +++++
 environment.c                       |  1 +
 sha1-name.c                         |  4 +++
 t/perf/p0014-abbrev.sh              | 13 ++++++++
 t/t1512-rev-parse-disambiguation.sh | 47 +++++++++++++++++++++++++++++
 7 files changed, 116 insertions(+)
 create mode 100755 t/perf/p0014-abbrev.sh

diff --git a/Documentation/config.txt b/Documentation/config.txt
index abf07be7b6..df31d1351f 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -925,6 +925,49 @@ means to add or subtract N characters from the SHA-1 that Git would
 otherwise print, this allows for producing more future-proof SHA-1s
 for use within a given project, while adjusting the value for the
 current approximate number of objects.
++
+This is especially useful in combination with the
+`core.validateAbbrev` setting, or to get more future-proof hashes to
+reference in the future in a repository whose number of objects is
+expected to grow.
+
+core.validateAbbrev::
+	If set to false (true by default) don't do any validation when
+	printing abbreviated object names to see if they're really
+	unique. This makes printing objects more performant at the
+	cost of potentially printing object names that aren't unique
+	within the current repository.
++
+When printing abbreviated object names Git needs to look through the
+local object store. This is an `O(log N)` operation assuming all the
+objects are in a single pack file, but `X * O(log N)` given `X` pack
+files, which can get expensive on some larger repositories.
++
+This setting changes that to `O(1)`, but with the trade-off that
+depending on the value of `core.abbrev` we may be printing abbreviated
+hashes that collide. Too see how likely this is, try running:
++
+-----------------------------------------------------------------------------------------------------------
+git log --all --pretty=format:%h --abbrev=4 | perl -nE 'chomp; say length' | sort | uniq -c | sort -nr
+-----------------------------------------------------------------------------------------------------------
++
+This shows how many commits were found at each abbreviation length. On
+linux.git in June 2018 this shows a bit more than 750,000 commits,
+with just 4 needing 11 characters to be fully abbreviated, and the
+default heuristic picks a length of 12.
++
+Even without `core.validateAbbrev=false` the results abbreviation
+already a bit of a probability game. They're guaranteed at the moment
+of generation, but as more objects are added, ambiguities may be
+introduced. Likewise, what's unambiguous for you may not be for
+somebody else you're communicating with, if they have their own clone.
++
+Therefore the default of `core.validateAbbrev=true` may not save you
+in practice if you're sharing the SHA-1 or noting it now to use after
+a `git fetch`. You may be better off setting `core.abbrev` to
+e.g. `+2` to add 2 extra characters to the SHA-1, and possibly combine
+that with `core.validateAbbrev=false` to get a reasonable trade-off
+between safety and performance.
 
 add.ignoreErrors::
 add.ignore-errors (deprecated)::
diff --git a/cache.h b/cache.h
index 0fb4211804..6dc5af9482 100644
--- a/cache.h
+++ b/cache.h
@@ -773,6 +773,7 @@ extern int quote_path_fully;
 extern int has_symlinks;
 extern int minimum_abbrev, default_abbrev;
 extern int default_abbrev_relative;
+extern int validate_abbrev;
 extern int ignore_case;
 extern int assume_unchanged;
 extern int prefer_symlink_refs;
diff --git a/config.c b/config.c
index cd95c6bdfb..fd11e1047d 100644
--- a/config.c
+++ b/config.c
@@ -1146,6 +1146,13 @@ static int git_default_core_config(const char *var, const char *value)
 		return 0;
 	}
 
+	if (!strcmp(var, "core.validateabbrev")) {
+		if (!value)
+			return config_error_nonbool(var);
+		validate_abbrev =  git_config_bool(var, value);
+		return 0;
+	}
+
 	if (!strcmp(var, "core.abbrev")) {
 		if (!value)
 			return config_error_nonbool(var);
diff --git a/environment.c b/environment.c
index 0d48d52fba..4a24d8126b 100644
--- a/environment.c
+++ b/environment.c
@@ -23,6 +23,7 @@ int check_stat = 1;
 int has_symlinks = 1;
 int minimum_abbrev = 4, default_abbrev = -1;
 int default_abbrev_relative = 0;
+int validate_abbrev = 1;
 int ignore_case;
 int assume_unchanged;
 int prefer_symlink_refs;
diff --git a/sha1-name.c b/sha1-name.c
index 75f1bef7d1..57dc782782 100644
--- a/sha1-name.c
+++ b/sha1-name.c
@@ -612,6 +612,10 @@ int find_unique_abbrev_r(char *hex, const struct object_id *oid, int len)
 		}
 		len += dar;
 	}
+	if (!validate_abbrev) {
+		hex[len] = 0;
+		return len;
+	}
 
 	mad.init_len = len;
 	mad.cur_len = len;
diff --git a/t/perf/p0014-abbrev.sh b/t/perf/p0014-abbrev.sh
new file mode 100755
index 0000000000..15c3764265
--- /dev/null
+++ b/t/perf/p0014-abbrev.sh
@@ -0,0 +1,13 @@
+#!/bin/sh
+
+test_description="Tests core.validateAbbrev performance"
+
+. ./perf-lib.sh
+
+test_perf_large_repo
+
+test_perf 'git log --oneline --raw --parents' '
+	git -c core.abbrev=15 -c core.validateAbbrev=false log --oneline --raw --parents >/dev/null
+'
+
+test_done
diff --git a/t/t1512-rev-parse-disambiguation.sh b/t/t1512-rev-parse-disambiguation.sh
index 96fe3754c8..89565af508 100755
--- a/t/t1512-rev-parse-disambiguation.sh
+++ b/t/t1512-rev-parse-disambiguation.sh
@@ -232,6 +232,53 @@ test_expect_success 'more history' '
 
 '
 
+test_expect_success 'ambiguous objects and core.abbrev' '
+	cat >expected <<-\EOF &&
+	00000000006
+	0000000005
+	00000000008
+	000000000004
+	0000000000e
+	EOF
+	git -c core.abbrev=4 log --pretty=tformat:%h >actual &&
+	test_cmp expected actual
+'
+
+test_expect_success 'ambiguous objects and core.validateAbbrev' '
+	git -c core.abbrev=4 -c core.validateabbrev=true log --pretty=tformat:%h >actual &&
+	test_cmp expected actual &&
+
+	cat >expected <<-\EOF &&
+	0000
+	0000
+	0000
+	0000
+	0000
+	EOF
+	git -c core.abbrev=4 -c core.validateabbrev=false log --pretty=tformat:%h >actual &&
+	test_cmp expected actual &&
+
+	cat >expected <<-\EOF &&
+	00000000006
+	0000000005b
+	00000000008
+	000000000004
+	0000000000e
+	EOF
+	git -c core.abbrev=+4 log --pretty=tformat:%h >actual &&
+	test_cmp expected actual &&
+
+	cat >expected <<-\EOF &&
+	00000000006
+	0000000005b
+	00000000008
+	00000000000
+	0000000000e
+	EOF
+	git -c core.abbrev=+4 -c core.validateabbrev=false log --pretty=tformat:%h >actual &&
+	test_cmp expected actual
+'
+
 test_expect_failure 'parse describe name taking advantage of generation' '
 	# ambiguous at the object name level, but there is only one
 	# such commit at generation 0
-- 
2.17.0.290.gded63e768a


^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/20] abbrev tests: test for "git-log" behavior
  2018-06-08 22:41 ` [PATCH 09/20] abbrev tests: test for "git-log" behavior Ævar Arnfjörð Bjarmason
@ 2018-06-09  8:43   ` Martin Ågren
  2018-06-09  9:56     ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 37+ messages in thread
From: Martin Ågren @ 2018-06-09  8:43 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Jeff Hostetler,
	Jeff King, Johannes Schindelin, Jonathan Nieder, Linus Torvalds

On 9 June 2018 at 00:41, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> The "log" family of commands does its own parsing for --abbrev in
> revision.c, so having dedicated tests for it makes sense.

> +for i in $(test_seq 4 40)

I've just been skimming so might have missed something, but I see
several instances of this construct, and I wonder what this brute-force
approach really buys us. An alternative would be, e.g., "for i in 4 23
40". That is, min/max and some arbitrary number in between (odd because
the others are even).

Of course, we might have a bug which magically happens for the number 9,
but I'd expect us to test for that only if we have some reason to
believe that number 9 is indeed magical.

Also, 40 is of course tied to SHA-1. You could perhaps define a variable
at the top of this file to simplify a future generalization. (Same for
39/41 which are related to 40.)

Martin

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/20] abbrev tests: test for "git-log" behavior
  2018-06-09  8:43   ` Martin Ågren
@ 2018-06-09  9:56     ` Ævar Arnfjörð Bjarmason
  2018-06-09 13:56       ` Martin Ågren
  0 siblings, 1 reply; 37+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-09  9:56 UTC (permalink / raw)
  To: Martin Ågren
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Jeff Hostetler,
	Jeff King, Johannes Schindelin, Jonathan Nieder, Linus Torvalds

On Sat, Jun 09 2018, Martin Ågren wrote:

> On 9 June 2018 at 00:41, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>> The "log" family of commands does its own parsing for --abbrev in
>> revision.c, so having dedicated tests for it makes sense.
>
>> +for i in $(test_seq 4 40)
>
> I've just been skimming so might have missed something, but I see
> several instances of this construct, and I wonder what this brute-force
> approach really buys us. An alternative would be, e.g., "for i in 4 23
> 40". That is, min/max and some arbitrary number in between (odd because
> the others are even).
>
> Of course, we might have a bug which magically happens for the number 9,
> but I'd expect us to test for that only if we have some reason to
> believe that number 9 is indeed magical.

Good point, I'll change this in v2, or at least guard it with
EXPENSIVE. I hacked it up like this while exhaustively testing things
during development, and discovered some edge cases (e.g. "0" is special
sometimes).

> Also, 40 is of course tied to SHA-1. You could perhaps define a variable
> at the top of this file to simplify a future generalization. (Same for
> 39/41 which are related to 40.)

I forgot to note this in the commit message, but I intentionally didn't
guard this test with the SHA1 prereq, there's nothing per-se specific to
SHA-1 here, it's not a given that whatever our NewHash is that we won't
use 40 characters, and the rest of the magic constants like 4 and 7 is
something we're likely to retain with NewHash.

Although maybe we should expose GIT_SHA1_HEXSZ to the test suite.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 09/20] abbrev tests: test for "git-log" behavior
  2018-06-09  9:56     ` Ævar Arnfjörð Bjarmason
@ 2018-06-09 13:56       ` Martin Ågren
  0 siblings, 0 replies; 37+ messages in thread
From: Martin Ågren @ 2018-06-09 13:56 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Jeff Hostetler,
	Jeff King, Johannes Schindelin, Jonathan Nieder, Linus Torvalds

On 9 June 2018 at 11:56, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>
> On Sat, Jun 09 2018, Martin Ågren wrote:
>
>> On 9 June 2018 at 00:41, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>>> The "log" family of commands does its own parsing for --abbrev in
>>> revision.c, so having dedicated tests for it makes sense.
>>
>>> +for i in $(test_seq 4 40)
>>
>> I've just been skimming so might have missed something, but I see
>> several instances of this construct, and I wonder what this brute-force
>> approach really buys us. An alternative would be, e.g., "for i in 4 23
>> 40". That is, min/max and some arbitrary number in between (odd because
>> the others are even).
>>
>> Of course, we might have a bug which magically happens for the number 9,
>> but I'd expect us to test for that only if we have some reason to
>> believe that number 9 is indeed magical.
>
> Good point, I'll change this in v2, or at least guard it with
> EXPENSIVE. I hacked it up like this while exhaustively testing things
> during development, and discovered some edge cases (e.g. "0" is special
> sometimes).

Ah, "useful during hacking" explains why you did it like this. Of your
two approaches, I'd probably favour "make it cheaper" over "mark it as
EXPENSIVE". Nothing I feel strongly about.

>> Also, 40 is of course tied to SHA-1. You could perhaps define a variable
>> at the top of this file to simplify a future generalization. (Same for
>> 39/41 which are related to 40.)
>
> I forgot to note this in the commit message, but I intentionally didn't
> guard this test with the SHA1 prereq, there's nothing per-se specific to
> SHA-1 here, it's not a given that whatever our NewHash is that we won't
> use 40 characters, and the rest of the magic constants like 4 and 7 is
> something we're likely to retain with NewHash.

I'd tend to agree about not marking this SHA1.

> Although maybe we should expose GIT_SHA1_HEXSZ to the test suite.

It seems like brian's "test_translate"-approach [1] would be a good
choice of tool for this. That is, you'd just define something at the top
of this file for now, then once that tool is in place, a one-line change
could get "hexsz" from `test_translate` instead.

[1] https://public-inbox.org/git/20180604235229.279814-2-sandals@crustytoothpaste.net/

Martin

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 17/20] abbrev: unify the handling of empty values
  2018-06-08 22:41 ` [PATCH 17/20] abbrev: unify the handling of empty values Ævar Arnfjörð Bjarmason
@ 2018-06-09 14:24   ` Martin Ågren
  2018-06-09 14:31     ` Martin Ågren
  0 siblings, 1 reply; 37+ messages in thread
From: Martin Ågren @ 2018-06-09 14:24 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Jeff Hostetler,
	Jeff King, Johannes Schindelin, Jonathan Nieder, Linus Torvalds

On 9 June 2018 at 00:41, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> For no good reason the --abbrev= command-line option was less strict
> than the core.abbrev config option, which came down to the latter
> using git_config_int() which rejects an empty string, but the rest of
> the parsing using strtoul() which will convert it to 0.

It will still be less strict in that it accepts trailing garbage, e.g.,
`--abbrev=7a`. Probably ok to leave it at that in this series, but
possibly useful to mention here that this only makes these options "less
differently strict".

I also notice that the documentation of `--abbrev` starts with "Instead
of showing the full 40-byte hexadecimal object name" which doesn't seem
right. I get much shorter IDs and I can't see that I'd have any
configuration causing this. Anyway, that might not be anything this
series needs to do anything about.

> +               if (!strcmp(arg, ""))
> +                       die("--abbrev expects a value, got '%s'", arg);

> +               if (!strcmp(arg, ""))
> +                       return opterror(opt, "expects a value", 0);

> +               if (!strcmp(optarg, ""))
> +                       die("--abbrev expects a value, got '%s'", optarg);

> +       test_must_fail git branch -v --abbrev= 2>stderr &&
> +       test_i18ngrep "expects a value" stderr &&

> +       test_must_fail git log --abbrev= -1 --pretty=format:%h 2>stderr &&
> +       test_i18ngrep "expects a value" stderr &&

> +       test_must_fail git diff --raw --abbrev= HEAD~ 2>stderr &&
> +       test_i18ngrep "expects a value" stderr &&

Mismatch re i18n-ed or not between implementations and tests?

Martin

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 17/20] abbrev: unify the handling of empty values
  2018-06-09 14:24   ` Martin Ågren
@ 2018-06-09 14:31     ` Martin Ågren
  0 siblings, 0 replies; 37+ messages in thread
From: Martin Ågren @ 2018-06-09 14:31 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Jeff Hostetler,
	Jeff King, Johannes Schindelin, Jonathan Nieder, Linus Torvalds

On 9 June 2018 at 16:24, Martin Ågren <martin.agren@gmail.com> wrote:
> On 9 June 2018 at 00:41, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>> For no good reason the --abbrev= command-line option was less strict
>> than the core.abbrev config option, which came down to the latter
>> using git_config_int() which rejects an empty string, but the rest of
>> the parsing using strtoul() which will convert it to 0.
>
> It will still be less strict in that it accepts trailing garbage, e.g.,
> `--abbrev=7a`. Probably ok to leave it at that in this series, but
> possibly useful to mention here that this only makes these options "less
> differently strict".

Hmpf, please ignore. That's what I get for looking at a few patches,
taking a break, picking it up again and completely forgetting what's
going on...

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 19/20] abbrev: support relative abbrev values
  2018-06-08 22:41 ` [PATCH 19/20] abbrev: support relative abbrev values Ævar Arnfjörð Bjarmason
@ 2018-06-09 15:38   ` Martin Ågren
  2018-06-12 19:16   ` Junio C Hamano
  1 sibling, 0 replies; 37+ messages in thread
From: Martin Ågren @ 2018-06-09 15:38 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Jeff Hostetler,
	Jeff King, Johannes Schindelin, Jonathan Nieder, Linus Torvalds

On 9 June 2018 at 00:41, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index ab641bf5a9..abf07be7b6 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -919,6 +919,12 @@ core.abbrev::
>         in your repository, which hopefully is enough for
>         abbreviated object names to stay unique for some time.
>         The minimum length is 4.
> ++
> +This can also be set to relative values such as `+2` or `-2`, which
> +means to add or subtract N characters from the SHA-1 that Git would
> +otherwise print, this allows for producing more future-proof SHA-1s
> +for use within a given project, while adjusting the value for the
> +current approximate number of objects.

How about s/, this/. This/ to break it up a little? Also, you write "+2"
and "-2" but then "N". Unify it?

Also, I'd suggest s/SHA-1/object ID/ to be future-proof.

> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index f466600972..f1114a7b8d 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -384,6 +384,9 @@ endif::git-format-patch[]
>         independent of the `--full-index` option above, which controls
>         the diff-patch output format.  Non default number of
>         digits can be specified with `--abbrev=<n>`.
> ++
> +Can also be set to a relative value, see `core.abbrev` in
> +linkgit:git-diff[1].

Good. You then add this paragraph to lots of other places...

> diff --git a/config.c b/config.c
> index 12f762ad92..cd95c6bdfb 100644
> --- a/config.c
> +++ b/config.c
> @@ -1151,6 +1151,17 @@ static int git_default_core_config(const char *var, const char *value)
>                         return config_error_nonbool(var);
>                 if (!strcasecmp(value, "auto")) {
>                         default_abbrev = -1;
> +               } else if (*value == '+' || *value == '-') {
> +                       int relative = git_config_int(var, value);
> +                       if (relative == 0)
> +                               die(_("bad core.abbrev value %s. "

Trailing period? Same below.

> +                                     "relative values must be non-zero"),
> +                                   value);
> +                       if (abs(relative) > GIT_SHA1_HEXSZ)
> +                               die(_("bad core.abbrev value %s. "
> +                                     "impossibly out of range"),
> +                                   value);
> +                       default_abbrev_relative = relative;
>                 } else {
>                         int abbrev = git_config_int(var, value);
>                         if (abbrev < minimum_abbrev || abbrev > 40)
> diff --git a/diff.c b/diff.c
> index e0141cfbc0..f7861b8472 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -4801,16 +4801,28 @@ int diff_opt_parse(struct diff_options *options,
>         else if (!strcmp(arg, "--abbrev"))
>                 options->abbrev = DEFAULT_ABBREV;
>         else if (skip_prefix(arg, "--abbrev=", &arg)) {
> +               int v;
>                 char *end;
>                 if (!strcmp(arg, ""))
>                         die("--abbrev expects a value, got '%s'", arg);
> -               options->abbrev = strtoul(arg, &end, 10);
> +               v = strtoul(arg, &end, 10);
>                 if (*end)
>                         die("--abbrev expects a numerical value, got '%s'", arg);
> -               if (options->abbrev < MINIMUM_ABBREV) {
> +               if (*arg == '+' || *arg == '-') {
> +                       if (v == 0) {
> +                               die("relative abbrev must be non-zero");
> +                       } else if (abs(v) > the_hash_algo->hexsz) {
> +                               die("relative abbrev impossibly out of range");
> +                       } else {
> +                               default_abbrev_relative = v;
> +                               options->abbrev = DEFAULT_ABBREV;
> +                       }
> +               } else if (v < MINIMUM_ABBREV) {
>                         options->abbrev = MINIMUM_ABBREV;
> -               } else if (the_hash_algo->hexsz < options->abbrev) {
> +               } else if (the_hash_algo->hexsz < v) {
>                         options->abbrev = the_hash_algo->hexsz;
> +               } else {
> +                       options->abbrev = v;

I've cut out a few instances of more-or-less repeated code. Any
possibility of extracting this into a helper? Maybe after you've done
the preparatory work of unifying these sites. Or as part of it, i.e.,
"let's switch this spot to use the helper; that makes it stricter in
this-and-that sense".

These can't all be entirely unified, I guess, but maybe "mostly"?

Martin

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 20/20] abbrev: add a core.validateAbbrev setting
  2018-06-08 22:41 ` [PATCH 20/20] abbrev: add a core.validateAbbrev setting Ævar Arnfjörð Bjarmason
@ 2018-06-09 15:47   ` Martin Ågren
  2018-06-12 18:58     ` Junio C Hamano
  0 siblings, 1 reply; 37+ messages in thread
From: Martin Ågren @ 2018-06-09 15:47 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, Junio C Hamano, Derrick Stolee, Jeff Hostetler,
	Jeff King, Johannes Schindelin, Jonathan Nieder, Linus Torvalds

On 9 June 2018 at 00:41, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:

> Instead of trying really hard to find an unambiguous SHA-1 we can with
> core.validateAbbrev=false, and preferably combined with the new
> support for relative core.abbrev values (such as +4) unconditionally
> print a short SHA-1 without doing any disambiguation check. I.e. it

This first paragraph read weirdly the first time. On the second attempt
I knew how to structure it and got it right. It might be easier to read
if the part about core.appreb=+4 were in a separate second sentence.

That last "it" is "the combination of these two configs", right?

> allows for picking a trade-off between performance, and the odds that
> future or remote (or current and local) short SHA-1 will be ambiguous.

> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index abf07be7b6..df31d1351f 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -925,6 +925,49 @@ means to add or subtract N characters from the SHA-1 that Git would
>  otherwise print, this allows for producing more future-proof SHA-1s
>  for use within a given project, while adjusting the value for the
>  current approximate number of objects.
> ++
> +This is especially useful in combination with the
> +`core.validateAbbrev` setting, or to get more future-proof hashes to
> +reference in the future in a repository whose number of objects is
> +expected to grow.

Maybe s/validateAbbrev/validateAbbrev = false/?

> +
> +core.validateAbbrev::
> +       If set to false (true by default) don't do any validation when
> +       printing abbreviated object names to see if they're really
> +       unique. This makes printing objects more performant at the
> +       cost of potentially printing object names that aren't unique
> +       within the current repository.

Good. I understand why I'd want to use it, and why not.

> ++
> +When printing abbreviated object names Git needs to look through the
> +local object store. This is an `O(log N)` operation assuming all the
> +objects are in a single pack file, but `X * O(log N)` given `X` pack
> +files, which can get expensive on some larger repositories.

This might be very close to too much information.

> ++
> +This setting changes that to `O(1)`, but with the trade-off that
> +depending on the value of `core.abbrev` we may be printing abbreviated
> +hashes that collide. Too see how likely this is, try running:
> ++
> +-----------------------------------------------------------------------------------------------------------
> +git log --all --pretty=format:%h --abbrev=4 | perl -nE 'chomp; say length' | sort | uniq -c | sort -nr
> +-----------------------------------------------------------------------------------------------------------
> ++
> +This shows how many commits were found at each abbreviation length. On
> +linux.git in June 2018 this shows a bit more than 750,000 commits,
> +with just 4 needing 11 characters to be fully abbreviated, and the
> +default heuristic picks a length of 12.

These last few paragraphs seem like too much to me.

> ++
> +Even without `core.validateAbbrev=false` the results abbreviation
> +already a bit of a probability game. They're guaranteed at the moment
> +of generation, but as more objects are added, ambiguities may be
> +introduced. Likewise, what's unambiguous for you may not be for
> +somebody else you're communicating with, if they have their own clone.

This seems more useful.

> ++
> +Therefore the default of `core.validateAbbrev=true` may not save you
> +in practice if you're sharing the SHA-1 or noting it now to use after
> +a `git fetch`. You may be better off setting `core.abbrev` to
> +e.g. `+2` to add 2 extra characters to the SHA-1, and possibly combine
> +that with `core.validateAbbrev=false` to get a reasonable trade-off
> +between safety and performance.

Makes sense. As before, I'd suggest s/SHA-1/object ID/.

I do wonder if this documentation could be a bit less verbose without
sacrificing too much correctness and clarity. No brilliant suggestions
on how to do that, sorry.

Martin

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 03/20] blame doc: explicitly note how --abbrev=40 gives 39 chars
  2018-06-08 22:41 ` [PATCH 03/20] blame doc: explicitly note how --abbrev=40 gives 39 chars Ævar Arnfjörð Bjarmason
@ 2018-06-12 18:10   ` Junio C Hamano
  0 siblings, 0 replies; 37+ messages in thread
From: Junio C Hamano @ 2018-06-12 18:10 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, dstolee, git, peff, johannes.schindelin, jrnieder,
	Linus Torvalds

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> In a later change I'm adding stress testing of the commit abbreviation
> as it relates to git-blame and others, and initially thought that the
> inability to extract full SHA-1s from the non-"--porcelain" output was
> a bug.

... meaning that it is not actually a bug, as the output format
other than porcelain is for human consumption?

> --- a/Documentation/git-blame.txt
> +++ b/Documentation/git-blame.txt
> @@ -88,6 +88,11 @@ include::blame-options.txt[]
>  	Instead of using the default 7+1 hexadecimal digits as the
>  	abbreviated object name, use <n>+1 digits. Note that 1 column
>  	is used for a caret to mark the boundary commit.

This is outside the scope of this patch, but is the above 7+1 still
current or do we need updating it for the (not so) recent change to
auto-scale the default abbreviation width?

> ++
> +Because of this UI design, the only way to get the full SHA-1 of the
> +boundary commit is to use the `--porcelain` format. With `--abbrev=40`
> +only 39 characters of the boundary SHA-1 will be emitted, since one
> +will be used for the caret to mark the boundary.

OK.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 04/20] abbrev tests: add tests for core.abbrev and --abbrev
  2018-06-08 22:41 ` [PATCH 04/20] abbrev tests: add tests for core.abbrev and --abbrev Ævar Arnfjörð Bjarmason
@ 2018-06-12 18:31   ` Junio C Hamano
  0 siblings, 0 replies; 37+ messages in thread
From: Junio C Hamano @ 2018-06-12 18:31 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, dstolee, git, peff, johannes.schindelin, jrnieder,
	Linus Torvalds

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> diff --git a/t/t0014-abbrev.sh b/t/t0014-abbrev.sh
> new file mode 100755
> index 0000000000..1c60f5ff93
> --- /dev/null
> +++ b/t/t0014-abbrev.sh
> @@ -0,0 +1,118 @@
> +#!/bin/sh
> +
> +test_description='test core.abbrev and related features'
> +
> +. ./test-lib.sh
> +
> +tr_d_n() {
> +	tr -d '\n'
> +}

- I personally would prefer to see the reason for having a helper
  function like this to be "make it easier to reason about", rather
  "make it shorter to type".  tr_d_n feels more about the latter; if
  we aimed for the former, this would be called strip_LF or
  something.

- In the existing stcipts, it seems that we prefer to spell tr args
  with 3-octal, e.g.  \012 for LF, \015 for CR.

- Also let's have SP on both sides of (), i.e.

	funcname () {

  for shell function definitions.

> +cut_tr_d_n_field_n() {
> +	cut -d " " -f $1 | tr_d_n
> +}

Likewise.  Name this not after how it does what it does, but after
what it does and why it does so.  In other words, if your answer to
the question: "what does the caller want?" is "it wants to pick the
nth field", then name it "pick_nth_field" or something?

> +test_expect_success 'abbrev empty value handling differs ' '
> +	test_must_fail git -c core.abbrev= log -1 --pretty=format:%h 2>stderr &&
> +	test_i18ngrep "bad numeric config value.*invalid unit" stderr &&
> +
> +	git branch -v --abbrev= | cut_tr_d_n_field_n 3 >branch &&
> +	test_byte_count = 40 branch &&

Sounds like a good thing to unify.  If anything, --options=value
should be stricter than vari.able=value but it is the other way
around.

> +	git log --abbrev= -1 --pretty=format:%h >log &&
> +	test_byte_count = 4 log &&

Makes readers wonder if 4 is about 3 hex plus terminating LF.  The
reason why this works is because --pretty=format:%h (not --format=%h
or --pretty=tformat:%h) uses delimiter semantics and we won't need
any LF to show a single record.

If we use the helper to measure the length of hexadecimal digits, it
may make sense to add a wrapper around test_byte_count that strips
LF; that way, the caller can use --format=%h instead and there will
be one less thing for the reader to worry about.

> +	git diff --raw --abbrev= HEAD~ >diff &&
> +	cut_tr_d_n_field_n 3 <diff >diff.3 &&
> +	test_byte_count = 4 diff.3 &&
> +	cut_tr_d_n_field_n 4 <diff >diff.4 &&
> +	test_byte_count = 4 diff.4 &&

These all depend on the fact that we do not have excessive number of
irrelevant objects to force us to abbreviate using more hexdigits
than the minimum 4, right?  I _think_ that is a reasonable
assumption we can depend on, even across the hash function
transition.  We may want to leave in-code comment, though.

> +	test_must_fail git diff --raw --abbrev= --no-index X Y >diff &&
> +	cut_tr_d_n_field_n 3 <diff >diff.3 &&
> +	test_byte_count = 4 diff.3 &&

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 20/20] abbrev: add a core.validateAbbrev setting
  2018-06-09 15:47   ` Martin Ågren
@ 2018-06-12 18:58     ` Junio C Hamano
  0 siblings, 0 replies; 37+ messages in thread
From: Junio C Hamano @ 2018-06-12 18:58 UTC (permalink / raw)
  To: Martin Ågren
  Cc: Ævar Arnfjörð Bjarmason, Git Mailing List,
	Derrick Stolee, Jeff Hostetler, Jeff King, Johannes Schindelin,
	Jonathan Nieder, Linus Torvalds

Martin Ågren <martin.agren@gmail.com> writes:

>> +This is especially useful in combination with the
>> +`core.validateAbbrev` setting, or to get more future-proof hashes to
>> +reference in the future in a repository whose number of objects is
>> +expected to grow.
>
> Maybe s/validateAbbrev/validateAbbrev = false/?

Perhaps, but even with =true it would equally be useful, as the
point of this setting is to future-proofing.

>> ++
>> +When printing abbreviated object names Git needs to look through the
>> +local object store. This is an `O(log N)` operation assuming all the
>> +objects are in a single pack file, but `X * O(log N)` given `X` pack
>> +files, which can get expensive on some larger repositories.
>
> This might be very close to too much information.

Not very close, but just too much information without crucial detail
(i.e. log N times what constant???).  I'd drop it.

>> ++
>> +This setting changes that to `O(1)`, but with the trade-off that
>> +depending on the value of `core.abbrev` we may be printing abbreviated
>> +hashes that collide.

It may not be technically wrong to say "This changes it to O(1)",
but I think to most people it is more understandable to say "This
changes it to zero" ;-)

    Setting this variable to false makes Git not to validate the
    abbreviation it produces uniquely identifies an object among the
    current set of objects in the repository.  Depending on the
    value of `core.abbrev`, we may be printing abbreviated hashes
    that collide.  Note that setting this variable to true (or
    leaving it unset) does not guarantee that an abbreviated hash
    will never collide with future objects in the repository (you
    need to set core.abbrevLength to a larger value for that).

would be sufficient to clarify, and also nuke the following
overly-detailed paragraph.

>> ... Too see how likely this is, try running:
>> ++
>> +-----------------------------------------------------------------------------------------------------------
>> +git log --all --pretty=format:%h --abbrev=4 | perl -nE 'chomp; say length' | sort | uniq -c | sort -nr
>> +-----------------------------------------------------------------------------------------------------------
>> ++
>> +This shows how many commits were found at each abbreviation length. On
>> +linux.git in June 2018 this shows a bit more than 750,000 commits,
>> +with just 4 needing 11 characters to be fully abbreviated, and the
>> +default heuristic picks a length of 12.
>
> These last few paragraphs seem like too much to me.

Yeah, it goes to too low level a detail, especially with the "try
running" part.  I'd remove everything but "On linux.git in June ..."
if I were writing it from the above.

>> ++
>> +Even without `core.validateAbbrev=false` the results abbreviation
>> +already a bit of a probability game. They're guaranteed at the moment
>> +of generation, but as more objects are added, ambiguities may be
>> +introduced. Likewise, what's unambiguous for you may not be for
>> +somebody else you're communicating with, if they have their own clone.
>
> This seems more useful.

Yes, but still overly verbose; I think rolling it in the single
paragraph description like I showed above would be sufficient.

>> ++
>> +Therefore the default of `core.validateAbbrev=true` may not save you
>> +in practice if you're sharing the SHA-1 or noting it now to use after
>> +a `git fetch`. You may be better off setting `core.abbrev` to
>> +e.g. `+2` to add 2 extra characters to the SHA-1, and possibly combine
>> +that with `core.validateAbbrev=false` to get a reasonable trade-off
>> +between safety and performance.
>
> Makes sense. As before, I'd suggest s/SHA-1/object ID/.

Likewise.  If we were to keep it, then s/object ID/object name/.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 19/20] abbrev: support relative abbrev values
  2018-06-08 22:41 ` [PATCH 19/20] abbrev: support relative abbrev values Ævar Arnfjörð Bjarmason
  2018-06-09 15:38   ` Martin Ågren
@ 2018-06-12 19:16   ` Junio C Hamano
  2018-06-13 22:22     ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 37+ messages in thread
From: Junio C Hamano @ 2018-06-12 19:16 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, dstolee, git, peff, johannes.schindelin, jrnieder,
	Linus Torvalds

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Change the core.abbrev config variable and the corresponding --abbrev
> command-line option to support relative values such as +1 or -1.
>
> Before Linus's e6c587c733 ("abbrev: auto size the default
> abbreviation", 2016-09-30) git would default to abbreviating object
> names to 7-hexdigits, and only picking longer SHA-1s as needed if that
> was ambiguous.
>
> That change instead set the default length as a function of the
> estimated current count of objects:
>
>     Based on the expectation that we would see collision in a
>     repository with 2^(2N) objects when using object names shortened
>     to first N bits, use sufficient number of hexdigits to cover the
>     number of objects in the repository.  Each hexdigit (4-bits) we
>     add to the shortened name allows us to have four times (2-bits) as
>     many objects in the repository.
>
> By supporting relative values for core.abbrev we can allow users to
> consistently opt-in for either a higher or lower probability of
> collision, without needing to hardcode a given numeric value like
> "10", which would be overkill on some repositories, and far to small
> on others.

Nicely explained and calculated ;-)

>  test_expect_success 'describe core.abbrev=[-+]1 and --abbrev=[-+]1' '
> -	test_must_fail git -c core.abbrev=+1 describe | sed_g_tr_d_n >describe &&
> -	test_must_fail git -c core.abbrev=-1 describe | sed_g_tr_d_n >describe &&
> +	git -c core.abbrev=-1 describe | sed_g_tr_d_n >describe &&
> +	test_byte_count = 6 describe &&
> +
> +	git -c core.abbrev=+1 describe | sed_g_tr_d_n >describe &&
> +	test_byte_count = 8 describe &&

Even though I see the point of supporting absurdly small absolute
values like 4, I do not quite see the point of supporting negative
relative values here.  What's the expected use case?

>  	git log --abbrev=+1 --pretty=format:%h -1 | tr_d_n >log &&
> -	test_byte_count = 4 log &&
> +	test_byte_count = 8 log &&

This, together with many many others in the rest of the patch, is
cute but confusing in that the diff shows change from 4 to 8 due to
the redefinition of what abbrev=+1 means.  I have a feeling that it
may not be worth doing it "right", but if we were doing it "right",
we would probably have done it in multiple steps:

    - the earlier patches in this series that demonstrates
      --abbrev=+1 is --abbrev=1 and core.abbrev=+1 is an error.

    - ensure --abbrev=+1 is rejected as syntax error just like
      core.abbrev=+1 was, without introducing relative values

    - introduce relative value.

That way, the last step (which corresponds to this patch) would show
change from "log --abbrev=+1" failing due to syntax error to showing
abbreviated value that is slightly longer than the default.

But a I said, it may not be worth doing so.  "Is it worth supporting
negative relative length?" still stands, though.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 19/20] abbrev: support relative abbrev values
  2018-06-12 19:16   ` Junio C Hamano
@ 2018-06-13 22:22     ` Ævar Arnfjörð Bjarmason
  2018-06-13 22:34       ` Junio C Hamano
  0 siblings, 1 reply; 37+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-13 22:22 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, dstolee, git, peff, johannes.schindelin, jrnieder,
	Linus Torvalds


On Tue, Jun 12 2018, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> Change the core.abbrev config variable and the corresponding --abbrev
>> command-line option to support relative values such as +1 or -1.
>>
>> Before Linus's e6c587c733 ("abbrev: auto size the default
>> abbreviation", 2016-09-30) git would default to abbreviating object
>> names to 7-hexdigits, and only picking longer SHA-1s as needed if that
>> was ambiguous.
>>
>> That change instead set the default length as a function of the
>> estimated current count of objects:
>>
>>     Based on the expectation that we would see collision in a
>>     repository with 2^(2N) objects when using object names shortened
>>     to first N bits, use sufficient number of hexdigits to cover the
>>     number of objects in the repository.  Each hexdigit (4-bits) we
>>     add to the shortened name allows us to have four times (2-bits) as
>>     many objects in the repository.
>>
>> By supporting relative values for core.abbrev we can allow users to
>> consistently opt-in for either a higher or lower probability of
>> collision, without needing to hardcode a given numeric value like
>> "10", which would be overkill on some repositories, and far to small
>> on others.
>
> Nicely explained and calculated ;-)
>
>>  test_expect_success 'describe core.abbrev=[-+]1 and --abbrev=[-+]1' '
>> -	test_must_fail git -c core.abbrev=+1 describe | sed_g_tr_d_n >describe &&
>> -	test_must_fail git -c core.abbrev=-1 describe | sed_g_tr_d_n >describe &&
>> +	git -c core.abbrev=-1 describe | sed_g_tr_d_n >describe &&
>> +	test_byte_count = 6 describe &&
>> +
>> +	git -c core.abbrev=+1 describe | sed_g_tr_d_n >describe &&
>> +	test_byte_count = 8 describe &&
>
> Even though I see the point of supporting absurdly small absolute
> values like 4, I do not quite see the point of supporting negative
> relative values here.  What's the expected use case?

I'll add a better explanation for this to the commit message.

Initially I did this for consistency, since it was easy to implement,
and there's no reason to have that arbitrary limitation, but thinking
about it again I think I'll use this for some of my projects.

E.g. here's a breakdown of my dotfiles repo:

    $ git -c core.abbrev=4 log  --pretty=format:%h|perl -nE 'chomp;say length'|sort|uniq -c|sort -nr
        784 4
         59 5
          7 6

I don't have a single commit that needs 7 characters, yet that's our
default. This is a sane trade-off for the kernel, but for something
that's just a toy or something you're playing around with something
shorter can make sense.

SHA-1s aren't just written down, but also e.g. remembered in wetware
short-term memory.

>>  	git log --abbrev=+1 --pretty=format:%h -1 | tr_d_n >log &&
>> -	test_byte_count = 4 log &&
>> +	test_byte_count = 8 log &&
>
> This, together with many many others in the rest of the patch, is
> cute but confusing in that the diff shows change from 4 to 8 due to
> the redefinition of what abbrev=+1 means.  I have a feeling that it
> may not be worth doing it "right", but if we were doing it "right",
> we would probably have done it in multiple steps:
>
>     - the earlier patches in this series that demonstrates
>       --abbrev=+1 is --abbrev=1 and core.abbrev=+1 is an error.
>
>     - ensure --abbrev=+1 is rejected as syntax error just like
>       core.abbrev=+1 was, without introducing relative values
>
>     - introduce relative value.
>
> That way, the last step (which corresponds to this patch) would show
> change from "log --abbrev=+1" failing due to syntax error to showing
> abbreviated value that is slightly longer than the default.
>
> But a I said, it may not be worth doing so.  "Is it worth supporting
> negative relative length?" still stands, though.

I'll see what I can do about this value churn.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 19/20] abbrev: support relative abbrev values
  2018-06-13 22:22     ` Ævar Arnfjörð Bjarmason
@ 2018-06-13 22:34       ` Junio C Hamano
  2018-06-14  7:36         ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 37+ messages in thread
From: Junio C Hamano @ 2018-06-13 22:34 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, dstolee, git, peff, johannes.schindelin, jrnieder,
	Linus Torvalds

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> E.g. here's a breakdown of my dotfiles repo:
>
>     $ git -c core.abbrev=4 log  --pretty=format:%h|perl -nE 'chomp;say length'|sort|uniq -c|sort -nr
>         784 4
>          59 5
>           7 6
>
> I don't have a single commit that needs 7 characters, yet that's our
> default. This is a sane trade-off for the kernel, but for something
> that's just a toy or something you're playing around with something
> shorter can make sense.
>
> SHA-1s aren't just written down, but also e.g. remembered in wetware
> short-term memory.

That's a fine example of what I called "supporting absurdly small
absolute values like 4"; I still do not see why you want "negative
relative values" from that example.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 19/20] abbrev: support relative abbrev values
  2018-06-13 22:34       ` Junio C Hamano
@ 2018-06-14  7:36         ` Ævar Arnfjörð Bjarmason
  2018-06-14 15:50           ` Junio C Hamano
  0 siblings, 1 reply; 37+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-14  7:36 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, dstolee, git, peff, johannes.schindelin, jrnieder,
	Linus Torvalds


On Wed, Jun 13 2018, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> E.g. here's a breakdown of my dotfiles repo:
>>
>>     $ git -c core.abbrev=4 log  --pretty=format:%h|perl -nE 'chomp;say length'|sort|uniq -c|sort -nr
>>         784 4
>>          59 5
>>           7 6
>>
>> I don't have a single commit that needs 7 characters, yet that's our
>> default. This is a sane trade-off for the kernel, but for something
>> that's just a toy or something you're playing around with something
>> shorter can make sense.
>>
>> SHA-1s aren't just written down, but also e.g. remembered in wetware
>> short-term memory.
>
> That's a fine example of what I called "supporting absurdly small
> absolute values like 4"; I still do not see why you want "negative
> relative values" from that example.

Because hardcoding -2 is very different than setting it to 5, because
the -2 will scale to the size of the repository, but 5 is just 7-2 where
7 is our default value.

So, in general if you want less future proof hashes by some
probabilistic metric (whether you use core.validateAbbrev or not) you'd
use -2 or -3, just like you might use +2 or +3 if you'd like to have
more future-proof hashes (especially with core.validateAbbrev=true).

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 19/20] abbrev: support relative abbrev values
  2018-06-14  7:36         ` Ævar Arnfjörð Bjarmason
@ 2018-06-14 15:50           ` Junio C Hamano
  2018-06-14 19:07             ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 37+ messages in thread
From: Junio C Hamano @ 2018-06-14 15:50 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, dstolee, git, peff, johannes.schindelin, jrnieder,
	Linus Torvalds

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Wed, Jun 13 2018, Junio C Hamano wrote:
>
>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>>
>>> E.g. here's a breakdown of my dotfiles repo:
>>>
>>>     $ git -c core.abbrev=4 log  --pretty=format:%h|perl -nE 'chomp;say length'|sort|uniq -c|sort -nr
>>>         784 4
>>>          59 5
>>>           7 6
>>>
>>> I don't have a single commit that needs 7 characters, yet that's our
>>> default. This is a sane trade-off for the kernel, but for something
>>> that's just a toy or something you're playing around with something
>>> shorter can make sense.
>>>
>>> SHA-1s aren't just written down, but also e.g. remembered in wetware
>>> short-term memory.
>>
>> That's a fine example of what I called "supporting absurdly small
>> absolute values like 4"; I still do not see why you want "negative
>> relative values" from that example.
>
> Because hardcoding -2 is very different than setting it to 5, because
> the -2 will scale to the size of the repository, but 5 is just 7-2 where
> 7 is our default value.
>
> So, in general if you want less future proof hashes by some
> probabilistic metric (whether you use core.validateAbbrev or not) you'd
> use -2 or -3, just like you might use +2 or +3 if you'd like to have
> more future-proof hashes (especially with core.validateAbbrev=true).

That still does not make much sense to me at all.

I do agree that something shorter than the default 7 may be more
appropriate for our wetware short-term memory, and it would make
sense to grow the "riskier to collide than the default heuristics
but more memorable" variant as the project grows, _ONLY_ _IF_ our
wetware short-term memory scales with the project we happen to be
working on.  But our wetware does not scale with the project we work
on; at least mine does not.

So...


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 19/20] abbrev: support relative abbrev values
  2018-06-14 15:50           ` Junio C Hamano
@ 2018-06-14 19:07             ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 37+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-06-14 19:07 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: git, dstolee, git, peff, johannes.schindelin, jrnieder,
	Linus Torvalds


On Thu, Jun 14 2018, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> On Wed, Jun 13 2018, Junio C Hamano wrote:
>>
>>> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>>>
>>>> E.g. here's a breakdown of my dotfiles repo:
>>>>
>>>>     $ git -c core.abbrev=4 log  --pretty=format:%h|perl -nE 'chomp;say length'|sort|uniq -c|sort -nr
>>>>         784 4
>>>>          59 5
>>>>           7 6
>>>>
>>>> I don't have a single commit that needs 7 characters, yet that's our
>>>> default. This is a sane trade-off for the kernel, but for something
>>>> that's just a toy or something you're playing around with something
>>>> shorter can make sense.
>>>>
>>>> SHA-1s aren't just written down, but also e.g. remembered in wetware
>>>> short-term memory.
>>>
>>> That's a fine example of what I called "supporting absurdly small
>>> absolute values like 4"; I still do not see why you want "negative
>>> relative values" from that example.
>>
>> Because hardcoding -2 is very different than setting it to 5, because
>> the -2 will scale to the size of the repository, but 5 is just 7-2 where
>> 7 is our default value.
>>
>> So, in general if you want less future proof hashes by some
>> probabilistic metric (whether you use core.validateAbbrev or not) you'd
>> use -2 or -3, just like you might use +2 or +3 if you'd like to have
>> more future-proof hashes (especially with core.validateAbbrev=true).
>
> That still does not make much sense to me at all.
>
> I do agree that something shorter than the default 7 may be more
> appropriate for our wetware short-term memory, and it would make
> sense to grow the "riskier to collide than the default heuristics
> but more memorable" variant as the project grows, _ONLY_ _IF_ our
> wetware short-term memory scales with the project we happen to be
> working on.  But our wetware does not scale with the project we work
> on; at least mine does not.

Yes, it's a trade-off, but just because something is a trade-off doesn't
make it useless.

Aside from the feature I'm proposing, the same thing applies to the
short SHA-1 currently. My ~1k commit dotfiles has 7 characters, but
linux.git has 12. Does that make printing the short SHA-1 at all useless
and we should just fall back to 40 characters if it's say >= 12? No.

12 is still better than 40, but if we could get away with it 10 would be
better than 12. Right now the "get away with it" calculation is a
hardcoded constant, this makes it configurable.

The reason to make it configurable is because you may want more future
proof *or* less future-proof SHA-1s depending on the use-case. Printing
a longer SHA-1 has a cost, including but not limited to:

 * Remembering it for a short time
 * Seeing it on one screen and typing it into another computer manually
 * `git log --oneline` output in a terminal where horizontal space is at
   a premium

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2018-06-14 19:08 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-08 22:41 [PATCH 00/20] unconditional O(1) SHA-1 abbreviation Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 01/20] t/README: clarify the description of test_line_count Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 02/20] test library: add a test_byte_count Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 03/20] blame doc: explicitly note how --abbrev=40 gives 39 chars Ævar Arnfjörð Bjarmason
2018-06-12 18:10   ` Junio C Hamano
2018-06-08 22:41 ` [PATCH 04/20] abbrev tests: add tests for core.abbrev and --abbrev Ævar Arnfjörð Bjarmason
2018-06-12 18:31   ` Junio C Hamano
2018-06-08 22:41 ` [PATCH 05/20] abbrev tests: test "git-blame" behavior Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 06/20] blame: fix a bug, core.abbrev should work like --abbrev Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 07/20] abbrev tests: test "git branch" behavior Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 08/20] abbrev tests: test for "git-describe" behavior Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 09/20] abbrev tests: test for "git-log" behavior Ævar Arnfjörð Bjarmason
2018-06-09  8:43   ` Martin Ågren
2018-06-09  9:56     ` Ævar Arnfjörð Bjarmason
2018-06-09 13:56       ` Martin Ågren
2018-06-08 22:41 ` [PATCH 10/20] abbrev tests: test for "git-diff" behavior Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 11/20] abbrev tests: test for plumbing behavior Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 12/20] abbrev tests: test for --abbrev and core.abbrev=[+-]N Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 13/20] parse-options-cb.c: convert uses of 40 to GIT_SHA1_HEXSZ Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 14/20] config.c: use braces on multiple conditional arms Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 15/20] parse-options-cb.c: " Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 16/20] abbrev: unify the handling of non-numeric values Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 17/20] abbrev: unify the handling of empty values Ævar Arnfjörð Bjarmason
2018-06-09 14:24   ` Martin Ågren
2018-06-09 14:31     ` Martin Ågren
2018-06-08 22:41 ` [PATCH 18/20] abbrev parsing: use braces on multiple conditional arms Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 19/20] abbrev: support relative abbrev values Ævar Arnfjörð Bjarmason
2018-06-09 15:38   ` Martin Ågren
2018-06-12 19:16   ` Junio C Hamano
2018-06-13 22:22     ` Ævar Arnfjörð Bjarmason
2018-06-13 22:34       ` Junio C Hamano
2018-06-14  7:36         ` Ævar Arnfjörð Bjarmason
2018-06-14 15:50           ` Junio C Hamano
2018-06-14 19:07             ` Ævar Arnfjörð Bjarmason
2018-06-08 22:41 ` [PATCH 20/20] abbrev: add a core.validateAbbrev setting Ævar Arnfjörð Bjarmason
2018-06-09 15:47   ` Martin Ågren
2018-06-12 18:58     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).