git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/18] alternate object database cleanups
@ 2016-10-03 20:33 Jeff King
  2016-10-03 20:33 ` [PATCH 01/18] t5613: drop reachable_via function Jeff King
                   ` (19 more replies)
  0 siblings, 20 replies; 84+ messages in thread
From: Jeff King @ 2016-10-03 20:33 UTC (permalink / raw)
  To: git; +Cc: René Scharfe

This series is the result of René nerd-sniping me with the claim that we
could "easily" teach count-objects to print out the list of alternates
in:

  http://public-inbox.org/git/c27dc1a4-3c7a-2866-d9d8-f5d3eb161650@web.de/

My real goal is just patch 17, which is needed for the quarantine series
in that thread. But along the way there were quite a few opportunities
for cleanups along with a few minor bugfixes (in patches 7 and 18), and
I think the count-objects change in patch 16 is a nice general debugging
tool.

The rest of it is "just" cleanup, but I'll note that it clears up some
hairy allocation code. These were bits that I noticed in my big
allocation-cleanup series last year, but were too nasty to fit any of
the more general fixes. I think the end result is much better.

The number of patches is a little intimidating, but I tried hard to
break the refactoring down into a sequence of obviously-correct steps.
You can be the judge of my success.

  [01/18]: t5613: drop reachable_via function
  [02/18]: t5613: drop test_valid_repo function
  [03/18]: t5613: use test_must_fail
  [04/18]: t5613: whitespace/style cleanups
  [05/18]: t5613: do not chdir in main process
  [06/18]: t5613: clarify "too deep" recursion tests
  [07/18]: link_alt_odb_entry: handle normalize_path errors
  [08/18]: link_alt_odb_entry: refactor string handling
  [09/18]: alternates: provide helper for adding to alternates list
  [10/18]: alternates: provide helper for allocating alternate
  [11/18]: alternates: encapsulate alt->base munging
  [12/18]: alternates: use a separate scratch space
  [13/18]: fill_sha1_file: write "boring" characters
  [14/18]: alternates: store scratch buffer as strbuf
  [15/18]: fill_sha1_file: write into a strbuf
  [16/18]: count-objects: report alternates via verbose mode
  [17/18]: sha1_file: always allow relative paths to alternates
  [18/18]: alternates: use fspathcmp to detect duplicates

 Documentation/git-count-objects.txt |   5 +
 builtin/count-objects.c             |  12 +++
 builtin/fsck.c                      |  10 +-
 builtin/submodule--helper.c         |  11 +-
 cache.h                             |  36 ++++++-
 sha1_file.c                         | 179 ++++++++++++++++++--------------
 sha1_name.c                         |  17 +--
 strbuf.c                            |  20 ++++
 strbuf.h                            |   8 ++
 submodule.c                         |  23 +---
 t/t5613-info-alternate.sh           | 202 ++++++++++++++++++++----------------
 transport.c                         |   4 +-
 12 files changed, 305 insertions(+), 222 deletions(-)


^ permalink raw reply	[flat|nested] 84+ messages in thread

* [PATCH 01/18] t5613: drop reachable_via function
  2016-10-03 20:33 [PATCH 0/18] alternate object database cleanups Jeff King
@ 2016-10-03 20:33 ` Jeff King
  2016-10-04  5:48   ` Jacob Keller
  2016-10-03 20:33 ` [PATCH 02/18] t5613: drop test_valid_repo function Jeff King
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 84+ messages in thread
From: Jeff King @ 2016-10-03 20:33 UTC (permalink / raw)
  To: git; +Cc: René Scharfe

This function was never used since its inception in dd05ea1
(test case for transitive info/alternates, 2006-05-07).
Which is just as well, since it mutates the repo state in a
way that would invalidate further tests, without cleaning up
after itself. Let's get rid of it so that nobody is tempted
to use it.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5613-info-alternate.sh | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
index 9cd2626..e13f57d 100755
--- a/t/t5613-info-alternate.sh
+++ b/t/t5613-info-alternate.sh
@@ -6,16 +6,6 @@
 test_description='test transitive info/alternate entries'
 . ./test-lib.sh
 
-# test that a file is not reachable in the current repository
-# but that it is after creating a info/alternate entry
-reachable_via() {
-	alternate="$1"
-	file="$2"
-	if git cat-file -e "HEAD:$file"; then return 1; fi
-	echo "$alternate" >> .git/objects/info/alternate
-	git cat-file -e "HEAD:$file"
-}
-
 test_valid_repo() {
 	git fsck --full > fsck.log &&
 	test_line_count = 0 fsck.log
-- 
2.10.0.618.g82cc264


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 02/18] t5613: drop test_valid_repo function
  2016-10-03 20:33 [PATCH 0/18] alternate object database cleanups Jeff King
  2016-10-03 20:33 ` [PATCH 01/18] t5613: drop reachable_via function Jeff King
@ 2016-10-03 20:33 ` Jeff King
  2016-10-04  5:50   ` Jacob Keller
  2016-10-03 20:34 ` [PATCH 03/18] t5613: use test_must_fail Jeff King
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 84+ messages in thread
From: Jeff King @ 2016-10-03 20:33 UTC (permalink / raw)
  To: git; +Cc: René Scharfe

This function makes sure that "git fsck" does not report any
errors. But "--full" has been the default since f29cd39
(fsck: default to "git fsck --full", 2009-10-20), and we can
use the exit code (instead of counting the lines) since
e2b4f63 (fsck: exit with non-zero status upon errors,
2007-03-05).

So we can just use "git fsck", which is shorter and more
flexible (e.g., we can use "git -C").

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5613-info-alternate.sh | 19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
index e13f57d..4548fb0 100755
--- a/t/t5613-info-alternate.sh
+++ b/t/t5613-info-alternate.sh
@@ -6,11 +6,6 @@
 test_description='test transitive info/alternate entries'
 . ./test-lib.sh
 
-test_valid_repo() {
-	git fsck --full > fsck.log &&
-	test_line_count = 0 fsck.log
-}
-
 base_dir=$(pwd)
 
 test_expect_success 'preparing first repository' \
@@ -52,7 +47,7 @@ git clone --bare -l -s G H'
 
 test_expect_success 'invalidity of deepest repository' \
 'cd H && {
-	test_valid_repo
+	git fsck
 	test $? -ne 0
 }'
 
@@ -60,41 +55,41 @@ cd "$base_dir"
 
 test_expect_success 'validity of third repository' \
 'cd C &&
-test_valid_repo'
+git fsck'
 
 cd "$base_dir"
 
 test_expect_success 'validity of fourth repository' \
 'cd D &&
-test_valid_repo'
+git fsck'
 
 cd "$base_dir"
 
 test_expect_success 'breaking of loops' \
 'echo "$base_dir"/B/.git/objects >> "$base_dir"/A/.git/objects/info/alternates&&
 cd C &&
-test_valid_repo'
+git fsck'
 
 cd "$base_dir"
 
 test_expect_success 'that info/alternates is necessary' \
 'cd C &&
 rm -f .git/objects/info/alternates &&
-! (test_valid_repo)'
+! (git fsck)'
 
 cd "$base_dir"
 
 test_expect_success 'that relative alternate is possible for current dir' \
 'cd C &&
 echo "../../../B/.git/objects" > .git/objects/info/alternates &&
-test_valid_repo'
+git fsck'
 
 cd "$base_dir"
 
 test_expect_success \
     'that relative alternate is only possible for current dir' '
     cd D &&
-    ! (test_valid_repo)
+    ! (git fsck)
 '
 
 cd "$base_dir"
-- 
2.10.0.618.g82cc264


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 03/18] t5613: use test_must_fail
  2016-10-03 20:33 [PATCH 0/18] alternate object database cleanups Jeff King
  2016-10-03 20:33 ` [PATCH 01/18] t5613: drop reachable_via function Jeff King
  2016-10-03 20:33 ` [PATCH 02/18] t5613: drop test_valid_repo function Jeff King
@ 2016-10-03 20:34 ` Jeff King
  2016-10-04  5:51   ` Jacob Keller
  2016-10-03 20:34 ` [PATCH 04/18] t5613: whitespace/style cleanups Jeff King
                   ` (16 subsequent siblings)
  19 siblings, 1 reply; 84+ messages in thread
From: Jeff King @ 2016-10-03 20:34 UTC (permalink / raw)
  To: git; +Cc: René Scharfe

Besides being our normal style, this correctly checks for an
error exit() versus signal death.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5613-info-alternate.sh | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
index 4548fb0..65074dd 100755
--- a/t/t5613-info-alternate.sh
+++ b/t/t5613-info-alternate.sh
@@ -46,10 +46,9 @@ git clone -l -s F G &&
 git clone --bare -l -s G H'
 
 test_expect_success 'invalidity of deepest repository' \
-'cd H && {
-	git fsck
-	test $? -ne 0
-}'
+'cd H &&
+test_must_fail git fsck
+'
 
 cd "$base_dir"
 
@@ -75,7 +74,8 @@ cd "$base_dir"
 test_expect_success 'that info/alternates is necessary' \
 'cd C &&
 rm -f .git/objects/info/alternates &&
-! (git fsck)'
+test_must_fail git fsck
+'
 
 cd "$base_dir"
 
@@ -89,7 +89,7 @@ cd "$base_dir"
 test_expect_success \
     'that relative alternate is only possible for current dir' '
     cd D &&
-    ! (git fsck)
+    test_must_fail git fsck
 '
 
 cd "$base_dir"
-- 
2.10.0.618.g82cc264


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 04/18] t5613: whitespace/style cleanups
  2016-10-03 20:33 [PATCH 0/18] alternate object database cleanups Jeff King
                   ` (2 preceding siblings ...)
  2016-10-03 20:34 ` [PATCH 03/18] t5613: use test_must_fail Jeff King
@ 2016-10-03 20:34 ` Jeff King
  2016-10-04  5:52   ` Jacob Keller
  2016-10-03 20:34 ` [PATCH 05/18] t5613: do not chdir in main process Jeff King
                   ` (15 subsequent siblings)
  19 siblings, 1 reply; 84+ messages in thread
From: Jeff King @ 2016-10-03 20:34 UTC (permalink / raw)
  To: git; +Cc: René Scharfe

Our normal test style these days puts the opening quote of
the body on the description line, and indents the body with
a single tab. This ancient test did not follow this.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5613-info-alternate.sh | 114 +++++++++++++++++++++++++---------------------
 1 file changed, 62 insertions(+), 52 deletions(-)

diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
index 65074dd..1f283a5 100755
--- a/t/t5613-info-alternate.sh
+++ b/t/t5613-info-alternate.sh
@@ -8,88 +8,98 @@ test_description='test transitive info/alternate entries'
 
 base_dir=$(pwd)
 
-test_expect_success 'preparing first repository' \
-'test_create_repo A && cd A &&
-echo "Hello World" > file1 &&
-git add file1 &&
-git commit -m "Initial commit" file1 &&
-git repack -a -d &&
-git prune'
+test_expect_success 'preparing first repository' '
+	test_create_repo A &&
+	cd A &&
+	echo "Hello World" > file1 &&
+	git add file1 &&
+	git commit -m "Initial commit" file1 &&
+	git repack -a -d &&
+	git prune
+'
 
 cd "$base_dir"
 
-test_expect_success 'preparing second repository' \
-'git clone -l -s A B && cd B &&
-echo "foo bar" > file2 &&
-git add file2 &&
-git commit -m "next commit" file2 &&
-git repack -a -d -l &&
-git prune'
+test_expect_success 'preparing second repository' '
+	git clone -l -s A B &&
+	cd B &&
+	echo "foo bar" > file2 &&
+	git add file2 &&
+	git commit -m "next commit" file2 &&
+	git repack -a -d -l &&
+	git prune
+'
 
 cd "$base_dir"
 
-test_expect_success 'preparing third repository' \
-'git clone -l -s B C && cd C &&
-echo "Goodbye, cruel world" > file3 &&
-git add file3 &&
-git commit -m "one more" file3 &&
-git repack -a -d -l &&
-git prune'
+test_expect_success 'preparing third repository' '
+	git clone -l -s B C &&
+	cd C &&
+	echo "Goodbye, cruel world" > file3 &&
+	git add file3 &&
+	git commit -m "one more" file3 &&
+	git repack -a -d -l &&
+	git prune
+'
 
 cd "$base_dir"
 
-test_expect_success 'creating too deep nesting' \
-'git clone -l -s C D &&
-git clone -l -s D E &&
-git clone -l -s E F &&
-git clone -l -s F G &&
-git clone --bare -l -s G H'
+test_expect_success 'creating too deep nesting' '
+	git clone -l -s C D &&
+	git clone -l -s D E &&
+	git clone -l -s E F &&
+	git clone -l -s F G &&
+	git clone --bare -l -s G H
+'
 
-test_expect_success 'invalidity of deepest repository' \
-'cd H &&
-test_must_fail git fsck
+test_expect_success 'invalidity of deepest repository' '
+	cd H &&
+	test_must_fail git fsck
 '
 
 cd "$base_dir"
 
-test_expect_success 'validity of third repository' \
-'cd C &&
-git fsck'
+test_expect_success 'validity of third repository' '
+	cd C &&
+	git fsck
+'
 
 cd "$base_dir"
 
-test_expect_success 'validity of fourth repository' \
-'cd D &&
-git fsck'
+test_expect_success 'validity of fourth repository' '
+	cd D &&
+	git fsck
+'
 
 cd "$base_dir"
 
-test_expect_success 'breaking of loops' \
-'echo "$base_dir"/B/.git/objects >> "$base_dir"/A/.git/objects/info/alternates&&
-cd C &&
-git fsck'
+test_expect_success 'breaking of loops' '
+	echo "$base_dir"/B/.git/objects >>"$base_dir"/A/.git/objects/info/alternatesi &&
+	cd C &&
+	git fsck
+'
 
 cd "$base_dir"
 
-test_expect_success 'that info/alternates is necessary' \
-'cd C &&
-rm -f .git/objects/info/alternates &&
-test_must_fail git fsck
+test_expect_success 'that info/alternates is necessary' '
+	cd C &&
+	rm -f .git/objects/info/alternates &&
+	test_must_fail git fsck
 '
 
 cd "$base_dir"
 
-test_expect_success 'that relative alternate is possible for current dir' \
-'cd C &&
-echo "../../../B/.git/objects" > .git/objects/info/alternates &&
-git fsck'
+test_expect_success 'that relative alternate is possible for current dir' '
+	cd C &&
+	echo "../../../B/.git/objects" > .git/objects/info/alternates &&
+	git fsck
+'
 
 cd "$base_dir"
 
-test_expect_success \
-    'that relative alternate is only possible for current dir' '
-    cd D &&
-    test_must_fail git fsck
+test_expect_success 'that relative alternate is only possible for current dir' '
+	cd D &&
+	test_must_fail git fsck
 '
 
 cd "$base_dir"
-- 
2.10.0.618.g82cc264


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 05/18] t5613: do not chdir in main process
  2016-10-03 20:33 [PATCH 0/18] alternate object database cleanups Jeff King
                   ` (3 preceding siblings ...)
  2016-10-03 20:34 ` [PATCH 04/18] t5613: whitespace/style cleanups Jeff King
@ 2016-10-03 20:34 ` Jeff King
  2016-10-04  5:54   ` Jacob Keller
  2016-10-04 21:00   ` Junio C Hamano
  2016-10-03 20:34 ` [PATCH 06/18] t5613: clarify "too deep" recursion tests Jeff King
                   ` (14 subsequent siblings)
  19 siblings, 2 replies; 84+ messages in thread
From: Jeff King @ 2016-10-03 20:34 UTC (permalink / raw)
  To: git; +Cc: René Scharfe

Our usual style when working with subdirectories is to chdir
inside a subshell or to use "git -C", which means we do not
have to constantly return to the main test directory. Let's
convert this old test, which does not follow that style.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5613-info-alternate.sh | 92 +++++++++++++++++------------------------------
 1 file changed, 33 insertions(+), 59 deletions(-)

diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
index 1f283a5..7bc1c3c 100755
--- a/t/t5613-info-alternate.sh
+++ b/t/t5613-info-alternate.sh
@@ -6,44 +6,39 @@
 test_description='test transitive info/alternate entries'
 . ./test-lib.sh
 
-base_dir=$(pwd)
-
 test_expect_success 'preparing first repository' '
-	test_create_repo A &&
-	cd A &&
-	echo "Hello World" > file1 &&
-	git add file1 &&
-	git commit -m "Initial commit" file1 &&
-	git repack -a -d &&
-	git prune
+	test_create_repo A && (
+		cd A &&
+		echo "Hello World" > file1 &&
+		git add file1 &&
+		git commit -m "Initial commit" file1 &&
+		git repack -a -d &&
+		git prune
+	)
 '
 
-cd "$base_dir"
-
 test_expect_success 'preparing second repository' '
-	git clone -l -s A B &&
-	cd B &&
-	echo "foo bar" > file2 &&
-	git add file2 &&
-	git commit -m "next commit" file2 &&
-	git repack -a -d -l &&
-	git prune
+	git clone -l -s A B && (
+		cd B &&
+		echo "foo bar" > file2 &&
+		git add file2 &&
+		git commit -m "next commit" file2 &&
+		git repack -a -d -l &&
+		git prune
+	)
 '
 
-cd "$base_dir"
-
 test_expect_success 'preparing third repository' '
-	git clone -l -s B C &&
-	cd C &&
-	echo "Goodbye, cruel world" > file3 &&
-	git add file3 &&
-	git commit -m "one more" file3 &&
-	git repack -a -d -l &&
-	git prune
+	git clone -l -s B C && (
+		cd C &&
+		echo "Goodbye, cruel world" > file3 &&
+		git add file3 &&
+		git commit -m "one more" file3 &&
+		git repack -a -d -l &&
+		git prune
+	)
 '
 
-cd "$base_dir"
-
 test_expect_success 'creating too deep nesting' '
 	git clone -l -s C D &&
 	git clone -l -s D E &&
@@ -53,55 +48,34 @@ test_expect_success 'creating too deep nesting' '
 '
 
 test_expect_success 'invalidity of deepest repository' '
-	cd H &&
-	test_must_fail git fsck
+	test_must_fail git -C H fsck
 '
 
-cd "$base_dir"
-
 test_expect_success 'validity of third repository' '
-	cd C &&
-	git fsck
+	git -C C fsck
 '
 
-cd "$base_dir"
-
 test_expect_success 'validity of fourth repository' '
-	cd D &&
-	git fsck
+	git -C D fsck
 '
 
-cd "$base_dir"
-
 test_expect_success 'breaking of loops' '
-	echo "$base_dir"/B/.git/objects >>"$base_dir"/A/.git/objects/info/alternatesi &&
-	cd C &&
-	git fsck
+	echo "$(pwd)"/B/.git/objects >>A/.git/objects/info/alternates &&
+	git -C C fsck
 '
 
-cd "$base_dir"
-
 test_expect_success 'that info/alternates is necessary' '
-	cd C &&
-	rm -f .git/objects/info/alternates &&
-	test_must_fail git fsck
+	rm -f C/.git/objects/info/alternates &&
+	test_must_fail git -C C fsck
 '
 
-cd "$base_dir"
-
 test_expect_success 'that relative alternate is possible for current dir' '
-	cd C &&
-	echo "../../../B/.git/objects" > .git/objects/info/alternates &&
+	echo "../../../B/.git/objects" >C/.git/objects/info/alternates &&
 	git fsck
 '
 
-cd "$base_dir"
-
 test_expect_success 'that relative alternate is only possible for current dir' '
-	cd D &&
-	test_must_fail git fsck
+	test_must_fail git -C D fsck
 '
 
-cd "$base_dir"
-
 test_done
-- 
2.10.0.618.g82cc264


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 06/18] t5613: clarify "too deep" recursion tests
  2016-10-03 20:33 [PATCH 0/18] alternate object database cleanups Jeff King
                   ` (4 preceding siblings ...)
  2016-10-03 20:34 ` [PATCH 05/18] t5613: do not chdir in main process Jeff King
@ 2016-10-03 20:34 ` Jeff King
  2016-10-04  5:57   ` Jacob Keller
  2016-10-03 20:34 ` [PATCH 07/18] link_alt_odb_entry: handle normalize_path errors Jeff King
                   ` (13 subsequent siblings)
  19 siblings, 1 reply; 84+ messages in thread
From: Jeff King @ 2016-10-03 20:34 UTC (permalink / raw)
  To: git; +Cc: René Scharfe

These tests are just trying to show that we allow recursion
up to a certain depth, but not past it. But the counting is
a bit non-intuitive, and rather than test at the edge of the
breakage, we test "OK" cases in the middle of the chain.
Let's explain what's going on, and explicitly test the
switch between "OK" and "too deep".

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5613-info-alternate.sh | 24 ++++++++++++++++--------
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
index 7bc1c3c..b393613 100755
--- a/t/t5613-info-alternate.sh
+++ b/t/t5613-info-alternate.sh
@@ -39,6 +39,18 @@ test_expect_success 'preparing third repository' '
 	)
 '
 
+# Note: These tests depend on the hard-coded value of 5 as "too deep". We start
+# the depth at 0 and count links, not repositories, so in a chain like:
+#
+#   A -> B -> C -> D -> E -> F -> G -> H
+#      0    1    2    3    4    5    6
+#
+# we are OK at "G", but break at "H".
+#
+# Note also that we must use "--bare -l" to make the link to H. The "-l"
+# ensures we do not do a connectivity check, and the "--bare" makes sure
+# we do not try to checkout the result (which needs objects), either of
+# which would cause the clone to fail.
 test_expect_success 'creating too deep nesting' '
 	git clone -l -s C D &&
 	git clone -l -s D E &&
@@ -47,16 +59,12 @@ test_expect_success 'creating too deep nesting' '
 	git clone --bare -l -s G H
 '
 
-test_expect_success 'invalidity of deepest repository' '
-	test_must_fail git -C H fsck
-'
-
-test_expect_success 'validity of third repository' '
-	git -C C fsck
+test_expect_success 'validity of fifth-deep repository' '
+	git -C G fsck
 '
 
-test_expect_success 'validity of fourth repository' '
-	git -C D fsck
+test_expect_success 'invalidity of sixth-deep repository' '
+	test_must_fail git -C H fsck
 '
 
 test_expect_success 'breaking of loops' '
-- 
2.10.0.618.g82cc264


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 07/18] link_alt_odb_entry: handle normalize_path errors
  2016-10-03 20:33 [PATCH 0/18] alternate object database cleanups Jeff King
                   ` (5 preceding siblings ...)
  2016-10-03 20:34 ` [PATCH 06/18] t5613: clarify "too deep" recursion tests Jeff King
@ 2016-10-03 20:34 ` Jeff King
  2016-10-04  6:01   ` Jacob Keller
                     ` (3 more replies)
  2016-10-03 20:34 ` [PATCH 08/18] link_alt_odb_entry: refactor string handling Jeff King
                   ` (12 subsequent siblings)
  19 siblings, 4 replies; 84+ messages in thread
From: Jeff King @ 2016-10-03 20:34 UTC (permalink / raw)
  To: git; +Cc: René Scharfe

When we add a new alternate to the list, we try to normalize
out any redundant "..", etc. However, we do not look at the
return value of normalize_path_copy(), and will happily
continue with a path that could not be normalized. Worse,
the normalizing process is done in-place, so we are left
with whatever half-finished working state the normalizing
function was in.

Fortunately, this cannot cause us to read past the end of
our buffer, as that working state will always leave the
NUL from the original path in place. And we do tend to
notice problems when we check is_directory() on the path.
But you can see the nonsense that we feed to is_directory
with an entry like:

  this/../../is/../../way/../../too/../../deep/../../to/../../resolve

in your objects/info/alternates, which yields:

  error: object directory
  /to/e/deep/too/way//ects/this/../../is/../../way/../../too/../../deep/../../to/../../resolve
  does not exist; check .git/objects/info/alternates.

We can easily fix this just by checking the return value.
But that makes it hard to generate a good error message,
since we're normalizing in-place and our input value has
been overwritten by cruft.

Instead, let's provide a strbuf helper that does an in-place
normalize, but restores the original contents on error. This
uses a second buffer under the hood, which is slightly less
efficient, but this is not a performance-critical code path.

The strbuf helper can also properly set the "len" parameter
of the strbuf before returning. Just doing:

  normalize_path_copy(buf.buf, buf.buf);

will shorten the string, but leave buf.len at the original
length. That may be confusing to later code which uses the
strbuf.

Signed-off-by: Jeff King <peff@peff.net>
---
 sha1_file.c | 11 +++++++++--
 strbuf.c    | 20 ++++++++++++++++++++
 strbuf.h    |  8 ++++++++
 3 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index b9c1fa3..68571bd 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -263,7 +263,12 @@ static int link_alt_odb_entry(const char *entry, const char *relative_base,
 	}
 	strbuf_addstr(&pathbuf, entry);
 
-	normalize_path_copy(pathbuf.buf, pathbuf.buf);
+	if (strbuf_normalize_path(&pathbuf) < 0) {
+		error("unable to normalize alternate object path: %s",
+		      pathbuf.buf);
+		strbuf_release(&pathbuf);
+		return -1;
+	}
 
 	pfxlen = strlen(pathbuf.buf);
 
@@ -335,7 +340,9 @@ static void link_alt_odb_entries(const char *alt, int len, int sep,
 	}
 
 	strbuf_add_absolute_path(&objdirbuf, get_object_directory());
-	normalize_path_copy(objdirbuf.buf, objdirbuf.buf);
+	if (strbuf_normalize_path(&objdirbuf) < 0)
+		die("unable to normalize object directory: %s",
+		    objdirbuf.buf);
 
 	alt_copy = xmemdupz(alt, len);
 	string_list_split_in_place(&entries, alt_copy, sep, -1);
diff --git a/strbuf.c b/strbuf.c
index b839be4..8fec657 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -870,3 +870,23 @@ void strbuf_stripspace(struct strbuf *sb, int skip_comments)
 
 	strbuf_setlen(sb, j);
 }
+
+int strbuf_normalize_path(struct strbuf *src)
+{
+	struct strbuf dst = STRBUF_INIT;
+
+	strbuf_grow(&dst, src->len);
+	if (normalize_path_copy(dst.buf, src->buf) < 0) {
+		strbuf_release(&dst);
+		return -1;
+	}
+
+	/*
+	 * normalize_path does not tell us the new length, so we have to
+	 * compute it by looking for the new NUL it placed
+	 */
+	strbuf_setlen(&dst, strlen(dst.buf));
+	strbuf_swap(src, &dst);
+	strbuf_release(&dst);
+	return 0;
+}
diff --git a/strbuf.h b/strbuf.h
index ba8d5f1..2262b12 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -443,6 +443,14 @@ extern int strbuf_getcwd(struct strbuf *sb);
  */
 extern void strbuf_add_absolute_path(struct strbuf *sb, const char *path);
 
+
+/**
+ * Normalize in-place the path contained in the strbuf. See
+ * normalize_path_copy() for details. If an error occurs, the contents of "sb"
+ * are left untouched, and -1 is returned.
+ */
+extern int strbuf_normalize_path(struct strbuf *sb);
+
 /**
  * Strip whitespace from a buffer. The second parameter controls if
  * comments are considered contents to be removed or not.
-- 
2.10.0.618.g82cc264


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 08/18] link_alt_odb_entry: refactor string handling
  2016-10-03 20:33 [PATCH 0/18] alternate object database cleanups Jeff King
                   ` (6 preceding siblings ...)
  2016-10-03 20:34 ` [PATCH 07/18] link_alt_odb_entry: handle normalize_path errors Jeff King
@ 2016-10-03 20:34 ` Jeff King
  2016-10-04  6:05   ` Jacob Keller
  2016-10-04 21:18   ` Junio C Hamano
  2016-10-03 20:35 ` [PATCH 09/18] alternates: provide helper for adding to alternates list Jeff King
                   ` (11 subsequent siblings)
  19 siblings, 2 replies; 84+ messages in thread
From: Jeff King @ 2016-10-03 20:34 UTC (permalink / raw)
  To: git; +Cc: René Scharfe

The string handling in link_alt_odb_entry() is mostly an
artifact of the original version, which took the path as a
ptr/len combo, and did not have a NUL-terminated string
until we created one in the alternate_object_database
struct.  But since 5bdf0a8 (sha1_file: normalize alt_odb
path before comparing and storing, 2011-09-07), the first
thing we do is put the path into a strbuf, which gives us
some easy opportunities for cleanup.

In particular:

  - we call strlen(pathbuf.buf), which is silly; we can look
    at pathbuf.len.

  - even though we have a strbuf, we don't maintain its
    "len" field when chomping extra slashes from the
    end, and instead keep a separate "pfxlen" variable. We
    can fix this and then drop "pfxlen" entirely.

  - we don't check whether the path is usable until after we
    allocate the new struct, making extra cleanup work for
    ourselves. Since we have a NUL-terminated string, we can
    bump the "is it usable" checks higher in the function.
    While we're at it, we can move that logic to its own
    helper, which makes the flow of link_alt_odb_entry()
    easier to follow.

Signed-off-by: Jeff King <peff@peff.net>
---
And you can probably guess now how I found the issue in the last patch
where pathbuf.len is totally bogus after calling normalize_path_copy. :)

 sha1_file.c | 83 +++++++++++++++++++++++++++++++++----------------------------
 1 file changed, 45 insertions(+), 38 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 68571bd..f396823 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -234,6 +234,36 @@ char *sha1_pack_index_name(const unsigned char *sha1)
 struct alternate_object_database *alt_odb_list;
 static struct alternate_object_database **alt_odb_tail;
 
+/*
+ * Return non-zero iff the path is usable as an alternate object database.
+ */
+static int alt_odb_usable(struct strbuf *path, const char *normalized_objdir)
+{
+	struct alternate_object_database *alt;
+
+	/* Detect cases where alternate disappeared */
+	if (!is_directory(path->buf)) {
+		error("object directory %s does not exist; "
+		      "check .git/objects/info/alternates.",
+		      path->buf);
+		return 0;
+	}
+
+	/*
+	 * Prevent the common mistake of listing the same
+	 * thing twice, or object directory itself.
+	 */
+	for (alt = alt_odb_list; alt; alt = alt->next) {
+		if (path->len == alt->name - alt->base - 1 &&
+		    !memcmp(path->buf, alt->base, path->len))
+			return 0;
+	}
+	if (!fspathcmp(path->buf, normalized_objdir))
+		return 0;
+
+	return 1;
+}
+
 /*
  * Prepare alternate object database registry.
  *
@@ -253,8 +283,7 @@ static int link_alt_odb_entry(const char *entry, const char *relative_base,
 	int depth, const char *normalized_objdir)
 {
 	struct alternate_object_database *ent;
-	struct alternate_object_database *alt;
-	size_t pfxlen, entlen;
+	size_t entlen;
 	struct strbuf pathbuf = STRBUF_INIT;
 
 	if (!is_absolute_path(entry) && relative_base) {
@@ -270,47 +299,26 @@ static int link_alt_odb_entry(const char *entry, const char *relative_base,
 		return -1;
 	}
 
-	pfxlen = strlen(pathbuf.buf);
-
 	/*
 	 * The trailing slash after the directory name is given by
 	 * this function at the end. Remove duplicates.
 	 */
-	while (pfxlen && pathbuf.buf[pfxlen-1] == '/')
-		pfxlen -= 1;
-
-	entlen = st_add(pfxlen, 43); /* '/' + 2 hex + '/' + 38 hex + NUL */
-	ent = xmalloc(st_add(sizeof(*ent), entlen));
-	memcpy(ent->base, pathbuf.buf, pfxlen);
-	strbuf_release(&pathbuf);
-
-	ent->name = ent->base + pfxlen + 1;
-	ent->base[pfxlen + 3] = '/';
-	ent->base[pfxlen] = ent->base[entlen-1] = 0;
+	while (pathbuf.len && pathbuf.buf[pathbuf.len - 1] == '/')
+		strbuf_setlen(&pathbuf, pathbuf.len - 1);
 
-	/* Detect cases where alternate disappeared */
-	if (!is_directory(ent->base)) {
-		error("object directory %s does not exist; "
-		      "check .git/objects/info/alternates.",
-		      ent->base);
-		free(ent);
+	if (!alt_odb_usable(&pathbuf, normalized_objdir)) {
+		strbuf_release(&pathbuf);
 		return -1;
 	}
 
-	/* Prevent the common mistake of listing the same
-	 * thing twice, or object directory itself.
-	 */
-	for (alt = alt_odb_list; alt; alt = alt->next) {
-		if (pfxlen == alt->name - alt->base - 1 &&
-		    !memcmp(ent->base, alt->base, pfxlen)) {
-			free(ent);
-			return -1;
-		}
-	}
-	if (!fspathcmp(ent->base, normalized_objdir)) {
-		free(ent);
-		return -1;
-	}
+	entlen = st_add(pathbuf.len, 43); /* '/' + 2 hex + '/' + 38 hex + NUL */
+	ent = xmalloc(st_add(sizeof(*ent), entlen));
+	memcpy(ent->base, pathbuf.buf, pathbuf.len);
+
+	ent->name = ent->base + pathbuf.len + 1;
+	ent->base[pathbuf.len] = '/';
+	ent->base[pathbuf.len + 3] = '/';
+	ent->base[entlen-1] = 0;
 
 	/* add the alternate entry */
 	*alt_odb_tail = ent;
@@ -318,10 +326,9 @@ static int link_alt_odb_entry(const char *entry, const char *relative_base,
 	ent->next = NULL;
 
 	/* recursively add alternates */
-	read_info_alternates(ent->base, depth + 1);
-
-	ent->base[pfxlen] = '/';
+	read_info_alternates(pathbuf.buf, depth + 1);
 
+	strbuf_release(&pathbuf);
 	return 0;
 }
 
-- 
2.10.0.618.g82cc264


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 09/18] alternates: provide helper for adding to alternates list
  2016-10-03 20:33 [PATCH 0/18] alternate object database cleanups Jeff King
                   ` (7 preceding siblings ...)
  2016-10-03 20:34 ` [PATCH 08/18] link_alt_odb_entry: refactor string handling Jeff King
@ 2016-10-03 20:35 ` Jeff King
  2016-10-04  6:07   ` Jacob Keller
  2016-10-03 20:35 ` [PATCH 10/18] alternates: provide helper for allocating alternate Jeff King
                   ` (10 subsequent siblings)
  19 siblings, 1 reply; 84+ messages in thread
From: Jeff King @ 2016-10-03 20:35 UTC (permalink / raw)
  To: git; +Cc: René Scharfe

The submodule code wants to temporarily add an alternate
object store to our in-memory alt_odb list, but does it
manually. Let's provide a helper so it can reuse the code in
link_alt_odb_entry().

While we're adding our new add_to_alternates_memory(), let's
document add_to_alternates_file(), as the two are related.

Signed-off-by: Jeff King <peff@peff.net>
---
 cache.h     | 14 +++++++++++++-
 sha1_file.c | 11 +++++++++++
 submodule.c | 23 +----------------------
 3 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/cache.h b/cache.h
index ed3d5df..9a91378 100644
--- a/cache.h
+++ b/cache.h
@@ -1388,10 +1388,22 @@ extern struct alternate_object_database {
 extern void prepare_alt_odb(void);
 extern void read_info_alternates(const char * relative_base, int depth);
 extern char *compute_alternate_path(const char *path, struct strbuf *err);
-extern void add_to_alternates_file(const char *reference);
 typedef int alt_odb_fn(struct alternate_object_database *, void *);
 extern int foreach_alt_odb(alt_odb_fn, void*);
 
+/*
+ * Add the directory to the on-disk alternates file; the new entry will also
+ * take effect in the current process.
+ */
+extern void add_to_alternates_file(const char *dir);
+
+/*
+ * Add the directory to the in-memory list of alternates (along with any
+ * recursive alternates it points to), but do not modify the on-disk alternates
+ * file.
+ */
+extern void add_to_alternates_memory(const char *dir);
+
 struct pack_window {
 	struct pack_window *next;
 	unsigned char *base;
diff --git a/sha1_file.c b/sha1_file.c
index f396823..2e41b26 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -440,6 +440,17 @@ void add_to_alternates_file(const char *reference)
 	free(alts);
 }
 
+void add_to_alternates_memory(const char *reference)
+{
+	/*
+	 * Make sure alternates are initialized, or else our entry may be
+	 * overwritten when they are.
+	 */
+	prepare_alt_odb();
+
+	link_alt_odb_entries(reference, strlen(reference), '\n', NULL, 0);
+}
+
 /*
  * Compute the exact path an alternate is at and returns it. In case of
  * error NULL is returned and the human readable error is added to `err`
diff --git a/submodule.c b/submodule.c
index 0ef2ff4..8b3274a 100644
--- a/submodule.c
+++ b/submodule.c
@@ -123,9 +123,7 @@ void stage_updated_gitmodules(void)
 static int add_submodule_odb(const char *path)
 {
 	struct strbuf objects_directory = STRBUF_INIT;
-	struct alternate_object_database *alt_odb;
 	int ret = 0;
-	size_t alloc;
 
 	ret = strbuf_git_path_submodule(&objects_directory, path, "objects/");
 	if (ret)
@@ -134,26 +132,7 @@ static int add_submodule_odb(const char *path)
 		ret = -1;
 		goto done;
 	}
-	/* avoid adding it twice */
-	prepare_alt_odb();
-	for (alt_odb = alt_odb_list; alt_odb; alt_odb = alt_odb->next)
-		if (alt_odb->name - alt_odb->base == objects_directory.len &&
-				!strncmp(alt_odb->base, objects_directory.buf,
-					objects_directory.len))
-			goto done;
-
-	alloc = st_add(objects_directory.len, 42); /* for "12/345..." sha1 */
-	alt_odb = xmalloc(st_add(sizeof(*alt_odb), alloc));
-	alt_odb->next = alt_odb_list;
-	xsnprintf(alt_odb->base, alloc, "%s", objects_directory.buf);
-	alt_odb->name = alt_odb->base + objects_directory.len;
-	alt_odb->name[2] = '/';
-	alt_odb->name[40] = '\0';
-	alt_odb->name[41] = '\0';
-	alt_odb_list = alt_odb;
-
-	/* add possible alternates from the submodule */
-	read_info_alternates(objects_directory.buf, 0);
+	add_to_alternates_memory(objects_directory.buf);
 done:
 	strbuf_release(&objects_directory);
 	return ret;
-- 
2.10.0.618.g82cc264


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 10/18] alternates: provide helper for allocating alternate
  2016-10-03 20:33 [PATCH 0/18] alternate object database cleanups Jeff King
                   ` (8 preceding siblings ...)
  2016-10-03 20:35 ` [PATCH 09/18] alternates: provide helper for adding to alternates list Jeff King
@ 2016-10-03 20:35 ` Jeff King
  2016-10-04  6:09   ` Jacob Keller
  2016-10-03 20:35 ` [PATCH 11/18] alternates: encapsulate alt->base munging Jeff King
                   ` (9 subsequent siblings)
  19 siblings, 1 reply; 84+ messages in thread
From: Jeff King @ 2016-10-03 20:35 UTC (permalink / raw)
  To: git; +Cc: René Scharfe

Allocating a struct alternate_object_database is tricky, as
we must over-allocate the buffer to provide scratch space,
and then put in particular '/' and NUL markers.

Let's encapsulate this in a function so that the complexity
doesn't leak into callers (and so that we can modify it
later).

Signed-off-by: Jeff King <peff@peff.net>
---
 cache.h     |  6 ++++++
 sha1_file.c | 28 +++++++++++++++++++---------
 sha1_name.c |  7 +------
 3 files changed, 26 insertions(+), 15 deletions(-)

diff --git a/cache.h b/cache.h
index 9a91378..d36b2ad 100644
--- a/cache.h
+++ b/cache.h
@@ -1391,6 +1391,12 @@ extern char *compute_alternate_path(const char *path, struct strbuf *err);
 typedef int alt_odb_fn(struct alternate_object_database *, void *);
 extern int foreach_alt_odb(alt_odb_fn, void*);
 
+/*
+ * Allocate a "struct alternate_object_database" but do _not_ actually
+ * add it to the list of alternates.
+ */
+struct alternate_object_database *alloc_alt_odb(const char *dir);
+
 /*
  * Add the directory to the on-disk alternates file; the new entry will also
  * take effect in the current process.
diff --git a/sha1_file.c b/sha1_file.c
index 2e41b26..549cf1e 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -283,7 +283,6 @@ static int link_alt_odb_entry(const char *entry, const char *relative_base,
 	int depth, const char *normalized_objdir)
 {
 	struct alternate_object_database *ent;
-	size_t entlen;
 	struct strbuf pathbuf = STRBUF_INIT;
 
 	if (!is_absolute_path(entry) && relative_base) {
@@ -311,14 +310,7 @@ static int link_alt_odb_entry(const char *entry, const char *relative_base,
 		return -1;
 	}
 
-	entlen = st_add(pathbuf.len, 43); /* '/' + 2 hex + '/' + 38 hex + NUL */
-	ent = xmalloc(st_add(sizeof(*ent), entlen));
-	memcpy(ent->base, pathbuf.buf, pathbuf.len);
-
-	ent->name = ent->base + pathbuf.len + 1;
-	ent->base[pathbuf.len] = '/';
-	ent->base[pathbuf.len + 3] = '/';
-	ent->base[entlen-1] = 0;
+	ent = alloc_alt_odb(pathbuf.buf);
 
 	/* add the alternate entry */
 	*alt_odb_tail = ent;
@@ -395,6 +387,24 @@ void read_info_alternates(const char * relative_base, int depth)
 	munmap(map, mapsz);
 }
 
+struct alternate_object_database *alloc_alt_odb(const char *dir)
+{
+	struct alternate_object_database *ent;
+	size_t dirlen = strlen(dir);
+	size_t entlen;
+
+	entlen = st_add(dirlen, 43); /* '/' + 2 hex + '/' + 38 hex + NUL */
+	ent = xmalloc(st_add(sizeof(*ent), entlen));
+	memcpy(ent->base, dir, dirlen);
+
+	ent->name = ent->base + dirlen + 1;
+	ent->base[dirlen] = '/';
+	ent->base[dirlen + 3] = '/';
+	ent->base[entlen-1] = 0;
+
+	return ent;
+}
+
 void add_to_alternates_file(const char *reference)
 {
 	struct lock_file *lock = xcalloc(1, sizeof(struct lock_file));
diff --git a/sha1_name.c b/sha1_name.c
index faf873c..98152a6 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -86,12 +86,7 @@ static void find_short_object_filename(int len, const char *hex_pfx, struct disa
 		 * alt->name/alt->base while iterating over the
 		 * object databases including our own.
 		 */
-		const char *objdir = get_object_directory();
-		size_t objdir_len = strlen(objdir);
-		fakeent = xmalloc(st_add3(sizeof(*fakeent), objdir_len, 43));
-		memcpy(fakeent->base, objdir, objdir_len);
-		fakeent->name = fakeent->base + objdir_len + 1;
-		fakeent->name[-1] = '/';
+		fakeent = alloc_alt_odb(get_object_directory());
 	}
 	fakeent->next = alt_odb_list;
 
-- 
2.10.0.618.g82cc264


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 11/18] alternates: encapsulate alt->base munging
  2016-10-03 20:33 [PATCH 0/18] alternate object database cleanups Jeff King
                   ` (9 preceding siblings ...)
  2016-10-03 20:35 ` [PATCH 10/18] alternates: provide helper for allocating alternate Jeff King
@ 2016-10-03 20:35 ` Jeff King
  2016-10-03 20:35 ` [PATCH 12/18] alternates: use a separate scratch space Jeff King
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 84+ messages in thread
From: Jeff King @ 2016-10-03 20:35 UTC (permalink / raw)
  To: git; +Cc: René Scharfe

The alternate_object_database struct holds a path to the
alternate objects, but we also use that buffer as scratch
space for forming loose object filenames. Let's pull that
logic into a helper function so that we can more easily
modify it.

Signed-off-by: Jeff King <peff@peff.net>
---
 sha1_file.c | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 549cf1e..ccf59ba 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -204,6 +204,13 @@ const char *sha1_file_name(const unsigned char *sha1)
 	return buf;
 }
 
+static const char *alt_sha1_path(struct alternate_object_database *alt,
+				 const unsigned char *sha1)
+{
+	fill_sha1_path(alt->name, sha1);
+	return alt->base;
+}
+
 /*
  * Return the name of the pack or index file with the specified sha1
  * in its filename.  *base and *name are scratch space that must be
@@ -601,8 +608,8 @@ static int check_and_freshen_nonlocal(const unsigned char *sha1, int freshen)
 	struct alternate_object_database *alt;
 	prepare_alt_odb();
 	for (alt = alt_odb_list; alt; alt = alt->next) {
-		fill_sha1_path(alt->name, sha1);
-		if (check_and_freshen_file(alt->base, freshen))
+		const char *path = alt_sha1_path(alt, sha1);
+		if (check_and_freshen_file(path, freshen))
 			return 1;
 	}
 	return 0;
@@ -1600,8 +1607,8 @@ static int stat_sha1_file(const unsigned char *sha1, struct stat *st)
 	prepare_alt_odb();
 	errno = ENOENT;
 	for (alt = alt_odb_list; alt; alt = alt->next) {
-		fill_sha1_path(alt->name, sha1);
-		if (!lstat(alt->base, st))
+		const char *path = alt_sha1_path(alt, sha1);
+		if (!lstat(path, st))
 			return 0;
 	}
 
@@ -1621,8 +1628,8 @@ static int open_sha1_file(const unsigned char *sha1)
 
 	prepare_alt_odb();
 	for (alt = alt_odb_list; alt; alt = alt->next) {
-		fill_sha1_path(alt->name, sha1);
-		fd = git_open_noatime(alt->base);
+		const char *path = alt_sha1_path(alt, sha1);
+		fd = git_open_noatime(path);
 		if (fd >= 0)
 			return fd;
 		if (most_interesting_errno == ENOENT)
-- 
2.10.0.618.g82cc264


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 12/18] alternates: use a separate scratch space
  2016-10-03 20:33 [PATCH 0/18] alternate object database cleanups Jeff King
                   ` (10 preceding siblings ...)
  2016-10-03 20:35 ` [PATCH 11/18] alternates: encapsulate alt->base munging Jeff King
@ 2016-10-03 20:35 ` Jeff King
  2016-10-04  6:12   ` Jacob Keller
  2016-10-04 21:29   ` Junio C Hamano
  2016-10-03 20:35 ` [PATCH 13/18] fill_sha1_file: write "boring" characters Jeff King
                   ` (7 subsequent siblings)
  19 siblings, 2 replies; 84+ messages in thread
From: Jeff King @ 2016-10-03 20:35 UTC (permalink / raw)
  To: git; +Cc: René Scharfe

The alternate_object_database struct uses a single buffer
both for storing the path to the alternate, and as a scratch
buffer for forming object names. This is efficient (since
otherwise we'd end up storing the path twice), but it makes
life hard for callers who just want to know the path to the
alternate. They have to remember to stop reading after
"alt->name - alt->base" bytes, and to subtract one for the
trailing '/'.

It would be much simpler if they could simply access a
NUL-terminated path string. We could encapsulate this in a
function which puts a NUL in the scratch buffer and returns
the string, but that opens up questions about the lifetime
of the result. The first time another caller uses the
alternate, the scratch buffer may get other data tacked onto
it.

Let's instead just store the root path separately from the
scratch buffer. There aren't enough alternates being stored
for the duplicated data to matter for performance, and this
keeps things simple and safe for the callers.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fsck.c              | 10 ++--------
 builtin/submodule--helper.c | 11 +++--------
 cache.h                     |  5 ++++-
 sha1_file.c                 | 28 ++++++++++++----------------
 sha1_name.c                 |  3 ++-
 transport.c                 |  4 +---
 6 files changed, 24 insertions(+), 37 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 055dfdc..f01b81e 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -644,14 +644,8 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 		fsck_object_dir(get_object_directory());
 
 		prepare_alt_odb();
-		for (alt = alt_odb_list; alt; alt = alt->next) {
-			/* directory name, minus trailing slash */
-			size_t namelen = alt->name - alt->base - 1;
-			struct strbuf name = STRBUF_INIT;
-			strbuf_add(&name, alt->base, namelen);
-			fsck_object_dir(name.buf);
-			strbuf_release(&name);
-		}
+		for (alt = alt_odb_list; alt; alt = alt->next)
+			fsck_object_dir(alt->path);
 	}
 
 	if (check_full) {
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index e3fdc0a..fd72c90 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -492,20 +492,16 @@ static int add_possible_reference_from_superproject(
 {
 	struct submodule_alternate_setup *sas = sas_cb;
 
-	/* directory name, minus trailing slash */
-	size_t namelen = alt->name - alt->base - 1;
-	struct strbuf name = STRBUF_INIT;
-	strbuf_add(&name, alt->base, namelen);
-
 	/*
 	 * If the alternate object store is another repository, try the
 	 * standard layout with .git/modules/<name>/objects
 	 */
-	if (ends_with(name.buf, ".git/objects")) {
+	if (ends_with(alt->path, ".git/objects")) {
 		char *sm_alternate;
 		struct strbuf sb = STRBUF_INIT;
 		struct strbuf err = STRBUF_INIT;
-		strbuf_add(&sb, name.buf, name.len - strlen("objects"));
+		strbuf_add(&sb, alt->path, strlen(alt->path) - strlen("objects"));
+
 		/*
 		 * We need to end the new path with '/' to mark it as a dir,
 		 * otherwise a submodule name containing '/' will be broken
@@ -533,7 +529,6 @@ static int add_possible_reference_from_superproject(
 		strbuf_release(&sb);
 	}
 
-	strbuf_release(&name);
 	return 0;
 }
 
diff --git a/cache.h b/cache.h
index d36b2ad..e1996c5 100644
--- a/cache.h
+++ b/cache.h
@@ -1382,8 +1382,11 @@ extern void remove_scheduled_dirs(void);
 
 extern struct alternate_object_database {
 	struct alternate_object_database *next;
+
 	char *name;
-	char base[FLEX_ARRAY]; /* more */
+	char *scratch;
+
+	char path[FLEX_ARRAY];
 } *alt_odb_list;
 extern void prepare_alt_odb(void);
 extern void read_info_alternates(const char * relative_base, int depth);
diff --git a/sha1_file.c b/sha1_file.c
index ccf59ba..70c3e2f 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -208,7 +208,7 @@ static const char *alt_sha1_path(struct alternate_object_database *alt,
 				 const unsigned char *sha1)
 {
 	fill_sha1_path(alt->name, sha1);
-	return alt->base;
+	return alt->scratch;
 }
 
 /*
@@ -261,8 +261,7 @@ static int alt_odb_usable(struct strbuf *path, const char *normalized_objdir)
 	 * thing twice, or object directory itself.
 	 */
 	for (alt = alt_odb_list; alt; alt = alt->next) {
-		if (path->len == alt->name - alt->base - 1 &&
-		    !memcmp(path->buf, alt->base, path->len))
+		if (!strcmp(path->buf, alt->path))
 			return 0;
 	}
 	if (!fspathcmp(path->buf, normalized_objdir))
@@ -401,13 +400,14 @@ struct alternate_object_database *alloc_alt_odb(const char *dir)
 	size_t entlen;
 
 	entlen = st_add(dirlen, 43); /* '/' + 2 hex + '/' + 38 hex + NUL */
-	ent = xmalloc(st_add(sizeof(*ent), entlen));
-	memcpy(ent->base, dir, dirlen);
+	FLEX_ALLOC_STR(ent, path, dir);
+	ent->scratch = xmalloc(entlen);
+	xsnprintf(ent->scratch, entlen, "%s/", dir);
 
-	ent->name = ent->base + dirlen + 1;
-	ent->base[dirlen] = '/';
-	ent->base[dirlen + 3] = '/';
-	ent->base[entlen-1] = 0;
+	ent->name = ent->scratch + dirlen + 1;
+	ent->scratch[dirlen] = '/';
+	ent->scratch[dirlen + 3] = '/';
+	ent->scratch[entlen-1] = 0;
 
 	return ent;
 }
@@ -1485,11 +1485,8 @@ void prepare_packed_git(void)
 		return;
 	prepare_packed_git_one(get_object_directory(), 1);
 	prepare_alt_odb();
-	for (alt = alt_odb_list; alt; alt = alt->next) {
-		alt->name[-1] = 0;
-		prepare_packed_git_one(alt->base, 0);
-		alt->name[-1] = '/';
-	}
+	for (alt = alt_odb_list; alt; alt = alt->next)
+		prepare_packed_git_one(alt->path, 0);
 	rearrange_packed_git();
 	prepare_packed_git_mru();
 	prepare_packed_git_run_once = 1;
@@ -3670,8 +3667,7 @@ static int loose_from_alt_odb(struct alternate_object_database *alt,
 	struct strbuf buf = STRBUF_INIT;
 	int r;
 
-	/* copy base not including trailing '/' */
-	strbuf_add(&buf, alt->base, alt->name - alt->base - 1);
+	strbuf_addstr(&buf, alt->path);
 	r = for_each_loose_file_in_objdir_buf(&buf,
 					      data->cb, NULL, NULL,
 					      data->data);
diff --git a/sha1_name.c b/sha1_name.c
index 98152a6..770ea4f 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -94,12 +94,13 @@ static void find_short_object_filename(int len, const char *hex_pfx, struct disa
 	for (alt = fakeent; alt && !ds->ambiguous; alt = alt->next) {
 		struct dirent *de;
 		DIR *dir;
+
 		/*
 		 * every alt_odb struct has 42 extra bytes after the base
 		 * for exactly this purpose
 		 */
 		xsnprintf(alt->name, 42, "%.2s/", hex_pfx);
-		dir = opendir(alt->base);
+		dir = opendir(alt->scratch);
 		if (!dir)
 			continue;
 
diff --git a/transport.c b/transport.c
index 94d6dc3..4bc4eea 100644
--- a/transport.c
+++ b/transport.c
@@ -1084,9 +1084,7 @@ static int refs_from_alternate_cb(struct alternate_object_database *e,
 	const struct ref *extra;
 	struct alternate_refs_data *cb = data;
 
-	e->name[-1] = '\0';
-	other = xstrdup(real_path(e->base));
-	e->name[-1] = '/';
+	other = xstrdup(real_path(e->path));
 	len = strlen(other);
 
 	while (other[len-1] == '/')
-- 
2.10.0.618.g82cc264


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 13/18] fill_sha1_file: write "boring" characters
  2016-10-03 20:33 [PATCH 0/18] alternate object database cleanups Jeff King
                   ` (11 preceding siblings ...)
  2016-10-03 20:35 ` [PATCH 12/18] alternates: use a separate scratch space Jeff King
@ 2016-10-03 20:35 ` Jeff King
  2016-10-04  6:13   ` Jacob Keller
  2016-10-03 20:36 ` [PATCH 14/18] alternates: store scratch buffer as strbuf Jeff King
                   ` (6 subsequent siblings)
  19 siblings, 1 reply; 84+ messages in thread
From: Jeff King @ 2016-10-03 20:35 UTC (permalink / raw)
  To: git; +Cc: René Scharfe

This function forms a sha1 as "xx/yyyy...", but skips over
the slot for the slash rather than writing it, leaving it to
the caller to do so. It also does not bother to put in a
trailing NUL, even though every caller would want it (we're
forming a path which by definition is not a directory, so
the only thing to do with it is feed it to a system call).

Let's make the lives of our callers easier by just writing
out the internal "/" and the NUL.

Signed-off-by: Jeff King <peff@peff.net>
---
 sha1_file.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 70c3e2f..c6308c1 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -178,10 +178,12 @@ static void fill_sha1_path(char *pathbuf, const unsigned char *sha1)
 	for (i = 0; i < 20; i++) {
 		static char hex[] = "0123456789abcdef";
 		unsigned int val = sha1[i];
-		char *pos = pathbuf + i*2 + (i > 0);
-		*pos++ = hex[val >> 4];
-		*pos = hex[val & 0xf];
+		*pathbuf++ = hex[val >> 4];
+		*pathbuf++ = hex[val & 0xf];
+		if (!i)
+			*pathbuf++ = '/';
 	}
+	*pathbuf = '\0';
 }
 
 const char *sha1_file_name(const unsigned char *sha1)
@@ -198,8 +200,6 @@ const char *sha1_file_name(const unsigned char *sha1)
 		die("insanely long object directory %s", objdir);
 	memcpy(buf, objdir, len);
 	buf[len] = '/';
-	buf[len+3] = '/';
-	buf[len+42] = '\0';
 	fill_sha1_path(buf + len + 1, sha1);
 	return buf;
 }
@@ -406,8 +406,6 @@ struct alternate_object_database *alloc_alt_odb(const char *dir)
 
 	ent->name = ent->scratch + dirlen + 1;
 	ent->scratch[dirlen] = '/';
-	ent->scratch[dirlen + 3] = '/';
-	ent->scratch[entlen-1] = 0;
 
 	return ent;
 }
-- 
2.10.0.618.g82cc264


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 14/18] alternates: store scratch buffer as strbuf
  2016-10-03 20:33 [PATCH 0/18] alternate object database cleanups Jeff King
                   ` (12 preceding siblings ...)
  2016-10-03 20:35 ` [PATCH 13/18] fill_sha1_file: write "boring" characters Jeff King
@ 2016-10-03 20:36 ` Jeff King
  2016-10-03 20:36 ` [PATCH 15/18] fill_sha1_file: write into a strbuf Jeff King
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 84+ messages in thread
From: Jeff King @ 2016-10-03 20:36 UTC (permalink / raw)
  To: git; +Cc: René Scharfe

We pre-size the scratch buffer to hold a loose object
filename of the form "xx/yyyy...", which leads to allocation
code that is hard to verify. We have to use some magic
numbers during the initial allocation, and then writers must
blindly assume that the buffer is big enough. Using a strbuf
makes it more clear that we cannot overflow.

Unfortunately, we do still need some magic numbers to grow
our strbuf before calling fill_sha1_path(), but the strbuf
growth is much closer to the point of use. This makes it
easier to see that it's correct, and opens the possibility
of pushing it even further down if fill_sha1_path() learns
to work on strbufs.

Signed-off-by: Jeff King <peff@peff.net>
---
 cache.h     | 13 +++++++++++--
 sha1_file.c | 28 ++++++++++++++++++----------
 sha1_name.c |  9 +++------
 3 files changed, 32 insertions(+), 18 deletions(-)

diff --git a/cache.h b/cache.h
index e1996c5..9866e46 100644
--- a/cache.h
+++ b/cache.h
@@ -1383,8 +1383,9 @@ extern void remove_scheduled_dirs(void);
 extern struct alternate_object_database {
 	struct alternate_object_database *next;
 
-	char *name;
-	char *scratch;
+	/* see alt_scratch_buf() */
+	struct strbuf scratch;
+	size_t base_len;
 
 	char path[FLEX_ARRAY];
 } *alt_odb_list;
@@ -1413,6 +1414,14 @@ extern void add_to_alternates_file(const char *dir);
  */
 extern void add_to_alternates_memory(const char *dir);
 
+/*
+ * Returns a scratch strbuf pre-filled with the alternate object directory,
+ * including a trailing slash, which can be used to access paths in the
+ * alternate. Always use this over direct access to alt->scratch, as it
+ * cleans up any previous use of the scratch buffer.
+ */
+extern struct strbuf *alt_scratch_buf(struct alternate_object_database *alt);
+
 struct pack_window {
 	struct pack_window *next;
 	unsigned char *base;
diff --git a/sha1_file.c b/sha1_file.c
index c6308c1..efc8cee 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -204,11 +204,24 @@ const char *sha1_file_name(const unsigned char *sha1)
 	return buf;
 }
 
+struct strbuf *alt_scratch_buf(struct alternate_object_database *alt)
+{
+	strbuf_setlen(&alt->scratch, alt->base_len);
+	return &alt->scratch;
+}
+
 static const char *alt_sha1_path(struct alternate_object_database *alt,
 				 const unsigned char *sha1)
 {
-	fill_sha1_path(alt->name, sha1);
-	return alt->scratch;
+	/* hex sha1 plus internal "/" */
+	size_t len = GIT_SHA1_HEXSZ + 1;
+	struct strbuf *buf = alt_scratch_buf(alt);
+
+	strbuf_grow(buf, len);
+	fill_sha1_path(buf->buf + buf->len, sha1);
+	strbuf_setlen(buf, buf->len + len);
+
+	return buf->buf;
 }
 
 /*
@@ -396,16 +409,11 @@ void read_info_alternates(const char * relative_base, int depth)
 struct alternate_object_database *alloc_alt_odb(const char *dir)
 {
 	struct alternate_object_database *ent;
-	size_t dirlen = strlen(dir);
-	size_t entlen;
 
-	entlen = st_add(dirlen, 43); /* '/' + 2 hex + '/' + 38 hex + NUL */
 	FLEX_ALLOC_STR(ent, path, dir);
-	ent->scratch = xmalloc(entlen);
-	xsnprintf(ent->scratch, entlen, "%s/", dir);
-
-	ent->name = ent->scratch + dirlen + 1;
-	ent->scratch[dirlen] = '/';
+	strbuf_init(&ent->scratch, 0);
+	strbuf_addf(&ent->scratch, "%s/", dir);
+	ent->base_len = ent->scratch.len;
 
 	return ent;
 }
diff --git a/sha1_name.c b/sha1_name.c
index 770ea4f..defbb3e 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -92,15 +92,12 @@ static void find_short_object_filename(int len, const char *hex_pfx, struct disa
 
 	xsnprintf(hex, sizeof(hex), "%.2s", hex_pfx);
 	for (alt = fakeent; alt && !ds->ambiguous; alt = alt->next) {
+		struct strbuf *buf = alt_scratch_buf(alt);
 		struct dirent *de;
 		DIR *dir;
 
-		/*
-		 * every alt_odb struct has 42 extra bytes after the base
-		 * for exactly this purpose
-		 */
-		xsnprintf(alt->name, 42, "%.2s/", hex_pfx);
-		dir = opendir(alt->scratch);
+		strbuf_addf(buf, "%.2s/", hex_pfx);
+		dir = opendir(buf->buf);
 		if (!dir)
 			continue;
 
-- 
2.10.0.618.g82cc264


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 15/18] fill_sha1_file: write into a strbuf
  2016-10-03 20:33 [PATCH 0/18] alternate object database cleanups Jeff King
                   ` (13 preceding siblings ...)
  2016-10-03 20:36 ` [PATCH 14/18] alternates: store scratch buffer as strbuf Jeff King
@ 2016-10-03 20:36 ` Jeff King
  2016-10-04  6:44   ` Jacob Keller
  2016-10-03 20:36 ` [PATCH 16/18] count-objects: report alternates via verbose mode Jeff King
                   ` (4 subsequent siblings)
  19 siblings, 1 reply; 84+ messages in thread
From: Jeff King @ 2016-10-03 20:36 UTC (permalink / raw)
  To: git; +Cc: René Scharfe

It's currently the responsibility of the caller to give
fill_sha1_file() enough bytes to write into, leading them to
manually compute the required lengths. Instead, let's just
write into a strbuf so that it's impossible to get this
wrong.

The alt_odb caller already has a strbuf, so this makes
things strictly simpler. The other caller, sha1_file_name(),
uses a static PATH_MAX buffer and dies when it would
overflow. We can convert this to a static strbuf, which
means our allocation cost is amortized (and as a bonus, we
no longer have to worry about PATH_MAX being too short for
normal use).

This does introduce some small overhead in fill_sha1_file(),
as each strbuf_addchar() will check whether it needs to
grow. However, between the optimization in fec501d
(strbuf_addch: avoid calling strbuf_grow, 2015-04-16) and
the fact that this is not generally called in a tight loop
(after all, the next step is typically to access the file!)
this probably doesn't matter. And even if it did, the right
place to micro-optimize is inside fill_sha1_file(), by
calling a single strbuf_grow() there.

Signed-off-by: Jeff King <peff@peff.net>
---
 sha1_file.c | 34 ++++++++++------------------------
 1 file changed, 10 insertions(+), 24 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index efc8cee..80a3333 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -172,36 +172,28 @@ enum scld_error safe_create_leading_directories_const(const char *path)
 	return result;
 }
 
-static void fill_sha1_path(char *pathbuf, const unsigned char *sha1)
+static void fill_sha1_path(struct strbuf *buf, const unsigned char *sha1)
 {
 	int i;
 	for (i = 0; i < 20; i++) {
 		static char hex[] = "0123456789abcdef";
 		unsigned int val = sha1[i];
-		*pathbuf++ = hex[val >> 4];
-		*pathbuf++ = hex[val & 0xf];
+		strbuf_addch(buf, hex[val >> 4]);
+		strbuf_addch(buf, hex[val & 0xf]);
 		if (!i)
-			*pathbuf++ = '/';
+			strbuf_addch(buf, '/');
 	}
-	*pathbuf = '\0';
 }
 
 const char *sha1_file_name(const unsigned char *sha1)
 {
-	static char buf[PATH_MAX];
-	const char *objdir;
-	int len;
+	static struct strbuf buf = STRBUF_INIT;
 
-	objdir = get_object_directory();
-	len = strlen(objdir);
+	strbuf_reset(&buf);
+	strbuf_addf(&buf, "%s/", get_object_directory());
 
-	/* '/' + sha1(2) + '/' + sha1(38) + '\0' */
-	if (len + 43 > PATH_MAX)
-		die("insanely long object directory %s", objdir);
-	memcpy(buf, objdir, len);
-	buf[len] = '/';
-	fill_sha1_path(buf + len + 1, sha1);
-	return buf;
+	fill_sha1_path(&buf, sha1);
+	return buf.buf;
 }
 
 struct strbuf *alt_scratch_buf(struct alternate_object_database *alt)
@@ -213,14 +205,8 @@ struct strbuf *alt_scratch_buf(struct alternate_object_database *alt)
 static const char *alt_sha1_path(struct alternate_object_database *alt,
 				 const unsigned char *sha1)
 {
-	/* hex sha1 plus internal "/" */
-	size_t len = GIT_SHA1_HEXSZ + 1;
 	struct strbuf *buf = alt_scratch_buf(alt);
-
-	strbuf_grow(buf, len);
-	fill_sha1_path(buf->buf + buf->len, sha1);
-	strbuf_setlen(buf, buf->len + len);
-
+	fill_sha1_path(buf, sha1);
 	return buf->buf;
 }
 
-- 
2.10.0.618.g82cc264


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 16/18] count-objects: report alternates via verbose mode
  2016-10-03 20:33 [PATCH 0/18] alternate object database cleanups Jeff King
                   ` (14 preceding siblings ...)
  2016-10-03 20:36 ` [PATCH 15/18] fill_sha1_file: write into a strbuf Jeff King
@ 2016-10-03 20:36 ` Jeff King
  2016-10-04  6:46   ` Jacob Keller
                     ` (2 more replies)
  2016-10-03 20:36 ` [PATCH 17/18] sha1_file: always allow relative paths to alternates Jeff King
                   ` (3 subsequent siblings)
  19 siblings, 3 replies; 84+ messages in thread
From: Jeff King @ 2016-10-03 20:36 UTC (permalink / raw)
  To: git; +Cc: René Scharfe

There's no way to get the list of alternates that git
computes internally; our tests only infer it based on which
objects are available. In addition to testing, knowing this
list may be helpful for somebody debugging their alternates
setup.

Let's add it to the "count-objects -v" output. We could give
it a separate flag, but there's not really any need.
"count-objects -v" is already a debugging catch-all for the
object database, its output is easily extensible to new data
items, and printing the alternates is not expensive (we
already had to find them to count the objects).

Signed-off-by: Jeff King <peff@peff.net>
---
 Documentation/git-count-objects.txt |  5 +++++
 builtin/count-objects.c             | 10 ++++++++++
 t/t5613-info-alternate.sh           | 10 ++++++++++
 3 files changed, 25 insertions(+)

diff --git a/Documentation/git-count-objects.txt b/Documentation/git-count-objects.txt
index 2ff3568..cb9b4d2 100644
--- a/Documentation/git-count-objects.txt
+++ b/Documentation/git-count-objects.txt
@@ -38,6 +38,11 @@ objects nor valid packs
 +
 size-garbage: disk space consumed by garbage files, in KiB (unless -H is
 specified)
++
+alternate: absolute path of alternate object databases; may appear
+multiple times, one line per path. Note that if the path contains
+non-printable characters, it may be surrounded by double-quotes and
+contain C-style backslashed escape sequences.
 
 -H::
 --human-readable::
diff --git a/builtin/count-objects.c b/builtin/count-objects.c
index ba92919..a700409 100644
--- a/builtin/count-objects.c
+++ b/builtin/count-objects.c
@@ -8,6 +8,7 @@
 #include "dir.h"
 #include "builtin.h"
 #include "parse-options.h"
+#include "quote.h"
 
 static unsigned long garbage;
 static off_t size_garbage;
@@ -73,6 +74,14 @@ static int count_cruft(const char *basename, const char *path, void *data)
 	return 0;
 }
 
+static int print_alternate(struct alternate_object_database *alt, void *data)
+{
+	printf("alternate: ");
+	quote_c_style(alt->path, NULL, stdout, 0);
+	putchar('\n');
+	return 0;
+}
+
 static char const * const count_objects_usage[] = {
 	N_("git count-objects [-v] [-H | --human-readable]"),
 	NULL
@@ -140,6 +149,7 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
 		printf("prune-packable: %lu\n", packed_loose);
 		printf("garbage: %lu\n", garbage);
 		printf("size-garbage: %s\n", garbage_buf.buf);
+		foreach_alt_odb(print_alternate, NULL);
 		strbuf_release(&loose_buf);
 		strbuf_release(&pack_buf);
 		strbuf_release(&garbage_buf);
diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
index b393613..74f6770 100755
--- a/t/t5613-info-alternate.sh
+++ b/t/t5613-info-alternate.sh
@@ -39,6 +39,16 @@ test_expect_success 'preparing third repository' '
 	)
 '
 
+test_expect_success 'count-objects shows the alternates' '
+	cat >expect <<-EOF &&
+	alternate: $(pwd)/B/.git/objects
+	alternate: $(pwd)/A/.git/objects
+	EOF
+	git -C C count-objects -v >actual &&
+	grep ^alternate: actual >actual.alternates &&
+	test_cmp expect actual.alternates
+'
+
 # Note: These tests depend on the hard-coded value of 5 as "too deep". We start
 # the depth at 0 and count links, not repositories, so in a chain like:
 #
-- 
2.10.0.618.g82cc264


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 17/18] sha1_file: always allow relative paths to alternates
  2016-10-03 20:33 [PATCH 0/18] alternate object database cleanups Jeff King
                   ` (15 preceding siblings ...)
  2016-10-03 20:36 ` [PATCH 16/18] count-objects: report alternates via verbose mode Jeff King
@ 2016-10-03 20:36 ` Jeff King
  2016-10-04  6:50   ` Jacob Keller
  2016-10-03 20:36 ` [PATCH 18/18] alternates: use fspathcmp to detect duplicates Jeff King
                   ` (2 subsequent siblings)
  19 siblings, 1 reply; 84+ messages in thread
From: Jeff King @ 2016-10-03 20:36 UTC (permalink / raw)
  To: git; +Cc: René Scharfe

We recursively expand alternates repositories, so that if A
borrows from B which borrows from C, A can see all objects.

For the root object database, we allow relative paths, so A
can point to B as "../B/objects". However, we currently do
not allow relative paths when recursing, so B must use an
absolute path to reach C.

That is an ancient protection from c2f493a (Transitively
read alternatives, 2006-05-07) that tries to avoid adding
the same alternate through two different paths. Since
5bdf0a8 (sha1_file: normalize alt_odb path before comparing
and storing, 2011-09-07), we use a normalized absolute path
for each alt_odb entry.

This means that in most cases the protection is no longer
necessary; we will detect the duplicate no matter how we got
there (but see below).  And it's a good idea to get rid of
it, as it creates an unnecessary complication when setting
up recursive alternates (B has to know that A is going to
borrow from it and make sure to use an absolute path).

Note that our normalization doesn't actually look at the
filesystem, so it can still be fooled by crossing symbolic
links. But that's also true of absolute paths, so it's not a
good reason to disallow only relative paths (it's
potentially a reason to switch to real_path(), but that's a
separate and non-trivial change).

We adjust the test script here to demonstrate that this now
works, and add new tests to show that the normalization does
indeed suppress duplicates.

Signed-off-by: Jeff King <peff@peff.net>
---
 sha1_file.c               |  7 +------
 t/t5613-info-alternate.sh | 24 ++++++++++++++++++++++--
 2 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 80a3333..b514167 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -354,12 +354,7 @@ static void link_alt_odb_entries(const char *alt, int len, int sep,
 		const char *entry = entries.items[i].string;
 		if (entry[0] == '\0' || entry[0] == '#')
 			continue;
-		if (!is_absolute_path(entry) && depth) {
-			error("%s: ignoring relative alternate object store %s",
-					relative_base, entry);
-		} else {
-			link_alt_odb_entry(entry, relative_base, depth, objdirbuf.buf);
-		}
+		link_alt_odb_entry(entry, relative_base, depth, objdirbuf.buf);
 	}
 	string_list_clear(&entries, 0);
 	free(alt_copy);
diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
index 74f6770..76525a0 100755
--- a/t/t5613-info-alternate.sh
+++ b/t/t5613-info-alternate.sh
@@ -92,8 +92,28 @@ test_expect_success 'that relative alternate is possible for current dir' '
 	git fsck
 '
 
-test_expect_success 'that relative alternate is only possible for current dir' '
-	test_must_fail git -C D fsck
+test_expect_success 'that relative alternate is recursive' '
+	git -C D fsck
+'
+
+# we can reach "A" from our new repo both directly, and via "C".
+# The deep/subdir is there to make sure we are not doing a stupid
+# pure-text comparison of the alternate names.
+test_expect_success 'relative duplicates are eliminated' '
+	mkdir -p deep/subdir &&
+	git init --bare deep/subdir/duplicate.git &&
+	cat >deep/subdir/duplicate.git/objects/info/alternates <<-\EOF &&
+	../../../../C/.git/objects
+	../../../../A/.git/objects
+	EOF
+	cat >expect <<-EOF &&
+	alternate: $(pwd)/C/.git/objects
+	alternate: $(pwd)/B/.git/objects
+	alternate: $(pwd)/A/.git/objects
+	EOF
+	git -C deep/subdir/duplicate.git count-objects -v >actual &&
+	grep ^alternate: actual >actual.alternates &&
+	test_cmp expect actual.alternates
 '
 
 test_done
-- 
2.10.0.618.g82cc264


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* [PATCH 18/18] alternates: use fspathcmp to detect duplicates
  2016-10-03 20:33 [PATCH 0/18] alternate object database cleanups Jeff King
                   ` (16 preceding siblings ...)
  2016-10-03 20:36 ` [PATCH 17/18] sha1_file: always allow relative paths to alternates Jeff King
@ 2016-10-03 20:36 ` Jeff King
  2016-10-04  6:51   ` Jacob Keller
                     ` (2 more replies)
  2016-10-04  5:47 ` [PATCH 0/18] alternate object database cleanups Jacob Keller
  2016-10-05 18:47 ` René Scharfe
  19 siblings, 3 replies; 84+ messages in thread
From: Jeff King @ 2016-10-03 20:36 UTC (permalink / raw)
  To: git; +Cc: René Scharfe

On a case-insensitive filesystem, we should realize that
"a/objects" and "A/objects" are the same path. We already
use fspathcmp() to check against the main object directory,
but until recently we couldn't use it for comparing against
other alternates (because their paths were not
NUL-terminated strings). But now we can, so let's do so.

Note that we also need to adjust count-objects to load the
config, so that it can see the setting of core.ignorecase
(this is required by the test, but is also a general bugfix
for users of count-objects).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/count-objects.c   |  2 ++
 sha1_file.c               |  2 +-
 t/t5613-info-alternate.sh | 17 +++++++++++++++++
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/builtin/count-objects.c b/builtin/count-objects.c
index a700409..a04b4f2 100644
--- a/builtin/count-objects.c
+++ b/builtin/count-objects.c
@@ -97,6 +97,8 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
 		OPT_END(),
 	};
 
+	git_config(git_default_config, NULL);
+
 	argc = parse_options(argc, argv, prefix, opts, count_objects_usage, 0);
 	/* we do not take arguments other than flags for now */
 	if (argc)
diff --git a/sha1_file.c b/sha1_file.c
index b514167..b05ec9c 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -260,7 +260,7 @@ static int alt_odb_usable(struct strbuf *path, const char *normalized_objdir)
 	 * thing twice, or object directory itself.
 	 */
 	for (alt = alt_odb_list; alt; alt = alt->next) {
-		if (!strcmp(path->buf, alt->path))
+		if (!fspathcmp(path->buf, alt->path))
 			return 0;
 	}
 	if (!fspathcmp(path->buf, normalized_objdir))
diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
index 76525a0..926fe14 100755
--- a/t/t5613-info-alternate.sh
+++ b/t/t5613-info-alternate.sh
@@ -116,4 +116,21 @@ test_expect_success 'relative duplicates are eliminated' '
 	test_cmp expect actual.alternates
 '
 
+test_expect_success CASE_INSENSITIVE_FS 'dup finding can be case-insensitive' '
+	git init --bare insensitive.git &&
+	# the previous entry for "A" will have used uppercase
+	cat >insensitive.git/objects/info/alternates <<-\EOF &&
+	../../C/.git/objects
+	../../a/.git/objects
+	EOF
+	cat >expect <<-EOF &&
+	alternate: $(pwd)/C/.git/objects
+	alternate: $(pwd)/B/.git/objects
+	alternate: $(pwd)/A/.git/objects
+	EOF
+	git -C insensitive.git count-objects -v >actual &&
+	grep ^alternate: actual >actual.alternates &&
+	test_cmp expect actual.alternates
+'
+
 test_done
-- 
2.10.0.618.g82cc264

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* Re: [PATCH 0/18] alternate object database cleanups
  2016-10-03 20:33 [PATCH 0/18] alternate object database cleanups Jeff King
                   ` (17 preceding siblings ...)
  2016-10-03 20:36 ` [PATCH 18/18] alternates: use fspathcmp to detect duplicates Jeff King
@ 2016-10-04  5:47 ` Jacob Keller
  2016-10-04 13:41   ` Jeff King
  2016-10-05 18:47 ` René Scharfe
  19 siblings, 1 reply; 84+ messages in thread
From: Jacob Keller @ 2016-10-04  5:47 UTC (permalink / raw)
  To: Jeff King; +Cc: Git mailing list, René Scharfe

On Mon, Oct 3, 2016 at 1:33 PM, Jeff King <peff@peff.net> wrote:
> This series is the result of René nerd-sniping me with the claim that we
> could "easily" teach count-objects to print out the list of alternates
> in:
>
>   http://public-inbox.org/git/c27dc1a4-3c7a-2866-d9d8-f5d3eb161650@web.de/
>

Hah. Nerd snipes are fun.

> My real goal is just patch 17, which is needed for the quarantine series
> in that thread. But along the way there were quite a few opportunities
> for cleanups along with a few minor bugfixes (in patches 7 and 18), and
> I think the count-objects change in patch 16 is a nice general debugging
> tool.

Yea there are a *lot* of cleanups here.

>
> The rest of it is "just" cleanup, but I'll note that it clears up some
> hairy allocation code. These were bits that I noticed in my big
> allocation-cleanup series last year, but were too nasty to fit any of
> the more general fixes. I think the end result is much better.
>

Definitely agreed. I read through all the patches, and each one seemed
reasonable.

> The number of patches is a little intimidating, but I tried hard to
> break the refactoring down into a sequence of obviously-correct steps.
> You can be the judge of my success.
>

I read through them once. I'm going to re-read through them again and
leave any comments I had.

Regards,
Jake

>   [01/18]: t5613: drop reachable_via function
>   [02/18]: t5613: drop test_valid_repo function
>   [03/18]: t5613: use test_must_fail
>   [04/18]: t5613: whitespace/style cleanups
>   [05/18]: t5613: do not chdir in main process
>   [06/18]: t5613: clarify "too deep" recursion tests
>   [07/18]: link_alt_odb_entry: handle normalize_path errors
>   [08/18]: link_alt_odb_entry: refactor string handling
>   [09/18]: alternates: provide helper for adding to alternates list
>   [10/18]: alternates: provide helper for allocating alternate
>   [11/18]: alternates: encapsulate alt->base munging
>   [12/18]: alternates: use a separate scratch space
>   [13/18]: fill_sha1_file: write "boring" characters
>   [14/18]: alternates: store scratch buffer as strbuf
>   [15/18]: fill_sha1_file: write into a strbuf
>   [16/18]: count-objects: report alternates via verbose mode
>   [17/18]: sha1_file: always allow relative paths to alternates
>   [18/18]: alternates: use fspathcmp to detect duplicates
>
>  Documentation/git-count-objects.txt |   5 +
>  builtin/count-objects.c             |  12 +++
>  builtin/fsck.c                      |  10 +-
>  builtin/submodule--helper.c         |  11 +-
>  cache.h                             |  36 ++++++-
>  sha1_file.c                         | 179 ++++++++++++++++++--------------
>  sha1_name.c                         |  17 +--
>  strbuf.c                            |  20 ++++
>  strbuf.h                            |   8 ++
>  submodule.c                         |  23 +---
>  t/t5613-info-alternate.sh           | 202 ++++++++++++++++++++----------------
>  transport.c                         |   4 +-
>  12 files changed, 305 insertions(+), 222 deletions(-)
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 01/18] t5613: drop reachable_via function
  2016-10-03 20:33 ` [PATCH 01/18] t5613: drop reachable_via function Jeff King
@ 2016-10-04  5:48   ` Jacob Keller
  2016-10-04 13:43     ` Jeff King
  0 siblings, 1 reply; 84+ messages in thread
From: Jacob Keller @ 2016-10-04  5:48 UTC (permalink / raw)
  To: Jeff King; +Cc: Git mailing list, René Scharfe

On Mon, Oct 3, 2016 at 1:33 PM, Jeff King <peff@peff.net> wrote:
> This function was never used since its inception in dd05ea1
> (test case for transitive info/alternates, 2006-05-07).
> Which is just as well, since it mutates the repo state in a
> way that would invalidate further tests, without cleaning up
> after itself. Let's get rid of it so that nobody is tempted
> to use it.
>

Makes sense. It wouldn't be a good idea to leave this around since it
didn't clean up after itself. Curious why no test actually used it
though..

Thanks,
Jake

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 02/18] t5613: drop test_valid_repo function
  2016-10-03 20:33 ` [PATCH 02/18] t5613: drop test_valid_repo function Jeff King
@ 2016-10-04  5:50   ` Jacob Keller
  0 siblings, 0 replies; 84+ messages in thread
From: Jacob Keller @ 2016-10-04  5:50 UTC (permalink / raw)
  To: Jeff King; +Cc: Git mailing list, René Scharfe

On Mon, Oct 3, 2016 at 1:33 PM, Jeff King <peff@peff.net> wrote:
> This function makes sure that "git fsck" does not report any
> errors. But "--full" has been the default since f29cd39
> (fsck: default to "git fsck --full", 2009-10-20), and we can
> use the exit code (instead of counting the lines) since
> e2b4f63 (fsck: exit with non-zero status upon errors,
> 2007-03-05).
>
> So we can just use "git fsck", which is shorter and more
> flexible (e.g., we can use "git -C").

This seems obviously correct. I didn't understand your comment about
the use of "git -C" at first, because I was confused about why "git
-C" doesn't work with "git --full", but then I realized you can't use
"git -C" with the shell test_valid_repo function.

Thanks,
Jake

>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  t/t5613-info-alternate.sh | 19 +++++++------------
>  1 file changed, 7 insertions(+), 12 deletions(-)
>
> diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
> index e13f57d..4548fb0 100755
> --- a/t/t5613-info-alternate.sh
> +++ b/t/t5613-info-alternate.sh
> @@ -6,11 +6,6 @@
>  test_description='test transitive info/alternate entries'
>  . ./test-lib.sh
>
> -test_valid_repo() {
> -       git fsck --full > fsck.log &&
> -       test_line_count = 0 fsck.log
> -}
> -
>  base_dir=$(pwd)

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 03/18] t5613: use test_must_fail
  2016-10-03 20:34 ` [PATCH 03/18] t5613: use test_must_fail Jeff King
@ 2016-10-04  5:51   ` Jacob Keller
  0 siblings, 0 replies; 84+ messages in thread
From: Jacob Keller @ 2016-10-04  5:51 UTC (permalink / raw)
  To: Jeff King; +Cc: Git mailing list, René Scharfe

On Mon, Oct 3, 2016 at 1:34 PM, Jeff King <peff@peff.net> wrote:
> Besides being our normal style, this correctly checks for an
> error exit() versus signal death.
>

Another very simple but obvious improvement.

Regards,
Jake

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  t/t5613-info-alternate.sh | 12 ++++++------
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
> index 4548fb0..65074dd 100755
> --- a/t/t5613-info-alternate.sh
> +++ b/t/t5613-info-alternate.sh
> @@ -46,10 +46,9 @@ git clone -l -s F G &&
>  git clone --bare -l -s G H'
>
>  test_expect_success 'invalidity of deepest repository' \
> -'cd H && {
> -       git fsck
> -       test $? -ne 0
> -}'
> +'cd H &&
> +test_must_fail git fsck
> +'
>
>  cd "$base_dir"
>
> @@ -75,7 +74,8 @@ cd "$base_dir"
>  test_expect_success 'that info/alternates is necessary' \
>  'cd C &&
>  rm -f .git/objects/info/alternates &&
> -! (git fsck)'
> +test_must_fail git fsck
> +'
>
>  cd "$base_dir"
>
> @@ -89,7 +89,7 @@ cd "$base_dir"
>  test_expect_success \
>      'that relative alternate is only possible for current dir' '
>      cd D &&
> -    ! (git fsck)
> +    test_must_fail git fsck
>  '
>
>  cd "$base_dir"
> --
> 2.10.0.618.g82cc264
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 04/18] t5613: whitespace/style cleanups
  2016-10-03 20:34 ` [PATCH 04/18] t5613: whitespace/style cleanups Jeff King
@ 2016-10-04  5:52   ` Jacob Keller
  2016-10-04 13:47     ` Jeff King
  0 siblings, 1 reply; 84+ messages in thread
From: Jacob Keller @ 2016-10-04  5:52 UTC (permalink / raw)
  To: Jeff King; +Cc: Git mailing list, René Scharfe

On Mon, Oct 3, 2016 at 1:34 PM, Jeff King <peff@peff.net> wrote:
> Our normal test style these days puts the opening quote of
> the body on the description line, and indents the body with
> a single tab. This ancient test did not follow this.
>

I was surprised you didn't do this first, but it doesn't really make a
difference either way. This is also a pretty straight forward
improvement, and I can see why you'd want to split this out to review
separately.

Regards,
Jake

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 05/18] t5613: do not chdir in main process
  2016-10-03 20:34 ` [PATCH 05/18] t5613: do not chdir in main process Jeff King
@ 2016-10-04  5:54   ` Jacob Keller
  2016-10-04 21:00   ` Junio C Hamano
  1 sibling, 0 replies; 84+ messages in thread
From: Jacob Keller @ 2016-10-04  5:54 UTC (permalink / raw)
  To: Jeff King; +Cc: Git mailing list, René Scharfe

On Mon, Oct 3, 2016 at 1:34 PM, Jeff King <peff@peff.net> wrote:
> Our usual style when working with subdirectories is to chdir
> inside a subshell or to use "git -C", which means we do not
> have to constantly return to the main test directory. Let's
> convert this old test, which does not follow that style.
>

More obvious cleanup for this test file. Seems like a lot of
intermediate steps to get this test file into a clean state.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 06/18] t5613: clarify "too deep" recursion tests
  2016-10-03 20:34 ` [PATCH 06/18] t5613: clarify "too deep" recursion tests Jeff King
@ 2016-10-04  5:57   ` Jacob Keller
  2016-10-04 13:48     ` Jeff King
  0 siblings, 1 reply; 84+ messages in thread
From: Jacob Keller @ 2016-10-04  5:57 UTC (permalink / raw)
  To: Jeff King; +Cc: Git mailing list, René Scharfe

On Mon, Oct 3, 2016 at 1:34 PM, Jeff King <peff@peff.net> wrote:
> These tests are just trying to show that we allow recursion
> up to a certain depth, but not past it. But the counting is
> a bit non-intuitive, and rather than test at the edge of the
> breakage, we test "OK" cases in the middle of the chain.
> Let's explain what's going on, and explicitly test the
> switch between "OK" and "too deep".
>

Makes sense to actually test the edge case here instead of just in the middle.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  t/t5613-info-alternate.sh | 24 ++++++++++++++++--------
>  1 file changed, 16 insertions(+), 8 deletions(-)
>
> diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
> index 7bc1c3c..b393613 100755
> --- a/t/t5613-info-alternate.sh
> +++ b/t/t5613-info-alternate.sh
> @@ -39,6 +39,18 @@ test_expect_success 'preparing third repository' '
>         )
>  '
>
> +# Note: These tests depend on the hard-coded value of 5 as "too deep". We start
> +# the depth at 0 and count links, not repositories, so in a chain like:
> +#
> +#   A -> B -> C -> D -> E -> F -> G -> H
> +#      0    1    2    3    4    5    6
> +#

Ok so we count links, but wouldn't we have 5 links when we hit F, and
not G? Or am I missing something here?

> +# we are OK at "G", but break at "H".
> +#

Seems like from the wording of this comment that we'd break at G and
not H..? Obviously the test below shows G is ok. Aren't the numbers
here off by 1?

Regards,
Jake

> +# Note also that we must use "--bare -l" to make the link to H. The "-l"
> +# ensures we do not do a connectivity check, and the "--bare" makes sure
> +# we do not try to checkout the result (which needs objects), either of
> +# which would cause the clone to fail.



>  test_expect_success 'creating too deep nesting' '
>         git clone -l -s C D &&
>         git clone -l -s D E &&
> @@ -47,16 +59,12 @@ test_expect_success 'creating too deep nesting' '
>         git clone --bare -l -s G H
>  '
>
> -test_expect_success 'invalidity of deepest repository' '
> -       test_must_fail git -C H fsck
> -'
> -
> -test_expect_success 'validity of third repository' '
> -       git -C C fsck
> +test_expect_success 'validity of fifth-deep repository' '
> +       git -C G fsck
>  '
>
> -test_expect_success 'validity of fourth repository' '
> -       git -C D fsck
> +test_expect_success 'invalidity of sixth-deep repository' '
> +       test_must_fail git -C H fsck
>  '
>
>  test_expect_success 'breaking of loops' '
> --
> 2.10.0.618.g82cc264
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 07/18] link_alt_odb_entry: handle normalize_path errors
  2016-10-03 20:34 ` [PATCH 07/18] link_alt_odb_entry: handle normalize_path errors Jeff King
@ 2016-10-04  6:01   ` Jacob Keller
  2016-10-04 21:08   ` Junio C Hamano
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 84+ messages in thread
From: Jacob Keller @ 2016-10-04  6:01 UTC (permalink / raw)
  To: Jeff King; +Cc: Git mailing list, René Scharfe

On Mon, Oct 3, 2016 at 1:34 PM, Jeff King <peff@peff.net> wrote:
> When we add a new alternate to the list, we try to normalize
> out any redundant "..", etc. However, we do not look at the
> return value of normalize_path_copy(), and will happily
> continue with a path that could not be normalized. Worse,
> the normalizing process is done in-place, so we are left
> with whatever half-finished working state the normalizing
> function was in.
>
> Fortunately, this cannot cause us to read past the end of
> our buffer, as that working state will always leave the
> NUL from the original path in place. And we do tend to
> notice problems when we check is_directory() on the path.
> But you can see the nonsense that we feed to is_directory
> with an entry like:
>
>   this/../../is/../../way/../../too/../../deep/../../to/../../resolve
>
> in your objects/info/alternates, which yields:
>
>   error: object directory
>   /to/e/deep/too/way//ects/this/../../is/../../way/../../too/../../deep/../../to/../../resolve
>   does not exist; check .git/objects/info/alternates.
>

Yikes, that doesn't seem helpful.

> We can easily fix this just by checking the return value.
> But that makes it hard to generate a good error message,
> since we're normalizing in-place and our input value has
> been overwritten by cruft.

Right. Definitely want to check the return value here...

>
> Instead, let's provide a strbuf helper that does an in-place
> normalize, but restores the original contents on error. This
> uses a second buffer under the hood, which is slightly less
> efficient, but this is not a performance-critical code path.
>

I agree, I don't think this duplication is really a big deal, since it
helps ensure that the function doesn't modify its arguments on error.

> The strbuf helper can also properly set the "len" parameter
> of the strbuf before returning. Just doing:
>
>   normalize_path_copy(buf.buf, buf.buf);
>
> will shorten the string, but leave buf.len at the original
> length. That may be confusing to later code which uses the
> strbuf.
>

Makes sense here. Properly setting the length will help prevent future issues.

Thanks,
Jake

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  sha1_file.c | 11 +++++++++--
>  strbuf.c    | 20 ++++++++++++++++++++
>  strbuf.h    |  8 ++++++++
>  3 files changed, 37 insertions(+), 2 deletions(-)
>
> diff --git a/sha1_file.c b/sha1_file.c
> index b9c1fa3..68571bd 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -263,7 +263,12 @@ static int link_alt_odb_entry(const char *entry, const char *relative_base,
>         }
>         strbuf_addstr(&pathbuf, entry);
>
> -       normalize_path_copy(pathbuf.buf, pathbuf.buf);
> +       if (strbuf_normalize_path(&pathbuf) < 0) {
> +               error("unable to normalize alternate object path: %s",
> +                     pathbuf.buf);
> +               strbuf_release(&pathbuf);
> +               return -1;
> +       }
>
>         pfxlen = strlen(pathbuf.buf);
>
> @@ -335,7 +340,9 @@ static void link_alt_odb_entries(const char *alt, int len, int sep,
>         }
>
>         strbuf_add_absolute_path(&objdirbuf, get_object_directory());
> -       normalize_path_copy(objdirbuf.buf, objdirbuf.buf);
> +       if (strbuf_normalize_path(&objdirbuf) < 0)
> +               die("unable to normalize object directory: %s",
> +                   objdirbuf.buf);
>
>         alt_copy = xmemdupz(alt, len);
>         string_list_split_in_place(&entries, alt_copy, sep, -1);
> diff --git a/strbuf.c b/strbuf.c
> index b839be4..8fec657 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -870,3 +870,23 @@ void strbuf_stripspace(struct strbuf *sb, int skip_comments)
>
>         strbuf_setlen(sb, j);
>  }
> +
> +int strbuf_normalize_path(struct strbuf *src)
> +{
> +       struct strbuf dst = STRBUF_INIT;
> +
> +       strbuf_grow(&dst, src->len);
> +       if (normalize_path_copy(dst.buf, src->buf) < 0) {
> +               strbuf_release(&dst);
> +               return -1;
> +       }
> +
> +       /*
> +        * normalize_path does not tell us the new length, so we have to
> +        * compute it by looking for the new NUL it placed
> +        */

And we can't correctly set the length inside normalize_path_copy
because it just takes C strings directly and not actually a strbuf. Ok
so it makes sense that we have to set it here.

Thanks,
Jake

> +       strbuf_setlen(&dst, strlen(dst.buf));
> +       strbuf_swap(src, &dst);
> +       strbuf_release(&dst);
> +       return 0;
> +}
> diff --git a/strbuf.h b/strbuf.h
> index ba8d5f1..2262b12 100644
> --- a/strbuf.h
> +++ b/strbuf.h
> @@ -443,6 +443,14 @@ extern int strbuf_getcwd(struct strbuf *sb);
>   */
>  extern void strbuf_add_absolute_path(struct strbuf *sb, const char *path);
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 08/18] link_alt_odb_entry: refactor string handling
  2016-10-03 20:34 ` [PATCH 08/18] link_alt_odb_entry: refactor string handling Jeff King
@ 2016-10-04  6:05   ` Jacob Keller
  2016-10-04 13:53     ` Jeff King
  2016-10-04 21:18   ` Junio C Hamano
  1 sibling, 1 reply; 84+ messages in thread
From: Jacob Keller @ 2016-10-04  6:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Git mailing list, René Scharfe

On Mon, Oct 3, 2016 at 1:34 PM, Jeff King <peff@peff.net> wrote:
> The string handling in link_alt_odb_entry() is mostly an
> artifact of the original version, which took the path as a
> ptr/len combo, and did not have a NUL-terminated string
> until we created one in the alternate_object_database
> struct.  But since 5bdf0a8 (sha1_file: normalize alt_odb
> path before comparing and storing, 2011-09-07), the first
> thing we do is put the path into a strbuf, which gives us
> some easy opportunities for cleanup.
>
> In particular:
>
>   - we call strlen(pathbuf.buf), which is silly; we can look
>     at pathbuf.len.

Right. This makes obvious sense.

>
>   - even though we have a strbuf, we don't maintain its
>     "len" field when chomping extra slashes from the
>     end, and instead keep a separate "pfxlen" variable. We
>     can fix this and then drop "pfxlen" entirely.
>

Makes sense.

>   - we don't check whether the path is usable until after we
>     allocate the new struct, making extra cleanup work for
>     ourselves. Since we have a NUL-terminated string, we can
>     bump the "is it usable" checks higher in the function.
>     While we're at it, we can move that logic to its own
>     helper, which makes the flow of link_alt_odb_entry()
>     easier to follow.
>

Also makes sense.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
> And you can probably guess now how I found the issue in the last patch
> where pathbuf.len is totally bogus after calling normalize_path_copy. :)
>
>  sha1_file.c | 83 +++++++++++++++++++++++++++++++++----------------------------
>  1 file changed, 45 insertions(+), 38 deletions(-)
>
> diff --git a/sha1_file.c b/sha1_file.c
> index 68571bd..f396823 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -234,6 +234,36 @@ char *sha1_pack_index_name(const unsigned char *sha1)
>  struct alternate_object_database *alt_odb_list;
>  static struct alternate_object_database **alt_odb_tail;
>
> +/*
> + * Return non-zero iff the path is usable as an alternate object database.
> + */
> +static int alt_odb_usable(struct strbuf *path, const char *normalized_objdir)
> +{
> +       struct alternate_object_database *alt;
> +
> +       /* Detect cases where alternate disappeared */
> +       if (!is_directory(path->buf)) {
> +               error("object directory %s does not exist; "
> +                     "check .git/objects/info/alternates.",
> +                     path->buf);
> +               return 0;
> +       }
> +
> +       /*
> +        * Prevent the common mistake of listing the same
> +        * thing twice, or object directory itself.
> +        */
> +       for (alt = alt_odb_list; alt; alt = alt->next) {
> +               if (path->len == alt->name - alt->base - 1 &&
> +                   !memcmp(path->buf, alt->base, path->len))
> +                       return 0;
> +       }
> +       if (!fspathcmp(path->buf, normalized_objdir))
> +               return 0;
> +
> +       return 1;
> +}
> +

This definitely makes reading the following function much easier,
though the diff is a bit funky. I think the end result is much
clearer.

Thanks,
Jake

>  /*
>   * Prepare alternate object database registry.
>   *
> @@ -253,8 +283,7 @@ static int link_alt_odb_entry(const char *entry, const char *relative_base,
>         int depth, const char *normalized_objdir)
>  {
>         struct alternate_object_database *ent;
> -       struct alternate_object_database *alt;
> -       size_t pfxlen, entlen;
> +       size_t entlen;
>         struct strbuf pathbuf = STRBUF_INIT;
>
>         if (!is_absolute_path(entry) && relative_base) {
> @@ -270,47 +299,26 @@ static int link_alt_odb_entry(const char *entry, const char *relative_base,
>                 return -1;
>         }
>
> -       pfxlen = strlen(pathbuf.buf);
> -
>         /*
>          * The trailing slash after the directory name is given by
>          * this function at the end. Remove duplicates.
>          */
> -       while (pfxlen && pathbuf.buf[pfxlen-1] == '/')
> -               pfxlen -= 1;
> -
> -       entlen = st_add(pfxlen, 43); /* '/' + 2 hex + '/' + 38 hex + NUL */
> -       ent = xmalloc(st_add(sizeof(*ent), entlen));
> -       memcpy(ent->base, pathbuf.buf, pfxlen);
> -       strbuf_release(&pathbuf);
> -
> -       ent->name = ent->base + pfxlen + 1;
> -       ent->base[pfxlen + 3] = '/';
> -       ent->base[pfxlen] = ent->base[entlen-1] = 0;
> +       while (pathbuf.len && pathbuf.buf[pathbuf.len - 1] == '/')
> +               strbuf_setlen(&pathbuf, pathbuf.len - 1);
>
> -       /* Detect cases where alternate disappeared */
> -       if (!is_directory(ent->base)) {
> -               error("object directory %s does not exist; "
> -                     "check .git/objects/info/alternates.",
> -                     ent->base);
> -               free(ent);
> +       if (!alt_odb_usable(&pathbuf, normalized_objdir)) {
> +               strbuf_release(&pathbuf);
>                 return -1;
>         }
>
> -       /* Prevent the common mistake of listing the same
> -        * thing twice, or object directory itself.
> -        */
> -       for (alt = alt_odb_list; alt; alt = alt->next) {
> -               if (pfxlen == alt->name - alt->base - 1 &&
> -                   !memcmp(ent->base, alt->base, pfxlen)) {
> -                       free(ent);
> -                       return -1;
> -               }
> -       }
> -       if (!fspathcmp(ent->base, normalized_objdir)) {
> -               free(ent);
> -               return -1;
> -       }
> +       entlen = st_add(pathbuf.len, 43); /* '/' + 2 hex + '/' + 38 hex + NUL */
> +       ent = xmalloc(st_add(sizeof(*ent), entlen));
> +       memcpy(ent->base, pathbuf.buf, pathbuf.len);
> +
> +       ent->name = ent->base + pathbuf.len + 1;
> +       ent->base[pathbuf.len] = '/';
> +       ent->base[pathbuf.len + 3] = '/';
> +       ent->base[entlen-1] = 0;
>
>         /* add the alternate entry */
>         *alt_odb_tail = ent;
> @@ -318,10 +326,9 @@ static int link_alt_odb_entry(const char *entry, const char *relative_base,
>         ent->next = NULL;
>
>         /* recursively add alternates */
> -       read_info_alternates(ent->base, depth + 1);
> -
> -       ent->base[pfxlen] = '/';
> +       read_info_alternates(pathbuf.buf, depth + 1);
>
> +       strbuf_release(&pathbuf);
>         return 0;
>  }
>
> --
> 2.10.0.618.g82cc264
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 09/18] alternates: provide helper for adding to alternates list
  2016-10-03 20:35 ` [PATCH 09/18] alternates: provide helper for adding to alternates list Jeff King
@ 2016-10-04  6:07   ` Jacob Keller
  0 siblings, 0 replies; 84+ messages in thread
From: Jacob Keller @ 2016-10-04  6:07 UTC (permalink / raw)
  To: Jeff King; +Cc: Git mailing list, René Scharfe

On Mon, Oct 3, 2016 at 1:35 PM, Jeff King <peff@peff.net> wrote:
> The submodule code wants to temporarily add an alternate
> object store to our in-memory alt_odb list, but does it
> manually. Let's provide a helper so it can reuse the code in
> link_alt_odb_entry().
>
> While we're adding our new add_to_alternates_memory(), let's
> document add_to_alternates_file(), as the two are related.
>

Ya the code used in the submodule area always felt a bit wrong to me.
It took me a bit to realize why we can just replace this all with a
call to link_alt_odb_entry, but the resulting code reduction is
definitely nice.


> -       /* avoid adding it twice */
> -       prepare_alt_odb();
> -       for (alt_odb = alt_odb_list; alt_odb; alt_odb = alt_odb->next)
> -               if (alt_odb->name - alt_odb->base == objects_directory.len &&
> -                               !strncmp(alt_odb->base, objects_directory.buf,
> -                                       objects_directory.len))
> -                       goto done;
> -
> -       alloc = st_add(objects_directory.len, 42); /* for "12/345..." sha1 */
> -       alt_odb = xmalloc(st_add(sizeof(*alt_odb), alloc));
> -       alt_odb->next = alt_odb_list;
> -       xsnprintf(alt_odb->base, alloc, "%s", objects_directory.buf);
> -       alt_odb->name = alt_odb->base + objects_directory.len;
> -       alt_odb->name[2] = '/';
> -       alt_odb->name[40] = '\0';
> -       alt_odb->name[41] = '\0';
> -       alt_odb_list = alt_odb;
> -

Getting rid of multiple places for this funky extra allocation is a
nice improvement.

Thanks,
Jake

> -       /* add possible alternates from the submodule */
> -       read_info_alternates(objects_directory.buf, 0);
> +       add_to_alternates_memory(objects_directory.buf);
>  done:
>         strbuf_release(&objects_directory);
>         return ret;
> --
> 2.10.0.618.g82cc264
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 10/18] alternates: provide helper for allocating alternate
  2016-10-03 20:35 ` [PATCH 10/18] alternates: provide helper for allocating alternate Jeff King
@ 2016-10-04  6:09   ` Jacob Keller
  0 siblings, 0 replies; 84+ messages in thread
From: Jacob Keller @ 2016-10-04  6:09 UTC (permalink / raw)
  To: Jeff King; +Cc: Git mailing list, René Scharfe

On Mon, Oct 3, 2016 at 1:35 PM, Jeff King <peff@peff.net> wrote:
> Allocating a struct alternate_object_database is tricky, as
> we must over-allocate the buffer to provide scratch space,
> and then put in particular '/' and NUL markers.
>
> Let's encapsulate this in a function so that the complexity
> doesn't leak into callers (and so that we can modify it
> later).

The overall way this was broken up is definitely a lot of patches to
follow but understanding the end goal this is a huge improvement in
code maintainability here. This original allocation is indeed very
tricky.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 12/18] alternates: use a separate scratch space
  2016-10-03 20:35 ` [PATCH 12/18] alternates: use a separate scratch space Jeff King
@ 2016-10-04  6:12   ` Jacob Keller
  2016-10-04 21:29   ` Junio C Hamano
  1 sibling, 0 replies; 84+ messages in thread
From: Jacob Keller @ 2016-10-04  6:12 UTC (permalink / raw)
  To: Jeff King; +Cc: Git mailing list, René Scharfe

On Mon, Oct 3, 2016 at 1:35 PM, Jeff King <peff@peff.net> wrote:
> The alternate_object_database struct uses a single buffer
> both for storing the path to the alternate, and as a scratch
> buffer for forming object names. This is efficient (since
> otherwise we'd end up storing the path twice), but it makes
> life hard for callers who just want to know the path to the
> alternate. They have to remember to stop reading after
> "alt->name - alt->base" bytes, and to subtract one for the
> trailing '/'.
>
> It would be much simpler if they could simply access a
> NUL-terminated path string. We could encapsulate this in a
> function which puts a NUL in the scratch buffer and returns
> the string, but that opens up questions about the lifetime
> of the result. The first time another caller uses the
> alternate, the scratch buffer may get other data tacked onto
> it.
>
> Let's instead just store the root path separately from the
> scratch buffer. There aren't enough alternates being stored
> for the duplicated data to matter for performance, and this
> keeps things simple and safe for the callers.
>

Definitely agree here. The resulting code seems a lot easier to
follow, and making the callers simpler here is a very goo thing.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 13/18] fill_sha1_file: write "boring" characters
  2016-10-03 20:35 ` [PATCH 13/18] fill_sha1_file: write "boring" characters Jeff King
@ 2016-10-04  6:13   ` Jacob Keller
  2016-10-04 21:46     ` Junio C Hamano
  0 siblings, 1 reply; 84+ messages in thread
From: Jacob Keller @ 2016-10-04  6:13 UTC (permalink / raw)
  To: Jeff King; +Cc: Git mailing list, René Scharfe

On Mon, Oct 3, 2016 at 1:35 PM, Jeff King <peff@peff.net> wrote:
> This function forms a sha1 as "xx/yyyy...", but skips over
> the slot for the slash rather than writing it, leaving it to
> the caller to do so. It also does not bother to put in a
> trailing NUL, even though every caller would want it (we're
> forming a path which by definition is not a directory, so
> the only thing to do with it is feed it to a system call).
>
> Let's make the lives of our callers easier by just writing
> out the internal "/" and the NUL.
>

Ya this makes sense.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  sha1_file.c | 12 +++++-------
>  1 file changed, 5 insertions(+), 7 deletions(-)
>
> diff --git a/sha1_file.c b/sha1_file.c
> index 70c3e2f..c6308c1 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -178,10 +178,12 @@ static void fill_sha1_path(char *pathbuf, const unsigned char *sha1)
>         for (i = 0; i < 20; i++) {
>                 static char hex[] = "0123456789abcdef";
>                 unsigned int val = sha1[i];
> -               char *pos = pathbuf + i*2 + (i > 0);
> -               *pos++ = hex[val >> 4];
> -               *pos = hex[val & 0xf];
> +               *pathbuf++ = hex[val >> 4];
> +               *pathbuf++ = hex[val & 0xf];
> +               if (!i)
> +                       *pathbuf++ = '/';
>         }
> +       *pathbuf = '\0';

I think this makes a lot more sense than making the callers have to do this.

Thanks,
Jake

>  }
>
>  const char *sha1_file_name(const unsigned char *sha1)
> @@ -198,8 +200,6 @@ const char *sha1_file_name(const unsigned char *sha1)
>                 die("insanely long object directory %s", objdir);
>         memcpy(buf, objdir, len);
>         buf[len] = '/';
> -       buf[len+3] = '/';
> -       buf[len+42] = '\0';
>         fill_sha1_path(buf + len + 1, sha1);
>         return buf;
>  }
> @@ -406,8 +406,6 @@ struct alternate_object_database *alloc_alt_odb(const char *dir)
>
>         ent->name = ent->scratch + dirlen + 1;
>         ent->scratch[dirlen] = '/';
> -       ent->scratch[dirlen + 3] = '/';
> -       ent->scratch[entlen-1] = 0;
>
>         return ent;
>  }
> --
> 2.10.0.618.g82cc264
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 15/18] fill_sha1_file: write into a strbuf
  2016-10-03 20:36 ` [PATCH 15/18] fill_sha1_file: write into a strbuf Jeff King
@ 2016-10-04  6:44   ` Jacob Keller
  0 siblings, 0 replies; 84+ messages in thread
From: Jacob Keller @ 2016-10-04  6:44 UTC (permalink / raw)
  To: Jeff King; +Cc: Git mailing list, René Scharfe

On Mon, Oct 3, 2016 at 1:36 PM, Jeff King <peff@peff.net> wrote:
> It's currently the responsibility of the caller to give
> fill_sha1_file() enough bytes to write into, leading them to
> manually compute the required lengths. Instead, let's just
> write into a strbuf so that it's impossible to get this
> wrong.

Yea this makes sense.

>
> The alt_odb caller already has a strbuf, so this makes
> things strictly simpler. The other caller, sha1_file_name(),
> uses a static PATH_MAX buffer and dies when it would
> overflow. We can convert this to a static strbuf, which
> means our allocation cost is amortized (and as a bonus, we
> no longer have to worry about PATH_MAX being too short for
> normal use).
>
> This does introduce some small overhead in fill_sha1_file(),
> as each strbuf_addchar() will check whether it needs to
> grow. However, between the optimization in fec501d
> (strbuf_addch: avoid calling strbuf_grow, 2015-04-16) and
> the fact that this is not generally called in a tight loop
> (after all, the next step is typically to access the file!)
> this probably doesn't matter. And even if it did, the right
> place to micro-optimize is inside fill_sha1_file(), by
> calling a single strbuf_grow() there.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  sha1_file.c | 34 ++++++++++------------------------
>  1 file changed, 10 insertions(+), 24 deletions(-)
>
> diff --git a/sha1_file.c b/sha1_file.c
> index efc8cee..80a3333 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -172,36 +172,28 @@ enum scld_error safe_create_leading_directories_const(const char *path)
>         return result;
>  }
>
> -static void fill_sha1_path(char *pathbuf, const unsigned char *sha1)
> +static void fill_sha1_path(struct strbuf *buf, const unsigned char *sha1)
>  {
>         int i;
>         for (i = 0; i < 20; i++) {
>                 static char hex[] = "0123456789abcdef";
>                 unsigned int val = sha1[i];
> -               *pathbuf++ = hex[val >> 4];
> -               *pathbuf++ = hex[val & 0xf];
> +               strbuf_addch(buf, hex[val >> 4]);
> +               strbuf_addch(buf, hex[val & 0xf]);
>                 if (!i)
> -                       *pathbuf++ = '/';
> +                       strbuf_addch(buf, '/');
>         }
> -       *pathbuf = '\0';
>  }
>
>  const char *sha1_file_name(const unsigned char *sha1)
>  {
> -       static char buf[PATH_MAX];
> -       const char *objdir;
> -       int len;
> +       static struct strbuf buf = STRBUF_INIT;
>
> -       objdir = get_object_directory();
> -       len = strlen(objdir);
> +       strbuf_reset(&buf);
> +       strbuf_addf(&buf, "%s/", get_object_directory());
>
> -       /* '/' + sha1(2) + '/' + sha1(38) + '\0' */
> -       if (len + 43 > PATH_MAX)
> -               die("insanely long object directory %s", objdir);
> -       memcpy(buf, objdir, len);
> -       buf[len] = '/';
> -       fill_sha1_path(buf + len + 1, sha1);
> -       return buf;

I'm definitely a fan of seeing the magic number here go away.

> +       fill_sha1_path(&buf, sha1);
> +       return buf.buf;
>  }
>
>  struct strbuf *alt_scratch_buf(struct alternate_object_database *alt)
> @@ -213,14 +205,8 @@ struct strbuf *alt_scratch_buf(struct alternate_object_database *alt)
>  static const char *alt_sha1_path(struct alternate_object_database *alt,
>                                  const unsigned char *sha1)
>  {
> -       /* hex sha1 plus internal "/" */
> -       size_t len = GIT_SHA1_HEXSZ + 1;
>         struct strbuf *buf = alt_scratch_buf(alt);

Funny story.. While reviewing this code on my screen, my monitor has a
nice little bit of gunk just between the lines that made this one look
like it was being deleted. So I was really confused as to what strbuf
you were using and why you removed a call to alt_scratch_buf()..
Obviously this line just isn't being removed.

> -
> -       strbuf_grow(buf, len);
> -       fill_sha1_path(buf->buf + buf->len, sha1);
> -       strbuf_setlen(buf, buf->len + len);
> -
> +       fill_sha1_path(buf, sha1);
>         return buf->buf;
>  }
>
> --
> 2.10.0.618.g82cc264
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 16/18] count-objects: report alternates via verbose mode
  2016-10-03 20:36 ` [PATCH 16/18] count-objects: report alternates via verbose mode Jeff King
@ 2016-10-04  6:46   ` Jacob Keller
  2016-10-04 13:56     ` Jeff King
  2016-10-05 14:23   ` Jakub Narębski
  2016-10-05 18:47   ` René Scharfe
  2 siblings, 1 reply; 84+ messages in thread
From: Jacob Keller @ 2016-10-04  6:46 UTC (permalink / raw)
  To: Jeff King; +Cc: Git mailing list, René Scharfe

\On Mon, Oct 3, 2016 at 1:36 PM, Jeff King <peff@peff.net> wrote:
> There's no way to get the list of alternates that git
> computes internally; our tests only infer it based on which
> objects are available. In addition to testing, knowing this
> list may be helpful for somebody debugging their alternates
> setup.
>
> Let's add it to the "count-objects -v" output. We could give
> it a separate flag, but there's not really any need.
> "count-objects -v" is already a debugging catch-all for the
> object database, its output is easily extensible to new data
> items, and printing the alternates is not expensive (we
> already had to find them to count the objects).
>

Makes sense. Unless there's a compelling reason you'd want to print
out these alternates *without* anything else from -v, but you can just
use grep like the test does so this seems fine to me.

Thanks,
Jake

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  Documentation/git-count-objects.txt |  5 +++++
>  builtin/count-objects.c             | 10 ++++++++++
>  t/t5613-info-alternate.sh           | 10 ++++++++++
>  3 files changed, 25 insertions(+)
>
> diff --git a/Documentation/git-count-objects.txt b/Documentation/git-count-objects.txt
> index 2ff3568..cb9b4d2 100644
> --- a/Documentation/git-count-objects.txt
> +++ b/Documentation/git-count-objects.txt
> @@ -38,6 +38,11 @@ objects nor valid packs
>  +
>  size-garbage: disk space consumed by garbage files, in KiB (unless -H is
>  specified)
> ++
> +alternate: absolute path of alternate object databases; may appear
> +multiple times, one line per path. Note that if the path contains
> +non-printable characters, it may be surrounded by double-quotes and
> +contain C-style backslashed escape sequences.
>
>  -H::
>  --human-readable::
> diff --git a/builtin/count-objects.c b/builtin/count-objects.c
> index ba92919..a700409 100644
> --- a/builtin/count-objects.c
> +++ b/builtin/count-objects.c
> @@ -8,6 +8,7 @@
>  #include "dir.h"
>  #include "builtin.h"
>  #include "parse-options.h"
> +#include "quote.h"
>
>  static unsigned long garbage;
>  static off_t size_garbage;
> @@ -73,6 +74,14 @@ static int count_cruft(const char *basename, const char *path, void *data)
>         return 0;
>  }
>
> +static int print_alternate(struct alternate_object_database *alt, void *data)
> +{
> +       printf("alternate: ");
> +       quote_c_style(alt->path, NULL, stdout, 0);
> +       putchar('\n');
> +       return 0;
> +}
> +
>  static char const * const count_objects_usage[] = {
>         N_("git count-objects [-v] [-H | --human-readable]"),
>         NULL
> @@ -140,6 +149,7 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
>                 printf("prune-packable: %lu\n", packed_loose);
>                 printf("garbage: %lu\n", garbage);
>                 printf("size-garbage: %s\n", garbage_buf.buf);
> +               foreach_alt_odb(print_alternate, NULL);
>                 strbuf_release(&loose_buf);
>                 strbuf_release(&pack_buf);
>                 strbuf_release(&garbage_buf);
> diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
> index b393613..74f6770 100755
> --- a/t/t5613-info-alternate.sh
> +++ b/t/t5613-info-alternate.sh
> @@ -39,6 +39,16 @@ test_expect_success 'preparing third repository' '
>         )
>  '
>
> +test_expect_success 'count-objects shows the alternates' '
> +       cat >expect <<-EOF &&
> +       alternate: $(pwd)/B/.git/objects
> +       alternate: $(pwd)/A/.git/objects
> +       EOF
> +       git -C C count-objects -v >actual &&
> +       grep ^alternate: actual >actual.alternates &&
> +       test_cmp expect actual.alternates
> +'
> +
>  # Note: These tests depend on the hard-coded value of 5 as "too deep". We start
>  # the depth at 0 and count links, not repositories, so in a chain like:
>  #
> --
> 2.10.0.618.g82cc264
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 17/18] sha1_file: always allow relative paths to alternates
  2016-10-03 20:36 ` [PATCH 17/18] sha1_file: always allow relative paths to alternates Jeff King
@ 2016-10-04  6:50   ` Jacob Keller
  2016-10-04 14:00     ` Jeff King
  0 siblings, 1 reply; 84+ messages in thread
From: Jacob Keller @ 2016-10-04  6:50 UTC (permalink / raw)
  To: Jeff King; +Cc: Git mailing list, René Scharfe

On Mon, Oct 3, 2016 at 1:36 PM, Jeff King <peff@peff.net> wrote:
> We recursively expand alternates repositories, so that if A
> borrows from B which borrows from C, A can see all objects.
>
> For the root object database, we allow relative paths, so A
> can point to B as "../B/objects". However, we currently do
> not allow relative paths when recursing, so B must use an
> absolute path to reach C.
>
> That is an ancient protection from c2f493a (Transitively
> read alternatives, 2006-05-07) that tries to avoid adding
> the same alternate through two different paths. Since
> 5bdf0a8 (sha1_file: normalize alt_odb path before comparing
> and storing, 2011-09-07), we use a normalized absolute path
> for each alt_odb entry.
>
> This means that in most cases the protection is no longer
> necessary; we will detect the duplicate no matter how we got
> there (but see below).  And it's a good idea to get rid of
> it, as it creates an unnecessary complication when setting
> up recursive alternates (B has to know that A is going to
> borrow from it and make sure to use an absolute path).
>

I think this makes sense. We already normalize a path, and if the
normalization is too complicated, then we (now) fail nicely so we
should always have an absolute path to the store.

> Note that our normalization doesn't actually look at the
> filesystem, so it can still be fooled by crossing symbolic
> links. But that's also true of absolute paths, so it's not a
> good reason to disallow only relative paths (it's
> potentially a reason to switch to real_path(), but that's a
> separate and non-trivial change).

Hmm, ya using real_path would fix that but I definitely agree that's
not trivial and can be done in the future if we think it is or becomes
necessary.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 18/18] alternates: use fspathcmp to detect duplicates
  2016-10-03 20:36 ` [PATCH 18/18] alternates: use fspathcmp to detect duplicates Jeff King
@ 2016-10-04  6:51   ` Jacob Keller
  2016-10-04 14:10     ` Jeff King
  2016-10-04 21:42   ` Junio C Hamano
  2016-10-05  2:34   ` Aaron Schrab
  2 siblings, 1 reply; 84+ messages in thread
From: Jacob Keller @ 2016-10-04  6:51 UTC (permalink / raw)
  To: Jeff King; +Cc: Git mailing list, René Scharfe

On Mon, Oct 3, 2016 at 1:36 PM, Jeff King <peff@peff.net> wrote:
> On a case-insensitive filesystem, we should realize that
> "a/objects" and "A/objects" are the same path. We already
> use fspathcmp() to check against the main object directory,
> but until recently we couldn't use it for comparing against
> other alternates (because their paths were not
> NUL-terminated strings). But now we can, so let's do so.
>

Yep, makes sense.

> Note that we also need to adjust count-objects to load the
> config, so that it can see the setting of core.ignorecase
> (this is required by the test, but is also a general bugfix
> for users of count-objects).

Also makes sense.

>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  builtin/count-objects.c   |  2 ++
>  sha1_file.c               |  2 +-
>  t/t5613-info-alternate.sh | 17 +++++++++++++++++
>  3 files changed, 20 insertions(+), 1 deletion(-)
>
> diff --git a/builtin/count-objects.c b/builtin/count-objects.c
> index a700409..a04b4f2 100644
> --- a/builtin/count-objects.c
> +++ b/builtin/count-objects.c
> @@ -97,6 +97,8 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
>                 OPT_END(),
>         };
>
> +       git_config(git_default_config, NULL);
> +
>         argc = parse_options(argc, argv, prefix, opts, count_objects_usage, 0);
>         /* we do not take arguments other than flags for now */
>         if (argc)
> diff --git a/sha1_file.c b/sha1_file.c
> index b514167..b05ec9c 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -260,7 +260,7 @@ static int alt_odb_usable(struct strbuf *path, const char *normalized_objdir)
>          * thing twice, or object directory itself.
>          */
>         for (alt = alt_odb_list; alt; alt = alt->next) {
> -               if (!strcmp(path->buf, alt->path))
> +               if (!fspathcmp(path->buf, alt->path))
>                         return 0;
>         }
>         if (!fspathcmp(path->buf, normalized_objdir))
> diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
> index 76525a0..926fe14 100755
> --- a/t/t5613-info-alternate.sh
> +++ b/t/t5613-info-alternate.sh
> @@ -116,4 +116,21 @@ test_expect_success 'relative duplicates are eliminated' '
>         test_cmp expect actual.alternates
>  '
>
> +test_expect_success CASE_INSENSITIVE_FS 'dup finding can be case-insensitive' '
> +       git init --bare insensitive.git &&
> +       # the previous entry for "A" will have used uppercase
> +       cat >insensitive.git/objects/info/alternates <<-\EOF &&
> +       ../../C/.git/objects
> +       ../../a/.git/objects
> +       EOF
> +       cat >expect <<-EOF &&
> +       alternate: $(pwd)/C/.git/objects
> +       alternate: $(pwd)/B/.git/objects
> +       alternate: $(pwd)/A/.git/objects
> +       EOF
> +       git -C insensitive.git count-objects -v >actual &&
> +       grep ^alternate: actual >actual.alternates &&
> +       test_cmp expect actual.alternates
> +'
> +
>  test_done
> --
> 2.10.0.618.g82cc264

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 0/18] alternate object database cleanups
  2016-10-04  5:47 ` [PATCH 0/18] alternate object database cleanups Jacob Keller
@ 2016-10-04 13:41   ` Jeff King
  2016-10-04 20:40     ` Jacob Keller
  0 siblings, 1 reply; 84+ messages in thread
From: Jeff King @ 2016-10-04 13:41 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Git mailing list, René Scharfe

On Mon, Oct 03, 2016 at 10:47:31PM -0700, Jacob Keller wrote:

> > The number of patches is a little intimidating, but I tried hard to
> > break the refactoring down into a sequence of obviously-correct steps.
> > You can be the judge of my success.
> 
> I read through them once. I'm going to re-read through them again and
> leave any comments I had.

Thanks for having the fortitude to read them all. :) After looking at
your comments, I don't _think_ there's anything that necessitates a
re-roll, but I'll respond to a few of them individually.

-Peff

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 01/18] t5613: drop reachable_via function
  2016-10-04  5:48   ` Jacob Keller
@ 2016-10-04 13:43     ` Jeff King
  0 siblings, 0 replies; 84+ messages in thread
From: Jeff King @ 2016-10-04 13:43 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Git mailing list, René Scharfe

On Mon, Oct 03, 2016 at 10:48:31PM -0700, Jacob Keller wrote:

> On Mon, Oct 3, 2016 at 1:33 PM, Jeff King <peff@peff.net> wrote:
> > This function was never used since its inception in dd05ea1
> > (test case for transitive info/alternates, 2006-05-07).
> > Which is just as well, since it mutates the repo state in a
> > way that would invalidate further tests, without cleaning up
> > after itself. Let's get rid of it so that nobody is tempted
> > to use it.
> >
> 
> Makes sense. It wouldn't be a good idea to leave this around since it
> didn't clean up after itself. Curious why no test actually used it
> though..

I wondered that, too. Sometimes in cases like this a call got dropped
during a re-roll on the list. But the original email:

  http://public-inbox.org/git/20060507181947.GE23738@admingilde.org/

has the problem, too. So probably it was just leftover cruft that was
made obsolete even before it hit the list.

-Peff

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 04/18] t5613: whitespace/style cleanups
  2016-10-04  5:52   ` Jacob Keller
@ 2016-10-04 13:47     ` Jeff King
  2016-10-04 20:41       ` Jacob Keller
  0 siblings, 1 reply; 84+ messages in thread
From: Jeff King @ 2016-10-04 13:47 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Git mailing list, René Scharfe

On Mon, Oct 03, 2016 at 10:52:39PM -0700, Jacob Keller wrote:

> On Mon, Oct 3, 2016 at 1:34 PM, Jeff King <peff@peff.net> wrote:
> > Our normal test style these days puts the opening quote of
> > the body on the description line, and indents the body with
> > a single tab. This ancient test did not follow this.
> >
> 
> I was surprised you didn't do this first, but it doesn't really make a
> difference either way. This is also a pretty straight forward
> improvement, and I can see why you'd want to split this out to review
> separately.

I was trying to leave it to the end, to move the substantive changes up
front (and because there _isn't_ a correct style for some of the things
it was doing). But it just got too painful to do the "don't chdir" patch
without updating the style. I agree it might have made more sense at the
very beginning, but I didn't think it mattered enough to go through the
trouble of rebasing the earlier patches (which would essentially be
rewriting them).

-Peff

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 06/18] t5613: clarify "too deep" recursion tests
  2016-10-04  5:57   ` Jacob Keller
@ 2016-10-04 13:48     ` Jeff King
  2016-10-04 20:44       ` Jacob Keller
  0 siblings, 1 reply; 84+ messages in thread
From: Jeff King @ 2016-10-04 13:48 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Git mailing list, René Scharfe

On Mon, Oct 03, 2016 at 10:57:48PM -0700, Jacob Keller wrote:

> > diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
> > index 7bc1c3c..b393613 100755
> > --- a/t/t5613-info-alternate.sh
> > +++ b/t/t5613-info-alternate.sh
> > @@ -39,6 +39,18 @@ test_expect_success 'preparing third repository' '
> >         )
> >  '
> >
> > +# Note: These tests depend on the hard-coded value of 5 as "too deep". We start
> > +# the depth at 0 and count links, not repositories, so in a chain like:
> > +#
> > +#   A -> B -> C -> D -> E -> F -> G -> H
> > +#      0    1    2    3    4    5    6
> > +#
> 
> Ok so we count links, but wouldn't we have 5 links when we hit F, and
> not G? Or am I missing something here?

This is what I was trying to get at with the "start the depth at 0". We
disallow a depth greater than 5, but because we start at 0-counting,
it's really six links. I guess saying "5 as too deep" is really the
misleading part. It should be "5 as the maximum depth".

-Peff

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 08/18] link_alt_odb_entry: refactor string handling
  2016-10-04  6:05   ` Jacob Keller
@ 2016-10-04 13:53     ` Jeff King
  2016-10-04 20:46       ` Jacob Keller
  0 siblings, 1 reply; 84+ messages in thread
From: Jeff King @ 2016-10-04 13:53 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Git mailing list, René Scharfe

On Mon, Oct 03, 2016 at 11:05:42PM -0700, Jacob Keller wrote:

> This definitely makes reading the following function much easier,
> though the diff is a bit funky. I think the end result is much
> clearer.

Yeah, it's really hard to see that all of the "ent" setup is kept,
because it moves _and_ changes its content (from pfxlen to pathbuf.len).

I actually tried to split this into two patches to make the diff easier
to read, but there are two mutually dependent changes: moving to
pathbuf.len everywhere requires not-freeing pathbuf in the early code
path. But if you do that and don't move all of "is it usable" checks up,
then you have to add a bunch of new error-handling code that would just
get ripped out in the next patch.

There's definitely _some_ of that in this series already (e.g., the
counting logic in alt_sha1_path() added by patch 14 that just gets
ripped out in patch 15 when fill_sha1_path() learns to use a strbuf). I
tried to balance "show each individual obvious step" with "don't make
people review a bunch of scaffolding that's not going to be in the final
product".

-Peff

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 16/18] count-objects: report alternates via verbose mode
  2016-10-04  6:46   ` Jacob Keller
@ 2016-10-04 13:56     ` Jeff King
  0 siblings, 0 replies; 84+ messages in thread
From: Jeff King @ 2016-10-04 13:56 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Git mailing list, René Scharfe

On Mon, Oct 03, 2016 at 11:46:32PM -0700, Jacob Keller wrote:

> \On Mon, Oct 3, 2016 at 1:36 PM, Jeff King <peff@peff.net> wrote:
> > There's no way to get the list of alternates that git
> > computes internally; our tests only infer it based on which
> > objects are available. In addition to testing, knowing this
> > list may be helpful for somebody debugging their alternates
> > setup.
> >
> > Let's add it to the "count-objects -v" output. We could give
> > it a separate flag, but there's not really any need.
> > "count-objects -v" is already a debugging catch-all for the
> > object database, its output is easily extensible to new data
> > items, and printing the alternates is not expensive (we
> > already had to find them to count the objects).
> >
> 
> Makes sense. Unless there's a compelling reason you'd want to print
> out these alternates *without* anything else from -v, but you can just
> use grep like the test does so this seems fine to me.

Yeah. I could definitely be persuaded otherwise, but I just couldn't see
these being used for anything useful beyond debugging.

-Peff

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 17/18] sha1_file: always allow relative paths to alternates
  2016-10-04  6:50   ` Jacob Keller
@ 2016-10-04 14:00     ` Jeff King
  0 siblings, 0 replies; 84+ messages in thread
From: Jeff King @ 2016-10-04 14:00 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Git mailing list, René Scharfe

On Mon, Oct 03, 2016 at 11:50:30PM -0700, Jacob Keller wrote:

> > Note that our normalization doesn't actually look at the
> > filesystem, so it can still be fooled by crossing symbolic
> > links. But that's also true of absolute paths, so it's not a
> > good reason to disallow only relative paths (it's
> > potentially a reason to switch to real_path(), but that's a
> > separate and non-trivial change).
> 
> Hmm, ya using real_path would fix that but I definitely agree that's
> not trivial and can be done in the future if we think it is or becomes
> necessary.

I did look into this briefly. The trick is that real_path() assumes
relative paths are relative from the current directory (and does chdir()
trickery to get the filesystem to resolve things for us). So you'd
really need a "real_path_from" that chdirs to the relative base, issues
the real_path() from there, and then chdirs back to the original cwd.

Which I guess is no less gross than what real_path() is doing itself
internally, but it's definitely something for another patch. Given the
fact that we don't check it now and nobody has complained leads me to
believe that nobody really cares.

Actually, given the fact that we didn't allow relative bases in
recursive alternates, I suspect that very few people are using
complicated alternate setups in the first place.

-Peff

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 18/18] alternates: use fspathcmp to detect duplicates
  2016-10-04  6:51   ` Jacob Keller
@ 2016-10-04 14:10     ` Jeff King
  0 siblings, 0 replies; 84+ messages in thread
From: Jeff King @ 2016-10-04 14:10 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Git mailing list, René Scharfe

On Mon, Oct 03, 2016 at 11:51:59PM -0700, Jacob Keller wrote:

> On Mon, Oct 3, 2016 at 1:36 PM, Jeff King <peff@peff.net> wrote:
> > On a case-insensitive filesystem, we should realize that
> > "a/objects" and "A/objects" are the same path. We already
> > use fspathcmp() to check against the main object directory,
> > but until recently we couldn't use it for comparing against
> > other alternates (because their paths were not
> > NUL-terminated strings). But now we can, so let's do so.
> >
> 
> Yep, makes sense.
> 
> > Note that we also need to adjust count-objects to load the
> > config, so that it can see the setting of core.ignorecase
> > (this is required by the test, but is also a general bugfix
> > for users of count-objects).
> 
> Also makes sense.

BTW, I tested this on a vfat loopback device, but I was surprised to see
that quite a few other tests failed on that device.

At least one of the problems is that symlinks are not supported, but
lib-httpd.sh wants to use them for its Apache setup. I guess people on
Windows just don't run the httpd tests at all, which is not too
surprising.

Likewise, credential-cache fails because it cannot create a Unix socket
(and the flag for that is in the build, not a run-time filesystem
check).

Some of the other failures seemed to be due to lack of an executable bit
on the filesystem. I'm not sure if we could or should do better run-time
detection of that sort of thing. I think some of the checks are tied to
the build, and that's generally good enough in practice because people
don't use vfat on their Linux machines. So tracking down each of them
may just be pedantic make-work that nobody cares about.

I did wonder if there was another good filesystem to use for
case-insensitive experiments on Linux. At the time I didn't think there
was good support for making HFS+ filesystems, but it looks Debian cares
mkfs.hfs. That's probably a better choice for such experiments.

-Peff

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 0/18] alternate object database cleanups
  2016-10-04 13:41   ` Jeff King
@ 2016-10-04 20:40     ` Jacob Keller
  0 siblings, 0 replies; 84+ messages in thread
From: Jacob Keller @ 2016-10-04 20:40 UTC (permalink / raw)
  To: Jeff King; +Cc: Git mailing list, René Scharfe

On Tue, Oct 4, 2016 at 6:41 AM, Jeff King <peff@peff.net> wrote:
> On Mon, Oct 03, 2016 at 10:47:31PM -0700, Jacob Keller wrote:
>
>> > The number of patches is a little intimidating, but I tried hard to
>> > break the refactoring down into a sequence of obviously-correct steps.
>> > You can be the judge of my success.
>>
>> I read through them once. I'm going to re-read through them again and
>> leave any comments I had.
>
> Thanks for having the fortitude to read them all. :) After looking at
> your comments, I don't _think_ there's anything that necessitates a
> re-roll, but I'll respond to a few of them individually.
>
> -Peff

Ya, I don't either. Most of my comments were just me trying to make
sure I understood what you were doing.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 04/18] t5613: whitespace/style cleanups
  2016-10-04 13:47     ` Jeff King
@ 2016-10-04 20:41       ` Jacob Keller
  0 siblings, 0 replies; 84+ messages in thread
From: Jacob Keller @ 2016-10-04 20:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Git mailing list, René Scharfe

On Tue, Oct 4, 2016 at 6:47 AM, Jeff King <peff@peff.net> wrote:
> On Mon, Oct 03, 2016 at 10:52:39PM -0700, Jacob Keller wrote:
>
>> On Mon, Oct 3, 2016 at 1:34 PM, Jeff King <peff@peff.net> wrote:
>> > Our normal test style these days puts the opening quote of
>> > the body on the description line, and indents the body with
>> > a single tab. This ancient test did not follow this.
>> >
>>
>> I was surprised you didn't do this first, but it doesn't really make a
>> difference either way. This is also a pretty straight forward
>> improvement, and I can see why you'd want to split this out to review
>> separately.
>
> I was trying to leave it to the end, to move the substantive changes up
> front (and because there _isn't_ a correct style for some of the things
> it was doing). But it just got too painful to do the "don't chdir" patch
> without updating the style. I agree it might have made more sense at the
> very beginning, but I didn't think it mattered enough to go through the
> trouble of rebasing the earlier patches (which would essentially be
> rewriting them).
>
> -Peff

Right.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 06/18] t5613: clarify "too deep" recursion tests
  2016-10-04 13:48     ` Jeff King
@ 2016-10-04 20:44       ` Jacob Keller
  2016-10-04 20:49         ` Jeff King
  0 siblings, 1 reply; 84+ messages in thread
From: Jacob Keller @ 2016-10-04 20:44 UTC (permalink / raw)
  To: Jeff King; +Cc: Git mailing list, René Scharfe

On Tue, Oct 4, 2016 at 6:48 AM, Jeff King <peff@peff.net> wrote:
> On Mon, Oct 03, 2016 at 10:57:48PM -0700, Jacob Keller wrote:
>
>> > diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
>> > index 7bc1c3c..b393613 100755
>> > --- a/t/t5613-info-alternate.sh
>> > +++ b/t/t5613-info-alternate.sh
>> > @@ -39,6 +39,18 @@ test_expect_success 'preparing third repository' '
>> >         )
>> >  '
>> >
>> > +# Note: These tests depend on the hard-coded value of 5 as "too deep". We start
>> > +# the depth at 0 and count links, not repositories, so in a chain like:
>> > +#
>> > +#   A -> B -> C -> D -> E -> F -> G -> H
>> > +#      0    1    2    3    4    5    6
>> > +#
>>
>> Ok so we count links, but wouldn't we have 5 links when we hit F, and
>> not G? Or am I missing something here?
>
> This is what I was trying to get at with the "start the depth at 0". We
> disallow a depth greater than 5, but because we start at 0-counting,
> it's really six links. I guess saying "5 as too deep" is really the
> misleading part. It should be "5 as the maximum depth".
>
> -Peff

Right, but if A is 0, then:

B = 1
C = 2
D = 3
E = 4
F = 5
G = 6  (UhOh??)
H = 7

So do you mean that *B* = 0, and C = 1??? That is not clear from this commment.

So either way it still feels like "6" links is what is allowed? Or the
first link has to not count? That's really confusing.

Basically I G is the 7th letter, not the 6th, so even if we're
subtractnig 1 it's still 6 which is 1 too deep? That means we not only
discard 0 (the first repository) but we discount the 2nd one as well?

Thanks,
Jake

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 08/18] link_alt_odb_entry: refactor string handling
  2016-10-04 13:53     ` Jeff King
@ 2016-10-04 20:46       ` Jacob Keller
  0 siblings, 0 replies; 84+ messages in thread
From: Jacob Keller @ 2016-10-04 20:46 UTC (permalink / raw)
  To: Jeff King; +Cc: Git mailing list, René Scharfe

On Tue, Oct 4, 2016 at 6:53 AM, Jeff King <peff@peff.net> wrote:
> On Mon, Oct 03, 2016 at 11:05:42PM -0700, Jacob Keller wrote:
>
>> This definitely makes reading the following function much easier,
>> though the diff is a bit funky. I think the end result is much
>> clearer.
>
> Yeah, it's really hard to see that all of the "ent" setup is kept,
> because it moves _and_ changes its content (from pfxlen to pathbuf.len).
>
> I actually tried to split this into two patches to make the diff easier
> to read, but there are two mutually dependent changes: moving to
> pathbuf.len everywhere requires not-freeing pathbuf in the early code
> path. But if you do that and don't move all of "is it usable" checks up,
> then you have to add a bunch of new error-handling code that would just
> get ripped out in the next patch.
>
> There's definitely _some_ of that in this series already (e.g., the
> counting logic in alt_sha1_path() added by patch 14 that just gets
> ripped out in patch 15 when fill_sha1_path() learns to use a strbuf). I
> tried to balance "show each individual obvious step" with "don't make
> people review a bunch of scaffolding that's not going to be in the final
> product".
>
> -Peff

Mostly the diff is funky because of how the diff selected which chunks
moved vs how your patch described what chunks moved.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 06/18] t5613: clarify "too deep" recursion tests
  2016-10-04 20:44       ` Jacob Keller
@ 2016-10-04 20:49         ` Jeff King
  2016-10-04 20:52           ` Jacob Keller
  0 siblings, 1 reply; 84+ messages in thread
From: Jeff King @ 2016-10-04 20:49 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Git mailing list, René Scharfe

On Tue, Oct 04, 2016 at 01:44:23PM -0700, Jacob Keller wrote:

> On Tue, Oct 4, 2016 at 6:48 AM, Jeff King <peff@peff.net> wrote:
> > On Mon, Oct 03, 2016 at 10:57:48PM -0700, Jacob Keller wrote:
> >
> >> > diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
> >> > index 7bc1c3c..b393613 100755
> >> > --- a/t/t5613-info-alternate.sh
> >> > +++ b/t/t5613-info-alternate.sh
> >> > @@ -39,6 +39,18 @@ test_expect_success 'preparing third repository' '
> >> >         )
> >> >  '
> >> >
> >> > +# Note: These tests depend on the hard-coded value of 5 as "too deep". We start
> >> > +# the depth at 0 and count links, not repositories, so in a chain like:
> >> > +#
> >> > +#   A -> B -> C -> D -> E -> F -> G -> H
> >> > +#      0    1    2    3    4    5    6
> >> > +#
> >>
> >> Ok so we count links, but wouldn't we have 5 links when we hit F, and
> >> not G? Or am I missing something here?
> >
> > This is what I was trying to get at with the "start the depth at 0". We
> > disallow a depth greater than 5, but because we start at 0-counting,
> > it's really six links. I guess saying "5 as too deep" is really the
> > misleading part. It should be "5 as the maximum depth".
> >
> > -Peff
> 
> Right, but if A is 0, then:
> 
> B = 1
> C = 2
> D = 3
> E = 4
> F = 5
> G = 6  (UhOh??)
> H = 7
> 
> So do you mean that *B* = 0, and C = 1??? That is not clear from this commment.

No, we count links, not repositories. So the "A->B" link is "0", "B->C"
is "1", and so on.

> So either way it still feels like "6" links is what is allowed? Or the
> first link has to not count? That's really confusing.

Right, 6 links _are_ allowed. Because we count links, and because we
start the link-counting at "0" and allow through "5". The link labeled
"6" (which is really the seventh link!) is the one that is forbidden.

> Basically I G is the 7th letter, not the 6th, so even if we're
> subtractnig 1 it's still 6 which is 1 too deep? That means we not only
> discard 0 (the first repository) but we discount the 2nd one as well?

It's basically two off-by-ones from what you might think is correct.  I
agree it's unintuitive, but I'm just documenting what's there. We could
change it; it's not like anybody cares about the exact value except
"deep enough", but _since_ nobody cares, I preferred not to modify the
code.

-Peff

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 06/18] t5613: clarify "too deep" recursion tests
  2016-10-04 20:49         ` Jeff King
@ 2016-10-04 20:52           ` Jacob Keller
  2016-10-04 20:55             ` Jeff King
  0 siblings, 1 reply; 84+ messages in thread
From: Jacob Keller @ 2016-10-04 20:52 UTC (permalink / raw)
  To: Jeff King; +Cc: Git mailing list, René Scharfe

On Tue, Oct 4, 2016 at 1:49 PM, Jeff King <peff@peff.net> wrote:
> On Tue, Oct 04, 2016 at 01:44:23PM -0700, Jacob Keller wrote:
>
>> On Tue, Oct 4, 2016 at 6:48 AM, Jeff King <peff@peff.net> wrote:
>> > On Mon, Oct 03, 2016 at 10:57:48PM -0700, Jacob Keller wrote:
>> >
>> >> > diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
>> >> > index 7bc1c3c..b393613 100755
>> >> > --- a/t/t5613-info-alternate.sh
>> >> > +++ b/t/t5613-info-alternate.sh
>> >> > @@ -39,6 +39,18 @@ test_expect_success 'preparing third repository' '
>> >> >         )
>> >> >  '
>> >> >
>> >> > +# Note: These tests depend on the hard-coded value of 5 as "too deep". We start
>> >> > +# the depth at 0 and count links, not repositories, so in a chain like:
>> >> > +#
>> >> > +#   A -> B -> C -> D -> E -> F -> G -> H
>> >> > +#      0    1    2    3    4    5    6
>> >> > +#
>> >>
>> >> Ok so we count links, but wouldn't we have 5 links when we hit F, and
>> >> not G? Or am I missing something here?
>> >
>> > This is what I was trying to get at with the "start the depth at 0". We
>> > disallow a depth greater than 5, but because we start at 0-counting,
>> > it's really six links. I guess saying "5 as too deep" is really the
>> > misleading part. It should be "5 as the maximum depth".
>> >
>> > -Peff
>>
>> Right, but if A is 0, then:
>>
>> B = 1
>> C = 2
>> D = 3
>> E = 4
>> F = 5
>> G = 6  (UhOh??)
>> H = 7
>>
>> So do you mean that *B* = 0, and C = 1??? That is not clear from this commment.
>
> No, we count links, not repositories. So the "A->B" link is "0", "B->C"
> is "1", and so on.
>

If you need to re-roll for some other reason I would add some spaces
around the numbers so they line up better with the links so that this
becomes more clear.

>> So either way it still feels like "6" links is what is allowed? Or the
>> first link has to not count? That's really confusing.
>
> Right, 6 links _are_ allowed. Because we count links, and because we
> start the link-counting at "0" and allow through "5". The link labeled
> "6" (which is really the seventh link!) is the one that is forbidden.

Right. Ok this makes more sense now.

>
>> Basically I G is the 7th letter, not the 6th, so even if we're
>> subtractnig 1 it's still 6 which is 1 too deep? That means we not only
>> discard 0 (the first repository) but we discount the 2nd one as well?
>
> It's basically two off-by-ones from what you might think is correct.  I
> agree it's unintuitive, but I'm just documenting what's there. We could
> change it; it's not like anybody cares about the exact value except
> "deep enough", but _since_ nobody cares, I preferred not to modify the
> code.
>

I agree I don't think changing code is necessary, I was just confused
by the comment that tried to make it clear.

Thanks,
Jake

> -Peff

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 06/18] t5613: clarify "too deep" recursion tests
  2016-10-04 20:52           ` Jacob Keller
@ 2016-10-04 20:55             ` Jeff King
  2016-10-04 20:58               ` Stefan Beller
  2016-10-04 21:43               ` Jacob Keller
  0 siblings, 2 replies; 84+ messages in thread
From: Jeff King @ 2016-10-04 20:55 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Git mailing list, René Scharfe

On Tue, Oct 04, 2016 at 01:52:19PM -0700, Jacob Keller wrote:

> >> >> > +# Note: These tests depend on the hard-coded value of 5 as "too deep". We start
> >> >> > +# the depth at 0 and count links, not repositories, so in a chain like:
> >> >> > +#
> >> >> > +#   A -> B -> C -> D -> E -> F -> G -> H
> >> >> > +#      0    1    2    3    4    5    6
> >> >> > +#
> [...]
> > No, we count links, not repositories. So the "A->B" link is "0", "B->C"
> > is "1", and so on.
> 
> If you need to re-roll for some other reason I would add some spaces
> around the numbers so they line up better with the links so that this
> becomes more clear.

Hmm. Now I am puzzled, because I _did_ line up them specifically to make
this clear. I put the numbers under the ">" of the arrow. Did I screw up
the spacing somehow so that isn't how they look to you? Or are you just
saying you would prefer them under the "-" of the arrow?

-Peff

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 06/18] t5613: clarify "too deep" recursion tests
  2016-10-04 20:55             ` Jeff King
@ 2016-10-04 20:58               ` Stefan Beller
  2016-10-04 21:00                 ` Jeff King
  2016-10-05 13:58                 ` Jakub Narębski
  2016-10-04 21:43               ` Jacob Keller
  1 sibling, 2 replies; 84+ messages in thread
From: Stefan Beller @ 2016-10-04 20:58 UTC (permalink / raw)
  To: Jeff King; +Cc: Jacob Keller, Git mailing list, René Scharfe

On Tue, Oct 4, 2016 at 1:55 PM, Jeff King <peff@peff.net> wrote:
> On Tue, Oct 04, 2016 at 01:52:19PM -0700, Jacob Keller wrote:
>
>> >> >> > +# Note: These tests depend on the hard-coded value of 5 as "too deep". We start
>> >> >> > +# the depth at 0 and count links, not repositories, so in a chain like:
>> >> >> > +#
>> >> >> > +#   A -> B -> C -> D -> E -> F -> G -> H
>> >> >> > +#      0    1    2    3    4    5    6
>> >> >> > +#
>> [...]
>> > No, we count links, not repositories. So the "A->B" link is "0", "B->C"
>> > is "1", and so on.
>>
>> If you need to re-roll for some other reason I would add some spaces
>> around the numbers so they line up better with the links so that this
>> becomes more clear.
>
> Hmm. Now I am puzzled, because I _did_ line up them specifically to make
> this clear. I put the numbers under the ">" of the arrow. Did I screw up
> the spacing somehow so that isn't how they look to you? Or are you just
> saying you would prefer them under the "-" of the arrow?
>
> -Peff

Input from a self-claimed design expert for ASCII art. ;)
What about this?

#   A  -0->  B  -1->  C  -2->  ...

(Double space between letter and arrow, number included in the arrow)

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 06/18] t5613: clarify "too deep" recursion tests
  2016-10-04 20:58               ` Stefan Beller
@ 2016-10-04 21:00                 ` Jeff King
  2016-10-05 13:58                 ` Jakub Narębski
  1 sibling, 0 replies; 84+ messages in thread
From: Jeff King @ 2016-10-04 21:00 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Jacob Keller, Git mailing list, René Scharfe

On Tue, Oct 04, 2016 at 01:58:54PM -0700, Stefan Beller wrote:

> On Tue, Oct 4, 2016 at 1:55 PM, Jeff King <peff@peff.net> wrote:
> > On Tue, Oct 04, 2016 at 01:52:19PM -0700, Jacob Keller wrote:
> >
> >> >> >> > +# Note: These tests depend on the hard-coded value of 5 as "too deep". We start
> >> >> >> > +# the depth at 0 and count links, not repositories, so in a chain like:
> >> >> >> > +#
> >> >> >> > +#   A -> B -> C -> D -> E -> F -> G -> H
> >> >> >> > +#      0    1    2    3    4    5    6
> >> >> >> > +#
> >> [...]
> >> > No, we count links, not repositories. So the "A->B" link is "0", "B->C"
> >> > is "1", and so on.
> >>
> >> If you need to re-roll for some other reason I would add some spaces
> >> around the numbers so they line up better with the links so that this
> >> becomes more clear.
> >
> > Hmm. Now I am puzzled, because I _did_ line up them specifically to make
> > this clear. I put the numbers under the ">" of the arrow. Did I screw up
> > the spacing somehow so that isn't how they look to you? Or are you just
> > saying you would prefer them under the "-" of the arrow?
> >
> > -Peff
> 
> Input from a self-claimed design expert for ASCII art. ;)
> What about this?
> 
> #   A  -0->  B  -1->  C  -2->  ...
> 
> (Double space between letter and arrow, number included in the arrow)

I actually find that quite confusing, as it looks like "-1", "-2", etc.

This has got to be my favorite bikeshed discussion of all time, though. :)

-Peff

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 05/18] t5613: do not chdir in main process
  2016-10-03 20:34 ` [PATCH 05/18] t5613: do not chdir in main process Jeff King
  2016-10-04  5:54   ` Jacob Keller
@ 2016-10-04 21:00   ` Junio C Hamano
  1 sibling, 0 replies; 84+ messages in thread
From: Junio C Hamano @ 2016-10-04 21:00 UTC (permalink / raw)
  To: Jeff King; +Cc: git, René Scharfe

Jeff King <peff@peff.net> writes:

> Our usual style when working with subdirectories is to chdir
> inside a subshell or to use "git -C", which means we do not
> have to constantly return to the main test directory. Let's
> convert this old test, which does not follow that style.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  t/t5613-info-alternate.sh | 92 +++++++++++++++++------------------------------
>  1 file changed, 33 insertions(+), 59 deletions(-)

Whew.  Quite a lot of cleanups on this ancient script.  Thanks.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 07/18] link_alt_odb_entry: handle normalize_path errors
  2016-10-03 20:34 ` [PATCH 07/18] link_alt_odb_entry: handle normalize_path errors Jeff King
  2016-10-04  6:01   ` Jacob Keller
@ 2016-10-04 21:08   ` Junio C Hamano
  2016-10-05 18:47   ` René Scharfe
  2016-11-07 23:42   ` Bryan Turner
  3 siblings, 0 replies; 84+ messages in thread
From: Junio C Hamano @ 2016-10-04 21:08 UTC (permalink / raw)
  To: Jeff King; +Cc: git, René Scharfe

Jeff King <peff@peff.net> writes:

> +int strbuf_normalize_path(struct strbuf *src)
> +{
> +	struct strbuf dst = STRBUF_INIT;
> +
> +	strbuf_grow(&dst, src->len);
> +	if (normalize_path_copy(dst.buf, src->buf) < 0) {
> +		strbuf_release(&dst);
> +		return -1;
> +	}
> +
> +	/*
> +	 * normalize_path does not tell us the new length, so we have to
> +	 * compute it by looking for the new NUL it placed
> +	 */
> +	strbuf_setlen(&dst, strlen(dst.buf));
> +	strbuf_swap(src, &dst);
> +	strbuf_release(&dst);
> +	return 0;
> +}

Makes sense.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 08/18] link_alt_odb_entry: refactor string handling
  2016-10-03 20:34 ` [PATCH 08/18] link_alt_odb_entry: refactor string handling Jeff King
  2016-10-04  6:05   ` Jacob Keller
@ 2016-10-04 21:18   ` Junio C Hamano
  1 sibling, 0 replies; 84+ messages in thread
From: Junio C Hamano @ 2016-10-04 21:18 UTC (permalink / raw)
  To: Jeff King; +Cc: git, René Scharfe

Jeff King <peff@peff.net> writes:

> The string handling in link_alt_odb_entry() is mostly an
> artifact of the original version, which took the path as a
> ptr/len combo, and did not have a NUL-terminated string
> until we created one in the alternate_object_database
> struct.  But since 5bdf0a8 (sha1_file: normalize alt_odb
> path before comparing and storing, 2011-09-07), the first
> thing we do is put the path into a strbuf, which gives us
> some easy opportunities for cleanup.
>
> In particular:
>
>   - we call strlen(pathbuf.buf), which is silly; we can look
>     at pathbuf.len.
>
>   - even though we have a strbuf, we don't maintain its
>     "len" field when chomping extra slashes from the
>     end, and instead keep a separate "pfxlen" variable. We
>     can fix this and then drop "pfxlen" entirely.
>
>   - we don't check whether the path is usable until after we
>     allocate the new struct, making extra cleanup work for
>     ourselves. Since we have a NUL-terminated string, we can
>     bump the "is it usable" checks higher in the function.
>     While we're at it, we can move that logic to its own
>     helper, which makes the flow of link_alt_odb_entry()
>     easier to follow.

Also I find that this bit is a nice touch:

>  	/* recursively add alternates */
> -	read_info_alternates(ent->base, depth + 1);
> -
> -	ent->base[pfxlen] = '/';
> +	read_info_alternates(pathbuf.buf, depth + 1);

We used to leave ent->base[] string terminated with NUL while
calling read_info_alternates() and then added '/' after that, but
because the new code uses a separate pathbuf for the call, ent->base[]
can be prepared into the desired shape upfront.

Much easier to follow.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 12/18] alternates: use a separate scratch space
  2016-10-03 20:35 ` [PATCH 12/18] alternates: use a separate scratch space Jeff King
  2016-10-04  6:12   ` Jacob Keller
@ 2016-10-04 21:29   ` Junio C Hamano
  2016-10-04 21:32     ` Jeff King
  1 sibling, 1 reply; 84+ messages in thread
From: Junio C Hamano @ 2016-10-04 21:29 UTC (permalink / raw)
  To: Jeff King; +Cc: git, René Scharfe

Jeff King <peff@peff.net> writes:

>  extern struct alternate_object_database {
>  	struct alternate_object_database *next;
> +
>  	char *name;
> -	char base[FLEX_ARRAY]; /* more */
> +	char *scratch;
> +
> +	char path[FLEX_ARRAY];
>  } *alt_odb_list;

It is not wrong per-se, but I am a bit surprised to see that the
code keeps FLEX_ARRAY _and_ uses a separate malloc'ed area pointed
at by the scratch pointer.

Loss of "compare only up to the location 'name' points at" makes the
users of the struct that want only the directory path certainly a
lot simpler and easier to follow.

Thanks.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 12/18] alternates: use a separate scratch space
  2016-10-04 21:29   ` Junio C Hamano
@ 2016-10-04 21:32     ` Jeff King
  2016-10-04 21:49       ` Junio C Hamano
  0 siblings, 1 reply; 84+ messages in thread
From: Jeff King @ 2016-10-04 21:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, René Scharfe

On Tue, Oct 04, 2016 at 02:29:46PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> >  extern struct alternate_object_database {
> >  	struct alternate_object_database *next;
> > +
> >  	char *name;
> > -	char base[FLEX_ARRAY]; /* more */
> > +	char *scratch;
> > +
> > +	char path[FLEX_ARRAY];
> >  } *alt_odb_list;
> 
> It is not wrong per-se, but I am a bit surprised to see that the
> code keeps FLEX_ARRAY _and_ uses a separate malloc'ed area pointed
> at by the scratch pointer.

Yeah, there's really no reason "path" could not become a non-flex
buffer. I mostly left it there out of inertia. If you have a preference,
I'm happy to change it.

-Peff

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 18/18] alternates: use fspathcmp to detect duplicates
  2016-10-03 20:36 ` [PATCH 18/18] alternates: use fspathcmp to detect duplicates Jeff King
  2016-10-04  6:51   ` Jacob Keller
@ 2016-10-04 21:42   ` Junio C Hamano
  2016-10-05  2:34   ` Aaron Schrab
  2 siblings, 0 replies; 84+ messages in thread
From: Junio C Hamano @ 2016-10-04 21:42 UTC (permalink / raw)
  To: Jeff King; +Cc: git, René Scharfe

Jeff King <peff@peff.net> writes:

> ... but until recently we couldn't use it for comparing against
> other alternates (because their paths were not
> NUL-terminated strings).

;-)  

I should have expected this when reading the "let's have
a separate .path field" conversion.  Nice job.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 06/18] t5613: clarify "too deep" recursion tests
  2016-10-04 20:55             ` Jeff King
  2016-10-04 20:58               ` Stefan Beller
@ 2016-10-04 21:43               ` Jacob Keller
  2016-10-04 21:49                 ` Jeff King
  1 sibling, 1 reply; 84+ messages in thread
From: Jacob Keller @ 2016-10-04 21:43 UTC (permalink / raw)
  To: Jeff King; +Cc: Git mailing list, René Scharfe

On Tue, Oct 4, 2016 at 1:55 PM, Jeff King <peff@peff.net> wrote:
> On Tue, Oct 04, 2016 at 01:52:19PM -0700, Jacob Keller wrote:
>
>> >> >> > +# Note: These tests depend on the hard-coded value of 5 as "too deep". We start
>> >> >> > +# the depth at 0 and count links, not repositories, so in a chain like:
>> >> >> > +#
>> >> >> > +#   A -> B -> C -> D -> E -> F -> G -> H
>> >> >> > +#      0    1    2    3    4    5    6
>> >> >> > +#
>> [...]
>> > No, we count links, not repositories. So the "A->B" link is "0", "B->C"
>> > is "1", and so on.
>>
>> If you need to re-roll for some other reason I would add some spaces
>> around the numbers so they line up better with the links so that this
>> becomes more clear.
>
> Hmm. Now I am puzzled, because I _did_ line up them specifically to make
> this clear. I put the numbers under the ">" of the arrow. Did I screw up
> the spacing somehow so that isn't how they look to you? Or are you just
> saying you would prefer them under the "-" of the arrow?
>
> -Peff

I bet they line up in a monospace font and I just happened to be
viewing this from GMail which isn't showing it in monospace and so it
doesn't line up. Ignore me and carry on

Thanks,
Jake

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 13/18] fill_sha1_file: write "boring" characters
  2016-10-04  6:13   ` Jacob Keller
@ 2016-10-04 21:46     ` Junio C Hamano
  2016-10-04 21:48       ` Jeff King
  2016-10-04 21:49       ` Jacob Keller
  0 siblings, 2 replies; 84+ messages in thread
From: Junio C Hamano @ 2016-10-04 21:46 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Jeff King, Git mailing list, René Scharfe

Jacob Keller <jacob.keller@gmail.com> writes:

> On Mon, Oct 3, 2016 at 1:35 PM, Jeff King <peff@peff.net> wrote:
>> This function forms a sha1 as "xx/yyyy...", but skips over
>> the slot for the slash rather than writing it, leaving it to
>> the caller to do so. It also does not bother to put in a
>> trailing NUL, even though every caller would want it (we're
>> forming a path which by definition is not a directory, so
>> the only thing to do with it is feed it to a system call).
>>
>> Let's make the lives of our callers easier by just writing
>> out the internal "/" and the NUL.
>> ...
>
> I think this makes a lot more sense than making the callers have to do this.

The cost of fill function having to do the same thing repeatedly is
negligible, so I am OK with the result, but for fairness, this was
not "make the callers do this extra thing", but was "the caller can
prepare these unchanging parts just once, and the fill function that
is repeatedly run does not have to."


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 13/18] fill_sha1_file: write "boring" characters
  2016-10-04 21:46     ` Junio C Hamano
@ 2016-10-04 21:48       ` Jeff King
  2016-10-04 21:49       ` Jacob Keller
  1 sibling, 0 replies; 84+ messages in thread
From: Jeff King @ 2016-10-04 21:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jacob Keller, Git mailing list, René Scharfe

On Tue, Oct 04, 2016 at 02:46:44PM -0700, Junio C Hamano wrote:

> Jacob Keller <jacob.keller@gmail.com> writes:
> 
> > On Mon, Oct 3, 2016 at 1:35 PM, Jeff King <peff@peff.net> wrote:
> >> This function forms a sha1 as "xx/yyyy...", but skips over
> >> the slot for the slash rather than writing it, leaving it to
> >> the caller to do so. It also does not bother to put in a
> >> trailing NUL, even though every caller would want it (we're
> >> forming a path which by definition is not a directory, so
> >> the only thing to do with it is feed it to a system call).
> >>
> >> Let's make the lives of our callers easier by just writing
> >> out the internal "/" and the NUL.
> >> ...
> >
> > I think this makes a lot more sense than making the callers have to do this.
> 
> The cost of fill function having to do the same thing repeatedly is
> negligible, so I am OK with the result, but for fairness, this was
> not "make the callers do this extra thing", but was "the caller can
> prepare these unchanging parts just once, and the fill function that
> is repeatedly run does not have to."

Yeah, perhaps "does not bother" in the commit message is not entirely
fair. But it really does feel like quite a premature optimization to
skip the writing of one "/" in the middle of the string, especially as
it impacts the interface.

-Peff

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 13/18] fill_sha1_file: write "boring" characters
  2016-10-04 21:46     ` Junio C Hamano
  2016-10-04 21:48       ` Jeff King
@ 2016-10-04 21:49       ` Jacob Keller
  2016-10-05 19:35         ` Junio C Hamano
  1 sibling, 1 reply; 84+ messages in thread
From: Jacob Keller @ 2016-10-04 21:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Git mailing list, René Scharfe

On Tue, Oct 4, 2016 at 2:46 PM, Junio C Hamano <gitster@pobox.com> wrote:
> Jacob Keller <jacob.keller@gmail.com> writes:
>
>> On Mon, Oct 3, 2016 at 1:35 PM, Jeff King <peff@peff.net> wrote:
>>> This function forms a sha1 as "xx/yyyy...", but skips over
>>> the slot for the slash rather than writing it, leaving it to
>>> the caller to do so. It also does not bother to put in a
>>> trailing NUL, even though every caller would want it (we're
>>> forming a path which by definition is not a directory, so
>>> the only thing to do with it is feed it to a system call).
>>>
>>> Let's make the lives of our callers easier by just writing
>>> out the internal "/" and the NUL.
>>> ...
>>
>> I think this makes a lot more sense than making the callers have to do this.
>
> The cost of fill function having to do the same thing repeatedly is
> negligible, so I am OK with the result, but for fairness, this was
> not "make the callers do this extra thing", but was "the caller can
> prepare these unchanging parts just once, and the fill function that
> is repeatedly run does not have to."
>

Sure, but it's a pretty minor optimization and I think the result is
easier to understand.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 06/18] t5613: clarify "too deep" recursion tests
  2016-10-04 21:43               ` Jacob Keller
@ 2016-10-04 21:49                 ` Jeff King
  2016-10-04 21:50                   ` Jacob Keller
  0 siblings, 1 reply; 84+ messages in thread
From: Jeff King @ 2016-10-04 21:49 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Git mailing list, René Scharfe

On Tue, Oct 04, 2016 at 02:43:24PM -0700, Jacob Keller wrote:

> > Hmm. Now I am puzzled, because I _did_ line up them specifically to make
> > this clear. I put the numbers under the ">" of the arrow. Did I screw up
> > the spacing somehow so that isn't how they look to you? Or are you just
> > saying you would prefer them under the "-" of the arrow?
>
> I bet they line up in a monospace font and I just happened to be
> viewing this from GMail which isn't showing it in monospace and so it
> doesn't line up. Ignore me and carry on

Oh, good. I was wondering if I was going crazy. :)

-Peff

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 12/18] alternates: use a separate scratch space
  2016-10-04 21:32     ` Jeff King
@ 2016-10-04 21:49       ` Junio C Hamano
  2016-10-04 21:51         ` Jeff King
  0 siblings, 1 reply; 84+ messages in thread
From: Junio C Hamano @ 2016-10-04 21:49 UTC (permalink / raw)
  To: Jeff King; +Cc: git, René Scharfe

Jeff King <peff@peff.net> writes:

> On Tue, Oct 04, 2016 at 02:29:46PM -0700, Junio C Hamano wrote:
>
>> Jeff King <peff@peff.net> writes:
>> 
>> >  extern struct alternate_object_database {
>> >  	struct alternate_object_database *next;
>> > +
>> >  	char *name;
>> > -	char base[FLEX_ARRAY]; /* more */
>> > +	char *scratch;
>> > +
>> > +	char path[FLEX_ARRAY];
>> >  } *alt_odb_list;
>> 
>> It is not wrong per-se, but I am a bit surprised to see that the
>> code keeps FLEX_ARRAY _and_ uses a separate malloc'ed area pointed
>> at by the scratch pointer.
>
> Yeah, there's really no reason "path" could not become a non-flex
> buffer. I mostly left it there out of inertia. If you have a preference,
> I'm happy to change it.

My preference, before reaching the end of the series, actually was
to overallocate just once and point with *scratch into path[] beyond
the end of the fixed "where is the object directory?" string.

Of course, that would not mesh very well with the plan this series
had after this step to use strbuf for keeping scratch ;-)  And the
end result looks fine to me.

Thanks.


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 06/18] t5613: clarify "too deep" recursion tests
  2016-10-04 21:49                 ` Jeff King
@ 2016-10-04 21:50                   ` Jacob Keller
  0 siblings, 0 replies; 84+ messages in thread
From: Jacob Keller @ 2016-10-04 21:50 UTC (permalink / raw)
  To: Jeff King; +Cc: Git mailing list, René Scharfe

On Tue, Oct 4, 2016 at 2:49 PM, Jeff King <peff@peff.net> wrote:
> On Tue, Oct 04, 2016 at 02:43:24PM -0700, Jacob Keller wrote:
>
>> > Hmm. Now I am puzzled, because I _did_ line up them specifically to make
>> > this clear. I put the numbers under the ">" of the arrow. Did I screw up
>> > the spacing somehow so that isn't how they look to you? Or are you just
>> > saying you would prefer them under the "-" of the arrow?
>>
>> I bet they line up in a monospace font and I just happened to be
>> viewing this from GMail which isn't showing it in monospace and so it
>> doesn't line up. Ignore me and carry on
>
> Oh, good. I was wondering if I was going crazy. :)
>
> -Peff

Only one of us is going crazy, but I'm not sure who ;)

-Jake

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 12/18] alternates: use a separate scratch space
  2016-10-04 21:49       ` Junio C Hamano
@ 2016-10-04 21:51         ` Jeff King
  0 siblings, 0 replies; 84+ messages in thread
From: Jeff King @ 2016-10-04 21:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, René Scharfe

On Tue, Oct 04, 2016 at 02:49:45PM -0700, Junio C Hamano wrote:

> >> It is not wrong per-se, but I am a bit surprised to see that the
> >> code keeps FLEX_ARRAY _and_ uses a separate malloc'ed area pointed
> >> at by the scratch pointer.
> >
> > Yeah, there's really no reason "path" could not become a non-flex
> > buffer. I mostly left it there out of inertia. If you have a preference,
> > I'm happy to change it.
> 
> My preference, before reaching the end of the series, actually was
> to overallocate just once and point with *scratch into path[] beyond
> the end of the fixed "where is the object directory?" string.
> 
> Of course, that would not mesh very well with the plan this series
> had after this step to use strbuf for keeping scratch ;-)  And the
> end result looks fine to me.

Heh, yeah, I did not think of that (because I had the strbuf end-game in
mind the whole time). I agree that would be nicer if we were keeping the
raw buffer, if only because one could free the whole thing in one shot
(OTOH, we do not ever free these structs at all :) ).

-Peff

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 18/18] alternates: use fspathcmp to detect duplicates
  2016-10-03 20:36 ` [PATCH 18/18] alternates: use fspathcmp to detect duplicates Jeff King
  2016-10-04  6:51   ` Jacob Keller
  2016-10-04 21:42   ` Junio C Hamano
@ 2016-10-05  2:34   ` Aaron Schrab
  2016-10-05  3:54     ` Jeff King
  2 siblings, 1 reply; 84+ messages in thread
From: Aaron Schrab @ 2016-10-05  2:34 UTC (permalink / raw)
  To: Jeff King; +Cc: git, René Scharfe

At 16:36 -0400 03 Oct 2016, Jeff King <peff@peff.net> wrote:
>On a case-insensitive filesystem, we should realize that
>"a/objects" and "A/objects" are the same path.

The current repository being on a case-insensitive filesystem doesn't 
guarantee that the alternates are as well.

On the other hand, I suspect that people who use a case-insensitive 
filesystem would be less likely to use names which differ only by case.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 18/18] alternates: use fspathcmp to detect duplicates
  2016-10-05  2:34   ` Aaron Schrab
@ 2016-10-05  3:54     ` Jeff King
  0 siblings, 0 replies; 84+ messages in thread
From: Jeff King @ 2016-10-05  3:54 UTC (permalink / raw)
  To: git, René Scharfe

On Tue, Oct 04, 2016 at 10:34:55PM -0400, Aaron Schrab wrote:

> At 16:36 -0400 03 Oct 2016, Jeff King <peff@peff.net> wrote:
> > On a case-insensitive filesystem, we should realize that
> > "a/objects" and "A/objects" are the same path.
> 
> The current repository being on a case-insensitive filesystem doesn't
> guarantee that the alternates are as well.
> 
> On the other hand, I suspect that people who use a case-insensitive
> filesystem would be less likely to use names which differ only by case.

True. I don't think we actually have enough information to make the
correct comparison (not only that, but I think that fspathcmp() can
sometimes be fooled by a path which is only partially case-insensitive
due to a case-insensitive filesystem mounted on a case-sensitive one).

Still, I think in practice this is likely to do more good than harm, as
I'd guess that being on a single filesystem is the common case.

-Peff

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 06/18] t5613: clarify "too deep" recursion tests
  2016-10-04 20:58               ` Stefan Beller
  2016-10-04 21:00                 ` Jeff King
@ 2016-10-05 13:58                 ` Jakub Narębski
  2016-10-05 14:40                   ` Jeff King
  1 sibling, 1 reply; 84+ messages in thread
From: Jakub Narębski @ 2016-10-05 13:58 UTC (permalink / raw)
  To: Stefan Beller, Jeff King
  Cc: Jacob Keller, Git mailing list, René Scharfe

W dniu 04.10.2016 o 22:58, Stefan Beller pisze:
> On Tue, Oct 4, 2016 at 1:55 PM, Jeff King <peff@peff.net> wrote:
>> On Tue, Oct 04, 2016 at 01:52:19PM -0700, Jacob Keller wrote:
>>
>>>>>>>> +# Note: These tests depend on the hard-coded value of 5 as "too deep". We start
>>>>>>>> +# the depth at 0 and count links, not repositories, so in a chain like:
>>>>>>>> +#
>>>>>>>> +#   A -> B -> C -> D -> E -> F -> G -> H
>>>>>>>> +#      0    1    2    3    4    5    6
>>>>>>>> +#

> 
> Input from a self-claimed design expert for ASCII art. ;)
> What about this?
> 
> #   A  -0->  B  -1->  C  -2->  ...

I would prefer the following:

#   A --> B --> C --> D --> E --> F --> G --> H
#      0     1     2     3     4     5     6

that is, the number below the middle of the arrow
(which could have been even longer)

#   A ---> B ---> C ---> D ---> E ---> F ---> G ---> H
#      0      1      2      3      4      5      6


Let's paint this bikeshed _plaid_ ;-))))
-- 
Jakub Narębski


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 16/18] count-objects: report alternates via verbose mode
  2016-10-03 20:36 ` [PATCH 16/18] count-objects: report alternates via verbose mode Jeff King
  2016-10-04  6:46   ` Jacob Keller
@ 2016-10-05 14:23   ` Jakub Narębski
  2016-10-05 18:47   ` René Scharfe
  2 siblings, 0 replies; 84+ messages in thread
From: Jakub Narębski @ 2016-10-05 14:23 UTC (permalink / raw)
  To: Jeff King, git; +Cc: René Scharfe

W dniu 03.10.2016 o 22:36, Jeff King pisze:

> +test_expect_success 'count-objects shows the alternates' '
> +	cat >expect <<-EOF &&
> +	alternate: $(pwd)/B/.git/objects
> +	alternate: $(pwd)/A/.git/objects
> +	EOF
> +	git -C C count-objects -v >actual &&
> +	grep ^alternate: actual >actual.alternates &&
> +	test_cmp expect actual.alternates
> +'

This was bit hard to grok for me without quotes around
regular expression in grep (should it be sane_grep, BTW?):

  +	grep "^alternate:" actual >actual.alternates &&

But it might be just me... It's certainly not necessary.

-- 
Jakub Narębski


^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 06/18] t5613: clarify "too deep" recursion tests
  2016-10-05 13:58                 ` Jakub Narębski
@ 2016-10-05 14:40                   ` Jeff King
  2016-10-05 16:14                     ` Junio C Hamano
  2016-10-05 16:47                     ` Jacob Keller
  0 siblings, 2 replies; 84+ messages in thread
From: Jeff King @ 2016-10-05 14:40 UTC (permalink / raw)
  To: Jakub Narębski
  Cc: Junio C Hamano, Stefan Beller, Jacob Keller, Git mailing list,
	René Scharfe

On Wed, Oct 05, 2016 at 03:58:53PM +0200, Jakub Narębski wrote:

> I would prefer the following:
> 
> #   A --> B --> C --> D --> E --> F --> G --> H
> #      0     1     2     3     4     5     6

Yeah, that is also more visually pleasing.

Here's a squashable update that uses that and clarifies the points in
the discussion with Jacob.

Junio, do you mind squashing this in to jk/alt-odb-cleanup?

diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
index b393613..62170b7 100755
--- a/t/t5613-info-alternate.sh
+++ b/t/t5613-info-alternate.sh
@@ -39,13 +39,16 @@ test_expect_success 'preparing third repository' '
 	)
 '
 
-# Note: These tests depend on the hard-coded value of 5 as "too deep". We start
-# the depth at 0 and count links, not repositories, so in a chain like:
+# Note: These tests depend on the hard-coded value of 5 as the maximum depth
+# we will follow recursion. We start the depth at 0 and count links, not
+# repositories. This means that in a chain like:
 #
-#   A -> B -> C -> D -> E -> F -> G -> H
-#      0    1    2    3    4    5    6
+#   A --> B --> C --> D --> E --> F --> G --> H
+#      0     1     2     3     4     5     6
 #
-# we are OK at "G", but break at "H".
+# we are OK at "G", but break at "H", even though "H" is actually the 8th
+# repository, not the 6th, which you might expect. Counting the links allows
+# N+1 repositories, and counting from 0 to 5 inclusive allows 6 links.
 #
 # Note also that we must use "--bare -l" to make the link to H. The "-l"
 # ensures we do not do a connectivity check, and the "--bare" makes sure
@@ -59,11 +62,11 @@ test_expect_success 'creating too deep nesting' '
 	git clone --bare -l -s G H
 '
 
-test_expect_success 'validity of fifth-deep repository' '
+test_expect_success 'validity of seventh repository' '
 	git -C G fsck
 '
 
-test_expect_success 'invalidity of sixth-deep repository' '
+test_expect_success 'invalidity of eighth repository' '
 	test_must_fail git -C H fsck
 '
 

^ permalink raw reply related	[flat|nested] 84+ messages in thread

* Re: [PATCH 06/18] t5613: clarify "too deep" recursion tests
  2016-10-05 14:40                   ` Jeff King
@ 2016-10-05 16:14                     ` Junio C Hamano
  2016-10-05 16:47                     ` Jacob Keller
  1 sibling, 0 replies; 84+ messages in thread
From: Junio C Hamano @ 2016-10-05 16:14 UTC (permalink / raw)
  To: Jeff King
  Cc: Jakub Narębski, Stefan Beller, Jacob Keller,
	Git mailing list, René Scharfe

Jeff King <peff@peff.net> writes:

> On Wed, Oct 05, 2016 at 03:58:53PM +0200, Jakub Narębski wrote:
>
>> I would prefer the following:
>> 
>> #   A --> B --> C --> D --> E --> F --> G --> H
>> #      0     1     2     3     4     5     6
>
> Yeah, that is also more visually pleasing.
>
> Here's a squashable update that uses that and clarifies the points in
> the discussion with Jacob.
>
> Junio, do you mind squashing this in to jk/alt-odb-cleanup?

No, I don't.

> diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
> index b393613..62170b7 100755
> --- a/t/t5613-info-alternate.sh
> +++ b/t/t5613-info-alternate.sh
> @@ -39,13 +39,16 @@ test_expect_success 'preparing third repository' '
>  	)
>  '
>  
> -# Note: These tests depend on the hard-coded value of 5 as "too deep". We start
> -# the depth at 0 and count links, not repositories, so in a chain like:
> +# Note: These tests depend on the hard-coded value of 5 as the maximum depth
> +# we will follow recursion. We start the depth at 0 and count links, not
> +# repositories. This means that in a chain like:
>  #
> -#   A -> B -> C -> D -> E -> F -> G -> H
> -#      0    1    2    3    4    5    6
> +#   A --> B --> C --> D --> E --> F --> G --> H
> +#      0     1     2     3     4     5     6
>  #
> -# we are OK at "G", but break at "H".
> +# we are OK at "G", but break at "H", even though "H" is actually the 8th
> +# repository, not the 6th, which you might expect. Counting the links allows
> +# N+1 repositories, and counting from 0 to 5 inclusive allows 6 links.
>  #
>  # Note also that we must use "--bare -l" to make the link to H. The "-l"
>  # ensures we do not do a connectivity check, and the "--bare" makes sure
> @@ -59,11 +62,11 @@ test_expect_success 'creating too deep nesting' '
>  	git clone --bare -l -s G H
>  '
>  
> -test_expect_success 'validity of fifth-deep repository' '
> +test_expect_success 'validity of seventh repository' '
>  	git -C G fsck
>  '
>  
> -test_expect_success 'invalidity of sixth-deep repository' '
> +test_expect_success 'invalidity of eighth repository' '
>  	test_must_fail git -C H fsck
>  '
>  

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 06/18] t5613: clarify "too deep" recursion tests
  2016-10-05 14:40                   ` Jeff King
  2016-10-05 16:14                     ` Junio C Hamano
@ 2016-10-05 16:47                     ` Jacob Keller
  1 sibling, 0 replies; 84+ messages in thread
From: Jacob Keller @ 2016-10-05 16:47 UTC (permalink / raw)
  To: Jeff King
  Cc: Jakub Narębski, Junio C Hamano, Stefan Beller,
	Git mailing list, René Scharfe

On Wed, Oct 5, 2016 at 7:40 AM, Jeff King <peff@peff.net> wrote:
> On Wed, Oct 05, 2016 at 03:58:53PM +0200, Jakub Narębski wrote:
>
>> I would prefer the following:
>>
>> #   A --> B --> C --> D --> E --> F --> G --> H
>> #      0     1     2     3     4     5     6
>
> Yeah, that is also more visually pleasing.
>
> Here's a squashable update that uses that and clarifies the points in
> the discussion with Jacob.
>
> Junio, do you mind squashing this in to jk/alt-odb-cleanup?
>
> diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
> index b393613..62170b7 100755
> --- a/t/t5613-info-alternate.sh
> +++ b/t/t5613-info-alternate.sh
> @@ -39,13 +39,16 @@ test_expect_success 'preparing third repository' '
>         )
>  '
>
> -# Note: These tests depend on the hard-coded value of 5 as "too deep". We start
> -# the depth at 0 and count links, not repositories, so in a chain like:
> +# Note: These tests depend on the hard-coded value of 5 as the maximum depth
> +# we will follow recursion. We start the depth at 0 and count links, not
> +# repositories. This means that in a chain like:
>  #
> -#   A -> B -> C -> D -> E -> F -> G -> H
> -#      0    1    2    3    4    5    6
> +#   A --> B --> C --> D --> E --> F --> G --> H
> +#      0     1     2     3     4     5     6

Yea this looks much better (when I view it locally, gmail still looks
aweful here but...)

>  #
> -# we are OK at "G", but break at "H".
> +# we are OK at "G", but break at "H", even though "H" is actually the 8th
> +# repository, not the 6th, which you might expect. Counting the links allows
> +# N+1 repositories, and counting from 0 to 5 inclusive allows 6 links.
>  #

... This is much more clear wording that helps me understand this a
lot more. Thanks!

Regards,
Jake

>  # Note also that we must use "--bare -l" to make the link to H. The "-l"
>  # ensures we do not do a connectivity check, and the "--bare" makes sure
> @@ -59,11 +62,11 @@ test_expect_success 'creating too deep nesting' '
>         git clone --bare -l -s G H
>  '
>
> -test_expect_success 'validity of fifth-deep repository' '
> +test_expect_success 'validity of seventh repository' '
>         git -C G fsck
>  '
>
> -test_expect_success 'invalidity of sixth-deep repository' '
> +test_expect_success 'invalidity of eighth repository' '
>         test_must_fail git -C H fsck
>  '
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 0/18] alternate object database cleanups
  2016-10-03 20:33 [PATCH 0/18] alternate object database cleanups Jeff King
                   ` (18 preceding siblings ...)
  2016-10-04  5:47 ` [PATCH 0/18] alternate object database cleanups Jacob Keller
@ 2016-10-05 18:47 ` René Scharfe
  19 siblings, 0 replies; 84+ messages in thread
From: René Scharfe @ 2016-10-05 18:47 UTC (permalink / raw)
  To: Jeff King, git

Am 03.10.2016 um 22:33 schrieb Jeff King:
> This series is the result of René nerd-sniping me with the claim that we
> could "easily" teach count-objects to print out the list of alternates
> in:
>
>   http://public-inbox.org/git/c27dc1a4-3c7a-2866-d9d8-f5d3eb161650@web.de/

1. Send crappy patch
2. ????
3. PROFIT!!!

Sometimes it works. :)

Thank you!
René

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 16/18] count-objects: report alternates via verbose mode
  2016-10-03 20:36 ` [PATCH 16/18] count-objects: report alternates via verbose mode Jeff King
  2016-10-04  6:46   ` Jacob Keller
  2016-10-05 14:23   ` Jakub Narębski
@ 2016-10-05 18:47   ` René Scharfe
  2 siblings, 0 replies; 84+ messages in thread
From: René Scharfe @ 2016-10-05 18:47 UTC (permalink / raw)
  To: Jeff King, git

Am 03.10.2016 um 22:36 schrieb Jeff King:
> There's no way to get the list of alternates that git
> computes internally; our tests only infer it based on which
> objects are available. In addition to testing, knowing this
> list may be helpful for somebody debugging their alternates
> setup.
>
> Let's add it to the "count-objects -v" output. We could give
> it a separate flag, but there's not really any need.
> "count-objects -v" is already a debugging catch-all for the
> object database, its output is easily extensible to new data
> items, and printing the alternates is not expensive (we
> already had to find them to count the objects).

Good idea.

> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  Documentation/git-count-objects.txt |  5 +++++
>  builtin/count-objects.c             | 10 ++++++++++
>  t/t5613-info-alternate.sh           | 10 ++++++++++
>  3 files changed, 25 insertions(+)
>
> diff --git a/Documentation/git-count-objects.txt b/Documentation/git-count-objects.txt
> index 2ff3568..cb9b4d2 100644
> --- a/Documentation/git-count-objects.txt
> +++ b/Documentation/git-count-objects.txt
> @@ -38,6 +38,11 @@ objects nor valid packs
>  +
>  size-garbage: disk space consumed by garbage files, in KiB (unless -H is
>  specified)
> ++
> +alternate: absolute path of alternate object databases; may appear
> +multiple times, one line per path. Note that if the path contains
> +non-printable characters, it may be surrounded by double-quotes and
> +contain C-style backslashed escape sequences.
>
>  -H::
>  --human-readable::
> diff --git a/builtin/count-objects.c b/builtin/count-objects.c
> index ba92919..a700409 100644
> --- a/builtin/count-objects.c
> +++ b/builtin/count-objects.c
> @@ -8,6 +8,7 @@
>  #include "dir.h"
>  #include "builtin.h"
>  #include "parse-options.h"
> +#include "quote.h"
>
>  static unsigned long garbage;
>  static off_t size_garbage;
> @@ -73,6 +74,14 @@ static int count_cruft(const char *basename, const char *path, void *data)
>  	return 0;
>  }
>
> +static int print_alternate(struct alternate_object_database *alt, void *data)
> +{
> +	printf("alternate: ");
> +	quote_c_style(alt->path, NULL, stdout, 0);
> +	putchar('\n');
> +	return 0;
> +}

Yeah, quoting paths makes sense.

> +
>  static char const * const count_objects_usage[] = {
>  	N_("git count-objects [-v] [-H | --human-readable]"),
>  	NULL
> @@ -140,6 +149,7 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix)
>  		printf("prune-packable: %lu\n", packed_loose);
>  		printf("garbage: %lu\n", garbage);
>  		printf("size-garbage: %s\n", garbage_buf.buf);
> +		foreach_alt_odb(print_alternate, NULL);
>  		strbuf_release(&loose_buf);
>  		strbuf_release(&pack_buf);
>  		strbuf_release(&garbage_buf);
> diff --git a/t/t5613-info-alternate.sh b/t/t5613-info-alternate.sh
> index b393613..74f6770 100755
> --- a/t/t5613-info-alternate.sh
> +++ b/t/t5613-info-alternate.sh
> @@ -39,6 +39,16 @@ test_expect_success 'preparing third repository' '
>  	)
>  '
>
> +test_expect_success 'count-objects shows the alternates' '
> +	cat >expect <<-EOF &&
> +	alternate: $(pwd)/B/.git/objects
> +	alternate: $(pwd)/A/.git/objects
> +	EOF
> +	git -C C count-objects -v >actual &&
> +	grep ^alternate: actual >actual.alternates &&
> +	test_cmp expect actual.alternates
> +'
> +
>  # Note: These tests depend on the hard-coded value of 5 as "too deep". We start
>  # the depth at 0 and count links, not repositories, so in a chain like:
>  #
>

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 07/18] link_alt_odb_entry: handle normalize_path errors
  2016-10-03 20:34 ` [PATCH 07/18] link_alt_odb_entry: handle normalize_path errors Jeff King
  2016-10-04  6:01   ` Jacob Keller
  2016-10-04 21:08   ` Junio C Hamano
@ 2016-10-05 18:47   ` René Scharfe
  2016-10-05 19:04     ` Jeff King
  2016-11-07 23:42   ` Bryan Turner
  3 siblings, 1 reply; 84+ messages in thread
From: René Scharfe @ 2016-10-05 18:47 UTC (permalink / raw)
  To: Jeff King, git

Am 03.10.2016 um 22:34 schrieb Jeff King:
> When we add a new alternate to the list, we try to normalize
> out any redundant "..", etc. However, we do not look at the
> return value of normalize_path_copy(), and will happily
> continue with a path that could not be normalized. Worse,
> the normalizing process is done in-place, so we are left
> with whatever half-finished working state the normalizing
> function was in.
>
> Fortunately, this cannot cause us to read past the end of
> our buffer, as that working state will always leave the
> NUL from the original path in place. And we do tend to
> notice problems when we check is_directory() on the path.
> But you can see the nonsense that we feed to is_directory
> with an entry like:
>
>   this/../../is/../../way/../../too/../../deep/../../to/../../resolve
>
> in your objects/info/alternates, which yields:
>
>   error: object directory
>   /to/e/deep/too/way//ects/this/../../is/../../way/../../too/../../deep/../../to/../../resolve
>   does not exist; check .git/objects/info/alternates.
>
> We can easily fix this just by checking the return value.
> But that makes it hard to generate a good error message,
> since we're normalizing in-place and our input value has
> been overwritten by cruft.
>
> Instead, let's provide a strbuf helper that does an in-place
> normalize, but restores the original contents on error. This
> uses a second buffer under the hood, which is slightly less
> efficient, but this is not a performance-critical code path.

Hmm, in-place functions are quite rare in the strbuf collection.  It 
looks like a good fit for the two callers and makes sense in general, 
though.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 07/18] link_alt_odb_entry: handle normalize_path errors
  2016-10-05 18:47   ` René Scharfe
@ 2016-10-05 19:04     ` Jeff King
  0 siblings, 0 replies; 84+ messages in thread
From: Jeff King @ 2016-10-05 19:04 UTC (permalink / raw)
  To: René Scharfe; +Cc: git

On Wed, Oct 05, 2016 at 08:47:29PM +0200, René Scharfe wrote:

> > Instead, let's provide a strbuf helper that does an in-place
> > normalize, but restores the original contents on error. This
> > uses a second buffer under the hood, which is slightly less
> > efficient, but this is not a performance-critical code path.
> 
> Hmm, in-place functions are quite rare in the strbuf collection.  It looks
> like a good fit for the two callers and makes sense in general, though.

Yeah, I almost wrote "strbuf_add_normalized_path()" instead. But then
the callers end up having to do the allocate-and-swap thing themselves.
And I think we're still set in the future to add that if somebody wants
it (and we can then implement the in-place version in terms of it).

Another alternative is to observe that the strbuf is generally used in
the first place to make the path absolute. So another interface is
perhaps something like:

  strbuf_add_path(struct strbuf *sb, const char *path,
                  const char *relative_base)
  {
        struct strbuf scratch = STRBUF_INIT;
        int ret;

        if (is_absolute_path(path))
                strbuf_grow(sb, strlen(path));
        else {
                if (relative_path)
                        strbuf_addstr(&scratch, path);
                else {
                        if (strbuf_getcwd(&scratch))
                                return -1;
                }
                strbuf_addch(&scratch, '/');
                strbuf_addstr(&scratch, path);

                strbuf_grow(sb, scratch.len);
                path = scratch.buf;
        }

        ret = normalize_path_copy(sb.buf + sb.len, path);
        strbuf_release(&scratch);
        return ret;
  }

I don't think its worth the complexity of interface for the spots in
this series, but maybe there are other places that could use it. I'll
leave that to somebody else to explore if the ywant to.

-Peff

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 13/18] fill_sha1_file: write "boring" characters
  2016-10-04 21:49       ` Jacob Keller
@ 2016-10-05 19:35         ` Junio C Hamano
  0 siblings, 0 replies; 84+ messages in thread
From: Junio C Hamano @ 2016-10-05 19:35 UTC (permalink / raw)
  To: Jacob Keller; +Cc: Jeff King, Git mailing list, René Scharfe

Jacob Keller <jacob.keller@gmail.com> writes:

>> The cost of fill function having to do the same thing repeatedly is
>> negligible, so I am OK with the result, but for fairness, this was
>> not "make the callers do this extra thing", but was "the caller can
>> prepare these unchanging parts just once, and the fill function that
>> is repeatedly run does not have to."
>
> Sure, but it's a pretty minor optimization and I think the result is
> easier to understand.

Yes; in case it wasn't clear, my comment was merely for fairness to
the original code.  I do agree that the end result of this series
makes a very pleasant read.

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 07/18] link_alt_odb_entry: handle normalize_path errors
  2016-10-03 20:34 ` [PATCH 07/18] link_alt_odb_entry: handle normalize_path errors Jeff King
                     ` (2 preceding siblings ...)
  2016-10-05 18:47   ` René Scharfe
@ 2016-11-07 23:42   ` Bryan Turner
  2016-11-08  0:30     ` Jeff King
  3 siblings, 1 reply; 84+ messages in thread
From: Bryan Turner @ 2016-11-07 23:42 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Users, René Scharfe

On Mon, Oct 3, 2016 at 1:34 PM, Jeff King <peff@peff.net> wrote:
> When we add a new alternate to the list, we try to normalize
> out any redundant "..", etc. However, we do not look at the
> return value of normalize_path_copy(), and will happily
> continue with a path that could not be normalized. Worse,
> the normalizing process is done in-place, so we are left
> with whatever half-finished working state the normalizing
> function was in.
>

<snip>

> @@ -335,7 +340,9 @@ static void link_alt_odb_entries(const char *alt, int len, int sep,
>         }
>
>         strbuf_add_absolute_path(&objdirbuf, get_object_directory());
> -       normalize_path_copy(objdirbuf.buf, objdirbuf.buf);
> +       if (strbuf_normalize_path(&objdirbuf) < 0)
> +               die("unable to normalize object directory: %s",
> +                   objdirbuf.buf);

This appears to break the ability to use a relative alternate via an
environment variable, since normalize_path_copy_len is explicitly
documented "Returns failure (non-zero) if a ".." component appears as
first path"

For example, when trying to run a rev-list over commits in two
repositories using GIT_ALTERNATE_OBJECT_DIRECTORIES, in 2.10.x and
prior the following command works. I know the alternate worked
previously because I'm passing a commit that does not exist in the
repository I'm running the command in; it only exists in a repository
linked by alternate, as shown by the "fatal: bad object" when the
alternates are rejected.

Before, using Git 2.7.4 (but I've verified this behavior through to
and including 2.10.2):

bturner@elysoun /tmp/1478561282706-0/shared/data/repositories/3 $
GIT_ALTERNATE_OBJECT_DIRECTORIES=../0/objects:../1/objects git
rev-list --format="%H" 2d8897c9ac29ce42c3442cf80ac977057045e7f6
74de5497dfca9731e455d60552f9a8906e5dc1ac
^6053a1eaa1c009dd11092d09a72f3c41af1b59ad
^017caf31eca7c46eb3d1800fcac431cfa7147a01 --
commit 74de5497dfca9731e455d60552f9a8906e5dc1ac
74de5497dfca9731e455d60552f9a8906e5dc1ac
commit 3528cf690cb37f6adb85b7bd40cc7a6118d4b598
3528cf690cb37f6adb85b7bd40cc7a6118d4b598
commit 2d8897c9ac29ce42c3442cf80ac977057045e7f6
2d8897c9ac29ce42c3442cf80ac977057045e7f6
commit 9c05f43f859375e392d90d23a13717c16d0fdcda
9c05f43f859375e392d90d23a13717c16d0fdcda

Now, using Git 2.11.0-rc0

bturner@elysoun /tmp/1478561282706-0/shared/data/repositories/3 $
GIT_ALTERNATE_OBJECT_DIRECTORIES=../0/objects:../1/objects
/opt/git/2.11.0-rc0/bin/git rev-list --format="%H"
2d8897c9ac29ce42c3442cf80ac977057045e7f6
74de5497dfca9731e455d60552f9a8906e5dc1ac
^6053a1eaa1c009dd11092d09a72f3c41af1b59ad
^017caf31eca7c46eb3d1800fcac431cfa7147a01 --
error: unable to normalize alternate object path: ../0/objects
error: unable to normalize alternate object path: ../1/objects
fatal: bad object 74de5497dfca9731e455d60552f9a8906e5dc1ac

Other commits, like [1], suggest the ability to use relative paths in
alternates is something still actively developed and enhanced. Is it
intentional that this breaks the ability to use relative alternates?
If this is to be the "new normal", is there any other option when
using environment variables besides using absolute paths?

Best regards,
Bryan Turner

[1]: https://github.com/git/git/commit/087b6d584062f5b704356286d6445bcc84d686fb
-- Also newly tagged in 2.11.0-rc0

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 07/18] link_alt_odb_entry: handle normalize_path errors
  2016-11-07 23:42   ` Bryan Turner
@ 2016-11-08  0:30     ` Jeff King
  2016-11-08  1:12       ` Bryan Turner
  0 siblings, 1 reply; 84+ messages in thread
From: Jeff King @ 2016-11-08  0:30 UTC (permalink / raw)
  To: Bryan Turner; +Cc: Git Users, René Scharfe

On Mon, Nov 07, 2016 at 03:42:35PM -0800, Bryan Turner wrote:

> > @@ -335,7 +340,9 @@ static void link_alt_odb_entries(const char *alt, int len, int sep,
> >         }
> >
> >         strbuf_add_absolute_path(&objdirbuf, get_object_directory());
> > -       normalize_path_copy(objdirbuf.buf, objdirbuf.buf);
> > +       if (strbuf_normalize_path(&objdirbuf) < 0)
> > +               die("unable to normalize object directory: %s",
> > +                   objdirbuf.buf);
> 
> This appears to break the ability to use a relative alternate via an
> environment variable, since normalize_path_copy_len is explicitly
> documented "Returns failure (non-zero) if a ".." component appears as
> first path"

That shouldn't happen, though, because the path we are normalizing has
been converted to an absolute path via strbuf_add_absolute_path. IOW, if
your relative path is "../../../foo", we should be feeding something
like "/path/to/repo/.git/objects/../../../foo" and normalizing that to
"/path/to/foo".

But in your example, you see:

  error: unable to normalize alternate object path: ../0/objects

which cannot come from the code above, which calls die(). It should be
coming from the call in link_alt_odb_entry().

I think what is happening is that relative paths via environment
variables have always been slightly broken, but happened to mostly work.
In prepare_alt_odb(), we call link_alt_odb_entries() with a NULL
relative_base. That function does two things with it:

  - it may unconditionally dereference it for an error message, which
    would cause a segfault. This is impossible to trigger in practice,
    though, because the error message is related to the depth, which we
    know will always be 0 here.

  - we pass the NULL along to the singular link_alt_odb_entry().
    That function only creates an absolute path if given a non-NULL
    relative_base; otherwise we have always fed the path to
    normalize_path_copy, which is nonsense for a relative path.

    So normalize_path_copy() was _always_ returning an error there, but
    we ignored it and used whatever happened to be left in the buffer
    anyway. And because of the way normalize_path_copy() is implemented,
    that happened to be the untouched original string in most cases. But
    that's mostly an accident. I think it would not be for something
    like "foo/../../bar", which is technically valid (if done from a
    relative base that has at least one path component).

    Moreover, it means we don't have an absolute path to our alternate
    odb. So the path is taken as relative whenever we do an object
    lookup, meaning it will behave differently between a bare repository
    (where we chdir to $GIT_DIR) and one with a working tree (where we
    are generally in the root of the working tree). It can even behave
    differently in the same process if we chdir between object lookups.

So it did happen to work, but I'm not sure it was planned (and obviously
we have no test coverage for it). More on that below.

> Other commits, like [1], suggest the ability to use relative paths in
> alternates is something still actively developed and enhanced. Is it
> intentional that this breaks the ability to use relative alternates?
> If this is to be the "new normal", is there any other option when
> using environment variables besides using absolute paths?

No, I had no intention of disallowing relative alternates (and as you
noticed, a commit from the same series actually expands the use of
relative alternates). My use has been entirely within info/alternates
files, though, not via the environment.

As I said, I'm not sure this was ever meant to work, but as far as I can
tell it mostly _has_ worked, modulo some quirks. So I think we should
consider it a regression for it to stop working in v2.11.

The obvious solution is one of:

  1. Stop calling normalize() at all when we do not have a relative base
     and the path is not absolute. This restores the original quirky
     behavior (plus makes the "foo/../../bar" case work).

     If we want to do the minimum before releasing v2.11, it would be
     that. I'm not sure it leaves things in a very sane state, but at
     least v2.11 does no harm, and anybody who cares can build saner
     semantics for v2.12.

  2. Fix it for real. Pass a real relative_base when linking from the
     environment. The question is: what is the correct relative base? I
     suppose "getcwd() at the time we prepare the alt odb" is
     reasonable, and would behave similarly to older versions ($GIT_DIR
     for bare repos, top of the working tree otherwise).

     If we were designing from scratch, I think saner semantics would
     probably be always relative from $GIT_DIR, or even always relative
     from the object directory (i.e., behave as if the paths were given
     in objects/info/alternates). But that breaks compatibility with
     older versions. If we are treating this as a regression, it is not
     very friendly to say "you are still broken, but you might just need
     to add an extra '..' to your path".

So I dunno. I guess that inclines me towards (1), as it lets us punt on
the harder question.

-Peff

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 07/18] link_alt_odb_entry: handle normalize_path errors
  2016-11-08  0:30     ` Jeff King
@ 2016-11-08  1:12       ` Bryan Turner
  2016-11-08  5:33         ` Jeff King
  0 siblings, 1 reply; 84+ messages in thread
From: Bryan Turner @ 2016-11-08  1:12 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Users, René Scharfe

On Mon, Nov 7, 2016 at 4:30 PM, Jeff King <peff@peff.net> wrote:
> On Mon, Nov 07, 2016 at 03:42:35PM -0800, Bryan Turner wrote:
>
>> > @@ -335,7 +340,9 @@ static void link_alt_odb_entries(const char *alt, int len, int sep,
>> >         }
>> >
>> >         strbuf_add_absolute_path(&objdirbuf, get_object_directory());
>> > -       normalize_path_copy(objdirbuf.buf, objdirbuf.buf);
>> > +       if (strbuf_normalize_path(&objdirbuf) < 0)
>> > +               die("unable to normalize object directory: %s",
>> > +                   objdirbuf.buf);
>>
>> This appears to break the ability to use a relative alternate via an
>> environment variable, since normalize_path_copy_len is explicitly
>> documented "Returns failure (non-zero) if a ".." component appears as
>> first path"
>
> That shouldn't happen, though, because the path we are normalizing has
> been converted to an absolute path via strbuf_add_absolute_path. IOW, if
> your relative path is "../../../foo", we should be feeding something
> like "/path/to/repo/.git/objects/../../../foo" and normalizing that to
> "/path/to/foo".
>
> But in your example, you see:
>
>   error: unable to normalize alternate object path: ../0/objects
>
> which cannot come from the code above, which calls die(). It should be
> coming from the call in link_alt_odb_entry().

Ah, of course. I should have looked more closely at the call.

<snip>

> No, I had no intention of disallowing relative alternates (and as you
> noticed, a commit from the same series actually expands the use of
> relative alternates). My use has been entirely within info/alternates
> files, though, not via the environment.
>
> As I said, I'm not sure this was ever meant to work, but as far as I can
> tell it mostly _has_ worked, modulo some quirks. So I think we should
> consider it a regression for it to stop working in v2.11.
>
> The obvious solution is one of:
>
>   1. Stop calling normalize() at all when we do not have a relative base
>      and the path is not absolute. This restores the original quirky
>      behavior (plus makes the "foo/../../bar" case work).
>
>      If we want to do the minimum before releasing v2.11, it would be
>      that. I'm not sure it leaves things in a very sane state, but at
>      least v2.11 does no harm, and anybody who cares can build saner
>      semantics for v2.12.
>
>   2. Fix it for real. Pass a real relative_base when linking from the
>      environment. The question is: what is the correct relative base? I
>      suppose "getcwd() at the time we prepare the alt odb" is
>      reasonable, and would behave similarly to older versions ($GIT_DIR
>      for bare repos, top of the working tree otherwise).
>
>      If we were designing from scratch, I think saner semantics would
>      probably be always relative from $GIT_DIR, or even always relative
>      from the object directory (i.e., behave as if the paths were given
>      in objects/info/alternates). But that breaks compatibility with
>      older versions. If we are treating this as a regression, it is not
>      very friendly to say "you are still broken, but you might just need
>      to add an extra '..' to your path".
>
> So I dunno. I guess that inclines me towards (1), as it lets us punt on
> the harder question.

Is there anything I can do to help? I'm happy to test out changes.
I've got a set of ~1,040 tests that verify all sorts of different Git
behaviors (those tests flagged this change, for example, and found a
regression in git diff-tree in 2.0.2/2.0.3, among other things). I run
them on the "newest" patch release for every feature-bearing line of
Git from 1.8.x up to 2.10 (currently 1.8.0.3, 1.8.1.5, 1.8.2.3,
1.8.3.4, 1.8.4.5, 1.8.5.6, 1.9.5, 2.0.5, 2.1.4, 2.2.3, 2.3.10, 2.4.11,
2.5.5, 2.6.6, 2.7.4, 2.8.4, 2.9.3 and 2.10.2), and I add in RCs of new
as soon as they become available. (I also test Git for Windows; at the
moment I've got 1.8.0, 1.8.1.2, 1.8.3, 1.8.4, 1.8.5.2 and 1.9.5.1 from
msysgit and 2.3.7.1, 2.4.6, 2.5.3, 2.6.4, 2.7.4, 2.8.4, 2.9.3 and
2.10.2 from Git for Windows. 2.11.0-rc0 on Windows passes my test
suite; it looks like it's not tagging the same git/git commit as
v2.11.0-rc0 is.) I wish there was an easy way for me to open this up.
At the moment, it's something I can really only run in-house, as it
were.

At the moment I have limited ability to actually try to submit patches
myself. I really need to sit down and setup a working development
environment for Git. (My current dream, if I could get such an
environment running, is to follow up on your git blame-tree work.

>
> -Peff

^ permalink raw reply	[flat|nested] 84+ messages in thread

* Re: [PATCH 07/18] link_alt_odb_entry: handle normalize_path errors
  2016-11-08  1:12       ` Bryan Turner
@ 2016-11-08  5:33         ` Jeff King
  2016-11-08 19:27           ` Bryan Turner
  0 siblings, 1 reply; 84+ messages in thread
From: Jeff King @ 2016-11-08  5:33 UTC (permalink / raw)
  To: Bryan Turner; +Cc: Git Users, René Scharfe

On Mon, Nov 07, 2016 at 05:12:43PM -0800, Bryan Turner wrote:

> > The obvious solution is one of:
> >
> >   1. Stop calling normalize() at all when we do not have a relative base
> >      and the path is not absolute. This restores the original quirky
> >      behavior (plus makes the "foo/../../bar" case work).

Actually, I think we want to keep normalizing, as it is possible for
relative paths to normalize correctly (e.g., "foo/../bar"). We just need
to ignore the error, which is easy.

The patch is below, and is the absolute minimum I think we'd need for
v2.11.

Beyond that, we could go further:

  a. Actually make a real absolute path based on getcwd(), which would
     protect against later chdir() calls, and possibly help with
     duplicate suppression. I'm not sure there are actually any
     triggerable bugs here, so I went for the minimal fix.

  b. Pick a more sane base for the absolute path, like $GIT_DIR. If we
     did so, then people using relative paths in
     GIT_ALTERNATE_OBJECT_DIRECTORIES from a bare repo would continue to
     work, and people in non-bare repositories would have to add an
     extra ".." to most of their paths. So a slight regression, but
     saner overall semantics.

     Making it relative to the object directory ($GIT_DIR/objects, or
     even whatever is in $GIT_OBJECT_DIRECTORY) makes even more sense
     to me, but would regress even the bare case (and would probably be
     "interesting" along with the tmp-objdir stuff, which sets
     $GIT_OBJECT_DIRECTORY on the fly, as that would invalidate
     $GIT_ALTERNATE_OBJECT_DIRECTORIES).

I'm inclined to leave those to anybody interested post-v2.11 (or never,
if nobody cares). But it would be pretty trivial to do (a) as part of
this initial fix if anybody feels strongly.

> Is there anything I can do to help? I'm happy to test out changes.

The patch at the end of his mail obviously passes the newly-added tests
for me, but please confirm that it fixes your test suite.

I gather your suite is about noticing behavior changes between different
versions. For cases where we know there is an obvious right behavior, it
would be nice if you could contribute them as patches to git's test
suite. This case was overlooked because there was no test coverage at
all.

Barring that, running your suite and giving easily-reproducible problem
reports is valuable. The earlier the better. So I am happy to see this
on -rc0, and not on the final release. Periodically running it on
"master" during the development cycle would have caught it even sooner.

> At the moment I have limited ability to actually try to submit patches
> myself. I really need to sit down and setup a working development
> environment for Git. (My current dream, if I could get such an
> environment running, is to follow up on your git blame-tree work.

I would be happy for somebody to pick that up, too. It has been powering
GitHub's tree-view for several years now, but I know there are some
rough edges as well as opportunities to optimize it.

-- >8 --
Subject: [PATCH] alternates: re-allow relative paths from environment

Commit 670c359da (link_alt_odb_entry: handle normalize_path
errors, 2016-10-03) regressed the handling of relative paths
in the GIT_ALTERNATE_OBJECT_DIRECTORIES variable. It's not
entirely clear this was ever meant to work, but it _has_
worked for several years, so this commit restores the
original behavior.

When we get a path in GIT_ALTERNATE_OBJECT_DIRECTORIES, we
add it the path to the list of alternate object directories
as if it were found in objects/info/alternates, but with one
difference: we do not provide the link_alt_odb_entry()
function with a base for relative paths. That function
doesn't turn it into an absolute path, and we end up feeding
the relative path to the strbuf_normalize_path() function.

Most relative paths break out of the top-level directory
(e.g., "../foo.git/objects"), and thus normalizing fails.
Prior to 670c359da, we simply ignored the error, and due to
the way normalize_path_copy() was implemented it happened to
return the original path in this case. We then accessed the
alternate objects using this relative path.

By storing the relative path in the alt_odb list, the path
is relative to wherever we happen to be at the time we do an
object lookup. That means we look from $GIT_DIR in a bare
repository, and from the top of the worktree in a non-bare
repository.

If this were being designed from scratch, it would make
sense to pick a stable location (probably $GIT_DIR, or even
the object directory) and use that as the relative base,
turning the result into an absolute path.  However, given
the history, at this point the minimal fix is to match the
pre-670c359da behavior.

We can do this simply by ignoring the error when we have no
relative base and using the original value (which we now
reliably have, thanks to strbuf_normalize_path()).

That still leaves us with a relative path that foils our
duplicate detection, and may act strangely if we ever
chdir() later in the process. We could solve that by storing
an absolute path based on getcwd(). That may be a good
future direction; for now we'll do just the minimum to fix
the regression.

The new t5615 script demonstrates the fix in its final three
tests. Since we didn't have any tests of the alternates
environment variable at all, it also adds some tests of
absolute paths.

Signed-off-by: Jeff King <peff@peff.net>
---
 sha1_file.c              |  2 +-
 t/t5615-alternate-env.sh | 71 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 72 insertions(+), 1 deletion(-)
 create mode 100755 t/t5615-alternate-env.sh

diff --git a/sha1_file.c b/sha1_file.c
index 5457314e6..9c86d1924 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -296,7 +296,7 @@ static int link_alt_odb_entry(const char *entry, const char *relative_base,
 	}
 	strbuf_addstr(&pathbuf, entry);
 
-	if (strbuf_normalize_path(&pathbuf) < 0) {
+	if (strbuf_normalize_path(&pathbuf) < 0 && relative_base) {
 		error("unable to normalize alternate object path: %s",
 		      pathbuf.buf);
 		strbuf_release(&pathbuf);
diff --git a/t/t5615-alternate-env.sh b/t/t5615-alternate-env.sh
new file mode 100755
index 000000000..22d9d8178
--- /dev/null
+++ b/t/t5615-alternate-env.sh
@@ -0,0 +1,71 @@
+#!/bin/sh
+
+test_description='handling of alternates in environment variables'
+. ./test-lib.sh
+
+check_obj () {
+	alt=$1; shift
+	while read obj expect
+	do
+		echo "$obj" >&3 &&
+		echo "$obj $expect" >&4
+	done 3>input 4>expect &&
+	GIT_ALTERNATE_OBJECT_DIRECTORIES=$alt \
+		git "$@" cat-file --batch-check='%(objectname) %(objecttype)' \
+		<input >actual &&
+	test_cmp expect actual
+}
+
+test_expect_success 'create alternate repositories' '
+	git init --bare one.git &&
+	one=$(echo one | git -C one.git hash-object -w --stdin) &&
+	git init --bare two.git &&
+	two=$(echo two | git -C two.git hash-object -w --stdin)
+'
+
+test_expect_success 'objects inaccessible without alternates' '
+	check_obj "" <<-EOF
+	$one missing
+	$two missing
+	EOF
+'
+
+test_expect_success 'access alternate via absolute path' '
+	check_obj "$(pwd)/one.git/objects" <<-EOF
+	$one blob
+	$two missing
+	EOF
+'
+
+test_expect_success 'access multiple alternates' '
+	check_obj "$(pwd)/one.git/objects:$(pwd)/two.git/objects" <<-EOF
+	$one blob
+	$two blob
+	EOF
+'
+
+# bare paths are relative from $GIT_DIR
+test_expect_success 'access alternate via relative path (bare)' '
+	git init --bare bare.git &&
+	check_obj "../one.git/objects" -C bare.git <<-EOF
+	$one blob
+	EOF
+'
+
+# non-bare paths are relative to top of worktree
+test_expect_success 'access alternate via relative path (worktree)' '
+	git init worktree &&
+	check_obj "../one.git/objects" -C worktree <<-EOF
+	$one blob
+	EOF
+'
+
+# path is computed after moving to top-level of worktree
+test_expect_success 'access alternate via relative path (subdir)' '
+	mkdir subdir &&
+	check_obj "one.git/objects" -C subdir <<-EOF
+	$one blob
+	EOF
+'
+
+test_done
-- 
2.11.0.rc0.263.g6f44bc3


^ permalink raw reply related	[flat|nested] 84+ messages in thread

* Re: [PATCH 07/18] link_alt_odb_entry: handle normalize_path errors
  2016-11-08  5:33         ` Jeff King
@ 2016-11-08 19:27           ` Bryan Turner
  0 siblings, 0 replies; 84+ messages in thread
From: Bryan Turner @ 2016-11-08 19:27 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Users, René Scharfe

>
>> Is there anything I can do to help? I'm happy to test out changes.
>
> The patch at the end of his mail obviously passes the newly-added tests
> for me, but please confirm that it fixes your test suite.
>
> I gather your suite is about noticing behavior changes between different
> versions. For cases where we know there is an obvious right behavior, it
> would be nice if you could contribute them as patches to git's test
> suite. This case was overlooked because there was no test coverage at
> all.
>
> Barring that, running your suite and giving easily-reproducible problem
> reports is valuable. The earlier the better. So I am happy to see this
> on -rc0, and not on the final release. Periodically running it on
> "master" during the development cycle would have caught it even sooner.

I've applied your patch to the tip of the 2.11.0-rc0 tag (just to make
sure I don't accidentally pick up anything else on master; I'll test
that separately) and my full test suite passes without issue.

I'm going to investigate whether I can setup a version of this build
that runs "periodically" (I'm not sure what that period will be)
against git/git master. I've got a lot of the infrastructure in place,
but I'm going to need to automate a few things to make it really
viable.

As for contributing extensions to the test suite, that's a good idea.
I need to fast track getting a development environment setup.

^ permalink raw reply	[flat|nested] 84+ messages in thread

end of thread, other threads:[~2016-11-08 19:27 UTC | newest]

Thread overview: 84+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-03 20:33 [PATCH 0/18] alternate object database cleanups Jeff King
2016-10-03 20:33 ` [PATCH 01/18] t5613: drop reachable_via function Jeff King
2016-10-04  5:48   ` Jacob Keller
2016-10-04 13:43     ` Jeff King
2016-10-03 20:33 ` [PATCH 02/18] t5613: drop test_valid_repo function Jeff King
2016-10-04  5:50   ` Jacob Keller
2016-10-03 20:34 ` [PATCH 03/18] t5613: use test_must_fail Jeff King
2016-10-04  5:51   ` Jacob Keller
2016-10-03 20:34 ` [PATCH 04/18] t5613: whitespace/style cleanups Jeff King
2016-10-04  5:52   ` Jacob Keller
2016-10-04 13:47     ` Jeff King
2016-10-04 20:41       ` Jacob Keller
2016-10-03 20:34 ` [PATCH 05/18] t5613: do not chdir in main process Jeff King
2016-10-04  5:54   ` Jacob Keller
2016-10-04 21:00   ` Junio C Hamano
2016-10-03 20:34 ` [PATCH 06/18] t5613: clarify "too deep" recursion tests Jeff King
2016-10-04  5:57   ` Jacob Keller
2016-10-04 13:48     ` Jeff King
2016-10-04 20:44       ` Jacob Keller
2016-10-04 20:49         ` Jeff King
2016-10-04 20:52           ` Jacob Keller
2016-10-04 20:55             ` Jeff King
2016-10-04 20:58               ` Stefan Beller
2016-10-04 21:00                 ` Jeff King
2016-10-05 13:58                 ` Jakub Narębski
2016-10-05 14:40                   ` Jeff King
2016-10-05 16:14                     ` Junio C Hamano
2016-10-05 16:47                     ` Jacob Keller
2016-10-04 21:43               ` Jacob Keller
2016-10-04 21:49                 ` Jeff King
2016-10-04 21:50                   ` Jacob Keller
2016-10-03 20:34 ` [PATCH 07/18] link_alt_odb_entry: handle normalize_path errors Jeff King
2016-10-04  6:01   ` Jacob Keller
2016-10-04 21:08   ` Junio C Hamano
2016-10-05 18:47   ` René Scharfe
2016-10-05 19:04     ` Jeff King
2016-11-07 23:42   ` Bryan Turner
2016-11-08  0:30     ` Jeff King
2016-11-08  1:12       ` Bryan Turner
2016-11-08  5:33         ` Jeff King
2016-11-08 19:27           ` Bryan Turner
2016-10-03 20:34 ` [PATCH 08/18] link_alt_odb_entry: refactor string handling Jeff King
2016-10-04  6:05   ` Jacob Keller
2016-10-04 13:53     ` Jeff King
2016-10-04 20:46       ` Jacob Keller
2016-10-04 21:18   ` Junio C Hamano
2016-10-03 20:35 ` [PATCH 09/18] alternates: provide helper for adding to alternates list Jeff King
2016-10-04  6:07   ` Jacob Keller
2016-10-03 20:35 ` [PATCH 10/18] alternates: provide helper for allocating alternate Jeff King
2016-10-04  6:09   ` Jacob Keller
2016-10-03 20:35 ` [PATCH 11/18] alternates: encapsulate alt->base munging Jeff King
2016-10-03 20:35 ` [PATCH 12/18] alternates: use a separate scratch space Jeff King
2016-10-04  6:12   ` Jacob Keller
2016-10-04 21:29   ` Junio C Hamano
2016-10-04 21:32     ` Jeff King
2016-10-04 21:49       ` Junio C Hamano
2016-10-04 21:51         ` Jeff King
2016-10-03 20:35 ` [PATCH 13/18] fill_sha1_file: write "boring" characters Jeff King
2016-10-04  6:13   ` Jacob Keller
2016-10-04 21:46     ` Junio C Hamano
2016-10-04 21:48       ` Jeff King
2016-10-04 21:49       ` Jacob Keller
2016-10-05 19:35         ` Junio C Hamano
2016-10-03 20:36 ` [PATCH 14/18] alternates: store scratch buffer as strbuf Jeff King
2016-10-03 20:36 ` [PATCH 15/18] fill_sha1_file: write into a strbuf Jeff King
2016-10-04  6:44   ` Jacob Keller
2016-10-03 20:36 ` [PATCH 16/18] count-objects: report alternates via verbose mode Jeff King
2016-10-04  6:46   ` Jacob Keller
2016-10-04 13:56     ` Jeff King
2016-10-05 14:23   ` Jakub Narębski
2016-10-05 18:47   ` René Scharfe
2016-10-03 20:36 ` [PATCH 17/18] sha1_file: always allow relative paths to alternates Jeff King
2016-10-04  6:50   ` Jacob Keller
2016-10-04 14:00     ` Jeff King
2016-10-03 20:36 ` [PATCH 18/18] alternates: use fspathcmp to detect duplicates Jeff King
2016-10-04  6:51   ` Jacob Keller
2016-10-04 14:10     ` Jeff King
2016-10-04 21:42   ` Junio C Hamano
2016-10-05  2:34   ` Aaron Schrab
2016-10-05  3:54     ` Jeff King
2016-10-04  5:47 ` [PATCH 0/18] alternate object database cleanups Jacob Keller
2016-10-04 13:41   ` Jeff King
2016-10-04 20:40     ` Jacob Keller
2016-10-05 18:47 ` René Scharfe

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).