git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/5] pack-objects: better handling of negative options
@ 2021-05-01 14:02 Jeff King
  2021-05-01 14:02 ` [PATCH 1/5] t5300: modernize basic tests Jeff King
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Jeff King @ 2021-05-01 14:02 UTC (permalink / raw)
  To: git; +Cc: Yiyuan guo

This series fixes a few problems discussed on the security list, but I
don't think the security implications are that interesting (an attacker
would need to control config or command-line options, which are already
pretty dangerous, and can mostly just cause the process to abort). But
still worth addressing.

Patches 3 and 5 are the interesting ones. The rest are just cleaning up
or improving test coverage.

  [1/5]: t5300: modernize basic tests
  [2/5]: t5300: check that we produced expected number of deltas
  [3/5]: pack-objects: clamp negative window size to 0
  [4/5]: t5316: check behavior of pack-objects --depth=0
  [5/5]: pack-objects: clamp negative depth to 0

 builtin/pack-objects.c      |   4 +
 t/t5300-pack-object.sh      | 265 +++++++++++++++---------------------
 t/t5316-pack-delta-depth.sh |  15 ++
 3 files changed, 126 insertions(+), 158 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/5] t5300: modernize basic tests
  2021-05-01 14:02 [PATCH 0/5] pack-objects: better handling of negative options Jeff King
@ 2021-05-01 14:02 ` Jeff King
  2021-05-03  5:27   ` Junio C Hamano
  2021-05-01 14:03 ` [PATCH 2/5] t5300: check that we produced expected number of deltas Jeff King
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2021-05-01 14:02 UTC (permalink / raw)
  To: git; +Cc: Yiyuan guo

The first set of tests in t5300 goes back to 2005, and doesn't use some
of our customary style and tools these days. In preparation for touching
them, let's modernize a few things:

  - titles go on the line with test_expect_success, with a hanging
    open-quote to start the test body

  - test bodies should be indented with tabs

  - opening braces for shell blocks in &&-chains go on their own line

  - no space between redirect operators and files (">foo", not "> foo")

  - avoid doing work outside of test blocks; in this case, we can stick
    the setup of ".git2" into the appropriate blocks

  - avoid modifying and then cleaning up the environment or current
    directory by using subshells and "git -C"

  - this test does a curious thing when testing the unpacking: it sets
    GIT_OBJECT_DIRECTORY, and then does a "git init" in the _original_
    directory, creating a weird mixed situation. Instead, it's much
    simpler to just "git init --bare" a new repository to unpack into,
    and check the results there. I renamed this "git2" instead of
    ".git2" to make it more clear it's a separate repo.

  - we can observe that the bodies of the no-delta, ref_delta, and
    ofs_delta cases are all virtually identical except for the pack
    creation, and factor out shared helper functions. I collapsed "do
    the unpack" and "check the results of the unpack" into a single
    test, since that makes the expected lifetime of the "git2" temporary
    directory more clear (that also lets us use test_when_finished to
    clean it up). This does make the "-v" output slightly less useful,
    but the improvement in reading the actual test code makes it worth
    it.

  - I dropped the "pwd" calls from some tests. These don't do anything
    functional, and I suspect may have been an aid for debugging when
    the script was more cavalier about leaving the working directory
    changed between tests.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5300-pack-object.sh | 243 ++++++++++++++---------------------------
 1 file changed, 85 insertions(+), 158 deletions(-)

diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 2fc5e68250..1e10c832a6 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -8,125 +8,74 @@ test_description='git pack-object
 '
 . ./test-lib.sh
 
-TRASH=$(pwd)
-
-test_expect_success \
-    'setup' \
-    'rm -f .git/index* &&
-     perl -e "print \"a\" x 4096;" > a &&
-     perl -e "print \"b\" x 4096;" > b &&
-     perl -e "print \"c\" x 4096;" > c &&
-     test-tool genrandom "seed a" 2097152 > a_big &&
-     test-tool genrandom "seed b" 2097152 > b_big &&
-     git update-index --add a a_big b b_big c &&
-     cat c >d && echo foo >>d && git update-index --add d &&
-     tree=$(git write-tree) &&
-     commit=$(git commit-tree $tree </dev/null) && {
-	 echo $tree &&
-	 echo $commit &&
-	 git ls-tree $tree | sed -e "s/.* \\([0-9a-f]*\\)	.*/\\1/"
-     } >obj-list && {
-	 git diff-tree --root -p $commit &&
-	 while read object
-	 do
-	    t=$(git cat-file -t $object) &&
-	    git cat-file $t $object || return 1
-	 done <obj-list
-     } >expect'
-
-test_expect_success \
-    'pack without delta' \
-    'packname_1=$(git pack-objects --window=0 test-1 <obj-list)'
-
-test_expect_success \
-    'pack-objects with bogus arguments' \
-    'test_must_fail git pack-objects --window=0 test-1 blah blah <obj-list'
-
-rm -fr .git2
-mkdir .git2
-
-test_expect_success \
-    'unpack without delta' \
-    "GIT_OBJECT_DIRECTORY=.git2/objects &&
-     export GIT_OBJECT_DIRECTORY &&
-     git init &&
-     git unpack-objects -n <test-1-${packname_1}.pack &&
-     git unpack-objects <test-1-${packname_1}.pack"
-
-unset GIT_OBJECT_DIRECTORY
-cd "$TRASH/.git2"
+test_expect_success 'setup' '
+	rm -f .git/index* &&
+	perl -e "print \"a\" x 4096;" >a &&
+	perl -e "print \"b\" x 4096;" >b &&
+	perl -e "print \"c\" x 4096;" >c &&
+	test-tool genrandom "seed a" 2097152 >a_big &&
+	test-tool genrandom "seed b" 2097152 >b_big &&
+	git update-index --add a a_big b b_big c &&
+	cat c >d && echo foo >>d && git update-index --add d &&
+	tree=$(git write-tree) &&
+	commit=$(git commit-tree $tree </dev/null) &&
+	{
+		echo $tree &&
+		echo $commit &&
+		git ls-tree $tree | sed -e "s/.* \\([0-9a-f]*\\)	.*/\\1/"
+	} >obj-list &&
+	{
+		git diff-tree --root -p $commit &&
+		while read object
+		do
+			t=$(git cat-file -t $object) &&
+			git cat-file $t $object || return 1
+		done <obj-list
+	} >expect
+'
 
-test_expect_success \
-    'check unpack without delta' \
-    '(cd ../.git && find objects -type f -print) |
-     while read path
-     do
-         cmp $path ../.git/$path || {
-	     echo $path differs.
-	     return 1
-	 }
-     done'
-cd "$TRASH"
+test_expect_success 'pack without delta' '
+	packname_1=$(git pack-objects --window=0 test-1 <obj-list)
+'
 
-test_expect_success \
-    'pack with REF_DELTA' \
-    'pwd &&
-     packname_2=$(git pack-objects test-2 <obj-list)'
+test_expect_success 'pack-objects with bogus arguments' '
+	test_must_fail git pack-objects --window=0 test-1 blah blah <obj-list
+'
 
-rm -fr .git2
-mkdir .git2
+check_unpack () {
+	test_when_finished "rm -rf git2" &&
+	git init --bare git2 &&
+	git -C git2 unpack-objects -n <"$1".pack &&
+	git -C git2 unpack-objects <"$1".pack &&
+	(cd .git && find objects -type f -print) |
+	while read path
+	do
+		cmp git2/$path .git/$path || {
+			echo $path differs.
+			return 1
+		}
+	done
+}
+
+test_expect_success 'unpack without delta' '
+	check_unpack test-1-${packname_1}
+'
 
-test_expect_success \
-    'unpack with REF_DELTA' \
-    'GIT_OBJECT_DIRECTORY=.git2/objects &&
-     export GIT_OBJECT_DIRECTORY &&
-     git init &&
-     git unpack-objects -n <test-2-${packname_2}.pack &&
-     git unpack-objects <test-2-${packname_2}.pack'
-
-unset GIT_OBJECT_DIRECTORY
-cd "$TRASH/.git2"
-test_expect_success \
-    'check unpack with REF_DELTA' \
-    '(cd ../.git && find objects -type f -print) |
-     while read path
-     do
-         cmp $path ../.git/$path || {
-	     echo $path differs.
-	     return 1
-	 }
-     done'
-cd "$TRASH"
+test_expect_success 'pack with REF_DELTA' '
+	packname_2=$(git pack-objects test-2 <obj-list)
+'
 
-test_expect_success \
-    'pack with OFS_DELTA' \
-    'pwd &&
-     packname_3=$(git pack-objects --delta-base-offset test-3 <obj-list)'
+test_expect_success 'unpack with REF_DELTA' '
+	check_unpack test-2-${packname_2}
+'
 
-rm -fr .git2
-mkdir .git2
+test_expect_success 'pack with OFS_DELTA' '
+	packname_3=$(git pack-objects --delta-base-offset test-3 <obj-list)
+'
 
-test_expect_success \
-    'unpack with OFS_DELTA' \
-    'GIT_OBJECT_DIRECTORY=.git2/objects &&
-     export GIT_OBJECT_DIRECTORY &&
-     git init &&
-     git unpack-objects -n <test-3-${packname_3}.pack &&
-     git unpack-objects <test-3-${packname_3}.pack'
-
-unset GIT_OBJECT_DIRECTORY
-cd "$TRASH/.git2"
-test_expect_success \
-    'check unpack with OFS_DELTA' \
-    '(cd ../.git && find objects -type f -print) |
-     while read path
-     do
-         cmp $path ../.git/$path || {
-	     echo $path differs.
-	     return 1
-	 }
-     done'
-cd "$TRASH"
+test_expect_success 'unpack with OFS_DELTA' '
+	check_unpack test-3-${packname_3}
+'
 
 test_expect_success 'compare delta flavors' '
 	perl -e '\''
@@ -135,55 +84,33 @@ test_expect_success 'compare delta flavors' '
 	'\'' test-2-$packname_2.pack test-3-$packname_3.pack
 '
 
-rm -fr .git2
-mkdir .git2
+check_use_objects () {
+	test_when_finished "rm -rf git2" &&
+	git init --bare git2 &&
+	cp "$1".pack "$1".idx git2/objects/pack &&
+	(
+		cd git2 &&
+		git diff-tree --root -p $commit &&
+		while read object
+		do
+			t=$(git cat-file -t $object) &&
+			git cat-file $t $object || exit 1
+		done
+	) <obj-list >current &&
+	cmp expect current
+}
 
-test_expect_success \
-    'use packed objects' \
-    'GIT_OBJECT_DIRECTORY=.git2/objects &&
-     export GIT_OBJECT_DIRECTORY &&
-     git init &&
-     cp test-1-${packname_1}.pack test-1-${packname_1}.idx .git2/objects/pack && {
-	 git diff-tree --root -p $commit &&
-	 while read object
-	 do
-	    t=$(git cat-file -t $object) &&
-	    git cat-file $t $object || return 1
-	 done <obj-list
-    } >current &&
-    cmp expect current'
+test_expect_success 'use packed objects' '
+	check_use_objects test-1-${packname_1}
+'
 
-test_expect_success \
-    'use packed deltified (REF_DELTA) objects' \
-    'GIT_OBJECT_DIRECTORY=.git2/objects &&
-     export GIT_OBJECT_DIRECTORY &&
-     rm -f .git2/objects/pack/test-* &&
-     cp test-2-${packname_2}.pack test-2-${packname_2}.idx .git2/objects/pack && {
-	 git diff-tree --root -p $commit &&
-	 while read object
-	 do
-	    t=$(git cat-file -t $object) &&
-	    git cat-file $t $object || return 1
-	 done <obj-list
-    } >current &&
-    cmp expect current'
+test_expect_success 'use packed deltified (REF_DELTA) objects' '
+	check_use_objects test-2-${packname_2}
+'
 
-test_expect_success \
-    'use packed deltified (OFS_DELTA) objects' \
-    'GIT_OBJECT_DIRECTORY=.git2/objects &&
-     export GIT_OBJECT_DIRECTORY &&
-     rm -f .git2/objects/pack/test-* &&
-     cp test-3-${packname_3}.pack test-3-${packname_3}.idx .git2/objects/pack && {
-	 git diff-tree --root -p $commit &&
-	 while read object
-	 do
-	    t=$(git cat-file -t $object) &&
-	    git cat-file $t $object || return 1
-	 done <obj-list
-    } >current &&
-    cmp expect current'
-
-unset GIT_OBJECT_DIRECTORY
+test_expect_success 'use packed deltified (OFS_DELTA) objects' '
+	check_use_objects test-3-${packname_3}
+'
 
 test_expect_success 'survive missing objects/pack directory' '
 	(
-- 
2.31.1.875.g5dccece0aa


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/5] t5300: check that we produced expected number of deltas
  2021-05-01 14:02 [PATCH 0/5] pack-objects: better handling of negative options Jeff King
  2021-05-01 14:02 ` [PATCH 1/5] t5300: modernize basic tests Jeff King
@ 2021-05-01 14:03 ` Jeff King
  2021-05-01 14:03 ` [PATCH 3/5] pack-objects: clamp negative window size to 0 Jeff King
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2021-05-01 14:03 UTC (permalink / raw)
  To: git; +Cc: Yiyuan guo

We pack a set of objects both with and without --window=0, assuming that
the 0-length window will cause us not to produce any deltas. Let's
confirm that this is the case.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5300-pack-object.sh | 23 ++++++++++++++++++++---
 1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 1e10c832a6..887e2d8d88 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -34,8 +34,22 @@ test_expect_success 'setup' '
 	} >expect
 '
 
+# usage: check_deltas <stderr_from_pack_objects> <cmp_op> <nr_deltas>
+# e.g.: check_deltas stderr -gt 0
+check_deltas() {
+	deltas=$(perl -lne '/delta (\d+)/ and print $1' "$1") &&
+	shift &&
+	if ! test "$deltas" "$@"
+	then
+		echo >&2 "unexpected number of deltas (compared $delta $*)"
+		return 1
+	fi
+}
+
 test_expect_success 'pack without delta' '
-	packname_1=$(git pack-objects --window=0 test-1 <obj-list)
+	packname_1=$(git pack-objects --progress --window=0 test-1 \
+			<obj-list 2>stderr) &&
+	check_deltas stderr = 0
 '
 
 test_expect_success 'pack-objects with bogus arguments' '
@@ -62,15 +76,18 @@ test_expect_success 'unpack without delta' '
 '
 
 test_expect_success 'pack with REF_DELTA' '
-	packname_2=$(git pack-objects test-2 <obj-list)
+	packname_2=$(git pack-objects --progress test-2 <obj-list 2>stderr) &&
+	check_deltas stderr -gt 0
 '
 
 test_expect_success 'unpack with REF_DELTA' '
 	check_unpack test-2-${packname_2}
 '
 
 test_expect_success 'pack with OFS_DELTA' '
-	packname_3=$(git pack-objects --delta-base-offset test-3 <obj-list)
+	packname_3=$(git pack-objects --progress --delta-base-offset test-3 \
+			<obj-list 2>stderr) &&
+	check_deltas stderr -gt 0
 '
 
 test_expect_success 'unpack with OFS_DELTA' '
-- 
2.31.1.875.g5dccece0aa


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/5] pack-objects: clamp negative window size to 0
  2021-05-01 14:02 [PATCH 0/5] pack-objects: better handling of negative options Jeff King
  2021-05-01 14:02 ` [PATCH 1/5] t5300: modernize basic tests Jeff King
  2021-05-01 14:03 ` [PATCH 2/5] t5300: check that we produced expected number of deltas Jeff King
@ 2021-05-01 14:03 ` Jeff King
  2021-05-03 12:10   ` Derrick Stolee
  2021-05-01 14:04 ` [PATCH 4/5] t5316: check behavior of pack-objects --depth=0 Jeff King
  2021-05-01 14:04 ` [PATCH 5/5] pack-objects: clamp negative depth to 0 Jeff King
  4 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2021-05-01 14:03 UTC (permalink / raw)
  To: git; +Cc: Yiyuan guo

A negative window size makes no sense, and the code in find_deltas() is
not prepared to handle it. If you pass "-1", for example, we end up
generate a 0-length array of "struct unpacked", but our loop assumes it
has at least one entry in it (and we end up reading garbage memory).

We could complain to the user about this, but it's more forgiving to
just clamp it to 0, which means "do not find any deltas at all". The
0-case is already tested earlier in the script, so we'll make sure this
does the same thing.

Reported-by: Yiyuan guo <yguoaz@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/pack-objects.c | 2 ++
 t/t5300-pack-object.sh | 5 +++++
 2 files changed, 7 insertions(+)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 6d13cd3e1a..ea7a5b3ba5 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3871,6 +3871,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 			(1U << OE_Z_DELTA_BITS) - 1);
 		cache_max_small_delta_size = (1U << OE_Z_DELTA_BITS) - 1;
 	}
+	if (window < 0)
+		window = 0;
 
 	strvec_push(&rp, "pack-objects");
 	if (thin) {
diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 887e2d8d88..5c5e53f0be 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -613,4 +613,9 @@ test_expect_success '--stdin-packs with broken links' '
 	)
 '
 
+test_expect_success 'negative window clamps to 0' '
+	git pack-objects --progress --window=-1 neg-window <obj-list 2>stderr &&
+	check_deltas stderr = 0
+'
+
 test_done
-- 
2.31.1.875.g5dccece0aa


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/5] t5316: check behavior of pack-objects --depth=0
  2021-05-01 14:02 [PATCH 0/5] pack-objects: better handling of negative options Jeff King
                   ` (2 preceding siblings ...)
  2021-05-01 14:03 ` [PATCH 3/5] pack-objects: clamp negative window size to 0 Jeff King
@ 2021-05-01 14:04 ` Jeff King
  2021-05-01 14:04 ` [PATCH 5/5] pack-objects: clamp negative depth to 0 Jeff King
  4 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2021-05-01 14:04 UTC (permalink / raw)
  To: git; +Cc: Yiyuan guo

We'd expect this to cleanly produce no deltas at all (as opposed to
getting confused by an out-of-bounds value), and it does.

Note we have to adjust our max_chain test helper, which expected to find
at least one delta.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t5316-pack-delta-depth.sh | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/t/t5316-pack-delta-depth.sh b/t/t5316-pack-delta-depth.sh
index a8c1bc0f66..3e84df4137 100755
--- a/t/t5316-pack-delta-depth.sh
+++ b/t/t5316-pack-delta-depth.sh
@@ -69,6 +69,7 @@ test_expect_success 'create series of packs' '
 max_chain() {
 	git index-pack --verify-stat-only "$1" >output &&
 	perl -lne '
+	  BEGIN { $len = 0 }
 	  /chain length = (\d+)/ and $len = $1;
 	  END { print $len }
 	' output
@@ -94,4 +95,11 @@ test_expect_success '--depth limits depth' '
 	test_cmp expect actual
 '
 
+test_expect_success '--depth=0 disables deltas' '
+	pack=$(git pack-objects --all --depth=0 </dev/null pack) &&
+	echo 0 >expect &&
+	max_chain pack-$pack.pack >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.31.1.875.g5dccece0aa


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/5] pack-objects: clamp negative depth to 0
  2021-05-01 14:02 [PATCH 0/5] pack-objects: better handling of negative options Jeff King
                   ` (3 preceding siblings ...)
  2021-05-01 14:04 ` [PATCH 4/5] t5316: check behavior of pack-objects --depth=0 Jeff King
@ 2021-05-01 14:04 ` Jeff King
  2021-05-03 12:11   ` Derrick Stolee
  4 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2021-05-01 14:04 UTC (permalink / raw)
  To: git; +Cc: Yiyuan guo

A negative delta depth makes no sense, and the code is not prepared to
handle it. If passed "--depth=-1" on the command line, then this line
from break_delta_chains():

	cur->depth = (total_depth--) % (depth + 1);

triggers a divide-by-zero. This is undefined behavior according to the C
standard, but on POSIX systems results in SIGFPE killing the process.
This is certainly one way to inform the use that the command was
invalid, but it's a bit friendlier to just treat it as "don't allow any
deltas", which we already do for --depth=0.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/pack-objects.c      | 2 ++
 t/t5316-pack-delta-depth.sh | 7 +++++++
 2 files changed, 9 insertions(+)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index ea7a5b3ba5..da5e0700f9 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3861,6 +3861,8 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix)
 	if (pack_to_stdout != !base_name || argc)
 		usage_with_options(pack_usage, pack_objects_options);
 
+	if (depth < 0)
+		depth = 0;
 	if (depth >= (1 << OE_DEPTH_BITS)) {
 		warning(_("delta chain depth %d is too deep, forcing %d"),
 			depth, (1 << OE_DEPTH_BITS) - 1);
diff --git a/t/t5316-pack-delta-depth.sh b/t/t5316-pack-delta-depth.sh
index 3e84df4137..759169d074 100755
--- a/t/t5316-pack-delta-depth.sh
+++ b/t/t5316-pack-delta-depth.sh
@@ -102,4 +102,11 @@ test_expect_success '--depth=0 disables deltas' '
 	test_cmp expect actual
 '
 
+test_expect_success 'negative depth disables deltas' '
+	pack=$(git pack-objects --all --depth=-1 </dev/null pack) &&
+	echo 0 >expect &&
+	max_chain pack-$pack.pack >actual &&
+	test_cmp expect actual
+'
+
 test_done
-- 
2.31.1.875.g5dccece0aa

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] t5300: modernize basic tests
  2021-05-01 14:02 ` [PATCH 1/5] t5300: modernize basic tests Jeff King
@ 2021-05-03  5:27   ` Junio C Hamano
  2021-05-03 14:53     ` Jeff King
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2021-05-03  5:27 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Yiyuan guo

Jeff King <peff@peff.net> writes:

> The first set of tests in t5300 goes back to 2005, and doesn't use some
> of our customary style and tools these days. In preparation for touching
> them, let's modernize a few things:
> ...
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  t/t5300-pack-object.sh | 243 ++++++++++++++---------------------------
>  1 file changed, 85 insertions(+), 158 deletions(-)

Thanks.  That was an impressive list of a "few" things ;-)  The
result certainly is much easier to follow.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/5] pack-objects: clamp negative window size to 0
  2021-05-01 14:03 ` [PATCH 3/5] pack-objects: clamp negative window size to 0 Jeff King
@ 2021-05-03 12:10   ` Derrick Stolee
  2021-05-03 14:55     ` Jeff King
  0 siblings, 1 reply; 15+ messages in thread
From: Derrick Stolee @ 2021-05-03 12:10 UTC (permalink / raw)
  To: Jeff King, git; +Cc: Yiyuan guo

On 5/1/2021 10:03 AM, Jeff King wrote:
> A negative window size makes no sense, and the code in find_deltas() is
> not prepared to handle it. If you pass "-1", for example, we end up
> generate a 0-length array of "struct unpacked", but our loop assumes it
> has at least one entry in it (and we end up reading garbage memory).
> 
> We could complain to the user about this, but it's more forgiving to
> just clamp it to 0, which means "do not find any deltas at all". The
> 0-case is already tested earlier in the script, so we'll make sure this
> does the same thing.

This seems like a reasonable approach. It takes existing "undefined"
behavior and turns it into well-understood, "defined" behavior.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/5] pack-objects: clamp negative depth to 0
  2021-05-01 14:04 ` [PATCH 5/5] pack-objects: clamp negative depth to 0 Jeff King
@ 2021-05-03 12:11   ` Derrick Stolee
  0 siblings, 0 replies; 15+ messages in thread
From: Derrick Stolee @ 2021-05-03 12:11 UTC (permalink / raw)
  To: Jeff King, git; +Cc: Yiyuan guo

On 5/1/2021 10:04 AM, Jeff King wrote:
> A negative delta depth makes no sense, and the code is not prepared to
> handle it. If passed "--depth=-1" on the command line, then this line
> from break_delta_chains():
> 
> 	cur->depth = (total_depth--) % (depth + 1);
> 
> triggers a divide-by-zero. This is undefined behavior according to the C
> standard, but on POSIX systems results in SIGFPE killing the process.
> This is certainly one way to inform the use that the command was
> invalid, but it's a bit friendlier to just treat it as "don't allow any
> deltas", which we already do for --depth=0.

Makes sense to me. This whole series LGTM.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] t5300: modernize basic tests
  2021-05-03  5:27   ` Junio C Hamano
@ 2021-05-03 14:53     ` Jeff King
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2021-05-03 14:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Yiyuan guo

On Mon, May 03, 2021 at 02:27:18PM +0900, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > The first set of tests in t5300 goes back to 2005, and doesn't use some
> > of our customary style and tools these days. In preparation for touching
> > them, let's modernize a few things:
> > ...
> > Signed-off-by: Jeff King <peff@peff.net>
> > ---
> >  t/t5300-pack-object.sh | 243 ++++++++++++++---------------------------
> >  1 file changed, 85 insertions(+), 158 deletions(-)
> 
> Thanks.  That was an impressive list of a "few" things ;-)  The
> result certainly is much easier to follow.

Sometimes these things get bigger than you expected. :)

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/5] pack-objects: clamp negative window size to 0
  2021-05-03 12:10   ` Derrick Stolee
@ 2021-05-03 14:55     ` Jeff King
  2021-05-04 18:47       ` René Scharfe
  0 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2021-05-03 14:55 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: git, Yiyuan guo

On Mon, May 03, 2021 at 08:10:24AM -0400, Derrick Stolee wrote:

> On 5/1/2021 10:03 AM, Jeff King wrote:
> > A negative window size makes no sense, and the code in find_deltas() is
> > not prepared to handle it. If you pass "-1", for example, we end up
> > generate a 0-length array of "struct unpacked", but our loop assumes it
> > has at least one entry in it (and we end up reading garbage memory).
> > 
> > We could complain to the user about this, but it's more forgiving to
> > just clamp it to 0, which means "do not find any deltas at all". The
> > 0-case is already tested earlier in the script, so we'll make sure this
> > does the same thing.
> 
> This seems like a reasonable approach. It takes existing "undefined"
> behavior and turns it into well-understood, "defined" behavior.

I was on the fence on doing that, or just:

  if (window < 0)
	die("sorry dude, negative windows are nonsense");

So if anybody has a strong preference, I could be easily persuaded. Part
of what led me to being forgiving was that we similarly clamp too-large
depth values (with a warning; I didn't think it was really necessary
here, though).

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/5] pack-objects: clamp negative window size to 0
  2021-05-03 14:55     ` Jeff King
@ 2021-05-04 18:47       ` René Scharfe
  2021-05-04 21:38         ` Jeff King
  2021-05-05 11:53         ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 15+ messages in thread
From: René Scharfe @ 2021-05-04 18:47 UTC (permalink / raw)
  To: Jeff King, Derrick Stolee; +Cc: git, Yiyuan guo

Am 03.05.21 um 16:55 schrieb Jeff King:
> On Mon, May 03, 2021 at 08:10:24AM -0400, Derrick Stolee wrote:
>
>> On 5/1/2021 10:03 AM, Jeff King wrote:
>>> A negative window size makes no sense, and the code in find_deltas() is
>>> not prepared to handle it. If you pass "-1", for example, we end up
>>> generate a 0-length array of "struct unpacked", but our loop assumes it
>>> has at least one entry in it (and we end up reading garbage memory).
>>>
>>> We could complain to the user about this, but it's more forgiving to
>>> just clamp it to 0, which means "do not find any deltas at all". The
>>> 0-case is already tested earlier in the script, so we'll make sure this
>>> does the same thing.
>>
>> This seems like a reasonable approach. It takes existing "undefined"
>> behavior and turns it into well-understood, "defined" behavior.
>
> I was on the fence on doing that, or just:
>
>   if (window < 0)
> 	die("sorry dude, negative windows are nonsense");
>
> So if anybody has a strong preference, I could be easily persuaded. Part
> of what led me to being forgiving was that we similarly clamp too-large
> depth values (with a warning; I didn't think it was really necessary
> here, though).

There's another option: Mapping -1 or all negative values to the
maximum:

	if (window < 0)
		window = INT_MAX;
	if (depth < 0)
		depth = (1 << OE_DEPTH_BITS) - 1;

That's allows saying "gimme all you got" without knowing or being
willing to type out the exact maximum value, which may change between
versions.  Not all that useful for --window, I guess.  That convention
has been used elsewhere I'm sure, but can't point out a concrete
example.  $arr[-1] get the last item of the array $arr in PowerShell,
though, which is kind of similar.

Sure, you get the same effect in both cases by typing 2147483647, but
-1 is more convenient.

Not a strong preference, but I thought it was at least worth
mentioning that particular bike shed color. :)

René

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/5] pack-objects: clamp negative window size to 0
  2021-05-04 18:47       ` René Scharfe
@ 2021-05-04 21:38         ` Jeff King
  2021-05-05 11:53         ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 15+ messages in thread
From: Jeff King @ 2021-05-04 21:38 UTC (permalink / raw)
  To: René Scharfe; +Cc: Derrick Stolee, git, Yiyuan guo

On Tue, May 04, 2021 at 08:47:56PM +0200, René Scharfe wrote:

> >> This seems like a reasonable approach. It takes existing "undefined"
> >> behavior and turns it into well-understood, "defined" behavior.
> >
> > I was on the fence on doing that, or just:
> >
> >   if (window < 0)
> > 	die("sorry dude, negative windows are nonsense");
> >
> > So if anybody has a strong preference, I could be easily persuaded. Part
> > of what led me to being forgiving was that we similarly clamp too-large
> > depth values (with a warning; I didn't think it was really necessary
> > here, though).
> 
> There's another option: Mapping -1 or all negative values to the
> maximum:
> 
> 	if (window < 0)
> 		window = INT_MAX;
> 	if (depth < 0)
> 		depth = (1 << OE_DEPTH_BITS) - 1;
> 
> That's allows saying "gimme all you got" without knowing or being
> willing to type out the exact maximum value, which may change between
> versions.  Not all that useful for --window, I guess.  That convention
> has been used elsewhere I'm sure, but can't point out a concrete
> example.  $arr[-1] get the last item of the array $arr in PowerShell,
> though, which is kind of similar.
> 
> Sure, you get the same effect in both cases by typing 2147483647, but
> -1 is more convenient.
> 
> Not a strong preference, but I thought it was at least worth
> mentioning that particular bike shed color. :)

Thanks for thinking about this. In general, yeah, allowing "-1" as
"unlimited" is a sensible thing for many options. But for these two
variables, I don't think "unlimited" is ever a good idea:

  - for --window, it makes the amount of work quadratic in the number of
    objects in the repository (and likewise, memory usage equivalent to
    the uncompressed contents of the repo). Going beyond about 250 or so
    gives diminishing returns.

  - for --depth, going beyond about 50 provides diminishing space
    returns, and makes the run-time performance of accessing the deltas
    horrible.

So certainly INT_MAX or the max allowable by OE_DEPTH_BITS is a very bad
idea in both cases, and the use would probably be happier if we just hit
a die() instead. :) We _could_ make "-1" mean "the maximum sensible
valuable", but I think there's a lot of wiggle room for "sensible"
there. I much prefer using and advertising the actual values there (as
we do for "gc --aggressive").

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/5] pack-objects: clamp negative window size to 0
  2021-05-04 18:47       ` René Scharfe
  2021-05-04 21:38         ` Jeff King
@ 2021-05-05 11:53         ` Ævar Arnfjörð Bjarmason
  2021-05-05 16:19           ` René Scharfe
  1 sibling, 1 reply; 15+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-05 11:53 UTC (permalink / raw)
  To: René Scharfe; +Cc: Jeff King, Derrick Stolee, git, Yiyuan guo


On Tue, May 04 2021, René Scharfe wrote:

> Am 03.05.21 um 16:55 schrieb Jeff King:
>> On Mon, May 03, 2021 at 08:10:24AM -0400, Derrick Stolee wrote:
>>
>>> On 5/1/2021 10:03 AM, Jeff King wrote:
>>>> A negative window size makes no sense, and the code in find_deltas() is
>>>> not prepared to handle it. If you pass "-1", for example, we end up
>>>> generate a 0-length array of "struct unpacked", but our loop assumes it
>>>> has at least one entry in it (and we end up reading garbage memory).
>>>>
>>>> We could complain to the user about this, but it's more forgiving to
>>>> just clamp it to 0, which means "do not find any deltas at all". The
>>>> 0-case is already tested earlier in the script, so we'll make sure this
>>>> does the same thing.
>>>
>>> This seems like a reasonable approach. It takes existing "undefined"
>>> behavior and turns it into well-understood, "defined" behavior.
>>
>> I was on the fence on doing that, or just:
>>
>>   if (window < 0)
>> 	die("sorry dude, negative windows are nonsense");
>>
>> So if anybody has a strong preference, I could be easily persuaded. Part
>> of what led me to being forgiving was that we similarly clamp too-large
>> depth values (with a warning; I didn't think it was really necessary
>> here, though).
>
> There's another option: Mapping -1 or all negative values to the
> maximum:
>
> 	if (window < 0)
> 		window = INT_MAX;
> 	if (depth < 0)
> 		depth = (1 << OE_DEPTH_BITS) - 1;
>
> That's allows saying "gimme all you got" without knowing or being
> willing to type out the exact maximum value, which may change between
> versions.  Not all that useful for --window, I guess.  That convention
> has been used elsewhere I'm sure, but can't point out a concrete
> example.  $arr[-1] get the last item of the array $arr in PowerShell,
> though, which is kind of similar.
>
> Sure, you get the same effect in both cases by typing 2147483647, but
> -1 is more convenient.
>
> Not a strong preference, but I thought it was at least worth
> mentioning that particular bike shed color. :)

That seems sensible to expose, but I think should really be
--window=max, not --window=-1. The latter feels way too much like
assuming that a user might know about C's "set all bits" semantics.

The one example of such a variable I could think of is core.abbrev=no,
which could arguably benefit from a core.abbrev=max synonym.

Another one is *.threads, e.g. grep.threads, index.threads. We currently
say that "auto" is like "max, but I can see how we'd eventually benefit
from splitting those up. I sometimes run git on machines where that
"auto" is 48 or whatever (I haven't benchmarked, but that's surely
counter-productive). Having max != auto in that case (but still having a
"max") would be nice.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/5] pack-objects: clamp negative window size to 0
  2021-05-05 11:53         ` Ævar Arnfjörð Bjarmason
@ 2021-05-05 16:19           ` René Scharfe
  0 siblings, 0 replies; 15+ messages in thread
From: René Scharfe @ 2021-05-05 16:19 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jeff King, Derrick Stolee, git, Yiyuan guo

Am 05.05.21 um 13:53 schrieb Ævar Arnfjörð Bjarmason:
>
> On Tue, May 04 2021, René Scharfe wrote:
>
>> There's another option: Mapping -1 or all negative values to the
>> maximum:

> That seems sensible to expose, but I think should really be
> --window=max, not --window=-1. The latter feels way too much like
> assuming that a user might know about C's "set all bits" semantics.

Nitpicking: Setting all bits for -1 is done by two's complement, which
is just one of the signed number representations supported by C.

But yeah: A non-numeric value would probably be better in general.  As
Peff explained it's not a particularly good idea to specify the maximum
values of --depth and --window, though, so no need to make it easier.

> The one example of such a variable I could think of is core.abbrev=no,
> which could arguably benefit from a core.abbrev=max synonym.

core.abbrev=no turns off abbreviation, i.e. you get the full hash
size (false and off work as well).

Following that logic, core.abbrev=max would ask for a maximum of
abbreviation, i.e. for the shortest unambiguous hash.  That would have
a length of at least 4.  This value is stored in a constant called
minimum_abbrev -- which seems backwards.  The implied negation of abbrev
(the more you abbreviate, the shorter the length) is a bit confusing.

> Another one is *.threads, e.g. grep.threads, index.threads. We currently
> say that "auto" is like "max, but I can see how we'd eventually benefit
> from splitting those up. I sometimes run git on machines where that
> "auto" is 48 or whatever (I haven't benchmarked, but that's surely
> counter-productive). Having max != auto in that case (but still having a
> "max") would be nice.

Good thinking.  What is the maximum number of threads?  Certainly higher
than the number of CPUs.  Would that be useful?  Maybe -- on a
single-core VM with an SSD queue length of 32 we can probably benefit
from running more than one thread.

Are our threads CPU-bound or I/O-bound?  I guess the answer is "yes". :)
How do we even find out the disk queue length in a portable way?  And
how would we calculate the optimal number of threads?  Are these even
the right questions to ask?

An "auto" option might help with that.  I imagine it starting out with
some default value and then experimentally decreasing and and increasing
the number of threads to find out which one works best.  Downside: It
would need to have comparable workloads for that.  And these benchmarks
need an otherwise quiet system.  Similar battery level if running on a
laptop.  Same system temperature.  Impractical during normal use.

Perhaps a "git benchmark" command that runs some synthetic speed tests
to tune grep.threads etc. would be possible?

René

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-05-05 16:19 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-01 14:02 [PATCH 0/5] pack-objects: better handling of negative options Jeff King
2021-05-01 14:02 ` [PATCH 1/5] t5300: modernize basic tests Jeff King
2021-05-03  5:27   ` Junio C Hamano
2021-05-03 14:53     ` Jeff King
2021-05-01 14:03 ` [PATCH 2/5] t5300: check that we produced expected number of deltas Jeff King
2021-05-01 14:03 ` [PATCH 3/5] pack-objects: clamp negative window size to 0 Jeff King
2021-05-03 12:10   ` Derrick Stolee
2021-05-03 14:55     ` Jeff King
2021-05-04 18:47       ` René Scharfe
2021-05-04 21:38         ` Jeff King
2021-05-05 11:53         ` Ævar Arnfjörð Bjarmason
2021-05-05 16:19           ` René Scharfe
2021-05-01 14:04 ` [PATCH 4/5] t5316: check behavior of pack-objects --depth=0 Jeff King
2021-05-01 14:04 ` [PATCH 5/5] pack-objects: clamp negative depth to 0 Jeff King
2021-05-03 12:11   ` Derrick Stolee

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).