git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/7] harden unexpected object types checks
@ 2019-04-05  3:37 Taylor Blau
  2019-04-05  3:37 ` [PATCH 1/7] t: move 'hex2oct' into test-lib-functions.sh Taylor Blau
                   ` (7 more replies)
  0 siblings, 8 replies; 41+ messages in thread
From: Taylor Blau @ 2019-04-05  3:37 UTC (permalink / raw)
  To: git; +Cc: peff, gitster

Hi everybody,

Peff pointed out to me a couple of weeks ago that we could reproducibly
crash Git when doing the following:

  $ git rev-list --objects <blob> <broken-tree>

Where <blob> is a normal blob, and <broken-tree> is a tree which
contains an entry that refers to <blob> but gives it a type other than
'blob'. (This is described in detail in 2/7 and fixed in 3/7.)

We decided to continue, trying to come up with more tests that exercise
similar object corruption, and the tests
't6102-rev-list-unexpected-objects.sh' are what we came up with.

The series goes as follows:

  1. Prepare ourselves by moving a helper in 't' into
     test-lib-functions.sh so that we can use it in a new location.

  2. Write out a handful of tests that exercises cases similar to the
     above, and mark the ones with bugs as 'test_expect_failure'.

  3. Fix (most) of them in each subsequent commit.

The exception we make for step (3) is that don't provide a complete fix,
only restore behavior to before the commit at which it regressed.

I'll be brief here, since most of the detail is described at length in
the patches themselves. This said, please do ask questions where I
wasn't clear, or could have been clearer. (This series grew larger than
I originally expected it to, so perhaps there is detail that I
accumulated and didn't devote enough time to).

Thanks as always in advance for your review.


Jeff King (3):
  get_commit_tree(): return NULL for broken tree
  rev-list: let traversal die when --missing is not in use
  rev-list: detect broken root trees

Taylor Blau (4):
  t: move 'hex2oct' into test-lib-functions.sh
  t: introduce tests for unexpected object types
  list-objects.c: handle unexpected non-blob entries
  list-objects.c: handle unexpected non-tree entries

 builtin/rev-list.c                     |   4 +-
 commit.c                               |   6 +-
 list-objects.c                         |  13 +++
 t/t1007-hash-object.sh                 |   4 -
 t/t1450-fsck.sh                        |   4 -
 t/t5601-clone.sh                       |   4 -
 t/t6102-rev-list-unexpected-objects.sh | 127 +++++++++++++++++++++++++
 t/test-lib-functions.sh                |   6 ++
 8 files changed, 152 insertions(+), 16 deletions(-)
 create mode 100755 t/t6102-rev-list-unexpected-objects.sh

--
2.21.0.203.g358da99528

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH 1/7] t: move 'hex2oct' into test-lib-functions.sh
  2019-04-05  3:37 [PATCH 0/7] harden unexpected object types checks Taylor Blau
@ 2019-04-05  3:37 ` Taylor Blau
  2019-04-05  3:37 ` [PATCH 2/7] t: introduce tests for unexpected object types Taylor Blau
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2019-04-05  3:37 UTC (permalink / raw)
  To: git; +Cc: peff, gitster

The helper 'hex2oct' is used to convert base-16 encoded data into a
base-8 binary form, and is useful for preparing data for commands that
accept input in a binary format, such as 'git hash-object', via
'printf'.

This helper is defined identically in three separate places throughout
't'. Move the definition to test-lib-function.sh, so that it can be used
in new test suites, and its definition is not redundant.

This will likewise make our job easier in the subsequent commit, which
also uses 'hex2oct'.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t1007-hash-object.sh  | 4 ----
 t/t1450-fsck.sh         | 4 ----
 t/t5601-clone.sh        | 4 ----
 t/test-lib-functions.sh | 6 ++++++
 4 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index a37753047e..7099d33508 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -199,10 +199,6 @@ test_expect_success 'too-short tree' '
 	test_i18ngrep "too-short tree object" err
 '
 
-hex2oct() {
-    perl -ne 'printf "\\%03o", hex for /../g'
-}
-
 test_expect_success 'malformed mode in tree' '
 	hex_sha1=$(echo foo | git hash-object --stdin -w) &&
 	bin_sha1=$(echo $hex_sha1 | hex2oct) &&
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 49f08d5b9c..0f268a3664 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -256,10 +256,6 @@ test_expect_success 'unparseable tree object' '
 	test_i18ngrep ! "fatal: empty filename in tree entry" out
 '
 
-hex2oct() {
-	perl -ne 'printf "\\%03o", hex for /../g'
-}
-
 test_expect_success 'tree entry with type mismatch' '
 	test_when_finished "remove_object \$blob" &&
 	test_when_finished "remove_object \$tree" &&
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index d6948cbdab..3f49943010 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -611,10 +611,6 @@ test_expect_success 'GIT_TRACE_PACKFILE produces a usable pack' '
 	git -C replay.git index-pack -v --stdin <tmp.pack
 '
 
-hex2oct () {
-	perl -ne 'printf "\\%03o", hex for /../g'
-}
-
 test_expect_success 'clone on case-insensitive fs' '
 	git init icasefs &&
 	(
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 80402a428f..349eabe851 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1202,6 +1202,12 @@ depacketize () {
 	'
 }
 
+# Converts base-16 data into base-8. The output is given as a sequence of
+# escaped octals, suitable for consumption by 'printf'.
+hex2oct () {
+	perl -ne 'printf "\\%03o", hex for /../g'
+}
+
 # Set the hash algorithm in use to $1.  Only useful when testing the testsuite.
 test_set_hash () {
 	test_hash_algo="$1"
-- 
2.21.0.203.g358da99528


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 2/7] t: introduce tests for unexpected object types
  2019-04-05  3:37 [PATCH 0/7] harden unexpected object types checks Taylor Blau
  2019-04-05  3:37 ` [PATCH 1/7] t: move 'hex2oct' into test-lib-functions.sh Taylor Blau
@ 2019-04-05  3:37 ` Taylor Blau
  2019-04-05 10:50   ` SZEDER Gábor
  2019-04-05 18:31   ` Jeff King
  2019-04-05  3:37 ` [PATCH 3/7] list-objects.c: handle unexpected non-blob entries Taylor Blau
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 41+ messages in thread
From: Taylor Blau @ 2019-04-05  3:37 UTC (permalink / raw)
  To: git; +Cc: peff, gitster

Call an object's type "unexpected" when the actual type of an object
does not match Git's contextual expectation. For example, a tree entry
whose mode differs from the object's actual type, or a commit's parent
which is not another commit, and so on.

This can manifest itself in various unfortunate ways, including Git
SIGSEGV-ing under specific conditions. Consider the following example:
Git traverses a blob (say, via `git rev-list`), and then tries to read
out a tree-entry which lists that object as something other than a blob.
In this case, `lookup_blob()` will return NULL, and the subsequent
dereference will result in a SIGSEGV.

Introduce tests that present objects of "unexpected" type in the above
fashion to 'git rev-list'. Mark as failures the combinations that are
already broken (i.e., they exhibit the segfault described above). In the
cases that are not broken (i.e., they have NULL-ness checks or similar),
mark these as expecting success.

Let A be the object referenced with an unexpected type, and B be the
object doing the referencing. Do the following:

  - test 'git rev-list --objects A B'. This causes A to be "cached", and
    presents the above scenario.

Likewise, if we have a tree entry that claims to be a tree (for example)
but points to another object type (say, a blob), there are two ways we
might find out:

  - when we call lookup_tree(), we might find that we've already seen
    the object referenced as another type, in which case we'd get NULL

  - we call lookup_tree() successfully, but when we try to read the
    object, we find out it's something else.

We should check that we behave sensibly in both cases (especially
because it is easy for a malicious actor to provoke one case or the
other).

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 t/t6102-rev-list-unexpected-objects.sh | 123 +++++++++++++++++++++++++
 1 file changed, 123 insertions(+)
 create mode 100755 t/t6102-rev-list-unexpected-objects.sh

diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh
new file mode 100755
index 0000000000..472b08528a
--- /dev/null
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -0,0 +1,123 @@
+#!/bin/sh
+
+test_description='git rev-list should handle unexpected object types'
+
+. ./test-lib.sh
+
+test_expect_success 'setup well-formed objects' '
+	blob="$(printf "foo" | git hash-object -w --stdin)" &&
+	tree="$(printf "100644 blob $blob\tfoo" | git mktree)" &&
+	commit="$(git commit-tree $tree -m "first commit")"
+'
+
+test_expect_success 'setup unexpected non-blob entry' '
+	printf "100644 foo\0$(echo $tree | hex2oct)" >broken-tree &&
+	broken_tree="$(git hash-object -w --literally -t tree broken-tree)"
+'
+
+test_expect_failure 'traverse unexpected non-blob entry (lone)' '
+	test_must_fail git rev-list --objects $broken_tree
+'
+
+test_expect_failure 'traverse unexpected non-blob entry (seen)' '
+	test_must_fail git rev-list --objects $tree $broken_tree
+'
+
+test_expect_success 'setup unexpected non-tree entry' '
+	printf "40000 foo\0$(echo $blob | hex2oct)" >broken-tree &&
+	broken_tree="$(git hash-object -w --literally -t tree broken-tree)"
+'
+
+test_expect_failure 'traverse unexpected non-tree entry (lone)' '
+	test_must_fail git rev-list --objects $broken_tree
+'
+
+test_expect_failure 'traverse unexpected non-tree entry (seen)' '
+	test_must_fail git rev-list --objects $blob $broken_tree >output 2>&1
+'
+
+test_expect_success 'setup unexpected non-commit parent' '
+	git cat-file commit $commit |
+		perl -lpe "/^author/ && print q(parent $blob)" \
+		>broken-commit &&
+	broken_commit="$(git hash-object -w --literally -t commit \
+		broken-commit)"
+'
+
+test_expect_success 'traverse unexpected non-commit parent (lone)' '
+	test_must_fail git rev-list --objects $broken_commit >output 2>&1 &&
+	test_i18ngrep "not a commit" output
+'
+
+test_expect_success 'traverse unexpected non-commit parent (seen)' '
+	test_must_fail git rev-list --objects $commit $broken_commit \
+		>output 2>&1 &&
+	test_i18ngrep "not a commit" output
+'
+
+test_expect_success 'setup unexpected non-tree root' '
+	git cat-file commit $commit |
+	sed -e "s/$tree/$blob/" >broken-commit &&
+	broken_commit="$(git hash-object -w --literally -t commit \
+		broken-commit)"
+'
+
+test_expect_failure 'traverse unexpected non-tree root (lone)' '
+	test_must_fail git rev-list --objects $broken_commit
+'
+
+test_expect_failure 'traverse unexpected non-tree root (seen)' '
+	test_must_fail git rev-list --objects $blob $broken_commit
+'
+
+test_expect_success 'setup unexpected non-commit tag' '
+	git tag -a -m "tagged commit" tag $commit &&
+	test_when_finished "git tag -d tag" &&
+	git cat-file -p tag | sed -e "s/$commit/$blob/" >broken-tag &&
+	tag=$(git hash-object -w --literally -t tag broken-tag)
+'
+
+test_expect_success 'traverse unexpected non-commit tag (lone)' '
+	test_must_fail git rev-list --objects $tag
+'
+
+test_expect_success 'traverse unexpected non-commit tag (seen)' '
+	test_must_fail git rev-list --objects $blob $tag >output 2>&1 &&
+	test_i18ngrep "not a commit" output
+'
+
+test_expect_success 'setup unexpected non-tree tag' '
+	git tag -a -m "tagged tree" tag $tree &&
+	test_when_finished "git tag -d tag" &&
+	git cat-file -p tag |
+	sed -e "s/$tree/$blob/" >broken-tag &&
+	tag=$(git hash-object -w --literally -t tag broken-tag)
+'
+
+test_expect_success 'traverse unexpected non-tree tag (lone)' '
+	test_must_fail git rev-list --objects $tag
+'
+
+test_expect_success 'traverse unexpected non-tree tag (seen)' '
+	test_must_fail git rev-list --objects $blob $tag >output 2>&1 &&
+	test_i18ngrep "not a tree" output
+'
+
+test_expect_success 'setup unexpected non-blob tag' '
+	git tag -a -m "tagged blob" tag $blob &&
+	test_when_finished "git tag -d tag" &&
+	git cat-file -p tag |
+	sed -e "s/$blob/$commit/" >broken-tag &&
+	tag=$(git hash-object -w --literally -t tag broken-tag)
+'
+
+test_expect_failure 'traverse unexpected non-blob tag (lone)' '
+	test_must_fail git rev-list --objects $tag
+'
+
+test_expect_success 'traverse unexpected non-blob tag (seen)' '
+	test_must_fail git rev-list --objects $commit $tag >output 2>&1 &&
+	test_i18ngrep "not a blob" output
+'
+
+test_done
-- 
2.21.0.203.g358da99528


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 3/7] list-objects.c: handle unexpected non-blob entries
  2019-04-05  3:37 [PATCH 0/7] harden unexpected object types checks Taylor Blau
  2019-04-05  3:37 ` [PATCH 1/7] t: move 'hex2oct' into test-lib-functions.sh Taylor Blau
  2019-04-05  3:37 ` [PATCH 2/7] t: introduce tests for unexpected object types Taylor Blau
@ 2019-04-05  3:37 ` Taylor Blau
  2019-04-05  3:37 ` [PATCH 4/7] list-objects.c: handle unexpected non-tree entries Taylor Blau
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2019-04-05  3:37 UTC (permalink / raw)
  To: git; +Cc: peff, gitster

Fix one of the cases described in the previous commit where a tree-entry
that is promised to a blob is in fact a non-blob.

When 'lookup_blob()' returns NULL, it is because Git has cached the
requested object as a non-blob. In this case, prevent a SIGSEGV by
'die()'-ing immediately before attempting to dereference the result.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 list-objects.c                         | 5 +++++
 t/t6102-rev-list-unexpected-objects.sh | 5 +++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index dc77361e11..ea04bbdee6 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -133,6 +133,11 @@ static void process_tree_contents(struct traversal_context *ctx,
 					base, entry.path);
 		else {
 			struct blob *b = lookup_blob(ctx->revs->repo, &entry.oid);
+			if (!b) {
+				die(_("entry '%s' in tree %s has blob mode, "
+				      "but is not a blob"),
+				    entry.path, oid_to_hex(&tree->object.oid));
+			}
 			b->object.flags |= NOT_USER_GIVEN;
 			process_blob(ctx, b, base, entry.path);
 		}
diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh
index 472b08528a..76fe9be30f 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -19,8 +19,9 @@ test_expect_failure 'traverse unexpected non-blob entry (lone)' '
 	test_must_fail git rev-list --objects $broken_tree
 '
 
-test_expect_failure 'traverse unexpected non-blob entry (seen)' '
-	test_must_fail git rev-list --objects $tree $broken_tree
+test_expect_success 'traverse unexpected non-blob entry (seen)' '
+	test_must_fail git rev-list --objects $tree $broken_tree >output 2>&1 &&
+	test_i18ngrep "is not a blob" output
 '
 
 test_expect_success 'setup unexpected non-tree entry' '
-- 
2.21.0.203.g358da99528


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 4/7] list-objects.c: handle unexpected non-tree entries
  2019-04-05  3:37 [PATCH 0/7] harden unexpected object types checks Taylor Blau
                   ` (2 preceding siblings ...)
  2019-04-05  3:37 ` [PATCH 3/7] list-objects.c: handle unexpected non-blob entries Taylor Blau
@ 2019-04-05  3:37 ` Taylor Blau
  2019-04-05  3:37 ` [PATCH 5/7] get_commit_tree(): return NULL for broken tree Taylor Blau
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2019-04-05  3:37 UTC (permalink / raw)
  To: git; +Cc: peff, gitster

Apply similar treatment as the previous commit for non-tree entries,
too.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 list-objects.c                         | 5 +++++
 t/t6102-rev-list-unexpected-objects.sh | 5 +++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index ea04bbdee6..bb7e61ef4b 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -125,6 +125,11 @@ static void process_tree_contents(struct traversal_context *ctx,
 
 		if (S_ISDIR(entry.mode)) {
 			struct tree *t = lookup_tree(ctx->revs->repo, &entry.oid);
+			if (!t) {
+				die(_("entry '%s' in tree %s has tree mode, "
+				      "but is not a tree"),
+				    entry.path, oid_to_hex(&tree->object.oid));
+			}
 			t->object.flags |= NOT_USER_GIVEN;
 			process_tree(ctx, t, base, entry.path);
 		}
diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh
index 76fe9be30f..30976385a8 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -33,8 +33,9 @@ test_expect_failure 'traverse unexpected non-tree entry (lone)' '
 	test_must_fail git rev-list --objects $broken_tree
 '
 
-test_expect_failure 'traverse unexpected non-tree entry (seen)' '
-	test_must_fail git rev-list --objects $blob $broken_tree >output 2>&1
+test_expect_success 'traverse unexpected non-tree entry (seen)' '
+	test_must_fail git rev-list --objects $blob $broken_tree >output 2>&1 &&
+	test_i18ngrep "is not a tree" output
 '
 
 test_expect_success 'setup unexpected non-commit parent' '
-- 
2.21.0.203.g358da99528


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 5/7] get_commit_tree(): return NULL for broken tree
  2019-04-05  3:37 [PATCH 0/7] harden unexpected object types checks Taylor Blau
                   ` (3 preceding siblings ...)
  2019-04-05  3:37 ` [PATCH 4/7] list-objects.c: handle unexpected non-tree entries Taylor Blau
@ 2019-04-05  3:37 ` Taylor Blau
  2019-04-05  3:37 ` [PATCH 6/7] rev-list: let traversal die when --missing is not in use Taylor Blau
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2019-04-05  3:37 UTC (permalink / raw)
  To: git; +Cc: peff, gitster

From: Jeff King <peff@peff.net>

Return NULL from 'get_commit_tree()' when a commit's root tree is
corrupt, doesn't exist, or points to an object which is not a tree.

In [1], this situation became a BUG(), but it can certainly occur in
cases which are not a bug in Git, for e.g., if a caller manually crafts
a commit whose tree is corrupt in any of the above ways.

Note that the expect_failure test in t6102 triggers this BUG(), but we
can't flip it to expect_success yet. Solving this problem actually
reveals a second bug.

[1]: 7b8a21dba1 (commit-graph: lazy-load trees for commits, 2018-04-06)

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 commit.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/commit.c b/commit.c
index a5333c7ac6..e2cde566a9 100644
--- a/commit.c
+++ b/commit.c
@@ -345,10 +345,10 @@ struct tree *get_commit_tree(const struct commit *commit)
 	if (commit->maybe_tree || !commit->object.parsed)
 		return commit->maybe_tree;
 
-	if (commit->graph_pos == COMMIT_NOT_FROM_GRAPH)
-		BUG("commit has NULL tree, but was not loaded from commit-graph");
+	if (commit->graph_pos != COMMIT_NOT_FROM_GRAPH)
+		return get_commit_tree_in_graph(the_repository, commit);
 
-	return get_commit_tree_in_graph(the_repository, commit);
+	return NULL;
 }
 
 struct object_id *get_commit_tree_oid(const struct commit *commit)
-- 
2.21.0.203.g358da99528


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 6/7] rev-list: let traversal die when --missing is not in use
  2019-04-05  3:37 [PATCH 0/7] harden unexpected object types checks Taylor Blau
                   ` (4 preceding siblings ...)
  2019-04-05  3:37 ` [PATCH 5/7] get_commit_tree(): return NULL for broken tree Taylor Blau
@ 2019-04-05  3:37 ` Taylor Blau
  2019-04-05 18:41   ` Jeff King
  2019-04-05  3:37 ` [PATCH 7/7] rev-list: detect broken root trees Taylor Blau
  2019-04-10  2:13 ` [PATCH v2 0/7] harden unexpected object types checks Taylor Blau
  7 siblings, 1 reply; 41+ messages in thread
From: Taylor Blau @ 2019-04-05  3:37 UTC (permalink / raw)
  To: git; +Cc: peff, gitster

From: Jeff King <peff@peff.net>

Commit 7c0fe330d5 (rev-list: handle missing tree objects properly,
2018-10-05) taught the traversal machinery used by git-rev-list to
ignore missing trees, so that rev-list could handle them itself.

However, it does so only by checking via oid_object_info_extended() that
the object exists at all. This can miss several classes of errors that
were previously detected by rev-list:

  - type mismatches (e.g., we expected a tree but got a blob)

  - failure to read the object data (e.g., due to bitrot on disk)

This is especially important because we use "rev-list --objects" as our
connectivity check to admit new objects to the repository, and it will
now miss these cases (though the bitrot one is less important here,
because we'd typically have just hashed and stored the object).

There are a few options to fix this:

 1. we could check these properties in rev-list when we do the existence
    check. This is probably too expensive in practice (perhaps even for
    a type check, but definitely for checking the whole content again,
    which implies loading each object into memory twice).

 2. teach the traversal machinery to differentiate between a missing
    object, and one that could not be loaded as expected. This probably
    wouldn't be too hard to detect type mismatches, but detecting bitrot
    versus a truly missing object would require deep changes to the
    object-loading code.

 3. have the traversal machinery communicate the failure to the caller,
    so that it can decide how to proceed without re-evaluting the object
    itself.

Of those, I think (3) is probably the best path forward. However, this
patch does none of them. In the name of expediently fixing the
regression to a normal "rev-list --objects" that we use for connectivity
checks, this simply restores the pre-7c0fe330d5 behavior of having the
traversal die as soon as it fails to load a tree (when --missing is set
to MA_ERROR, which is the default).

Note that we can't get rid of the object-existence check in
finish_object(), because this also handles blobs (which are not
otherwise checked at all by the traversal code).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/rev-list.c                     | 4 +++-
 t/t6102-rev-list-unexpected-objects.sh | 4 ++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 425a5774db..9f31837d30 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -379,7 +379,6 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	repo_init_revisions(the_repository, &revs, prefix);
 	revs.abbrev = DEFAULT_ABBREV;
 	revs.commit_format = CMIT_FMT_UNSPECIFIED;
-	revs.do_not_die_on_missing_tree = 1;
 
 	/*
 	 * Scan the argument list before invoking setup_revisions(), so that we
@@ -409,6 +408,9 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		}
 	}
 
+	if (arg_missing_action)
+		revs.do_not_die_on_missing_tree = 1;
+
 	argc = setup_revisions(argc, argv, &revs, &s_r_opt);
 
 	memset(&info, 0, sizeof(info));
diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh
index 30976385a8..c8d4b31f8f 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -29,7 +29,7 @@ test_expect_success 'setup unexpected non-tree entry' '
 	broken_tree="$(git hash-object -w --literally -t tree broken-tree)"
 '
 
-test_expect_failure 'traverse unexpected non-tree entry (lone)' '
+test_expect_success 'traverse unexpected non-tree entry (lone)' '
 	test_must_fail git rev-list --objects $broken_tree
 '
 
@@ -64,7 +64,7 @@ test_expect_success 'setup unexpected non-tree root' '
 		broken-commit)"
 '
 
-test_expect_failure 'traverse unexpected non-tree root (lone)' '
+test_expect_success 'traverse unexpected non-tree root (lone)' '
 	test_must_fail git rev-list --objects $broken_commit
 '
 
-- 
2.21.0.203.g358da99528


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH 7/7] rev-list: detect broken root trees
  2019-04-05  3:37 [PATCH 0/7] harden unexpected object types checks Taylor Blau
                   ` (5 preceding siblings ...)
  2019-04-05  3:37 ` [PATCH 6/7] rev-list: let traversal die when --missing is not in use Taylor Blau
@ 2019-04-05  3:37 ` Taylor Blau
  2019-04-10  2:13 ` [PATCH v2 0/7] harden unexpected object types checks Taylor Blau
  7 siblings, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2019-04-05  3:37 UTC (permalink / raw)
  To: git; +Cc: peff, gitster

From: Jeff King <peff@peff.net>

When the traversal machinery sees a commit without a root tree, it
assumes that the tree was part of a BOUNDARY commit, and quietly ignores
the tree. But it could also be caused by a commit whose root tree is
broken or missing.

Instead, let's die() when we see a NULL root tree. We can differentiate
it from the BOUNDARY case by seeing if the commit was actually parsed.
This covers that case, plus future-proofs us against any others where we
might try to show an unparsed commit.

Signed-off-by: Jeff King <peff@peff.net>
---
 list-objects.c                         | 3 +++
 t/t6102-rev-list-unexpected-objects.sh | 6 ++++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index bb7e61ef4b..b5651ddd5b 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -374,6 +374,9 @@ static void do_traverse(struct traversal_context *ctx)
 			struct tree *tree = get_commit_tree(commit);
 			tree->object.flags |= NOT_USER_GIVEN;
 			add_pending_tree(ctx->revs, tree);
+		} else if (commit->object.parsed) {
+			die(_("unable to load root tree for commit %s"),
+			      oid_to_hex(&commit->object.oid));
 		}
 		ctx->show_commit(commit, ctx->show_data);
 
diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh
index c8d4b31f8f..98f5cffbb6 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -68,8 +68,10 @@ test_expect_success 'traverse unexpected non-tree root (lone)' '
 	test_must_fail git rev-list --objects $broken_commit
 '
 
-test_expect_failure 'traverse unexpected non-tree root (seen)' '
-	test_must_fail git rev-list --objects $blob $broken_commit
+test_expect_success 'traverse unexpected non-tree root (seen)' '
+	test_must_fail git rev-list --objects $blob $broken_commit \
+		>output 2>&1 &&
+	test_i18ngrep "not a tree" output
 '
 
 test_expect_success 'setup unexpected non-commit tag' '
-- 
2.21.0.203.g358da99528

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/7] t: introduce tests for unexpected object types
  2019-04-05  3:37 ` [PATCH 2/7] t: introduce tests for unexpected object types Taylor Blau
@ 2019-04-05 10:50   ` SZEDER Gábor
  2019-04-05 18:24     ` Jeff King
  2019-04-05 18:31   ` Jeff King
  1 sibling, 1 reply; 41+ messages in thread
From: SZEDER Gábor @ 2019-04-05 10:50 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, peff, gitster

On Thu, Apr 04, 2019 at 08:37:44PM -0700, Taylor Blau wrote:
> diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh
> new file mode 100755
> index 0000000000..472b08528a
> --- /dev/null
> +++ b/t/t6102-rev-list-unexpected-objects.sh
> @@ -0,0 +1,123 @@
> +#!/bin/sh
> +
> +test_description='git rev-list should handle unexpected object types'
> +
> +. ./test-lib.sh
> +
> +test_expect_success 'setup well-formed objects' '
> +	blob="$(printf "foo" | git hash-object -w --stdin)" &&
> +	tree="$(printf "100644 blob $blob\tfoo" | git mktree)" &&
> +	commit="$(git commit-tree $tree -m "first commit")"
> +'
> +
> +test_expect_success 'setup unexpected non-blob entry' '
> +	printf "100644 foo\0$(echo $tree | hex2oct)" >broken-tree &&
> +	broken_tree="$(git hash-object -w --literally -t tree broken-tree)"
> +'
> +
> +test_expect_failure 'traverse unexpected non-blob entry (lone)' '
> +	test_must_fail git rev-list --objects $broken_tree
> +'
> +
> +test_expect_failure 'traverse unexpected non-blob entry (seen)' '
> +	test_must_fail git rev-list --objects $tree $broken_tree
> +'
> +
> +test_expect_success 'setup unexpected non-tree entry' '
> +	printf "40000 foo\0$(echo $blob | hex2oct)" >broken-tree &&
> +	broken_tree="$(git hash-object -w --literally -t tree broken-tree)"
> +'
> +
> +test_expect_failure 'traverse unexpected non-tree entry (lone)' '
> +	test_must_fail git rev-list --objects $broken_tree
> +'
> +
> +test_expect_failure 'traverse unexpected non-tree entry (seen)' '
> +	test_must_fail git rev-list --objects $blob $broken_tree >output 2>&1

This test saves standard output and error, but doesn't look at them.

> +'
> +
> +test_expect_success 'setup unexpected non-commit parent' '
> +	git cat-file commit $commit |
> +		perl -lpe "/^author/ && print q(parent $blob)" \
> +		>broken-commit &&

Don't run git commands upstream of a pipe, because the pipe hides
their exit code.  This applies to several other tests below as well.

Wouldn't a 'sed' one-liner suffice, so we won't have yet another perl
dependency?

> +	broken_commit="$(git hash-object -w --literally -t commit \
> +		broken-commit)"
> +'
> +
> +test_expect_success 'traverse unexpected non-commit parent (lone)' '
> +	test_must_fail git rev-list --objects $broken_commit >output 2>&1 &&
> +	test_i18ngrep "not a commit" output

Please make sure that this "not a commit" message goes to the file
descriptor it is supposed to, i.e., assuming it's part of an error
message:

  test_must_fail git rev-list .... 2>err &&
  test_i18ngrep "..." err

This applies to several other tests below and in other patches as
well.

> +'
> +
> +test_expect_success 'traverse unexpected non-commit parent (seen)' '
> +	test_must_fail git rev-list --objects $commit $broken_commit \
> +		>output 2>&1 &&
> +	test_i18ngrep "not a commit" output
> +'
> +
> +test_expect_success 'setup unexpected non-tree root' '
> +	git cat-file commit $commit |
> +	sed -e "s/$tree/$blob/" >broken-commit &&
> +	broken_commit="$(git hash-object -w --literally -t commit \
> +		broken-commit)"
> +'
> +
> +test_expect_failure 'traverse unexpected non-tree root (lone)' '
> +	test_must_fail git rev-list --objects $broken_commit
> +'
> +
> +test_expect_failure 'traverse unexpected non-tree root (seen)' '
> +	test_must_fail git rev-list --objects $blob $broken_commit
> +'
> +
> +test_expect_success 'setup unexpected non-commit tag' '
> +	git tag -a -m "tagged commit" tag $commit &&
> +	test_when_finished "git tag -d tag" &&
> +	git cat-file -p tag | sed -e "s/$commit/$blob/" >broken-tag &&
> +	tag=$(git hash-object -w --literally -t tag broken-tag)
> +'
> +
> +test_expect_success 'traverse unexpected non-commit tag (lone)' '
> +	test_must_fail git rev-list --objects $tag
> +'
> +
> +test_expect_success 'traverse unexpected non-commit tag (seen)' '
> +	test_must_fail git rev-list --objects $blob $tag >output 2>&1 &&
> +	test_i18ngrep "not a commit" output
> +'
> +
> +test_expect_success 'setup unexpected non-tree tag' '
> +	git tag -a -m "tagged tree" tag $tree &&
> +	test_when_finished "git tag -d tag" &&
> +	git cat-file -p tag |
> +	sed -e "s/$tree/$blob/" >broken-tag &&
> +	tag=$(git hash-object -w --literally -t tag broken-tag)
> +'
> +
> +test_expect_success 'traverse unexpected non-tree tag (lone)' '
> +	test_must_fail git rev-list --objects $tag
> +'
> +
> +test_expect_success 'traverse unexpected non-tree tag (seen)' '
> +	test_must_fail git rev-list --objects $blob $tag >output 2>&1 &&
> +	test_i18ngrep "not a tree" output
> +'
> +
> +test_expect_success 'setup unexpected non-blob tag' '
> +	git tag -a -m "tagged blob" tag $blob &&
> +	test_when_finished "git tag -d tag" &&
> +	git cat-file -p tag |
> +	sed -e "s/$blob/$commit/" >broken-tag &&
> +	tag=$(git hash-object -w --literally -t tag broken-tag)
> +'
> +
> +test_expect_failure 'traverse unexpected non-blob tag (lone)' '
> +	test_must_fail git rev-list --objects $tag
> +'
> +
> +test_expect_success 'traverse unexpected non-blob tag (seen)' '
> +	test_must_fail git rev-list --objects $commit $tag >output 2>&1 &&
> +	test_i18ngrep "not a blob" output
> +'
> +
> +test_done
> -- 
> 2.21.0.203.g358da99528
> 

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/7] t: introduce tests for unexpected object types
  2019-04-05 10:50   ` SZEDER Gábor
@ 2019-04-05 18:24     ` Jeff King
  2019-04-05 18:42       ` SZEDER Gábor
                         ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Jeff King @ 2019-04-05 18:24 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Taylor Blau, git, gitster

On Fri, Apr 05, 2019 at 12:50:33PM +0200, SZEDER Gábor wrote:

> > +test_expect_failure 'traverse unexpected non-tree entry (seen)' '
> > +	test_must_fail git rev-list --objects $blob $broken_tree >output 2>&1
> 
> This test saves standard output and error, but doesn't look at them.

I think we want to be checking for "not a tree" here, which is later
added with the fix. But either we should have the test_i18ngrep here
initially, or we should add both the redirect and the grep with the fix.

> > +test_expect_success 'setup unexpected non-commit parent' '
> > +	git cat-file commit $commit |
> > +		perl -lpe "/^author/ && print q(parent $blob)" \
> > +		>broken-commit &&
> 
> Don't run git commands upstream of a pipe, because the pipe hides
> their exit code.  This applies to several other tests below as well.

I disagree with that rule here. We're not testing "cat-file" in any
meaningful way, but just getting some stock output from a known-good
commit.

> Wouldn't a 'sed' one-liner suffice, so we won't have yet another perl
> dependency?

Heh, this was actually the subject of much discussion before the patches
hit the list. If you can write such a one-liner that is both readable
and portable, please share it. I got disgusted with sed and suggested
this perl.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/7] t: introduce tests for unexpected object types
  2019-04-05  3:37 ` [PATCH 2/7] t: introduce tests for unexpected object types Taylor Blau
  2019-04-05 10:50   ` SZEDER Gábor
@ 2019-04-05 18:31   ` Jeff King
  2019-04-06  5:23     ` Taylor Blau
  1 sibling, 1 reply; 41+ messages in thread
From: Jeff King @ 2019-04-05 18:31 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, gitster

On Thu, Apr 04, 2019 at 08:37:44PM -0700, Taylor Blau wrote:

> Let A be the object referenced with an unexpected type, and B be the
> object doing the referencing. Do the following:
> 
>   - test 'git rev-list --objects A B'. This causes A to be "cached", and
>     presents the above scenario.
> 
> Likewise, if we have a tree entry that claims to be a tree (for example)
> but points to another object type (say, a blob), there are two ways we
> might find out:
> 
>   - when we call lookup_tree(), we might find that we've already seen
>     the object referenced as another type, in which case we'd get NULL
> 
>   - we call lookup_tree() successfully, but when we try to read the
>     object, we find out it's something else.
> 
> We should check that we behave sensibly in both cases (especially
> because it is easy for a malicious actor to provoke one case or the
> other).

I think our pasting together of multiple commits adding the lone/seen
cases ended up in some redundancy in the description. In particular, I'm
not sure what the first paragraph/bullet quoted above is trying to say,
as it corresponds to the second bullet in the later list. Maybe collapse
them together like:

  We might hit an unexpected type in two different ways (imagine we have
  a tree entry that claims to be a tree but actually points to a blob):

    - when we call lookup_tree(), we might find that we've already seen
      the object referenced as a blob, in which case we'd get NULL. We
      can exercise this with "git rev-list --objects $blob $tree", which
      guarantees that the blob will have been parsed before we look in
      the tree. These tests are marked as "seen" in the test script.

    - we call lookup_tree() successfully, but when we try to read the
      object, we find out it's something else. We construct our tests
      such that $blob is not otherwise mentioned in $tree. These tests
      are marked as "lone" in the script.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 6/7] rev-list: let traversal die when --missing is not in use
  2019-04-05  3:37 ` [PATCH 6/7] rev-list: let traversal die when --missing is not in use Taylor Blau
@ 2019-04-05 18:41   ` Jeff King
  2019-04-06  5:36     ` Taylor Blau
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2019-04-05 18:41 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, gitster

On Thu, Apr 04, 2019 at 08:37:54PM -0700, Taylor Blau wrote:

>  3. have the traversal machinery communicate the failure to the caller,
>     so that it can decide how to proceed without re-evaluting the object
>     itself.
> 
> Of those, I think (3) is probably the best path forward. However, this
> patch does none of them. In the name of expediently fixing the
> regression to a normal "rev-list --objects" that we use for connectivity
> checks, this simply restores the pre-7c0fe330d5 behavior of having the
> traversal die as soon as it fails to load a tree (when --missing is set
> to MA_ERROR, which is the default).

I think this is worth doing, as it restores the earlier behavior. But a
few general thoughts (which I've shared already with you, but for the
benefit of the list):

 - actually doing the "communicate failure to the caller" would probably
   not be too bad as a single-bit PARSE_FAILED flag in obj->flags. But
   it does require the caller understanding which objects the traversal
   would try to parse (i.e., rev-list would have to understand that it
   is on its own to check blobs, even if they don't have a PARSE_FAILED
   flag).

 - speaking of blobs, this series does not help rev-list find a
   mis-typed or bit-rotted blob at all, because it never opens the
   blobs. Does that mean my expectations for rev-list are simply too
   high, and that we should be expecting fsck-like checks to catch
   these? I dunno.

   It would not be too expensive to convert the existing "do we have the
   blob" check in rev-list to "do we have it, and is its type correct?".
   But obviously finding bitrot would be super-expensive. Which leads me
   to...

 - there actually _is_ a --verify-objects option, which would check even
   blobs for bitrot. It was added long ago in 5a48d24012 (rev-list
   --verify-object, 2011-09-01) for use with check_connected(). But it
   was deemed too slow for normal use, and ripped out in d21c463d55
   (fetch/receive: remove over-pessimistic connectivity check,
   2012-03-15).

That last one implies that we're OK relying on the incoming index-pack
to catch these cases (which is going to do a sha1 over each object).

It does seem like we should bother to notice failures when it's _free_
to do so, which is the case with these tree-loading failures. Which is
basically what this patch is doing.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/7] t: introduce tests for unexpected object types
  2019-04-05 18:24     ` Jeff King
@ 2019-04-05 18:42       ` SZEDER Gábor
  2019-04-05 18:52         ` Jeff King
  2019-04-05 19:25       ` Eric Sunshine
  2019-04-06  5:31       ` Taylor Blau
  2 siblings, 1 reply; 41+ messages in thread
From: SZEDER Gábor @ 2019-04-05 18:42 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, gitster

On Fri, Apr 05, 2019 at 02:24:12PM -0400, Jeff King wrote:
> On Fri, Apr 05, 2019 at 12:50:33PM +0200, SZEDER Gábor wrote:
> 
> > > +test_expect_failure 'traverse unexpected non-tree entry (seen)' '
> > > +	test_must_fail git rev-list --objects $blob $broken_tree >output 2>&1
> > 
> > This test saves standard output and error, but doesn't look at them.
> 
> I think we want to be checking for "not a tree" here, which is later
> added with the fix. But either we should have the test_i18ngrep here
> initially, or we should add both the redirect and the grep with the fix.
> 
> > > +test_expect_success 'setup unexpected non-commit parent' '
> > > +	git cat-file commit $commit |
> > > +		perl -lpe "/^author/ && print q(parent $blob)" \
> > > +		>broken-commit &&
> > 
> > Don't run git commands upstream of a pipe, because the pipe hides
> > their exit code.  This applies to several other tests below as well.
> 
> I disagree with that rule here. We're not testing "cat-file" in any
> meaningful way, but just getting some stock output from a known-good
> commit.

It makes auditing harder and sets bad example.
 
> > Wouldn't a 'sed' one-liner suffice, so we won't have yet another perl
> > dependency?
> 
> Heh, this was actually the subject of much discussion before the patches
> hit the list. If you can write such a one-liner that is both readable
> and portable, please share it. I got disgusted with sed and suggested
> this perl.

Hm, so the trivial s/// with '\n' in the replacement part is not
portable, then?  Oh, well.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/7] t: introduce tests for unexpected object types
  2019-04-05 18:42       ` SZEDER Gábor
@ 2019-04-05 18:52         ` Jeff King
  2019-04-07 21:00           ` Ævar Arnfjörð Bjarmason
  2019-04-08  5:27           ` Junio C Hamano
  0 siblings, 2 replies; 41+ messages in thread
From: Jeff King @ 2019-04-05 18:52 UTC (permalink / raw)
  To: SZEDER Gábor; +Cc: Taylor Blau, git, gitster

On Fri, Apr 05, 2019 at 08:42:29PM +0200, SZEDER Gábor wrote:

> > > Don't run git commands upstream of a pipe, because the pipe hides
> > > their exit code.  This applies to several other tests below as well.
> > 
> > I disagree with that rule here. We're not testing "cat-file" in any
> > meaningful way, but just getting some stock output from a known-good
> > commit.
> 
> It makes auditing harder and sets bad example.

I disagree that it's a bad example. Just because a reader might not
realize that it is sometimes OK and sometimes not, does not make it bad
to do so in the OK case. It means the reader needs to understand the
rule in order to correctly apply it.

I am sympathetic to the auditing issue, though.

I dunno. In this case it is not too bad to do:

  git cat-file commit $commit >good-commit &&
  perl ... <good-commit >broken-commit

but I am not sure I am on board with a blanket rule of "git must never
be on the left-hand side of a pipe".

> > > Wouldn't a 'sed' one-liner suffice, so we won't have yet another perl
> > > dependency?
> > 
> > Heh, this was actually the subject of much discussion before the patches
> > hit the list. If you can write such a one-liner that is both readable
> > and portable, please share it. I got disgusted with sed and suggested
> > this perl.
> 
> Hm, so the trivial s/// with '\n' in the replacement part is not
> portable, then?  Oh, well.

I don't think it is, but I could be wrong. POSIX does say that "\n"
matches a newline in the pattern space, but nothing about it on the RHS
of a substitution. I have a vague feeling of running into problems in
the past, but I could just be misremembering.

We also tried matching /^tree/ and using "a\" to append a line, but it
was both hard to read and hit portability issues with bsd sed.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/7] t: introduce tests for unexpected object types
  2019-04-05 18:24     ` Jeff King
  2019-04-05 18:42       ` SZEDER Gábor
@ 2019-04-05 19:25       ` Eric Sunshine
  2019-04-05 20:53         ` Jeff King
  2019-04-08  6:44         ` Junio C Hamano
  2019-04-06  5:31       ` Taylor Blau
  2 siblings, 2 replies; 41+ messages in thread
From: Eric Sunshine @ 2019-04-05 19:25 UTC (permalink / raw)
  To: Jeff King; +Cc: SZEDER Gábor, Taylor Blau, Git List, Junio C Hamano

On Fri, Apr 5, 2019 at 2:24 PM Jeff King <peff@peff.net> wrote:
> On Fri, Apr 05, 2019 at 12:50:33PM +0200, SZEDER Gábor wrote:
> > > +   git cat-file commit $commit |
> > > +           perl -lpe "/^author/ && print q(parent $blob)" \
> > > +           >broken-commit &&
>
> > Wouldn't a 'sed' one-liner suffice, so we won't have yet another perl
> > dependency?
>
> Heh, this was actually the subject of much discussion before the patches
> hit the list. If you can write such a one-liner that is both readable
> and portable, please share it. I got disgusted with sed and suggested
> this perl.

Trivial and portable 'sed' equivalent:

git cat-file commit $commit | sed "/^author/ { h; s/.*/parent $blob/; G; }"

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/7] t: introduce tests for unexpected object types
  2019-04-05 19:25       ` Eric Sunshine
@ 2019-04-05 20:53         ` Jeff King
  2019-04-06  5:33           ` Taylor Blau
  2019-04-08  6:44         ` Junio C Hamano
  1 sibling, 1 reply; 41+ messages in thread
From: Jeff King @ 2019-04-05 20:53 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: SZEDER Gábor, Taylor Blau, Git List, Junio C Hamano

On Fri, Apr 05, 2019 at 03:25:43PM -0400, Eric Sunshine wrote:

> On Fri, Apr 5, 2019 at 2:24 PM Jeff King <peff@peff.net> wrote:
> > On Fri, Apr 05, 2019 at 12:50:33PM +0200, SZEDER Gábor wrote:
> > > > +   git cat-file commit $commit |
> > > > +           perl -lpe "/^author/ && print q(parent $blob)" \
> > > > +           >broken-commit &&
> >
> > > Wouldn't a 'sed' one-liner suffice, so we won't have yet another perl
> > > dependency?
> >
> > Heh, this was actually the subject of much discussion before the patches
> > hit the list. If you can write such a one-liner that is both readable
> > and portable, please share it. I got disgusted with sed and suggested
> > this perl.
> 
> Trivial and portable 'sed' equivalent:
> 
> git cat-file commit $commit | sed "/^author/ { h; s/.*/parent $blob/; G; }"

I always forget about the hold space. That's pretty readable (though
being sed, it's terse enough that I actually think the perl is more
readable; that may be personal taste, though).

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/7] t: introduce tests for unexpected object types
  2019-04-05 18:31   ` Jeff King
@ 2019-04-06  5:23     ` Taylor Blau
  0 siblings, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2019-04-06  5:23 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, gitster

Hi Peff,

On Fri, Apr 05, 2019 at 02:31:42PM -0400, Jeff King wrote:
> On Thu, Apr 04, 2019 at 08:37:44PM -0700, Taylor Blau wrote:
>
> > Let A be the object referenced with an unexpected type, and B be the
> > object doing the referencing. Do the following:
> >
> >   - test 'git rev-list --objects A B'. This causes A to be "cached", and
> >     presents the above scenario.
> >
> > Likewise, if we have a tree entry that claims to be a tree (for example)
> > but points to another object type (say, a blob), there are two ways we
> > might find out:
> >
> >   - when we call lookup_tree(), we might find that we've already seen
> >     the object referenced as another type, in which case we'd get NULL
> >
> >   - we call lookup_tree() successfully, but when we try to read the
> >     object, we find out it's something else.
> >
> > We should check that we behave sensibly in both cases (especially
> > because it is easy for a malicious actor to provoke one case or the
> > other).
>
> I think our pasting together of multiple commits adding the lone/seen
> cases ended up in some redundancy in the description. In particular, I'm
> not sure what the first paragraph/bullet quoted above is trying to say,
> as it corresponds to the second bullet in the later list.

I agree that this is at the very least redundant, and at the most,
confusing. I think that you're right that this is the result of
squashing the commits often enough that some of this cruft stuck around
and got combined in ways that it shouldn't have.

> Maybe collapse them together like:
>
>   We might hit an unexpected type in two different ways (imagine we have
>   a tree entry that claims to be a tree but actually points to a blob):
>
>     - when we call lookup_tree(), we might find that we've already seen
>       the object referenced as a blob, in which case we'd get NULL. We
>       can exercise this with "git rev-list --objects $blob $tree", which
>       guarantees that the blob will have been parsed before we look in
>       the tree. These tests are marked as "seen" in the test script.
>
>     - we call lookup_tree() successfully, but when we try to read the
>       object, we find out it's something else. We construct our tests
>       such that $blob is not otherwise mentioned in $tree. These tests
>       are marked as "lone" in the script.

Indeed, this is much cleaner (and thanks for writing it here, and saving
me the work). I think that I will take this as-is for 2/7 in v2.

> -Peff

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/7] t: introduce tests for unexpected object types
  2019-04-05 18:24     ` Jeff King
  2019-04-05 18:42       ` SZEDER Gábor
  2019-04-05 19:25       ` Eric Sunshine
@ 2019-04-06  5:31       ` Taylor Blau
  2 siblings, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2019-04-06  5:31 UTC (permalink / raw)
  To: Jeff King; +Cc: SZEDER Gábor, Taylor Blau, git, gitster

On Fri, Apr 05, 2019 at 02:24:12PM -0400, Jeff King wrote:
> On Fri, Apr 05, 2019 at 12:50:33PM +0200, SZEDER Gábor wrote:
>
> > > +test_expect_failure 'traverse unexpected non-tree entry (seen)' '
> > > +	test_must_fail git rev-list --objects $blob $broken_tree >output 2>&1
> >
> > This test saves standard output and error, but doesn't look at them.
>
> I think we want to be checking for "not a tree" here, which is later
> added with the fix. But either we should have the test_i18ngrep here
> initially, or we should add both the redirect and the grep with the fix.

Right, pointing out that saving the standard output and error of 'git
rev-list' and then doing nothing with it as being redundant is certainly
right.

I think that the 'fix' here is to write instead:

  +test_expect_failure 'traverse unexpected non-tree entry (seen)' '
  +	test_must_fail git rev-list --objects $blob $broken_tree
  +'

And _then_ add '>output 2>&1 &&' to the end, capturing the output, as
well as the appropriate test_i18ngrep. This matches the pattern that
we've been otherwise following in the series so far.

(FWIW, I think that this is also the result of squashing the series down
a few times...)

> > > +test_expect_success 'setup unexpected non-commit parent' '
> > > +	git cat-file commit $commit |
> > > +		perl -lpe "/^author/ && print q(parent $blob)" \
> > > +		>broken-commit &&
> >
> > Don't run git commands upstream of a pipe, because the pipe hides
> > their exit code.  This applies to several other tests below as well.
>
> I disagree with that rule here. We're not testing "cat-file" in any
> meaningful way, but just getting some stock output from a known-good
> commit.
>
> > Wouldn't a 'sed' one-liner suffice, so we won't have yet another perl
> > dependency?
>
> Heh, this was actually the subject of much discussion before the patches
> hit the list. If you can write such a one-liner that is both readable
> and portable, please share it. I got disgusted with sed and suggested
> this perl.

I admit that this gave me a chuckle, too. When preparing this series to
send it, I did something like:

  $ git rebase -x 'make && cd t && ./t6102-*.sh

but found that it was broken on the BSD sed that ships with macOS
10.14.2. I didn't notice until preparing the series for upstream because
I wrote it on my Debian 9 VM, with GNU sed (where it is not broken ;-)).

It was originally written as:

	test_expect_success 'setup unexpected non-commit parent' '
		git cat-file commit $commit | sed "/^tree/a\
		parent $blob" >broken-commit &&
		broken_commit="$(git hash-object -w --literally -t commit \
			broken-commit)"
	'

> -Peff

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/7] t: introduce tests for unexpected object types
  2019-04-05 20:53         ` Jeff King
@ 2019-04-06  5:33           ` Taylor Blau
  0 siblings, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2019-04-06  5:33 UTC (permalink / raw)
  To: Jeff King
  Cc: Eric Sunshine, SZEDER Gábor, Taylor Blau, Git List,
	Junio C Hamano

On Fri, Apr 05, 2019 at 04:53:45PM -0400, Jeff King wrote:
> On Fri, Apr 05, 2019 at 03:25:43PM -0400, Eric Sunshine wrote:
>
> > On Fri, Apr 5, 2019 at 2:24 PM Jeff King <peff@peff.net> wrote:
> > > On Fri, Apr 05, 2019 at 12:50:33PM +0200, SZEDER Gábor wrote:
> > > > > +   git cat-file commit $commit |
> > > > > +           perl -lpe "/^author/ && print q(parent $blob)" \
> > > > > +           >broken-commit &&
> > >
> > > > Wouldn't a 'sed' one-liner suffice, so we won't have yet another perl
> > > > dependency?
> > >
> > > Heh, this was actually the subject of much discussion before the patches
> > > hit the list. If you can write such a one-liner that is both readable
> > > and portable, please share it. I got disgusted with sed and suggested
> > > this perl.
> >
> > Trivial and portable 'sed' equivalent:
> >
> > git cat-file commit $commit | sed "/^author/ { h; s/.*/parent $blob/; G; }"
>
> I always forget about the hold space. That's pretty readable (though
> being sed, it's terse enough that I actually think the perl is more
> readable; that may be personal taste, though).

Ah, very nice. Thanks Eric, your sed-fu is much stronger than mine ;-).

I share Peff's view that this might be less readable than its Perl
counterpart, but am similarly sympathetic to the notion that more Perl
is a bad example in 't'.

I think that I'll take this for v2 and get rid of the Perl.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 6/7] rev-list: let traversal die when --missing is not in use
  2019-04-05 18:41   ` Jeff King
@ 2019-04-06  5:36     ` Taylor Blau
  2019-04-07 13:41       ` Jeff King
  0 siblings, 1 reply; 41+ messages in thread
From: Taylor Blau @ 2019-04-06  5:36 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, gitster

On Fri, Apr 05, 2019 at 02:41:11PM -0400, Jeff King wrote:
> On Thu, Apr 04, 2019 at 08:37:54PM -0700, Taylor Blau wrote:
>
> >  3. have the traversal machinery communicate the failure to the caller,
> >     so that it can decide how to proceed without re-evaluting the object
> >     itself.
> >
> > Of those, I think (3) is probably the best path forward. However, this
> > patch does none of them. In the name of expediently fixing the
> > regression to a normal "rev-list --objects" that we use for connectivity
> > checks, this simply restores the pre-7c0fe330d5 behavior of having the
> > traversal die as soon as it fails to load a tree (when --missing is set
> > to MA_ERROR, which is the default).
>
> I think this is worth doing, as it restores the earlier behavior. But a
> few general thoughts (which I've shared already with you, but for the
> benefit of the list):

I agree that it's worth doing. One question that I have is _when_ you
feel it's good to do. I'm happy to write it and include the change in
v2, but if others would be happy not to grow the series too much between
re-rolls, I'd be just as pleased to send it in a new series after this
one.

>  - actually doing the "communicate failure to the caller" would probably
>    not be too bad as a single-bit PARSE_FAILED flag in obj->flags. But
>    it does require the caller understanding which objects the traversal
>    would try to parse (i.e., rev-list would have to understand that it
>    is on its own to check blobs, even if they don't have a PARSE_FAILED
>    flag).
>
>  - speaking of blobs, this series does not help rev-list find a
>    mis-typed or bit-rotted blob at all, because it never opens the
>    blobs. Does that mean my expectations for rev-list are simply too
>    high, and that we should be expecting fsck-like checks to catch
>    these? I dunno.
>
>    It would not be too expensive to convert the existing "do we have the
>    blob" check in rev-list to "do we have it, and is its type correct?".
>    But obviously finding bitrot would be super-expensive. Which leads me
>    to...
>
>  - there actually _is_ a --verify-objects option, which would check even
>    blobs for bitrot. It was added long ago in 5a48d24012 (rev-list
>    --verify-object, 2011-09-01) for use with check_connected(). But it
>    was deemed too slow for normal use, and ripped out in d21c463d55
>    (fetch/receive: remove over-pessimistic connectivity check,
>    2012-03-15).
>
> That last one implies that we're OK relying on the incoming index-pack
> to catch these cases (which is going to do a sha1 over each object).
>
> It does seem like we should bother to notice failures when it's _free_
> to do so, which is the case with these tree-loading failures. Which is
> basically what this patch is doing.
>
> -Peff
Thanks,
Taylor

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 6/7] rev-list: let traversal die when --missing is not in use
  2019-04-06  5:36     ` Taylor Blau
@ 2019-04-07 13:41       ` Jeff King
  2019-04-09  2:11         ` Taylor Blau
  0 siblings, 1 reply; 41+ messages in thread
From: Jeff King @ 2019-04-07 13:41 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, gitster

On Fri, Apr 05, 2019 at 10:36:48PM -0700, Taylor Blau wrote:

> > > Of those, I think (3) is probably the best path forward. However, this
> > > patch does none of them. In the name of expediently fixing the
> > > regression to a normal "rev-list --objects" that we use for connectivity
> > > checks, this simply restores the pre-7c0fe330d5 behavior of having the
> > > traversal die as soon as it fails to load a tree (when --missing is set
> > > to MA_ERROR, which is the default).
> >
> > I think this is worth doing, as it restores the earlier behavior. But a
> > few general thoughts (which I've shared already with you, but for the
> > benefit of the list):
> 
> I agree that it's worth doing. One question that I have is _when_ you
> feel it's good to do. I'm happy to write it and include the change in
> v2, but if others would be happy not to grow the series too much between
> re-rolls, I'd be just as pleased to send it in a new series after this
> one.

I'm not sure what "it" is here. My earlier message was admittedly
rambling, but I think I'm arguing that it's OK to continue to include
this patch that you already have, and punt further changes to make
"rev-list --objects" detect blob problems down the road. I.e., leave the
two expect_failures in place that your v1 series ends with.

-Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/7] t: introduce tests for unexpected object types
  2019-04-05 18:52         ` Jeff King
@ 2019-04-07 21:00           ` Ævar Arnfjörð Bjarmason
  2019-04-09  2:29             ` Taylor Blau
  2019-04-08  5:27           ` Junio C Hamano
  1 sibling, 1 reply; 41+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-04-07 21:00 UTC (permalink / raw)
  To: Jeff King; +Cc: SZEDER Gábor, Taylor Blau, git, gitster


On Fri, Apr 05 2019, Jeff King wrote:

> On Fri, Apr 05, 2019 at 08:42:29PM +0200, SZEDER Gábor wrote:
>
>> > > Don't run git commands upstream of a pipe, because the pipe hides
>> > > their exit code.  This applies to several other tests below as well.
>> >
>> > I disagree with that rule here. We're not testing "cat-file" in any
>> > meaningful way, but just getting some stock output from a known-good
>> > commit.
>>
>> It makes auditing harder and sets bad example.
>
> I disagree that it's a bad example. Just because a reader might not
> realize that it is sometimes OK and sometimes not, does not make it bad
> to do so in the OK case. It means the reader needs to understand the
> rule in order to correctly apply it.

FWIW I thought the rule was something to the effect of "we're hacking on
git, any change might segfault in some unexpected test, let's make sure
that fails right away", hence the blanket rule in t/README against "!
git <cmd>" in favor of "test_must_fail git <cmd>", and "don't feed the
output of a git command to a pipe" documented right after that.

> I am sympathetic to the auditing issue, though.
>
> I dunno. In this case it is not too bad to do:
>
>   git cat-file commit $commit >good-commit &&
>   perl ... <good-commit >broken-commit
>
> but I am not sure I am on board with a blanket rule of "git must never
> be on the left-hand side of a pipe".
>
>> > > Wouldn't a 'sed' one-liner suffice, so we won't have yet another perl
>> > > dependency?
>> >
>> > Heh, this was actually the subject of much discussion before the patches
>> > hit the list. If you can write such a one-liner that is both readable
>> > and portable, please share it. I got disgusted with sed and suggested
>> > this perl.
>>
>> Hm, so the trivial s/// with '\n' in the replacement part is not
>> portable, then?  Oh, well.
>
> I don't think it is, but I could be wrong. POSIX does say that "\n"
> matches a newline in the pattern space, but nothing about it on the RHS
> of a substitution. I have a vague feeling of running into problems in
> the past, but I could just be misremembering.
>
> We also tried matching /^tree/ and using "a\" to append a line, but it
> was both hard to read and hit portability issues with bsd sed.
>
> -Peff

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/7] t: introduce tests for unexpected object types
  2019-04-05 18:52         ` Jeff King
  2019-04-07 21:00           ` Ævar Arnfjörð Bjarmason
@ 2019-04-08  5:27           ` Junio C Hamano
  1 sibling, 0 replies; 41+ messages in thread
From: Junio C Hamano @ 2019-04-08  5:27 UTC (permalink / raw)
  To: Jeff King; +Cc: SZEDER Gábor, Taylor Blau, git

Jeff King <peff@peff.net> writes:

> I don't think it is, but I could be wrong. POSIX does say that "\n"
> matches a newline in the pattern space, but nothing about it on the RHS
> of a substitution. I have a vague feeling of running into problems in
> the past, but I could just be misremembering.

Yes, it was quite bad on minority platforms like AIX when I had to
touch it the last time.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/7] t: introduce tests for unexpected object types
  2019-04-05 19:25       ` Eric Sunshine
  2019-04-05 20:53         ` Jeff King
@ 2019-04-08  6:44         ` Junio C Hamano
  2019-04-09  2:30           ` Taylor Blau
  1 sibling, 1 reply; 41+ messages in thread
From: Junio C Hamano @ 2019-04-08  6:44 UTC (permalink / raw)
  To: Eric Sunshine; +Cc: Jeff King, SZEDER Gábor, Taylor Blau, Git List

Eric Sunshine <sunshine@sunshineco.com> writes:

> On Fri, Apr 5, 2019 at 2:24 PM Jeff King <peff@peff.net> wrote:
>> On Fri, Apr 05, 2019 at 12:50:33PM +0200, SZEDER Gábor wrote:
>> > > +   git cat-file commit $commit |
>> > > +           perl -lpe "/^author/ && print q(parent $blob)" \
>> > > +           >broken-commit &&
>>
>> > Wouldn't a 'sed' one-liner suffice, so we won't have yet another perl
>> > dependency?
>>
>> Heh, this was actually the subject of much discussion before the patches
>> hit the list. If you can write such a one-liner that is both readable
>> and portable, please share it. I got disgusted with sed and suggested
>> this perl.
>
> Trivial and portable 'sed' equivalent:
>
> git cat-file commit $commit | sed "/^author/ { h; s/.*/parent $blob/; G; }"

Looks good.  I had a bit of head scratching moment when I saw that
"perl -lpe" one-liner; this sed expression may not be crystal clear
to those who are not used to, but it is not so bad, either.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 6/7] rev-list: let traversal die when --missing is not in use
  2019-04-07 13:41       ` Jeff King
@ 2019-04-09  2:11         ` Taylor Blau
  0 siblings, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2019-04-09  2:11 UTC (permalink / raw)
  To: Jeff King; +Cc: Taylor Blau, git, gitster

Hi Peff,

On Sun, Apr 07, 2019 at 09:41:13AM -0400, Jeff King wrote:
> On Fri, Apr 05, 2019 at 10:36:48PM -0700, Taylor Blau wrote:
>
> > > > Of those, I think (3) is probably the best path forward. However, this
> > > > patch does none of them. In the name of expediently fixing the
> > > > regression to a normal "rev-list --objects" that we use for connectivity
> > > > checks, this simply restores the pre-7c0fe330d5 behavior of having the
> > > > traversal die as soon as it fails to load a tree (when --missing is set
> > > > to MA_ERROR, which is the default).
> > >
> > > I think this is worth doing, as it restores the earlier behavior. But a
> > > few general thoughts (which I've shared already with you, but for the
> > > benefit of the list):
> >
> > I agree that it's worth doing. One question that I have is _when_ you
> > feel it's good to do. I'm happy to write it and include the change in
> > v2, but if others would be happy not to grow the series too much between
> > re-rolls, I'd be just as pleased to send it in a new series after this
> > one.
>
> I'm not sure what "it" is here.

Yes... as I read this email again after the weekend had passed, I found
myself a little confused, too.

> My earlier message was admittedly rambling, but I think I'm arguing
> that it's OK to continue to include this patch that you already have,
> and punt further changes to make "rev-list --objects" detect blob
> problems down the road. I.e., leave the two expect_failures in place
> that your v1 series ends with.

I believe that that was the "it" that I was talking about it. To be
explicit, I think I was suggesting that we should not change this patch
much or add more to the series, and rather address the blob checking in
a new series after this one.

> -Peff

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/7] t: introduce tests for unexpected object types
  2019-04-07 21:00           ` Ævar Arnfjörð Bjarmason
@ 2019-04-09  2:29             ` Taylor Blau
  2019-04-09  9:14               ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 41+ messages in thread
From: Taylor Blau @ 2019-04-09  2:29 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jeff King, SZEDER Gábor, Taylor Blau, git, gitster

Hi Ævar,

On Sun, Apr 07, 2019 at 11:00:19PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> On Fri, Apr 05 2019, Jeff King wrote:
>
> > On Fri, Apr 05, 2019 at 08:42:29PM +0200, SZEDER Gábor wrote:
> >
> >> > > Don't run git commands upstream of a pipe, because the pipe hides
> >> > > their exit code.  This applies to several other tests below as well.
> >> >
> >> > I disagree with that rule here. We're not testing "cat-file" in any
> >> > meaningful way, but just getting some stock output from a known-good
> >> > commit.
> >>
> >> It makes auditing harder and sets bad example.
> >
> > I disagree that it's a bad example. Just because a reader might not
> > realize that it is sometimes OK and sometimes not, does not make it bad
> > to do so in the OK case. It means the reader needs to understand the
> > rule in order to correctly apply it.
>
> FWIW I thought the rule was something to the effect of "we're hacking on
> git, any change might segfault in some unexpected test, let's make sure
> that fails right away", hence the blanket rule in t/README against "!
> git <cmd>" in favor of "test_must_fail git <cmd>", and "don't feed the
> output of a git command to a pipe" documented right after that.

I have some loosely-held opinions on this. Certainly knowing if a change
to git caused a segfault in some test is something that we would want to
know about, though I'm not sure we're loosing anything by putting 'git'
on the left-hand side of a pipe here.

  - If someone wrote a change to git that introduced a segfault in 'git
    cat-file', I'm sure that this would not be the only place that in
    the suite that would break because of it. Presumably, at least one
    of those places uses 'test_must_fail git ...' instead

  - At the very least, 'broken-commit' doesn't contain what it needs to,
    the test breaks in another way (albeit further from the actual
    defect), and the developer finds out about their bug that way.

In any case, these above two might be moot anyways, because I'm almost
certain that it will be a rarity for someone to _only_ run t6102, unless
it is included in a blanket 'make' from within 't'.

> > I am sympathetic to the auditing issue, though.

Just to satisfy my curiosity, I put git on the left-hand side of a pipe
to see how many such examples exist today:

  ~/g/git (master) $ git grep 'git cat-file .* |' -- t/t*.sh | wc -l
      179

I'm not going to claim that changing 179 -> 180 is neutral or bad, but
it's interesting nonetheless.

> > I dunno. In this case it is not too bad to do:
> >
> >   git cat-file commit $commit >good-commit &&
> >   perl ... <good-commit >broken-commit
> >
> > but I am not sure I am on board with a blanket rule of "git must never
> > be on the left-hand side of a pipe".

I think that I mostly agree with Peff here for the reasons above.

That all said, I don't really want to hold up the series for this alone.
Since there don't seem to be many other comments or issues, I would be
quite happy to do whatever people are most in favor of.

I basically don't really feel strongly about writing:

  git cat-file commit $commit | sed -e ... >broken-commit

as opposed to:

  git cat-file commit $commit >good-commit &&
  sed -e '' <commit >bad-commit

> >> > > Wouldn't a 'sed' one-liner suffice, so we won't have yet another perl
> >> > > dependency?
> >> >
> >> > Heh, this was actually the subject of much discussion before the patches
> >> > hit the list. If you can write such a one-liner that is both readable
> >> > and portable, please share it. I got disgusted with sed and suggested
> >> > this perl.
> >>
> >> Hm, so the trivial s/// with '\n' in the replacement part is not
> >> portable, then?  Oh, well.
> >
> > I don't think it is, but I could be wrong. POSIX does say that "\n"
> > matches a newline in the pattern space, but nothing about it on the RHS
> > of a substitution. I have a vague feeling of running into problems in
> > the past, but I could just be misremembering.
> >
> > We also tried matching /^tree/ and using "a\" to append a line, but it
> > was both hard to read and hit portability issues with bsd sed.

I think that all of this discussion is addressed elsewhere in thread, but
the gist is that Eric provided a suitable sed invocation that I am going
to use instead of Peff's Perl.

> > -Peff

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/7] t: introduce tests for unexpected object types
  2019-04-08  6:44         ` Junio C Hamano
@ 2019-04-09  2:30           ` Taylor Blau
  2019-04-09  3:28             ` Eric Sunshine
  0 siblings, 1 reply; 41+ messages in thread
From: Taylor Blau @ 2019-04-09  2:30 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Eric Sunshine, Jeff King, SZEDER Gábor, Taylor Blau,
	Git List

Hi Junio,


On Mon, Apr 08, 2019 at 03:44:25PM +0900, Junio C Hamano wrote:
> Eric Sunshine <sunshine@sunshineco.com> writes:
>
> > On Fri, Apr 5, 2019 at 2:24 PM Jeff King <peff@peff.net> wrote:
> >> On Fri, Apr 05, 2019 at 12:50:33PM +0200, SZEDER Gábor wrote:
> >> > > +   git cat-file commit $commit |
> >> > > +           perl -lpe "/^author/ && print q(parent $blob)" \
> >> > > +           >broken-commit &&
> >>
> >> > Wouldn't a 'sed' one-liner suffice, so we won't have yet another perl
> >> > dependency?
> >>
> >> Heh, this was actually the subject of much discussion before the patches
> >> hit the list. If you can write such a one-liner that is both readable
> >> and portable, please share it. I got disgusted with sed and suggested
> >> this perl.
> >
> > Trivial and portable 'sed' equivalent:
> >
> > git cat-file commit $commit | sed "/^author/ { h; s/.*/parent $blob/; G; }"
>
> Looks good.  I had a bit of head scratching moment when I saw that
> "perl -lpe" one-liner; this sed expression may not be crystal clear
> to those who are not used to, but it is not so bad, either.

Should I take this as your endorsement of putting 'git' on the left-hand
side of a pipe? ;-).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/7] t: introduce tests for unexpected object types
  2019-04-09  2:30           ` Taylor Blau
@ 2019-04-09  3:28             ` Eric Sunshine
  2019-04-09  5:08               ` Taylor Blau
  0 siblings, 1 reply; 41+ messages in thread
From: Eric Sunshine @ 2019-04-09  3:28 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Junio C Hamano, Jeff King, SZEDER Gábor, Git List

On Mon, Apr 8, 2019 at 10:31 PM Taylor Blau <me@ttaylorr.com> wrote:
> On Mon, Apr 08, 2019 at 03:44:25PM +0900, Junio C Hamano wrote:
> > Eric Sunshine <sunshine@sunshineco.com> writes:
> > >> > > +   git cat-file commit $commit |
> > >> > > +           perl -lpe "/^author/ && print q(parent $blob)" \
> > >> > > +           >broken-commit &&
> > >
> > > Trivial and portable 'sed' equivalent:
> > >
> > > git cat-file commit $commit | sed "/^author/ { h; s/.*/parent $blob/; G; }"
> >
> > Looks good.  I had a bit of head scratching moment when I saw that
> > "perl -lpe" one-liner; this sed expression may not be crystal clear
> > to those who are not used to, but it is not so bad, either.
>
> Should I take this as your endorsement of putting 'git' on the left-hand
> side of a pipe? ;-).

I suspect that Junio's "Looks good" was referring to the 'sed expression.

With all the recent work of moving away from having Git upstream of a
pipe, let's not intentionally introduce a new instance. I wrote the
example 'sed' expression that way merely to mirror how the original
'perl' version was written to make it easier to see the equivalence
(not because it was intended as an endorsement of having Git upstream
of a pipe).

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/7] t: introduce tests for unexpected object types
  2019-04-09  3:28             ` Eric Sunshine
@ 2019-04-09  5:08               ` Taylor Blau
  2019-04-09  8:02                 ` Eric Sunshine
  0 siblings, 1 reply; 41+ messages in thread
From: Taylor Blau @ 2019-04-09  5:08 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Taylor Blau, Junio C Hamano, Jeff King, SZEDER Gábor,
	Git List

Hi Eric,

On Mon, Apr 08, 2019 at 11:28:19PM -0400, Eric Sunshine wrote:
> On Mon, Apr 8, 2019 at 10:31 PM Taylor Blau <me@ttaylorr.com> wrote:
> > On Mon, Apr 08, 2019 at 03:44:25PM +0900, Junio C Hamano wrote:
> > > Eric Sunshine <sunshine@sunshineco.com> writes:
> > > >> > > +   git cat-file commit $commit |
> > > >> > > +           perl -lpe "/^author/ && print q(parent $blob)" \
> > > >> > > +           >broken-commit &&
> > > >
> > > > Trivial and portable 'sed' equivalent:
> > > >
> > > > git cat-file commit $commit | sed "/^author/ { h; s/.*/parent $blob/; G; }"
> > >
> > > Looks good.  I had a bit of head scratching moment when I saw that
> > > "perl -lpe" one-liner; this sed expression may not be crystal clear
> > > to those who are not used to, but it is not so bad, either.
> >
> > Should I take this as your endorsement of putting 'git' on the left-hand
> > side of a pipe? ;-).
>
> I suspect that Junio's "Looks good" was referring to the 'sed expression.

I think that you are right -- and I'll happily _not_ introduce more Git
on the left-hand-side of a pipe instances.

I noticed a few more instances in t6102 where we do something similar
to:

  git cat-file -p <something> | sed ... >broken-<something> &&

I wrote the following patch, which I've folded into my local copy (and
will send with v2):

diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh
index b9d82f9542..15072ecce3 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -7,7 +7,8 @@ test_description='git rev-list should handle unexpected object types'
 test_expect_success 'setup well-formed objects' '
 	blob="$(printf "foo" | git hash-object -w --stdin)" &&
 	tree="$(printf "100644 blob $blob\tfoo" | git mktree)" &&
-	commit="$(git commit-tree $tree -m "first commit")"
+	commit="$(git commit-tree $tree -m "first commit")" &&
+	git cat-file commit $commit >good-commit
 '

 test_expect_success 'setup unexpected non-blob entry' '
@@ -37,8 +38,8 @@ test_expect_failure 'traverse unexpected non-tree entry (seen)' '
 '

 test_expect_success 'setup unexpected non-commit parent' '
-	git cat-file commit $commit | \
-		sed "/^author/ { h; s/.*/parent $blob/; G; }" >broken-commit &&
+	sed "/^author/ { h; s/.*/parent $blob/; G; }" <good-commit \
+		>broken-commit &&
 	broken_commit="$(git hash-object -w --literally -t commit \
 		broken-commit)"
 '
@@ -55,8 +56,7 @@ test_expect_success 'traverse unexpected non-commit parent (seen)' '
 '

 test_expect_success 'setup unexpected non-tree root' '
-	git cat-file commit $commit |
-	sed -e "s/$tree/$blob/" >broken-commit &&
+	sed -e "s/$tree/$blob/" <good-commit >broken-commit &&
 	broken_commit="$(git hash-object -w --literally -t commit \
 		broken-commit)"
 '
@@ -71,8 +71,9 @@ test_expect_failure 'traverse unexpected non-tree root (seen)' '

 test_expect_success 'setup unexpected non-commit tag' '
 	git tag -a -m "tagged commit" tag $commit &&
+	git cat-file tag tag >good-tag &&
 	test_when_finished "git tag -d tag" &&
-	git cat-file -p tag | sed -e "s/$commit/$blob/" >broken-tag &&
+	sed -e "s/$commit/$blob/" <good-tag >broken-tag &&
 	tag=$(git hash-object -w --literally -t tag broken-tag)
 '

@@ -87,9 +88,9 @@ test_expect_success 'traverse unexpected non-commit tag (seen)' '

 test_expect_success 'setup unexpected non-tree tag' '
 	git tag -a -m "tagged tree" tag $tree &&
+	git cat-file tag tag >good-tag &&
 	test_when_finished "git tag -d tag" &&
-	git cat-file -p tag |
-	sed -e "s/$tree/$blob/" >broken-tag &&
+	sed -e "s/$tree/$blob/" <good-tag >broken-tag &&
 	tag=$(git hash-object -w --literally -t tag broken-tag)
 '

@@ -104,9 +105,9 @@ test_expect_success 'traverse unexpected non-tree tag (seen)' '

 test_expect_success 'setup unexpected non-blob tag' '
 	git tag -a -m "tagged blob" tag $blob &&
+	git cat-file tag tag >good-tag &&
 	test_when_finished "git tag -d tag" &&
-	git cat-file -p tag |
-	sed -e "s/$blob/$commit/" >broken-tag &&
+	sed -e "s/$blob/$commit/" <good-tag >broken-tag &&
 	tag=$(git hash-object -w --literally -t tag broken-tag)
 '

> With all the recent work of moving away from having Git upstream of a
> pipe, let's not intentionally introduce a new instance. I wrote the
> example 'sed' expression that way merely to mirror how the original
> 'perl' version was written to make it easier to see the equivalence
> (not because it was intended as an endorsement of having Git upstream
> of a pipe).

I see, and thank you for the clarification. Let me know if you like the
patch above.

Thanks,
Taylor

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/7] t: introduce tests for unexpected object types
  2019-04-09  5:08               ` Taylor Blau
@ 2019-04-09  8:02                 ` Eric Sunshine
  2019-04-10  1:54                   ` Taylor Blau
  0 siblings, 1 reply; 41+ messages in thread
From: Eric Sunshine @ 2019-04-09  8:02 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Junio C Hamano, Jeff King, SZEDER Gábor, Git List

On Tue, Apr 9, 2019 at 1:09 AM Taylor Blau <me@ttaylorr.com> wrote:
> On Mon, Apr 08, 2019 at 11:28:19PM -0400, Eric Sunshine wrote:
> > I suspect that Junio's "Looks good" was referring to the 'sed expression.
>
> I think that you are right -- and I'll happily _not_ introduce more Git
> on the left-hand-side of a pipe instances.
>
> I noticed a few more instances in t6102 [...]

Indeed, SZEDER mentioned those in [1]:

    Don't run git commands upstream of a pipe, because the pipe
    hides their exit code.  This applies to several other tests
    below as well.

[1]: http://public-inbox.org/git/20190405105033.GT32732@szeder.dev/

> I wrote the following patch, which I've folded into my local copy (and
> will send with v2):
>
> > With all the recent work of moving away from having Git upstream of a
> > pipe, let's not intentionally introduce a new instance. I wrote the
> > example 'sed' expression that way merely to mirror how the original
> > 'perl' version was written to make it easier to see the equivalence
> > (not because it was intended as an endorsement of having Git upstream
> > of a pipe).
>
> I see, and thank you for the clarification. Let me know if you like the
> patch above.

Looks fine. Thanks.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/7] t: introduce tests for unexpected object types
  2019-04-09  2:29             ` Taylor Blau
@ 2019-04-09  9:14               ` Ævar Arnfjörð Bjarmason
  2019-04-10  1:59                 ` Taylor Blau
  0 siblings, 1 reply; 41+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2019-04-09  9:14 UTC (permalink / raw)
  To: Taylor Blau; +Cc: Jeff King, SZEDER Gábor, git, gitster


On Tue, Apr 09 2019, Taylor Blau wrote:

> Hi Ævar,
>
> On Sun, Apr 07, 2019 at 11:00:19PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Fri, Apr 05 2019, Jeff King wrote:
>>
>> > On Fri, Apr 05, 2019 at 08:42:29PM +0200, SZEDER Gábor wrote:
>> >
>> >> > > Don't run git commands upstream of a pipe, because the pipe hides
>> >> > > their exit code.  This applies to several other tests below as well.
>> >> >
>> >> > I disagree with that rule here. We're not testing "cat-file" in any
>> >> > meaningful way, but just getting some stock output from a known-good
>> >> > commit.
>> >>
>> >> It makes auditing harder and sets bad example.
>> >
>> > I disagree that it's a bad example. Just because a reader might not
>> > realize that it is sometimes OK and sometimes not, does not make it bad
>> > to do so in the OK case. It means the reader needs to understand the
>> > rule in order to correctly apply it.
>>
>> FWIW I thought the rule was something to the effect of "we're hacking on
>> git, any change might segfault in some unexpected test, let's make sure
>> that fails right away", hence the blanket rule in t/README against "!
>> git <cmd>" in favor of "test_must_fail git <cmd>", and "don't feed the
>> output of a git command to a pipe" documented right after that.
>
> I have some loosely-held opinions on this. Certainly knowing if a change
> to git caused a segfault in some test is something that we would want to
> know about, though I'm not sure we're loosing anything by putting 'git'
> on the left-hand side of a pipe here.
>
>   - If someone wrote a change to git that introduced a segfault in 'git
>     cat-file', I'm sure that this would not be the only place that in
>     the suite that would break because of it. Presumably, at least one
>     of those places uses 'test_must_fail git ...' instead
>
>   - At the very least, 'broken-commit' doesn't contain what it needs to,
>     the test breaks in another way (albeit further from the actual
>     defect), and the developer finds out about their bug that way.
>
> In any case, these above two might be moot anyways, because I'm almost
> certain that it will be a rarity for someone to _only_ run t6102, unless
> it is included in a blanket 'make' from within 't'.

First. I realize we're talking about the light fixtures in the bike shed
at this point, but with that in mind...

I just think it's useful as a general rule, particularly since with the
"special setups" in the test mode we've found that all sorts of odd
tests we didn't expect to stress test some features turned out to cover
edge cases of them, e.g. when "GIT_TEST_COMMIT_GRAPH" was added we found
that a bunch of random stuff segfaulted all over the place.

So it's hard to say with any confidence in advance that something isn't
going to stress git in some unusual way and isn't useful to guard for
segfaults.

Of course if and when it segfaults it's likely to be seen by subsequent
tests. In this case I'll note that if git fails we'll happily run not
only perl/sed, but then hash-object the subsequent empty file, and then
(presumably) fail in the next test.

>> > I am sympathetic to the auditing issue, though.
>
> Just to satisfy my curiosity, I put git on the left-hand side of a pipe
> to see how many such examples exist today:
>
>   ~/g/git (master) $ git grep 'git cat-file .* |' -- t/t*.sh | wc -l
>       179
>
> I'm not going to claim that changing 179 -> 180 is neutral or bad, but
> it's interesting nonetheless.

Separate from the question of if we generally agree that some value of
"Y" makes for good coding style or not, we're always going to be in some
flux state where a bunch of examples in our existing codebase contradict
that, particularly in the test suite.

I think that's a bit unfortunate in some ways. It's the result of the
default "policy" that refactoring for refactoring's sakes is generally
frowned upon, so we don't tend to go back and mass-fix "X" to "Y" once
we agree "Y" is better than "X" for new code, just do it as we go when
new code is written, or existing tests are updated for other reasons
"while we're at it".

>> > I dunno. In this case it is not too bad to do:
>> >
>> >   git cat-file commit $commit >good-commit &&
>> >   perl ... <good-commit >broken-commit
>> >
>> > but I am not sure I am on board with a blanket rule of "git must never
>> > be on the left-hand side of a pipe".
>
> I think that I mostly agree with Peff here for the reasons above.
>
> That all said, I don't really want to hold up the series for this alone.
> Since there don't seem to be many other comments or issues, I would be
> quite happy to do whatever people are most in favor of.

FWIW this series LGTM as a whole. I think it says something about the
general quality of it that we're all in some giant nitpick thread about
perl v.s. sed and git on the LHS of a pipe or not :) I'm happy to have
it queued as-is. These test issues are minor...

> I basically don't really feel strongly about writing:
>
>   git cat-file commit $commit | sed -e ... >broken-commit
>
> as opposed to:
>
>   git cat-file commit $commit >good-commit &&
>   sed -e '' <commit >bad-commit
>
>> >> > > Wouldn't a 'sed' one-liner suffice, so we won't have yet another perl
>> >> > > dependency?
>> >> >
>> >> > Heh, this was actually the subject of much discussion before the patches
>> >> > hit the list. If you can write such a one-liner that is both readable
>> >> > and portable, please share it. I got disgusted with sed and suggested
>> >> > this perl.
>> >>
>> >> Hm, so the trivial s/// with '\n' in the replacement part is not
>> >> portable, then?  Oh, well.
>> >
>> > I don't think it is, but I could be wrong. POSIX does say that "\n"
>> > matches a newline in the pattern space, but nothing about it on the RHS
>> > of a substitution. I have a vague feeling of running into problems in
>> > the past, but I could just be misremembering.
>> >
>> > We also tried matching /^tree/ and using "a\" to append a line, but it
>> > was both hard to read and hit portability issues with bsd sed.
>
> I think that all of this discussion is addressed elsewhere in thread, but
> the gist is that Eric provided a suitable sed invocation that I am going
> to use instead of Peff's Perl.
>
>> > -Peff
>
> Thanks,
> Taylor

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/7] t: introduce tests for unexpected object types
  2019-04-09  8:02                 ` Eric Sunshine
@ 2019-04-10  1:54                   ` Taylor Blau
  0 siblings, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2019-04-10  1:54 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Taylor Blau, Junio C Hamano, Jeff King, SZEDER Gábor,
	Git List

Hi Eric,

On Tue, Apr 09, 2019 at 04:02:23AM -0400, Eric Sunshine wrote:
> On Tue, Apr 9, 2019 at 1:09 AM Taylor Blau <me@ttaylorr.com> wrote:
> > On Mon, Apr 08, 2019 at 11:28:19PM -0400, Eric Sunshine wrote:
> > > I suspect that Junio's "Looks good" was referring to the 'sed expression.
> >
> > I think that you are right -- and I'll happily _not_ introduce more Git
> > on the left-hand-side of a pipe instances.
> >
> > I noticed a few more instances in t6102 [...]
>
> Indeed, SZEDER mentioned those in [1]:

Aha, thanks -- I'm not sure how I missed that in [1]. Credit is given
where it is due :-).

>     Don't run git commands upstream of a pipe, because the pipe
>     hides their exit code.  This applies to several other tests
>     below as well.
>
> [1]: http://public-inbox.org/git/20190405105033.GT32732@szeder.dev/
>
> > I wrote the following patch, which I've folded into my local copy (and
> > will send with v2):
> >
> > > With all the recent work of moving away from having Git upstream of a
> > > pipe, let's not intentionally introduce a new instance. I wrote the
> > > example 'sed' expression that way merely to mirror how the original
> > > 'perl' version was written to make it easier to see the equivalence
> > > (not because it was intended as an endorsement of having Git upstream
> > > of a pipe).
> >
> > I see, and thank you for the clarification. Let me know if you like the
> > patch above.
>
> Looks fine. Thanks.

Great -- I appreciate your review. I'll send v2 shortly with these
changes in it.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH 2/7] t: introduce tests for unexpected object types
  2019-04-09  9:14               ` Ævar Arnfjörð Bjarmason
@ 2019-04-10  1:59                 ` Taylor Blau
  0 siblings, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2019-04-10  1:59 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Taylor Blau, Jeff King, SZEDER Gábor, git, gitster

Hi Ævar,

On Tue, Apr 09, 2019 at 11:14:48AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
> On Tue, Apr 09 2019, Taylor Blau wrote:
>
> > Hi Ævar,
> >
> > On Sun, Apr 07, 2019 at 11:00:19PM +0200, Ævar Arnfjörð Bjarmason wrote:
> >>
> >> On Fri, Apr 05 2019, Jeff King wrote:
> >>
> >> > On Fri, Apr 05, 2019 at 08:42:29PM +0200, SZEDER Gábor wrote:
> >> >
> >> >> > > Don't run git commands upstream of a pipe, because the pipe hides
> >> >> > > their exit code.  This applies to several other tests below as well.
> >> >> >
> >> >> > I disagree with that rule here. We're not testing "cat-file" in any
> >> >> > meaningful way, but just getting some stock output from a known-good
> >> >> > commit.
> >> >>
> >> >> It makes auditing harder and sets bad example.
> >> >
> >> > I disagree that it's a bad example. Just because a reader might not
> >> > realize that it is sometimes OK and sometimes not, does not make it bad
> >> > to do so in the OK case. It means the reader needs to understand the
> >> > rule in order to correctly apply it.
> >>
> >> FWIW I thought the rule was something to the effect of "we're hacking on
> >> git, any change might segfault in some unexpected test, let's make sure
> >> that fails right away", hence the blanket rule in t/README against "!
> >> git <cmd>" in favor of "test_must_fail git <cmd>", and "don't feed the
> >> output of a git command to a pipe" documented right after that.
> >
> > I have some loosely-held opinions on this. Certainly knowing if a change
> > to git caused a segfault in some test is something that we would want to
> > know about, though I'm not sure we're loosing anything by putting 'git'
> > on the left-hand side of a pipe here.
> >
> >   - If someone wrote a change to git that introduced a segfault in 'git
> >     cat-file', I'm sure that this would not be the only place that in
> >     the suite that would break because of it. Presumably, at least one
> >     of those places uses 'test_must_fail git ...' instead
> >
> >   - At the very least, 'broken-commit' doesn't contain what it needs to,
> >     the test breaks in another way (albeit further from the actual
> >     defect), and the developer finds out about their bug that way.
> >
> > In any case, these above two might be moot anyways, because I'm almost
> > certain that it will be a rarity for someone to _only_ run t6102, unless
> > it is included in a blanket 'make' from within 't'.
>
> First. I realize we're talking about the light fixtures in the bike shed
> at this point, but with that in mind...

I don't mind some bike-shedding ;-).

> I just think it's useful as a general rule, particularly since with the
> "special setups" in the test mode we've found that all sorts of odd
> tests we didn't expect to stress test some features turned out to cover
> edge cases of them, e.g. when "GIT_TEST_COMMIT_GRAPH" was added we found
> that a bunch of random stuff segfaulted all over the place.
>
> So it's hard to say with any confidence in advance that something isn't
> going to stress git in some unusual way and isn't useful to guard for
> segfaults.

Yes, I think that's certainly true. There's something to be said for my
argument, too, (i.e., that it's ``probably'' just fine), but as I think
about it more, I'm less convinced that mine is a position worth holding.

Even if we catch only a bug or two on an off chance, there aren't too
many things I'd sacrifice to not make it so. In other words, I'm willing
to do quite a lot in order to get even a little indication of when
things are broken (including replacing 'git ... | sed' with 'git ... >foo &&
sed ... <foo').

> Of course if and when it segfaults it's likely to be seen by subsequent
> tests. In this case I'll note that if git fails we'll happily run not
> only perl/sed, but then hash-object the subsequent empty file, and then
> (presumably) fail in the next test.

Sure, we'll probably find out about it anyway, but still...

> >> > I am sympathetic to the auditing issue, though.
> >
> > Just to satisfy my curiosity, I put git on the left-hand side of a pipe
> > to see how many such examples exist today:
> >
> >   ~/g/git (master) $ git grep 'git cat-file .* |' -- t/t*.sh | wc -l
> >       179
> >
> > I'm not going to claim that changing 179 -> 180 is neutral or bad, but
> > it's interesting nonetheless.
>
> Separate from the question of if we generally agree that some value of
> "Y" makes for good coding style or not, we're always going to be in some
> flux state where a bunch of examples in our existing codebase contradict
> that, particularly in the test suite.
>
> I think that's a bit unfortunate in some ways. It's the result of the
> default "policy" that refactoring for refactoring's sakes is generally
> frowned upon, so we don't tend to go back and mass-fix "X" to "Y" once
> we agree "Y" is better than "X" for new code, just do it as we go when
> new code is written, or existing tests are updated for other reasons
> "while we're at it".

Is that refactoring for its own sake, though? I would personally be
happy to write a series that either (1) identifies spots in 't' where
`git` is found on the left-hand side of a pipe, (2) fixes the bad-style
invocation in those spots, or (3) both.

> >> > I dunno. In this case it is not too bad to do:
> >> >
> >> >   git cat-file commit $commit >good-commit &&
> >> >   perl ... <good-commit >broken-commit
> >> >
> >> > but I am not sure I am on board with a blanket rule of "git must never
> >> > be on the left-hand side of a pipe".
> >
> > I think that I mostly agree with Peff here for the reasons above.
> >
> > That all said, I don't really want to hold up the series for this alone.
> > Since there don't seem to be many other comments or issues, I would be
> > quite happy to do whatever people are most in favor of.
>
> FWIW this series LGTM as a whole. I think it says something about the
> general quality of it that we're all in some giant nitpick thread about
> perl v.s. sed and git on the LHS of a pipe or not :) I'm happy to have
> it queued as-is. These test issues are minor...

Thanks, I appreciate that you think the series is in good shape. Since
there haven't been too many comments other than this thread, I'm going
to send v2 now.

I decided to ban t6102 of all such 'git ... | sed' invocations, and
described the change earlier in this thread.

> > I basically don't really feel strongly about writing:
> >
> >   git cat-file commit $commit | sed -e ... >broken-commit
> >
> > as opposed to:
> >
> >   git cat-file commit $commit >good-commit &&
> >   sed -e '' <commit >bad-commit
> >
> >> >> > > Wouldn't a 'sed' one-liner suffice, so we won't have yet another perl
> >> >> > > dependency?
> >> >> >
> >> >> > Heh, this was actually the subject of much discussion before the patches
> >> >> > hit the list. If you can write such a one-liner that is both readable
> >> >> > and portable, please share it. I got disgusted with sed and suggested
> >> >> > this perl.
> >> >>
> >> >> Hm, so the trivial s/// with '\n' in the replacement part is not
> >> >> portable, then?  Oh, well.
> >> >
> >> > I don't think it is, but I could be wrong. POSIX does say that "\n"
> >> > matches a newline in the pattern space, but nothing about it on the RHS
> >> > of a substitution. I have a vague feeling of running into problems in
> >> > the past, but I could just be misremembering.
> >> >
> >> > We also tried matching /^tree/ and using "a\" to append a line, but it
> >> > was both hard to read and hit portability issues with bsd sed.
> >
> > I think that all of this discussion is addressed elsewhere in thread, but
> > the gist is that Eric provided a suitable sed invocation that I am going
> > to use instead of Peff's Perl.
> >
> >> > -Peff
> >
> > Thanks,
> > Taylor

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 0/7] harden unexpected object types checks
  2019-04-05  3:37 [PATCH 0/7] harden unexpected object types checks Taylor Blau
                   ` (6 preceding siblings ...)
  2019-04-05  3:37 ` [PATCH 7/7] rev-list: detect broken root trees Taylor Blau
@ 2019-04-10  2:13 ` Taylor Blau
  2019-04-10  2:13   ` [PATCH v2 1/7] t: move 'hex2oct' into test-lib-functions.sh Taylor Blau
                     ` (6 more replies)
  7 siblings, 7 replies; 41+ messages in thread
From: Taylor Blau @ 2019-04-10  2:13 UTC (permalink / raw)
  To: git; +Cc: avarab, gitster, peff, sunshine, szeder.dev

Hi everyone,

Here is 'v2' of mine and Peff's series to add tests for and fix some
cases when object connections are malformed in various ways.

There aren't many major changes from the initial version, but what has
changed is described briefly here, and in complete detail in the
range-diff at the bottom of this message:

  - [2/7] was amended with Peff's suggestion to be a little clearer about
    what 'lone' and 'seen' mean

  - [2/7] was changed to replace all instances of:

      git cat-file ... | sed -e ... >broken-obj

    with:

      git cat-file ... >good-obj &&
      sed -e ... <good-obj >broken-obj

  - [4/7] was cleaned up to include the right test after a faulty
    rebase.

Thanks as always for your review :-).


Thanks,
Taylor

Jeff King (3):
  get_commit_tree(): return NULL for broken tree
  rev-list: let traversal die when --missing is not in use
  rev-list: detect broken root trees

Taylor Blau (4):
  t: move 'hex2oct' into test-lib-functions.sh
  t: introduce tests for unexpected object types
  list-objects.c: handle unexpected non-blob entries
  list-objects.c: handle unexpected non-tree entries

 builtin/rev-list.c                     |   4 +-
 commit.c                               |   6 +-
 list-objects.c                         |  13 +++
 t/t1007-hash-object.sh                 |   4 -
 t/t1450-fsck.sh                        |   4 -
 t/t5601-clone.sh                       |   4 -
 t/t6102-rev-list-unexpected-objects.sh | 127 +++++++++++++++++++++++++
 t/test-lib-functions.sh                |   6 ++
 8 files changed, 152 insertions(+), 16 deletions(-)
 create mode 100755 t/t6102-rev-list-unexpected-objects.sh

Range-diff against v1:
1:  2104508e42 = 1:  f09c374557 t: move 'hex2oct' into test-lib-functions.sh
2:  909cbcd4a1 ! 2:  137e2df24d t: introduce tests for unexpected object types
    @@ -20,21 +20,19 @@
         cases that are not broken (i.e., they have NULL-ness checks or similar),
         mark these as expecting success.

    -    Let A be the object referenced with an unexpected type, and B be the
    -    object doing the referencing. Do the following:
    -
    -      - test 'git rev-list --objects A B'. This causes A to be "cached", and
    -        presents the above scenario.
    -
    -    Likewise, if we have a tree entry that claims to be a tree (for example)
    -    but points to another object type (say, a blob), there are two ways we
    -    might find out:
    +    We might hit an unexpected type in two different ways (imagine we have a
    +    tree entry that claims to be a tree but actually points to a blob):

           - when we call lookup_tree(), we might find that we've already seen
    -        the object referenced as another type, in which case we'd get NULL
    +        the object referenced as a blob, in which case we'd get NULL. We
    +        can exercise this with "git rev-list --objects $blob $tree", which
    +        guarantees that the blob will have been parsed before we look in
    +        the tree. These tests are marked as "seen" in the test script.

           - we call lookup_tree() successfully, but when we try to read the
    -        object, we find out it's something else.
    +        object, we find out it's something else. We construct our tests
    +        such that $blob is not otherwise mentioned in $tree. These tests
    +        are marked as "lone" in the script.

         We should check that we behave sensibly in both cases (especially
         because it is easy for a malicious actor to provoke one case or the
    @@ -58,7 +56,8 @@
     +test_expect_success 'setup well-formed objects' '
     +	blob="$(printf "foo" | git hash-object -w --stdin)" &&
     +	tree="$(printf "100644 blob $blob\tfoo" | git mktree)" &&
    -+	commit="$(git commit-tree $tree -m "first commit")"
    ++	commit="$(git commit-tree $tree -m "first commit")" &&
    ++	git cat-file commit $commit >good-commit
     +'
     +
     +test_expect_success 'setup unexpected non-blob entry' '
    @@ -84,12 +83,11 @@
     +'
     +
     +test_expect_failure 'traverse unexpected non-tree entry (seen)' '
    -+	test_must_fail git rev-list --objects $blob $broken_tree >output 2>&1
    ++	test_must_fail git rev-list --objects $blob $broken_tree
     +'
     +
     +test_expect_success 'setup unexpected non-commit parent' '
    -+	git cat-file commit $commit |
    -+		perl -lpe "/^author/ && print q(parent $blob)" \
    ++	sed "/^author/ { h; s/.*/parent $blob/; G; }" <good-commit \
     +		>broken-commit &&
     +	broken_commit="$(git hash-object -w --literally -t commit \
     +		broken-commit)"
    @@ -107,8 +105,7 @@
     +'
     +
     +test_expect_success 'setup unexpected non-tree root' '
    -+	git cat-file commit $commit |
    -+	sed -e "s/$tree/$blob/" >broken-commit &&
    ++	sed -e "s/$tree/$blob/" <good-commit >broken-commit &&
     +	broken_commit="$(git hash-object -w --literally -t commit \
     +		broken-commit)"
     +'
    @@ -123,8 +120,9 @@
     +
     +test_expect_success 'setup unexpected non-commit tag' '
     +	git tag -a -m "tagged commit" tag $commit &&
    ++	git cat-file tag tag >good-tag &&
     +	test_when_finished "git tag -d tag" &&
    -+	git cat-file -p tag | sed -e "s/$commit/$blob/" >broken-tag &&
    ++	sed -e "s/$commit/$blob/" <good-tag >broken-tag &&
     +	tag=$(git hash-object -w --literally -t tag broken-tag)
     +'
     +
    @@ -139,9 +137,9 @@
     +
     +test_expect_success 'setup unexpected non-tree tag' '
     +	git tag -a -m "tagged tree" tag $tree &&
    ++	git cat-file tag tag >good-tag &&
     +	test_when_finished "git tag -d tag" &&
    -+	git cat-file -p tag |
    -+	sed -e "s/$tree/$blob/" >broken-tag &&
    ++	sed -e "s/$tree/$blob/" <good-tag >broken-tag &&
     +	tag=$(git hash-object -w --literally -t tag broken-tag)
     +'
     +
    @@ -156,9 +154,9 @@
     +
     +test_expect_success 'setup unexpected non-blob tag' '
     +	git tag -a -m "tagged blob" tag $blob &&
    ++	git cat-file tag tag >good-tag &&
     +	test_when_finished "git tag -d tag" &&
    -+	git cat-file -p tag |
    -+	sed -e "s/$blob/$commit/" >broken-tag &&
    ++	sed -e "s/$blob/$commit/" <good-tag >broken-tag &&
     +	tag=$(git hash-object -w --literally -t tag broken-tag)
     +'
     +
3:  42f1b5d5fd = 3:  1d1ac9b7a7 list-objects.c: handle unexpected non-blob entries
4:  62c9a6f2e0 ! 4:  97c7e23ec9 list-objects.c: handle unexpected non-tree entries
    @@ -31,7 +31,7 @@
      '

     -test_expect_failure 'traverse unexpected non-tree entry (seen)' '
    --	test_must_fail git rev-list --objects $blob $broken_tree >output 2>&1
    +-	test_must_fail git rev-list --objects $blob $broken_tree
     +test_expect_success 'traverse unexpected non-tree entry (seen)' '
     +	test_must_fail git rev-list --objects $blob $broken_tree >output 2>&1 &&
     +	test_i18ngrep "is not a tree" output
5:  d0ae70542d = 5:  e9400a9f77 get_commit_tree(): return NULL for broken tree
6:  a0c23c8c16 = 6:  88ca5dfe68 rev-list: let traversal die when --missing is not in use
7:  2a25796147 = 7:  e0bd479e82 rev-list: detect broken root trees
--
2.21.0.203.g358da99528

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 1/7] t: move 'hex2oct' into test-lib-functions.sh
  2019-04-10  2:13 ` [PATCH v2 0/7] harden unexpected object types checks Taylor Blau
@ 2019-04-10  2:13   ` Taylor Blau
  2019-04-10  2:13   ` [PATCH v2 2/7] t: introduce tests for unexpected object types Taylor Blau
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2019-04-10  2:13 UTC (permalink / raw)
  To: git; +Cc: avarab, gitster, peff, sunshine, szeder.dev

The helper 'hex2oct' is used to convert base-16 encoded data into a
base-8 binary form, and is useful for preparing data for commands that
accept input in a binary format, such as 'git hash-object', via
'printf'.

This helper is defined identically in three separate places throughout
't'. Move the definition to test-lib-function.sh, so that it can be used
in new test suites, and its definition is not redundant.

This will likewise make our job easier in the subsequent commit, which
also uses 'hex2oct'.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 t/t1007-hash-object.sh  | 4 ----
 t/t1450-fsck.sh         | 4 ----
 t/t5601-clone.sh        | 4 ----
 t/test-lib-functions.sh | 6 ++++++
 4 files changed, 6 insertions(+), 12 deletions(-)

diff --git a/t/t1007-hash-object.sh b/t/t1007-hash-object.sh
index a37753047e..7099d33508 100755
--- a/t/t1007-hash-object.sh
+++ b/t/t1007-hash-object.sh
@@ -199,10 +199,6 @@ test_expect_success 'too-short tree' '
 	test_i18ngrep "too-short tree object" err
 '
 
-hex2oct() {
-    perl -ne 'printf "\\%03o", hex for /../g'
-}
-
 test_expect_success 'malformed mode in tree' '
 	hex_sha1=$(echo foo | git hash-object --stdin -w) &&
 	bin_sha1=$(echo $hex_sha1 | hex2oct) &&
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 49f08d5b9c..0f268a3664 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -256,10 +256,6 @@ test_expect_success 'unparseable tree object' '
 	test_i18ngrep ! "fatal: empty filename in tree entry" out
 '
 
-hex2oct() {
-	perl -ne 'printf "\\%03o", hex for /../g'
-}
-
 test_expect_success 'tree entry with type mismatch' '
 	test_when_finished "remove_object \$blob" &&
 	test_when_finished "remove_object \$tree" &&
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index d6948cbdab..3f49943010 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -611,10 +611,6 @@ test_expect_success 'GIT_TRACE_PACKFILE produces a usable pack' '
 	git -C replay.git index-pack -v --stdin <tmp.pack
 '
 
-hex2oct () {
-	perl -ne 'printf "\\%03o", hex for /../g'
-}
-
 test_expect_success 'clone on case-insensitive fs' '
 	git init icasefs &&
 	(
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 80402a428f..349eabe851 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1202,6 +1202,12 @@ depacketize () {
 	'
 }
 
+# Converts base-16 data into base-8. The output is given as a sequence of
+# escaped octals, suitable for consumption by 'printf'.
+hex2oct () {
+	perl -ne 'printf "\\%03o", hex for /../g'
+}
+
 # Set the hash algorithm in use to $1.  Only useful when testing the testsuite.
 test_set_hash () {
 	test_hash_algo="$1"
-- 
2.21.0.203.g358da99528


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 2/7] t: introduce tests for unexpected object types
  2019-04-10  2:13 ` [PATCH v2 0/7] harden unexpected object types checks Taylor Blau
  2019-04-10  2:13   ` [PATCH v2 1/7] t: move 'hex2oct' into test-lib-functions.sh Taylor Blau
@ 2019-04-10  2:13   ` Taylor Blau
  2019-04-10  2:13   ` [PATCH v2 3/7] list-objects.c: handle unexpected non-blob entries Taylor Blau
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2019-04-10  2:13 UTC (permalink / raw)
  To: git; +Cc: avarab, gitster, peff, sunshine, szeder.dev

Call an object's type "unexpected" when the actual type of an object
does not match Git's contextual expectation. For example, a tree entry
whose mode differs from the object's actual type, or a commit's parent
which is not another commit, and so on.

This can manifest itself in various unfortunate ways, including Git
SIGSEGV-ing under specific conditions. Consider the following example:
Git traverses a blob (say, via `git rev-list`), and then tries to read
out a tree-entry which lists that object as something other than a blob.
In this case, `lookup_blob()` will return NULL, and the subsequent
dereference will result in a SIGSEGV.

Introduce tests that present objects of "unexpected" type in the above
fashion to 'git rev-list'. Mark as failures the combinations that are
already broken (i.e., they exhibit the segfault described above). In the
cases that are not broken (i.e., they have NULL-ness checks or similar),
mark these as expecting success.

We might hit an unexpected type in two different ways (imagine we have a
tree entry that claims to be a tree but actually points to a blob):

  - when we call lookup_tree(), we might find that we've already seen
    the object referenced as a blob, in which case we'd get NULL. We
    can exercise this with "git rev-list --objects $blob $tree", which
    guarantees that the blob will have been parsed before we look in
    the tree. These tests are marked as "seen" in the test script.

  - we call lookup_tree() successfully, but when we try to read the
    object, we find out it's something else. We construct our tests
    such that $blob is not otherwise mentioned in $tree. These tests
    are marked as "lone" in the script.

We should check that we behave sensibly in both cases (especially
because it is easy for a malicious actor to provoke one case or the
other).

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 t/t6102-rev-list-unexpected-objects.sh | 123 +++++++++++++++++++++++++
 1 file changed, 123 insertions(+)
 create mode 100755 t/t6102-rev-list-unexpected-objects.sh

diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh
new file mode 100755
index 0000000000..15072ecce3
--- /dev/null
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -0,0 +1,123 @@
+#!/bin/sh
+
+test_description='git rev-list should handle unexpected object types'
+
+. ./test-lib.sh
+
+test_expect_success 'setup well-formed objects' '
+	blob="$(printf "foo" | git hash-object -w --stdin)" &&
+	tree="$(printf "100644 blob $blob\tfoo" | git mktree)" &&
+	commit="$(git commit-tree $tree -m "first commit")" &&
+	git cat-file commit $commit >good-commit
+'
+
+test_expect_success 'setup unexpected non-blob entry' '
+	printf "100644 foo\0$(echo $tree | hex2oct)" >broken-tree &&
+	broken_tree="$(git hash-object -w --literally -t tree broken-tree)"
+'
+
+test_expect_failure 'traverse unexpected non-blob entry (lone)' '
+	test_must_fail git rev-list --objects $broken_tree
+'
+
+test_expect_failure 'traverse unexpected non-blob entry (seen)' '
+	test_must_fail git rev-list --objects $tree $broken_tree
+'
+
+test_expect_success 'setup unexpected non-tree entry' '
+	printf "40000 foo\0$(echo $blob | hex2oct)" >broken-tree &&
+	broken_tree="$(git hash-object -w --literally -t tree broken-tree)"
+'
+
+test_expect_failure 'traverse unexpected non-tree entry (lone)' '
+	test_must_fail git rev-list --objects $broken_tree
+'
+
+test_expect_failure 'traverse unexpected non-tree entry (seen)' '
+	test_must_fail git rev-list --objects $blob $broken_tree
+'
+
+test_expect_success 'setup unexpected non-commit parent' '
+	sed "/^author/ { h; s/.*/parent $blob/; G; }" <good-commit \
+		>broken-commit &&
+	broken_commit="$(git hash-object -w --literally -t commit \
+		broken-commit)"
+'
+
+test_expect_success 'traverse unexpected non-commit parent (lone)' '
+	test_must_fail git rev-list --objects $broken_commit >output 2>&1 &&
+	test_i18ngrep "not a commit" output
+'
+
+test_expect_success 'traverse unexpected non-commit parent (seen)' '
+	test_must_fail git rev-list --objects $commit $broken_commit \
+		>output 2>&1 &&
+	test_i18ngrep "not a commit" output
+'
+
+test_expect_success 'setup unexpected non-tree root' '
+	sed -e "s/$tree/$blob/" <good-commit >broken-commit &&
+	broken_commit="$(git hash-object -w --literally -t commit \
+		broken-commit)"
+'
+
+test_expect_failure 'traverse unexpected non-tree root (lone)' '
+	test_must_fail git rev-list --objects $broken_commit
+'
+
+test_expect_failure 'traverse unexpected non-tree root (seen)' '
+	test_must_fail git rev-list --objects $blob $broken_commit
+'
+
+test_expect_success 'setup unexpected non-commit tag' '
+	git tag -a -m "tagged commit" tag $commit &&
+	git cat-file tag tag >good-tag &&
+	test_when_finished "git tag -d tag" &&
+	sed -e "s/$commit/$blob/" <good-tag >broken-tag &&
+	tag=$(git hash-object -w --literally -t tag broken-tag)
+'
+
+test_expect_success 'traverse unexpected non-commit tag (lone)' '
+	test_must_fail git rev-list --objects $tag
+'
+
+test_expect_success 'traverse unexpected non-commit tag (seen)' '
+	test_must_fail git rev-list --objects $blob $tag >output 2>&1 &&
+	test_i18ngrep "not a commit" output
+'
+
+test_expect_success 'setup unexpected non-tree tag' '
+	git tag -a -m "tagged tree" tag $tree &&
+	git cat-file tag tag >good-tag &&
+	test_when_finished "git tag -d tag" &&
+	sed -e "s/$tree/$blob/" <good-tag >broken-tag &&
+	tag=$(git hash-object -w --literally -t tag broken-tag)
+'
+
+test_expect_success 'traverse unexpected non-tree tag (lone)' '
+	test_must_fail git rev-list --objects $tag
+'
+
+test_expect_success 'traverse unexpected non-tree tag (seen)' '
+	test_must_fail git rev-list --objects $blob $tag >output 2>&1 &&
+	test_i18ngrep "not a tree" output
+'
+
+test_expect_success 'setup unexpected non-blob tag' '
+	git tag -a -m "tagged blob" tag $blob &&
+	git cat-file tag tag >good-tag &&
+	test_when_finished "git tag -d tag" &&
+	sed -e "s/$blob/$commit/" <good-tag >broken-tag &&
+	tag=$(git hash-object -w --literally -t tag broken-tag)
+'
+
+test_expect_failure 'traverse unexpected non-blob tag (lone)' '
+	test_must_fail git rev-list --objects $tag
+'
+
+test_expect_success 'traverse unexpected non-blob tag (seen)' '
+	test_must_fail git rev-list --objects $commit $tag >output 2>&1 &&
+	test_i18ngrep "not a blob" output
+'
+
+test_done
-- 
2.21.0.203.g358da99528


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 3/7] list-objects.c: handle unexpected non-blob entries
  2019-04-10  2:13 ` [PATCH v2 0/7] harden unexpected object types checks Taylor Blau
  2019-04-10  2:13   ` [PATCH v2 1/7] t: move 'hex2oct' into test-lib-functions.sh Taylor Blau
  2019-04-10  2:13   ` [PATCH v2 2/7] t: introduce tests for unexpected object types Taylor Blau
@ 2019-04-10  2:13   ` Taylor Blau
  2019-04-10  2:13   ` [PATCH v2 4/7] list-objects.c: handle unexpected non-tree entries Taylor Blau
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2019-04-10  2:13 UTC (permalink / raw)
  To: git; +Cc: avarab, gitster, peff, sunshine, szeder.dev

Fix one of the cases described in the previous commit where a tree-entry
that is promised to a blob is in fact a non-blob.

When 'lookup_blob()' returns NULL, it is because Git has cached the
requested object as a non-blob. In this case, prevent a SIGSEGV by
'die()'-ing immediately before attempting to dereference the result.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 list-objects.c                         | 5 +++++
 t/t6102-rev-list-unexpected-objects.sh | 5 +++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index dc77361e11..ea04bbdee6 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -133,6 +133,11 @@ static void process_tree_contents(struct traversal_context *ctx,
 					base, entry.path);
 		else {
 			struct blob *b = lookup_blob(ctx->revs->repo, &entry.oid);
+			if (!b) {
+				die(_("entry '%s' in tree %s has blob mode, "
+				      "but is not a blob"),
+				    entry.path, oid_to_hex(&tree->object.oid));
+			}
 			b->object.flags |= NOT_USER_GIVEN;
 			process_blob(ctx, b, base, entry.path);
 		}
diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh
index 15072ecce3..1377c60378 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -20,8 +20,9 @@ test_expect_failure 'traverse unexpected non-blob entry (lone)' '
 	test_must_fail git rev-list --objects $broken_tree
 '
 
-test_expect_failure 'traverse unexpected non-blob entry (seen)' '
-	test_must_fail git rev-list --objects $tree $broken_tree
+test_expect_success 'traverse unexpected non-blob entry (seen)' '
+	test_must_fail git rev-list --objects $tree $broken_tree >output 2>&1 &&
+	test_i18ngrep "is not a blob" output
 '
 
 test_expect_success 'setup unexpected non-tree entry' '
-- 
2.21.0.203.g358da99528


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 4/7] list-objects.c: handle unexpected non-tree entries
  2019-04-10  2:13 ` [PATCH v2 0/7] harden unexpected object types checks Taylor Blau
                     ` (2 preceding siblings ...)
  2019-04-10  2:13   ` [PATCH v2 3/7] list-objects.c: handle unexpected non-blob entries Taylor Blau
@ 2019-04-10  2:13   ` Taylor Blau
  2019-04-10  2:13   ` [PATCH v2 5/7] get_commit_tree(): return NULL for broken tree Taylor Blau
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2019-04-10  2:13 UTC (permalink / raw)
  To: git; +Cc: avarab, gitster, peff, sunshine, szeder.dev

Apply similar treatment as the previous commit for non-tree entries,
too.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
---
 list-objects.c                         | 5 +++++
 t/t6102-rev-list-unexpected-objects.sh | 5 +++--
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index ea04bbdee6..bb7e61ef4b 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -125,6 +125,11 @@ static void process_tree_contents(struct traversal_context *ctx,
 
 		if (S_ISDIR(entry.mode)) {
 			struct tree *t = lookup_tree(ctx->revs->repo, &entry.oid);
+			if (!t) {
+				die(_("entry '%s' in tree %s has tree mode, "
+				      "but is not a tree"),
+				    entry.path, oid_to_hex(&tree->object.oid));
+			}
 			t->object.flags |= NOT_USER_GIVEN;
 			process_tree(ctx, t, base, entry.path);
 		}
diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh
index 1377c60378..e3ec195532 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -34,8 +34,9 @@ test_expect_failure 'traverse unexpected non-tree entry (lone)' '
 	test_must_fail git rev-list --objects $broken_tree
 '
 
-test_expect_failure 'traverse unexpected non-tree entry (seen)' '
-	test_must_fail git rev-list --objects $blob $broken_tree
+test_expect_success 'traverse unexpected non-tree entry (seen)' '
+	test_must_fail git rev-list --objects $blob $broken_tree >output 2>&1 &&
+	test_i18ngrep "is not a tree" output
 '
 
 test_expect_success 'setup unexpected non-commit parent' '
-- 
2.21.0.203.g358da99528


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 5/7] get_commit_tree(): return NULL for broken tree
  2019-04-10  2:13 ` [PATCH v2 0/7] harden unexpected object types checks Taylor Blau
                     ` (3 preceding siblings ...)
  2019-04-10  2:13   ` [PATCH v2 4/7] list-objects.c: handle unexpected non-tree entries Taylor Blau
@ 2019-04-10  2:13   ` Taylor Blau
  2019-04-10  2:13   ` [PATCH v2 6/7] rev-list: let traversal die when --missing is not in use Taylor Blau
  2019-04-10  2:13   ` [PATCH v2 7/7] rev-list: detect broken root trees Taylor Blau
  6 siblings, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2019-04-10  2:13 UTC (permalink / raw)
  To: git; +Cc: avarab, gitster, peff, sunshine, szeder.dev

From: Jeff King <peff@peff.net>

Return NULL from 'get_commit_tree()' when a commit's root tree is
corrupt, doesn't exist, or points to an object which is not a tree.

In [1], this situation became a BUG(), but it can certainly occur in
cases which are not a bug in Git, for e.g., if a caller manually crafts
a commit whose tree is corrupt in any of the above ways.

Note that the expect_failure test in t6102 triggers this BUG(), but we
can't flip it to expect_success yet. Solving this problem actually
reveals a second bug.

[1]: 7b8a21dba1 (commit-graph: lazy-load trees for commits, 2018-04-06)

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 commit.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/commit.c b/commit.c
index a5333c7ac6..e2cde566a9 100644
--- a/commit.c
+++ b/commit.c
@@ -345,10 +345,10 @@ struct tree *get_commit_tree(const struct commit *commit)
 	if (commit->maybe_tree || !commit->object.parsed)
 		return commit->maybe_tree;
 
-	if (commit->graph_pos == COMMIT_NOT_FROM_GRAPH)
-		BUG("commit has NULL tree, but was not loaded from commit-graph");
+	if (commit->graph_pos != COMMIT_NOT_FROM_GRAPH)
+		return get_commit_tree_in_graph(the_repository, commit);
 
-	return get_commit_tree_in_graph(the_repository, commit);
+	return NULL;
 }
 
 struct object_id *get_commit_tree_oid(const struct commit *commit)
-- 
2.21.0.203.g358da99528


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 6/7] rev-list: let traversal die when --missing is not in use
  2019-04-10  2:13 ` [PATCH v2 0/7] harden unexpected object types checks Taylor Blau
                     ` (4 preceding siblings ...)
  2019-04-10  2:13   ` [PATCH v2 5/7] get_commit_tree(): return NULL for broken tree Taylor Blau
@ 2019-04-10  2:13   ` Taylor Blau
  2019-04-10  2:13   ` [PATCH v2 7/7] rev-list: detect broken root trees Taylor Blau
  6 siblings, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2019-04-10  2:13 UTC (permalink / raw)
  To: git; +Cc: avarab, gitster, peff, sunshine, szeder.dev

From: Jeff King <peff@peff.net>

Commit 7c0fe330d5 (rev-list: handle missing tree objects properly,
2018-10-05) taught the traversal machinery used by git-rev-list to
ignore missing trees, so that rev-list could handle them itself.

However, it does so only by checking via oid_object_info_extended() that
the object exists at all. This can miss several classes of errors that
were previously detected by rev-list:

  - type mismatches (e.g., we expected a tree but got a blob)

  - failure to read the object data (e.g., due to bitrot on disk)

This is especially important because we use "rev-list --objects" as our
connectivity check to admit new objects to the repository, and it will
now miss these cases (though the bitrot one is less important here,
because we'd typically have just hashed and stored the object).

There are a few options to fix this:

 1. we could check these properties in rev-list when we do the existence
    check. This is probably too expensive in practice (perhaps even for
    a type check, but definitely for checking the whole content again,
    which implies loading each object into memory twice).

 2. teach the traversal machinery to differentiate between a missing
    object, and one that could not be loaded as expected. This probably
    wouldn't be too hard to detect type mismatches, but detecting bitrot
    versus a truly missing object would require deep changes to the
    object-loading code.

 3. have the traversal machinery communicate the failure to the caller,
    so that it can decide how to proceed without re-evaluting the object
    itself.

Of those, I think (3) is probably the best path forward. However, this
patch does none of them. In the name of expediently fixing the
regression to a normal "rev-list --objects" that we use for connectivity
checks, this simply restores the pre-7c0fe330d5 behavior of having the
traversal die as soon as it fails to load a tree (when --missing is set
to MA_ERROR, which is the default).

Note that we can't get rid of the object-existence check in
finish_object(), because this also handles blobs (which are not
otherwise checked at all by the traversal code).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/rev-list.c                     | 4 +++-
 t/t6102-rev-list-unexpected-objects.sh | 4 ++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 425a5774db..9f31837d30 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -379,7 +379,6 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 	repo_init_revisions(the_repository, &revs, prefix);
 	revs.abbrev = DEFAULT_ABBREV;
 	revs.commit_format = CMIT_FMT_UNSPECIFIED;
-	revs.do_not_die_on_missing_tree = 1;
 
 	/*
 	 * Scan the argument list before invoking setup_revisions(), so that we
@@ -409,6 +408,9 @@ int cmd_rev_list(int argc, const char **argv, const char *prefix)
 		}
 	}
 
+	if (arg_missing_action)
+		revs.do_not_die_on_missing_tree = 1;
+
 	argc = setup_revisions(argc, argv, &revs, &s_r_opt);
 
 	memset(&info, 0, sizeof(info));
diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh
index e3ec195532..28ee1bcb07 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -30,7 +30,7 @@ test_expect_success 'setup unexpected non-tree entry' '
 	broken_tree="$(git hash-object -w --literally -t tree broken-tree)"
 '
 
-test_expect_failure 'traverse unexpected non-tree entry (lone)' '
+test_expect_success 'traverse unexpected non-tree entry (lone)' '
 	test_must_fail git rev-list --objects $broken_tree
 '
 
@@ -63,7 +63,7 @@ test_expect_success 'setup unexpected non-tree root' '
 		broken-commit)"
 '
 
-test_expect_failure 'traverse unexpected non-tree root (lone)' '
+test_expect_success 'traverse unexpected non-tree root (lone)' '
 	test_must_fail git rev-list --objects $broken_commit
 '
 
-- 
2.21.0.203.g358da99528


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 7/7] rev-list: detect broken root trees
  2019-04-10  2:13 ` [PATCH v2 0/7] harden unexpected object types checks Taylor Blau
                     ` (5 preceding siblings ...)
  2019-04-10  2:13   ` [PATCH v2 6/7] rev-list: let traversal die when --missing is not in use Taylor Blau
@ 2019-04-10  2:13   ` Taylor Blau
  6 siblings, 0 replies; 41+ messages in thread
From: Taylor Blau @ 2019-04-10  2:13 UTC (permalink / raw)
  To: git; +Cc: avarab, gitster, peff, sunshine, szeder.dev

From: Jeff King <peff@peff.net>

When the traversal machinery sees a commit without a root tree, it
assumes that the tree was part of a BOUNDARY commit, and quietly ignores
the tree. But it could also be caused by a commit whose root tree is
broken or missing.

Instead, let's die() when we see a NULL root tree. We can differentiate
it from the BOUNDARY case by seeing if the commit was actually parsed.
This covers that case, plus future-proofs us against any others where we
might try to show an unparsed commit.

Signed-off-by: Jeff King <peff@peff.net>
---
 list-objects.c                         | 3 +++
 t/t6102-rev-list-unexpected-objects.sh | 6 ++++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/list-objects.c b/list-objects.c
index bb7e61ef4b..b5651ddd5b 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -374,6 +374,9 @@ static void do_traverse(struct traversal_context *ctx)
 			struct tree *tree = get_commit_tree(commit);
 			tree->object.flags |= NOT_USER_GIVEN;
 			add_pending_tree(ctx->revs, tree);
+		} else if (commit->object.parsed) {
+			die(_("unable to load root tree for commit %s"),
+			      oid_to_hex(&commit->object.oid));
 		}
 		ctx->show_commit(commit, ctx->show_data);
 
diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh
index 28ee1bcb07..28611c978e 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -67,8 +67,10 @@ test_expect_success 'traverse unexpected non-tree root (lone)' '
 	test_must_fail git rev-list --objects $broken_commit
 '
 
-test_expect_failure 'traverse unexpected non-tree root (seen)' '
-	test_must_fail git rev-list --objects $blob $broken_commit
+test_expect_success 'traverse unexpected non-tree root (seen)' '
+	test_must_fail git rev-list --objects $blob $broken_commit \
+		>output 2>&1 &&
+	test_i18ngrep "not a tree" output
 '
 
 test_expect_success 'setup unexpected non-commit tag' '
-- 
2.21.0.203.g358da99528

^ permalink raw reply related	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2019-04-10  2:13 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-05  3:37 [PATCH 0/7] harden unexpected object types checks Taylor Blau
2019-04-05  3:37 ` [PATCH 1/7] t: move 'hex2oct' into test-lib-functions.sh Taylor Blau
2019-04-05  3:37 ` [PATCH 2/7] t: introduce tests for unexpected object types Taylor Blau
2019-04-05 10:50   ` SZEDER Gábor
2019-04-05 18:24     ` Jeff King
2019-04-05 18:42       ` SZEDER Gábor
2019-04-05 18:52         ` Jeff King
2019-04-07 21:00           ` Ævar Arnfjörð Bjarmason
2019-04-09  2:29             ` Taylor Blau
2019-04-09  9:14               ` Ævar Arnfjörð Bjarmason
2019-04-10  1:59                 ` Taylor Blau
2019-04-08  5:27           ` Junio C Hamano
2019-04-05 19:25       ` Eric Sunshine
2019-04-05 20:53         ` Jeff King
2019-04-06  5:33           ` Taylor Blau
2019-04-08  6:44         ` Junio C Hamano
2019-04-09  2:30           ` Taylor Blau
2019-04-09  3:28             ` Eric Sunshine
2019-04-09  5:08               ` Taylor Blau
2019-04-09  8:02                 ` Eric Sunshine
2019-04-10  1:54                   ` Taylor Blau
2019-04-06  5:31       ` Taylor Blau
2019-04-05 18:31   ` Jeff King
2019-04-06  5:23     ` Taylor Blau
2019-04-05  3:37 ` [PATCH 3/7] list-objects.c: handle unexpected non-blob entries Taylor Blau
2019-04-05  3:37 ` [PATCH 4/7] list-objects.c: handle unexpected non-tree entries Taylor Blau
2019-04-05  3:37 ` [PATCH 5/7] get_commit_tree(): return NULL for broken tree Taylor Blau
2019-04-05  3:37 ` [PATCH 6/7] rev-list: let traversal die when --missing is not in use Taylor Blau
2019-04-05 18:41   ` Jeff King
2019-04-06  5:36     ` Taylor Blau
2019-04-07 13:41       ` Jeff King
2019-04-09  2:11         ` Taylor Blau
2019-04-05  3:37 ` [PATCH 7/7] rev-list: detect broken root trees Taylor Blau
2019-04-10  2:13 ` [PATCH v2 0/7] harden unexpected object types checks Taylor Blau
2019-04-10  2:13   ` [PATCH v2 1/7] t: move 'hex2oct' into test-lib-functions.sh Taylor Blau
2019-04-10  2:13   ` [PATCH v2 2/7] t: introduce tests for unexpected object types Taylor Blau
2019-04-10  2:13   ` [PATCH v2 3/7] list-objects.c: handle unexpected non-blob entries Taylor Blau
2019-04-10  2:13   ` [PATCH v2 4/7] list-objects.c: handle unexpected non-tree entries Taylor Blau
2019-04-10  2:13   ` [PATCH v2 5/7] get_commit_tree(): return NULL for broken tree Taylor Blau
2019-04-10  2:13   ` [PATCH v2 6/7] rev-list: let traversal die when --missing is not in use Taylor Blau
2019-04-10  2:13   ` [PATCH v2 7/7] rev-list: detect broken root trees Taylor Blau

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).