git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/6] fsck --connectivity-check misses some corruption
@ 2017-01-16 21:22 Jeff King
  2017-01-16 21:24 ` [PATCH 1/6] t1450: clean up sub-objects in duplicate-entry test Jeff King
                   ` (6 more replies)
  0 siblings, 7 replies; 15+ messages in thread
From: Jeff King @ 2017-01-16 21:22 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Johannes Schindelin

I came across a repository today that was missing an object, and for
which "git fsck" reported the error but "git fsck --connectivity-check"
did not. It turns out that the shortcut taken by --connectivity-check
violates some assumptions made by the rest of fsck (namely, that every
object in the repo has a "struct object" loaded).

And fsck being a generally neglected tool, I couldn't help but find
several more bugs on the way. :)

  [1/6]: t1450: clean up sub-objects in duplicate-entry test
  [2/6]: fsck: report trees as dangling
  [3/6]: fsck: prepare dummy objects for --connectivity-check
  [4/6]: fsck: tighten error-checks of "git fsck <head>"
  [5/6]: fsck: do not fallback "git fsck <bogus>" to "git fsck"
  [6/6]: fsck: check HAS_OBJ more consistently

 builtin/fsck.c  | 131 ++++++++++++++++++++++++++++++++++++++++++++------------
 t/t1450-fsck.sh |  70 ++++++++++++++++++++++++++++--
 2 files changed, 171 insertions(+), 30 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/6] t1450: clean up sub-objects in duplicate-entry test
  2017-01-16 21:22 [PATCH 0/6] fsck --connectivity-check misses some corruption Jeff King
@ 2017-01-16 21:24 ` Jeff King
  2017-01-17 20:52   ` Junio C Hamano
  2017-01-16 21:25 ` [PATCH 2/6] fsck: report trees as dangling Jeff King
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2017-01-16 21:24 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Johannes Schindelin

This test creates a multi-level set of trees, but its
cleanup routine only removes the top-level tree. After the
test finishes, the inner tree and the blob it points to
remain, making the inner tree dangling.

A later test ("cleaned up") verifies that we've removed any
cruft and "git fsck" output is clean. This passes only
because of a bug in git-fsck which fails to notice dangling
trees.

In preparation for fixing the bug, let's teach this earlier
test to clean up after itself correctly. We have to remove
the inner tree (and therefore the blob, too, which becomes
dangling after removing that tree).

Since the setup code happens inside a subshell, we can't
just set a variable for each object. However, we can stuff
all of the sha1s into the $T output variable, which is not
used for anything except cleanup.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t1450-fsck.sh | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index ee7d4736d..6eef8b28e 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -189,14 +189,16 @@ test_expect_success 'commit with NUL in header' '
 '
 
 test_expect_success 'tree object with duplicate entries' '
-	test_when_finished "remove_object \$T" &&
+	test_when_finished "for i in \$T; do remove_object \$i; done" &&
 	T=$(
 		GIT_INDEX_FILE=test-index &&
 		export GIT_INDEX_FILE &&
 		rm -f test-index &&
 		>x &&
 		git add x &&
+		git rev-parse :x &&
 		T=$(git write-tree) &&
+		echo $T &&
 		(
 			git cat-file tree $T &&
 			git cat-file tree $T
-- 
2.11.0.642.gd6f8cda6c


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/6] fsck: report trees as dangling
  2017-01-16 21:22 [PATCH 0/6] fsck --connectivity-check misses some corruption Jeff King
  2017-01-16 21:24 ` [PATCH 1/6] t1450: clean up sub-objects in duplicate-entry test Jeff King
@ 2017-01-16 21:25 ` Jeff King
  2017-01-17 20:57   ` Junio C Hamano
  2017-01-16 21:32 ` [PATCH 3/6] fsck: prepare dummy objects for --connectivity-check Jeff King
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2017-01-16 21:25 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Johannes Schindelin

After checking connectivity, fsck looks through the list of
any objects we've seen mentioned, and reports unreachable
and un-"used" ones as dangling. However, it skips any object
which is not marked as "parsed", as that is an object that
we _don't_ have (but that somebody mentioned).

Since 6e454b9a3 (clear parsed flag when we free tree
buffers, 2013-06-05), that flag can't be relied on, and the
correct method is to check the HAS_OBJ flag. The cleanup in
that commit missed this callsite, though. As a result, we
would generally fail to report dangling trees.

We never noticed because there were no tests in this area
(for trees or otherwise). Let's add some.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fsck.c  |  2 +-
 t/t1450-fsck.sh | 27 +++++++++++++++++++++++++++
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index f01b81eeb..3e67203f9 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -225,7 +225,7 @@ static void check_unreachable_object(struct object *obj)
 	 * to complain about it being unreachable (since it does
 	 * not exist).
 	 */
-	if (!obj->parsed)
+	if (!(obj->flags & HAS_OBJ))
 		return;
 
 	/*
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 6eef8b28e..e88ec7747 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -559,4 +559,31 @@ test_expect_success 'fsck --name-objects' '
 	)
 '
 
+# for each of type, we have one version which is referenced by another object
+# (and so while unreachable, not dangling), and another variant which really is
+# dangling.
+test_expect_success 'fsck notices dangling objects' '
+	git init dangling &&
+	(
+		cd dangling &&
+		blob=$(echo not-dangling | git hash-object -w --stdin) &&
+		dblob=$(echo dangling | git hash-object -w --stdin) &&
+		tree=$(printf "100644 blob %s\t%s\n" $blob one | git mktree) &&
+		dtree=$(printf "100644 blob %s\t%s\n" $blob two | git mktree) &&
+		commit=$(git commit-tree $tree) &&
+		dcommit=$(git commit-tree -p $commit $tree) &&
+
+		cat >expect <<-EOF &&
+		dangling blob $dblob
+		dangling commit $dcommit
+		dangling tree $dtree
+		EOF
+
+		git fsck >actual &&
+		# the output order is non-deterministic, as it comes from a hash
+		sort <actual >actual.sorted &&
+		test_cmp expect actual.sorted
+	)
+'
+
 test_done
-- 
2.11.0.642.gd6f8cda6c


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/6] fsck: prepare dummy objects for --connectivity-check
  2017-01-16 21:22 [PATCH 0/6] fsck --connectivity-check misses some corruption Jeff King
  2017-01-16 21:24 ` [PATCH 1/6] t1450: clean up sub-objects in duplicate-entry test Jeff King
  2017-01-16 21:25 ` [PATCH 2/6] fsck: report trees as dangling Jeff King
@ 2017-01-16 21:32 ` Jeff King
  2017-01-17 21:15   ` Junio C Hamano
  2017-01-17 21:16   ` Junio C Hamano
  2017-01-16 21:33 ` [PATCH 4/6] fsck: tighten error-checks of "git fsck <head>" Jeff King
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 15+ messages in thread
From: Jeff King @ 2017-01-16 21:32 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Johannes Schindelin

Normally fsck makes a pass over all objects to check their
integrity, and then follows up with a reachability check to
make sure we have all of the referenced objects (and to know
which ones are dangling). The latter checks for the HAS_OBJ
flag in obj->flags to see if we found the object in the
first pass.

Commit 02976bf85 (fsck: introduce `git fsck --connectivity-only`,
2015-06-22) taught fsck to skip the initial pass, and to
fallback to has_sha1_file() instead of the HAS_OBJ check.

However, it converted only one HAS_OBJ check to use
has_sha1_file(). But there are many other places in
builtin/fsck.c that assume that the flag is set (or that
lookup_object() will return an object at all). This leads to
several bugs with --connectivity-only:

  1. mark_object() will not queue objects for examination,
     so recursively following links from commits to trees,
     etc, did nothing. I.e., we were checking the
     reachability of hardly anything at all.

  2. When a set of heads is given on the command-line, we
     use lookup_object() to see if they exist. But without
     the initial pass, we assume nothing exists.

  3. When loading reflog entries, we do a similar
     lookup_object() check, and complain that the reflog is
     broken if the object doesn't exist in our hash.

So in short, --connectivity-only is broken pretty badly, and
will claim that your repository is fine when it's not.
Presumably nobody noticed for a few reasons.

One is that the embedded test does not actually test the
recursive nature of the reachability check. All of the
missing objects are still in the index, and we directly
check items from the index. This patch modifies the test to
delete the index, which shows off breakage (1).

Another is that --connectivity-only just skips the initial
pass for loose objects. So on a real repository, the packed
objects were still checked correctly. But on the flipside,
it means that "git fsck --connectivity-only" still checks
the sha1 of all of the packed objects, nullifying its
original purpose of being a faster git-fsck.

And of course the final problem is that the bug only shows
up when there _is_ corruption, which is rare. So anybody
running "git fsck --connectivity-only" proactively would
assume it was being thorough, when it was not.

One possibility for fixing this is to find all of the spots
that rely on HAS_OBJ and tweak them for the connectivity-only
case. But besides the risk that we might miss a spot (and I
found three already, corresponding to the three bugs above),
there are other parts of fsck that _can't_ work without a
full list of objects. E.g., the list of dangling objects.

Instead, let's make the connectivity-only case look more
like the normal case. Rather than skip the initial pass
completely, we'll do an abbreviated one that sets up the
HAS_OBJ flag for each object, without actually loading the
object data.

That's simple and fast, and we don't have to care about the
connectivity_only flag in the rest of the code at all.
While we're at it, let's make sure we treat loose and packed
objects the same (i.e., setting up dummy objects for both
and skipping the actual sha1 check). That makes the
connectivity-only check actually fast on a real repo (40
seconds versus 180 seconds on my copy of linux.git).

Signed-off-by: Jeff King <peff@peff.net>
---
The cmd_fsck() part of the diff is pretty nasty without
"-b".

I had originally planned to pull the "stop calling
verify_pack()" bit out into its own patch. But doing this
without treating loose and packed objects the same raises a
lot of questions about why one and not the other. It seemed
simpler to just explain it all together.

 builtin/fsck.c  | 118 +++++++++++++++++++++++++++++++++++++++++++++-----------
 t/t1450-fsck.sh |  23 ++++++++++-
 2 files changed, 117 insertions(+), 24 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 3e67203f9..f527d8a02 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -205,8 +205,6 @@ static void check_reachable_object(struct object *obj)
 	if (!(obj->flags & HAS_OBJ)) {
 		if (has_sha1_pack(obj->oid.hash))
 			return; /* it is in pack - forget about it */
-		if (connectivity_only && has_object_file(&obj->oid))
-			return;
 		printf("missing %s %s\n", typename(obj->type),
 			describe_object(obj));
 		errors_found |= ERROR_REACHABLE;
@@ -584,6 +582,79 @@ static int fsck_cache_tree(struct cache_tree *it)
 	return err;
 }
 
+static void mark_object_for_connectivity(const unsigned char *sha1)
+{
+	struct object *obj = lookup_object(sha1);
+
+	/*
+	 * Setting the object type here isn't strictly necessary here for a
+	 * connectivity check. In most cases, our walk will expect a certain
+	 * type (e.g., a tree referencing a blob) and will use lookup_blob() to
+	 * assign the type. But doing it here has two advantages:
+	 *
+	 *   1. When the fsck_walk code looks at objects that _don't_ come from
+	 *      links (e.g., the tip of a ref), it may complain about the
+	 *      "unknown object type".
+	 *
+	 *   2. This serves as a nice cross-check that the graph links are
+	 *      sane. So --connectivity-only does not check that the bits of
+	 *      blobs are not corrupted, but it _does_ check that 100644 tree
+	 *      entries point to blobs, and so forth.
+	 *
+	 * Unfortunately we can't just use parse_object() here, because the
+	 * whole point of --connectivity-only is to avoid reading the object
+	 * data more than necessary.
+	 */
+	if (!obj || obj->type == OBJ_NONE) {
+		enum object_type type = sha1_object_info(sha1, NULL);
+		switch (type) {
+		case OBJ_BAD:
+			error("%s: unable to read object type",
+			      sha1_to_hex(sha1));
+			break;
+		case OBJ_COMMIT:
+			obj = (struct object *)lookup_commit(sha1);
+			break;
+		case OBJ_TREE:
+			obj = (struct object *)lookup_tree(sha1);
+			break;
+		case OBJ_BLOB:
+			obj = (struct object *)lookup_blob(sha1);
+			break;
+		case OBJ_TAG:
+			obj = (struct object *)lookup_tag(sha1);
+			break;
+		default:
+			error("%s: unknown object type %d",
+			      sha1_to_hex(sha1), type);
+		}
+
+		if (!obj || obj->type == OBJ_NONE) {
+			errors_found |= ERROR_OBJECT;
+			return;
+		}
+	}
+
+	obj->flags |= HAS_OBJ;
+}
+
+static int mark_loose_for_connectivity(const unsigned char *sha1,
+				       const char *path,
+				       void *data)
+{
+	mark_object_for_connectivity(sha1);
+	return 0;
+}
+
+static int mark_packed_for_connectivity(const unsigned char *sha1,
+					struct packed_git *pack,
+					uint32_t pos,
+					void *data)
+{
+	mark_object_for_connectivity(sha1);
+	return 0;
+}
+
 static char const * const fsck_usage[] = {
 	N_("git fsck [<options>] [<object>...]"),
 	NULL
@@ -646,32 +717,35 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 		prepare_alt_odb();
 		for (alt = alt_odb_list; alt; alt = alt->next)
 			fsck_object_dir(alt->path);
-	}
 
-	if (check_full) {
-		struct packed_git *p;
-		uint32_t total = 0, count = 0;
-		struct progress *progress = NULL;
+		if (check_full) {
+			struct packed_git *p;
+			uint32_t total = 0, count = 0;
+			struct progress *progress = NULL;
+
+			prepare_packed_git();
 
-		prepare_packed_git();
+			if (show_progress) {
+				for (p = packed_git; p; p = p->next) {
+					if (open_pack_index(p))
+						continue;
+					total += p->num_objects;
+				}
 
-		if (show_progress) {
+				progress = start_progress(_("Checking objects"), total);
+			}
 			for (p = packed_git; p; p = p->next) {
-				if (open_pack_index(p))
-					continue;
-				total += p->num_objects;
+				/* verify gives error messages itself */
+				if (verify_pack(p, fsck_obj_buffer,
+						progress, count))
+					errors_found |= ERROR_PACK;
+				count += p->num_objects;
 			}
-
-			progress = start_progress(_("Checking objects"), total);
-		}
-		for (p = packed_git; p; p = p->next) {
-			/* verify gives error messages itself */
-			if (verify_pack(p, fsck_obj_buffer,
-					progress, count))
-				errors_found |= ERROR_PACK;
-			count += p->num_objects;
+			stop_progress(&progress);
 		}
-		stop_progress(&progress);
+	} else {
+		for_each_loose_object(mark_loose_for_connectivity, NULL, 0);
+		for_each_packed_object(mark_packed_for_connectivity, NULL, 0);
 	}
 
 	heads = 0;
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index e88ec7747..c1b2dda33 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -523,9 +523,21 @@ test_expect_success 'fsck --connectivity-only' '
 		touch empty &&
 		git add empty &&
 		test_commit empty &&
+
+		# Drop the index now; we want to be sure that we
+		# recursively notice that we notice the broken objects
+		# because they are reachable from refs, not because
+		# they are in the index.
+		rm -f .git/index &&
+
+		# corrupt the blob, but in a way that we can still identify
+		# its type. That lets us see that --connectivity-only is
+		# not actually looking at the contents, but leaves it
+		# free to examine the type if it chooses.
 		empty=.git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 &&
-		rm -f $empty &&
-		echo invalid >$empty &&
+		blob=$(echo unrelated | git hash-object -w --stdin) &&
+		mv $(sha1_file $blob) $empty &&
+
 		test_must_fail git fsck --strict &&
 		git fsck --strict --connectivity-only &&
 		tree=$(git rev-parse HEAD:) &&
@@ -537,6 +549,13 @@ test_expect_success 'fsck --connectivity-only' '
 	)
 '
 
+test_expect_success 'fsck --connectivity-only with explicit head' '
+	(
+		cd connectivity-only &&
+		test_must_fail git fsck --connectivity-only $tree
+	)
+'
+
 remove_loose_object () {
 	sha1="$(git rev-parse "$1")" &&
 	remainder=${sha1#??} &&
-- 
2.11.0.642.gd6f8cda6c


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/6] fsck: tighten error-checks of "git fsck <head>"
  2017-01-16 21:22 [PATCH 0/6] fsck --connectivity-check misses some corruption Jeff King
                   ` (2 preceding siblings ...)
  2017-01-16 21:32 ` [PATCH 3/6] fsck: prepare dummy objects for --connectivity-check Jeff King
@ 2017-01-16 21:33 ` Jeff King
  2017-01-17 21:17   ` Junio C Hamano
  2017-01-16 21:34 ` [PATCH 5/6] fsck: do not fallback "git fsck <bogus>" to "git fsck" Jeff King
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 15+ messages in thread
From: Jeff King @ 2017-01-16 21:33 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Johannes Schindelin

Instead of checking reachability from the refs, you can ask
fsck to check from a particular set of heads. However, the
error checking here is quite lax. In particular:

  1. It claims lookup_object() will report an error, which
     is not true. It only does a hash lookup, and the user
     has no clue that their argument was skipped.

  2. When either the name or sha1 cannot be resolved, we
     continue to exit with a successful error code, even
     though we didn't check what the user asked us to.

This patch fixes both of these cases.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fsck.c  | 7 +++++--
 t/t1450-fsck.sh | 5 +++++
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index f527d8a02..c7d0590e5 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -755,9 +755,11 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 		if (!get_sha1(arg, sha1)) {
 			struct object *obj = lookup_object(sha1);
 
-			/* Error is printed by lookup_object(). */
-			if (!obj)
+			if (!obj) {
+				error("%s: object missing", sha1_to_hex(sha1));
+				errors_found |= ERROR_OBJECT;
 				continue;
+			}
 
 			obj->used = 1;
 			if (name_objects)
@@ -768,6 +770,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 			continue;
 		}
 		error("invalid parameter: expected sha1, got '%s'", arg);
+		errors_found |= ERROR_OBJECT;
 	}
 
 	/*
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index c1b2dda33..2f3b05276 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -605,4 +605,9 @@ test_expect_success 'fsck notices dangling objects' '
 	)
 '
 
+test_expect_success 'fsck $name notices bogus $name' '
+	test_must_fail git fsck bogus &&
+	test_must_fail git fsck $_z40
+'
+
 test_done
-- 
2.11.0.642.gd6f8cda6c


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/6] fsck: do not fallback "git fsck <bogus>" to "git fsck"
  2017-01-16 21:22 [PATCH 0/6] fsck --connectivity-check misses some corruption Jeff King
                   ` (3 preceding siblings ...)
  2017-01-16 21:33 ` [PATCH 4/6] fsck: tighten error-checks of "git fsck <head>" Jeff King
@ 2017-01-16 21:34 ` Jeff King
  2017-01-16 21:34 ` [PATCH 6/6] fsck: check HAS_OBJ more consistently Jeff King
  2017-01-16 23:26 ` [PATCH 0/6] fsck --connectivity-check misses some corruption Jeff King
  6 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2017-01-16 21:34 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Johannes Schindelin

Since fsck tries to continue as much as it can after seeing
an error, we still do the reachability check even if some
heads we were given on the command-line are bogus. But if
_none_ of the heads is is valid, we fallback to checking all
refs and the index, which is not what the user asked for at
all.

Instead of checking "heads", the number of successful heads
we got, check "argc" (which we know only has non-options in
it, because parse_options removed the others).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fsck.c  |  2 +-
 t/t1450-fsck.sh | 11 +++++++++++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index c7d0590e5..8ae065b2d 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -778,7 +778,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 	 * default ones from .git/refs. We also consider the index file
 	 * in this case (ie this implies --cache).
 	 */
-	if (!heads) {
+	if (!argc) {
 		get_default_heads();
 		keep_cache_objects = 1;
 	}
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 2f3b05276..96b74dc9a 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -610,4 +610,15 @@ test_expect_success 'fsck $name notices bogus $name' '
 	test_must_fail git fsck $_z40
 '
 
+test_expect_success 'bogus head does not fallback to all heads' '
+	# set up a case that will cause a reachability complaint
+	echo to-be-deleted >foo &&
+	git add foo &&
+	blob=$(git rev-parse :foo) &&
+	test_when_finished "git rm --cached foo" &&
+	remove_object $blob &&
+	test_must_fail git fsck $_z40 >out 2>&1 &&
+	! grep $blob out
+'
+
 test_done
-- 
2.11.0.642.gd6f8cda6c


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 6/6] fsck: check HAS_OBJ more consistently
  2017-01-16 21:22 [PATCH 0/6] fsck --connectivity-check misses some corruption Jeff King
                   ` (4 preceding siblings ...)
  2017-01-16 21:34 ` [PATCH 5/6] fsck: do not fallback "git fsck <bogus>" to "git fsck" Jeff King
@ 2017-01-16 21:34 ` Jeff King
  2017-01-16 23:26 ` [PATCH 0/6] fsck --connectivity-check misses some corruption Jeff King
  6 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2017-01-16 21:34 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Johannes Schindelin

There are two spots that call lookup_object() and assume
that a non-NULL result means we have the object:

  1. When we're checking the objects given to us by the user
     on the command line.

  2. When we're checking if a reflog entry is valid.

This generally follows fsck's mental model that we will have
looked at and loaded a "struct object" for each object in
the repository. But it misses one case: if another object
_mentioned_ an object, but we didn't actually parse it or
verify that it exists, it will still have a struct.

It's not clear if this is a triggerable bug or not.
Certainly the later parts of the reachability check need to
be careful of this, and do so by checking the HAS_OBJ flag.
But both of these steps happen before we start traversing,
so probably we won't have followed any links yet. Still,
it's easy enough to be defensive here.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fsck.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 8ae065b2d..28d3cbb14 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -398,7 +398,7 @@ static void fsck_handle_reflog_sha1(const char *refname, unsigned char *sha1,
 
 	if (!is_null_sha1(sha1)) {
 		obj = lookup_object(sha1);
-		if (obj) {
+		if (obj && (obj->flags & HAS_OBJ)) {
 			if (timestamp && name_objects)
 				add_decoration(fsck_walk_options.object_names,
 					obj,
@@ -755,7 +755,7 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 		if (!get_sha1(arg, sha1)) {
 			struct object *obj = lookup_object(sha1);
 
-			if (!obj) {
+			if (!obj || !(obj->flags & HAS_OBJ)) {
 				error("%s: object missing", sha1_to_hex(sha1));
 				errors_found |= ERROR_OBJECT;
 				continue;
-- 
2.11.0.642.gd6f8cda6c

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/6] fsck --connectivity-check misses some corruption
  2017-01-16 21:22 [PATCH 0/6] fsck --connectivity-check misses some corruption Jeff King
                   ` (5 preceding siblings ...)
  2017-01-16 21:34 ` [PATCH 6/6] fsck: check HAS_OBJ more consistently Jeff King
@ 2017-01-16 23:26 ` Jeff King
  6 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2017-01-16 23:26 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Johannes Schindelin

On Mon, Jan 16, 2017 at 04:22:31PM -0500, Jeff King wrote:

> I came across a repository today that was missing an object, and for
> which "git fsck" reported the error but "git fsck --connectivity-check"
> did not. It turns out that the shortcut taken by --connectivity-check
> violates some assumptions made by the rest of fsck (namely, that every
> object in the repo has a "struct object" loaded).
> 
> And fsck being a generally neglected tool, I couldn't help but find
> several more bugs on the way. :)
> 
>   [1/6]: t1450: clean up sub-objects in duplicate-entry test
>   [2/6]: fsck: report trees as dangling
>   [3/6]: fsck: prepare dummy objects for --connectivity-check
>   [4/6]: fsck: tighten error-checks of "git fsck <head>"
>   [5/6]: fsck: do not fallback "git fsck <bogus>" to "git fsck"
>   [6/6]: fsck: check HAS_OBJ more consistently
> 
>  builtin/fsck.c  | 131 ++++++++++++++++++++++++++++++++++++++++++++------------
>  t/t1450-fsck.sh |  70 ++++++++++++++++++++++++++++--
>  2 files changed, 171 insertions(+), 30 deletions(-)

Oh, one thing I forgot to mention: I tried test-merging this with my
other recent topic in the area, jk/loose-object-fsck. There are a few
conflicts in the test script, but nothing hard to resolve (just both of
them adding tests at the end, but both are careful to clean up after
themselves).

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/6] t1450: clean up sub-objects in duplicate-entry test
  2017-01-16 21:24 ` [PATCH 1/6] t1450: clean up sub-objects in duplicate-entry test Jeff King
@ 2017-01-17 20:52   ` Junio C Hamano
  2017-01-17 20:53     ` Jeff King
  0 siblings, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2017-01-17 20:52 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Michael Haggerty, Johannes Schindelin

Jeff King <peff@peff.net> writes:

> This test creates a multi-level set of trees, but its
> cleanup routine only removes the top-level tree. After the
> test finishes, the inner tree and the blob it points to
> remain, making the inner tree dangling.
>
> A later test ("cleaned up") verifies that we've removed any
> cruft and "git fsck" output is clean. This passes only
> because of a bug in git-fsck which fails to notice dangling
> trees.
>
> In preparation for fixing the bug, let's teach this earlier
> test to clean up after itself correctly. We have to remove
> the inner tree (and therefore the blob, too, which becomes
> dangling after removing that tree).
>
> Since the setup code happens inside a subshell, we can't
> just set a variable for each object. However, we can stuff
> all of the sha1s into the $T output variable, which is not
> used for anything except cleanup.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  t/t1450-fsck.sh | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

Thanks.  

It is tempting to move this loop to remove_object, but that is not
necessary while the user is only this one.

>
> diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
> index ee7d4736d..6eef8b28e 100755
> --- a/t/t1450-fsck.sh
> +++ b/t/t1450-fsck.sh
> @@ -189,14 +189,16 @@ test_expect_success 'commit with NUL in header' '
>  '
>  
>  test_expect_success 'tree object with duplicate entries' '
> -	test_when_finished "remove_object \$T" &&
> +	test_when_finished "for i in \$T; do remove_object \$i; done" &&
>  	T=$(
>  		GIT_INDEX_FILE=test-index &&
>  		export GIT_INDEX_FILE &&
>  		rm -f test-index &&
>  		>x &&
>  		git add x &&
> +		git rev-parse :x &&
>  		T=$(git write-tree) &&
> +		echo $T &&
>  		(
>  			git cat-file tree $T &&
>  			git cat-file tree $T

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/6] t1450: clean up sub-objects in duplicate-entry test
  2017-01-17 20:52   ` Junio C Hamano
@ 2017-01-17 20:53     ` Jeff King
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2017-01-17 20:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Michael Haggerty, Johannes Schindelin

On Tue, Jan 17, 2017 at 12:52:43PM -0800, Junio C Hamano wrote:

> > Since the setup code happens inside a subshell, we can't
> > just set a variable for each object. However, we can stuff
> > all of the sha1s into the $T output variable, which is not
> > used for anything except cleanup.
> >
> > Signed-off-by: Jeff King <peff@peff.net>
> > ---
> >  t/t1450-fsck.sh | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> Thanks.  
> 
> It is tempting to move this loop to remove_object, but that is not
> necessary while the user is only this one.

I agree it would be less gross. I avoided it because I knew that I
hacked up remove_object() in the other topic.

-Peff

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/6] fsck: report trees as dangling
  2017-01-16 21:25 ` [PATCH 2/6] fsck: report trees as dangling Jeff King
@ 2017-01-17 20:57   ` Junio C Hamano
  0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2017-01-17 20:57 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Michael Haggerty, Johannes Schindelin

Jeff King <peff@peff.net> writes:

> After checking connectivity, fsck looks through the list of
> any objects we've seen mentioned, and reports unreachable
> and un-"used" ones as dangling. However, it skips any object
> which is not marked as "parsed", as that is an object that
> we _don't_ have (but that somebody mentioned).
>
> Since 6e454b9a3 (clear parsed flag when we free tree
> buffers, 2013-06-05), that flag can't be relied on, and the
> correct method is to check the HAS_OBJ flag. The cleanup in
> that commit missed this callsite, though. As a result, we
> would generally fail to report dangling trees.
>
> We never noticed because there were no tests in this area
> (for trees or otherwise). Let's add some.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---

Makes sense, and the new test is very easy to read, too.

Queued; thanks.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/6] fsck: prepare dummy objects for --connectivity-check
  2017-01-16 21:32 ` [PATCH 3/6] fsck: prepare dummy objects for --connectivity-check Jeff King
@ 2017-01-17 21:15   ` Junio C Hamano
  2017-01-17 21:32     ` Jeff King
  2017-01-17 21:16   ` Junio C Hamano
  1 sibling, 1 reply; 15+ messages in thread
From: Junio C Hamano @ 2017-01-17 21:15 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Michael Haggerty, Johannes Schindelin

Jeff King <peff@peff.net> writes:

> +static void mark_object_for_connectivity(const unsigned char *sha1)
> +{
> +	struct object *obj = lookup_object(sha1);
> +
> +	/*
> +	 * Setting the object type here isn't strictly necessary here for a
> +	 * connectivity check.

Drop one of these "here"s?


> The cmd_fsck() part of the diff is pretty nasty without
> "-b".

True.  I also wonder if swapping the if/else arms make the end
result and the patch easier to read. i.e.

+	if (connectivity_only) {
+		mark loose for connectivity;
+		mark packed for connectivity;
+	} else {
		... existing code comes here reindented ...
	}                

But the patch makes sense.

> diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
> index e88ec7747..c1b2dda33 100755
> --- a/t/t1450-fsck.sh
> +++ b/t/t1450-fsck.sh
> @@ -523,9 +523,21 @@ test_expect_success 'fsck --connectivity-only' '
>  		touch empty &&
>  		git add empty &&
>  		test_commit empty &&
> +
> +		# Drop the index now; we want to be sure that we
> +		# recursively notice that we notice the broken objects
> +		# because they are reachable from refs, not because
> +		# they are in the index.
> +		rm -f .git/index &&
> +
> +		# corrupt the blob, but in a way that we can still identify
> +		# its type. That lets us see that --connectivity-only is
> +		# not actually looking at the contents, but leaves it
> +		# free to examine the type if it chooses.
>  		empty=.git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 &&
> -		rm -f $empty &&
> -		echo invalid >$empty &&
> +		blob=$(echo unrelated | git hash-object -w --stdin) &&
> +		mv $(sha1_file $blob) $empty &&
> +
>  		test_must_fail git fsck --strict &&
>  		git fsck --strict --connectivity-only &&
>  		tree=$(git rev-parse HEAD:) &&
> @@ -537,6 +549,13 @@ test_expect_success 'fsck --connectivity-only' '
>  	)
>  '
>  
> +test_expect_success 'fsck --connectivity-only with explicit head' '
> +	(
> +		cd connectivity-only &&
> +		test_must_fail git fsck --connectivity-only $tree
> +	)
> +'

Most of the earlier "tree=..." assignments are done in subshells,
and it is not clear which tree this refers to.  Is this the one that
was written in 'rev-list --verify-objects with bad sha1' that has
been removed in its when-finished handler?

>  remove_loose_object () {
>  	sha1="$(git rev-parse "$1")" &&
>  	remainder=${sha1#??} &&

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/6] fsck: prepare dummy objects for --connectivity-check
  2017-01-16 21:32 ` [PATCH 3/6] fsck: prepare dummy objects for --connectivity-check Jeff King
  2017-01-17 21:15   ` Junio C Hamano
@ 2017-01-17 21:16   ` Junio C Hamano
  1 sibling, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2017-01-17 21:16 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Michael Haggerty, Johannes Schindelin

Jeff King <peff@peff.net> writes:

> +		# Drop the index now; we want to be sure that we
> +		# recursively notice that we notice the broken objects
> +		# because they are reachable from refs, not because
> +		# they are in the index.

Rephrase to reduce redundunt "notice"s?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/6] fsck: tighten error-checks of "git fsck <head>"
  2017-01-16 21:33 ` [PATCH 4/6] fsck: tighten error-checks of "git fsck <head>" Jeff King
@ 2017-01-17 21:17   ` Junio C Hamano
  0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2017-01-17 21:17 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Michael Haggerty, Johannes Schindelin

Jeff King <peff@peff.net> writes:

> Instead of checking reachability from the refs, you can ask
> fsck to check from a particular set of heads. However, the
> error checking here is quite lax. In particular:
>
>   1. It claims lookup_object() will report an error, which
>      is not true. It only does a hash lookup, and the user
>      has no clue that their argument was skipped.
>
>   2. When either the name or sha1 cannot be resolved, we
>      continue to exit with a successful error code, even
>      though we didn't check what the user asked us to.
>
> This patch fixes both of these cases.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---

Makes sense, too.  Thanks.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/6] fsck: prepare dummy objects for --connectivity-check
  2017-01-17 21:15   ` Junio C Hamano
@ 2017-01-17 21:32     ` Jeff King
  0 siblings, 0 replies; 15+ messages in thread
From: Jeff King @ 2017-01-17 21:32 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Michael Haggerty, Johannes Schindelin

On Tue, Jan 17, 2017 at 01:15:57PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > +static void mark_object_for_connectivity(const unsigned char *sha1)
> > +{
> > +	struct object *obj = lookup_object(sha1);
> > +
> > +	/*
> > +	 * Setting the object type here isn't strictly necessary here for a
> > +	 * connectivity check.
> 
> Drop one of these "here"s?

Yeah.

> > The cmd_fsck() part of the diff is pretty nasty without
> > "-b".
> 
> True.  I also wonder if swapping the if/else arms make the end
> result and the patch easier to read. i.e.
> 
> +	if (connectivity_only) {
> +		mark loose for connectivity;
> +		mark packed for connectivity;
> +	} else {
> 		... existing code comes here reindented ...
> 	}                
> 
> But the patch makes sense.

Yeah. It doesn't help the diff, but the end result is more readable. And
the conditional lacks a negation, which is nice. Will change.

> > +test_expect_success 'fsck --connectivity-only with explicit head' '
> > +	(
> > +		cd connectivity-only &&
> > +		test_must_fail git fsck --connectivity-only $tree
> > +	)
> > +'
> 
> Most of the earlier "tree=..." assignments are done in subshells,
> and it is not clear which tree this refers to.  Is this the one that
> was written in 'rev-list --verify-objects with bad sha1' that has
> been removed in its when-finished handler?

Doh. It was meant to be the one from the earlier --connectivity-only
check. But that is doubly silly, even, because the tree variable in that
case is holding the filename, not the sha1. The right thing is to call
rev-parse again, but of course that doesn't work because the repo has
been corrupted. :-/

Here's a re-roll with the two changes above (and the notice thing you
mentioned elsewhere), plus a test that actually checks the right thing
(fails before this commit and passes after).

If everything else looks good, you can queue it along with the rest.
Otherwise if I need a re-roll, it will be in the next version.

-- >8 --
Subject: fsck: prepare dummy objects for --connectivity-check

Normally fsck makes a pass over all objects to check their
integrity, and then follows up with a reachability check to
make sure we have all of the referenced objects (and to know
which ones are dangling). The latter checks for the HAS_OBJ
flag in obj->flags to see if we found the object in the
first pass.

Commit 02976bf85 (fsck: introduce `git fsck --connectivity-only`,
2015-06-22) taught fsck to skip the initial pass, and to
fallback to has_sha1_file() instead of the HAS_OBJ check.

However, it converted only one HAS_OBJ check to use
has_sha1_file(). But there are many other places in
builtin/fsck.c that assume that the flag is set (or that
lookup_object() will return an object at all). This leads to
several bugs with --connectivity-only:

  1. mark_object() will not queue objects for examination,
     so recursively following links from commits to trees,
     etc, did nothing. I.e., we were checking the
     reachability of hardly anything at all.

  2. When a set of heads is given on the command-line, we
     use lookup_object() to see if they exist. But without
     the initial pass, we assume nothing exists.

  3. When loading reflog entries, we do a similar
     lookup_object() check, and complain that the reflog is
     broken if the object doesn't exist in our hash.

So in short, --connectivity-only is broken pretty badly, and
will claim that your repository is fine when it's not.
Presumably nobody noticed for a few reasons.

One is that the embedded test does not actually test the
recursive nature of the reachability check. All of the
missing objects are still in the index, and we directly
check items from the index. This patch modifies the test to
delete the index, which shows off breakage (1).

Another is that --connectivity-only just skips the initial
pass for loose objects. So on a real repository, the packed
objects were still checked correctly. But on the flipside,
it means that "git fsck --connectivity-only" still checks
the sha1 of all of the packed objects, nullifying its
original purpose of being a faster git-fsck.

And of course the final problem is that the bug only shows
up when there _is_ corruption, which is rare. So anybody
running "git fsck --connectivity-only" proactively would
assume it was being thorough, when it was not.

One possibility for fixing this is to find all of the spots
that rely on HAS_OBJ and tweak them for the connectivity-only
case. But besides the risk that we might miss a spot (and I
found three already, corresponding to the three bugs above),
there are other parts of fsck that _can't_ work without a
full list of objects. E.g., the list of dangling objects.

Instead, let's make the connectivity-only case look more
like the normal case. Rather than skip the initial pass
completely, we'll do an abbreviated one that sets up the
HAS_OBJ flag for each object, without actually loading the
object data.

That's simple and fast, and we don't have to care about the
connectivity_only flag in the rest of the code at all.
While we're at it, let's make sure we treat loose and packed
objects the same (i.e., setting up dummy objects for both
and skipping the actual sha1 check). That makes the
connectivity-only check actually fast on a real repo (40
seconds versus 180 seconds on my copy of linux.git).

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fsck.c  | 120 +++++++++++++++++++++++++++++++++++++++++++++-----------
 t/t1450-fsck.sh |  29 +++++++++++++-
 2 files changed, 124 insertions(+), 25 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 3e67203f9..75e836e2f 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -205,8 +205,6 @@ static void check_reachable_object(struct object *obj)
 	if (!(obj->flags & HAS_OBJ)) {
 		if (has_sha1_pack(obj->oid.hash))
 			return; /* it is in pack - forget about it */
-		if (connectivity_only && has_object_file(&obj->oid))
-			return;
 		printf("missing %s %s\n", typename(obj->type),
 			describe_object(obj));
 		errors_found |= ERROR_REACHABLE;
@@ -584,6 +582,79 @@ static int fsck_cache_tree(struct cache_tree *it)
 	return err;
 }
 
+static void mark_object_for_connectivity(const unsigned char *sha1)
+{
+	struct object *obj = lookup_object(sha1);
+
+	/*
+	 * Setting the object type here isn't strictly necessary for a
+	 * connectivity check. In most cases, our walk will expect a certain
+	 * type (e.g., a tree referencing a blob) and will use lookup_blob() to
+	 * assign the type. But doing it here has two advantages:
+	 *
+	 *   1. When the fsck_walk code looks at objects that _don't_ come from
+	 *      links (e.g., the tip of a ref), it may complain about the
+	 *      "unknown object type".
+	 *
+	 *   2. This serves as a nice cross-check that the graph links are
+	 *      sane. So --connectivity-only does not check that the bits of
+	 *      blobs are not corrupted, but it _does_ check that 100644 tree
+	 *      entries point to blobs, and so forth.
+	 *
+	 * Unfortunately we can't just use parse_object() here, because the
+	 * whole point of --connectivity-only is to avoid reading the object
+	 * data more than necessary.
+	 */
+	if (!obj || obj->type == OBJ_NONE) {
+		enum object_type type = sha1_object_info(sha1, NULL);
+		switch (type) {
+		case OBJ_BAD:
+			error("%s: unable to read object type",
+			      sha1_to_hex(sha1));
+			break;
+		case OBJ_COMMIT:
+			obj = (struct object *)lookup_commit(sha1);
+			break;
+		case OBJ_TREE:
+			obj = (struct object *)lookup_tree(sha1);
+			break;
+		case OBJ_BLOB:
+			obj = (struct object *)lookup_blob(sha1);
+			break;
+		case OBJ_TAG:
+			obj = (struct object *)lookup_tag(sha1);
+			break;
+		default:
+			error("%s: unknown object type %d",
+			      sha1_to_hex(sha1), type);
+		}
+
+		if (!obj || obj->type == OBJ_NONE) {
+			errors_found |= ERROR_OBJECT;
+			return;
+		}
+	}
+
+	obj->flags |= HAS_OBJ;
+}
+
+static int mark_loose_for_connectivity(const unsigned char *sha1,
+				       const char *path,
+				       void *data)
+{
+	mark_object_for_connectivity(sha1);
+	return 0;
+}
+
+static int mark_packed_for_connectivity(const unsigned char *sha1,
+					struct packed_git *pack,
+					uint32_t pos,
+					void *data)
+{
+	mark_object_for_connectivity(sha1);
+	return 0;
+}
+
 static char const * const fsck_usage[] = {
 	N_("git fsck [<options>] [<object>...]"),
 	NULL
@@ -640,38 +711,41 @@ int cmd_fsck(int argc, const char **argv, const char *prefix)
 	git_config(fsck_config, NULL);
 
 	fsck_head_link();
-	if (!connectivity_only) {
+	if (connectivity_only) {
+		for_each_loose_object(mark_loose_for_connectivity, NULL, 0);
+		for_each_packed_object(mark_packed_for_connectivity, NULL, 0);
+	} else {
 		fsck_object_dir(get_object_directory());
 
 		prepare_alt_odb();
 		for (alt = alt_odb_list; alt; alt = alt->next)
 			fsck_object_dir(alt->path);
-	}
 
-	if (check_full) {
-		struct packed_git *p;
-		uint32_t total = 0, count = 0;
-		struct progress *progress = NULL;
+		if (check_full) {
+			struct packed_git *p;
+			uint32_t total = 0, count = 0;
+			struct progress *progress = NULL;
+
+			prepare_packed_git();
 
-		prepare_packed_git();
+			if (show_progress) {
+				for (p = packed_git; p; p = p->next) {
+					if (open_pack_index(p))
+						continue;
+					total += p->num_objects;
+				}
 
-		if (show_progress) {
+				progress = start_progress(_("Checking objects"), total);
+			}
 			for (p = packed_git; p; p = p->next) {
-				if (open_pack_index(p))
-					continue;
-				total += p->num_objects;
+				/* verify gives error messages itself */
+				if (verify_pack(p, fsck_obj_buffer,
+						progress, count))
+					errors_found |= ERROR_PACK;
+				count += p->num_objects;
 			}
-
-			progress = start_progress(_("Checking objects"), total);
-		}
-		for (p = packed_git; p; p = p->next) {
-			/* verify gives error messages itself */
-			if (verify_pack(p, fsck_obj_buffer,
-					progress, count))
-				errors_found |= ERROR_PACK;
-			count += p->num_objects;
+			stop_progress(&progress);
 		}
-		stop_progress(&progress);
 	}
 
 	heads = 0;
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index e88ec7747..4d1c3ba66 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -523,9 +523,21 @@ test_expect_success 'fsck --connectivity-only' '
 		touch empty &&
 		git add empty &&
 		test_commit empty &&
+
+		# Drop the index now; we want to be sure that we
+		# recursively notice the broken objects
+		# because they are reachable from refs, not because
+		# they are in the index.
+		rm -f .git/index &&
+
+		# corrupt the blob, but in a way that we can still identify
+		# its type. That lets us see that --connectivity-only is
+		# not actually looking at the contents, but leaves it
+		# free to examine the type if it chooses.
 		empty=.git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 &&
-		rm -f $empty &&
-		echo invalid >$empty &&
+		blob=$(echo unrelated | git hash-object -w --stdin) &&
+		mv $(sha1_file $blob) $empty &&
+
 		test_must_fail git fsck --strict &&
 		git fsck --strict --connectivity-only &&
 		tree=$(git rev-parse HEAD:) &&
@@ -537,6 +549,19 @@ test_expect_success 'fsck --connectivity-only' '
 	)
 '
 
+test_expect_success 'fsck --connectivity-only with explicit head' '
+	rm -rf connectivity-only &&
+	git init connectivity-only &&
+	(
+		cd connectivity-only &&
+		test_commit foo &&
+		rm -f .git/index &&
+		tree=$(git rev-parse HEAD^{tree}) &&
+		remove_object $(git rev-parse HEAD:foo.t) &&
+		test_must_fail git fsck --connectivity-only $tree
+	)
+'
+
 remove_loose_object () {
 	sha1="$(git rev-parse "$1")" &&
 	remainder=${sha1#??} &&
-- 
2.11.0.651.g41f4a5c4e


^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-01-17 21:34 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-16 21:22 [PATCH 0/6] fsck --connectivity-check misses some corruption Jeff King
2017-01-16 21:24 ` [PATCH 1/6] t1450: clean up sub-objects in duplicate-entry test Jeff King
2017-01-17 20:52   ` Junio C Hamano
2017-01-17 20:53     ` Jeff King
2017-01-16 21:25 ` [PATCH 2/6] fsck: report trees as dangling Jeff King
2017-01-17 20:57   ` Junio C Hamano
2017-01-16 21:32 ` [PATCH 3/6] fsck: prepare dummy objects for --connectivity-check Jeff King
2017-01-17 21:15   ` Junio C Hamano
2017-01-17 21:32     ` Jeff King
2017-01-17 21:16   ` Junio C Hamano
2017-01-16 21:33 ` [PATCH 4/6] fsck: tighten error-checks of "git fsck <head>" Jeff King
2017-01-17 21:17   ` Junio C Hamano
2017-01-16 21:34 ` [PATCH 5/6] fsck: do not fallback "git fsck <bogus>" to "git fsck" Jeff King
2017-01-16 21:34 ` [PATCH 6/6] fsck: check HAS_OBJ more consistently Jeff King
2017-01-16 23:26 ` [PATCH 0/6] fsck --connectivity-check misses some corruption Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).