"git fsck" not detecting garbage at the end of blob object files...

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* "git fsck" not detecting garbage at the end of blob object files...
@ 2017-01-07 12:50 John Szakmeister
  2017-01-07 21:47 ` Dennis Kaarsemaker
  0 siblings, 1 reply; 39+ messages in thread
From: John Szakmeister @ 2017-01-07 12:50 UTC (permalink / raw)
  To: git

I was perusing StackOverflow this morning and ran across this
question: http://stackoverflow.com/questions/41521143/git-fsck-full-only-checking-directories/

It was a simple question about why "checking objects" was not
appearing, but in it was another issue.  The user purposefully
corrupted a blob object file to see if `git fsck` would catch it by
tacking extra data on at the end.  `git fsck` happily said everything
was okay, but when I played with things locally I found out that `git
gc` does not like that extra garbage.  I'm not sure what the trade-off
needs to be here, but my expectation is that if `git fsck` says
everything is okay, then all operations using that object (file)
should work too.

Is that unreasonable?  What would be the impact of fixing this issue?

-John

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: "git fsck" not detecting garbage at the end of blob object files...
  2017-01-07 12:50 "git fsck" not detecting garbage at the end of blob object files John Szakmeister
@ 2017-01-07 21:47 ` Dennis Kaarsemaker
  2017-01-08  5:26   ` Jeff King
  2017-01-13  9:16   ` "git fsck" not detecting garbage at the end of blob object files John Szakmeister
  0 siblings, 2 replies; 39+ messages in thread
From: Dennis Kaarsemaker @ 2017-01-07 21:47 UTC (permalink / raw)
  To: John Szakmeister, git

On Sat, 2017-01-07 at 07:50 -0500, John Szakmeister wrote:
> I was perusing StackOverflow this morning and ran across this
> question: http://stackoverflow.com/questions/41521143/git-fsck-full-only-checking-directories/
> 
> It was a simple question about why "checking objects" was not
> appearing, but in it was another issue.  The user purposefully
> corrupted a blob object file to see if `git fsck` would catch it by
> tacking extra data on at the end.  `git fsck` happily said everything
> was okay, but when I played with things locally I found out that `git
> gc` does not like that extra garbage.  I'm not sure what the trade-off
> needs to be here, but my expectation is that if `git fsck` says
> everything is okay, then all operations using that object (file)
> should work too.
> 
> Is that unreasonable?  What would be the impact of fixing this issue?

If you do this with a commit object or tree object, fsck does complain.
I think it's sensible to do so for blob objects as well.

Editing blob object:

hurricane:/tmp/moo (master)$ hexer .git/objects/a1/b3ebb97f10ff8d85a9472bcba50cb575dbd485 
hurricane:/tmp/moo (master)$ git status
On branch master
nothing to commit, working tree clean
hurricane:/tmp/moo (master)$ git fsck
Checking object directories: 100% (256/256), done.
hurricane:/tmp/moo (master)$ git gc
Counting objects: 3, done.
error: garbage at end of loose object 'a1b3ebb97f10ff8d85a9472bcba50cb575dbd485'
fatal: loose object a1b3ebb97f10ff8d85a9472bcba50cb575dbd485 (stored in .git/objects/a1/b3ebb97f10ff8d85a9472bcba50cb575dbd485) is corrupt
error: failed to run repack

Editing tree object:

hurricane:/tmp/moo (master)$ hexer .git/objects/d4/eda486f02e3e862e23f6eb3739a25a2ca43f20
hurricane:/tmp/moo (master +)$ git status
error: garbage at end of loose object 'd4eda486f02e3e862e23f6eb3739a25a2ca43f20'
fatal: loose object d4eda486f02e3e862e23f6eb3739a25a2ca43f20 (stored in .git/objects/d4/eda486f02e3e862e23f6eb3739a25a2ca43f20) is corrupt
error: garbage at end of loose object 'd4eda486f02e3e862e23f6eb3739a25a2ca43f20'
fatal: loose object d4eda486f02e3e862e23f6eb3739a25a2ca43f20 (stored in .git/objects/d4/eda486f02e3e862e23f6eb3739a25a2ca43f20) is corrupt
hurricane:/tmp/moo (master +)$ git fsck
error: garbage at end of loose object 'd4eda486f02e3e862e23f6eb3739a25a2ca43f20'
fatal: loose object d4eda486f02e3e862e23f6eb3739a25a2ca43f20 (stored in .git/objects/d4/eda486f02e3e862e23f6eb3739a25a2ca43f20) is corrupt
error: garbage at end of loose object 'd4eda486f02e3e862e23f6eb3739a25a2ca43f20'
fatal: loose object d4eda486f02e3e862e23f6eb3739a25a2ca43f20 (stored in .git/objects/d4/eda486f02e3e862e23f6eb3739a25a2ca43f20) is corrupt

Editing commit object:

hurricane:/tmp/moo (master)$ echo test >> .git/objects/47/59a693f7e8362c724d3365fe6df398083fafa0 
hurricane:/tmp/moo (master +)$ git status
error: garbage at end of loose object '4759a693f7e8362c724d3365fe6df398083fafa0'
fatal: loose object 4759a693f7e8362c724d3365fe6df398083fafa0 (stored in .git/objects/47/59a693f7e8362c724d3365fe6df398083fafa0) is corrupt
error: garbage at end of loose object '4759a693f7e8362c724d3365fe6df398083fafa0'
fatal: loose object 4759a693f7e8362c724d3365fe6df398083fafa0 (stored in .git/objects/47/59a693f7e8362c724d3365fe6df398083fafa0) is corrupt
!(128) hurricane:/tmp/moo (master +)$ git fsck
error: garbage at end of loose object '4759a693f7e8362c724d3365fe6df398083fafa0'
fatal: loose object 4759a693f7e8362c724d3365fe6df398083fafa0 (stored in .git/objects/47/59a693f7e8362c724d3365fe6df398083fafa0) is corrupt
error: garbage at end of loose object '4759a693f7e8362c724d3365fe6df398083fafa0'
fatal: loose object 4759a693f7e8362c724d3365fe6df398083fafa0 (stored in .git/objects/47/59a693f7e8362c724d3365fe6df398083fafa0) is corrupt

D.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: "git fsck" not detecting garbage at the end of blob object files...
  2017-01-07 21:47 ` Dennis Kaarsemaker
@ 2017-01-08  5:26   ` Jeff King
  2017-01-13  9:15     ` John Szakmeister
  2017-01-13  9:16   ` "git fsck" not detecting garbage at the end of blob object files John Szakmeister
  1 sibling, 1 reply; 39+ messages in thread
From: Jeff King @ 2017-01-08  5:26 UTC (permalink / raw)
  To: Dennis Kaarsemaker; +Cc: John Szakmeister, git

On Sat, Jan 07, 2017 at 10:47:03PM +0100, Dennis Kaarsemaker wrote:

> On Sat, 2017-01-07 at 07:50 -0500, John Szakmeister wrote:
> > I was perusing StackOverflow this morning and ran across this
> > question: http://stackoverflow.com/questions/41521143/git-fsck-full-only-checking-directories/
> > 
> > It was a simple question about why "checking objects" was not
> > appearing, but in it was another issue.  The user purposefully
> > corrupted a blob object file to see if `git fsck` would catch it by
> > tacking extra data on at the end.  `git fsck` happily said everything
> > was okay, but when I played with things locally I found out that `git
> > gc` does not like that extra garbage.  I'm not sure what the trade-off
> > needs to be here, but my expectation is that if `git fsck` says
> > everything is okay, then all operations using that object (file)
> > should work too.
> > 
> > Is that unreasonable?  What would be the impact of fixing this issue?
> 
> If you do this with a commit object or tree object, fsck does complain.
> I think it's sensible to do so for blob objects as well.

The existing extra-garbage check is in unpack_sha1_rest(), which is
called as part of read_sha1_file(). And that's what we hit for commits
and trees. However, we check the sha1 of blobs using the streaming
interface (in case they're large). I think you'd want to put a similar
check into read_istream_loose(). But note if you are grepping for it, it
is hidden behind a macro; look for read_method_decl(loose).

I'm actually not sure if this should be downgrade to a warning. It's
true that it's a form of corruption, but it doesn't actually prohibit us
from getting the data we need to complete the operation. Arguably fsck
should be more picky, but it is just relying on the same parse_object()
code path that the rest of git uses.

I doubt anybody cares too much either way, though. It's not like this is
a common thing.

I did notice another interesting case when looking at this. Fsck ends up
in fsck_loose(), which has the sha1 and path of the loose object. It
passes the sha1 to fsck_sha1(), and ignores the path entirely!

So if you have a duplicate copy of the object in a pack, we'd actually
find and check the duplicate. This can happen, e.g., if you had a loose
object and fetched a thin-pack which made a copy of the loose object to
complete the pack).

Probably fsck_loose() should be more picky about making sure we are
reading the data from the loose version we found.

-Peff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: "git fsck" not detecting garbage at the end of blob object files...
  2017-01-08  5:26   ` Jeff King
@ 2017-01-13  9:15     ` John Szakmeister
  2017-01-13 17:52       ` [PATCH 0/6] loose-object fsck fixes/tightening Jeff King
  0 siblings, 1 reply; 39+ messages in thread
From: John Szakmeister @ 2017-01-13  9:15 UTC (permalink / raw)
  To: Jeff King; +Cc: Dennis Kaarsemaker, git

On Sun, Jan 8, 2017 at 12:26 AM, Jeff King <peff@peff.net> wrote:
> On Sat, Jan 07, 2017 at 10:47:03PM +0100, Dennis Kaarsemaker wrote:
>> On Sat, 2017-01-07 at 07:50 -0500, John Szakmeister wrote:
>> > I was perusing StackOverflow this morning and ran across this
>> > question: http://stackoverflow.com/questions/41521143/git-fsck-full-only-checking-directories/
>> >
>> > It was a simple question about why "checking objects" was not
>> > appearing, but in it was another issue.  The user purposefully
>> > corrupted a blob object file to see if `git fsck` would catch it by
>> > tacking extra data on at the end.  `git fsck` happily said everything
>> > was okay, but when I played with things locally I found out that `git
>> > gc` does not like that extra garbage.  I'm not sure what the trade-off
>> > needs to be here, but my expectation is that if `git fsck` says
>> > everything is okay, then all operations using that object (file)
>> > should work too.
>> >
>> > Is that unreasonable?  What would be the impact of fixing this issue?
>>
>> If you do this with a commit object or tree object, fsck does complain.
>> I think it's sensible to do so for blob objects as well.
>
> The existing extra-garbage check is in unpack_sha1_rest(), which is
> called as part of read_sha1_file(). And that's what we hit for commits
> and trees. However, we check the sha1 of blobs using the streaming
> interface (in case they're large). I think you'd want to put a similar
> check into read_istream_loose(). But note if you are grepping for it, it
> is hidden behind a macro; look for read_method_decl(loose).

That's for the pointer.

> I'm actually not sure if this should be downgrade to a warning. It's
> true that it's a form of corruption, but it doesn't actually prohibit us
> from getting the data we need to complete the operation. Arguably fsck
> should be more picky, but it is just relying on the same parse_object()
> code path that the rest of git uses.
>
> I doubt anybody cares too much either way, though. It's not like this is
> a common thing.

I kind of wonder about that myself too, and I'm not sure what to
think about it.  On the one hand, I'd like to know about
*anything* that has changed in an adverse way--it could indicate
a failure somewhere else that needs to be handled.  On the other
hand, scaring the user isn't all that advantageous.  I guess I'm
in the former camp.

As to whether this is common, yeah, it's probably not.  However,
I was surprised by the number of results that turned up when I
search for "garbage at end of loose object".

> I did notice another interesting case when looking at this. Fsck ends up
> in fsck_loose(), which has the sha1 and path of the loose object. It
> passes the sha1 to fsck_sha1(), and ignores the path entirely!
>
> So if you have a duplicate copy of the object in a pack, we'd actually
> find and check the duplicate. This can happen, e.g., if you had a loose
> object and fetched a thin-pack which made a copy of the loose object to
> complete the pack).
>
> Probably fsck_loose() should be more picky about making sure we are
> reading the data from the loose version we found.

Interesting find!  Thanks for the information Peff!

-John

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: "git fsck" not detecting garbage at the end of blob object files...
  2017-01-07 21:47 ` Dennis Kaarsemaker
  2017-01-08  5:26   ` Jeff King
@ 2017-01-13  9:16   ` John Szakmeister
  1 sibling, 0 replies; 39+ messages in thread
From: John Szakmeister @ 2017-01-13  9:16 UTC (permalink / raw)
  To: Dennis Kaarsemaker; +Cc: git

On Sat, Jan 7, 2017 at 4:47 PM, Dennis Kaarsemaker
<dennis@kaarsemaker.net> wrote:
> On Sat, 2017-01-07 at 07:50 -0500, John Szakmeister wrote:
>> I was perusing StackOverflow this morning and ran across this
>> question: http://stackoverflow.com/questions/41521143/git-fsck-full-only-checking-directories/
>>
>> It was a simple question about why "checking objects" was not
>> appearing, but in it was another issue.  The user purposefully
>> corrupted a blob object file to see if `git fsck` would catch it by
>> tacking extra data on at the end.  `git fsck` happily said everything
>> was okay, but when I played with things locally I found out that `git
>> gc` does not like that extra garbage.  I'm not sure what the trade-off
>> needs to be here, but my expectation is that if `git fsck` says
>> everything is okay, then all operations using that object (file)
>> should work too.
>>
>> Is that unreasonable?  What would be the impact of fixing this issue?
>
> If you do this with a commit object or tree object, fsck does complain.
> I think it's sensible to do so for blob objects as well.

Also very good information.  Thanks Dennis!

-John

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 0/6] loose-object fsck fixes/tightening
  2017-01-13  9:15     ` John Szakmeister
@ 2017-01-13 17:52       ` Jeff King
  2017-01-13 17:54         ` [PATCH 1/6] t1450: refactor loose-object removal Jeff King
                           ` (6 more replies)
  0 siblings, 7 replies; 39+ messages in thread
From: Jeff King @ 2017-01-13 17:52 UTC (permalink / raw)
  To: John Szakmeister; +Cc: Dennis Kaarsemaker, git

On Fri, Jan 13, 2017 at 04:15:42AM -0500, John Szakmeister wrote:

> > I did notice another interesting case when looking at this. Fsck ends up
> > in fsck_loose(), which has the sha1 and path of the loose object. It
> > passes the sha1 to fsck_sha1(), and ignores the path entirely!
> >
> > So if you have a duplicate copy of the object in a pack, we'd actually
> > find and check the duplicate. This can happen, e.g., if you had a loose
> > object and fetched a thin-pack which made a copy of the loose object to
> > complete the pack).
> >
> > Probably fsck_loose() should be more picky about making sure we are
> > reading the data from the loose version we found.
> 
> Interesting find!  Thanks for the information Peff!

So I figured I would knock this out as a fun morning exercise. But
sheesh, it turned out to be a slog, because most of the functions rely
on map_sha1_file() to convert the sha1 to an object path at the lowest
level.

But I finally got something working, so here it is. I found another bug
on the way, along with a few cleanups. And then I did the trailing
garbage detection at the end, because by that point I knew right where
it needed to go. :)

  [1/6]: t1450: refactor loose-object removal
  [2/6]: sha1_file: fix error message for alternate objects
  [3/6]: t1450: test fsck of packed objects
  [4/6]: sha1_file: add read_loose_object() function
  [5/6]: fsck: parse loose object paths directly
  [6/6]: fsck: detect trailing garbage in all object types

 builtin/fsck.c  |  46 +++++++++++----
 cache.h         |  13 ++++
 sha1_file.c     | 180 +++++++++++++++++++++++++++++++++++++++++++++++++++-----
 t/t1450-fsck.sh |  86 +++++++++++++++++++++++----
 4 files changed, 284 insertions(+), 41 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/6] t1450: refactor loose-object removal
  2017-01-13 17:52       ` [PATCH 0/6] loose-object fsck fixes/tightening Jeff King
@ 2017-01-13 17:54         ` Jeff King
  2017-01-13 17:54         ` [PATCH 2/6] sha1_file: fix error message for alternate objects Jeff King
                           ` (5 subsequent siblings)
  6 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2017-01-13 17:54 UTC (permalink / raw)
  To: John Szakmeister; +Cc: Dennis Kaarsemaker, git

Commit 90cf590f5 (fsck: optionally show more helpful info
for broken links, 2016-07-17) added a remove_loose_object()
helper, but we already had a remove_object() helper that did
the same thing. Let's combine these into one.

The implementations had a few subtle differences, so I've
tried to take the best of both:

  - the original used "sed", but the newer version avoids
    spawning an extra process

  - the original processed "$*", which was nonsense, as it
    assumed only a single sha1. Use "$1" to make that more
    clear.

  - the newer version ran an extra rev-parse, but it was not
    necessary; it's sole caller already converted the
    argument into a raw sha1

  - the original used "rm -f", whereas the new one uses
    "rm". The latter is better because it may notice a bug
    or other unexpected failure in the test. (The original
    does check that the object exists before we remove it,
    which is good, but that's a subset of the possible
    unexpected conditions).

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t1450-fsck.sh | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index ee7d4736d..3297d4cb2 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -43,13 +43,13 @@ test_expect_success 'HEAD is part of refs, valid objects appear valid' '
 
 test_expect_success 'setup: helpers for corruption tests' '
 	sha1_file() {
-		echo "$*" | sed "s#..#.git/objects/&/#"
+		remainder=${1#??} &&
+		firsttwo=${1%$remainder} &&
+		echo ".git/objects/$firsttwo/$remainder"
 	} &&
 
 	remove_object() {
-		file=$(sha1_file "$*") &&
-		test -e "$file" &&
-		rm -f "$file"
+		rm "$(sha1_file "$1")"
 	}
 '
 
@@ -535,13 +535,6 @@ test_expect_success 'fsck --connectivity-only' '
 	)
 '
 
-remove_loose_object () {
-	sha1="$(git rev-parse "$1")" &&
-	remainder=${sha1#??} &&
-	firsttwo=${sha1%$remainder} &&
-	rm .git/objects/$firsttwo/$remainder
-}
-
 test_expect_success 'fsck --name-objects' '
 	rm -rf name-objects &&
 	git init name-objects &&
@@ -550,7 +543,7 @@ test_expect_success 'fsck --name-objects' '
 		test_commit julius caesar.t &&
 		test_commit augustus &&
 		test_commit caesar &&
-		remove_loose_object $(git rev-parse julius:caesar.t) &&
+		remove_object $(git rev-parse julius:caesar.t) &&
 		test_must_fail git fsck --name-objects >out &&
 		tree=$(git rev-parse --verify julius:) &&
 		grep "$tree (\(refs/heads/master\|HEAD\)@{[0-9]*}:" out
-- 
2.11.0.629.g10075098c


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 2/6] sha1_file: fix error message for alternate objects
  2017-01-13 17:52       ` [PATCH 0/6] loose-object fsck fixes/tightening Jeff King
  2017-01-13 17:54         ` [PATCH 1/6] t1450: refactor loose-object removal Jeff King
@ 2017-01-13 17:54         ` Jeff King
  2017-01-13 17:55         ` [PATCH 3/6] t1450: test fsck of packed objects Jeff King
                           ` (4 subsequent siblings)
  6 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2017-01-13 17:54 UTC (permalink / raw)
  To: John Szakmeister; +Cc: Dennis Kaarsemaker, git

When we fail to open a corrupt loose object, we report an
error and mention the filename via sha1_file_name().
However, that function will always give us a path in the
local repository, whereas the corrupt object may have come
from an alternate. The result is a very misleading error
message.

Teach the open_sha1_file() and stat_sha1_file() helpers to
pass back the path they found, so that we can report it
correctly.

Note that the pointers we return go to static storage (e.g.,
from sha1_file_name()), which is slightly dangerous.
However, these helpers are static local helpers, and the
names are used for immediately generating error messages.
The simplicity is an acceptable tradeoff for the danger.

Signed-off-by: Jeff King <peff@peff.net>
---
 sha1_file.c     | 46 +++++++++++++++++++++++++++++++---------------
 t/t1450-fsck.sh | 10 ++++++++++
 2 files changed, 41 insertions(+), 15 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 1eb47f611..c6b990f41 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1630,39 +1630,54 @@ int git_open_cloexec(const char *name, int flags)
 	return fd;
 }
 
-static int stat_sha1_file(const unsigned char *sha1, struct stat *st)
+/*
+ * Find "sha1" as a loose object in the local repository or in an alternate.
+ * Returns 0 on success, negative on failure.
+ *
+ * The "path" out-parameter will give the path of the object we found (if any).
+ * Note that it may point to static storage and is only valid until another
+ * call to sha1_file_name(), etc.
+ */
+static int stat_sha1_file(const unsigned char *sha1, struct stat *st,
+			  const char **path)
 {
 	struct alternate_object_database *alt;
 
-	if (!lstat(sha1_file_name(sha1), st))
+	*path = sha1_file_name(sha1);
+	if (!lstat(*path, st))
 		return 0;
 
 	prepare_alt_odb();
 	errno = ENOENT;
 	for (alt = alt_odb_list; alt; alt = alt->next) {
-		const char *path = alt_sha1_path(alt, sha1);
-		if (!lstat(path, st))
+		*path = alt_sha1_path(alt, sha1);
+		if (!lstat(*path, st))
 			return 0;
 	}
 
 	return -1;
 }
 
-static int open_sha1_file(const unsigned char *sha1)
+/*
+ * Like stat_sha1_file(), but actually open the object and return the
+ * descriptor. See the caveats on the "path" parameter above.
+ */
+static int open_sha1_file(const unsigned char *sha1, const char **path)
 {
 	int fd;
 	struct alternate_object_database *alt;
 	int most_interesting_errno;
 
-	fd = git_open(sha1_file_name(sha1));
+	*path = sha1_file_name(sha1);
+	fd = git_open(*path);
 	if (fd >= 0)
 		return fd;
 	most_interesting_errno = errno;
 
 	prepare_alt_odb();
 	for (alt = alt_odb_list; alt; alt = alt->next) {
-		const char *path = alt_sha1_path(alt, sha1);
-		fd = git_open(path);
+		*path = alt_sha1_path(alt, sha1);
+		fd = git_open(*path);
 		if (fd >= 0)
 			return fd;
 		if (most_interesting_errno == ENOENT)
@@ -1674,10 +1689,11 @@ static int open_sha1_file(const unsigned char *sha1)
 
 void *map_sha1_file(const unsigned char *sha1, unsigned long *size)
 {
+	const char *path;
 	void *map;
 	int fd;
 
-	fd = open_sha1_file(sha1);
+	fd = open_sha1_file(sha1, &path);
 	map = NULL;
 	if (fd >= 0) {
 		struct stat st;
@@ -1686,7 +1702,7 @@ void *map_sha1_file(const unsigned char *sha1, unsigned long *size)
 			*size = xsize_t(st.st_size);
 			if (!*size) {
 				/* mmap() is forbidden on empty files */
-				error("object file %s is empty", sha1_file_name(sha1));
+				error("object file %s is empty", path);
 				return NULL;
 			}
 			map = xmmap(NULL, *size, PROT_READ, MAP_PRIVATE, fd, 0);
@@ -2806,8 +2822,9 @@ static int sha1_loose_object_info(const unsigned char *sha1,
 	 * object even exists.
 	 */
 	if (!oi->typep && !oi->typename && !oi->sizep) {
+		const char *path;
 		struct stat st;
-		if (stat_sha1_file(sha1, &st) < 0)
+		if (stat_sha1_file(sha1, &st, &path) < 0)
 			return -1;
 		if (oi->disk_sizep)
 			*oi->disk_sizep = st.st_size;
@@ -3003,6 +3020,8 @@ void *read_sha1_file_extended(const unsigned char *sha1,
 {
 	void *data;
 	const struct packed_git *p;
+	const char *path;
+	struct stat st;
 	const unsigned char *repl = lookup_replace_object_extended(sha1, flag);
 
 	errno = 0;
@@ -3018,12 +3037,9 @@ void *read_sha1_file_extended(const unsigned char *sha1,
 		die("replacement %s not found for %s",
 		    sha1_to_hex(repl), sha1_to_hex(sha1));
 
-	if (has_loose_object(repl)) {
-		const char *path = sha1_file_name(sha1);
-
+	if (!stat_sha1_file(repl, &st, &path))
 		die("loose object %s (stored in %s) is corrupt",
 		    sha1_to_hex(repl), path);
-	}
 
 	if ((p = has_packed_and_bad(repl)) != NULL)
 		die("packed object %s (stored in %s) is corrupt",
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 3297d4cb2..f95174c9d 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -550,4 +550,14 @@ test_expect_success 'fsck --name-objects' '
 	)
 '
 
+test_expect_success 'alternate objects are correctly blamed' '
+	test_when_finished "rm -rf alt.git .git/objects/info/alternates" &&
+	git init --bare alt.git &&
+	echo "../../alt.git/objects" >.git/objects/info/alternates &&
+	mkdir alt.git/objects/12 &&
+	>alt.git/objects/12/34567890123456789012345678901234567890 &&
+	test_must_fail git fsck >out 2>&1 &&
+	grep alt.git out
+'
+
 test_done
-- 
2.11.0.629.g10075098c


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 3/6] t1450: test fsck of packed objects
  2017-01-13 17:52       ` [PATCH 0/6] loose-object fsck fixes/tightening Jeff King
  2017-01-13 17:54         ` [PATCH 1/6] t1450: refactor loose-object removal Jeff King
  2017-01-13 17:54         ` [PATCH 2/6] sha1_file: fix error message for alternate objects Jeff King
@ 2017-01-13 17:55         ` Jeff King
  2017-01-13 17:58         ` [PATCH 4/6] sha1_file: add read_loose_object() function Jeff King
                           ` (3 subsequent siblings)
  6 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2017-01-13 17:55 UTC (permalink / raw)
  To: John Szakmeister; +Cc: Dennis Kaarsemaker, git

The code paths in fsck for packed and loose objects are
quite different, and it is not immediately obvious that the
packed case behaves well. In particular:

  1. The fsck_loose() function always returns "0" to tell the
     iterator to keep checking more objects. Whereas
     fsck_obj_buffer() (which handles packed objects)
     returns -1. This is OK, because the callback machinery
     for verify_pack() does not stop when it sees a non-zero
     return.

  2. The fsck_loose() function sets the ERROR_OBJECT bit
     when fsck_obj() fails, whereas fsck_obj_buffer() sets it
     only when it sees a corrupt object. This turns out not
     to matter. We don't actually do anything with this bit
     except exit the program with a non-zero code, and that
     is handled already by the non-zero return from the
     function.

So there are no bugs here, but it was certainly confusing to
me. And we do not test either of the properties in t1450
(neither that a non-corruption error will caused a non-zero
exit for a packed object, nor that we keep going after
seeing the first error). Let's test both of those
conditions, so that we'll notice if any of those assumptions
becomes invalid.

Signed-off-by: Jeff King <peff@peff.net>
---
 t/t1450-fsck.sh | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index f95174c9d..c39d42120 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -560,4 +560,25 @@ test_expect_success 'alternate objects are correctly blamed' '
 	grep alt.git out
 '
 
+test_expect_success 'fsck errors in packed objects' '
+	git cat-file commit HEAD >basis &&
+	sed "s/</one/" basis >one &&
+	sed "s/</foo/" basis >two &&
+	one=$(git hash-object -t commit -w one) &&
+	two=$(git hash-object -t commit -w two) &&
+	pack=$(
+		{
+			echo $one &&
+			echo $two
+		} | git pack-objects .git/objects/pack/pack
+	) &&
+	test_when_finished "rm -f .git/objects/pack/pack-$pack.*" &&
+	remove_object $one &&
+	remove_object $two &&
+	test_must_fail git fsck 2>out &&
+	grep "error in commit $one.* - bad name" out &&
+	grep "error in commit $two.* - bad name" out &&
+	! grep corrupt out
+'
+
 test_done
-- 
2.11.0.629.g10075098c


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 4/6] sha1_file: add read_loose_object() function
  2017-01-13 17:52       ` [PATCH 0/6] loose-object fsck fixes/tightening Jeff King
                           ` (2 preceding siblings ...)
  2017-01-13 17:55         ` [PATCH 3/6] t1450: test fsck of packed objects Jeff King
@ 2017-01-13 17:58         ` Jeff King
  2017-01-13 17:59         ` [PATCH 5/6] fsck: parse loose object paths directly Jeff King
                           ` (2 subsequent siblings)
  6 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2017-01-13 17:58 UTC (permalink / raw)
  To: John Szakmeister; +Cc: Dennis Kaarsemaker, git

It's surprisingly hard to ask the sha1_file code to open a
_specific_ incarnation of a loose object. Most of the
functions take a sha1, and loop over the various object
types (packed versus loose) and locations (local versus
alternates) at a low level.

However, some tools like fsck need to look at a specific
file. This patch gives them a function they can use to open
the loose object at a given path.

The implementation unfortunately ends up repeating bits of
related functions, but there's not a good way around it
without some major refactoring of the whole sha1_file stack.
We need to mmap the specific file, then partially read the
zlib stream to know whether we're streaming or not, and then
finally either stream it or copy the data to a buffer.

We can do that by assembling some of the more arcane
internal sha1_file functions, but we end up having to
essentially reimplement unpack_sha1_file(), along with the
streaming bits of check_sha1_signature().

Still, most of the ugliness is contained in the new
function, and the interface is clean enough that it may be
reusable (though it seems unlikely anything but git-fsck
would care about opening a specific file).

Signed-off-by: Jeff King <peff@peff.net>
---
 cache.h     |  13 ++++++
 sha1_file.c | 133 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 143 insertions(+), 3 deletions(-)

diff --git a/cache.h b/cache.h
index 1b67f078d..33f1c2fa7 100644
--- a/cache.h
+++ b/cache.h
@@ -1140,6 +1140,19 @@ extern int finalize_object_file(const char *tmpfile, const char *filename);
 
 extern int has_sha1_pack(const unsigned char *sha1);
 
+/*
+ * Open the loose object at path, check its sha1, and return the contents,
+ * type, and size. If the object is a blob, then "contents" may return NULL,
+ * to allow streaming of large blobs.
+ *
+ * Returns 0 on success, negative on error (details may be written to stderr).
+ */
+int read_loose_object(const char *path,
+		      const unsigned char *expected_sha1,
+		      enum object_type *type,
+		      unsigned long *size,
+		      void **contents);
+
 /*
  * Return true iff we have an object named sha1, whether local or in
  * an alternate object database, and whether packed or loose.  This
diff --git a/sha1_file.c b/sha1_file.c
index c6b990f41..c0fccb73c 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1687,13 +1687,21 @@ static int open_sha1_file(const unsigned char *sha1, const char **path)
 	return -1;
 }
 
-void *map_sha1_file(const unsigned char *sha1, unsigned long *size)
+/*
+ * Map the loose object at "path" if it is not NULL, or the path found by
+ * searching for a loose object named "sha1".
+ */
+static void *map_sha1_file_1(const char *path,
+			     const unsigned char *sha1,
+			     unsigned long *size)
 {
-	const char *path;
 	void *map;
 	int fd;
 
-	fd = open_sha1_file(sha1, &path);
+	if (path)
+		fd = git_open(path);
+	else
+		fd = open_sha1_file(sha1, &path);
 	map = NULL;
 	if (fd >= 0) {
 		struct stat st;
@@ -1712,6 +1720,11 @@ void *map_sha1_file(const unsigned char *sha1, unsigned long *size)
 	return map;
 }
 
+void *map_sha1_file(const unsigned char *sha1, unsigned long *size)
+{
+	return map_sha1_file_1(NULL, sha1, size);
+}
+
 unsigned long unpack_object_header_buffer(const unsigned char *buf,
 		unsigned long len, enum object_type *type, unsigned long *sizep)
 {
@@ -3809,3 +3822,117 @@ int for_each_packed_object(each_packed_object_fn cb, void *data, unsigned flags)
 	}
 	return r ? r : pack_errors;
 }
+
+static int check_stream_sha1(git_zstream *stream,
+			     const char *hdr,
+			     unsigned long size,
+			     const char *path,
+			     const unsigned char *expected_sha1)
+{
+	git_SHA_CTX c;
+	unsigned char real_sha1[GIT_SHA1_RAWSZ];
+	unsigned char buf[4096];
+	unsigned long total_read;
+	int status = Z_OK;
+
+	git_SHA1_Init(&c);
+	git_SHA1_Update(&c, hdr, stream->total_out);
+
+	/*
+	 * We already read some bytes into hdr, but the ones up to the NUL
+	 * do not count against the object's content size.
+	 */
+	total_read = stream->total_out - strlen(hdr) - 1;
+
+	/*
+	 * This size comparison must be "<=" to read the final zlib packets;
+	 * see the comment in unpack_sha1_rest for details.
+	 */
+	while (total_read <= size &&
+	       (status == Z_OK || status == Z_BUF_ERROR)) {
+		stream->next_out = buf;
+		stream->avail_out = sizeof(buf);
+		if (size - total_read < stream->avail_out)
+			stream->avail_out = size - total_read;
+		status = git_inflate(stream, Z_FINISH);
+		git_SHA1_Update(&c, buf, stream->next_out - buf);
+		total_read += stream->next_out - buf;
+	}
+	git_inflate_end(stream);
+
+	if (status != Z_STREAM_END) {
+		error("corrupt loose object '%s'", sha1_to_hex(expected_sha1));
+		return -1;
+	}
+
+	git_SHA1_Final(real_sha1, &c);
+	if (hashcmp(expected_sha1, real_sha1)) {
+		error("sha1 mismatch for %s (expected %s)", path,
+		      sha1_to_hex(expected_sha1));
+		return -1;
+	}
+
+	return 0;
+}
+
+int read_loose_object(const char *path,
+		      const unsigned char *expected_sha1,
+		      enum object_type *type,
+		      unsigned long *size,
+		      void **contents)
+{
+	int ret = -1;
+	int fd = -1;
+	void *map = NULL;
+	unsigned long mapsize;
+	git_zstream stream;
+	char hdr[32];
+
+	*contents = NULL;
+
+	map = map_sha1_file_1(path, NULL, &mapsize);
+	if (!map) {
+		error_errno("unable to mmap %s", path);
+		goto out;
+	}
+
+	if (unpack_sha1_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
+		error("unable to unpack header of %s", path);
+		goto out;
+	}
+
+	*type = parse_sha1_header(hdr, size);
+	if (*type < 0) {
+		error("unable to parse header of %s", path);
+		git_inflate_end(&stream);
+		goto out;
+	}
+
+	if (*type == OBJ_BLOB) {
+		if (check_stream_sha1(&stream, hdr, *size, path, expected_sha1) < 0)
+			goto out;
+	} else {
+		*contents = unpack_sha1_rest(&stream, hdr, *size, expected_sha1);
+		if (!*contents) {
+			error("unable to unpack contents of %s", path);
+			git_inflate_end(&stream);
+			goto out;
+		}
+		if (check_sha1_signature(expected_sha1, *contents,
+					 *size, typename(*type))) {
+			error("sha1 mismatch for %s (expected %s)", path,
+			      sha1_to_hex(expected_sha1));
+			free(*contents);
+			goto out;
+		}
+	}
+
+	ret = 0; /* everything checks out */
+
+out:
+	if (map)
+		munmap(map, mapsize);
+	if (fd >= 0)
+		close(fd);
+	return ret;
+}
-- 
2.11.0.629.g10075098c


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 5/6] fsck: parse loose object paths directly
  2017-01-13 17:52       ` [PATCH 0/6] loose-object fsck fixes/tightening Jeff King
                           ` (3 preceding siblings ...)
  2017-01-13 17:58         ` [PATCH 4/6] sha1_file: add read_loose_object() function Jeff King
@ 2017-01-13 17:59         ` Jeff King
  2018-10-30 20:03           ` Infinite loop regression in git-fsck in v2.12.0 Ævar Arnfjörð Bjarmason
  2017-01-13 18:00         ` [PATCH 6/6] fsck: detect trailing garbage in all object types Jeff King
  2017-01-19 11:18         ` [PATCH 0/6] loose-object fsck fixes/tightening John Szakmeister
  6 siblings, 1 reply; 39+ messages in thread
From: Jeff King @ 2017-01-13 17:59 UTC (permalink / raw)
  To: John Szakmeister; +Cc: Dennis Kaarsemaker, git

When we iterate over the list of loose objects to check, we
get the actual path of each object. But we then throw it
away and pass just the sha1 to fsck_sha1(), which will do a
fresh lookup. Usually it would find the same object, but it
may not if an object exists both as a loose and a packed
object. We may end up checking the packed object twice, and
never look at the loose one.

In practice this isn't too terrible, because if fsck doesn't
complain, it means you have at least one good copy. But
since the point of fsck is to look for corruption, we should
be thorough.

The new read_loose_object() interface can help us get the
data from disk, and then we replace parse_object() with
parse_object_buffer(). As a bonus, our error messages now
mention the path to a corrupted object, which should make it
easier to track down errors when they do happen.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/fsck.c  | 46 +++++++++++++++++++++++++++++++++-------------
 t/t1450-fsck.sh | 16 ++++++++++++++++
 2 files changed, 49 insertions(+), 13 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index f01b81eeb..4b91ee95e 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -362,18 +362,6 @@ static int fsck_obj(struct object *obj)
 	return 0;
 }
 
-static int fsck_sha1(const unsigned char *sha1)
-{
-	struct object *obj = parse_object(sha1);
-	if (!obj) {
-		errors_found |= ERROR_OBJECT;
-		return error("%s: object corrupt or missing",
-			     sha1_to_hex(sha1));
-	}
-	obj->flags |= HAS_OBJ;
-	return fsck_obj(obj);
-}
-
 static int fsck_obj_buffer(const unsigned char *sha1, enum object_type type,
 			   unsigned long size, void *buffer, int *eaten)
 {
@@ -488,9 +476,41 @@ static void get_default_heads(void)
 	}
 }
 
+static struct object *parse_loose_object(const unsigned char *sha1,
+					 const char *path)
+{
+	struct object *obj;
+	void *contents;
+	enum object_type type;
+	unsigned long size;
+	int eaten;
+
+	if (read_loose_object(path, sha1, &type, &size, &contents) < 0)
+		return NULL;
+
+	if (!contents && type != OBJ_BLOB)
+		die("BUG: read_loose_object streamed a non-blob");
+
+	obj = parse_object_buffer(sha1, type, size, contents, &eaten);
+
+	if (!eaten)
+		free(contents);
+	return obj;
+}
+
 static int fsck_loose(const unsigned char *sha1, const char *path, void *data)
 {
-	if (fsck_sha1(sha1))
+	struct object *obj = parse_loose_object(sha1, path);
+
+	if (!obj) {
+		errors_found |= ERROR_OBJECT;
+		error("%s: object corrupt or missing: %s",
+		      sha1_to_hex(sha1), path);
+		return 0; /* keep checking other objects */
+	}
+
+	obj->flags = HAS_OBJ;
+	if (fsck_obj(obj))
 		errors_found |= ERROR_OBJECT;
 	return 0;
 }
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index c39d42120..455c186fe 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -581,4 +581,20 @@ test_expect_success 'fsck errors in packed objects' '
 	! grep corrupt out
 '
 
+test_expect_success 'fsck finds problems in duplicate loose objects' '
+	rm -rf broken-duplicate &&
+	git init broken-duplicate &&
+	(
+		cd broken-duplicate &&
+		test_commit duplicate &&
+		# no "-d" here, so we end up with duplicates
+		git repack &&
+		# now corrupt the loose copy
+		file=$(sha1_file "$(git rev-parse HEAD)") &&
+		rm "$file" &&
+		echo broken >"$file" &&
+		test_must_fail git fsck
+	)
+'
+
 test_done
-- 
2.11.0.629.g10075098c


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 6/6] fsck: detect trailing garbage in all object types
  2017-01-13 17:52       ` [PATCH 0/6] loose-object fsck fixes/tightening Jeff King
                           ` (4 preceding siblings ...)
  2017-01-13 17:59         ` [PATCH 5/6] fsck: parse loose object paths directly Jeff King
@ 2017-01-13 18:00         ` Jeff King
  2017-01-19 11:18         ` [PATCH 0/6] loose-object fsck fixes/tightening John Szakmeister
  6 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2017-01-13 18:00 UTC (permalink / raw)
  To: John Szakmeister; +Cc: Dennis Kaarsemaker, git

When a loose tree or commit is read by fsck (or any git
program), unpack_sha1_rest() checks whether there is extra
cruft at the end of the object file, after the zlib data.
Blobs that are streamed, however, do not have this check.

For normal git operations, it's not a big deal. We know the
sha1 and size checked out, so we have the object bytes we
wanted.  The trailing garbage doesn't affect what we're
trying to do.

But since the point of fsck is to find corruption or other
problems, it should be more thorough. This patch teaches its
loose-sha1 reader to detect extra bytes after the zlib
stream and complain.

Signed-off-by: Jeff King <peff@peff.net>
---
 sha1_file.c     |  5 +++++
 t/t1450-fsck.sh | 22 ++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/sha1_file.c b/sha1_file.c
index c0fccb73c..b77ab6d5c 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3864,6 +3864,11 @@ static int check_stream_sha1(git_zstream *stream,
 		error("corrupt loose object '%s'", sha1_to_hex(expected_sha1));
 		return -1;
 	}
+	if (stream->avail_in) {
+		error("garbage at end of loose object '%s'",
+		      sha1_to_hex(expected_sha1));
+		return -1;
+	}
 
 	git_SHA1_Final(real_sha1, &c);
 	if (hashcmp(expected_sha1, real_sha1)) {
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 455c186fe..8975b4d1b 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -597,4 +597,26 @@ test_expect_success 'fsck finds problems in duplicate loose objects' '
 	)
 '
 
+test_expect_success 'fsck detects trailing loose garbage (commit)' '
+	git cat-file commit HEAD >basis &&
+	echo bump-commit-sha1 >>basis &&
+	commit=$(git hash-object -w -t commit basis) &&
+	file=$(sha1_file $commit) &&
+	test_when_finished "remove_object $commit" &&
+	chmod +w "$file" &&
+	echo garbage >>"$file" &&
+	test_must_fail git fsck 2>out &&
+	test_i18ngrep "garbage.*$commit" out
+'
+
+test_expect_success 'fsck detects trailing loose garbage (blob)' '
+	blob=$(echo trailing | git hash-object -w --stdin) &&
+	file=$(sha1_file $blob) &&
+	test_when_finished "remove_object $blob" &&
+	chmod +w "$file" &&
+	echo garbage >>"$file" &&
+	test_must_fail git fsck 2>out &&
+	test_i18ngrep "garbage.*$blob" out
+'
+
 test_done
-- 
2.11.0.629.g10075098c

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/6] loose-object fsck fixes/tightening
  2017-01-13 17:52       ` [PATCH 0/6] loose-object fsck fixes/tightening Jeff King
                           ` (5 preceding siblings ...)
  2017-01-13 18:00         ` [PATCH 6/6] fsck: detect trailing garbage in all object types Jeff King
@ 2017-01-19 11:18         ` John Szakmeister
  6 siblings, 0 replies; 39+ messages in thread
From: John Szakmeister @ 2017-01-19 11:18 UTC (permalink / raw)
  To: Jeff King; +Cc: Dennis Kaarsemaker, git

On Fri, Jan 13, 2017 at 12:52 PM, Jeff King <peff@peff.net> wrote:
> On Fri, Jan 13, 2017 at 04:15:42AM -0500, John Szakmeister wrote:
>
>> > I did notice another interesting case when looking at this. Fsck ends up
>> > in fsck_loose(), which has the sha1 and path of the loose object. It
>> > passes the sha1 to fsck_sha1(), and ignores the path entirely!
>> >
>> > So if you have a duplicate copy of the object in a pack, we'd actually
>> > find and check the duplicate. This can happen, e.g., if you had a loose
>> > object and fetched a thin-pack which made a copy of the loose object to
>> > complete the pack).
>> >
>> > Probably fsck_loose() should be more picky about making sure we are
>> > reading the data from the loose version we found.
>>
>> Interesting find!  Thanks for the information Peff!
>
> So I figured I would knock this out as a fun morning exercise. But
> sheesh, it turned out to be a slog, because most of the functions rely
> on map_sha1_file() to convert the sha1 to an object path at the lowest
> level.

Yeah, I discovered the same thing when I took a look at it a week or so ago. :-(

> But I finally got something working, so here it is. I found another bug
> on the way, along with a few cleanups. And then I did the trailing
> garbage detection at the end, because by that point I knew right where
> it needed to go. :)

I don't know if my opinion counts for much, but the changes look good to me.

-John

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Infinite loop regression in git-fsck in v2.12.0
  2017-01-13 17:59         ` [PATCH 5/6] fsck: parse loose object paths directly Jeff King
@ 2018-10-30 20:03           ` Ævar Arnfjörð Bjarmason
  2018-10-30 21:35             ` Jeff King
  2018-10-30 21:56             ` Infinite loop regression in git-fsck in v2.12.0 Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 39+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-10-30 20:03 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List, John Szakmeister, Dennis Kaarsemaker

While playing around with having a GIT_TEST_FSCK=true as I suggested in
https://public-inbox.org/git/20181030184331.27264-3-avarab@gmail.com/ I
found that we've had an infinite loop in git-fsck since c68b489e56
("fsck: parse loose object paths directly", 2017-01-13)

In particular in the while() loop added by f6371f9210 ("sha1_file: add
read_loose_object() function", 2017-01-13) in the check_stream_sha1()
function.

To reproduce just:

    (
        cd t &&
        ./t5000-tar-tree.sh -d &&
        git -C trash\ directory.t5000-tar-tree/ fsck
    )

Before we'd print:

    error: sha1 mismatch 19f9c8273ec45a8938e6999cb59b3ff66739902a
    error: 19f9c8273ec45a8938e6999cb59b3ff66739902a: object corrupt or missing
    Checking object directories: 100% (256/256), done.
    missing blob 19f9c8273ec45a8938e6999cb59b3ff66739902a

Now we just hang on:

    Checking object directories:   9% (24/256)

I have no idea if this makes sense, but this fixes it and we pass all
the fsck tests with it:

    diff --git a/sha1-file.c b/sha1-file.c
    index dd0b6aa873..fffc31458e 100644
    --- a/sha1-file.c
    +++ b/sha1-file.c
    @@ -2182,7 +2182,7 @@ static int check_stream_sha1(git_zstream *stream,
     	git_hash_ctx c;
     	unsigned char real_sha1[GIT_MAX_RAWSZ];
     	unsigned char buf[4096];
    -	unsigned long total_read;
    +	unsigned long total_read, last_total_read;
     	int status = Z_OK;

     	the_hash_algo->init_fn(&c);
    @@ -2193,6 +2193,7 @@ static int check_stream_sha1(git_zstream *stream,
     	 * do not count against the object's content size.
     	 */
     	total_read = stream->total_out - strlen(hdr) - 1;
    +	last_total_read = total_read;

     	/*
     	 * This size comparison must be "<=" to read the final zlib packets;
    @@ -2207,6 +2208,9 @@ static int check_stream_sha1(git_zstream *stream,
     		status = git_inflate(stream, Z_FINISH);
     		the_hash_algo->update_fn(&c, buf, stream->next_out - buf);
     		total_read += stream->next_out - buf;
    +		if (last_total_read == total_read)
    +			return -1;
    +		last_total_read = total_read;
     	}
     	git_inflate_end(stream);

I.e. we get into a loop where total_read isn't increasing. We no longer
print "sha1 mismatch" but maybe that's an emergent effect of something
else. Haven't checked.

The test is easy, just add a 'git fsck' at the end of t5000-tar-tree.sh,
but more generally it seems having something like GIT_TEST_FSCK=true is
a good idea. We do a bunch of stress testing of the object store in the
test suite that we're unlikely to encounter in the wild.

Of course my idea of how to do that in my
<20181030184331.27264-3-avarab@gmail.com> would be counterproductive,
i.e. it seems we want to catch all the cases where there's a bad fsck,
just that it returns in a certain way.

So maybe a good approach would be that we'd annotate all those test
whose fsck fails with "this is how it should fail", and run those tests
under GIT_TEST_FSCK=true, and GIT_TEST_FSCK=true would also be asserting
that no tests other than those marked as failing the fsck check at the
end fail it.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Infinite loop regression in git-fsck in v2.12.0
  2018-10-30 20:03           ` Infinite loop regression in git-fsck in v2.12.0 Ævar Arnfjörð Bjarmason
@ 2018-10-30 21:35             ` Jeff King
  2018-10-30 22:28               ` Junio C Hamano
  2018-10-30 21:56             ` Infinite loop regression in git-fsck in v2.12.0 Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 39+ messages in thread
From: Jeff King @ 2018-10-30 21:35 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, John Szakmeister, Dennis Kaarsemaker

On Tue, Oct 30, 2018 at 09:03:24PM +0100, Ævar Arnfjörð Bjarmason wrote:

> While playing around with having a GIT_TEST_FSCK=true as I suggested in
> https://public-inbox.org/git/20181030184331.27264-3-avarab@gmail.com/ I
> found that we've had an infinite loop in git-fsck since c68b489e56
> ("fsck: parse loose object paths directly", 2017-01-13)
> 
> In particular in the while() loop added by f6371f9210 ("sha1_file: add
> read_loose_object() function", 2017-01-13) in the check_stream_sha1()
> function.
> 
> To reproduce just:
> 
>     (
>         cd t &&
>         ./t5000-tar-tree.sh -d &&
>         git -C trash\ directory.t5000-tar-tree/ fsck
>     )

Thanks, I was easily able to reproduce.

> Before we'd print:
> 
>     error: sha1 mismatch 19f9c8273ec45a8938e6999cb59b3ff66739902a
>     error: 19f9c8273ec45a8938e6999cb59b3ff66739902a: object corrupt or missing
>     Checking object directories: 100% (256/256), done.
>     missing blob 19f9c8273ec45a8938e6999cb59b3ff66739902a

The problem isn't actually a sha1 mismatch, though that's what
parse_object() will report. The issue is actually that the file is
truncated. So zlib does not say "this is corrupt", but rather "I need
more bytes to keep going". And unfortunately it returns Z_BUF_ERROR both
for "I need more bytes" (in which we know we are truncated, because we
fed the whole mmap'd file in the first place) as well as "I need more
output buffer space" (which just means we should keep looping!).

So we need to distinguish those cases. I think this is the simplest fix:

diff --git a/sha1-file.c b/sha1-file.c
index dd0b6aa873..a7ff5fe25d 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -2199,6 +2199,7 @@ static int check_stream_sha1(git_zstream *stream,
 	 * see the comment in unpack_sha1_rest for details.
 	 */
 	while (total_read <= size &&
+	       stream->avail_in > 0 &&
 	       (status == Z_OK || status == Z_BUF_ERROR)) {
 		stream->next_out = buf;
 		stream->avail_out = sizeof(buf);

> I have no idea if this makes sense, but this fixes it and we pass all
> the fsck tests with it:
> 
>     diff --git a/sha1-file.c b/sha1-file.c
>     index dd0b6aa873..fffc31458e 100644
>     --- a/sha1-file.c
>     +++ b/sha1-file.c
>     @@ -2182,7 +2182,7 @@ static int check_stream_sha1(git_zstream *stream,
>      	git_hash_ctx c;
>      	unsigned char real_sha1[GIT_MAX_RAWSZ];
>      	unsigned char buf[4096];
>     -	unsigned long total_read;
>     +	unsigned long total_read, last_total_read;
>      	int status = Z_OK;
> 
>      	the_hash_algo->init_fn(&c);
>     @@ -2193,6 +2193,7 @@ static int check_stream_sha1(git_zstream *stream,
>      	 * do not count against the object's content size.
>      	 */
>      	total_read = stream->total_out - strlen(hdr) - 1;
>     +	last_total_read = total_read;

This works just by checking that we are making forward progress in the
output buffer. I think that would _probably_ be OK for this case, since
we know we have all of the input available. But in a case where we're
feeding the input in a stream, it would not be. It's possible there that
we would not create any output in one round, but would do so after
feeding more input bytes.

I think the patch I showed above addresses the root cause more directly.
I'll wrap that up in a real commit, but I think there may be some
related work:

  - "git show 19f9c827" does complain with "sha1 mismatch" (which isn't
    strictly correct, but is probably good enough). However, "git
    cat-file blob 19f9c827" exits non-zero without printing anything. It
    probably should complain more loudly.

  - the offending loop comes from f6371f9210. But that commit was mostly
    cargo-culting other parts of sha1-file.c. I'm worried that this bug
    exists elsewhere, too. I'll dig around to see if I can find other
    instances.

-Peff

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: Infinite loop regression in git-fsck in v2.12.0
  2018-10-30 20:03           ` Infinite loop regression in git-fsck in v2.12.0 Ævar Arnfjörð Bjarmason
  2018-10-30 21:35             ` Jeff King
@ 2018-10-30 21:56             ` Ævar Arnfjörð Bjarmason
  2018-10-30 23:08               ` Jeff King
  1 sibling, 1 reply; 39+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-10-30 21:56 UTC (permalink / raw)
  To: Jeff King; +Cc: Git Mailing List, John Szakmeister, Dennis Kaarsemaker


On Tue, Oct 30 2018, Ævar Arnfjörð Bjarmason wrote:

> The test is easy, just add a 'git fsck' at the end of t5000-tar-tree.sh,
> but more generally it seems having something like GIT_TEST_FSCK=true is
> a good idea. We do a bunch of stress testing of the object store in the
> test suite that we're unlikely to encounter in the wild.
>
> Of course my idea of how to do that in my
> <20181030184331.27264-3-avarab@gmail.com> would be counterproductive,
> i.e. it seems we want to catch all the cases where there's a bad fsck,
> just that it returns in a certain way.
>
> So maybe a good approach would be that we'd annotate all those test
> whose fsck fails with "this is how it should fail", and run those tests
> under GIT_TEST_FSCK=true, and GIT_TEST_FSCK=true would also be asserting
> that no tests other than those marked as failing the fsck check at the
> end fail it.

WIP patch for doing that:

    diff --git a/Makefile b/Makefile
    index b08d5ea258..ca624c381f 100644
    --- a/Makefile
    +++ b/Makefile
    @@ -723,6 +723,7 @@ TEST_BUILTINS_OBJS += test-dump-fsmonitor.o
     TEST_BUILTINS_OBJS += test-dump-split-index.o
     TEST_BUILTINS_OBJS += test-dump-untracked-cache.o
     TEST_BUILTINS_OBJS += test-example-decorate.o
    +TEST_BUILTINS_OBJS += test-env-bool.o
     TEST_BUILTINS_OBJS += test-genrandom.o
     TEST_BUILTINS_OBJS += test-hashmap.o
     TEST_BUILTINS_OBJS += test-index-version.o
    diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
    index 5df8b682aa..c4481085c4 100644
    --- a/t/helper/test-tool.c
    +++ b/t/helper/test-tool.c
    @@ -17,6 +17,7 @@ static struct test_cmd cmds[] = {
     	{ "dump-fsmonitor", cmd__dump_fsmonitor },
     	{ "dump-split-index", cmd__dump_split_index },
     	{ "dump-untracked-cache", cmd__dump_untracked_cache },
    +	{ "env-bool", cmd__env_bool },
     	{ "example-decorate", cmd__example_decorate },
     	{ "genrandom", cmd__genrandom },
     	{ "hashmap", cmd__hashmap },
    diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
    index 71f470b871..f7845fbc56 100644
    --- a/t/helper/test-tool.h
    +++ b/t/helper/test-tool.h
    @@ -13,6 +13,7 @@ int cmd__dump_cache_tree(int argc, const char **argv);
     int cmd__dump_fsmonitor(int argc, const char **argv);
     int cmd__dump_split_index(int argc, const char **argv);
     int cmd__dump_untracked_cache(int argc, const char **argv);
    +int cmd__env_bool(int argc, const char **argv);
     int cmd__example_decorate(int argc, const char **argv);
     int cmd__genrandom(int argc, const char **argv);
     int cmd__hashmap(int argc, const char **argv);
    diff --git a/t/t1305-config-include.sh b/t/t1305-config-include.sh
    index 635918505d..92fbce2920 100755
    --- a/t/t1305-config-include.sh
    +++ b/t/t1305-config-include.sh
    @@ -313,4 +313,8 @@ test_expect_success 'include cycles are detected' '
     	test_i18ngrep "exceeded maximum include depth" stderr
     '

    +GIT_FSCK_FAILS=true
    +GIT_FSCK_FAILS_TEST='
    +	test_i18ngrep "exceeded maximum include depth" fsck.err
    +'
     test_done
    diff --git a/t/t3103-ls-tree-misc.sh b/t/t3103-ls-tree-misc.sh
    index 14520913af..06abf84ef4 100755
    --- a/t/t3103-ls-tree-misc.sh
    +++ b/t/t3103-ls-tree-misc.sh
    @@ -22,4 +22,10 @@ test_expect_success 'ls-tree fails with non-zero exit code on broken tree' '
     	test_must_fail git ls-tree -r HEAD
     '

    +GIT_FSCK_FAILS=true
    +GIT_FSCK_FAILS_TEST='
    +	test_i18ngrep "invalid sha1 pointer in cache-tree" fsck.err &&
    +	test_i18ngrep "broken link from" fsck.out &&
    +	test_i18ngrep "missing tree" fsck.out
    +'
     test_done
    diff --git a/t/test-lib.sh b/t/test-lib.sh
    index 897e6fcc94..d4ebb94998 100644
    --- a/t/test-lib.sh
    +++ b/t/test-lib.sh
    @@ -454,6 +454,8 @@ GIT_EXIT_OK=
     trap 'die' EXIT
     trap 'exit $?' INT

    +GIT_FSCK_FAILS=
    +
     # The user-facing functions are loaded from a separate file so that
     # test_perf subshells can have them too
     . "$TEST_DIRECTORY/test-lib-functions.sh"
    @@ -790,6 +792,25 @@ test_at_end_hook_ () {
     }

     test_done () {
    +	if test_have_prereq TEST_FSCK
    +	then
    +		desc='git fsck at end (due to GIT_TEST_FSCK)'
    +		if test -n "$GIT_FSCK_FAILS"
    +		then
    +			test_expect_success "$desc (expected to fail)" '
    +				test_must_fail git fsck 2>fsck.err >fsck.out
    +			'
    +			test_expect_success "$descriptor (expected to fail) -- assert failure mode" "
    +				test_path_exists fsck.err &&
    +				test_path_exists fsck.out &&
    +				$GIT_FSCK_FAILS_TEST
    +			"
    +		else
    +			test_expect_success "$desc" '
    +				git fsck
    +			'
    +		fi
    +	fi
     	GIT_EXIT_OK=t

     	if test -z "$HARNESS_ACTIVE"
    @@ -1268,3 +1289,5 @@ test_lazy_prereq CURL '
     test_lazy_prereq SHA1 '
     	test $(git hash-object /dev/null) = e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
     '
    +
    +test_lazy_prereq TEST_FSCK 'test-tool env-bool GIT_TEST_FSCK'

Could be made prettier by turning that work in test_done() into a
utility function, but is (I think) worth the effort to do.

Jeff: Gotta turn in for the night, but maybe Something you're maybe
interested in carrying forward for this fix? It's not that much work to
mark up the failing tests, there's 10-20 of them from some quick
eyeballing.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Infinite loop regression in git-fsck in v2.12.0
  2018-10-30 21:35             ` Jeff King
@ 2018-10-30 22:28               ` Junio C Hamano
  2018-10-30 22:56                 ` Jeff King
  0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2018-10-30 22:28 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, Git Mailing List,
	John Szakmeister, Dennis Kaarsemaker

Jeff King <peff@peff.net> writes:

> The problem isn't actually a sha1 mismatch, though that's what
> parse_object() will report. The issue is actually that the file is
> truncated. So zlib does not say "this is corrupt", but rather "I need
> more bytes to keep going". And unfortunately it returns Z_BUF_ERROR both
> for "I need more bytes" (in which we know we are truncated, because we
> fed the whole mmap'd file in the first place) as well as "I need more
> output buffer space" (which just means we should keep looping!).
>
> So we need to distinguish those cases. I think this is the simplest fix:
>
> diff --git a/sha1-file.c b/sha1-file.c
> index dd0b6aa873..a7ff5fe25d 100644
> --- a/sha1-file.c
> +++ b/sha1-file.c
> @@ -2199,6 +2199,7 @@ static int check_stream_sha1(git_zstream *stream,
>  	 * see the comment in unpack_sha1_rest for details.
>  	 */
>  	while (total_read <= size &&
> +	       stream->avail_in > 0 &&
>  	       (status == Z_OK || status == Z_BUF_ERROR)) {
>  		stream->next_out = buf;
>  		stream->avail_out = sizeof(buf);

Hmph.  If the last round consumed the final input byte and needed
output space of N bytes, but only M (< N) bytes of the output space
was available, then it would have reduced both avail_in and
avail_out down to zero and yielded Z_BUF_ERROR, no?  Or would zlib
refrain from consuming that final byte (leaving avail_in to at least
one) and give us Z_BUF_ERROR in such a case?

> This works just by checking that we are making forward progress in the
> output buffer. I think that would _probably_ be OK for this case, since
> we know we have all of the input available. But in a case where we're
> feeding the input in a stream, it would not be. It's possible there that
> we would not create any output in one round, but would do so after
> feeding more input bytes.

Yes, exactly.

> I think the patch I showed above addresses the root cause more directly.
> I'll wrap that up in a real commit, but I think there may be some
> related work:
>
>   - "git show 19f9c827" does complain with "sha1 mismatch" (which isn't
>     strictly correct, but is probably good enough). However, "git
>     cat-file blob 19f9c827" exits non-zero without printing anything. It
>     probably should complain more loudly.
>
>   - the offending loop comes from f6371f9210. But that commit was mostly
>     cargo-culting other parts of sha1-file.c. I'm worried that this bug
>     exists elsewhere, too. I'll dig around to see if I can find other
>     instances.

Thanks.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Infinite loop regression in git-fsck in v2.12.0
  2018-10-30 22:28               ` Junio C Hamano
@ 2018-10-30 22:56                 ` Jeff King
  2018-10-30 23:12                   ` Jeff King
  0 siblings, 1 reply; 39+ messages in thread
From: Jeff King @ 2018-10-30 22:56 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, Git Mailing List,
	John Szakmeister, Dennis Kaarsemaker

On Wed, Oct 31, 2018 at 07:28:00AM +0900, Junio C Hamano wrote:

> > So we need to distinguish those cases. I think this is the simplest fix:
> >
> > diff --git a/sha1-file.c b/sha1-file.c
> > index dd0b6aa873..a7ff5fe25d 100644
> > --- a/sha1-file.c
> > +++ b/sha1-file.c
> > @@ -2199,6 +2199,7 @@ static int check_stream_sha1(git_zstream *stream,
> >  	 * see the comment in unpack_sha1_rest for details.
> >  	 */
> >  	while (total_read <= size &&
> > +	       stream->avail_in > 0 &&
> >  	       (status == Z_OK || status == Z_BUF_ERROR)) {
> >  		stream->next_out = buf;
> >  		stream->avail_out = sizeof(buf);
> 
> Hmph.  If the last round consumed the final input byte and needed
> output space of N bytes, but only M (< N) bytes of the output space
> was available, then it would have reduced both avail_in and
> avail_out down to zero and yielded Z_BUF_ERROR, no?  Or would zlib
> refrain from consuming that final byte (leaving avail_in to at least
> one) and give us Z_BUF_ERROR in such a case?

Hmm, yeah, good thinking. I think zlib could consume that final byte
into its internal buffer.

As part of my digging, I looked at how the loose streaming code handles
this. It checks that when we see Z_BUF_ERROR, we actually did run out of
output bytes (so if we didn't, then we know it's not the case we
expected to be looping on).

I have some patches almost ready to send; I'll use that technique.

-Peff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Infinite loop regression in git-fsck in v2.12.0
  2018-10-30 21:56             ` Infinite loop regression in git-fsck in v2.12.0 Ævar Arnfjörð Bjarmason
@ 2018-10-30 23:08               ` Jeff King
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2018-10-30 23:08 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git Mailing List, John Szakmeister, Dennis Kaarsemaker

On Tue, Oct 30, 2018 at 10:56:22PM +0100, Ævar Arnfjörð Bjarmason wrote:

> > So maybe a good approach would be that we'd annotate all those test
> > whose fsck fails with "this is how it should fail", and run those tests
> > under GIT_TEST_FSCK=true, and GIT_TEST_FSCK=true would also be asserting
> > that no tests other than those marked as failing the fsck check at the
> > end fail it.
> [...]
> Jeff: Gotta turn in for the night, but maybe Something you're maybe
> interested in carrying forward for this fix? It's not that much work to
> mark up the failing tests, there's 10-20 of them from some quick
> eyeballing.

For this fix, I'd much rather add a specific test to the existing fsck
tests. Otherwise, we're relying on what a bunch of other tests happen to
be doing now, but there's little hope that they won't get refactored in
a way that puts a gap in our test coverage.

IOW, I think of things like GIT_TEST_FSCK as a kind of shotgun approach.
They may find things, and we should fix them and make sure it runs
clean. But ultimately, specific cases of interest should get their own
tests.

-Peff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: Infinite loop regression in git-fsck in v2.12.0
  2018-10-30 22:56                 ` Jeff King
@ 2018-10-30 23:12                   ` Jeff King
  2018-10-30 23:18                     ` [PATCH 1/3] t1450: check large blob in trailing-garbage test Jeff King
                                       ` (2 more replies)
  0 siblings, 3 replies; 39+ messages in thread
From: Jeff King @ 2018-10-30 23:12 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, Git Mailing List,
	John Szakmeister, Dennis Kaarsemaker

On Tue, Oct 30, 2018 at 06:56:03PM -0400, Jeff King wrote:

> > >  	while (total_read <= size &&
> > > +	       stream->avail_in > 0 &&
> > >  	       (status == Z_OK || status == Z_BUF_ERROR)) {
> > >  		stream->next_out = buf;
> > >  		stream->avail_out = sizeof(buf);
> > 
> > Hmph.  If the last round consumed the final input byte and needed
> > output space of N bytes, but only M (< N) bytes of the output space
> > was available, then it would have reduced both avail_in and
> > avail_out down to zero and yielded Z_BUF_ERROR, no?  Or would zlib
> > refrain from consuming that final byte (leaving avail_in to at least
> > one) and give us Z_BUF_ERROR in such a case?
> 
> Hmm, yeah, good thinking. I think zlib could consume that final byte
> into its internal buffer.
> 
> As part of my digging, I looked at how the loose streaming code handles
> this. It checks that when we see Z_BUF_ERROR, we actually did run out of
> output bytes (so if we didn't, then we know it's not the case we
> expected to be looping on).
> 
> I have some patches almost ready to send; I'll use that technique.

And here they are.

  [1/3]: t1450: check large blob in trailing-garbage test
  [2/3]: check_stream_sha1(): handle input underflow
  [3/3]: cat-file: handle streaming failures consistently

 builtin/cat-file.c | 16 ++++++++++++----
 sha1-file.c        |  3 ++-
 t/t1450-fsck.sh    | 23 +++++++++++++++++++++--
 3 files changed, 35 insertions(+), 7 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/3] t1450: check large blob in trailing-garbage test
  2018-10-30 23:12                   ` Jeff King
@ 2018-10-30 23:18                     ` Jeff King
  2018-10-30 23:23                     ` [PATCH 2/3] check_stream_sha1(): handle input underflow Jeff King
  2018-10-30 23:23                     ` [PATCH 3/3] cat-file: handle streaming failures consistently Jeff King
  2 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2018-10-30 23:18 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, Git Mailing List,
	John Szakmeister, Dennis Kaarsemaker

Commit cce044df7f (fsck: detect trailing garbage in all
object types, 2017-01-13) added two tests of trailing
garbage in a loose object file: one with a commit and one
with a blob. The point of having two is that blobs would
follow a different code path that streamed the contents,
instead of loading it into a buffer as usual.

At the time, merely being a blob was enough to trigger the
streaming code path. But since 7ac4f3a007 (fsck: actually
fsck blob data, 2018-05-02), we now only stream blobs that
are actually large. So since then, the streaming code path
is not tested at all for this case.

We can restore the original intent of the test by tweaking
core.bigFileThreshold to make our small blob seem large.
There's no easy way to externally verify that we followed
the streaming code path, but I did check before/after using
a temporary debug statement.

Signed-off-by: Jeff King <peff@peff.net>
---
I prepared this series on master, but it occurs to me you may want to
apply patch 2 on top of f6371f9210 or thereabouts, which introduced the
bug it fixes. If so, then obviously this one doesn't make sense back
then, and should go on top of 7ac4f3a007. It should be semantically
independent, though there may be a minor text conflict.

 t/t1450-fsck.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 0f2dd26f74..3421f12e8a 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -673,13 +673,13 @@ test_expect_success 'fsck detects trailing loose garbage (commit)' '
 	test_i18ngrep "garbage.*$commit" out
 '

-test_expect_success 'fsck detects trailing loose garbage (blob)' '
+test_expect_success 'fsck detects trailing loose garbage (large blob)' '
 	blob=$(echo trailing | git hash-object -w --stdin) &&
 	file=$(sha1_file $blob) &&
 	test_when_finished "remove_object $blob" &&
 	chmod +w "$file" &&
 	echo garbage >>"$file" &&
-	test_must_fail git fsck 2>out &&
+	test_must_fail git -c core.bigfilethreshold=5 fsck 2>out &&
 	test_i18ngrep "garbage.*$blob" out
 '

-- 
2.19.1.1235.g6b27db57c2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 2/3] check_stream_sha1(): handle input underflow
  2018-10-30 23:12                   ` Jeff King
  2018-10-30 23:18                     ` [PATCH 1/3] t1450: check large blob in trailing-garbage test Jeff King
@ 2018-10-30 23:23                     ` Jeff King
  2018-10-31  4:23                       ` Junio C Hamano
  2018-10-30 23:23                     ` [PATCH 3/3] cat-file: handle streaming failures consistently Jeff King
  2 siblings, 1 reply; 39+ messages in thread
From: Jeff King @ 2018-10-30 23:23 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, Git Mailing List,
	John Szakmeister, Dennis Kaarsemaker

This commit fixes an infinite loop when fscking large
truncated loose objects.

The check_stream_sha1() function takes an mmap'd loose
object buffer and streams 4k of output at a time, checking
its sha1. The loop quits when we've output enough bytes (we
know the size from the object header), or when zlib tells us
anything except Z_OK or Z_BUF_ERROR.

The latter is expected because zlib may run out of room in
our 4k buffer, and that is how it tells us to process the
output and loop again.

But Z_BUF_ERROR also covers another case: one in which zlib
cannot make forward progress because it needs more _input_.
This should never happen in this loop, because though we're
streaming the output, we have the entire deflated input
available in the mmap'd buffer. But since we don't check
this case, we'll just loop infinitely if we do see a
truncated object, thinking that zlib is asking for more
output space.

It's tempting to fix this by checking stream->avail_in as
part of the loop condition (and quitting if all of our bytes
have been consumed). But that assumes that once zlib has
consumed the input, there is nothing left to do.  That's not
necessarily the case: it may have read our input into its
internal state, but still have bytes to output.

Instead, let's continue on Z_BUF_ERROR only when we see the
case we're expecting: the previous round filled our output
buffer completely. If it didn't (and we still saw
Z_BUF_ERROR), we know something is wrong and should break
out of the loop.

The bug comes from commit f6371f9210 (sha1_file: add
read_loose_object() function, 2017-01-13), which
reimplemented some of the existing loose object functions.
So it's worth checking if this bug was inherited from any of
those. The answers seems to be no. The two obvious
candidates are both OK:

  1. unpack_sha1_rest(); this doesn't need to loop on
     Z_BUF_ERROR at all, since it allocates the expected
     output buffer in advance (which we can't do since we're
     explicitly streaming here)

  2. check_object_signature(); the streaming path relies on
     the istream interface, which uses read_istream_loose()
     for this case. That function uses a similar "is our
     output buffer full" check with Z_BUF_ERROR (which is
     where I stole it from for this patch!)

Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jeff King <peff@peff.net>
---
 sha1-file.c     |  3 ++-
 t/t1450-fsck.sh | 19 +++++++++++++++++++
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/sha1-file.c b/sha1-file.c
index dd0b6aa873..2daf7d9935 100644
--- a/sha1-file.c
+++ b/sha1-file.c
@@ -2199,7 +2199,8 @@ static int check_stream_sha1(git_zstream *stream,
 	 * see the comment in unpack_sha1_rest for details.
 	 */
 	while (total_read <= size &&
-	       (status == Z_OK || status == Z_BUF_ERROR)) {
+	       (status == Z_OK ||
+		(status == Z_BUF_ERROR && !stream->avail_out))) {
 		stream->next_out = buf;
 		stream->avail_out = sizeof(buf);
 		if (size - total_read < stream->avail_out)
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 3421f12e8a..b5677d26a4 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -683,6 +683,25 @@ test_expect_success 'fsck detects trailing loose garbage (large blob)' '
 	test_i18ngrep "garbage.*$blob" out
 '

+test_expect_success 'fsck detects truncated loose object' '
+	# make it big enough that we know we will truncate in the data
+	# portion, not the header
+	test-tool genrandom truncate 4096 >file &&
+	blob=$(git hash-object -w file) &&
+	file=$(sha1_file $blob) &&
+	test_when_finished "remove_object $blob" &&
+	test_copy_bytes 1024 <"$file" >tmp &&
+	rm "$file" &&
+	mv -f tmp "$file" &&
+
+	# check both regular and streaming code paths
+	test_must_fail git fsck 2>out &&
+	test_i18ngrep corrupt.*$blob out &&
+
+	test_must_fail git -c core.bigfilethreshold=128 fsck 2>out &&
+	test_i18ngrep corrupt.*$blob out
+'
+
 # for each of type, we have one version which is referenced by another object
 # (and so while unreachable, not dangling), and another variant which really is
 # dangling.
-- 
2.19.1.1235.g6b27db57c2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 3/3] cat-file: handle streaming failures consistently
  2018-10-30 23:12                   ` Jeff King
  2018-10-30 23:18                     ` [PATCH 1/3] t1450: check large blob in trailing-garbage test Jeff King
  2018-10-30 23:23                     ` [PATCH 2/3] check_stream_sha1(): handle input underflow Jeff King
@ 2018-10-30 23:23                     ` Jeff King
  2018-10-31 12:42                       ` [PATCH 0/3] Add a GIT_TEST_FSCK test mode Ævar Arnfjörð Bjarmason
                                         ` (5 more replies)
  2 siblings, 6 replies; 39+ messages in thread
From: Jeff King @ 2018-10-30 23:23 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, Git Mailing List,
	John Szakmeister, Dennis Kaarsemaker

There are three ways to convince cat-file to stream a blob:

  - cat-file -p $blob

  - cat-file blob $blob

  - echo $batch | cat-file --batch

In the first two, we simply exit with the error code of
streaw_blob_to_fd(). That means that an error will cause us
to exit with "-1" (which we try to avoid) without printing
any kind of error message (which is confusing to the user).

Instead, let's match the third case, which calls die() on an
error. Unfortunately we cannot be more specific, as
stream_blob_to_fd() does not tell us whether the problem was
on reading (e.g., a corrupt object) or on writing (e.g.,
ENOSPC). That might be an opportunity for future work, but
for now we will at least exit with a sane message and exit
code.

Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/cat-file.c | 16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 8d97c84725..0d403eb77d 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -50,6 +50,13 @@ static int filter_object(const char *path, unsigned mode,
 	return 0;
 }
 
+static int stream_blob(const struct object_id *oid)
+{
+	if (stream_blob_to_fd(1, oid, NULL, 0))
+		die("unable to stream %s to stdout", oid_to_hex(oid));
+	return 0;
+}
+
 static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 			int unknown_type)
 {
@@ -132,7 +139,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 		}
 
 		if (type == OBJ_BLOB)
-			return stream_blob_to_fd(1, &oid, NULL, 0);
+			return stream_blob(&oid);
 		buf = read_object_file(&oid, &type, &size);
 		if (!buf)
 			die("Cannot read object %s", obj_name);
@@ -155,7 +162,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 				oidcpy(&blob_oid, &oid);
 
 			if (oid_object_info(the_repository, &blob_oid, NULL) == OBJ_BLOB)
-				return stream_blob_to_fd(1, &blob_oid, NULL, 0);
+				return stream_blob(&blob_oid);
 			/*
 			 * we attempted to dereference a tag to a blob
 			 * and failed; there may be new dereference
@@ -319,8 +326,9 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d
 				BUG("invalid cmdmode: %c", opt->cmdmode);
 			batch_write(opt, contents, size);
 			free(contents);
-		} else if (stream_blob_to_fd(1, oid, NULL, 0) < 0)
-			die("unable to stream %s to stdout", oid_to_hex(oid));
+		} else {
+			stream_blob(oid);
+		}
 	}
 	else {
 		enum object_type type;
-- 
2.19.1.1235.g6b27db57c2

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/3] check_stream_sha1(): handle input underflow
  2018-10-30 23:23                     ` [PATCH 2/3] check_stream_sha1(): handle input underflow Jeff King
@ 2018-10-31  4:23                       ` Junio C Hamano
  2018-10-31  4:30                         ` Jeff King
  0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2018-10-31  4:23 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, Git Mailing List,
	John Szakmeister, Dennis Kaarsemaker

Jeff King <peff@peff.net> writes:

> The bug comes from commit f6371f9210 (sha1_file: add
> read_loose_object() function, 2017-01-13), which
> reimplemented some of the existing loose object functions.
> So it's worth checking if this bug was inherited from any of
> those. The answers seems to be no. The two obvious
> candidates are both OK:
>
>   1. unpack_sha1_rest(); this doesn't need to loop on
>      Z_BUF_ERROR at all, since it allocates the expected
>      output buffer in advance (which we can't do since we're
>      explicitly streaming here)
>
>   2. check_object_signature(); the streaming path relies on
>      the istream interface, which uses read_istream_loose()
>      for this case. That function uses a similar "is our
>      output buffer full" check with Z_BUF_ERROR (which is
>      where I stole it from for this patch!)

See 692f0bc7 to find who did the fix you stole from, and for what
kind of breakage the original fix was made.

By the way, a very similar loop for pack_non_delta istream iterates
while total_read is smaller than sz, but it does not have the same
check upon BUF_ERROR to see if we've read everything.



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/3] check_stream_sha1(): handle input underflow
  2018-10-31  4:23                       ` Junio C Hamano
@ 2018-10-31  4:30                         ` Jeff King
  2018-10-31  4:44                           ` Junio C Hamano
  0 siblings, 1 reply; 39+ messages in thread
From: Jeff King @ 2018-10-31  4:30 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, Git Mailing List,
	John Szakmeister, Dennis Kaarsemaker

On Wed, Oct 31, 2018 at 01:23:54PM +0900, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > The bug comes from commit f6371f9210 (sha1_file: add
> > read_loose_object() function, 2017-01-13), which
> > reimplemented some of the existing loose object functions.
> > So it's worth checking if this bug was inherited from any of
> > those. The answers seems to be no. The two obvious
> > candidates are both OK:
> >
> >   1. unpack_sha1_rest(); this doesn't need to loop on
> >      Z_BUF_ERROR at all, since it allocates the expected
> >      output buffer in advance (which we can't do since we're
> >      explicitly streaming here)
> >
> >   2. check_object_signature(); the streaming path relies on
> >      the istream interface, which uses read_istream_loose()
> >      for this case. That function uses a similar "is our
> >      output buffer full" check with Z_BUF_ERROR (which is
> >      where I stole it from for this patch!)
> 
> See 692f0bc7 to find who did the fix you stole from, and for what
> kind of breakage the original fix was made.

Heh. I did not dig into it, but actually thought "I'll bet Junio had to
get this right when he wrote the streaming code. No wonder he spotted my
mistake so quickly!".

> By the way, a very similar loop for pack_non_delta istream iterates
> while total_read is smaller than sz, but it does not have the same
> check upon BUF_ERROR to see if we've read everything.

Indeed. Did you find that one by inspection, or did you peek at:

  https://public-inbox.org/git/20130325202114.GD16019@sigill.intra.peff.net/

? :)

-Peff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/3] check_stream_sha1(): handle input underflow
  2018-10-31  4:30                         ` Jeff King
@ 2018-10-31  4:44                           ` Junio C Hamano
  2018-10-31  5:03                             ` Jeff King
  0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2018-10-31  4:44 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, Git Mailing List,
	John Szakmeister, Dennis Kaarsemaker

Jeff King <peff@peff.net> writes:

>> See 692f0bc7 to find who did the fix you stole from, and for what
>> kind of breakage the original fix was made.
>
> Heh. I did not dig into it, but actually thought "I'll bet Junio had to
> get this right when he wrote the streaming code. No wonder he spotted my
> mistake so quickly!".
>
>> By the way, a very similar loop for pack_non_delta istream iterates
>> while total_read is smaller than sz, but it does not have the same
>> check upon BUF_ERROR to see if we've read everything.
>
> Indeed. Did you find that one by inspection, or did you peek at:
>
>   https://public-inbox.org/git/20130325202114.GD16019@sigill.intra.peff.net/

I looked for BUF_ERROR in the streaming.c and found two instances in
a very similar looking loop with a subtle differnce, and the
difference was due to one of them getting fixed in the past while
the other one was left intact as written at its inception.

I should have looked for that message to read the part below
three-dash mark.  Or we may want to transplant that comment somehow
to the function so next person will not be puzzled like I did?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/3] check_stream_sha1(): handle input underflow
  2018-10-31  4:44                           ` Junio C Hamano
@ 2018-10-31  5:03                             ` Jeff King
  2018-10-31  5:13                               ` Jeff King
  0 siblings, 1 reply; 39+ messages in thread
From: Jeff King @ 2018-10-31  5:03 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, Git Mailing List,
	John Szakmeister, Dennis Kaarsemaker

On Wed, Oct 31, 2018 at 01:44:25PM +0900, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> >> See 692f0bc7 to find who did the fix you stole from, and for what
> >> kind of breakage the original fix was made.
> >
> > Heh. I did not dig into it, but actually thought "I'll bet Junio had to
> > get this right when he wrote the streaming code. No wonder he spotted my
> > mistake so quickly!".
> >
> >> By the way, a very similar loop for pack_non_delta istream iterates
> >> while total_read is smaller than sz, but it does not have the same
> >> check upon BUF_ERROR to see if we've read everything.
> >
> > Indeed. Did you find that one by inspection, or did you peek at:
> >
> >   https://public-inbox.org/git/20130325202114.GD16019@sigill.intra.peff.net/
> 
> I looked for BUF_ERROR in the streaming.c and found two instances in
> a very similar looking loop with a subtle differnce, and the
> difference was due to one of them getting fixed in the past while
> the other one was left intact as written at its inception.
> 
> I should have looked for that message to read the part below
> three-dash mark.  Or we may want to transplant that comment somehow
> to the function so next person will not be puzzled like I did?

Hmm. Reading that function, I am not sure if it actually might need
fixing.

Might we actually get Z_BUF_ERROR asking for more input if zlib reads to
the end of the pack window? That is probably quite unlikely in practice,
but in theory you could feed a very large buffer for the output and use
a very small pack window.

So I do not think we can use the same logic in that loop. But at the
same time, what prevents use_pack() from getting to the very end of the
pack and saying "I have no bytes left for you"? And then we'd loop
infinitely, feeding zlib nothing.

I'm not sure what the solution is. I do not think this works:

diff --git a/streaming.c b/streaming.c
index d1e6b2dce6..a92a85ed10 100644
--- a/streaming.c
+++ b/streaming.c
@@ -394,6 +394,9 @@ static read_method_decl(pack_non_delta)
 		mapped = use_pack(st->u.in_pack.pack, &window,
 				  st->u.in_pack.pos, &st->z.avail_in);

+		if (!st->z.avail_in)
+			break;
+
 		st->z.next_out = (unsigned char *)buf + total_read;
 		st->z.avail_out = sz - total_read;
 		st->z.next_in = mapped;

because we may have read to the very end but still have bytes to output.

Though hrm. I think use_pack() will always tell us about the trailing
20-byte hash in the "avail" window. Which means we should never
legitimately get to 0 there, because it means that either:

  1. We're reading the trailing hash, which cannot possibly be right (in
     most cases I'd expect zlib to barf at that point anyway, but of
     course it's possible to have a hash that is valid zlib data ;) ).

  2. We're truncated _before_ the hash, so we really did read to EOF,
     and there are no more bytes. I suspect we may actually detect this
     case upon opening the pack (since we do peek at the trailer then),
     but again that could be fooled by coincidence.

I guess that's not the whole story, though. use_pack() tries to promise
at least 20 bytes (to simplify some of the other parsing routines). So
we shouldn't actually ever get "0" here. If we really are that close to
the end of the pack, we'd hit this logic in use_pack:

  if (offset > (p->pack_size - the_hash_algo->rawsz))
	die("offset beyond end of packfile (truncated pack?)");

So actually, I think this code is OK as-is. We will always have at least
20 bytes of input, or use_pack() will die.

Phew. I almost just deleted all of the above, because now I think I'm
ready to write that comment you asked for. ;) But I left it since maybe
it makes sense to follow my thought process.

-Peff

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/3] check_stream_sha1(): handle input underflow
  2018-10-31  5:03                             ` Jeff King
@ 2018-10-31  5:13                               ` Jeff King
  2018-10-31  5:31                                 ` Junio C Hamano
  0 siblings, 1 reply; 39+ messages in thread
From: Jeff King @ 2018-10-31  5:13 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, Git Mailing List,
	John Szakmeister, Dennis Kaarsemaker

On Wed, Oct 31, 2018 at 01:03:39AM -0400, Jeff King wrote:

> Phew. I almost just deleted all of the above, because now I think I'm
> ready to write that comment you asked for. ;) But I left it since maybe
> it makes sense to follow my thought process.

So here it is in a more succinct form.

-Peff

-- >8 --
Subject: [PATCH] read_istream_pack_non_delta(): document input handling

Twice now we have scratched our heads about why the loose streaming code
needs the protection added by 692f0bc7ae (avoid infinite loop in
read_istream_loose, 2013-03-25), but the similar code in its pack
counterpart does not.

The short answer is that use_pack() will die before it lets us run out
of bytes. Note that this could mean reading garbage (including the
trailing hash) from the packfile in some cases of corruption, but that's
OK. zlib will notice and complain (and if not, certainly the end result
will not match the object hash we expect).

Let's leave a comment this time to document our findings.

Signed-off-by: Jeff King <peff@peff.net>
---
 streaming.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/streaming.c b/streaming.c
index d1e6b2dce6..ac7c7a22f9 100644
--- a/streaming.c
+++ b/streaming.c
@@ -408,6 +408,15 @@ static read_method_decl(pack_non_delta)
 			st->z_state = z_done;
 			break;
 		}
+
+		/*
+		 * Unlike the loose object case, we do not have to worry here
+		 * about running out of input bytes and spinning infinitely. If
+		 * we get Z_BUF_ERROR due to too few input bytes, then we'll
+		 * replenish them in the next use_pack() call when we loop. If
+		 * we truly hit the end of the pack (i.e., because it's corrupt
+		 * or truncated), then use_pack() catches that and will die().
+		 */
 		if (status != Z_OK && status != Z_BUF_ERROR) {
 			git_inflate_end(&st->z);
 			st->z_state = z_error;
-- 
2.19.1.1298.g19f18f2a22


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/3] check_stream_sha1(): handle input underflow
  2018-10-31  5:13                               ` Jeff King
@ 2018-10-31  5:31                                 ` Junio C Hamano
  0 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2018-10-31  5:31 UTC (permalink / raw)
  To: Jeff King
  Cc: Ævar Arnfjörð Bjarmason, Git Mailing List,
	John Szakmeister, Dennis Kaarsemaker

Jeff King <peff@peff.net> writes:

> On Wed, Oct 31, 2018 at 01:03:39AM -0400, Jeff King wrote:
>
>> Phew. I almost just deleted all of the above, because now I think I'm
>> ready to write that comment you asked for. ;) But I left it since maybe
>> it makes sense to follow my thought process.
>
> So here it is in a more succinct form.

Thanks.

> +
> +		/*
> +		 * Unlike the loose object case, we do not have to worry here
> +		 * about running out of input bytes and spinning infinitely. If
> +		 * we get Z_BUF_ERROR due to too few input bytes, then we'll
> +		 * replenish them in the next use_pack() call when we loop. If
> +		 * we truly hit the end of the pack (i.e., because it's corrupt
> +		 * or truncated), then use_pack() catches that and will die().
> +		 */
>  		if (status != Z_OK && status != Z_BUF_ERROR) {
>  			git_inflate_end(&st->z);
>  			st->z_state = z_error;

Reads well.  Will apply.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 0/3] Add a GIT_TEST_FSCK test mode
  2018-10-30 23:23                     ` [PATCH 3/3] cat-file: handle streaming failures consistently Jeff King
@ 2018-10-31 12:42                       ` Ævar Arnfjörð Bjarmason
  2018-10-31 12:42                       ` [PATCH 1/3] tests: add a "env-bool" helper to test-tool Ævar Arnfjörð Bjarmason
                                         ` (4 subsequent siblings)
  5 siblings, 0 replies; 39+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-10-31 12:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, John Szakmeister, Dennis Kaarsemaker,
	Christian Couder, Ævar Arnfjörð Bjarmason

This goes on top Jeff's "cat-file: handle streaming failures
consistently" and implements the test mode I suggested in
https://public-inbox.org/git/877ehzksjd.fsf@evledraar.gmail.com/

In the process I didn't find any other bugs than the 2.12..2.19
regression which is already fixed, but as noted in 3/3 I think it's
worth it to stress test fsck like this. I'll be adding this to my
regular build.

Ævar Arnfjörð Bjarmason (3):
  tests: add a "env-bool" helper to test-tool
  tests: mark those tests where "git fsck" fails at the end
  tests: add a special test setup that runs "git fsck" before exiting

 Makefile                                |  1 +
 t/README                                |  5 ++++
 t/helper/test-env-bool.c                |  9 +++++++
 t/helper/test-tool.c                    |  1 +
 t/helper/test-tool.h                    |  1 +
 t/t0000-basic.sh                        | 26 +++++++++++++++++++
 t/t1006-cat-file.sh                     |  5 ++++
 t/t1305-config-include.sh               |  4 +++
 t/t1404-update-ref-errors.sh            |  4 +++
 t/t1410-reflog.sh                       |  4 +++
 t/t1515-rev-parse-outside-repo.sh       |  4 +++
 t/t3008-ls-files-lazy-init-name-hash.sh |  4 +++
 t/t3103-ls-tree-misc.sh                 |  6 +++++
 t/t3430-rebase-merges.sh                |  6 +++++
 t/t4046-diff-unmerged.sh                |  4 +++
 t/t4058-diff-duplicates.sh              |  5 ++++
 t/t4212-log-corrupt.sh                  |  6 +++++
 t/t5000-tar-tree.sh                     |  5 ++++
 t/t5300-pack-object.sh                  |  5 ++++
 t/t5303-pack-corruption-resilience.sh   |  8 ++++++
 t/t5307-pack-missing-commit.sh          |  7 ++++++
 t/t5312-prune-corruption.sh             |  4 +++
 t/t5504-fetch-receive-strict.sh         |  4 +++
 t/t5601-clone.sh                        |  8 ++++++
 t/t6007-rev-list-cherry-pick-file.sh    |  4 +++
 t/t6011-rev-list-with-bad-commit.sh     |  7 ++++++
 t/t6030-bisect-porcelain.sh             |  6 +++++
 t/t7007-show.sh                         |  6 +++++
 t/t7106-reset-unborn-branch.sh          |  4 +++
 t/t7415-submodule-names.sh              |  4 +++
 t/t7416-submodule-dash-url.sh           |  4 +++
 t/t7417-submodule-path-url.sh           |  4 +++
 t/t7509-commit-authorship.sh            |  4 +++
 t/t8003-blame-corner-cases.sh           |  4 +++
 t/t9130-git-svn-authors-file.sh         |  7 ++++++
 t/test-lib-functions.sh                 |  2 ++
 t/test-lib.sh                           | 33 +++++++++++++++++++++++++
 37 files changed, 225 insertions(+)
 create mode 100644 t/helper/test-env-bool.c

-- 
2.19.1.899.g0250525e69


^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/3] tests: add a "env-bool" helper to test-tool
  2018-10-30 23:23                     ` [PATCH 3/3] cat-file: handle streaming failures consistently Jeff King
  2018-10-31 12:42                       ` [PATCH 0/3] Add a GIT_TEST_FSCK test mode Ævar Arnfjörð Bjarmason
@ 2018-10-31 12:42                       ` Ævar Arnfjörð Bjarmason
  2018-10-31 12:42                       ` [PATCH 2/3] tests: mark those tests where "git fsck" fails at the end Ævar Arnfjörð Bjarmason
                                         ` (3 subsequent siblings)
  5 siblings, 0 replies; 39+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-10-31 12:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, John Szakmeister, Dennis Kaarsemaker,
	Christian Couder, Ævar Arnfjörð Bjarmason

This new helper is a wrapper around the git_env_bool() function. There
are various GIT_TEST_* variables described in "Running tests with
special setups" in t/README that use git_env_bool().

A GIT_TEST_* variable implemented in shellscript won't have access to
the same semantics (historically we've used "test -n" for many of
these).

So let's add this helper so we can expose the same environment
variable behavior without exposing the implementation detail of
whether that variable happens to be checked in C or shellscript.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Makefile                 | 1 +
 t/helper/test-env-bool.c | 9 +++++++++
 t/helper/test-tool.c     | 1 +
 t/helper/test-tool.h     | 1 +
 4 files changed, 12 insertions(+)
 create mode 100644 t/helper/test-env-bool.c

diff --git a/Makefile b/Makefile
index b08d5ea258..ca624c381f 100644
--- a/Makefile
+++ b/Makefile
@@ -723,6 +723,7 @@ TEST_BUILTINS_OBJS += test-dump-fsmonitor.o
 TEST_BUILTINS_OBJS += test-dump-split-index.o
 TEST_BUILTINS_OBJS += test-dump-untracked-cache.o
 TEST_BUILTINS_OBJS += test-example-decorate.o
+TEST_BUILTINS_OBJS += test-env-bool.o
 TEST_BUILTINS_OBJS += test-genrandom.o
 TEST_BUILTINS_OBJS += test-hashmap.o
 TEST_BUILTINS_OBJS += test-index-version.o
diff --git a/t/helper/test-env-bool.c b/t/helper/test-env-bool.c
new file mode 100644
index 0000000000..956b0aa88e
--- /dev/null
+++ b/t/helper/test-env-bool.c
@@ -0,0 +1,9 @@
+#include "test-tool.h"
+#include "cache.h"
+#include "config.h"
+
+int cmd__env_bool(int argc, const char **argv)
+{
+	assert(argc == 2);
+	return !git_env_bool(argv[1], 0);
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 5df8b682aa..c4481085c4 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -17,6 +17,7 @@ static struct test_cmd cmds[] = {
 	{ "dump-fsmonitor", cmd__dump_fsmonitor },
 	{ "dump-split-index", cmd__dump_split_index },
 	{ "dump-untracked-cache", cmd__dump_untracked_cache },
+	{ "env-bool", cmd__env_bool },
 	{ "example-decorate", cmd__example_decorate },
 	{ "genrandom", cmd__genrandom },
 	{ "hashmap", cmd__hashmap },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 71f470b871..f7845fbc56 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -13,6 +13,7 @@ int cmd__dump_cache_tree(int argc, const char **argv);
 int cmd__dump_fsmonitor(int argc, const char **argv);
 int cmd__dump_split_index(int argc, const char **argv);
 int cmd__dump_untracked_cache(int argc, const char **argv);
+int cmd__env_bool(int argc, const char **argv);
 int cmd__example_decorate(int argc, const char **argv);
 int cmd__genrandom(int argc, const char **argv);
 int cmd__hashmap(int argc, const char **argv);
-- 
2.19.1.899.g0250525e69


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 2/3] tests: mark those tests where "git fsck" fails at the end
  2018-10-30 23:23                     ` [PATCH 3/3] cat-file: handle streaming failures consistently Jeff King
  2018-10-31 12:42                       ` [PATCH 0/3] Add a GIT_TEST_FSCK test mode Ævar Arnfjörð Bjarmason
  2018-10-31 12:42                       ` [PATCH 1/3] tests: add a "env-bool" helper to test-tool Ævar Arnfjörð Bjarmason
@ 2018-10-31 12:42                       ` Ævar Arnfjörð Bjarmason
  2018-11-01  3:37                         ` Junio C Hamano
  2018-10-31 12:42                       ` [PATCH 3/3] tests: add a special test setup that runs "git fsck" before exiting Ævar Arnfjörð Bjarmason
                                         ` (2 subsequent siblings)
  5 siblings, 1 reply; 39+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-10-31 12:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, John Szakmeister, Dennis Kaarsemaker,
	Christian Couder, Ævar Arnfjörð Bjarmason

Mark the tests where "git fsck" fails at the end with extra test code
to check the fsck output. There fsck.{err,out} has been created for
us.

A later change will add the support for GIT_TEST_FSCK_TESTS. They're
being added first to ensure the test suite will never fail with
GIT_TEST_FSCK=true during bisect.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh                     | 5 +++++
 t/t1305-config-include.sh               | 4 ++++
 t/t1404-update-ref-errors.sh            | 4 ++++
 t/t1410-reflog.sh                       | 4 ++++
 t/t1515-rev-parse-outside-repo.sh       | 4 ++++
 t/t3008-ls-files-lazy-init-name-hash.sh | 4 ++++
 t/t3103-ls-tree-misc.sh                 | 6 ++++++
 t/t3430-rebase-merges.sh                | 6 ++++++
 t/t4046-diff-unmerged.sh                | 4 ++++
 t/t4058-diff-duplicates.sh              | 5 +++++
 t/t4212-log-corrupt.sh                  | 6 ++++++
 t/t5000-tar-tree.sh                     | 5 +++++
 t/t5300-pack-object.sh                  | 5 +++++
 t/t5303-pack-corruption-resilience.sh   | 8 ++++++++
 t/t5307-pack-missing-commit.sh          | 7 +++++++
 t/t5312-prune-corruption.sh             | 4 ++++
 t/t5504-fetch-receive-strict.sh         | 4 ++++
 t/t5601-clone.sh                        | 8 ++++++++
 t/t6007-rev-list-cherry-pick-file.sh    | 4 ++++
 t/t6011-rev-list-with-bad-commit.sh     | 7 +++++++
 t/t6030-bisect-porcelain.sh             | 6 ++++++
 t/t7007-show.sh                         | 6 ++++++
 t/t7106-reset-unborn-branch.sh          | 4 ++++
 t/t7415-submodule-names.sh              | 4 ++++
 t/t7416-submodule-dash-url.sh           | 4 ++++
 t/t7417-submodule-path-url.sh           | 4 ++++
 t/t7509-commit-authorship.sh            | 4 ++++
 t/t8003-blame-corner-cases.sh           | 4 ++++
 t/t9130-git-svn-authors-file.sh         | 7 +++++++
 29 files changed, 147 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 43c4be1e5e..12b69e6fbe 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -588,4 +588,9 @@ test_expect_success 'cat-file --unordered works' '
 	test_cmp expect actual
 '
 
+GIT_TEST_FSCK_TESTS='
+	test_i18ngrep "unable to unpack header of" fsck.err &&
+	test_i18ngrep "object corrupt or missing" fsck.err
+'
+
 test_done
diff --git a/t/t1305-config-include.sh b/t/t1305-config-include.sh
index 635918505d..890d307d4e 100755
--- a/t/t1305-config-include.sh
+++ b/t/t1305-config-include.sh
@@ -313,4 +313,8 @@ test_expect_success 'include cycles are detected' '
 	test_i18ngrep "exceeded maximum include depth" stderr
 '
 
+GIT_TEST_FSCK_TESTS='
+	test_i18ngrep "exceeded maximum include depth" fsck.err
+'
+
 test_done
diff --git a/t/t1404-update-ref-errors.sh b/t/t1404-update-ref-errors.sh
index 51a4f4c0ac..6095b2d4b9 100755
--- a/t/t1404-update-ref-errors.sh
+++ b/t/t1404-update-ref-errors.sh
@@ -618,4 +618,8 @@ test_expect_success 'delete fails cleanly if packed-refs file is locked' '
 	test_cmp unchanged actual
 '
 
+GIT_TEST_FSCK_TESTS='
+	test_i18ngrep "invalid sha1 pointer" fsck.err
+'
+
 test_done
diff --git a/t/t1410-reflog.sh b/t/t1410-reflog.sh
index 388b0611d8..43b8e0c9c5 100755
--- a/t/t1410-reflog.sh
+++ b/t/t1410-reflog.sh
@@ -368,4 +368,8 @@ test_expect_success 'continue walking past root commits' '
 	)
 '
 
+GIT_TEST_FSCK_TESTS='
+	test_i18ngrep "invalid reflog entry" fsck.err
+'
+
 test_done
diff --git a/t/t1515-rev-parse-outside-repo.sh b/t/t1515-rev-parse-outside-repo.sh
index 3ec2971ee5..1d8fc3ad70 100755
--- a/t/t1515-rev-parse-outside-repo.sh
+++ b/t/t1515-rev-parse-outside-repo.sh
@@ -42,4 +42,8 @@ test_expect_success 'rev-parse --resolve-git-dir' '
 	test_cmp expect actual
 '
 
+GIT_TEST_FSCK_TESTS='
+	test_i18ngrep "not a git repository" fsck.err
+'
+
 test_done
diff --git a/t/t3008-ls-files-lazy-init-name-hash.sh b/t/t3008-ls-files-lazy-init-name-hash.sh
index 64f047332b..7fb2e5c177 100755
--- a/t/t3008-ls-files-lazy-init-name-hash.sh
+++ b/t/t3008-ls-files-lazy-init-name-hash.sh
@@ -24,4 +24,8 @@ test_expect_success 'no buffer overflow in lazy_init_name_hash' '
 	test-tool lazy-init-name-hash -m
 '
 
+GIT_TEST_FSCK_TESTS='
+	test_i18ngrep "notice: No default references" fsck.err
+'
+
 test_done
diff --git a/t/t3103-ls-tree-misc.sh b/t/t3103-ls-tree-misc.sh
index 14520913af..b7d8ae2e81 100755
--- a/t/t3103-ls-tree-misc.sh
+++ b/t/t3103-ls-tree-misc.sh
@@ -22,4 +22,10 @@ test_expect_success 'ls-tree fails with non-zero exit code on broken tree' '
 	test_must_fail git ls-tree -r HEAD
 '
 
+GIT_TEST_FSCK_TESTS='
+	test_i18ngrep "invalid sha1 pointer in cache-tree" fsck.err &&
+	test_i18ngrep "broken link from.*tree" fsck.out &&
+	test_i18ngrep "missing tree" fsck.out
+'
+
 test_done
diff --git a/t/t3430-rebase-merges.sh b/t/t3430-rebase-merges.sh
index aa7bfc88ec..efac3a792b 100755
--- a/t/t3430-rebase-merges.sh
+++ b/t/t3430-rebase-merges.sh
@@ -396,4 +396,10 @@ test_expect_success 'with --autosquash and --exec' '
 	grep "G: +G" actual
 '
 
+GIT_TEST_FSCK_TESTS='
+	test_i18ngrep "broken link from.*commit" fsck.out &&
+	test_i18ngrep "to.*tree" fsck.out &&
+	test_i18ngrep "missing tree" fsck.out
+'
+
 test_done
diff --git a/t/t4046-diff-unmerged.sh b/t/t4046-diff-unmerged.sh
index ff7cfd884a..d868bc44a9 100755
--- a/t/t4046-diff-unmerged.sh
+++ b/t/t4046-diff-unmerged.sh
@@ -84,4 +84,8 @@ test_expect_success 'diff-files -3' '
 	test_cmp diff-files-3.expect diff-files-3.actual
 '
 
+GIT_TEST_FSCK_TESTS='
+	test_i18ngrep "notice: No default references" fsck.err
+'
+
 test_done
diff --git a/t/t4058-diff-duplicates.sh b/t/t4058-diff-duplicates.sh
index c24ee175ef..9c79410dc0 100755
--- a/t/t4058-diff-duplicates.sh
+++ b/t/t4058-diff-duplicates.sh
@@ -76,4 +76,9 @@ test_expect_success 'diff-tree with renames' '
 	test_cmp expect actual
 '
 
+GIT_TEST_FSCK_TESTS='
+	test_i18ngrep "zeroPaddedFilemode" fsck.err &&
+	test_i18ngrep "duplicateEntries" fsck.err
+'
+
 test_done
diff --git a/t/t4212-log-corrupt.sh b/t/t4212-log-corrupt.sh
index 03b952c90d..5f36c58a61 100755
--- a/t/t4212-log-corrupt.sh
+++ b/t/t4212-log-corrupt.sh
@@ -85,4 +85,10 @@ test_expect_success 'absurdly far-in-future date' '
 	git log -1 --format=%ad $commit
 '
 
+GIT_TEST_FSCK_TESTS='
+	test_i18ngrep "badDate" fsck.err &&
+	test_i18ngrep "badDateOverflow" fsck.err &&
+	test_i18ngrep "missingSpaceBeforeDate" fsck.err
+'
+
 test_done
diff --git a/t/t5000-tar-tree.sh b/t/t5000-tar-tree.sh
index 2a97b27b0a..88c768f232 100755
--- a/t/t5000-tar-tree.sh
+++ b/t/t5000-tar-tree.sh
@@ -408,4 +408,9 @@ test_expect_success TAR_HUGE,TIME_IS_64BIT,TIME_T_IS_64BIT 'system tar can read
 	test_cmp expect actual
 '
 
+GIT_TEST_FSCK_TESTS='
+	test_i18ngrep "corrupt loose object" fsck.err &&
+	test_i18ngrep "object corrupt or missing" fsck.err
+'
+
 test_done
diff --git a/t/t5300-pack-object.sh b/t/t5300-pack-object.sh
index 6c620cd540..b4ef9a447a 100755
--- a/t/t5300-pack-object.sh
+++ b/t/t5300-pack-object.sh
@@ -493,4 +493,9 @@ test_expect_success \
     'test_must_fail git -c core.bigfilethreshold=1 index-pack -o bad.idx test-3.pack 2>msg &&
      test_i18ngrep "SHA1 COLLISION FOUND" msg'
 
+GIT_TEST_FSCK_TESTS='
+	test_i18ngrep "sha1 mismatch for" fsck.err &&
+	test_i18ngrep "object corrupt or missing" fsck.err
+'
+
 test_done
diff --git a/t/t5303-pack-corruption-resilience.sh b/t/t5303-pack-corruption-resilience.sh
index 41e6dc4dcf..79c9b307d0 100755
--- a/t/t5303-pack-corruption-resilience.sh
+++ b/t/t5303-pack-corruption-resilience.sh
@@ -400,4 +400,12 @@ test_expect_success \
     'printf "\0\1\1X\0" > tail_garbage_opcode &&
      test_must_fail test-tool delta -p /dev/null tail_garbage_opcode /dev/null'
 
+GIT_TEST_FSCK_TESTS='
+	test_i18ngrep "pack checksum mismatch" fsck.err &&
+	test_i18ngrep "index CRC mismatch.*at offset 12" fsck.err &&
+	test_i18ngrep "cannot unpack.*at offset 12" fsck.err &&
+	test_i18ngrep "failed to read delta base object.*at offset 12" fsck.err &&
+	test_i18ngrep "failed to read delta base object.*at offset 2032" fsck.err
+'
+
 test_done
diff --git a/t/t5307-pack-missing-commit.sh b/t/t5307-pack-missing-commit.sh
index dacb440b27..2780f4ceeb 100755
--- a/t/t5307-pack-missing-commit.sh
+++ b/t/t5307-pack-missing-commit.sh
@@ -36,4 +36,11 @@ test_expect_success 'pack-objects notices corruption' '
 	test_must_fail git pack-objects --revs pack
 '
 
+GIT_TEST_FSCK_TESTS='
+	test_i18ngrep "invalid sha1 pointer" fsck.err &&
+	test_i18ngrep "broken link from.*commit" fsck.out &&
+	test_i18ngrep "to.*commit" fsck.out &&
+	test_i18ngrep "missing commit" fsck.out
+'
+
 test_done
diff --git a/t/t5312-prune-corruption.sh b/t/t5312-prune-corruption.sh
index da9d59940d..898d8906bc 100755
--- a/t/t5312-prune-corruption.sh
+++ b/t/t5312-prune-corruption.sh
@@ -111,4 +111,8 @@ test_expect_success 'pack-refs does not drop broken refs during deletion' '
 	test_cmp expect actual
 '
 
+GIT_TEST_FSCK_TESTS='
+	test_i18ngrep "invalid sha1 pointer" fsck.err
+'
+
 test_done
diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index 7bc706873c..945a060992 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -348,4 +348,8 @@ test_expect_success \
 	grep "Cannot demote unterminatedheader" act
 '
 
+GIT_TEST_FSCK_TESTS='
+	test_i18ngrep "missingEmail" fsck.err
+'
+
 test_done
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index f1a49e94f5..35eca5881b 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -726,6 +726,14 @@ test_expect_success 'batch missing blob request does not inadvertently try to fe
 	git clone --filter=blob:limit=0 "file://$(pwd)/server" client
 '
 
+# We might have "test_done" through lib-httpd.sh. Need to tes
+# GIT_TEST_FSCK_TESTS here.
+GIT_TEST_FSCK_TESTS='
+	test_i18ngrep "object.*is a tree, not a blob" fsck.err &&
+	test_i18ngrep "object.*is a commit, not a blob" fsck.err &&
+	test_i18ngrep "error in tree.*: broken links" fsck.err
+'
+
 . "$TEST_DIRECTORY"/lib-httpd.sh
 start_httpd
 
diff --git a/t/t6007-rev-list-cherry-pick-file.sh b/t/t6007-rev-list-cherry-pick-file.sh
index f0268372d2..a86ba0900b 100755
--- a/t/t6007-rev-list-cherry-pick-file.sh
+++ b/t/t6007-rev-list-cherry-pick-file.sh
@@ -266,4 +266,8 @@ test_expect_success '--cherry-pick avoids looking at full diffs' '
 	git rev-list --cherry-pick ...shy-diff
 '
 
+GIT_TEST_FSCK_TESTS='
+	test_i18ngrep "missing blob" fsck.out
+'
+
 test_done
diff --git a/t/t6011-rev-list-with-bad-commit.sh b/t/t6011-rev-list-with-bad-commit.sh
index 545b461e51..30d39ce925 100755
--- a/t/t6011-rev-list-with-bad-commit.sh
+++ b/t/t6011-rev-list-with-bad-commit.sh
@@ -55,5 +55,12 @@ test_expect_success 'first commit is still available' \
    git log $first_commit
    '
 
+GIT_TEST_FSCK_TESTS='
+	test_i18ngrep "pack checksum mismatch" fsck.err &&
+	test_i18ngrep "index CRC mismatch for object.*at offset 487" fsck.err &&
+	test_i18ngrep "inflate: data stream error.*incorrect data check" fsck.err &&
+	test_i18ngrep "cannot unpack.*at offset 487" fsck.err
+'
+
 test_done
 
diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh
index f84ff941c3..5da668ed06 100755
--- a/t/t6030-bisect-porcelain.sh
+++ b/t/t6030-bisect-porcelain.sh
@@ -911,4 +911,10 @@ test_expect_success 'git bisect reset cleans bisection state properly' '
 	test_path_is_missing "$GIT_DIR/BISECT_START"
 '
 
+GIT_TEST_FSCK_TESTS='
+	test_i18ngrep "broken link from.*tree" fsck.out &&
+	test_i18ngrep "to.*tree" fsck.out &&
+	test_i18ngrep "missing tree" fsck.out
+'
+
 test_done
diff --git a/t/t7007-show.sh b/t/t7007-show.sh
index 42d3db6246..d25cee8a72 100755
--- a/t/t7007-show.sh
+++ b/t/t7007-show.sh
@@ -128,4 +128,10 @@ test_expect_success 'show --graph is forbidden' '
   test_must_fail git show --graph HEAD
 '
 
+GIT_TEST_FSCK_TESTS='
+	test_i18ngrep "broken link from.*tag" fsck.out &&
+	test_i18ngrep "to.*blob" fsck.out &&
+	test_i18ngrep "missing blob" fsck.out
+'
+
 test_done
diff --git a/t/t7106-reset-unborn-branch.sh b/t/t7106-reset-unborn-branch.sh
index ecb85c3b82..0719261fe7 100755
--- a/t/t7106-reset-unborn-branch.sh
+++ b/t/t7106-reset-unborn-branch.sh
@@ -64,4 +64,8 @@ test_expect_success 'reset --hard' '
 	test_path_is_missing a
 '
 
+GIT_TEST_FSCK_TESTS='
+	test_i18ngrep "missing tree" fsck.out
+'
+
 test_done
diff --git a/t/t7415-submodule-names.sh b/t/t7415-submodule-names.sh
index 293e2e1963..3d8fa7831f 100755
--- a/t/t7415-submodule-names.sh
+++ b/t/t7415-submodule-names.sh
@@ -191,4 +191,8 @@ test_expect_success 'fsck detects corrupt .gitmodules' '
 	)
 '
 
+GIT_TEST_FSCK_TESTS='
+	test_i18ngrep "gitmodulesName" fsck.err
+'
+
 test_done
diff --git a/t/t7416-submodule-dash-url.sh b/t/t7416-submodule-dash-url.sh
index 1cd2c1c1ea..e6c885784e 100755
--- a/t/t7416-submodule-dash-url.sh
+++ b/t/t7416-submodule-dash-url.sh
@@ -46,4 +46,8 @@ test_expect_success 'fsck rejects unprotected dash' '
 	grep gitmodulesUrl err
 '
 
+GIT_TEST_FSCK_TESTS='
+	test_i18ngrep "gitmodulesUrl" fsck.err
+'
+
 test_done
diff --git a/t/t7417-submodule-path-url.sh b/t/t7417-submodule-path-url.sh
index 756af8c4d6..8362442908 100755
--- a/t/t7417-submodule-path-url.sh
+++ b/t/t7417-submodule-path-url.sh
@@ -25,4 +25,8 @@ test_expect_success 'fsck rejects unprotected dash' '
 	grep gitmodulesPath err
 '
 
+GIT_TEST_FSCK_TESTS='
+	test_i18ngrep "gitmodulesPath" fsck.err
+'
+
 test_done
diff --git a/t/t7509-commit-authorship.sh b/t/t7509-commit-authorship.sh
index 500ab2fe72..cdcbfed61a 100755
--- a/t/t7509-commit-authorship.sh
+++ b/t/t7509-commit-authorship.sh
@@ -174,4 +174,8 @@ test_expect_success '--reset-author with CHERRY_PICK_HEAD' '
 	test_cmp expect actual
 '
 
+GIT_TEST_FSCK_TESTS='
+	test_i18ngrep "invalid reflog entry" fsck.err
+'
+
 test_done
diff --git a/t/t8003-blame-corner-cases.sh b/t/t8003-blame-corner-cases.sh
index c92a47b6d5..3a8affc3e7 100755
--- a/t/t8003-blame-corner-cases.sh
+++ b/t/t8003-blame-corner-cases.sh
@@ -275,4 +275,8 @@ test_expect_success 'blame file with CRLF core.autocrlf=true' '
 	grep "A U Thor" actual
 '
 
+GIT_TEST_FSCK_TESTS='
+	test_i18ngrep "missingNameBeforeEmail" fsck.err
+'
+
 test_done
diff --git a/t/t9130-git-svn-authors-file.sh b/t/t9130-git-svn-authors-file.sh
index cb764bcadc..0c0c42c72c 100755
--- a/t/t9130-git-svn-authors-file.sh
+++ b/t/t9130-git-svn-authors-file.sh
@@ -128,4 +128,11 @@ test_expect_success 'authors-file imported user without email' '
 
 test_debug 'GIT_DIR=gitconfig.clone/.git git log'
 
+GIT_TEST_FSCK_TESTS='
+	test_i18ngrep "object.*is a tree, not a blob" fsck.err &&
+	test_i18ngrep "object.*is a commit, not a blob" fsck.err &&
+	test_i18ngrep "tree.*: broken links" fsck.err &&
+	test_i18ngrep "missingTaggerEntry" fsck.err
+'
+
 test_done
-- 
2.19.1.899.g0250525e69


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [PATCH 3/3] tests: add a special test setup that runs "git fsck" before exiting
  2018-10-30 23:23                     ` [PATCH 3/3] cat-file: handle streaming failures consistently Jeff King
                                         ` (2 preceding siblings ...)
  2018-10-31 12:42                       ` [PATCH 2/3] tests: mark those tests where "git fsck" fails at the end Ævar Arnfjörð Bjarmason
@ 2018-10-31 12:42                       ` Ævar Arnfjörð Bjarmason
  2018-10-31 13:33                       ` [PATCH 3/3] cat-file: handle streaming failures consistently Torsten Bögershausen
  2018-10-31 17:38                       ` Eric Sunshine
  5 siblings, 0 replies; 39+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2018-10-31 12:42 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, John Szakmeister, Dennis Kaarsemaker,
	Christian Couder, Ævar Arnfjörð Bjarmason

Add the ability to run the tests with GIT_TEST_FSCK=true in the
environment. If set we'll run "git fsck" at the end of every test, and
those tests that fail need to annotate what their failure was.

The goal is to detect regressions in fsck that our tests might
otherwise miss. We had one such regression in c68b489e56 ("fsck: parse
loose object paths directly", 2017-01-13) released with Git 2.12.0,
which wasn't spotted more than a year and a half later during the
2.20.0 window.

As it turns out there already was a test for what triggerd that bug
all along in the form of t5000-tar-tree.sh, we just weren't running
"git fsck" at the end[1].

That specific bug has been fixed in ("check_stream_sha1(): handle
input underflow", 2018-10-30)[1], but since we have a demonstrable
history of not anticipating which tests which would make "git fsck"
fail need to be made part of the "git fsck" test suite let's add this
test mode to cover potential blind spots. The "git fsck" command is
also something where we might expect that during our RC windows users
aren't actively testing on already corrupt repositories, so "in the
wild" test coverage will be spotty, so we need all the help we can
get.

1. https://public-inbox.org/git/878t2fkxrn.fsf@evledraar.gmail.com/
2. https://public-inbox.org/git/20181030232312.GB32038@sigill.intra.peff.net/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/README                |  5 +++++
 t/t0000-basic.sh        | 26 ++++++++++++++++++++++++++
 t/test-lib-functions.sh |  2 ++
 t/test-lib.sh           | 33 +++++++++++++++++++++++++++++++++
 4 files changed, 66 insertions(+)

diff --git a/t/README b/t/README
index 8847489640..092f78b3d7 100644
--- a/t/README
+++ b/t/README
@@ -343,6 +343,11 @@ of the index for the whole test suite by bypassing the default number of
 cache entries and thread minimums. Setting this to 1 will make the
 index loading single threaded.
 
+GIT_TEST_FSCK=<boolean> if true arranges for "git fsck" to be run at
+the end of the test scripts. Those tests that fail will need to set a
+"GIT_TEST_FSCK_TESTS" variable before we enter "test_done" with a test
+fragment to test that fsck.{out,err} is the expected failure.
+
 Naming Tests
 ------------
 
diff --git a/t/t0000-basic.sh b/t/t0000-basic.sh
index 4d23373526..8e667e6691 100755
--- a/t/t0000-basic.sh
+++ b/t/t0000-basic.sh
@@ -19,6 +19,7 @@ modification *should* take notice and update the test vectors here.
 '
 
 . ./test-lib.sh
+unset GIT_TEST_FSCK
 
 try_local_x () {
 	local x="local" &&
@@ -393,6 +394,31 @@ test_expect_success 'GIT_SKIP_TESTS sh pattern' "
 	)
 "
 
+test_expect_success 'GIT_TEST_FSCK=true' "
+	test_when_finished 'sane_unset GIT_TEST_FSCK' &&
+	GIT_TEST_FSCK=true &&
+	export GIT_TEST_FSCK &&
+	run_sub_test_lib_test run-git-fsck-test \
+		'--run basic' --run='1 3 5' <<-\\EOF &&
+	for i in 1 2 3 4 5 6
+	do
+		test_expect_success \"passing test #\$i\" 'true'
+	done
+	GIT_TEST_FSCK=true test_done
+	EOF
+	check_sub_test_lib_test run-git-fsck-test <<-\\EOF
+	> ok 1 - passing test #1
+	> ok 2 # skip passing test #2 (--run)
+	> ok 3 - passing test #3
+	> ok 4 # skip passing test #4 (--run)
+	> ok 5 - passing test #5
+	> ok 6 # skip passing test #6 (--run)
+	> ok 7 # skip git fsck at end (due to GIT_TEST_FSCK) (expected to succeed) (--run)
+	> # passed all 7 test(s)
+	> 1..7
+	EOF
+"
+
 test_expect_success '--run basic' "
 	run_sub_test_lib_test run-basic \
 		'--run basic' --run='1 3 5' <<-\\EOF &&
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 78d8c3783b..7d002ff5aa 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -470,6 +470,7 @@ test_expect_success () {
 # Usage: test_external description command arguments...
 # Example: test_external 'Perl API' perl ../path/to/test.pl
 test_external () {
+	unset GIT_TEST_FSCK
 	test "$#" = 4 && { test_prereq=$1; shift; } || test_prereq=
 	test "$#" = 3 ||
 	error >&5 "bug in the test script: not 3 or 4 parameters to test_external"
@@ -511,6 +512,7 @@ test_external () {
 # Like test_external, but in addition tests that the command generated
 # no output on stderr.
 test_external_without_stderr () {
+	unset GIT_TEST_FSCK
 	# The temporary file has no (and must have no) security
 	# implications.
 	tmp=${TMPDIR:-/tmp}
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 897e6fcc94..5f7f5595e3 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -454,6 +454,8 @@ GIT_EXIT_OK=
 trap 'die' EXIT
 trap 'exit $?' INT
 
+GIT_TEST_FSCK_TESTS=
+
 # The user-facing functions are loaded from a separate file so that
 # test_perf subshells can have them too
 . "$TEST_DIRECTORY/test-lib-functions.sh"
@@ -789,7 +791,36 @@ test_at_end_hook_ () {
 	:
 }
 
+_test_done_fsck() {
+	desc='git fsck at end (due to GIT_TEST_FSCK)'
+	if test -n "$GIT_TEST_FSCK_TESTS"
+	then
+		test_expect_success "$desc (expected to fail)" '
+			test_must_fail git fsck 2>fsck.err >fsck.out
+		'
+		test_expect_success "$desc (expected to fail) -- assert failure mode" "
+			test_path_exists fsck.err &&
+			test_path_exists fsck.out &&
+			$GIT_TEST_FSCK_TESTS
+		"
+	else
+		test_expect_success "$desc (expected to succeed)" '
+			git fsck
+		'
+	fi
+}
+
 test_done () {
+	# Don't want to run this under TEST_NO_CREATE_REPO, otherwise
+	# we end up sloowly running "git fsck" against git.git
+	if test -z "$TEST_NO_CREATE_REPO" &&
+		    # test -n first so all --verbose output isn't
+		    # polluted with this check
+		    test -n "$GIT_TEST_FSCK" &&
+		    test_have_prereq TEST_FSCK
+	then
+		_test_done_fsck
+	fi
 	GIT_EXIT_OK=t
 
 	if test -z "$HARNESS_ACTIVE"
@@ -1268,3 +1299,5 @@ test_lazy_prereq CURL '
 test_lazy_prereq SHA1 '
 	test $(git hash-object /dev/null) = e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
 '
+
+test_lazy_prereq TEST_FSCK 'test-tool env-bool GIT_TEST_FSCK'
-- 
2.19.1.899.g0250525e69


^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/3] cat-file: handle streaming failures consistently
  2018-10-30 23:23                     ` [PATCH 3/3] cat-file: handle streaming failures consistently Jeff King
                                         ` (3 preceding siblings ...)
  2018-10-31 12:42                       ` [PATCH 3/3] tests: add a special test setup that runs "git fsck" before exiting Ævar Arnfjörð Bjarmason
@ 2018-10-31 13:33                       ` Torsten Bögershausen
  2018-10-31 14:23                         ` Junio C Hamano
  2018-10-31 17:38                       ` Eric Sunshine
  5 siblings, 1 reply; 39+ messages in thread
From: Torsten Bögershausen @ 2018-10-31 13:33 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Git Mailing List, John Szakmeister, Dennis Kaarsemaker

On Tue, Oct 30, 2018 at 07:23:38PM -0400, Jeff King wrote:
> There are three ways to convince cat-file to stream a blob:
> 
>   - cat-file -p $blob
> 
>   - cat-file blob $blob
> 
>   - echo $batch | cat-file --batch
> 
> In the first two, we simply exit with the error code of
> streaw_blob_to_fd(). That means that an error will cause us
> to exit with "-1" (which we try to avoid) without printing
> any kind of error message (which is confusing to the user).
> 
> Instead, let's match the third case, which calls die() on an
> error. Unfortunately we cannot be more specific, as
> stream_blob_to_fd() does not tell us whether the problem was
> on reading (e.g., a corrupt object) or on writing (e.g.,
> ENOSPC). That might be an opportunity for future work, but
> for now we will at least exit with a sane message and exit
> code.
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  builtin/cat-file.c | 16 ++++++++++++----
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index 8d97c84725..0d403eb77d 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -50,6 +50,13 @@ static int filter_object(const char *path, unsigned mode,
>  	return 0;
>  }
>  
> +static int stream_blob(const struct object_id *oid)

Sorry for nit-picking:
could this be renamed into stream_blob_to_stdout() ?

> +{
> +	if (stream_blob_to_fd(1, oid, NULL, 0))

And I wonder if we could make things clearer:
 s/1/STDOUT_FILENO/
 
 (Stolen from fast-import.c)

> +		die("unable to stream %s to stdout", oid_to_hex(oid));
> +	return 0;
> +}
> +
[]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/3] cat-file: handle streaming failures consistently
  2018-10-31 13:33                       ` [PATCH 3/3] cat-file: handle streaming failures consistently Torsten Bögershausen
@ 2018-10-31 14:23                         ` Junio C Hamano
  2018-10-31 14:37                           ` Jeff King
  0 siblings, 1 reply; 39+ messages in thread
From: Junio C Hamano @ 2018-10-31 14:23 UTC (permalink / raw)
  To: Torsten Bögershausen
  Cc: Jeff King, Ævar Arnfjörð Bjarmason,
	Git Mailing List, John Szakmeister, Dennis Kaarsemaker

Torsten Bögershausen <tboegi@web.de> writes:

>> +static int stream_blob(const struct object_id *oid)
>
> Sorry for nit-picking:
> could this be renamed into stream_blob_to_stdout() ?

I think that name makes sense, even though stream_blob() is just
fine for a fuction that takes a single parameter oid, as there is no
other sane choice than streaming to the standard output stream the
blob data.

>> +{
>> +	if (stream_blob_to_fd(1, oid, NULL, 0))
>
> And I wonder if we could make things clearer:
>  s/1/STDOUT_FILENO/

What would benefit from symbolic constant more in that function call
may be CAN_SEEK thing, but s/1/STDOUT_FILENO/ adds negative value to
that line, I would think.  The name of the function already makes it
clear this is sending the output to a file descriptor, and an
integer 1 that specifies a file descriptor cannot mean anything
other than the standard output stream.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/3] cat-file: handle streaming failures consistently
  2018-10-31 14:23                         ` Junio C Hamano
@ 2018-10-31 14:37                           ` Jeff King
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2018-10-31 14:37 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Torsten Bögershausen, Ævar Arnfjörð Bjarmason,
	Git Mailing List, John Szakmeister, Dennis Kaarsemaker

On Wed, Oct 31, 2018 at 11:23:48PM +0900, Junio C Hamano wrote:

> Torsten Bögershausen <tboegi@web.de> writes:
> 
> >> +static int stream_blob(const struct object_id *oid)
> >
> > Sorry for nit-picking:
> > could this be renamed into stream_blob_to_stdout() ?
> 
> I think that name makes sense, even though stream_blob() is just
> fine for a fuction that takes a single parameter oid, as there is no
> other sane choice than streaming to the standard output stream the
> blob data.

I was trying to keep the name small since it is a static-local
convenience helper. I'd rather write it as:

  stream_blob(1, oid);

than change the name. ;)

> >> +{
> >> +	if (stream_blob_to_fd(1, oid, NULL, 0))
> >
> > And I wonder if we could make things clearer:
> >  s/1/STDOUT_FILENO/
> 
> What would benefit from symbolic constant more in that function call
> may be CAN_SEEK thing, but s/1/STDOUT_FILENO/ adds negative value to
> that line, I would think.  The name of the function already makes it
> clear this is sending the output to a file descriptor, and an
> integer 1 that specifies a file descriptor cannot mean anything
> other than the standard output stream.

Yes, I'd agree (there are very few cases where I think STDOUT_FILENO
actually increases the readability, since it is usually pretty clear
from the context when something is a descriptor).

-Peff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/3] cat-file: handle streaming failures consistently
  2018-10-30 23:23                     ` [PATCH 3/3] cat-file: handle streaming failures consistently Jeff King
                                         ` (4 preceding siblings ...)
  2018-10-31 13:33                       ` [PATCH 3/3] cat-file: handle streaming failures consistently Torsten Bögershausen
@ 2018-10-31 17:38                       ` Eric Sunshine
  2018-10-31 20:29                         ` Jeff King
  5 siblings, 1 reply; 39+ messages in thread
From: Eric Sunshine @ 2018-10-31 17:38 UTC (permalink / raw)
  To: Jeff King
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Git Mailing List, John Szakmeister, Dennis Kaarsemaker

On Tue, Oct 30, 2018 at 07:23:38PM -0400, Jeff King wrote:
> There are three ways to convince cat-file to stream a blob:
> 
>   - cat-file -p $blob
> 
>   - cat-file blob $blob
> 
>   - echo $batch | cat-file --batch
> 
> In the first two, we simply exit with the error code of
> streaw_blob_to_fd(). That means that an error will cause us

Your "m" got confused and ended up upside-down.

> to exit with "-1" (which we try to avoid) without printing
> any kind of error message (which is confusing to the user).
> 
> Instead, let's match the third case, which calls die() on an
> error. Unfortunately we cannot be more specific, as
> stream_blob_to_fd() does not tell us whether the problem was
> on reading (e.g., a corrupt object) or on writing (e.g.,
> ENOSPC). That might be an opportunity for future work, but
> for now we will at least exit with a sane message and exit
> code.
> 
> Signed-off-by: Jeff King <peff@peff.net>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/3] cat-file: handle streaming failures consistently
  2018-10-31 17:38                       ` Eric Sunshine
@ 2018-10-31 20:29                         ` Jeff King
  0 siblings, 0 replies; 39+ messages in thread
From: Jeff King @ 2018-10-31 20:29 UTC (permalink / raw)
  To: Eric Sunshine
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Git Mailing List, John Szakmeister, Dennis Kaarsemaker

On Wed, Oct 31, 2018 at 01:38:59PM -0400, Eric Sunshine wrote:

> On Tue, Oct 30, 2018 at 07:23:38PM -0400, Jeff King wrote:
> > There are three ways to convince cat-file to stream a blob:
> > 
> >   - cat-file -p $blob
> > 
> >   - cat-file blob $blob
> > 
> >   - echo $batch | cat-file --batch
> > 
> > In the first two, we simply exit with the error code of
> > streaw_blob_to_fd(). That means that an error will cause us
> 
> Your "m" got confused and ended up upside-down.

Heh. I'm not sure how I managed that. They're not exactly next to each
other on a qwerty keyboard.

-Peff

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/3] tests: mark those tests where "git fsck" fails at the end
  2018-10-31 12:42                       ` [PATCH 2/3] tests: mark those tests where "git fsck" fails at the end Ævar Arnfjörð Bjarmason
@ 2018-11-01  3:37                         ` Junio C Hamano
  0 siblings, 0 replies; 39+ messages in thread
From: Junio C Hamano @ 2018-11-01  3:37 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff King, John Szakmeister, Dennis Kaarsemaker,
	Christian Couder

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Mark the tests where "git fsck" fails at the end with extra test code
> to check the fsck output. There fsck.{err,out} has been created for
> us.
>
> A later change will add the support for GIT_TEST_FSCK_TESTS. They're
> being added first to ensure the test suite will never fail with
> GIT_TEST_FSCK=true during bisect.

I am sympathetic to what step 3/3 (eh, rather, an earlier "let's not
leave the repository in corrupt state, as that would make it
inconvenient for us to later append new tests") wants to do, but not
this one---these markings at the end makes it inconvenient for us to
later add new tests to these script before them.


^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2018-11-01  3:37 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-07 12:50 "git fsck" not detecting garbage at the end of blob object files John Szakmeister
2017-01-07 21:47 ` Dennis Kaarsemaker
2017-01-08  5:26   ` Jeff King
2017-01-13  9:15     ` John Szakmeister
2017-01-13 17:52       ` [PATCH 0/6] loose-object fsck fixes/tightening Jeff King
2017-01-13 17:54         ` [PATCH 1/6] t1450: refactor loose-object removal Jeff King
2017-01-13 17:54         ` [PATCH 2/6] sha1_file: fix error message for alternate objects Jeff King
2017-01-13 17:55         ` [PATCH 3/6] t1450: test fsck of packed objects Jeff King
2017-01-13 17:58         ` [PATCH 4/6] sha1_file: add read_loose_object() function Jeff King
2017-01-13 17:59         ` [PATCH 5/6] fsck: parse loose object paths directly Jeff King
2018-10-30 20:03           ` Infinite loop regression in git-fsck in v2.12.0 Ævar Arnfjörð Bjarmason
2018-10-30 21:35             ` Jeff King
2018-10-30 22:28               ` Junio C Hamano
2018-10-30 22:56                 ` Jeff King
2018-10-30 23:12                   ` Jeff King
2018-10-30 23:18                     ` [PATCH 1/3] t1450: check large blob in trailing-garbage test Jeff King
2018-10-30 23:23                     ` [PATCH 2/3] check_stream_sha1(): handle input underflow Jeff King
2018-10-31  4:23                       ` Junio C Hamano
2018-10-31  4:30                         ` Jeff King
2018-10-31  4:44                           ` Junio C Hamano
2018-10-31  5:03                             ` Jeff King
2018-10-31  5:13                               ` Jeff King
2018-10-31  5:31                                 ` Junio C Hamano
2018-10-30 23:23                     ` [PATCH 3/3] cat-file: handle streaming failures consistently Jeff King
2018-10-31 12:42                       ` [PATCH 0/3] Add a GIT_TEST_FSCK test mode Ævar Arnfjörð Bjarmason
2018-10-31 12:42                       ` [PATCH 1/3] tests: add a "env-bool" helper to test-tool Ævar Arnfjörð Bjarmason
2018-10-31 12:42                       ` [PATCH 2/3] tests: mark those tests where "git fsck" fails at the end Ævar Arnfjörð Bjarmason
2018-11-01  3:37                         ` Junio C Hamano
2018-10-31 12:42                       ` [PATCH 3/3] tests: add a special test setup that runs "git fsck" before exiting Ævar Arnfjörð Bjarmason
2018-10-31 13:33                       ` [PATCH 3/3] cat-file: handle streaming failures consistently Torsten Bögershausen
2018-10-31 14:23                         ` Junio C Hamano
2018-10-31 14:37                           ` Jeff King
2018-10-31 17:38                       ` Eric Sunshine
2018-10-31 20:29                         ` Jeff King
2018-10-30 21:56             ` Infinite loop regression in git-fsck in v2.12.0 Ævar Arnfjörð Bjarmason
2018-10-30 23:08               ` Jeff King
2017-01-13 18:00         ` [PATCH 6/6] fsck: detect trailing garbage in all object types Jeff King
2017-01-19 11:18         ` [PATCH 0/6] loose-object fsck fixes/tightening John Szakmeister
2017-01-13  9:16   ` "git fsck" not detecting garbage at the end of blob object files John Szakmeister

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).