git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git fsck exit code?
@ 2014-08-27 22:10 David Turner
  2014-08-29 18:53 ` Jeff King
  0 siblings, 1 reply; 21+ messages in thread
From: David Turner @ 2014-08-27 22:10 UTC (permalink / raw)
  To: git mailing list

It looks like git fsck exits with 0 status even if there are some
errors. The only case where there's a non-zero exit code is if
verify_pack reports errors -- but not e.g. fsck_object_dir.

Is that really the intended behavior?  I think it would be nice to at
least support --exit-code (but probably a sensible exit code ought to be
the default).

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: git fsck exit code?
  2014-08-27 22:10 git fsck exit code? David Turner
@ 2014-08-29 18:53 ` Jeff King
  2014-08-29 19:21   ` Junio C Hamano
  0 siblings, 1 reply; 21+ messages in thread
From: Jeff King @ 2014-08-29 18:53 UTC (permalink / raw)
  To: David Turner; +Cc: git mailing list

On Wed, Aug 27, 2014 at 06:10:12PM -0400, David Turner wrote:

> It looks like git fsck exits with 0 status even if there are some
> errors. The only case where there's a non-zero exit code is if
> verify_pack reports errors -- but not e.g. fsck_object_dir.

It will also bail non-zero with _certain_ tree errors that cause git to
die() rather than fscking more completely.

> Is that really the intended behavior?  I think it would be nice to at
> least support --exit-code (but probably a sensible exit code ought to be
> the default).

I don't know that the current behavior is really intended. I agree that
I would expect a non-zero exit code if there are errors. Fsck also has
some warnings. I do not know what those should produce (I'd think a zero
exit normally, and some option like -Werror to turn them into errors).

-Peff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: git fsck exit code?
  2014-08-29 18:53 ` Jeff King
@ 2014-08-29 19:21   ` Junio C Hamano
  2014-08-29 20:18     ` David Turner
  0 siblings, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2014-08-29 19:21 UTC (permalink / raw)
  To: Jeff King; +Cc: David Turner, git mailing list

Jeff King <peff@peff.net> writes:

> On Wed, Aug 27, 2014 at 06:10:12PM -0400, David Turner wrote:
>
>> It looks like git fsck exits with 0 status even if there are some
>> errors. The only case where there's a non-zero exit code is if
>> verify_pack reports errors -- but not e.g. fsck_object_dir.
>
> It will also bail non-zero with _certain_ tree errors that cause git to
> die() rather than fscking more completely.

Even if git does not die, whenever it says broken link, missing
object, or object corrupt, we set errors_found and that variable
affects the exit status of fsck.  What does "some errors" exactly
mean in the original report?  Dangling objects are *not* errors and
should not cause fsck to report an error with its exit status.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: git fsck exit code?
  2014-08-29 19:21   ` Junio C Hamano
@ 2014-08-29 20:18     ` David Turner
  2014-08-29 20:31       ` Jeff King
  2014-08-31 18:54       ` git fsck exit code? Øyvind A. Holm
  0 siblings, 2 replies; 21+ messages in thread
From: David Turner @ 2014-08-29 20:18 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git mailing list

On Fri, 2014-08-29 at 12:21 -0700, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
> > On Wed, Aug 27, 2014 at 06:10:12PM -0400, David Turner wrote:
> >
> >> It looks like git fsck exits with 0 status even if there are some
> >> errors. The only case where there's a non-zero exit code is if
> >> verify_pack reports errors -- but not e.g. fsck_object_dir.
> >
> > It will also bail non-zero with _certain_ tree errors that cause git to
> > die() rather than fscking more completely.
> 
> Even if git does not die, whenever it says broken link, missing
> object, or object corrupt, we set errors_found and that variable
> affects the exit status of fsck.  What does "some errors" exactly
> mean in the original report?  Dangling objects are *not* errors and
> should not cause fsck to report an error with its exit status.

error in tree 9f50addba2b4e9e928d9c6a7056bdf71b36fba90: contains
duplicate file entries

(at least -- there might be more, but that's the one that bit me)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: git fsck exit code?
  2014-08-29 20:18     ` David Turner
@ 2014-08-29 20:31       ` Jeff King
  2014-08-29 20:47         ` Junio C Hamano
  2014-09-09 22:03         ` [PATCH] fsck: exit with non-zero status upon error from fsck_obj() Junio C Hamano
  2014-08-31 18:54       ` git fsck exit code? Øyvind A. Holm
  1 sibling, 2 replies; 21+ messages in thread
From: Jeff King @ 2014-08-29 20:31 UTC (permalink / raw)
  To: David Turner; +Cc: Junio C Hamano, git mailing list

On Fri, Aug 29, 2014 at 04:18:00PM -0400, David Turner wrote:

> > Even if git does not die, whenever it says broken link, missing
> > object, or object corrupt, we set errors_found and that variable
> > affects the exit status of fsck.  What does "some errors" exactly
> > mean in the original report?  Dangling objects are *not* errors and
> > should not cause fsck to report an error with its exit status.
> 
> error in tree 9f50addba2b4e9e928d9c6a7056bdf71b36fba90: contains
> duplicate file entries
> 
> (at least -- there might be more, but that's the one that bit me)

I think that we just don't set "errors_found" in fsck_obj (nor do we in
fsck_obj_buffer, but in that case its caller is verify-pack, which
propagates the return code). Maybe (completely untested):

diff --git a/builtin/fsck.c b/builtin/fsck.c
index d42a27d..29de901 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -388,7 +388,8 @@ static void fsck_sha1_list(void)
 		unsigned char *sha1 = entry->sha1;
 
 		sha1_list.entry[i] = NULL;
-		fsck_sha1(sha1);
+		if (fsck_sha1(sha1))
+			errors_found |= ERROR_OBJECT;
 		free(entry);
 	}
 	sha1_list.nr = 0;

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: git fsck exit code?
  2014-08-29 20:31       ` Jeff King
@ 2014-08-29 20:47         ` Junio C Hamano
  2014-09-09 22:03         ` [PATCH] fsck: exit with non-zero status upon error from fsck_obj() Junio C Hamano
  1 sibling, 0 replies; 21+ messages in thread
From: Junio C Hamano @ 2014-08-29 20:47 UTC (permalink / raw)
  To: Jeff King; +Cc: David Turner, git mailing list

Jeff King <peff@peff.net> writes:

> On Fri, Aug 29, 2014 at 04:18:00PM -0400, David Turner wrote:
>
>> > Even if git does not die, whenever it says broken link, missing
>> > object, or object corrupt, we set errors_found and that variable
>> > affects the exit status of fsck.  What does "some errors" exactly
>> > mean in the original report?  Dangling objects are *not* errors and
>> > should not cause fsck to report an error with its exit status.
>> 
>> error in tree 9f50addba2b4e9e928d9c6a7056bdf71b36fba90: contains
>> duplicate file entries
>> 
>> (at least -- there might be more, but that's the one that bit me)
>
> I think that we just don't set "errors_found" in fsck_obj (nor do we in
> fsck_obj_buffer, but in that case its caller is verify-pack, which
> propagates the return code). Maybe (completely untested):

Sounds about right.  David may have more or there may be not.  Let's
not forget to collect them and roll the fixes into a single update.

Thanks.

> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index d42a27d..29de901 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -388,7 +388,8 @@ static void fsck_sha1_list(void)
>  		unsigned char *sha1 = entry->sha1;
>  
>  		sha1_list.entry[i] = NULL;
> -		fsck_sha1(sha1);
> +		if (fsck_sha1(sha1))
> +			errors_found |= ERROR_OBJECT;
>  		free(entry);
>  	}
>  	sha1_list.nr = 0;

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: git fsck exit code?
  2014-08-29 20:18     ` David Turner
  2014-08-29 20:31       ` Jeff King
@ 2014-08-31 18:54       ` Øyvind A. Holm
  2014-09-01 11:54         ` Øyvind A. Holm
  2014-09-01 18:17         ` David Turner
  1 sibling, 2 replies; 21+ messages in thread
From: Øyvind A. Holm @ 2014-08-31 18:54 UTC (permalink / raw)
  To: David Turner; +Cc: Junio C Hamano, Jeff King, git mailing list

On 29 August 2014 22:18, David Turner <dturner@twopensource.com> wrote:
> On Fri, 2014-08-29 at 12:21 -0700, Junio C Hamano wrote:
> > Jeff King <peff@peff.net> writes:
> > > On Wed, Aug 27, 2014 at 06:10:12PM -0400, David Turner wrote:
> > > > It looks like git fsck exits with 0 status even if there are
> > > > some errors. The only case where there's a non-zero exit code is
> > > > if verify_pack reports errors -- but not e.g. fsck_object_dir.
> > >
> > > It will also bail non-zero with _certain_ tree errors that cause
> > > git to die() rather than fscking more completely.
> >
> > Even if git does not die, whenever it says broken link, missing
> > object, or object corrupt, we set errors_found and that variable
> > affects the exit status of fsck.  What does "some errors" exactly
> > mean in the original report?  Dangling objects are *not* errors and
> > should not cause fsck to report an error with its exit status.
>
> error in tree 9f50addba2b4e9e928d9c6a7056bdf71b36fba90: contains
> duplicate file entries

I don't think git fsck should return !0 in this case. Yes, it's an
inconsistency in the repo, but it's sometimes due to erroneous
conversions from another SCM or some other (non-standard) implementation
of the git client. I've seen things like this (and other inconsistencies
in repos, like wrong date formats, non-standard Author fields, unsorted
trees, zero-padded file modes and so on), and the thing is, owners of
public repos with these errors tend to avoid fixing it because it
changes the commit SHAs. If git fsck starts to return !0 on these
errors, it will always return error on that repo, which in practise
means that the error code is rendered useless. IMHO git fsck should only
return !0 on errors that can be fixed without changing the commit
history, for example missing or invalid objects.

Øyvind

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: git fsck exit code?
  2014-08-31 18:54       ` git fsck exit code? Øyvind A. Holm
@ 2014-09-01 11:54         ` Øyvind A. Holm
  2014-09-01 18:17         ` David Turner
  1 sibling, 0 replies; 21+ messages in thread
From: Øyvind A. Holm @ 2014-09-01 11:54 UTC (permalink / raw)
  To: David Turner; +Cc: Junio C Hamano, Jeff King, git mailing list

On 31 August 2014 20:54, Øyvind A. Holm <sunny@sunbase.org> wrote:
> On 29 August 2014 22:18, David Turner wrote:
> > error in tree 9f50addba2b4e9e928d9c6a7056bdf71b36fba90: contains
> > duplicate file entries
>
> I don't think git fsck should return !0 in this case. Yes, it's an
> inconsistency in the repo, but it's sometimes due to erroneous
> conversions from another SCM or some other (non-standard)
> implementation of the git client. I've seen things like this (and
> other inconsistencies in repos, like wrong date formats, non-standard
> Author fields, unsorted trees, zero-padded file modes and so on), and
> the thing is, owners of public repos with these errors tend to avoid
> fixing it because it changes the commit SHAs. If git fsck starts to
> return !0 on these errors, it will always return error on that repo,
> which in practise means that the error code is rendered useless. IMHO
> git fsck should only return !0 on errors that can be fixed without
> changing the commit history, for example missing or invalid objects.

To elaborate on what I wrote: Of course, if the error is grave enough
that Git isn't able to work around it and it severely limits the
functionality of the repo, action needs to be taken regardless of
whether the history has to be changed or not. In that case it's OK to
return an error value too.

Øyvind

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: git fsck exit code?
  2014-08-31 18:54       ` git fsck exit code? Øyvind A. Holm
  2014-09-01 11:54         ` Øyvind A. Holm
@ 2014-09-01 18:17         ` David Turner
  2014-09-09 22:09           ` Jeff King
  1 sibling, 1 reply; 21+ messages in thread
From: David Turner @ 2014-09-01 18:17 UTC (permalink / raw)
  To: Øyvind A. Holm; +Cc: Junio C Hamano, Jeff King, git mailing list

On Sun, 2014-08-31 at 20:54 +0200, Øyvind A. Holm wrote:
> On 29 August 2014 22:18, David Turner <dturner@twopensource.com> wrote:
> > On Fri, 2014-08-29 at 12:21 -0700, Junio C Hamano wrote:
> > > Jeff King <peff@peff.net> writes:
> > > > On Wed, Aug 27, 2014 at 06:10:12PM -0400, David Turner wrote:
> > > > > It looks like git fsck exits with 0 status even if there are
> > > > > some errors. The only case where there's a non-zero exit code is
> > > > > if verify_pack reports errors -- but not e.g. fsck_object_dir.
> > > >
> > > > It will also bail non-zero with _certain_ tree errors that cause
> > > > git to die() rather than fscking more completely.
> > >
> > > Even if git does not die, whenever it says broken link, missing
> > > object, or object corrupt, we set errors_found and that variable
> > > affects the exit status of fsck.  What does "some errors" exactly
> > > mean in the original report?  Dangling objects are *not* errors and
> > > should not cause fsck to report an error with its exit status.
> >
> > error in tree 9f50addba2b4e9e928d9c6a7056bdf71b36fba90: contains
> > duplicate file entries
> 
> I don't think git fsck should return !0 in this case. Yes, it's an
> inconsistency in the repo, but it's sometimes due to erroneous
> conversions from another SCM or some other (non-standard) implementation
> of the git client. I've seen things like this (and other inconsistencies
> in repos, like wrong date formats, non-standard Author fields, unsorted
> trees, zero-padded file modes and so on), and the thing is, owners of
> public repos with these errors tend to avoid fixing it because it
> changes the commit SHAs. If git fsck starts to return !0 on these
> errors, it will always return error on that repo, which in practise
> means that the error code is rendered useless. IMHO git fsck should only
> return !0 on errors that can be fixed without changing the commit
> history, for example missing or invalid objects.

We could have one exit code for errors which can be fixed without
rewriting history, and another for errors that can't.  Or different
command-line arguments to suppress errors of this sort.

In my case, I actually could fix the issue, because it was in a
newly-created branch; I just rewrote the script that created the branch
to be a little smarter.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] fsck: exit with non-zero status upon error from fsck_obj()
  2014-08-29 20:31       ` Jeff King
  2014-08-29 20:47         ` Junio C Hamano
@ 2014-09-09 22:03         ` Junio C Hamano
  2014-09-09 22:07           ` Jeff King
  2014-09-09 22:21           ` [PATCH] fsck: exit with non-zero status upon error from fsck_obj() Junio C Hamano
  1 sibling, 2 replies; 21+ messages in thread
From: Junio C Hamano @ 2014-09-09 22:03 UTC (permalink / raw)
  To: Jeff King; +Cc: David Turner, git mailing list

From: Jeff King <peff@peff.net>
Date: Fri, 29 Aug 2014 16:31:46 -0400

Upon finding a corrupt loose object, we forgot to note the error to
signal it with the exit status of the entire process.

[jc: adjusted t1450 and added another test]

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * I think your fix is a right one that catches all the "we can
   parse minimally for the purpose of 'struct object' class system,
   but the object is semantically broken" cases, as fsck_obj() is
   where such a validation should all happen.

   I can haz a sign off?  Thanks.

 builtin/fsck.c  |  3 ++-
 t/t1450-fsck.sh | 30 +++++++++++++++++++++++++-----
 2 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 1affdd5..8abe644 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -389,7 +389,8 @@ static void fsck_sha1_list(void)
 		unsigned char *sha1 = entry->sha1;
 
 		sha1_list.entry[i] = NULL;
-		fsck_sha1(sha1);
+		if (fsck_sha1(sha1))
+			errors_found |= ERROR_OBJECT;
 		free(entry);
 	}
 	sha1_list.nr = 0;
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 8c739c9..c8dff9c 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -101,7 +101,7 @@ test_expect_success 'email with embedded > is not okay' '
 	test_when_finished "remove_object $new" &&
 	git update-ref refs/heads/bogus "$new" &&
 	test_when_finished "git update-ref -d refs/heads/bogus" &&
-	git fsck 2>out &&
+	test_must_fail git fsck 2>out &&
 	cat out &&
 	grep "error in commit $new" out
 '
@@ -113,7 +113,7 @@ test_expect_success 'missing < email delimiter is reported nicely' '
 	test_when_finished "remove_object $new" &&
 	git update-ref refs/heads/bogus "$new" &&
 	test_when_finished "git update-ref -d refs/heads/bogus" &&
-	git fsck 2>out &&
+	test_must_fail git fsck 2>out &&
 	cat out &&
 	grep "error in commit $new.* - bad name" out
 '
@@ -125,7 +125,7 @@ test_expect_success 'missing email is reported nicely' '
 	test_when_finished "remove_object $new" &&
 	git update-ref refs/heads/bogus "$new" &&
 	test_when_finished "git update-ref -d refs/heads/bogus" &&
-	git fsck 2>out &&
+	test_must_fail git fsck 2>out &&
 	cat out &&
 	grep "error in commit $new.* - missing email" out
 '
@@ -137,7 +137,7 @@ test_expect_success '> in name is reported' '
 	test_when_finished "remove_object $new" &&
 	git update-ref refs/heads/bogus "$new" &&
 	test_when_finished "git update-ref -d refs/heads/bogus" &&
-	git fsck 2>out &&
+	test_must_fail git fsck 2>out &&
 	cat out &&
 	grep "error in commit $new" out
 '
@@ -151,11 +151,31 @@ test_expect_success 'integer overflow in timestamps is reported' '
 	test_when_finished "remove_object $new" &&
 	git update-ref refs/heads/bogus "$new" &&
 	test_when_finished "git update-ref -d refs/heads/bogus" &&
-	git fsck 2>out &&
+	test_must_fail git fsck 2>out &&
 	cat out &&
 	grep "error in commit $new.*integer overflow" out
 '
 
+test_expect_success 'malformatted tree object' '
+	test_when_finished "git update-ref -d refs/tags/wrong" &&
+	test_when_finished "remove_object \$T" &&
+	T=$(
+		GIT_INDEX_FILE=test-index &&
+		export GIT_INDEX_FILE &&
+		rm -f test-index &&
+		>x &&
+		git add x &&
+		T=$(git write-tree) &&
+		(
+			git cat-file tree $T &&
+			git cat-file tree $T
+		) |
+		git hash-object -w -t tree --stdin
+	) &&
+	test_must_fail git fsck 2>out &&
+	grep "error in tree .*contains duplicate file entries" out
+'
+
 test_expect_success 'tag pointing to nonexistent' '
 	cat >invalid-tag <<-\EOF &&
 	object ffffffffffffffffffffffffffffffffffffffff
-- 
2.1.0-449-g93bbe5b

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] fsck: exit with non-zero status upon error from fsck_obj()
  2014-09-09 22:03         ` [PATCH] fsck: exit with non-zero status upon error from fsck_obj() Junio C Hamano
@ 2014-09-09 22:07           ` Jeff King
  2014-09-12  3:38             ` [PATCH] fsck: return non-zero status on missing ref tips Jeff King
  2014-09-09 22:21           ` [PATCH] fsck: exit with non-zero status upon error from fsck_obj() Junio C Hamano
  1 sibling, 1 reply; 21+ messages in thread
From: Jeff King @ 2014-09-09 22:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: David Turner, git mailing list

On Tue, Sep 09, 2014 at 03:03:33PM -0700, Junio C Hamano wrote:

> From: Jeff King <peff@peff.net>
> Date: Fri, 29 Aug 2014 16:31:46 -0400
> 
> Upon finding a corrupt loose object, we forgot to note the error to
> signal it with the exit status of the entire process.
> 
> [jc: adjusted t1450 and added another test]
> 
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
> 
>  * I think your fix is a right one that catches all the "we can
>    parse minimally for the purpose of 'struct object' class system,
>    but the object is semantically broken" cases, as fsck_obj() is
>    where such a validation should all happen.
> 
>    I can haz a sign off?  Thanks.

Thanks, I think this is a step forward regardless of other conversation
on the exit code, as it is just harmonizing loose and packed object
behavior.

Signed-off-by: Jeff King <peff@peff.net>

-Peff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: git fsck exit code?
  2014-09-01 18:17         ` David Turner
@ 2014-09-09 22:09           ` Jeff King
  0 siblings, 0 replies; 21+ messages in thread
From: Jeff King @ 2014-09-09 22:09 UTC (permalink / raw)
  To: David Turner; +Cc: Øyvind A. Holm, Junio C Hamano, git mailing list

On Mon, Sep 01, 2014 at 02:17:43PM -0400, David Turner wrote:

> > I don't think git fsck should return !0 in this case. Yes, it's an
> > inconsistency in the repo, but it's sometimes due to erroneous
> > conversions from another SCM or some other (non-standard) implementation
> > of the git client. I've seen things like this (and other inconsistencies
> > in repos, like wrong date formats, non-standard Author fields, unsorted
> > trees, zero-padded file modes and so on), and the thing is, owners of
> > public repos with these errors tend to avoid fixing it because it
> > changes the commit SHAs. If git fsck starts to return !0 on these
> > errors, it will always return error on that repo, which in practise
> > means that the error code is rendered useless. IMHO git fsck should only
> > return !0 on errors that can be fixed without changing the commit
> > history, for example missing or invalid objects.
> 
> We could have one exit code for errors which can be fixed without
> rewriting history, and another for errors that can't.  Or different
> command-line arguments to suppress errors of this sort.
> 
> In my case, I actually could fix the issue, because it was in a
> newly-created branch; I just rewrote the script that created the branch
> to be a little smarter.

I think there's obviously some disagreement or room for interpretation
on the exit code. Perhaps a better path forward is to have a
machine-readable output from fsck in the first place, and then we can
annotate each warning/error with extra information that a caller can
use.

As it is now, you have to scrape fsck's stderr stream to figure out what
happened (which is a thing I have done, and it felt dirty and wrong).

-Peff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] fsck: exit with non-zero status upon error from fsck_obj()
  2014-09-09 22:03         ` [PATCH] fsck: exit with non-zero status upon error from fsck_obj() Junio C Hamano
  2014-09-09 22:07           ` Jeff King
@ 2014-09-09 22:21           ` Junio C Hamano
  2014-09-09 22:29             ` Jonathan Nieder
  1 sibling, 1 reply; 21+ messages in thread
From: Junio C Hamano @ 2014-09-09 22:21 UTC (permalink / raw)
  To: Jeff King; +Cc: David Turner, git mailing list, Jonathan Nieder

Junio C Hamano <gitster@pobox.com> writes:

> From: Jeff King <peff@peff.net>
> Date: Fri, 29 Aug 2014 16:31:46 -0400
>
> Upon finding a corrupt loose object, we forgot to note the error to
> signal it with the exit status of the entire process.
>
> [jc: adjusted t1450 and added another test]

Spoke too soon.  If found another instance where we expected fsck to
notice a corruption.  We'd need to squash this in.

By the way, Jonathan, with dbedf8bf (t1450 (fsck): remove dangling
objects, 2010-09-06) you added a 'test_might_fail git fsck' to the
1450 test that catches an object corruption.  Do you remember if
there was some flakiness in this test that necessitated it, or is it
merely "I think this should fail, but it does not, and we may fix it
some day but I am not doing that in this patch?"  Assuming that it
is the latter, I am including an update to 1450 as well.

There is another test_might_fail that runs "git rev-list" in the
same test, but that is not part of the said patch and unrelated to
the current topic, so I didn't dig further; we may want to audit the
hits from "git grep test_might_fail t/" as a separate topic (hint,
hint).


 t/t1450-fsck.sh        | 2 +-
 t/t4212-log-corrupt.sh | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index c8dff9c..0de755c 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -69,7 +69,7 @@ test_expect_success 'object with bad sha1' '
 	git update-ref refs/heads/bogus $cmt &&
 	test_when_finished "git update-ref -d refs/heads/bogus" &&
 
-	test_might_fail git fsck 2>out &&
+	test_must_fail git fsck 2>out &&
 	cat out &&
 	grep "$sha.*corrupt" out
 '
diff --git a/t/t4212-log-corrupt.sh b/t/t4212-log-corrupt.sh
index 58b792b..67bd8ec 100755
--- a/t/t4212-log-corrupt.sh
+++ b/t/t4212-log-corrupt.sh
@@ -14,7 +14,7 @@ test_expect_success 'setup' '
 '
 
 test_expect_success 'fsck notices broken commit' '
-	git fsck 2>actual &&
+	test_must_fail git fsck 2>actual &&
 	test_i18ngrep invalid.author actual
 '
 

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] fsck: exit with non-zero status upon error from fsck_obj()
  2014-09-09 22:21           ` [PATCH] fsck: exit with non-zero status upon error from fsck_obj() Junio C Hamano
@ 2014-09-09 22:29             ` Jonathan Nieder
  0 siblings, 0 replies; 21+ messages in thread
From: Jonathan Nieder @ 2014-09-09 22:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, David Turner, git mailing list, Thomas Rast

Junio C Hamano wrote:

> By the way, Jonathan, with dbedf8bf (t1450 (fsck): remove dangling
> objects, 2010-09-06) you added a 'test_might_fail git fsck' to the
> 1450 test that catches an object corruption.  Do you remember if
> there was some flakiness in this test that necessitated it, or is it
> merely "I think this should fail, but it does not, and we may fix it
> some day but I am not doing that in this patch?"

Thomas is the person to ask. :)  See v1.6.3-rc0~176^2~3 (Test fsck a
bit harder, 2009-02-19):

> +	(git fsck 2>out; true) &&

which that cleanup tightened to test_might_fail.

But yes, I'm pretty sure it was for futureproofing, not for hiding
flakiness.  I think your patch does the right thing in changing it to
test_must_fail now that fsck exits nonzero.

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] fsck: return non-zero status on missing ref tips
  2014-09-09 22:07           ` Jeff King
@ 2014-09-12  3:38             ` Jeff King
  2014-09-12  4:29               ` Jeff King
  0 siblings, 1 reply; 21+ messages in thread
From: Jeff King @ 2014-09-12  3:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: David Turner, git mailing list

> On Tue, Sep 09, 2014 at 03:03:33PM -0700, Junio C Hamano wrote:
> 
> > Upon finding a corrupt loose object, we forgot to note the error to
> > signal it with the exit status of the entire process.
> > 
> > [jc: adjusted t1450 and added another test]
> > 
> > Signed-off-by: Junio C Hamano <gitster@pobox.com>
> > ---
> > 
> >  * I think your fix is a right one that catches all the "we can
> >    parse minimally for the purpose of 'struct object' class system,
> >    but the object is semantically broken" cases, as fsck_obj() is
> >    where such a validation should all happen.

Here's another structural case that we should catch but do not:

-- >8 --
Subject: fsck: return non-zero status on missing ref tips

Fsck tries hard to detect missing objects, and will complain
(and exit non-zero) about any inter-object links that are
missing. However, it will not exit non-zero for any missing
ref tips, meaning that a severely broken repository may
still pass "git fsck && echo ok".

The problem is that we use for_each_ref to iterate over the
ref tips, which hides broken tips. It does at least print an
error from the refs.c code, but fsck does not ever see the
ref and cannot note the problem in its exit code. We can solve
this by using for_each_rawref and noting the error ourselves.

In addition to adding tests for this case, we add tests for
all types of missing-object links (all of which worked, but
which we were not testing).

Signed-off-by: Jeff King <peff@peff.net>
---
Just below here we check that refs/heads/* points only to commit
objects. That's also sort-of-structural, but is pretty easy to recover
from without data loss, so I don't think it is as obvious a candidate
for a non-zero exit.

 builtin/fsck.c  |  3 ++-
 t/t1450-fsck.sh | 56 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 58 insertions(+), 1 deletion(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 29de901..0928a98 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -489,6 +489,7 @@ static int fsck_handle_ref(const char *refname, const unsigned char *sha1, int f
 	obj = parse_object(sha1);
 	if (!obj) {
 		error("%s: invalid sha1 pointer %s", refname, sha1_to_hex(sha1));
+		errors_found |= ERROR_REACHABLE;
 		/* We'll continue with the rest despite the error.. */
 		return 0;
 	}
@@ -505,7 +506,7 @@ static void get_default_heads(void)
 {
 	if (head_points_at && !is_null_sha1(head_sha1))
 		fsck_handle_ref("HEAD", head_sha1, 0, NULL);
-	for_each_ref(fsck_handle_ref, NULL);
+	for_each_rawref(fsck_handle_ref, NULL);
 	if (include_reflogs)
 		for_each_reflog(fsck_handle_reflog, NULL);
 
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 0de755c..b52397a 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -302,4 +302,60 @@ test_expect_success 'fsck notices ".git" in trees' '
 	)
 '
 
+# create a static test repo which is broken by omitting
+# one particular object ($1, which is looked up via rev-parse
+# in the new repository).
+create_repo_missing () {
+	rm -rf missing &&
+	git init missing &&
+	(
+		cd missing &&
+		git commit -m one --allow-empty &&
+		mkdir subdir &&
+		echo content >subdir/file &&
+		git add subdir/file &&
+		git commit -m two &&
+		unrelated=$(echo unrelated | git hash-object --stdin -w) &&
+		git tag -m foo tag $unrelated &&
+		sha1=$(git rev-parse --verify "$1") &&
+		path=$(echo $sha1 | sed 's|..|&/|') &&
+		rm .git/objects/$path
+	)
+}
+
+test_expect_success 'fsck notices missing blob' '
+	create_repo_missing HEAD:subdir/file &&
+	test_must_fail git -C missing fsck
+'
+
+test_expect_success 'fsck notices missing subtree' '
+	create_repo_missing HEAD:subdir &&
+	test_must_fail git -C missing fsck
+'
+
+test_expect_success 'fsck notices missing root tree' '
+	create_repo_missing HEAD^{tree} &&
+	test_must_fail git -C missing fsck
+'
+
+test_expect_success 'fsck notices missing parent' '
+	create_repo_missing HEAD^ &&
+	test_must_fail git -C missing fsck
+'
+
+test_expect_success 'fsck notices missing tagged object' '
+	create_repo_missing tag^{blob} &&
+	test_must_fail git -C missing fsck
+'
+
+test_expect_success 'fsck notices ref pointing to missing commit' '
+	create_repo_missing HEAD &&
+	test_must_fail git -C missing fsck
+'
+
+test_expect_success 'fsck notices ref pointing to missing tag' '
+	create_repo_missing tag &&
+	test_must_fail git -C missing fsck
+'
+
 test_done
-- 
2.1.0.373.g91ca799

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] fsck: return non-zero status on missing ref tips
  2014-09-12  3:38             ` [PATCH] fsck: return non-zero status on missing ref tips Jeff King
@ 2014-09-12  4:29               ` Jeff King
  2014-09-12  4:38                 ` Jeff King
                                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Jeff King @ 2014-09-12  4:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: David Turner, git mailing list, Michael Haggerty

[+cc mhagger for packed-refs wisdom]

On Thu, Sep 11, 2014 at 11:38:30PM -0400, Jeff King wrote:

> Fsck tries hard to detect missing objects, and will complain
> (and exit non-zero) about any inter-object links that are
> missing. However, it will not exit non-zero for any missing
> ref tips, meaning that a severely broken repository may
> still pass "git fsck && echo ok".
> 
> The problem is that we use for_each_ref to iterate over the
> ref tips, which hides broken tips. It does at least print an
> error from the refs.c code, but fsck does not ever see the
> ref and cannot note the problem in its exit code. We can solve
> this by using for_each_rawref and noting the error ourselves.

There's a possibly related problem with packed-refs that I noticed while
looking at this.

When we call pack-refs, it will refuse to pack any broken loose refs,
and leave them loose. Which is sane. But when we delete a ref, we need
to rewrite the packed-refs file, and we omit any broken packed refs. We
wouldn't have written a broken entry, but we may get broken later (i.e.,
the tip object may go missing after the packed-refs file is written).

If we only have a packed copy of "refs/heads/master" and it is broken,
then deleting any _other_ unrelated ref will cause refs/heads/master to
be dropped from the packed-refs file entirely. We get an error message,
but that's easy to miss, and the pointer to master's sha1 is lost
forever.

This test (on top of the patch I just sent) demonstrates:

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index b52397a..b0f4545 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -317,6 +317,7 @@ create_repo_missing () {
 		git commit -m two &&
 		unrelated=$(echo unrelated | git hash-object --stdin -w) &&
 		git tag -m foo tag $unrelated &&
+		git pack-refs --all --prune &&
 		sha1=$(git rev-parse --verify "$1") &&
 		path=$(echo $sha1 | sed 's|..|&/|') &&
 		rm .git/objects/$path
@@ -358,4 +359,10 @@ test_expect_success 'fsck notices ref pointing to missing tag' '
 	test_must_fail git -C missing fsck
 '
 
+test_expect_success 'ref deletion does not eat broken refs' '
+	create_repo_missing HEAD &&
+	git -C missing update-ref -d refs/tags/tag &&
+	test_must_fail git -C missing fsck
+'
+
 test_done


The fsck will succeed even though "master" should be broken, because
we appear to have no refs at all.

The fault is in curate_packed_ref_fn, which drops crufty from
packed-refs as we repack. That in turn is representing an older:

         if (!ref_resolves_to_object(entry))
                 return 0; /* Skip broken refs */

which comes from 624cac3 (refs: change the internal reference-iteration
API, 2013-04-22). But that was just maintaining the existing behavior,
which was using a do_for_each_ref iteration without INCLUDE_BROKEN. So I
think this problem has a similar root, though the fix is now slightly
different.

I am tempted to say that we do not need to do curate_each_ref_fn at all.
Any entry with a broken sha1 is either:

  1. A truly broken ref, in which case we should make sure to keep it
     (i.e., it is not cruft at all).

  2. A crufty entry that has been replaced by a loose reference that has
     not yet been packed. Such a crufty entry may point to broken
     objects, and that is OK.

In case 2, we _could_ delete the cruft. But I do not think we need to.
The loose ref will take precedence to anybody who actually does a ref
lookup, so the cruft is not hurting anybody.

Dropping curate_packed_ref_fn (as below) fixes the test above. And
miraculously does not even seem to conflict with ref patches in pu. :)

Am I missing any case that it is actually helping?

diff --git a/refs.c b/refs.c
index a7853cc..83c2bf7 100644
--- a/refs.c
+++ b/refs.c
@@ -2484,70 +2484,11 @@ int pack_refs(unsigned int flags)
 }
 
 /*
- * If entry is no longer needed in packed-refs, add it to the string
- * list pointed to by cb_data.  Reasons for deleting entries:
- *
- * - Entry is broken.
- * - Entry is overridden by a loose ref.
- * - Entry does not point at a valid object.
- *
- * In the first and third cases, also emit an error message because these
- * are indications of repository corruption.
- */
-static int curate_packed_ref_fn(struct ref_entry *entry, void *cb_data)
-{
-	struct string_list *refs_to_delete = cb_data;
-
-	if (entry->flag & REF_ISBROKEN) {
-		/* This shouldn't happen to packed refs. */
-		error("%s is broken!", entry->name);
-		string_list_append(refs_to_delete, entry->name);
-		return 0;
-	}
-	if (!has_sha1_file(entry->u.value.sha1)) {
-		unsigned char sha1[20];
-		int flags;
-
-		if (read_ref_full(entry->name, sha1, 0, &flags))
-			/* We should at least have found the packed ref. */
-			die("Internal error");
-		if ((flags & REF_ISSYMREF) || !(flags & REF_ISPACKED)) {
-			/*
-			 * This packed reference is overridden by a
-			 * loose reference, so it is OK that its value
-			 * is no longer valid; for example, it might
-			 * refer to an object that has been garbage
-			 * collected.  For this purpose we don't even
-			 * care whether the loose reference itself is
-			 * invalid, broken, symbolic, etc.  Silently
-			 * remove the packed reference.
-			 */
-			string_list_append(refs_to_delete, entry->name);
-			return 0;
-		}
-		/*
-		 * There is no overriding loose reference, so the fact
-		 * that this reference doesn't refer to a valid object
-		 * indicates some kind of repository corruption.
-		 * Report the problem, then omit the reference from
-		 * the output.
-		 */
-		error("%s does not point to a valid object!", entry->name);
-		string_list_append(refs_to_delete, entry->name);
-		return 0;
-	}
-
-	return 0;
-}
-
-/*
  * Must be called with packed refs already locked (and sorted)
  */
 static int repack_without_refs(const char **refnames, int n, struct strbuf *err)
 {
 	struct ref_dir *packed;
-	struct string_list refs_to_delete = STRING_LIST_INIT_DUP;
-	struct string_list_item *ref_to_delete;
 	int i, ret;
 
 	/* Look for a packed ref */
@@ -2561,15 +2502,6 @@ static int repack_without_refs(const char **refnames, int n, struct strbuf *err)
 	for (i = 0; i < n; i++)
 		remove_entry(packed, refnames[i]);
 
-	/* Remove any other accumulated cruft */
-	do_for_each_entry_in_dir(packed, 0, curate_packed_ref_fn, &refs_to_delete);
-	for_each_string_list_item(ref_to_delete, &refs_to_delete) {
-		if (remove_entry(packed, ref_to_delete->string) == -1) {
-			rollback_packed_refs();
-			die("internal error");
-		}
-	}
-
 	/* Write what remains */
 	ret = commit_packed_refs();
 	if (ret && err)

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH] fsck: return non-zero status on missing ref tips
  2014-09-12  4:29               ` Jeff King
@ 2014-09-12  4:38                 ` Jeff King
  2014-09-12  4:58                 ` Junio C Hamano
  2014-09-15 14:57                 ` Michael Haggerty
  2 siblings, 0 replies; 21+ messages in thread
From: Jeff King @ 2014-09-12  4:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: David Turner, git mailing list, Michael Haggerty

On Fri, Sep 12, 2014 at 12:29:39AM -0400, Jeff King wrote:

> Dropping curate_packed_ref_fn (as below) fixes the test above. And
> miraculously does not even seem to conflict with ref patches in pu. :)

Of course I spoke too soon. The patch I sent is actually based on pu. It
is easy to make the equivalent change in either "master" or "pu" (they
are both just deletions of the same blocks), but the code mutated a
little in between, and there are purely textual conflicts going from one
to the other.

-Peff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] fsck: return non-zero status on missing ref tips
  2014-09-12  4:29               ` Jeff King
  2014-09-12  4:38                 ` Jeff King
@ 2014-09-12  4:58                 ` Junio C Hamano
  2014-09-12  5:12                   ` Jeff King
  2014-09-15 14:42                   ` Michael Haggerty
  2014-09-15 14:57                 ` Michael Haggerty
  2 siblings, 2 replies; 21+ messages in thread
From: Junio C Hamano @ 2014-09-12  4:58 UTC (permalink / raw)
  To: Jeff King; +Cc: David Turner, git mailing list, Michael Haggerty

On Thu, Sep 11, 2014 at 9:29 PM, Jeff King <peff@peff.net> wrote:
> [+cc mhagger for packed-refs wisdom]
>
> If we only have a packed copy of "refs/heads/master" and it is broken,
> then deleting any _other_ unrelated ref will cause refs/heads/master to
> be dropped from the packed-refs file entirely. We get an error message,
> but that's easy to miss, and the pointer to master's sha1 is lost
> forever.

Hmph, and the significance of losing a random 20-byte object name that
is useless in your repository is? You could of course ask around other
repositories (i.e. your origin, others that fork from the same origin,
etc.), and having the name might make it easier to locate the exact
object.

But in such a case, either they have it at the tip (in which case you
can just fetch the branch you lost), or they have it reachable from
one of their tips of branches you had shown interest in (that is why
you had that lost object in the first place). Either way, you would be
running "git fetch" or asking them to send "git bundle" output to be
unbundled at your end, and the way you ask would be by refname, not
the object name, so I am not sure if the loss is that grave.

Perhaps I am missing something, of course, though.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] fsck: return non-zero status on missing ref tips
  2014-09-12  4:58                 ` Junio C Hamano
@ 2014-09-12  5:12                   ` Jeff King
  2014-09-15 14:42                   ` Michael Haggerty
  1 sibling, 0 replies; 21+ messages in thread
From: Jeff King @ 2014-09-12  5:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: David Turner, git mailing list, Michael Haggerty

On Thu, Sep 11, 2014 at 09:58:45PM -0700, Junio C Hamano wrote:

> On Thu, Sep 11, 2014 at 9:29 PM, Jeff King <peff@peff.net> wrote:
> > [+cc mhagger for packed-refs wisdom]
> >
> > If we only have a packed copy of "refs/heads/master" and it is broken,
> > then deleting any _other_ unrelated ref will cause refs/heads/master to
> > be dropped from the packed-refs file entirely. We get an error message,
> > but that's easy to miss, and the pointer to master's sha1 is lost
> > forever.
> 
> Hmph, and the significance of losing a random 20-byte object name that
> is useless in your repository is? You could of course ask around other
> repositories (i.e. your origin, others that fork from the same origin,
> etc.), and having the name might make it easier to locate the exact
> object.

Because your repository is corrupted and broken, and we then forget that
fact. I.e., it is not a random 20-byte object name. It is the thing that
your branch is pointing at, and that is valuable in itself. If you can
restore the object (e.g., from another repository), you need to know
which object to restore.

But I also think corrupting a repository and not noticing is quite bad
in itself.

> But in such a case, either they have it at the tip (in which case you
> can just fetch the branch you lost), or they have it reachable from
> one of their tips of branches you had shown interest in (that is why
> you had that lost object in the first place). Either way, you would be
> running "git fetch" or asking them to send "git bundle" output to be
> unbundled at your end, and the way you ask would be by refname, not
> the object name, so I am not sure if the loss is that grave.

Yes, but after you get the objects from the other person, what do you
set your ref to? If I know my tip was at commit X, I can get any set of
objects from another untrusted person that includes X, verify what they
sent me cryptographically, and restore my tip to X.

If I do not know X, they can send me any random set of objects. I can
verify that the sha1s are OK and the graph is complete, but I cannot
verify that the contents are sane. I am effectively just picking their
"master" as my new starting point, and trusting that it has not been
tampered with.

-Peff

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] fsck: return non-zero status on missing ref tips
  2014-09-12  4:58                 ` Junio C Hamano
  2014-09-12  5:12                   ` Jeff King
@ 2014-09-15 14:42                   ` Michael Haggerty
  1 sibling, 0 replies; 21+ messages in thread
From: Michael Haggerty @ 2014-09-15 14:42 UTC (permalink / raw)
  To: Junio C Hamano, Jeff King; +Cc: David Turner, git mailing list

On 09/12/2014 06:58 AM, Junio C Hamano wrote:
> On Thu, Sep 11, 2014 at 9:29 PM, Jeff King <peff@peff.net> wrote:
>> [+cc mhagger for packed-refs wisdom]
>>
>> If we only have a packed copy of "refs/heads/master" and it is broken,
>> then deleting any _other_ unrelated ref will cause refs/heads/master to
>> be dropped from the packed-refs file entirely. We get an error message,
>> but that's easy to miss, and the pointer to master's sha1 is lost
>> forever.
> 
> Hmph, and the significance of losing a random 20-byte object name that
> is useless in your repository is? You could of course ask around other
> repositories (i.e. your origin, others that fork from the same origin,
> etc.), and having the name might make it easier to locate the exact
> object.
> 
> But in such a case, either they have it at the tip (in which case you
> can just fetch the branch you lost), or they have it reachable from
> one of their tips of branches you had shown interest in (that is why
> you had that lost object in the first place). Either way, you would be
> running "git fetch" or asking them to send "git bundle" output to be
> unbundled at your end, and the way you ask would be by refname, not
> the object name, so I am not sure if the loss is that grave.
> 
> Perhaps I am missing something, of course, though.

I don't understand your argument.

First, you would not just lose the SHA-1 of the object. You would also
lose the name of the reference that was previously pointing at it.

Second, the discarded information *is* useful. The more information you
have, the more likely you can restore it and/or diagnose the original
cause of the corruption.

Third, even if the discarded information were not useful, the fact that
*information has gone missing* is of overwhelming importance, and that
fact would be forgotten as soon as the warning message scrolls off of
your terminal. The reference deletion that triggered the warning might
even have been done in the background by some other process (e.g., a
GUI) and the output discarded or shunted into some "debug" window that
the user would have no reason to look at.

So I agree with Peff that it would be prudent to preserve the corrupt
reference at least until the next "git fsck", which (a) is run by the
user specifically to look for corruption, and (b) can return an error
result to make the failure obvious.

The only thing that is unclear to me is whether the user would be able
to get rid of the broken reference once it is discovered (short of
opening packed-refs in an editor).

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH] fsck: return non-zero status on missing ref tips
  2014-09-12  4:29               ` Jeff King
  2014-09-12  4:38                 ` Jeff King
  2014-09-12  4:58                 ` Junio C Hamano
@ 2014-09-15 14:57                 ` Michael Haggerty
  2 siblings, 0 replies; 21+ messages in thread
From: Michael Haggerty @ 2014-09-15 14:57 UTC (permalink / raw)
  To: Jeff King, Junio C Hamano; +Cc: David Turner, git mailing list

On 09/12/2014 06:29 AM, Jeff King wrote:
> [+cc mhagger for packed-refs wisdom]
> 
> On Thu, Sep 11, 2014 at 11:38:30PM -0400, Jeff King wrote:
> 
>> Fsck tries hard to detect missing objects, and will complain
>> (and exit non-zero) about any inter-object links that are
>> missing. However, it will not exit non-zero for any missing
>> ref tips, meaning that a severely broken repository may
>> still pass "git fsck && echo ok".
>>
>> The problem is that we use for_each_ref to iterate over the
>> ref tips, which hides broken tips. It does at least print an
>> error from the refs.c code, but fsck does not ever see the
>> ref and cannot note the problem in its exit code. We can solve
>> this by using for_each_rawref and noting the error ourselves.
> 
> There's a possibly related problem with packed-refs that I noticed while
> looking at this.
> 
> When we call pack-refs, it will refuse to pack any broken loose refs,
> and leave them loose. Which is sane. But when we delete a ref, we need
> to rewrite the packed-refs file, and we omit any broken packed refs. We
> wouldn't have written a broken entry, but we may get broken later (i.e.,
> the tip object may go missing after the packed-refs file is written).
> 
> If we only have a packed copy of "refs/heads/master" and it is broken,
> then deleting any _other_ unrelated ref will cause refs/heads/master to
> be dropped from the packed-refs file entirely. We get an error message,
> but that's easy to miss, and the pointer to master's sha1 is lost
> forever.

I was confused for a while by your observation, because the curate
function has

	if (read_ref_full(entry->name, sha1, 0, &flags))
		/* We should at least have found the packed ref. */
		die("Internal error");

, which looks like more than "emit an error message and continue". But
in fact the flow never gets this far, because iterating without
DO_FOR_EACH_INCLUDE_BROKEN doesn't just skip references for which
REF_ISBROKEN is set, but also (do to a test in do_one_ref()) references
for which ref_resolves_to_object() fails. The ultimate source of my
confusion is that the word BROKEN has two different meanings in the two
constants' names.

> [...]
> I am tempted to say that we do not need to do curate_each_ref_fn at all.
> Any entry with a broken sha1 is either:
> 
>   1. A truly broken ref, in which case we should make sure to keep it
>      (i.e., it is not cruft at all).
> 
>   2. A crufty entry that has been replaced by a loose reference that has
>      not yet been packed. Such a crufty entry may point to broken
>      objects, and that is OK.
> 
> In case 2, we _could_ delete the cruft. But I do not think we need to.
> The loose ref will take precedence to anybody who actually does a ref
> lookup, so the cruft is not hurting anybody.
> 
> Dropping curate_packed_ref_fn (as below) fixes the test above. And
> miraculously does not even seem to conflict with ref patches in pu. :)
> 
> Am I missing any case that it is actually helping?

Something inside me screams out in horror that we would pass up an
opportunity to delete unneeded cruft from the packed-refs file. But I
can't think of a rational reason to disagree with you, so as far as I'm
concerned your suggestion seems OK.

Michael

-- 
Michael Haggerty
mhagger@alum.mit.edu

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2014-09-15 15:04 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-27 22:10 git fsck exit code? David Turner
2014-08-29 18:53 ` Jeff King
2014-08-29 19:21   ` Junio C Hamano
2014-08-29 20:18     ` David Turner
2014-08-29 20:31       ` Jeff King
2014-08-29 20:47         ` Junio C Hamano
2014-09-09 22:03         ` [PATCH] fsck: exit with non-zero status upon error from fsck_obj() Junio C Hamano
2014-09-09 22:07           ` Jeff King
2014-09-12  3:38             ` [PATCH] fsck: return non-zero status on missing ref tips Jeff King
2014-09-12  4:29               ` Jeff King
2014-09-12  4:38                 ` Jeff King
2014-09-12  4:58                 ` Junio C Hamano
2014-09-12  5:12                   ` Jeff King
2014-09-15 14:42                   ` Michael Haggerty
2014-09-15 14:57                 ` Michael Haggerty
2014-09-09 22:21           ` [PATCH] fsck: exit with non-zero status upon error from fsck_obj() Junio C Hamano
2014-09-09 22:29             ` Jonathan Nieder
2014-08-31 18:54       ` git fsck exit code? Øyvind A. Holm
2014-09-01 11:54         ` Øyvind A. Holm
2014-09-01 18:17         ` David Turner
2014-09-09 22:09           ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).