git@vger.kernel.org list mirror (unofficial, one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/4] usage.c: add a non-fatal bug() + misc doc fixes
@ 2021-03-28  2:25 Ævar Arnfjörð Bjarmason
  2021-03-28  2:26 ` [PATCH 1/4] usage.c: don't copy/paste the same comment three times Ævar Arnfjörð Bjarmason
                   ` (4 more replies)
  0 siblings, 5 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:25 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff Hostetler, Jonathan Tan, Jeff King,
	Ævar Arnfjörð Bjarmason

Fix up some of the technical docs around error handling and add a
bug() function, as explained in 4/4 this is for use in some
fsck-related code.

This series does not make use of it, but I'll follow-up with one that
does. I wanted to peel of this small cleanup series from that.

I noticed some semi-related bugs in these error tracing functions to
do with the trace2 integration, noted in
https://lore.kernel.org/git/87mtuoo4ym.fsf@evledraar.gmail.com/ this
doesn't attempt to fix those.

Ævar Arnfjörð Bjarmason (4):
  usage.c: don't copy/paste the same comment three times
  api docs: document BUG() in api-error-handling.txt
  api docs: document that BUG() emits a trace2 error event
  usage.c: add a non-fatal bug() function to go with BUG()

 .../technical/api-error-handling.txt          | 16 ++++++-
 Documentation/technical/api-trace2.txt        |  4 +-
 git-compat-util.h                             |  3 ++
 run-command.c                                 | 11 +++++
 t/helper/test-trace2.c                        | 14 +++++-
 t/t0210-trace2-normal.sh                      | 19 ++++++++
 usage.c                                       | 46 ++++++++++++++-----
 7 files changed, 95 insertions(+), 18 deletions(-)

-- 
2.31.1.445.g91d8e479b0a


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH 1/4] usage.c: don't copy/paste the same comment three times
  2021-03-28  2:25 [PATCH 0/4] usage.c: add a non-fatal bug() + misc doc fixes Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:26 ` Ævar Arnfjörð Bjarmason
  2021-03-28  2:32   ` Eric Sunshine
  2021-03-28  2:26 ` [PATCH 2/4] api docs: document BUG() in api-error-handling.txt Ævar Arnfjörð Bjarmason
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:26 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff Hostetler, Jonathan Tan, Jeff King,
	Ævar Arnfjörð Bjarmason

In gee4512ed481 (trace2: create new combined trace facility,
2019-02-22) we started with two copies of this comment,
0ee10fd1296 (usage: add trace2 entry upon warning(), 2020-11-23) added
a third. Let's instead add an earlier comment that applies to all
these mostly-the-same functions.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 usage.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/usage.c b/usage.c
index 1b206de36d6..c7d233b0de9 100644
--- a/usage.c
+++ b/usage.c
@@ -55,12 +55,13 @@ static NORETURN void usage_builtin(const char *err, va_list params)
 	exit(129);
 }
 
+/*
+ * We call trace2_cmd_error_va() in the below functions first and
+ * expect it to va_copy 'params' before using it (because an 'ap' can
+ * only be walked once).
+ */
 static NORETURN void die_builtin(const char *err, va_list params)
 {
-	/*
-	 * We call this trace2 function first and expect it to va_copy 'params'
-	 * before using it (because an 'ap' can only be walked once).
-	 */
 	trace2_cmd_error_va(err, params);
 
 	vreportf("fatal: ", err, params);
@@ -70,10 +71,6 @@ static NORETURN void die_builtin(const char *err, va_list params)
 
 static void error_builtin(const char *err, va_list params)
 {
-	/*
-	 * We call this trace2 function first and expect it to va_copy 'params'
-	 * before using it (because an 'ap' can only be walked once).
-	 */
 	trace2_cmd_error_va(err, params);
 
 	vreportf("error: ", err, params);
@@ -81,10 +78,6 @@ static void error_builtin(const char *err, va_list params)
 
 static void warn_builtin(const char *warn, va_list params)
 {
-	/*
-	 * We call this trace2 function first and expect it to va_copy 'params'
-	 * before using it (because an 'ap' can only be walked once).
-	 */
 	trace2_cmd_error_va(warn, params);
 
 	vreportf("warning: ", warn, params);
-- 
2.31.1.445.g91d8e479b0a


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH 2/4] api docs: document BUG() in api-error-handling.txt
  2021-03-28  2:25 [PATCH 0/4] usage.c: add a non-fatal bug() + misc doc fixes Ævar Arnfjörð Bjarmason
  2021-03-28  2:26 ` [PATCH 1/4] usage.c: don't copy/paste the same comment three times Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:26 ` Ævar Arnfjörð Bjarmason
  2021-03-29  5:37   ` Bagas Sanjaya
  2021-03-28  2:26 ` [PATCH 3/4] api docs: document that BUG() emits a trace2 error event Ævar Arnfjörð Bjarmason
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:26 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff Hostetler, Jonathan Tan, Jeff King,
	Ævar Arnfjörð Bjarmason

When the BUG() function was added in d8193743e08 (usage.c: add BUG()
function, 2017-05-12) these docs added in 1f23cfe0ef5 (doc: document
error handling functions and conventions, 2014-12-03) were not
updated. Let's do that.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/technical/api-error-handling.txt | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/api-error-handling.txt b/Documentation/technical/api-error-handling.txt
index ceeedd485c9..71486abb2f0 100644
--- a/Documentation/technical/api-error-handling.txt
+++ b/Documentation/technical/api-error-handling.txt
@@ -1,8 +1,11 @@
 Error reporting in git
 ======================
 
-`die`, `usage`, `error`, and `warning` report errors of various
-kinds.
+`BUG`, `die`, `usage`, `error`, and `warning` report errors of
+various kinds.
+
+- `BUG` is for failed internal assertions that should never happen,
+  i.e. a bug in git itself.
 
 - `die` is for fatal application errors.  It prints a message to
   the user and exits with status 128.
-- 
2.31.1.445.g91d8e479b0a


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH 3/4] api docs: document that BUG() emits a trace2 error event
  2021-03-28  2:25 [PATCH 0/4] usage.c: add a non-fatal bug() + misc doc fixes Ævar Arnfjörð Bjarmason
  2021-03-28  2:26 ` [PATCH 1/4] usage.c: don't copy/paste the same comment three times Ævar Arnfjörð Bjarmason
  2021-03-28  2:26 ` [PATCH 2/4] api docs: document BUG() in api-error-handling.txt Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:26 ` Ævar Arnfjörð Bjarmason
  2021-03-28  2:26 ` [PATCH 4/4] usage.c: add a non-fatal bug() function to go with BUG() Ævar Arnfjörð Bjarmason
  2021-04-13  9:08 ` [PATCH v2 0/3] trace2 docs: note that BUG() sends an "error" event Ævar Arnfjörð Bjarmason
  4 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:26 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff Hostetler, Jonathan Tan, Jeff King,
	Ævar Arnfjörð Bjarmason

Correct documentation added in e544221d97a (trace2:
Documentation/technical/api-trace2.txt, 2019-02-22) to state that
calling BUG() also emits an "error" event. See ee4512ed481 (trace2:
create new combined trace facility, 2019-02-22) for the initial
implementation.

The BUG() function did not emit an event then however, that was only
changed later in 0a9dde4a04c (usage: trace2 BUG() invocations,
2021-02-05), that commit changed the code, but didn't update any of
the docs.

Let's also add a cross-reference from api-error-handling.txt.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/technical/api-error-handling.txt | 3 +++
 Documentation/technical/api-trace2.txt         | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/Documentation/technical/api-error-handling.txt b/Documentation/technical/api-error-handling.txt
index 71486abb2f0..8be4f4d0d6a 100644
--- a/Documentation/technical/api-error-handling.txt
+++ b/Documentation/technical/api-error-handling.txt
@@ -23,6 +23,9 @@ various kinds.
   without running into too many problems.  Like `error`, it
   returns -1 after reporting the situation to the caller.
 
+These reports will be logged via the trace2 facility. See the "error"
+event in link:api-trace2.txt[trace2 API].
+
 Customizable error handlers
 ---------------------------
 
diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index c65ffafc485..3f52f981a2d 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -465,7 +465,7 @@ completed.)
 ------------
 
 `"error"`::
-	This event is emitted when one of the `error()`, `die()`,
+	This event is emitted when one of the `BUG()`, `error()`, `die()`,
 	`warning()`, or `usage()` functions are called.
 +
 ------------
-- 
2.31.1.445.g91d8e479b0a


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH 4/4] usage.c: add a non-fatal bug() function to go with BUG()
  2021-03-28  2:25 [PATCH 0/4] usage.c: add a non-fatal bug() + misc doc fixes Ævar Arnfjörð Bjarmason
                   ` (2 preceding siblings ...)
  2021-03-28  2:26 ` [PATCH 3/4] api docs: document that BUG() emits a trace2 error event Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:26 ` Ævar Arnfjörð Bjarmason
  2021-03-28  2:58   ` [PATCH 0/5] fsck: improve error reporting Ævar Arnfjörð Bjarmason
  2021-03-28  6:12   ` [PATCH 4/4] usage.c: add a non-fatal bug() function to go with BUG() Junio C Hamano
  2021-04-13  9:08 ` [PATCH v2 0/3] trace2 docs: note that BUG() sends an "error" event Ævar Arnfjörð Bjarmason
  4 siblings, 2 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:26 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff Hostetler, Jonathan Tan, Jeff King,
	Ævar Arnfjörð Bjarmason

Add a bug() function that works like error() except the message is
prefixed with "bug:" instead of "error:".

The reason this is needed is for e.g. the fsck code. If we encounter
what we'd consider a BUG() in the middle of fsck traversal we'd still
like to try as hard as possible to go past that object and complete
the fsck, instead of hard dying. A follow-up commit will introduce
such a use in object-file.c.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 .../technical/api-error-handling.txt          |  8 ++++-
 Documentation/technical/api-trace2.txt        |  4 +--
 git-compat-util.h                             |  3 ++
 run-command.c                                 | 11 +++++++
 t/helper/test-trace2.c                        | 14 +++++++--
 t/t0210-trace2-normal.sh                      | 19 ++++++++++++
 usage.c                                       | 29 +++++++++++++++++++
 7 files changed, 83 insertions(+), 5 deletions(-)

diff --git a/Documentation/technical/api-error-handling.txt b/Documentation/technical/api-error-handling.txt
index 8be4f4d0d6a..9d6ac6f6649 100644
--- a/Documentation/technical/api-error-handling.txt
+++ b/Documentation/technical/api-error-handling.txt
@@ -1,7 +1,7 @@
 Error reporting in git
 ======================
 
-`BUG`, `die`, `usage`, `error`, and `warning` report errors of
+`BUG`, `bug`, `die`, `usage`, `error`, and `warning` report errors of
 various kinds.
 
 - `BUG` is for failed internal assertions that should never happen,
@@ -18,6 +18,12 @@ various kinds.
   to the user and returns -1 for convenience in signaling the error
   to the caller.
 
+- `bug` (lower-case, not `BUG`) is supposed to be used like `BUG` but
+  has the same non-fatal semantics as `error`. It's meant to signal an
+  internal bug in a library whose caller might still want to attempt
+  some amount of graceful recovery, or to append other error output of
+  their own.
+
 - `warning` is for reporting situations that probably should not
   occur but which the user (and Git) can continue to work around
   without running into too many problems.  Like `error`, it
diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index 3f52f981a2d..cafe373f405 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -465,8 +465,8 @@ completed.)
 ------------
 
 `"error"`::
-	This event is emitted when one of the `BUG()`, `error()`, `die()`,
-	`warning()`, or `usage()` functions are called.
+	This event is emitted when one of the `BUG()`, `bug()`, `error()`,
+	`die()`, `warning()`, or `usage()` functions are called.
 +
 ------------
 {
diff --git a/git-compat-util.h b/git-compat-util.h
index 9ddf9d7044b..13c1dcf9dcc 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -463,6 +463,7 @@ NORETURN void usage(const char *err);
 NORETURN void usagef(const char *err, ...) __attribute__((format (printf, 1, 2)));
 NORETURN void die(const char *err, ...) __attribute__((format (printf, 1, 2)));
 NORETURN void die_errno(const char *err, ...) __attribute__((format (printf, 1, 2)));
+int bug(const char *err, ...) __attribute__((format (printf, 1, 2)));
 int error(const char *err, ...) __attribute__((format (printf, 1, 2)));
 int error_errno(const char *err, ...) __attribute__((format (printf, 1, 2)));
 void warning(const char *err, ...) __attribute__((format (printf, 1, 2)));
@@ -497,6 +498,8 @@ static inline int const_error(void)
 typedef void (*report_fn)(const char *, va_list params);
 
 void set_die_routine(NORETURN_PTR report_fn routine);
+void set_bug_routine(report_fn routine);
+report_fn get_bug_routine(void);
 void set_error_routine(report_fn routine);
 report_fn get_error_routine(void);
 void set_warn_routine(report_fn routine);
diff --git a/run-command.c b/run-command.c
index be6bc128cd9..8b818b063ff 100644
--- a/run-command.c
+++ b/run-command.c
@@ -348,6 +348,12 @@ static void fake_fatal(const char *err, va_list params)
 	vreportf("fatal: ", err, params);
 }
 
+static void child_bug_fn(const char *err, va_list params)
+{
+	const char msg[] = "bug() should not be called in child\n";
+	xwrite(2, msg, sizeof(msg) - 1);
+}
+
 static void child_error_fn(const char *err, va_list params)
 {
 	const char msg[] = "error() should not be called in child\n";
@@ -371,9 +377,12 @@ static void NORETURN child_die_fn(const char *err, va_list params)
 static void child_err_spew(struct child_process *cmd, struct child_err *cerr)
 {
 	static void (*old_errfn)(const char *err, va_list params);
+	static void (*old_bugfn)(const char *err, va_list params);
 
 	old_errfn = get_error_routine();
 	set_error_routine(fake_fatal);
+	old_bugfn = get_bug_routine();
+	set_bug_routine(fake_fatal);
 	errno = cerr->syserr;
 
 	switch (cerr->err) {
@@ -399,6 +408,7 @@ static void child_err_spew(struct child_process *cmd, struct child_err *cerr)
 		error_errno("cannot exec '%s'", cmd->argv[0]);
 		break;
 	}
+	set_bug_routine(old_bugfn);
 	set_error_routine(old_errfn);
 }
 
@@ -789,6 +799,7 @@ int start_command(struct child_process *cmd)
 		 * called, they can take stdio locks and malloc.
 		 */
 		set_die_routine(child_die_fn);
+		set_error_routine(child_bug_fn);
 		set_error_routine(child_error_fn);
 		set_warn_routine(child_warn_fn);
 
diff --git a/t/helper/test-trace2.c b/t/helper/test-trace2.c
index f93633f895a..6248427e4bf 100644
--- a/t/helper/test-trace2.c
+++ b/t/helper/test-trace2.c
@@ -198,7 +198,7 @@ static int ut_006data(int argc, const char **argv)
 	return 0;
 }
 
-static int ut_007bug(int argc, const char **argv)
+static int ut_007BUG(int argc, const char **argv)
 {
 	/*
 	 * Exercise BUG() to ensure that the message is printed to trace2.
@@ -206,6 +206,15 @@ static int ut_007bug(int argc, const char **argv)
 	BUG("the bug message");
 }
 
+static int ut_008bug(int argc, const char **argv)
+{
+	/*
+	 * Exercise BUG() to ensure that the message is printed to trace2.
+	 */
+	bug("the bug message");
+	return 0;
+}
+
 /*
  * Usage:
  *     test-tool trace2 <ut_name_1> <ut_usage_1>
@@ -222,7 +231,8 @@ static struct unit_test ut_table[] = {
 	{ ut_004child,    "004child",  "[<child_command_line>]" },
 	{ ut_005exec,     "005exec",   "<git_command_args>" },
 	{ ut_006data,     "006data",   "[<category> <key> <value>]+" },
-	{ ut_007bug,      "007bug",    "" },
+	{ ut_007BUG,      "007bug",    "" },
+	{ ut_008bug,      "008bug",    "" },
 };
 /* clang-format on */
 
diff --git a/t/t0210-trace2-normal.sh b/t/t0210-trace2-normal.sh
index 0cf3a63b75b..9c866af971f 100755
--- a/t/t0210-trace2-normal.sh
+++ b/t/t0210-trace2-normal.sh
@@ -166,6 +166,25 @@ test_expect_success 'BUG messages are written to trace2' '
 	test_cmp expect actual
 '
 
+# Verb 008bug
+#
+# Check that BUG writes to trace2
+
+test_expect_success 'bug messages are written to trace2' '
+	test_when_finished "rm trace.normal actual expect" &&
+	GIT_TRACE2="$(pwd)/trace.normal" test-tool trace2 008bug &&
+	perl "$TEST_DIRECTORY/t0210/scrub_normal.perl" <trace.normal >actual &&
+	cat >expect <<-EOF &&
+		version $V
+		start _EXE_ trace2 008bug
+		cmd_name trace2 (trace2)
+		error the bug message
+		exit elapsed:_TIME_ code:0
+		atexit elapsed:_TIME_ code:0
+	EOF
+	test_cmp expect actual
+'
+
 sane_unset GIT_TRACE2_BRIEF
 
 # Now test without environment variables and get all Trace2 settings
diff --git a/usage.c b/usage.c
index c7d233b0de9..34bd3abf048 100644
--- a/usage.c
+++ b/usage.c
@@ -69,6 +69,13 @@ static NORETURN void die_builtin(const char *err, va_list params)
 	exit(128);
 }
 
+static void bug_builtin(const char *err, va_list params)
+{
+	trace2_cmd_error_va(err, params);
+
+	vreportf("bug: ", err, params);
+}
+
 static void error_builtin(const char *err, va_list params)
 {
 	trace2_cmd_error_va(err, params);
@@ -109,6 +116,7 @@ static int die_is_recursing_builtin(void)
  * (ugh), so keep things static. */
 static NORETURN_PTR report_fn usage_routine = usage_builtin;
 static NORETURN_PTR report_fn die_routine = die_builtin;
+static report_fn bug_routine = bug_builtin;
 static report_fn error_routine = error_builtin;
 static report_fn warn_routine = warn_builtin;
 static int (*die_is_recursing)(void) = die_is_recursing_builtin;
@@ -118,11 +126,22 @@ void set_die_routine(NORETURN_PTR report_fn routine)
 	die_routine = routine;
 }
 
+
+void set_bug_routine(report_fn routine)
+{
+	bug_routine = routine;
+}
+
 void set_error_routine(report_fn routine)
 {
 	error_routine = routine;
 }
 
+report_fn get_bug_routine(void)
+{
+	return bug_routine;
+}
+
 report_fn get_error_routine(void)
 {
 	return error_routine;
@@ -223,6 +242,16 @@ int error_errno(const char *fmt, ...)
 	return -1;
 }
 
+int bug(const char *err, ...)
+{
+	va_list params;
+
+	va_start(params, err);
+	bug_routine(err, params);
+	va_end(params);
+	return -1;
+}
+
 #undef error
 int error(const char *err, ...)
 {
-- 
2.31.1.445.g91d8e479b0a


^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH 1/4] usage.c: don't copy/paste the same comment three times
  2021-03-28  2:26 ` [PATCH 1/4] usage.c: don't copy/paste the same comment three times Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:32   ` Eric Sunshine
  0 siblings, 0 replies; 155+ messages in thread
From: Eric Sunshine @ 2021-03-28  2:32 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Git List, Junio C Hamano, Jeff Hostetler, Jonathan Tan, Jeff King

On Sat, Mar 27, 2021 at 10:28 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> In gee4512ed481 (trace2: create new combined trace facility,

s/gee/ee/

> 2019-02-22) we started with two copies of this comment,
> 0ee10fd1296 (usage: add trace2 entry upon warning(), 2020-11-23) added
> a third. Let's instead add an earlier comment that applies to all
> these mostly-the-same functions.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>

^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH 0/5] fsck: improve error reporting
  2021-03-28  2:26 ` [PATCH 4/4] usage.c: add a non-fatal bug() function to go with BUG() Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:58   ` Ævar Arnfjörð Bjarmason
  2021-03-28  2:58     ` [PATCH 1/5] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
                       ` (6 more replies)
  2021-03-28  6:12   ` [PATCH 4/4] usage.c: add a non-fatal bug() function to go with BUG() Junio C Hamano
  1 sibling, 7 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:58 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

This improves fsck error reporting in a rather obscure edge case, and
fixes up some object APIs along the way as needed.

This is based on my series to add a bug() function:
https://lore.kernel.org/git/cover-0.5-00000000000-20210328T022343Z-avarab@gmail.com/

The use of the new bug() function is in 5/5.

Ævar Arnfjörð Bjarmason (5):
  cache.h: move object functions to object-store.h
  fsck tests: refactor one test to use a sub-repo
  fsck: don't hard die on invalid object types
  fsck: improve the error on invalid object types
  fsck: improve error on loose object hash mismatch

 builtin/cat-file.c    |  7 +++--
 builtin/fast-export.c |  2 +-
 builtin/fsck.c        | 28 ++++++++++++++---
 builtin/index-pack.c  |  2 +-
 builtin/mktag.c       |  3 +-
 cache.h               | 10 ------
 object-file.c         | 73 +++++++++++++++++++++++--------------------
 object-store.h        | 19 +++++++++--
 object.c              |  4 +--
 pack-check.c          |  3 +-
 streaming.c           |  5 ++-
 t/t1450-fsck.sh       | 64 +++++++++++++++++++++++++++----------
 12 files changed, 143 insertions(+), 77 deletions(-)

-- 
2.31.1.445.g91d8e479b0a


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH 1/5] cache.h: move object functions to object-store.h
  2021-03-28  2:58   ` [PATCH 0/5] fsck: improve error reporting Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:58     ` Ævar Arnfjörð Bjarmason
  2021-03-28  2:58     ` [PATCH 2/5] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
                       ` (5 subsequent siblings)
  6 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:58 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

Move the declaration of some ancient object functions added in
e.g. c4483576b8d (Add "unpack_sha1_header()" helper function,
2005-06-01) from cache.h to object-store.h. This continues work
started in cbd53a2193d (object-store: move object access functions to
object-store.h, 2018-05-15).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache.h        | 10 ----------
 object-store.h |  9 +++++++++
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/cache.h b/cache.h
index 6fda8091f11..0e387f84f67 100644
--- a/cache.h
+++ b/cache.h
@@ -1279,16 +1279,6 @@ char *xdg_cache_home(const char *filename);
 
 int git_open_cloexec(const char *name, int flags);
 #define git_open(name) git_open_cloexec(name, O_RDONLY)
-int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
-
-int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *buf, unsigned long size, const char *type);
-
-int finalize_object_file(const char *tmpfile, const char *filename);
-
-/* Helper to check and "touch" a file */
-int check_and_freshen_file(const char *fn, int freshen);
 
 extern const signed char hexval_table[256];
 static inline unsigned int hexval(unsigned char c)
diff --git a/object-store.h b/object-store.h
index ec32c23dcb5..9117115a50c 100644
--- a/object-store.h
+++ b/object-store.h
@@ -477,4 +477,13 @@ int for_each_object_in_pack(struct packed_git *p,
 int for_each_packed_object(each_packed_object_fn, void *,
 			   enum for_each_object_flags flags);
 
+int unpack_loose_header(git_zstream *stream, unsigned char *map,
+			unsigned long mapsize, void *buffer,
+			unsigned long bufsiz);
+int parse_loose_header(const char *hdr, unsigned long *sizep);
+int check_object_signature(struct repository *r, const struct object_id *oid,
+			   void *buf, unsigned long size, const char *type);
+int finalize_object_file(const char *tmpfile, const char *filename);
+int check_and_freshen_file(const char *fn, int freshen);
+
 #endif /* OBJECT_STORE_H */
-- 
2.31.1.445.g91d8e479b0a


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH 2/5] fsck tests: refactor one test to use a sub-repo
  2021-03-28  2:58   ` [PATCH 0/5] fsck: improve error reporting Ævar Arnfjörð Bjarmason
  2021-03-28  2:58     ` [PATCH 1/5] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:58     ` Ævar Arnfjörð Bjarmason
  2021-03-28  2:58     ` [PATCH 3/5] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
                       ` (4 subsequent siblings)
  6 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:58 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

Refactor one of the fsck tests to use a throwaway repository. It's a
pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
teardown of a tests so we're not leaving corrupt content for the next
test.

We should instead simply use something like this test_create_repo
pattern. It's both less verbose, and makes things easier to debug as a
failing test can have their state left behind under -d without
damaging the state for other tests.

But let's punt on that general refactoring and just change this one
test, I'm going to change it further in subsequent commits.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 34 ++++++++++++++++------------------
 1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 5071ac63a5b..1563b35f88c 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -48,24 +48,22 @@ remove_object () {
 	rm "$(sha1_file "$1")"
 }
 
-test_expect_success 'object with bad sha1' '
-	sha=$(echo blob | git hash-object -w --stdin) &&
-	old=$(test_oid_to_path "$sha") &&
-	new=$(dirname $old)/$(test_oid ff_2) &&
-	sha="$(dirname $new)$(basename $new)" &&
-	mv .git/objects/$old .git/objects/$new &&
-	test_when_finished "remove_object $sha" &&
-	git update-index --add --cacheinfo 100644 $sha foo &&
-	test_when_finished "git read-tree -u --reset HEAD" &&
-	tree=$(git write-tree) &&
-	test_when_finished "remove_object $tree" &&
-	cmt=$(echo bogus | git commit-tree $tree) &&
-	test_when_finished "remove_object $cmt" &&
-	git update-ref refs/heads/bogus $cmt &&
-	test_when_finished "git update-ref -d refs/heads/bogus" &&
-
-	test_must_fail git fsck 2>out &&
-	test_i18ngrep "$sha.*corrupt" out
+test_expect_success 'object with hash mismatch' '
+	test_create_repo hash-mismatch &&
+	(
+		cd hash-mismatch &&
+		oid=$(echo blob | git hash-object -w --stdin) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+		mv .git/objects/$old .git/objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+		test_must_fail git fsck 2>out &&
+		test_i18ngrep "$oid.*corrupt" out
+	)
 '
 
 test_expect_success 'branch pointing to non-commit' '
-- 
2.31.1.445.g91d8e479b0a


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH 3/5] fsck: don't hard die on invalid object types
  2021-03-28  2:58   ` [PATCH 0/5] fsck: improve error reporting Ævar Arnfjörð Bjarmason
  2021-03-28  2:58     ` [PATCH 1/5] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
  2021-03-28  2:58     ` [PATCH 2/5] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:58     ` Ævar Arnfjörð Bjarmason
  2021-03-28  2:58     ` [PATCH 4/5] fsck: improve the error " Ævar Arnfjörð Bjarmason
                       ` (3 subsequent siblings)
  6 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:58 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

Change builtin/fsck.c to pass down a
OBJECT_INFO_ALLOW_UNKNOWN_TYPE. This changes this very ungraceful
error:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    <OID>
    $ git fsck
    fatal: invalid object type
    $

Into:

    $ git fsck
    error: hash mismatch for <OID_PATH> (expected <OID>)
    error: <OID>: object corrupt or missing: <OID_PATH>
    [ the rest of the fsck output here, i.e. it didn't hard die ]

We'll still exit with non-zero, but now we'll finish the rest of the
traversal. The tests that's being added here asserts that we'll still
complain about other fsck issues (e.g. an unrelated dangling blob).

But why are we complaining about a "hash mismatch" for an object of a
type we don't know about? We shouldn't. This is the bare minimal
change needed to not make fsck hard die on a repository that's been
corrupted in this manner. In subsequent commits we'll teach fsck to
recognize this particular type of corruption and emit a better error
message.

The parse_loose_header() function being changed here is only used in
builtin/fsck.c, see f6371f92104 (sha1_file: add read_loose_object()
function, 2017-01-13) for its introduction.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c  |  3 ++-
 object-file.c   | 24 ++++++++++--------------
 object-store.h  |  6 ++++--
 streaming.c     |  5 ++++-
 t/t1450-fsck.sh |  9 +++++++++
 5 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 821e7798c70..d92c530863d 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -602,7 +602,8 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	void *contents;
 	int eaten;
 
-	if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
+	if (read_loose_object(path, oid, &type, &size, &contents,
+			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
 		errors_found |= ERROR_OBJECT;
 		error(_("%s: object corrupt or missing: %s"),
 		      oid_to_hex(oid), path);
diff --git a/object-file.c b/object-file.c
index 624af408cdc..26560a6281c 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1294,8 +1294,9 @@ static void *unpack_loose_rest(git_zstream *stream,
  * too permissive for what we want to check. So do an anal
  * object header parse by hand.
  */
-static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
-				       unsigned int flags)
+int parse_loose_header(const char *hdr,
+		       struct object_info *oi,
+		       unsigned int flags)
 {
 	const char *type_buf = hdr;
 	unsigned long size;
@@ -1355,14 +1356,6 @@ static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
 	return *hdr ? -1 : type;
 }
 
-int parse_loose_header(const char *hdr, unsigned long *sizep)
-{
-	struct object_info oi = OBJECT_INFO_INIT;
-
-	oi.sizep = sizep;
-	return parse_loose_header_extended(hdr, &oi, 0);
-}
-
 static int loose_object_info(struct repository *r,
 			     const struct object_id *oid,
 			     struct object_info *oi, int flags)
@@ -1417,10 +1410,10 @@ static int loose_object_info(struct repository *r,
 	if (status < 0)
 		; /* Do nothing */
 	else if (hdrbuf.len) {
-		if ((status = parse_loose_header_extended(hdrbuf.buf, oi, flags)) < 0)
+		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header_extended(hdr, oi, flags)) < 0)
+	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
 
 	if (status >= 0 && oi->contentp) {
@@ -2497,13 +2490,16 @@ int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
 		      enum object_type *type,
 		      unsigned long *size,
-		      void **contents)
+		      void **contents,
+		      unsigned int oi_flags)
 {
 	int ret = -1;
 	void *map = NULL;
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
+	struct object_info oi = OBJECT_INFO_INIT;
+	oi.sizep = size;
 
 	*contents = NULL;
 
@@ -2518,7 +2514,7 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, size);
+	*type = parse_loose_header(hdr, &oi, oi_flags);
 	if (*type < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
diff --git a/object-store.h b/object-store.h
index 9117115a50c..ab86c8bf32c 100644
--- a/object-store.h
+++ b/object-store.h
@@ -245,7 +245,8 @@ int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
 		      enum object_type *type,
 		      unsigned long *size,
-		      void **contents);
+		      void **contents,
+		      unsigned int oi_flags);
 
 /* Retry packed storage after checking packed and loose storage */
 #define HAS_OBJECT_RECHECK_PACKED 1
@@ -480,7 +481,8 @@ int for_each_packed_object(each_packed_object_fn, void *,
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
 			unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
+int parse_loose_header(const char *hdr, struct object_info *oi,
+		       unsigned int flags);
 int check_object_signature(struct repository *r, const struct object_id *oid,
 			   void *buf, unsigned long size, const char *type);
 int finalize_object_file(const char *tmpfile, const char *filename);
diff --git a/streaming.c b/streaming.c
index 800f07a52cc..e5d4dd2f654 100644
--- a/streaming.c
+++ b/streaming.c
@@ -341,6 +341,9 @@ static struct stream_vtbl loose_vtbl = {
 
 static open_method_decl(loose)
 {
+	struct object_info oi2 = OBJECT_INFO_INIT;
+	oi2.sizep = &st->size;
+
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
 		return -1;
@@ -349,7 +352,7 @@ static open_method_decl(loose)
 				 st->u.loose.mapsize,
 				 st->u.loose.hdr,
 				 sizeof(st->u.loose.hdr)) < 0) ||
-	    (parse_loose_header(st->u.loose.hdr, &st->size) < 0)) {
+	    (parse_loose_header(st->u.loose.hdr, &oi2, 0) < 0)) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
 		return -1;
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 1563b35f88c..025dd1b491a 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -863,4 +863,13 @@ test_expect_success 'detect corrupt index file in fsck' '
 	test_i18ngrep "bad index file" errors
 '
 
+test_expect_success 'fsck error and recovery on invalid object type' '
+	test_create_repo garbage-type &&
+	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
+	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
+	test_must_fail git -C garbage-type fsck >out 2>err &&
+	grep "$garbage_blob: object corrupt or missing:" err &&
+	grep "dangling blob $empty_blob" out
+'
+
 test_done
-- 
2.31.1.445.g91d8e479b0a


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH 4/5] fsck: improve the error on invalid object types
  2021-03-28  2:58   ` [PATCH 0/5] fsck: improve error reporting Ævar Arnfjörð Bjarmason
                       ` (2 preceding siblings ...)
  2021-03-28  2:58     ` [PATCH 3/5] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:58     ` Ævar Arnfjörð Bjarmason
  2021-03-28  8:56       ` Johannes Sixt
  2021-03-28  2:58     ` [PATCH 5/5] fsck: improve error on loose object hash mismatch Ævar Arnfjörð Bjarmason
                       ` (2 subsequent siblings)
  6 siblings, 1 reply; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:58 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

Continue the work in the preceding commit and improve the error on:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    $ git fsck
    error: hash mismatch for <OID_PATH> (expected <OID>)
    error: <OID>: object corrupt or missing: <OID_PATH>
    [ other fsck output ]

To instead emit:

    $ git fsck
    error: <OID>: object is of unknown type 'garbage': <OID_PATH>
    [ other fsck output ]

The complaint about a "hash mismatch" was simply an emergent property
of how we'd fall though from read_loose_object() into fsck_loose()
when we didn't get the data we expected. Now we'll correctly note that
the object type is invalid.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/cat-file.c |  7 +++++--
 builtin/fsck.c     | 22 ++++++++++++++++++----
 object-file.c      | 31 +++++++++++++++----------------
 object-store.h     |  4 ++--
 t/t1450-fsck.sh    | 23 ++++++++++++++++++++++-
 5 files changed, 62 insertions(+), 25 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 5ebf13359e8..1063576a982 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -74,6 +74,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 	struct strbuf sb = STRBUF_INIT;
 	unsigned flags = OBJECT_INFO_LOOKUP_REPLACE;
 	const char *path = force_path;
+	int ret;
 
 	if (unknown_type)
 		flags |= OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
@@ -92,7 +93,8 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 	switch (opt) {
 	case 't':
 		oi.type_name = &sb;
-		if (oid_object_info_extended(the_repository, &oid, &oi, flags) < 0)
+		ret = oid_object_info_extended(the_repository, &oid, &oi, flags);
+		if (!unknown_type && ret < 0)
 			die("git cat-file: could not get object info");
 		if (sb.len) {
 			printf("%s\n", sb.buf);
@@ -103,7 +105,8 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 
 	case 's':
 		oi.sizep = &size;
-		if (oid_object_info_extended(the_repository, &oid, &oi, flags) < 0)
+		ret = oid_object_info_extended(the_repository, &oid, &oi, flags);
+		if (!unknown_type && ret < 0)
 			die("git cat-file: could not get object info");
 		printf("%"PRIuMAX"\n", (uintmax_t)size);
 		return 0;
diff --git a/builtin/fsck.c b/builtin/fsck.c
index d92c530863d..c8ab14d1545 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -601,12 +601,26 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	unsigned long size;
 	void *contents;
 	int eaten;
-
-	if (read_loose_object(path, oid, &type, &size, &contents,
-			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
-		errors_found |= ERROR_OBJECT;
+	struct strbuf sb = STRBUF_INIT;
+	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
+	struct object_info oi;
+	int found = 0;
+	oi.type_name = &sb;
+	oi.sizep = &size;
+	oi.typep = &type;
+
+	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
+		found |= ERROR_OBJECT;
 		error(_("%s: object corrupt or missing: %s"),
 		      oid_to_hex(oid), path);
+	}
+	if (type < 0) {
+		found |= ERROR_OBJECT;
+		error(_("%s: object is of unknown type '%s': %s"),
+		      oid_to_hex(oid), sb.buf, path);
+	}
+	if (found) {
+		errors_found |= ERROR_OBJECT;
 		return 0; /* keep checking other objects */
 	}
 
diff --git a/object-file.c b/object-file.c
index 26560a6281c..e744a06637b 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1323,9 +1323,7 @@ int parse_loose_header(const char *hdr,
 	 * we're obtaining the type using '--allow-unknown-type'
 	 * option.
 	 */
-	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
-		type = 0;
-	else if (type < 0)
+	if (type < 0 && !(flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE))
 		die(_("invalid object type"));
 	if (oi->typep)
 		*oi->typep = type;
@@ -1407,14 +1405,17 @@ static int loose_object_info(struct repository *r,
 	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
-	if (status < 0)
+	if (status < 0) {
 		; /* Do nothing */
-	else if (hdrbuf.len) {
+	} else if (hdrbuf.len) {
 		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
-		status = error(_("unable to parse %s header"), oid_to_hex(oid));
+	} else {
+		status = parse_loose_header(hdr, oi, flags);
+		if (status < 0 && !(flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE))
+			error(_("unable to parse %s header"), oid_to_hex(oid));
+	}
 
 	if (status >= 0 && oi->contentp) {
 		*oi->contentp = unpack_loose_rest(&stream, hdr,
@@ -2488,9 +2489,8 @@ static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
 		      void **contents,
+		      struct object_info *oi,
 		      unsigned int oi_flags)
 {
 	int ret = -1;
@@ -2498,8 +2498,8 @@ int read_loose_object(const char *path,
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
-	struct object_info oi = OBJECT_INFO_INIT;
-	oi.sizep = size;
+	enum object_type *type = oi->typep;
+	unsigned long *size = oi->sizep;
 
 	*contents = NULL;
 
@@ -2514,9 +2514,9 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, &oi, oi_flags);
-	if (*type < 0) {
-		error(_("unable to parse header of %s"), path);
+	*type = parse_loose_header(hdr, oi, oi_flags);
+	if (*type < 0 && !(oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
+		error(_("unable to parse header %s"), path);
 		git_inflate_end(&stream);
 		goto out;
 	}
@@ -2532,8 +2532,7 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size,
-					   type_name(*type))) {
+					   *contents, *size, oi->type_name->buf)) {
 			error(_("hash mismatch for %s (expected %s)"), path,
 			      oid_to_hex(expected_oid));
 			free(*contents);
diff --git a/object-store.h b/object-store.h
index ab86c8bf32c..786c5c34704 100644
--- a/object-store.h
+++ b/object-store.h
@@ -241,11 +241,11 @@ int force_object_loose(const struct object_id *oid, time_t mtime);
  *
  * Returns 0 on success, negative on error (details may be written to stderr).
  */
+struct object_info;
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
 		      void **contents,
+		      struct object_info *oi,
 		      unsigned int oi_flags);
 
 /* Retry packed storage after checking packed and loose storage */
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 025dd1b491a..214278e134a 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -66,6 +66,25 @@ test_expect_success 'object with hash mismatch' '
 	)
 '
 
+test_expect_success 'object with hash and type mismatch' '
+	test_create_repo hash-type-mismatch &&
+	(
+		cd hash-type-mismatch &&
+		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+		mv .git/objects/$old .git/objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+		test_must_fail git fsck 2>out &&
+		grep "^error: hash mismatch for " out &&
+		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+	)
+'
+
 test_expect_success 'branch pointing to non-commit' '
 	git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
 	test_when_finished "git update-ref -d refs/heads/invalid" &&
@@ -868,7 +887,9 @@ test_expect_success 'fsck error and recovery on invalid object type' '
 	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
 	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
 	test_must_fail git -C garbage-type fsck >out 2>err &&
-	grep "$garbage_blob: object corrupt or missing:" err &&
+	grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err &&
+	grep error: err >err.errors &&
+	test_line_count = 1 err.errors &&
 	grep "dangling blob $empty_blob" out
 '
 
-- 
2.31.1.445.g91d8e479b0a


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH 5/5] fsck: improve error on loose object hash mismatch
  2021-03-28  2:58   ` [PATCH 0/5] fsck: improve error reporting Ævar Arnfjörð Bjarmason
                       ` (3 preceding siblings ...)
  2021-03-28  2:58     ` [PATCH 4/5] fsck: improve the error " Ævar Arnfjörð Bjarmason
@ 2021-03-28  2:58     ` Ævar Arnfjörð Bjarmason
  2021-04-13  9:43     ` [PATCH v2 0/6] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
  6 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-28  2:58 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Jeff King, Ævar Arnfjörð Bjarmason

Improve the error that's emitted in cases where we find a loose object
we parse, but which isn't at the location we expect it to be.

Before this change we'd prefix the error with a not-a-OID derived from
the path at which the object was found, due to an emergent behavior in
how we'd end up with an "OID" in these codepaths.

Now we'll instead say what object we hashed, and what path it was
found at. Before this patch series e.g.:

    $ git hash-object --stdin -w -t blob </dev/null
    e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ mv objects/e6/ objects/e7

Would emit ("[...]" used to abbreviate the OIDs):

    git fsck
    error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
    error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]

Now we'll instead emit:

    error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]

Furthermore, we'll do the right thing when the object type and its
location are bad. I.e. this case:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    8315a83d2acc4c174aed59430f9a9c4ed926440f
    $ mv objects/83 objects/84

As noted in an earlier commits we'd simply die early in those cases,
until preceding commits fixed the hard die on invalid object type:

    $ git fsck
    fatal: invalid object type

Now we'll instead emit sensible error messages:

    $ git fsck
    error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...]
    error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...]

In both fsck.c and object-file.c we're using null_oid as a sentinel
value for checking whether we got far enough to be certain that the
issue was indeed this OID mismatch.

In the case of check_object_signature() I don't really trust all the
moving parts there to behave consistently, in the face of future
refactorings. Getting it wrong would mean that we'd potentially emit
no error at all on a failing check_object_signature(), or worse
misreport whatever issue we encountered. So let's use the new bug()
function to ferry and return code up to fsck_loose() in that case.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fast-export.c |  2 +-
 builtin/fsck.c        | 13 +++++++++----
 builtin/index-pack.c  |  2 +-
 builtin/mktag.c       |  3 ++-
 object-file.c         | 28 +++++++++++++++++++---------
 object-store.h        |  4 +++-
 object.c              |  4 ++--
 pack-check.c          |  3 ++-
 t/t1450-fsck.sh       |  8 +++++---
 9 files changed, 44 insertions(+), 23 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 85a76e0ef8b..bf0e266d83a 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -312,7 +312,7 @@ static void export_blob(const struct object_id *oid)
 		if (!buf)
 			die("could not read blob %s", oid_to_hex(oid));
 		if (check_object_signature(the_repository, oid, buf, size,
-					   type_name(type)) < 0)
+					   type_name(type), NULL) < 0)
 			die("oid mismatch in blob %s", oid_to_hex(oid));
 		object = parse_object_buffer(the_repository, oid, type,
 					     size, buf, &eaten);
diff --git a/builtin/fsck.c b/builtin/fsck.c
index c8ab14d1545..365b9124bdc 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -604,20 +604,25 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	struct strbuf sb = STRBUF_INIT;
 	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 	struct object_info oi;
+	struct object_id real_oid = null_oid;
 	int found = 0;
 	oi.type_name = &sb;
 	oi.sizep = &size;
 	oi.typep = &type;
 
-	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
+	if (read_loose_object(path, oid, &real_oid, &contents, &oi, oi_flags) < 0) {
 		found |= ERROR_OBJECT;
-		error(_("%s: object corrupt or missing: %s"),
-		      oid_to_hex(oid), path);
+		if (!oideq(&real_oid, oid))
+			error(_("%s: hash-path mismatch, found at: %s"),
+			      oid_to_hex(&real_oid), path);
+		else
+			error(_("%s: object corrupt or missing: %s"),
+			      oid_to_hex(oid), path);
 	}
 	if (type < 0) {
 		found |= ERROR_OBJECT;
 		error(_("%s: object is of unknown type '%s': %s"),
-		      oid_to_hex(oid), sb.buf, path);
+		      oid_to_hex(&real_oid), sb.buf, path);
 	}
 	if (found) {
 		errors_found |= ERROR_OBJECT;
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 21899687e2c..93044e9e618 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1420,7 +1420,7 @@ static void fix_unresolved_deltas(struct hashfile *f)
 
 		if (check_object_signature(the_repository, &d->oid,
 					   data, size,
-					   type_name(type)))
+					   type_name(type), NULL))
 			die(_("local object %s is corrupt"), oid_to_hex(&d->oid));
 
 		/*
diff --git a/builtin/mktag.c b/builtin/mktag.c
index 41a399a69e4..cfecbcd664e 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -65,7 +65,8 @@ static int verify_object_in_tag(struct object_id *tagged_oid, int *tagged_type)
 
 	repl = lookup_replace_object(the_repository, tagged_oid);
 	ret = check_object_signature(the_repository, repl,
-				     buffer, size, type_name(*tagged_type));
+				     buffer, size, type_name(*tagged_type),
+				     NULL);
 	free(buffer);
 
 	return ret;
diff --git a/object-file.c b/object-file.c
index e744a06637b..7aa80701aa7 100644
--- a/object-file.c
+++ b/object-file.c
@@ -993,9 +993,11 @@ void *xmmap(void *start, size_t length,
  * the streaming interface and rehash it to do the same.
  */
 int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *map, unsigned long size, const char *type)
+			   void *map, unsigned long size, const char *type,
+			   struct object_id *real_oidp)
 {
-	struct object_id real_oid;
+	struct object_id tmp;
+	struct object_id *real_oid = real_oidp ? real_oidp : &tmp;
 	enum object_type obj_type;
 	struct git_istream *st;
 	git_hash_ctx c;
@@ -1003,8 +1005,8 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 	int hdrlen;
 
 	if (map) {
-		hash_object_file(r->hash_algo, map, size, type, &real_oid);
-		return !oideq(oid, &real_oid) ? -1 : 0;
+		hash_object_file(r->hash_algo, map, size, type, real_oid);
+		return !oideq(oid, real_oid) ? -1 : 0;
 	}
 
 	st = open_istream(r, oid, &obj_type, &size, NULL);
@@ -1029,9 +1031,9 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 			break;
 		r->hash_algo->update_fn(&c, buf, readlen);
 	}
-	r->hash_algo->final_fn(real_oid.hash, &c);
+	r->hash_algo->final_fn(real_oid->hash, &c);
 	close_istream(st);
-	return !oideq(oid, &real_oid) ? -1 : 0;
+	return !oideq(oid, real_oid) ? -1 : 0;
 }
 
 int git_open_cloexec(const char *name, int flags)
@@ -2489,6 +2491,7 @@ static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
+		      struct object_id *real_oid,
 		      void **contents,
 		      struct object_info *oi,
 		      unsigned int oi_flags)
@@ -2532,9 +2535,16 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size, oi->type_name->buf)) {
-			error(_("hash mismatch for %s (expected %s)"), path,
-			      oid_to_hex(expected_oid));
+					   *contents, *size, oi->type_name->buf, real_oid)) {
+			if (oideq(real_oid, &null_oid))
+				/*
+				 * Not a plain BUG() because if it
+				 * does happen we're in the middle of
+				 * an fsck we'd like to see to the
+				 * end.
+				 */
+				bug("BUG trying to compute hash for object at %s (expected %s)",
+				    path, oid_to_hex(expected_oid));
 			free(*contents);
 			goto out;
 		}
diff --git a/object-store.h b/object-store.h
index 786c5c34704..340b0f51f08 100644
--- a/object-store.h
+++ b/object-store.h
@@ -244,6 +244,7 @@ int force_object_loose(const struct object_id *oid, time_t mtime);
 struct object_info;
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
+		      struct object_id *real_oid,
 		      void **contents,
 		      struct object_info *oi,
 		      unsigned int oi_flags);
@@ -484,7 +485,8 @@ int unpack_loose_header(git_zstream *stream, unsigned char *map,
 int parse_loose_header(const char *hdr, struct object_info *oi,
 		       unsigned int flags);
 int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *buf, unsigned long size, const char *type);
+			   void *buf, unsigned long size, const char *type,
+			   struct object_id *real_oidp);
 int finalize_object_file(const char *tmpfile, const char *filename);
 int check_and_freshen_file(const char *fn, int freshen);
 
diff --git a/object.c b/object.c
index 78343781ae7..1cb4b30acd7 100644
--- a/object.c
+++ b/object.c
@@ -262,7 +262,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
 	    (!obj && repo_has_object_file(r, oid) &&
 	     oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
-		if (check_object_signature(r, repl, NULL, 0, NULL) < 0) {
+		if (check_object_signature(r, repl, NULL, 0, NULL, NULL) < 0) {
 			error(_("hash mismatch %s"), oid_to_hex(oid));
 			return NULL;
 		}
@@ -273,7 +273,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	buffer = repo_read_object_file(r, oid, &type, &size);
 	if (buffer) {
 		if (check_object_signature(r, repl, buffer, size,
-					   type_name(type)) < 0) {
+					   type_name(type), NULL) < 0) {
 			free(buffer);
 			error(_("hash mismatch %s"), oid_to_hex(repl));
 			return NULL;
diff --git a/pack-check.c b/pack-check.c
index 4b089fe8ec0..e6aa4442c90 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -142,7 +142,8 @@ static int verify_packfile(struct repository *r,
 			err = error("cannot unpack %s from %s at offset %"PRIuMAX"",
 				    oid_to_hex(&oid), p->pack_name,
 				    (uintmax_t)entries[i].offset);
-		else if (check_object_signature(r, &oid, data, size, type_name(type)))
+		else if (check_object_signature(r, &oid, data, size,
+						type_name(type), NULL))
 			err = error("packed %s from %s is corrupt",
 				    oid_to_hex(&oid), p->pack_name);
 		else if (fn) {
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 214278e134a..c7b084364b7 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -53,6 +53,7 @@ test_expect_success 'object with hash mismatch' '
 	(
 		cd hash-mismatch &&
 		oid=$(echo blob | git hash-object -w --stdin) &&
+		oldoid=$oid &&
 		old=$(test_oid_to_path "$oid") &&
 		new=$(dirname $old)/$(test_oid ff_2) &&
 		oid="$(dirname $new)$(basename $new)" &&
@@ -62,7 +63,7 @@ test_expect_success 'object with hash mismatch' '
 		cmt=$(echo bogus | git commit-tree $tree) &&
 		git update-ref refs/heads/bogus $cmt &&
 		test_must_fail git fsck 2>out &&
-		test_i18ngrep "$oid.*corrupt" out
+		grep "$oldoid: hash-path mismatch, found at: .*$new" out
 	)
 '
 
@@ -71,6 +72,7 @@ test_expect_success 'object with hash and type mismatch' '
 	(
 		cd hash-type-mismatch &&
 		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		oldoid=$oid &&
 		old=$(test_oid_to_path "$oid") &&
 		new=$(dirname $old)/$(test_oid ff_2) &&
 		oid="$(dirname $new)$(basename $new)" &&
@@ -80,8 +82,8 @@ test_expect_success 'object with hash and type mismatch' '
 		cmt=$(echo bogus | git commit-tree $tree) &&
 		git update-ref refs/heads/bogus $cmt &&
 		test_must_fail git fsck 2>out &&
-		grep "^error: hash mismatch for " out &&
-		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+		grep "^error: $oldoid: hash-path mismatch, found at: .*$new" out &&
+		grep "^error: $oldoid: object is of unknown type '"'"'garbage'"'"'" out
 	)
 '
 
-- 
2.31.1.445.g91d8e479b0a


^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH 4/4] usage.c: add a non-fatal bug() function to go with BUG()
  2021-03-28  2:26 ` [PATCH 4/4] usage.c: add a non-fatal bug() function to go with BUG() Ævar Arnfjörð Bjarmason
  2021-03-28  2:58   ` [PATCH 0/5] fsck: improve error reporting Ævar Arnfjörð Bjarmason
@ 2021-03-28  6:12   ` Junio C Hamano
  2021-03-28  7:17     ` Jeff King
  1 sibling, 1 reply; 155+ messages in thread
From: Junio C Hamano @ 2021-03-28  6:12 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Jeff Hostetler, Jonathan Tan, Jeff King

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Add a bug() function that works like error() except the message is
> prefixed with "bug:" instead of "error:".
>
> The reason this is needed is for e.g. the fsck code. If we encounter
> what we'd consider a BUG() in the middle of fsck traversal we'd still
> like to try as hard as possible to go past that object and complete
> the fsck, instead of hard dying. A follow-up commit will introduce
> such a use in object-file.c.

Reading the description above, i.e. "to go past that object", the
assumed use case seems to be to deal with a data error, not a
program bug (which is where we use BUG()---e.g. one helper function
in the fsck code detected that the caller wasn't careful enough to
vet the data it has and called it with incoherent data).  If we find
a tree entry whose mode bits implies that the object recorded in the
entry ought to be a blob, and later find out that the object turns
out to be a tree, that is a corrupt repository and the code that
detected is not buggy (and we shouldn't use BUG(), of course).

So, ... I am skeptical.  If the code is prepared to handle breakage,
we would not want to die, but then I am not sure why it has to be
different from error().

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH 4/4] usage.c: add a non-fatal bug() function to go with BUG()
  2021-03-28  6:12   ` [PATCH 4/4] usage.c: add a non-fatal bug() function to go with BUG() Junio C Hamano
@ 2021-03-28  7:17     ` Jeff King
  2021-03-29 13:25       ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 155+ messages in thread
From: Jeff King @ 2021-03-28  7:17 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, git, Jeff Hostetler,
	Jonathan Tan

On Sat, Mar 27, 2021 at 11:12:40PM -0700, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
> 
> > Add a bug() function that works like error() except the message is
> > prefixed with "bug:" instead of "error:".
> >
> > The reason this is needed is for e.g. the fsck code. If we encounter
> > what we'd consider a BUG() in the middle of fsck traversal we'd still
> > like to try as hard as possible to go past that object and complete
> > the fsck, instead of hard dying. A follow-up commit will introduce
> > such a use in object-file.c.
> 
> Reading the description above, i.e. "to go past that object", the
> assumed use case seems to be to deal with a data error, not a
> program bug (which is where we use BUG()---e.g. one helper function
> in the fsck code detected that the caller wasn't careful enough to
> vet the data it has and called it with incoherent data).  If we find
> a tree entry whose mode bits implies that the object recorded in the
> entry ought to be a blob, and later find out that the object turns
> out to be a tree, that is a corrupt repository and the code that
> detected is not buggy (and we shouldn't use BUG(), of course).
> 
> So, ... I am skeptical.  If the code is prepared to handle breakage,
> we would not want to die, but then I am not sure why it has to be
> different from error().

Yeah, this seems like it is missing the point of BUG() completely.  I
took a peek at patch 5/5 of the follow-on, which uses bug(). It looks
like it should really be an error() return or similar. The root cause
would be open_istream() on a loose object failing (which might be
corruption, or might even be a transient OS error!).

-Peff

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH 4/5] fsck: improve the error on invalid object types
  2021-03-28  2:58     ` [PATCH 4/5] fsck: improve the error " Ævar Arnfjörð Bjarmason
@ 2021-03-28  8:56       ` Johannes Sixt
  0 siblings, 0 replies; 155+ messages in thread
From: Johannes Sixt @ 2021-03-28  8:56 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano, Jeff King

A note on the subject line: we are all pretty sure that you want to
improve the code, not disimprove it. Therefore, the word "improve"
carries not meaning and only takes away space that could be used better.
Perhaps

   fsck: report invalid types recorded in objects

or something. Ditto for 5/5.

-- Hannes

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH 2/4] api docs: document BUG() in api-error-handling.txt
  2021-03-28  2:26 ` [PATCH 2/4] api docs: document BUG() in api-error-handling.txt Ævar Arnfjörð Bjarmason
@ 2021-03-29  5:37   ` Bagas Sanjaya
  0 siblings, 0 replies; 155+ messages in thread
From: Bagas Sanjaya @ 2021-03-29  5:37 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, Jeff Hostetler, Jonathan Tan, Jeff King, git

On 28/03/21 09.26, Ævar Arnfjörð Bjarmason wrote:
> -`die`, `usage`, `error`, and `warning` report errors of various
> -kinds.
> +`BUG`, `die`, `usage`, `error`, and `warning` report errors of
> +various kinds.
> +
> +- `BUG` is for failed internal assertions that should never happen,
> +  i.e. a bug in git itself.
>   
>   - `die` is for fatal application errors.  It prints a message to
>     the user and exits with status 128.
> 
Documentation looks OK.

-- 
An old man doll... just what I always wanted! - Clara

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH 4/4] usage.c: add a non-fatal bug() function to go with BUG()
  2021-03-28  7:17     ` Jeff King
@ 2021-03-29 13:25       ` Ævar Arnfjörð Bjarmason
  2021-03-31 11:06         ` Jeff King
  0 siblings, 1 reply; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-03-29 13:25 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Jeff Hostetler, Jonathan Tan


On Sun, Mar 28 2021, Jeff King wrote:

> On Sat, Mar 27, 2021 at 11:12:40PM -0700, Junio C Hamano wrote:
>
>> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>> 
>> > Add a bug() function that works like error() except the message is
>> > prefixed with "bug:" instead of "error:".
>> >
>> > The reason this is needed is for e.g. the fsck code. If we encounter
>> > what we'd consider a BUG() in the middle of fsck traversal we'd still
>> > like to try as hard as possible to go past that object and complete
>> > the fsck, instead of hard dying. A follow-up commit will introduce
>> > such a use in object-file.c.
>> 
>> Reading the description above, i.e. "to go past that object", the
>> assumed use case seems to be to deal with a data error, not a
>> program bug (which is where we use BUG()---e.g. one helper function
>> in the fsck code detected that the caller wasn't careful enough to
>> vet the data it has and called it with incoherent data).  If we find
>> a tree entry whose mode bits implies that the object recorded in the
>> entry ought to be a blob, and later find out that the object turns
>> out to be a tree, that is a corrupt repository and the code that
>> detected is not buggy (and we shouldn't use BUG(), of course).
>> 
>> So, ... I am skeptical.  If the code is prepared to handle breakage,
>> we would not want to die, but then I am not sure why it has to be
>> different from error().
>
> Yeah, this seems like it is missing the point of BUG() completely.  I
> took a peek at patch 5/5 of the follow-on, which uses bug(). It looks
> like it should really be an error() return or similar. The root cause
> would be open_istream() on a loose object failing (which might be
> corruption, or might even be a transient OS error!).

I don't feel strongly about this bug() thing, I'll drop it if you two
don't like it.

But that's not why I added it, yes you can now carefully read the code
and reason that this code is unreachable now, as I think it is.

But it may not stay that way, refactoring how we handle I/O errors
etc. further down the stack is the sort of thing that if this bug()
wasn't there would cause us to otherwise silently lose the
error. I.e. does check_object_signature() always promise to return
non-zero *only* if the signature isn't OK?

So maybe we are happy to just make that promise, in which case yes, this
should/could be an error() in this case.

But isn't this also useful for multi-threaded code? E.g. let's say fsck
learns to map-reduce its fsck-ing of objects across threads. One of them
encounters a BUG(). Do we want to hard kill the whole thing or try to
limp ahead and report partial results from the other thread(s)?

We have than now with pack-objects/grep, but I'm struggling to find a
use-case for a partial grep result if e.g. PCRE fails with a BUG(...)
...


^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH 4/4] usage.c: add a non-fatal bug() function to go with BUG()
  2021-03-29 13:25       ` Ævar Arnfjörð Bjarmason
@ 2021-03-31 11:06         ` Jeff King
  0 siblings, 0 replies; 155+ messages in thread
From: Jeff King @ 2021-03-31 11:06 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Junio C Hamano, git, Jeff Hostetler, Jonathan Tan

On Mon, Mar 29, 2021 at 03:25:09PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > Yeah, this seems like it is missing the point of BUG() completely.  I
> > took a peek at patch 5/5 of the follow-on, which uses bug(). It looks
> > like it should really be an error() return or similar. The root cause
> > would be open_istream() on a loose object failing (which might be
> > corruption, or might even be a transient OS error!).
> 
> I don't feel strongly about this bug() thing, I'll drop it if you two
> don't like it.
> 
> But that's not why I added it, yes you can now carefully read the code
> and reason that this code is unreachable now, as I think it is.
> 
> But it may not stay that way, refactoring how we handle I/O errors
> etc. further down the stack is the sort of thing that if this bug()
> wasn't there would cause us to otherwise silently lose the
> error. I.e. does check_object_signature() always promise to return
> non-zero *only* if the signature isn't OK?
> 
> So maybe we are happy to just make that promise, in which case yes, this
> should/could be an error() in this case.

I didn't dig into what check_object_signature() promises, but I don't
think it matters for my argument. If the case you are looking at can be
triggered by bad on-disk data, transient OS errors, etc, then it should
be an error() or a die(), or whatever is appropriate for the code. If it
is meant to be an invariant of the code that it should never trigger,
then it should be a BUG(), so that we loudly inform people that the
code's assumption has been violated.

But I do not see any point in a bug() that does not abort(). The point
of BUG() is that nobody is supposed to see it, and we should be as loud
as possible if we do.

And if there is a call site that is in doubt about what it may be fed,
then it should just be an error() or die().

> But isn't this also useful for multi-threaded code? E.g. let's say fsck
> learns to map-reduce its fsck-ing of objects across threads. One of them
> encounters a BUG(). Do we want to hard kill the whole thing or try to
> limp ahead and report partial results from the other thread(s)?

Yes, we want to hard kill. The point of BUG() is that it is not supposed
to happen, and there is no point in limping further.

-Peff

^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v2 0/3] trace2 docs: note that BUG() sends an "error" event
  2021-03-28  2:25 [PATCH 0/4] usage.c: add a non-fatal bug() + misc doc fixes Ævar Arnfjörð Bjarmason
                   ` (3 preceding siblings ...)
  2021-03-28  2:26 ` [PATCH 4/4] usage.c: add a non-fatal bug() function to go with BUG() Ævar Arnfjörð Bjarmason
@ 2021-04-13  9:08 ` Ævar Arnfjörð Bjarmason
  2021-04-13  9:08   ` [PATCH v2 1/3] usage.c: don't copy/paste the same comment three times Ævar Arnfjörð Bjarmason
                     ` (3 more replies)
  4 siblings, 4 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-13  9:08 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff Hostetler, Jonathan Tan, Jeff King,
	Eric Sunshine, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason

A trivial update to the trace2 docs to fix an omission
with "BUG()" not being listed alongside error(), die() etc.

v1 of this[1] added a non-fatal-but-logging bug() function, per the
discussion on v1 that's now gone.

1. http://lore.kernel.org/git/cover-0.6-00000000000-20210328T025618Z-avarab@gmail.com

Ævar Arnfjörð Bjarmason (3):
  usage.c: don't copy/paste the same comment three times
  api docs: document BUG() in api-error-handling.txt
  api docs: document that BUG() emits a trace2 error event

 Documentation/technical/api-error-handling.txt | 10 ++++++++--
 Documentation/technical/api-trace2.txt         |  2 +-
 usage.c                                        | 17 +++++------------
 3 files changed, 14 insertions(+), 15 deletions(-)

Range-diff against v1:
1:  a7b329c21cf ! 1:  2e4665b625b usage.c: don't copy/paste the same comment three times
    @@ Metadata
      ## Commit message ##
         usage.c: don't copy/paste the same comment three times
     
    -    In gee4512ed481 (trace2: create new combined trace facility,
    +    In ee4512ed481 (trace2: create new combined trace facility,
         2019-02-22) we started with two copies of this comment,
         0ee10fd1296 (usage: add trace2 entry upon warning(), 2020-11-23) added
         a third. Let's instead add an earlier comment that applies to all
2:  8c8b1dfd184 = 2:  ce78c79c9ac api docs: document BUG() in api-error-handling.txt
3:  f0e0d0daa6e = 3:  982f72345f1 api docs: document that BUG() emits a trace2 error event
4:  515d146cac8 < -:  ----------- usage.c: add a non-fatal bug() function to go with BUG()
-- 
2.31.1.645.g989d83ea6a6


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v2 1/3] usage.c: don't copy/paste the same comment three times
  2021-04-13  9:08 ` [PATCH v2 0/3] trace2 docs: note that BUG() sends an "error" event Ævar Arnfjörð Bjarmason
@ 2021-04-13  9:08   ` Ævar Arnfjörð Bjarmason
  2021-04-15 10:09     ` Jeff King
  2021-04-13  9:08   ` [PATCH v2 2/3] api docs: document BUG() in api-error-handling.txt Ævar Arnfjörð Bjarmason
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-13  9:08 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff Hostetler, Jonathan Tan, Jeff King,
	Eric Sunshine, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason

In ee4512ed481 (trace2: create new combined trace facility,
2019-02-22) we started with two copies of this comment,
0ee10fd1296 (usage: add trace2 entry upon warning(), 2020-11-23) added
a third. Let's instead add an earlier comment that applies to all
these mostly-the-same functions.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 usage.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/usage.c b/usage.c
index 1b206de36d6..c7d233b0de9 100644
--- a/usage.c
+++ b/usage.c
@@ -55,12 +55,13 @@ static NORETURN void usage_builtin(const char *err, va_list params)
 	exit(129);
 }
 
+/*
+ * We call trace2_cmd_error_va() in the below functions first and
+ * expect it to va_copy 'params' before using it (because an 'ap' can
+ * only be walked once).
+ */
 static NORETURN void die_builtin(const char *err, va_list params)
 {
-	/*
-	 * We call this trace2 function first and expect it to va_copy 'params'
-	 * before using it (because an 'ap' can only be walked once).
-	 */
 	trace2_cmd_error_va(err, params);
 
 	vreportf("fatal: ", err, params);
@@ -70,10 +71,6 @@ static NORETURN void die_builtin(const char *err, va_list params)
 
 static void error_builtin(const char *err, va_list params)
 {
-	/*
-	 * We call this trace2 function first and expect it to va_copy 'params'
-	 * before using it (because an 'ap' can only be walked once).
-	 */
 	trace2_cmd_error_va(err, params);
 
 	vreportf("error: ", err, params);
@@ -81,10 +78,6 @@ static void error_builtin(const char *err, va_list params)
 
 static void warn_builtin(const char *warn, va_list params)
 {
-	/*
-	 * We call this trace2 function first and expect it to va_copy 'params'
-	 * before using it (because an 'ap' can only be walked once).
-	 */
 	trace2_cmd_error_va(warn, params);
 
 	vreportf("warning: ", warn, params);
-- 
2.31.1.645.g989d83ea6a6


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v2 2/3] api docs: document BUG() in api-error-handling.txt
  2021-04-13  9:08 ` [PATCH v2 0/3] trace2 docs: note that BUG() sends an "error" event Ævar Arnfjörð Bjarmason
  2021-04-13  9:08   ` [PATCH v2 1/3] usage.c: don't copy/paste the same comment three times Ævar Arnfjörð Bjarmason
@ 2021-04-13  9:08   ` Ævar Arnfjörð Bjarmason
  2021-04-15 10:00     ` Jeff King
  2021-04-13  9:08   ` [PATCH v2 3/3] api docs: document that BUG() emits a trace2 error event Ævar Arnfjörð Bjarmason
  2021-04-15 10:10   ` [PATCH v2 0/3] trace2 docs: note that BUG() sends an "error" event Jeff King
  3 siblings, 1 reply; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-13  9:08 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff Hostetler, Jonathan Tan, Jeff King,
	Eric Sunshine, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason

When the BUG() function was added in d8193743e08 (usage.c: add BUG()
function, 2017-05-12) these docs added in 1f23cfe0ef5 (doc: document
error handling functions and conventions, 2014-12-03) were not
updated. Let's do that.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/technical/api-error-handling.txt | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/Documentation/technical/api-error-handling.txt b/Documentation/technical/api-error-handling.txt
index ceeedd485c9..71486abb2f0 100644
--- a/Documentation/technical/api-error-handling.txt
+++ b/Documentation/technical/api-error-handling.txt
@@ -1,8 +1,11 @@
 Error reporting in git
 ======================
 
-`die`, `usage`, `error`, and `warning` report errors of various
-kinds.
+`BUG`, `die`, `usage`, `error`, and `warning` report errors of
+various kinds.
+
+- `BUG` is for failed internal assertions that should never happen,
+  i.e. a bug in git itself.
 
 - `die` is for fatal application errors.  It prints a message to
   the user and exits with status 128.
-- 
2.31.1.645.g989d83ea6a6


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v2 3/3] api docs: document that BUG() emits a trace2 error event
  2021-04-13  9:08 ` [PATCH v2 0/3] trace2 docs: note that BUG() sends an "error" event Ævar Arnfjörð Bjarmason
  2021-04-13  9:08   ` [PATCH v2 1/3] usage.c: don't copy/paste the same comment three times Ævar Arnfjörð Bjarmason
  2021-04-13  9:08   ` [PATCH v2 2/3] api docs: document BUG() in api-error-handling.txt Ævar Arnfjörð Bjarmason
@ 2021-04-13  9:08   ` Ævar Arnfjörð Bjarmason
  2021-04-15 10:10   ` [PATCH v2 0/3] trace2 docs: note that BUG() sends an "error" event Jeff King
  3 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-13  9:08 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff Hostetler, Jonathan Tan, Jeff King,
	Eric Sunshine, Bagas Sanjaya,
	Ævar Arnfjörð Bjarmason

Correct documentation added in e544221d97a (trace2:
Documentation/technical/api-trace2.txt, 2019-02-22) to state that
calling BUG() also emits an "error" event. See ee4512ed481 (trace2:
create new combined trace facility, 2019-02-22) for the initial
implementation.

The BUG() function did not emit an event then however, that was only
changed later in 0a9dde4a04c (usage: trace2 BUG() invocations,
2021-02-05), that commit changed the code, but didn't update any of
the docs.

Let's also add a cross-reference from api-error-handling.txt.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Documentation/technical/api-error-handling.txt | 3 +++
 Documentation/technical/api-trace2.txt         | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/Documentation/technical/api-error-handling.txt b/Documentation/technical/api-error-handling.txt
index 71486abb2f0..8be4f4d0d6a 100644
--- a/Documentation/technical/api-error-handling.txt
+++ b/Documentation/technical/api-error-handling.txt
@@ -23,6 +23,9 @@ various kinds.
   without running into too many problems.  Like `error`, it
   returns -1 after reporting the situation to the caller.
 
+These reports will be logged via the trace2 facility. See the "error"
+event in link:api-trace2.txt[trace2 API].
+
 Customizable error handlers
 ---------------------------
 
diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt
index c65ffafc485..3f52f981a2d 100644
--- a/Documentation/technical/api-trace2.txt
+++ b/Documentation/technical/api-trace2.txt
@@ -465,7 +465,7 @@ completed.)
 ------------
 
 `"error"`::
-	This event is emitted when one of the `error()`, `die()`,
+	This event is emitted when one of the `BUG()`, `error()`, `die()`,
 	`warning()`, or `usage()` functions are called.
 +
 ------------
-- 
2.31.1.645.g989d83ea6a6


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v2 0/6] fsck: better "invalid object" error reporting
  2021-03-28  2:58   ` [PATCH 0/5] fsck: improve error reporting Ævar Arnfjörð Bjarmason
                       ` (4 preceding siblings ...)
  2021-03-28  2:58     ` [PATCH 5/5] fsck: improve error on loose object hash mismatch Ævar Arnfjörð Bjarmason
@ 2021-04-13  9:43     ` Ævar Arnfjörð Bjarmason
  2021-04-13  9:43       ` [PATCH v2 1/6] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
                         ` (5 more replies)
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
  6 siblings, 6 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-13  9:43 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

A re-roll of improved error reporting for fsck-ing bad loose
objects. See [1] for v1.

This is no longer based on the series to add a bug() function, since
as noted in the re-roll of that[2] that function is gone. This version
uses a plain BUG() for that condition.

Other than that the only change is improved commit messages, and I
added a trivial patch to move read_loose_object() around in
object-store.h so I wouldn't need a forward declaration, and updated
the comment for that function.

1. https://lore.kernel.org/git/cover-0.6-00000000000-20210328T025618Z-avarab@gmail.com/
2. https://lore.kernel.org/git/cover-0.3-00000000000-20210413T090603Z-avarab@gmail.com


Ævar Arnfjörð Bjarmason (6):
  cache.h: move object functions to object-store.h
  fsck tests: refactor one test to use a sub-repo
  fsck: don't hard die on invalid object types
  object-store.h: move read_loose_object() below 'struct object_info'
  fsck: report invalid types recorded in objects
  fsck: report invalid object type-path combinations

 builtin/cat-file.c    |  7 +++--
 builtin/fast-export.c |  2 +-
 builtin/fsck.c        | 28 +++++++++++++++---
 builtin/index-pack.c  |  2 +-
 builtin/mktag.c       |  3 +-
 cache.h               | 10 -------
 object-file.c         | 66 +++++++++++++++++++++----------------------
 object-store.h        | 39 ++++++++++++++++---------
 object.c              |  4 +--
 pack-check.c          |  3 +-
 streaming.c           |  5 +++-
 t/t1450-fsck.sh       | 64 ++++++++++++++++++++++++++++++-----------
 12 files changed, 146 insertions(+), 87 deletions(-)

Range-diff against v1:
1:  f8f00db8d31 = 1:  37c323a2410 cache.h: move object functions to object-store.h
2:  3e547289408 = 2:  5a2cd6cca9c fsck tests: refactor one test to use a sub-repo
3:  74654a01ba3 = 3:  d0d9cb33315 fsck: don't hard die on invalid object types
-:  ----------- > 4:  81fffefcf99 object-store.h: move read_loose_object() below 'struct object_info'
4:  d23fb5cd039 ! 5:  5fb6ac4faee fsck: improve the error on invalid object types
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    fsck: improve the error on invalid object types
    +    fsck: report invalid types recorded in objects
     
         Continue the work in the preceding commit and improve the error on:
     
    @@ object-file.c: int read_loose_object(const char *path,
      			free(*contents);
     
      ## object-store.h ##
    -@@ object-store.h: int force_object_loose(const struct object_id *oid, time_t mtime);
    +@@ object-store.h: int oid_object_info_extended(struct repository *r,
    + 
    + /*
    +  * Open the loose object at path, check its hash, and return the contents,
    ++ * use the "oi" argument to assert things about the object, or e.g. populate its
    +  * type, and size. If the object is a blob, then "contents" may return NULL,
    +  * to allow streaming of large blobs.
       *
    -  * Returns 0 on success, negative on error (details may be written to stderr).
    +@@ object-store.h: int oid_object_info_extended(struct repository *r,
       */
    -+struct object_info;
      int read_loose_object(const char *path,
      		      const struct object_id *expected_oid,
     -		      enum object_type *type,
    @@ object-store.h: int force_object_loose(const struct object_id *oid, time_t mtime
     +		      struct object_info *oi,
      		      unsigned int oi_flags);
      
    - /* Retry packed storage after checking packed and loose storage */
    + /*
     
      ## t/t1450-fsck.sh ##
     @@ t/t1450-fsck.sh: test_expect_success 'object with hash mismatch' '
5:  bcec536b0f6 ! 6:  226d2031bcf fsck: improve error on loose object hash mismatch
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    +    fsck: report invalid object type-path combinations
    +
         fsck: improve error on loose object hash mismatch
     
         Improve the error that's emitted in cases where we find a loose object
    @@ object-file.c: int read_loose_object(const char *path,
     -			      oid_to_hex(expected_oid));
     +					   *contents, *size, oi->type_name->buf, real_oid)) {
     +			if (oideq(real_oid, &null_oid))
    -+				/*
    -+				 * Not a plain BUG() because if it
    -+				 * does happen we're in the middle of
    -+				 * an fsck we'd like to see to the
    -+				 * end.
    -+				 */
    -+				bug("BUG trying to compute hash for object at %s (expected %s)",
    -+				    path, oid_to_hex(expected_oid));
    ++				BUG("should only get OID mismatch errors with mapped contents");
      			free(*contents);
      			goto out;
      		}
     
      ## object-store.h ##
    -@@ object-store.h: int force_object_loose(const struct object_id *oid, time_t mtime);
    - struct object_info;
    +@@ object-store.h: int oid_object_info_extended(struct repository *r,
    +  */
      int read_loose_object(const char *path,
      		      const struct object_id *expected_oid,
     +		      struct object_id *real_oid,
-- 
2.31.1.645.g989d83ea6a6


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v2 1/6] cache.h: move object functions to object-store.h
  2021-04-13  9:43     ` [PATCH v2 0/6] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
@ 2021-04-13  9:43       ` Ævar Arnfjörð Bjarmason
  2021-04-13  9:43       ` [PATCH v2 2/6] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
                         ` (4 subsequent siblings)
  5 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-13  9:43 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Move the declaration of some ancient object functions added in
e.g. c4483576b8d (Add "unpack_sha1_header()" helper function,
2005-06-01) from cache.h to object-store.h. This continues work
started in cbd53a2193d (object-store: move object access functions to
object-store.h, 2018-05-15).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache.h        | 10 ----------
 object-store.h |  9 +++++++++
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/cache.h b/cache.h
index 148d9ab5f18..f2e7fc615ba 100644
--- a/cache.h
+++ b/cache.h
@@ -1279,16 +1279,6 @@ char *xdg_cache_home(const char *filename);
 
 int git_open_cloexec(const char *name, int flags);
 #define git_open(name) git_open_cloexec(name, O_RDONLY)
-int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
-
-int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *buf, unsigned long size, const char *type);
-
-int finalize_object_file(const char *tmpfile, const char *filename);
-
-/* Helper to check and "touch" a file */
-int check_and_freshen_file(const char *fn, int freshen);
 
 extern const signed char hexval_table[256];
 static inline unsigned int hexval(unsigned char c)
diff --git a/object-store.h b/object-store.h
index ec32c23dcb5..9117115a50c 100644
--- a/object-store.h
+++ b/object-store.h
@@ -477,4 +477,13 @@ int for_each_object_in_pack(struct packed_git *p,
 int for_each_packed_object(each_packed_object_fn, void *,
 			   enum for_each_object_flags flags);
 
+int unpack_loose_header(git_zstream *stream, unsigned char *map,
+			unsigned long mapsize, void *buffer,
+			unsigned long bufsiz);
+int parse_loose_header(const char *hdr, unsigned long *sizep);
+int check_object_signature(struct repository *r, const struct object_id *oid,
+			   void *buf, unsigned long size, const char *type);
+int finalize_object_file(const char *tmpfile, const char *filename);
+int check_and_freshen_file(const char *fn, int freshen);
+
 #endif /* OBJECT_STORE_H */
-- 
2.31.1.645.g989d83ea6a6


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v2 2/6] fsck tests: refactor one test to use a sub-repo
  2021-04-13  9:43     ` [PATCH v2 0/6] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
  2021-04-13  9:43       ` [PATCH v2 1/6] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
@ 2021-04-13  9:43       ` Ævar Arnfjörð Bjarmason
  2021-04-13  9:43       ` [PATCH v2 3/6] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
                         ` (3 subsequent siblings)
  5 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-13  9:43 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Refactor one of the fsck tests to use a throwaway repository. It's a
pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
teardown of a tests so we're not leaving corrupt content for the next
test.

We should instead simply use something like this test_create_repo
pattern. It's both less verbose, and makes things easier to debug as a
failing test can have their state left behind under -d without
damaging the state for other tests.

But let's punt on that general refactoring and just change this one
test, I'm going to change it further in subsequent commits.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 34 ++++++++++++++++------------------
 1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 5071ac63a5b..1563b35f88c 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -48,24 +48,22 @@ remove_object () {
 	rm "$(sha1_file "$1")"
 }
 
-test_expect_success 'object with bad sha1' '
-	sha=$(echo blob | git hash-object -w --stdin) &&
-	old=$(test_oid_to_path "$sha") &&
-	new=$(dirname $old)/$(test_oid ff_2) &&
-	sha="$(dirname $new)$(basename $new)" &&
-	mv .git/objects/$old .git/objects/$new &&
-	test_when_finished "remove_object $sha" &&
-	git update-index --add --cacheinfo 100644 $sha foo &&
-	test_when_finished "git read-tree -u --reset HEAD" &&
-	tree=$(git write-tree) &&
-	test_when_finished "remove_object $tree" &&
-	cmt=$(echo bogus | git commit-tree $tree) &&
-	test_when_finished "remove_object $cmt" &&
-	git update-ref refs/heads/bogus $cmt &&
-	test_when_finished "git update-ref -d refs/heads/bogus" &&
-
-	test_must_fail git fsck 2>out &&
-	test_i18ngrep "$sha.*corrupt" out
+test_expect_success 'object with hash mismatch' '
+	test_create_repo hash-mismatch &&
+	(
+		cd hash-mismatch &&
+		oid=$(echo blob | git hash-object -w --stdin) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+		mv .git/objects/$old .git/objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+		test_must_fail git fsck 2>out &&
+		test_i18ngrep "$oid.*corrupt" out
+	)
 '
 
 test_expect_success 'branch pointing to non-commit' '
-- 
2.31.1.645.g989d83ea6a6


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v2 3/6] fsck: don't hard die on invalid object types
  2021-04-13  9:43     ` [PATCH v2 0/6] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
  2021-04-13  9:43       ` [PATCH v2 1/6] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
  2021-04-13  9:43       ` [PATCH v2 2/6] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
@ 2021-04-13  9:43       ` Ævar Arnfjörð Bjarmason
  2021-04-23 14:26         ` Jeff King
  2021-04-13  9:43       ` [PATCH v2 4/6] object-store.h: move read_loose_object() below 'struct object_info' Ævar Arnfjörð Bjarmason
                         ` (2 subsequent siblings)
  5 siblings, 1 reply; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-13  9:43 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Change builtin/fsck.c to pass down a
OBJECT_INFO_ALLOW_UNKNOWN_TYPE. This changes this very ungraceful
error:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    <OID>
    $ git fsck
    fatal: invalid object type
    $

Into:

    $ git fsck
    error: hash mismatch for <OID_PATH> (expected <OID>)
    error: <OID>: object corrupt or missing: <OID_PATH>
    [ the rest of the fsck output here, i.e. it didn't hard die ]

We'll still exit with non-zero, but now we'll finish the rest of the
traversal. The tests that's being added here asserts that we'll still
complain about other fsck issues (e.g. an unrelated dangling blob).

But why are we complaining about a "hash mismatch" for an object of a
type we don't know about? We shouldn't. This is the bare minimal
change needed to not make fsck hard die on a repository that's been
corrupted in this manner. In subsequent commits we'll teach fsck to
recognize this particular type of corruption and emit a better error
message.

The parse_loose_header() function being changed here is only used in
builtin/fsck.c, see f6371f92104 (sha1_file: add read_loose_object()
function, 2017-01-13) for its introduction.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c  |  3 ++-
 object-file.c   | 24 ++++++++++--------------
 object-store.h  |  6 ++++--
 streaming.c     |  5 ++++-
 t/t1450-fsck.sh |  9 +++++++++
 5 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 70ff95837ae..686f7cdfea0 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -600,7 +600,8 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	void *contents;
 	int eaten;
 
-	if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
+	if (read_loose_object(path, oid, &type, &size, &contents,
+			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
 		errors_found |= ERROR_OBJECT;
 		error(_("%s: object corrupt or missing: %s"),
 		      oid_to_hex(oid), path);
diff --git a/object-file.c b/object-file.c
index 624af408cdc..26560a6281c 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1294,8 +1294,9 @@ static void *unpack_loose_rest(git_zstream *stream,
  * too permissive for what we want to check. So do an anal
  * object header parse by hand.
  */
-static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
-				       unsigned int flags)
+int parse_loose_header(const char *hdr,
+		       struct object_info *oi,
+		       unsigned int flags)
 {
 	const char *type_buf = hdr;
 	unsigned long size;
@@ -1355,14 +1356,6 @@ static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
 	return *hdr ? -1 : type;
 }
 
-int parse_loose_header(const char *hdr, unsigned long *sizep)
-{
-	struct object_info oi = OBJECT_INFO_INIT;
-
-	oi.sizep = sizep;
-	return parse_loose_header_extended(hdr, &oi, 0);
-}
-
 static int loose_object_info(struct repository *r,
 			     const struct object_id *oid,
 			     struct object_info *oi, int flags)
@@ -1417,10 +1410,10 @@ static int loose_object_info(struct repository *r,
 	if (status < 0)
 		; /* Do nothing */
 	else if (hdrbuf.len) {
-		if ((status = parse_loose_header_extended(hdrbuf.buf, oi, flags)) < 0)
+		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header_extended(hdr, oi, flags)) < 0)
+	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
 
 	if (status >= 0 && oi->contentp) {
@@ -2497,13 +2490,16 @@ int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
 		      enum object_type *type,
 		      unsigned long *size,
-		      void **contents)
+		      void **contents,
+		      unsigned int oi_flags)
 {
 	int ret = -1;
 	void *map = NULL;
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
+	struct object_info oi = OBJECT_INFO_INIT;
+	oi.sizep = size;
 
 	*contents = NULL;
 
@@ -2518,7 +2514,7 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, size);
+	*type = parse_loose_header(hdr, &oi, oi_flags);
 	if (*type < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
diff --git a/object-store.h b/object-store.h
index 9117115a50c..ab86c8bf32c 100644
--- a/object-store.h
+++ b/object-store.h
@@ -245,7 +245,8 @@ int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
 		      enum object_type *type,
 		      unsigned long *size,
-		      void **contents);
+		      void **contents,
+		      unsigned int oi_flags);
 
 /* Retry packed storage after checking packed and loose storage */
 #define HAS_OBJECT_RECHECK_PACKED 1
@@ -480,7 +481,8 @@ int for_each_packed_object(each_packed_object_fn, void *,
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
 			unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
+int parse_loose_header(const char *hdr, struct object_info *oi,
+		       unsigned int flags);
 int check_object_signature(struct repository *r, const struct object_id *oid,
 			   void *buf, unsigned long size, const char *type);
 int finalize_object_file(const char *tmpfile, const char *filename);
diff --git a/streaming.c b/streaming.c
index 800f07a52cc..e5d4dd2f654 100644
--- a/streaming.c
+++ b/streaming.c
@@ -341,6 +341,9 @@ static struct stream_vtbl loose_vtbl = {
 
 static open_method_decl(loose)
 {
+	struct object_info oi2 = OBJECT_INFO_INIT;
+	oi2.sizep = &st->size;
+
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
 		return -1;
@@ -349,7 +352,7 @@ static open_method_decl(loose)
 				 st->u.loose.mapsize,
 				 st->u.loose.hdr,
 				 sizeof(st->u.loose.hdr)) < 0) ||
-	    (parse_loose_header(st->u.loose.hdr, &st->size) < 0)) {
+	    (parse_loose_header(st->u.loose.hdr, &oi2, 0) < 0)) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
 		return -1;
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 1563b35f88c..025dd1b491a 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -863,4 +863,13 @@ test_expect_success 'detect corrupt index file in fsck' '
 	test_i18ngrep "bad index file" errors
 '
 
+test_expect_success 'fsck error and recovery on invalid object type' '
+	test_create_repo garbage-type &&
+	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
+	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
+	test_must_fail git -C garbage-type fsck >out 2>err &&
+	grep "$garbage_blob: object corrupt or missing:" err &&
+	grep "dangling blob $empty_blob" out
+'
+
 test_done
-- 
2.31.1.645.g989d83ea6a6


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v2 4/6] object-store.h: move read_loose_object() below 'struct object_info'
  2021-04-13  9:43     ` [PATCH v2 0/6] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (2 preceding siblings ...)
  2021-04-13  9:43       ` [PATCH v2 3/6] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
@ 2021-04-13  9:43       ` Ævar Arnfjörð Bjarmason
  2021-04-23 14:27         ` Jeff King
  2021-04-13  9:43       ` [PATCH v2 5/6] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
  2021-04-13  9:43       ` [PATCH v2 6/6] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
  5 siblings, 1 reply; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-13  9:43 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Move the definition of read_loose_object() below "struct
object_info". In the next commit we'll add a "struct object_info *"
parameter to it, moving it will avoid a forward declaration of the
struct.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-store.h | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/object-store.h b/object-store.h
index ab86c8bf32c..4680dc68ee4 100644
--- a/object-store.h
+++ b/object-store.h
@@ -234,20 +234,6 @@ int pretend_object_file(void *, unsigned long, enum object_type,
 
 int force_object_loose(const struct object_id *oid, time_t mtime);
 
-/*
- * Open the loose object at path, check its hash, and return the contents,
- * type, and size. If the object is a blob, then "contents" may return NULL,
- * to allow streaming of large blobs.
- *
- * Returns 0 on success, negative on error (details may be written to stderr).
- */
-int read_loose_object(const char *path,
-		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
-		      void **contents,
-		      unsigned int oi_flags);
-
 /* Retry packed storage after checking packed and loose storage */
 #define HAS_OBJECT_RECHECK_PACKED 1
 
@@ -388,6 +374,20 @@ int oid_object_info_extended(struct repository *r,
 			     const struct object_id *,
 			     struct object_info *, unsigned flags);
 
+/*
+ * Open the loose object at path, check its hash, and return the contents,
+ * type, and size. If the object is a blob, then "contents" may return NULL,
+ * to allow streaming of large blobs.
+ *
+ * Returns 0 on success, negative on error (details may be written to stderr).
+ */
+int read_loose_object(const char *path,
+		      const struct object_id *expected_oid,
+		      enum object_type *type,
+		      unsigned long *size,
+		      void **contents,
+		      unsigned int oi_flags);
+
 /*
  * Iterate over the files in the loose-object parts of the object
  * directory "path", triggering the following callbacks:
-- 
2.31.1.645.g989d83ea6a6


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v2 5/6] fsck: report invalid types recorded in objects
  2021-04-13  9:43     ` [PATCH v2 0/6] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (3 preceding siblings ...)
  2021-04-13  9:43       ` [PATCH v2 4/6] object-store.h: move read_loose_object() below 'struct object_info' Ævar Arnfjörð Bjarmason
@ 2021-04-13  9:43       ` Ævar Arnfjörð Bjarmason
  2021-04-23 14:37         ` Jeff King
  2021-04-13  9:43       ` [PATCH v2 6/6] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
  5 siblings, 1 reply; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-13  9:43 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Continue the work in the preceding commit and improve the error on:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    $ git fsck
    error: hash mismatch for <OID_PATH> (expected <OID>)
    error: <OID>: object corrupt or missing: <OID_PATH>
    [ other fsck output ]

To instead emit:

    $ git fsck
    error: <OID>: object is of unknown type 'garbage': <OID_PATH>
    [ other fsck output ]

The complaint about a "hash mismatch" was simply an emergent property
of how we'd fall though from read_loose_object() into fsck_loose()
when we didn't get the data we expected. Now we'll correctly note that
the object type is invalid.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/cat-file.c |  7 +++++--
 builtin/fsck.c     | 22 ++++++++++++++++++----
 object-file.c      | 31 +++++++++++++++----------------
 object-store.h     |  4 ++--
 t/t1450-fsck.sh    | 23 ++++++++++++++++++++++-
 5 files changed, 62 insertions(+), 25 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 5ebf13359e8..1063576a982 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -74,6 +74,7 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 	struct strbuf sb = STRBUF_INIT;
 	unsigned flags = OBJECT_INFO_LOOKUP_REPLACE;
 	const char *path = force_path;
+	int ret;
 
 	if (unknown_type)
 		flags |= OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
@@ -92,7 +93,8 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 	switch (opt) {
 	case 't':
 		oi.type_name = &sb;
-		if (oid_object_info_extended(the_repository, &oid, &oi, flags) < 0)
+		ret = oid_object_info_extended(the_repository, &oid, &oi, flags);
+		if (!unknown_type && ret < 0)
 			die("git cat-file: could not get object info");
 		if (sb.len) {
 			printf("%s\n", sb.buf);
@@ -103,7 +105,8 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
 
 	case 's':
 		oi.sizep = &size;
-		if (oid_object_info_extended(the_repository, &oid, &oi, flags) < 0)
+		ret = oid_object_info_extended(the_repository, &oid, &oi, flags);
+		if (!unknown_type && ret < 0)
 			die("git cat-file: could not get object info");
 		printf("%"PRIuMAX"\n", (uintmax_t)size);
 		return 0;
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 686f7cdfea0..878191e53ca 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -599,12 +599,26 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	unsigned long size;
 	void *contents;
 	int eaten;
-
-	if (read_loose_object(path, oid, &type, &size, &contents,
-			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
-		errors_found |= ERROR_OBJECT;
+	struct strbuf sb = STRBUF_INIT;
+	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
+	struct object_info oi;
+	int found = 0;
+	oi.type_name = &sb;
+	oi.sizep = &size;
+	oi.typep = &type;
+
+	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
+		found |= ERROR_OBJECT;
 		error(_("%s: object corrupt or missing: %s"),
 		      oid_to_hex(oid), path);
+	}
+	if (type < 0) {
+		found |= ERROR_OBJECT;
+		error(_("%s: object is of unknown type '%s': %s"),
+		      oid_to_hex(oid), sb.buf, path);
+	}
+	if (found) {
+		errors_found |= ERROR_OBJECT;
 		return 0; /* keep checking other objects */
 	}
 
diff --git a/object-file.c b/object-file.c
index 26560a6281c..e744a06637b 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1323,9 +1323,7 @@ int parse_loose_header(const char *hdr,
 	 * we're obtaining the type using '--allow-unknown-type'
 	 * option.
 	 */
-	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
-		type = 0;
-	else if (type < 0)
+	if (type < 0 && !(flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE))
 		die(_("invalid object type"));
 	if (oi->typep)
 		*oi->typep = type;
@@ -1407,14 +1405,17 @@ static int loose_object_info(struct repository *r,
 	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
-	if (status < 0)
+	if (status < 0) {
 		; /* Do nothing */
-	else if (hdrbuf.len) {
+	} else if (hdrbuf.len) {
 		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
-		status = error(_("unable to parse %s header"), oid_to_hex(oid));
+	} else {
+		status = parse_loose_header(hdr, oi, flags);
+		if (status < 0 && !(flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE))
+			error(_("unable to parse %s header"), oid_to_hex(oid));
+	}
 
 	if (status >= 0 && oi->contentp) {
 		*oi->contentp = unpack_loose_rest(&stream, hdr,
@@ -2488,9 +2489,8 @@ static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
 		      void **contents,
+		      struct object_info *oi,
 		      unsigned int oi_flags)
 {
 	int ret = -1;
@@ -2498,8 +2498,8 @@ int read_loose_object(const char *path,
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
-	struct object_info oi = OBJECT_INFO_INIT;
-	oi.sizep = size;
+	enum object_type *type = oi->typep;
+	unsigned long *size = oi->sizep;
 
 	*contents = NULL;
 
@@ -2514,9 +2514,9 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, &oi, oi_flags);
-	if (*type < 0) {
-		error(_("unable to parse header of %s"), path);
+	*type = parse_loose_header(hdr, oi, oi_flags);
+	if (*type < 0 && !(oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
+		error(_("unable to parse header %s"), path);
 		git_inflate_end(&stream);
 		goto out;
 	}
@@ -2532,8 +2532,7 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size,
-					   type_name(*type))) {
+					   *contents, *size, oi->type_name->buf)) {
 			error(_("hash mismatch for %s (expected %s)"), path,
 			      oid_to_hex(expected_oid));
 			free(*contents);
diff --git a/object-store.h b/object-store.h
index 4680dc68ee4..3d88b8a7cd3 100644
--- a/object-store.h
+++ b/object-store.h
@@ -376,6 +376,7 @@ int oid_object_info_extended(struct repository *r,
 
 /*
  * Open the loose object at path, check its hash, and return the contents,
+ * use the "oi" argument to assert things about the object, or e.g. populate its
  * type, and size. If the object is a blob, then "contents" may return NULL,
  * to allow streaming of large blobs.
  *
@@ -383,9 +384,8 @@ int oid_object_info_extended(struct repository *r,
  */
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
 		      void **contents,
+		      struct object_info *oi,
 		      unsigned int oi_flags);
 
 /*
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 025dd1b491a..214278e134a 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -66,6 +66,25 @@ test_expect_success 'object with hash mismatch' '
 	)
 '
 
+test_expect_success 'object with hash and type mismatch' '
+	test_create_repo hash-type-mismatch &&
+	(
+		cd hash-type-mismatch &&
+		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+		mv .git/objects/$old .git/objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+		test_must_fail git fsck 2>out &&
+		grep "^error: hash mismatch for " out &&
+		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+	)
+'
+
 test_expect_success 'branch pointing to non-commit' '
 	git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
 	test_when_finished "git update-ref -d refs/heads/invalid" &&
@@ -868,7 +887,9 @@ test_expect_success 'fsck error and recovery on invalid object type' '
 	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
 	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
 	test_must_fail git -C garbage-type fsck >out 2>err &&
-	grep "$garbage_blob: object corrupt or missing:" err &&
+	grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err &&
+	grep error: err >err.errors &&
+	test_line_count = 1 err.errors &&
 	grep "dangling blob $empty_blob" out
 '
 
-- 
2.31.1.645.g989d83ea6a6


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v2 6/6] fsck: report invalid object type-path combinations
  2021-04-13  9:43     ` [PATCH v2 0/6] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (4 preceding siblings ...)
  2021-04-13  9:43       ` [PATCH v2 5/6] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
@ 2021-04-13  9:43       ` Ævar Arnfjörð Bjarmason
  5 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-13  9:43 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

fsck: improve error on loose object hash mismatch

Improve the error that's emitted in cases where we find a loose object
we parse, but which isn't at the location we expect it to be.

Before this change we'd prefix the error with a not-a-OID derived from
the path at which the object was found, due to an emergent behavior in
how we'd end up with an "OID" in these codepaths.

Now we'll instead say what object we hashed, and what path it was
found at. Before this patch series e.g.:

    $ git hash-object --stdin -w -t blob </dev/null
    e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ mv objects/e6/ objects/e7

Would emit ("[...]" used to abbreviate the OIDs):

    git fsck
    error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
    error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]

Now we'll instead emit:

    error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]

Furthermore, we'll do the right thing when the object type and its
location are bad. I.e. this case:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    8315a83d2acc4c174aed59430f9a9c4ed926440f
    $ mv objects/83 objects/84

As noted in an earlier commits we'd simply die early in those cases,
until preceding commits fixed the hard die on invalid object type:

    $ git fsck
    fatal: invalid object type

Now we'll instead emit sensible error messages:

    $ git fsck
    error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...]
    error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...]

In both fsck.c and object-file.c we're using null_oid as a sentinel
value for checking whether we got far enough to be certain that the
issue was indeed this OID mismatch.

In the case of check_object_signature() I don't really trust all the
moving parts there to behave consistently, in the face of future
refactorings. Getting it wrong would mean that we'd potentially emit
no error at all on a failing check_object_signature(), or worse
misreport whatever issue we encountered. So let's use the new bug()
function to ferry and return code up to fsck_loose() in that case.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fast-export.c |  2 +-
 builtin/fsck.c        | 13 +++++++++----
 builtin/index-pack.c  |  2 +-
 builtin/mktag.c       |  3 ++-
 object-file.c         | 21 ++++++++++++---------
 object-store.h        |  4 +++-
 object.c              |  4 ++--
 pack-check.c          |  3 ++-
 t/t1450-fsck.sh       |  8 +++++---
 9 files changed, 37 insertions(+), 23 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 85a76e0ef8b..bf0e266d83a 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -312,7 +312,7 @@ static void export_blob(const struct object_id *oid)
 		if (!buf)
 			die("could not read blob %s", oid_to_hex(oid));
 		if (check_object_signature(the_repository, oid, buf, size,
-					   type_name(type)) < 0)
+					   type_name(type), NULL) < 0)
 			die("oid mismatch in blob %s", oid_to_hex(oid));
 		object = parse_object_buffer(the_repository, oid, type,
 					     size, buf, &eaten);
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 878191e53ca..7713d992960 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -602,20 +602,25 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	struct strbuf sb = STRBUF_INIT;
 	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 	struct object_info oi;
+	struct object_id real_oid = null_oid;
 	int found = 0;
 	oi.type_name = &sb;
 	oi.sizep = &size;
 	oi.typep = &type;
 
-	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
+	if (read_loose_object(path, oid, &real_oid, &contents, &oi, oi_flags) < 0) {
 		found |= ERROR_OBJECT;
-		error(_("%s: object corrupt or missing: %s"),
-		      oid_to_hex(oid), path);
+		if (!oideq(&real_oid, oid))
+			error(_("%s: hash-path mismatch, found at: %s"),
+			      oid_to_hex(&real_oid), path);
+		else
+			error(_("%s: object corrupt or missing: %s"),
+			      oid_to_hex(oid), path);
 	}
 	if (type < 0) {
 		found |= ERROR_OBJECT;
 		error(_("%s: object is of unknown type '%s': %s"),
-		      oid_to_hex(oid), sb.buf, path);
+		      oid_to_hex(&real_oid), sb.buf, path);
 	}
 	if (found) {
 		errors_found |= ERROR_OBJECT;
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 15507b5cff0..d5fd81ebf39 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1421,7 +1421,7 @@ static void fix_unresolved_deltas(struct hashfile *f)
 
 		if (check_object_signature(the_repository, &d->oid,
 					   data, size,
-					   type_name(type)))
+					   type_name(type), NULL))
 			die(_("local object %s is corrupt"), oid_to_hex(&d->oid));
 
 		/*
diff --git a/builtin/mktag.c b/builtin/mktag.c
index dddcccdd368..3b2dbbb37e6 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -62,7 +62,8 @@ static int verify_object_in_tag(struct object_id *tagged_oid, int *tagged_type)
 
 	repl = lookup_replace_object(the_repository, tagged_oid);
 	ret = check_object_signature(the_repository, repl,
-				     buffer, size, type_name(*tagged_type));
+				     buffer, size, type_name(*tagged_type),
+				     NULL);
 	free(buffer);
 
 	return ret;
diff --git a/object-file.c b/object-file.c
index e744a06637b..ca54e76fda2 100644
--- a/object-file.c
+++ b/object-file.c
@@ -993,9 +993,11 @@ void *xmmap(void *start, size_t length,
  * the streaming interface and rehash it to do the same.
  */
 int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *map, unsigned long size, const char *type)
+			   void *map, unsigned long size, const char *type,
+			   struct object_id *real_oidp)
 {
-	struct object_id real_oid;
+	struct object_id tmp;
+	struct object_id *real_oid = real_oidp ? real_oidp : &tmp;
 	enum object_type obj_type;
 	struct git_istream *st;
 	git_hash_ctx c;
@@ -1003,8 +1005,8 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 	int hdrlen;
 
 	if (map) {
-		hash_object_file(r->hash_algo, map, size, type, &real_oid);
-		return !oideq(oid, &real_oid) ? -1 : 0;
+		hash_object_file(r->hash_algo, map, size, type, real_oid);
+		return !oideq(oid, real_oid) ? -1 : 0;
 	}
 
 	st = open_istream(r, oid, &obj_type, &size, NULL);
@@ -1029,9 +1031,9 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 			break;
 		r->hash_algo->update_fn(&c, buf, readlen);
 	}
-	r->hash_algo->final_fn(real_oid.hash, &c);
+	r->hash_algo->final_fn(real_oid->hash, &c);
 	close_istream(st);
-	return !oideq(oid, &real_oid) ? -1 : 0;
+	return !oideq(oid, real_oid) ? -1 : 0;
 }
 
 int git_open_cloexec(const char *name, int flags)
@@ -2489,6 +2491,7 @@ static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
+		      struct object_id *real_oid,
 		      void **contents,
 		      struct object_info *oi,
 		      unsigned int oi_flags)
@@ -2532,9 +2535,9 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size, oi->type_name->buf)) {
-			error(_("hash mismatch for %s (expected %s)"), path,
-			      oid_to_hex(expected_oid));
+					   *contents, *size, oi->type_name->buf, real_oid)) {
+			if (oideq(real_oid, &null_oid))
+				BUG("should only get OID mismatch errors with mapped contents");
 			free(*contents);
 			goto out;
 		}
diff --git a/object-store.h b/object-store.h
index 3d88b8a7cd3..c9c4d211de3 100644
--- a/object-store.h
+++ b/object-store.h
@@ -384,6 +384,7 @@ int oid_object_info_extended(struct repository *r,
  */
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
+		      struct object_id *real_oid,
 		      void **contents,
 		      struct object_info *oi,
 		      unsigned int oi_flags);
@@ -484,7 +485,8 @@ int unpack_loose_header(git_zstream *stream, unsigned char *map,
 int parse_loose_header(const char *hdr, struct object_info *oi,
 		       unsigned int flags);
 int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *buf, unsigned long size, const char *type);
+			   void *buf, unsigned long size, const char *type,
+			   struct object_id *real_oidp);
 int finalize_object_file(const char *tmpfile, const char *filename);
 int check_and_freshen_file(const char *fn, int freshen);
 
diff --git a/object.c b/object.c
index 78343781ae7..1cb4b30acd7 100644
--- a/object.c
+++ b/object.c
@@ -262,7 +262,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
 	    (!obj && repo_has_object_file(r, oid) &&
 	     oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
-		if (check_object_signature(r, repl, NULL, 0, NULL) < 0) {
+		if (check_object_signature(r, repl, NULL, 0, NULL, NULL) < 0) {
 			error(_("hash mismatch %s"), oid_to_hex(oid));
 			return NULL;
 		}
@@ -273,7 +273,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	buffer = repo_read_object_file(r, oid, &type, &size);
 	if (buffer) {
 		if (check_object_signature(r, repl, buffer, size,
-					   type_name(type)) < 0) {
+					   type_name(type), NULL) < 0) {
 			free(buffer);
 			error(_("hash mismatch %s"), oid_to_hex(repl));
 			return NULL;
diff --git a/pack-check.c b/pack-check.c
index 4b089fe8ec0..e6aa4442c90 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -142,7 +142,8 @@ static int verify_packfile(struct repository *r,
 			err = error("cannot unpack %s from %s at offset %"PRIuMAX"",
 				    oid_to_hex(&oid), p->pack_name,
 				    (uintmax_t)entries[i].offset);
-		else if (check_object_signature(r, &oid, data, size, type_name(type)))
+		else if (check_object_signature(r, &oid, data, size,
+						type_name(type), NULL))
 			err = error("packed %s from %s is corrupt",
 				    oid_to_hex(&oid), p->pack_name);
 		else if (fn) {
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 214278e134a..c7b084364b7 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -53,6 +53,7 @@ test_expect_success 'object with hash mismatch' '
 	(
 		cd hash-mismatch &&
 		oid=$(echo blob | git hash-object -w --stdin) &&
+		oldoid=$oid &&
 		old=$(test_oid_to_path "$oid") &&
 		new=$(dirname $old)/$(test_oid ff_2) &&
 		oid="$(dirname $new)$(basename $new)" &&
@@ -62,7 +63,7 @@ test_expect_success 'object with hash mismatch' '
 		cmt=$(echo bogus | git commit-tree $tree) &&
 		git update-ref refs/heads/bogus $cmt &&
 		test_must_fail git fsck 2>out &&
-		test_i18ngrep "$oid.*corrupt" out
+		grep "$oldoid: hash-path mismatch, found at: .*$new" out
 	)
 '
 
@@ -71,6 +72,7 @@ test_expect_success 'object with hash and type mismatch' '
 	(
 		cd hash-type-mismatch &&
 		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		oldoid=$oid &&
 		old=$(test_oid_to_path "$oid") &&
 		new=$(dirname $old)/$(test_oid ff_2) &&
 		oid="$(dirname $new)$(basename $new)" &&
@@ -80,8 +82,8 @@ test_expect_success 'object with hash and type mismatch' '
 		cmt=$(echo bogus | git commit-tree $tree) &&
 		git update-ref refs/heads/bogus $cmt &&
 		test_must_fail git fsck 2>out &&
-		grep "^error: hash mismatch for " out &&
-		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+		grep "^error: $oldoid: hash-path mismatch, found at: .*$new" out &&
+		grep "^error: $oldoid: object is of unknown type '"'"'garbage'"'"'" out
 	)
 '
 
-- 
2.31.1.645.g989d83ea6a6


^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v2 2/3] api docs: document BUG() in api-error-handling.txt
  2021-04-13  9:08   ` [PATCH v2 2/3] api docs: document BUG() in api-error-handling.txt Ævar Arnfjörð Bjarmason
@ 2021-04-15 10:00     ` Jeff King
  0 siblings, 0 replies; 155+ messages in thread
From: Jeff King @ 2021-04-15 10:00 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff Hostetler, Jonathan Tan, Eric Sunshine,
	Bagas Sanjaya

On Tue, Apr 13, 2021 at 11:08:20AM +0200, Ævar Arnfjörð Bjarmason wrote:

> When the BUG() function was added in d8193743e08 (usage.c: add BUG()
> function, 2017-05-12) these docs added in 1f23cfe0ef5 (doc: document
> error handling functions and conventions, 2014-12-03) were not
> updated. Let's do that.

Wow, I had no idea this file even existed (most of the time I looked at
technical/api-* the contents were along the lines of "somebody should
write this").

IMHO this is more evidence that this stuff should just go into header
files, where people are more likely to see and update it.

> diff --git a/Documentation/technical/api-error-handling.txt b/Documentation/technical/api-error-handling.txt
> index ceeedd485c9..71486abb2f0 100644
> --- a/Documentation/technical/api-error-handling.txt
> +++ b/Documentation/technical/api-error-handling.txt
> @@ -1,8 +1,11 @@
>  Error reporting in git
>  ======================
>  
> -`die`, `usage`, `error`, and `warning` report errors of various
> -kinds.
> +`BUG`, `die`, `usage`, `error`, and `warning` report errors of
> +various kinds.
> +
> +- `BUG` is for failed internal assertions that should never happen,
> +  i.e. a bug in git itself.

Your change looks obviously correct, of course.

-Peff

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v2 1/3] usage.c: don't copy/paste the same comment three times
  2021-04-13  9:08   ` [PATCH v2 1/3] usage.c: don't copy/paste the same comment three times Ævar Arnfjörð Bjarmason
@ 2021-04-15 10:09     ` Jeff King
  0 siblings, 0 replies; 155+ messages in thread
From: Jeff King @ 2021-04-15 10:09 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff Hostetler, Jonathan Tan, Eric Sunshine,
	Bagas Sanjaya

On Tue, Apr 13, 2021 at 11:08:19AM +0200, Ævar Arnfjörð Bjarmason wrote:

> In ee4512ed481 (trace2: create new combined trace facility,
> 2019-02-22) we started with two copies of this comment,
> 0ee10fd1296 (usage: add trace2 entry upon warning(), 2020-11-23) added
> a third. Let's instead add an earlier comment that applies to all
> these mostly-the-same functions.

I'm sometimes wary of this aggregating comments like this, because
somebody who is just reading the third function may not think to look
further up to find the comment.

But this comment in particular does not seem dangerous if somebody
misses it (unlike comments that are warning people about bad things
happening).

> + */
>  static NORETURN void die_builtin(const char *err, va_list params)
>  {
> -	/*
> -	 * We call this trace2 function first and expect it to va_copy 'params'
> -	 * before using it (because an 'ap' can only be walked once).
> -	 */
>  	trace2_cmd_error_va(err, params);
>  
>  	vreportf("fatal: ", err, params);

TBH, I am not sure it adds all that much value in the first place. It is
only telling the reader that the code is not broken. But I kind of
wonder if it should simply be doing the defensive thing anyway:

  va_list cp;
  va_copy(cp, params);
  trace2_cmd_error_va(err, params);
  va_end(cp);

We are relying on a subtle contract with the trace2 code, and there is
no compile-time check that it will be upheld (and indeed, on many
platforms we might not even notice if it isn't, depending on how va_list
is implemented). Looking at the trace2 code, we do not even enforce this
centrally. It is up to each target to remember to va_copy()!

Anyway, that is somewhat outside the scope of your series (though I
would not be sad to see this comment go away entirely in favor of the
more defensive code above).

-Peff

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v2 0/3] trace2 docs: note that BUG() sends an "error" event
  2021-04-13  9:08 ` [PATCH v2 0/3] trace2 docs: note that BUG() sends an "error" event Ævar Arnfjörð Bjarmason
                     ` (2 preceding siblings ...)
  2021-04-13  9:08   ` [PATCH v2 3/3] api docs: document that BUG() emits a trace2 error event Ævar Arnfjörð Bjarmason
@ 2021-04-15 10:10   ` Jeff King
  3 siblings, 0 replies; 155+ messages in thread
From: Jeff King @ 2021-04-15 10:10 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff Hostetler, Jonathan Tan, Eric Sunshine,
	Bagas Sanjaya

On Tue, Apr 13, 2021 at 11:08:18AM +0200, Ævar Arnfjörð Bjarmason wrote:

> A trivial update to the trace2 docs to fix an omission
> with "BUG()" not being listed alongside error(), die() etc.
> 
> v1 of this[1] added a non-fatal-but-logging bug() function, per the
> discussion on v1 that's now gone.

All three look reasonable to me. I left a few comments, but you may want
to just ignore them. ;)

-Peff

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v2 3/6] fsck: don't hard die on invalid object types
  2021-04-13  9:43       ` [PATCH v2 3/6] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
@ 2021-04-23 14:26         ` Jeff King
  0 siblings, 0 replies; 155+ messages in thread
From: Jeff King @ 2021-04-23 14:26 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano, Johannes Sixt

On Tue, Apr 13, 2021 at 11:43:06AM +0200, Ævar Arnfjörð Bjarmason wrote:

> Change builtin/fsck.c to pass down a
> OBJECT_INFO_ALLOW_UNKNOWN_TYPE. This changes this very ungraceful
> error:
> 
>     $ git hash-object --stdin -w -t garbage --literally </dev/null
>     <OID>
>     $ git fsck
>     fatal: invalid object type
>     $
> 
> Into:
> 
>     $ git fsck
>     error: hash mismatch for <OID_PATH> (expected <OID>)
>     error: <OID>: object corrupt or missing: <OID_PATH>
>     [ the rest of the fsck output here, i.e. it didn't hard die ]
> 
> We'll still exit with non-zero, but now we'll finish the rest of the
> traversal. The tests that's being added here asserts that we'll still
> complain about other fsck issues (e.g. an unrelated dangling blob).
> 
> But why are we complaining about a "hash mismatch" for an object of a
> type we don't know about? We shouldn't. This is the bare minimal
> change needed to not make fsck hard die on a repository that's been
> corrupted in this manner. In subsequent commits we'll teach fsck to
> recognize this particular type of corruption and emit a better error
> message.

OK. The overall goal makes sense.

> The parse_loose_header() function being changed here is only used in
> builtin/fsck.c, see f6371f92104 (sha1_file: add read_loose_object()
> function, 2017-01-13) for its introduction.

This left me scratching my head for a long time. Did you mean
read_loose_object() in the beginning of the sentence?

> -static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
> -				       unsigned int flags)
> +int parse_loose_header(const char *hdr,
> +		       struct object_info *oi,
> +		       unsigned int flags)

So we are getting rid of the "extended" form and just making the
non-extended way take an OI. That seems kind of orthogonal...

> --- a/streaming.c
> +++ b/streaming.c
> @@ -341,6 +341,9 @@ static struct stream_vtbl loose_vtbl = {
>  
>  static open_method_decl(loose)
>  {
> +	struct object_info oi2 = OBJECT_INFO_INIT;
> +	oi2.sizep = &st->size;
> +

...and is what leads us to having to touch this otherwise unrelated
function.

I don't mind _too_ much getting rid of a helper function that would have
only one caller remaining (though "oi2" is a bit mysterious here). But
it seems like the patch would have been a lot easier to understand if
that were separately done (and explained). AFAICT the functional change
here is just passing the flag to read_loose_object(), which could be
calling parse_loose_header_extended(). I guess that would have to become
public, but that seems reasonable.

-Peff

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v2 4/6] object-store.h: move read_loose_object() below 'struct object_info'
  2021-04-13  9:43       ` [PATCH v2 4/6] object-store.h: move read_loose_object() below 'struct object_info' Ævar Arnfjörð Bjarmason
@ 2021-04-23 14:27         ` Jeff King
  0 siblings, 0 replies; 155+ messages in thread
From: Jeff King @ 2021-04-23 14:27 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano, Johannes Sixt

On Tue, Apr 13, 2021 at 11:43:07AM +0200, Ævar Arnfjörð Bjarmason wrote:

> Move the definition of read_loose_object() below "struct
> object_info". In the next commit we'll add a "struct object_info *"
> parameter to it, moving it will avoid a forward declaration of the
> struct.

This is a declaration, not a definition, no?

Not a huge deal, I just expected to see the function body moving when I
read the patch, but didn't.

-Peff

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v2 5/6] fsck: report invalid types recorded in objects
  2021-04-13  9:43       ` [PATCH v2 5/6] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
@ 2021-04-23 14:37         ` Jeff King
  2021-04-26 14:28           ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 155+ messages in thread
From: Jeff King @ 2021-04-23 14:37 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano, Johannes Sixt

On Tue, Apr 13, 2021 at 11:43:08AM +0200, Ævar Arnfjörð Bjarmason wrote:

> Continue the work in the preceding commit and improve the error on:
> 
>     $ git hash-object --stdin -w -t garbage --literally </dev/null
>     $ git fsck
>     error: hash mismatch for <OID_PATH> (expected <OID>)
>     error: <OID>: object corrupt or missing: <OID_PATH>
>     [ other fsck output ]
> 
> To instead emit:
> 
>     $ git fsck
>     error: <OID>: object is of unknown type 'garbage': <OID_PATH>
>     [ other fsck output ]

Sounds good.

> @@ -92,7 +93,8 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
>  	switch (opt) {
>  	case 't':
>  		oi.type_name = &sb;
> -		if (oid_object_info_extended(the_repository, &oid, &oi, flags) < 0)
> +		ret = oid_object_info_extended(the_repository, &oid, &oi, flags);
> +		if (!unknown_type && ret < 0)
>  			die("git cat-file: could not get object info");
>  		if (sb.len) {
>  			printf("%s\n", sb.buf);

Surprised to see changes to cat-file here, since the commit message is
all about fsck. Did the semantics of oid_object_info_extended() change?
I.e., this hunk implies to me that it is now returning -1 when we said
unknown types were OK, and we got one. But in that case, how do we
distinguish that from a real error?

Or more concretely, this patch causes this:

  $ git cat-file -t 1234567890123456789012345678901234567890
  fatal: git cat-file: could not get object info

  $ git.compile cat-file --allow-unknown-type -t 1234567890123456789012345678901234567890
  fatal: git cat-file 1234567890123456789012345678901234567890: bad file

Or much worse, from the next hunk:

  $ git cat-file -s 1234567890123456789012345678901234567890
  fatal: git cat-file: could not get object info

  $ git cat-file --allow-unknown-type -s 1234567890123456789012345678901234567890
  140732113568960

That seems wrong (so I think my "this hunk implies" is not true, but
then I am left with: what is the point of this hunk?).

-Peff

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v2 5/6] fsck: report invalid types recorded in objects
  2021-04-23 14:37         ` Jeff King
@ 2021-04-26 14:28           ` Ævar Arnfjörð Bjarmason
  2021-04-26 15:45             ` Jeff King
  0 siblings, 1 reply; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-04-26 14:28 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Junio C Hamano, Johannes Sixt


On Fri, Apr 23 2021, Jeff King wrote:

> On Tue, Apr 13, 2021 at 11:43:08AM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> Continue the work in the preceding commit and improve the error on:
>> 
>>     $ git hash-object --stdin -w -t garbage --literally </dev/null
>>     $ git fsck
>>     error: hash mismatch for <OID_PATH> (expected <OID>)
>>     error: <OID>: object corrupt or missing: <OID_PATH>
>>     [ other fsck output ]
>> 
>> To instead emit:
>> 
>>     $ git fsck
>>     error: <OID>: object is of unknown type 'garbage': <OID_PATH>
>>     [ other fsck output ]
>
> Sounds good.
>
>> @@ -92,7 +93,8 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
>>  	switch (opt) {
>>  	case 't':
>>  		oi.type_name = &sb;
>> -		if (oid_object_info_extended(the_repository, &oid, &oi, flags) < 0)
>> +		ret = oid_object_info_extended(the_repository, &oid, &oi, flags);
>> +		if (!unknown_type && ret < 0)
>>  			die("git cat-file: could not get object info");
>>  		if (sb.len) {
>>  			printf("%s\n", sb.buf);
>
> Surprised to see changes to cat-file here, since the commit message is
> all about fsck. Did the semantics of oid_object_info_extended() change?
> I.e., this hunk implies to me that it is now returning -1 when we said
> unknown types were OK, and we got one. But in that case, how do we
> distinguish that from a real error?
>
> Or more concretely, this patch causes this:
>
>   $ git cat-file -t 1234567890123456789012345678901234567890
>   fatal: git cat-file: could not get object info
>
>   $ git.compile cat-file --allow-unknown-type -t 1234567890123456789012345678901234567890
>   fatal: git cat-file 1234567890123456789012345678901234567890: bad file
>
> Or much worse, from the next hunk:
>
>   $ git cat-file -s 1234567890123456789012345678901234567890
>   fatal: git cat-file: could not get object info
>
>   $ git cat-file --allow-unknown-type -s 1234567890123456789012345678901234567890
>   140732113568960
>
> That seems wrong (so I think my "this hunk implies" is not true, but
> then I am left with: what is the point of this hunk?).

That's very well spotted.

I started re-rolling this today but ran out of time. For what it's worth
the combination of this and 6/6 "makes sense" in the sense that all
tests pass at the end of this series.

But the cases you're pointing out are ones we don't have tests for,
i.e. the combination of "allow unknown" and a non-existing object, as
opposed to a garbage one.

Hence the bug with passing up an invalid (uninitialized) size in that
case. It's fallout from other partial lib-ification changes of these
APIs, i.e. making them return bad values upstream instead of dying right
away.

I'll sort that out in some sensible way. Starting with adding meaningful
test coverage for the existing behavior.


^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v2 5/6] fsck: report invalid types recorded in objects
  2021-04-26 14:28           ` Ævar Arnfjörð Bjarmason
@ 2021-04-26 15:45             ` Jeff King
  0 siblings, 0 replies; 155+ messages in thread
From: Jeff King @ 2021-04-26 15:45 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git, Junio C Hamano, Johannes Sixt

On Mon, Apr 26, 2021 at 04:28:30PM +0200, Ævar Arnfjörð Bjarmason wrote:

> >> @@ -92,7 +93,8 @@ static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
> >>  	switch (opt) {
> >>  	case 't':
> >>  		oi.type_name = &sb;
> >> -		if (oid_object_info_extended(the_repository, &oid, &oi, flags) < 0)
> >> +		ret = oid_object_info_extended(the_repository, &oid, &oi, flags);
> >> +		if (!unknown_type && ret < 0)
> >>  			die("git cat-file: could not get object info");
> >>  		if (sb.len) {
> >>  			printf("%s\n", sb.buf);
> >
> > Surprised to see changes to cat-file here, since the commit message is
> > all about fsck. Did the semantics of oid_object_info_extended() change?
> > I.e., this hunk implies to me that it is now returning -1 when we said
> > unknown types were OK, and we got one. But in that case, how do we
> > distinguish that from a real error?
> >
> > Or more concretely, this patch causes this:
> >
> >   $ git cat-file -t 1234567890123456789012345678901234567890
> >   fatal: git cat-file: could not get object info
> >
> >   $ git.compile cat-file --allow-unknown-type -t 1234567890123456789012345678901234567890
> >   fatal: git cat-file 1234567890123456789012345678901234567890: bad file
> >
> > Or much worse, from the next hunk:
> >
> >   $ git cat-file -s 1234567890123456789012345678901234567890
> >   fatal: git cat-file: could not get object info
> >
> >   $ git cat-file --allow-unknown-type -s 1234567890123456789012345678901234567890
> >   140732113568960
> >
> > That seems wrong (so I think my "this hunk implies" is not true, but
> > then I am left with: what is the point of this hunk?).
> 
> That's very well spotted.
> 
> I started re-rolling this today but ran out of time. For what it's worth
> the combination of this and 6/6 "makes sense" in the sense that all
> tests pass at the end of this series.
> 
> But the cases you're pointing out are ones we don't have tests for,
> i.e. the combination of "allow unknown" and a non-existing object, as
> opposed to a garbage one.
> 
> Hence the bug with passing up an invalid (uninitialized) size in that
> case. It's fallout from other partial lib-ification changes of these
> APIs, i.e. making them return bad values upstream instead of dying right
> away.

I'm not sure I understand. The problem seems solely in the hunk above.
Before, if we got an error from oid_object_info_extended(), we stopped
immediately. But after, we look at the results even though it told us
there was an error. In general, I would think that a "-1" return value
from oid_object_info_extended() is "all bets are off" (remember that
unlike oid_object_info(), this is a strict error return, and not trying
to force the object type into the return value).

And that's independent of what the other patches in the series are
doing, I think.

> I'll sort that out in some sensible way. Starting with adding meaningful
> test coverage for the existing behavior.

Yeah, that sounds fine. I think the current behavior there is perfectly
reasonable (fail with "could not get object info").

-Peff

^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v3 00/17] fsck: better "invalid object" error reporting
  2021-03-28  2:58   ` [PATCH 0/5] fsck: improve error reporting Ævar Arnfjörð Bjarmason
                       ` (5 preceding siblings ...)
  2021-04-13  9:43     ` [PATCH v2 0/6] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:22     ` Ævar Arnfjörð Bjarmason
  2021-05-20 11:22       ` [PATCH v3 01/17] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
                         ` (18 more replies)
  6 siblings, 19 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

A re-roll of v2's 6 patch series[1], has turned into 17 now. Less
scary than one might think though, it's mostly added test coverage +
splitting existing commits into more incremental chunks, but see the
range-diff. This should address all the feedback on v2 + more.

A brief recap summary of what this is about: We now gracefully recover
instead of dying when fsck encounters types that aren't
blob/commit/tree/tag. Those types don't exist in the wild, but you can
manually create them with "git hash-object -t garbage --literally".

So in some senes this matters to nobody, but I'm doing this as part of
general changes I've been pushing to make fsck/gc error reporting more
graceful, and errors more recoverable. We now have a few more places
in object-file.c where we don't just die(), but properly return
API-like return codes/data to the caller instead.

This does not contain any changes to how --allow-unknown-type
hash-object's --literally etc. work, as I suggested we could do in
[2]. Any such changes will need the API changes here, but these are
just the narrow fsck fixes.

1. https://lore.kernel.org/git/cover-0.6-00000000000-20210413T093734Z-avarab@gmail.com
2. https://lore.kernel.org/git/87r1i4qf4h.fsf@evledraar.gmail.com/

Ævar Arnfjörð Bjarmason (17):
  fsck tests: refactor one test to use a sub-repo
  fsck tests: add test for fsck-ing an unknown type
  cat-file tests: test for missing object with -t and -s
  cat-file tests: test that --allow-unknown-type isn't on by default
  rev-list tests: test for behavior with invalid object types
  cat-file tests: add corrupt loose object test
  cat-file tests: test for current --allow-unknown-type behavior
  cache.h: move object functions to object-store.h
  object-file.c: make parse_loose_header_extended() public
  object-file.c: add missing braces to loose_object_info()
  object-file.c: stop dying in parse_loose_header()
  object-file.c: return -2 on "header too long" in unpack_loose_header()
  object-file.c: return -1, not "status" from unpack_loose_header()
  fsck: don't hard die on invalid object types
  object-store.h: move read_loose_object() below 'struct object_info'
  fsck: report invalid types recorded in objects
  fsck: report invalid object type-path combinations

 builtin/fast-export.c  |   2 +-
 builtin/fsck.c         |  28 ++++++-
 builtin/index-pack.c   |   2 +-
 builtin/mktag.c        |   3 +-
 cache.h                |  10 ---
 object-file.c          | 156 ++++++++++++++++++-------------------
 object-store.h         |  60 +++++++++++----
 object.c               |   4 +-
 pack-check.c           |   3 +-
 streaming.c            |  10 ++-
 t/t1006-cat-file.sh    | 169 +++++++++++++++++++++++++++++++++++++++++
 t/t1450-fsck.sh        |  64 +++++++++++-----
 t/t6115-rev-list-du.sh |  11 +++
 13 files changed, 387 insertions(+), 135 deletions(-)

Range-diff against v2:
 2:  5a2cd6cca9 =  1:  aa38b2bf9e fsck tests: refactor one test to use a sub-repo
 -:  ---------- >  2:  82b64abd25 fsck tests: add test for fsck-ing an unknown type
 -:  ---------- >  3:  7c3c2fe25d cat-file tests: test for missing object with -t and -s
 -:  ---------- >  4:  871b820003 cat-file tests: test that --allow-unknown-type isn't on by default
 -:  ---------- >  5:  b98da9cc89 rev-list tests: test for behavior with invalid object types
 -:  ---------- >  6:  04cc1d20f6 cat-file tests: add corrupt loose object test
 -:  ---------- >  7:  9217320888 cat-file tests: test for current --allow-unknown-type behavior
 1:  37c323a241 =  8:  12dd453879 cache.h: move object functions to object-store.h
 3:  d0d9cb3331 !  9:  6a5b78dcad fsck: don't hard die on invalid object types
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    fsck: don't hard die on invalid object types
    +    object-file.c: make parse_loose_header_extended() public
     
    -    Change builtin/fsck.c to pass down a
    -    OBJECT_INFO_ALLOW_UNKNOWN_TYPE. This changes this very ungraceful
    -    error:
    +    Make the parse_loose_header_extended() function public and remove the
    +    parse_loose_header() wrapper. The only direct user of it outside of
    +    object-file.c itself was in streaming.c, that caller can simply pass
    +    the required "struct object-info *" instead.
     
    -        $ git hash-object --stdin -w -t garbage --literally </dev/null
    -        <OID>
    -        $ git fsck
    -        fatal: invalid object type
    -        $
    -
    -    Into:
    -
    -        $ git fsck
    -        error: hash mismatch for <OID_PATH> (expected <OID>)
    -        error: <OID>: object corrupt or missing: <OID_PATH>
    -        [ the rest of the fsck output here, i.e. it didn't hard die ]
    -
    -    We'll still exit with non-zero, but now we'll finish the rest of the
    -    traversal. The tests that's being added here asserts that we'll still
    -    complain about other fsck issues (e.g. an unrelated dangling blob).
    -
    -    But why are we complaining about a "hash mismatch" for an object of a
    -    type we don't know about? We shouldn't. This is the bare minimal
    -    change needed to not make fsck hard die on a repository that's been
    -    corrupted in this manner. In subsequent commits we'll teach fsck to
    -    recognize this particular type of corruption and emit a better error
    -    message.
    -
    -    The parse_loose_header() function being changed here is only used in
    -    builtin/fsck.c, see f6371f92104 (sha1_file: add read_loose_object()
    -    function, 2017-01-13) for its introduction.
    +    This change is being done in preparation for teaching
    +    read_loose_object() to accept a flag to pass to
    +    parse_loose_header(). It isn't strictly necessary for that change, we
    +    could simply use parse_loose_header_extended() there, but will leave
    +    the API in a better end state.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    - ## builtin/fsck.c ##
    -@@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *path, void *data)
    - 	void *contents;
    - 	int eaten;
    - 
    --	if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
    -+	if (read_loose_object(path, oid, &type, &size, &contents,
    -+			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
    - 		errors_found |= ERROR_OBJECT;
    - 		error(_("%s: object corrupt or missing: %s"),
    - 		      oid_to_hex(oid), path);
    -
      ## object-file.c ##
     @@ object-file.c: static void *unpack_loose_rest(git_zstream *stream,
       * too permissive for what we want to check. So do an anal
    @@ object-file.c: static int loose_object_info(struct repository *r,
      
      	if (status >= 0 && oi->contentp) {
     @@ object-file.c: int read_loose_object(const char *path,
    - 		      const struct object_id *expected_oid,
    - 		      enum object_type *type,
    - 		      unsigned long *size,
    --		      void **contents)
    -+		      void **contents,
    -+		      unsigned int oi_flags)
    - {
    - 	int ret = -1;
    - 	void *map = NULL;
      	unsigned long mapsize;
      	git_zstream stream;
      	char hdr[MAX_HEADER_LEN];
    @@ object-file.c: int read_loose_object(const char *path,
      	}
      
     -	*type = parse_loose_header(hdr, size);
    -+	*type = parse_loose_header(hdr, &oi, oi_flags);
    ++	*type = parse_loose_header(hdr, &oi, 0);
      	if (*type < 0) {
      		error(_("unable to parse header of %s"), path);
      		git_inflate_end(&stream);
     
      ## object-store.h ##
    -@@ object-store.h: int read_loose_object(const char *path,
    - 		      const struct object_id *expected_oid,
    - 		      enum object_type *type,
    - 		      unsigned long *size,
    --		      void **contents);
    -+		      void **contents,
    -+		      unsigned int oi_flags);
    - 
    - /* Retry packed storage after checking packed and loose storage */
    - #define HAS_OBJECT_RECHECK_PACKED 1
     @@ object-store.h: int for_each_packed_object(each_packed_object_fn, void *,
      int unpack_loose_header(git_zstream *stream, unsigned char *map,
      			unsigned long mapsize, void *buffer,
    @@ object-store.h: int for_each_packed_object(each_packed_object_fn, void *,
      int finalize_object_file(const char *tmpfile, const char *filename);
     
      ## streaming.c ##
    -@@ streaming.c: static struct stream_vtbl loose_vtbl = {
    - 
    - static open_method_decl(loose)
    +@@ streaming.c: static int open_istream_loose(struct git_istream *st, struct repository *r,
    + 			      const struct object_id *oid,
    + 			      enum object_type *type)
      {
    -+	struct object_info oi2 = OBJECT_INFO_INIT;
    -+	oi2.sizep = &st->size;
    ++	struct object_info oi = OBJECT_INFO_INIT;
    ++	oi.sizep = &st->size;
     +
      	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
      	if (!st->u.loose.mapped)
      		return -1;
    -@@ streaming.c: static open_method_decl(loose)
    +@@ streaming.c: static int open_istream_loose(struct git_istream *st, struct repository *r,
      				 st->u.loose.mapsize,
      				 st->u.loose.hdr,
      				 sizeof(st->u.loose.hdr)) < 0) ||
     -	    (parse_loose_header(st->u.loose.hdr, &st->size) < 0)) {
    -+	    (parse_loose_header(st->u.loose.hdr, &oi2, 0) < 0)) {
    ++	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
      		git_inflate_end(&st->z);
      		munmap(st->u.loose.mapped, st->u.loose.mapsize);
      		return -1;
    -
    - ## t/t1450-fsck.sh ##
    -@@ t/t1450-fsck.sh: test_expect_success 'detect corrupt index file in fsck' '
    - 	test_i18ngrep "bad index file" errors
    - '
    - 
    -+test_expect_success 'fsck error and recovery on invalid object type' '
    -+	test_create_repo garbage-type &&
    -+	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
    -+	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
    -+	test_must_fail git -C garbage-type fsck >out 2>err &&
    -+	grep "$garbage_blob: object corrupt or missing:" err &&
    -+	grep "dangling blob $empty_blob" out
    -+'
    -+
    - test_done
 -:  ---------- > 10:  5d31d7e1a5 object-file.c: add missing braces to loose_object_info()
 -:  ---------- > 11:  ee28089219 object-file.c: stop dying in parse_loose_header()
 -:  ---------- > 12:  77f2cd439c object-file.c: return -2 on "header too long" in unpack_loose_header()
 -:  ---------- > 13:  d22d5b8b85 object-file.c: return -1, not "status" from unpack_loose_header()
 -:  ---------- > 14:  260e9888a3 fsck: don't hard die on invalid object types
 4:  81fffefcf9 ! 15:  e2afb813b2 object-store.h: move read_loose_object() below 'struct object_info'
    @@ Metadata
      ## Commit message ##
         object-store.h: move read_loose_object() below 'struct object_info'
     
    -    Move the definition of read_loose_object() below "struct
    +    Move the declaration of read_loose_object() below "struct
         object_info". In the next commit we'll add a "struct object_info *"
         parameter to it, moving it will avoid a forward declaration of the
         struct.
 5:  5fb6ac4fae ! 16:  328f05c51b fsck: report invalid types recorded in objects
    @@ Commit message
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    - ## builtin/cat-file.c ##
    -@@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
    - 	struct strbuf sb = STRBUF_INIT;
    - 	unsigned flags = OBJECT_INFO_LOOKUP_REPLACE;
    - 	const char *path = force_path;
    -+	int ret;
    - 
    - 	if (unknown_type)
    - 		flags |= OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
    -@@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
    - 	switch (opt) {
    - 	case 't':
    - 		oi.type_name = &sb;
    --		if (oid_object_info_extended(the_repository, &oid, &oi, flags) < 0)
    -+		ret = oid_object_info_extended(the_repository, &oid, &oi, flags);
    -+		if (!unknown_type && ret < 0)
    - 			die("git cat-file: could not get object info");
    - 		if (sb.len) {
    - 			printf("%s\n", sb.buf);
    -@@ builtin/cat-file.c: static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
    - 
    - 	case 's':
    - 		oi.sizep = &size;
    --		if (oid_object_info_extended(the_repository, &oid, &oi, flags) < 0)
    -+		ret = oid_object_info_extended(the_repository, &oid, &oi, flags);
    -+		if (!unknown_type && ret < 0)
    - 			die("git cat-file: could not get object info");
    - 		printf("%"PRIuMAX"\n", (uintmax_t)size);
    - 		return 0;
    -
      ## builtin/fsck.c ##
     @@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *path, void *data)
      	unsigned long size;
    @@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *p
      
     
      ## object-file.c ##
    -@@ object-file.c: int parse_loose_header(const char *hdr,
    - 	 * we're obtaining the type using '--allow-unknown-type'
    - 	 * option.
    - 	 */
    --	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
    --		type = 0;
    --	else if (type < 0)
    -+	if (type < 0 && !(flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE))
    - 		die(_("invalid object type"));
    - 	if (oi->typep)
    - 		*oi->typep = type;
    -@@ object-file.c: static int loose_object_info(struct repository *r,
    - 	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
    - 		status = error(_("unable to unpack %s header"),
    - 			       oid_to_hex(oid));
    --	if (status < 0)
    -+	if (status < 0) {
    - 		; /* Do nothing */
    --	else if (hdrbuf.len) {
    -+	} else if (hdrbuf.len) {
    - 		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
    - 			status = error(_("unable to parse %s header with --allow-unknown-type"),
    - 				       oid_to_hex(oid));
    --	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
    --		status = error(_("unable to parse %s header"), oid_to_hex(oid));
    -+	} else {
    -+		status = parse_loose_header(hdr, oi, flags);
    -+		if (status < 0 && !(flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE))
    -+			error(_("unable to parse %s header"), oid_to_hex(oid));
    -+	}
    - 
    - 	if (status >= 0 && oi->contentp) {
    - 		*oi->contentp = unpack_loose_rest(&stream, hdr,
     @@ object-file.c: static int check_stream_oid(git_zstream *stream,
      
      int read_loose_object(const char *path,
    @@ object-file.c: int read_loose_object(const char *path,
      	git_zstream stream;
      	char hdr[MAX_HEADER_LEN];
     -	struct object_info oi = OBJECT_INFO_INIT;
    + 	int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
    +-	oi.typep = type;
     -	oi.sizep = size;
     +	enum object_type *type = oi->typep;
     +	unsigned long *size = oi->sizep;
    @@ object-file.c: int read_loose_object(const char *path,
      		goto out;
      	}
      
    --	*type = parse_loose_header(hdr, &oi, oi_flags);
    --	if (*type < 0) {
    --		error(_("unable to parse header of %s"), path);
    -+	*type = parse_loose_header(hdr, oi, oi_flags);
    -+	if (*type < 0 && !(oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
    -+		error(_("unable to parse header %s"), path);
    +-	if (parse_loose_header(hdr, &oi) < 0) {
    ++	if (parse_loose_header(hdr, oi) < 0) {
    + 		error(_("unable to parse header of %s"), path);
      		git_inflate_end(&stream);
      		goto out;
    - 	}
     @@ object-file.c: int read_loose_object(const char *path,
      			goto out;
      		}
    @@ t/t1450-fsck.sh: test_expect_success 'object with hash mismatch' '
      	git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
      	test_when_finished "git update-ref -d refs/heads/invalid" &&
     @@ t/t1450-fsck.sh: test_expect_success 'fsck error and recovery on invalid object type' '
    - 	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
      	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
      	test_must_fail git -C garbage-type fsck >out 2>err &&
    + 	grep -e "^error" -e "^fatal" err >errors &&
    +-	test_line_count = 2 errors &&
    +-	grep "error: hash mismatch for" err &&
     -	grep "$garbage_blob: object corrupt or missing:" err &&
    ++	test_line_count = 1 errors &&
     +	grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err &&
    -+	grep error: err >err.errors &&
    -+	test_line_count = 1 err.errors &&
      	grep "dangling blob $empty_blob" out
      '
      
 6:  226d2031bc ! 17:  c5e6686765 fsck: report invalid object type-path combinations
    @@ Metadata
      ## Commit message ##
         fsck: report invalid object type-path combinations
     
    -    fsck: improve error on loose object hash mismatch
    -
         Improve the error that's emitted in cases where we find a loose object
         we parse, but which isn't at the location we expect it to be.
     
    @@ builtin/fsck.c: static int fsck_loose(const struct object_id *oid, const char *p
      	struct strbuf sb = STRBUF_INIT;
      	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
      	struct object_info oi;
    -+	struct object_id real_oid = null_oid;
    ++	struct object_id real_oid = *null_oid();
      	int found = 0;
      	oi.type_name = &sb;
      	oi.sizep = &size;
    @@ object-file.c: int check_object_signature(struct repository *r, const struct obj
      			break;
      		r->hash_algo->update_fn(&c, buf, readlen);
      	}
    --	r->hash_algo->final_fn(real_oid.hash, &c);
    -+	r->hash_algo->final_fn(real_oid->hash, &c);
    +-	r->hash_algo->final_oid_fn(&real_oid, &c);
    ++	r->hash_algo->final_oid_fn(real_oid, &c);
      	close_istream(st);
     -	return !oideq(oid, &real_oid) ? -1 : 0;
     +	return !oideq(oid, real_oid) ? -1 : 0;
    @@ object-file.c: int read_loose_object(const char *path,
     -			error(_("hash mismatch for %s (expected %s)"), path,
     -			      oid_to_hex(expected_oid));
     +					   *contents, *size, oi->type_name->buf, real_oid)) {
    -+			if (oideq(real_oid, &null_oid))
    ++			if (oideq(real_oid, null_oid()))
     +				BUG("should only get OID mismatch errors with mapped contents");
      			free(*contents);
      			goto out;
    @@ object-store.h: int oid_object_info_extended(struct repository *r,
      		      struct object_info *oi,
      		      unsigned int oi_flags);
     @@ object-store.h: int unpack_loose_header(git_zstream *stream, unsigned char *map,
    - int parse_loose_header(const char *hdr, struct object_info *oi,
    - 		       unsigned int flags);
    + int parse_loose_header(const char *hdr, struct object_info *oi);
    + 
      int check_object_signature(struct repository *r, const struct object_id *oid,
     -			   void *buf, unsigned long size, const char *type);
     +			   void *buf, unsigned long size, const char *type,
    @@ pack-check.c: static int verify_packfile(struct repository *r,
      				    oid_to_hex(&oid), p->pack_name);
      		else if (fn) {
     
    + ## t/t1006-cat-file.sh ##
    +@@ t/t1006-cat-file.sh: test_expect_success 'cat-file -t and -s on corrupt loose object' '
    + 		# Swap the two to corrupt the repository
    + 		mv -v "$other_path" "$empty_path" &&
    + 		test_must_fail git fsck 2>err.fsck &&
    +-		grep "hash mismatch" err.fsck &&
    ++		grep "hash-path mismatch" err.fsck &&
    + 
    + 		# confirm that cat-file is reading the new swapped-in
    + 		# blob...
    +
      ## t/t1450-fsck.sh ##
     @@ t/t1450-fsck.sh: test_expect_success 'object with hash mismatch' '
      	(
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v3 01/17] fsck tests: refactor one test to use a sub-repo
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:22       ` Ævar Arnfjörð Bjarmason
  2021-05-20 11:22       ` [PATCH v3 02/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
                         ` (17 subsequent siblings)
  18 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Refactor one of the fsck tests to use a throwaway repository. It's a
pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
teardown of a tests so we're not leaving corrupt content for the next
test.

We should instead simply use something like this test_create_repo
pattern. It's both less verbose, and makes things easier to debug as a
failing test can have their state left behind under -d without
damaging the state for other tests.

But let's punt on that general refactoring and just change this one
test, I'm going to change it further in subsequent commits.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 34 ++++++++++++++++------------------
 1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 5071ac63a5..1563b35f88 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -48,24 +48,22 @@ remove_object () {
 	rm "$(sha1_file "$1")"
 }
 
-test_expect_success 'object with bad sha1' '
-	sha=$(echo blob | git hash-object -w --stdin) &&
-	old=$(test_oid_to_path "$sha") &&
-	new=$(dirname $old)/$(test_oid ff_2) &&
-	sha="$(dirname $new)$(basename $new)" &&
-	mv .git/objects/$old .git/objects/$new &&
-	test_when_finished "remove_object $sha" &&
-	git update-index --add --cacheinfo 100644 $sha foo &&
-	test_when_finished "git read-tree -u --reset HEAD" &&
-	tree=$(git write-tree) &&
-	test_when_finished "remove_object $tree" &&
-	cmt=$(echo bogus | git commit-tree $tree) &&
-	test_when_finished "remove_object $cmt" &&
-	git update-ref refs/heads/bogus $cmt &&
-	test_when_finished "git update-ref -d refs/heads/bogus" &&
-
-	test_must_fail git fsck 2>out &&
-	test_i18ngrep "$sha.*corrupt" out
+test_expect_success 'object with hash mismatch' '
+	test_create_repo hash-mismatch &&
+	(
+		cd hash-mismatch &&
+		oid=$(echo blob | git hash-object -w --stdin) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+		mv .git/objects/$old .git/objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+		test_must_fail git fsck 2>out &&
+		test_i18ngrep "$oid.*corrupt" out
+	)
 '
 
 test_expect_success 'branch pointing to non-commit' '
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v3 02/17] fsck tests: add test for fsck-ing an unknown type
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
  2021-05-20 11:22       ` [PATCH v3 01/17] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:22       ` Ævar Arnfjörð Bjarmason
  2021-05-20 11:22       ` [PATCH v3 03/17] cat-file tests: test for missing object with -t and -s Ævar Arnfjörð Bjarmason
                         ` (16 subsequent siblings)
  18 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Fix a blindspot in the fsck tests by checking what we do when we
encounter an unknown "garbage" type produced with hash-object's
--literally option.

This behavior needs to be improved, which'll be done in subsequent
patches, but for now let's test for the current behavior.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 1563b35f88..f36ec1e2f4 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -863,4 +863,16 @@ test_expect_success 'detect corrupt index file in fsck' '
 	test_i18ngrep "bad index file" errors
 '
 
+test_expect_success 'fsck hard errors on an invalid object type' '
+	test_create_repo garbage-type &&
+	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
+	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type
+	EOF
+	test_must_fail git -C garbage-type fsck >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual
+'
+
 test_done
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v3 03/17] cat-file tests: test for missing object with -t and -s
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
  2021-05-20 11:22       ` [PATCH v3 01/17] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
  2021-05-20 11:22       ` [PATCH v3 02/17] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:22       ` Ævar Arnfjörð Bjarmason
  2021-05-20 11:22       ` [PATCH v3 04/17] cat-file tests: test that --allow-unknown-type isn't on by default Ævar Arnfjörð Bjarmason
                         ` (15 subsequent siblings)
  18 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Test for what happens when the -t and -s flags are asked to operate on
a missing object, this extends tests added in 3e370f9faf0 (t1006: add
tests for git cat-file --allow-unknown-type, 2015-05-03). The -t and
-s flags are the only ones that can be combined with
--allow-unknown-type, so let's test with and without that flag.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 5d2dc99b74..b71ef94329 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -315,6 +315,33 @@ test_expect_success '%(deltabase) reports packed delta bases' '
 	}
 '
 
+missing_oid=$(test_oid deadbeef)
+test_expect_success 'error on type of missing object' '
+	cat >expect.err <<-\EOF &&
+	fatal: git cat-file: could not get object info
+	EOF
+	test_must_fail git cat-file -t $missing_oid >out 2>err &&
+	test_must_be_empty out &&
+	test_cmp expect.err err &&
+
+	test_must_fail git cat-file -t --allow-unknown-type $missing_oid >out 2>err &&
+	test_must_be_empty out &&
+	test_cmp expect.err err
+'
+
+test_expect_success 'error on size of missing object' '
+	cat >expect.err <<-\EOF &&
+	fatal: git cat-file: could not get object info
+	EOF
+	test_must_fail git cat-file -s $missing_oid >out 2>err &&
+	test_must_be_empty out &&
+	test_cmp expect.err err &&
+
+	test_must_fail git cat-file -s --allow-unknown-type $missing_oid >out 2>err &&
+	test_must_be_empty out &&
+	test_cmp expect.err err
+'
+
 bogus_type="bogus"
 bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v3 04/17] cat-file tests: test that --allow-unknown-type isn't on by default
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (2 preceding siblings ...)
  2021-05-20 11:22       ` [PATCH v3 03/17] cat-file tests: test for missing object with -t and -s Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:22       ` Ævar Arnfjörð Bjarmason
  2021-05-27 21:17         ` Jonathan Nieder
  2021-05-20 11:22       ` [PATCH v3 05/17] rev-list tests: test for behavior with invalid object types Ævar Arnfjörð Bjarmason
                         ` (14 subsequent siblings)
  18 siblings, 1 reply; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Fix a blindspot in the tests added in the tests for the
--allow-unknown-type feature, added in 39e4ae38804 (cat-file: teach
cat-file a '--allow-unknown-type' option, 2015-05-03).

Before this change all the tests would succeed if --allow-unknown-type
was on by default, let's fix that by asserting that -t and -s die on a
"garbage" type without --allow-unknown-type.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index b71ef94329..dc01d7c4a9 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -347,6 +347,20 @@ bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
 bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
 
+test_expect_success 'die on broken object under -t and -s without --allow-unknown-type' '
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type
+	EOF
+
+	test_must_fail git cat-file -t $bogus_sha1 >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual &&
+
+	test_must_fail git cat-file -s $bogus_sha1 >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual
+'
+
 test_expect_success "Type of broken object is correct" '
 	echo $bogus_type >expect &&
 	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
@@ -363,6 +377,21 @@ bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
 bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
 
+test_expect_success 'die on broken object with large type under -t and -s without --allow-unknown-type' '
+	cat >err.expect <<-EOF &&
+	error: unable to unpack $bogus_sha1 header
+	fatal: git cat-file: could not get object info
+	EOF
+
+	test_must_fail git cat-file -t $bogus_sha1 >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual &&
+
+	test_must_fail git cat-file -s $bogus_sha1 >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual
+'
+
 test_expect_success "Type of broken object is correct when type is large" '
 	echo $bogus_type >expect &&
 	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v3 05/17] rev-list tests: test for behavior with invalid object types
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (3 preceding siblings ...)
  2021-05-20 11:22       ` [PATCH v3 04/17] cat-file tests: test that --allow-unknown-type isn't on by default Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:22       ` Ævar Arnfjörð Bjarmason
  2021-05-20 11:23       ` [PATCH v3 06/17] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
                         ` (13 subsequent siblings)
  18 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:22 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Fix a blindspot in the tests for the "rev-list --disk-usage" feature
added in 16950f8384a (rev-list: add --disk-usage option for
calculating disk usage, 2021-02-09) to test for what happens when it's
asked to calculate the disk usage of invalid object types.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t6115-rev-list-du.sh | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/t/t6115-rev-list-du.sh b/t/t6115-rev-list-du.sh
index b4aef32b71..edb2ed5584 100755
--- a/t/t6115-rev-list-du.sh
+++ b/t/t6115-rev-list-du.sh
@@ -48,4 +48,15 @@ check_du HEAD
 check_du --objects HEAD
 check_du --objects HEAD^..HEAD
 
+test_expect_success 'setup garbage repository' '
+	git clone --bare . garbage.git &&
+	garbage_oid=$(git -C garbage.git hash-object -t garbage -w --stdin --literally <one.t) &&
+	git -C garbage.git rev-list --objects --all --disk-usage &&
+
+	# Manually create a ref because "update-ref", "tag" etc. have
+	# no corresponding --literally option.
+	echo $garbage_oid >garbage.git/refs/tags/garbage-tag &&
+	test_must_fail git -C garbage.git rev-list --objects --all --disk-usage
+'
+
 test_done
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v3 06/17] cat-file tests: add corrupt loose object test
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (4 preceding siblings ...)
  2021-05-20 11:22       ` [PATCH v3 05/17] rev-list tests: test for behavior with invalid object types Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:23       ` Ævar Arnfjörð Bjarmason
  2021-05-20 11:23       ` [PATCH v3 07/17] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
                         ` (12 subsequent siblings)
  18 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Fix a blindspot in the tests for "cat-file" (and by proxy, the guts of
object-file.c) by testing that when we can't decode a loose object
with zlib we'll emit an error from zlib.c.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 52 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index dc01d7c4a9..4a76ff024d 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -404,6 +404,58 @@ test_expect_success "Size of large broken object is correct when type is large"
 	test_cmp expect actual
 '
 
+test_expect_success 'cat-file -t and -s on corrupt loose object' '
+	git init --bare corrupt-loose.git &&
+	(
+		cd corrupt-loose.git &&
+
+		# Setup and create the empty blob and its path
+		empty_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$EMPTY_BLOB")) &&
+		git hash-object -w --stdin </dev/null &&
+
+		# Create another blob and its path
+		echo other >other.blob &&
+		other_blob=$(git hash-object -w --stdin <other.blob) &&
+		other_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$other_blob")) &&
+
+		# Before the swap the size is 0
+		cat >out.expect <<-EOF &&
+		0
+		EOF
+		git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# Swap the two to corrupt the repository
+		mv -v "$other_path" "$empty_path" &&
+		test_must_fail git fsck 2>err.fsck &&
+		grep "hash mismatch" err.fsck &&
+
+		# confirm that cat-file is reading the new swapped-in
+		# blob...
+		cat >out.expect <<-EOF &&
+		blob
+		EOF
+		git cat-file -t "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# ... since it has a different size now.
+		cat >out.expect <<-EOF &&
+		6
+		EOF
+		git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# So far "cat-file" has been happy to spew the found
+		# content out as-is. Try to make it zlib-invalid.
+		mv -v other.blob "$empty_path" &&
+		test_must_fail git fsck 2>err.fsck &&
+		grep "^error: inflate: data stream error (" err.fsck
+	)
+'
+
 # Tests for git cat-file --follow-symlinks
 test_expect_success 'prep for symlink tests' '
 	echo_without_newline "$hello_content" >morx &&
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v3 07/17] cat-file tests: test for current --allow-unknown-type behavior
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (5 preceding siblings ...)
  2021-05-20 11:23       ` [PATCH v3 06/17] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:23       ` Ævar Arnfjörð Bjarmason
  2021-05-20 11:23       ` [PATCH v3 08/17] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
                         ` (11 subsequent siblings)
  18 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Add more tests for the current --allow-unknown-type behavior. As noted
in [1] I don't think much of this makes sense, but let's test for it
as-is so we can see if the behavior changes in the future.

1. https://lore.kernel.org/git/87r1i4qf4h.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 61 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 4a76ff024d..d3d3fd733a 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -361,6 +361,46 @@ test_expect_success 'die on broken object under -t and -s without --allow-unknow
 	test_must_be_empty out.actual
 '
 
+test_expect_success '-e is OK with a broken object without --allow-unknown-type' '
+	git cat-file -e $bogus_sha1
+'
+
+test_expect_success '-e can not be combined with --allow-unknown-type' '
+	test_expect_code 128 git cat-file -e --allow-unknown-type $bogus_sha1
+'
+
+test_expect_success '-p cannot print a broken object even with --allow-unknown-type' '
+	test_must_fail git cat-file -p $bogus_sha1 &&
+	test_expect_code 128 git cat-file -p --allow-unknown-type $bogus_sha1
+'
+
+test_expect_success '<type> <hash> does not work with objects of broken types' '
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type "bogus"
+	EOF
+	test_must_fail git cat-file $bogus_type $bogus_sha1 2>err.actual &&
+	test_cmp err.expect err.actual
+'
+
+test_expect_success 'broken types combined with --batch and --batch-check' '
+	echo $bogus_sha1 >bogus-oid &&
+
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type
+	EOF
+
+	test_must_fail git cat-file --batch <bogus-oid 2>err.actual &&
+	test_cmp err.expect err.actual &&
+
+	test_must_fail git cat-file --batch-check <bogus-oid 2>err.actual &&
+	test_cmp err.expect err.actual
+'
+
+test_expect_success 'the --batch and --batch-check options do not combine with --allow-unknown-type' '
+	test_expect_code 128 git cat-file --batch --allow-unknown-type <bogus-oid &&
+	test_expect_code 128 git cat-file --batch-check --allow-unknown-type <bogus-oid
+'
+
 test_expect_success "Type of broken object is correct" '
 	echo $bogus_type >expect &&
 	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
@@ -372,6 +412,27 @@ test_expect_success "Size of broken object is correct" '
 	git cat-file -s --allow-unknown-type $bogus_sha1 >actual &&
 	test_cmp expect actual
 '
+
+test_expect_success 'the --allow-unknown-type option does not consider replacement refs' '
+	cat >expect <<-EOF &&
+	$bogus_type
+	EOF
+	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+	test_cmp expect actual &&
+
+	# Create it manually, as "git replace" will die on bogus
+	# types.
+	head=$(git rev-parse --verify HEAD) &&
+	mkdir -p .git/refs/replace &&
+	echo $head >.git/refs/replace/$bogus_sha1 &&
+
+	cat >expect <<-EOF &&
+	commit
+	EOF
+	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+	test_cmp expect actual
+'
+
 bogus_type="abcdefghijklmnopqrstuvwxyz1234679"
 bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v3 08/17] cache.h: move object functions to object-store.h
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (6 preceding siblings ...)
  2021-05-20 11:23       ` [PATCH v3 07/17] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:23       ` Ævar Arnfjörð Bjarmason
  2021-05-20 11:23       ` [PATCH v3 09/17] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
                         ` (10 subsequent siblings)
  18 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Move the declaration of some ancient object functions added in
e.g. c4483576b8d (Add "unpack_sha1_header()" helper function,
2005-06-01) from cache.h to object-store.h. This continues work
started in cbd53a2193d (object-store: move object access functions to
object-store.h, 2018-05-15).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache.h        | 10 ----------
 object-store.h |  9 +++++++++
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/cache.h b/cache.h
index ba04ff8bd3..32ea1ea047 100644
--- a/cache.h
+++ b/cache.h
@@ -1302,16 +1302,6 @@ char *xdg_cache_home(const char *filename);
 
 int git_open_cloexec(const char *name, int flags);
 #define git_open(name) git_open_cloexec(name, O_RDONLY)
-int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
-
-int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *buf, unsigned long size, const char *type);
-
-int finalize_object_file(const char *tmpfile, const char *filename);
-
-/* Helper to check and "touch" a file */
-int check_and_freshen_file(const char *fn, int freshen);
 
 extern const signed char hexval_table[256];
 static inline unsigned int hexval(unsigned char c)
diff --git a/object-store.h b/object-store.h
index ec32c23dcb..9117115a50 100644
--- a/object-store.h
+++ b/object-store.h
@@ -477,4 +477,13 @@ int for_each_object_in_pack(struct packed_git *p,
 int for_each_packed_object(each_packed_object_fn, void *,
 			   enum for_each_object_flags flags);
 
+int unpack_loose_header(git_zstream *stream, unsigned char *map,
+			unsigned long mapsize, void *buffer,
+			unsigned long bufsiz);
+int parse_loose_header(const char *hdr, unsigned long *sizep);
+int check_object_signature(struct repository *r, const struct object_id *oid,
+			   void *buf, unsigned long size, const char *type);
+int finalize_object_file(const char *tmpfile, const char *filename);
+int check_and_freshen_file(const char *fn, int freshen);
+
 #endif /* OBJECT_STORE_H */
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v3 09/17] object-file.c: make parse_loose_header_extended() public
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (7 preceding siblings ...)
  2021-05-20 11:23       ` [PATCH v3 08/17] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:23       ` Ævar Arnfjörð Bjarmason
  2021-05-20 11:23       ` [PATCH v3 10/17] object-file.c: add missing braces to loose_object_info() Ævar Arnfjörð Bjarmason
                         ` (9 subsequent siblings)
  18 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Make the parse_loose_header_extended() function public and remove the
parse_loose_header() wrapper. The only direct user of it outside of
object-file.c itself was in streaming.c, that caller can simply pass
the required "struct object-info *" instead.

This change is being done in preparation for teaching
read_loose_object() to accept a flag to pass to
parse_loose_header(). It isn't strictly necessary for that change, we
could simply use parse_loose_header_extended() there, but will leave
the API in a better end state.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c  | 21 ++++++++-------------
 object-store.h |  3 ++-
 streaming.c    |  5 ++++-
 3 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/object-file.c b/object-file.c
index f233b440b2..527f435381 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1340,8 +1340,9 @@ static void *unpack_loose_rest(git_zstream *stream,
  * too permissive for what we want to check. So do an anal
  * object header parse by hand.
  */
-static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
-				       unsigned int flags)
+int parse_loose_header(const char *hdr,
+		       struct object_info *oi,
+		       unsigned int flags)
 {
 	const char *type_buf = hdr;
 	unsigned long size;
@@ -1401,14 +1402,6 @@ static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
 	return *hdr ? -1 : type;
 }
 
-int parse_loose_header(const char *hdr, unsigned long *sizep)
-{
-	struct object_info oi = OBJECT_INFO_INIT;
-
-	oi.sizep = sizep;
-	return parse_loose_header_extended(hdr, &oi, 0);
-}
-
 static int loose_object_info(struct repository *r,
 			     const struct object_id *oid,
 			     struct object_info *oi, int flags)
@@ -1463,10 +1456,10 @@ static int loose_object_info(struct repository *r,
 	if (status < 0)
 		; /* Do nothing */
 	else if (hdrbuf.len) {
-		if ((status = parse_loose_header_extended(hdrbuf.buf, oi, flags)) < 0)
+		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header_extended(hdr, oi, flags)) < 0)
+	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
 
 	if (status >= 0 && oi->contentp) {
@@ -2549,6 +2542,8 @@ int read_loose_object(const char *path,
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
+	struct object_info oi = OBJECT_INFO_INIT;
+	oi.sizep = size;
 
 	*contents = NULL;
 
@@ -2563,7 +2558,7 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, size);
+	*type = parse_loose_header(hdr, &oi, 0);
 	if (*type < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
diff --git a/object-store.h b/object-store.h
index 9117115a50..d443964447 100644
--- a/object-store.h
+++ b/object-store.h
@@ -480,7 +480,8 @@ int for_each_packed_object(each_packed_object_fn, void *,
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
 			unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
+int parse_loose_header(const char *hdr, struct object_info *oi,
+		       unsigned int flags);
 int check_object_signature(struct repository *r, const struct object_id *oid,
 			   void *buf, unsigned long size, const char *type);
 int finalize_object_file(const char *tmpfile, const char *filename);
diff --git a/streaming.c b/streaming.c
index 5f480ad50c..8beac62cbb 100644
--- a/streaming.c
+++ b/streaming.c
@@ -223,6 +223,9 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 			      const struct object_id *oid,
 			      enum object_type *type)
 {
+	struct object_info oi = OBJECT_INFO_INIT;
+	oi.sizep = &st->size;
+
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
 		return -1;
@@ -231,7 +234,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 				 st->u.loose.mapsize,
 				 st->u.loose.hdr,
 				 sizeof(st->u.loose.hdr)) < 0) ||
-	    (parse_loose_header(st->u.loose.hdr, &st->size) < 0)) {
+	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
 		return -1;
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v3 10/17] object-file.c: add missing braces to loose_object_info()
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (8 preceding siblings ...)
  2021-05-20 11:23       ` [PATCH v3 09/17] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:23       ` Ævar Arnfjörð Bjarmason
  2021-05-20 11:23       ` [PATCH v3 11/17] object-file.c: stop dying in parse_loose_header() Ævar Arnfjörð Bjarmason
                         ` (8 subsequent siblings)
  18 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Change the formatting in loose_object_info() to conform with our usual
coding style:

    When there are multiple arms to a conditional and some of them
    require braces, enclose even a single line block in braces for
    consistency -- Documentation/CodingGuidelines

This formatting-only change makes a subsequent commit easier to read.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/object-file.c b/object-file.c
index 527f435381..115054389c 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1450,17 +1450,20 @@ static int loose_object_info(struct repository *r,
 		if (unpack_loose_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
 			status = error(_("unable to unpack %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
+	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
-	if (status < 0)
-		; /* Do nothing */
-	else if (hdrbuf.len) {
+	}
+
+	if (status < 0) {
+		/* Do nothing */
+	} else if (hdrbuf.len) {
 		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
+	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0) {
 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
+	}
 
 	if (status >= 0 && oi->contentp) {
 		*oi->contentp = unpack_loose_rest(&stream, hdr,
@@ -1469,8 +1472,9 @@ static int loose_object_info(struct repository *r,
 			git_inflate_end(&stream);
 			status = -1;
 		}
-	} else
+	} else {
 		git_inflate_end(&stream);
+	}
 
 	munmap(map, mapsize);
 	if (status && oi->typep)
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v3 11/17] object-file.c: stop dying in parse_loose_header()
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (9 preceding siblings ...)
  2021-05-20 11:23       ` [PATCH v3 10/17] object-file.c: add missing braces to loose_object_info() Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:23       ` Ævar Arnfjörð Bjarmason
  2021-05-27 17:50         ` Jonathan Tan
  2021-05-20 11:23       ` [PATCH v3 12/17] object-file.c: return -2 on "header too long" in unpack_loose_header() Ævar Arnfjörð Bjarmason
                         ` (7 subsequent siblings)
  18 siblings, 1 reply; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Start the libification of parse_loose_header() by making it return
error codes and data instead of invoking die() by itself. For now
we'll move the relevant die() call to loose_object_info() and
read_loose_object() to keep this change smaller, but in subsequent
commits we'll also libify those.

The reason this makes sense is that with the refactoring of
parse_loose_header_extended() in an earlier commit the public
interface for parse_loose_header() no longer just accepts a "unsigned
long *sizep". Rather it accepts a "struct object_info *", that
structure will be populated with information about the object.

It thus makes sense to further libify the interface so that it stops
calling die() when it encounters OBJ_BAD, and instead rely on its
callers to check the populated "oi->typep".

This also allows us to simplify away the
unpack_loose_header_to_strbuf() function added in
46f034483eb (sha1_file: support reading from a loose object of unknown
type, 2015-05-03). Its code was mostly copy/pasted between it and both
of unpack_loose_header() and unpack_loose_short_header(). We now have
a single unpack_loose_header() function which accepts an optional
"struct strbuf *" instead.

I think the remaining unpack_loose_header() function could be further
simplified, we're carrying some complexity just to be able to emit a
garbage type longer than MAX_HEADER_LEN, we could alternatively just
say "we found a garbage type <first 32 bytes>..." instead, but let's
leave this in place for now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c  | 105 ++++++++++++++++++++-----------------------------
 object-store.h |  25 ++++++++++--
 streaming.c    |   7 +++-
 3 files changed, 70 insertions(+), 67 deletions(-)

diff --git a/object-file.c b/object-file.c
index 115054389c..d4bdf86657 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1210,11 +1210,12 @@ void *map_loose_object(struct repository *r,
 	return map_loose_object_1(r, NULL, oid, size);
 }
 
-static int unpack_loose_short_header(git_zstream *stream,
-				     unsigned char *map, unsigned long mapsize,
-				     void *buffer, unsigned long bufsiz)
+int unpack_loose_header(git_zstream *stream,
+			unsigned char *map, unsigned long mapsize,
+			void *buffer, unsigned long bufsiz,
+			struct strbuf *header)
 {
-	int ret;
+	int status;
 
 	/* Get the data stream */
 	memset(stream, 0, sizeof(*stream));
@@ -1225,44 +1226,25 @@ static int unpack_loose_short_header(git_zstream *stream,
 
 	git_inflate_init(stream);
 	obj_read_unlock();
-	ret = git_inflate(stream, 0);
+	status = git_inflate(stream, 0);
 	obj_read_lock();
-
-	return ret;
-}
-
-int unpack_loose_header(git_zstream *stream,
-			unsigned char *map, unsigned long mapsize,
-			void *buffer, unsigned long bufsiz)
-{
-	int status = unpack_loose_short_header(stream, map, mapsize,
-					       buffer, bufsiz);
-
 	if (status < Z_OK)
 		return status;
 
-	/* Make sure we have the terminating NUL */
-	if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
-		return -1;
-	return 0;
-}
-
-static int unpack_loose_header_to_strbuf(git_zstream *stream, unsigned char *map,
-					 unsigned long mapsize, void *buffer,
-					 unsigned long bufsiz, struct strbuf *header)
-{
-	int status;
-
-	status = unpack_loose_short_header(stream, map, mapsize, buffer, bufsiz);
-	if (status < Z_OK)
-		return -1;
-
 	/*
 	 * Check if entire header is unpacked in the first iteration.
 	 */
 	if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
 		return 0;
 
+	/*
+	 * We have a header longer than MAX_HEADER_LEN. We abort early
+	 * unless under we're running as e.g. "cat-file
+	 * --allow-unknown-type".
+	 */
+	if (!header)
+		return -1;
+
 	/*
 	 * buffer[0..bufsiz] was not large enough.  Copy the partial
 	 * result out to header, and then append the result of further
@@ -1340,9 +1322,7 @@ static void *unpack_loose_rest(git_zstream *stream,
  * too permissive for what we want to check. So do an anal
  * object header parse by hand.
  */
-int parse_loose_header(const char *hdr,
-		       struct object_info *oi,
-		       unsigned int flags)
+int parse_loose_header(const char *hdr, struct object_info *oi)
 {
 	const char *type_buf = hdr;
 	unsigned long size;
@@ -1364,15 +1344,6 @@ int parse_loose_header(const char *hdr,
 	type = type_from_string_gently(type_buf, type_len, 1);
 	if (oi->type_name)
 		strbuf_add(oi->type_name, type_buf, type_len);
-	/*
-	 * Set type to 0 if its an unknown object and
-	 * we're obtaining the type using '--allow-unknown-type'
-	 * option.
-	 */
-	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
-		type = 0;
-	else if (type < 0)
-		die(_("invalid object type"));
 	if (oi->typep)
 		*oi->typep = type;
 
@@ -1399,7 +1370,14 @@ int parse_loose_header(const char *hdr,
 	/*
 	 * The length must be followed by a zero byte
 	 */
-	return *hdr ? -1 : type;
+	if (*hdr)
+		return -1;
+
+	/*
+	 * The format is valid, but the type may still be bogus. The
+	 * Caller needs to check its oi->typep.
+	 */
+	return 0;
 }
 
 static int loose_object_info(struct repository *r,
@@ -1410,9 +1388,12 @@ static int loose_object_info(struct repository *r,
 	unsigned long mapsize;
 	void *map;
 	git_zstream stream;
+	int hdr_ret;
 	char hdr[MAX_HEADER_LEN];
 	struct strbuf hdrbuf = STRBUF_INIT;
 	unsigned long size_scratch;
+	enum object_type type_scratch;
+	int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 
 	if (oi->delta_base_oid)
 		oidclr(oi->delta_base_oid);
@@ -1443,27 +1424,23 @@ static int loose_object_info(struct repository *r,
 
 	if (!oi->sizep)
 		oi->sizep = &size_scratch;
+	if (!oi->typep)
+		oi->typep = &type_scratch;
 
 	if (oi->disk_sizep)
 		*oi->disk_sizep = mapsize;
-	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
-		if (unpack_loose_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
-			status = error(_("unable to unpack %s header with --allow-unknown-type"),
-				       oid_to_hex(oid));
-	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
+
+	hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+				      allow_unknown ? &hdrbuf : NULL);
+	if (hdr_ret < 0) {
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
 	}
-
-	if (status < 0) {
-		/* Do nothing */
-	} else if (hdrbuf.len) {
-		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
-			status = error(_("unable to parse %s header with --allow-unknown-type"),
-				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0) {
+	if (!status && parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi) < 0) {
 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
 	}
+	if (!allow_unknown && *oi->typep < 0)
+		die(_("invalid object type"));
 
 	if (status >= 0 && oi->contentp) {
 		*oi->contentp = unpack_loose_rest(&stream, hdr,
@@ -1481,7 +1458,8 @@ static int loose_object_info(struct repository *r,
 		*oi->typep = status;
 	if (oi->sizep == &size_scratch)
 		oi->sizep = NULL;
-	strbuf_release(&hdrbuf);
+	if (oi->typep == &type_scratch)
+		oi->typep = NULL;
 	oi->whence = OI_LOOSE;
 	return (status < 0) ? status : 0;
 }
@@ -2547,6 +2525,7 @@ int read_loose_object(const char *path,
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
 	struct object_info oi = OBJECT_INFO_INIT;
+	oi.typep = type;
 	oi.sizep = size;
 
 	*contents = NULL;
@@ -2557,17 +2536,19 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
+	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+				NULL) < 0) {
 		error(_("unable to unpack header of %s"), path);
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, &oi, 0);
-	if (*type < 0) {
+	if (parse_loose_header(hdr, &oi) < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
 		goto out;
 	}
+	if (*type < 0)
+		die(_("invalid object type"));
 
 	if (*type == OBJ_BLOB && *size > big_file_threshold) {
 		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
diff --git a/object-store.h b/object-store.h
index d443964447..740edcac30 100644
--- a/object-store.h
+++ b/object-store.h
@@ -477,11 +477,30 @@ int for_each_object_in_pack(struct packed_git *p,
 int for_each_packed_object(each_packed_object_fn, void *,
 			   enum for_each_object_flags flags);
 
+/**
+ * unpack_loose_header() initializes the data stream needed to unpack
+ * a loose object header.
+ *
+ * Returns 0 on success. Returns negative values on error.
+ *
+ * It will only parse up to MAX_HEADER_LEN bytes unless an optional
+ * "hdrbuf" argument is non-NULL. This is intended for use with
+ * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
+ * reporting. The full header will be extracted to "hdrbuf" for use
+ * with parse_loose_header().
+ */
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
-			unsigned long bufsiz);
-int parse_loose_header(const char *hdr, struct object_info *oi,
-		       unsigned int flags);
+			unsigned long bufsiz, struct strbuf *hdrbuf);
+
+/**
+ * parse_loose_header() parses the starting "<type> <len>\0" of an
+ * object. If it doesn't follow that format -1 is returned. To check
+ * the validity of the <type> populate the "typep" in the "struct
+ * object_info". It will be OBJ_BAD if the object type is unknown.
+ */
+int parse_loose_header(const char *hdr, struct object_info *oi);
+
 int check_object_signature(struct repository *r, const struct object_id *oid,
 			   void *buf, unsigned long size, const char *type);
 int finalize_object_file(const char *tmpfile, const char *filename);
diff --git a/streaming.c b/streaming.c
index 8beac62cbb..c3dc241d6a 100644
--- a/streaming.c
+++ b/streaming.c
@@ -225,6 +225,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 {
 	struct object_info oi = OBJECT_INFO_INIT;
 	oi.sizep = &st->size;
+	oi.typep = type;
 
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
@@ -233,8 +234,10 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 				 st->u.loose.mapped,
 				 st->u.loose.mapsize,
 				 st->u.loose.hdr,
-				 sizeof(st->u.loose.hdr)) < 0) ||
-	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
+				 sizeof(st->u.loose.hdr),
+				 NULL) < 0) ||
+	    (parse_loose_header(st->u.loose.hdr, &oi) < 0) ||
+	    *type < 0) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
 		return -1;
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v3 12/17] object-file.c: return -2 on "header too long" in unpack_loose_header()
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (10 preceding siblings ...)
  2021-05-20 11:23       ` [PATCH v3 11/17] object-file.c: stop dying in parse_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:23       ` Ævar Arnfjörð Bjarmason
  2021-05-27 17:54         ` Jonathan Tan
  2021-05-20 11:23       ` [PATCH v3 13/17] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
                         ` (6 subsequent siblings)
  18 siblings, 1 reply; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Split up the return code for "header too long" from the generic
negative return value unpack_loose_header() returns, and report via
error() if we exceed MAX_HEADER_LEN.

As a test added earlier in this series in t1006-cat-file.sh shows
we'll correctly emit zlib errors from zlib.c already in this case, so
we have no need to carry those return codes further down the
stack. Let's instead just return -2 saying we ran into the
MAX_HEADER_LEN limit, or other negative values for "unable to unpack
<OID> header".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c       | 15 ++++++++++-----
 object-store.h      |  6 ++++--
 t/t1006-cat-file.sh |  2 +-
 3 files changed, 15 insertions(+), 8 deletions(-)

diff --git a/object-file.c b/object-file.c
index d4bdf86657..7623ada1aa 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1240,10 +1240,10 @@ int unpack_loose_header(git_zstream *stream,
 	/*
 	 * We have a header longer than MAX_HEADER_LEN. We abort early
 	 * unless under we're running as e.g. "cat-file
-	 * --allow-unknown-type".
+	 * --allow-unknown-type". A -2 is "header too long"
 	 */
 	if (!header)
-		return -1;
+		return -2;
 
 	/*
 	 * buffer[0..bufsiz] was not large enough.  Copy the partial
@@ -1264,7 +1264,7 @@ int unpack_loose_header(git_zstream *stream,
 		stream->next_out = buffer;
 		stream->avail_out = bufsiz;
 	} while (status != Z_STREAM_END);
-	return -1;
+	return -2;
 }
 
 static void *unpack_loose_rest(git_zstream *stream,
@@ -1433,9 +1433,14 @@ static int loose_object_info(struct repository *r,
 	hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
 				      allow_unknown ? &hdrbuf : NULL);
 	if (hdr_ret < 0) {
-		status = error(_("unable to unpack %s header"),
-			       oid_to_hex(oid));
+		if (hdr_ret == -2)
+			status = error(_("header for %s too long, exceeds %d bytes"),
+				       oid_to_hex(oid), MAX_HEADER_LEN);
+		else
+			status = error(_("unable to unpack %s header"),
+				       oid_to_hex(oid));
 	}
+
 	if (!status && parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi) < 0) {
 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
 	}
diff --git a/object-store.h b/object-store.h
index 740edcac30..9accb614fc 100644
--- a/object-store.h
+++ b/object-store.h
@@ -481,13 +481,15 @@ int for_each_packed_object(each_packed_object_fn, void *,
  * unpack_loose_header() initializes the data stream needed to unpack
  * a loose object header.
  *
- * Returns 0 on success. Returns negative values on error.
+ * Returns 0 on success. Returns negative values on error. If the
+ * header exceeds MAX_HEADER_LEN -2 will be returned.
  *
  * It will only parse up to MAX_HEADER_LEN bytes unless an optional
  * "hdrbuf" argument is non-NULL. This is intended for use with
  * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
  * reporting. The full header will be extracted to "hdrbuf" for use
- * with parse_loose_header().
+ * with parse_loose_header(), -2 will still be returned from this
+ * function to indicate that the header was too long.
  */
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index d3d3fd733a..f12b06150e 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -440,7 +440,7 @@ bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_t
 
 test_expect_success 'die on broken object with large type under -t and -s without --allow-unknown-type' '
 	cat >err.expect <<-EOF &&
-	error: unable to unpack $bogus_sha1 header
+	error: header for $bogus_sha1 too long, exceeds 32 bytes
 	fatal: git cat-file: could not get object info
 	EOF
 
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v3 13/17] object-file.c: return -1, not "status" from unpack_loose_header()
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (11 preceding siblings ...)
  2021-05-20 11:23       ` [PATCH v3 12/17] object-file.c: return -2 on "header too long" in unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:23       ` Ævar Arnfjörð Bjarmason
  2021-05-20 11:23       ` [PATCH v3 14/17] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
                         ` (5 subsequent siblings)
  18 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Return a -1 when git_inflate() fails instead of whatever Z_* status
we'd get from zlib.c. This makes no difference to any error we report,
but makes it more obvious that we don't care about the specific zlib
error codes here.

See d21f8426907 (unpack_sha1_header(): detect malformed object header,
2016-09-25) for the commit that added the "return status" code. As far
as I can tell there was never a real reason (e.g. different reporting)
for carrying down the "status" as opposed to "-1".

At the time that d21f8426907 was written there was a corresponding
"ret < Z_OK" check right after the unpack_sha1_header() call (the
"unpack_sha1_header()" function was later rename to our current
"unpack_loose_header()").

However, that check was removed in c84a1f3ed4d (sha1_file: refactor
read_object, 2017-06-21) without changing the corresponding return
code.

So let's do the minor cleanup of also changing this function to return
a -1.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/object-file.c b/object-file.c
index 7623ada1aa..0de699de98 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1229,7 +1229,7 @@ int unpack_loose_header(git_zstream *stream,
 	status = git_inflate(stream, 0);
 	obj_read_lock();
 	if (status < Z_OK)
-		return status;
+		return -1;
 
 	/*
 	 * Check if entire header is unpacked in the first iteration.
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v3 14/17] fsck: don't hard die on invalid object types
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (12 preceding siblings ...)
  2021-05-20 11:23       ` [PATCH v3 13/17] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:23       ` Ævar Arnfjörð Bjarmason
  2021-05-27 18:18         ` Jonathan Tan
  2021-05-20 11:23       ` [PATCH v3 15/17] object-store.h: move read_loose_object() below 'struct object_info' Ævar Arnfjörð Bjarmason
                         ` (4 subsequent siblings)
  18 siblings, 1 reply; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Change the error fsck emits on invalid object types, such as:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    <OID>

From the very ungraceful error of:

    $ git fsck
    fatal: invalid object type
    $

To:

    $ git fsck
    error: hash mismatch for <OID_PATH> (expected <OID>)
    error: <OID>: object corrupt or missing: <OID_PATH>
    [ the rest of the fsck output here, i.e. it didn't hard die ]

We'll still exit with non-zero, but now we'll finish the rest of the
traversal. The tests that's being added here asserts that we'll still
complain about other fsck issues (e.g. an unrelated dangling blob).

To do this we need to pass down the "OBJECT_INFO_ALLOW_UNKNOWN_TYPE"
flag from read_loose_object() through to parse_loose_header(). Since
the read_loose_object() function is only used in builtin/fsck.c we can
simply change it. See f6371f92104 (sha1_file: add read_loose_object()
function, 2017-01-13) for the introduction of read_loose_object().

Why are we complaining about a "hash mismatch" for an object of a type
we don't know about? We shouldn't. This is the bare minimal change
needed to not make fsck hard die on a repository that's been corrupted
in this manner. In subsequent commits we'll teach fsck to recognize
this particular type of corruption and emit a better error message.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c  |  3 ++-
 object-file.c   | 11 ++++++++---
 object-store.h  |  3 ++-
 t/t1450-fsck.sh | 14 +++++++-------
 4 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 87a99b0108..38b515deb6 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -600,7 +600,8 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	void *contents;
 	int eaten;
 
-	if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
+	if (read_loose_object(path, oid, &type, &size, &contents,
+			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
 		errors_found |= ERROR_OBJECT;
 		error(_("%s: object corrupt or missing: %s"),
 		      oid_to_hex(oid), path);
diff --git a/object-file.c b/object-file.c
index 0de699de98..0e8a024eb3 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2522,7 +2522,8 @@ int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
 		      enum object_type *type,
 		      unsigned long *size,
-		      void **contents)
+		      void **contents,
+		      unsigned int oi_flags)
 {
 	int ret = -1;
 	void *map = NULL;
@@ -2530,6 +2531,7 @@ int read_loose_object(const char *path,
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
 	struct object_info oi = OBJECT_INFO_INIT;
+	int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 	oi.typep = type;
 	oi.sizep = size;
 
@@ -2552,8 +2554,11 @@ int read_loose_object(const char *path,
 		git_inflate_end(&stream);
 		goto out;
 	}
-	if (*type < 0)
-		die(_("invalid object type"));
+	if (!allow_unknown && *type < 0) {
+		error(_("header for %s declares an unknown type"), path);
+		git_inflate_end(&stream);
+		goto out;
+	}
 
 	if (*type == OBJ_BLOB && *size > big_file_threshold) {
 		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
diff --git a/object-store.h b/object-store.h
index 9accb614fc..790e8b1798 100644
--- a/object-store.h
+++ b/object-store.h
@@ -245,7 +245,8 @@ int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
 		      enum object_type *type,
 		      unsigned long *size,
-		      void **contents);
+		      void **contents,
+		      unsigned int oi_flags);
 
 /* Retry packed storage after checking packed and loose storage */
 #define HAS_OBJECT_RECHECK_PACKED 1
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index f36ec1e2f4..e7e8decebb 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -863,16 +863,16 @@ test_expect_success 'detect corrupt index file in fsck' '
 	test_i18ngrep "bad index file" errors
 '
 
-test_expect_success 'fsck hard errors on an invalid object type' '
+test_expect_success 'fsck error and recovery on invalid object type' '
 	test_create_repo garbage-type &&
 	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
 	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
-	cat >err.expect <<-\EOF &&
-	fatal: invalid object type
-	EOF
-	test_must_fail git -C garbage-type fsck >out.actual 2>err.actual &&
-	test_cmp err.expect err.actual &&
-	test_must_be_empty out.actual
+	test_must_fail git -C garbage-type fsck >out 2>err &&
+	grep -e "^error" -e "^fatal" err >errors &&
+	test_line_count = 2 errors &&
+	grep "error: hash mismatch for" err &&
+	grep "$garbage_blob: object corrupt or missing:" err &&
+	grep "dangling blob $empty_blob" out
 '
 
 test_done
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v3 15/17] object-store.h: move read_loose_object() below 'struct object_info'
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (13 preceding siblings ...)
  2021-05-20 11:23       ` [PATCH v3 14/17] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:23       ` Ævar Arnfjörð Bjarmason
  2021-05-20 11:23       ` [PATCH v3 16/17] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
                         ` (3 subsequent siblings)
  18 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Move the declaration of read_loose_object() below "struct
object_info". In the next commit we'll add a "struct object_info *"
parameter to it, moving it will avoid a forward declaration of the
struct.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-store.h | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/object-store.h b/object-store.h
index 790e8b1798..698a701d70 100644
--- a/object-store.h
+++ b/object-store.h
@@ -234,20 +234,6 @@ int pretend_object_file(void *, unsigned long, enum object_type,
 
 int force_object_loose(const struct object_id *oid, time_t mtime);
 
-/*
- * Open the loose object at path, check its hash, and return the contents,
- * type, and size. If the object is a blob, then "contents" may return NULL,
- * to allow streaming of large blobs.
- *
- * Returns 0 on success, negative on error (details may be written to stderr).
- */
-int read_loose_object(const char *path,
-		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
-		      void **contents,
-		      unsigned int oi_flags);
-
 /* Retry packed storage after checking packed and loose storage */
 #define HAS_OBJECT_RECHECK_PACKED 1
 
@@ -388,6 +374,20 @@ int oid_object_info_extended(struct repository *r,
 			     const struct object_id *,
 			     struct object_info *, unsigned flags);
 
+/*
+ * Open the loose object at path, check its hash, and return the contents,
+ * type, and size. If the object is a blob, then "contents" may return NULL,
+ * to allow streaming of large blobs.
+ *
+ * Returns 0 on success, negative on error (details may be written to stderr).
+ */
+int read_loose_object(const char *path,
+		      const struct object_id *expected_oid,
+		      enum object_type *type,
+		      unsigned long *size,
+		      void **contents,
+		      unsigned int oi_flags);
+
 /*
  * Iterate over the files in the loose-object parts of the object
  * directory "path", triggering the following callbacks:
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v3 16/17] fsck: report invalid types recorded in objects
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (14 preceding siblings ...)
  2021-05-20 11:23       ` [PATCH v3 15/17] object-store.h: move read_loose_object() below 'struct object_info' Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:23       ` Ævar Arnfjörð Bjarmason
  2021-05-27 18:24         ` Jonathan Tan
  2021-05-20 11:23       ` [PATCH v3 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
                         ` (2 subsequent siblings)
  18 siblings, 1 reply; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Continue the work in the preceding commit and improve the error on:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    $ git fsck
    error: hash mismatch for <OID_PATH> (expected <OID>)
    error: <OID>: object corrupt or missing: <OID_PATH>
    [ other fsck output ]

To instead emit:

    $ git fsck
    error: <OID>: object is of unknown type 'garbage': <OID_PATH>
    [ other fsck output ]

The complaint about a "hash mismatch" was simply an emergent property
of how we'd fall though from read_loose_object() into fsck_loose()
when we didn't get the data we expected. Now we'll correctly note that
the object type is invalid.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c  | 22 ++++++++++++++++++----
 object-file.c   | 13 +++++--------
 object-store.h  |  4 ++--
 t/t1450-fsck.sh | 24 +++++++++++++++++++++---
 4 files changed, 46 insertions(+), 17 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 38b515deb6..32f11dc1fe 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -599,12 +599,26 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	unsigned long size;
 	void *contents;
 	int eaten;
-
-	if (read_loose_object(path, oid, &type, &size, &contents,
-			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
-		errors_found |= ERROR_OBJECT;
+	struct strbuf sb = STRBUF_INIT;
+	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
+	struct object_info oi;
+	int found = 0;
+	oi.type_name = &sb;
+	oi.sizep = &size;
+	oi.typep = &type;
+
+	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
+		found |= ERROR_OBJECT;
 		error(_("%s: object corrupt or missing: %s"),
 		      oid_to_hex(oid), path);
+	}
+	if (type < 0) {
+		found |= ERROR_OBJECT;
+		error(_("%s: object is of unknown type '%s': %s"),
+		      oid_to_hex(oid), sb.buf, path);
+	}
+	if (found) {
+		errors_found |= ERROR_OBJECT;
 		return 0; /* keep checking other objects */
 	}
 
diff --git a/object-file.c b/object-file.c
index 0e8a024eb3..f24fa2fe4a 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2520,9 +2520,8 @@ static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
 		      void **contents,
+		      struct object_info *oi,
 		      unsigned int oi_flags)
 {
 	int ret = -1;
@@ -2530,10 +2529,9 @@ int read_loose_object(const char *path,
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
-	struct object_info oi = OBJECT_INFO_INIT;
 	int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
-	oi.typep = type;
-	oi.sizep = size;
+	enum object_type *type = oi->typep;
+	unsigned long *size = oi->sizep;
 
 	*contents = NULL;
 
@@ -2549,7 +2547,7 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	if (parse_loose_header(hdr, &oi) < 0) {
+	if (parse_loose_header(hdr, oi) < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
 		goto out;
@@ -2571,8 +2569,7 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size,
-					   type_name(*type))) {
+					   *contents, *size, oi->type_name->buf)) {
 			error(_("hash mismatch for %s (expected %s)"), path,
 			      oid_to_hex(expected_oid));
 			free(*contents);
diff --git a/object-store.h b/object-store.h
index 698a701d70..ba6e5d76c0 100644
--- a/object-store.h
+++ b/object-store.h
@@ -376,6 +376,7 @@ int oid_object_info_extended(struct repository *r,
 
 /*
  * Open the loose object at path, check its hash, and return the contents,
+ * use the "oi" argument to assert things about the object, or e.g. populate its
  * type, and size. If the object is a blob, then "contents" may return NULL,
  * to allow streaming of large blobs.
  *
@@ -383,9 +384,8 @@ int oid_object_info_extended(struct repository *r,
  */
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
 		      void **contents,
+		      struct object_info *oi,
 		      unsigned int oi_flags);
 
 /*
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index e7e8decebb..bc541af2cf 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -66,6 +66,25 @@ test_expect_success 'object with hash mismatch' '
 	)
 '
 
+test_expect_success 'object with hash and type mismatch' '
+	test_create_repo hash-type-mismatch &&
+	(
+		cd hash-type-mismatch &&
+		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+		mv .git/objects/$old .git/objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+		test_must_fail git fsck 2>out &&
+		grep "^error: hash mismatch for " out &&
+		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+	)
+'
+
 test_expect_success 'branch pointing to non-commit' '
 	git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
 	test_when_finished "git update-ref -d refs/heads/invalid" &&
@@ -869,9 +888,8 @@ test_expect_success 'fsck error and recovery on invalid object type' '
 	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
 	test_must_fail git -C garbage-type fsck >out 2>err &&
 	grep -e "^error" -e "^fatal" err >errors &&
-	test_line_count = 2 errors &&
-	grep "error: hash mismatch for" err &&
-	grep "$garbage_blob: object corrupt or missing:" err &&
+	test_line_count = 1 errors &&
+	grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err &&
 	grep "dangling blob $empty_blob" out
 '
 
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v3 17/17] fsck: report invalid object type-path combinations
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (15 preceding siblings ...)
  2021-05-20 11:23       ` [PATCH v3 16/17] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
@ 2021-05-20 11:23       ` Ævar Arnfjörð Bjarmason
  2021-05-27 18:28         ` Jonathan Tan
  2021-05-27 17:08       ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Jonathan Tan
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
  18 siblings, 1 reply; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-20 11:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt,
	Ævar Arnfjörð Bjarmason

Improve the error that's emitted in cases where we find a loose object
we parse, but which isn't at the location we expect it to be.

Before this change we'd prefix the error with a not-a-OID derived from
the path at which the object was found, due to an emergent behavior in
how we'd end up with an "OID" in these codepaths.

Now we'll instead say what object we hashed, and what path it was
found at. Before this patch series e.g.:

    $ git hash-object --stdin -w -t blob </dev/null
    e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ mv objects/e6/ objects/e7

Would emit ("[...]" used to abbreviate the OIDs):

    git fsck
    error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
    error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]

Now we'll instead emit:

    error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]

Furthermore, we'll do the right thing when the object type and its
location are bad. I.e. this case:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    8315a83d2acc4c174aed59430f9a9c4ed926440f
    $ mv objects/83 objects/84

As noted in an earlier commits we'd simply die early in those cases,
until preceding commits fixed the hard die on invalid object type:

    $ git fsck
    fatal: invalid object type

Now we'll instead emit sensible error messages:

    $ git fsck
    error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...]
    error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...]

In both fsck.c and object-file.c we're using null_oid as a sentinel
value for checking whether we got far enough to be certain that the
issue was indeed this OID mismatch.

In the case of check_object_signature() I don't really trust all the
moving parts there to behave consistently, in the face of future
refactorings. Getting it wrong would mean that we'd potentially emit
no error at all on a failing check_object_signature(), or worse
misreport whatever issue we encountered. So let's use the new bug()
function to ferry and return code up to fsck_loose() in that case.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fast-export.c |  2 +-
 builtin/fsck.c        | 13 +++++++++----
 builtin/index-pack.c  |  2 +-
 builtin/mktag.c       |  3 ++-
 object-file.c         | 21 ++++++++++++---------
 object-store.h        |  4 +++-
 object.c              |  4 ++--
 pack-check.c          |  3 ++-
 t/t1006-cat-file.sh   |  2 +-
 t/t1450-fsck.sh       |  8 +++++---
 10 files changed, 38 insertions(+), 24 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 3c20f164f0..48a3b6a7f8 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -312,7 +312,7 @@ static void export_blob(const struct object_id *oid)
 		if (!buf)
 			die("could not read blob %s", oid_to_hex(oid));
 		if (check_object_signature(the_repository, oid, buf, size,
-					   type_name(type)) < 0)
+					   type_name(type), NULL) < 0)
 			die("oid mismatch in blob %s", oid_to_hex(oid));
 		object = parse_object_buffer(the_repository, oid, type,
 					     size, buf, &eaten);
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 32f11dc1fe..96df1aadbf 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -602,20 +602,25 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	struct strbuf sb = STRBUF_INIT;
 	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 	struct object_info oi;
+	struct object_id real_oid = *null_oid();
 	int found = 0;
 	oi.type_name = &sb;
 	oi.sizep = &size;
 	oi.typep = &type;
 
-	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
+	if (read_loose_object(path, oid, &real_oid, &contents, &oi, oi_flags) < 0) {
 		found |= ERROR_OBJECT;
-		error(_("%s: object corrupt or missing: %s"),
-		      oid_to_hex(oid), path);
+		if (!oideq(&real_oid, oid))
+			error(_("%s: hash-path mismatch, found at: %s"),
+			      oid_to_hex(&real_oid), path);
+		else
+			error(_("%s: object corrupt or missing: %s"),
+			      oid_to_hex(oid), path);
 	}
 	if (type < 0) {
 		found |= ERROR_OBJECT;
 		error(_("%s: object is of unknown type '%s': %s"),
-		      oid_to_hex(oid), sb.buf, path);
+		      oid_to_hex(&real_oid), sb.buf, path);
 	}
 	if (found) {
 		errors_found |= ERROR_OBJECT;
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 3fbc5d7077..bf860b6555 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1421,7 +1421,7 @@ static void fix_unresolved_deltas(struct hashfile *f)
 
 		if (check_object_signature(the_repository, &d->oid,
 					   data, size,
-					   type_name(type)))
+					   type_name(type), NULL))
 			die(_("local object %s is corrupt"), oid_to_hex(&d->oid));
 
 		/*
diff --git a/builtin/mktag.c b/builtin/mktag.c
index dddcccdd36..3b2dbbb37e 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -62,7 +62,8 @@ static int verify_object_in_tag(struct object_id *tagged_oid, int *tagged_type)
 
 	repl = lookup_replace_object(the_repository, tagged_oid);
 	ret = check_object_signature(the_repository, repl,
-				     buffer, size, type_name(*tagged_type));
+				     buffer, size, type_name(*tagged_type),
+				     NULL);
 	free(buffer);
 
 	return ret;
diff --git a/object-file.c b/object-file.c
index f24fa2fe4a..bf9ac38d73 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1039,9 +1039,11 @@ void *xmmap(void *start, size_t length,
  * the streaming interface and rehash it to do the same.
  */
 int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *map, unsigned long size, const char *type)
+			   void *map, unsigned long size, const char *type,
+			   struct object_id *real_oidp)
 {
-	struct object_id real_oid;
+	struct object_id tmp;
+	struct object_id *real_oid = real_oidp ? real_oidp : &tmp;
 	enum object_type obj_type;
 	struct git_istream *st;
 	git_hash_ctx c;
@@ -1049,8 +1051,8 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 	int hdrlen;
 
 	if (map) {
-		hash_object_file(r->hash_algo, map, size, type, &real_oid);
-		return !oideq(oid, &real_oid) ? -1 : 0;
+		hash_object_file(r->hash_algo, map, size, type, real_oid);
+		return !oideq(oid, real_oid) ? -1 : 0;
 	}
 
 	st = open_istream(r, oid, &obj_type, &size, NULL);
@@ -1075,9 +1077,9 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 			break;
 		r->hash_algo->update_fn(&c, buf, readlen);
 	}
-	r->hash_algo->final_oid_fn(&real_oid, &c);
+	r->hash_algo->final_oid_fn(real_oid, &c);
 	close_istream(st);
-	return !oideq(oid, &real_oid) ? -1 : 0;
+	return !oideq(oid, real_oid) ? -1 : 0;
 }
 
 int git_open_cloexec(const char *name, int flags)
@@ -2520,6 +2522,7 @@ static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
+		      struct object_id *real_oid,
 		      void **contents,
 		      struct object_info *oi,
 		      unsigned int oi_flags)
@@ -2569,9 +2572,9 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size, oi->type_name->buf)) {
-			error(_("hash mismatch for %s (expected %s)"), path,
-			      oid_to_hex(expected_oid));
+					   *contents, *size, oi->type_name->buf, real_oid)) {
+			if (oideq(real_oid, null_oid()))
+				BUG("should only get OID mismatch errors with mapped contents");
 			free(*contents);
 			goto out;
 		}
diff --git a/object-store.h b/object-store.h
index ba6e5d76c0..60b566a63b 100644
--- a/object-store.h
+++ b/object-store.h
@@ -384,6 +384,7 @@ int oid_object_info_extended(struct repository *r,
  */
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
+		      struct object_id *real_oid,
 		      void **contents,
 		      struct object_info *oi,
 		      unsigned int oi_flags);
@@ -505,7 +506,8 @@ int unpack_loose_header(git_zstream *stream, unsigned char *map,
 int parse_loose_header(const char *hdr, struct object_info *oi);
 
 int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *buf, unsigned long size, const char *type);
+			   void *buf, unsigned long size, const char *type,
+			   struct object_id *real_oidp);
 int finalize_object_file(const char *tmpfile, const char *filename);
 int check_and_freshen_file(const char *fn, int freshen);
 
diff --git a/object.c b/object.c
index 14188453c5..5467ead328 100644
--- a/object.c
+++ b/object.c
@@ -261,7 +261,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
 	    (!obj && repo_has_object_file(r, oid) &&
 	     oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
-		if (check_object_signature(r, repl, NULL, 0, NULL) < 0) {
+		if (check_object_signature(r, repl, NULL, 0, NULL, NULL) < 0) {
 			error(_("hash mismatch %s"), oid_to_hex(oid));
 			return NULL;
 		}
@@ -272,7 +272,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	buffer = repo_read_object_file(r, oid, &type, &size);
 	if (buffer) {
 		if (check_object_signature(r, repl, buffer, size,
-					   type_name(type)) < 0) {
+					   type_name(type), NULL) < 0) {
 			free(buffer);
 			error(_("hash mismatch %s"), oid_to_hex(repl));
 			return NULL;
diff --git a/pack-check.c b/pack-check.c
index 4b089fe8ec..e6aa4442c9 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -142,7 +142,8 @@ static int verify_packfile(struct repository *r,
 			err = error("cannot unpack %s from %s at offset %"PRIuMAX"",
 				    oid_to_hex(&oid), p->pack_name,
 				    (uintmax_t)entries[i].offset);
-		else if (check_object_signature(r, &oid, data, size, type_name(type)))
+		else if (check_object_signature(r, &oid, data, size,
+						type_name(type), NULL))
 			err = error("packed %s from %s is corrupt",
 				    oid_to_hex(&oid), p->pack_name);
 		else if (fn) {
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index f12b06150e..0bb5ee97de 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -490,7 +490,7 @@ test_expect_success 'cat-file -t and -s on corrupt loose object' '
 		# Swap the two to corrupt the repository
 		mv -v "$other_path" "$empty_path" &&
 		test_must_fail git fsck 2>err.fsck &&
-		grep "hash mismatch" err.fsck &&
+		grep "hash-path mismatch" err.fsck &&
 
 		# confirm that cat-file is reading the new swapped-in
 		# blob...
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index bc541af2cf..d76293c495 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -53,6 +53,7 @@ test_expect_success 'object with hash mismatch' '
 	(
 		cd hash-mismatch &&
 		oid=$(echo blob | git hash-object -w --stdin) &&
+		oldoid=$oid &&
 		old=$(test_oid_to_path "$oid") &&
 		new=$(dirname $old)/$(test_oid ff_2) &&
 		oid="$(dirname $new)$(basename $new)" &&
@@ -62,7 +63,7 @@ test_expect_success 'object with hash mismatch' '
 		cmt=$(echo bogus | git commit-tree $tree) &&
 		git update-ref refs/heads/bogus $cmt &&
 		test_must_fail git fsck 2>out &&
-		test_i18ngrep "$oid.*corrupt" out
+		grep "$oldoid: hash-path mismatch, found at: .*$new" out
 	)
 '
 
@@ -71,6 +72,7 @@ test_expect_success 'object with hash and type mismatch' '
 	(
 		cd hash-type-mismatch &&
 		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		oldoid=$oid &&
 		old=$(test_oid_to_path "$oid") &&
 		new=$(dirname $old)/$(test_oid ff_2) &&
 		oid="$(dirname $new)$(basename $new)" &&
@@ -80,8 +82,8 @@ test_expect_success 'object with hash and type mismatch' '
 		cmt=$(echo bogus | git commit-tree $tree) &&
 		git update-ref refs/heads/bogus $cmt &&
 		test_must_fail git fsck 2>out &&
-		grep "^error: hash mismatch for " out &&
-		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+		grep "^error: $oldoid: hash-path mismatch, found at: .*$new" out &&
+		grep "^error: $oldoid: object is of unknown type '"'"'garbage'"'"'" out
 	)
 '
 
-- 
2.32.0.rc0.406.g73369325f8d


^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v3 00/17] fsck: better "invalid object" error reporting
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (16 preceding siblings ...)
  2021-05-20 11:23       ` [PATCH v3 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
@ 2021-05-27 17:08       ` Jonathan Tan
  2021-05-28  0:18         ` Junio C Hamano
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
  18 siblings, 1 reply; 155+ messages in thread
From: Jonathan Tan @ 2021-05-27 17:08 UTC (permalink / raw)
  To: avarab; +Cc: git, gitster, peff, j6t, Jonathan Tan

> So in some senes this matters to nobody, but I'm doing this as part of
> general changes I've been pushing to make fsck/gc error reporting more
> graceful, and errors more recoverable. We now have a few more places
> in object-file.c where we don't just die(), but properly return
> API-like return codes/data to the caller instead.

Well, I guess it's useful if somehow your repository got corrupted, and
you want to pinpoint where it occurred.

> Ævar Arnfjörð Bjarmason (17):
>   fsck tests: refactor one test to use a sub-repo
>   fsck tests: add test for fsck-ing an unknown type
>   cat-file tests: test for missing object with -t and -s
>   cat-file tests: test that --allow-unknown-type isn't on by default
>   rev-list tests: test for behavior with invalid object types
>   cat-file tests: add corrupt loose object test
>   cat-file tests: test for current --allow-unknown-type behavior
>   cache.h: move object functions to object-store.h
>   object-file.c: make parse_loose_header_extended() public
>   object-file.c: add missing braces to loose_object_info()
>   object-file.c: stop dying in parse_loose_header()
>   object-file.c: return -2 on "header too long" in unpack_loose_header()
>   object-file.c: return -1, not "status" from unpack_loose_header()
>   fsck: don't hard die on invalid object types
>   object-store.h: move read_loose_object() below 'struct object_info'
>   fsck: report invalid types recorded in objects
>   fsck: report invalid object type-path combinations

My main comment as a reviewer is I think that there are a lot of
unrelated changes in this patch set - in particular, the first 7 tests
(1 fsck test that refactors something unrelated, 1 fsck test that I
presume will be overridden later, and 5 tests for other commands
unrelated to fsck). I'll also give comments on individual patches.

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v3 11/17] object-file.c: stop dying in parse_loose_header()
  2021-05-20 11:23       ` [PATCH v3 11/17] object-file.c: stop dying in parse_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-05-27 17:50         ` Jonathan Tan
  0 siblings, 0 replies; 155+ messages in thread
From: Jonathan Tan @ 2021-05-27 17:50 UTC (permalink / raw)
  To: avarab; +Cc: git, gitster, peff, j6t, Jonathan Tan

> Start the libification of parse_loose_header() by making it return
> error codes and data instead of invoking die() by itself. For now
> we'll move the relevant die() call to loose_object_info() and
> read_loose_object() to keep this change smaller, but in subsequent
> commits we'll also libify those.
> 
> The reason this makes sense is that with the refactoring of
> parse_loose_header_extended() in an earlier commit the public
> interface for parse_loose_header() no longer just accepts a "unsigned
> long *sizep". Rather it accepts a "struct object_info *", that
> structure will be populated with information about the object.
> 
> It thus makes sense to further libify the interface so that it stops
> calling die() when it encounters OBJ_BAD, and instead rely on its
> callers to check the populated "oi->typep".
> 
> This also allows us to simplify away the
> unpack_loose_header_to_strbuf() function added in
> 46f034483eb (sha1_file: support reading from a loose object of unknown
> type, 2015-05-03). Its code was mostly copy/pasted between it and both
> of unpack_loose_header() and unpack_loose_short_header(). We now have
> a single unpack_loose_header() function which accepts an optional
> "struct strbuf *" instead.
> 
> I think the remaining unpack_loose_header() function could be further
> simplified, we're carrying some complexity just to be able to emit a
> garbage type longer than MAX_HEADER_LEN, we could alternatively just
> say "we found a garbage type <first 32 bytes>..." instead, but let's
> leave this in place for now.

Looking at the patch itself, this patch:
 1. Combines unpack_loose_header(), unpack_loose_short_header(), and
    unpack_loose_header_to_strbuf() into one function. It does different
    things depending on whether the struct strbuf * is provided.
 2. Updates parse_loose_header() to:
    a. never die upon invalid object type
    b. not accept a flags argument (which was used solely to control
       whether it died or not upon invalid object type)
 3. Updates the callers of these functions.

I think 2b should have been mentioned in the commit message. (And
overall I think that the commit message could be shorter, but what
information to include and exclude is a subjective matter, so I won't
concern myself so much about that.)

Also, I think that 1 should be split into its own commit. As it is, I'm
not even sure if it's a good idea - for me, it is confusing for a
function to consume or not consume more of the stream depending on
whether an argument is NULL or not.

>  	/*
>  	 * Check if entire header is unpacked in the first iteration.
>  	 */
>  	if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
>  		return 0;
>  
> +	/*
> +	 * We have a header longer than MAX_HEADER_LEN. We abort early
> +	 * unless under we're running as e.g. "cat-file
> +	 * --allow-unknown-type".
> +	 */
> +	if (!header)
> +		return -1;

What do you mean by "unless under we're running as"? And how would we
know at this point that we're running as "cat-file --allow-unknown-type"
just by checking "header"?

> -	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
> -		if (unpack_loose_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
> -			status = error(_("unable to unpack %s header with --allow-unknown-type"),
> -				       oid_to_hex(oid));
> -	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
> +
> +	hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
> +				      allow_unknown ? &hdrbuf : NULL);
> +	if (hdr_ret < 0) {
>  		status = error(_("unable to unpack %s header"),
>  			       oid_to_hex(oid));
>  	}

This hunk would go into the split commit (for the unpack_loose_header()
refactoring) I suggested above.

> -
> -	if (status < 0) {
> -		/* Do nothing */
> -	} else if (hdrbuf.len) {
> -		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
> -			status = error(_("unable to parse %s header with --allow-unknown-type"),
> -				       oid_to_hex(oid));
> -	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0) {
> +	if (!status && parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi) < 0) {
>  		status = error(_("unable to parse %s header"), oid_to_hex(oid));
>  	}
> +	if (!allow_unknown && *oi->typep < 0)
> +		die(_("invalid object type"));

I think this change doesn't need to be so big? The oi->typep check could
just go in the "else if" branch that happens if --allow-unknown-type is
not set.

> diff --git a/object-store.h b/object-store.h
> index d443964447..740edcac30 100644
> --- a/object-store.h
> +++ b/object-store.h
> @@ -477,11 +477,30 @@ int for_each_object_in_pack(struct packed_git *p,
>  int for_each_packed_object(each_packed_object_fn, void *,
>  			   enum for_each_object_flags flags);
>  
> +/**
> + * unpack_loose_header() initializes the data stream needed to unpack
> + * a loose object header.
> + *
> + * Returns 0 on success. Returns negative values on error.
> + *
> + * It will only parse up to MAX_HEADER_LEN bytes unless an optional
> + * "hdrbuf" argument is non-NULL. This is intended for use with
> + * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
> + * reporting. The full header will be extracted to "hdrbuf" for use
> + * with parse_loose_header().
> + */
>  int unpack_loose_header(git_zstream *stream, unsigned char *map,
>  			unsigned long mapsize, void *buffer,
> +			unsigned long bufsiz, struct strbuf *hdrbuf);

Parsing up to MAX_HEADER_LEN only occasionally is confusing. Could we
always make it parse up to MAX_HEADER_LEN, and then put a truncated
header in hdrbuf if it is too big (and therefore invalid)?

> +/**
> + * parse_loose_header() parses the starting "<type> <len>\0" of an
> + * object. If it doesn't follow that format -1 is returned. To check
> + * the validity of the <type> populate the "typep" in the "struct
> + * object_info". It will be OBJ_BAD if the object type is unknown.
> + */
> +int parse_loose_header(const char *hdr, struct object_info *oi);

Also mention what happens if the size is invalid.

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v3 12/17] object-file.c: return -2 on "header too long" in unpack_loose_header()
  2021-05-20 11:23       ` [PATCH v3 12/17] object-file.c: return -2 on "header too long" in unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-05-27 17:54         ` Jonathan Tan
  0 siblings, 0 replies; 155+ messages in thread
From: Jonathan Tan @ 2021-05-27 17:54 UTC (permalink / raw)
  To: avarab; +Cc: git, gitster, peff, j6t, Jonathan Tan

> diff --git a/object-store.h b/object-store.h
> index 740edcac30..9accb614fc 100644
> --- a/object-store.h
> +++ b/object-store.h
> @@ -481,13 +481,15 @@ int for_each_packed_object(each_packed_object_fn, void *,
>   * unpack_loose_header() initializes the data stream needed to unpack
>   * a loose object header.
>   *
> - * Returns 0 on success. Returns negative values on error.
> + * Returns 0 on success. Returns negative values on error. If the
> + * header exceeds MAX_HEADER_LEN -2 will be returned.
>   *
>   * It will only parse up to MAX_HEADER_LEN bytes unless an optional
>   * "hdrbuf" argument is non-NULL. This is intended for use with
>   * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
>   * reporting. The full header will be extracted to "hdrbuf" for use
> - * with parse_loose_header().
> + * with parse_loose_header(), -2 will still be returned from this
> + * function to indicate that the header was too long.
>   */
>  int unpack_loose_header(git_zstream *stream, unsigned char *map,
>  			unsigned long mapsize, void *buffer,

Can the return type be an enum in this case?

> diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
> index d3d3fd733a..f12b06150e 100755
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -440,7 +440,7 @@ bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_t
>  
>  test_expect_success 'die on broken object with large type under -t and -s without --allow-unknown-type' '
>  	cat >err.expect <<-EOF &&
> -	error: unable to unpack $bogus_sha1 header
> +	error: header for $bogus_sha1 too long, exceeds 32 bytes
>  	fatal: git cat-file: could not get object info
>  	EOF

Ah, the error message is much more informative - thanks.

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v3 14/17] fsck: don't hard die on invalid object types
  2021-05-20 11:23       ` [PATCH v3 14/17] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
@ 2021-05-27 18:18         ` Jonathan Tan
  0 siblings, 0 replies; 155+ messages in thread
From: Jonathan Tan @ 2021-05-27 18:18 UTC (permalink / raw)
  To: avarab; +Cc: git, gitster, peff, j6t, Jonathan Tan

> diff --git a/object-file.c b/object-file.c
> index 0de699de98..0e8a024eb3 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -2522,7 +2522,8 @@ int read_loose_object(const char *path,
>  		      const struct object_id *expected_oid,
>  		      enum object_type *type,
>  		      unsigned long *size,
> -		      void **contents)
> +		      void **contents,
> +		      unsigned int oi_flags)
>  {
>  	int ret = -1;
>  	void *map = NULL;
> @@ -2530,6 +2531,7 @@ int read_loose_object(const char *path,
>  	git_zstream stream;
>  	char hdr[MAX_HEADER_LEN];
>  	struct object_info oi = OBJECT_INFO_INIT;
> +	int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
>  	oi.typep = type;
>  	oi.sizep = size;
>  
> @@ -2552,8 +2554,11 @@ int read_loose_object(const char *path,
>  		git_inflate_end(&stream);
>  		goto out;
>  	}
> -	if (*type < 0)
> -		die(_("invalid object type"));
> +	if (!allow_unknown && *type < 0) {
> +		error(_("header for %s declares an unknown type"), path);
> +		git_inflate_end(&stream);
> +		goto out;
> +	}
>  
>  	if (*type == OBJ_BLOB && *size > big_file_threshold) {
>  		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)

So instead of dying, we print an error and behave as if the object was
invalid for some other reason. Makes sense.

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v3 16/17] fsck: report invalid types recorded in objects
  2021-05-20 11:23       ` [PATCH v3 16/17] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
@ 2021-05-27 18:24         ` Jonathan Tan
  0 siblings, 0 replies; 155+ messages in thread
From: Jonathan Tan @ 2021-05-27 18:24 UTC (permalink / raw)
  To: avarab; +Cc: git, gitster, peff, j6t, Jonathan Tan

> diff --git a/object-store.h b/object-store.h
> index 698a701d70..ba6e5d76c0 100644
> --- a/object-store.h
> +++ b/object-store.h
> @@ -376,6 +376,7 @@ int oid_object_info_extended(struct repository *r,
>  
>  /*
>   * Open the loose object at path, check its hash, and return the contents,
> + * use the "oi" argument to assert things about the object, or e.g. populate its
>   * type, and size. If the object is a blob, then "contents" may return NULL,
>   * to allow streaming of large blobs.
>   *
> @@ -383,9 +384,8 @@ int oid_object_info_extended(struct repository *r,
>   */
>  int read_loose_object(const char *path,
>  		      const struct object_id *expected_oid,
> -		      enum object_type *type,
> -		      unsigned long *size,
>  		      void **contents,
> +		      struct object_info *oi,
>  		      unsigned int oi_flags);
>  
>  /*

What do you mean by using "oi" to assert things? As far as I know, it's
just a way to specify which information you want (by assigning pointers
to the appropriate fields).

> diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
> index e7e8decebb..bc541af2cf 100755
> --- a/t/t1450-fsck.sh
> +++ b/t/t1450-fsck.sh
> @@ -66,6 +66,25 @@ test_expect_success 'object with hash mismatch' '
>  	)
>  '
>  
> +test_expect_success 'object with hash and type mismatch' '
> +	test_create_repo hash-type-mismatch &&
> +	(
> +		cd hash-type-mismatch &&
> +		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
> +		old=$(test_oid_to_path "$oid") &&
> +		new=$(dirname $old)/$(test_oid ff_2) &&
> +		oid="$(dirname $new)$(basename $new)" &&
> +		mv .git/objects/$old .git/objects/$new &&
> +		git update-index --add --cacheinfo 100644 $oid foo &&
> +		tree=$(git write-tree) &&
> +		cmt=$(echo bogus | git commit-tree $tree) &&
> +		git update-ref refs/heads/bogus $cmt &&
> +		test_must_fail git fsck 2>out &&
> +		grep "^error: hash mismatch for " out &&
> +		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out

I don't think we need to know the precise quote used - it's simpler to
just put "." where we need the single quote. Same for the other case
below.

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v3 17/17] fsck: report invalid object type-path combinations
  2021-05-20 11:23       ` [PATCH v3 17/17] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
@ 2021-05-27 18:28         ` Jonathan Tan
  0 siblings, 0 replies; 155+ messages in thread
From: Jonathan Tan @ 2021-05-27 18:28 UTC (permalink / raw)
  To: avarab; +Cc: git, gitster, peff, j6t, Jonathan Tan

> Improve the error that's emitted in cases where we find a loose object
> we parse, but which isn't at the location we expect it to be.

I'll hold off reviewing this for 2 reasons - firstly, I don't think it's
related to the other patches that are about reporting a wrong object
type, and secondly, in the case of repository corruption, I don't see
how useful computing (and reporting) the hash of a corrupt object is.

If someone else wants to take a look at this, that would be great, but
otherwise I would suggest splitting this into its own patch set.

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v3 04/17] cat-file tests: test that --allow-unknown-type isn't on by default
  2021-05-20 11:22       ` [PATCH v3 04/17] cat-file tests: test that --allow-unknown-type isn't on by default Ævar Arnfjörð Bjarmason
@ 2021-05-27 21:17         ` Jonathan Nieder
  2021-05-28  3:10           ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 155+ messages in thread
From: Jonathan Nieder @ 2021-05-27 21:17 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Johannes Sixt

Hi,

Ævar Arnfjörð Bjarmason wrote:

> Fix a blindspot in the tests added in the tests for the
> --allow-unknown-type feature, added in 39e4ae38804 (cat-file: teach
> cat-file a '--allow-unknown-type' option, 2015-05-03).
>
> Before this change all the tests would succeed if --allow-unknown-type
> was on by default, let's fix that by asserting that -t and -s die on a
> "garbage" type without --allow-unknown-type.

nit: "tests added in the tests" seems oddly repetitive.

More importantly, I'm curious about the desired behavior here.  The
idea behind cat-file --allow-unknown-type is that I can use it to
inspect an invalid object, for example after it has been reported by
git fsck.  The commit that introduced it (39e4ae3880, "cat-file: teach
cat-file a '--allow-unknown-type' option", 2015-05-03) gives the hint
"query broken/corrupt objects" in the documentation, so I figure
that's what it's for, and I'm sympathetic.

But: why is that an option instead of something that we always do?

In other words, is there some situation where I would not want the
more permissive behavior from cat-file against a bad object?

Thanks,
Jonathan

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v3 00/17] fsck: better "invalid object" error reporting
  2021-05-27 17:08       ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Jonathan Tan
@ 2021-05-28  0:18         ` Junio C Hamano
  2021-05-28  5:41           ` Felipe Contreras
  0 siblings, 1 reply; 155+ messages in thread
From: Junio C Hamano @ 2021-05-28  0:18 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: avarab, git, peff, j6t

Jonathan Tan <jonathantanmy@google.com> writes:

> My main comment as a reviewer is I think that there are a lot of
> unrelated changes in this patch set ...

Thanks for reviewing.  I share the same feeling, not specifically
about this series, but I find that "doing too many while-at-it
changes" is shared among the topics by the same author, and I often
wish each topic were more focused.


^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v3 04/17] cat-file tests: test that --allow-unknown-type isn't on by default
  2021-05-27 21:17         ` Jonathan Nieder
@ 2021-05-28  3:10           ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-05-28  3:10 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git, Junio C Hamano, Jeff King, Johannes Sixt


On Thu, May 27 2021, Jonathan Nieder wrote:

> Hi,
>
> Ævar Arnfjörð Bjarmason wrote:
>
>> Fix a blindspot in the tests added in the tests for the
>> --allow-unknown-type feature, added in 39e4ae38804 (cat-file: teach
>> cat-file a '--allow-unknown-type' option, 2015-05-03).
>>
>> Before this change all the tests would succeed if --allow-unknown-type
>> was on by default, let's fix that by asserting that -t and -s die on a
>> "garbage" type without --allow-unknown-type.
>
> nit: "tests added in the tests" seems oddly repetitive.
>
> More importantly, I'm curious about the desired behavior here.  The
> idea behind cat-file --allow-unknown-type is that I can use it to
> inspect an invalid object, for example after it has been reported by
> git fsck.  The commit that introduced it (39e4ae3880, "cat-file: teach
> cat-file a '--allow-unknown-type' option", 2015-05-03) gives the hint
> "query broken/corrupt objects" in the documentation, so I figure
> that's what it's for, and I'm sympathetic.
>
> But: why is that an option instead of something that we always do?
>
> In other words, is there some situation where I would not want the
> more permissive behavior from cat-file against a bad object?

Yes. I suggested as much in
https://lore.kernel.org/git/87r1i4qf4h.fsf@evledraar.gmail.com/

For this series though I'm sticking to testing for the existing behavior
+ fixing the immediate fsck issues. I've got some local patches queued
up for after this topic lands (after I re-roll it, re-submit etc.) that
do that.

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v3 00/17] fsck: better "invalid object" error reporting
  2021-05-28  0:18         ` Junio C Hamano
@ 2021-05-28  5:41           ` Felipe Contreras
  0 siblings, 0 replies; 155+ messages in thread
From: Felipe Contreras @ 2021-05-28  5:41 UTC (permalink / raw)
  To: Junio C Hamano, Jonathan Tan; +Cc: avarab, git, peff, j6t

Junio C Hamano wrote:
> Jonathan Tan <jonathantanmy@google.com> writes:
> 
> > My main comment as a reviewer is I think that there are a lot of
> > unrelated changes in this patch set ...
> 
> Thanks for reviewing.  I share the same feeling, not specifically
> about this series, but I find that "doing too many while-at-it
> changes" is shared among the topics by the same author, and I often
> wish each topic were more focused.

The author has a name.

I understand why as a reviewer you want a small patch series, but as a
patch-writer you want your code to land on master.

Perhaps if there was an actual incentive to split a patch series more
people would do so, but in my personal experience that has not been the
case.

If I had to name the reason why some of my patches have landed on master I
would say it's *arbitrary*. Maybe you catch reviewers on a good mood, or
the maintainer in-between release candidates. But regardless of the
actual reason, patch-series' size doesn't seem to be a huge factor.

As exhibit I can five two patch-series of mine:

  1. https://lore.kernel.org/git/20201223144845.143039-1-felipe.contreras@gmail.com/
  2. https://lore.kernel.org/git/20210426161458.49860-1-felipe.contreras@gmail.com/

The first one is 4 patches. The second one is 43.

The second one receved feedback from the maintainer. The first one was
complerely ignored. Neither were acceped.

This is not intended to point fingers at anyone, merely to state the
a mathematical fact.


Splitting a patch series is usually more work. If there's no real
incentive for a submitter to do so, why would she/he?

Cheers.

-- 
Felipe Contreras

^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v4 00/21] fsck: better "invalid object" error reporting
  2021-05-20 11:22     ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Ævar Arnfjörð Bjarmason
                         ` (17 preceding siblings ...)
  2021-05-27 17:08       ` [PATCH v3 00/17] fsck: better "invalid object" error reporting Jonathan Tan
@ 2021-06-24 19:23       ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 01/21] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
                           ` (21 more replies)
  18 siblings, 22 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

A late re-roll of the v3[1] that's in "seen" and causing a couple of
CI errors, both are solved with this version. For a recap of what this
is about see v3's summary[1].

One was a "mv -v" issue on OSX, it asks interactively, and defaults to
no, now moved to "mv -f" (as in other tests that move .git/objects/*
files directly).

The other happened on one of the docker32 Linux boxes, and turned out
to be an uninitialized variable bug, which happened to be initialized
to a negative value there, and zero (or positive) everywhere else.

I also went through all the commentary on the v3 (particularly good
feedback from Jonathan Tan) and hopefully addressed everything in one
way or another. In particular the 12/21 is new here, split off from
what's now 14/21. That change was the most complex one in the previous
series.

1. https://lore.kernel.org/git/cover-00.17-0000000000-20210520T111610Z-avarab@gmail.com/

Ævar Arnfjörð Bjarmason (21):
  fsck tests: refactor one test to use a sub-repo
  fsck tests: add test for fsck-ing an unknown type
  cat-file tests: test for missing object with -t and -s
  cat-file tests: test that --allow-unknown-type isn't on by default
  rev-list tests: test for behavior with invalid object types
  cat-file tests: add corrupt loose object test
  cat-file tests: test for current --allow-unknown-type behavior
  cache.h: move object functions to object-store.h
  object-file.c: don't set "typep" when returning non-zero
  object-file.c: make parse_loose_header_extended() public
  object-file.c: add missing braces to loose_object_info()
  object-file.c: simplify unpack_loose_short_header()
  object-file.c: split up ternary in parse_loose_header()
  object-file.c: stop dying in parse_loose_header()
  object-file.c: guard against future bugs in loose_object_info()
  object-file.c: return -1, not "status" from unpack_loose_header()
  object-file.c: return -2 on "header too long" in unpack_loose_header()
  fsck: don't hard die on invalid object types
  object-store.h: move read_loose_object() below 'struct object_info'
  fsck: report invalid types recorded in objects
  fsck: report invalid object type-path combinations

 builtin/fast-export.c  |   2 +-
 builtin/fsck.c         |  28 ++++++-
 builtin/index-pack.c   |   2 +-
 builtin/mktag.c        |   3 +-
 cache.h                |  10 ---
 object-file.c          | 178 +++++++++++++++++++++--------------------
 object-store.h         |  62 +++++++++++---
 object.c               |   4 +-
 pack-check.c           |   3 +-
 streaming.c            |  10 ++-
 t/t1006-cat-file.sh    | 169 ++++++++++++++++++++++++++++++++++++++
 t/t1450-fsck.sh        |  64 +++++++++++----
 t/t6115-rev-list-du.sh |  11 +++
 13 files changed, 407 insertions(+), 139 deletions(-)

Range-diff against v3:
 1:  aa38b2bf9e7 =  1:  2e37971c016 fsck tests: refactor one test to use a sub-repo
 2:  82b64abd250 =  2:  79630a99433 fsck tests: add test for fsck-ing an unknown type
 3:  7c3c2fe25d9 =  3:  2b5366bfb9d cat-file tests: test for missing object with -t and -s
 4:  871b8200035 !  4:  ea9a5ef0920 cat-file tests: test that --allow-unknown-type isn't on by default
    @@ Metadata
      ## Commit message ##
         cat-file tests: test that --allow-unknown-type isn't on by default
     
    -    Fix a blindspot in the tests added in the tests for the
    -    --allow-unknown-type feature, added in 39e4ae38804 (cat-file: teach
    -    cat-file a '--allow-unknown-type' option, 2015-05-03).
    +    Fix a blindspot in the tests for the --allow-unknown-type feature
    +    added in 39e4ae38804 (cat-file: teach cat-file a
    +    '--allow-unknown-type' option, 2015-05-03). We should check that
    +    --allow-unknown-type isn't on by default.
     
         Before this change all the tests would succeed if --allow-unknown-type
         was on by default, let's fix that by asserting that -t and -s die on a
 5:  b98da9cc89e =  5:  8eaf0e6ddda rev-list tests: test for behavior with invalid object types
 6:  04cc1d20f62 !  6:  f0e9d92414e cat-file tests: add corrupt loose object test
    @@ t/t1006-cat-file.sh: test_expect_success "Size of large broken object is correct
     +		test_cmp out.expect out.actual &&
     +
     +		# Swap the two to corrupt the repository
    -+		mv -v "$other_path" "$empty_path" &&
    ++		mv -f "$other_path" "$empty_path" &&
     +		test_must_fail git fsck 2>err.fsck &&
     +		grep "hash mismatch" err.fsck &&
     +
    @@ t/t1006-cat-file.sh: test_expect_success "Size of large broken object is correct
     +
     +		# So far "cat-file" has been happy to spew the found
     +		# content out as-is. Try to make it zlib-invalid.
    -+		mv -v other.blob "$empty_path" &&
    ++		mv -f other.blob "$empty_path" &&
     +		test_must_fail git fsck 2>err.fsck &&
     +		grep "^error: inflate: data stream error (" err.fsck
     +	)
 7:  9217320888f =  7:  d797d2e8e9d cat-file tests: test for current --allow-unknown-type behavior
 8:  12dd4538794 =  8:  96310a0bb59 cache.h: move object functions to object-store.h
 -:  ----------- >  9:  54fb9189408 object-file.c: don't set "typep" when returning non-zero
 9:  6a5b78dcad8 = 10:  9d36fcbc44a object-file.c: make parse_loose_header_extended() public
10:  5d31d7e1a54 ! 11:  74c308adc19 object-file.c: add missing braces to loose_object_info()
    @@ object-file.c: static int loose_object_info(struct repository *r,
     +	}
      
      	munmap(map, mapsize);
    - 	if (status && oi->typep)
    + 	if (oi->sizep == &size_scratch)
11:  ee28089219f ! 12:  3f52149bfde object-file.c: stop dying in parse_loose_header()
    @@ Metadata
     Author: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## Commit message ##
    -    object-file.c: stop dying in parse_loose_header()
    +    object-file.c: simplify unpack_loose_short_header()
     
    -    Start the libification of parse_loose_header() by making it return
    -    error codes and data instead of invoking die() by itself. For now
    -    we'll move the relevant die() call to loose_object_info() and
    -    read_loose_object() to keep this change smaller, but in subsequent
    -    commits we'll also libify those.
    +    Combine the unpack_loose_short_header(),
    +    unpack_loose_header_to_strbuf() and unpack_loose_header() functions
    +    into one.
     
    -    The reason this makes sense is that with the refactoring of
    -    parse_loose_header_extended() in an earlier commit the public
    -    interface for parse_loose_header() no longer just accepts a "unsigned
    -    long *sizep". Rather it accepts a "struct object_info *", that
    -    structure will be populated with information about the object.
    -
    -    It thus makes sense to further libify the interface so that it stops
    -    calling die() when it encounters OBJ_BAD, and instead rely on its
    -    callers to check the populated "oi->typep".
    -
    -    This also allows us to simplify away the
    -    unpack_loose_header_to_strbuf() function added in
    +    The unpack_loose_header_to_strbuf() function was added in
         46f034483eb (sha1_file: support reading from a loose object of unknown
    -    type, 2015-05-03). Its code was mostly copy/pasted between it and both
    -    of unpack_loose_header() and unpack_loose_short_header(). We now have
    -    a single unpack_loose_header() function which accepts an optional
    +    type, 2015-05-03).
    +
    +    Its code was mostly copy/pasted between it and both of
    +    unpack_loose_header() and unpack_loose_short_header(). We now have a
    +    single unpack_loose_header() function which accepts an optional
         "struct strbuf *" instead.
     
         I think the remaining unpack_loose_header() function could be further
         simplified, we're carrying some complexity just to be able to emit a
         garbage type longer than MAX_HEADER_LEN, we could alternatively just
    -    say "we found a garbage type <first 32 bytes>..." instead, but let's
    -    leave this in place for now.
    +    say "we found a garbage type <first 32 bytes>..." instead. But let's
    +    leave the current behavior in place for now.
     
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
    @@ object-file.c: static int unpack_loose_short_header(git_zstream *stream,
      		return 0;
      
     +	/*
    -+	 * We have a header longer than MAX_HEADER_LEN. We abort early
    -+	 * unless under we're running as e.g. "cat-file
    ++	 * We have a header longer than MAX_HEADER_LEN. The "header"
    ++	 * here is only non-NULL when we run "cat-file
     +	 * --allow-unknown-type".
     +	 */
     +	if (!header)
    @@ object-file.c: static int unpack_loose_short_header(git_zstream *stream,
      	/*
      	 * buffer[0..bufsiz] was not large enough.  Copy the partial
      	 * result out to header, and then append the result of further
    -@@ object-file.c: static void *unpack_loose_rest(git_zstream *stream,
    -  * too permissive for what we want to check. So do an anal
    -  * object header parse by hand.
    -  */
    --int parse_loose_header(const char *hdr,
    --		       struct object_info *oi,
    --		       unsigned int flags)
    -+int parse_loose_header(const char *hdr, struct object_info *oi)
    - {
    - 	const char *type_buf = hdr;
    - 	unsigned long size;
    -@@ object-file.c: int parse_loose_header(const char *hdr,
    - 	type = type_from_string_gently(type_buf, type_len, 1);
    - 	if (oi->type_name)
    - 		strbuf_add(oi->type_name, type_buf, type_len);
    --	/*
    --	 * Set type to 0 if its an unknown object and
    --	 * we're obtaining the type using '--allow-unknown-type'
    --	 * option.
    --	 */
    --	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
    --		type = 0;
    --	else if (type < 0)
    --		die(_("invalid object type"));
    - 	if (oi->typep)
    - 		*oi->typep = type;
    - 
    -@@ object-file.c: int parse_loose_header(const char *hdr,
    - 	/*
    - 	 * The length must be followed by a zero byte
    - 	 */
    --	return *hdr ? -1 : type;
    -+	if (*hdr)
    -+		return -1;
    -+
    -+	/*
    -+	 * The format is valid, but the type may still be bogus. The
    -+	 * Caller needs to check its oi->typep.
    -+	 */
    -+	return 0;
    - }
    - 
    - static int loose_object_info(struct repository *r,
     @@ object-file.c: static int loose_object_info(struct repository *r,
      	unsigned long mapsize;
      	void *map;
    @@ object-file.c: static int loose_object_info(struct repository *r,
      	char hdr[MAX_HEADER_LEN];
      	struct strbuf hdrbuf = STRBUF_INIT;
      	unsigned long size_scratch;
    -+	enum object_type type_scratch;
     +	int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
      
      	if (oi->delta_base_oid)
      		oidclr(oi->delta_base_oid);
     @@ object-file.c: static int loose_object_info(struct repository *r,
      
    - 	if (!oi->sizep)
    - 		oi->sizep = &size_scratch;
    -+	if (!oi->typep)
    -+		oi->typep = &type_scratch;
    - 
      	if (oi->disk_sizep)
      		*oi->disk_sizep = mapsize;
     -	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
    @@ object-file.c: static int loose_object_info(struct repository *r,
      		status = error(_("unable to unpack %s header"),
      			       oid_to_hex(oid));
      	}
    --
    --	if (status < 0) {
    --		/* Do nothing */
    --	} else if (hdrbuf.len) {
    --		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
    --			status = error(_("unable to parse %s header with --allow-unknown-type"),
    --				       oid_to_hex(oid));
    --	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0) {
    -+	if (!status && parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi) < 0) {
    - 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
    - 	}
    -+	if (!allow_unknown && *oi->typep < 0)
    -+		die(_("invalid object type"));
    - 
    - 	if (status >= 0 && oi->contentp) {
    - 		*oi->contentp = unpack_loose_rest(&stream, hdr,
    -@@ object-file.c: static int loose_object_info(struct repository *r,
    - 		*oi->typep = status;
    - 	if (oi->sizep == &size_scratch)
    - 		oi->sizep = NULL;
    --	strbuf_release(&hdrbuf);
    -+	if (oi->typep == &type_scratch)
    -+		oi->typep = NULL;
    - 	oi->whence = OI_LOOSE;
    - 	return (status < 0) ? status : 0;
    - }
    -@@ object-file.c: int read_loose_object(const char *path,
    - 	git_zstream stream;
    - 	char hdr[MAX_HEADER_LEN];
    - 	struct object_info oi = OBJECT_INFO_INIT;
    -+	oi.typep = type;
    - 	oi.sizep = size;
    - 
    - 	*contents = NULL;
     @@ object-file.c: int read_loose_object(const char *path,
      		goto out;
      	}
    @@ object-file.c: int read_loose_object(const char *path,
      		error(_("unable to unpack header of %s"), path);
      		goto out;
      	}
    - 
    --	*type = parse_loose_header(hdr, &oi, 0);
    --	if (*type < 0) {
    -+	if (parse_loose_header(hdr, &oi) < 0) {
    - 		error(_("unable to parse header of %s"), path);
    - 		git_inflate_end(&stream);
    - 		goto out;
    - 	}
    -+	if (*type < 0)
    -+		die(_("invalid object type"));
    - 
    - 	if (*type == OBJ_BLOB && *size > big_file_threshold) {
    - 		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
     
      ## object-store.h ##
     @@ object-store.h: int for_each_object_in_pack(struct packed_git *p,
    @@ object-store.h: int for_each_object_in_pack(struct packed_git *p,
      int unpack_loose_header(git_zstream *stream, unsigned char *map,
      			unsigned long mapsize, void *buffer,
     -			unsigned long bufsiz);
    --int parse_loose_header(const char *hdr, struct object_info *oi,
    --		       unsigned int flags);
     +			unsigned long bufsiz, struct strbuf *hdrbuf);
    -+
    -+/**
    -+ * parse_loose_header() parses the starting "<type> <len>\0" of an
    -+ * object. If it doesn't follow that format -1 is returned. To check
    -+ * the validity of the <type> populate the "typep" in the "struct
    -+ * object_info". It will be OBJ_BAD if the object type is unknown.
    -+ */
    -+int parse_loose_header(const char *hdr, struct object_info *oi);
    -+
    + int parse_loose_header(const char *hdr, struct object_info *oi,
    + 		       unsigned int flags);
      int check_object_signature(struct repository *r, const struct object_id *oid,
    - 			   void *buf, unsigned long size, const char *type);
    - int finalize_object_file(const char *tmpfile, const char *filename);
     
      ## streaming.c ##
    -@@ streaming.c: static int open_istream_loose(struct git_istream *st, struct repository *r,
    - {
    - 	struct object_info oi = OBJECT_INFO_INIT;
    - 	oi.sizep = &st->size;
    -+	oi.typep = type;
    - 
    - 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
    - 	if (!st->u.loose.mapped)
     @@ streaming.c: static int open_istream_loose(struct git_istream *st, struct repository *r,
      				 st->u.loose.mapped,
      				 st->u.loose.mapsize,
      				 st->u.loose.hdr,
     -				 sizeof(st->u.loose.hdr)) < 0) ||
    --	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
     +				 sizeof(st->u.loose.hdr),
     +				 NULL) < 0) ||
    -+	    (parse_loose_header(st->u.loose.hdr, &oi) < 0) ||
    -+	    *type < 0) {
    + 	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
      		git_inflate_end(&st->z);
      		munmap(st->u.loose.mapped, st->u.loose.mapsize);
    - 		return -1;
 -:  ----------- > 13:  ba632be1520 object-file.c: split up ternary in parse_loose_header()
 -:  ----------- > 14:  ea4f446f5b1 object-file.c: stop dying in parse_loose_header()
 -:  ----------- > 15:  aacef784eab object-file.c: guard against future bugs in loose_object_info()
13:  d22d5b8b85e = 16:  050cfc7808c object-file.c: return -1, not "status" from unpack_loose_header()
12:  77f2cd439c6 ! 17:  78e3152fd94 object-file.c: return -2 on "header too long" in unpack_loose_header()
    @@ Commit message
         MAX_HEADER_LEN limit, or other negative values for "unable to unpack
         <OID> header".
     
    +    I tried setting up an enum just for these three return values, but I
    +    think the result was less readable. Let's consider doing that if we
    +    gain even more return values. For now let's do the next best thing and
    +    enumerate our known return values, and BUG() if we encounter one we
    +    don't know about.
    +
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## object-file.c ##
     @@ object-file.c: int unpack_loose_header(git_zstream *stream,
    - 	/*
    - 	 * We have a header longer than MAX_HEADER_LEN. We abort early
    - 	 * unless under we're running as e.g. "cat-file
    --	 * --allow-unknown-type".
    -+	 * --allow-unknown-type". A -2 is "header too long"
    + 	 * --allow-unknown-type".
      	 */
      	if (!header)
     -		return -1;
    @@ object-file.c: int unpack_loose_header(git_zstream *stream,
      
      static void *unpack_loose_rest(git_zstream *stream,
     @@ object-file.c: static int loose_object_info(struct repository *r,
    + 
      	hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
      				      allow_unknown ? &hdrbuf : NULL);
    - 	if (hdr_ret < 0) {
    --		status = error(_("unable to unpack %s header"),
    --			       oid_to_hex(oid));
    -+		if (hdr_ret == -2)
    -+			status = error(_("header for %s too long, exceeds %d bytes"),
    -+				       oid_to_hex(oid), MAX_HEADER_LEN);
    -+		else
    -+			status = error(_("unable to unpack %s header"),
    -+				       oid_to_hex(oid));
    - 	}
    -+
    - 	if (!status && parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi) < 0) {
    - 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
    +-	if (hdr_ret < 0) {
    ++	switch (hdr_ret) {
    ++	case 0:
    ++		break;
    ++	case -1:
    + 		status = error(_("unable to unpack %s header"),
    + 			       oid_to_hex(oid));
    ++		break;
    ++	case -2:
    ++		status = error(_("header for %s too long, exceeds %d bytes"),
    ++			       oid_to_hex(oid), MAX_HEADER_LEN);
    ++		break;
    ++	default:
    ++		BUG("unknown hdr_ret value %d", hdr_ret);
      	}
    + 	if (!status) {
    + 		if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
     
      ## object-store.h ##
     @@ object-store.h: int for_each_packed_object(each_packed_object_fn, void *,
14:  260e9888a3e = 18:  f9bb1b799ac fsck: don't hard die on invalid object types
15:  e2afb813b28 = 19:  acbea7e2a2a object-store.h: move read_loose_object() below 'struct object_info'
16:  328f05c51b3 = 20:  edc28de229d fsck: report invalid types recorded in objects
17:  c5e6686765d ! 21:  e588c05f461 fsck: report invalid object type-path combinations
    @@ pack-check.c: static int verify_packfile(struct repository *r,
      ## t/t1006-cat-file.sh ##
     @@ t/t1006-cat-file.sh: test_expect_success 'cat-file -t and -s on corrupt loose object' '
      		# Swap the two to corrupt the repository
    - 		mv -v "$other_path" "$empty_path" &&
    + 		mv -f "$other_path" "$empty_path" &&
      		test_must_fail git fsck 2>err.fsck &&
     -		grep "hash mismatch" err.fsck &&
     +		grep "hash-path mismatch" err.fsck &&
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v4 01/21] fsck tests: refactor one test to use a sub-repo
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 22:00           ` Andrei Rybak
  2021-06-24 19:23         ` [PATCH v4 02/21] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
                           ` (20 subsequent siblings)
  21 siblings, 1 reply; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Refactor one of the fsck tests to use a throwaway repository. It's a
pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
teardown of a tests so we're not leaving corrupt content for the next
test.

We should instead simply use something like this test_create_repo
pattern. It's both less verbose, and makes things easier to debug as a
failing test can have their state left behind under -d without
damaging the state for other tests.

But let's punt on that general refactoring and just change this one
test, I'm going to change it further in subsequent commits.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 34 ++++++++++++++++------------------
 1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 5071ac63a5b..1563b35f88c 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -48,24 +48,22 @@ remove_object () {
 	rm "$(sha1_file "$1")"
 }
 
-test_expect_success 'object with bad sha1' '
-	sha=$(echo blob | git hash-object -w --stdin) &&
-	old=$(test_oid_to_path "$sha") &&
-	new=$(dirname $old)/$(test_oid ff_2) &&
-	sha="$(dirname $new)$(basename $new)" &&
-	mv .git/objects/$old .git/objects/$new &&
-	test_when_finished "remove_object $sha" &&
-	git update-index --add --cacheinfo 100644 $sha foo &&
-	test_when_finished "git read-tree -u --reset HEAD" &&
-	tree=$(git write-tree) &&
-	test_when_finished "remove_object $tree" &&
-	cmt=$(echo bogus | git commit-tree $tree) &&
-	test_when_finished "remove_object $cmt" &&
-	git update-ref refs/heads/bogus $cmt &&
-	test_when_finished "git update-ref -d refs/heads/bogus" &&
-
-	test_must_fail git fsck 2>out &&
-	test_i18ngrep "$sha.*corrupt" out
+test_expect_success 'object with hash mismatch' '
+	test_create_repo hash-mismatch &&
+	(
+		cd hash-mismatch &&
+		oid=$(echo blob | git hash-object -w --stdin) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+		mv .git/objects/$old .git/objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+		test_must_fail git fsck 2>out &&
+		test_i18ngrep "$oid.*corrupt" out
+	)
 '
 
 test_expect_success 'branch pointing to non-commit' '
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v4 02/21] fsck tests: add test for fsck-ing an unknown type
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 01/21] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 03/21] cat-file tests: test for missing object with -t and -s Ævar Arnfjörð Bjarmason
                           ` (19 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Fix a blindspot in the fsck tests by checking what we do when we
encounter an unknown "garbage" type produced with hash-object's
--literally option.

This behavior needs to be improved, which'll be done in subsequent
patches, but for now let's test for the current behavior.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 1563b35f88c..f36ec1e2f4a 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -863,4 +863,16 @@ test_expect_success 'detect corrupt index file in fsck' '
 	test_i18ngrep "bad index file" errors
 '
 
+test_expect_success 'fsck hard errors on an invalid object type' '
+	test_create_repo garbage-type &&
+	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
+	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type
+	EOF
+	test_must_fail git -C garbage-type fsck >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual
+'
+
 test_done
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v4 03/21] cat-file tests: test for missing object with -t and -s
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 01/21] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 02/21] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 04/21] cat-file tests: test that --allow-unknown-type isn't on by default Ævar Arnfjörð Bjarmason
                           ` (18 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Test for what happens when the -t and -s flags are asked to operate on
a missing object, this extends tests added in 3e370f9faf0 (t1006: add
tests for git cat-file --allow-unknown-type, 2015-05-03). The -t and
-s flags are the only ones that can be combined with
--allow-unknown-type, so let's test with and without that flag.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 5d2dc99b74a..b71ef94329e 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -315,6 +315,33 @@ test_expect_success '%(deltabase) reports packed delta bases' '
 	}
 '
 
+missing_oid=$(test_oid deadbeef)
+test_expect_success 'error on type of missing object' '
+	cat >expect.err <<-\EOF &&
+	fatal: git cat-file: could not get object info
+	EOF
+	test_must_fail git cat-file -t $missing_oid >out 2>err &&
+	test_must_be_empty out &&
+	test_cmp expect.err err &&
+
+	test_must_fail git cat-file -t --allow-unknown-type $missing_oid >out 2>err &&
+	test_must_be_empty out &&
+	test_cmp expect.err err
+'
+
+test_expect_success 'error on size of missing object' '
+	cat >expect.err <<-\EOF &&
+	fatal: git cat-file: could not get object info
+	EOF
+	test_must_fail git cat-file -s $missing_oid >out 2>err &&
+	test_must_be_empty out &&
+	test_cmp expect.err err &&
+
+	test_must_fail git cat-file -s --allow-unknown-type $missing_oid >out 2>err &&
+	test_must_be_empty out &&
+	test_cmp expect.err err
+'
+
 bogus_type="bogus"
 bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v4 04/21] cat-file tests: test that --allow-unknown-type isn't on by default
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (2 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 03/21] cat-file tests: test for missing object with -t and -s Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 05/21] rev-list tests: test for behavior with invalid object types Ævar Arnfjörð Bjarmason
                           ` (17 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Fix a blindspot in the tests for the --allow-unknown-type feature
added in 39e4ae38804 (cat-file: teach cat-file a
'--allow-unknown-type' option, 2015-05-03). We should check that
--allow-unknown-type isn't on by default.

Before this change all the tests would succeed if --allow-unknown-type
was on by default, let's fix that by asserting that -t and -s die on a
"garbage" type without --allow-unknown-type.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index b71ef94329e..dc01d7c4a9a 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -347,6 +347,20 @@ bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
 bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
 
+test_expect_success 'die on broken object under -t and -s without --allow-unknown-type' '
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type
+	EOF
+
+	test_must_fail git cat-file -t $bogus_sha1 >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual &&
+
+	test_must_fail git cat-file -s $bogus_sha1 >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual
+'
+
 test_expect_success "Type of broken object is correct" '
 	echo $bogus_type >expect &&
 	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
@@ -363,6 +377,21 @@ bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
 bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
 
+test_expect_success 'die on broken object with large type under -t and -s without --allow-unknown-type' '
+	cat >err.expect <<-EOF &&
+	error: unable to unpack $bogus_sha1 header
+	fatal: git cat-file: could not get object info
+	EOF
+
+	test_must_fail git cat-file -t $bogus_sha1 >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual &&
+
+	test_must_fail git cat-file -s $bogus_sha1 >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual
+'
+
 test_expect_success "Type of broken object is correct when type is large" '
 	echo $bogus_type >expect &&
 	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v4 05/21] rev-list tests: test for behavior with invalid object types
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (3 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 04/21] cat-file tests: test that --allow-unknown-type isn't on by default Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 06/21] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
                           ` (16 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Fix a blindspot in the tests for the "rev-list --disk-usage" feature
added in 16950f8384a (rev-list: add --disk-usage option for
calculating disk usage, 2021-02-09) to test for what happens when it's
asked to calculate the disk usage of invalid object types.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t6115-rev-list-du.sh | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/t/t6115-rev-list-du.sh b/t/t6115-rev-list-du.sh
index b4aef32b713..edb2ed55846 100755
--- a/t/t6115-rev-list-du.sh
+++ b/t/t6115-rev-list-du.sh
@@ -48,4 +48,15 @@ check_du HEAD
 check_du --objects HEAD
 check_du --objects HEAD^..HEAD
 
+test_expect_success 'setup garbage repository' '
+	git clone --bare . garbage.git &&
+	garbage_oid=$(git -C garbage.git hash-object -t garbage -w --stdin --literally <one.t) &&
+	git -C garbage.git rev-list --objects --all --disk-usage &&
+
+	# Manually create a ref because "update-ref", "tag" etc. have
+	# no corresponding --literally option.
+	echo $garbage_oid >garbage.git/refs/tags/garbage-tag &&
+	test_must_fail git -C garbage.git rev-list --objects --all --disk-usage
+'
+
 test_done
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v4 06/21] cat-file tests: add corrupt loose object test
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (4 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 05/21] rev-list tests: test for behavior with invalid object types Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 07/21] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
                           ` (15 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Fix a blindspot in the tests for "cat-file" (and by proxy, the guts of
object-file.c) by testing that when we can't decode a loose object
with zlib we'll emit an error from zlib.c.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 52 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index dc01d7c4a9a..7f10a92f0e4 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -404,6 +404,58 @@ test_expect_success "Size of large broken object is correct when type is large"
 	test_cmp expect actual
 '
 
+test_expect_success 'cat-file -t and -s on corrupt loose object' '
+	git init --bare corrupt-loose.git &&
+	(
+		cd corrupt-loose.git &&
+
+		# Setup and create the empty blob and its path
+		empty_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$EMPTY_BLOB")) &&
+		git hash-object -w --stdin </dev/null &&
+
+		# Create another blob and its path
+		echo other >other.blob &&
+		other_blob=$(git hash-object -w --stdin <other.blob) &&
+		other_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$other_blob")) &&
+
+		# Before the swap the size is 0
+		cat >out.expect <<-EOF &&
+		0
+		EOF
+		git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# Swap the two to corrupt the repository
+		mv -f "$other_path" "$empty_path" &&
+		test_must_fail git fsck 2>err.fsck &&
+		grep "hash mismatch" err.fsck &&
+
+		# confirm that cat-file is reading the new swapped-in
+		# blob...
+		cat >out.expect <<-EOF &&
+		blob
+		EOF
+		git cat-file -t "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# ... since it has a different size now.
+		cat >out.expect <<-EOF &&
+		6
+		EOF
+		git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# So far "cat-file" has been happy to spew the found
+		# content out as-is. Try to make it zlib-invalid.
+		mv -f other.blob "$empty_path" &&
+		test_must_fail git fsck 2>err.fsck &&
+		grep "^error: inflate: data stream error (" err.fsck
+	)
+'
+
 # Tests for git cat-file --follow-symlinks
 test_expect_success 'prep for symlink tests' '
 	echo_without_newline "$hello_content" >morx &&
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v4 07/21] cat-file tests: test for current --allow-unknown-type behavior
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (5 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 06/21] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 08/21] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
                           ` (14 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Add more tests for the current --allow-unknown-type behavior. As noted
in [1] I don't think much of this makes sense, but let's test for it
as-is so we can see if the behavior changes in the future.

1. https://lore.kernel.org/git/87r1i4qf4h.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 61 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 7f10a92f0e4..86fd2a90ca7 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -361,6 +361,46 @@ test_expect_success 'die on broken object under -t and -s without --allow-unknow
 	test_must_be_empty out.actual
 '
 
+test_expect_success '-e is OK with a broken object without --allow-unknown-type' '
+	git cat-file -e $bogus_sha1
+'
+
+test_expect_success '-e can not be combined with --allow-unknown-type' '
+	test_expect_code 128 git cat-file -e --allow-unknown-type $bogus_sha1
+'
+
+test_expect_success '-p cannot print a broken object even with --allow-unknown-type' '
+	test_must_fail git cat-file -p $bogus_sha1 &&
+	test_expect_code 128 git cat-file -p --allow-unknown-type $bogus_sha1
+'
+
+test_expect_success '<type> <hash> does not work with objects of broken types' '
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type "bogus"
+	EOF
+	test_must_fail git cat-file $bogus_type $bogus_sha1 2>err.actual &&
+	test_cmp err.expect err.actual
+'
+
+test_expect_success 'broken types combined with --batch and --batch-check' '
+	echo $bogus_sha1 >bogus-oid &&
+
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type
+	EOF
+
+	test_must_fail git cat-file --batch <bogus-oid 2>err.actual &&
+	test_cmp err.expect err.actual &&
+
+	test_must_fail git cat-file --batch-check <bogus-oid 2>err.actual &&
+	test_cmp err.expect err.actual
+'
+
+test_expect_success 'the --batch and --batch-check options do not combine with --allow-unknown-type' '
+	test_expect_code 128 git cat-file --batch --allow-unknown-type <bogus-oid &&
+	test_expect_code 128 git cat-file --batch-check --allow-unknown-type <bogus-oid
+'
+
 test_expect_success "Type of broken object is correct" '
 	echo $bogus_type >expect &&
 	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
@@ -372,6 +412,27 @@ test_expect_success "Size of broken object is correct" '
 	git cat-file -s --allow-unknown-type $bogus_sha1 >actual &&
 	test_cmp expect actual
 '
+
+test_expect_success 'the --allow-unknown-type option does not consider replacement refs' '
+	cat >expect <<-EOF &&
+	$bogus_type
+	EOF
+	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+	test_cmp expect actual &&
+
+	# Create it manually, as "git replace" will die on bogus
+	# types.
+	head=$(git rev-parse --verify HEAD) &&
+	mkdir -p .git/refs/replace &&
+	echo $head >.git/refs/replace/$bogus_sha1 &&
+
+	cat >expect <<-EOF &&
+	commit
+	EOF
+	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+	test_cmp expect actual
+'
+
 bogus_type="abcdefghijklmnopqrstuvwxyz1234679"
 bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v4 08/21] cache.h: move object functions to object-store.h
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (6 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 07/21] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 09/21] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
                           ` (13 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Move the declaration of some ancient object functions added in
e.g. c4483576b8d (Add "unpack_sha1_header()" helper function,
2005-06-01) from cache.h to object-store.h. This continues work
started in cbd53a2193d (object-store: move object access functions to
object-store.h, 2018-05-15).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache.h        | 10 ----------
 object-store.h |  9 +++++++++
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/cache.h b/cache.h
index ba04ff8bd36..32ea1ea0474 100644
--- a/cache.h
+++ b/cache.h
@@ -1302,16 +1302,6 @@ char *xdg_cache_home(const char *filename);
 
 int git_open_cloexec(const char *name, int flags);
 #define git_open(name) git_open_cloexec(name, O_RDONLY)
-int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
-
-int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *buf, unsigned long size, const char *type);
-
-int finalize_object_file(const char *tmpfile, const char *filename);
-
-/* Helper to check and "touch" a file */
-int check_and_freshen_file(const char *fn, int freshen);
 
 extern const signed char hexval_table[256];
 static inline unsigned int hexval(unsigned char c)
diff --git a/object-store.h b/object-store.h
index ec32c23dcb5..9117115a50c 100644
--- a/object-store.h
+++ b/object-store.h
@@ -477,4 +477,13 @@ int for_each_object_in_pack(struct packed_git *p,
 int for_each_packed_object(each_packed_object_fn, void *,
 			   enum for_each_object_flags flags);
 
+int unpack_loose_header(git_zstream *stream, unsigned char *map,
+			unsigned long mapsize, void *buffer,
+			unsigned long bufsiz);
+int parse_loose_header(const char *hdr, unsigned long *sizep);
+int check_object_signature(struct repository *r, const struct object_id *oid,
+			   void *buf, unsigned long size, const char *type);
+int finalize_object_file(const char *tmpfile, const char *filename);
+int check_and_freshen_file(const char *fn, int freshen);
+
 #endif /* OBJECT_STORE_H */
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v4 09/21] object-file.c: don't set "typep" when returning non-zero
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (7 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 08/21] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 10/21] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
                           ` (12 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

When the loose_object_info() function returns an error stop faking up
the "oi->typep" to OBJ_BAD. Let the return value of the function
itself suffice. This code cleanup simplifies subsequent changes.

That we set this at all is a relic from the past. Before
052fe5eaca9 (sha1_loose_object_info: make type lookup optional,
2013-07-12) we would always return the type_from_string(type) via the
parse_sha1_header() function, or -1 (i.e. OBJ_BAD) if we couldn't
parse it.

Then in a combination of 46f034483eb (sha1_file: support reading from
a loose object of unknown type, 2015-05-03) and
b3ea7dd32d6 (sha1_loose_object_info: handle errors from
unpack_sha1_rest, 2017-10-05) our API drifted even further towards
conflating the two again.

Having read the code paths involved carefully I think this is OK. We
are just about to return -1, and we have only one caller:
do_oid_object_info_extended(). That function will in turn go on to
return -1 when we return -1 here.

This might be introducing a subtle bug where a caller of
oid_object_info_extended() would inspect its "typep" and expect a
meaningful value if the function returned -1.

Such a problem would not occur for its simpler oid_object_info()
sister function. That one always returns the "enum object_type", which
in the case of -1 would be the OBJ_BAD.

Having read the code for all the callers of these functions I don't
believe any such bug is being introduced here, and in any case we'd
likely already have such a bug for the "sizep" member (although
blindly checking "typep" first would be a more common case).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/object-file.c b/object-file.c
index f233b440b22..9210e2e6fe4 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1480,8 +1480,6 @@ static int loose_object_info(struct repository *r,
 		git_inflate_end(&stream);
 
 	munmap(map, mapsize);
-	if (status && oi->typep)
-		*oi->typep = status;
 	if (oi->sizep == &size_scratch)
 		oi->sizep = NULL;
 	strbuf_release(&hdrbuf);
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v4 10/21] object-file.c: make parse_loose_header_extended() public
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (8 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 09/21] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 11/21] object-file.c: add missing braces to loose_object_info() Ævar Arnfjörð Bjarmason
                           ` (11 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Make the parse_loose_header_extended() function public and remove the
parse_loose_header() wrapper. The only direct user of it outside of
object-file.c itself was in streaming.c, that caller can simply pass
the required "struct object-info *" instead.

This change is being done in preparation for teaching
read_loose_object() to accept a flag to pass to
parse_loose_header(). It isn't strictly necessary for that change, we
could simply use parse_loose_header_extended() there, but will leave
the API in a better end state.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c  | 21 ++++++++-------------
 object-store.h |  3 ++-
 streaming.c    |  5 ++++-
 3 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/object-file.c b/object-file.c
index 9210e2e6fe4..e0ba1842272 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1340,8 +1340,9 @@ static void *unpack_loose_rest(git_zstream *stream,
  * too permissive for what we want to check. So do an anal
  * object header parse by hand.
  */
-static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
-				       unsigned int flags)
+int parse_loose_header(const char *hdr,
+		       struct object_info *oi,
+		       unsigned int flags)
 {
 	const char *type_buf = hdr;
 	unsigned long size;
@@ -1401,14 +1402,6 @@ static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
 	return *hdr ? -1 : type;
 }
 
-int parse_loose_header(const char *hdr, unsigned long *sizep)
-{
-	struct object_info oi = OBJECT_INFO_INIT;
-
-	oi.sizep = sizep;
-	return parse_loose_header_extended(hdr, &oi, 0);
-}
-
 static int loose_object_info(struct repository *r,
 			     const struct object_id *oid,
 			     struct object_info *oi, int flags)
@@ -1463,10 +1456,10 @@ static int loose_object_info(struct repository *r,
 	if (status < 0)
 		; /* Do nothing */
 	else if (hdrbuf.len) {
-		if ((status = parse_loose_header_extended(hdrbuf.buf, oi, flags)) < 0)
+		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header_extended(hdr, oi, flags)) < 0)
+	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
 
 	if (status >= 0 && oi->contentp) {
@@ -2547,6 +2540,8 @@ int read_loose_object(const char *path,
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
+	struct object_info oi = OBJECT_INFO_INIT;
+	oi.sizep = size;
 
 	*contents = NULL;
 
@@ -2561,7 +2556,7 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, size);
+	*type = parse_loose_header(hdr, &oi, 0);
 	if (*type < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
diff --git a/object-store.h b/object-store.h
index 9117115a50c..d443964447c 100644
--- a/object-store.h
+++ b/object-store.h
@@ -480,7 +480,8 @@ int for_each_packed_object(each_packed_object_fn, void *,
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
 			unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
+int parse_loose_header(const char *hdr, struct object_info *oi,
+		       unsigned int flags);
 int check_object_signature(struct repository *r, const struct object_id *oid,
 			   void *buf, unsigned long size, const char *type);
 int finalize_object_file(const char *tmpfile, const char *filename);
diff --git a/streaming.c b/streaming.c
index 5f480ad50c4..8beac62cbb7 100644
--- a/streaming.c
+++ b/streaming.c
@@ -223,6 +223,9 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 			      const struct object_id *oid,
 			      enum object_type *type)
 {
+	struct object_info oi = OBJECT_INFO_INIT;
+	oi.sizep = &st->size;
+
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
 		return -1;
@@ -231,7 +234,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 				 st->u.loose.mapsize,
 				 st->u.loose.hdr,
 				 sizeof(st->u.loose.hdr)) < 0) ||
-	    (parse_loose_header(st->u.loose.hdr, &st->size) < 0)) {
+	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
 		return -1;
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v4 11/21] object-file.c: add missing braces to loose_object_info()
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (9 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 10/21] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 12/21] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
                           ` (10 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Change the formatting in loose_object_info() to conform with our usual
coding style:

    When there are multiple arms to a conditional and some of them
    require braces, enclose even a single line block in braces for
    consistency -- Documentation/CodingGuidelines

This formatting-only change makes a subsequent commit easier to read.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/object-file.c b/object-file.c
index e0ba1842272..646ca7f85d6 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1450,17 +1450,20 @@ static int loose_object_info(struct repository *r,
 		if (unpack_loose_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
 			status = error(_("unable to unpack %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
+	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
-	if (status < 0)
-		; /* Do nothing */
-	else if (hdrbuf.len) {
+	}
+
+	if (status < 0) {
+		/* Do nothing */
+	} else if (hdrbuf.len) {
 		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
+	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0) {
 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
+	}
 
 	if (status >= 0 && oi->contentp) {
 		*oi->contentp = unpack_loose_rest(&stream, hdr,
@@ -1469,8 +1472,9 @@ static int loose_object_info(struct repository *r,
 			git_inflate_end(&stream);
 			status = -1;
 		}
-	} else
+	} else {
 		git_inflate_end(&stream);
+	}
 
 	munmap(map, mapsize);
 	if (oi->sizep == &size_scratch)
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v4 12/21] object-file.c: simplify unpack_loose_short_header()
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (10 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 11/21] object-file.c: add missing braces to loose_object_info() Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 13/21] object-file.c: split up ternary in parse_loose_header() Ævar Arnfjörð Bjarmason
                           ` (9 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Combine the unpack_loose_short_header(),
unpack_loose_header_to_strbuf() and unpack_loose_header() functions
into one.

The unpack_loose_header_to_strbuf() function was added in
46f034483eb (sha1_file: support reading from a loose object of unknown
type, 2015-05-03).

Its code was mostly copy/pasted between it and both of
unpack_loose_header() and unpack_loose_short_header(). We now have a
single unpack_loose_header() function which accepts an optional
"struct strbuf *" instead.

I think the remaining unpack_loose_header() function could be further
simplified, we're carrying some complexity just to be able to emit a
garbage type longer than MAX_HEADER_LEN, we could alternatively just
say "we found a garbage type <first 32 bytes>..." instead. But let's
leave the current behavior in place for now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c  | 60 ++++++++++++++++++--------------------------------
 object-store.h | 14 +++++++++++-
 streaming.c    |  3 ++-
 3 files changed, 37 insertions(+), 40 deletions(-)

diff --git a/object-file.c b/object-file.c
index 646ca7f85d6..ef3a1517fed 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1210,11 +1210,12 @@ void *map_loose_object(struct repository *r,
 	return map_loose_object_1(r, NULL, oid, size);
 }
 
-static int unpack_loose_short_header(git_zstream *stream,
-				     unsigned char *map, unsigned long mapsize,
-				     void *buffer, unsigned long bufsiz)
+int unpack_loose_header(git_zstream *stream,
+			unsigned char *map, unsigned long mapsize,
+			void *buffer, unsigned long bufsiz,
+			struct strbuf *header)
 {
-	int ret;
+	int status;
 
 	/* Get the data stream */
 	memset(stream, 0, sizeof(*stream));
@@ -1225,44 +1226,25 @@ static int unpack_loose_short_header(git_zstream *stream,
 
 	git_inflate_init(stream);
 	obj_read_unlock();
-	ret = git_inflate(stream, 0);
+	status = git_inflate(stream, 0);
 	obj_read_lock();
-
-	return ret;
-}
-
-int unpack_loose_header(git_zstream *stream,
-			unsigned char *map, unsigned long mapsize,
-			void *buffer, unsigned long bufsiz)
-{
-	int status = unpack_loose_short_header(stream, map, mapsize,
-					       buffer, bufsiz);
-
 	if (status < Z_OK)
 		return status;
 
-	/* Make sure we have the terminating NUL */
-	if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
-		return -1;
-	return 0;
-}
-
-static int unpack_loose_header_to_strbuf(git_zstream *stream, unsigned char *map,
-					 unsigned long mapsize, void *buffer,
-					 unsigned long bufsiz, struct strbuf *header)
-{
-	int status;
-
-	status = unpack_loose_short_header(stream, map, mapsize, buffer, bufsiz);
-	if (status < Z_OK)
-		return -1;
-
 	/*
 	 * Check if entire header is unpacked in the first iteration.
 	 */
 	if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
 		return 0;
 
+	/*
+	 * We have a header longer than MAX_HEADER_LEN. The "header"
+	 * here is only non-NULL when we run "cat-file
+	 * --allow-unknown-type".
+	 */
+	if (!header)
+		return -1;
+
 	/*
 	 * buffer[0..bufsiz] was not large enough.  Copy the partial
 	 * result out to header, and then append the result of further
@@ -1410,9 +1392,11 @@ static int loose_object_info(struct repository *r,
 	unsigned long mapsize;
 	void *map;
 	git_zstream stream;
+	int hdr_ret;
 	char hdr[MAX_HEADER_LEN];
 	struct strbuf hdrbuf = STRBUF_INIT;
 	unsigned long size_scratch;
+	int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 
 	if (oi->delta_base_oid)
 		oidclr(oi->delta_base_oid);
@@ -1446,11 +1430,10 @@ static int loose_object_info(struct repository *r,
 
 	if (oi->disk_sizep)
 		*oi->disk_sizep = mapsize;
-	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
-		if (unpack_loose_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
-			status = error(_("unable to unpack %s header with --allow-unknown-type"),
-				       oid_to_hex(oid));
-	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
+
+	hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+				      allow_unknown ? &hdrbuf : NULL);
+	if (hdr_ret < 0) {
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
 	}
@@ -2555,7 +2538,8 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
+	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+				NULL) < 0) {
 		error(_("unable to unpack header of %s"), path);
 		goto out;
 	}
diff --git a/object-store.h b/object-store.h
index d443964447c..31327a7f6c3 100644
--- a/object-store.h
+++ b/object-store.h
@@ -477,9 +477,21 @@ int for_each_object_in_pack(struct packed_git *p,
 int for_each_packed_object(each_packed_object_fn, void *,
 			   enum for_each_object_flags flags);
 
+/**
+ * unpack_loose_header() initializes the data stream needed to unpack
+ * a loose object header.
+ *
+ * Returns 0 on success. Returns negative values on error.
+ *
+ * It will only parse up to MAX_HEADER_LEN bytes unless an optional
+ * "hdrbuf" argument is non-NULL. This is intended for use with
+ * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
+ * reporting. The full header will be extracted to "hdrbuf" for use
+ * with parse_loose_header().
+ */
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
-			unsigned long bufsiz);
+			unsigned long bufsiz, struct strbuf *hdrbuf);
 int parse_loose_header(const char *hdr, struct object_info *oi,
 		       unsigned int flags);
 int check_object_signature(struct repository *r, const struct object_id *oid,
diff --git a/streaming.c b/streaming.c
index 8beac62cbb7..cb3c3cf6ff6 100644
--- a/streaming.c
+++ b/streaming.c
@@ -233,7 +233,8 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 				 st->u.loose.mapped,
 				 st->u.loose.mapsize,
 				 st->u.loose.hdr,
-				 sizeof(st->u.loose.hdr)) < 0) ||
+				 sizeof(st->u.loose.hdr),
+				 NULL) < 0) ||
 	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v4 13/21] object-file.c: split up ternary in parse_loose_header()
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (11 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 12/21] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 14/21] object-file.c: stop dying " Ævar Arnfjörð Bjarmason
                           ` (8 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

This minor formatting change serves to make a subsequent patch easier
to read.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/object-file.c b/object-file.c
index ef3a1517fed..e51cf2ca33e 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1381,7 +1381,10 @@ int parse_loose_header(const char *hdr,
 	/*
 	 * The length must be followed by a zero byte
 	 */
-	return *hdr ? -1 : type;
+	if (*hdr)
+		return -1;
+
+	return type;
 }
 
 static int loose_object_info(struct repository *r,
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v4 14/21] object-file.c: stop dying in parse_loose_header()
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (12 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 13/21] object-file.c: split up ternary in parse_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 15/21] object-file.c: guard against future bugs in loose_object_info() Ævar Arnfjörð Bjarmason
                           ` (7 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Start the libification of parse_loose_header() by making it return
error codes and data instead of invoking die() by itself. For now
we'll move the relevant die() call to loose_object_info() and
read_loose_object() to keep this change smaller, but in subsequent
commits we'll also libify those.

Since the refactoring of parse_loose_header_extended() into
parse_loose_header() in an earlier commit, its interface accepts a
"unsigned long *sizep". Rather it accepts a "struct object_info *",
that structure will be populated with information about the object.

It thus makes sense to further libify the interface so that it stops
calling die() when it encounters OBJ_BAD, and instead rely on its
callers to check the populated "oi->typep".

Because of this we don't need to pass in the "unsigned int flags"
which we used for OBJECT_INFO_ALLOW_UNKNOWN_TYPE, we can instead do
that check in loose_object_info().

This also refactors some confusing control flow around the "status"
variable. In some cases we set it to the return value of "error()",
i.e. -1, and later checked if "status < 0" was true.

In another case added in c84a1f3ed4d (sha1_file: refactor read_object,
2017-06-21) (but the behavior pre-dated that) we did checks of "status
>= 0", because at that point "status" had become the return value of
parse_loose_header(). I.e. a non-negative "enum object_type" (unless
we -1, aka. OBJ_BAD).

Now that parse_loose_header() will return 0 on success instead of the
type (which it'll stick into the "struct object_info") we don't need
to conflate these two cases in its callers.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c  | 53 ++++++++++++++++++++++++++------------------------
 object-store.h | 13 +++++++++++--
 streaming.c    |  4 +++-
 3 files changed, 42 insertions(+), 28 deletions(-)

diff --git a/object-file.c b/object-file.c
index e51cf2ca33e..31263335af9 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1322,9 +1322,7 @@ static void *unpack_loose_rest(git_zstream *stream,
  * too permissive for what we want to check. So do an anal
  * object header parse by hand.
  */
-int parse_loose_header(const char *hdr,
-		       struct object_info *oi,
-		       unsigned int flags)
+int parse_loose_header(const char *hdr, struct object_info *oi)
 {
 	const char *type_buf = hdr;
 	unsigned long size;
@@ -1346,15 +1344,6 @@ int parse_loose_header(const char *hdr,
 	type = type_from_string_gently(type_buf, type_len, 1);
 	if (oi->type_name)
 		strbuf_add(oi->type_name, type_buf, type_len);
-	/*
-	 * Set type to 0 if its an unknown object and
-	 * we're obtaining the type using '--allow-unknown-type'
-	 * option.
-	 */
-	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
-		type = 0;
-	else if (type < 0)
-		die(_("invalid object type"));
 	if (oi->typep)
 		*oi->typep = type;
 
@@ -1384,7 +1373,11 @@ int parse_loose_header(const char *hdr,
 	if (*hdr)
 		return -1;
 
-	return type;
+	/*
+	 * The format is valid, but the type may still be bogus. The
+	 * Caller needs to check its oi->typep.
+	 */
+	return 0;
 }
 
 static int loose_object_info(struct repository *r,
@@ -1399,6 +1392,8 @@ static int loose_object_info(struct repository *r,
 	char hdr[MAX_HEADER_LEN];
 	struct strbuf hdrbuf = STRBUF_INIT;
 	unsigned long size_scratch;
+	enum object_type type_scratch;
+	int parsed_header = 0;
 	int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 
 	if (oi->delta_base_oid)
@@ -1430,6 +1425,8 @@ static int loose_object_info(struct repository *r,
 
 	if (!oi->sizep)
 		oi->sizep = &size_scratch;
+	if (!oi->typep)
+		oi->typep = &type_scratch;
 
 	if (oi->disk_sizep)
 		*oi->disk_sizep = mapsize;
@@ -1440,18 +1437,20 @@ static int loose_object_info(struct repository *r,
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
 	}
-
-	if (status < 0) {
-		/* Do nothing */
-	} else if (hdrbuf.len) {
-		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
-			status = error(_("unable to parse %s header with --allow-unknown-type"),
-				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0) {
-		status = error(_("unable to parse %s header"), oid_to_hex(oid));
+	if (!status) {
+		if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
+			/*
+			 * oi->{sizep,typep} are meaningless unless
+			 * parse_loose_header() returns >= 0.
+			 */
+			parsed_header = 1;
+		else
+			status = error(_("unable to parse %s header"), oid_to_hex(oid));
 	}
+	if (!allow_unknown && parsed_header && *oi->typep < 0)
+		die(_("invalid object type"));
 
-	if (status >= 0 && oi->contentp) {
+	if (parsed_header && oi->contentp) {
 		*oi->contentp = unpack_loose_rest(&stream, hdr,
 						  *oi->sizep, oid);
 		if (!*oi->contentp) {
@@ -1466,6 +1465,8 @@ static int loose_object_info(struct repository *r,
 	if (oi->sizep == &size_scratch)
 		oi->sizep = NULL;
 	strbuf_release(&hdrbuf);
+	if (oi->typep == &type_scratch)
+		oi->typep = NULL;
 	oi->whence = OI_LOOSE;
 	return (status < 0) ? status : 0;
 }
@@ -2531,6 +2532,7 @@ int read_loose_object(const char *path,
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
 	struct object_info oi = OBJECT_INFO_INIT;
+	oi.typep = type;
 	oi.sizep = size;
 
 	*contents = NULL;
@@ -2547,12 +2549,13 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, &oi, 0);
-	if (*type < 0) {
+	if (parse_loose_header(hdr, &oi) < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
 		goto out;
 	}
+	if (*type < 0)
+		die(_("invalid object type"));
 
 	if (*type == OBJ_BLOB && *size > big_file_threshold) {
 		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
diff --git a/object-store.h b/object-store.h
index 31327a7f6c3..65a8e4dc6a8 100644
--- a/object-store.h
+++ b/object-store.h
@@ -492,8 +492,17 @@ int for_each_packed_object(each_packed_object_fn, void *,
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
 			unsigned long bufsiz, struct strbuf *hdrbuf);
-int parse_loose_header(const char *hdr, struct object_info *oi,
-		       unsigned int flags);
+
+/**
+ * parse_loose_header() parses the starting "<type> <len>\0" of an
+ * object. If it doesn't follow that format -1 is returned. To check
+ * the validity of the <type> populate the "typep" in the "struct
+ * object_info". It will be OBJ_BAD if the object type is unknown. The
+ * parsed <len> can be retrieved via "oi->sizep", and from there
+ * passed to unpack_loose_rest().
+ */
+int parse_loose_header(const char *hdr, struct object_info *oi);
+
 int check_object_signature(struct repository *r, const struct object_id *oid,
 			   void *buf, unsigned long size, const char *type);
 int finalize_object_file(const char *tmpfile, const char *filename);
diff --git a/streaming.c b/streaming.c
index cb3c3cf6ff6..c3dc241d6a5 100644
--- a/streaming.c
+++ b/streaming.c
@@ -225,6 +225,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 {
 	struct object_info oi = OBJECT_INFO_INIT;
 	oi.sizep = &st->size;
+	oi.typep = type;
 
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
@@ -235,7 +236,8 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 				 st->u.loose.hdr,
 				 sizeof(st->u.loose.hdr),
 				 NULL) < 0) ||
-	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
+	    (parse_loose_header(st->u.loose.hdr, &oi) < 0) ||
+	    *type < 0) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
 		return -1;
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v4 15/21] object-file.c: guard against future bugs in loose_object_info()
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (13 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 14/21] object-file.c: stop dying " Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 16/21] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
                           ` (6 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

An earlier version of the preceding commit had a subtle bug where our
"type_scratch" (later assigned to "oi->typep") would be uninitialized
and used in the "!allow_unknown" case, at which point it would contain
a nonsensical value if we'd failed to call parse_loose_header().

The preceding commit introduced "parsed_header" variable to check for
this case, but I think we can do better, let's carry a "oi_header"
variable initially set to NULL, and only set it to "oi" once we're
past parse_loose_header().

This is functionally the same thing, but hopefully makes it even more
obvious in the future that we must not access the "typep" and
"sizep" (or "type_name") unless parse_loose_header() succeeds, but
that accessing other fields set earlier (such as the "disk_sizep" set
earlier) is OK.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/object-file.c b/object-file.c
index 31263335af9..d41f444e6cc 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1393,7 +1393,7 @@ static int loose_object_info(struct repository *r,
 	struct strbuf hdrbuf = STRBUF_INIT;
 	unsigned long size_scratch;
 	enum object_type type_scratch;
-	int parsed_header = 0;
+	struct object_info *oi_header = NULL;
 	int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 
 	if (oi->delta_base_oid)
@@ -1441,18 +1441,20 @@ static int loose_object_info(struct repository *r,
 		if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
 			/*
 			 * oi->{sizep,typep} are meaningless unless
-			 * parse_loose_header() returns >= 0.
+			 * parse_loose_header() returns >= 0. Let's
+			 * access them as "oi_header" (just an alias
+			 * for "oi") below to make that intent clear.
 			 */
-			parsed_header = 1;
+			oi_header = oi;
 		else
 			status = error(_("unable to parse %s header"), oid_to_hex(oid));
 	}
-	if (!allow_unknown && parsed_header && *oi->typep < 0)
+	if (!allow_unknown && oi_header && *oi_header->typep < 0)
 		die(_("invalid object type"));
 
-	if (parsed_header && oi->contentp) {
+	if (oi_header && oi->contentp) {
 		*oi->contentp = unpack_loose_rest(&stream, hdr,
-						  *oi->sizep, oid);
+						  *oi_header->sizep, oid);
 		if (!*oi->contentp) {
 			git_inflate_end(&stream);
 			status = -1;
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v4 16/21] object-file.c: return -1, not "status" from unpack_loose_header()
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (14 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 15/21] object-file.c: guard against future bugs in loose_object_info() Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 17/21] object-file.c: return -2 on "header too long" in unpack_loose_header() Ævar Arnfjörð Bjarmason
                           ` (5 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Return a -1 when git_inflate() fails instead of whatever Z_* status
we'd get from zlib.c. This makes no difference to any error we report,
but makes it more obvious that we don't care about the specific zlib
error codes here.

See d21f8426907 (unpack_sha1_header(): detect malformed object header,
2016-09-25) for the commit that added the "return status" code. As far
as I can tell there was never a real reason (e.g. different reporting)
for carrying down the "status" as opposed to "-1".

At the time that d21f8426907 was written there was a corresponding
"ret < Z_OK" check right after the unpack_sha1_header() call (the
"unpack_sha1_header()" function was later rename to our current
"unpack_loose_header()").

However, that check was removed in c84a1f3ed4d (sha1_file: refactor
read_object, 2017-06-21) without changing the corresponding return
code.

So let's do the minor cleanup of also changing this function to return
a -1.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/object-file.c b/object-file.c
index d41f444e6cc..956ca260518 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1229,7 +1229,7 @@ int unpack_loose_header(git_zstream *stream,
 	status = git_inflate(stream, 0);
 	obj_read_lock();
 	if (status < Z_OK)
-		return status;
+		return -1;
 
 	/*
 	 * Check if entire header is unpacked in the first iteration.
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v4 17/21] object-file.c: return -2 on "header too long" in unpack_loose_header()
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (15 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 16/21] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 18/21] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
                           ` (4 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Split up the return code for "header too long" from the generic
negative return value unpack_loose_header() returns, and report via
error() if we exceed MAX_HEADER_LEN.

As a test added earlier in this series in t1006-cat-file.sh shows
we'll correctly emit zlib errors from zlib.c already in this case, so
we have no need to carry those return codes further down the
stack. Let's instead just return -2 saying we ran into the
MAX_HEADER_LEN limit, or other negative values for "unable to unpack
<OID> header".

I tried setting up an enum just for these three return values, but I
think the result was less readable. Let's consider doing that if we
gain even more return values. For now let's do the next best thing and
enumerate our known return values, and BUG() if we encounter one we
don't know about.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c       | 16 +++++++++++++---
 object-store.h      |  6 ++++--
 t/t1006-cat-file.sh |  2 +-
 3 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/object-file.c b/object-file.c
index 956ca260518..1866115a1c5 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1243,7 +1243,7 @@ int unpack_loose_header(git_zstream *stream,
 	 * --allow-unknown-type".
 	 */
 	if (!header)
-		return -1;
+		return -2;
 
 	/*
 	 * buffer[0..bufsiz] was not large enough.  Copy the partial
@@ -1264,7 +1264,7 @@ int unpack_loose_header(git_zstream *stream,
 		stream->next_out = buffer;
 		stream->avail_out = bufsiz;
 	} while (status != Z_STREAM_END);
-	return -1;
+	return -2;
 }
 
 static void *unpack_loose_rest(git_zstream *stream,
@@ -1433,9 +1433,19 @@ static int loose_object_info(struct repository *r,
 
 	hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
 				      allow_unknown ? &hdrbuf : NULL);
-	if (hdr_ret < 0) {
+	switch (hdr_ret) {
+	case 0:
+		break;
+	case -1:
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
+		break;
+	case -2:
+		status = error(_("header for %s too long, exceeds %d bytes"),
+			       oid_to_hex(oid), MAX_HEADER_LEN);
+		break;
+	default:
+		BUG("unknown hdr_ret value %d", hdr_ret);
 	}
 	if (!status) {
 		if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
diff --git a/object-store.h b/object-store.h
index 65a8e4dc6a8..1151ce8e820 100644
--- a/object-store.h
+++ b/object-store.h
@@ -481,13 +481,15 @@ int for_each_packed_object(each_packed_object_fn, void *,
  * unpack_loose_header() initializes the data stream needed to unpack
  * a loose object header.
  *
- * Returns 0 on success. Returns negative values on error.
+ * Returns 0 on success. Returns negative values on error. If the
+ * header exceeds MAX_HEADER_LEN -2 will be returned.
  *
  * It will only parse up to MAX_HEADER_LEN bytes unless an optional
  * "hdrbuf" argument is non-NULL. This is intended for use with
  * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
  * reporting. The full header will be extracted to "hdrbuf" for use
- * with parse_loose_header().
+ * with parse_loose_header(), -2 will still be returned from this
+ * function to indicate that the header was too long.
  */
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 86fd2a90ca7..06d38e1fae6 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -440,7 +440,7 @@ bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_t
 
 test_expect_success 'die on broken object with large type under -t and -s without --allow-unknown-type' '
 	cat >err.expect <<-EOF &&
-	error: unable to unpack $bogus_sha1 header
+	error: header for $bogus_sha1 too long, exceeds 32 bytes
 	fatal: git cat-file: could not get object info
 	EOF
 
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v4 18/21] fsck: don't hard die on invalid object types
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (16 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 17/21] object-file.c: return -2 on "header too long" in unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 19/21] object-store.h: move read_loose_object() below 'struct object_info' Ævar Arnfjörð Bjarmason
                           ` (3 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Change the error fsck emits on invalid object types, such as:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    <OID>

From the very ungraceful error of:

    $ git fsck
    fatal: invalid object type
    $

To:

    $ git fsck
    error: hash mismatch for <OID_PATH> (expected <OID>)
    error: <OID>: object corrupt or missing: <OID_PATH>
    [ the rest of the fsck output here, i.e. it didn't hard die ]

We'll still exit with non-zero, but now we'll finish the rest of the
traversal. The tests that's being added here asserts that we'll still
complain about other fsck issues (e.g. an unrelated dangling blob).

To do this we need to pass down the "OBJECT_INFO_ALLOW_UNKNOWN_TYPE"
flag from read_loose_object() through to parse_loose_header(). Since
the read_loose_object() function is only used in builtin/fsck.c we can
simply change it. See f6371f92104 (sha1_file: add read_loose_object()
function, 2017-01-13) for the introduction of read_loose_object().

Why are we complaining about a "hash mismatch" for an object of a type
we don't know about? We shouldn't. This is the bare minimal change
needed to not make fsck hard die on a repository that's been corrupted
in this manner. In subsequent commits we'll teach fsck to recognize
this particular type of corruption and emit a better error message.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c  |  3 ++-
 object-file.c   | 11 ++++++++---
 object-store.h  |  3 ++-
 t/t1450-fsck.sh | 14 +++++++-------
 4 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index b42b6fe21f7..082dadd5629 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -601,7 +601,8 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	void *contents;
 	int eaten;
 
-	if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
+	if (read_loose_object(path, oid, &type, &size, &contents,
+			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
 		errors_found |= ERROR_OBJECT;
 		error(_("%s: object corrupt or missing: %s"),
 		      oid_to_hex(oid), path);
diff --git a/object-file.c b/object-file.c
index 1866115a1c5..8fb55fc6f58 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2536,7 +2536,8 @@ int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
 		      enum object_type *type,
 		      unsigned long *size,
-		      void **contents)
+		      void **contents,
+		      unsigned int oi_flags)
 {
 	int ret = -1;
 	void *map = NULL;
@@ -2544,6 +2545,7 @@ int read_loose_object(const char *path,
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
 	struct object_info oi = OBJECT_INFO_INIT;
+	int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 	oi.typep = type;
 	oi.sizep = size;
 
@@ -2566,8 +2568,11 @@ int read_loose_object(const char *path,
 		git_inflate_end(&stream);
 		goto out;
 	}
-	if (*type < 0)
-		die(_("invalid object type"));
+	if (!allow_unknown && *type < 0) {
+		error(_("header for %s declares an unknown type"), path);
+		git_inflate_end(&stream);
+		goto out;
+	}
 
 	if (*type == OBJ_BLOB && *size > big_file_threshold) {
 		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
diff --git a/object-store.h b/object-store.h
index 1151ce8e820..94ff03072c1 100644
--- a/object-store.h
+++ b/object-store.h
@@ -245,7 +245,8 @@ int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
 		      enum object_type *type,
 		      unsigned long *size,
-		      void **contents);
+		      void **contents,
+		      unsigned int oi_flags);
 
 /* Retry packed storage after checking packed and loose storage */
 #define HAS_OBJECT_RECHECK_PACKED 1
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index f36ec1e2f4a..e7e8decebbd 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -863,16 +863,16 @@ test_expect_success 'detect corrupt index file in fsck' '
 	test_i18ngrep "bad index file" errors
 '
 
-test_expect_success 'fsck hard errors on an invalid object type' '
+test_expect_success 'fsck error and recovery on invalid object type' '
 	test_create_repo garbage-type &&
 	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
 	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
-	cat >err.expect <<-\EOF &&
-	fatal: invalid object type
-	EOF
-	test_must_fail git -C garbage-type fsck >out.actual 2>err.actual &&
-	test_cmp err.expect err.actual &&
-	test_must_be_empty out.actual
+	test_must_fail git -C garbage-type fsck >out 2>err &&
+	grep -e "^error" -e "^fatal" err >errors &&
+	test_line_count = 2 errors &&
+	grep "error: hash mismatch for" err &&
+	grep "$garbage_blob: object corrupt or missing:" err &&
+	grep "dangling blob $empty_blob" out
 '
 
 test_done
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v4 19/21] object-store.h: move read_loose_object() below 'struct object_info'
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (17 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 18/21] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 20/21] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
                           ` (2 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Move the declaration of read_loose_object() below "struct
object_info". In the next commit we'll add a "struct object_info *"
parameter to it, moving it will avoid a forward declaration of the
struct.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-store.h | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/object-store.h b/object-store.h
index 94ff03072c1..72d668b1674 100644
--- a/object-store.h
+++ b/object-store.h
@@ -234,20 +234,6 @@ int pretend_object_file(void *, unsigned long, enum object_type,
 
 int force_object_loose(const struct object_id *oid, time_t mtime);
 
-/*
- * Open the loose object at path, check its hash, and return the contents,
- * type, and size. If the object is a blob, then "contents" may return NULL,
- * to allow streaming of large blobs.
- *
- * Returns 0 on success, negative on error (details may be written to stderr).
- */
-int read_loose_object(const char *path,
-		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
-		      void **contents,
-		      unsigned int oi_flags);
-
 /* Retry packed storage after checking packed and loose storage */
 #define HAS_OBJECT_RECHECK_PACKED 1
 
@@ -388,6 +374,20 @@ int oid_object_info_extended(struct repository *r,
 			     const struct object_id *,
 			     struct object_info *, unsigned flags);
 
+/*
+ * Open the loose object at path, check its hash, and return the contents,
+ * type, and size. If the object is a blob, then "contents" may return NULL,
+ * to allow streaming of large blobs.
+ *
+ * Returns 0 on success, negative on error (details may be written to stderr).
+ */
+int read_loose_object(const char *path,
+		      const struct object_id *expected_oid,
+		      enum object_type *type,
+		      unsigned long *size,
+		      void **contents,
+		      unsigned int oi_flags);
+
 /*
  * Iterate over the files in the loose-object parts of the object
  * directory "path", triggering the following callbacks:
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v4 20/21] fsck: report invalid types recorded in objects
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (18 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 19/21] object-store.h: move read_loose_object() below 'struct object_info' Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-06-24 19:23         ` [PATCH v4 21/21] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Continue the work in the preceding commit and improve the error on:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    $ git fsck
    error: hash mismatch for <OID_PATH> (expected <OID>)
    error: <OID>: object corrupt or missing: <OID_PATH>
    [ other fsck output ]

To instead emit:

    $ git fsck
    error: <OID>: object is of unknown type 'garbage': <OID_PATH>
    [ other fsck output ]

The complaint about a "hash mismatch" was simply an emergent property
of how we'd fall though from read_loose_object() into fsck_loose()
when we didn't get the data we expected. Now we'll correctly note that
the object type is invalid.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c  | 22 ++++++++++++++++++----
 object-file.c   | 13 +++++--------
 object-store.h  |  4 ++--
 t/t1450-fsck.sh | 24 +++++++++++++++++++++---
 4 files changed, 46 insertions(+), 17 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 082dadd5629..07af0434db6 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -600,12 +600,26 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	unsigned long size;
 	void *contents;
 	int eaten;
-
-	if (read_loose_object(path, oid, &type, &size, &contents,
-			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
-		errors_found |= ERROR_OBJECT;
+	struct strbuf sb = STRBUF_INIT;
+	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
+	struct object_info oi;
+	int found = 0;
+	oi.type_name = &sb;
+	oi.sizep = &size;
+	oi.typep = &type;
+
+	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
+		found |= ERROR_OBJECT;
 		error(_("%s: object corrupt or missing: %s"),
 		      oid_to_hex(oid), path);
+	}
+	if (type < 0) {
+		found |= ERROR_OBJECT;
+		error(_("%s: object is of unknown type '%s': %s"),
+		      oid_to_hex(oid), sb.buf, path);
+	}
+	if (found) {
+		errors_found |= ERROR_OBJECT;
 		return 0; /* keep checking other objects */
 	}
 
diff --git a/object-file.c b/object-file.c
index 8fb55fc6f58..e550ea0c7cf 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2534,9 +2534,8 @@ static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
 		      void **contents,
+		      struct object_info *oi,
 		      unsigned int oi_flags)
 {
 	int ret = -1;
@@ -2544,10 +2543,9 @@ int read_loose_object(const char *path,
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
-	struct object_info oi = OBJECT_INFO_INIT;
 	int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
-	oi.typep = type;
-	oi.sizep = size;
+	enum object_type *type = oi->typep;
+	unsigned long *size = oi->sizep;
 
 	*contents = NULL;
 
@@ -2563,7 +2561,7 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	if (parse_loose_header(hdr, &oi) < 0) {
+	if (parse_loose_header(hdr, oi) < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
 		goto out;
@@ -2585,8 +2583,7 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size,
-					   type_name(*type))) {
+					   *contents, *size, oi->type_name->buf)) {
 			error(_("hash mismatch for %s (expected %s)"), path,
 			      oid_to_hex(expected_oid));
 			free(*contents);
diff --git a/object-store.h b/object-store.h
index 72d668b1674..96a5970f314 100644
--- a/object-store.h
+++ b/object-store.h
@@ -376,6 +376,7 @@ int oid_object_info_extended(struct repository *r,
 
 /*
  * Open the loose object at path, check its hash, and return the contents,
+ * use the "oi" argument to assert things about the object, or e.g. populate its
  * type, and size. If the object is a blob, then "contents" may return NULL,
  * to allow streaming of large blobs.
  *
@@ -383,9 +384,8 @@ int oid_object_info_extended(struct repository *r,
  */
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
 		      void **contents,
+		      struct object_info *oi,
 		      unsigned int oi_flags);
 
 /*
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index e7e8decebbd..bc541af2cfc 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -66,6 +66,25 @@ test_expect_success 'object with hash mismatch' '
 	)
 '
 
+test_expect_success 'object with hash and type mismatch' '
+	test_create_repo hash-type-mismatch &&
+	(
+		cd hash-type-mismatch &&
+		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+		mv .git/objects/$old .git/objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+		test_must_fail git fsck 2>out &&
+		grep "^error: hash mismatch for " out &&
+		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+	)
+'
+
 test_expect_success 'branch pointing to non-commit' '
 	git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
 	test_when_finished "git update-ref -d refs/heads/invalid" &&
@@ -869,9 +888,8 @@ test_expect_success 'fsck error and recovery on invalid object type' '
 	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
 	test_must_fail git -C garbage-type fsck >out 2>err &&
 	grep -e "^error" -e "^fatal" err >errors &&
-	test_line_count = 2 errors &&
-	grep "error: hash mismatch for" err &&
-	grep "$garbage_blob: object corrupt or missing:" err &&
+	test_line_count = 1 errors &&
+	grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err &&
 	grep "dangling blob $empty_blob" out
 '
 
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v4 21/21] fsck: report invalid object type-path combinations
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (19 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 20/21] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
@ 2021-06-24 19:23         ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 19:23 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras, Ævar Arnfjörð Bjarmason

Improve the error that's emitted in cases where we find a loose object
we parse, but which isn't at the location we expect it to be.

Before this change we'd prefix the error with a not-a-OID derived from
the path at which the object was found, due to an emergent behavior in
how we'd end up with an "OID" in these codepaths.

Now we'll instead say what object we hashed, and what path it was
found at. Before this patch series e.g.:

    $ git hash-object --stdin -w -t blob </dev/null
    e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ mv objects/e6/ objects/e7

Would emit ("[...]" used to abbreviate the OIDs):

    git fsck
    error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
    error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]

Now we'll instead emit:

    error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]

Furthermore, we'll do the right thing when the object type and its
location are bad. I.e. this case:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    8315a83d2acc4c174aed59430f9a9c4ed926440f
    $ mv objects/83 objects/84

As noted in an earlier commits we'd simply die early in those cases,
until preceding commits fixed the hard die on invalid object type:

    $ git fsck
    fatal: invalid object type

Now we'll instead emit sensible error messages:

    $ git fsck
    error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...]
    error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...]

In both fsck.c and object-file.c we're using null_oid as a sentinel
value for checking whether we got far enough to be certain that the
issue was indeed this OID mismatch.

In the case of check_object_signature() I don't really trust all the
moving parts there to behave consistently, in the face of future
refactorings. Getting it wrong would mean that we'd potentially emit
no error at all on a failing check_object_signature(), or worse
misreport whatever issue we encountered. So let's use the new bug()
function to ferry and return code up to fsck_loose() in that case.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fast-export.c |  2 +-
 builtin/fsck.c        | 13 +++++++++----
 builtin/index-pack.c  |  2 +-
 builtin/mktag.c       |  3 ++-
 object-file.c         | 21 ++++++++++++---------
 object-store.h        |  4 +++-
 object.c              |  4 ++--
 pack-check.c          |  3 ++-
 t/t1006-cat-file.sh   |  2 +-
 t/t1450-fsck.sh       |  8 +++++---
 10 files changed, 38 insertions(+), 24 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 3c20f164f0f..48a3b6a7f8f 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -312,7 +312,7 @@ static void export_blob(const struct object_id *oid)
 		if (!buf)
 			die("could not read blob %s", oid_to_hex(oid));
 		if (check_object_signature(the_repository, oid, buf, size,
-					   type_name(type)) < 0)
+					   type_name(type), NULL) < 0)
 			die("oid mismatch in blob %s", oid_to_hex(oid));
 		object = parse_object_buffer(the_repository, oid, type,
 					     size, buf, &eaten);
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 07af0434db6..158b9dac9b3 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -603,20 +603,25 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	struct strbuf sb = STRBUF_INIT;
 	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 	struct object_info oi;
+	struct object_id real_oid = *null_oid();
 	int found = 0;
 	oi.type_name = &sb;
 	oi.sizep = &size;
 	oi.typep = &type;
 
-	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
+	if (read_loose_object(path, oid, &real_oid, &contents, &oi, oi_flags) < 0) {
 		found |= ERROR_OBJECT;
-		error(_("%s: object corrupt or missing: %s"),
-		      oid_to_hex(oid), path);
+		if (!oideq(&real_oid, oid))
+			error(_("%s: hash-path mismatch, found at: %s"),
+			      oid_to_hex(&real_oid), path);
+		else
+			error(_("%s: object corrupt or missing: %s"),
+			      oid_to_hex(oid), path);
 	}
 	if (type < 0) {
 		found |= ERROR_OBJECT;
 		error(_("%s: object is of unknown type '%s': %s"),
-		      oid_to_hex(oid), sb.buf, path);
+		      oid_to_hex(&real_oid), sb.buf, path);
 	}
 	if (found) {
 		errors_found |= ERROR_OBJECT;
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 3fbc5d70777..bf860b6555e 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1421,7 +1421,7 @@ static void fix_unresolved_deltas(struct hashfile *f)
 
 		if (check_object_signature(the_repository, &d->oid,
 					   data, size,
-					   type_name(type)))
+					   type_name(type), NULL))
 			die(_("local object %s is corrupt"), oid_to_hex(&d->oid));
 
 		/*
diff --git a/builtin/mktag.c b/builtin/mktag.c
index dddcccdd368..3b2dbbb37e6 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -62,7 +62,8 @@ static int verify_object_in_tag(struct object_id *tagged_oid, int *tagged_type)
 
 	repl = lookup_replace_object(the_repository, tagged_oid);
 	ret = check_object_signature(the_repository, repl,
-				     buffer, size, type_name(*tagged_type));
+				     buffer, size, type_name(*tagged_type),
+				     NULL);
 	free(buffer);
 
 	return ret;
diff --git a/object-file.c b/object-file.c
index e550ea0c7cf..923ff759e19 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1039,9 +1039,11 @@ void *xmmap(void *start, size_t length,
  * the streaming interface and rehash it to do the same.
  */
 int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *map, unsigned long size, const char *type)
+			   void *map, unsigned long size, const char *type,
+			   struct object_id *real_oidp)
 {
-	struct object_id real_oid;
+	struct object_id tmp;
+	struct object_id *real_oid = real_oidp ? real_oidp : &tmp;
 	enum object_type obj_type;
 	struct git_istream *st;
 	git_hash_ctx c;
@@ -1049,8 +1051,8 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 	int hdrlen;
 
 	if (map) {
-		hash_object_file(r->hash_algo, map, size, type, &real_oid);
-		return !oideq(oid, &real_oid) ? -1 : 0;
+		hash_object_file(r->hash_algo, map, size, type, real_oid);
+		return !oideq(oid, real_oid) ? -1 : 0;
 	}
 
 	st = open_istream(r, oid, &obj_type, &size, NULL);
@@ -1075,9 +1077,9 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 			break;
 		r->hash_algo->update_fn(&c, buf, readlen);
 	}
-	r->hash_algo->final_oid_fn(&real_oid, &c);
+	r->hash_algo->final_oid_fn(real_oid, &c);
 	close_istream(st);
-	return !oideq(oid, &real_oid) ? -1 : 0;
+	return !oideq(oid, real_oid) ? -1 : 0;
 }
 
 int git_open_cloexec(const char *name, int flags)
@@ -2534,6 +2536,7 @@ static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
+		      struct object_id *real_oid,
 		      void **contents,
 		      struct object_info *oi,
 		      unsigned int oi_flags)
@@ -2583,9 +2586,9 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size, oi->type_name->buf)) {
-			error(_("hash mismatch for %s (expected %s)"), path,
-			      oid_to_hex(expected_oid));
+					   *contents, *size, oi->type_name->buf, real_oid)) {
+			if (oideq(real_oid, null_oid()))
+				BUG("should only get OID mismatch errors with mapped contents");
 			free(*contents);
 			goto out;
 		}
diff --git a/object-store.h b/object-store.h
index 96a5970f314..9fc69016361 100644
--- a/object-store.h
+++ b/object-store.h
@@ -384,6 +384,7 @@ int oid_object_info_extended(struct repository *r,
  */
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
+		      struct object_id *real_oid,
 		      void **contents,
 		      struct object_info *oi,
 		      unsigned int oi_flags);
@@ -507,7 +508,8 @@ int unpack_loose_header(git_zstream *stream, unsigned char *map,
 int parse_loose_header(const char *hdr, struct object_info *oi);
 
 int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *buf, unsigned long size, const char *type);
+			   void *buf, unsigned long size, const char *type,
+			   struct object_id *real_oidp);
 int finalize_object_file(const char *tmpfile, const char *filename);
 int check_and_freshen_file(const char *fn, int freshen);
 
diff --git a/object.c b/object.c
index 14188453c56..5467ead3285 100644
--- a/object.c
+++ b/object.c
@@ -261,7 +261,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
 	    (!obj && repo_has_object_file(r, oid) &&
 	     oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
-		if (check_object_signature(r, repl, NULL, 0, NULL) < 0) {
+		if (check_object_signature(r, repl, NULL, 0, NULL, NULL) < 0) {
 			error(_("hash mismatch %s"), oid_to_hex(oid));
 			return NULL;
 		}
@@ -272,7 +272,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	buffer = repo_read_object_file(r, oid, &type, &size);
 	if (buffer) {
 		if (check_object_signature(r, repl, buffer, size,
-					   type_name(type)) < 0) {
+					   type_name(type), NULL) < 0) {
 			free(buffer);
 			error(_("hash mismatch %s"), oid_to_hex(repl));
 			return NULL;
diff --git a/pack-check.c b/pack-check.c
index 4b089fe8ec0..e6aa4442c90 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -142,7 +142,8 @@ static int verify_packfile(struct repository *r,
 			err = error("cannot unpack %s from %s at offset %"PRIuMAX"",
 				    oid_to_hex(&oid), p->pack_name,
 				    (uintmax_t)entries[i].offset);
-		else if (check_object_signature(r, &oid, data, size, type_name(type)))
+		else if (check_object_signature(r, &oid, data, size,
+						type_name(type), NULL))
 			err = error("packed %s from %s is corrupt",
 				    oid_to_hex(&oid), p->pack_name);
 		else if (fn) {
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 06d38e1fae6..72386cfec0e 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -490,7 +490,7 @@ test_expect_success 'cat-file -t and -s on corrupt loose object' '
 		# Swap the two to corrupt the repository
 		mv -f "$other_path" "$empty_path" &&
 		test_must_fail git fsck 2>err.fsck &&
-		grep "hash mismatch" err.fsck &&
+		grep "hash-path mismatch" err.fsck &&
 
 		# confirm that cat-file is reading the new swapped-in
 		# blob...
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index bc541af2cfc..d76293c495a 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -53,6 +53,7 @@ test_expect_success 'object with hash mismatch' '
 	(
 		cd hash-mismatch &&
 		oid=$(echo blob | git hash-object -w --stdin) &&
+		oldoid=$oid &&
 		old=$(test_oid_to_path "$oid") &&
 		new=$(dirname $old)/$(test_oid ff_2) &&
 		oid="$(dirname $new)$(basename $new)" &&
@@ -62,7 +63,7 @@ test_expect_success 'object with hash mismatch' '
 		cmt=$(echo bogus | git commit-tree $tree) &&
 		git update-ref refs/heads/bogus $cmt &&
 		test_must_fail git fsck 2>out &&
-		test_i18ngrep "$oid.*corrupt" out
+		grep "$oldoid: hash-path mismatch, found at: .*$new" out
 	)
 '
 
@@ -71,6 +72,7 @@ test_expect_success 'object with hash and type mismatch' '
 	(
 		cd hash-type-mismatch &&
 		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		oldoid=$oid &&
 		old=$(test_oid_to_path "$oid") &&
 		new=$(dirname $old)/$(test_oid ff_2) &&
 		oid="$(dirname $new)$(basename $new)" &&
@@ -80,8 +82,8 @@ test_expect_success 'object with hash and type mismatch' '
 		cmt=$(echo bogus | git commit-tree $tree) &&
 		git update-ref refs/heads/bogus $cmt &&
 		test_must_fail git fsck 2>out &&
-		grep "^error: hash mismatch for " out &&
-		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+		grep "^error: $oldoid: hash-path mismatch, found at: .*$new" out &&
+		grep "^error: $oldoid: object is of unknown type '"'"'garbage'"'"'" out
 	)
 '
 
-- 
2.32.0.606.g2e440ee2c94


^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v4 01/21] fsck tests: refactor one test to use a sub-repo
  2021-06-24 19:23         ` [PATCH v4 01/21] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
@ 2021-06-24 22:00           ` Andrei Rybak
  2021-06-24 22:34             ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 155+ messages in thread
From: Andrei Rybak @ 2021-06-24 22:00 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason, git
  Cc: Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan, Felipe Contreras

On 24/06/2021 21:23, Ævar Arnfjörð Bjarmason wrote:
> Refactor one of the fsck tests to use a throwaway repository. It's a
> pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
> teardown of a tests so we're not leaving corrupt content for the next
> test.
> 
> We should instead simply use something like this test_create_repo
> pattern. It's both less verbose, and makes things easier to debug as a
> failing test can have their state left behind under -d without
> damaging the state for other tests.
> 
> But let's punt on that general refactoring and just change this one
> test, I'm going to change it further in subsequent commits.
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>   t/t1450-fsck.sh | 34 ++++++++++++++++------------------
>   1 file changed, 16 insertions(+), 18 deletions(-)
> 
> diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
> index 5071ac63a5b..1563b35f88c 100755
> --- a/t/t1450-fsck.sh
> +++ b/t/t1450-fsck.sh
> @@ -48,24 +48,22 @@ remove_object () {
>   	rm "$(sha1_file "$1")"
>   }
>   
> -test_expect_success 'object with bad sha1' '
> -	sha=$(echo blob | git hash-object -w --stdin) &&
> -	old=$(test_oid_to_path "$sha") &&
> -	new=$(dirname $old)/$(test_oid ff_2) &&
> -	sha="$(dirname $new)$(basename $new)" &&
> -	mv .git/objects/$old .git/objects/$new &&
> -	test_when_finished "remove_object $sha" &&
> -	git update-index --add --cacheinfo 100644 $sha foo &&
> -	test_when_finished "git read-tree -u --reset HEAD" &&
> -	tree=$(git write-tree) &&
> -	test_when_finished "remove_object $tree" &&
> -	cmt=$(echo bogus | git commit-tree $tree) &&
> -	test_when_finished "remove_object $cmt" &&
> -	git update-ref refs/heads/bogus $cmt &&
> -	test_when_finished "git update-ref -d refs/heads/bogus" &&
> -
> -	test_must_fail git fsck 2>out &&
> -	test_i18ngrep "$sha.*corrupt" out
> +test_expect_success 'object with hash mismatch' '
> +	test_create_repo hash-mismatch &&

This patch was originally sent to ML on 2021-03-28:
	https://lore.kernel.org/git/patch-2.6-3e547289408-20210328T025618Z-avarab@gmail.com/

since then, however, commit f0d4d398e2 (test-lib: split up and deprecate
test_create_repo(), 2021-05-10) has been merged ;-) so this line should
be:

	git init hash-mismatch &&


> +	(
> +		cd hash-mismatch &&
> +		oid=$(echo blob | git hash-object -w --stdin) &&
> +		old=$(test_oid_to_path "$oid") &&
> +		new=$(dirname $old)/$(test_oid ff_2) &&
> +		oid="$(dirname $new)$(basename $new)" &&
> +		mv .git/objects/$old .git/objects/$new &&
> +		git update-index --add --cacheinfo 100644 $oid foo &&
> +		tree=$(git write-tree) &&
> +		cmt=$(echo bogus | git commit-tree $tree) &&
> +		git update-ref refs/heads/bogus $cmt &&
> +		test_must_fail git fsck 2>out &&
> +		test_i18ngrep "$oid.*corrupt" out
> +	)
>   '
>   
>   test_expect_success 'branch pointing to non-commit' '
> 


^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v4 01/21] fsck tests: refactor one test to use a sub-repo
  2021-06-24 22:00           ` Andrei Rybak
@ 2021-06-24 22:34             ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-06-24 22:34 UTC (permalink / raw)
  To: Andrei Rybak
  Cc: git, Junio C Hamano, Jeff King, Johannes Sixt, Jonathan Tan,
	Felipe Contreras


On Fri, Jun 25 2021, Andrei Rybak wrote:

> On 24/06/2021 21:23, Ævar Arnfjörð Bjarmason wrote:
>> Refactor one of the fsck tests to use a throwaway repository. It's a
>> pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
>> teardown of a tests so we're not leaving corrupt content for the next
>> test.
>> We should instead simply use something like this test_create_repo
>> pattern. It's both less verbose, and makes things easier to debug as a
>> failing test can have their state left behind under -d without
>> damaging the state for other tests.
>> But let's punt on that general refactoring and just change this one
>> test, I'm going to change it further in subsequent commits.
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>>   t/t1450-fsck.sh | 34 ++++++++++++++++------------------
>>   1 file changed, 16 insertions(+), 18 deletions(-)
>> diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
>> index 5071ac63a5b..1563b35f88c 100755
>> --- a/t/t1450-fsck.sh
>> +++ b/t/t1450-fsck.sh
>> @@ -48,24 +48,22 @@ remove_object () {
>>   	rm "$(sha1_file "$1")"
>>   }
>>   -test_expect_success 'object with bad sha1' '
>> -	sha=$(echo blob | git hash-object -w --stdin) &&
>> -	old=$(test_oid_to_path "$sha") &&
>> -	new=$(dirname $old)/$(test_oid ff_2) &&
>> -	sha="$(dirname $new)$(basename $new)" &&
>> -	mv .git/objects/$old .git/objects/$new &&
>> -	test_when_finished "remove_object $sha" &&
>> -	git update-index --add --cacheinfo 100644 $sha foo &&
>> -	test_when_finished "git read-tree -u --reset HEAD" &&
>> -	tree=$(git write-tree) &&
>> -	test_when_finished "remove_object $tree" &&
>> -	cmt=$(echo bogus | git commit-tree $tree) &&
>> -	test_when_finished "remove_object $cmt" &&
>> -	git update-ref refs/heads/bogus $cmt &&
>> -	test_when_finished "git update-ref -d refs/heads/bogus" &&
>> -
>> -	test_must_fail git fsck 2>out &&
>> -	test_i18ngrep "$sha.*corrupt" out
>> +test_expect_success 'object with hash mismatch' '
>> +	test_create_repo hash-mismatch &&
>
> This patch was originally sent to ML on 2021-03-28:
> 	https://lore.kernel.org/git/patch-2.6-3e547289408-20210328T025618Z-avarab@gmail.com/
>
> since then, however, commit f0d4d398e2 (test-lib: split up and deprecate
> test_create_repo(), 2021-05-10) has been merged ;-) so this line should
> be:
>
> 	git init hash-mismatch &&

Thanks, you'd think that the author of that code would be in a better
position than most to get this right, but apparently not :)

This series originally pre-dates that work, I see I have a couple of
more test_create_repo() in other patches here, will fix for a v5 pending
more discussion on v4, thanks!

^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting
  2021-06-24 19:23       ` [PATCH v4 00/21] " Ævar Arnfjörð Bjarmason
                           ` (20 preceding siblings ...)
  2021-06-24 19:23         ` [PATCH v4 21/21] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37         ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 01/21] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
                             ` (21 more replies)
  21 siblings, 22 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

This improves fsck error reporting, see the examples in the commit
messages of 18/21, 20/21 and 21/21, to get there I've lib-ified more
thigs in object-file.c and the general object APIs, i.e. now we'll
return error codes instead of calling die() in these cases.

The fsck improvements are rather obscure & trivial in the grand scheme
of things, but the object API improvements make it easier to work with
in general.

A trivial re-roll of v4 to s/test_create_repo/git init/g, pointed out
by Andrei Rybak, I changed them to "git init --bare" while I was at
it. For v4 see:

https://lore.kernel.org/git/cover-00.21-00000000000-20210624T191754Z-avarab@gmail.com/

Ævar Arnfjörð Bjarmason (21):
  fsck tests: refactor one test to use a sub-repo
  fsck tests: add test for fsck-ing an unknown type
  cat-file tests: test for missing object with -t and -s
  cat-file tests: test that --allow-unknown-type isn't on by default
  rev-list tests: test for behavior with invalid object types
  cat-file tests: add corrupt loose object test
  cat-file tests: test for current --allow-unknown-type behavior
  cache.h: move object functions to object-store.h
  object-file.c: don't set "typep" when returning non-zero
  object-file.c: make parse_loose_header_extended() public
  object-file.c: add missing braces to loose_object_info()
  object-file.c: simplify unpack_loose_short_header()
  object-file.c: split up ternary in parse_loose_header()
  object-file.c: stop dying in parse_loose_header()
  object-file.c: guard against future bugs in loose_object_info()
  object-file.c: return -1, not "status" from unpack_loose_header()
  object-file.c: return -2 on "header too long" in unpack_loose_header()
  fsck: don't hard die on invalid object types
  object-store.h: move read_loose_object() below 'struct object_info'
  fsck: report invalid types recorded in objects
  fsck: report invalid object type-path combinations

 builtin/fast-export.c  |   2 +-
 builtin/fsck.c         |  28 ++++++-
 builtin/index-pack.c   |   2 +-
 builtin/mktag.c        |   3 +-
 cache.h                |  10 ---
 object-file.c          | 178 +++++++++++++++++++++--------------------
 object-store.h         |  62 +++++++++++---
 object.c               |   4 +-
 pack-check.c           |   3 +-
 streaming.c            |  10 ++-
 t/t1006-cat-file.sh    | 169 ++++++++++++++++++++++++++++++++++++++
 t/t1450-fsck.sh        |  64 +++++++++++----
 t/t6115-rev-list-du.sh |  11 +++
 13 files changed, 407 insertions(+), 139 deletions(-)

Range-diff against v4:
 1:  2e37971c016 !  1:  a1259cdedcb fsck tests: refactor one test to use a sub-repo
    @@ t/t1450-fsck.sh: remove_object () {
     -	test_must_fail git fsck 2>out &&
     -	test_i18ngrep "$sha.*corrupt" out
     +test_expect_success 'object with hash mismatch' '
    -+	test_create_repo hash-mismatch &&
    ++	git init --bare hash-mismatch &&
     +	(
     +		cd hash-mismatch &&
     +		oid=$(echo blob | git hash-object -w --stdin) &&
     +		old=$(test_oid_to_path "$oid") &&
     +		new=$(dirname $old)/$(test_oid ff_2) &&
     +		oid="$(dirname $new)$(basename $new)" &&
    -+		mv .git/objects/$old .git/objects/$new &&
    ++		mv objects/$old objects/$new &&
     +		git update-index --add --cacheinfo 100644 $oid foo &&
     +		tree=$(git write-tree) &&
     +		cmt=$(echo bogus | git commit-tree $tree) &&
 2:  79630a99433 !  2:  634f991d7c6 fsck tests: add test for fsck-ing an unknown type
    @@ t/t1450-fsck.sh: test_expect_success 'detect corrupt index file in fsck' '
      '
      
     +test_expect_success 'fsck hard errors on an invalid object type' '
    -+	test_create_repo garbage-type &&
    ++	git init --bare garbage-type &&
     +	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
     +	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
     +	cat >err.expect <<-\EOF &&
 3:  2b5366bfb9d =  3:  ce9dcc423e9 cat-file tests: test for missing object with -t and -s
 4:  ea9a5ef0920 =  4:  50a20741e86 cat-file tests: test that --allow-unknown-type isn't on by default
 5:  8eaf0e6ddda =  5:  f8d0b630d0a rev-list tests: test for behavior with invalid object types
 6:  f0e9d92414e =  6:  43335e653b8 cat-file tests: add corrupt loose object test
 7:  d797d2e8e9d =  7:  a00dfea3fb8 cat-file tests: test for current --allow-unknown-type behavior
 8:  96310a0bb59 =  8:  387d7f08e61 cache.h: move object functions to object-store.h
 9:  54fb9189408 =  9:  e9520953956 object-file.c: don't set "typep" when returning non-zero
10:  9d36fcbc44a = 10:  a8b408eefe6 object-file.c: make parse_loose_header_extended() public
11:  74c308adc19 = 11:  31eee4da0e1 object-file.c: add missing braces to loose_object_info()
12:  3f52149bfde = 12:  dae5cfabd57 object-file.c: simplify unpack_loose_short_header()
13:  ba632be1520 = 13:  0d8385d8d12 object-file.c: split up ternary in parse_loose_header()
14:  ea4f446f5b1 = 14:  d1522291aee object-file.c: stop dying in parse_loose_header()
15:  aacef784eab = 15:  13d4141a21b object-file.c: guard against future bugs in loose_object_info()
16:  050cfc7808c = 16:  912c9edf362 object-file.c: return -1, not "status" from unpack_loose_header()
17:  78e3152fd94 = 17:  7e101f97646 object-file.c: return -2 on "header too long" in unpack_loose_header()
18:  f9bb1b799ac ! 18:  3c04065b0b0 fsck: don't hard die on invalid object types
    @@ t/t1450-fsck.sh: test_expect_success 'detect corrupt index file in fsck' '
      
     -test_expect_success 'fsck hard errors on an invalid object type' '
     +test_expect_success 'fsck error and recovery on invalid object type' '
    - 	test_create_repo garbage-type &&
    + 	git init --bare garbage-type &&
      	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
      	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
     -	cat >err.expect <<-\EOF &&
19:  acbea7e2a2a = 19:  ad920362594 object-store.h: move read_loose_object() below 'struct object_info'
20:  edc28de229d ! 20:  02a148af5cf fsck: report invalid types recorded in objects
    @@ t/t1450-fsck.sh: test_expect_success 'object with hash mismatch' '
      '
      
     +test_expect_success 'object with hash and type mismatch' '
    -+	test_create_repo hash-type-mismatch &&
    ++	git init --bare hash-type-mismatch &&
     +	(
     +		cd hash-type-mismatch &&
     +		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
     +		old=$(test_oid_to_path "$oid") &&
     +		new=$(dirname $old)/$(test_oid ff_2) &&
     +		oid="$(dirname $new)$(basename $new)" &&
    -+		mv .git/objects/$old .git/objects/$new &&
    ++		mv objects/$old objects/$new &&
     +		git update-index --add --cacheinfo 100644 $oid foo &&
     +		tree=$(git write-tree) &&
     +		cmt=$(echo bogus | git commit-tree $tree) &&
21:  e588c05f461 = 21:  730e0a6f805 fsck: report invalid object type-path combinations
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v5 01/21] fsck tests: refactor one test to use a sub-repo
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 02/21] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
                             ` (20 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Refactor one of the fsck tests to use a throwaway repository. It's a
pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
teardown of a tests so we're not leaving corrupt content for the next
test.

We should instead simply use something like this test_create_repo
pattern. It's both less verbose, and makes things easier to debug as a
failing test can have their state left behind under -d without
damaging the state for other tests.

But let's punt on that general refactoring and just change this one
test, I'm going to change it further in subsequent commits.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 34 ++++++++++++++++------------------
 1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 5071ac63a5b..7becab5ba1e 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -48,24 +48,22 @@ remove_object () {
 	rm "$(sha1_file "$1")"
 }
 
-test_expect_success 'object with bad sha1' '
-	sha=$(echo blob | git hash-object -w --stdin) &&
-	old=$(test_oid_to_path "$sha") &&
-	new=$(dirname $old)/$(test_oid ff_2) &&
-	sha="$(dirname $new)$(basename $new)" &&
-	mv .git/objects/$old .git/objects/$new &&
-	test_when_finished "remove_object $sha" &&
-	git update-index --add --cacheinfo 100644 $sha foo &&
-	test_when_finished "git read-tree -u --reset HEAD" &&
-	tree=$(git write-tree) &&
-	test_when_finished "remove_object $tree" &&
-	cmt=$(echo bogus | git commit-tree $tree) &&
-	test_when_finished "remove_object $cmt" &&
-	git update-ref refs/heads/bogus $cmt &&
-	test_when_finished "git update-ref -d refs/heads/bogus" &&
-
-	test_must_fail git fsck 2>out &&
-	test_i18ngrep "$sha.*corrupt" out
+test_expect_success 'object with hash mismatch' '
+	git init --bare hash-mismatch &&
+	(
+		cd hash-mismatch &&
+		oid=$(echo blob | git hash-object -w --stdin) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+		mv objects/$old objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+		test_must_fail git fsck 2>out &&
+		test_i18ngrep "$oid.*corrupt" out
+	)
 '
 
 test_expect_success 'branch pointing to non-commit' '
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v5 02/21] fsck tests: add test for fsck-ing an unknown type
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 01/21] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 03/21] cat-file tests: test for missing object with -t and -s Ævar Arnfjörð Bjarmason
                             ` (19 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Fix a blindspot in the fsck tests by checking what we do when we
encounter an unknown "garbage" type produced with hash-object's
--literally option.

This behavior needs to be improved, which'll be done in subsequent
patches, but for now let's test for the current behavior.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 7becab5ba1e..f10d6f7b7e8 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -863,4 +863,16 @@ test_expect_success 'detect corrupt index file in fsck' '
 	test_i18ngrep "bad index file" errors
 '
 
+test_expect_success 'fsck hard errors on an invalid object type' '
+	git init --bare garbage-type &&
+	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
+	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type
+	EOF
+	test_must_fail git -C garbage-type fsck >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual
+'
+
 test_done
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v5 03/21] cat-file tests: test for missing object with -t and -s
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 01/21] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 02/21] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 04/21] cat-file tests: test that --allow-unknown-type isn't on by default Ævar Arnfjörð Bjarmason
                             ` (18 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Test for what happens when the -t and -s flags are asked to operate on
a missing object, this extends tests added in 3e370f9faf0 (t1006: add
tests for git cat-file --allow-unknown-type, 2015-05-03). The -t and
-s flags are the only ones that can be combined with
--allow-unknown-type, so let's test with and without that flag.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 5d2dc99b74a..b71ef94329e 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -315,6 +315,33 @@ test_expect_success '%(deltabase) reports packed delta bases' '
 	}
 '
 
+missing_oid=$(test_oid deadbeef)
+test_expect_success 'error on type of missing object' '
+	cat >expect.err <<-\EOF &&
+	fatal: git cat-file: could not get object info
+	EOF
+	test_must_fail git cat-file -t $missing_oid >out 2>err &&
+	test_must_be_empty out &&
+	test_cmp expect.err err &&
+
+	test_must_fail git cat-file -t --allow-unknown-type $missing_oid >out 2>err &&
+	test_must_be_empty out &&
+	test_cmp expect.err err
+'
+
+test_expect_success 'error on size of missing object' '
+	cat >expect.err <<-\EOF &&
+	fatal: git cat-file: could not get object info
+	EOF
+	test_must_fail git cat-file -s $missing_oid >out 2>err &&
+	test_must_be_empty out &&
+	test_cmp expect.err err &&
+
+	test_must_fail git cat-file -s --allow-unknown-type $missing_oid >out 2>err &&
+	test_must_be_empty out &&
+	test_cmp expect.err err
+'
+
 bogus_type="bogus"
 bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v5 04/21] cat-file tests: test that --allow-unknown-type isn't on by default
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (2 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 03/21] cat-file tests: test for missing object with -t and -s Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 05/21] rev-list tests: test for behavior with invalid object types Ævar Arnfjörð Bjarmason
                             ` (17 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Fix a blindspot in the tests for the --allow-unknown-type feature
added in 39e4ae38804 (cat-file: teach cat-file a
'--allow-unknown-type' option, 2015-05-03). We should check that
--allow-unknown-type isn't on by default.

Before this change all the tests would succeed if --allow-unknown-type
was on by default, let's fix that by asserting that -t and -s die on a
"garbage" type without --allow-unknown-type.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index b71ef94329e..dc01d7c4a9a 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -347,6 +347,20 @@ bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
 bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
 
+test_expect_success 'die on broken object under -t and -s without --allow-unknown-type' '
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type
+	EOF
+
+	test_must_fail git cat-file -t $bogus_sha1 >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual &&
+
+	test_must_fail git cat-file -s $bogus_sha1 >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual
+'
+
 test_expect_success "Type of broken object is correct" '
 	echo $bogus_type >expect &&
 	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
@@ -363,6 +377,21 @@ bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
 bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
 
+test_expect_success 'die on broken object with large type under -t and -s without --allow-unknown-type' '
+	cat >err.expect <<-EOF &&
+	error: unable to unpack $bogus_sha1 header
+	fatal: git cat-file: could not get object info
+	EOF
+
+	test_must_fail git cat-file -t $bogus_sha1 >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual &&
+
+	test_must_fail git cat-file -s $bogus_sha1 >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual
+'
+
 test_expect_success "Type of broken object is correct when type is large" '
 	echo $bogus_type >expect &&
 	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v5 05/21] rev-list tests: test for behavior with invalid object types
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (3 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 04/21] cat-file tests: test that --allow-unknown-type isn't on by default Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 06/21] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
                             ` (16 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Fix a blindspot in the tests for the "rev-list --disk-usage" feature
added in 16950f8384a (rev-list: add --disk-usage option for
calculating disk usage, 2021-02-09) to test for what happens when it's
asked to calculate the disk usage of invalid object types.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t6115-rev-list-du.sh | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/t/t6115-rev-list-du.sh b/t/t6115-rev-list-du.sh
index b4aef32b713..edb2ed55846 100755
--- a/t/t6115-rev-list-du.sh
+++ b/t/t6115-rev-list-du.sh
@@ -48,4 +48,15 @@ check_du HEAD
 check_du --objects HEAD
 check_du --objects HEAD^..HEAD
 
+test_expect_success 'setup garbage repository' '
+	git clone --bare . garbage.git &&
+	garbage_oid=$(git -C garbage.git hash-object -t garbage -w --stdin --literally <one.t) &&
+	git -C garbage.git rev-list --objects --all --disk-usage &&
+
+	# Manually create a ref because "update-ref", "tag" etc. have
+	# no corresponding --literally option.
+	echo $garbage_oid >garbage.git/refs/tags/garbage-tag &&
+	test_must_fail git -C garbage.git rev-list --objects --all --disk-usage
+'
+
 test_done
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v5 06/21] cat-file tests: add corrupt loose object test
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (4 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 05/21] rev-list tests: test for behavior with invalid object types Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 07/21] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
                             ` (15 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Fix a blindspot in the tests for "cat-file" (and by proxy, the guts of
object-file.c) by testing that when we can't decode a loose object
with zlib we'll emit an error from zlib.c.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 52 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index dc01d7c4a9a..7f10a92f0e4 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -404,6 +404,58 @@ test_expect_success "Size of large broken object is correct when type is large"
 	test_cmp expect actual
 '
 
+test_expect_success 'cat-file -t and -s on corrupt loose object' '
+	git init --bare corrupt-loose.git &&
+	(
+		cd corrupt-loose.git &&
+
+		# Setup and create the empty blob and its path
+		empty_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$EMPTY_BLOB")) &&
+		git hash-object -w --stdin </dev/null &&
+
+		# Create another blob and its path
+		echo other >other.blob &&
+		other_blob=$(git hash-object -w --stdin <other.blob) &&
+		other_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$other_blob")) &&
+
+		# Before the swap the size is 0
+		cat >out.expect <<-EOF &&
+		0
+		EOF
+		git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# Swap the two to corrupt the repository
+		mv -f "$other_path" "$empty_path" &&
+		test_must_fail git fsck 2>err.fsck &&
+		grep "hash mismatch" err.fsck &&
+
+		# confirm that cat-file is reading the new swapped-in
+		# blob...
+		cat >out.expect <<-EOF &&
+		blob
+		EOF
+		git cat-file -t "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# ... since it has a different size now.
+		cat >out.expect <<-EOF &&
+		6
+		EOF
+		git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# So far "cat-file" has been happy to spew the found
+		# content out as-is. Try to make it zlib-invalid.
+		mv -f other.blob "$empty_path" &&
+		test_must_fail git fsck 2>err.fsck &&
+		grep "^error: inflate: data stream error (" err.fsck
+	)
+'
+
 # Tests for git cat-file --follow-symlinks
 test_expect_success 'prep for symlink tests' '
 	echo_without_newline "$hello_content" >morx &&
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v5 07/21] cat-file tests: test for current --allow-unknown-type behavior
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (5 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 06/21] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 08/21] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
                             ` (14 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Add more tests for the current --allow-unknown-type behavior. As noted
in [1] I don't think much of this makes sense, but let's test for it
as-is so we can see if the behavior changes in the future.

1. https://lore.kernel.org/git/87r1i4qf4h.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 61 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 7f10a92f0e4..86fd2a90ca7 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -361,6 +361,46 @@ test_expect_success 'die on broken object under -t and -s without --allow-unknow
 	test_must_be_empty out.actual
 '
 
+test_expect_success '-e is OK with a broken object without --allow-unknown-type' '
+	git cat-file -e $bogus_sha1
+'
+
+test_expect_success '-e can not be combined with --allow-unknown-type' '
+	test_expect_code 128 git cat-file -e --allow-unknown-type $bogus_sha1
+'
+
+test_expect_success '-p cannot print a broken object even with --allow-unknown-type' '
+	test_must_fail git cat-file -p $bogus_sha1 &&
+	test_expect_code 128 git cat-file -p --allow-unknown-type $bogus_sha1
+'
+
+test_expect_success '<type> <hash> does not work with objects of broken types' '
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type "bogus"
+	EOF
+	test_must_fail git cat-file $bogus_type $bogus_sha1 2>err.actual &&
+	test_cmp err.expect err.actual
+'
+
+test_expect_success 'broken types combined with --batch and --batch-check' '
+	echo $bogus_sha1 >bogus-oid &&
+
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type
+	EOF
+
+	test_must_fail git cat-file --batch <bogus-oid 2>err.actual &&
+	test_cmp err.expect err.actual &&
+
+	test_must_fail git cat-file --batch-check <bogus-oid 2>err.actual &&
+	test_cmp err.expect err.actual
+'
+
+test_expect_success 'the --batch and --batch-check options do not combine with --allow-unknown-type' '
+	test_expect_code 128 git cat-file --batch --allow-unknown-type <bogus-oid &&
+	test_expect_code 128 git cat-file --batch-check --allow-unknown-type <bogus-oid
+'
+
 test_expect_success "Type of broken object is correct" '
 	echo $bogus_type >expect &&
 	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
@@ -372,6 +412,27 @@ test_expect_success "Size of broken object is correct" '
 	git cat-file -s --allow-unknown-type $bogus_sha1 >actual &&
 	test_cmp expect actual
 '
+
+test_expect_success 'the --allow-unknown-type option does not consider replacement refs' '
+	cat >expect <<-EOF &&
+	$bogus_type
+	EOF
+	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+	test_cmp expect actual &&
+
+	# Create it manually, as "git replace" will die on bogus
+	# types.
+	head=$(git rev-parse --verify HEAD) &&
+	mkdir -p .git/refs/replace &&
+	echo $head >.git/refs/replace/$bogus_sha1 &&
+
+	cat >expect <<-EOF &&
+	commit
+	EOF
+	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+	test_cmp expect actual
+'
+
 bogus_type="abcdefghijklmnopqrstuvwxyz1234679"
 bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v5 08/21] cache.h: move object functions to object-store.h
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (6 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 07/21] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 09/21] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
                             ` (13 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Move the declaration of some ancient object functions added in
e.g. c4483576b8d (Add "unpack_sha1_header()" helper function,
2005-06-01) from cache.h to object-store.h. This continues work
started in cbd53a2193d (object-store: move object access functions to
object-store.h, 2018-05-15).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache.h        | 10 ----------
 object-store.h |  9 +++++++++
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/cache.h b/cache.h
index ba04ff8bd36..32ea1ea0474 100644
--- a/cache.h
+++ b/cache.h
@@ -1302,16 +1302,6 @@ char *xdg_cache_home(const char *filename);
 
 int git_open_cloexec(const char *name, int flags);
 #define git_open(name) git_open_cloexec(name, O_RDONLY)
-int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
-
-int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *buf, unsigned long size, const char *type);
-
-int finalize_object_file(const char *tmpfile, const char *filename);
-
-/* Helper to check and "touch" a file */
-int check_and_freshen_file(const char *fn, int freshen);
 
 extern const signed char hexval_table[256];
 static inline unsigned int hexval(unsigned char c)
diff --git a/object-store.h b/object-store.h
index ec32c23dcb5..9117115a50c 100644
--- a/object-store.h
+++ b/object-store.h
@@ -477,4 +477,13 @@ int for_each_object_in_pack(struct packed_git *p,
 int for_each_packed_object(each_packed_object_fn, void *,
 			   enum for_each_object_flags flags);
 
+int unpack_loose_header(git_zstream *stream, unsigned char *map,
+			unsigned long mapsize, void *buffer,
+			unsigned long bufsiz);
+int parse_loose_header(const char *hdr, unsigned long *sizep);
+int check_object_signature(struct repository *r, const struct object_id *oid,
+			   void *buf, unsigned long size, const char *type);
+int finalize_object_file(const char *tmpfile, const char *filename);
+int check_and_freshen_file(const char *fn, int freshen);
+
 #endif /* OBJECT_STORE_H */
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v5 09/21] object-file.c: don't set "typep" when returning non-zero
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (7 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 08/21] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 10/21] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
                             ` (12 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

When the loose_object_info() function returns an error stop faking up
the "oi->typep" to OBJ_BAD. Let the return value of the function
itself suffice. This code cleanup simplifies subsequent changes.

That we set this at all is a relic from the past. Before
052fe5eaca9 (sha1_loose_object_info: make type lookup optional,
2013-07-12) we would always return the type_from_string(type) via the
parse_sha1_header() function, or -1 (i.e. OBJ_BAD) if we couldn't
parse it.

Then in a combination of 46f034483eb (sha1_file: support reading from
a loose object of unknown type, 2015-05-03) and
b3ea7dd32d6 (sha1_loose_object_info: handle errors from
unpack_sha1_rest, 2017-10-05) our API drifted even further towards
conflating the two again.

Having read the code paths involved carefully I think this is OK. We
are just about to return -1, and we have only one caller:
do_oid_object_info_extended(). That function will in turn go on to
return -1 when we return -1 here.

This might be introducing a subtle bug where a caller of
oid_object_info_extended() would inspect its "typep" and expect a
meaningful value if the function returned -1.

Such a problem would not occur for its simpler oid_object_info()
sister function. That one always returns the "enum object_type", which
in the case of -1 would be the OBJ_BAD.

Having read the code for all the callers of these functions I don't
believe any such bug is being introduced here, and in any case we'd
likely already have such a bug for the "sizep" member (although
blindly checking "typep" first would be a more common case).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/object-file.c b/object-file.c
index f233b440b22..9210e2e6fe4 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1480,8 +1480,6 @@ static int loose_object_info(struct repository *r,
 		git_inflate_end(&stream);
 
 	munmap(map, mapsize);
-	if (status && oi->typep)
-		*oi->typep = status;
 	if (oi->sizep == &size_scratch)
 		oi->sizep = NULL;
 	strbuf_release(&hdrbuf);
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v5 10/21] object-file.c: make parse_loose_header_extended() public
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (8 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 09/21] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 11/21] object-file.c: add missing braces to loose_object_info() Ævar Arnfjörð Bjarmason
                             ` (11 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Make the parse_loose_header_extended() function public and remove the
parse_loose_header() wrapper. The only direct user of it outside of
object-file.c itself was in streaming.c, that caller can simply pass
the required "struct object-info *" instead.

This change is being done in preparation for teaching
read_loose_object() to accept a flag to pass to
parse_loose_header(). It isn't strictly necessary for that change, we
could simply use parse_loose_header_extended() there, but will leave
the API in a better end state.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c  | 21 ++++++++-------------
 object-store.h |  3 ++-
 streaming.c    |  5 ++++-
 3 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/object-file.c b/object-file.c
index 9210e2e6fe4..e0ba1842272 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1340,8 +1340,9 @@ static void *unpack_loose_rest(git_zstream *stream,
  * too permissive for what we want to check. So do an anal
  * object header parse by hand.
  */
-static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
-				       unsigned int flags)
+int parse_loose_header(const char *hdr,
+		       struct object_info *oi,
+		       unsigned int flags)
 {
 	const char *type_buf = hdr;
 	unsigned long size;
@@ -1401,14 +1402,6 @@ static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
 	return *hdr ? -1 : type;
 }
 
-int parse_loose_header(const char *hdr, unsigned long *sizep)
-{
-	struct object_info oi = OBJECT_INFO_INIT;
-
-	oi.sizep = sizep;
-	return parse_loose_header_extended(hdr, &oi, 0);
-}
-
 static int loose_object_info(struct repository *r,
 			     const struct object_id *oid,
 			     struct object_info *oi, int flags)
@@ -1463,10 +1456,10 @@ static int loose_object_info(struct repository *r,
 	if (status < 0)
 		; /* Do nothing */
 	else if (hdrbuf.len) {
-		if ((status = parse_loose_header_extended(hdrbuf.buf, oi, flags)) < 0)
+		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header_extended(hdr, oi, flags)) < 0)
+	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
 
 	if (status >= 0 && oi->contentp) {
@@ -2547,6 +2540,8 @@ int read_loose_object(const char *path,
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
+	struct object_info oi = OBJECT_INFO_INIT;
+	oi.sizep = size;
 
 	*contents = NULL;
 
@@ -2561,7 +2556,7 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, size);
+	*type = parse_loose_header(hdr, &oi, 0);
 	if (*type < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
diff --git a/object-store.h b/object-store.h
index 9117115a50c..d443964447c 100644
--- a/object-store.h
+++ b/object-store.h
@@ -480,7 +480,8 @@ int for_each_packed_object(each_packed_object_fn, void *,
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
 			unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
+int parse_loose_header(const char *hdr, struct object_info *oi,
+		       unsigned int flags);
 int check_object_signature(struct repository *r, const struct object_id *oid,
 			   void *buf, unsigned long size, const char *type);
 int finalize_object_file(const char *tmpfile, const char *filename);
diff --git a/streaming.c b/streaming.c
index 5f480ad50c4..8beac62cbb7 100644
--- a/streaming.c
+++ b/streaming.c
@@ -223,6 +223,9 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 			      const struct object_id *oid,
 			      enum object_type *type)
 {
+	struct object_info oi = OBJECT_INFO_INIT;
+	oi.sizep = &st->size;
+
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
 		return -1;
@@ -231,7 +234,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 				 st->u.loose.mapsize,
 				 st->u.loose.hdr,
 				 sizeof(st->u.loose.hdr)) < 0) ||
-	    (parse_loose_header(st->u.loose.hdr, &st->size) < 0)) {
+	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
 		return -1;
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v5 11/21] object-file.c: add missing braces to loose_object_info()
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (9 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 10/21] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 12/21] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
                             ` (10 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Change the formatting in loose_object_info() to conform with our usual
coding style:

    When there are multiple arms to a conditional and some of them
    require braces, enclose even a single line block in braces for
    consistency -- Documentation/CodingGuidelines

This formatting-only change makes a subsequent commit easier to read.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/object-file.c b/object-file.c
index e0ba1842272..646ca7f85d6 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1450,17 +1450,20 @@ static int loose_object_info(struct repository *r,
 		if (unpack_loose_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
 			status = error(_("unable to unpack %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
+	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
-	if (status < 0)
-		; /* Do nothing */
-	else if (hdrbuf.len) {
+	}
+
+	if (status < 0) {
+		/* Do nothing */
+	} else if (hdrbuf.len) {
 		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
+	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0) {
 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
+	}
 
 	if (status >= 0 && oi->contentp) {
 		*oi->contentp = unpack_loose_rest(&stream, hdr,
@@ -1469,8 +1472,9 @@ static int loose_object_info(struct repository *r,
 			git_inflate_end(&stream);
 			status = -1;
 		}
-	} else
+	} else {
 		git_inflate_end(&stream);
+	}
 
 	munmap(map, mapsize);
 	if (oi->sizep == &size_scratch)
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v5 12/21] object-file.c: simplify unpack_loose_short_header()
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (10 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 11/21] object-file.c: add missing braces to loose_object_info() Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 13/21] object-file.c: split up ternary in parse_loose_header() Ævar Arnfjörð Bjarmason
                             ` (9 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Combine the unpack_loose_short_header(),
unpack_loose_header_to_strbuf() and unpack_loose_header() functions
into one.

The unpack_loose_header_to_strbuf() function was added in
46f034483eb (sha1_file: support reading from a loose object of unknown
type, 2015-05-03).

Its code was mostly copy/pasted between it and both of
unpack_loose_header() and unpack_loose_short_header(). We now have a
single unpack_loose_header() function which accepts an optional
"struct strbuf *" instead.

I think the remaining unpack_loose_header() function could be further
simplified, we're carrying some complexity just to be able to emit a
garbage type longer than MAX_HEADER_LEN, we could alternatively just
say "we found a garbage type <first 32 bytes>..." instead. But let's
leave the current behavior in place for now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c  | 60 ++++++++++++++++++--------------------------------
 object-store.h | 14 +++++++++++-
 streaming.c    |  3 ++-
 3 files changed, 37 insertions(+), 40 deletions(-)

diff --git a/object-file.c b/object-file.c
index 646ca7f85d6..ef3a1517fed 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1210,11 +1210,12 @@ void *map_loose_object(struct repository *r,
 	return map_loose_object_1(r, NULL, oid, size);
 }
 
-static int unpack_loose_short_header(git_zstream *stream,
-				     unsigned char *map, unsigned long mapsize,
-				     void *buffer, unsigned long bufsiz)
+int unpack_loose_header(git_zstream *stream,
+			unsigned char *map, unsigned long mapsize,
+			void *buffer, unsigned long bufsiz,
+			struct strbuf *header)
 {
-	int ret;
+	int status;
 
 	/* Get the data stream */
 	memset(stream, 0, sizeof(*stream));
@@ -1225,44 +1226,25 @@ static int unpack_loose_short_header(git_zstream *stream,
 
 	git_inflate_init(stream);
 	obj_read_unlock();
-	ret = git_inflate(stream, 0);
+	status = git_inflate(stream, 0);
 	obj_read_lock();
-
-	return ret;
-}
-
-int unpack_loose_header(git_zstream *stream,
-			unsigned char *map, unsigned long mapsize,
-			void *buffer, unsigned long bufsiz)
-{
-	int status = unpack_loose_short_header(stream, map, mapsize,
-					       buffer, bufsiz);
-
 	if (status < Z_OK)
 		return status;
 
-	/* Make sure we have the terminating NUL */
-	if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
-		return -1;
-	return 0;
-}
-
-static int unpack_loose_header_to_strbuf(git_zstream *stream, unsigned char *map,
-					 unsigned long mapsize, void *buffer,
-					 unsigned long bufsiz, struct strbuf *header)
-{
-	int status;
-
-	status = unpack_loose_short_header(stream, map, mapsize, buffer, bufsiz);
-	if (status < Z_OK)
-		return -1;
-
 	/*
 	 * Check if entire header is unpacked in the first iteration.
 	 */
 	if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
 		return 0;
 
+	/*
+	 * We have a header longer than MAX_HEADER_LEN. The "header"
+	 * here is only non-NULL when we run "cat-file
+	 * --allow-unknown-type".
+	 */
+	if (!header)
+		return -1;
+
 	/*
 	 * buffer[0..bufsiz] was not large enough.  Copy the partial
 	 * result out to header, and then append the result of further
@@ -1410,9 +1392,11 @@ static int loose_object_info(struct repository *r,
 	unsigned long mapsize;
 	void *map;
 	git_zstream stream;
+	int hdr_ret;
 	char hdr[MAX_HEADER_LEN];
 	struct strbuf hdrbuf = STRBUF_INIT;
 	unsigned long size_scratch;
+	int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 
 	if (oi->delta_base_oid)
 		oidclr(oi->delta_base_oid);
@@ -1446,11 +1430,10 @@ static int loose_object_info(struct repository *r,
 
 	if (oi->disk_sizep)
 		*oi->disk_sizep = mapsize;
-	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
-		if (unpack_loose_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
-			status = error(_("unable to unpack %s header with --allow-unknown-type"),
-				       oid_to_hex(oid));
-	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
+
+	hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+				      allow_unknown ? &hdrbuf : NULL);
+	if (hdr_ret < 0) {
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
 	}
@@ -2555,7 +2538,8 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
+	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+				NULL) < 0) {
 		error(_("unable to unpack header of %s"), path);
 		goto out;
 	}
diff --git a/object-store.h b/object-store.h
index d443964447c..31327a7f6c3 100644
--- a/object-store.h
+++ b/object-store.h
@@ -477,9 +477,21 @@ int for_each_object_in_pack(struct packed_git *p,
 int for_each_packed_object(each_packed_object_fn, void *,
 			   enum for_each_object_flags flags);
 
+/**
+ * unpack_loose_header() initializes the data stream needed to unpack
+ * a loose object header.
+ *
+ * Returns 0 on success. Returns negative values on error.
+ *
+ * It will only parse up to MAX_HEADER_LEN bytes unless an optional
+ * "hdrbuf" argument is non-NULL. This is intended for use with
+ * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
+ * reporting. The full header will be extracted to "hdrbuf" for use
+ * with parse_loose_header().
+ */
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
-			unsigned long bufsiz);
+			unsigned long bufsiz, struct strbuf *hdrbuf);
 int parse_loose_header(const char *hdr, struct object_info *oi,
 		       unsigned int flags);
 int check_object_signature(struct repository *r, const struct object_id *oid,
diff --git a/streaming.c b/streaming.c
index 8beac62cbb7..cb3c3cf6ff6 100644
--- a/streaming.c
+++ b/streaming.c
@@ -233,7 +233,8 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 				 st->u.loose.mapped,
 				 st->u.loose.mapsize,
 				 st->u.loose.hdr,
-				 sizeof(st->u.loose.hdr)) < 0) ||
+				 sizeof(st->u.loose.hdr),
+				 NULL) < 0) ||
 	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v5 13/21] object-file.c: split up ternary in parse_loose_header()
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (11 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 12/21] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 14/21] object-file.c: stop dying " Ævar Arnfjörð Bjarmason
                             ` (8 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

This minor formatting change serves to make a subsequent patch easier
to read.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/object-file.c b/object-file.c
index ef3a1517fed..e51cf2ca33e 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1381,7 +1381,10 @@ int parse_loose_header(const char *hdr,
 	/*
 	 * The length must be followed by a zero byte
 	 */
-	return *hdr ? -1 : type;
+	if (*hdr)
+		return -1;
+
+	return type;
 }
 
 static int loose_object_info(struct repository *r,
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v5 14/21] object-file.c: stop dying in parse_loose_header()
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (12 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 13/21] object-file.c: split up ternary in parse_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 15/21] object-file.c: guard against future bugs in loose_object_info() Ævar Arnfjörð Bjarmason
                             ` (7 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Start the libification of parse_loose_header() by making it return
error codes and data instead of invoking die() by itself. For now
we'll move the relevant die() call to loose_object_info() and
read_loose_object() to keep this change smaller, but in subsequent
commits we'll also libify those.

Since the refactoring of parse_loose_header_extended() into
parse_loose_header() in an earlier commit, its interface accepts a
"unsigned long *sizep". Rather it accepts a "struct object_info *",
that structure will be populated with information about the object.

It thus makes sense to further libify the interface so that it stops
calling die() when it encounters OBJ_BAD, and instead rely on its
callers to check the populated "oi->typep".

Because of this we don't need to pass in the "unsigned int flags"
which we used for OBJECT_INFO_ALLOW_UNKNOWN_TYPE, we can instead do
that check in loose_object_info().

This also refactors some confusing control flow around the "status"
variable. In some cases we set it to the return value of "error()",
i.e. -1, and later checked if "status < 0" was true.

In another case added in c84a1f3ed4d (sha1_file: refactor read_object,
2017-06-21) (but the behavior pre-dated that) we did checks of "status
>= 0", because at that point "status" had become the return value of
parse_loose_header(). I.e. a non-negative "enum object_type" (unless
we -1, aka. OBJ_BAD).

Now that parse_loose_header() will return 0 on success instead of the
type (which it'll stick into the "struct object_info") we don't need
to conflate these two cases in its callers.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c  | 53 ++++++++++++++++++++++++++------------------------
 object-store.h | 13 +++++++++++--
 streaming.c    |  4 +++-
 3 files changed, 42 insertions(+), 28 deletions(-)

diff --git a/object-file.c b/object-file.c
index e51cf2ca33e..31263335af9 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1322,9 +1322,7 @@ static void *unpack_loose_rest(git_zstream *stream,
  * too permissive for what we want to check. So do an anal
  * object header parse by hand.
  */
-int parse_loose_header(const char *hdr,
-		       struct object_info *oi,
-		       unsigned int flags)
+int parse_loose_header(const char *hdr, struct object_info *oi)
 {
 	const char *type_buf = hdr;
 	unsigned long size;
@@ -1346,15 +1344,6 @@ int parse_loose_header(const char *hdr,
 	type = type_from_string_gently(type_buf, type_len, 1);
 	if (oi->type_name)
 		strbuf_add(oi->type_name, type_buf, type_len);
-	/*
-	 * Set type to 0 if its an unknown object and
-	 * we're obtaining the type using '--allow-unknown-type'
-	 * option.
-	 */
-	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
-		type = 0;
-	else if (type < 0)
-		die(_("invalid object type"));
 	if (oi->typep)
 		*oi->typep = type;
 
@@ -1384,7 +1373,11 @@ int parse_loose_header(const char *hdr,
 	if (*hdr)
 		return -1;
 
-	return type;
+	/*
+	 * The format is valid, but the type may still be bogus. The
+	 * Caller needs to check its oi->typep.
+	 */
+	return 0;
 }
 
 static int loose_object_info(struct repository *r,
@@ -1399,6 +1392,8 @@ static int loose_object_info(struct repository *r,
 	char hdr[MAX_HEADER_LEN];
 	struct strbuf hdrbuf = STRBUF_INIT;
 	unsigned long size_scratch;
+	enum object_type type_scratch;
+	int parsed_header = 0;
 	int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 
 	if (oi->delta_base_oid)
@@ -1430,6 +1425,8 @@ static int loose_object_info(struct repository *r,
 
 	if (!oi->sizep)
 		oi->sizep = &size_scratch;
+	if (!oi->typep)
+		oi->typep = &type_scratch;
 
 	if (oi->disk_sizep)
 		*oi->disk_sizep = mapsize;
@@ -1440,18 +1437,20 @@ static int loose_object_info(struct repository *r,
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
 	}
-
-	if (status < 0) {
-		/* Do nothing */
-	} else if (hdrbuf.len) {
-		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
-			status = error(_("unable to parse %s header with --allow-unknown-type"),
-				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0) {
-		status = error(_("unable to parse %s header"), oid_to_hex(oid));
+	if (!status) {
+		if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
+			/*
+			 * oi->{sizep,typep} are meaningless unless
+			 * parse_loose_header() returns >= 0.
+			 */
+			parsed_header = 1;
+		else
+			status = error(_("unable to parse %s header"), oid_to_hex(oid));
 	}
+	if (!allow_unknown && parsed_header && *oi->typep < 0)
+		die(_("invalid object type"));
 
-	if (status >= 0 && oi->contentp) {
+	if (parsed_header && oi->contentp) {
 		*oi->contentp = unpack_loose_rest(&stream, hdr,
 						  *oi->sizep, oid);
 		if (!*oi->contentp) {
@@ -1466,6 +1465,8 @@ static int loose_object_info(struct repository *r,
 	if (oi->sizep == &size_scratch)
 		oi->sizep = NULL;
 	strbuf_release(&hdrbuf);
+	if (oi->typep == &type_scratch)
+		oi->typep = NULL;
 	oi->whence = OI_LOOSE;
 	return (status < 0) ? status : 0;
 }
@@ -2531,6 +2532,7 @@ int read_loose_object(const char *path,
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
 	struct object_info oi = OBJECT_INFO_INIT;
+	oi.typep = type;
 	oi.sizep = size;
 
 	*contents = NULL;
@@ -2547,12 +2549,13 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, &oi, 0);
-	if (*type < 0) {
+	if (parse_loose_header(hdr, &oi) < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
 		goto out;
 	}
+	if (*type < 0)
+		die(_("invalid object type"));
 
 	if (*type == OBJ_BLOB && *size > big_file_threshold) {
 		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
diff --git a/object-store.h b/object-store.h
index 31327a7f6c3..65a8e4dc6a8 100644
--- a/object-store.h
+++ b/object-store.h
@@ -492,8 +492,17 @@ int for_each_packed_object(each_packed_object_fn, void *,
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
 			unsigned long bufsiz, struct strbuf *hdrbuf);
-int parse_loose_header(const char *hdr, struct object_info *oi,
-		       unsigned int flags);
+
+/**
+ * parse_loose_header() parses the starting "<type> <len>\0" of an
+ * object. If it doesn't follow that format -1 is returned. To check
+ * the validity of the <type> populate the "typep" in the "struct
+ * object_info". It will be OBJ_BAD if the object type is unknown. The
+ * parsed <len> can be retrieved via "oi->sizep", and from there
+ * passed to unpack_loose_rest().
+ */
+int parse_loose_header(const char *hdr, struct object_info *oi);
+
 int check_object_signature(struct repository *r, const struct object_id *oid,
 			   void *buf, unsigned long size, const char *type);
 int finalize_object_file(const char *tmpfile, const char *filename);
diff --git a/streaming.c b/streaming.c
index cb3c3cf6ff6..c3dc241d6a5 100644
--- a/streaming.c
+++ b/streaming.c
@@ -225,6 +225,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 {
 	struct object_info oi = OBJECT_INFO_INIT;
 	oi.sizep = &st->size;
+	oi.typep = type;
 
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
@@ -235,7 +236,8 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 				 st->u.loose.hdr,
 				 sizeof(st->u.loose.hdr),
 				 NULL) < 0) ||
-	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
+	    (parse_loose_header(st->u.loose.hdr, &oi) < 0) ||
+	    *type < 0) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
 		return -1;
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v5 15/21] object-file.c: guard against future bugs in loose_object_info()
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (13 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 14/21] object-file.c: stop dying " Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 16/21] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
                             ` (6 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

An earlier version of the preceding commit had a subtle bug where our
"type_scratch" (later assigned to "oi->typep") would be uninitialized
and used in the "!allow_unknown" case, at which point it would contain
a nonsensical value if we'd failed to call parse_loose_header().

The preceding commit introduced "parsed_header" variable to check for
this case, but I think we can do better, let's carry a "oi_header"
variable initially set to NULL, and only set it to "oi" once we're
past parse_loose_header().

This is functionally the same thing, but hopefully makes it even more
obvious in the future that we must not access the "typep" and
"sizep" (or "type_name") unless parse_loose_header() succeeds, but
that accessing other fields set earlier (such as the "disk_sizep" set
earlier) is OK.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/object-file.c b/object-file.c
index 31263335af9..d41f444e6cc 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1393,7 +1393,7 @@ static int loose_object_info(struct repository *r,
 	struct strbuf hdrbuf = STRBUF_INIT;
 	unsigned long size_scratch;
 	enum object_type type_scratch;
-	int parsed_header = 0;
+	struct object_info *oi_header = NULL;
 	int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 
 	if (oi->delta_base_oid)
@@ -1441,18 +1441,20 @@ static int loose_object_info(struct repository *r,
 		if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
 			/*
 			 * oi->{sizep,typep} are meaningless unless
-			 * parse_loose_header() returns >= 0.
+			 * parse_loose_header() returns >= 0. Let's
+			 * access them as "oi_header" (just an alias
+			 * for "oi") below to make that intent clear.
 			 */
-			parsed_header = 1;
+			oi_header = oi;
 		else
 			status = error(_("unable to parse %s header"), oid_to_hex(oid));
 	}
-	if (!allow_unknown && parsed_header && *oi->typep < 0)
+	if (!allow_unknown && oi_header && *oi_header->typep < 0)
 		die(_("invalid object type"));
 
-	if (parsed_header && oi->contentp) {
+	if (oi_header && oi->contentp) {
 		*oi->contentp = unpack_loose_rest(&stream, hdr,
-						  *oi->sizep, oid);
+						  *oi_header->sizep, oid);
 		if (!*oi->contentp) {
 			git_inflate_end(&stream);
 			status = -1;
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v5 16/21] object-file.c: return -1, not "status" from unpack_loose_header()
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (14 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 15/21] object-file.c: guard against future bugs in loose_object_info() Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 17/21] object-file.c: return -2 on "header too long" in unpack_loose_header() Ævar Arnfjörð Bjarmason
                             ` (5 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Return a -1 when git_inflate() fails instead of whatever Z_* status
we'd get from zlib.c. This makes no difference to any error we report,
but makes it more obvious that we don't care about the specific zlib
error codes here.

See d21f8426907 (unpack_sha1_header(): detect malformed object header,
2016-09-25) for the commit that added the "return status" code. As far
as I can tell there was never a real reason (e.g. different reporting)
for carrying down the "status" as opposed to "-1".

At the time that d21f8426907 was written there was a corresponding
"ret < Z_OK" check right after the unpack_sha1_header() call (the
"unpack_sha1_header()" function was later rename to our current
"unpack_loose_header()").

However, that check was removed in c84a1f3ed4d (sha1_file: refactor
read_object, 2017-06-21) without changing the corresponding return
code.

So let's do the minor cleanup of also changing this function to return
a -1.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/object-file.c b/object-file.c
index d41f444e6cc..956ca260518 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1229,7 +1229,7 @@ int unpack_loose_header(git_zstream *stream,
 	status = git_inflate(stream, 0);
 	obj_read_lock();
 	if (status < Z_OK)
-		return status;
+		return -1;
 
 	/*
 	 * Check if entire header is unpacked in the first iteration.
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v5 17/21] object-file.c: return -2 on "header too long" in unpack_loose_header()
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (15 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 16/21] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 18/21] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
                             ` (4 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Split up the return code for "header too long" from the generic
negative return value unpack_loose_header() returns, and report via
error() if we exceed MAX_HEADER_LEN.

As a test added earlier in this series in t1006-cat-file.sh shows
we'll correctly emit zlib errors from zlib.c already in this case, so
we have no need to carry those return codes further down the
stack. Let's instead just return -2 saying we ran into the
MAX_HEADER_LEN limit, or other negative values for "unable to unpack
<OID> header".

I tried setting up an enum just for these three return values, but I
think the result was less readable. Let's consider doing that if we
gain even more return values. For now let's do the next best thing and
enumerate our known return values, and BUG() if we encounter one we
don't know about.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c       | 16 +++++++++++++---
 object-store.h      |  6 ++++--
 t/t1006-cat-file.sh |  2 +-
 3 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/object-file.c b/object-file.c
index 956ca260518..1866115a1c5 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1243,7 +1243,7 @@ int unpack_loose_header(git_zstream *stream,
 	 * --allow-unknown-type".
 	 */
 	if (!header)
-		return -1;
+		return -2;
 
 	/*
 	 * buffer[0..bufsiz] was not large enough.  Copy the partial
@@ -1264,7 +1264,7 @@ int unpack_loose_header(git_zstream *stream,
 		stream->next_out = buffer;
 		stream->avail_out = bufsiz;
 	} while (status != Z_STREAM_END);
-	return -1;
+	return -2;
 }
 
 static void *unpack_loose_rest(git_zstream *stream,
@@ -1433,9 +1433,19 @@ static int loose_object_info(struct repository *r,
 
 	hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
 				      allow_unknown ? &hdrbuf : NULL);
-	if (hdr_ret < 0) {
+	switch (hdr_ret) {
+	case 0:
+		break;
+	case -1:
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
+		break;
+	case -2:
+		status = error(_("header for %s too long, exceeds %d bytes"),
+			       oid_to_hex(oid), MAX_HEADER_LEN);
+		break;
+	default:
+		BUG("unknown hdr_ret value %d", hdr_ret);
 	}
 	if (!status) {
 		if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
diff --git a/object-store.h b/object-store.h
index 65a8e4dc6a8..1151ce8e820 100644
--- a/object-store.h
+++ b/object-store.h
@@ -481,13 +481,15 @@ int for_each_packed_object(each_packed_object_fn, void *,
  * unpack_loose_header() initializes the data stream needed to unpack
  * a loose object header.
  *
- * Returns 0 on success. Returns negative values on error.
+ * Returns 0 on success. Returns negative values on error. If the
+ * header exceeds MAX_HEADER_LEN -2 will be returned.
  *
  * It will only parse up to MAX_HEADER_LEN bytes unless an optional
  * "hdrbuf" argument is non-NULL. This is intended for use with
  * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
  * reporting. The full header will be extracted to "hdrbuf" for use
- * with parse_loose_header().
+ * with parse_loose_header(), -2 will still be returned from this
+ * function to indicate that the header was too long.
  */
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 86fd2a90ca7..06d38e1fae6 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -440,7 +440,7 @@ bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_t
 
 test_expect_success 'die on broken object with large type under -t and -s without --allow-unknown-type' '
 	cat >err.expect <<-EOF &&
-	error: unable to unpack $bogus_sha1 header
+	error: header for $bogus_sha1 too long, exceeds 32 bytes
 	fatal: git cat-file: could not get object info
 	EOF
 
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v5 18/21] fsck: don't hard die on invalid object types
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (16 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 17/21] object-file.c: return -2 on "header too long" in unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 19/21] object-store.h: move read_loose_object() below 'struct object_info' Ævar Arnfjörð Bjarmason
                             ` (3 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Change the error fsck emits on invalid object types, such as:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    <OID>

From the very ungraceful error of:

    $ git fsck
    fatal: invalid object type
    $

To:

    $ git fsck
    error: hash mismatch for <OID_PATH> (expected <OID>)
    error: <OID>: object corrupt or missing: <OID_PATH>
    [ the rest of the fsck output here, i.e. it didn't hard die ]

We'll still exit with non-zero, but now we'll finish the rest of the
traversal. The tests that's being added here asserts that we'll still
complain about other fsck issues (e.g. an unrelated dangling blob).

To do this we need to pass down the "OBJECT_INFO_ALLOW_UNKNOWN_TYPE"
flag from read_loose_object() through to parse_loose_header(). Since
the read_loose_object() function is only used in builtin/fsck.c we can
simply change it. See f6371f92104 (sha1_file: add read_loose_object()
function, 2017-01-13) for the introduction of read_loose_object().

Why are we complaining about a "hash mismatch" for an object of a type
we don't know about? We shouldn't. This is the bare minimal change
needed to not make fsck hard die on a repository that's been corrupted
in this manner. In subsequent commits we'll teach fsck to recognize
this particular type of corruption and emit a better error message.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c  |  3 ++-
 object-file.c   | 11 ++++++++---
 object-store.h  |  3 ++-
 t/t1450-fsck.sh | 14 +++++++-------
 4 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index b42b6fe21f7..082dadd5629 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -601,7 +601,8 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	void *contents;
 	int eaten;
 
-	if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
+	if (read_loose_object(path, oid, &type, &size, &contents,
+			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
 		errors_found |= ERROR_OBJECT;
 		error(_("%s: object corrupt or missing: %s"),
 		      oid_to_hex(oid), path);
diff --git a/object-file.c b/object-file.c
index 1866115a1c5..8fb55fc6f58 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2536,7 +2536,8 @@ int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
 		      enum object_type *type,
 		      unsigned long *size,
-		      void **contents)
+		      void **contents,
+		      unsigned int oi_flags)
 {
 	int ret = -1;
 	void *map = NULL;
@@ -2544,6 +2545,7 @@ int read_loose_object(const char *path,
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
 	struct object_info oi = OBJECT_INFO_INIT;
+	int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 	oi.typep = type;
 	oi.sizep = size;
 
@@ -2566,8 +2568,11 @@ int read_loose_object(const char *path,
 		git_inflate_end(&stream);
 		goto out;
 	}
-	if (*type < 0)
-		die(_("invalid object type"));
+	if (!allow_unknown && *type < 0) {
+		error(_("header for %s declares an unknown type"), path);
+		git_inflate_end(&stream);
+		goto out;
+	}
 
 	if (*type == OBJ_BLOB && *size > big_file_threshold) {
 		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
diff --git a/object-store.h b/object-store.h
index 1151ce8e820..94ff03072c1 100644
--- a/object-store.h
+++ b/object-store.h
@@ -245,7 +245,8 @@ int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
 		      enum object_type *type,
 		      unsigned long *size,
-		      void **contents);
+		      void **contents,
+		      unsigned int oi_flags);
 
 /* Retry packed storage after checking packed and loose storage */
 #define HAS_OBJECT_RECHECK_PACKED 1
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index f10d6f7b7e8..d8303db9709 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -863,16 +863,16 @@ test_expect_success 'detect corrupt index file in fsck' '
 	test_i18ngrep "bad index file" errors
 '
 
-test_expect_success 'fsck hard errors on an invalid object type' '
+test_expect_success 'fsck error and recovery on invalid object type' '
 	git init --bare garbage-type &&
 	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
 	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
-	cat >err.expect <<-\EOF &&
-	fatal: invalid object type
-	EOF
-	test_must_fail git -C garbage-type fsck >out.actual 2>err.actual &&
-	test_cmp err.expect err.actual &&
-	test_must_be_empty out.actual
+	test_must_fail git -C garbage-type fsck >out 2>err &&
+	grep -e "^error" -e "^fatal" err >errors &&
+	test_line_count = 2 errors &&
+	grep "error: hash mismatch for" err &&
+	grep "$garbage_blob: object corrupt or missing:" err &&
+	grep "dangling blob $empty_blob" out
 '
 
 test_done
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v5 19/21] object-store.h: move read_loose_object() below 'struct object_info'
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (17 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 18/21] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 20/21] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
                             ` (2 subsequent siblings)
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Move the declaration of read_loose_object() below "struct
object_info". In the next commit we'll add a "struct object_info *"
parameter to it, moving it will avoid a forward declaration of the
struct.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-store.h | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/object-store.h b/object-store.h
index 94ff03072c1..72d668b1674 100644
--- a/object-store.h
+++ b/object-store.h
@@ -234,20 +234,6 @@ int pretend_object_file(void *, unsigned long, enum object_type,
 
 int force_object_loose(const struct object_id *oid, time_t mtime);
 
-/*
- * Open the loose object at path, check its hash, and return the contents,
- * type, and size. If the object is a blob, then "contents" may return NULL,
- * to allow streaming of large blobs.
- *
- * Returns 0 on success, negative on error (details may be written to stderr).
- */
-int read_loose_object(const char *path,
-		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
-		      void **contents,
-		      unsigned int oi_flags);
-
 /* Retry packed storage after checking packed and loose storage */
 #define HAS_OBJECT_RECHECK_PACKED 1
 
@@ -388,6 +374,20 @@ int oid_object_info_extended(struct repository *r,
 			     const struct object_id *,
 			     struct object_info *, unsigned flags);
 
+/*
+ * Open the loose object at path, check its hash, and return the contents,
+ * type, and size. If the object is a blob, then "contents" may return NULL,
+ * to allow streaming of large blobs.
+ *
+ * Returns 0 on success, negative on error (details may be written to stderr).
+ */
+int read_loose_object(const char *path,
+		      const struct object_id *expected_oid,
+		      enum object_type *type,
+		      unsigned long *size,
+		      void **contents,
+		      unsigned int oi_flags);
+
 /*
  * Iterate over the files in the loose-object parts of the object
  * directory "path", triggering the following callbacks:
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v5 20/21] fsck: report invalid types recorded in objects
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (18 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 19/21] object-store.h: move read_loose_object() below 'struct object_info' Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-07-10 13:37           ` [PATCH v5 21/21] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Continue the work in the preceding commit and improve the error on:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    $ git fsck
    error: hash mismatch for <OID_PATH> (expected <OID>)
    error: <OID>: object corrupt or missing: <OID_PATH>
    [ other fsck output ]

To instead emit:

    $ git fsck
    error: <OID>: object is of unknown type 'garbage': <OID_PATH>
    [ other fsck output ]

The complaint about a "hash mismatch" was simply an emergent property
of how we'd fall though from read_loose_object() into fsck_loose()
when we didn't get the data we expected. Now we'll correctly note that
the object type is invalid.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c  | 22 ++++++++++++++++++----
 object-file.c   | 13 +++++--------
 object-store.h  |  4 ++--
 t/t1450-fsck.sh | 24 +++++++++++++++++++++---
 4 files changed, 46 insertions(+), 17 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 082dadd5629..07af0434db6 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -600,12 +600,26 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	unsigned long size;
 	void *contents;
 	int eaten;
-
-	if (read_loose_object(path, oid, &type, &size, &contents,
-			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
-		errors_found |= ERROR_OBJECT;
+	struct strbuf sb = STRBUF_INIT;
+	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
+	struct object_info oi;
+	int found = 0;
+	oi.type_name = &sb;
+	oi.sizep = &size;
+	oi.typep = &type;
+
+	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
+		found |= ERROR_OBJECT;
 		error(_("%s: object corrupt or missing: %s"),
 		      oid_to_hex(oid), path);
+	}
+	if (type < 0) {
+		found |= ERROR_OBJECT;
+		error(_("%s: object is of unknown type '%s': %s"),
+		      oid_to_hex(oid), sb.buf, path);
+	}
+	if (found) {
+		errors_found |= ERROR_OBJECT;
 		return 0; /* keep checking other objects */
 	}
 
diff --git a/object-file.c b/object-file.c
index 8fb55fc6f58..e550ea0c7cf 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2534,9 +2534,8 @@ static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
 		      void **contents,
+		      struct object_info *oi,
 		      unsigned int oi_flags)
 {
 	int ret = -1;
@@ -2544,10 +2543,9 @@ int read_loose_object(const char *path,
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
-	struct object_info oi = OBJECT_INFO_INIT;
 	int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
-	oi.typep = type;
-	oi.sizep = size;
+	enum object_type *type = oi->typep;
+	unsigned long *size = oi->sizep;
 
 	*contents = NULL;
 
@@ -2563,7 +2561,7 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	if (parse_loose_header(hdr, &oi) < 0) {
+	if (parse_loose_header(hdr, oi) < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
 		goto out;
@@ -2585,8 +2583,7 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size,
-					   type_name(*type))) {
+					   *contents, *size, oi->type_name->buf)) {
 			error(_("hash mismatch for %s (expected %s)"), path,
 			      oid_to_hex(expected_oid));
 			free(*contents);
diff --git a/object-store.h b/object-store.h
index 72d668b1674..96a5970f314 100644
--- a/object-store.h
+++ b/object-store.h
@@ -376,6 +376,7 @@ int oid_object_info_extended(struct repository *r,
 
 /*
  * Open the loose object at path, check its hash, and return the contents,
+ * use the "oi" argument to assert things about the object, or e.g. populate its
  * type, and size. If the object is a blob, then "contents" may return NULL,
  * to allow streaming of large blobs.
  *
@@ -383,9 +384,8 @@ int oid_object_info_extended(struct repository *r,
  */
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
 		      void **contents,
+		      struct object_info *oi,
 		      unsigned int oi_flags);
 
 /*
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index d8303db9709..da2658155c7 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -66,6 +66,25 @@ test_expect_success 'object with hash mismatch' '
 	)
 '
 
+test_expect_success 'object with hash and type mismatch' '
+	git init --bare hash-type-mismatch &&
+	(
+		cd hash-type-mismatch &&
+		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+		mv objects/$old objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+		test_must_fail git fsck 2>out &&
+		grep "^error: hash mismatch for " out &&
+		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+	)
+'
+
 test_expect_success 'branch pointing to non-commit' '
 	git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
 	test_when_finished "git update-ref -d refs/heads/invalid" &&
@@ -869,9 +888,8 @@ test_expect_success 'fsck error and recovery on invalid object type' '
 	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
 	test_must_fail git -C garbage-type fsck >out 2>err &&
 	grep -e "^error" -e "^fatal" err >errors &&
-	test_line_count = 2 errors &&
-	grep "error: hash mismatch for" err &&
-	grep "$garbage_blob: object corrupt or missing:" err &&
+	test_line_count = 1 errors &&
+	grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err &&
 	grep "dangling blob $empty_blob" out
 '
 
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v5 21/21] fsck: report invalid object type-path combinations
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (19 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 20/21] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
@ 2021-07-10 13:37           ` Ævar Arnfjörð Bjarmason
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
  21 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-07-10 13:37 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Felipe Contreras,
	Andrei Rybak, Ævar Arnfjörð Bjarmason

Improve the error that's emitted in cases where we find a loose object
we parse, but which isn't at the location we expect it to be.

Before this change we'd prefix the error with a not-a-OID derived from
the path at which the object was found, due to an emergent behavior in
how we'd end up with an "OID" in these codepaths.

Now we'll instead say what object we hashed, and what path it was
found at. Before this patch series e.g.:

    $ git hash-object --stdin -w -t blob </dev/null
    e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ mv objects/e6/ objects/e7

Would emit ("[...]" used to abbreviate the OIDs):

    git fsck
    error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
    error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]

Now we'll instead emit:

    error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]

Furthermore, we'll do the right thing when the object type and its
location are bad. I.e. this case:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    8315a83d2acc4c174aed59430f9a9c4ed926440f
    $ mv objects/83 objects/84

As noted in an earlier commits we'd simply die early in those cases,
until preceding commits fixed the hard die on invalid object type:

    $ git fsck
    fatal: invalid object type

Now we'll instead emit sensible error messages:

    $ git fsck
    error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...]
    error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...]

In both fsck.c and object-file.c we're using null_oid as a sentinel
value for checking whether we got far enough to be certain that the
issue was indeed this OID mismatch.

In the case of check_object_signature() I don't really trust all the
moving parts there to behave consistently, in the face of future
refactorings. Getting it wrong would mean that we'd potentially emit
no error at all on a failing check_object_signature(), or worse
misreport whatever issue we encountered. So let's use the new bug()
function to ferry and return code up to fsck_loose() in that case.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fast-export.c |  2 +-
 builtin/fsck.c        | 13 +++++++++----
 builtin/index-pack.c  |  2 +-
 builtin/mktag.c       |  3 ++-
 object-file.c         | 21 ++++++++++++---------
 object-store.h        |  4 +++-
 object.c              |  4 ++--
 pack-check.c          |  3 ++-
 t/t1006-cat-file.sh   |  2 +-
 t/t1450-fsck.sh       |  8 +++++---
 10 files changed, 38 insertions(+), 24 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 3c20f164f0f..48a3b6a7f8f 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -312,7 +312,7 @@ static void export_blob(const struct object_id *oid)
 		if (!buf)
 			die("could not read blob %s", oid_to_hex(oid));
 		if (check_object_signature(the_repository, oid, buf, size,
-					   type_name(type)) < 0)
+					   type_name(type), NULL) < 0)
 			die("oid mismatch in blob %s", oid_to_hex(oid));
 		object = parse_object_buffer(the_repository, oid, type,
 					     size, buf, &eaten);
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 07af0434db6..158b9dac9b3 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -603,20 +603,25 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	struct strbuf sb = STRBUF_INIT;
 	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 	struct object_info oi;
+	struct object_id real_oid = *null_oid();
 	int found = 0;
 	oi.type_name = &sb;
 	oi.sizep = &size;
 	oi.typep = &type;
 
-	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
+	if (read_loose_object(path, oid, &real_oid, &contents, &oi, oi_flags) < 0) {
 		found |= ERROR_OBJECT;
-		error(_("%s: object corrupt or missing: %s"),
-		      oid_to_hex(oid), path);
+		if (!oideq(&real_oid, oid))
+			error(_("%s: hash-path mismatch, found at: %s"),
+			      oid_to_hex(&real_oid), path);
+		else
+			error(_("%s: object corrupt or missing: %s"),
+			      oid_to_hex(oid), path);
 	}
 	if (type < 0) {
 		found |= ERROR_OBJECT;
 		error(_("%s: object is of unknown type '%s': %s"),
-		      oid_to_hex(oid), sb.buf, path);
+		      oid_to_hex(&real_oid), sb.buf, path);
 	}
 	if (found) {
 		errors_found |= ERROR_OBJECT;
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 3fbc5d70777..bf860b6555e 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1421,7 +1421,7 @@ static void fix_unresolved_deltas(struct hashfile *f)
 
 		if (check_object_signature(the_repository, &d->oid,
 					   data, size,
-					   type_name(type)))
+					   type_name(type), NULL))
 			die(_("local object %s is corrupt"), oid_to_hex(&d->oid));
 
 		/*
diff --git a/builtin/mktag.c b/builtin/mktag.c
index dddcccdd368..3b2dbbb37e6 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -62,7 +62,8 @@ static int verify_object_in_tag(struct object_id *tagged_oid, int *tagged_type)
 
 	repl = lookup_replace_object(the_repository, tagged_oid);
 	ret = check_object_signature(the_repository, repl,
-				     buffer, size, type_name(*tagged_type));
+				     buffer, size, type_name(*tagged_type),
+				     NULL);
 	free(buffer);
 
 	return ret;
diff --git a/object-file.c b/object-file.c
index e550ea0c7cf..923ff759e19 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1039,9 +1039,11 @@ void *xmmap(void *start, size_t length,
  * the streaming interface and rehash it to do the same.
  */
 int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *map, unsigned long size, const char *type)
+			   void *map, unsigned long size, const char *type,
+			   struct object_id *real_oidp)
 {
-	struct object_id real_oid;
+	struct object_id tmp;
+	struct object_id *real_oid = real_oidp ? real_oidp : &tmp;
 	enum object_type obj_type;
 	struct git_istream *st;
 	git_hash_ctx c;
@@ -1049,8 +1051,8 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 	int hdrlen;
 
 	if (map) {
-		hash_object_file(r->hash_algo, map, size, type, &real_oid);
-		return !oideq(oid, &real_oid) ? -1 : 0;
+		hash_object_file(r->hash_algo, map, size, type, real_oid);
+		return !oideq(oid, real_oid) ? -1 : 0;
 	}
 
 	st = open_istream(r, oid, &obj_type, &size, NULL);
@@ -1075,9 +1077,9 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 			break;
 		r->hash_algo->update_fn(&c, buf, readlen);
 	}
-	r->hash_algo->final_oid_fn(&real_oid, &c);
+	r->hash_algo->final_oid_fn(real_oid, &c);
 	close_istream(st);
-	return !oideq(oid, &real_oid) ? -1 : 0;
+	return !oideq(oid, real_oid) ? -1 : 0;
 }
 
 int git_open_cloexec(const char *name, int flags)
@@ -2534,6 +2536,7 @@ static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
+		      struct object_id *real_oid,
 		      void **contents,
 		      struct object_info *oi,
 		      unsigned int oi_flags)
@@ -2583,9 +2586,9 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size, oi->type_name->buf)) {
-			error(_("hash mismatch for %s (expected %s)"), path,
-			      oid_to_hex(expected_oid));
+					   *contents, *size, oi->type_name->buf, real_oid)) {
+			if (oideq(real_oid, null_oid()))
+				BUG("should only get OID mismatch errors with mapped contents");
 			free(*contents);
 			goto out;
 		}
diff --git a/object-store.h b/object-store.h
index 96a5970f314..9fc69016361 100644
--- a/object-store.h
+++ b/object-store.h
@@ -384,6 +384,7 @@ int oid_object_info_extended(struct repository *r,
  */
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
+		      struct object_id *real_oid,
 		      void **contents,
 		      struct object_info *oi,
 		      unsigned int oi_flags);
@@ -507,7 +508,8 @@ int unpack_loose_header(git_zstream *stream, unsigned char *map,
 int parse_loose_header(const char *hdr, struct object_info *oi);
 
 int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *buf, unsigned long size, const char *type);
+			   void *buf, unsigned long size, const char *type,
+			   struct object_id *real_oidp);
 int finalize_object_file(const char *tmpfile, const char *filename);
 int check_and_freshen_file(const char *fn, int freshen);
 
diff --git a/object.c b/object.c
index 14188453c56..5467ead3285 100644
--- a/object.c
+++ b/object.c
@@ -261,7 +261,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
 	    (!obj && repo_has_object_file(r, oid) &&
 	     oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
-		if (check_object_signature(r, repl, NULL, 0, NULL) < 0) {
+		if (check_object_signature(r, repl, NULL, 0, NULL, NULL) < 0) {
 			error(_("hash mismatch %s"), oid_to_hex(oid));
 			return NULL;
 		}
@@ -272,7 +272,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	buffer = repo_read_object_file(r, oid, &type, &size);
 	if (buffer) {
 		if (check_object_signature(r, repl, buffer, size,
-					   type_name(type)) < 0) {
+					   type_name(type), NULL) < 0) {
 			free(buffer);
 			error(_("hash mismatch %s"), oid_to_hex(repl));
 			return NULL;
diff --git a/pack-check.c b/pack-check.c
index 4b089fe8ec0..e6aa4442c90 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -142,7 +142,8 @@ static int verify_packfile(struct repository *r,
 			err = error("cannot unpack %s from %s at offset %"PRIuMAX"",
 				    oid_to_hex(&oid), p->pack_name,
 				    (uintmax_t)entries[i].offset);
-		else if (check_object_signature(r, &oid, data, size, type_name(type)))
+		else if (check_object_signature(r, &oid, data, size,
+						type_name(type), NULL))
 			err = error("packed %s from %s is corrupt",
 				    oid_to_hex(&oid), p->pack_name);
 		else if (fn) {
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 06d38e1fae6..72386cfec0e 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -490,7 +490,7 @@ test_expect_success 'cat-file -t and -s on corrupt loose object' '
 		# Swap the two to corrupt the repository
 		mv -f "$other_path" "$empty_path" &&
 		test_must_fail git fsck 2>err.fsck &&
-		grep "hash mismatch" err.fsck &&
+		grep "hash-path mismatch" err.fsck &&
 
 		# confirm that cat-file is reading the new swapped-in
 		# blob...
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index da2658155c7..7d0d57564b5 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -53,6 +53,7 @@ test_expect_success 'object with hash mismatch' '
 	(
 		cd hash-mismatch &&
 		oid=$(echo blob | git hash-object -w --stdin) &&
+		oldoid=$oid &&
 		old=$(test_oid_to_path "$oid") &&
 		new=$(dirname $old)/$(test_oid ff_2) &&
 		oid="$(dirname $new)$(basename $new)" &&
@@ -62,7 +63,7 @@ test_expect_success 'object with hash mismatch' '
 		cmt=$(echo bogus | git commit-tree $tree) &&
 		git update-ref refs/heads/bogus $cmt &&
 		test_must_fail git fsck 2>out &&
-		test_i18ngrep "$oid.*corrupt" out
+		grep "$oldoid: hash-path mismatch, found at: .*$new" out
 	)
 '
 
@@ -71,6 +72,7 @@ test_expect_success 'object with hash and type mismatch' '
 	(
 		cd hash-type-mismatch &&
 		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		oldoid=$oid &&
 		old=$(test_oid_to_path "$oid") &&
 		new=$(dirname $old)/$(test_oid ff_2) &&
 		oid="$(dirname $new)$(basename $new)" &&
@@ -80,8 +82,8 @@ test_expect_success 'object with hash and type mismatch' '
 		cmt=$(echo bogus | git commit-tree $tree) &&
 		git update-ref refs/heads/bogus $cmt &&
 		test_must_fail git fsck 2>out &&
-		grep "^error: hash mismatch for " out &&
-		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+		grep "^error: $oldoid: hash-path mismatch, found at: .*$new" out &&
+		grep "^error: $oldoid: object is of unknown type '"'"'garbage'"'"'" out
 	)
 '
 
-- 
2.32.0.636.g43e71d69cff


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting
  2021-07-10 13:37         ` [PATCH v5 00/21] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                             ` (20 preceding siblings ...)
  2021-07-10 13:37           ` [PATCH v5 21/21] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:57           ` Ævar Arnfjörð Bjarmason
  2021-09-07 10:57             ` [PATCH v6 01/22] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
                               ` (22 more replies)
  21 siblings, 23 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

This improves fsck error reporting, see the examples in the commit
messages of 19/22, 21/22 and 22/22. To get there I've lib-ified more
thigs in object-file.c and the general object APIs, i.e. now we'll
return error codes instead of calling die() in these cases.

This series has been in "needs review" state for a while. This re-roll
is mainly to bump it for the list's attention, but while I was at it I
addressed point from Jonathan Tan raised in a previous round: use an
enum instead of int for the unpack_loose_header() return value.

I think the v3 of this got a detailed review, and the v3->v4 delta
wasn't that big (although one commit in particular is a bit tricky):

    https://lore.kernel.org/git/cover-00.21-00000000000-20210624T191754Z-avarab@gmail.com/

I.e. it's just this part being substantially different from v3:

    https://lore.kernel.org/git/patch-12.21-3f52149bfde-20210624T191755Z-avarab@gmail.com/
    https://lore.kernel.org/git/patch-13.21-ba632be1520-20210624T191755Z-avarab@gmail.com/
    https://lore.kernel.org/git/patch-14.21-ea4f446f5b1-20210624T191755Z-avarab@gmail.com/

The v5 was then a trivial change of moving away from the
test_create_repo() test helper:
https://lore.kernel.org/git/cover-00.21-00000000000-20210710T133203Z-avarab@gmail.com/

Perhaps this whole thing just needs to be split up, e.g. the first 7
commits could be some "improve fsck tests" series, the middle part
some general small refactoring, and the real meaty work as of 14/22
its own third series...

But I've resisted that because I think while this is rather long the
first parts adding tests for missing behavior should assure reviewers
that the later changes are all properly tested.

I know the end-goal here isn't all that exciting in itself, but I've
really wanted to improve various things around fsck-ing, bad object
reporting etc., and that's been stalled on this first step for a
while.

Ævar Arnfjörð Bjarmason (22):
  fsck tests: refactor one test to use a sub-repo
  fsck tests: add test for fsck-ing an unknown type
  cat-file tests: test for missing object with -t and -s
  cat-file tests: test that --allow-unknown-type isn't on by default
  rev-list tests: test for behavior with invalid object types
  cat-file tests: add corrupt loose object test
  cat-file tests: test for current --allow-unknown-type behavior
  object-file.c: don't set "typep" when returning non-zero
  cache.h: move object functions to object-store.h
  object-file.c: make parse_loose_header_extended() public
  object-file.c: add missing braces to loose_object_info()
  object-file.c: simplify unpack_loose_short_header()
  object-file.c: split up ternary in parse_loose_header()
  object-file.c: stop dying in parse_loose_header()
  object-file.c: guard against future bugs in loose_object_info()
  object-file.c: return -1, not "status" from unpack_loose_header()
  object-file.c: return -2 on "header too long" in unpack_loose_header()
  object-file.c: use "enum" return type for unpack_loose_header()
  fsck: don't hard die on invalid object types
  object-store.h: move read_loose_object() below 'struct object_info'
  fsck: report invalid types recorded in objects
  fsck: report invalid object type-path combinations

 builtin/fast-export.c  |   2 +-
 builtin/fsck.c         |  28 ++++++-
 builtin/index-pack.c   |   2 +-
 builtin/mktag.c        |   3 +-
 cache.h                |  10 ---
 object-file.c          | 178 +++++++++++++++++++++--------------------
 object-store.h         |  75 ++++++++++++++---
 object.c               |   4 +-
 pack-check.c           |   3 +-
 streaming.c            |  29 ++++---
 t/t1006-cat-file.sh    | 169 ++++++++++++++++++++++++++++++++++++++
 t/t1450-fsck.sh        |  64 +++++++++++----
 t/t6115-rev-list-du.sh |  11 +++
 13 files changed, 432 insertions(+), 146 deletions(-)

Range-diff against v5:
 1:  a1259cdedcb =  1:  ebe89f65354 fsck tests: refactor one test to use a sub-repo
 2:  634f991d7c6 =  2:  9072eef3be3 fsck tests: add test for fsck-ing an unknown type
 3:  ce9dcc423e9 =  3:  d442a309178 cat-file tests: test for missing object with -t and -s
 4:  50a20741e86 =  4:  0358273022f cat-file tests: test that --allow-unknown-type isn't on by default
 5:  f8d0b630d0a =  5:  82db40ebf8a rev-list tests: test for behavior with invalid object types
 6:  43335e653b8 =  6:  d1ffd21acc5 cat-file tests: add corrupt loose object test
 7:  a00dfea3fb8 =  7:  22ab12c2282 cat-file tests: test for current --allow-unknown-type behavior
 9:  e9520953956 =  8:  38e4266772d object-file.c: don't set "typep" when returning non-zero
 8:  387d7f08e61 =  9:  5b9278e7bb4 cache.h: move object functions to object-store.h
10:  a8b408eefe6 = 10:  b15ad53414b object-file.c: make parse_loose_header_extended() public
11:  31eee4da0e1 = 11:  326eb74545d object-file.c: add missing braces to loose_object_info()
12:  dae5cfabd57 = 12:  4f829e9b727 object-file.c: simplify unpack_loose_short_header()
13:  0d8385d8d12 = 13:  90489d9e6ec object-file.c: split up ternary in parse_loose_header()
14:  d1522291aee = 14:  7c9819d37c5 object-file.c: stop dying in parse_loose_header()
15:  13d4141a21b = 15:  3fb660ff944 object-file.c: guard against future bugs in loose_object_info()
16:  912c9edf362 = 16:  9e7dbfb4aa3 object-file.c: return -1, not "status" from unpack_loose_header()
17:  7e101f97646 ! 17:  f28c4f0dfb4 object-file.c: return -2 on "header too long" in unpack_loose_header()
    @@ Commit message
         MAX_HEADER_LEN limit, or other negative values for "unable to unpack
         <OID> header".
     
    -    I tried setting up an enum just for these three return values, but I
    -    think the result was less readable. Let's consider doing that if we
    -    gain even more return values. For now let's do the next best thing and
    -    enumerate our known return values, and BUG() if we encounter one we
    -    don't know about.
    -
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## object-file.c ##
 -:  ----------- > 18:  1b7173a5b5b object-file.c: use "enum" return type for unpack_loose_header()
18:  3c04065b0b0 = 19:  ad1614dbb8d fsck: don't hard die on invalid object types
19:  ad920362594 = 20:  3bf3cf2299d object-store.h: move read_loose_object() below 'struct object_info'
20:  02a148af5cf = 21:  974f650cddf fsck: report invalid types recorded in objects
21:  730e0a6f805 ! 22:  804673a17b0 fsck: report invalid object type-path combinations
    @@ object-store.h: int oid_object_info_extended(struct repository *r,
      		      void **contents,
      		      struct object_info *oi,
      		      unsigned int oi_flags);
    -@@ object-store.h: int unpack_loose_header(git_zstream *stream, unsigned char *map,
    +@@ object-store.h: enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
      int parse_loose_header(const char *hdr, struct object_info *oi);
      
      int check_object_signature(struct repository *r, const struct object_id *oid,
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v6 01/22] fsck tests: refactor one test to use a sub-repo
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:57             ` Ævar Arnfjörð Bjarmason
  2021-09-16 19:40               ` Taylor Blau
  2021-09-07 10:57             ` [PATCH v6 02/22] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
                               ` (21 subsequent siblings)
  22 siblings, 1 reply; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Refactor one of the fsck tests to use a throwaway repository. It's a
pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
teardown of a tests so we're not leaving corrupt content for the next
test.

We should instead simply use something like this test_create_repo
pattern. It's both less verbose, and makes things easier to debug as a
failing test can have their state left behind under -d without
damaging the state for other tests.

But let's punt on that general refactoring and just change this one
test, I'm going to change it further in subsequent commits.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 34 ++++++++++++++++------------------
 1 file changed, 16 insertions(+), 18 deletions(-)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 5071ac63a5b..7becab5ba1e 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -48,24 +48,22 @@ remove_object () {
 	rm "$(sha1_file "$1")"
 }
 
-test_expect_success 'object with bad sha1' '
-	sha=$(echo blob | git hash-object -w --stdin) &&
-	old=$(test_oid_to_path "$sha") &&
-	new=$(dirname $old)/$(test_oid ff_2) &&
-	sha="$(dirname $new)$(basename $new)" &&
-	mv .git/objects/$old .git/objects/$new &&
-	test_when_finished "remove_object $sha" &&
-	git update-index --add --cacheinfo 100644 $sha foo &&
-	test_when_finished "git read-tree -u --reset HEAD" &&
-	tree=$(git write-tree) &&
-	test_when_finished "remove_object $tree" &&
-	cmt=$(echo bogus | git commit-tree $tree) &&
-	test_when_finished "remove_object $cmt" &&
-	git update-ref refs/heads/bogus $cmt &&
-	test_when_finished "git update-ref -d refs/heads/bogus" &&
-
-	test_must_fail git fsck 2>out &&
-	test_i18ngrep "$sha.*corrupt" out
+test_expect_success 'object with hash mismatch' '
+	git init --bare hash-mismatch &&
+	(
+		cd hash-mismatch &&
+		oid=$(echo blob | git hash-object -w --stdin) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+		mv objects/$old objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+		test_must_fail git fsck 2>out &&
+		test_i18ngrep "$oid.*corrupt" out
+	)
 '
 
 test_expect_success 'branch pointing to non-commit' '
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v6 02/22] fsck tests: add test for fsck-ing an unknown type
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
  2021-09-07 10:57             ` [PATCH v6 01/22] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:57             ` Ævar Arnfjörð Bjarmason
  2021-09-16 19:51               ` Taylor Blau
  2021-09-07 10:57             ` [PATCH v6 03/22] cat-file tests: test for missing object with -t and -s Ævar Arnfjörð Bjarmason
                               ` (20 subsequent siblings)
  22 siblings, 1 reply; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Fix a blindspot in the fsck tests by checking what we do when we
encounter an unknown "garbage" type produced with hash-object's
--literally option.

This behavior needs to be improved, which'll be done in subsequent
patches, but for now let's test for the current behavior.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1450-fsck.sh | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 7becab5ba1e..f10d6f7b7e8 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -863,4 +863,16 @@ test_expect_success 'detect corrupt index file in fsck' '
 	test_i18ngrep "bad index file" errors
 '
 
+test_expect_success 'fsck hard errors on an invalid object type' '
+	git init --bare garbage-type &&
+	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
+	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type
+	EOF
+	test_must_fail git -C garbage-type fsck >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual
+'
+
 test_done
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v6 03/22] cat-file tests: test for missing object with -t and -s
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
  2021-09-07 10:57             ` [PATCH v6 01/22] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
  2021-09-07 10:57             ` [PATCH v6 02/22] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:57             ` Ævar Arnfjörð Bjarmason
  2021-09-16 19:57               ` Taylor Blau
  2021-09-07 10:57             ` [PATCH v6 04/22] cat-file tests: test that --allow-unknown-type isn't on by default Ævar Arnfjörð Bjarmason
                               ` (19 subsequent siblings)
  22 siblings, 1 reply; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Test for what happens when the -t and -s flags are asked to operate on
a missing object, this extends tests added in 3e370f9faf0 (t1006: add
tests for git cat-file --allow-unknown-type, 2015-05-03). The -t and
-s flags are the only ones that can be combined with
--allow-unknown-type, so let's test with and without that flag.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 18b3779ccb6..3a7b138fe4e 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -315,6 +315,33 @@ test_expect_success '%(deltabase) reports packed delta bases' '
 	}
 '
 
+missing_oid=$(test_oid deadbeef)
+test_expect_success 'error on type of missing object' '
+	cat >expect.err <<-\EOF &&
+	fatal: git cat-file: could not get object info
+	EOF
+	test_must_fail git cat-file -t $missing_oid >out 2>err &&
+	test_must_be_empty out &&
+	test_cmp expect.err err &&
+
+	test_must_fail git cat-file -t --allow-unknown-type $missing_oid >out 2>err &&
+	test_must_be_empty out &&
+	test_cmp expect.err err
+'
+
+test_expect_success 'error on size of missing object' '
+	cat >expect.err <<-\EOF &&
+	fatal: git cat-file: could not get object info
+	EOF
+	test_must_fail git cat-file -s $missing_oid >out 2>err &&
+	test_must_be_empty out &&
+	test_cmp expect.err err &&
+
+	test_must_fail git cat-file -s --allow-unknown-type $missing_oid >out 2>err &&
+	test_must_be_empty out &&
+	test_cmp expect.err err
+'
+
 bogus_type="bogus"
 bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v6 04/22] cat-file tests: test that --allow-unknown-type isn't on by default
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (2 preceding siblings ...)
  2021-09-07 10:57             ` [PATCH v6 03/22] cat-file tests: test for missing object with -t and -s Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:57             ` Ævar Arnfjörð Bjarmason
  2021-09-07 10:58             ` [PATCH v6 05/22] rev-list tests: test for behavior with invalid object types Ævar Arnfjörð Bjarmason
                               ` (18 subsequent siblings)
  22 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:57 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Fix a blindspot in the tests for the --allow-unknown-type feature
added in 39e4ae38804 (cat-file: teach cat-file a
'--allow-unknown-type' option, 2015-05-03). We should check that
--allow-unknown-type isn't on by default.

Before this change all the tests would succeed if --allow-unknown-type
was on by default, let's fix that by asserting that -t and -s die on a
"garbage" type without --allow-unknown-type.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 3a7b138fe4e..5e05ea0861e 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -347,6 +347,20 @@ bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
 bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
 
+test_expect_success 'die on broken object under -t and -s without --allow-unknown-type' '
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type
+	EOF
+
+	test_must_fail git cat-file -t $bogus_sha1 >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual &&
+
+	test_must_fail git cat-file -s $bogus_sha1 >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual
+'
+
 test_expect_success "Type of broken object is correct" '
 	echo $bogus_type >expect &&
 	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
@@ -363,6 +377,21 @@ bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
 bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_type --literally -w --stdin)
 
+test_expect_success 'die on broken object with large type under -t and -s without --allow-unknown-type' '
+	cat >err.expect <<-EOF &&
+	error: unable to unpack $bogus_sha1 header
+	fatal: git cat-file: could not get object info
+	EOF
+
+	test_must_fail git cat-file -t $bogus_sha1 >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual &&
+
+	test_must_fail git cat-file -s $bogus_sha1 >out.actual 2>err.actual &&
+	test_cmp err.expect err.actual &&
+	test_must_be_empty out.actual
+'
+
 test_expect_success "Type of broken object is correct when type is large" '
 	echo $bogus_type >expect &&
 	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v6 05/22] rev-list tests: test for behavior with invalid object types
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (3 preceding siblings ...)
  2021-09-07 10:57             ` [PATCH v6 04/22] cat-file tests: test that --allow-unknown-type isn't on by default Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-16 20:40               ` Taylor Blau
  2021-09-07 10:58             ` [PATCH v6 06/22] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
                               ` (17 subsequent siblings)
  22 siblings, 1 reply; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Fix a blindspot in the tests for the "rev-list --disk-usage" feature
added in 16950f8384a (rev-list: add --disk-usage option for
calculating disk usage, 2021-02-09) to test for what happens when it's
asked to calculate the disk usage of invalid object types.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t6115-rev-list-du.sh | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/t/t6115-rev-list-du.sh b/t/t6115-rev-list-du.sh
index b4aef32b713..edb2ed55846 100755
--- a/t/t6115-rev-list-du.sh
+++ b/t/t6115-rev-list-du.sh
@@ -48,4 +48,15 @@ check_du HEAD
 check_du --objects HEAD
 check_du --objects HEAD^..HEAD
 
+test_expect_success 'setup garbage repository' '
+	git clone --bare . garbage.git &&
+	garbage_oid=$(git -C garbage.git hash-object -t garbage -w --stdin --literally <one.t) &&
+	git -C garbage.git rev-list --objects --all --disk-usage &&
+
+	# Manually create a ref because "update-ref", "tag" etc. have
+	# no corresponding --literally option.
+	echo $garbage_oid >garbage.git/refs/tags/garbage-tag &&
+	test_must_fail git -C garbage.git rev-list --objects --all --disk-usage
+'
+
 test_done
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v6 06/22] cat-file tests: add corrupt loose object test
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (4 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 05/22] rev-list tests: test for behavior with invalid object types Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-07 10:58             ` [PATCH v6 07/22] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
                               ` (16 subsequent siblings)
  22 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Fix a blindspot in the tests for "cat-file" (and by proxy, the guts of
object-file.c) by testing that when we can't decode a loose object
with zlib we'll emit an error from zlib.c.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 52 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 52 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 5e05ea0861e..8f3516db188 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -404,6 +404,58 @@ test_expect_success "Size of large broken object is correct when type is large"
 	test_cmp expect actual
 '
 
+test_expect_success 'cat-file -t and -s on corrupt loose object' '
+	git init --bare corrupt-loose.git &&
+	(
+		cd corrupt-loose.git &&
+
+		# Setup and create the empty blob and its path
+		empty_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$EMPTY_BLOB")) &&
+		git hash-object -w --stdin </dev/null &&
+
+		# Create another blob and its path
+		echo other >other.blob &&
+		other_blob=$(git hash-object -w --stdin <other.blob) &&
+		other_path=$(git rev-parse --git-path objects/$(test_oid_to_path "$other_blob")) &&
+
+		# Before the swap the size is 0
+		cat >out.expect <<-EOF &&
+		0
+		EOF
+		git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# Swap the two to corrupt the repository
+		mv -f "$other_path" "$empty_path" &&
+		test_must_fail git fsck 2>err.fsck &&
+		grep "hash mismatch" err.fsck &&
+
+		# confirm that cat-file is reading the new swapped-in
+		# blob...
+		cat >out.expect <<-EOF &&
+		blob
+		EOF
+		git cat-file -t "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# ... since it has a different size now.
+		cat >out.expect <<-EOF &&
+		6
+		EOF
+		git cat-file -s "$EMPTY_BLOB" >out.actual 2>err.actual &&
+		test_must_be_empty err.actual &&
+		test_cmp out.expect out.actual &&
+
+		# So far "cat-file" has been happy to spew the found
+		# content out as-is. Try to make it zlib-invalid.
+		mv -f other.blob "$empty_path" &&
+		test_must_fail git fsck 2>err.fsck &&
+		grep "^error: inflate: data stream error (" err.fsck
+	)
+'
+
 # Tests for git cat-file --follow-symlinks
 test_expect_success 'prep for symlink tests' '
 	echo_without_newline "$hello_content" >morx &&
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v6 07/22] cat-file tests: test for current --allow-unknown-type behavior
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (5 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 06/22] cat-file tests: add corrupt loose object test Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-07 10:58             ` [PATCH v6 08/22] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
                               ` (15 subsequent siblings)
  22 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Add more tests for the current --allow-unknown-type behavior. As noted
in [1] I don't think much of this makes sense, but let's test for it
as-is so we can see if the behavior changes in the future.

1. https://lore.kernel.org/git/87r1i4qf4h.fsf@evledraar.gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t1006-cat-file.sh | 61 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 8f3516db188..98729f1edfc 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -361,6 +361,46 @@ test_expect_success 'die on broken object under -t and -s without --allow-unknow
 	test_must_be_empty out.actual
 '
 
+test_expect_success '-e is OK with a broken object without --allow-unknown-type' '
+	git cat-file -e $bogus_sha1
+'
+
+test_expect_success '-e can not be combined with --allow-unknown-type' '
+	test_expect_code 128 git cat-file -e --allow-unknown-type $bogus_sha1
+'
+
+test_expect_success '-p cannot print a broken object even with --allow-unknown-type' '
+	test_must_fail git cat-file -p $bogus_sha1 &&
+	test_expect_code 128 git cat-file -p --allow-unknown-type $bogus_sha1
+'
+
+test_expect_success '<type> <hash> does not work with objects of broken types' '
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type "bogus"
+	EOF
+	test_must_fail git cat-file $bogus_type $bogus_sha1 2>err.actual &&
+	test_cmp err.expect err.actual
+'
+
+test_expect_success 'broken types combined with --batch and --batch-check' '
+	echo $bogus_sha1 >bogus-oid &&
+
+	cat >err.expect <<-\EOF &&
+	fatal: invalid object type
+	EOF
+
+	test_must_fail git cat-file --batch <bogus-oid 2>err.actual &&
+	test_cmp err.expect err.actual &&
+
+	test_must_fail git cat-file --batch-check <bogus-oid 2>err.actual &&
+	test_cmp err.expect err.actual
+'
+
+test_expect_success 'the --batch and --batch-check options do not combine with --allow-unknown-type' '
+	test_expect_code 128 git cat-file --batch --allow-unknown-type <bogus-oid &&
+	test_expect_code 128 git cat-file --batch-check --allow-unknown-type <bogus-oid
+'
+
 test_expect_success "Type of broken object is correct" '
 	echo $bogus_type >expect &&
 	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
@@ -372,6 +412,27 @@ test_expect_success "Size of broken object is correct" '
 	git cat-file -s --allow-unknown-type $bogus_sha1 >actual &&
 	test_cmp expect actual
 '
+
+test_expect_success 'the --allow-unknown-type option does not consider replacement refs' '
+	cat >expect <<-EOF &&
+	$bogus_type
+	EOF
+	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+	test_cmp expect actual &&
+
+	# Create it manually, as "git replace" will die on bogus
+	# types.
+	head=$(git rev-parse --verify HEAD) &&
+	mkdir -p .git/refs/replace &&
+	echo $head >.git/refs/replace/$bogus_sha1 &&
+
+	cat >expect <<-EOF &&
+	commit
+	EOF
+	git cat-file -t --allow-unknown-type $bogus_sha1 >actual &&
+	test_cmp expect actual
+'
+
 bogus_type="abcdefghijklmnopqrstuvwxyz1234679"
 bogus_content="bogus"
 bogus_size=$(strlen "$bogus_content")
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v6 08/22] object-file.c: don't set "typep" when returning non-zero
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (6 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 07/22] cat-file tests: test for current --allow-unknown-type behavior Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-16 21:29               ` Taylor Blau
  2021-09-07 10:58             ` [PATCH v6 09/22] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
                               ` (14 subsequent siblings)
  22 siblings, 1 reply; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

When the loose_object_info() function returns an error stop faking up
the "oi->typep" to OBJ_BAD. Let the return value of the function
itself suffice. This code cleanup simplifies subsequent changes.

That we set this at all is a relic from the past. Before
052fe5eaca9 (sha1_loose_object_info: make type lookup optional,
2013-07-12) we would always return the type_from_string(type) via the
parse_sha1_header() function, or -1 (i.e. OBJ_BAD) if we couldn't
parse it.

Then in a combination of 46f034483eb (sha1_file: support reading from
a loose object of unknown type, 2015-05-03) and
b3ea7dd32d6 (sha1_loose_object_info: handle errors from
unpack_sha1_rest, 2017-10-05) our API drifted even further towards
conflating the two again.

Having read the code paths involved carefully I think this is OK. We
are just about to return -1, and we have only one caller:
do_oid_object_info_extended(). That function will in turn go on to
return -1 when we return -1 here.

This might be introducing a subtle bug where a caller of
oid_object_info_extended() would inspect its "typep" and expect a
meaningful value if the function returned -1.

Such a problem would not occur for its simpler oid_object_info()
sister function. That one always returns the "enum object_type", which
in the case of -1 would be the OBJ_BAD.

Having read the code for all the callers of these functions I don't
believe any such bug is being introduced here, and in any case we'd
likely already have such a bug for the "sizep" member (although
blindly checking "typep" first would be a more common case).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/object-file.c b/object-file.c
index a8be8994814..bda3497d5ca 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1503,8 +1503,6 @@ static int loose_object_info(struct repository *r,
 		git_inflate_end(&stream);
 
 	munmap(map, mapsize);
-	if (status && oi->typep)
-		*oi->typep = status;
 	if (oi->sizep == &size_scratch)
 		oi->sizep = NULL;
 	strbuf_release(&hdrbuf);
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v6 09/22] cache.h: move object functions to object-store.h
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (7 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 08/22] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-16 21:33               ` Taylor Blau
  2021-09-07 10:58             ` [PATCH v6 10/22] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
                               ` (13 subsequent siblings)
  22 siblings, 1 reply; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Move the declaration of some ancient object functions added in
e.g. c4483576b8d (Add "unpack_sha1_header()" helper function,
2005-06-01) from cache.h to object-store.h. This continues work
started in cbd53a2193d (object-store: move object access functions to
object-store.h, 2018-05-15).

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 cache.h        | 10 ----------
 object-store.h |  9 +++++++++
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/cache.h b/cache.h
index d23de693680..11a04a93436 100644
--- a/cache.h
+++ b/cache.h
@@ -1313,16 +1313,6 @@ char *xdg_cache_home(const char *filename);
 
 int git_open_cloexec(const char *name, int flags);
 #define git_open(name) git_open_cloexec(name, O_RDONLY)
-int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
-
-int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *buf, unsigned long size, const char *type);
-
-int finalize_object_file(const char *tmpfile, const char *filename);
-
-/* Helper to check and "touch" a file */
-int check_and_freshen_file(const char *fn, int freshen);
 
 extern const signed char hexval_table[256];
 static inline unsigned int hexval(unsigned char c)
diff --git a/object-store.h b/object-store.h
index d24915ced1b..eb4876ec983 100644
--- a/object-store.h
+++ b/object-store.h
@@ -485,4 +485,13 @@ int for_each_object_in_pack(struct packed_git *p,
 int for_each_packed_object(each_packed_object_fn, void *,
 			   enum for_each_object_flags flags);
 
+int unpack_loose_header(git_zstream *stream, unsigned char *map,
+			unsigned long mapsize, void *buffer,
+			unsigned long bufsiz);
+int parse_loose_header(const char *hdr, unsigned long *sizep);
+int check_object_signature(struct repository *r, const struct object_id *oid,
+			   void *buf, unsigned long size, const char *type);
+int finalize_object_file(const char *tmpfile, const char *filename);
+int check_and_freshen_file(const char *fn, int freshen);
+
 #endif /* OBJECT_STORE_H */
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v6 10/22] object-file.c: make parse_loose_header_extended() public
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (8 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 09/22] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-16 21:39               ` Taylor Blau
  2021-09-07 10:58             ` [PATCH v6 11/22] object-file.c: add missing braces to loose_object_info() Ævar Arnfjörð Bjarmason
                               ` (12 subsequent siblings)
  22 siblings, 1 reply; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Make the parse_loose_header_extended() function public and remove the
parse_loose_header() wrapper. The only direct user of it outside of
object-file.c itself was in streaming.c, that caller can simply pass
the required "struct object-info *" instead.

This change is being done in preparation for teaching
read_loose_object() to accept a flag to pass to
parse_loose_header(). It isn't strictly necessary for that change, we
could simply use parse_loose_header_extended() there, but will leave
the API in a better end state.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c  | 21 ++++++++-------------
 object-store.h |  3 ++-
 streaming.c    |  5 ++++-
 3 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/object-file.c b/object-file.c
index bda3497d5ca..7a47af68bd8 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1363,8 +1363,9 @@ static void *unpack_loose_rest(git_zstream *stream,
  * too permissive for what we want to check. So do an anal
  * object header parse by hand.
  */
-static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
-				       unsigned int flags)
+int parse_loose_header(const char *hdr,
+		       struct object_info *oi,
+		       unsigned int flags)
 {
 	const char *type_buf = hdr;
 	unsigned long size;
@@ -1424,14 +1425,6 @@ static int parse_loose_header_extended(const char *hdr, struct object_info *oi,
 	return *hdr ? -1 : type;
 }
 
-int parse_loose_header(const char *hdr, unsigned long *sizep)
-{
-	struct object_info oi = OBJECT_INFO_INIT;
-
-	oi.sizep = sizep;
-	return parse_loose_header_extended(hdr, &oi, 0);
-}
-
 static int loose_object_info(struct repository *r,
 			     const struct object_id *oid,
 			     struct object_info *oi, int flags)
@@ -1486,10 +1479,10 @@ static int loose_object_info(struct repository *r,
 	if (status < 0)
 		; /* Do nothing */
 	else if (hdrbuf.len) {
-		if ((status = parse_loose_header_extended(hdrbuf.buf, oi, flags)) < 0)
+		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header_extended(hdr, oi, flags)) < 0)
+	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
 
 	if (status >= 0 && oi->contentp) {
@@ -2573,6 +2566,8 @@ int read_loose_object(const char *path,
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
+	struct object_info oi = OBJECT_INFO_INIT;
+	oi.sizep = size;
 
 	*contents = NULL;
 
@@ -2587,7 +2582,7 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, size);
+	*type = parse_loose_header(hdr, &oi, 0);
 	if (*type < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
diff --git a/object-store.h b/object-store.h
index eb4876ec983..25e641a606f 100644
--- a/object-store.h
+++ b/object-store.h
@@ -488,7 +488,8 @@ int for_each_packed_object(each_packed_object_fn, void *,
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
 			unsigned long bufsiz);
-int parse_loose_header(const char *hdr, unsigned long *sizep);
+int parse_loose_header(const char *hdr, struct object_info *oi,
+		       unsigned int flags);
 int check_object_signature(struct repository *r, const struct object_id *oid,
 			   void *buf, unsigned long size, const char *type);
 int finalize_object_file(const char *tmpfile, const char *filename);
diff --git a/streaming.c b/streaming.c
index 5f480ad50c4..8beac62cbb7 100644
--- a/streaming.c
+++ b/streaming.c
@@ -223,6 +223,9 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 			      const struct object_id *oid,
 			      enum object_type *type)
 {
+	struct object_info oi = OBJECT_INFO_INIT;
+	oi.sizep = &st->size;
+
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
 		return -1;
@@ -231,7 +234,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 				 st->u.loose.mapsize,
 				 st->u.loose.hdr,
 				 sizeof(st->u.loose.hdr)) < 0) ||
-	    (parse_loose_header(st->u.loose.hdr, &st->size) < 0)) {
+	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
 		return -1;
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v6 11/22] object-file.c: add missing braces to loose_object_info()
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (9 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 10/22] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-07 10:58             ` [PATCH v6 12/22] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
                               ` (11 subsequent siblings)
  22 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Change the formatting in loose_object_info() to conform with our usual
coding style:

    When there are multiple arms to a conditional and some of them
    require braces, enclose even a single line block in braces for
    consistency -- Documentation/CodingGuidelines

This formatting-only change makes a subsequent commit easier to read.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 16 ++++++++++------
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/object-file.c b/object-file.c
index 7a47af68bd8..878a4298c9b 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1473,17 +1473,20 @@ static int loose_object_info(struct repository *r,
 		if (unpack_loose_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
 			status = error(_("unable to unpack %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0)
+	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
-	if (status < 0)
-		; /* Do nothing */
-	else if (hdrbuf.len) {
+	}
+
+	if (status < 0) {
+		/* Do nothing */
+	} else if (hdrbuf.len) {
 		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
 			status = error(_("unable to parse %s header with --allow-unknown-type"),
 				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0)
+	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0) {
 		status = error(_("unable to parse %s header"), oid_to_hex(oid));
+	}
 
 	if (status >= 0 && oi->contentp) {
 		*oi->contentp = unpack_loose_rest(&stream, hdr,
@@ -1492,8 +1495,9 @@ static int loose_object_info(struct repository *r,
 			git_inflate_end(&stream);
 			status = -1;
 		}
-	} else
+	} else {
 		git_inflate_end(&stream);
+	}
 
 	munmap(map, mapsize);
 	if (oi->sizep == &size_scratch)
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v6 12/22] object-file.c: simplify unpack_loose_short_header()
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (10 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 11/22] object-file.c: add missing braces to loose_object_info() Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-07 10:58             ` [PATCH v6 13/22] object-file.c: split up ternary in parse_loose_header() Ævar Arnfjörð Bjarmason
                               ` (10 subsequent siblings)
  22 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Combine the unpack_loose_short_header(),
unpack_loose_header_to_strbuf() and unpack_loose_header() functions
into one.

The unpack_loose_header_to_strbuf() function was added in
46f034483eb (sha1_file: support reading from a loose object of unknown
type, 2015-05-03).

Its code was mostly copy/pasted between it and both of
unpack_loose_header() and unpack_loose_short_header(). We now have a
single unpack_loose_header() function which accepts an optional
"struct strbuf *" instead.

I think the remaining unpack_loose_header() function could be further
simplified, we're carrying some complexity just to be able to emit a
garbage type longer than MAX_HEADER_LEN, we could alternatively just
say "we found a garbage type <first 32 bytes>..." instead. But let's
leave the current behavior in place for now.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c  | 60 ++++++++++++++++++--------------------------------
 object-store.h | 14 +++++++++++-
 streaming.c    |  3 ++-
 3 files changed, 37 insertions(+), 40 deletions(-)

diff --git a/object-file.c b/object-file.c
index 878a4298c9b..2dd4cdd1ae0 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1233,11 +1233,12 @@ void *map_loose_object(struct repository *r,
 	return map_loose_object_1(r, NULL, oid, size);
 }
 
-static int unpack_loose_short_header(git_zstream *stream,
-				     unsigned char *map, unsigned long mapsize,
-				     void *buffer, unsigned long bufsiz)
+int unpack_loose_header(git_zstream *stream,
+			unsigned char *map, unsigned long mapsize,
+			void *buffer, unsigned long bufsiz,
+			struct strbuf *header)
 {
-	int ret;
+	int status;
 
 	/* Get the data stream */
 	memset(stream, 0, sizeof(*stream));
@@ -1248,44 +1249,25 @@ static int unpack_loose_short_header(git_zstream *stream,
 
 	git_inflate_init(stream);
 	obj_read_unlock();
-	ret = git_inflate(stream, 0);
+	status = git_inflate(stream, 0);
 	obj_read_lock();
-
-	return ret;
-}
-
-int unpack_loose_header(git_zstream *stream,
-			unsigned char *map, unsigned long mapsize,
-			void *buffer, unsigned long bufsiz)
-{
-	int status = unpack_loose_short_header(stream, map, mapsize,
-					       buffer, bufsiz);
-
 	if (status < Z_OK)
 		return status;
 
-	/* Make sure we have the terminating NUL */
-	if (!memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
-		return -1;
-	return 0;
-}
-
-static int unpack_loose_header_to_strbuf(git_zstream *stream, unsigned char *map,
-					 unsigned long mapsize, void *buffer,
-					 unsigned long bufsiz, struct strbuf *header)
-{
-	int status;
-
-	status = unpack_loose_short_header(stream, map, mapsize, buffer, bufsiz);
-	if (status < Z_OK)
-		return -1;
-
 	/*
 	 * Check if entire header is unpacked in the first iteration.
 	 */
 	if (memchr(buffer, '\0', stream->next_out - (unsigned char *)buffer))
 		return 0;
 
+	/*
+	 * We have a header longer than MAX_HEADER_LEN. The "header"
+	 * here is only non-NULL when we run "cat-file
+	 * --allow-unknown-type".
+	 */
+	if (!header)
+		return -1;
+
 	/*
 	 * buffer[0..bufsiz] was not large enough.  Copy the partial
 	 * result out to header, and then append the result of further
@@ -1433,9 +1415,11 @@ static int loose_object_info(struct repository *r,
 	unsigned long mapsize;
 	void *map;
 	git_zstream stream;
+	int hdr_ret;
 	char hdr[MAX_HEADER_LEN];
 	struct strbuf hdrbuf = STRBUF_INIT;
 	unsigned long size_scratch;
+	int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 
 	if (oi->delta_base_oid)
 		oidclr(oi->delta_base_oid);
@@ -1469,11 +1453,10 @@ static int loose_object_info(struct repository *r,
 
 	if (oi->disk_sizep)
 		*oi->disk_sizep = mapsize;
-	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE)) {
-		if (unpack_loose_header_to_strbuf(&stream, map, mapsize, hdr, sizeof(hdr), &hdrbuf) < 0)
-			status = error(_("unable to unpack %s header with --allow-unknown-type"),
-				       oid_to_hex(oid));
-	} else if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
+
+	hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+				      allow_unknown ? &hdrbuf : NULL);
+	if (hdr_ret < 0) {
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
 	}
@@ -2581,7 +2564,8 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr)) < 0) {
+	if (unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
+				NULL) < 0) {
 		error(_("unable to unpack header of %s"), path);
 		goto out;
 	}
diff --git a/object-store.h b/object-store.h
index 25e641a606f..4064710ae29 100644
--- a/object-store.h
+++ b/object-store.h
@@ -485,9 +485,21 @@ int for_each_object_in_pack(struct packed_git *p,
 int for_each_packed_object(each_packed_object_fn, void *,
 			   enum for_each_object_flags flags);
 
+/**
+ * unpack_loose_header() initializes the data stream needed to unpack
+ * a loose object header.
+ *
+ * Returns 0 on success. Returns negative values on error.
+ *
+ * It will only parse up to MAX_HEADER_LEN bytes unless an optional
+ * "hdrbuf" argument is non-NULL. This is intended for use with
+ * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
+ * reporting. The full header will be extracted to "hdrbuf" for use
+ * with parse_loose_header().
+ */
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
-			unsigned long bufsiz);
+			unsigned long bufsiz, struct strbuf *hdrbuf);
 int parse_loose_header(const char *hdr, struct object_info *oi,
 		       unsigned int flags);
 int check_object_signature(struct repository *r, const struct object_id *oid,
diff --git a/streaming.c b/streaming.c
index 8beac62cbb7..cb3c3cf6ff6 100644
--- a/streaming.c
+++ b/streaming.c
@@ -233,7 +233,8 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 				 st->u.loose.mapped,
 				 st->u.loose.mapsize,
 				 st->u.loose.hdr,
-				 sizeof(st->u.loose.hdr)) < 0) ||
+				 sizeof(st->u.loose.hdr),
+				 NULL) < 0) ||
 	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v6 13/22] object-file.c: split up ternary in parse_loose_header()
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (11 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 12/22] object-file.c: simplify unpack_loose_short_header() Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-16 21:58               ` Taylor Blau
  2021-09-07 10:58             ` [PATCH v6 14/22] object-file.c: stop dying " Ævar Arnfjörð Bjarmason
                               ` (9 subsequent siblings)
  22 siblings, 1 reply; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

This minor formatting change serves to make a subsequent patch easier
to read.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/object-file.c b/object-file.c
index 2dd4cdd1ae0..7c6a865a6c0 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1404,7 +1404,10 @@ int parse_loose_header(const char *hdr,
 	/*
 	 * The length must be followed by a zero byte
 	 */
-	return *hdr ? -1 : type;
+	if (*hdr)
+		return -1;
+
+	return type;
 }
 
 static int loose_object_info(struct repository *r,
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v6 14/22] object-file.c: stop dying in parse_loose_header()
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (12 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 13/22] object-file.c: split up ternary in parse_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-17  2:32               ` Taylor Blau
  2021-09-07 10:58             ` [PATCH v6 15/22] object-file.c: guard against future bugs in loose_object_info() Ævar Arnfjörð Bjarmason
                               ` (8 subsequent siblings)
  22 siblings, 1 reply; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Start the libification of parse_loose_header() by making it return
error codes and data instead of invoking die() by itself. For now
we'll move the relevant die() call to loose_object_info() and
read_loose_object() to keep this change smaller, but in subsequent
commits we'll also libify those.

Since the refactoring of parse_loose_header_extended() into
parse_loose_header() in an earlier commit, its interface accepts a
"unsigned long *sizep". Rather it accepts a "struct object_info *",
that structure will be populated with information about the object.

It thus makes sense to further libify the interface so that it stops
calling die() when it encounters OBJ_BAD, and instead rely on its
callers to check the populated "oi->typep".

Because of this we don't need to pass in the "unsigned int flags"
which we used for OBJECT_INFO_ALLOW_UNKNOWN_TYPE, we can instead do
that check in loose_object_info().

This also refactors some confusing control flow around the "status"
variable. In some cases we set it to the return value of "error()",
i.e. -1, and later checked if "status < 0" was true.

In another case added in c84a1f3ed4d (sha1_file: refactor read_object,
2017-06-21) (but the behavior pre-dated that) we did checks of "status
>= 0", because at that point "status" had become the return value of
parse_loose_header(). I.e. a non-negative "enum object_type" (unless
we -1, aka. OBJ_BAD).

Now that parse_loose_header() will return 0 on success instead of the
type (which it'll stick into the "struct object_info") we don't need
to conflate these two cases in its callers.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c  | 53 ++++++++++++++++++++++++++------------------------
 object-store.h | 13 +++++++++++--
 streaming.c    |  4 +++-
 3 files changed, 42 insertions(+), 28 deletions(-)

diff --git a/object-file.c b/object-file.c
index 7c6a865a6c0..d656960422d 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1345,9 +1345,7 @@ static void *unpack_loose_rest(git_zstream *stream,
  * too permissive for what we want to check. So do an anal
  * object header parse by hand.
  */
-int parse_loose_header(const char *hdr,
-		       struct object_info *oi,
-		       unsigned int flags)
+int parse_loose_header(const char *hdr, struct object_info *oi)
 {
 	const char *type_buf = hdr;
 	unsigned long size;
@@ -1369,15 +1367,6 @@ int parse_loose_header(const char *hdr,
 	type = type_from_string_gently(type_buf, type_len, 1);
 	if (oi->type_name)
 		strbuf_add(oi->type_name, type_buf, type_len);
-	/*
-	 * Set type to 0 if its an unknown object and
-	 * we're obtaining the type using '--allow-unknown-type'
-	 * option.
-	 */
-	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
-		type = 0;
-	else if (type < 0)
-		die(_("invalid object type"));
 	if (oi->typep)
 		*oi->typep = type;
 
@@ -1407,7 +1396,11 @@ int parse_loose_header(const char *hdr,
 	if (*hdr)
 		return -1;
 
-	return type;
+	/*
+	 * The format is valid, but the type may still be bogus. The
+	 * Caller needs to check its oi->typep.
+	 */
+	return 0;
 }
 
 static int loose_object_info(struct repository *r,
@@ -1422,6 +1415,8 @@ static int loose_object_info(struct repository *r,
 	char hdr[MAX_HEADER_LEN];
 	struct strbuf hdrbuf = STRBUF_INIT;
 	unsigned long size_scratch;
+	enum object_type type_scratch;
+	int parsed_header = 0;
 	int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 
 	if (oi->delta_base_oid)
@@ -1453,6 +1448,8 @@ static int loose_object_info(struct repository *r,
 
 	if (!oi->sizep)
 		oi->sizep = &size_scratch;
+	if (!oi->typep)
+		oi->typep = &type_scratch;
 
 	if (oi->disk_sizep)
 		*oi->disk_sizep = mapsize;
@@ -1463,18 +1460,20 @@ static int loose_object_info(struct repository *r,
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
 	}
-
-	if (status < 0) {
-		/* Do nothing */
-	} else if (hdrbuf.len) {
-		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
-			status = error(_("unable to parse %s header with --allow-unknown-type"),
-				       oid_to_hex(oid));
-	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0) {
-		status = error(_("unable to parse %s header"), oid_to_hex(oid));
+	if (!status) {
+		if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
+			/*
+			 * oi->{sizep,typep} are meaningless unless
+			 * parse_loose_header() returns >= 0.
+			 */
+			parsed_header = 1;
+		else
+			status = error(_("unable to parse %s header"), oid_to_hex(oid));
 	}
+	if (!allow_unknown && parsed_header && *oi->typep < 0)
+		die(_("invalid object type"));
 
-	if (status >= 0 && oi->contentp) {
+	if (parsed_header && oi->contentp) {
 		*oi->contentp = unpack_loose_rest(&stream, hdr,
 						  *oi->sizep, oid);
 		if (!*oi->contentp) {
@@ -1489,6 +1488,8 @@ static int loose_object_info(struct repository *r,
 	if (oi->sizep == &size_scratch)
 		oi->sizep = NULL;
 	strbuf_release(&hdrbuf);
+	if (oi->typep == &type_scratch)
+		oi->typep = NULL;
 	oi->whence = OI_LOOSE;
 	return (status < 0) ? status : 0;
 }
@@ -2557,6 +2558,7 @@ int read_loose_object(const char *path,
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
 	struct object_info oi = OBJECT_INFO_INIT;
+	oi.typep = type;
 	oi.sizep = size;
 
 	*contents = NULL;
@@ -2573,12 +2575,13 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	*type = parse_loose_header(hdr, &oi, 0);
-	if (*type < 0) {
+	if (parse_loose_header(hdr, &oi) < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
 		goto out;
 	}
+	if (*type < 0)
+		die(_("invalid object type"));
 
 	if (*type == OBJ_BLOB && *size > big_file_threshold) {
 		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
diff --git a/object-store.h b/object-store.h
index 4064710ae29..584bf5556af 100644
--- a/object-store.h
+++ b/object-store.h
@@ -500,8 +500,17 @@ int for_each_packed_object(each_packed_object_fn, void *,
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
 			unsigned long bufsiz, struct strbuf *hdrbuf);
-int parse_loose_header(const char *hdr, struct object_info *oi,
-		       unsigned int flags);
+
+/**
+ * parse_loose_header() parses the starting "<type> <len>\0" of an
+ * object. If it doesn't follow that format -1 is returned. To check
+ * the validity of the <type> populate the "typep" in the "struct
+ * object_info". It will be OBJ_BAD if the object type is unknown. The
+ * parsed <len> can be retrieved via "oi->sizep", and from there
+ * passed to unpack_loose_rest().
+ */
+int parse_loose_header(const char *hdr, struct object_info *oi);
+
 int check_object_signature(struct repository *r, const struct object_id *oid,
 			   void *buf, unsigned long size, const char *type);
 int finalize_object_file(const char *tmpfile, const char *filename);
diff --git a/streaming.c b/streaming.c
index cb3c3cf6ff6..c3dc241d6a5 100644
--- a/streaming.c
+++ b/streaming.c
@@ -225,6 +225,7 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 {
 	struct object_info oi = OBJECT_INFO_INIT;
 	oi.sizep = &st->size;
+	oi.typep = type;
 
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
@@ -235,7 +236,8 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 				 st->u.loose.hdr,
 				 sizeof(st->u.loose.hdr),
 				 NULL) < 0) ||
-	    (parse_loose_header(st->u.loose.hdr, &oi, 0) < 0)) {
+	    (parse_loose_header(st->u.loose.hdr, &oi) < 0) ||
+	    *type < 0) {
 		git_inflate_end(&st->z);
 		munmap(st->u.loose.mapped, st->u.loose.mapsize);
 		return -1;
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v6 15/22] object-file.c: guard against future bugs in loose_object_info()
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (13 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 14/22] object-file.c: stop dying " Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-17  2:35               ` Taylor Blau
  2021-09-07 10:58             ` [PATCH v6 16/22] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
                               ` (7 subsequent siblings)
  22 siblings, 1 reply; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

An earlier version of the preceding commit had a subtle bug where our
"type_scratch" (later assigned to "oi->typep") would be uninitialized
and used in the "!allow_unknown" case, at which point it would contain
a nonsensical value if we'd failed to call parse_loose_header().

The preceding commit introduced "parsed_header" variable to check for
this case, but I think we can do better, let's carry a "oi_header"
variable initially set to NULL, and only set it to "oi" once we're
past parse_loose_header().

This is functionally the same thing, but hopefully makes it even more
obvious in the future that we must not access the "typep" and
"sizep" (or "type_name") unless parse_loose_header() succeeds, but
that accessing other fields set earlier (such as the "disk_sizep" set
earlier) is OK.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/object-file.c b/object-file.c
index d656960422d..ae6a37ab5fb 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1416,7 +1416,7 @@ static int loose_object_info(struct repository *r,
 	struct strbuf hdrbuf = STRBUF_INIT;
 	unsigned long size_scratch;
 	enum object_type type_scratch;
-	int parsed_header = 0;
+	struct object_info *oi_header = NULL;
 	int allow_unknown = flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 
 	if (oi->delta_base_oid)
@@ -1464,18 +1464,20 @@ static int loose_object_info(struct repository *r,
 		if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
 			/*
 			 * oi->{sizep,typep} are meaningless unless
-			 * parse_loose_header() returns >= 0.
+			 * parse_loose_header() returns >= 0. Let's
+			 * access them as "oi_header" (just an alias
+			 * for "oi") below to make that intent clear.
 			 */
-			parsed_header = 1;
+			oi_header = oi;
 		else
 			status = error(_("unable to parse %s header"), oid_to_hex(oid));
 	}
-	if (!allow_unknown && parsed_header && *oi->typep < 0)
+	if (!allow_unknown && oi_header && *oi_header->typep < 0)
 		die(_("invalid object type"));
 
-	if (parsed_header && oi->contentp) {
+	if (oi_header && oi->contentp) {
 		*oi->contentp = unpack_loose_rest(&stream, hdr,
-						  *oi->sizep, oid);
+						  *oi_header->sizep, oid);
 		if (!*oi->contentp) {
 			git_inflate_end(&stream);
 			status = -1;
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v6 16/22] object-file.c: return -1, not "status" from unpack_loose_header()
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (14 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 15/22] object-file.c: guard against future bugs in loose_object_info() Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-07 10:58             ` [PATCH v6 17/22] object-file.c: return -2 on "header too long" in unpack_loose_header() Ævar Arnfjörð Bjarmason
                               ` (6 subsequent siblings)
  22 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Return a -1 when git_inflate() fails instead of whatever Z_* status
we'd get from zlib.c. This makes no difference to any error we report,
but makes it more obvious that we don't care about the specific zlib
error codes here.

See d21f8426907 (unpack_sha1_header(): detect malformed object header,
2016-09-25) for the commit that added the "return status" code. As far
as I can tell there was never a real reason (e.g. different reporting)
for carrying down the "status" as opposed to "-1".

At the time that d21f8426907 was written there was a corresponding
"ret < Z_OK" check right after the unpack_sha1_header() call (the
"unpack_sha1_header()" function was later rename to our current
"unpack_loose_header()").

However, that check was removed in c84a1f3ed4d (sha1_file: refactor
read_object, 2017-06-21) without changing the corresponding return
code.

So let's do the minor cleanup of also changing this function to return
a -1.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/object-file.c b/object-file.c
index ae6a37ab5fb..11df4485147 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1252,7 +1252,7 @@ int unpack_loose_header(git_zstream *stream,
 	status = git_inflate(stream, 0);
 	obj_read_lock();
 	if (status < Z_OK)
-		return status;
+		return -1;
 
 	/*
 	 * Check if entire header is unpacked in the first iteration.
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v6 17/22] object-file.c: return -2 on "header too long" in unpack_loose_header()
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (15 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 16/22] object-file.c: return -1, not "status" from unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-07 10:58             ` [PATCH v6 18/22] object-file.c: use "enum" return type for unpack_loose_header() Ævar Arnfjörð Bjarmason
                               ` (5 subsequent siblings)
  22 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Split up the return code for "header too long" from the generic
negative return value unpack_loose_header() returns, and report via
error() if we exceed MAX_HEADER_LEN.

As a test added earlier in this series in t1006-cat-file.sh shows
we'll correctly emit zlib errors from zlib.c already in this case, so
we have no need to carry those return codes further down the
stack. Let's instead just return -2 saying we ran into the
MAX_HEADER_LEN limit, or other negative values for "unable to unpack
<OID> header".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c       | 16 +++++++++++++---
 object-store.h      |  6 ++++--
 t/t1006-cat-file.sh |  2 +-
 3 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/object-file.c b/object-file.c
index 11df4485147..0cb5287d3ef 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1266,7 +1266,7 @@ int unpack_loose_header(git_zstream *stream,
 	 * --allow-unknown-type".
 	 */
 	if (!header)
-		return -1;
+		return -2;
 
 	/*
 	 * buffer[0..bufsiz] was not large enough.  Copy the partial
@@ -1287,7 +1287,7 @@ int unpack_loose_header(git_zstream *stream,
 		stream->next_out = buffer;
 		stream->avail_out = bufsiz;
 	} while (status != Z_STREAM_END);
-	return -1;
+	return -2;
 }
 
 static void *unpack_loose_rest(git_zstream *stream,
@@ -1456,9 +1456,19 @@ static int loose_object_info(struct repository *r,
 
 	hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
 				      allow_unknown ? &hdrbuf : NULL);
-	if (hdr_ret < 0) {
+	switch (hdr_ret) {
+	case 0:
+		break;
+	case -1:
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
+		break;
+	case -2:
+		status = error(_("header for %s too long, exceeds %d bytes"),
+			       oid_to_hex(oid), MAX_HEADER_LEN);
+		break;
+	default:
+		BUG("unknown hdr_ret value %d", hdr_ret);
 	}
 	if (!status) {
 		if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
diff --git a/object-store.h b/object-store.h
index 584bf5556af..e896b813f24 100644
--- a/object-store.h
+++ b/object-store.h
@@ -489,13 +489,15 @@ int for_each_packed_object(each_packed_object_fn, void *,
  * unpack_loose_header() initializes the data stream needed to unpack
  * a loose object header.
  *
- * Returns 0 on success. Returns negative values on error.
+ * Returns 0 on success. Returns negative values on error. If the
+ * header exceeds MAX_HEADER_LEN -2 will be returned.
  *
  * It will only parse up to MAX_HEADER_LEN bytes unless an optional
  * "hdrbuf" argument is non-NULL. This is intended for use with
  * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
  * reporting. The full header will be extracted to "hdrbuf" for use
- * with parse_loose_header().
+ * with parse_loose_header(), -2 will still be returned from this
+ * function to indicate that the header was too long.
  */
 int unpack_loose_header(git_zstream *stream, unsigned char *map,
 			unsigned long mapsize, void *buffer,
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 98729f1edfc..43a9f4e7f0c 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -440,7 +440,7 @@ bogus_sha1=$(echo_without_newline "$bogus_content" | git hash-object -t $bogus_t
 
 test_expect_success 'die on broken object with large type under -t and -s without --allow-unknown-type' '
 	cat >err.expect <<-EOF &&
-	error: unable to unpack $bogus_sha1 header
+	error: header for $bogus_sha1 too long, exceeds 32 bytes
 	fatal: git cat-file: could not get object info
 	EOF
 
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v6 18/22] object-file.c: use "enum" return type for unpack_loose_header()
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (16 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 17/22] object-file.c: return -2 on "header too long" in unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-17  2:45               ` Taylor Blau
  2021-09-07 10:58             ` [PATCH v6 19/22] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
                               ` (4 subsequent siblings)
  22 siblings, 1 reply; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

In the preceding commits we changed and documented
unpack_loose_header() from return any negative value or zero, to only
-2, -1 or 0. Let's instead add an "enum unpack_loose_header_result"
type and use it, and have the compiler assert that we're exhaustively
covering all return values. This gets rid of the need for having a
"default" BUG() case in loose_object_info().

I'm on the fence about whether this is more readable or worth it, but
since it was suggested in [1] to do this let's go for it.

1. https://lore.kernel.org/git/20210527175433.2673306-1-jonathantanmy@google.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c  | 20 ++++++++++----------
 object-store.h | 27 ++++++++++++++++++++-------
 streaming.c    | 27 ++++++++++++++++-----------
 3 files changed, 46 insertions(+), 28 deletions(-)

diff --git a/object-file.c b/object-file.c
index 0cb5287d3ef..9484c7ce2be 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1233,10 +1233,12 @@ void *map_loose_object(struct repository *r,
 	return map_loose_object_1(r, NULL, oid, size);
 }
 
-int unpack_loose_header(git_zstream *stream,
-			unsigned char *map, unsigned long mapsize,
-			void *buffer, unsigned long bufsiz,
-			struct strbuf *header)
+enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
+						    unsigned char *map,
+						    unsigned long mapsize,
+						    void *buffer,
+						    unsigned long bufsiz,
+						    struct strbuf *header)
 {
 	int status;
 
@@ -1411,7 +1413,7 @@ static int loose_object_info(struct repository *r,
 	unsigned long mapsize;
 	void *map;
 	git_zstream stream;
-	int hdr_ret;
+	enum unpack_loose_header_result hdr_ret;
 	char hdr[MAX_HEADER_LEN];
 	struct strbuf hdrbuf = STRBUF_INIT;
 	unsigned long size_scratch;
@@ -1457,18 +1459,16 @@ static int loose_object_info(struct repository *r,
 	hdr_ret = unpack_loose_header(&stream, map, mapsize, hdr, sizeof(hdr),
 				      allow_unknown ? &hdrbuf : NULL);
 	switch (hdr_ret) {
-	case 0:
+	case UNPACK_LOOSE_HEADER_RESULT_OK:
 		break;
-	case -1:
+	case UNPACK_LOOSE_HEADER_RESULT_BAD:
 		status = error(_("unable to unpack %s header"),
 			       oid_to_hex(oid));
 		break;
-	case -2:
+	case UNPACK_LOOSE_HEADER_RESULT_BAD_TOO_LONG:
 		status = error(_("header for %s too long, exceeds %d bytes"),
 			       oid_to_hex(oid), MAX_HEADER_LEN);
 		break;
-	default:
-		BUG("unknown hdr_ret value %d", hdr_ret);
 	}
 	if (!status) {
 		if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
diff --git a/object-store.h b/object-store.h
index e896b813f24..ac55b02f15a 100644
--- a/object-store.h
+++ b/object-store.h
@@ -485,23 +485,36 @@ int for_each_object_in_pack(struct packed_git *p,
 int for_each_packed_object(each_packed_object_fn, void *,
 			   enum for_each_object_flags flags);
 
+enum unpack_loose_header_result {
+	UNPACK_LOOSE_HEADER_RESULT_BAD_TOO_LONG = -2,
+	UNPACK_LOOSE_HEADER_RESULT_BAD = -1,
+	UNPACK_LOOSE_HEADER_RESULT_OK,
+
+};
+
 /**
  * unpack_loose_header() initializes the data stream needed to unpack
  * a loose object header.
  *
- * Returns 0 on success. Returns negative values on error. If the
- * header exceeds MAX_HEADER_LEN -2 will be returned.
+ * Returns UNPACK_LOOSE_HEADER_RESULT_OK on success. Returns
+ * UNPACK_LOOSE_HEADER_RESULT_BAD values on error, or if the header
+ * exceeds MAX_HEADER_LEN UNPACK_LOOSE_HEADER_RESULT_BAD_TOO_LONG will
+ * be returned.
  *
  * It will only parse up to MAX_HEADER_LEN bytes unless an optional
  * "hdrbuf" argument is non-NULL. This is intended for use with
  * OBJECT_INFO_ALLOW_UNKNOWN_TYPE to extract the bad type for (error)
  * reporting. The full header will be extracted to "hdrbuf" for use
- * with parse_loose_header(), -2 will still be returned from this
- * function to indicate that the header was too long.
+ * with parse_loose_header(), UNPACK_LOOSE_HEADER_RESULT_BAD_TOO_LONG
+ * will still be returned from this function to indicate that the
+ * header was too long.
  */
-int unpack_loose_header(git_zstream *stream, unsigned char *map,
-			unsigned long mapsize, void *buffer,
-			unsigned long bufsiz, struct strbuf *hdrbuf);
+enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
+						    unsigned char *map,
+						    unsigned long mapsize,
+						    void *buffer,
+						    unsigned long bufsiz,
+						    struct strbuf *hdrbuf);
 
 /**
  * parse_loose_header() parses the starting "<type> <len>\0" of an
diff --git a/streaming.c b/streaming.c
index c3dc241d6a5..3e5045c004d 100644
--- a/streaming.c
+++ b/streaming.c
@@ -224,24 +224,25 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 			      enum object_type *type)
 {
 	struct object_info oi = OBJECT_INFO_INIT;
+	enum unpack_loose_header_result hdr_ret;
 	oi.sizep = &st->size;
 	oi.typep = type;
 
 	st->u.loose.mapped = map_loose_object(r, oid, &st->u.loose.mapsize);
 	if (!st->u.loose.mapped)
 		return -1;
-	if ((unpack_loose_header(&st->z,
-				 st->u.loose.mapped,
-				 st->u.loose.mapsize,
-				 st->u.loose.hdr,
-				 sizeof(st->u.loose.hdr),
-				 NULL) < 0) ||
-	    (parse_loose_header(st->u.loose.hdr, &oi) < 0) ||
-	    *type < 0) {
-		git_inflate_end(&st->z);
-		munmap(st->u.loose.mapped, st->u.loose.mapsize);
-		return -1;
+	hdr_ret = unpack_loose_header(&st->z, st->u.loose.mapped,
+				      st->u.loose.mapsize, st->u.loose.hdr,
+				      sizeof(st->u.loose.hdr), NULL);
+	switch (hdr_ret) {
+	case UNPACK_LOOSE_HEADER_RESULT_OK:
+		break;
+	case UNPACK_LOOSE_HEADER_RESULT_BAD:
+	case UNPACK_LOOSE_HEADER_RESULT_BAD_TOO_LONG:
+		goto error;
 	}
+	if (parse_loose_header(st->u.loose.hdr, &oi) < 0 || *type < 0)
+		goto error;
 
 	st->u.loose.hdr_used = strlen(st->u.loose.hdr) + 1;
 	st->u.loose.hdr_avail = st->z.total_out;
@@ -250,6 +251,10 @@ static int open_istream_loose(struct git_istream *st, struct repository *r,
 	st->read = read_istream_loose;
 
 	return 0;
+error:
+	git_inflate_end(&st->z);
+	munmap(st->u.loose.mapped, st->u.loose.mapsize);
+	return -1;
 }
 
 
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v6 19/22] fsck: don't hard die on invalid object types
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (17 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 18/22] object-file.c: use "enum" return type for unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-17  3:37               ` Taylor Blau
  2021-09-07 10:58             ` [PATCH v6 20/22] object-store.h: move read_loose_object() below 'struct object_info' Ævar Arnfjörð Bjarmason
                               ` (3 subsequent siblings)
  22 siblings, 1 reply; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Change the error fsck emits on invalid object types, such as:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    <OID>

From the very ungraceful error of:

    $ git fsck
    fatal: invalid object type
    $

To:

    $ git fsck
    error: hash mismatch for <OID_PATH> (expected <OID>)
    error: <OID>: object corrupt or missing: <OID_PATH>
    [ the rest of the fsck output here, i.e. it didn't hard die ]

We'll still exit with non-zero, but now we'll finish the rest of the
traversal. The tests that's being added here asserts that we'll still
complain about other fsck issues (e.g. an unrelated dangling blob).

To do this we need to pass down the "OBJECT_INFO_ALLOW_UNKNOWN_TYPE"
flag from read_loose_object() through to parse_loose_header(). Since
the read_loose_object() function is only used in builtin/fsck.c we can
simply change it. See f6371f92104 (sha1_file: add read_loose_object()
function, 2017-01-13) for the introduction of read_loose_object().

Why are we complaining about a "hash mismatch" for an object of a type
we don't know about? We shouldn't. This is the bare minimal change
needed to not make fsck hard die on a repository that's been corrupted
in this manner. In subsequent commits we'll teach fsck to recognize
this particular type of corruption and emit a better error message.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c  |  3 ++-
 object-file.c   | 11 ++++++++---
 object-store.h  |  3 ++-
 t/t1450-fsck.sh | 14 +++++++-------
 4 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index b42b6fe21f7..082dadd5629 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -601,7 +601,8 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	void *contents;
 	int eaten;
 
-	if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
+	if (read_loose_object(path, oid, &type, &size, &contents,
+			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
 		errors_found |= ERROR_OBJECT;
 		error(_("%s: object corrupt or missing: %s"),
 		      oid_to_hex(oid), path);
diff --git a/object-file.c b/object-file.c
index 9484c7ce2be..0e6937fad73 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2562,7 +2562,8 @@ int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
 		      enum object_type *type,
 		      unsigned long *size,
-		      void **contents)
+		      void **contents,
+		      unsigned int oi_flags)
 {
 	int ret = -1;
 	void *map = NULL;
@@ -2570,6 +2571,7 @@ int read_loose_object(const char *path,
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
 	struct object_info oi = OBJECT_INFO_INIT;
+	int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 	oi.typep = type;
 	oi.sizep = size;
 
@@ -2592,8 +2594,11 @@ int read_loose_object(const char *path,
 		git_inflate_end(&stream);
 		goto out;
 	}
-	if (*type < 0)
-		die(_("invalid object type"));
+	if (!allow_unknown && *type < 0) {
+		error(_("header for %s declares an unknown type"), path);
+		git_inflate_end(&stream);
+		goto out;
+	}
 
 	if (*type == OBJ_BLOB && *size > big_file_threshold) {
 		if (check_stream_oid(&stream, hdr, *size, path, expected_oid) < 0)
diff --git a/object-store.h b/object-store.h
index ac55b02f15a..c268662f5ba 100644
--- a/object-store.h
+++ b/object-store.h
@@ -253,7 +253,8 @@ int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
 		      enum object_type *type,
 		      unsigned long *size,
-		      void **contents);
+		      void **contents,
+		      unsigned int oi_flags);
 
 /* Retry packed storage after checking packed and loose storage */
 #define HAS_OBJECT_RECHECK_PACKED 1
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index f10d6f7b7e8..d8303db9709 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -863,16 +863,16 @@ test_expect_success 'detect corrupt index file in fsck' '
 	test_i18ngrep "bad index file" errors
 '
 
-test_expect_success 'fsck hard errors on an invalid object type' '
+test_expect_success 'fsck error and recovery on invalid object type' '
 	git init --bare garbage-type &&
 	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
 	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
-	cat >err.expect <<-\EOF &&
-	fatal: invalid object type
-	EOF
-	test_must_fail git -C garbage-type fsck >out.actual 2>err.actual &&
-	test_cmp err.expect err.actual &&
-	test_must_be_empty out.actual
+	test_must_fail git -C garbage-type fsck >out 2>err &&
+	grep -e "^error" -e "^fatal" err >errors &&
+	test_line_count = 2 errors &&
+	grep "error: hash mismatch for" err &&
+	grep "$garbage_blob: object corrupt or missing:" err &&
+	grep "dangling blob $empty_blob" out
 '
 
 test_done
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v6 20/22] object-store.h: move read_loose_object() below 'struct object_info'
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (18 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 19/22] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-07 10:58             ` [PATCH v6 21/22] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
                               ` (2 subsequent siblings)
  22 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Move the declaration of read_loose_object() below "struct
object_info". In the next commit we'll add a "struct object_info *"
parameter to it, moving it will avoid a forward declaration of the
struct.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-store.h | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/object-store.h b/object-store.h
index c268662f5ba..dc638335e7d 100644
--- a/object-store.h
+++ b/object-store.h
@@ -242,20 +242,6 @@ int pretend_object_file(void *, unsigned long, enum object_type,
 
 int force_object_loose(const struct object_id *oid, time_t mtime);
 
-/*
- * Open the loose object at path, check its hash, and return the contents,
- * type, and size. If the object is a blob, then "contents" may return NULL,
- * to allow streaming of large blobs.
- *
- * Returns 0 on success, negative on error (details may be written to stderr).
- */
-int read_loose_object(const char *path,
-		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
-		      void **contents,
-		      unsigned int oi_flags);
-
 /* Retry packed storage after checking packed and loose storage */
 #define HAS_OBJECT_RECHECK_PACKED 1
 
@@ -396,6 +382,20 @@ int oid_object_info_extended(struct repository *r,
 			     const struct object_id *,
 			     struct object_info *, unsigned flags);
 
+/*
+ * Open the loose object at path, check its hash, and return the contents,
+ * type, and size. If the object is a blob, then "contents" may return NULL,
+ * to allow streaming of large blobs.
+ *
+ * Returns 0 on success, negative on error (details may be written to stderr).
+ */
+int read_loose_object(const char *path,
+		      const struct object_id *expected_oid,
+		      enum object_type *type,
+		      unsigned long *size,
+		      void **contents,
+		      unsigned int oi_flags);
+
 /*
  * Iterate over the files in the loose-object parts of the object
  * directory "path", triggering the following callbacks:
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v6 21/22] fsck: report invalid types recorded in objects
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (19 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 20/22] object-store.h: move read_loose_object() below 'struct object_info' Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-17  3:57               ` Taylor Blau
  2021-09-07 10:58             ` [PATCH v6 22/22] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
  2021-09-17  4:08             ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Taylor Blau
  22 siblings, 1 reply; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Continue the work in the preceding commit and improve the error on:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    $ git fsck
    error: hash mismatch for <OID_PATH> (expected <OID>)
    error: <OID>: object corrupt or missing: <OID_PATH>
    [ other fsck output ]

To instead emit:

    $ git fsck
    error: <OID>: object is of unknown type 'garbage': <OID_PATH>
    [ other fsck output ]

The complaint about a "hash mismatch" was simply an emergent property
of how we'd fall though from read_loose_object() into fsck_loose()
when we didn't get the data we expected. Now we'll correctly note that
the object type is invalid.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fsck.c  | 22 ++++++++++++++++++----
 object-file.c   | 13 +++++--------
 object-store.h  |  4 ++--
 t/t1450-fsck.sh | 24 +++++++++++++++++++++---
 4 files changed, 46 insertions(+), 17 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 082dadd5629..07af0434db6 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -600,12 +600,26 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	unsigned long size;
 	void *contents;
 	int eaten;
-
-	if (read_loose_object(path, oid, &type, &size, &contents,
-			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
-		errors_found |= ERROR_OBJECT;
+	struct strbuf sb = STRBUF_INIT;
+	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
+	struct object_info oi;
+	int found = 0;
+	oi.type_name = &sb;
+	oi.sizep = &size;
+	oi.typep = &type;
+
+	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
+		found |= ERROR_OBJECT;
 		error(_("%s: object corrupt or missing: %s"),
 		      oid_to_hex(oid), path);
+	}
+	if (type < 0) {
+		found |= ERROR_OBJECT;
+		error(_("%s: object is of unknown type '%s': %s"),
+		      oid_to_hex(oid), sb.buf, path);
+	}
+	if (found) {
+		errors_found |= ERROR_OBJECT;
 		return 0; /* keep checking other objects */
 	}
 
diff --git a/object-file.c b/object-file.c
index 0e6937fad73..f4850ba62b4 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2560,9 +2560,8 @@ static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
 		      void **contents,
+		      struct object_info *oi,
 		      unsigned int oi_flags)
 {
 	int ret = -1;
@@ -2570,10 +2569,9 @@ int read_loose_object(const char *path,
 	unsigned long mapsize;
 	git_zstream stream;
 	char hdr[MAX_HEADER_LEN];
-	struct object_info oi = OBJECT_INFO_INIT;
 	int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
-	oi.typep = type;
-	oi.sizep = size;
+	enum object_type *type = oi->typep;
+	unsigned long *size = oi->sizep;
 
 	*contents = NULL;
 
@@ -2589,7 +2587,7 @@ int read_loose_object(const char *path,
 		goto out;
 	}
 
-	if (parse_loose_header(hdr, &oi) < 0) {
+	if (parse_loose_header(hdr, oi) < 0) {
 		error(_("unable to parse header of %s"), path);
 		git_inflate_end(&stream);
 		goto out;
@@ -2611,8 +2609,7 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size,
-					   type_name(*type))) {
+					   *contents, *size, oi->type_name->buf)) {
 			error(_("hash mismatch for %s (expected %s)"), path,
 			      oid_to_hex(expected_oid));
 			free(*contents);
diff --git a/object-store.h b/object-store.h
index dc638335e7d..f3045148b89 100644
--- a/object-store.h
+++ b/object-store.h
@@ -384,6 +384,7 @@ int oid_object_info_extended(struct repository *r,
 
 /*
  * Open the loose object at path, check its hash, and return the contents,
+ * use the "oi" argument to assert things about the object, or e.g. populate its
  * type, and size. If the object is a blob, then "contents" may return NULL,
  * to allow streaming of large blobs.
  *
@@ -391,9 +392,8 @@ int oid_object_info_extended(struct repository *r,
  */
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
-		      enum object_type *type,
-		      unsigned long *size,
 		      void **contents,
+		      struct object_info *oi,
 		      unsigned int oi_flags);
 
 /*
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index d8303db9709..da2658155c7 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -66,6 +66,25 @@ test_expect_success 'object with hash mismatch' '
 	)
 '
 
+test_expect_success 'object with hash and type mismatch' '
+	git init --bare hash-type-mismatch &&
+	(
+		cd hash-type-mismatch &&
+		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		old=$(test_oid_to_path "$oid") &&
+		new=$(dirname $old)/$(test_oid ff_2) &&
+		oid="$(dirname $new)$(basename $new)" &&
+		mv objects/$old objects/$new &&
+		git update-index --add --cacheinfo 100644 $oid foo &&
+		tree=$(git write-tree) &&
+		cmt=$(echo bogus | git commit-tree $tree) &&
+		git update-ref refs/heads/bogus $cmt &&
+		test_must_fail git fsck 2>out &&
+		grep "^error: hash mismatch for " out &&
+		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+	)
+'
+
 test_expect_success 'branch pointing to non-commit' '
 	git rev-parse HEAD^{tree} >.git/refs/heads/invalid &&
 	test_when_finished "git update-ref -d refs/heads/invalid" &&
@@ -869,9 +888,8 @@ test_expect_success 'fsck error and recovery on invalid object type' '
 	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
 	test_must_fail git -C garbage-type fsck >out 2>err &&
 	grep -e "^error" -e "^fatal" err >errors &&
-	test_line_count = 2 errors &&
-	grep "error: hash mismatch for" err &&
-	grep "$garbage_blob: object corrupt or missing:" err &&
+	test_line_count = 1 errors &&
+	grep "$garbage_blob: object is of unknown type '"'"'garbage'"'"':" err &&
 	grep "dangling blob $empty_blob" out
 '
 
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply	[flat|nested] 155+ messages in thread

* [PATCH v6 22/22] fsck: report invalid object type-path combinations
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (20 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 21/22] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
@ 2021-09-07 10:58             ` Ævar Arnfjörð Bjarmason
  2021-09-17  4:06               ` Taylor Blau
  2021-09-17  4:08             ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Taylor Blau
  22 siblings, 1 reply; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-07 10:58 UTC (permalink / raw)
  To: git
  Cc: Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak,
	Ævar Arnfjörð Bjarmason

Improve the error that's emitted in cases where we find a loose object
we parse, but which isn't at the location we expect it to be.

Before this change we'd prefix the error with a not-a-OID derived from
the path at which the object was found, due to an emergent behavior in
how we'd end up with an "OID" in these codepaths.

Now we'll instead say what object we hashed, and what path it was
found at. Before this patch series e.g.:

    $ git hash-object --stdin -w -t blob </dev/null
    e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
    $ mv objects/e6/ objects/e7

Would emit ("[...]" used to abbreviate the OIDs):

    git fsck
    error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
    error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]

Now we'll instead emit:

    error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]

Furthermore, we'll do the right thing when the object type and its
location are bad. I.e. this case:

    $ git hash-object --stdin -w -t garbage --literally </dev/null
    8315a83d2acc4c174aed59430f9a9c4ed926440f
    $ mv objects/83 objects/84

As noted in an earlier commits we'd simply die early in those cases,
until preceding commits fixed the hard die on invalid object type:

    $ git fsck
    fatal: invalid object type

Now we'll instead emit sensible error messages:

    $ git fsck
    error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...]
    error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...]

In both fsck.c and object-file.c we're using null_oid as a sentinel
value for checking whether we got far enough to be certain that the
issue was indeed this OID mismatch.

In the case of check_object_signature() I don't really trust all the
moving parts there to behave consistently, in the face of future
refactorings. Getting it wrong would mean that we'd potentially emit
no error at all on a failing check_object_signature(), or worse
misreport whatever issue we encountered. So let's use the new bug()
function to ferry and return code up to fsck_loose() in that case.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 builtin/fast-export.c |  2 +-
 builtin/fsck.c        | 13 +++++++++----
 builtin/index-pack.c  |  2 +-
 builtin/mktag.c       |  3 ++-
 object-file.c         | 21 ++++++++++++---------
 object-store.h        |  4 +++-
 object.c              |  4 ++--
 pack-check.c          |  3 ++-
 t/t1006-cat-file.sh   |  2 +-
 t/t1450-fsck.sh       |  8 +++++---
 10 files changed, 38 insertions(+), 24 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 3c20f164f0f..48a3b6a7f8f 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -312,7 +312,7 @@ static void export_blob(const struct object_id *oid)
 		if (!buf)
 			die("could not read blob %s", oid_to_hex(oid));
 		if (check_object_signature(the_repository, oid, buf, size,
-					   type_name(type)) < 0)
+					   type_name(type), NULL) < 0)
 			die("oid mismatch in blob %s", oid_to_hex(oid));
 		object = parse_object_buffer(the_repository, oid, type,
 					     size, buf, &eaten);
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 07af0434db6..158b9dac9b3 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -603,20 +603,25 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
 	struct strbuf sb = STRBUF_INIT;
 	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
 	struct object_info oi;
+	struct object_id real_oid = *null_oid();
 	int found = 0;
 	oi.type_name = &sb;
 	oi.sizep = &size;
 	oi.typep = &type;
 
-	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
+	if (read_loose_object(path, oid, &real_oid, &contents, &oi, oi_flags) < 0) {
 		found |= ERROR_OBJECT;
-		error(_("%s: object corrupt or missing: %s"),
-		      oid_to_hex(oid), path);
+		if (!oideq(&real_oid, oid))
+			error(_("%s: hash-path mismatch, found at: %s"),
+			      oid_to_hex(&real_oid), path);
+		else
+			error(_("%s: object corrupt or missing: %s"),
+			      oid_to_hex(oid), path);
 	}
 	if (type < 0) {
 		found |= ERROR_OBJECT;
 		error(_("%s: object is of unknown type '%s': %s"),
-		      oid_to_hex(oid), sb.buf, path);
+		      oid_to_hex(&real_oid), sb.buf, path);
 	}
 	if (found) {
 		errors_found |= ERROR_OBJECT;
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 8336466865c..9f540e0236a 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1419,7 +1419,7 @@ static void fix_unresolved_deltas(struct hashfile *f)
 
 		if (check_object_signature(the_repository, &d->oid,
 					   data, size,
-					   type_name(type)))
+					   type_name(type), NULL))
 			die(_("local object %s is corrupt"), oid_to_hex(&d->oid));
 
 		/*
diff --git a/builtin/mktag.c b/builtin/mktag.c
index dddcccdd368..3b2dbbb37e6 100644
--- a/builtin/mktag.c
+++ b/builtin/mktag.c
@@ -62,7 +62,8 @@ static int verify_object_in_tag(struct object_id *tagged_oid, int *tagged_type)
 
 	repl = lookup_replace_object(the_repository, tagged_oid);
 	ret = check_object_signature(the_repository, repl,
-				     buffer, size, type_name(*tagged_type));
+				     buffer, size, type_name(*tagged_type),
+				     NULL);
 	free(buffer);
 
 	return ret;
diff --git a/object-file.c b/object-file.c
index f4850ba62b4..07b3e4d9b4b 100644
--- a/object-file.c
+++ b/object-file.c
@@ -1062,9 +1062,11 @@ void *xmmap(void *start, size_t length,
  * the streaming interface and rehash it to do the same.
  */
 int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *map, unsigned long size, const char *type)
+			   void *map, unsigned long size, const char *type,
+			   struct object_id *real_oidp)
 {
-	struct object_id real_oid;
+	struct object_id tmp;
+	struct object_id *real_oid = real_oidp ? real_oidp : &tmp;
 	enum object_type obj_type;
 	struct git_istream *st;
 	git_hash_ctx c;
@@ -1072,8 +1074,8 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 	int hdrlen;
 
 	if (map) {
-		hash_object_file(r->hash_algo, map, size, type, &real_oid);
-		return !oideq(oid, &real_oid) ? -1 : 0;
+		hash_object_file(r->hash_algo, map, size, type, real_oid);
+		return !oideq(oid, real_oid) ? -1 : 0;
 	}
 
 	st = open_istream(r, oid, &obj_type, &size, NULL);
@@ -1098,9 +1100,9 @@ int check_object_signature(struct repository *r, const struct object_id *oid,
 			break;
 		r->hash_algo->update_fn(&c, buf, readlen);
 	}
-	r->hash_algo->final_oid_fn(&real_oid, &c);
+	r->hash_algo->final_oid_fn(real_oid, &c);
 	close_istream(st);
-	return !oideq(oid, &real_oid) ? -1 : 0;
+	return !oideq(oid, real_oid) ? -1 : 0;
 }
 
 int git_open_cloexec(const char *name, int flags)
@@ -2560,6 +2562,7 @@ static int check_stream_oid(git_zstream *stream,
 
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
+		      struct object_id *real_oid,
 		      void **contents,
 		      struct object_info *oi,
 		      unsigned int oi_flags)
@@ -2609,9 +2612,9 @@ int read_loose_object(const char *path,
 			goto out;
 		}
 		if (check_object_signature(the_repository, expected_oid,
-					   *contents, *size, oi->type_name->buf)) {
-			error(_("hash mismatch for %s (expected %s)"), path,
-			      oid_to_hex(expected_oid));
+					   *contents, *size, oi->type_name->buf, real_oid)) {
+			if (oideq(real_oid, null_oid()))
+				BUG("should only get OID mismatch errors with mapped contents");
 			free(*contents);
 			goto out;
 		}
diff --git a/object-store.h b/object-store.h
index f3045148b89..3c4ada23f5d 100644
--- a/object-store.h
+++ b/object-store.h
@@ -392,6 +392,7 @@ int oid_object_info_extended(struct repository *r,
  */
 int read_loose_object(const char *path,
 		      const struct object_id *expected_oid,
+		      struct object_id *real_oid,
 		      void **contents,
 		      struct object_info *oi,
 		      unsigned int oi_flags);
@@ -528,7 +529,8 @@ enum unpack_loose_header_result unpack_loose_header(git_zstream *stream,
 int parse_loose_header(const char *hdr, struct object_info *oi);
 
 int check_object_signature(struct repository *r, const struct object_id *oid,
-			   void *buf, unsigned long size, const char *type);
+			   void *buf, unsigned long size, const char *type,
+			   struct object_id *real_oidp);
 int finalize_object_file(const char *tmpfile, const char *filename);
 int check_and_freshen_file(const char *fn, int freshen);
 
diff --git a/object.c b/object.c
index 4e85955a941..23a24e678a8 100644
--- a/object.c
+++ b/object.c
@@ -279,7 +279,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
 	    (!obj && repo_has_object_file(r, oid) &&
 	     oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
-		if (check_object_signature(r, repl, NULL, 0, NULL) < 0) {
+		if (check_object_signature(r, repl, NULL, 0, NULL, NULL) < 0) {
 			error(_("hash mismatch %s"), oid_to_hex(oid));
 			return NULL;
 		}
@@ -290,7 +290,7 @@ struct object *parse_object(struct repository *r, const struct object_id *oid)
 	buffer = repo_read_object_file(r, oid, &type, &size);
 	if (buffer) {
 		if (check_object_signature(r, repl, buffer, size,
-					   type_name(type)) < 0) {
+					   type_name(type), NULL) < 0) {
 			free(buffer);
 			error(_("hash mismatch %s"), oid_to_hex(repl));
 			return NULL;
diff --git a/pack-check.c b/pack-check.c
index c8e560d71ab..3f418e3a6af 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -142,7 +142,8 @@ static int verify_packfile(struct repository *r,
 			err = error("cannot unpack %s from %s at offset %"PRIuMAX"",
 				    oid_to_hex(&oid), p->pack_name,
 				    (uintmax_t)entries[i].offset);
-		else if (check_object_signature(r, &oid, data, size, type_name(type)))
+		else if (check_object_signature(r, &oid, data, size,
+						type_name(type), NULL))
 			err = error("packed %s from %s is corrupt",
 				    oid_to_hex(&oid), p->pack_name);
 		else if (fn) {
diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 43a9f4e7f0c..39fe11bc92c 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -490,7 +490,7 @@ test_expect_success 'cat-file -t and -s on corrupt loose object' '
 		# Swap the two to corrupt the repository
 		mv -f "$other_path" "$empty_path" &&
 		test_must_fail git fsck 2>err.fsck &&
-		grep "hash mismatch" err.fsck &&
+		grep "hash-path mismatch" err.fsck &&
 
 		# confirm that cat-file is reading the new swapped-in
 		# blob...
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index da2658155c7..7d0d57564b5 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -53,6 +53,7 @@ test_expect_success 'object with hash mismatch' '
 	(
 		cd hash-mismatch &&
 		oid=$(echo blob | git hash-object -w --stdin) &&
+		oldoid=$oid &&
 		old=$(test_oid_to_path "$oid") &&
 		new=$(dirname $old)/$(test_oid ff_2) &&
 		oid="$(dirname $new)$(basename $new)" &&
@@ -62,7 +63,7 @@ test_expect_success 'object with hash mismatch' '
 		cmt=$(echo bogus | git commit-tree $tree) &&
 		git update-ref refs/heads/bogus $cmt &&
 		test_must_fail git fsck 2>out &&
-		test_i18ngrep "$oid.*corrupt" out
+		grep "$oldoid: hash-path mismatch, found at: .*$new" out
 	)
 '
 
@@ -71,6 +72,7 @@ test_expect_success 'object with hash and type mismatch' '
 	(
 		cd hash-type-mismatch &&
 		oid=$(echo blob | git hash-object -w --stdin -t garbage --literally) &&
+		oldoid=$oid &&
 		old=$(test_oid_to_path "$oid") &&
 		new=$(dirname $old)/$(test_oid ff_2) &&
 		oid="$(dirname $new)$(basename $new)" &&
@@ -80,8 +82,8 @@ test_expect_success 'object with hash and type mismatch' '
 		cmt=$(echo bogus | git commit-tree $tree) &&
 		git update-ref refs/heads/bogus $cmt &&
 		test_must_fail git fsck 2>out &&
-		grep "^error: hash mismatch for " out &&
-		grep "^error: $oid: object is of unknown type '"'"'garbage'"'"'" out
+		grep "^error: $oldoid: hash-path mismatch, found at: .*$new" out &&
+		grep "^error: $oldoid: object is of unknown type '"'"'garbage'"'"'" out
 	)
 '
 
-- 
2.33.0.815.g21c7aaf6073


^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v6 01/22] fsck tests: refactor one test to use a sub-repo
  2021-09-07 10:57             ` [PATCH v6 01/22] fsck tests: refactor one test to use a sub-repo Ævar Arnfjörð Bjarmason
@ 2021-09-16 19:40               ` Taylor Blau
  2021-09-17  9:27                 ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 155+ messages in thread
From: Taylor Blau @ 2021-09-16 19:40 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Tue, Sep 07, 2021 at 12:57:56PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Refactor one of the fsck tests to use a throwaway repository. It's a
> pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
> teardown of a tests so we're not leaving corrupt content for the next
> test.

OK. I seem to recall you advocating against this pattern elsewhere[1], but
this is a good example of why it can sometimes make writing tests much
easier when not having to reason about what leaks out of running a test.

[1]: https://lore.kernel.org/git/87zgsnj0q0.fsf@evledraar.gmail.com/,
although after re-reading it it looks like you were more focused on the
unnecessary "rm -fr repo" there and not the "git init +
test_when_finished rm -fr" pattern.

> -test_expect_success 'object with bad sha1' '
> -	sha=$(echo blob | git hash-object -w --stdin) &&
> -	old=$(test_oid_to_path "$sha") &&
> -	new=$(dirname $old)/$(test_oid ff_2) &&
> -	sha="$(dirname $new)$(basename $new)" &&
> -	mv .git/objects/$old .git/objects/$new &&
> -	test_when_finished "remove_object $sha" &&
> -	git update-index --add --cacheinfo 100644 $sha foo &&
> -	test_when_finished "git read-tree -u --reset HEAD" &&
> -	tree=$(git write-tree) &&
> -	test_when_finished "remove_object $tree" &&
> -	cmt=$(echo bogus | git commit-tree $tree) &&
> -	test_when_finished "remove_object $cmt" &&
> -	git update-ref refs/heads/bogus $cmt &&
> -	test_when_finished "git update-ref -d refs/heads/bogus" &&
> -
> -	test_must_fail git fsck 2>out &&
> -	test_i18ngrep "$sha.*corrupt" out
> +test_expect_success 'object with hash mismatch' '
> +	git init --bare hash-mismatch &&
> +	(
> +		cd hash-mismatch &&
> +		oid=$(echo blob | git hash-object -w --stdin) &&
> +		old=$(test_oid_to_path "$oid") &&
> +		new=$(dirname $old)/$(test_oid ff_2) &&
> +		oid="$(dirname $new)$(basename $new)" &&
> +		mv objects/$old objects/$new &&
> +		git update-index --add --cacheinfo 100644 $oid foo &&
> +		tree=$(git write-tree) &&
> +		cmt=$(echo bogus | git commit-tree $tree) &&
> +		git update-ref refs/heads/bogus $cmt &&
> +		test_must_fail git fsck 2>out &&
> +		test_i18ngrep "$oid.*corrupt" out
> +	)
>  '

This all looks fine to me. The translation is s/sha/oid and removing all
of the now-unnecessary test_when_finished calls.

But the test_i18ngrep (which isn't new) could probably also stand to get
cleaned up and converted to a normal grep.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v6 02/22] fsck tests: add test for fsck-ing an unknown type
  2021-09-07 10:57             ` [PATCH v6 02/22] fsck tests: add test for fsck-ing an unknown type Ævar Arnfjörð Bjarmason
@ 2021-09-16 19:51               ` Taylor Blau
  2021-09-17  9:39                 ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 155+ messages in thread
From: Taylor Blau @ 2021-09-16 19:51 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Tue, Sep 07, 2021 at 12:57:57PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Fix a blindspot in the fsck tests by checking what we do when we
> encounter an unknown "garbage" type produced with hash-object's
> --literally option.
>
> This behavior needs to be improved, which'll be done in subsequent
> patches, but for now let's test for the current behavior.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  t/t1450-fsck.sh | 12 ++++++++++++
>  1 file changed, 12 insertions(+)
>
> diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
> index 7becab5ba1e..f10d6f7b7e8 100755
> --- a/t/t1450-fsck.sh
> +++ b/t/t1450-fsck.sh
> @@ -863,4 +863,16 @@ test_expect_success 'detect corrupt index file in fsck' '
>  	test_i18ngrep "bad index file" errors
>  '
>
> +test_expect_success 'fsck hard errors on an invalid object type' '
> +	git init --bare garbage-type &&

I wondered whether it was really possible to not cover this, since I
figured such a test may have just been hiding elsewhere. But we really
do seem to be lacking coverage. So, adding this test is good.

> +	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
> +	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&

I'm nitpicking, but I find the -C garbage-type pattern less than ideal
for two reasons:

  - It makes every line longer (since "-C garbage type" is wider than an
    8-wide tab, even indenting this in a subshell would take up fewer
    characters visually)

  - It pollutes the current directory with things like "err.expect" and
    "err.actual" that have nothing to do with the current directory (and
    much more to do with the garbage-type repository within it).

So I don't care, really, but it may be better to just put all of this in
a subshell.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v6 03/22] cat-file tests: test for missing object with -t and -s
  2021-09-07 10:57             ` [PATCH v6 03/22] cat-file tests: test for missing object with -t and -s Ævar Arnfjörð Bjarmason
@ 2021-09-16 19:57               ` Taylor Blau
  2021-09-16 20:01                 ` Taylor Blau
  2021-09-16 22:52                 ` Ævar Arnfjörð Bjarmason
  0 siblings, 2 replies; 155+ messages in thread
From: Taylor Blau @ 2021-09-16 19:57 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Tue, Sep 07, 2021 at 12:57:58PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Test for what happens when the -t and -s flags are asked to operate on
> a missing object, this extends tests added in 3e370f9faf0 (t1006: add
> tests for git cat-file --allow-unknown-type, 2015-05-03). The -t and
> -s flags are the only ones that can be combined with
> --allow-unknown-type, so let's test with and without that flag.

I'm a little skeptical to have tests for all four pairs of `-t` or `-s`
and "with `--allow-unknown-type` and without `--allow-unknown-type`".

Testing both the presence and absence of `--allow-unknown-type` seems
useful to me, but I'm not sure what testing `-t` and `-s` separately
buys us.

(If you really feel the need test both, I'd encourage looping like:

    for arg in -t -s
    do
      test_must_fail git cat-file $arg $missing_oid >out 2>err &&
      test_must_be_empty out &&
      test_cmp expect.err err &&

      test_must_fail git cat-file $arg --allow-unknown-type $missing_oid >out 2>err &&
      test_must_be_empty out &&
      test_cmp expect.err err
    done &&

but I would be equally or perhaps even happier to just have one of the
two tests).

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v6 03/22] cat-file tests: test for missing object with -t and -s
  2021-09-16 19:57               ` Taylor Blau
@ 2021-09-16 20:01                 ` Taylor Blau
  2021-09-16 22:52                 ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 155+ messages in thread
From: Taylor Blau @ 2021-09-16 20:01 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Thu, Sep 16, 2021 at 03:57:30PM -0400, Taylor Blau wrote:
> On Tue, Sep 07, 2021 at 12:57:58PM +0200, Ævar Arnfjörð Bjarmason wrote:
> > Test for what happens when the -t and -s flags are asked to operate on
> > a missing object, this extends tests added in 3e370f9faf0 (t1006: add
> > tests for git cat-file --allow-unknown-type, 2015-05-03). The -t and
> > -s flags are the only ones that can be combined with
> > --allow-unknown-type, so let's test with and without that flag.
>
> I'm a little skeptical to have tests for all four pairs of `-t` or `-s`
> and "with `--allow-unknown-type` and without `--allow-unknown-type`".

Ah. Reading the next patch makes me feel even more certain of this
advice. Consider squashing this and the next patch with my suggestion
to use a loop below?

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v6 05/22] rev-list tests: test for behavior with invalid object types
  2021-09-07 10:58             ` [PATCH v6 05/22] rev-list tests: test for behavior with invalid object types Ævar Arnfjörð Bjarmason
@ 2021-09-16 20:40               ` Taylor Blau
  2021-09-17 11:59                 ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 155+ messages in thread
From: Taylor Blau @ 2021-09-16 20:40 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Tue, Sep 07, 2021 at 12:58:00PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Fix a blindspot in the tests for the "rev-list --disk-usage" feature
> added in 16950f8384a (rev-list: add --disk-usage option for
> calculating disk usage, 2021-02-09) to test for what happens when it's
> asked to calculate the disk usage of invalid object types.

I'm not sure that I agree this is a blindspot, or at least one worth
testing. Is the goal to add tests to every Git command that might have
to do something with a corrupt object and make sure that it is handled
correctly?

I'm not sure that doing so would be useful, or at the very least that it
would be worth the effort. That's not to say I'm not interested in
having tests fail when we don't handle corrupt objects correctly, but
more to say that I think there are so many parts of Git that might touch
a corrupt object that trying to test all of them seems like a losing
battle.

Assuming that this is a useful direction, though...

> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  t/t6115-rev-list-du.sh | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/t/t6115-rev-list-du.sh b/t/t6115-rev-list-du.sh
> index b4aef32b713..edb2ed55846 100755
> --- a/t/t6115-rev-list-du.sh
> +++ b/t/t6115-rev-list-du.sh
> @@ -48,4 +48,15 @@ check_du HEAD
>  check_du --objects HEAD
>  check_du --objects HEAD^..HEAD
>
> +test_expect_success 'setup garbage repository' '
> +	git clone --bare . garbage.git &&

Since this is cloned within the working directory, should we bother to
clean this up to avoid munging with future tests?

> +	garbage_oid=$(git -C garbage.git hash-object -t garbage -w --stdin --literally <one.t) &&
> +	git -C garbage.git rev-list --objects --all --disk-usage &&
> +
> +	# Manually create a ref because "update-ref", "tag" etc. have
> +	# no corresponding --literally option.
> +	echo $garbage_oid >garbage.git/refs/tags/garbage-tag &&
> +	test_must_fail git -C garbage.git rev-list --objects --all --disk-usage

See also my earlier comment about this being much more readable in a
sub-shell.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v6 08/22] object-file.c: don't set "typep" when returning non-zero
  2021-09-07 10:58             ` [PATCH v6 08/22] object-file.c: don't set "typep" when returning non-zero Ævar Arnfjörð Bjarmason
@ 2021-09-16 21:29               ` Taylor Blau
  2021-09-16 21:56                 ` Jeff King
  0 siblings, 1 reply; 155+ messages in thread
From: Taylor Blau @ 2021-09-16 21:29 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Tue, Sep 07, 2021 at 12:58:03PM +0200, Ævar Arnfjörð Bjarmason wrote:
> When the loose_object_info() function returns an error stop faking up
> the "oi->typep" to OBJ_BAD. Let the return value of the function
> itself suffice. This code cleanup simplifies subsequent changes.

The obvious danger (which you mention) is that somebody is relying on
what typep points to, and is reading it even if we returned non-zero
from whatever called this function.

Hopefully nobody is, but this change makes me a little uncomfortable
nonetheless, since there are so many potential callers (even though this
function has only one caller, it doesn't take long before the number of
indirect callers explodes).

So it would be nice if we could do without it, but you claim that it
simplifies changes that happen later on. So let's continue to see if we
really do need it...

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v6 09/22] cache.h: move object functions to object-store.h
  2021-09-07 10:58             ` [PATCH v6 09/22] cache.h: move object functions to object-store.h Ævar Arnfjörð Bjarmason
@ 2021-09-16 21:33               ` Taylor Blau
  0 siblings, 0 replies; 155+ messages in thread
From: Taylor Blau @ 2021-09-16 21:33 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Tue, Sep 07, 2021 at 12:58:04PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Move the declaration of some ancient object functions added in
> e.g. c4483576b8d (Add "unpack_sha1_header()" helper function,
> 2005-06-01) from cache.h to object-store.h. This continues work
> started in cbd53a2193d (object-store: move object access functions to
> object-store.h, 2018-05-15).

This builds with DEVELOPER=1, likely as a result of all of the includes
on object-store.h added in cbd53a2193d.

> diff --git a/cache.h b/cache.h
> index d23de693680..11a04a93436 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -1313,16 +1313,6 @@ char *xdg_cache_home(const char *filename);
>
>  int git_open_cloexec(const char *name, int flags);
>  #define git_open(name) git_open_cloexec(name, O_RDONLY)
> -int unpack_loose_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz);
> -int parse_loose_header(const char *hdr, unsigned long *sizep);
> -
> -int check_object_signature(struct repository *r, const struct object_id *oid,
> -			   void *buf, unsigned long size, const char *type);
> -
> -int finalize_object_file(const char *tmpfile, const char *filename);
> -
> -/* Helper to check and "touch" a file */

I'm fine to drop this comment, by the way, since it does not add any
explanation to what a function called check_and_freshen_file() might do
;).

> -int check_and_freshen_file(const char *fn, int freshen);

Everything else looks fine, although it's unclear how this is related to
the rest of your series.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v6 10/22] object-file.c: make parse_loose_header_extended() public
  2021-09-07 10:58             ` [PATCH v6 10/22] object-file.c: make parse_loose_header_extended() public Ævar Arnfjörð Bjarmason
@ 2021-09-16 21:39               ` Taylor Blau
  0 siblings, 0 replies; 155+ messages in thread
From: Taylor Blau @ 2021-09-16 21:39 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Tue, Sep 07, 2021 at 12:58:05PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Make the parse_loose_header_extended() function public and remove the
> parse_loose_header() wrapper. The only direct user of it outside of
> object-file.c itself was in streaming.c, that caller can simply pass
> the required "struct object-info *" instead.
>
> This change is being done in preparation for teaching
> read_loose_object() to accept a flag to pass to
> parse_loose_header(). It isn't strictly necessary for that change, we
> could simply use parse_loose_header_extended() there, but will leave
> the API in a better end state.

All seems reasonable. I agree that this is not a necessary step, but at
least the clean-up is self contained and an easy enough read.

The flag that read_loose_object() is going to start passing to
parse_loose_header() is left a bit vague, but I'll continue reading to
figure out what it is.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v6 08/22] object-file.c: don't set "typep" when returning non-zero
  2021-09-16 21:29               ` Taylor Blau
@ 2021-09-16 21:56                 ` Jeff King
  0 siblings, 0 replies; 155+ messages in thread
From: Jeff King @ 2021-09-16 21:56 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Ævar Arnfjörð Bjarmason, git, Junio C Hamano,
	Jonathan Tan, Andrei Rybak

On Thu, Sep 16, 2021 at 05:29:30PM -0400, Taylor Blau wrote:

> On Tue, Sep 07, 2021 at 12:58:03PM +0200, Ævar Arnfjörð Bjarmason wrote:
> > When the loose_object_info() function returns an error stop faking up
> > the "oi->typep" to OBJ_BAD. Let the return value of the function
> > itself suffice. This code cleanup simplifies subsequent changes.
> 
> The obvious danger (which you mention) is that somebody is relying on
> what typep points to, and is reading it even if we returned non-zero
> from whatever called this function.
> 
> Hopefully nobody is, but this change makes me a little uncomfortable
> nonetheless, since there are so many potential callers (even though this
> function has only one caller, it doesn't take long before the number of
> indirect callers explodes).
> 
> So it would be nice if we could do without it, but you claim that it
> simplifies changes that happen later on. So let's continue to see if we
> really do need it...

I'm actually reasonable comfortable with this patch. If we return an
error from the *_object_info() functions, then I think all bets are off
on what is in the resulting object_info struct. E.g., we'd already leave
sizep uninitialized in such a case.

It feels like oi->typep may be a little bit special because we conflate
"error" and "type" in the return from oid_object_info(). But
oid_object_info_extended() does not do that, and the innards of
oid_object_info() do the right thing.

Of course we _have_ been setting typep in this way for a while, so it's
worth making sure nobody is depending on. Notably packed_object_info()
does not behave in this way; if it hits an error, typep may be left
unset. So any oid_object_info_extended() callers depending on this were
already potentially buggy. I'd be OK with a quick sweep of the hits of
"git grep typep" here.

I just did that, and all the sites look pretty reasonable (they call
oid_object_info_extended() and bail as soon as they see that it fails).

-Peff

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v6 13/22] object-file.c: split up ternary in parse_loose_header()
  2021-09-07 10:58             ` [PATCH v6 13/22] object-file.c: split up ternary in parse_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-16 21:58               ` Taylor Blau
  0 siblings, 0 replies; 155+ messages in thread
From: Taylor Blau @ 2021-09-16 21:58 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Tue, Sep 07, 2021 at 12:58:08PM +0200, Ævar Arnfjörð Bjarmason wrote:
> This minor formatting change serves to make a subsequent patch easier
> to read.

Hmm. I'm not sure if I agree.

As far as I can tell from reading the subsequent patch, this is designed
to make it easier to add a comment above "return type" that pertains
just to the case when !*hdr.

I think it would have been fine to go from the ternary to this style
with the comment in a single patch. But I also think it would have been
fine to add the comment above the ternary and instead start it off by
saying "when !*hdr, ...".

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v6 03/22] cat-file tests: test for missing object with -t and -s
  2021-09-16 19:57               ` Taylor Blau
  2021-09-16 20:01                 ` Taylor Blau
@ 2021-09-16 22:52                 ` Ævar Arnfjörð Bjarmason
  1 sibling, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-16 22:52 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak


On Thu, Sep 16 2021, Taylor Blau wrote:

> On Tue, Sep 07, 2021 at 12:57:58PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> Test for what happens when the -t and -s flags are asked to operate on
>> a missing object, this extends tests added in 3e370f9faf0 (t1006: add
>> tests for git cat-file --allow-unknown-type, 2015-05-03). The -t and
>> -s flags are the only ones that can be combined with
>> --allow-unknown-type, so let's test with and without that flag.
>
> I'm a little skeptical to have tests for all four pairs of `-t` or `-s`
> and "with `--allow-unknown-type` and without `--allow-unknown-type`".
>
> Testing both the presence and absence of `--allow-unknown-type` seems
> useful to me, but I'm not sure what testing `-t` and `-s` separately
> buys us.
>
> (If you really feel the need test both, I'd encourage looping like:

Thanks, I'll try to simplify it.

>     for arg in -t -s
>     do
>       test_must_fail git cat-file $arg $missing_oid >out 2>err &&
>       test_must_be_empty out &&
>       test_cmp expect.err err &&
>
>       test_must_fail git cat-file $arg --allow-unknown-type $missing_oid >out 2>err &&
>       test_must_be_empty out &&
>       test_cmp expect.err err
>     done &&
>
> but I would be equally or perhaps even happier to just have one of the
> two tests).

A loop like that can be further simplified as just (just inlining
arg=-s):

	test_must_fail git cat-file -s $missing_oid >out 2>err &&
	test_must_be_empty out &&
	test_cmp expect.err err &&

	test_must_fail git cat-file -s --allow-unknown-type $missing_oid >out 2>err &&
	test_must_be_empty out &&
	test_cmp expect.err err

:)

I.e. unless you end &&-chains in loops in the test framework with an ||
return 1 you're only testing your last iteration. Aside from whatever
I'm doing here I generally prefer to either just spell it out twice (if
small enough), or:

    for arg in -t -s
    do
        test_expect_success '...' "[... use $arg ...]"
    done

Which both nicely get around the issue of that easy-to-make mistake.

We've got some in-tree tests that are broken this way, well, at least
4cf67869b2a (list-objects.c: don't segfault for missing cmdline objects,
2018-12-05). But I think I'll leave that for a #leftoverbits submission
given my outstanding patch queue..., oh there's another one in
t1010-mktree.sh ... :)

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v6 14/22] object-file.c: stop dying in parse_loose_header()
  2021-09-07 10:58             ` [PATCH v6 14/22] object-file.c: stop dying " Ævar Arnfjörð Bjarmason
@ 2021-09-17  2:32               ` Taylor Blau
  0 siblings, 0 replies; 155+ messages in thread
From: Taylor Blau @ 2021-09-17  2:32 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Tue, Sep 07, 2021 at 12:58:09PM +0200, Ævar Arnfjörð Bjarmason wrote:
> It thus makes sense to further libify the interface so that it stops
> calling die() when it encounters OBJ_BAD, and instead rely on its
> callers to check the populated "oi->typep".

Hmm. I thought we got rid of this behavior in a previous commit? Perhaps
I'm thinking of something else, but I would certainly appreciate a
clarification :).

> @@ -1369,15 +1367,6 @@ int parse_loose_header(const char *hdr,
>  	type = type_from_string_gently(type_buf, type_len, 1);
>  	if (oi->type_name)
>  		strbuf_add(oi->type_name, type_buf, type_len);
> -	/*
> -	 * Set type to 0 if its an unknown object and
> -	 * we're obtaining the type using '--allow-unknown-type'
> -	 * option.
> -	 */
> -	if ((flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE) && (type < 0))
> -		type = 0;
> -	else if (type < 0)
> -		die(_("invalid object type"));

Good, this part moved to loose_object_info() as you said it would.

> @@ -1463,18 +1460,20 @@ static int loose_object_info(struct repository *r,
>  		status = error(_("unable to unpack %s header"),
>  			       oid_to_hex(oid));
>  	}
> -
> -	if (status < 0) {
> -		/* Do nothing */
> -	} else if (hdrbuf.len) {
> -		if ((status = parse_loose_header(hdrbuf.buf, oi, flags)) < 0)
> -			status = error(_("unable to parse %s header with --allow-unknown-type"),
> -				       oid_to_hex(oid));
> -	} else if ((status = parse_loose_header(hdr, oi, flags)) < 0) {
> -		status = error(_("unable to parse %s header"), oid_to_hex(oid));
> +	if (!status) {
> +		if (!parse_loose_header(hdrbuf.len ? hdrbuf.buf : hdr, oi))
> +			/*
> +			 * oi->{sizep,typep} are meaningless unless
> +			 * parse_loose_header() returns >= 0.
> +			 */

This double negative is a little confusing. Clearer to say:
"oi->{size,type}p is meaningless if parse_loose_header() returns < 0"?

But I was also a little confused to see that the expression we are
checking here is just that parse_loose_header() returned zero. What
about other positive values?

I think we should either update the comment to say "unless it returns
zero" or the conditional expression to check for >= 0.

> diff --git a/object-store.h b/object-store.h
> index 4064710ae29..584bf5556af 100644
> --- a/object-store.h
> +++ b/object-store.h
> @@ -500,8 +500,17 @@ int for_each_packed_object(each_packed_object_fn, void *,
>  int unpack_loose_header(git_zstream *stream, unsigned char *map,
>  			unsigned long mapsize, void *buffer,
>  			unsigned long bufsiz, struct strbuf *hdrbuf);
> -int parse_loose_header(const char *hdr, struct object_info *oi,
> -		       unsigned int flags);
> +
> +/**
> + * parse_loose_header() parses the starting "<type> <len>\0" of an
> + * object. If it doesn't follow that format -1 is returned. To check
> + * the validity of the <type> populate the "typep" in the "struct
> + * object_info". It will be OBJ_BAD if the object type is unknown. The
> + * parsed <len> can be retrieved via "oi->sizep", and from there
> + * passed to unpack_loose_rest().
> + */
> +int parse_loose_header(const char *hdr, struct object_info *oi);

OK, I guess this must be what I was confused about earlier (that I
thought we didn't support reading typep if returning OBJ_BAD). But it
seems odd to me that we would get rid of it elsewhere, yet continue
using this pattern here.

Or am I mistaken that the two are different?

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v6 15/22] object-file.c: guard against future bugs in loose_object_info()
  2021-09-07 10:58             ` [PATCH v6 15/22] object-file.c: guard against future bugs in loose_object_info() Ævar Arnfjörð Bjarmason
@ 2021-09-17  2:35               ` Taylor Blau
  0 siblings, 0 replies; 155+ messages in thread
From: Taylor Blau @ 2021-09-17  2:35 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Tue, Sep 07, 2021 at 12:58:10PM +0200, Ævar Arnfjörð Bjarmason wrote:
> An earlier version of the preceding commit had a subtle bug where our
> "type_scratch" (later assigned to "oi->typep") would be uninitialized
> and used in the "!allow_unknown" case, at which point it would contain
> a nonsensical value if we'd failed to call parse_loose_header().
>
> The preceding commit introduced "parsed_header" variable to check for
> this case, but I think we can do better, let's carry a "oi_header"
> variable initially set to NULL, and only set it to "oi" once we're
> past parse_loose_header().

Everything in this patch seems OK to me.

For what it's worth, I think that this could likely have been folded
into the previous commit. I was just a little surprised to see
parsed_header go away after I had just a minute or two again spent time
thinking about what it was for.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v6 18/22] object-file.c: use "enum" return type for unpack_loose_header()
  2021-09-07 10:58             ` [PATCH v6 18/22] object-file.c: use "enum" return type for unpack_loose_header() Ævar Arnfjörð Bjarmason
@ 2021-09-17  2:45               ` Taylor Blau
  0 siblings, 0 replies; 155+ messages in thread
From: Taylor Blau @ 2021-09-17  2:45 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Tue, Sep 07, 2021 at 12:58:13PM +0200, Ævar Arnfjörð Bjarmason wrote:
> In the preceding commits we changed and documented
> unpack_loose_header() from return any negative value or zero, to only
> -2, -1 or 0. Let's instead add an "enum unpack_loose_header_result"
> type and use it, and have the compiler assert that we're exhaustively
> covering all return values. This gets rid of the need for having a
> "default" BUG() case in loose_object_info().
>
> I'm on the fence about whether this is more readable or worth it, but
> since it was suggested in [1] to do this let's go for it.

:-). The first hunk is quite a long line, but I think that only suggests
the enum has a long name. I also can't think of anything shorter, so I
think what you have is just fine.

I do think that this is an improvement in readability, and for what it's
worth I am a fan of the previous two changes as well.

As a workflow comment, I would have perhaps done these conversions a
little earlier, maybe in these steps:

  - First a patch to introduce unpack_loose_header_result with just OK
    and BAD, and then converted all callers that return negative numbers
    to return BAD (and all others to return OK).

  - Then a second patch to convert some of the BAD returns into
    BAD_TOO_LONG.

That gets things done in two patches, instead of three, at the cost of a
slightly more complicated first patch. But I think you also get some
more insight into why we're making the change in the first place instead
of having to read through a couple of commits to get there.

In any case, what you have is certainly fine, and I don't think that one
approach is any better or worse than the other. Just mentioning it in
case it's something may try in the future.

This patch looks good.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v6 19/22] fsck: don't hard die on invalid object types
  2021-09-07 10:58             ` [PATCH v6 19/22] fsck: don't hard die on invalid object types Ævar Arnfjörð Bjarmason
@ 2021-09-17  3:37               ` Taylor Blau
  0 siblings, 0 replies; 155+ messages in thread
From: Taylor Blau @ 2021-09-17  3:37 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Tue, Sep 07, 2021 at 12:58:14PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Change the error fsck emits on invalid object types, such as:
>
>     $ git hash-object --stdin -w -t garbage --literally </dev/null
>     <OID>
>
> >From the very ungraceful error of:
>
>     $ git fsck
>     fatal: invalid object type
>     $
>
> To:
>
>     $ git fsck
>     error: hash mismatch for <OID_PATH> (expected <OID>)
>     error: <OID>: object corrupt or missing: <OID_PATH>
>     [ the rest of the fsck output here, i.e. it didn't hard die ]

Great. I don't love the second error (since it doesn't really give the
user any new information when read after the first) but that's fsck's
fault, and not your patch's.

> To do this we need to pass down the "OBJECT_INFO_ALLOW_UNKNOWN_TYPE"
> flag from read_loose_object() through to parse_loose_header(). Since
> the read_loose_object() function is only used in builtin/fsck.c we can
> simply change it. See f6371f92104 (sha1_file: add read_loose_object()
> function, 2017-01-13) for the introduction of read_loose_object().
>
> Why are we complaining about a "hash mismatch" for an object of a type
> we don't know about? We shouldn't. This is the bare minimal change
> needed to not make fsck hard die on a repository that's been corrupted
> in this manner. In subsequent commits we'll teach fsck to recognize
> this particular type of corruption and emit a better error message.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  builtin/fsck.c  |  3 ++-
>  object-file.c   | 11 ++++++++---
>  object-store.h  |  3 ++-
>  t/t1450-fsck.sh | 14 +++++++-------
>  4 files changed, 19 insertions(+), 12 deletions(-)
>
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index b42b6fe21f7..082dadd5629 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -601,7 +601,8 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
>  	void *contents;
>  	int eaten;
>
> -	if (read_loose_object(path, oid, &type, &size, &contents) < 0) {
> +	if (read_loose_object(path, oid, &type, &size, &contents,
> +			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
>  		errors_found |= ERROR_OBJECT;
>  		error(_("%s: object corrupt or missing: %s"),
>  		      oid_to_hex(oid), path);
> diff --git a/object-file.c b/object-file.c
> index 9484c7ce2be..0e6937fad73 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -2562,7 +2562,8 @@ int read_loose_object(const char *path,
>  		      const struct object_id *expected_oid,
>  		      enum object_type *type,
>  		      unsigned long *size,
> -		      void **contents)
> +		      void **contents,
> +		      unsigned int oi_flags)
>  {
>  	int ret = -1;
>  	void *map = NULL;
> @@ -2570,6 +2571,7 @@ int read_loose_object(const char *path,
>  	git_zstream stream;
>  	char hdr[MAX_HEADER_LEN];
>  	struct object_info oi = OBJECT_INFO_INIT;
> +	int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
>  	oi.typep = type;
>  	oi.sizep = size;
>
> @@ -2592,8 +2594,11 @@ int read_loose_object(const char *path,
>  		git_inflate_end(&stream);
>  		goto out;
>  	}
> -	if (*type < 0)
> -		die(_("invalid object type"));
> +	if (!allow_unknown && *type < 0) {
> +		error(_("header for %s declares an unknown type"), path);
> +		git_inflate_end(&stream);
> +		goto out;
> +	}

Hmm. I'm not sure that I new test for this error (which may be
uninteresting, in which case it is fine to skip).
>
> -test_expect_success 'fsck hard errors on an invalid object type' '
> +test_expect_success 'fsck error and recovery on invalid object type' '
>  	git init --bare garbage-type &&
>  	empty_blob=$(git -C garbage-type hash-object --stdin -w -t blob </dev/null) &&
>  	garbage_blob=$(git -C garbage-type hash-object --stdin -w -t garbage --literally </dev/null) &&
> -	cat >err.expect <<-\EOF &&
> -	fatal: invalid object type
> -	EOF
> -	test_must_fail git -C garbage-type fsck >out.actual 2>err.actual &&
> -	test_cmp err.expect err.actual &&
> -	test_must_be_empty out.actual
> +	test_must_fail git -C garbage-type fsck >out 2>err &&
> +	grep -e "^error" -e "^fatal" err >errors &&
> +	test_line_count = 2 errors &&
> +	grep "error: hash mismatch for" err &&
> +	grep "$garbage_blob: object corrupt or missing:" err &&
> +	grep "dangling blob $empty_blob" out
>  '

Great.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v6 21/22] fsck: report invalid types recorded in objects
  2021-09-07 10:58             ` [PATCH v6 21/22] fsck: report invalid types recorded in objects Ævar Arnfjörð Bjarmason
@ 2021-09-17  3:57               ` Taylor Blau
  0 siblings, 0 replies; 155+ messages in thread
From: Taylor Blau @ 2021-09-17  3:57 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Tue, Sep 07, 2021 at 12:58:16PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Continue the work in the preceding commit and improve the error on:
>
>     $ git hash-object --stdin -w -t garbage --literally </dev/null
>     $ git fsck
>     error: hash mismatch for <OID_PATH> (expected <OID>)
>     error: <OID>: object corrupt or missing: <OID_PATH>
>     [ other fsck output ]
>
> To instead emit:
>
>     $ git fsck
>     error: <OID>: object is of unknown type 'garbage': <OID_PATH>
>     [ other fsck output ]
>
> The complaint about a "hash mismatch" was simply an emergent property
> of how we'd fall though from read_loose_object() into fsck_loose()
> when we didn't get the data we expected. Now we'll correctly note that
> the object type is invalid.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  builtin/fsck.c  | 22 ++++++++++++++++++----
>  object-file.c   | 13 +++++--------
>  object-store.h  |  4 ++--
>  t/t1450-fsck.sh | 24 +++++++++++++++++++++---
>  4 files changed, 46 insertions(+), 17 deletions(-)
>
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index 082dadd5629..07af0434db6 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -600,12 +600,26 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
>  	unsigned long size;
>  	void *contents;
>  	int eaten;
> -
> -	if (read_loose_object(path, oid, &type, &size, &contents,
> -			      OBJECT_INFO_ALLOW_UNKNOWN_TYPE) < 0) {
> -		errors_found |= ERROR_OBJECT;
> +	struct strbuf sb = STRBUF_INIT;
> +	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
> +	struct object_info oi;
> +	int found = 0;
> +	oi.type_name = &sb;
> +	oi.sizep = &size;
> +	oi.typep = &type;
> +
> +	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {

OK, now we pass a struct object_info instead of pointers to type and
size separately. Makes sense.

> +		found |= ERROR_OBJECT;

And found tracks the error we found when trying to read this loose
object, if any. Having a separate variable makes sense, since we only
want to avoid calling fsck_obj() if we found any errors for this object
while trying to call read_loose_object().

>  		error(_("%s: object corrupt or missing: %s"),
>  		      oid_to_hex(oid), path);
> +	}
> +	if (type < 0) {
> +		found |= ERROR_OBJECT;
> +		error(_("%s: object is of unknown type '%s': %s"),
> +		      oid_to_hex(oid), sb.buf, path);
> +	}
> +	if (found) {
> +		errors_found |= ERROR_OBJECT;

Perhaps errors_found |= found ?

>  		return 0; /* keep checking other objects */
>  	}
>
> diff --git a/object-file.c b/object-file.c
> index 0e6937fad73..f4850ba62b4 100644
> --- a/object-file.c
> +++ b/object-file.c
> @@ -2560,9 +2560,8 @@ static int check_stream_oid(git_zstream *stream,
>
>  int read_loose_object(const char *path,
>  		      const struct object_id *expected_oid,
> -		      enum object_type *type,
> -		      unsigned long *size,
>  		      void **contents,
> +		      struct object_info *oi,
>  		      unsigned int oi_flags)

All of the changes in this function make perfect sense, except...
>  {
>  	int ret = -1;
> @@ -2570,10 +2569,9 @@ int read_loose_object(const char *path,
>  	unsigned long mapsize;
>  	git_zstream stream;
>  	char hdr[MAX_HEADER_LEN];
> -	struct object_info oi = OBJECT_INFO_INIT;
>  	int allow_unknown = oi_flags & OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
> -	oi.typep = type;
> -	oi.sizep = size;
> +	enum object_type *type = oi->typep;
> +	unsigned long *size = oi->sizep;

...I see that size is used in check_object_signature(), but I don't see
any uses for type. Am I missing it?

The tests look good to me.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v6 22/22] fsck: report invalid object type-path combinations
  2021-09-07 10:58             ` [PATCH v6 22/22] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
@ 2021-09-17  4:06               ` Taylor Blau
  0 siblings, 0 replies; 155+ messages in thread
From: Taylor Blau @ 2021-09-17  4:06 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Tue, Sep 07, 2021 at 12:58:17PM +0200, Ævar Arnfjörð Bjarmason wrote:
> Improve the error that's emitted in cases where we find a loose object
> we parse, but which isn't at the location we expect it to be.
>
> Before this change we'd prefix the error with a not-a-OID derived from
> the path at which the object was found, due to an emergent behavior in
> how we'd end up with an "OID" in these codepaths.
>
> Now we'll instead say what object we hashed, and what path it was
> found at. Before this patch series e.g.:
>
>     $ git hash-object --stdin -w -t blob </dev/null
>     e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
>     $ mv objects/e6/ objects/e7
>
> Would emit ("[...]" used to abbreviate the OIDs):
>
>     git fsck
>     error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...])
>     error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...]
>
> Now we'll instead emit:
>
>     error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...]

Lovely!

> @@ -603,20 +603,25 @@ static int fsck_loose(const struct object_id *oid, const char *path, void *data)
>  	struct strbuf sb = STRBUF_INIT;
>  	unsigned int oi_flags = OBJECT_INFO_ALLOW_UNKNOWN_TYPE;
>  	struct object_info oi;
> +	struct object_id real_oid = *null_oid();
>  	int found = 0;
>  	oi.type_name = &sb;
>  	oi.sizep = &size;
>  	oi.typep = &type;
>
> -	if (read_loose_object(path, oid, &contents, &oi, oi_flags) < 0) {
> +	if (read_loose_object(path, oid, &real_oid, &contents, &oi, oi_flags) < 0) {
>  		found |= ERROR_OBJECT;
> -		error(_("%s: object corrupt or missing: %s"),
> -		      oid_to_hex(oid), path);
> +		if (!oideq(&real_oid, oid))
> +			error(_("%s: hash-path mismatch, found at: %s"),
> +			      oid_to_hex(&real_oid), path);
> +		else
> +			error(_("%s: object corrupt or missing: %s"),
> +			      oid_to_hex(oid), path);

Nice; this is the important part that this patch is changing, and the
logic is very nice. Before it read "anytime read_loose_object fails,
it's an error" to "it's still an error, but we can handle the case where
the real OID and the one we expected were different separately from
generic corruption".

>  	}
>  	if (type < 0) {
>  		found |= ERROR_OBJECT;
>  		error(_("%s: object is of unknown type '%s': %s"),
> -		      oid_to_hex(oid), sb.buf, path);
> +		      oid_to_hex(&real_oid), sb.buf, path);

Could go either way on this hunk, but I think that I err slightly on
your side now that we have access to the "real_oid".

The rest of the code and test changes in this patch look good to me.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting
  2021-09-07 10:57           ` [PATCH v6 00/22] fsck: lib-ify object-file.c & better fsck "invalid object" error reporting Ævar Arnfjörð Bjarmason
                               ` (21 preceding siblings ...)
  2021-09-07 10:58             ` [PATCH v6 22/22] fsck: report invalid object type-path combinations Ævar Arnfjörð Bjarmason
@ 2021-09-17  4:08             ` Taylor Blau
  22 siblings, 0 replies; 155+ messages in thread
From: Taylor Blau @ 2021-09-17  4:08 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak

On Tue, Sep 07, 2021 at 12:57:55PM +0200, Ævar Arnfjörð Bjarmason wrote:
> This improves fsck error reporting, see the examples in the commit
> messages of 19/22, 21/22 and 22/22. To get there I've lib-ified more
> thigs in object-file.c and the general object APIs, i.e. now we'll
> return error codes instead of calling die() in these cases.
>
> This series has been in "needs review" state for a while. This re-roll
> is mainly to bump it for the list's attention, but while I was at it I
> addressed point from Jonathan Tan raised in a previous round: use an
> enum instead of int for the unpack_loose_header() return value.

I took a thorough look through this series, and left a handful of minor
comments. I didn't spot any glaring issues, and think that this series
is in pretty good shape.

I do admit there were quite a large number of patches to get to the
couple of changes at the end. I left some thoughts throughout for places
that I would have combined things / presented them in a different order
or similar.

I don't think you should spend much time changing the structure now that
it's been looked at with close eyes, but just some idle thoughts for
other large series you might send in the future.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 155+ messages in thread

* Re: [PATCH v6 01/22] fsck tests: refactor one test to use a sub-repo
  2021-09-16 19:40               ` Taylor Blau
@ 2021-09-17  9:27                 ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 155+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2021-09-17  9:27 UTC (permalink / raw)
  To: Taylor Blau; +Cc: git, Junio C Hamano, Jeff King, Jonathan Tan, Andrei Rybak


On Thu, Sep 16 2021, Taylor Blau wrote:

> On Tue, Sep 07, 2021 at 12:57:56PM +0200, Ævar Arnfjörð Bjarmason wrote:
>> Refactor one of the fsck tests to use a throwaway repository. It's a
>> pervasive pattern in t1450-fsck.sh to spend a lot of effort on the
>> teardown of a tests so we're not leaving corrupt content for the next
>> test.
>
> OK. I seem to recall you advocating against this pattern elsewhere[1], but
> this is a good example of why it can sometimes make writing tests much
> easier when not having to reason about what leaks out of running a test.
>
> [1]: https://lore.kernel.org/git/87zgsnj0q0.fsf@evledraar.gmail.com/,
> although after re-reading it it looks like you were more focused on the
> unnecessary "rm -fr repo" there and not the "git init +
> test_when_finished rm -fr" pattern.

I was referring to a different pattern there, replied in some detail at
https://lore.kernel.org/git/87y27veeyj.fsf@evledraar.gmail.com/

>> -test_expect_success 'object with bad sha1' '
>> -	sha=$(echo blob | git hash-object -w --stdin) &&
>> -	old=$(test_oid_to_path "$sha") &&
>> -	new=$(dirname $old)/$(test_oid ff_2) &&
>> -	sha="$(dirname $new)$(basename $new)" &&
>> -	mv .git/objects/$old .git/objects/$new &&
>> -	test_when_finished "remove_object $sha" &&
>> -	git update-index --add --cacheinfo 100644 $sha foo &&
>> -	test_when_finished "git read-tree -u --reset HEAD" &&
>> -	tree=$(git write-tree) &&
>> -	test_when_finished "remove_object $tree" &&
>> -	cmt=$(echo bogus | git commit-tree $tree) &&
>> -	test_when_finished "remove_object $cmt" &&
>> -	git update-ref refs/heads/bogus $cmt &&
>> -	test_when_finished "git update-ref -d refs/heads/bogus" &&
>> -
>> -	test_must_fail git fsck 2>out &&
>> -	test_i18ngrep "$sha.*corrupt" out
>> +test_expect_success 'object with hash mismatch' '
>> +	git init --bare hash-mismatch &&
>> +	(
>> +		cd hash-mismatch &&
>> +		oid=$(echo blob | git hash-object -w --stdin) &&
>> +		old=$(test_oid_to_path "$oid") &&
>> +		new=$(dirname $old)/$(test_oid ff_2) &&
>> +		oid="$(dirname $new)$(basename $new)" &&
>> +		mv objects/$old objects/$new &&
>> +		git update-index --add --cacheinfo 100644 $oid foo &&
>> +		tree=$(git write-tree) &&
>> +		cmt=$(echo bogus | git commit-tree $tree) &&
>> +		git update-ref refs/heads/bogus $cmt &&
>> +		test_must_fail git fsck 2>out &&
>> +		test_i18ngrep "$oid.*corrupt" out
>> +	)
>>  '
>
> This all looks fine to me. The translation is s/sha/oid and removing all
> of the now-unnecessary test_when_finished calls.
>
> But the test_i18ngrep (which isn't new) could probably also stand to get
> cleaned up and converted to a normal grep.

Thanks, I missed that one!

^