git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [RFC][PATCH] object.c: use has_object() instead of repo_has_object_file()
@ 2022-11-16 16:39 Kousik Sanagavarapu
  2022-11-16 18:20 ` Jeff King
  0 siblings, 1 reply; 22+ messages in thread
From: Kousik Sanagavarapu @ 2022-11-16 16:39 UTC (permalink / raw)
  To: git; +Cc: Kousik Sanagavarapu

It is mentioned in object-store.h that the function
repo_has_object_file() is deprecated. One possible alternative for this
function is has_object() (or atleast that is how I understood it).

The file object-store.h also mentions that repo_has_object_file() and
its fellow functions and macros can be removed once the migrations take
place. This patch therefore is an attempt to reduce the usage of these
functions and macros.

I request for comments as I'm not really sure about the "flags" argument
of the has_object() function and its usage in this patch.

Signed-off-by: Kousik Sanagavarapu <five231003@gmail.com>
---
 object.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/object.c b/object.c
index 8a74eb85e9..0a9516137a 100644
--- a/object.c
+++ b/object.c
@@ -286,8 +286,8 @@ struct object *parse_object_with_flags(struct repository *r,
 			return &commit->object;
 	}
 
-	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
-	    (!obj && repo_has_object_file(r, oid) &&
+	if ((obj && obj->type == OBJ_BLOB && has_object(r, oid, 0)) ||
+	    (!obj && has_object(r, oid, 0) &&
 	     oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
 		if (!skip_hash && stream_object_signature(r, repl) < 0) {
 			error(_("hash mismatch %s"), oid_to_hex(oid));
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] object.c: use has_object() instead of repo_has_object_file()
  2022-11-16 16:39 [RFC][PATCH] object.c: use has_object() instead of repo_has_object_file() Kousik Sanagavarapu
@ 2022-11-16 18:20 ` Jeff King
  2022-11-16 21:14   ` Jonathan Tan
  0 siblings, 1 reply; 22+ messages in thread
From: Jeff King @ 2022-11-16 18:20 UTC (permalink / raw)
  To: Kousik Sanagavarapu; +Cc: Jonathan Tan, git

On Wed, Nov 16, 2022 at 10:09:56PM +0530, Kousik Sanagavarapu wrote:

> It is mentioned in object-store.h that the function
> repo_has_object_file() is deprecated. One possible alternative for this
> function is has_object() (or atleast that is how I understood it).
> 
> The file object-store.h also mentions that repo_has_object_file() and
> its fellow functions and macros can be removed once the migrations take
> place. This patch therefore is an attempt to reduce the usage of these
> functions and macros.
> 
> I request for comments as I'm not really sure about the "flags" argument
> of the has_object() function and its usage in this patch.

So you've stumbled into quite a tricky spot. :)

Yes, without specifying new flags, this patch has a change of behavior
which I think is not what we want. From 1d8d9cb620 (sha1-file: introduce
no-lazy-fetch has_object(), 2020-08-05):

  There have been a few bugs wherein Git fetches missing objects
  whenever the existence of an object is checked, even though it does
  not need to perform such a fetch. To resolve these bugs, we could look
  at all the places that has_object_file() (or a similar function) is
  used. As a first step, introduce a new function has_object() that
  checks for the existence of an object, with a default behavior of not
  fetching if the object is missing and the repository is a partial
  clone. As we verify each has_object_file() (or similar) usage, we can
  replace it with has_object(), and we will know that we are done when
  we can delete has_object_file() (and the other similar functions).
  
  Also, the new function has_object() has more appropriate defaults:
  besides not fetching, it also does not recheck packed storage.

So the new function will:

  - not recheck packed objects unless we pass HAS_OBJECT_RECHECK_PACKED;
    this is done in the default paths because a simultaneous git-gc may
    be repacking objects (and we would rather double-check than racily
    miss it). So it's appropriate behavior if the caller is
    speculatively asking "hey, we _might_ have this object", but not if
    we expect to have it.

  - not lazily fetch objects in a partial-clone repository. This again
    depends on the caller being in a situation where they are OK saying
    "no we don't have it" for an object we _could_ get if it was worth
    spending effort

The call you're touching here is in parse_object_with_flags(), which is
using the check as part of the "is it a blob that we should stream?"
check. If this call returns "no we don't have it", when we could get it
(either due to racy repack or by fetching), then we'll hit the non-blob
path that calls repo_read_object_file(). Where we would do a fresh
lookup, including re-checking packs and/or lazily fetching!

So the new behavior seems strictly worse to me. We don't avoid those
behaviors, _and_ we fail to follow the streaming-blob code path (which
means we may accidentally read a huge blob into memory. I think we'd
want to leave it as-is, or if we really want to eventually drop to a
single interface, we need has_object() to learn a new flag to enable the
lazy-fetch (and then use it along with RECHECK_PACKED).

Now all of that said, I am skeptical that these calls to
repo_has_object_file() are even doing anything useful at all. Looking at
the existing code (dropping the "+" lines from your patch):

> -	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
> -	    (!obj && repo_has_object_file(r, oid) &&
>  	     oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
>  		if (!skip_hash && stream_object_signature(r, repl) < 0) {
>  			error(_("hash mismatch %s"), oid_to_hex(oid));

we are checking that it exits if either:

  - another object (e.g., a tree) referred to it as a blob during this
    process, and we created an in-memory "struct object" with that type

  - nobody has referred to it, and we want to check its type via
    oid_object_info()

In the second case, this seems totally pointless. We can just ask
oid_object_info() what it's type is, and it will say "no, we don't have
it" if appropriate. It will do the usual recheck-pack and lazy-fetch,
but so is repo_has_object_file(), so the short-circuit "&&" is not
helping. _If_ we were to switch to has_object() it would start to do
something, but I think that's a bad idea for the reasons given above.

In the first case, I'd likewise argue it's not doing anything useful. It
is confirming that we have the object, but so would the call to
stream_object_signature() immediately below (assuming skip_hash is not
set). If skip_hash is set, then we could either:

  1. Just assume we have it. The point of the caller passing skip_hash
     is that we don't care about checking the integrity for this use
     case, and "missing" is not really any different than "there's a
     file on disk but it might contain garbage bytes".

  2. Check repo_has_object_file() in this code path only when skip_hash
     is set. That retains the same "do we even have it" check for
     skip_hash that is performed now.

I.e., I'd suggest this patch to remove both calls entirely:

diff --git a/object.c b/object.c
index 248530ba7b..a370545405 100644
--- a/object.c
+++ b/object.c
@@ -285,9 +285,8 @@ struct object *parse_object_with_flags(struct repository *r,
 			return &commit->object;
 	}
 
-	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
-	    (!obj && repo_has_object_file(r, oid) &&
-	     oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
+	if ((obj && obj->type == OBJ_BLOB) ||
+	    (!obj && oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
 		if (!skip_hash && stream_object_signature(r, repl) < 0) {
 			error(_("hash mismatch %s"), oid_to_hex(oid));
 			return NULL;

But there may be some subtlety I'm missing. I'm cc-ing Jonathan Tan, who
added has_object(), and who added the top call to repo_has_object_file()
via df11e19648 (rev-list: support termination at promisor objects,
2017-12-08). The second call is from 090ea12671 (parse_object: avoid
putting whole blob in core, 2012-03-07), but he's no longer active on
the project. Looking at the commit, I think it was just a case of "let's
be extra careful". But as far as I can tell, it's not helping anything,
and both calls are introducing extra work doing object lookups.

-Peff

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC][PATCH] object.c: use has_object() instead of repo_has_object_file()
  2022-11-16 18:20 ` Jeff King
@ 2022-11-16 21:14   ` Jonathan Tan
  2022-11-17 22:37     ` [PATCH 0/2] fixing parse_object() check for type mismatch Jeff King
  0 siblings, 1 reply; 22+ messages in thread
From: Jonathan Tan @ 2022-11-16 21:14 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Tan, Kousik Sanagavarapu, git

Jeff King <peff@peff.net> writes:
> I.e., I'd suggest this patch to remove both calls entirely:
> 
> diff --git a/object.c b/object.c
> index 248530ba7b..a370545405 100644
> --- a/object.c
> +++ b/object.c
> @@ -285,9 +285,8 @@ struct object *parse_object_with_flags(struct repository *r,
>  			return &commit->object;
>  	}
>  
> -	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
> -	    (!obj && repo_has_object_file(r, oid) &&
> -	     oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
> +	if ((obj && obj->type == OBJ_BLOB) ||
> +	    (!obj && oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
>  		if (!skip_hash && stream_object_signature(r, repl) < 0) {
>  			error(_("hash mismatch %s"), oid_to_hex(oid));
>  			return NULL;
> 
> But there may be some subtlety I'm missing. I'm cc-ing Jonathan Tan, who
> added has_object(), and who added the top call to repo_has_object_file()
> via df11e19648 (rev-list: support termination at promisor objects,
> 2017-12-08).

Thanks for CC-ing me on this. Looking at that commit and the code at that time,
I'm not sure why I added that call either. My best guess is that I was worried
that the streaming interface wouldn't support missing objects, but both then
and now, a call to istream_source() is made before any streaming occurs (which
does perform the lazy fetch).

So yes, I also think that you can remove these calls.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 0/2] fixing parse_object() check for type mismatch
  2022-11-16 21:14   ` Jonathan Tan
@ 2022-11-17 22:37     ` Jeff King
  2022-11-17 22:37       ` [PATCH 1/2] parse_object(): drop extra "has" check before checking object type Jeff King
                         ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Jeff King @ 2022-11-17 22:37 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Taylor Blau, Kousik Sanagavarapu, git

On Wed, Nov 16, 2022 at 01:14:18PM -0800, Jonathan Tan wrote:

> > But there may be some subtlety I'm missing. I'm cc-ing Jonathan Tan, who
> > added has_object(), and who added the top call to repo_has_object_file()
> > via df11e19648 (rev-list: support termination at promisor objects,
> > 2017-12-08).
> 
> Thanks for CC-ing me on this. Looking at that commit and the code at that time,
> I'm not sure why I added that call either. My best guess is that I was worried
> that the streaming interface wouldn't support missing objects, but both then
> and now, a call to istream_source() is made before any streaming occurs (which
> does perform the lazy fetch).
> 
> So yes, I also think that you can remove these calls.

Thanks. After staring at this a bit, I noticed there is an even more
subtle issue with the case you touched back then, which is that it fails
to notice when a think we expect to be a blob isn't one. Your old patch
didn't make anything worse there, but it also wasn't sufficient to catch
the problem. See patch 2 for details.

I'm adding Taylor to the cc as the author of t6102, when we were
tracking down all of these "oops, it's not really a blob" cases. This
fixes one of the lingering cases from that test script.

  [1/2]: parse_object(): drop extra "has" check before checking object type
  [2/2]: parse_object(): check on-disk type of suspected blob

 object.c                               | 5 ++---
 t/t6102-rev-list-unexpected-objects.sh | 4 ++--
 2 files changed, 4 insertions(+), 5 deletions(-)

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/2] parse_object(): drop extra "has" check before checking object type
  2022-11-17 22:37     ` [PATCH 0/2] fixing parse_object() check for type mismatch Jeff King
@ 2022-11-17 22:37       ` Jeff King
  2022-11-17 22:41       ` [PATCH 2/2] parse_object(): check on-disk type of suspected blob Jeff King
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 22+ messages in thread
From: Jeff King @ 2022-11-17 22:37 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Taylor Blau, Kousik Sanagavarapu, git

When parsing an object of unknown type, we check to see if it's a blob,
so we can use our streaming code path. This uses oid_object_info() to
check the type, but before doing so we call repo_has_object_file(). This
latter is pointless, as oid_object_info() will already fail if the
object is missing. Checking it ahead of time just complicates the code
and is a waste of resources (albeit small).

Let's drop the redundant check.

Signed-off-by: Jeff King <peff@peff.net>
---
 object.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/object.c b/object.c
index 8a74eb85e9..16eb944e98 100644
--- a/object.c
+++ b/object.c
@@ -287,8 +287,7 @@ struct object *parse_object_with_flags(struct repository *r,
 	}
 
 	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
-	    (!obj && repo_has_object_file(r, oid) &&
-	     oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
+	    (!obj && oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
 		if (!skip_hash && stream_object_signature(r, repl) < 0) {
 			error(_("hash mismatch %s"), oid_to_hex(oid));
 			return NULL;
-- 
2.38.1.890.g50b10763b9


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/2] parse_object(): check on-disk type of suspected blob
  2022-11-17 22:37     ` [PATCH 0/2] fixing parse_object() check for type mismatch Jeff King
  2022-11-17 22:37       ` [PATCH 1/2] parse_object(): drop extra "has" check before checking object type Jeff King
@ 2022-11-17 22:41       ` Jeff King
  2022-11-18  0:36         ` Ævar Arnfjörð Bjarmason
  2022-11-18 11:46       ` [PATCH 0/4] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason
  2022-11-18 19:05       ` [PATCH 0/2] fixing parse_object() check for type mismatch Taylor Blau
  3 siblings, 1 reply; 22+ messages in thread
From: Jeff King @ 2022-11-17 22:41 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Taylor Blau, Kousik Sanagavarapu, git

In parse_object(), we try to handle blobs by streaming rather than
loading them entirely into memory. The most common case here will be
that we haven't seen the object yet and check oid_object_info(), which
tells us we have a blob.

But we trigger this code on one other case: when we have an in-memory
object struct with type OBJ_BLOB (and without its "parsed" flag set,
since otherwise we'd return early from the function). This indicates
that some other part of the code suspected we have a blob (e.g., it was
mentioned by a tree or tag) but we haven't yet looked at the on-disk
copy.

In this case before hitting the streaming path, we check if we have the
object on-disk at all. This is mostly pointless extra work, as the
streaming path would complain if it couldn't open the object (albeit
with the message "hash mismatch", which is a little misleading).

But it's also insufficient to catch all problems. The streaming code
will only tell us "yes, the on-disk object matches the oid". But it
doesn't actually confirm that what we found was indeed a blob, and
neither does repo_has_object_file().

One way to improve this would be to teach stream_object_signature() to
check the type (either by returning it to us to check, or taking an
"expected" type). But there's an even simpler fix here: if we suspect
the object is a blob, just call oid_object_info() to confirm that we
have it on-disk, and that it really is a blob.

This is slightly less efficient than teaching stream_object_signature()
to do it (since it has to open the object already). But this case very
rarely comes up. In practice, we usually don't have any clue what the
type is, in which case we already call oid_object_info(). This
"suspected" case happens only when some other code created an object
struct but didn't actually parse the blob, which is actually tricky to
trigger at all (see the discussion of the test below).

I reworked the conditional a bit so that instead of:

  if ((suspected_blob && oid_object_info() == OBJ_BLOB)
      (no_clue && oid_object_info() == OBJ_BLOB)

we have the simpler:

  if ((suspected_blob || no_clue) && oid_object_info() == OBJ_BLOB)

This is shorter, but also reflects what we really want say, which is
"have we ruled out this being a blob; if not, check it on-disk".

In either case, if oid_object_info() fails to tell us it's a blob, we'll
skip the streaming code path and call repo_read_object_file(), just as
before. And if we really do have a mismatch with the existing object
struct, we'll eventually call lookup_commit(), etc, via
parse_object_buffer(), which will complain that it doesn't match our
existing obj->type.

So this fixes one of the lingering expect_failure cases from 0616617c7e
(t: introduce tests for unexpected object types, 2019-04-09).  That test
works by peeling a tag that claims to point to a blob (triggering us to
create the struct), but really points to something else, which we later
discover when we call parse_object() as part of the actual traversal).
Prior to this commit, we'd quietly check the sha1 and mark the blob as
"parsed". Now we correctly complain about the mismatch.

Signed-off-by: Jeff King <peff@peff.net>
---
As an aside, I found the "this test is marked as success but testing the
wrong thing" pattern here confusing to deal with (since I had to dig in
history to understand what was going on and what the test was _supposed_
to say).

It comes from cf10c5b4cf (rev-list tests: don't hide abort() in
"test_expect_failure", 2022-03-07). I'm skeptical that it was worth
switching those tests for leak detection purposes.

But more importantly, it looks like pw/test-todo would provide us with a
much nicer pattern there. It seems to be stalled on review, so let's see
if we can get that moving again.

 object.c                               | 4 ++--
 t/t6102-rev-list-unexpected-objects.sh | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/object.c b/object.c
index 16eb944e98..fad1a5af4a 100644
--- a/object.c
+++ b/object.c
@@ -286,8 +286,8 @@ struct object *parse_object_with_flags(struct repository *r,
 			return &commit->object;
 	}
 
-	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
-	    (!obj && oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
+	if ((!obj || (obj && obj->type == OBJ_BLOB)) &&
+	    oid_object_info(r, oid, NULL) == OBJ_BLOB) {
 		if (!skip_hash && stream_object_signature(r, repl) < 0) {
 			error(_("hash mismatch %s"), oid_to_hex(oid));
 			return NULL;
diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh
index 4a9a4436e2..9350b5fd2c 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -121,8 +121,8 @@ test_expect_success 'setup unexpected non-blob tag' '
 	tag=$(git hash-object -w --literally -t tag broken-tag)
 '
 
-test_expect_success 'TODO (should fail!): traverse unexpected non-blob tag (lone)' '
-	git rev-list --objects $tag
+test_expect_success 'traverse unexpected non-blob tag (lone)' '
+	test_must_fail git rev-list --objects $tag
 '
 
 test_expect_success 'traverse unexpected non-blob tag (seen)' '
-- 
2.38.1.890.g50b10763b9

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] parse_object(): check on-disk type of suspected blob
  2022-11-17 22:41       ` [PATCH 2/2] parse_object(): check on-disk type of suspected blob Jeff King
@ 2022-11-18  0:36         ` Ævar Arnfjörð Bjarmason
  2022-11-21 19:21           ` Jeff King
  0 siblings, 1 reply; 22+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-11-18  0:36 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Tan, Taylor Blau, Kousik Sanagavarapu, git


On Thu, Nov 17 2022, Jeff King wrote:

> I reworked the conditional a bit so that instead of:
>
>   if ((suspected_blob && oid_object_info() == OBJ_BLOB)
>       (no_clue && oid_object_info() == OBJ_BLOB)
>
> we have the simpler:
>
>   if ((suspected_blob || no_clue) && oid_object_info() == OBJ_BLOB)
> [...]
> @@ -286,8 +286,8 @@ struct object *parse_object_with_flags(struct repository *r,
>  			return &commit->object;
>  	}
>  
> -	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
> -	    (!obj && oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
> +	if ((!obj || (obj && obj->type == OBJ_BLOB)) &&
> +	    oid_object_info(r, oid, NULL) == OBJ_BLOB) {
>  		if (!skip_hash && stream_object_signature(r, repl) < 0) {
>  			error(_("hash mismatch %s"), oid_to_hex(oid));
>  			return NULL;

But why:

	if ((!x || (x && x->m)) && ...)

Instead of:

	if ((!x || x->m)) && ...)

If "!obj" is false then "obj" must be non-NULL, so you don't need to
check it again and can lose the "obj &&".

> [...]
> So this fixes one of the lingering expect_failure cases from 0616617c7e
> (t: introduce tests for unexpected object types, 2019-04-09).  That test
> works by peeling a tag that claims to point to a blob (triggering us to
> create the struct), but really points to something else, which we later
> discover when we call parse_object() as part of the actual traversal).
> Prior to this commit, we'd quietly check the sha1 and mark the blob as
> "parsed". Now we correctly complain about the mismatch.

I applied this on top of "master", and adjusted your test to be this
instead:

	test_expect_success 'traverse unexpected non-blob tag (lone)' '
		cat >expect <<-EOF &&
		error: object $commit is a blob, not a commit
		fatal: bad object $commit
		EOF
		test_must_fail git rev-list --objects $tag >out 2>actual &&
		test_must_be_empty out &&
		test_cmp expect actual
	'

Which passes, showing that we're still not correctly identifying it, but
we are doing it for the purposes of erroring out, but the incorrect type
persists.

Now, this all does seem quite familiar... :) :
https://lore.kernel.org/git/patch-10.11-a84f670ac24-20210328T021238Z-avarab@gmail.com/

I.e. that's the rest of the fix for this issue. I applied this change on
my local branch with that, and they combine nicely. the "test_must_fail"
here works as intended, *and* we'll correctly report & store the type.

> As an aside, I found the "this test is marked as success but testing the
> wrong thing" pattern here confusing to deal with (since I had to dig in
> history to understand what was going on and what the test was _supposed_
> to say).
>
> It comes from cf10c5b4cf (rev-list tests: don't hide abort() in
> "test_expect_failure", 2022-03-07). I'm skeptical that it was worth
> switching those tests for leak detection purposes.

It's not just for leak detection purposes that it was a good idea to
switch it away from test_expect_failure, but also that we've been
ensuring that this didn't turn into a segfault all this time by not
using "test_expect_failure".

> But more importantly, it looks like pw/test-todo would provide us with a
> much nicer pattern there. It seems to be stalled on review, so let's see
> if we can get that moving again.

The "TODO (should fail!)" didn't stand out? But yeah, having a "todo" or
"test_expect_todo" or "test_expect_failure" not suck would be nice.

FWIW I think
https://lore.kernel.org/git/221006.86v8owr986.gmgdl@evledraar.gmail.com/
outlines a good way forward for it that I think should make everyone
happy.

> diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh
> index 4a9a4436e2..9350b5fd2c 100755
> --- a/t/t6102-rev-list-unexpected-objects.sh
> +++ b/t/t6102-rev-list-unexpected-objects.sh
> @@ -121,8 +121,8 @@ test_expect_success 'setup unexpected non-blob tag' '
>  	tag=$(git hash-object -w --literally -t tag broken-tag)
>  '
>  
> -test_expect_success 'TODO (should fail!): traverse unexpected non-blob tag (lone)' '
> -	git rev-list --objects $tag
> +test_expect_success 'traverse unexpected non-blob tag (lone)' '
> +	test_must_fail git rev-list --objects $tag
>  '

I think you'll either want the test_cmp I noted above, or to do that in
a subsequent test_expect_failure.

I know that your stance is that you prefer not to "test the bad
behavior" as it were. Personally I thought an all-caps TODO comment
might make it less confusing, but anyway.

In this case your commit message claims you're happy with the end
result, so I think you'd want to test what we actually emit on stderr,
as it's quite ... unintuative.

Or, which I think probably makes more sense, add that as a subsequent
test_expect_failure or whatever. FWIW this somewhat un-idiomatic pattern
will get around the current caveats with it:

	test_expect_success .... '
		... >actual.non-blob-tag
	'
	test_lazy_prereq HAVE_NON_BLOB_TAG 'test -e actual.non-blob-tag'
	test_expect_failure HAVE_NON_BLOB_TAG '...' '
		cat >expect.non-blob-tag <<-\EOF &&
		...
		EOF
		test_cmp expect.non-blob-tag actual.non-blob-tag
	'

I.e. peel off the 'test_cmp" that should have a known-good state from
the already-good status code.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 0/4] tag: don't misreport type of tagged objects in errors
  2022-11-17 22:37     ` [PATCH 0/2] fixing parse_object() check for type mismatch Jeff King
  2022-11-17 22:37       ` [PATCH 1/2] parse_object(): drop extra "has" check before checking object type Jeff King
  2022-11-17 22:41       ` [PATCH 2/2] parse_object(): check on-disk type of suspected blob Jeff King
@ 2022-11-18 11:46       ` Ævar Arnfjörð Bjarmason
  2022-11-18 11:46         ` [PATCH 1/4] object-file.c: free the "t.tag" in check_tag() Ævar Arnfjörð Bjarmason
                           ` (4 more replies)
  2022-11-18 19:05       ` [PATCH 0/2] fixing parse_object() check for type mismatch Taylor Blau
  3 siblings, 5 replies; 22+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-11-18 11:46 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Junio C Hamano, Jeff King, Jonathan Tan,
	Kousik Sanagavarapu, Ævar Arnfjörð Bjarmason

This series fixes a very long-standing issue where we'll get confused
when we parse a tag whose "type" lies about the type of the target
object.

It goes on top of Jeff King's just-submitted [1], and the two
compliment one another. See [2] for my feedback about what was left
over, which this fixes.

Currently we'll parse tags and note what the "type" claims to be. Say
a pointer to a "blob" object that claims to be a "commit" in the
envelope.

Then when we we'd try to parse that supposed "commit' for real we'd
emit a message like:

	error: object <oid> is a blob, not a commit

Which is reversed, i.e. we'd remember the first "blob" we saw, and
then get confused about seeing a "commit" when we did the actual
parsing.

This is now fixed in almost all cases by having the one caller of
parse_tag() which actually knows the type tell it "yes, I'm sure this
is a commit".

We'll then be able to see that we have a non-parsed object as
scaffolding, but that it's really a commit, and emit the correct:

	error: object <oid> is a commit not a blob

Which goes along with other errors where the tag object itself yells
about being unhappy with the object reference.

I submitted a version of these patches back in early 2021[3], this is
significantly slimmed down since then.

At the time Jeff King noted that this approach inherently can't cover
all possible scenarios. I.e. sometimes our parsing of the envelope
isn't followed up by the "real" parse.

Even in those cases we can "get it right as 4/4 here demonstrates.

But there are going to be cases left where we get it wrong, but
they're all cases where we get it wrong now. It's probably not worth
fixing the long tail of those issues, but now we'll emit a sensible
error on the common case of "log" etc.

1. https://lore.kernel.org/git/Y3a3qcqNG8W3ueeb@coredump.intra.peff.net/
2. https://lore.kernel.org/git/221118.86cz9lgjxu.gmgdl@evledraar.gmail.com/
3. https://lore.kernel.org/git/YGTGgFI19fS7Uv6I@coredump.intra.peff.net/

Ævar Arnfjörð Bjarmason (4):
  object-file.c: free the "t.tag" in check_tag()
  object tests: add test for unexpected objects in tags
  tag: don't misreport type of tagged objects in errors
  tag: don't emit potentially incorrect "object is a X, not a Y"

 blob.c                                 |  11 +-
 blob.h                                 |   3 +
 commit.c                               |  11 +-
 commit.h                               |   2 +
 object-file.c                          |   1 +
 object.c                               |  20 +++-
 object.h                               |   2 +
 t/t3800-mktag.sh                       |   1 +
 t/t5302-pack-index.sh                  |   2 +
 t/t6102-rev-list-unexpected-objects.sh | 146 +++++++++++++++++++++++++
 tag.c                                  |  22 +++-
 tag.h                                  |   2 +
 tree.c                                 |  11 +-
 tree.h                                 |   2 +
 14 files changed, 222 insertions(+), 14 deletions(-)

-- 
2.38.0.1511.gcdcff1f1dc2


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/4] object-file.c: free the "t.tag" in check_tag()
  2022-11-18 11:46       ` [PATCH 0/4] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason
@ 2022-11-18 11:46         ` Ævar Arnfjörð Bjarmason
  2022-11-18 11:46         ` [PATCH 2/4] object tests: add test for unexpected objects in tags Ævar Arnfjörð Bjarmason
                           ` (3 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-11-18 11:46 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Junio C Hamano, Jeff King, Jonathan Tan,
	Kousik Sanagavarapu, Ævar Arnfjörð Bjarmason

Fix a memory leak that's been with us ever since c879daa2372 (Make
hash-object more robust against malformed objects, 2011-02-05). With
"HASH_FORMAT_CHECK" (used by "hash-object" and "replace") we'll parse
tags into a throwaway variable on the stack, but weren't freeing the
"item->tag" we might malloc() when doing so.

Mark the tests that now pass in their entirety as passing under
"SANITIZE=leak", which means we'll test them as part of the
"linux-leaks" CI job.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 object-file.c         | 1 +
 t/t3800-mktag.sh      | 1 +
 t/t5302-pack-index.sh | 2 ++
 3 files changed, 4 insertions(+)

diff --git a/object-file.c b/object-file.c
index 957790098fa..a13edba0753 100644
--- a/object-file.c
+++ b/object-file.c
@@ -2338,6 +2338,7 @@ static void check_tag(const void *buf, size_t size)
 	memset(&t, 0, sizeof(t));
 	if (parse_tag_buffer(the_repository, &t, buf, size))
 		die(_("corrupt tag"));
+	free(t.tag);
 }
 
 static int index_mem(struct index_state *istate,
diff --git a/t/t3800-mktag.sh b/t/t3800-mktag.sh
index e3cf0ffbe59..d3e428ff46e 100755
--- a/t/t3800-mktag.sh
+++ b/t/t3800-mktag.sh
@@ -4,6 +4,7 @@
 
 test_description='git mktag: tag object verify test'
 
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 ###########################################################
diff --git a/t/t5302-pack-index.sh b/t/t5302-pack-index.sh
index b0095ab41d3..54b11f81c63 100755
--- a/t/t5302-pack-index.sh
+++ b/t/t5302-pack-index.sh
@@ -4,6 +4,8 @@
 #
 
 test_description='pack index with 64-bit offsets and object CRC'
+
+TEST_PASSES_SANITIZE_LEAK=true
 . ./test-lib.sh
 
 test_expect_success 'setup' '
-- 
2.38.0.1511.gcdcff1f1dc2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/4] object tests: add test for unexpected objects in tags
  2022-11-18 11:46       ` [PATCH 0/4] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason
  2022-11-18 11:46         ` [PATCH 1/4] object-file.c: free the "t.tag" in check_tag() Ævar Arnfjörð Bjarmason
@ 2022-11-18 11:46         ` Ævar Arnfjörð Bjarmason
  2022-11-18 11:46         ` [PATCH 3/4] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason
                           ` (2 subsequent siblings)
  4 siblings, 0 replies; 22+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-11-18 11:46 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Junio C Hamano, Jeff King, Jonathan Tan,
	Kousik Sanagavarapu, Ævar Arnfjörð Bjarmason

Fix a blind spot in the tests added in 0616617c7e1 (t: introduce tests
for unexpected object types, 2019-04-09), there were no meaningful
tests for checking how we reported on finding the incorrect object
type in a tag, i.e. one that broke the "type" promise in the tag
header.

We'll report the wrong object type in these cases, and thus fail on
the "test_cmp", e.g. for the first "error: " output being tested here
we should say "$commit is a tag, not a commit", instead we say
"$commit is a commit, not a tag". This will be fixed in a subsequent
commit.

See the discussion & notes in [1] and downthread of there for test
snippets that are adapted here.

In the case of "fsck" which objects we visit in what order, and if we
report errors on them depends on their OIDs. So the test uses the
technique of extracting the OID/type combinations that fsck does
report, and asserting that those are correct (currently, it's far from
correct).

1. https://lore.kernel.org/git/YGTGgFI19fS7Uv6I@coredump.intra.peff.net/

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t6102-rev-list-unexpected-objects.sh | 146 +++++++++++++++++++++++++
 1 file changed, 146 insertions(+)

diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh
index 9350b5fd2c2..ac49f7182fd 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -130,4 +130,150 @@ test_expect_success 'traverse unexpected non-blob tag (seen)' '
 	test_i18ngrep "not a blob" output
 '
 
+test_expect_success 'setup unexpected non-tag tag' '
+	test_when_finished "git tag -d tag-commit tag-tag" &&
+
+	git tag -a -m"my tagged commit" tag-commit $commit &&
+	tag_commit=$(git rev-parse tag-commit) &&
+	git tag -a -m"my tagged tag" tag-tag tag-commit &&
+	tag_tag=$(git rev-parse tag-tag) &&
+
+	git cat-file tag tag-tag >good-tag-tag &&
+	git cat-file tag tag-commit >good-commit-tag &&
+
+	sed -e "s/$tag_commit/$commit/" <good-tag-tag >broken-tag-tag-commit &&
+	sed -e "s/$tag_commit/$tree/" <good-tag-tag >broken-tag-tag-tree &&
+	sed -e "s/$tag_commit/$blob/" <good-tag-tag >broken-tag-tag-blob &&
+
+	sed -e "s/$commit/$tag_commit/" <good-commit-tag >broken-commit-tag-tag &&
+	sed -e "s/$commit/$tree/" <good-commit-tag >broken-commit-tag-tree &&
+	sed -e "s/$commit/$blob/" <good-commit-tag >broken-commit-tag-blob &&
+
+	tag_tag_commit=$(git hash-object -w -t tag broken-tag-tag-commit) &&
+	tag_tag_tree=$(git hash-object -w -t tag broken-tag-tag-tree) &&
+	tag_tag_blob=$(git hash-object -w -t tag broken-tag-tag-blob) &&
+
+	git update-ref refs/tags/tag_tag_commit $tag_tag_commit &&
+	git update-ref refs/tags/tag_tag_tree $tag_tag_tree &&
+	git update-ref refs/tags/tag_tag_blob $tag_tag_blob &&
+
+	commit_tag_tag=$(git hash-object -w -t tag broken-commit-tag-tag) &&
+	commit_tag_tree=$(git hash-object -w -t tag broken-commit-tag-tree) &&
+	commit_tag_blob=$(git hash-object -w -t tag broken-commit-tag-blob) &&
+
+	git update-ref refs/tags/commit_tag_tag $commit_tag_tag &&
+	git update-ref refs/tags/commit_tag_tree $commit_tag_tree &&
+	git update-ref refs/tags/commit_tag_blob $commit_tag_blob
+'
+
+test_expect_failure 'traverse unexpected incorrectly typed tag (to commit & tag)' '
+	test_must_fail git rev-list --objects $tag_tag_commit 2>err &&
+	cat >expect <<-EOF &&
+	error: object $commit is a commit, not a tag
+	fatal: bad object $commit
+	EOF
+	test_cmp expect err &&
+
+	test_must_fail git rev-list --objects $commit_tag_tag 2>err &&
+	cat >expect <<-EOF &&
+	error: object $tag_commit is a tag, not a commit
+	fatal: bad object $tag_commit
+	EOF
+	test_cmp expect err
+'
+
+test_expect_failure 'traverse unexpected incorrectly typed tag (to tree)' '
+	test_must_fail git rev-list --objects $tag_tag_tree 2>err &&
+	cat >expect <<-EOF &&
+	error: object $tree is a tree, not a tag
+	fatal: bad object $tree
+	EOF
+	test_cmp expect err &&
+
+	test_must_fail git rev-list --objects $commit_tag_tree 2>err &&
+	cat >expect <<-EOF &&
+	error: object $tree is a tree, not a commit
+	fatal: bad object $tree
+	EOF
+	test_cmp expect err
+'
+
+test_expect_failure 'traverse unexpected incorrectly typed tag (to blob)' '
+	test_must_fail git rev-list --objects $tag_tag_blob 2>err &&
+	cat >expect <<-EOF &&
+	error: object $blob is a blob, not a tag
+	fatal: bad object $blob
+	EOF
+	test_cmp expect err &&
+
+	test_must_fail git rev-list --objects $commit_tag_blob 2>err &&
+	cat >expect <<-EOF &&
+	error: object $blob is a blob, not a commit
+	fatal: bad object $blob
+	EOF
+	test_cmp expect err
+'
+
+test_expect_failure 'traverse unexpected non-tag tag (tree seen to blob)' '
+	test_must_fail git rev-list --objects $tree $commit_tag_blob 2>err &&
+	cat >expect <<-EOF &&
+	error: object $blob is a blob, not a commit
+	fatal: bad object $blob
+	EOF
+	test_cmp expect err &&
+
+	test_must_fail git rev-list --objects $tree $tag_tag_blob 2>err &&
+	cat >expect <<-EOF &&
+	error: object $blob is a blob, not a tag
+	fatal: bad object $blob
+	EOF
+	test_cmp expect err
+'
+
+
+test_expect_failure 'traverse unexpected objects with for-each-ref' '
+	cat >expect <<-EOF &&
+	error: bad tag pointer to $tree in $tag_tag_tree
+	fatal: parse_object_buffer failed on $tag_tag_tree for refs/tags/tag_tag_tree
+	EOF
+	test_must_fail git for-each-ref --format="%(*objectname)" 2>actual &&
+	test_cmp expect actual
+'
+
+>fsck-object-isa
+test_expect_failure 'setup: unexpected objects with fsck' '
+	test_must_fail git fsck 2>err &&
+	sed -n -e "/^error: object .* is a .*, not a .*$/ {
+		s/^error: object \([0-9a-f]*\) is a \([a-z]*\), not a [a-z]*$/\\1 \\2/;
+		p;
+	}" <err >fsck-object-isa
+'
+
+while read oid type
+do
+	test_expect_failure "fsck knows unexpected object $oid is $type" '
+		git cat-file -t $oid >expect &&
+		echo $type >actual &&
+		test_cmp expect actual
+	'
+done <fsck-object-isa
+
+test_expect_success 'traverse unexpected non-tag tag (blob seen to blob)' '
+	test_must_fail git rev-list --objects $blob $commit_tag_blob 2>err &&
+	cat >expected <<-EOF &&
+	error: object $blob is a blob, not a commit
+	error: bad tag pointer to $blob in $commit_tag_blob
+	fatal: bad object $commit_tag_blob
+	EOF
+	test_cmp expected err &&
+
+	test_must_fail git rev-list --objects $blob $tag_tag_blob 2>err &&
+	cat >expected <<-EOF &&
+	error: object $blob is a blob, not a tag
+	error: bad tag pointer to $blob in $tag_tag_blob
+	fatal: bad object $tag_tag_blob
+	EOF
+	test_cmp expected err
+'
+
 test_done
-- 
2.38.0.1511.gcdcff1f1dc2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/4] tag: don't misreport type of tagged objects in errors
  2022-11-18 11:46       ` [PATCH 0/4] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason
  2022-11-18 11:46         ` [PATCH 1/4] object-file.c: free the "t.tag" in check_tag() Ævar Arnfjörð Bjarmason
  2022-11-18 11:46         ` [PATCH 2/4] object tests: add test for unexpected objects in tags Ævar Arnfjörð Bjarmason
@ 2022-11-18 11:46         ` Ævar Arnfjörð Bjarmason
  2022-11-18 11:46         ` [PATCH 4/4] tag: don't emit potentially incorrect "object is a X, not a Y" Ævar Arnfjörð Bjarmason
  2022-12-30  1:52         ` [PATCH v2 0/3] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason
  4 siblings, 0 replies; 22+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-11-18 11:46 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Junio C Hamano, Jeff King, Jonathan Tan,
	Kousik Sanagavarapu, Ævar Arnfjörð Bjarmason

Fix a regression in 89e4202f982 ([PATCH] Parse tags for absent
objects, 2005-06-21) (yes, that ancient!) and correctly report an
error on a tag like:

    object <a tree hash>
    type commit

As:

    error: object <a tree hash> is tree, not a commit

Instead of our long-standing misbehavior of inverting the two, and
reporting:

    error: object <a tree hash> is commit, not a tree

Which, as can be trivially seen with 'git cat-file -t <a tree hash>'
is incorrect.

The reason for this misreporting is that in parse_tag_buffer() we end
up doing a lookup_{blob,commit,tag,tree}() depending on what we read
out of the "type" line.

If we haven't parsed that object before we end up dispatching to the
type-specific lookup functions, e.g. this for commit.c in
lookup_commit_type():

	struct object *obj = lookup_object(r, oid);
	if (!obj)
		return create_object(r, oid, alloc_commit_node(r));

Its allocation will then set the obj->type according to what the tag
told us the type was, but which we've never validated. At this point
we've got an object in memory that hasn't been parsed, and whose type
is incorrect, since we mistrusted a tag to tell us the type.

Then when we actually load the object with parse_object() we read it
and find that it's a "tree". See 8ff226a9d5e (add object_as_type
helper for casting objects, 2014-07-13) for that behavior (that's just
a refactoring commit, but shows all the code involved).

Which explains why we inverted the error report. Normally when
object_as_type() is called it's by the lookup_{blob,commit,tag,tree}()
functions via parse_object(). At that point we can trust the
obj->type.

In the case of parsing objects we've learned about via a tag with an
incorrect type it's the opposite, the obj->type isn't correct and
holds the mislabeled type, but we're parsing the object and know for
sure what object type we're dealing with.

So, let's add "lookup_{blob,commit,tag,tree}_type()" functions to go
with the existing ""lookup_{blob,commit,tag,tree}()", we'll call these
from "parse_object_buffer()" where we actually know the type, as
opposed to the "parse_tag_buffer()" code where we're just guessing
what it might be.

This only help with the cases where we do see the tag reference, and
then end up doing a full parse of the object. But as seen in the
"for-each-ref" and "fsck" tests we have cases where we'll never fully
parse it.

Those will be handled in a subsequent commit, but for now this handles
the common case of "show" etc. running into these.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 blob.c                                 | 11 +++++++++--
 blob.h                                 |  3 +++
 commit.c                               | 11 +++++++++--
 commit.h                               |  2 ++
 object.c                               | 20 ++++++++++++++++----
 object.h                               |  2 ++
 t/t6102-rev-list-unexpected-objects.sh |  8 ++++----
 tag.c                                  | 21 +++++++++++++++++----
 tag.h                                  |  2 ++
 tree.c                                 | 11 +++++++++--
 tree.h                                 |  2 ++
 11 files changed, 75 insertions(+), 18 deletions(-)

diff --git a/blob.c b/blob.c
index 182718aba9f..ca30a22b2e8 100644
--- a/blob.c
+++ b/blob.c
@@ -5,12 +5,19 @@
 
 const char *blob_type = "blob";
 
-struct blob *lookup_blob(struct repository *r, const struct object_id *oid)
+struct blob *lookup_blob_type(struct repository *r,
+			      const struct object_id *oid,
+			      enum object_type type)
 {
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
 		return create_object(r, oid, alloc_blob_node(r));
-	return object_as_type(obj, OBJ_BLOB, 0);
+	return object_as_type_hint(obj, OBJ_BLOB, type);
+}
+
+struct blob *lookup_blob(struct repository *r, const struct object_id *oid)
+{
+	return lookup_blob_type(r, oid, OBJ_NONE);
 }
 
 int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size)
diff --git a/blob.h b/blob.h
index 16648720557..066a2effcbf 100644
--- a/blob.h
+++ b/blob.h
@@ -10,6 +10,9 @@ struct blob {
 };
 
 struct blob *lookup_blob(struct repository *r, const struct object_id *oid);
+struct blob *lookup_blob_type(struct repository *r,
+			      const struct object_id *oid,
+			      enum object_type type);
 
 int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size);
 
diff --git a/commit.c b/commit.c
index 572301b80a2..8a90f279e24 100644
--- a/commit.c
+++ b/commit.c
@@ -67,12 +67,19 @@ struct commit *lookup_commit_object(struct repository *r,
 
 }
 
-struct commit *lookup_commit(struct repository *r, const struct object_id *oid)
+struct commit *lookup_commit_type(struct repository *r,
+				  const struct object_id *oid,
+				  enum object_type type)
 {
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
 		return create_object(r, oid, alloc_commit_node(r));
-	return object_as_type(obj, OBJ_COMMIT, 0);
+	return object_as_type_hint(obj, OBJ_COMMIT, type);
+}
+
+struct commit *lookup_commit(struct repository *r, const struct object_id *oid)
+{
+	return lookup_commit_type(r, oid, OBJ_NONE);
 }
 
 struct commit *lookup_commit_reference_by_name(const char *name)
diff --git a/commit.h b/commit.h
index fa39202fa6b..95001a29d6b 100644
--- a/commit.h
+++ b/commit.h
@@ -78,6 +78,8 @@ struct commit *lookup_commit_object(struct repository *r, const struct object_id
  * "oid" is not in the object cache.
  */
 struct commit *lookup_commit(struct repository *r, const struct object_id *oid);
+struct commit *lookup_commit_type(struct repository *r, const struct object_id *oid,
+				  enum object_type type);
 struct commit *lookup_commit_reference(struct repository *r,
 				       const struct object_id *oid);
 struct commit *lookup_commit_reference_gently(struct repository *r,
diff --git a/object.c b/object.c
index fad1a5af4a6..cc17ed0606e 100644
--- a/object.c
+++ b/object.c
@@ -177,6 +177,18 @@ void *object_as_type(struct object *obj, enum object_type type, int quiet)
 	}
 }
 
+void *object_as_type_hint(struct object *obj, enum object_type type,
+			  enum object_type hint)
+{
+	if (hint != OBJ_NONE && obj->type != OBJ_NONE && obj->type != type) {
+		error(_("object %s is a %s, not a %s"), oid_to_hex(&obj->oid),
+		      type_name(type), type_name(obj->type));
+		obj->type = type;
+		return NULL;
+	}
+	return object_as_type(obj, type, 0);;
+}
+
 struct object *lookup_unknown_object(struct repository *r, const struct object_id *oid)
 {
 	struct object *obj = lookup_object(r, oid);
@@ -210,14 +222,14 @@ struct object *parse_object_buffer(struct repository *r, const struct object_id
 
 	obj = NULL;
 	if (type == OBJ_BLOB) {
-		struct blob *blob = lookup_blob(r, oid);
+		struct blob *blob = lookup_blob_type(r, oid, type);
 		if (blob) {
 			if (parse_blob_buffer(blob, buffer, size))
 				return NULL;
 			obj = &blob->object;
 		}
 	} else if (type == OBJ_TREE) {
-		struct tree *tree = lookup_tree(r, oid);
+		struct tree *tree = lookup_tree_type(r, oid, type);
 		if (tree) {
 			obj = &tree->object;
 			if (!tree->buffer)
@@ -229,7 +241,7 @@ struct object *parse_object_buffer(struct repository *r, const struct object_id
 			}
 		}
 	} else if (type == OBJ_COMMIT) {
-		struct commit *commit = lookup_commit(r, oid);
+		struct commit *commit = lookup_commit_type(r, oid, type);
 		if (commit) {
 			if (parse_commit_buffer(r, commit, buffer, size, 1))
 				return NULL;
@@ -241,7 +253,7 @@ struct object *parse_object_buffer(struct repository *r, const struct object_id
 			obj = &commit->object;
 		}
 	} else if (type == OBJ_TAG) {
-		struct tag *tag = lookup_tag(r, oid);
+		struct tag *tag = lookup_tag_type(r, oid, type);
 		if (tag) {
 			if (parse_tag_buffer(r, tag, buffer, size))
 			       return NULL;
diff --git a/object.h b/object.h
index 31ebe114585..042c304d3a4 100644
--- a/object.h
+++ b/object.h
@@ -122,6 +122,8 @@ struct object *lookup_object(struct repository *r, const struct object_id *oid);
 void *create_object(struct repository *r, const struct object_id *oid, void *obj);
 
 void *object_as_type(struct object *obj, enum object_type type, int quiet);
+void *object_as_type_hint(struct object *obj, enum object_type type,
+			  enum object_type hint);
 
 /*
  * Returns the object, having parsed it to find out what it is.
diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh
index ac49f7182fd..2e36d8bcfd9 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -166,7 +166,7 @@ test_expect_success 'setup unexpected non-tag tag' '
 	git update-ref refs/tags/commit_tag_blob $commit_tag_blob
 '
 
-test_expect_failure 'traverse unexpected incorrectly typed tag (to commit & tag)' '
+test_expect_success 'traverse unexpected incorrectly typed tag (to commit & tag)' '
 	test_must_fail git rev-list --objects $tag_tag_commit 2>err &&
 	cat >expect <<-EOF &&
 	error: object $commit is a commit, not a tag
@@ -182,7 +182,7 @@ test_expect_failure 'traverse unexpected incorrectly typed tag (to commit & tag)
 	test_cmp expect err
 '
 
-test_expect_failure 'traverse unexpected incorrectly typed tag (to tree)' '
+test_expect_success 'traverse unexpected incorrectly typed tag (to tree)' '
 	test_must_fail git rev-list --objects $tag_tag_tree 2>err &&
 	cat >expect <<-EOF &&
 	error: object $tree is a tree, not a tag
@@ -198,7 +198,7 @@ test_expect_failure 'traverse unexpected incorrectly typed tag (to tree)' '
 	test_cmp expect err
 '
 
-test_expect_failure 'traverse unexpected incorrectly typed tag (to blob)' '
+test_expect_success 'traverse unexpected incorrectly typed tag (to blob)' '
 	test_must_fail git rev-list --objects $tag_tag_blob 2>err &&
 	cat >expect <<-EOF &&
 	error: object $blob is a blob, not a tag
@@ -214,7 +214,7 @@ test_expect_failure 'traverse unexpected incorrectly typed tag (to blob)' '
 	test_cmp expect err
 '
 
-test_expect_failure 'traverse unexpected non-tag tag (tree seen to blob)' '
+test_expect_success 'traverse unexpected non-tag tag (tree seen to blob)' '
 	test_must_fail git rev-list --objects $tree $commit_tag_blob 2>err &&
 	cat >expect <<-EOF &&
 	error: object $blob is a blob, not a commit
diff --git a/tag.c b/tag.c
index dfbcd7fcc24..19453c2edbf 100644
--- a/tag.c
+++ b/tag.c
@@ -100,12 +100,18 @@ struct object *deref_tag_noverify(struct object *o)
 	return o;
 }
 
-struct tag *lookup_tag(struct repository *r, const struct object_id *oid)
+struct tag *lookup_tag_type(struct repository *r, const struct object_id *oid,
+			    enum object_type type)
 {
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
 		return create_object(r, oid, alloc_tag_node(r));
-	return object_as_type(obj, OBJ_TAG, 0);
+	return object_as_type_hint(obj, OBJ_TAG, type);
+}
+
+struct tag *lookup_tag(struct repository *r, const struct object_id *oid)
+{
+	return lookup_tag_type(r, oid, OBJ_NONE);
 }
 
 static timestamp_t parse_tag_date(const char *buf, const char *tail)
@@ -135,6 +141,7 @@ void release_tag_memory(struct tag *t)
 
 int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, unsigned long size)
 {
+	struct object *obj;
 	struct object_id oid;
 	char type[20];
 	const char *bufptr = data;
@@ -169,7 +176,10 @@ int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u
 	type[nl - bufptr] = '\0';
 	bufptr = nl + 1;
 
-	if (!strcmp(type, blob_type)) {
+	obj = lookup_object(r, &oid);
+	if (obj) {
+		item->tagged = obj;
+	} else if (!strcmp(type, blob_type)) {
 		item->tagged = (struct object *)lookup_blob(r, &oid);
 	} else if (!strcmp(type, tree_type)) {
 		item->tagged = (struct object *)lookup_tree(r, &oid);
@@ -182,10 +192,13 @@ int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u
 			     type, oid_to_hex(&item->object.oid));
 	}
 
-	if (!item->tagged)
+	if (!item->tagged || strcmp(type_name(item->tagged->type), type)) {
+		error(_("object %s is a %s, not a %s"), oid_to_hex(&oid),
+		      type_name(item->tagged->type), type);
 		return error("bad tag pointer to %s in %s",
 			     oid_to_hex(&oid),
 			     oid_to_hex(&item->object.oid));
+	}
 
 	if (bufptr + 4 < tail && starts_with(bufptr, "tag "))
 		; 		/* good */
diff --git a/tag.h b/tag.h
index 3ce8e721924..42bd3e64011 100644
--- a/tag.h
+++ b/tag.h
@@ -12,6 +12,8 @@ struct tag {
 	timestamp_t date;
 };
 struct tag *lookup_tag(struct repository *r, const struct object_id *oid);
+struct tag *lookup_tag_type(struct repository *r, const struct object_id *oid,
+			    enum object_type type);
 int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, unsigned long size);
 int parse_tag(struct tag *item);
 void release_tag_memory(struct tag *t);
diff --git a/tree.c b/tree.c
index 410e3b477e5..1a730249bb8 100644
--- a/tree.c
+++ b/tree.c
@@ -102,12 +102,19 @@ int cmp_cache_name_compare(const void *a_, const void *b_)
 				  ce2->name, ce2->ce_namelen, ce_stage(ce2));
 }
 
-struct tree *lookup_tree(struct repository *r, const struct object_id *oid)
+struct tree *lookup_tree_type(struct repository *r,
+			      const struct object_id *oid,
+			      enum object_type type)
 {
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
 		return create_object(r, oid, alloc_tree_node(r));
-	return object_as_type(obj, OBJ_TREE, 0);
+	return object_as_type_hint(obj, OBJ_TREE, type);
+}
+
+struct tree *lookup_tree(struct repository *r, const struct object_id *oid)
+{
+	return lookup_tree_type(r, oid, OBJ_NONE);
 }
 
 int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size)
diff --git a/tree.h b/tree.h
index 6efff003e21..4af3b617f3d 100644
--- a/tree.h
+++ b/tree.h
@@ -15,6 +15,8 @@ struct tree {
 extern const char *tree_type;
 
 struct tree *lookup_tree(struct repository *r, const struct object_id *oid);
+struct tree *lookup_tree_type(struct repository *r, const struct object_id *oid,
+			      enum object_type type);
 
 int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size);
 
-- 
2.38.0.1511.gcdcff1f1dc2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/4] tag: don't emit potentially incorrect "object is a X, not a Y"
  2022-11-18 11:46       ` [PATCH 0/4] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason
                           ` (2 preceding siblings ...)
  2022-11-18 11:46         ` [PATCH 3/4] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason
@ 2022-11-18 11:46         ` Ævar Arnfjörð Bjarmason
  2022-12-30  1:52         ` [PATCH v2 0/3] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason
  4 siblings, 0 replies; 22+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-11-18 11:46 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Junio C Hamano, Jeff King, Jonathan Tan,
	Kousik Sanagavarapu, Ævar Arnfjörð Bjarmason

As noted in the preceding commit we weren't handling cases where we
see a reference to a bad "type" in a "tag", but then end up not fully
parsing the object.

In those cases let's only claim that we have a bad tag pointer, but
emit "is a %s, not a %s".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t6102-rev-list-unexpected-objects.sh | 6 +++---
 tag.c                                  | 5 +++--
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh
index 2e36d8bcfd9..ffc1672d7dc 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -231,7 +231,7 @@ test_expect_success 'traverse unexpected non-tag tag (tree seen to blob)' '
 '
 
 
-test_expect_failure 'traverse unexpected objects with for-each-ref' '
+test_expect_success 'traverse unexpected objects with for-each-ref' '
 	cat >expect <<-EOF &&
 	error: bad tag pointer to $tree in $tag_tag_tree
 	fatal: parse_object_buffer failed on $tag_tag_tree for refs/tags/tag_tag_tree
@@ -241,7 +241,7 @@ test_expect_failure 'traverse unexpected objects with for-each-ref' '
 '
 
 >fsck-object-isa
-test_expect_failure 'setup: unexpected objects with fsck' '
+test_expect_success 'setup: unexpected objects with fsck' '
 	test_must_fail git fsck 2>err &&
 	sed -n -e "/^error: object .* is a .*, not a .*$/ {
 		s/^error: object \([0-9a-f]*\) is a \([a-z]*\), not a [a-z]*$/\\1 \\2/;
@@ -251,7 +251,7 @@ test_expect_failure 'setup: unexpected objects with fsck' '
 
 while read oid type
 do
-	test_expect_failure "fsck knows unexpected object $oid is $type" '
+	test_expect_success "fsck knows unexpected object $oid is $type" '
 		git cat-file -t $oid >expect &&
 		echo $type >actual &&
 		test_cmp expect actual
diff --git a/tag.c b/tag.c
index 19453c2edbf..ad92cf89209 100644
--- a/tag.c
+++ b/tag.c
@@ -193,8 +193,9 @@ int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u
 	}
 
 	if (!item->tagged || strcmp(type_name(item->tagged->type), type)) {
-		error(_("object %s is a %s, not a %s"), oid_to_hex(&oid),
-		      type_name(item->tagged->type), type);
+		if (item->tagged && item->tagged->parsed)
+			error(_("object %s is a %s, not a %s"), oid_to_hex(&oid),
+			      type_name(item->tagged->type), type);
 		return error("bad tag pointer to %s in %s",
 			     oid_to_hex(&oid),
 			     oid_to_hex(&item->object.oid));
-- 
2.38.0.1511.gcdcff1f1dc2


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2] fixing parse_object() check for type mismatch
  2022-11-17 22:37     ` [PATCH 0/2] fixing parse_object() check for type mismatch Jeff King
                         ` (2 preceding siblings ...)
  2022-11-18 11:46       ` [PATCH 0/4] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason
@ 2022-11-18 19:05       ` Taylor Blau
  2022-11-21 19:26         ` Jeff King
  3 siblings, 1 reply; 22+ messages in thread
From: Taylor Blau @ 2022-11-18 19:05 UTC (permalink / raw)
  To: Jeff King; +Cc: Jonathan Tan, Kousik Sanagavarapu, git

On Thu, Nov 17, 2022 at 05:37:29PM -0500, Jeff King wrote:
> I'm adding Taylor to the cc as the author of t6102, when we were
> tracking down all of these "oops, it's not really a blob" cases. This
> fixes one of the lingering cases from that test script.
>
>   [1/2]: parse_object(): drop extra "has" check before checking object type
>   [2/2]: parse_object(): check on-disk type of suspected blob
>
>  object.c                               | 5 ++---
>  t/t6102-rev-list-unexpected-objects.sh | 4 ++--
>  2 files changed, 4 insertions(+), 5 deletions(-)

A blast from the past :-).

I took a careful look at both of these patches and they looked good to
me, so let's start merging them down.

Thanks,
Taylor

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/2] parse_object(): check on-disk type of suspected blob
  2022-11-18  0:36         ` Ævar Arnfjörð Bjarmason
@ 2022-11-21 19:21           ` Jeff King
  0 siblings, 0 replies; 22+ messages in thread
From: Jeff King @ 2022-11-21 19:21 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Jonathan Tan, Taylor Blau, Kousik Sanagavarapu, git

On Fri, Nov 18, 2022 at 01:36:23AM +0100, Ævar Arnfjörð Bjarmason wrote:

> > -	if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) ||
> > -	    (!obj && oid_object_info(r, oid, NULL) == OBJ_BLOB)) {
> > +	if ((!obj || (obj && obj->type == OBJ_BLOB)) &&
> > +	    oid_object_info(r, oid, NULL) == OBJ_BLOB) {
> >  		if (!skip_hash && stream_object_signature(r, repl) < 0) {
> >  			error(_("hash mismatch %s"), oid_to_hex(oid));
> >  			return NULL;
> 
> But why:
> 
> 	if ((!x || (x && x->m)) && ...)
> 
> Instead of:
> 
> 	if ((!x || x->m)) && ...)
> 
> If "!obj" is false then "obj" must be non-NULL, so you don't need to
> check it again and can lose the "obj &&".

Just that it was one more round of refactoring than I did. :)

I agree that it's much more readable. It looks like the original hit
'next', so I'll send a patch on top.

> I applied this on top of "master", and adjusted your test to be this
> instead:
> 
> 	test_expect_success 'traverse unexpected non-blob tag (lone)' '
> 		cat >expect <<-EOF &&
> 		error: object $commit is a blob, not a commit
> 		fatal: bad object $commit
> 		EOF
> 		test_must_fail git rev-list --objects $tag >out 2>actual &&
> 		test_must_be_empty out &&
> 		test_cmp expect actual
> 	'
> 
> Which passes, showing that we're still not correctly identifying it, but
> we are doing it for the purposes of erroring out, but the incorrect type
> persists.
> 
> Now, this all does seem quite familiar... :) :
> https://lore.kernel.org/git/patch-10.11-a84f670ac24-20210328T021238Z-avarab@gmail.com/
> 
> I.e. that's the rest of the fix for this issue. I applied this change on
> my local branch with that, and they combine nicely. the "test_must_fail"
> here works as intended, *and* we'll correctly report & store the type.

Right. It's hitting the exact same code path as all of the other object
types now. You suggested adding to the test here, but I'd prefer not to
do that. Noticing that we have a type mismatch is what is fixed, and it
now does that just like all the other object types. Dealing with the
message reversal is orthogonal.

> > But more importantly, it looks like pw/test-todo would provide us with a
> > much nicer pattern there. It seems to be stalled on review, so let's see
> > if we can get that moving again.
> 
> The "TODO (should fail!)" didn't stand out? But yeah, having a "todo" or
> "test_expect_todo" or "test_expect_failure" not suck would be nice.

I did double-take on the "TODO" just because that is not our usual
pattern, but that was easily fixed. What I really don't like about the
"switch failure to success" pattern is that it requires rewriting the
test to expect the wrong thing! So when somebody later fixes the bug,
they get a confusing failure, but must also rewrite the test back to
what it originally should have been.

That was not too hard here, where it was just replacing a
test_must_fail, but that earlier hunk in cf10c5b4cf that actualy adds in
expected output (that we know is the wrong thing to be printing!) seems
a bit over the top to me. Anybody who encounters it has to dig into the
history to understand what is going on.

-Peff

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2] fixing parse_object() check for type mismatch
  2022-11-18 19:05       ` [PATCH 0/2] fixing parse_object() check for type mismatch Taylor Blau
@ 2022-11-21 19:26         ` Jeff King
  2022-11-22  0:05           ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 22+ messages in thread
From: Jeff King @ 2022-11-21 19:26 UTC (permalink / raw)
  To: Taylor Blau
  Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano,
	Jonathan Tan, Kousik Sanagavarapu, git

On Fri, Nov 18, 2022 at 02:05:04PM -0500, Taylor Blau wrote:

> On Thu, Nov 17, 2022 at 05:37:29PM -0500, Jeff King wrote:
> > I'm adding Taylor to the cc as the author of t6102, when we were
> > tracking down all of these "oops, it's not really a blob" cases. This
> > fixes one of the lingering cases from that test script.
> >
> >   [1/2]: parse_object(): drop extra "has" check before checking object type
> >   [2/2]: parse_object(): check on-disk type of suspected blob
> >
> >  object.c                               | 5 ++---
> >  t/t6102-rev-list-unexpected-objects.sh | 4 ++--
> >  2 files changed, 4 insertions(+), 5 deletions(-)
> 
> A blast from the past :-).
> 
> I took a careful look at both of these patches and they looked good to
> me, so let's start merging them down.

I saw this hit 'next', but I think Ævar's simplification suggestion is
worth taking. So here is a patch on top to do so (the original branch is
jk/parse-object-type-mismatch for the benefit of any newly-returned
maintainers).

I was going to do a "helped-by", but since the only thing in the patch
is the suggested change, I just handed over authorship. :)

I didn't forge a signoff, and I think mine is sufficient under DCO's
part (b), but Ævar please indicate if that's OK.

-- >8 --
From: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Subject: [PATCH] parse_object(): simplify blob conditional

Commit 8db2dad7a0 (parse_object(): check on-disk type of suspected blob,
2022-11-17) simplified the conditional for checking if we might have a
blob. But we can simplify it further. In:

  !obj || (obj && obj->type == OBJ_BLOB)

the short-circuit "OR" means "obj" will always be true on the right-hand
side. The compiler almost certainly optimized that out anyway, but
dropping it makes the conditional easier to understand for humans.

Signed-off-by: Jeff King <peff@peff.net>
---
 object.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/object.c b/object.c
index fad1a5af4a..682b852a46 100644
--- a/object.c
+++ b/object.c
@@ -286,7 +286,7 @@ struct object *parse_object_with_flags(struct repository *r,
 			return &commit->object;
 	}
 
-	if ((!obj || (obj && obj->type == OBJ_BLOB)) &&
+	if ((!obj || obj->type == OBJ_BLOB) &&
 	    oid_object_info(r, oid, NULL) == OBJ_BLOB) {
 		if (!skip_hash && stream_object_signature(r, repl) < 0) {
 			error(_("hash mismatch %s"), oid_to_hex(oid));
-- 
2.38.1.950.g65ed8c1df8


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 0/2] fixing parse_object() check for type mismatch
  2022-11-21 19:26         ` Jeff King
@ 2022-11-22  0:05           ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 22+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-11-22  0:05 UTC (permalink / raw)
  To: Jeff King
  Cc: Taylor Blau, Junio C Hamano, Jonathan Tan, Kousik Sanagavarapu,
	git


On Mon, Nov 21 2022, Jeff King wrote:

> On Fri, Nov 18, 2022 at 02:05:04PM -0500, Taylor Blau wrote:
>
>> On Thu, Nov 17, 2022 at 05:37:29PM -0500, Jeff King wrote:
>> > I'm adding Taylor to the cc as the author of t6102, when we were
>> > tracking down all of these "oops, it's not really a blob" cases. This
>> > fixes one of the lingering cases from that test script.
>> >
>> >   [1/2]: parse_object(): drop extra "has" check before checking object type
>> >   [2/2]: parse_object(): check on-disk type of suspected blob
>> >
>> >  object.c                               | 5 ++---
>> >  t/t6102-rev-list-unexpected-objects.sh | 4 ++--
>> >  2 files changed, 4 insertions(+), 5 deletions(-)
>> 
>> A blast from the past :-).
>> 
>> I took a careful look at both of these patches and they looked good to
>> me, so let's start merging them down.
>
> I saw this hit 'next', but I think Ævar's simplification suggestion is
> worth taking. So here is a patch on top to do so (the original branch is
> jk/parse-object-type-mismatch for the benefit of any newly-returned
> maintainers).
>
> I was going to do a "helped-by", but since the only thing in the patch
> is the suggested change, I just handed over authorship. :)
>
> I didn't forge a signoff, and I think mine is sufficient under DCO's
> part (b), but Ævar please indicate if that's OK.

This looks good to me, thanks for following up. In case my SOB is needed
feel free to add it, but it's fine without that too as far as I'm
concerned.

> -- >8 --
> From: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> Subject: [PATCH] parse_object(): simplify blob conditional
>
> Commit 8db2dad7a0 (parse_object(): check on-disk type of suspected blob,
> 2022-11-17) simplified the conditional for checking if we might have a
> blob. But we can simplify it further. In:
>
>   !obj || (obj && obj->type == OBJ_BLOB)
>
> the short-circuit "OR" means "obj" will always be true on the right-hand
> side. The compiler almost certainly optimized that out anyway, but
> dropping it makes the conditional easier to understand for humans.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  object.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/object.c b/object.c
> index fad1a5af4a..682b852a46 100644
> --- a/object.c
> +++ b/object.c
> @@ -286,7 +286,7 @@ struct object *parse_object_with_flags(struct repository *r,
>  			return &commit->object;
>  	}
>  
> -	if ((!obj || (obj && obj->type == OBJ_BLOB)) &&
> +	if ((!obj || obj->type == OBJ_BLOB) &&
>  	    oid_object_info(r, oid, NULL) == OBJ_BLOB) {
>  		if (!skip_hash && stream_object_signature(r, repl) < 0) {
>  			error(_("hash mismatch %s"), oid_to_hex(oid));


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 0/3] tag: don't misreport type of tagged objects in errors
  2022-11-18 11:46       ` [PATCH 0/4] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason
                           ` (3 preceding siblings ...)
  2022-11-18 11:46         ` [PATCH 4/4] tag: don't emit potentially incorrect "object is a X, not a Y" Ævar Arnfjörð Bjarmason
@ 2022-12-30  1:52         ` Ævar Arnfjörð Bjarmason
  2022-12-30  1:52           ` [PATCH v2 1/3] object tests: add test for unexpected objects in tags Ævar Arnfjörð Bjarmason
                             ` (2 more replies)
  4 siblings, 3 replies; 22+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-12-30  1:52 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Junio C Hamano, Jeff King, Jonathan Tan,
	Kousik Sanagavarapu, Ævar Arnfjörð Bjarmason

This series fixes a very long-standing issue where we'll get confused
when we parse a tag whose "type" lies about the type of the target
object.

The v1 was on top of jk/parse-object-type-mismatch, which has since
landed on "master". As I noted in [1] this covers remaining
misreporting cases which weren't addressed in that series.

Currently we'll parse tags and note what the "type" claims to be. Say
a pointer to a "blob" object that claims to be a "commit" in the
envelope.

Then when we we'd try to parse that supposed "commit' for real we'd
emit a message like:

	error: object <oid> is a blob, not a commit

Which is reversed, i.e. we'd remember the first "blob" we saw, and
then get confused about seeing a "commit" when we did the actual
parsing.

This is now fixed in almost all cases by having the one caller of
parse_tag() which actually knows the type tell it "yes, I'm sure this
is a commit".

We'll then be able to see that we have a non-parsed object as
scaffolding, but that it's really a commit, and emit the correct:

	error: object <oid> is a commit not a blob

Which goes along with other errors where the tag object itself yells
about being unhappy with the object reference.

I submitted a version of these patches back in early 2021[2], this is
significantly slimmed down since then.

At the time Jeff King noted[3] that this approach inherently can't cover
all possible scenarios. I.e. sometimes our parsing of the envelope
isn't followed up by the "real" parse.

Even in those cases we can "get it right as 3/3 here demonstrates.

But there are going to be cases left where we get it wrong, but
they're all cases where we get it wrong now. It's probably not worth
fixing the long tail of those issues, but now we'll emit a sensible
error on the common case of "log" etc.

Changes since v1:

* The v1 of this included a fix for the t.tag memory leak, which has
  now been ejected. I'm fixing that in another series[4]

  As a result we need to mark the new tests with !SANITIZE_LEAK, once
  some version of [4] lands we can un-mark these, so we'll test them
  under SANITIZE=leak.

* In the previous 1st patch I marked a "setup" test as
  "test_expect_failure", which will pass at that point, let's make it
  "test_expect_success" from the outset.

CI & branch at [5]. The "win build" CI failure is unrelated, it also
happens when I re-push master, root cause unknown, but unrelated to
this topic.

1. https://lore.kernel.org/git/221118.86cz9lgjxu.gmgdl@evledraar.gmail.com/
2. https://lore.kernel.org/git/cover-00.11-00000000000-20210328T021238Z-avarab@gmail.com/
3. https://lore.kernel.org/git/YGTGgFI19fS7Uv6I@coredump.intra.peff.net/
4. https://lore.kernel.org/git/cover-00.20-00000000000-20221228T175512Z-avarab@gmail.com/
5. https://github.com/avar/git/tree/avar/correct-object-as-type-minimal-2


Ævar Arnfjörð Bjarmason (3):
  object tests: add test for unexpected objects in tags
  tag: don't misreport type of tagged objects in errors
  tag: don't emit potentially incorrect "object is a X, not a Y"

 blob.c                                 |  11 +-
 blob.h                                 |   3 +
 commit.c                               |  11 +-
 commit.h                               |   2 +
 object.c                               |  20 +++-
 object.h                               |   2 +
 t/t6102-rev-list-unexpected-objects.sh | 146 +++++++++++++++++++++++++
 tag.c                                  |  22 +++-
 tag.h                                  |   2 +
 tree.c                                 |  11 +-
 tree.h                                 |   2 +
 11 files changed, 218 insertions(+), 14 deletions(-)

Range-diff against v1:
1:  2be8477cd78 < -:  ----------- object-file.c: free the "t.tag" in check_tag()
2:  1b5544ec868 ! 1:  0abf873f1e3 object tests: add test for unexpected objects in tags
    @@ Commit message
         report, and asserting that those are correct (currently, it's far from
         correct).
     
    +    As these tests happen to run into a memory leak skip them under
    +    SANITIZE=leak, as the test file was previously marked leak-free in
    +    [3]. There is a concurrent fix for the leak in question[4].
    +
         1. https://lore.kernel.org/git/YGTGgFI19fS7Uv6I@coredump.intra.peff.net/
    +    2. https://lore.kernel.org/git/patch-18.20-aa4df0e1b5c-20221228T175512Z-avarab@gmail.com/
    +    3. dd9cede9136 (leak tests: mark some rev-list tests as passing with
    +       SANITIZE=leak, 2021-10-31)
    +    4. https://lore.kernel.org/git/patch-18.20-aa4df0e1b5c-20221228T175512Z-avarab@gmail.com/
     
         Helped-by: Jeff King <peff@peff.net>
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
    @@ t/t6102-rev-list-unexpected-objects.sh: test_expect_success 'traverse unexpected
      	test_i18ngrep "not a blob" output
      '
      
    -+test_expect_success 'setup unexpected non-tag tag' '
    ++test_expect_success !SANITIZE_LEAK 'setup unexpected non-tag tag' '
     +	test_when_finished "git tag -d tag-commit tag-tag" &&
     +
     +	git tag -a -m"my tagged commit" tag-commit $commit &&
    @@ t/t6102-rev-list-unexpected-objects.sh: test_expect_success 'traverse unexpected
     +	git update-ref refs/tags/commit_tag_blob $commit_tag_blob
     +'
     +
    -+test_expect_failure 'traverse unexpected incorrectly typed tag (to commit & tag)' '
    ++test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to commit & tag)' '
     +	test_must_fail git rev-list --objects $tag_tag_commit 2>err &&
     +	cat >expect <<-EOF &&
     +	error: object $commit is a commit, not a tag
    @@ t/t6102-rev-list-unexpected-objects.sh: test_expect_success 'traverse unexpected
     +	test_cmp expect err
     +'
     +
    -+test_expect_failure 'traverse unexpected incorrectly typed tag (to tree)' '
    ++test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to tree)' '
     +	test_must_fail git rev-list --objects $tag_tag_tree 2>err &&
     +	cat >expect <<-EOF &&
     +	error: object $tree is a tree, not a tag
    @@ t/t6102-rev-list-unexpected-objects.sh: test_expect_success 'traverse unexpected
     +	test_cmp expect err
     +'
     +
    -+test_expect_failure 'traverse unexpected incorrectly typed tag (to blob)' '
    ++test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to blob)' '
     +	test_must_fail git rev-list --objects $tag_tag_blob 2>err &&
     +	cat >expect <<-EOF &&
     +	error: object $blob is a blob, not a tag
    @@ t/t6102-rev-list-unexpected-objects.sh: test_expect_success 'traverse unexpected
     +	test_cmp expect err
     +'
     +
    -+test_expect_failure 'traverse unexpected non-tag tag (tree seen to blob)' '
    ++test_expect_failure !SANITIZE_LEAK 'traverse unexpected non-tag tag (tree seen to blob)' '
     +	test_must_fail git rev-list --objects $tree $commit_tag_blob 2>err &&
     +	cat >expect <<-EOF &&
     +	error: object $blob is a blob, not a commit
    @@ t/t6102-rev-list-unexpected-objects.sh: test_expect_success 'traverse unexpected
     +'
     +
     +
    -+test_expect_failure 'traverse unexpected objects with for-each-ref' '
    ++test_expect_failure !SANITIZE_LEAK 'traverse unexpected objects with for-each-ref' '
     +	cat >expect <<-EOF &&
     +	error: bad tag pointer to $tree in $tag_tag_tree
     +	fatal: parse_object_buffer failed on $tag_tag_tree for refs/tags/tag_tag_tree
    @@ t/t6102-rev-list-unexpected-objects.sh: test_expect_success 'traverse unexpected
     +'
     +
     +>fsck-object-isa
    -+test_expect_failure 'setup: unexpected objects with fsck' '
    ++test_expect_success 'setup: unexpected objects with fsck' '
     +	test_must_fail git fsck 2>err &&
     +	sed -n -e "/^error: object .* is a .*, not a .*$/ {
     +		s/^error: object \([0-9a-f]*\) is a \([a-z]*\), not a [a-z]*$/\\1 \\2/;
    @@ t/t6102-rev-list-unexpected-objects.sh: test_expect_success 'traverse unexpected
     +	'
     +done <fsck-object-isa
     +
    -+test_expect_success 'traverse unexpected non-tag tag (blob seen to blob)' '
    ++test_expect_success !SANITIZE_LEAK 'traverse unexpected non-tag tag (blob seen to blob)' '
     +	test_must_fail git rev-list --objects $blob $commit_tag_blob 2>err &&
     +	cat >expected <<-EOF &&
     +	error: object $blob is a blob, not a commit
3:  468af961dc4 ! 2:  96398731841 tag: don't misreport type of tagged objects in errors
    @@ blob.c
     +	return lookup_blob_type(r, oid, OBJ_NONE);
      }
      
    - int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size)
    + void parse_blob_buffer(struct blob *item)
     
      ## blob.h ##
     @@ blob.h: struct blob {
    @@ blob.h: struct blob {
     +			      const struct object_id *oid,
     +			      enum object_type type);
      
    - int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size);
    - 
    + /**
    +  * Blobs do not contain references to other objects and do not have
     
      ## commit.c ##
     @@ commit.c: struct commit *lookup_commit_object(struct repository *r,
    @@ object.c: struct object *parse_object_buffer(struct repository *r, const struct
     -		struct blob *blob = lookup_blob(r, oid);
     +		struct blob *blob = lookup_blob_type(r, oid, type);
      		if (blob) {
    - 			if (parse_blob_buffer(blob, buffer, size))
    - 				return NULL;
    + 			parse_blob_buffer(blob);
      			obj = &blob->object;
      		}
      	} else if (type == OBJ_TREE) {
    @@ object.h: struct object *lookup_object(struct repository *r, const struct object
       * Returns the object, having parsed it to find out what it is.
     
      ## t/t6102-rev-list-unexpected-objects.sh ##
    -@@ t/t6102-rev-list-unexpected-objects.sh: test_expect_success 'setup unexpected non-tag tag' '
    +@@ t/t6102-rev-list-unexpected-objects.sh: test_expect_success !SANITIZE_LEAK 'setup unexpected non-tag tag' '
      	git update-ref refs/tags/commit_tag_blob $commit_tag_blob
      '
      
    --test_expect_failure 'traverse unexpected incorrectly typed tag (to commit & tag)' '
    -+test_expect_success 'traverse unexpected incorrectly typed tag (to commit & tag)' '
    +-test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to commit & tag)' '
    ++test_expect_success !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to commit & tag)' '
      	test_must_fail git rev-list --objects $tag_tag_commit 2>err &&
      	cat >expect <<-EOF &&
      	error: object $commit is a commit, not a tag
    -@@ t/t6102-rev-list-unexpected-objects.sh: test_expect_failure 'traverse unexpected incorrectly typed tag (to commit & tag)
    +@@ t/t6102-rev-list-unexpected-objects.sh: test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (t
      	test_cmp expect err
      '
      
    --test_expect_failure 'traverse unexpected incorrectly typed tag (to tree)' '
    -+test_expect_success 'traverse unexpected incorrectly typed tag (to tree)' '
    +-test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to tree)' '
    ++test_expect_success !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to tree)' '
      	test_must_fail git rev-list --objects $tag_tag_tree 2>err &&
      	cat >expect <<-EOF &&
      	error: object $tree is a tree, not a tag
    -@@ t/t6102-rev-list-unexpected-objects.sh: test_expect_failure 'traverse unexpected incorrectly typed tag (to tree)' '
    +@@ t/t6102-rev-list-unexpected-objects.sh: test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (t
      	test_cmp expect err
      '
      
    --test_expect_failure 'traverse unexpected incorrectly typed tag (to blob)' '
    -+test_expect_success 'traverse unexpected incorrectly typed tag (to blob)' '
    +-test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to blob)' '
    ++test_expect_success !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to blob)' '
      	test_must_fail git rev-list --objects $tag_tag_blob 2>err &&
      	cat >expect <<-EOF &&
      	error: object $blob is a blob, not a tag
    -@@ t/t6102-rev-list-unexpected-objects.sh: test_expect_failure 'traverse unexpected incorrectly typed tag (to blob)' '
    +@@ t/t6102-rev-list-unexpected-objects.sh: test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (t
      	test_cmp expect err
      '
      
    --test_expect_failure 'traverse unexpected non-tag tag (tree seen to blob)' '
    -+test_expect_success 'traverse unexpected non-tag tag (tree seen to blob)' '
    +-test_expect_failure !SANITIZE_LEAK 'traverse unexpected non-tag tag (tree seen to blob)' '
    ++test_expect_success !SANITIZE_LEAK 'traverse unexpected non-tag tag (tree seen to blob)' '
      	test_must_fail git rev-list --objects $tree $commit_tag_blob 2>err &&
      	cat >expect <<-EOF &&
      	error: object $blob is a blob, not a commit
4:  1a9dcb9e05d ! 3:  2493988c41c tag: don't emit potentially incorrect "object is a X, not a Y"
    @@ Commit message
         Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
     
      ## t/t6102-rev-list-unexpected-objects.sh ##
    -@@ t/t6102-rev-list-unexpected-objects.sh: test_expect_success 'traverse unexpected non-tag tag (tree seen to blob)' '
    +@@ t/t6102-rev-list-unexpected-objects.sh: test_expect_success !SANITIZE_LEAK 'traverse unexpected non-tag tag (tree seen t
      '
      
      
    --test_expect_failure 'traverse unexpected objects with for-each-ref' '
    -+test_expect_success 'traverse unexpected objects with for-each-ref' '
    +-test_expect_failure !SANITIZE_LEAK 'traverse unexpected objects with for-each-ref' '
    ++test_expect_success !SANITIZE_LEAK 'traverse unexpected objects with for-each-ref' '
      	cat >expect <<-EOF &&
      	error: bad tag pointer to $tree in $tag_tag_tree
      	fatal: parse_object_buffer failed on $tag_tag_tree for refs/tags/tag_tag_tree
    -@@ t/t6102-rev-list-unexpected-objects.sh: test_expect_failure 'traverse unexpected objects with for-each-ref' '
    - '
    - 
    - >fsck-object-isa
    --test_expect_failure 'setup: unexpected objects with fsck' '
    -+test_expect_success 'setup: unexpected objects with fsck' '
    - 	test_must_fail git fsck 2>err &&
    - 	sed -n -e "/^error: object .* is a .*, not a .*$/ {
    - 		s/^error: object \([0-9a-f]*\) is a \([a-z]*\), not a [a-z]*$/\\1 \\2/;
    -@@ t/t6102-rev-list-unexpected-objects.sh: test_expect_failure 'setup: unexpected objects with fsck' '
    +@@ t/t6102-rev-list-unexpected-objects.sh: test_expect_success 'setup: unexpected objects with fsck' '
      
      while read oid type
      do
-- 
2.39.0.1153.g589e4efe9dc


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 1/3] object tests: add test for unexpected objects in tags
  2022-12-30  1:52         ` [PATCH v2 0/3] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason
@ 2022-12-30  1:52           ` Ævar Arnfjörð Bjarmason
  2022-12-30  4:25             ` Junio C Hamano
  2022-12-30  1:52           ` [PATCH v2 2/3] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason
  2022-12-30  1:52           ` [PATCH v2 3/3] tag: don't emit potentially incorrect "object is a X, not a Y" Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 22+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-12-30  1:52 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Junio C Hamano, Jeff King, Jonathan Tan,
	Kousik Sanagavarapu, Ævar Arnfjörð Bjarmason

Fix a blind spot in the tests added in 0616617c7e1 (t: introduce tests
for unexpected object types, 2019-04-09), there were no meaningful
tests for checking how we reported on finding the incorrect object
type in a tag, i.e. one that broke the "type" promise in the tag
header.

We'll report the wrong object type in these cases, and thus fail on
the "test_cmp", e.g. for the first "error: " output being tested here
we should say "$commit is a tag, not a commit", instead we say
"$commit is a commit, not a tag". This will be fixed in a subsequent
commit.

See the discussion & notes in [1] and downthread of there for test
snippets that are adapted here.

In the case of "fsck" which objects we visit in what order, and if we
report errors on them depends on their OIDs. So the test uses the
technique of extracting the OID/type combinations that fsck does
report, and asserting that those are correct (currently, it's far from
correct).

As these tests happen to run into a memory leak skip them under
SANITIZE=leak, as the test file was previously marked leak-free in
[3]. There is a concurrent fix for the leak in question[4].

1. https://lore.kernel.org/git/YGTGgFI19fS7Uv6I@coredump.intra.peff.net/
2. https://lore.kernel.org/git/patch-18.20-aa4df0e1b5c-20221228T175512Z-avarab@gmail.com/
3. dd9cede9136 (leak tests: mark some rev-list tests as passing with
   SANITIZE=leak, 2021-10-31)
4. https://lore.kernel.org/git/patch-18.20-aa4df0e1b5c-20221228T175512Z-avarab@gmail.com/

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t6102-rev-list-unexpected-objects.sh | 146 +++++++++++++++++++++++++
 1 file changed, 146 insertions(+)

diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh
index 9350b5fd2c2..f1c30db2654 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -130,4 +130,150 @@ test_expect_success 'traverse unexpected non-blob tag (seen)' '
 	test_i18ngrep "not a blob" output
 '
 
+test_expect_success !SANITIZE_LEAK 'setup unexpected non-tag tag' '
+	test_when_finished "git tag -d tag-commit tag-tag" &&
+
+	git tag -a -m"my tagged commit" tag-commit $commit &&
+	tag_commit=$(git rev-parse tag-commit) &&
+	git tag -a -m"my tagged tag" tag-tag tag-commit &&
+	tag_tag=$(git rev-parse tag-tag) &&
+
+	git cat-file tag tag-tag >good-tag-tag &&
+	git cat-file tag tag-commit >good-commit-tag &&
+
+	sed -e "s/$tag_commit/$commit/" <good-tag-tag >broken-tag-tag-commit &&
+	sed -e "s/$tag_commit/$tree/" <good-tag-tag >broken-tag-tag-tree &&
+	sed -e "s/$tag_commit/$blob/" <good-tag-tag >broken-tag-tag-blob &&
+
+	sed -e "s/$commit/$tag_commit/" <good-commit-tag >broken-commit-tag-tag &&
+	sed -e "s/$commit/$tree/" <good-commit-tag >broken-commit-tag-tree &&
+	sed -e "s/$commit/$blob/" <good-commit-tag >broken-commit-tag-blob &&
+
+	tag_tag_commit=$(git hash-object -w -t tag broken-tag-tag-commit) &&
+	tag_tag_tree=$(git hash-object -w -t tag broken-tag-tag-tree) &&
+	tag_tag_blob=$(git hash-object -w -t tag broken-tag-tag-blob) &&
+
+	git update-ref refs/tags/tag_tag_commit $tag_tag_commit &&
+	git update-ref refs/tags/tag_tag_tree $tag_tag_tree &&
+	git update-ref refs/tags/tag_tag_blob $tag_tag_blob &&
+
+	commit_tag_tag=$(git hash-object -w -t tag broken-commit-tag-tag) &&
+	commit_tag_tree=$(git hash-object -w -t tag broken-commit-tag-tree) &&
+	commit_tag_blob=$(git hash-object -w -t tag broken-commit-tag-blob) &&
+
+	git update-ref refs/tags/commit_tag_tag $commit_tag_tag &&
+	git update-ref refs/tags/commit_tag_tree $commit_tag_tree &&
+	git update-ref refs/tags/commit_tag_blob $commit_tag_blob
+'
+
+test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to commit & tag)' '
+	test_must_fail git rev-list --objects $tag_tag_commit 2>err &&
+	cat >expect <<-EOF &&
+	error: object $commit is a commit, not a tag
+	fatal: bad object $commit
+	EOF
+	test_cmp expect err &&
+
+	test_must_fail git rev-list --objects $commit_tag_tag 2>err &&
+	cat >expect <<-EOF &&
+	error: object $tag_commit is a tag, not a commit
+	fatal: bad object $tag_commit
+	EOF
+	test_cmp expect err
+'
+
+test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to tree)' '
+	test_must_fail git rev-list --objects $tag_tag_tree 2>err &&
+	cat >expect <<-EOF &&
+	error: object $tree is a tree, not a tag
+	fatal: bad object $tree
+	EOF
+	test_cmp expect err &&
+
+	test_must_fail git rev-list --objects $commit_tag_tree 2>err &&
+	cat >expect <<-EOF &&
+	error: object $tree is a tree, not a commit
+	fatal: bad object $tree
+	EOF
+	test_cmp expect err
+'
+
+test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to blob)' '
+	test_must_fail git rev-list --objects $tag_tag_blob 2>err &&
+	cat >expect <<-EOF &&
+	error: object $blob is a blob, not a tag
+	fatal: bad object $blob
+	EOF
+	test_cmp expect err &&
+
+	test_must_fail git rev-list --objects $commit_tag_blob 2>err &&
+	cat >expect <<-EOF &&
+	error: object $blob is a blob, not a commit
+	fatal: bad object $blob
+	EOF
+	test_cmp expect err
+'
+
+test_expect_failure !SANITIZE_LEAK 'traverse unexpected non-tag tag (tree seen to blob)' '
+	test_must_fail git rev-list --objects $tree $commit_tag_blob 2>err &&
+	cat >expect <<-EOF &&
+	error: object $blob is a blob, not a commit
+	fatal: bad object $blob
+	EOF
+	test_cmp expect err &&
+
+	test_must_fail git rev-list --objects $tree $tag_tag_blob 2>err &&
+	cat >expect <<-EOF &&
+	error: object $blob is a blob, not a tag
+	fatal: bad object $blob
+	EOF
+	test_cmp expect err
+'
+
+
+test_expect_failure !SANITIZE_LEAK 'traverse unexpected objects with for-each-ref' '
+	cat >expect <<-EOF &&
+	error: bad tag pointer to $tree in $tag_tag_tree
+	fatal: parse_object_buffer failed on $tag_tag_tree for refs/tags/tag_tag_tree
+	EOF
+	test_must_fail git for-each-ref --format="%(*objectname)" 2>actual &&
+	test_cmp expect actual
+'
+
+>fsck-object-isa
+test_expect_success 'setup: unexpected objects with fsck' '
+	test_must_fail git fsck 2>err &&
+	sed -n -e "/^error: object .* is a .*, not a .*$/ {
+		s/^error: object \([0-9a-f]*\) is a \([a-z]*\), not a [a-z]*$/\\1 \\2/;
+		p;
+	}" <err >fsck-object-isa
+'
+
+while read oid type
+do
+	test_expect_failure "fsck knows unexpected object $oid is $type" '
+		git cat-file -t $oid >expect &&
+		echo $type >actual &&
+		test_cmp expect actual
+	'
+done <fsck-object-isa
+
+test_expect_success !SANITIZE_LEAK 'traverse unexpected non-tag tag (blob seen to blob)' '
+	test_must_fail git rev-list --objects $blob $commit_tag_blob 2>err &&
+	cat >expected <<-EOF &&
+	error: object $blob is a blob, not a commit
+	error: bad tag pointer to $blob in $commit_tag_blob
+	fatal: bad object $commit_tag_blob
+	EOF
+	test_cmp expected err &&
+
+	test_must_fail git rev-list --objects $blob $tag_tag_blob 2>err &&
+	cat >expected <<-EOF &&
+	error: object $blob is a blob, not a tag
+	error: bad tag pointer to $blob in $tag_tag_blob
+	fatal: bad object $tag_tag_blob
+	EOF
+	test_cmp expected err
+'
+
 test_done
-- 
2.39.0.1153.g589e4efe9dc


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 2/3] tag: don't misreport type of tagged objects in errors
  2022-12-30  1:52         ` [PATCH v2 0/3] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason
  2022-12-30  1:52           ` [PATCH v2 1/3] object tests: add test for unexpected objects in tags Ævar Arnfjörð Bjarmason
@ 2022-12-30  1:52           ` Ævar Arnfjörð Bjarmason
  2022-12-30  6:07             ` Junio C Hamano
  2022-12-30  1:52           ` [PATCH v2 3/3] tag: don't emit potentially incorrect "object is a X, not a Y" Ævar Arnfjörð Bjarmason
  2 siblings, 1 reply; 22+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-12-30  1:52 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Junio C Hamano, Jeff King, Jonathan Tan,
	Kousik Sanagavarapu, Ævar Arnfjörð Bjarmason

Fix a regression in 89e4202f982 ([PATCH] Parse tags for absent
objects, 2005-06-21) (yes, that ancient!) and correctly report an
error on a tag like:

    object <a tree hash>
    type commit

As:

    error: object <a tree hash> is tree, not a commit

Instead of our long-standing misbehavior of inverting the two, and
reporting:

    error: object <a tree hash> is commit, not a tree

Which, as can be trivially seen with 'git cat-file -t <a tree hash>'
is incorrect.

The reason for this misreporting is that in parse_tag_buffer() we end
up doing a lookup_{blob,commit,tag,tree}() depending on what we read
out of the "type" line.

If we haven't parsed that object before we end up dispatching to the
type-specific lookup functions, e.g. this for commit.c in
lookup_commit_type():

	struct object *obj = lookup_object(r, oid);
	if (!obj)
		return create_object(r, oid, alloc_commit_node(r));

Its allocation will then set the obj->type according to what the tag
told us the type was, but which we've never validated. At this point
we've got an object in memory that hasn't been parsed, and whose type
is incorrect, since we mistrusted a tag to tell us the type.

Then when we actually load the object with parse_object() we read it
and find that it's a "tree". See 8ff226a9d5e (add object_as_type
helper for casting objects, 2014-07-13) for that behavior (that's just
a refactoring commit, but shows all the code involved).

Which explains why we inverted the error report. Normally when
object_as_type() is called it's by the lookup_{blob,commit,tag,tree}()
functions via parse_object(). At that point we can trust the
obj->type.

In the case of parsing objects we've learned about via a tag with an
incorrect type it's the opposite, the obj->type isn't correct and
holds the mislabeled type, but we're parsing the object and know for
sure what object type we're dealing with.

So, let's add "lookup_{blob,commit,tag,tree}_type()" functions to go
with the existing ""lookup_{blob,commit,tag,tree}()", we'll call these
from "parse_object_buffer()" where we actually know the type, as
opposed to the "parse_tag_buffer()" code where we're just guessing
what it might be.

This only help with the cases where we do see the tag reference, and
then end up doing a full parse of the object. But as seen in the
"for-each-ref" and "fsck" tests we have cases where we'll never fully
parse it.

Those will be handled in a subsequent commit, but for now this handles
the common case of "show" etc. running into these.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 blob.c                                 | 11 +++++++++--
 blob.h                                 |  3 +++
 commit.c                               | 11 +++++++++--
 commit.h                               |  2 ++
 object.c                               | 20 ++++++++++++++++----
 object.h                               |  2 ++
 t/t6102-rev-list-unexpected-objects.sh |  8 ++++----
 tag.c                                  | 21 +++++++++++++++++----
 tag.h                                  |  2 ++
 tree.c                                 | 11 +++++++++--
 tree.h                                 |  2 ++
 11 files changed, 75 insertions(+), 18 deletions(-)

diff --git a/blob.c b/blob.c
index 8f83523b0cd..d6e3d64213d 100644
--- a/blob.c
+++ b/blob.c
@@ -5,12 +5,19 @@
 
 const char *blob_type = "blob";
 
-struct blob *lookup_blob(struct repository *r, const struct object_id *oid)
+struct blob *lookup_blob_type(struct repository *r,
+			      const struct object_id *oid,
+			      enum object_type type)
 {
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
 		return create_object(r, oid, alloc_blob_node(r));
-	return object_as_type(obj, OBJ_BLOB, 0);
+	return object_as_type_hint(obj, OBJ_BLOB, type);
+}
+
+struct blob *lookup_blob(struct repository *r, const struct object_id *oid)
+{
+	return lookup_blob_type(r, oid, OBJ_NONE);
 }
 
 void parse_blob_buffer(struct blob *item)
diff --git a/blob.h b/blob.h
index 74555c90c44..b91b1600bda 100644
--- a/blob.h
+++ b/blob.h
@@ -10,6 +10,9 @@ struct blob {
 };
 
 struct blob *lookup_blob(struct repository *r, const struct object_id *oid);
+struct blob *lookup_blob_type(struct repository *r,
+			      const struct object_id *oid,
+			      enum object_type type);
 
 /**
  * Blobs do not contain references to other objects and do not have
diff --git a/commit.c b/commit.c
index d00780bee5f..8e30c21816c 100644
--- a/commit.c
+++ b/commit.c
@@ -67,12 +67,19 @@ struct commit *lookup_commit_object(struct repository *r,
 
 }
 
-struct commit *lookup_commit(struct repository *r, const struct object_id *oid)
+struct commit *lookup_commit_type(struct repository *r,
+				  const struct object_id *oid,
+				  enum object_type type)
 {
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
 		return create_object(r, oid, alloc_commit_node(r));
-	return object_as_type(obj, OBJ_COMMIT, 0);
+	return object_as_type_hint(obj, OBJ_COMMIT, type);
+}
+
+struct commit *lookup_commit(struct repository *r, const struct object_id *oid)
+{
+	return lookup_commit_type(r, oid, OBJ_NONE);
 }
 
 struct commit *lookup_commit_reference_by_name(const char *name)
diff --git a/commit.h b/commit.h
index fa39202fa6b..95001a29d6b 100644
--- a/commit.h
+++ b/commit.h
@@ -78,6 +78,8 @@ struct commit *lookup_commit_object(struct repository *r, const struct object_id
  * "oid" is not in the object cache.
  */
 struct commit *lookup_commit(struct repository *r, const struct object_id *oid);
+struct commit *lookup_commit_type(struct repository *r, const struct object_id *oid,
+				  enum object_type type);
 struct commit *lookup_commit_reference(struct repository *r,
 				       const struct object_id *oid);
 struct commit *lookup_commit_reference_gently(struct repository *r,
diff --git a/object.c b/object.c
index 344087de4de..81ec18f0739 100644
--- a/object.c
+++ b/object.c
@@ -177,6 +177,18 @@ void *object_as_type(struct object *obj, enum object_type type, int quiet)
 	}
 }
 
+void *object_as_type_hint(struct object *obj, enum object_type type,
+			  enum object_type hint)
+{
+	if (hint != OBJ_NONE && obj->type != OBJ_NONE && obj->type != type) {
+		error(_("object %s is a %s, not a %s"), oid_to_hex(&obj->oid),
+		      type_name(type), type_name(obj->type));
+		obj->type = type;
+		return NULL;
+	}
+	return object_as_type(obj, type, 0);;
+}
+
 struct object *lookup_unknown_object(struct repository *r, const struct object_id *oid)
 {
 	struct object *obj = lookup_object(r, oid);
@@ -210,13 +222,13 @@ struct object *parse_object_buffer(struct repository *r, const struct object_id
 
 	obj = NULL;
 	if (type == OBJ_BLOB) {
-		struct blob *blob = lookup_blob(r, oid);
+		struct blob *blob = lookup_blob_type(r, oid, type);
 		if (blob) {
 			parse_blob_buffer(blob);
 			obj = &blob->object;
 		}
 	} else if (type == OBJ_TREE) {
-		struct tree *tree = lookup_tree(r, oid);
+		struct tree *tree = lookup_tree_type(r, oid, type);
 		if (tree) {
 			obj = &tree->object;
 			if (!tree->buffer)
@@ -228,7 +240,7 @@ struct object *parse_object_buffer(struct repository *r, const struct object_id
 			}
 		}
 	} else if (type == OBJ_COMMIT) {
-		struct commit *commit = lookup_commit(r, oid);
+		struct commit *commit = lookup_commit_type(r, oid, type);
 		if (commit) {
 			if (parse_commit_buffer(r, commit, buffer, size, 1))
 				return NULL;
@@ -240,7 +252,7 @@ struct object *parse_object_buffer(struct repository *r, const struct object_id
 			obj = &commit->object;
 		}
 	} else if (type == OBJ_TAG) {
-		struct tag *tag = lookup_tag(r, oid);
+		struct tag *tag = lookup_tag_type(r, oid, type);
 		if (tag) {
 			if (parse_tag_buffer(r, tag, buffer, size))
 			       return NULL;
diff --git a/object.h b/object.h
index 31ebe114585..042c304d3a4 100644
--- a/object.h
+++ b/object.h
@@ -122,6 +122,8 @@ struct object *lookup_object(struct repository *r, const struct object_id *oid);
 void *create_object(struct repository *r, const struct object_id *oid, void *obj);
 
 void *object_as_type(struct object *obj, enum object_type type, int quiet);
+void *object_as_type_hint(struct object *obj, enum object_type type,
+			  enum object_type hint);
 
 /*
  * Returns the object, having parsed it to find out what it is.
diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh
index f1c30db2654..590e2523d0c 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -166,7 +166,7 @@ test_expect_success !SANITIZE_LEAK 'setup unexpected non-tag tag' '
 	git update-ref refs/tags/commit_tag_blob $commit_tag_blob
 '
 
-test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to commit & tag)' '
+test_expect_success !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to commit & tag)' '
 	test_must_fail git rev-list --objects $tag_tag_commit 2>err &&
 	cat >expect <<-EOF &&
 	error: object $commit is a commit, not a tag
@@ -182,7 +182,7 @@ test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (t
 	test_cmp expect err
 '
 
-test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to tree)' '
+test_expect_success !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to tree)' '
 	test_must_fail git rev-list --objects $tag_tag_tree 2>err &&
 	cat >expect <<-EOF &&
 	error: object $tree is a tree, not a tag
@@ -198,7 +198,7 @@ test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (t
 	test_cmp expect err
 '
 
-test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to blob)' '
+test_expect_success !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to blob)' '
 	test_must_fail git rev-list --objects $tag_tag_blob 2>err &&
 	cat >expect <<-EOF &&
 	error: object $blob is a blob, not a tag
@@ -214,7 +214,7 @@ test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (t
 	test_cmp expect err
 '
 
-test_expect_failure !SANITIZE_LEAK 'traverse unexpected non-tag tag (tree seen to blob)' '
+test_expect_success !SANITIZE_LEAK 'traverse unexpected non-tag tag (tree seen to blob)' '
 	test_must_fail git rev-list --objects $tree $commit_tag_blob 2>err &&
 	cat >expect <<-EOF &&
 	error: object $blob is a blob, not a commit
diff --git a/tag.c b/tag.c
index dfbcd7fcc24..19453c2edbf 100644
--- a/tag.c
+++ b/tag.c
@@ -100,12 +100,18 @@ struct object *deref_tag_noverify(struct object *o)
 	return o;
 }
 
-struct tag *lookup_tag(struct repository *r, const struct object_id *oid)
+struct tag *lookup_tag_type(struct repository *r, const struct object_id *oid,
+			    enum object_type type)
 {
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
 		return create_object(r, oid, alloc_tag_node(r));
-	return object_as_type(obj, OBJ_TAG, 0);
+	return object_as_type_hint(obj, OBJ_TAG, type);
+}
+
+struct tag *lookup_tag(struct repository *r, const struct object_id *oid)
+{
+	return lookup_tag_type(r, oid, OBJ_NONE);
 }
 
 static timestamp_t parse_tag_date(const char *buf, const char *tail)
@@ -135,6 +141,7 @@ void release_tag_memory(struct tag *t)
 
 int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, unsigned long size)
 {
+	struct object *obj;
 	struct object_id oid;
 	char type[20];
 	const char *bufptr = data;
@@ -169,7 +176,10 @@ int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u
 	type[nl - bufptr] = '\0';
 	bufptr = nl + 1;
 
-	if (!strcmp(type, blob_type)) {
+	obj = lookup_object(r, &oid);
+	if (obj) {
+		item->tagged = obj;
+	} else if (!strcmp(type, blob_type)) {
 		item->tagged = (struct object *)lookup_blob(r, &oid);
 	} else if (!strcmp(type, tree_type)) {
 		item->tagged = (struct object *)lookup_tree(r, &oid);
@@ -182,10 +192,13 @@ int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u
 			     type, oid_to_hex(&item->object.oid));
 	}
 
-	if (!item->tagged)
+	if (!item->tagged || strcmp(type_name(item->tagged->type), type)) {
+		error(_("object %s is a %s, not a %s"), oid_to_hex(&oid),
+		      type_name(item->tagged->type), type);
 		return error("bad tag pointer to %s in %s",
 			     oid_to_hex(&oid),
 			     oid_to_hex(&item->object.oid));
+	}
 
 	if (bufptr + 4 < tail && starts_with(bufptr, "tag "))
 		; 		/* good */
diff --git a/tag.h b/tag.h
index 3ce8e721924..42bd3e64011 100644
--- a/tag.h
+++ b/tag.h
@@ -12,6 +12,8 @@ struct tag {
 	timestamp_t date;
 };
 struct tag *lookup_tag(struct repository *r, const struct object_id *oid);
+struct tag *lookup_tag_type(struct repository *r, const struct object_id *oid,
+			    enum object_type type);
 int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, unsigned long size);
 int parse_tag(struct tag *item);
 void release_tag_memory(struct tag *t);
diff --git a/tree.c b/tree.c
index 410e3b477e5..1a730249bb8 100644
--- a/tree.c
+++ b/tree.c
@@ -102,12 +102,19 @@ int cmp_cache_name_compare(const void *a_, const void *b_)
 				  ce2->name, ce2->ce_namelen, ce_stage(ce2));
 }
 
-struct tree *lookup_tree(struct repository *r, const struct object_id *oid)
+struct tree *lookup_tree_type(struct repository *r,
+			      const struct object_id *oid,
+			      enum object_type type)
 {
 	struct object *obj = lookup_object(r, oid);
 	if (!obj)
 		return create_object(r, oid, alloc_tree_node(r));
-	return object_as_type(obj, OBJ_TREE, 0);
+	return object_as_type_hint(obj, OBJ_TREE, type);
+}
+
+struct tree *lookup_tree(struct repository *r, const struct object_id *oid)
+{
+	return lookup_tree_type(r, oid, OBJ_NONE);
 }
 
 int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size)
diff --git a/tree.h b/tree.h
index 6efff003e21..4af3b617f3d 100644
--- a/tree.h
+++ b/tree.h
@@ -15,6 +15,8 @@ struct tree {
 extern const char *tree_type;
 
 struct tree *lookup_tree(struct repository *r, const struct object_id *oid);
+struct tree *lookup_tree_type(struct repository *r, const struct object_id *oid,
+			      enum object_type type);
 
 int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size);
 
-- 
2.39.0.1153.g589e4efe9dc


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 3/3] tag: don't emit potentially incorrect "object is a X, not a Y"
  2022-12-30  1:52         ` [PATCH v2 0/3] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason
  2022-12-30  1:52           ` [PATCH v2 1/3] object tests: add test for unexpected objects in tags Ævar Arnfjörð Bjarmason
  2022-12-30  1:52           ` [PATCH v2 2/3] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason
@ 2022-12-30  1:52           ` Ævar Arnfjörð Bjarmason
  2 siblings, 0 replies; 22+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2022-12-30  1:52 UTC (permalink / raw)
  To: git
  Cc: Taylor Blau, Junio C Hamano, Jeff King, Jonathan Tan,
	Kousik Sanagavarapu, Ævar Arnfjörð Bjarmason

As noted in the preceding commit we weren't handling cases where we
see a reference to a bad "type" in a "tag", but then end up not fully
parsing the object.

In those cases let's only claim that we have a bad tag pointer, but
emit "is a %s, not a %s".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/t6102-rev-list-unexpected-objects.sh | 4 ++--
 tag.c                                  | 5 +++--
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh
index 590e2523d0c..98e36e73698 100755
--- a/t/t6102-rev-list-unexpected-objects.sh
+++ b/t/t6102-rev-list-unexpected-objects.sh
@@ -231,7 +231,7 @@ test_expect_success !SANITIZE_LEAK 'traverse unexpected non-tag tag (tree seen t
 '
 
 
-test_expect_failure !SANITIZE_LEAK 'traverse unexpected objects with for-each-ref' '
+test_expect_success !SANITIZE_LEAK 'traverse unexpected objects with for-each-ref' '
 	cat >expect <<-EOF &&
 	error: bad tag pointer to $tree in $tag_tag_tree
 	fatal: parse_object_buffer failed on $tag_tag_tree for refs/tags/tag_tag_tree
@@ -251,7 +251,7 @@ test_expect_success 'setup: unexpected objects with fsck' '
 
 while read oid type
 do
-	test_expect_failure "fsck knows unexpected object $oid is $type" '
+	test_expect_success "fsck knows unexpected object $oid is $type" '
 		git cat-file -t $oid >expect &&
 		echo $type >actual &&
 		test_cmp expect actual
diff --git a/tag.c b/tag.c
index 19453c2edbf..ad92cf89209 100644
--- a/tag.c
+++ b/tag.c
@@ -193,8 +193,9 @@ int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u
 	}
 
 	if (!item->tagged || strcmp(type_name(item->tagged->type), type)) {
-		error(_("object %s is a %s, not a %s"), oid_to_hex(&oid),
-		      type_name(item->tagged->type), type);
+		if (item->tagged && item->tagged->parsed)
+			error(_("object %s is a %s, not a %s"), oid_to_hex(&oid),
+			      type_name(item->tagged->type), type);
 		return error("bad tag pointer to %s in %s",
 			     oid_to_hex(&oid),
 			     oid_to_hex(&item->object.oid));
-- 
2.39.0.1153.g589e4efe9dc


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/3] object tests: add test for unexpected objects in tags
  2022-12-30  1:52           ` [PATCH v2 1/3] object tests: add test for unexpected objects in tags Ævar Arnfjörð Bjarmason
@ 2022-12-30  4:25             ` Junio C Hamano
  0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2022-12-30  4:25 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Taylor Blau, Jeff King, Jonathan Tan, Kousik Sanagavarapu

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> +test_expect_success !SANITIZE_LEAK 'setup unexpected non-tag tag' '
> +	test_when_finished "git tag -d tag-commit tag-tag" &&
> +
> +	git tag -a -m"my tagged commit" tag-commit $commit &&
> +	tag_commit=$(git rev-parse tag-commit) &&
> +	git tag -a -m"my tagged tag" tag-tag tag-commit &&
> +	tag_tag=$(git rev-parse tag-tag) &&
> +
> +	git cat-file tag tag-tag >good-tag-tag &&
> +	git cat-file tag tag-commit >good-commit-tag &&
> +
> +	sed -e "s/$tag_commit/$commit/" <good-tag-tag >broken-tag-tag-commit &&
> +	sed -e "s/$tag_commit/$tree/" <good-tag-tag >broken-tag-tag-tree &&
> +	sed -e "s/$tag_commit/$blob/" <good-tag-tag >broken-tag-tag-blob &&
> +
> +	sed -e "s/$commit/$tag_commit/" <good-commit-tag >broken-commit-tag-tag &&
> +	sed -e "s/$commit/$tree/" <good-commit-tag >broken-commit-tag-tree &&
> +	sed -e "s/$commit/$blob/" <good-commit-tag >broken-commit-tag-blob &&
> +
> +	tag_tag_commit=$(git hash-object -w -t tag broken-tag-tag-commit) &&
> +	tag_tag_tree=$(git hash-object -w -t tag broken-tag-tag-tree) &&
> +	tag_tag_blob=$(git hash-object -w -t tag broken-tag-tag-blob) &&

If the second block of 3 sed commands to prepare data for tags that
incorrectly claim to point at a commit are moved a bit, i.e. make
"sed's && hash-object's && update-ref's" as a logical group, the
above would become slightly easier to read, but in any case the
set-up step looks quite repetitive and boring to read ;-)

There is no strong reason to use broken-tag-* temporary files,
though.  Each of them is used exactly once, but you can just
pipe the output from "sed" to "git hash-object --stdin" without
losing any exit status, e.g.

	tag_tag_commit=$(sed -e '...' good-commit-tag |
			git hash-object -w -t tag --stdin)

> +	git update-ref refs/tags/tag_tag_commit $tag_tag_commit &&
> +	git update-ref refs/tags/tag_tag_tree $tag_tag_tree &&
> +	git update-ref refs/tags/tag_tag_blob $tag_tag_blob &&
> +
> +	commit_tag_tag=$(git hash-object -w -t tag broken-commit-tag-tag) &&
> +	commit_tag_tree=$(git hash-object -w -t tag broken-commit-tag-tree) &&
> +	commit_tag_blob=$(git hash-object -w -t tag broken-commit-tag-blob) &&
> +
> +	git update-ref refs/tags/commit_tag_tag $commit_tag_tag &&
> +	git update-ref refs/tags/commit_tag_tree $commit_tag_tree &&
> +	git update-ref refs/tags/commit_tag_blob $commit_tag_blob
> +'
> +
> +test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to commit & tag)' '
> +	test_must_fail git rev-list --objects $tag_tag_commit 2>err &&

Does this have to be "rev-list --objects" or would something like
"cat-file -t $tag_tag_commit^0" do?

Especially because "rev-list --objects" is a rather heavy-weight
command that does a lot of checking, I am wondering if it is
sensible to rely on the assumption that the errors expected below
will stay to be the only errors we get before the command exits
(hence a wish to replace it with a more narrowly focused comamnd).

> +	cat >expect <<-EOF &&
> +	error: object $commit is a commit, not a tag
> +	fatal: bad object $commit
> +	EOF
> +	test_cmp expect err &&



> +test_expect_failure !SANITIZE_LEAK 'traverse unexpected objects with for-each-ref' '
> +	cat >expect <<-EOF &&
> +	error: bad tag pointer to $tree in $tag_tag_tree
> +	fatal: parse_object_buffer failed on $tag_tag_tree for refs/tags/tag_tag_tree
> +	EOF

This depends on the fact that among the broken ones tag_tag_tree
sorts the earliest (and the command stops after barfing on a single
bad object), doesn't it?  I wonder if it makes the test more robust
by feeding refs/tags/tag_tag_tree from the command line to limit the
tips the command needs to inspect.

> +>fsck-object-isa

Move it inside the setup as the first command in case "git fsck"
succeeds?

> +test_expect_success 'setup: unexpected objects with fsck' '
> +	test_must_fail git fsck 2>err &&
> +	sed -n -e "/^error: object .* is a .*, not a .*$/ {
> +		s/^error: object \([0-9a-f]*\) is a \([a-z]*\), not a [a-z]*$/\\1 \\2/;
> +		p;
> +	}" <err >fsck-object-isa
> +'

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/3] tag: don't misreport type of tagged objects in errors
  2022-12-30  1:52           ` [PATCH v2 2/3] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason
@ 2022-12-30  6:07             ` Junio C Hamano
  0 siblings, 0 replies; 22+ messages in thread
From: Junio C Hamano @ 2022-12-30  6:07 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: git, Taylor Blau, Jeff King, Jonathan Tan, Kousik Sanagavarapu

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> diff --git a/blob.c b/blob.c
> index 8f83523b0cd..d6e3d64213d 100644
> --- a/blob.c
> +++ b/blob.c
> @@ -5,12 +5,19 @@
>  
>  const char *blob_type = "blob";
>  
> -struct blob *lookup_blob(struct repository *r, const struct object_id *oid)
> +struct blob *lookup_blob_type(struct repository *r,
> +			      const struct object_id *oid,
> +			      enum object_type type)
>  {
>  	struct object *obj = lookup_object(r, oid);
>  	if (!obj)
>  		return create_object(r, oid, alloc_blob_node(r));
> -	return object_as_type(obj, OBJ_BLOB, 0);
> +	return object_as_type_hint(obj, OBJ_BLOB, type);
> +}
> +
> +struct blob *lookup_blob(struct repository *r, const struct object_id *oid)
> +{
> +	return lookup_blob_type(r, oid, OBJ_NONE);
>  }

Between lookup_blob() and lookup_blob_type(), the distinction is
that the latter calls object_as_type_hint() to ensure that the real
type of the given object is of the type, which is given as "hint"?

Very confusing naming.  Perhaps because "hint" argument given to
"as-type-hint" is playing a role that is FAR stronger than "hint"?
It sounds more like enforcement.  Perhaps s/hint/check/ or something?

I am not sure exactly why, but lookup_blob_type() does look
confusing.  A natural question anybody would ask after seeing the
function mame and the extra parmater is: "If we want/expect to see a
blob, why do we give an extra type?"  To answer that question, "_type"
at the end of the function name and "type" that is an extra argument
should say what that thing is used for.  "enum object_type" is enough
to say what the extra argument is---we should use the parameter name
to convey what it is there for.

> +void *object_as_type_hint(struct object *obj, enum object_type type,
> +			  enum object_type hint)
> +{
> +	if (hint != OBJ_NONE && obj->type != OBJ_NONE && obj->type != type) {
> +		error(_("object %s is a %s, not a %s"), oid_to_hex(&obj->oid),
> +		      type_name(type), type_name(obj->type));
> +		obj->type = type;
> +		return NULL;
> +	}

There must be something utterly wrong in the above implementation.
As long as "hint" is ANY type other than OBJ_NONE, "check if type
and obj->type match and otherwise complain" logic kicks in, and the
actual value of "hint" does not contribute anything to the outcome.

IOW, the "hint" parameter is misleading to be "enum object_type", as
it is only used as a Boolean "do we bother to check?".

> +	return object_as_type(obj, type, 0);;

Stray extra semicolon at the end.

> +}

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-12-30  6:07 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-16 16:39 [RFC][PATCH] object.c: use has_object() instead of repo_has_object_file() Kousik Sanagavarapu
2022-11-16 18:20 ` Jeff King
2022-11-16 21:14   ` Jonathan Tan
2022-11-17 22:37     ` [PATCH 0/2] fixing parse_object() check for type mismatch Jeff King
2022-11-17 22:37       ` [PATCH 1/2] parse_object(): drop extra "has" check before checking object type Jeff King
2022-11-17 22:41       ` [PATCH 2/2] parse_object(): check on-disk type of suspected blob Jeff King
2022-11-18  0:36         ` Ævar Arnfjörð Bjarmason
2022-11-21 19:21           ` Jeff King
2022-11-18 11:46       ` [PATCH 0/4] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason
2022-11-18 11:46         ` [PATCH 1/4] object-file.c: free the "t.tag" in check_tag() Ævar Arnfjörð Bjarmason
2022-11-18 11:46         ` [PATCH 2/4] object tests: add test for unexpected objects in tags Ævar Arnfjörð Bjarmason
2022-11-18 11:46         ` [PATCH 3/4] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason
2022-11-18 11:46         ` [PATCH 4/4] tag: don't emit potentially incorrect "object is a X, not a Y" Ævar Arnfjörð Bjarmason
2022-12-30  1:52         ` [PATCH v2 0/3] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason
2022-12-30  1:52           ` [PATCH v2 1/3] object tests: add test for unexpected objects in tags Ævar Arnfjörð Bjarmason
2022-12-30  4:25             ` Junio C Hamano
2022-12-30  1:52           ` [PATCH v2 2/3] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason
2022-12-30  6:07             ` Junio C Hamano
2022-12-30  1:52           ` [PATCH v2 3/3] tag: don't emit potentially incorrect "object is a X, not a Y" Ævar Arnfjörð Bjarmason
2022-11-18 19:05       ` [PATCH 0/2] fixing parse_object() check for type mismatch Taylor Blau
2022-11-21 19:26         ` Jeff King
2022-11-22  0:05           ` Ævar Arnfjörð Bjarmason

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).