git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	git@vger.kernel.org, Taylor Blau <me@ttaylorr.com>,
	Elijah Newren <newren@gmail.com>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>
Subject: Re: [PATCH v2 10/10] tag: don't misreport type of tagged objects in errors
Date: Wed, 31 Mar 2021 14:59:12 -0400	[thread overview]
Message-ID: <YGTGgFI19fS7Uv6I@coredump.intra.peff.net> (raw)
In-Reply-To: <87k0pnkwej.fsf@evledraar.gmail.com>

On Wed, Mar 31, 2021 at 08:31:16PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > Ævar's patch tries to improve the case where we might _know_ which is
> > correct (because we're actually parsing the object contents), but of
> > course it covers only a fraction of cases. I'm not really opposed to
> > that per se, but I probably wouldn't bother myself.
> 
> What fraction of cases? As far as I can tell it covers all cases where
> we get this error.
> 
> If there is a case like what you're describing I haven't found it.

It would happen any time somebody calls lookup_foo() because they saw an
object referenced, but _doesn't_ parse it. And then somebody later calls
lookup_bar() in the same way. Neither of them consulted the actual
object database.

Try this with your patches:

-- >8 --
git init repo
cd repo

# just for making things deterministic
export GIT_COMMITTER_NAME='A U Thor'
export GIT_COMMITTER_EMAIL='author@example.com'
export GIT_COMMITTER_DATE='@1234567890 +0000'

blob=$(echo foo | git hash-object -w --stdin)
git tag -m 'tag of blob' tag-of-blob $blob
git update-ref refs/tags/tag-of-commit $(
  git cat-file tag tag-of-blob |
  sed s/blob/commit/g |
  git hash-object -w --stdin -t tag
)
git update-ref refs/tags/tag-of-tree $(
  git cat-file tag tag-of-blob |
  sed s/blob/tree/g |
  git hash-object -w --stdin -t tag
)

git fsck
-- >8 --

That fsck produces (257cc5642 is the blob):

  error: object 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 is a blob, not a commit
  error: 257cc5642cb1a054f08cc83f2d943e56fd3ebe99: object could not be parsed: .git/objects/25/7cc5642cb1a054f08cc83f2d943e56fd3ebe99
  error: object 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 is a commit, not a tree
  error: bad tag pointer to 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 in aaff0d42df150e1a734f6a8516878b2ea315ee0a
  error: aaff0d42df150e1a734f6a8516878b2ea315ee0a: object could not be parsed: .git/objects/aa/ff0d42df150e1a734f6a8516878b2ea315ee0a
  error: object 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 is a commit, not a blob
  error: bad tag pointer to 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 in bbd2b7077cd91ee6175cdc0e4c477c25c230cdc7
  error: bbd2b7077cd91ee6175cdc0e4c477c25c230cdc7: object could not be parsed: .git/objects/bb/d2b7077cd91ee6175cdc0e4c477c25c230cdc7

So we claim "is X, not Y" in multiple directions for the same object.

It might just be that there are spots in the fsck code that need to be
adjusted to use your new function (if they are indeed parsing the
referred-to object). But there are lots of places that don't actually
parse the object at the moment they're parsing the tag. E.g.:

  $ git for-each-ref --format='%(*objectname)'
  error: object 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 is a commit, not a tree
  error: bad tag pointer to 257cc5642cb1a054f08cc83f2d943e56fd3ebe99 in aaff0d42df150e1a734f6a8516878b2ea315ee0a
  Segmentation fault

Neither of those types is the correct one. And the segfault is just a
bonus! :)

I'd expect similar cases with parsing commit parents and tree pointers.
And probably tree entries whose modes are wrong.

> I.e. it happens when we have an un-parsed "struct object" whose type is
> inferred, and parse it to find out it's not what we expected.
> 
> It's not ambigious at all what the object actually is. It's just that
> the previous code was leaking the *assumption* about the type at the
> time of emitting the error, due to an apparent oversight with parsed
> v.s. non-parsed.
> 
> Or in other words, we're leaking the implementation detail that we
> pre-allocated an object struct of a given type in anticipation of
> holding a parsed version of that object soon.

Right. In the case that you are indeed parsing the object later, you can
say definitively "it is X in the odb, but seen as Y previously". But we
do not always hit the "is X, not Y" error when parsing the object. It
might be caused by two of these "pre-allocations" (though really I think
it is not just an implementation detail; the pre-allocation happened
because some other object referred to us as a given type, so it really
is a corruption in the repository. Just not in the object we mention).

> > @@ -169,10 +169,16 @@ void *object_as_type(struct object *obj, enum object_type type, int quiet)
> >  		return obj;
> >  	}
> >  	else {
> > -		if (!quiet)
> > -			error(_("object %s is a %s, not a %s"),
> > -			      oid_to_hex(&obj->oid),
> > -			      type_name(obj->type), type_name(type));
> > +		if (!(flags & OBJECT_AS_TYPE_QUIET)) {
> > +			if (flags & OBJECT_AS_TYPE_EXPECT_PARSED)
> > +				error(_("object %s is a %s, but was referred to as a %s"),
> > +				      oid_to_hex(&obj->oid), type_name(obj->type),
> > +				      type_name(type));
> > +			else
> > +				error(_("object %s referred to as both a %s and a %s"),
> > +				      oid_to_hex(&obj->oid),
> > +				      type_name(obj->type), type_name(type));
> > +		}
> >  		return NULL;
> >  	}
> >  }
> 
> Per the above I don't understand how you think there's any uncertainty
> here.
> 
> If I'm right and there isn't then first of all I don't see how we could
> emit 1/2 of those errors. The whole problem here is that we don't know
> the type of the un-parsed object (and presumably don't want to eagerly
> know, it would mean hitting the object store).

Forgetting for a moment how to trigger it with actual Git commands, the
root of the problem is that:

  lookup_tree(&oid);
  lookup_blob(&oid);

is going to produce an error message. But we cannot know which object
type is wrong and which is right (if any). So we'd want to produce the
"referred to as both" message.

_If_ the caller happens to know that it has just parsed the object
contents and got a tree, then it would call lookup_parsed_tree(&oid),
which would pass along OBJECT_AS_TYPE_EXPECT_PARSED, and produce the
other message.

In practice, of course those two lookup_foo() calls are not right next
to each other. But they may be triggered on an identical oid by two
references from different objects.

> But when we do know why would we beat around the bush and say "was
> referred to as X and Y" once we know what it is. 
> 
> AFAICT there's no more reason to think that parse_object_buffer() will
> be wrong about the type than "git cat-file -t" will be. They both use
> the same underlying functions to get that information.

My point is that we are not always coming from parse_object_buffer()
when we see these error messages.

-Peff

  reply	other threads:[~2021-03-31 18:59 UTC|newest]

Thread overview: 142+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-06-22  0:33 [PATCH 0/2] Pull objects of various types Daniel Barkalow
2005-06-22  0:35 ` [PATCH 1/2] Parse tags for absent objects Daniel Barkalow
2021-03-08 20:04   ` [PATCH 0/7] improve reporting of unexpected objects Ævar Arnfjörð Bjarmason
2021-03-28  2:13     ` [PATCH v2 00/10] " Ævar Arnfjörð Bjarmason
2021-03-28  2:13       ` [PATCH v2 01/10] object.c: stop supporting len == -1 in type_from_string_gently() Ævar Arnfjörð Bjarmason
2021-03-28  5:35         ` Junio C Hamano
2021-03-28 15:46           ` Ævar Arnfjörð Bjarmason
2021-03-28 18:25             ` Junio C Hamano
2021-04-22 18:09               ` Felipe Contreras
2021-03-28  2:13       ` [PATCH v2 02/10] object.c: refactor type_from_string_gently() Ævar Arnfjörð Bjarmason
2021-03-28  2:13       ` [PATCH v2 03/10] object.c: make type_from_string() return "enum object_type" Ævar Arnfjörð Bjarmason
2021-03-28  2:13       ` [PATCH v2 04/10] object-file.c: make oid_object_info() " Ævar Arnfjörð Bjarmason
2021-03-28  2:13       ` [PATCH v2 05/10] object-name.c: make dependency on object_type order more obvious Ævar Arnfjörð Bjarmason
2021-03-28  2:13       ` [PATCH v2 06/10] tree.c: fix misindentation in parse_tree_gently() Ævar Arnfjörð Bjarmason
2021-03-28  2:13       ` [PATCH v2 07/10] object.c: add a utility function for "expected type X, got Y" Ævar Arnfjörð Bjarmason
2021-03-28  2:13       ` [PATCH v2 08/10] object.c: add and use oid_is_type_or_die_msg() function Ævar Arnfjörð Bjarmason
2021-03-28  2:13       ` [PATCH v2 09/10] object tests: add test for unexpected objects in tags Ævar Arnfjörð Bjarmason
2021-03-28  2:13       ` [PATCH v2 10/10] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason
2021-03-30  5:50         ` Junio C Hamano
2021-03-31 11:02           ` Jeff King
2021-03-31 18:05             ` Junio C Hamano
2021-03-31 18:31             ` Ævar Arnfjörð Bjarmason
2021-03-31 18:59               ` Jeff King [this message]
2021-03-31 20:46                 ` Ævar Arnfjörð Bjarmason
2021-04-01  7:54                   ` Jeff King
2021-04-01  8:32                     ` [PATCH] ref-filter: fix NULL check for parse object failure Jeff King
2021-04-01 13:56                       ` [PATCH v2 0/5] mktag tests & fix for-each-ref segfault Ævar Arnfjörð Bjarmason
2021-04-01 13:56                         ` [PATCH v2 1/5] mktag tests: parse out options in helper Ævar Arnfjörð Bjarmason
2021-04-01 13:56                         ` [PATCH v2 2/5] mktag tests: invert --no-strict test Ævar Arnfjörð Bjarmason
2021-04-01 13:56                         ` [PATCH v2 3/5] mktag tests: do fsck on failure Ævar Arnfjörð Bjarmason
2021-04-01 13:56                         ` [PATCH v2 4/5] mktag tests: test for maybe segfaulting for-each-ref Ævar Arnfjörð Bjarmason
2021-04-01 13:56                         ` [PATCH v2 5/5] ref-filter: fix NULL check for parse object failure Ævar Arnfjörð Bjarmason
2021-04-01 19:19                           ` Ramsay Jones
2021-04-01 19:56                         ` [PATCH v2 0/5] mktag tests & fix for-each-ref segfault Junio C Hamano
2021-04-02 11:37                           ` Ævar Arnfjörð Bjarmason
2021-04-02 20:51                             ` Junio C Hamano
2021-04-01 19:52                       ` [PATCH] ref-filter: fix NULL check for parse object failure Junio C Hamano
2021-03-31 18:41             ` [PATCH v2 10/10] tag: don't misreport type of tagged objects in errors Junio C Hamano
2021-03-31 19:00               ` Jeff King
2021-03-28  9:27       ` [PATCH v2 00/10] improve reporting of unexpected objects Jeff King
2021-03-29 13:34         ` Ævar Arnfjörð Bjarmason
2021-03-31 10:43           ` Jeff King
2021-04-09  8:07       ` [PATCH 0/2] blob/object.c: trivial readability improvements Ævar Arnfjörð Bjarmason
2021-04-09  8:07         ` [PATCH 1/2] blob.c: remove buffer & size arguments to parse_blob_buffer() Ævar Arnfjörð Bjarmason
2021-04-09 17:51           ` Jeff King
2021-04-09 22:31             ` Junio C Hamano
2021-04-10 12:57             ` Ævar Arnfjörð Bjarmason
2021-04-10 13:01               ` Ævar Arnfjörð Bjarmason
2021-04-13  8:25               ` Jeff King
2021-04-09  8:07         ` [PATCH 2/2] object.c: initialize automatic variable in lookup_object() Ævar Arnfjörð Bjarmason
2021-04-09 17:53           ` Jeff King
2021-04-09 22:32             ` Junio C Hamano
2021-04-09  8:32         ` [PATCH 0/6] {tag,object}*.c: refactorings + prep for a larger change Ævar Arnfjörð Bjarmason
2021-04-09  8:32           ` [PATCH 1/6] object.c: stop supporting len == -1 in type_from_string_gently() Ævar Arnfjörð Bjarmason
2021-04-09 18:06             ` Jeff King
2021-04-09 18:10               ` Jeff King
2021-04-09  8:32           ` [PATCH 2/6] object.c: remove "gently" argument to type_from_string_gently() Ævar Arnfjörð Bjarmason
2021-04-09 18:10             ` Jeff King
2021-04-09  8:32           ` [PATCH 3/6] object.c: make type_from_string() return "enum object_type" Ævar Arnfjörð Bjarmason
2021-04-09 18:14             ` Jeff King
2021-04-09 19:42               ` Ævar Arnfjörð Bjarmason
2021-04-09 21:29                 ` Jeff King
2021-04-09  8:32           ` [PATCH 4/6] object-file.c: make oid_object_info() " Ævar Arnfjörð Bjarmason
2021-04-09 18:24             ` Jeff King
2021-04-09  8:32           ` [PATCH 5/6] object-name.c: make dependency on object_type order more obvious Ævar Arnfjörð Bjarmason
2021-04-09 18:36             ` Jeff King
2021-04-09  8:32           ` [PATCH 6/6] tag.c: use type_from_string_gently() when parsing tags Ævar Arnfjörð Bjarmason
2021-04-09 18:42             ` Jeff King
2021-04-09  8:49           ` [PATCH 0/7] object.c: add and use "is expected" utility function + object_as_type() use Ævar Arnfjörð Bjarmason
2021-04-09  8:49             ` [PATCH 1/7] tree.c: fix misindentation in parse_tree_gently() Ævar Arnfjörð Bjarmason
2021-04-09  8:49             ` [PATCH 2/7] object.c: add a utility function for "expected type X, got Y" Ævar Arnfjörð Bjarmason
2021-04-09  8:49             ` [PATCH 3/7] object.c: add and use oid_is_type_or_die_msg() function Ævar Arnfjörð Bjarmason
2021-04-09  8:49             ` [PATCH 4/7] commit-graph: use obj->type, not object_as_type() Ævar Arnfjörð Bjarmason
2021-04-09  8:50             ` [PATCH 5/7] commit.c: don't use deref_tag() -> object_as_type() Ævar Arnfjörð Bjarmason
2021-04-09  8:50             ` [PATCH 6/7] object.c: normalize brace style in object_as_type() Ævar Arnfjörð Bjarmason
2021-04-09  8:50             ` [PATCH 7/7] object.c: remove "quiet" parameter from object_as_type() Ævar Arnfjörð Bjarmason
2021-04-20 13:36             ` [PATCH v2 0/8] object.c: add and use "is expected" utility function + object_as_type() use Ævar Arnfjörð Bjarmason
2021-04-20 13:36               ` [PATCH v2 1/8] tree.c: fix misindentation in parse_tree_gently() Ævar Arnfjörð Bjarmason
2021-04-20 13:36               ` [PATCH v2 2/8] object.c: add a utility function for "expected type X, got Y" Ævar Arnfjörð Bjarmason
2021-04-21 22:02                 ` Jonathan Tan
2021-04-22  6:10                   ` Ævar Arnfjörð Bjarmason
2021-04-20 13:36               ` [PATCH v2 3/8] object.c: add and use oid_is_type_or_die_msg() function Ævar Arnfjörð Bjarmason
2021-04-21 22:07                 ` Jonathan Tan
2021-04-21 23:28                 ` Josh Steadmon
2021-04-28  4:12                   ` Junio C Hamano
2021-04-20 13:36               ` [PATCH v2 4/8] commit-graph: use obj->type, not object_as_type() Ævar Arnfjörð Bjarmason
2021-04-20 13:36               ` [PATCH v2 5/8] branch tests: assert lookup_commit_reference_gently() error Ævar Arnfjörð Bjarmason
2021-04-20 13:36               ` [PATCH v2 6/8] commit.c: don't use deref_tag() -> object_as_type() Ævar Arnfjörð Bjarmason
2021-04-21 22:26                 ` Jonathan Tan
2021-04-20 13:36               ` [PATCH v2 7/8] object.c: normalize brace style in object_as_type() Ævar Arnfjörð Bjarmason
2021-04-20 13:37               ` [PATCH v2 8/8] object.c: remove "quiet" parameter from object_as_type() Ævar Arnfjörð Bjarmason
2021-04-20 13:00           ` [PATCH v2 00/10] {tag,object}*.c: refactorings + prep for a larger change Ævar Arnfjörð Bjarmason
2021-04-20 13:00             ` [PATCH v2 01/10] object.c: stop supporting len == -1 in type_from_string_gently() Ævar Arnfjörð Bjarmason
2021-04-20 13:00             ` [PATCH v2 02/10] object.c: remove "gently" argument to type_from_string_gently() Ævar Arnfjörð Bjarmason
2021-04-20 13:00             ` [PATCH v2 03/10] object.c: make type_from_string() return "enum object_type" Ævar Arnfjörð Bjarmason
2021-04-20 13:00             ` [PATCH v2 04/10] object-file.c: make oid_object_info() " Ævar Arnfjörð Bjarmason
2021-04-20 13:00             ` [PATCH v2 05/10] object-name.c: make dependency on object_type order more obvious Ævar Arnfjörð Bjarmason
2021-04-20 13:00             ` [PATCH v2 06/10] tag.c: use type_from_string_gently() when parsing tags Ævar Arnfjörð Bjarmason
2021-04-20 13:00             ` [PATCH v2 07/10] hash-object: pass along type length to object.c Ævar Arnfjörð Bjarmason
2021-04-20 13:00             ` [PATCH v2 08/10] hash-object: refactor nested else/if/if into else if/else if Ævar Arnfjörð Bjarmason
2021-04-20 13:00             ` [PATCH v2 09/10] hash-object: show usage on invalid --type Ævar Arnfjörð Bjarmason
2021-04-20 13:00             ` [PATCH v2 10/10] object.c: move type_from_string() code to its last user Ævar Arnfjörð Bjarmason
2021-04-20 12:50         ` [PATCH v2 00/10] object.c et al: tests, small bug fixes etc Ævar Arnfjörð Bjarmason
2021-04-20 12:50           ` [PATCH v2 01/10] cat-file tests: test for bogus type name handling Ævar Arnfjörð Bjarmason
2021-04-29  4:15             ` Junio C Hamano
2021-04-20 12:50           ` [PATCH v2 02/10] hash-object tests: more detailed test for invalid type Ævar Arnfjörð Bjarmason
2021-04-20 12:50           ` [PATCH v2 03/10] mktree tests: add test for invalid object type Ævar Arnfjörð Bjarmason
2021-04-20 12:50           ` [PATCH v2 04/10] object-file.c: take type id, not string, in read_object_with_reference() Ævar Arnfjörð Bjarmason
2021-04-29  4:37             ` Junio C Hamano
2021-04-20 12:50           ` [PATCH v2 05/10] {commit,tree,blob,tag}.c: add a create_{commit,tree,blob,tag}() Ævar Arnfjörð Bjarmason
2021-04-29  4:45             ` Junio C Hamano
2021-04-29 12:01               ` Ævar Arnfjörð Bjarmason
2021-04-20 12:50           ` [PATCH v2 06/10] blob.c: remove parse_blob_buffer() Ævar Arnfjörð Bjarmason
2021-04-29  4:51             ` Junio C Hamano
2021-04-20 12:50           ` [PATCH v2 07/10] object.c: simplify return semantic of parse_object_buffer() Ævar Arnfjörð Bjarmason
2021-04-20 12:50           ` [PATCH v2 08/10] object.c: don't go past "len" under die() in type_from_string_gently() Ævar Arnfjörð Bjarmason
2021-04-29  4:55             ` Junio C Hamano
2021-04-20 12:50           ` [PATCH v2 09/10] mktree: stop setting *ntr++ to NIL Ævar Arnfjörð Bjarmason
2021-04-29  5:01             ` Junio C Hamano
2021-04-20 12:50           ` [PATCH v2 10/10] mktree: emit a more detailed error when the <type> is invalid Ævar Arnfjörð Bjarmason
2021-03-08 20:04   ` [PATCH 1/7] object.c: refactor type_from_string_gently() Ævar Arnfjörð Bjarmason
2021-03-08 20:52     ` Taylor Blau
2021-03-09 10:46     ` Jeff King
2021-03-08 20:04   ` [PATCH 2/7] object.c: make type_from_string() return "enum object_type" Ævar Arnfjörð Bjarmason
2021-03-08 20:56     ` Taylor Blau
2021-03-08 21:48     ` Junio C Hamano
2021-03-08 20:04   ` [PATCH 3/7] oid_object_info(): " Ævar Arnfjörð Bjarmason
2021-03-08 21:54     ` Junio C Hamano
2021-03-08 22:32       ` Junio C Hamano
2021-03-09 10:34     ` Jeff King
2021-03-08 20:04   ` [PATCH 4/7] tree.c: fix misindentation in parse_tree_gently() Ævar Arnfjörð Bjarmason
2021-03-08 20:04   ` [PATCH 5/7] object.c: add a utility function for "expected type X, got Y" Ævar Arnfjörð Bjarmason
2021-03-08 20:59     ` Taylor Blau
2021-03-08 22:15     ` Junio C Hamano
2021-03-08 20:04   ` [PATCH 6/7] object tests: add test for unexpected objects in tags Ævar Arnfjörð Bjarmason
2021-03-09 10:44     ` Jeff King
2021-03-28  1:35       ` Ævar Arnfjörð Bjarmason
2021-03-28  9:06         ` Jeff King
2021-03-28 15:39           ` Ævar Arnfjörð Bjarmason
2021-03-29  9:16             ` Jeff King
2021-03-08 20:04   ` [PATCH 7/7] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason
2005-06-22  0:35 ` [PATCH 2/2] Pull misc objects Daniel Barkalow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YGTGgFI19fS7Uv6I@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).