* [RFC][PATCH] object.c: use has_object() instead of repo_has_object_file() @ 2022-11-16 16:39 Kousik Sanagavarapu 2022-11-16 18:20 ` Jeff King 0 siblings, 1 reply; 22+ messages in thread From: Kousik Sanagavarapu @ 2022-11-16 16:39 UTC (permalink / raw) To: git; +Cc: Kousik Sanagavarapu It is mentioned in object-store.h that the function repo_has_object_file() is deprecated. One possible alternative for this function is has_object() (or atleast that is how I understood it). The file object-store.h also mentions that repo_has_object_file() and its fellow functions and macros can be removed once the migrations take place. This patch therefore is an attempt to reduce the usage of these functions and macros. I request for comments as I'm not really sure about the "flags" argument of the has_object() function and its usage in this patch. Signed-off-by: Kousik Sanagavarapu <five231003@gmail.com> --- object.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/object.c b/object.c index 8a74eb85e9..0a9516137a 100644 --- a/object.c +++ b/object.c @@ -286,8 +286,8 @@ struct object *parse_object_with_flags(struct repository *r, return &commit->object; } - if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) || - (!obj && repo_has_object_file(r, oid) && + if ((obj && obj->type == OBJ_BLOB && has_object(r, oid, 0)) || + (!obj && has_object(r, oid, 0) && oid_object_info(r, oid, NULL) == OBJ_BLOB)) { if (!skip_hash && stream_object_signature(r, repl) < 0) { error(_("hash mismatch %s"), oid_to_hex(oid)); -- 2.25.1 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [RFC][PATCH] object.c: use has_object() instead of repo_has_object_file() 2022-11-16 16:39 [RFC][PATCH] object.c: use has_object() instead of repo_has_object_file() Kousik Sanagavarapu @ 2022-11-16 18:20 ` Jeff King 2022-11-16 21:14 ` Jonathan Tan 0 siblings, 1 reply; 22+ messages in thread From: Jeff King @ 2022-11-16 18:20 UTC (permalink / raw) To: Kousik Sanagavarapu; +Cc: Jonathan Tan, git On Wed, Nov 16, 2022 at 10:09:56PM +0530, Kousik Sanagavarapu wrote: > It is mentioned in object-store.h that the function > repo_has_object_file() is deprecated. One possible alternative for this > function is has_object() (or atleast that is how I understood it). > > The file object-store.h also mentions that repo_has_object_file() and > its fellow functions and macros can be removed once the migrations take > place. This patch therefore is an attempt to reduce the usage of these > functions and macros. > > I request for comments as I'm not really sure about the "flags" argument > of the has_object() function and its usage in this patch. So you've stumbled into quite a tricky spot. :) Yes, without specifying new flags, this patch has a change of behavior which I think is not what we want. From 1d8d9cb620 (sha1-file: introduce no-lazy-fetch has_object(), 2020-08-05): There have been a few bugs wherein Git fetches missing objects whenever the existence of an object is checked, even though it does not need to perform such a fetch. To resolve these bugs, we could look at all the places that has_object_file() (or a similar function) is used. As a first step, introduce a new function has_object() that checks for the existence of an object, with a default behavior of not fetching if the object is missing and the repository is a partial clone. As we verify each has_object_file() (or similar) usage, we can replace it with has_object(), and we will know that we are done when we can delete has_object_file() (and the other similar functions). Also, the new function has_object() has more appropriate defaults: besides not fetching, it also does not recheck packed storage. So the new function will: - not recheck packed objects unless we pass HAS_OBJECT_RECHECK_PACKED; this is done in the default paths because a simultaneous git-gc may be repacking objects (and we would rather double-check than racily miss it). So it's appropriate behavior if the caller is speculatively asking "hey, we _might_ have this object", but not if we expect to have it. - not lazily fetch objects in a partial-clone repository. This again depends on the caller being in a situation where they are OK saying "no we don't have it" for an object we _could_ get if it was worth spending effort The call you're touching here is in parse_object_with_flags(), which is using the check as part of the "is it a blob that we should stream?" check. If this call returns "no we don't have it", when we could get it (either due to racy repack or by fetching), then we'll hit the non-blob path that calls repo_read_object_file(). Where we would do a fresh lookup, including re-checking packs and/or lazily fetching! So the new behavior seems strictly worse to me. We don't avoid those behaviors, _and_ we fail to follow the streaming-blob code path (which means we may accidentally read a huge blob into memory. I think we'd want to leave it as-is, or if we really want to eventually drop to a single interface, we need has_object() to learn a new flag to enable the lazy-fetch (and then use it along with RECHECK_PACKED). Now all of that said, I am skeptical that these calls to repo_has_object_file() are even doing anything useful at all. Looking at the existing code (dropping the "+" lines from your patch): > - if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) || > - (!obj && repo_has_object_file(r, oid) && > oid_object_info(r, oid, NULL) == OBJ_BLOB)) { > if (!skip_hash && stream_object_signature(r, repl) < 0) { > error(_("hash mismatch %s"), oid_to_hex(oid)); we are checking that it exits if either: - another object (e.g., a tree) referred to it as a blob during this process, and we created an in-memory "struct object" with that type - nobody has referred to it, and we want to check its type via oid_object_info() In the second case, this seems totally pointless. We can just ask oid_object_info() what it's type is, and it will say "no, we don't have it" if appropriate. It will do the usual recheck-pack and lazy-fetch, but so is repo_has_object_file(), so the short-circuit "&&" is not helping. _If_ we were to switch to has_object() it would start to do something, but I think that's a bad idea for the reasons given above. In the first case, I'd likewise argue it's not doing anything useful. It is confirming that we have the object, but so would the call to stream_object_signature() immediately below (assuming skip_hash is not set). If skip_hash is set, then we could either: 1. Just assume we have it. The point of the caller passing skip_hash is that we don't care about checking the integrity for this use case, and "missing" is not really any different than "there's a file on disk but it might contain garbage bytes". 2. Check repo_has_object_file() in this code path only when skip_hash is set. That retains the same "do we even have it" check for skip_hash that is performed now. I.e., I'd suggest this patch to remove both calls entirely: diff --git a/object.c b/object.c index 248530ba7b..a370545405 100644 --- a/object.c +++ b/object.c @@ -285,9 +285,8 @@ struct object *parse_object_with_flags(struct repository *r, return &commit->object; } - if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) || - (!obj && repo_has_object_file(r, oid) && - oid_object_info(r, oid, NULL) == OBJ_BLOB)) { + if ((obj && obj->type == OBJ_BLOB) || + (!obj && oid_object_info(r, oid, NULL) == OBJ_BLOB)) { if (!skip_hash && stream_object_signature(r, repl) < 0) { error(_("hash mismatch %s"), oid_to_hex(oid)); return NULL; But there may be some subtlety I'm missing. I'm cc-ing Jonathan Tan, who added has_object(), and who added the top call to repo_has_object_file() via df11e19648 (rev-list: support termination at promisor objects, 2017-12-08). The second call is from 090ea12671 (parse_object: avoid putting whole blob in core, 2012-03-07), but he's no longer active on the project. Looking at the commit, I think it was just a case of "let's be extra careful". But as far as I can tell, it's not helping anything, and both calls are introducing extra work doing object lookups. -Peff ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [RFC][PATCH] object.c: use has_object() instead of repo_has_object_file() 2022-11-16 18:20 ` Jeff King @ 2022-11-16 21:14 ` Jonathan Tan 2022-11-17 22:37 ` [PATCH 0/2] fixing parse_object() check for type mismatch Jeff King 0 siblings, 1 reply; 22+ messages in thread From: Jonathan Tan @ 2022-11-16 21:14 UTC (permalink / raw) To: Jeff King; +Cc: Jonathan Tan, Kousik Sanagavarapu, git Jeff King <peff@peff.net> writes: > I.e., I'd suggest this patch to remove both calls entirely: > > diff --git a/object.c b/object.c > index 248530ba7b..a370545405 100644 > --- a/object.c > +++ b/object.c > @@ -285,9 +285,8 @@ struct object *parse_object_with_flags(struct repository *r, > return &commit->object; > } > > - if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) || > - (!obj && repo_has_object_file(r, oid) && > - oid_object_info(r, oid, NULL) == OBJ_BLOB)) { > + if ((obj && obj->type == OBJ_BLOB) || > + (!obj && oid_object_info(r, oid, NULL) == OBJ_BLOB)) { > if (!skip_hash && stream_object_signature(r, repl) < 0) { > error(_("hash mismatch %s"), oid_to_hex(oid)); > return NULL; > > But there may be some subtlety I'm missing. I'm cc-ing Jonathan Tan, who > added has_object(), and who added the top call to repo_has_object_file() > via df11e19648 (rev-list: support termination at promisor objects, > 2017-12-08). Thanks for CC-ing me on this. Looking at that commit and the code at that time, I'm not sure why I added that call either. My best guess is that I was worried that the streaming interface wouldn't support missing objects, but both then and now, a call to istream_source() is made before any streaming occurs (which does perform the lazy fetch). So yes, I also think that you can remove these calls. ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 0/2] fixing parse_object() check for type mismatch 2022-11-16 21:14 ` Jonathan Tan @ 2022-11-17 22:37 ` Jeff King 2022-11-17 22:37 ` [PATCH 1/2] parse_object(): drop extra "has" check before checking object type Jeff King ` (3 more replies) 0 siblings, 4 replies; 22+ messages in thread From: Jeff King @ 2022-11-17 22:37 UTC (permalink / raw) To: Jonathan Tan; +Cc: Taylor Blau, Kousik Sanagavarapu, git On Wed, Nov 16, 2022 at 01:14:18PM -0800, Jonathan Tan wrote: > > But there may be some subtlety I'm missing. I'm cc-ing Jonathan Tan, who > > added has_object(), and who added the top call to repo_has_object_file() > > via df11e19648 (rev-list: support termination at promisor objects, > > 2017-12-08). > > Thanks for CC-ing me on this. Looking at that commit and the code at that time, > I'm not sure why I added that call either. My best guess is that I was worried > that the streaming interface wouldn't support missing objects, but both then > and now, a call to istream_source() is made before any streaming occurs (which > does perform the lazy fetch). > > So yes, I also think that you can remove these calls. Thanks. After staring at this a bit, I noticed there is an even more subtle issue with the case you touched back then, which is that it fails to notice when a think we expect to be a blob isn't one. Your old patch didn't make anything worse there, but it also wasn't sufficient to catch the problem. See patch 2 for details. I'm adding Taylor to the cc as the author of t6102, when we were tracking down all of these "oops, it's not really a blob" cases. This fixes one of the lingering cases from that test script. [1/2]: parse_object(): drop extra "has" check before checking object type [2/2]: parse_object(): check on-disk type of suspected blob object.c | 5 ++--- t/t6102-rev-list-unexpected-objects.sh | 4 ++-- 2 files changed, 4 insertions(+), 5 deletions(-) -Peff ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 1/2] parse_object(): drop extra "has" check before checking object type 2022-11-17 22:37 ` [PATCH 0/2] fixing parse_object() check for type mismatch Jeff King @ 2022-11-17 22:37 ` Jeff King 2022-11-17 22:41 ` [PATCH 2/2] parse_object(): check on-disk type of suspected blob Jeff King ` (2 subsequent siblings) 3 siblings, 0 replies; 22+ messages in thread From: Jeff King @ 2022-11-17 22:37 UTC (permalink / raw) To: Jonathan Tan; +Cc: Taylor Blau, Kousik Sanagavarapu, git When parsing an object of unknown type, we check to see if it's a blob, so we can use our streaming code path. This uses oid_object_info() to check the type, but before doing so we call repo_has_object_file(). This latter is pointless, as oid_object_info() will already fail if the object is missing. Checking it ahead of time just complicates the code and is a waste of resources (albeit small). Let's drop the redundant check. Signed-off-by: Jeff King <peff@peff.net> --- object.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/object.c b/object.c index 8a74eb85e9..16eb944e98 100644 --- a/object.c +++ b/object.c @@ -287,8 +287,7 @@ struct object *parse_object_with_flags(struct repository *r, } if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) || - (!obj && repo_has_object_file(r, oid) && - oid_object_info(r, oid, NULL) == OBJ_BLOB)) { + (!obj && oid_object_info(r, oid, NULL) == OBJ_BLOB)) { if (!skip_hash && stream_object_signature(r, repl) < 0) { error(_("hash mismatch %s"), oid_to_hex(oid)); return NULL; -- 2.38.1.890.g50b10763b9 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 2/2] parse_object(): check on-disk type of suspected blob 2022-11-17 22:37 ` [PATCH 0/2] fixing parse_object() check for type mismatch Jeff King 2022-11-17 22:37 ` [PATCH 1/2] parse_object(): drop extra "has" check before checking object type Jeff King @ 2022-11-17 22:41 ` Jeff King 2022-11-18 0:36 ` Ævar Arnfjörð Bjarmason 2022-11-18 11:46 ` [PATCH 0/4] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason 2022-11-18 19:05 ` [PATCH 0/2] fixing parse_object() check for type mismatch Taylor Blau 3 siblings, 1 reply; 22+ messages in thread From: Jeff King @ 2022-11-17 22:41 UTC (permalink / raw) To: Jonathan Tan; +Cc: Taylor Blau, Kousik Sanagavarapu, git In parse_object(), we try to handle blobs by streaming rather than loading them entirely into memory. The most common case here will be that we haven't seen the object yet and check oid_object_info(), which tells us we have a blob. But we trigger this code on one other case: when we have an in-memory object struct with type OBJ_BLOB (and without its "parsed" flag set, since otherwise we'd return early from the function). This indicates that some other part of the code suspected we have a blob (e.g., it was mentioned by a tree or tag) but we haven't yet looked at the on-disk copy. In this case before hitting the streaming path, we check if we have the object on-disk at all. This is mostly pointless extra work, as the streaming path would complain if it couldn't open the object (albeit with the message "hash mismatch", which is a little misleading). But it's also insufficient to catch all problems. The streaming code will only tell us "yes, the on-disk object matches the oid". But it doesn't actually confirm that what we found was indeed a blob, and neither does repo_has_object_file(). One way to improve this would be to teach stream_object_signature() to check the type (either by returning it to us to check, or taking an "expected" type). But there's an even simpler fix here: if we suspect the object is a blob, just call oid_object_info() to confirm that we have it on-disk, and that it really is a blob. This is slightly less efficient than teaching stream_object_signature() to do it (since it has to open the object already). But this case very rarely comes up. In practice, we usually don't have any clue what the type is, in which case we already call oid_object_info(). This "suspected" case happens only when some other code created an object struct but didn't actually parse the blob, which is actually tricky to trigger at all (see the discussion of the test below). I reworked the conditional a bit so that instead of: if ((suspected_blob && oid_object_info() == OBJ_BLOB) (no_clue && oid_object_info() == OBJ_BLOB) we have the simpler: if ((suspected_blob || no_clue) && oid_object_info() == OBJ_BLOB) This is shorter, but also reflects what we really want say, which is "have we ruled out this being a blob; if not, check it on-disk". In either case, if oid_object_info() fails to tell us it's a blob, we'll skip the streaming code path and call repo_read_object_file(), just as before. And if we really do have a mismatch with the existing object struct, we'll eventually call lookup_commit(), etc, via parse_object_buffer(), which will complain that it doesn't match our existing obj->type. So this fixes one of the lingering expect_failure cases from 0616617c7e (t: introduce tests for unexpected object types, 2019-04-09). That test works by peeling a tag that claims to point to a blob (triggering us to create the struct), but really points to something else, which we later discover when we call parse_object() as part of the actual traversal). Prior to this commit, we'd quietly check the sha1 and mark the blob as "parsed". Now we correctly complain about the mismatch. Signed-off-by: Jeff King <peff@peff.net> --- As an aside, I found the "this test is marked as success but testing the wrong thing" pattern here confusing to deal with (since I had to dig in history to understand what was going on and what the test was _supposed_ to say). It comes from cf10c5b4cf (rev-list tests: don't hide abort() in "test_expect_failure", 2022-03-07). I'm skeptical that it was worth switching those tests for leak detection purposes. But more importantly, it looks like pw/test-todo would provide us with a much nicer pattern there. It seems to be stalled on review, so let's see if we can get that moving again. object.c | 4 ++-- t/t6102-rev-list-unexpected-objects.sh | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/object.c b/object.c index 16eb944e98..fad1a5af4a 100644 --- a/object.c +++ b/object.c @@ -286,8 +286,8 @@ struct object *parse_object_with_flags(struct repository *r, return &commit->object; } - if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) || - (!obj && oid_object_info(r, oid, NULL) == OBJ_BLOB)) { + if ((!obj || (obj && obj->type == OBJ_BLOB)) && + oid_object_info(r, oid, NULL) == OBJ_BLOB) { if (!skip_hash && stream_object_signature(r, repl) < 0) { error(_("hash mismatch %s"), oid_to_hex(oid)); return NULL; diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh index 4a9a4436e2..9350b5fd2c 100755 --- a/t/t6102-rev-list-unexpected-objects.sh +++ b/t/t6102-rev-list-unexpected-objects.sh @@ -121,8 +121,8 @@ test_expect_success 'setup unexpected non-blob tag' ' tag=$(git hash-object -w --literally -t tag broken-tag) ' -test_expect_success 'TODO (should fail!): traverse unexpected non-blob tag (lone)' ' - git rev-list --objects $tag +test_expect_success 'traverse unexpected non-blob tag (lone)' ' + test_must_fail git rev-list --objects $tag ' test_expect_success 'traverse unexpected non-blob tag (seen)' ' -- 2.38.1.890.g50b10763b9 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] parse_object(): check on-disk type of suspected blob 2022-11-17 22:41 ` [PATCH 2/2] parse_object(): check on-disk type of suspected blob Jeff King @ 2022-11-18 0:36 ` Ævar Arnfjörð Bjarmason 2022-11-21 19:21 ` Jeff King 0 siblings, 1 reply; 22+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-11-18 0:36 UTC (permalink / raw) To: Jeff King; +Cc: Jonathan Tan, Taylor Blau, Kousik Sanagavarapu, git On Thu, Nov 17 2022, Jeff King wrote: > I reworked the conditional a bit so that instead of: > > if ((suspected_blob && oid_object_info() == OBJ_BLOB) > (no_clue && oid_object_info() == OBJ_BLOB) > > we have the simpler: > > if ((suspected_blob || no_clue) && oid_object_info() == OBJ_BLOB) > [...] > @@ -286,8 +286,8 @@ struct object *parse_object_with_flags(struct repository *r, > return &commit->object; > } > > - if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) || > - (!obj && oid_object_info(r, oid, NULL) == OBJ_BLOB)) { > + if ((!obj || (obj && obj->type == OBJ_BLOB)) && > + oid_object_info(r, oid, NULL) == OBJ_BLOB) { > if (!skip_hash && stream_object_signature(r, repl) < 0) { > error(_("hash mismatch %s"), oid_to_hex(oid)); > return NULL; But why: if ((!x || (x && x->m)) && ...) Instead of: if ((!x || x->m)) && ...) If "!obj" is false then "obj" must be non-NULL, so you don't need to check it again and can lose the "obj &&". > [...] > So this fixes one of the lingering expect_failure cases from 0616617c7e > (t: introduce tests for unexpected object types, 2019-04-09). That test > works by peeling a tag that claims to point to a blob (triggering us to > create the struct), but really points to something else, which we later > discover when we call parse_object() as part of the actual traversal). > Prior to this commit, we'd quietly check the sha1 and mark the blob as > "parsed". Now we correctly complain about the mismatch. I applied this on top of "master", and adjusted your test to be this instead: test_expect_success 'traverse unexpected non-blob tag (lone)' ' cat >expect <<-EOF && error: object $commit is a blob, not a commit fatal: bad object $commit EOF test_must_fail git rev-list --objects $tag >out 2>actual && test_must_be_empty out && test_cmp expect actual ' Which passes, showing that we're still not correctly identifying it, but we are doing it for the purposes of erroring out, but the incorrect type persists. Now, this all does seem quite familiar... :) : https://lore.kernel.org/git/patch-10.11-a84f670ac24-20210328T021238Z-avarab@gmail.com/ I.e. that's the rest of the fix for this issue. I applied this change on my local branch with that, and they combine nicely. the "test_must_fail" here works as intended, *and* we'll correctly report & store the type. > As an aside, I found the "this test is marked as success but testing the > wrong thing" pattern here confusing to deal with (since I had to dig in > history to understand what was going on and what the test was _supposed_ > to say). > > It comes from cf10c5b4cf (rev-list tests: don't hide abort() in > "test_expect_failure", 2022-03-07). I'm skeptical that it was worth > switching those tests for leak detection purposes. It's not just for leak detection purposes that it was a good idea to switch it away from test_expect_failure, but also that we've been ensuring that this didn't turn into a segfault all this time by not using "test_expect_failure". > But more importantly, it looks like pw/test-todo would provide us with a > much nicer pattern there. It seems to be stalled on review, so let's see > if we can get that moving again. The "TODO (should fail!)" didn't stand out? But yeah, having a "todo" or "test_expect_todo" or "test_expect_failure" not suck would be nice. FWIW I think https://lore.kernel.org/git/221006.86v8owr986.gmgdl@evledraar.gmail.com/ outlines a good way forward for it that I think should make everyone happy. > diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh > index 4a9a4436e2..9350b5fd2c 100755 > --- a/t/t6102-rev-list-unexpected-objects.sh > +++ b/t/t6102-rev-list-unexpected-objects.sh > @@ -121,8 +121,8 @@ test_expect_success 'setup unexpected non-blob tag' ' > tag=$(git hash-object -w --literally -t tag broken-tag) > ' > > -test_expect_success 'TODO (should fail!): traverse unexpected non-blob tag (lone)' ' > - git rev-list --objects $tag > +test_expect_success 'traverse unexpected non-blob tag (lone)' ' > + test_must_fail git rev-list --objects $tag > ' I think you'll either want the test_cmp I noted above, or to do that in a subsequent test_expect_failure. I know that your stance is that you prefer not to "test the bad behavior" as it were. Personally I thought an all-caps TODO comment might make it less confusing, but anyway. In this case your commit message claims you're happy with the end result, so I think you'd want to test what we actually emit on stderr, as it's quite ... unintuative. Or, which I think probably makes more sense, add that as a subsequent test_expect_failure or whatever. FWIW this somewhat un-idiomatic pattern will get around the current caveats with it: test_expect_success .... ' ... >actual.non-blob-tag ' test_lazy_prereq HAVE_NON_BLOB_TAG 'test -e actual.non-blob-tag' test_expect_failure HAVE_NON_BLOB_TAG '...' ' cat >expect.non-blob-tag <<-\EOF && ... EOF test_cmp expect.non-blob-tag actual.non-blob-tag ' I.e. peel off the 'test_cmp" that should have a known-good state from the already-good status code. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 2/2] parse_object(): check on-disk type of suspected blob 2022-11-18 0:36 ` Ævar Arnfjörð Bjarmason @ 2022-11-21 19:21 ` Jeff King 0 siblings, 0 replies; 22+ messages in thread From: Jeff King @ 2022-11-21 19:21 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: Jonathan Tan, Taylor Blau, Kousik Sanagavarapu, git On Fri, Nov 18, 2022 at 01:36:23AM +0100, Ævar Arnfjörð Bjarmason wrote: > > - if ((obj && obj->type == OBJ_BLOB && repo_has_object_file(r, oid)) || > > - (!obj && oid_object_info(r, oid, NULL) == OBJ_BLOB)) { > > + if ((!obj || (obj && obj->type == OBJ_BLOB)) && > > + oid_object_info(r, oid, NULL) == OBJ_BLOB) { > > if (!skip_hash && stream_object_signature(r, repl) < 0) { > > error(_("hash mismatch %s"), oid_to_hex(oid)); > > return NULL; > > But why: > > if ((!x || (x && x->m)) && ...) > > Instead of: > > if ((!x || x->m)) && ...) > > If "!obj" is false then "obj" must be non-NULL, so you don't need to > check it again and can lose the "obj &&". Just that it was one more round of refactoring than I did. :) I agree that it's much more readable. It looks like the original hit 'next', so I'll send a patch on top. > I applied this on top of "master", and adjusted your test to be this > instead: > > test_expect_success 'traverse unexpected non-blob tag (lone)' ' > cat >expect <<-EOF && > error: object $commit is a blob, not a commit > fatal: bad object $commit > EOF > test_must_fail git rev-list --objects $tag >out 2>actual && > test_must_be_empty out && > test_cmp expect actual > ' > > Which passes, showing that we're still not correctly identifying it, but > we are doing it for the purposes of erroring out, but the incorrect type > persists. > > Now, this all does seem quite familiar... :) : > https://lore.kernel.org/git/patch-10.11-a84f670ac24-20210328T021238Z-avarab@gmail.com/ > > I.e. that's the rest of the fix for this issue. I applied this change on > my local branch with that, and they combine nicely. the "test_must_fail" > here works as intended, *and* we'll correctly report & store the type. Right. It's hitting the exact same code path as all of the other object types now. You suggested adding to the test here, but I'd prefer not to do that. Noticing that we have a type mismatch is what is fixed, and it now does that just like all the other object types. Dealing with the message reversal is orthogonal. > > But more importantly, it looks like pw/test-todo would provide us with a > > much nicer pattern there. It seems to be stalled on review, so let's see > > if we can get that moving again. > > The "TODO (should fail!)" didn't stand out? But yeah, having a "todo" or > "test_expect_todo" or "test_expect_failure" not suck would be nice. I did double-take on the "TODO" just because that is not our usual pattern, but that was easily fixed. What I really don't like about the "switch failure to success" pattern is that it requires rewriting the test to expect the wrong thing! So when somebody later fixes the bug, they get a confusing failure, but must also rewrite the test back to what it originally should have been. That was not too hard here, where it was just replacing a test_must_fail, but that earlier hunk in cf10c5b4cf that actualy adds in expected output (that we know is the wrong thing to be printing!) seems a bit over the top to me. Anybody who encounters it has to dig into the history to understand what is going on. -Peff ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 0/4] tag: don't misreport type of tagged objects in errors 2022-11-17 22:37 ` [PATCH 0/2] fixing parse_object() check for type mismatch Jeff King 2022-11-17 22:37 ` [PATCH 1/2] parse_object(): drop extra "has" check before checking object type Jeff King 2022-11-17 22:41 ` [PATCH 2/2] parse_object(): check on-disk type of suspected blob Jeff King @ 2022-11-18 11:46 ` Ævar Arnfjörð Bjarmason 2022-11-18 11:46 ` [PATCH 1/4] object-file.c: free the "t.tag" in check_tag() Ævar Arnfjörð Bjarmason ` (4 more replies) 2022-11-18 19:05 ` [PATCH 0/2] fixing parse_object() check for type mismatch Taylor Blau 3 siblings, 5 replies; 22+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-11-18 11:46 UTC (permalink / raw) To: git Cc: Taylor Blau, Junio C Hamano, Jeff King, Jonathan Tan, Kousik Sanagavarapu, Ævar Arnfjörð Bjarmason This series fixes a very long-standing issue where we'll get confused when we parse a tag whose "type" lies about the type of the target object. It goes on top of Jeff King's just-submitted [1], and the two compliment one another. See [2] for my feedback about what was left over, which this fixes. Currently we'll parse tags and note what the "type" claims to be. Say a pointer to a "blob" object that claims to be a "commit" in the envelope. Then when we we'd try to parse that supposed "commit' for real we'd emit a message like: error: object <oid> is a blob, not a commit Which is reversed, i.e. we'd remember the first "blob" we saw, and then get confused about seeing a "commit" when we did the actual parsing. This is now fixed in almost all cases by having the one caller of parse_tag() which actually knows the type tell it "yes, I'm sure this is a commit". We'll then be able to see that we have a non-parsed object as scaffolding, but that it's really a commit, and emit the correct: error: object <oid> is a commit not a blob Which goes along with other errors where the tag object itself yells about being unhappy with the object reference. I submitted a version of these patches back in early 2021[3], this is significantly slimmed down since then. At the time Jeff King noted that this approach inherently can't cover all possible scenarios. I.e. sometimes our parsing of the envelope isn't followed up by the "real" parse. Even in those cases we can "get it right as 4/4 here demonstrates. But there are going to be cases left where we get it wrong, but they're all cases where we get it wrong now. It's probably not worth fixing the long tail of those issues, but now we'll emit a sensible error on the common case of "log" etc. 1. https://lore.kernel.org/git/Y3a3qcqNG8W3ueeb@coredump.intra.peff.net/ 2. https://lore.kernel.org/git/221118.86cz9lgjxu.gmgdl@evledraar.gmail.com/ 3. https://lore.kernel.org/git/YGTGgFI19fS7Uv6I@coredump.intra.peff.net/ Ævar Arnfjörð Bjarmason (4): object-file.c: free the "t.tag" in check_tag() object tests: add test for unexpected objects in tags tag: don't misreport type of tagged objects in errors tag: don't emit potentially incorrect "object is a X, not a Y" blob.c | 11 +- blob.h | 3 + commit.c | 11 +- commit.h | 2 + object-file.c | 1 + object.c | 20 +++- object.h | 2 + t/t3800-mktag.sh | 1 + t/t5302-pack-index.sh | 2 + t/t6102-rev-list-unexpected-objects.sh | 146 +++++++++++++++++++++++++ tag.c | 22 +++- tag.h | 2 + tree.c | 11 +- tree.h | 2 + 14 files changed, 222 insertions(+), 14 deletions(-) -- 2.38.0.1511.gcdcff1f1dc2 ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH 1/4] object-file.c: free the "t.tag" in check_tag() 2022-11-18 11:46 ` [PATCH 0/4] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason @ 2022-11-18 11:46 ` Ævar Arnfjörð Bjarmason 2022-11-18 11:46 ` [PATCH 2/4] object tests: add test for unexpected objects in tags Ævar Arnfjörð Bjarmason ` (3 subsequent siblings) 4 siblings, 0 replies; 22+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-11-18 11:46 UTC (permalink / raw) To: git Cc: Taylor Blau, Junio C Hamano, Jeff King, Jonathan Tan, Kousik Sanagavarapu, Ævar Arnfjörð Bjarmason Fix a memory leak that's been with us ever since c879daa2372 (Make hash-object more robust against malformed objects, 2011-02-05). With "HASH_FORMAT_CHECK" (used by "hash-object" and "replace") we'll parse tags into a throwaway variable on the stack, but weren't freeing the "item->tag" we might malloc() when doing so. Mark the tests that now pass in their entirety as passing under "SANITIZE=leak", which means we'll test them as part of the "linux-leaks" CI job. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- object-file.c | 1 + t/t3800-mktag.sh | 1 + t/t5302-pack-index.sh | 2 ++ 3 files changed, 4 insertions(+) diff --git a/object-file.c b/object-file.c index 957790098fa..a13edba0753 100644 --- a/object-file.c +++ b/object-file.c @@ -2338,6 +2338,7 @@ static void check_tag(const void *buf, size_t size) memset(&t, 0, sizeof(t)); if (parse_tag_buffer(the_repository, &t, buf, size)) die(_("corrupt tag")); + free(t.tag); } static int index_mem(struct index_state *istate, diff --git a/t/t3800-mktag.sh b/t/t3800-mktag.sh index e3cf0ffbe59..d3e428ff46e 100755 --- a/t/t3800-mktag.sh +++ b/t/t3800-mktag.sh @@ -4,6 +4,7 @@ test_description='git mktag: tag object verify test' +TEST_PASSES_SANITIZE_LEAK=true . ./test-lib.sh ########################################################### diff --git a/t/t5302-pack-index.sh b/t/t5302-pack-index.sh index b0095ab41d3..54b11f81c63 100755 --- a/t/t5302-pack-index.sh +++ b/t/t5302-pack-index.sh @@ -4,6 +4,8 @@ # test_description='pack index with 64-bit offsets and object CRC' + +TEST_PASSES_SANITIZE_LEAK=true . ./test-lib.sh test_expect_success 'setup' ' -- 2.38.0.1511.gcdcff1f1dc2 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 2/4] object tests: add test for unexpected objects in tags 2022-11-18 11:46 ` [PATCH 0/4] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason 2022-11-18 11:46 ` [PATCH 1/4] object-file.c: free the "t.tag" in check_tag() Ævar Arnfjörð Bjarmason @ 2022-11-18 11:46 ` Ævar Arnfjörð Bjarmason 2022-11-18 11:46 ` [PATCH 3/4] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason ` (2 subsequent siblings) 4 siblings, 0 replies; 22+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-11-18 11:46 UTC (permalink / raw) To: git Cc: Taylor Blau, Junio C Hamano, Jeff King, Jonathan Tan, Kousik Sanagavarapu, Ævar Arnfjörð Bjarmason Fix a blind spot in the tests added in 0616617c7e1 (t: introduce tests for unexpected object types, 2019-04-09), there were no meaningful tests for checking how we reported on finding the incorrect object type in a tag, i.e. one that broke the "type" promise in the tag header. We'll report the wrong object type in these cases, and thus fail on the "test_cmp", e.g. for the first "error: " output being tested here we should say "$commit is a tag, not a commit", instead we say "$commit is a commit, not a tag". This will be fixed in a subsequent commit. See the discussion & notes in [1] and downthread of there for test snippets that are adapted here. In the case of "fsck" which objects we visit in what order, and if we report errors on them depends on their OIDs. So the test uses the technique of extracting the OID/type combinations that fsck does report, and asserting that those are correct (currently, it's far from correct). 1. https://lore.kernel.org/git/YGTGgFI19fS7Uv6I@coredump.intra.peff.net/ Helped-by: Jeff King <peff@peff.net> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- t/t6102-rev-list-unexpected-objects.sh | 146 +++++++++++++++++++++++++ 1 file changed, 146 insertions(+) diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh index 9350b5fd2c2..ac49f7182fd 100755 --- a/t/t6102-rev-list-unexpected-objects.sh +++ b/t/t6102-rev-list-unexpected-objects.sh @@ -130,4 +130,150 @@ test_expect_success 'traverse unexpected non-blob tag (seen)' ' test_i18ngrep "not a blob" output ' +test_expect_success 'setup unexpected non-tag tag' ' + test_when_finished "git tag -d tag-commit tag-tag" && + + git tag -a -m"my tagged commit" tag-commit $commit && + tag_commit=$(git rev-parse tag-commit) && + git tag -a -m"my tagged tag" tag-tag tag-commit && + tag_tag=$(git rev-parse tag-tag) && + + git cat-file tag tag-tag >good-tag-tag && + git cat-file tag tag-commit >good-commit-tag && + + sed -e "s/$tag_commit/$commit/" <good-tag-tag >broken-tag-tag-commit && + sed -e "s/$tag_commit/$tree/" <good-tag-tag >broken-tag-tag-tree && + sed -e "s/$tag_commit/$blob/" <good-tag-tag >broken-tag-tag-blob && + + sed -e "s/$commit/$tag_commit/" <good-commit-tag >broken-commit-tag-tag && + sed -e "s/$commit/$tree/" <good-commit-tag >broken-commit-tag-tree && + sed -e "s/$commit/$blob/" <good-commit-tag >broken-commit-tag-blob && + + tag_tag_commit=$(git hash-object -w -t tag broken-tag-tag-commit) && + tag_tag_tree=$(git hash-object -w -t tag broken-tag-tag-tree) && + tag_tag_blob=$(git hash-object -w -t tag broken-tag-tag-blob) && + + git update-ref refs/tags/tag_tag_commit $tag_tag_commit && + git update-ref refs/tags/tag_tag_tree $tag_tag_tree && + git update-ref refs/tags/tag_tag_blob $tag_tag_blob && + + commit_tag_tag=$(git hash-object -w -t tag broken-commit-tag-tag) && + commit_tag_tree=$(git hash-object -w -t tag broken-commit-tag-tree) && + commit_tag_blob=$(git hash-object -w -t tag broken-commit-tag-blob) && + + git update-ref refs/tags/commit_tag_tag $commit_tag_tag && + git update-ref refs/tags/commit_tag_tree $commit_tag_tree && + git update-ref refs/tags/commit_tag_blob $commit_tag_blob +' + +test_expect_failure 'traverse unexpected incorrectly typed tag (to commit & tag)' ' + test_must_fail git rev-list --objects $tag_tag_commit 2>err && + cat >expect <<-EOF && + error: object $commit is a commit, not a tag + fatal: bad object $commit + EOF + test_cmp expect err && + + test_must_fail git rev-list --objects $commit_tag_tag 2>err && + cat >expect <<-EOF && + error: object $tag_commit is a tag, not a commit + fatal: bad object $tag_commit + EOF + test_cmp expect err +' + +test_expect_failure 'traverse unexpected incorrectly typed tag (to tree)' ' + test_must_fail git rev-list --objects $tag_tag_tree 2>err && + cat >expect <<-EOF && + error: object $tree is a tree, not a tag + fatal: bad object $tree + EOF + test_cmp expect err && + + test_must_fail git rev-list --objects $commit_tag_tree 2>err && + cat >expect <<-EOF && + error: object $tree is a tree, not a commit + fatal: bad object $tree + EOF + test_cmp expect err +' + +test_expect_failure 'traverse unexpected incorrectly typed tag (to blob)' ' + test_must_fail git rev-list --objects $tag_tag_blob 2>err && + cat >expect <<-EOF && + error: object $blob is a blob, not a tag + fatal: bad object $blob + EOF + test_cmp expect err && + + test_must_fail git rev-list --objects $commit_tag_blob 2>err && + cat >expect <<-EOF && + error: object $blob is a blob, not a commit + fatal: bad object $blob + EOF + test_cmp expect err +' + +test_expect_failure 'traverse unexpected non-tag tag (tree seen to blob)' ' + test_must_fail git rev-list --objects $tree $commit_tag_blob 2>err && + cat >expect <<-EOF && + error: object $blob is a blob, not a commit + fatal: bad object $blob + EOF + test_cmp expect err && + + test_must_fail git rev-list --objects $tree $tag_tag_blob 2>err && + cat >expect <<-EOF && + error: object $blob is a blob, not a tag + fatal: bad object $blob + EOF + test_cmp expect err +' + + +test_expect_failure 'traverse unexpected objects with for-each-ref' ' + cat >expect <<-EOF && + error: bad tag pointer to $tree in $tag_tag_tree + fatal: parse_object_buffer failed on $tag_tag_tree for refs/tags/tag_tag_tree + EOF + test_must_fail git for-each-ref --format="%(*objectname)" 2>actual && + test_cmp expect actual +' + +>fsck-object-isa +test_expect_failure 'setup: unexpected objects with fsck' ' + test_must_fail git fsck 2>err && + sed -n -e "/^error: object .* is a .*, not a .*$/ { + s/^error: object \([0-9a-f]*\) is a \([a-z]*\), not a [a-z]*$/\\1 \\2/; + p; + }" <err >fsck-object-isa +' + +while read oid type +do + test_expect_failure "fsck knows unexpected object $oid is $type" ' + git cat-file -t $oid >expect && + echo $type >actual && + test_cmp expect actual + ' +done <fsck-object-isa + +test_expect_success 'traverse unexpected non-tag tag (blob seen to blob)' ' + test_must_fail git rev-list --objects $blob $commit_tag_blob 2>err && + cat >expected <<-EOF && + error: object $blob is a blob, not a commit + error: bad tag pointer to $blob in $commit_tag_blob + fatal: bad object $commit_tag_blob + EOF + test_cmp expected err && + + test_must_fail git rev-list --objects $blob $tag_tag_blob 2>err && + cat >expected <<-EOF && + error: object $blob is a blob, not a tag + error: bad tag pointer to $blob in $tag_tag_blob + fatal: bad object $tag_tag_blob + EOF + test_cmp expected err +' + test_done -- 2.38.0.1511.gcdcff1f1dc2 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 3/4] tag: don't misreport type of tagged objects in errors 2022-11-18 11:46 ` [PATCH 0/4] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason 2022-11-18 11:46 ` [PATCH 1/4] object-file.c: free the "t.tag" in check_tag() Ævar Arnfjörð Bjarmason 2022-11-18 11:46 ` [PATCH 2/4] object tests: add test for unexpected objects in tags Ævar Arnfjörð Bjarmason @ 2022-11-18 11:46 ` Ævar Arnfjörð Bjarmason 2022-11-18 11:46 ` [PATCH 4/4] tag: don't emit potentially incorrect "object is a X, not a Y" Ævar Arnfjörð Bjarmason 2022-12-30 1:52 ` [PATCH v2 0/3] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason 4 siblings, 0 replies; 22+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-11-18 11:46 UTC (permalink / raw) To: git Cc: Taylor Blau, Junio C Hamano, Jeff King, Jonathan Tan, Kousik Sanagavarapu, Ævar Arnfjörð Bjarmason Fix a regression in 89e4202f982 ([PATCH] Parse tags for absent objects, 2005-06-21) (yes, that ancient!) and correctly report an error on a tag like: object <a tree hash> type commit As: error: object <a tree hash> is tree, not a commit Instead of our long-standing misbehavior of inverting the two, and reporting: error: object <a tree hash> is commit, not a tree Which, as can be trivially seen with 'git cat-file -t <a tree hash>' is incorrect. The reason for this misreporting is that in parse_tag_buffer() we end up doing a lookup_{blob,commit,tag,tree}() depending on what we read out of the "type" line. If we haven't parsed that object before we end up dispatching to the type-specific lookup functions, e.g. this for commit.c in lookup_commit_type(): struct object *obj = lookup_object(r, oid); if (!obj) return create_object(r, oid, alloc_commit_node(r)); Its allocation will then set the obj->type according to what the tag told us the type was, but which we've never validated. At this point we've got an object in memory that hasn't been parsed, and whose type is incorrect, since we mistrusted a tag to tell us the type. Then when we actually load the object with parse_object() we read it and find that it's a "tree". See 8ff226a9d5e (add object_as_type helper for casting objects, 2014-07-13) for that behavior (that's just a refactoring commit, but shows all the code involved). Which explains why we inverted the error report. Normally when object_as_type() is called it's by the lookup_{blob,commit,tag,tree}() functions via parse_object(). At that point we can trust the obj->type. In the case of parsing objects we've learned about via a tag with an incorrect type it's the opposite, the obj->type isn't correct and holds the mislabeled type, but we're parsing the object and know for sure what object type we're dealing with. So, let's add "lookup_{blob,commit,tag,tree}_type()" functions to go with the existing ""lookup_{blob,commit,tag,tree}()", we'll call these from "parse_object_buffer()" where we actually know the type, as opposed to the "parse_tag_buffer()" code where we're just guessing what it might be. This only help with the cases where we do see the tag reference, and then end up doing a full parse of the object. But as seen in the "for-each-ref" and "fsck" tests we have cases where we'll never fully parse it. Those will be handled in a subsequent commit, but for now this handles the common case of "show" etc. running into these. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- blob.c | 11 +++++++++-- blob.h | 3 +++ commit.c | 11 +++++++++-- commit.h | 2 ++ object.c | 20 ++++++++++++++++---- object.h | 2 ++ t/t6102-rev-list-unexpected-objects.sh | 8 ++++---- tag.c | 21 +++++++++++++++++---- tag.h | 2 ++ tree.c | 11 +++++++++-- tree.h | 2 ++ 11 files changed, 75 insertions(+), 18 deletions(-) diff --git a/blob.c b/blob.c index 182718aba9f..ca30a22b2e8 100644 --- a/blob.c +++ b/blob.c @@ -5,12 +5,19 @@ const char *blob_type = "blob"; -struct blob *lookup_blob(struct repository *r, const struct object_id *oid) +struct blob *lookup_blob_type(struct repository *r, + const struct object_id *oid, + enum object_type type) { struct object *obj = lookup_object(r, oid); if (!obj) return create_object(r, oid, alloc_blob_node(r)); - return object_as_type(obj, OBJ_BLOB, 0); + return object_as_type_hint(obj, OBJ_BLOB, type); +} + +struct blob *lookup_blob(struct repository *r, const struct object_id *oid) +{ + return lookup_blob_type(r, oid, OBJ_NONE); } int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size) diff --git a/blob.h b/blob.h index 16648720557..066a2effcbf 100644 --- a/blob.h +++ b/blob.h @@ -10,6 +10,9 @@ struct blob { }; struct blob *lookup_blob(struct repository *r, const struct object_id *oid); +struct blob *lookup_blob_type(struct repository *r, + const struct object_id *oid, + enum object_type type); int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size); diff --git a/commit.c b/commit.c index 572301b80a2..8a90f279e24 100644 --- a/commit.c +++ b/commit.c @@ -67,12 +67,19 @@ struct commit *lookup_commit_object(struct repository *r, } -struct commit *lookup_commit(struct repository *r, const struct object_id *oid) +struct commit *lookup_commit_type(struct repository *r, + const struct object_id *oid, + enum object_type type) { struct object *obj = lookup_object(r, oid); if (!obj) return create_object(r, oid, alloc_commit_node(r)); - return object_as_type(obj, OBJ_COMMIT, 0); + return object_as_type_hint(obj, OBJ_COMMIT, type); +} + +struct commit *lookup_commit(struct repository *r, const struct object_id *oid) +{ + return lookup_commit_type(r, oid, OBJ_NONE); } struct commit *lookup_commit_reference_by_name(const char *name) diff --git a/commit.h b/commit.h index fa39202fa6b..95001a29d6b 100644 --- a/commit.h +++ b/commit.h @@ -78,6 +78,8 @@ struct commit *lookup_commit_object(struct repository *r, const struct object_id * "oid" is not in the object cache. */ struct commit *lookup_commit(struct repository *r, const struct object_id *oid); +struct commit *lookup_commit_type(struct repository *r, const struct object_id *oid, + enum object_type type); struct commit *lookup_commit_reference(struct repository *r, const struct object_id *oid); struct commit *lookup_commit_reference_gently(struct repository *r, diff --git a/object.c b/object.c index fad1a5af4a6..cc17ed0606e 100644 --- a/object.c +++ b/object.c @@ -177,6 +177,18 @@ void *object_as_type(struct object *obj, enum object_type type, int quiet) } } +void *object_as_type_hint(struct object *obj, enum object_type type, + enum object_type hint) +{ + if (hint != OBJ_NONE && obj->type != OBJ_NONE && obj->type != type) { + error(_("object %s is a %s, not a %s"), oid_to_hex(&obj->oid), + type_name(type), type_name(obj->type)); + obj->type = type; + return NULL; + } + return object_as_type(obj, type, 0);; +} + struct object *lookup_unknown_object(struct repository *r, const struct object_id *oid) { struct object *obj = lookup_object(r, oid); @@ -210,14 +222,14 @@ struct object *parse_object_buffer(struct repository *r, const struct object_id obj = NULL; if (type == OBJ_BLOB) { - struct blob *blob = lookup_blob(r, oid); + struct blob *blob = lookup_blob_type(r, oid, type); if (blob) { if (parse_blob_buffer(blob, buffer, size)) return NULL; obj = &blob->object; } } else if (type == OBJ_TREE) { - struct tree *tree = lookup_tree(r, oid); + struct tree *tree = lookup_tree_type(r, oid, type); if (tree) { obj = &tree->object; if (!tree->buffer) @@ -229,7 +241,7 @@ struct object *parse_object_buffer(struct repository *r, const struct object_id } } } else if (type == OBJ_COMMIT) { - struct commit *commit = lookup_commit(r, oid); + struct commit *commit = lookup_commit_type(r, oid, type); if (commit) { if (parse_commit_buffer(r, commit, buffer, size, 1)) return NULL; @@ -241,7 +253,7 @@ struct object *parse_object_buffer(struct repository *r, const struct object_id obj = &commit->object; } } else if (type == OBJ_TAG) { - struct tag *tag = lookup_tag(r, oid); + struct tag *tag = lookup_tag_type(r, oid, type); if (tag) { if (parse_tag_buffer(r, tag, buffer, size)) return NULL; diff --git a/object.h b/object.h index 31ebe114585..042c304d3a4 100644 --- a/object.h +++ b/object.h @@ -122,6 +122,8 @@ struct object *lookup_object(struct repository *r, const struct object_id *oid); void *create_object(struct repository *r, const struct object_id *oid, void *obj); void *object_as_type(struct object *obj, enum object_type type, int quiet); +void *object_as_type_hint(struct object *obj, enum object_type type, + enum object_type hint); /* * Returns the object, having parsed it to find out what it is. diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh index ac49f7182fd..2e36d8bcfd9 100755 --- a/t/t6102-rev-list-unexpected-objects.sh +++ b/t/t6102-rev-list-unexpected-objects.sh @@ -166,7 +166,7 @@ test_expect_success 'setup unexpected non-tag tag' ' git update-ref refs/tags/commit_tag_blob $commit_tag_blob ' -test_expect_failure 'traverse unexpected incorrectly typed tag (to commit & tag)' ' +test_expect_success 'traverse unexpected incorrectly typed tag (to commit & tag)' ' test_must_fail git rev-list --objects $tag_tag_commit 2>err && cat >expect <<-EOF && error: object $commit is a commit, not a tag @@ -182,7 +182,7 @@ test_expect_failure 'traverse unexpected incorrectly typed tag (to commit & tag) test_cmp expect err ' -test_expect_failure 'traverse unexpected incorrectly typed tag (to tree)' ' +test_expect_success 'traverse unexpected incorrectly typed tag (to tree)' ' test_must_fail git rev-list --objects $tag_tag_tree 2>err && cat >expect <<-EOF && error: object $tree is a tree, not a tag @@ -198,7 +198,7 @@ test_expect_failure 'traverse unexpected incorrectly typed tag (to tree)' ' test_cmp expect err ' -test_expect_failure 'traverse unexpected incorrectly typed tag (to blob)' ' +test_expect_success 'traverse unexpected incorrectly typed tag (to blob)' ' test_must_fail git rev-list --objects $tag_tag_blob 2>err && cat >expect <<-EOF && error: object $blob is a blob, not a tag @@ -214,7 +214,7 @@ test_expect_failure 'traverse unexpected incorrectly typed tag (to blob)' ' test_cmp expect err ' -test_expect_failure 'traverse unexpected non-tag tag (tree seen to blob)' ' +test_expect_success 'traverse unexpected non-tag tag (tree seen to blob)' ' test_must_fail git rev-list --objects $tree $commit_tag_blob 2>err && cat >expect <<-EOF && error: object $blob is a blob, not a commit diff --git a/tag.c b/tag.c index dfbcd7fcc24..19453c2edbf 100644 --- a/tag.c +++ b/tag.c @@ -100,12 +100,18 @@ struct object *deref_tag_noverify(struct object *o) return o; } -struct tag *lookup_tag(struct repository *r, const struct object_id *oid) +struct tag *lookup_tag_type(struct repository *r, const struct object_id *oid, + enum object_type type) { struct object *obj = lookup_object(r, oid); if (!obj) return create_object(r, oid, alloc_tag_node(r)); - return object_as_type(obj, OBJ_TAG, 0); + return object_as_type_hint(obj, OBJ_TAG, type); +} + +struct tag *lookup_tag(struct repository *r, const struct object_id *oid) +{ + return lookup_tag_type(r, oid, OBJ_NONE); } static timestamp_t parse_tag_date(const char *buf, const char *tail) @@ -135,6 +141,7 @@ void release_tag_memory(struct tag *t) int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, unsigned long size) { + struct object *obj; struct object_id oid; char type[20]; const char *bufptr = data; @@ -169,7 +176,10 @@ int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u type[nl - bufptr] = '\0'; bufptr = nl + 1; - if (!strcmp(type, blob_type)) { + obj = lookup_object(r, &oid); + if (obj) { + item->tagged = obj; + } else if (!strcmp(type, blob_type)) { item->tagged = (struct object *)lookup_blob(r, &oid); } else if (!strcmp(type, tree_type)) { item->tagged = (struct object *)lookup_tree(r, &oid); @@ -182,10 +192,13 @@ int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u type, oid_to_hex(&item->object.oid)); } - if (!item->tagged) + if (!item->tagged || strcmp(type_name(item->tagged->type), type)) { + error(_("object %s is a %s, not a %s"), oid_to_hex(&oid), + type_name(item->tagged->type), type); return error("bad tag pointer to %s in %s", oid_to_hex(&oid), oid_to_hex(&item->object.oid)); + } if (bufptr + 4 < tail && starts_with(bufptr, "tag ")) ; /* good */ diff --git a/tag.h b/tag.h index 3ce8e721924..42bd3e64011 100644 --- a/tag.h +++ b/tag.h @@ -12,6 +12,8 @@ struct tag { timestamp_t date; }; struct tag *lookup_tag(struct repository *r, const struct object_id *oid); +struct tag *lookup_tag_type(struct repository *r, const struct object_id *oid, + enum object_type type); int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, unsigned long size); int parse_tag(struct tag *item); void release_tag_memory(struct tag *t); diff --git a/tree.c b/tree.c index 410e3b477e5..1a730249bb8 100644 --- a/tree.c +++ b/tree.c @@ -102,12 +102,19 @@ int cmp_cache_name_compare(const void *a_, const void *b_) ce2->name, ce2->ce_namelen, ce_stage(ce2)); } -struct tree *lookup_tree(struct repository *r, const struct object_id *oid) +struct tree *lookup_tree_type(struct repository *r, + const struct object_id *oid, + enum object_type type) { struct object *obj = lookup_object(r, oid); if (!obj) return create_object(r, oid, alloc_tree_node(r)); - return object_as_type(obj, OBJ_TREE, 0); + return object_as_type_hint(obj, OBJ_TREE, type); +} + +struct tree *lookup_tree(struct repository *r, const struct object_id *oid) +{ + return lookup_tree_type(r, oid, OBJ_NONE); } int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size) diff --git a/tree.h b/tree.h index 6efff003e21..4af3b617f3d 100644 --- a/tree.h +++ b/tree.h @@ -15,6 +15,8 @@ struct tree { extern const char *tree_type; struct tree *lookup_tree(struct repository *r, const struct object_id *oid); +struct tree *lookup_tree_type(struct repository *r, const struct object_id *oid, + enum object_type type); int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size); -- 2.38.0.1511.gcdcff1f1dc2 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH 4/4] tag: don't emit potentially incorrect "object is a X, not a Y" 2022-11-18 11:46 ` [PATCH 0/4] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason ` (2 preceding siblings ...) 2022-11-18 11:46 ` [PATCH 3/4] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason @ 2022-11-18 11:46 ` Ævar Arnfjörð Bjarmason 2022-12-30 1:52 ` [PATCH v2 0/3] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason 4 siblings, 0 replies; 22+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-11-18 11:46 UTC (permalink / raw) To: git Cc: Taylor Blau, Junio C Hamano, Jeff King, Jonathan Tan, Kousik Sanagavarapu, Ævar Arnfjörð Bjarmason As noted in the preceding commit we weren't handling cases where we see a reference to a bad "type" in a "tag", but then end up not fully parsing the object. In those cases let's only claim that we have a bad tag pointer, but emit "is a %s, not a %s". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- t/t6102-rev-list-unexpected-objects.sh | 6 +++--- tag.c | 5 +++-- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh index 2e36d8bcfd9..ffc1672d7dc 100755 --- a/t/t6102-rev-list-unexpected-objects.sh +++ b/t/t6102-rev-list-unexpected-objects.sh @@ -231,7 +231,7 @@ test_expect_success 'traverse unexpected non-tag tag (tree seen to blob)' ' ' -test_expect_failure 'traverse unexpected objects with for-each-ref' ' +test_expect_success 'traverse unexpected objects with for-each-ref' ' cat >expect <<-EOF && error: bad tag pointer to $tree in $tag_tag_tree fatal: parse_object_buffer failed on $tag_tag_tree for refs/tags/tag_tag_tree @@ -241,7 +241,7 @@ test_expect_failure 'traverse unexpected objects with for-each-ref' ' ' >fsck-object-isa -test_expect_failure 'setup: unexpected objects with fsck' ' +test_expect_success 'setup: unexpected objects with fsck' ' test_must_fail git fsck 2>err && sed -n -e "/^error: object .* is a .*, not a .*$/ { s/^error: object \([0-9a-f]*\) is a \([a-z]*\), not a [a-z]*$/\\1 \\2/; @@ -251,7 +251,7 @@ test_expect_failure 'setup: unexpected objects with fsck' ' while read oid type do - test_expect_failure "fsck knows unexpected object $oid is $type" ' + test_expect_success "fsck knows unexpected object $oid is $type" ' git cat-file -t $oid >expect && echo $type >actual && test_cmp expect actual diff --git a/tag.c b/tag.c index 19453c2edbf..ad92cf89209 100644 --- a/tag.c +++ b/tag.c @@ -193,8 +193,9 @@ int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u } if (!item->tagged || strcmp(type_name(item->tagged->type), type)) { - error(_("object %s is a %s, not a %s"), oid_to_hex(&oid), - type_name(item->tagged->type), type); + if (item->tagged && item->tagged->parsed) + error(_("object %s is a %s, not a %s"), oid_to_hex(&oid), + type_name(item->tagged->type), type); return error("bad tag pointer to %s in %s", oid_to_hex(&oid), oid_to_hex(&item->object.oid)); -- 2.38.0.1511.gcdcff1f1dc2 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v2 0/3] tag: don't misreport type of tagged objects in errors 2022-11-18 11:46 ` [PATCH 0/4] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason ` (3 preceding siblings ...) 2022-11-18 11:46 ` [PATCH 4/4] tag: don't emit potentially incorrect "object is a X, not a Y" Ævar Arnfjörð Bjarmason @ 2022-12-30 1:52 ` Ævar Arnfjörð Bjarmason 2022-12-30 1:52 ` [PATCH v2 1/3] object tests: add test for unexpected objects in tags Ævar Arnfjörð Bjarmason ` (2 more replies) 4 siblings, 3 replies; 22+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-12-30 1:52 UTC (permalink / raw) To: git Cc: Taylor Blau, Junio C Hamano, Jeff King, Jonathan Tan, Kousik Sanagavarapu, Ævar Arnfjörð Bjarmason This series fixes a very long-standing issue where we'll get confused when we parse a tag whose "type" lies about the type of the target object. The v1 was on top of jk/parse-object-type-mismatch, which has since landed on "master". As I noted in [1] this covers remaining misreporting cases which weren't addressed in that series. Currently we'll parse tags and note what the "type" claims to be. Say a pointer to a "blob" object that claims to be a "commit" in the envelope. Then when we we'd try to parse that supposed "commit' for real we'd emit a message like: error: object <oid> is a blob, not a commit Which is reversed, i.e. we'd remember the first "blob" we saw, and then get confused about seeing a "commit" when we did the actual parsing. This is now fixed in almost all cases by having the one caller of parse_tag() which actually knows the type tell it "yes, I'm sure this is a commit". We'll then be able to see that we have a non-parsed object as scaffolding, but that it's really a commit, and emit the correct: error: object <oid> is a commit not a blob Which goes along with other errors where the tag object itself yells about being unhappy with the object reference. I submitted a version of these patches back in early 2021[2], this is significantly slimmed down since then. At the time Jeff King noted[3] that this approach inherently can't cover all possible scenarios. I.e. sometimes our parsing of the envelope isn't followed up by the "real" parse. Even in those cases we can "get it right as 3/3 here demonstrates. But there are going to be cases left where we get it wrong, but they're all cases where we get it wrong now. It's probably not worth fixing the long tail of those issues, but now we'll emit a sensible error on the common case of "log" etc. Changes since v1: * The v1 of this included a fix for the t.tag memory leak, which has now been ejected. I'm fixing that in another series[4] As a result we need to mark the new tests with !SANITIZE_LEAK, once some version of [4] lands we can un-mark these, so we'll test them under SANITIZE=leak. * In the previous 1st patch I marked a "setup" test as "test_expect_failure", which will pass at that point, let's make it "test_expect_success" from the outset. CI & branch at [5]. The "win build" CI failure is unrelated, it also happens when I re-push master, root cause unknown, but unrelated to this topic. 1. https://lore.kernel.org/git/221118.86cz9lgjxu.gmgdl@evledraar.gmail.com/ 2. https://lore.kernel.org/git/cover-00.11-00000000000-20210328T021238Z-avarab@gmail.com/ 3. https://lore.kernel.org/git/YGTGgFI19fS7Uv6I@coredump.intra.peff.net/ 4. https://lore.kernel.org/git/cover-00.20-00000000000-20221228T175512Z-avarab@gmail.com/ 5. https://github.com/avar/git/tree/avar/correct-object-as-type-minimal-2 Ævar Arnfjörð Bjarmason (3): object tests: add test for unexpected objects in tags tag: don't misreport type of tagged objects in errors tag: don't emit potentially incorrect "object is a X, not a Y" blob.c | 11 +- blob.h | 3 + commit.c | 11 +- commit.h | 2 + object.c | 20 +++- object.h | 2 + t/t6102-rev-list-unexpected-objects.sh | 146 +++++++++++++++++++++++++ tag.c | 22 +++- tag.h | 2 + tree.c | 11 +- tree.h | 2 + 11 files changed, 218 insertions(+), 14 deletions(-) Range-diff against v1: 1: 2be8477cd78 < -: ----------- object-file.c: free the "t.tag" in check_tag() 2: 1b5544ec868 ! 1: 0abf873f1e3 object tests: add test for unexpected objects in tags @@ Commit message report, and asserting that those are correct (currently, it's far from correct). + As these tests happen to run into a memory leak skip them under + SANITIZE=leak, as the test file was previously marked leak-free in + [3]. There is a concurrent fix for the leak in question[4]. + 1. https://lore.kernel.org/git/YGTGgFI19fS7Uv6I@coredump.intra.peff.net/ + 2. https://lore.kernel.org/git/patch-18.20-aa4df0e1b5c-20221228T175512Z-avarab@gmail.com/ + 3. dd9cede9136 (leak tests: mark some rev-list tests as passing with + SANITIZE=leak, 2021-10-31) + 4. https://lore.kernel.org/git/patch-18.20-aa4df0e1b5c-20221228T175512Z-avarab@gmail.com/ Helped-by: Jeff King <peff@peff.net> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> @@ t/t6102-rev-list-unexpected-objects.sh: test_expect_success 'traverse unexpected test_i18ngrep "not a blob" output ' -+test_expect_success 'setup unexpected non-tag tag' ' ++test_expect_success !SANITIZE_LEAK 'setup unexpected non-tag tag' ' + test_when_finished "git tag -d tag-commit tag-tag" && + + git tag -a -m"my tagged commit" tag-commit $commit && @@ t/t6102-rev-list-unexpected-objects.sh: test_expect_success 'traverse unexpected + git update-ref refs/tags/commit_tag_blob $commit_tag_blob +' + -+test_expect_failure 'traverse unexpected incorrectly typed tag (to commit & tag)' ' ++test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to commit & tag)' ' + test_must_fail git rev-list --objects $tag_tag_commit 2>err && + cat >expect <<-EOF && + error: object $commit is a commit, not a tag @@ t/t6102-rev-list-unexpected-objects.sh: test_expect_success 'traverse unexpected + test_cmp expect err +' + -+test_expect_failure 'traverse unexpected incorrectly typed tag (to tree)' ' ++test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to tree)' ' + test_must_fail git rev-list --objects $tag_tag_tree 2>err && + cat >expect <<-EOF && + error: object $tree is a tree, not a tag @@ t/t6102-rev-list-unexpected-objects.sh: test_expect_success 'traverse unexpected + test_cmp expect err +' + -+test_expect_failure 'traverse unexpected incorrectly typed tag (to blob)' ' ++test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to blob)' ' + test_must_fail git rev-list --objects $tag_tag_blob 2>err && + cat >expect <<-EOF && + error: object $blob is a blob, not a tag @@ t/t6102-rev-list-unexpected-objects.sh: test_expect_success 'traverse unexpected + test_cmp expect err +' + -+test_expect_failure 'traverse unexpected non-tag tag (tree seen to blob)' ' ++test_expect_failure !SANITIZE_LEAK 'traverse unexpected non-tag tag (tree seen to blob)' ' + test_must_fail git rev-list --objects $tree $commit_tag_blob 2>err && + cat >expect <<-EOF && + error: object $blob is a blob, not a commit @@ t/t6102-rev-list-unexpected-objects.sh: test_expect_success 'traverse unexpected +' + + -+test_expect_failure 'traverse unexpected objects with for-each-ref' ' ++test_expect_failure !SANITIZE_LEAK 'traverse unexpected objects with for-each-ref' ' + cat >expect <<-EOF && + error: bad tag pointer to $tree in $tag_tag_tree + fatal: parse_object_buffer failed on $tag_tag_tree for refs/tags/tag_tag_tree @@ t/t6102-rev-list-unexpected-objects.sh: test_expect_success 'traverse unexpected +' + +>fsck-object-isa -+test_expect_failure 'setup: unexpected objects with fsck' ' ++test_expect_success 'setup: unexpected objects with fsck' ' + test_must_fail git fsck 2>err && + sed -n -e "/^error: object .* is a .*, not a .*$/ { + s/^error: object \([0-9a-f]*\) is a \([a-z]*\), not a [a-z]*$/\\1 \\2/; @@ t/t6102-rev-list-unexpected-objects.sh: test_expect_success 'traverse unexpected + ' +done <fsck-object-isa + -+test_expect_success 'traverse unexpected non-tag tag (blob seen to blob)' ' ++test_expect_success !SANITIZE_LEAK 'traverse unexpected non-tag tag (blob seen to blob)' ' + test_must_fail git rev-list --objects $blob $commit_tag_blob 2>err && + cat >expected <<-EOF && + error: object $blob is a blob, not a commit 3: 468af961dc4 ! 2: 96398731841 tag: don't misreport type of tagged objects in errors @@ blob.c + return lookup_blob_type(r, oid, OBJ_NONE); } - int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size) + void parse_blob_buffer(struct blob *item) ## blob.h ## @@ blob.h: struct blob { @@ blob.h: struct blob { + const struct object_id *oid, + enum object_type type); - int parse_blob_buffer(struct blob *item, void *buffer, unsigned long size); - + /** + * Blobs do not contain references to other objects and do not have ## commit.c ## @@ commit.c: struct commit *lookup_commit_object(struct repository *r, @@ object.c: struct object *parse_object_buffer(struct repository *r, const struct - struct blob *blob = lookup_blob(r, oid); + struct blob *blob = lookup_blob_type(r, oid, type); if (blob) { - if (parse_blob_buffer(blob, buffer, size)) - return NULL; + parse_blob_buffer(blob); obj = &blob->object; } } else if (type == OBJ_TREE) { @@ object.h: struct object *lookup_object(struct repository *r, const struct object * Returns the object, having parsed it to find out what it is. ## t/t6102-rev-list-unexpected-objects.sh ## -@@ t/t6102-rev-list-unexpected-objects.sh: test_expect_success 'setup unexpected non-tag tag' ' +@@ t/t6102-rev-list-unexpected-objects.sh: test_expect_success !SANITIZE_LEAK 'setup unexpected non-tag tag' ' git update-ref refs/tags/commit_tag_blob $commit_tag_blob ' --test_expect_failure 'traverse unexpected incorrectly typed tag (to commit & tag)' ' -+test_expect_success 'traverse unexpected incorrectly typed tag (to commit & tag)' ' +-test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to commit & tag)' ' ++test_expect_success !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to commit & tag)' ' test_must_fail git rev-list --objects $tag_tag_commit 2>err && cat >expect <<-EOF && error: object $commit is a commit, not a tag -@@ t/t6102-rev-list-unexpected-objects.sh: test_expect_failure 'traverse unexpected incorrectly typed tag (to commit & tag) +@@ t/t6102-rev-list-unexpected-objects.sh: test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (t test_cmp expect err ' --test_expect_failure 'traverse unexpected incorrectly typed tag (to tree)' ' -+test_expect_success 'traverse unexpected incorrectly typed tag (to tree)' ' +-test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to tree)' ' ++test_expect_success !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to tree)' ' test_must_fail git rev-list --objects $tag_tag_tree 2>err && cat >expect <<-EOF && error: object $tree is a tree, not a tag -@@ t/t6102-rev-list-unexpected-objects.sh: test_expect_failure 'traverse unexpected incorrectly typed tag (to tree)' ' +@@ t/t6102-rev-list-unexpected-objects.sh: test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (t test_cmp expect err ' --test_expect_failure 'traverse unexpected incorrectly typed tag (to blob)' ' -+test_expect_success 'traverse unexpected incorrectly typed tag (to blob)' ' +-test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to blob)' ' ++test_expect_success !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to blob)' ' test_must_fail git rev-list --objects $tag_tag_blob 2>err && cat >expect <<-EOF && error: object $blob is a blob, not a tag -@@ t/t6102-rev-list-unexpected-objects.sh: test_expect_failure 'traverse unexpected incorrectly typed tag (to blob)' ' +@@ t/t6102-rev-list-unexpected-objects.sh: test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (t test_cmp expect err ' --test_expect_failure 'traverse unexpected non-tag tag (tree seen to blob)' ' -+test_expect_success 'traverse unexpected non-tag tag (tree seen to blob)' ' +-test_expect_failure !SANITIZE_LEAK 'traverse unexpected non-tag tag (tree seen to blob)' ' ++test_expect_success !SANITIZE_LEAK 'traverse unexpected non-tag tag (tree seen to blob)' ' test_must_fail git rev-list --objects $tree $commit_tag_blob 2>err && cat >expect <<-EOF && error: object $blob is a blob, not a commit 4: 1a9dcb9e05d ! 3: 2493988c41c tag: don't emit potentially incorrect "object is a X, not a Y" @@ Commit message Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> ## t/t6102-rev-list-unexpected-objects.sh ## -@@ t/t6102-rev-list-unexpected-objects.sh: test_expect_success 'traverse unexpected non-tag tag (tree seen to blob)' ' +@@ t/t6102-rev-list-unexpected-objects.sh: test_expect_success !SANITIZE_LEAK 'traverse unexpected non-tag tag (tree seen t ' --test_expect_failure 'traverse unexpected objects with for-each-ref' ' -+test_expect_success 'traverse unexpected objects with for-each-ref' ' +-test_expect_failure !SANITIZE_LEAK 'traverse unexpected objects with for-each-ref' ' ++test_expect_success !SANITIZE_LEAK 'traverse unexpected objects with for-each-ref' ' cat >expect <<-EOF && error: bad tag pointer to $tree in $tag_tag_tree fatal: parse_object_buffer failed on $tag_tag_tree for refs/tags/tag_tag_tree -@@ t/t6102-rev-list-unexpected-objects.sh: test_expect_failure 'traverse unexpected objects with for-each-ref' ' - ' - - >fsck-object-isa --test_expect_failure 'setup: unexpected objects with fsck' ' -+test_expect_success 'setup: unexpected objects with fsck' ' - test_must_fail git fsck 2>err && - sed -n -e "/^error: object .* is a .*, not a .*$/ { - s/^error: object \([0-9a-f]*\) is a \([a-z]*\), not a [a-z]*$/\\1 \\2/; -@@ t/t6102-rev-list-unexpected-objects.sh: test_expect_failure 'setup: unexpected objects with fsck' ' +@@ t/t6102-rev-list-unexpected-objects.sh: test_expect_success 'setup: unexpected objects with fsck' ' while read oid type do -- 2.39.0.1153.g589e4efe9dc ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v2 1/3] object tests: add test for unexpected objects in tags 2022-12-30 1:52 ` [PATCH v2 0/3] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason @ 2022-12-30 1:52 ` Ævar Arnfjörð Bjarmason 2022-12-30 4:25 ` Junio C Hamano 2022-12-30 1:52 ` [PATCH v2 2/3] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason 2022-12-30 1:52 ` [PATCH v2 3/3] tag: don't emit potentially incorrect "object is a X, not a Y" Ævar Arnfjörð Bjarmason 2 siblings, 1 reply; 22+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-12-30 1:52 UTC (permalink / raw) To: git Cc: Taylor Blau, Junio C Hamano, Jeff King, Jonathan Tan, Kousik Sanagavarapu, Ævar Arnfjörð Bjarmason Fix a blind spot in the tests added in 0616617c7e1 (t: introduce tests for unexpected object types, 2019-04-09), there were no meaningful tests for checking how we reported on finding the incorrect object type in a tag, i.e. one that broke the "type" promise in the tag header. We'll report the wrong object type in these cases, and thus fail on the "test_cmp", e.g. for the first "error: " output being tested here we should say "$commit is a tag, not a commit", instead we say "$commit is a commit, not a tag". This will be fixed in a subsequent commit. See the discussion & notes in [1] and downthread of there for test snippets that are adapted here. In the case of "fsck" which objects we visit in what order, and if we report errors on them depends on their OIDs. So the test uses the technique of extracting the OID/type combinations that fsck does report, and asserting that those are correct (currently, it's far from correct). As these tests happen to run into a memory leak skip them under SANITIZE=leak, as the test file was previously marked leak-free in [3]. There is a concurrent fix for the leak in question[4]. 1. https://lore.kernel.org/git/YGTGgFI19fS7Uv6I@coredump.intra.peff.net/ 2. https://lore.kernel.org/git/patch-18.20-aa4df0e1b5c-20221228T175512Z-avarab@gmail.com/ 3. dd9cede9136 (leak tests: mark some rev-list tests as passing with SANITIZE=leak, 2021-10-31) 4. https://lore.kernel.org/git/patch-18.20-aa4df0e1b5c-20221228T175512Z-avarab@gmail.com/ Helped-by: Jeff King <peff@peff.net> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- t/t6102-rev-list-unexpected-objects.sh | 146 +++++++++++++++++++++++++ 1 file changed, 146 insertions(+) diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh index 9350b5fd2c2..f1c30db2654 100755 --- a/t/t6102-rev-list-unexpected-objects.sh +++ b/t/t6102-rev-list-unexpected-objects.sh @@ -130,4 +130,150 @@ test_expect_success 'traverse unexpected non-blob tag (seen)' ' test_i18ngrep "not a blob" output ' +test_expect_success !SANITIZE_LEAK 'setup unexpected non-tag tag' ' + test_when_finished "git tag -d tag-commit tag-tag" && + + git tag -a -m"my tagged commit" tag-commit $commit && + tag_commit=$(git rev-parse tag-commit) && + git tag -a -m"my tagged tag" tag-tag tag-commit && + tag_tag=$(git rev-parse tag-tag) && + + git cat-file tag tag-tag >good-tag-tag && + git cat-file tag tag-commit >good-commit-tag && + + sed -e "s/$tag_commit/$commit/" <good-tag-tag >broken-tag-tag-commit && + sed -e "s/$tag_commit/$tree/" <good-tag-tag >broken-tag-tag-tree && + sed -e "s/$tag_commit/$blob/" <good-tag-tag >broken-tag-tag-blob && + + sed -e "s/$commit/$tag_commit/" <good-commit-tag >broken-commit-tag-tag && + sed -e "s/$commit/$tree/" <good-commit-tag >broken-commit-tag-tree && + sed -e "s/$commit/$blob/" <good-commit-tag >broken-commit-tag-blob && + + tag_tag_commit=$(git hash-object -w -t tag broken-tag-tag-commit) && + tag_tag_tree=$(git hash-object -w -t tag broken-tag-tag-tree) && + tag_tag_blob=$(git hash-object -w -t tag broken-tag-tag-blob) && + + git update-ref refs/tags/tag_tag_commit $tag_tag_commit && + git update-ref refs/tags/tag_tag_tree $tag_tag_tree && + git update-ref refs/tags/tag_tag_blob $tag_tag_blob && + + commit_tag_tag=$(git hash-object -w -t tag broken-commit-tag-tag) && + commit_tag_tree=$(git hash-object -w -t tag broken-commit-tag-tree) && + commit_tag_blob=$(git hash-object -w -t tag broken-commit-tag-blob) && + + git update-ref refs/tags/commit_tag_tag $commit_tag_tag && + git update-ref refs/tags/commit_tag_tree $commit_tag_tree && + git update-ref refs/tags/commit_tag_blob $commit_tag_blob +' + +test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to commit & tag)' ' + test_must_fail git rev-list --objects $tag_tag_commit 2>err && + cat >expect <<-EOF && + error: object $commit is a commit, not a tag + fatal: bad object $commit + EOF + test_cmp expect err && + + test_must_fail git rev-list --objects $commit_tag_tag 2>err && + cat >expect <<-EOF && + error: object $tag_commit is a tag, not a commit + fatal: bad object $tag_commit + EOF + test_cmp expect err +' + +test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to tree)' ' + test_must_fail git rev-list --objects $tag_tag_tree 2>err && + cat >expect <<-EOF && + error: object $tree is a tree, not a tag + fatal: bad object $tree + EOF + test_cmp expect err && + + test_must_fail git rev-list --objects $commit_tag_tree 2>err && + cat >expect <<-EOF && + error: object $tree is a tree, not a commit + fatal: bad object $tree + EOF + test_cmp expect err +' + +test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to blob)' ' + test_must_fail git rev-list --objects $tag_tag_blob 2>err && + cat >expect <<-EOF && + error: object $blob is a blob, not a tag + fatal: bad object $blob + EOF + test_cmp expect err && + + test_must_fail git rev-list --objects $commit_tag_blob 2>err && + cat >expect <<-EOF && + error: object $blob is a blob, not a commit + fatal: bad object $blob + EOF + test_cmp expect err +' + +test_expect_failure !SANITIZE_LEAK 'traverse unexpected non-tag tag (tree seen to blob)' ' + test_must_fail git rev-list --objects $tree $commit_tag_blob 2>err && + cat >expect <<-EOF && + error: object $blob is a blob, not a commit + fatal: bad object $blob + EOF + test_cmp expect err && + + test_must_fail git rev-list --objects $tree $tag_tag_blob 2>err && + cat >expect <<-EOF && + error: object $blob is a blob, not a tag + fatal: bad object $blob + EOF + test_cmp expect err +' + + +test_expect_failure !SANITIZE_LEAK 'traverse unexpected objects with for-each-ref' ' + cat >expect <<-EOF && + error: bad tag pointer to $tree in $tag_tag_tree + fatal: parse_object_buffer failed on $tag_tag_tree for refs/tags/tag_tag_tree + EOF + test_must_fail git for-each-ref --format="%(*objectname)" 2>actual && + test_cmp expect actual +' + +>fsck-object-isa +test_expect_success 'setup: unexpected objects with fsck' ' + test_must_fail git fsck 2>err && + sed -n -e "/^error: object .* is a .*, not a .*$/ { + s/^error: object \([0-9a-f]*\) is a \([a-z]*\), not a [a-z]*$/\\1 \\2/; + p; + }" <err >fsck-object-isa +' + +while read oid type +do + test_expect_failure "fsck knows unexpected object $oid is $type" ' + git cat-file -t $oid >expect && + echo $type >actual && + test_cmp expect actual + ' +done <fsck-object-isa + +test_expect_success !SANITIZE_LEAK 'traverse unexpected non-tag tag (blob seen to blob)' ' + test_must_fail git rev-list --objects $blob $commit_tag_blob 2>err && + cat >expected <<-EOF && + error: object $blob is a blob, not a commit + error: bad tag pointer to $blob in $commit_tag_blob + fatal: bad object $commit_tag_blob + EOF + test_cmp expected err && + + test_must_fail git rev-list --objects $blob $tag_tag_blob 2>err && + cat >expected <<-EOF && + error: object $blob is a blob, not a tag + error: bad tag pointer to $blob in $tag_tag_blob + fatal: bad object $tag_tag_blob + EOF + test_cmp expected err +' + test_done -- 2.39.0.1153.g589e4efe9dc ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v2 1/3] object tests: add test for unexpected objects in tags 2022-12-30 1:52 ` [PATCH v2 1/3] object tests: add test for unexpected objects in tags Ævar Arnfjörð Bjarmason @ 2022-12-30 4:25 ` Junio C Hamano 0 siblings, 0 replies; 22+ messages in thread From: Junio C Hamano @ 2022-12-30 4:25 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Taylor Blau, Jeff King, Jonathan Tan, Kousik Sanagavarapu Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > +test_expect_success !SANITIZE_LEAK 'setup unexpected non-tag tag' ' > + test_when_finished "git tag -d tag-commit tag-tag" && > + > + git tag -a -m"my tagged commit" tag-commit $commit && > + tag_commit=$(git rev-parse tag-commit) && > + git tag -a -m"my tagged tag" tag-tag tag-commit && > + tag_tag=$(git rev-parse tag-tag) && > + > + git cat-file tag tag-tag >good-tag-tag && > + git cat-file tag tag-commit >good-commit-tag && > + > + sed -e "s/$tag_commit/$commit/" <good-tag-tag >broken-tag-tag-commit && > + sed -e "s/$tag_commit/$tree/" <good-tag-tag >broken-tag-tag-tree && > + sed -e "s/$tag_commit/$blob/" <good-tag-tag >broken-tag-tag-blob && > + > + sed -e "s/$commit/$tag_commit/" <good-commit-tag >broken-commit-tag-tag && > + sed -e "s/$commit/$tree/" <good-commit-tag >broken-commit-tag-tree && > + sed -e "s/$commit/$blob/" <good-commit-tag >broken-commit-tag-blob && > + > + tag_tag_commit=$(git hash-object -w -t tag broken-tag-tag-commit) && > + tag_tag_tree=$(git hash-object -w -t tag broken-tag-tag-tree) && > + tag_tag_blob=$(git hash-object -w -t tag broken-tag-tag-blob) && If the second block of 3 sed commands to prepare data for tags that incorrectly claim to point at a commit are moved a bit, i.e. make "sed's && hash-object's && update-ref's" as a logical group, the above would become slightly easier to read, but in any case the set-up step looks quite repetitive and boring to read ;-) There is no strong reason to use broken-tag-* temporary files, though. Each of them is used exactly once, but you can just pipe the output from "sed" to "git hash-object --stdin" without losing any exit status, e.g. tag_tag_commit=$(sed -e '...' good-commit-tag | git hash-object -w -t tag --stdin) > + git update-ref refs/tags/tag_tag_commit $tag_tag_commit && > + git update-ref refs/tags/tag_tag_tree $tag_tag_tree && > + git update-ref refs/tags/tag_tag_blob $tag_tag_blob && > + > + commit_tag_tag=$(git hash-object -w -t tag broken-commit-tag-tag) && > + commit_tag_tree=$(git hash-object -w -t tag broken-commit-tag-tree) && > + commit_tag_blob=$(git hash-object -w -t tag broken-commit-tag-blob) && > + > + git update-ref refs/tags/commit_tag_tag $commit_tag_tag && > + git update-ref refs/tags/commit_tag_tree $commit_tag_tree && > + git update-ref refs/tags/commit_tag_blob $commit_tag_blob > +' > + > +test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to commit & tag)' ' > + test_must_fail git rev-list --objects $tag_tag_commit 2>err && Does this have to be "rev-list --objects" or would something like "cat-file -t $tag_tag_commit^0" do? Especially because "rev-list --objects" is a rather heavy-weight command that does a lot of checking, I am wondering if it is sensible to rely on the assumption that the errors expected below will stay to be the only errors we get before the command exits (hence a wish to replace it with a more narrowly focused comamnd). > + cat >expect <<-EOF && > + error: object $commit is a commit, not a tag > + fatal: bad object $commit > + EOF > + test_cmp expect err && > +test_expect_failure !SANITIZE_LEAK 'traverse unexpected objects with for-each-ref' ' > + cat >expect <<-EOF && > + error: bad tag pointer to $tree in $tag_tag_tree > + fatal: parse_object_buffer failed on $tag_tag_tree for refs/tags/tag_tag_tree > + EOF This depends on the fact that among the broken ones tag_tag_tree sorts the earliest (and the command stops after barfing on a single bad object), doesn't it? I wonder if it makes the test more robust by feeding refs/tags/tag_tag_tree from the command line to limit the tips the command needs to inspect. > +>fsck-object-isa Move it inside the setup as the first command in case "git fsck" succeeds? > +test_expect_success 'setup: unexpected objects with fsck' ' > + test_must_fail git fsck 2>err && > + sed -n -e "/^error: object .* is a .*, not a .*$/ { > + s/^error: object \([0-9a-f]*\) is a \([a-z]*\), not a [a-z]*$/\\1 \\2/; > + p; > + }" <err >fsck-object-isa > +' ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v2 2/3] tag: don't misreport type of tagged objects in errors 2022-12-30 1:52 ` [PATCH v2 0/3] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason 2022-12-30 1:52 ` [PATCH v2 1/3] object tests: add test for unexpected objects in tags Ævar Arnfjörð Bjarmason @ 2022-12-30 1:52 ` Ævar Arnfjörð Bjarmason 2022-12-30 6:07 ` Junio C Hamano 2022-12-30 1:52 ` [PATCH v2 3/3] tag: don't emit potentially incorrect "object is a X, not a Y" Ævar Arnfjörð Bjarmason 2 siblings, 1 reply; 22+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-12-30 1:52 UTC (permalink / raw) To: git Cc: Taylor Blau, Junio C Hamano, Jeff King, Jonathan Tan, Kousik Sanagavarapu, Ævar Arnfjörð Bjarmason Fix a regression in 89e4202f982 ([PATCH] Parse tags for absent objects, 2005-06-21) (yes, that ancient!) and correctly report an error on a tag like: object <a tree hash> type commit As: error: object <a tree hash> is tree, not a commit Instead of our long-standing misbehavior of inverting the two, and reporting: error: object <a tree hash> is commit, not a tree Which, as can be trivially seen with 'git cat-file -t <a tree hash>' is incorrect. The reason for this misreporting is that in parse_tag_buffer() we end up doing a lookup_{blob,commit,tag,tree}() depending on what we read out of the "type" line. If we haven't parsed that object before we end up dispatching to the type-specific lookup functions, e.g. this for commit.c in lookup_commit_type(): struct object *obj = lookup_object(r, oid); if (!obj) return create_object(r, oid, alloc_commit_node(r)); Its allocation will then set the obj->type according to what the tag told us the type was, but which we've never validated. At this point we've got an object in memory that hasn't been parsed, and whose type is incorrect, since we mistrusted a tag to tell us the type. Then when we actually load the object with parse_object() we read it and find that it's a "tree". See 8ff226a9d5e (add object_as_type helper for casting objects, 2014-07-13) for that behavior (that's just a refactoring commit, but shows all the code involved). Which explains why we inverted the error report. Normally when object_as_type() is called it's by the lookup_{blob,commit,tag,tree}() functions via parse_object(). At that point we can trust the obj->type. In the case of parsing objects we've learned about via a tag with an incorrect type it's the opposite, the obj->type isn't correct and holds the mislabeled type, but we're parsing the object and know for sure what object type we're dealing with. So, let's add "lookup_{blob,commit,tag,tree}_type()" functions to go with the existing ""lookup_{blob,commit,tag,tree}()", we'll call these from "parse_object_buffer()" where we actually know the type, as opposed to the "parse_tag_buffer()" code where we're just guessing what it might be. This only help with the cases where we do see the tag reference, and then end up doing a full parse of the object. But as seen in the "for-each-ref" and "fsck" tests we have cases where we'll never fully parse it. Those will be handled in a subsequent commit, but for now this handles the common case of "show" etc. running into these. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- blob.c | 11 +++++++++-- blob.h | 3 +++ commit.c | 11 +++++++++-- commit.h | 2 ++ object.c | 20 ++++++++++++++++---- object.h | 2 ++ t/t6102-rev-list-unexpected-objects.sh | 8 ++++---- tag.c | 21 +++++++++++++++++---- tag.h | 2 ++ tree.c | 11 +++++++++-- tree.h | 2 ++ 11 files changed, 75 insertions(+), 18 deletions(-) diff --git a/blob.c b/blob.c index 8f83523b0cd..d6e3d64213d 100644 --- a/blob.c +++ b/blob.c @@ -5,12 +5,19 @@ const char *blob_type = "blob"; -struct blob *lookup_blob(struct repository *r, const struct object_id *oid) +struct blob *lookup_blob_type(struct repository *r, + const struct object_id *oid, + enum object_type type) { struct object *obj = lookup_object(r, oid); if (!obj) return create_object(r, oid, alloc_blob_node(r)); - return object_as_type(obj, OBJ_BLOB, 0); + return object_as_type_hint(obj, OBJ_BLOB, type); +} + +struct blob *lookup_blob(struct repository *r, const struct object_id *oid) +{ + return lookup_blob_type(r, oid, OBJ_NONE); } void parse_blob_buffer(struct blob *item) diff --git a/blob.h b/blob.h index 74555c90c44..b91b1600bda 100644 --- a/blob.h +++ b/blob.h @@ -10,6 +10,9 @@ struct blob { }; struct blob *lookup_blob(struct repository *r, const struct object_id *oid); +struct blob *lookup_blob_type(struct repository *r, + const struct object_id *oid, + enum object_type type); /** * Blobs do not contain references to other objects and do not have diff --git a/commit.c b/commit.c index d00780bee5f..8e30c21816c 100644 --- a/commit.c +++ b/commit.c @@ -67,12 +67,19 @@ struct commit *lookup_commit_object(struct repository *r, } -struct commit *lookup_commit(struct repository *r, const struct object_id *oid) +struct commit *lookup_commit_type(struct repository *r, + const struct object_id *oid, + enum object_type type) { struct object *obj = lookup_object(r, oid); if (!obj) return create_object(r, oid, alloc_commit_node(r)); - return object_as_type(obj, OBJ_COMMIT, 0); + return object_as_type_hint(obj, OBJ_COMMIT, type); +} + +struct commit *lookup_commit(struct repository *r, const struct object_id *oid) +{ + return lookup_commit_type(r, oid, OBJ_NONE); } struct commit *lookup_commit_reference_by_name(const char *name) diff --git a/commit.h b/commit.h index fa39202fa6b..95001a29d6b 100644 --- a/commit.h +++ b/commit.h @@ -78,6 +78,8 @@ struct commit *lookup_commit_object(struct repository *r, const struct object_id * "oid" is not in the object cache. */ struct commit *lookup_commit(struct repository *r, const struct object_id *oid); +struct commit *lookup_commit_type(struct repository *r, const struct object_id *oid, + enum object_type type); struct commit *lookup_commit_reference(struct repository *r, const struct object_id *oid); struct commit *lookup_commit_reference_gently(struct repository *r, diff --git a/object.c b/object.c index 344087de4de..81ec18f0739 100644 --- a/object.c +++ b/object.c @@ -177,6 +177,18 @@ void *object_as_type(struct object *obj, enum object_type type, int quiet) } } +void *object_as_type_hint(struct object *obj, enum object_type type, + enum object_type hint) +{ + if (hint != OBJ_NONE && obj->type != OBJ_NONE && obj->type != type) { + error(_("object %s is a %s, not a %s"), oid_to_hex(&obj->oid), + type_name(type), type_name(obj->type)); + obj->type = type; + return NULL; + } + return object_as_type(obj, type, 0);; +} + struct object *lookup_unknown_object(struct repository *r, const struct object_id *oid) { struct object *obj = lookup_object(r, oid); @@ -210,13 +222,13 @@ struct object *parse_object_buffer(struct repository *r, const struct object_id obj = NULL; if (type == OBJ_BLOB) { - struct blob *blob = lookup_blob(r, oid); + struct blob *blob = lookup_blob_type(r, oid, type); if (blob) { parse_blob_buffer(blob); obj = &blob->object; } } else if (type == OBJ_TREE) { - struct tree *tree = lookup_tree(r, oid); + struct tree *tree = lookup_tree_type(r, oid, type); if (tree) { obj = &tree->object; if (!tree->buffer) @@ -228,7 +240,7 @@ struct object *parse_object_buffer(struct repository *r, const struct object_id } } } else if (type == OBJ_COMMIT) { - struct commit *commit = lookup_commit(r, oid); + struct commit *commit = lookup_commit_type(r, oid, type); if (commit) { if (parse_commit_buffer(r, commit, buffer, size, 1)) return NULL; @@ -240,7 +252,7 @@ struct object *parse_object_buffer(struct repository *r, const struct object_id obj = &commit->object; } } else if (type == OBJ_TAG) { - struct tag *tag = lookup_tag(r, oid); + struct tag *tag = lookup_tag_type(r, oid, type); if (tag) { if (parse_tag_buffer(r, tag, buffer, size)) return NULL; diff --git a/object.h b/object.h index 31ebe114585..042c304d3a4 100644 --- a/object.h +++ b/object.h @@ -122,6 +122,8 @@ struct object *lookup_object(struct repository *r, const struct object_id *oid); void *create_object(struct repository *r, const struct object_id *oid, void *obj); void *object_as_type(struct object *obj, enum object_type type, int quiet); +void *object_as_type_hint(struct object *obj, enum object_type type, + enum object_type hint); /* * Returns the object, having parsed it to find out what it is. diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh index f1c30db2654..590e2523d0c 100755 --- a/t/t6102-rev-list-unexpected-objects.sh +++ b/t/t6102-rev-list-unexpected-objects.sh @@ -166,7 +166,7 @@ test_expect_success !SANITIZE_LEAK 'setup unexpected non-tag tag' ' git update-ref refs/tags/commit_tag_blob $commit_tag_blob ' -test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to commit & tag)' ' +test_expect_success !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to commit & tag)' ' test_must_fail git rev-list --objects $tag_tag_commit 2>err && cat >expect <<-EOF && error: object $commit is a commit, not a tag @@ -182,7 +182,7 @@ test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (t test_cmp expect err ' -test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to tree)' ' +test_expect_success !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to tree)' ' test_must_fail git rev-list --objects $tag_tag_tree 2>err && cat >expect <<-EOF && error: object $tree is a tree, not a tag @@ -198,7 +198,7 @@ test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (t test_cmp expect err ' -test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to blob)' ' +test_expect_success !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (to blob)' ' test_must_fail git rev-list --objects $tag_tag_blob 2>err && cat >expect <<-EOF && error: object $blob is a blob, not a tag @@ -214,7 +214,7 @@ test_expect_failure !SANITIZE_LEAK 'traverse unexpected incorrectly typed tag (t test_cmp expect err ' -test_expect_failure !SANITIZE_LEAK 'traverse unexpected non-tag tag (tree seen to blob)' ' +test_expect_success !SANITIZE_LEAK 'traverse unexpected non-tag tag (tree seen to blob)' ' test_must_fail git rev-list --objects $tree $commit_tag_blob 2>err && cat >expect <<-EOF && error: object $blob is a blob, not a commit diff --git a/tag.c b/tag.c index dfbcd7fcc24..19453c2edbf 100644 --- a/tag.c +++ b/tag.c @@ -100,12 +100,18 @@ struct object *deref_tag_noverify(struct object *o) return o; } -struct tag *lookup_tag(struct repository *r, const struct object_id *oid) +struct tag *lookup_tag_type(struct repository *r, const struct object_id *oid, + enum object_type type) { struct object *obj = lookup_object(r, oid); if (!obj) return create_object(r, oid, alloc_tag_node(r)); - return object_as_type(obj, OBJ_TAG, 0); + return object_as_type_hint(obj, OBJ_TAG, type); +} + +struct tag *lookup_tag(struct repository *r, const struct object_id *oid) +{ + return lookup_tag_type(r, oid, OBJ_NONE); } static timestamp_t parse_tag_date(const char *buf, const char *tail) @@ -135,6 +141,7 @@ void release_tag_memory(struct tag *t) int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, unsigned long size) { + struct object *obj; struct object_id oid; char type[20]; const char *bufptr = data; @@ -169,7 +176,10 @@ int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u type[nl - bufptr] = '\0'; bufptr = nl + 1; - if (!strcmp(type, blob_type)) { + obj = lookup_object(r, &oid); + if (obj) { + item->tagged = obj; + } else if (!strcmp(type, blob_type)) { item->tagged = (struct object *)lookup_blob(r, &oid); } else if (!strcmp(type, tree_type)) { item->tagged = (struct object *)lookup_tree(r, &oid); @@ -182,10 +192,13 @@ int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u type, oid_to_hex(&item->object.oid)); } - if (!item->tagged) + if (!item->tagged || strcmp(type_name(item->tagged->type), type)) { + error(_("object %s is a %s, not a %s"), oid_to_hex(&oid), + type_name(item->tagged->type), type); return error("bad tag pointer to %s in %s", oid_to_hex(&oid), oid_to_hex(&item->object.oid)); + } if (bufptr + 4 < tail && starts_with(bufptr, "tag ")) ; /* good */ diff --git a/tag.h b/tag.h index 3ce8e721924..42bd3e64011 100644 --- a/tag.h +++ b/tag.h @@ -12,6 +12,8 @@ struct tag { timestamp_t date; }; struct tag *lookup_tag(struct repository *r, const struct object_id *oid); +struct tag *lookup_tag_type(struct repository *r, const struct object_id *oid, + enum object_type type); int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, unsigned long size); int parse_tag(struct tag *item); void release_tag_memory(struct tag *t); diff --git a/tree.c b/tree.c index 410e3b477e5..1a730249bb8 100644 --- a/tree.c +++ b/tree.c @@ -102,12 +102,19 @@ int cmp_cache_name_compare(const void *a_, const void *b_) ce2->name, ce2->ce_namelen, ce_stage(ce2)); } -struct tree *lookup_tree(struct repository *r, const struct object_id *oid) +struct tree *lookup_tree_type(struct repository *r, + const struct object_id *oid, + enum object_type type) { struct object *obj = lookup_object(r, oid); if (!obj) return create_object(r, oid, alloc_tree_node(r)); - return object_as_type(obj, OBJ_TREE, 0); + return object_as_type_hint(obj, OBJ_TREE, type); +} + +struct tree *lookup_tree(struct repository *r, const struct object_id *oid) +{ + return lookup_tree_type(r, oid, OBJ_NONE); } int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size) diff --git a/tree.h b/tree.h index 6efff003e21..4af3b617f3d 100644 --- a/tree.h +++ b/tree.h @@ -15,6 +15,8 @@ struct tree { extern const char *tree_type; struct tree *lookup_tree(struct repository *r, const struct object_id *oid); +struct tree *lookup_tree_type(struct repository *r, const struct object_id *oid, + enum object_type type); int parse_tree_buffer(struct tree *item, void *buffer, unsigned long size); -- 2.39.0.1153.g589e4efe9dc ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v2 2/3] tag: don't misreport type of tagged objects in errors 2022-12-30 1:52 ` [PATCH v2 2/3] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason @ 2022-12-30 6:07 ` Junio C Hamano 0 siblings, 0 replies; 22+ messages in thread From: Junio C Hamano @ 2022-12-30 6:07 UTC (permalink / raw) To: Ævar Arnfjörð Bjarmason Cc: git, Taylor Blau, Jeff King, Jonathan Tan, Kousik Sanagavarapu Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > diff --git a/blob.c b/blob.c > index 8f83523b0cd..d6e3d64213d 100644 > --- a/blob.c > +++ b/blob.c > @@ -5,12 +5,19 @@ > > const char *blob_type = "blob"; > > -struct blob *lookup_blob(struct repository *r, const struct object_id *oid) > +struct blob *lookup_blob_type(struct repository *r, > + const struct object_id *oid, > + enum object_type type) > { > struct object *obj = lookup_object(r, oid); > if (!obj) > return create_object(r, oid, alloc_blob_node(r)); > - return object_as_type(obj, OBJ_BLOB, 0); > + return object_as_type_hint(obj, OBJ_BLOB, type); > +} > + > +struct blob *lookup_blob(struct repository *r, const struct object_id *oid) > +{ > + return lookup_blob_type(r, oid, OBJ_NONE); > } Between lookup_blob() and lookup_blob_type(), the distinction is that the latter calls object_as_type_hint() to ensure that the real type of the given object is of the type, which is given as "hint"? Very confusing naming. Perhaps because "hint" argument given to "as-type-hint" is playing a role that is FAR stronger than "hint"? It sounds more like enforcement. Perhaps s/hint/check/ or something? I am not sure exactly why, but lookup_blob_type() does look confusing. A natural question anybody would ask after seeing the function mame and the extra parmater is: "If we want/expect to see a blob, why do we give an extra type?" To answer that question, "_type" at the end of the function name and "type" that is an extra argument should say what that thing is used for. "enum object_type" is enough to say what the extra argument is---we should use the parameter name to convey what it is there for. > +void *object_as_type_hint(struct object *obj, enum object_type type, > + enum object_type hint) > +{ > + if (hint != OBJ_NONE && obj->type != OBJ_NONE && obj->type != type) { > + error(_("object %s is a %s, not a %s"), oid_to_hex(&obj->oid), > + type_name(type), type_name(obj->type)); > + obj->type = type; > + return NULL; > + } There must be something utterly wrong in the above implementation. As long as "hint" is ANY type other than OBJ_NONE, "check if type and obj->type match and otherwise complain" logic kicks in, and the actual value of "hint" does not contribute anything to the outcome. IOW, the "hint" parameter is misleading to be "enum object_type", as it is only used as a Boolean "do we bother to check?". > + return object_as_type(obj, type, 0);; Stray extra semicolon at the end. > +} ^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v2 3/3] tag: don't emit potentially incorrect "object is a X, not a Y" 2022-12-30 1:52 ` [PATCH v2 0/3] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason 2022-12-30 1:52 ` [PATCH v2 1/3] object tests: add test for unexpected objects in tags Ævar Arnfjörð Bjarmason 2022-12-30 1:52 ` [PATCH v2 2/3] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason @ 2022-12-30 1:52 ` Ævar Arnfjörð Bjarmason 2 siblings, 0 replies; 22+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-12-30 1:52 UTC (permalink / raw) To: git Cc: Taylor Blau, Junio C Hamano, Jeff King, Jonathan Tan, Kousik Sanagavarapu, Ævar Arnfjörð Bjarmason As noted in the preceding commit we weren't handling cases where we see a reference to a bad "type" in a "tag", but then end up not fully parsing the object. In those cases let's only claim that we have a bad tag pointer, but emit "is a %s, not a %s". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- t/t6102-rev-list-unexpected-objects.sh | 4 ++-- tag.c | 5 +++-- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/t/t6102-rev-list-unexpected-objects.sh b/t/t6102-rev-list-unexpected-objects.sh index 590e2523d0c..98e36e73698 100755 --- a/t/t6102-rev-list-unexpected-objects.sh +++ b/t/t6102-rev-list-unexpected-objects.sh @@ -231,7 +231,7 @@ test_expect_success !SANITIZE_LEAK 'traverse unexpected non-tag tag (tree seen t ' -test_expect_failure !SANITIZE_LEAK 'traverse unexpected objects with for-each-ref' ' +test_expect_success !SANITIZE_LEAK 'traverse unexpected objects with for-each-ref' ' cat >expect <<-EOF && error: bad tag pointer to $tree in $tag_tag_tree fatal: parse_object_buffer failed on $tag_tag_tree for refs/tags/tag_tag_tree @@ -251,7 +251,7 @@ test_expect_success 'setup: unexpected objects with fsck' ' while read oid type do - test_expect_failure "fsck knows unexpected object $oid is $type" ' + test_expect_success "fsck knows unexpected object $oid is $type" ' git cat-file -t $oid >expect && echo $type >actual && test_cmp expect actual diff --git a/tag.c b/tag.c index 19453c2edbf..ad92cf89209 100644 --- a/tag.c +++ b/tag.c @@ -193,8 +193,9 @@ int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u } if (!item->tagged || strcmp(type_name(item->tagged->type), type)) { - error(_("object %s is a %s, not a %s"), oid_to_hex(&oid), - type_name(item->tagged->type), type); + if (item->tagged && item->tagged->parsed) + error(_("object %s is a %s, not a %s"), oid_to_hex(&oid), + type_name(item->tagged->type), type); return error("bad tag pointer to %s in %s", oid_to_hex(&oid), oid_to_hex(&item->object.oid)); -- 2.39.0.1153.g589e4efe9dc ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 0/2] fixing parse_object() check for type mismatch 2022-11-17 22:37 ` [PATCH 0/2] fixing parse_object() check for type mismatch Jeff King ` (2 preceding siblings ...) 2022-11-18 11:46 ` [PATCH 0/4] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason @ 2022-11-18 19:05 ` Taylor Blau 2022-11-21 19:26 ` Jeff King 3 siblings, 1 reply; 22+ messages in thread From: Taylor Blau @ 2022-11-18 19:05 UTC (permalink / raw) To: Jeff King; +Cc: Jonathan Tan, Kousik Sanagavarapu, git On Thu, Nov 17, 2022 at 05:37:29PM -0500, Jeff King wrote: > I'm adding Taylor to the cc as the author of t6102, when we were > tracking down all of these "oops, it's not really a blob" cases. This > fixes one of the lingering cases from that test script. > > [1/2]: parse_object(): drop extra "has" check before checking object type > [2/2]: parse_object(): check on-disk type of suspected blob > > object.c | 5 ++--- > t/t6102-rev-list-unexpected-objects.sh | 4 ++-- > 2 files changed, 4 insertions(+), 5 deletions(-) A blast from the past :-). I took a careful look at both of these patches and they looked good to me, so let's start merging them down. Thanks, Taylor ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH 0/2] fixing parse_object() check for type mismatch 2022-11-18 19:05 ` [PATCH 0/2] fixing parse_object() check for type mismatch Taylor Blau @ 2022-11-21 19:26 ` Jeff King 2022-11-22 0:05 ` Ævar Arnfjörð Bjarmason 0 siblings, 1 reply; 22+ messages in thread From: Jeff King @ 2022-11-21 19:26 UTC (permalink / raw) To: Taylor Blau Cc: Ævar Arnfjörð Bjarmason, Junio C Hamano, Jonathan Tan, Kousik Sanagavarapu, git On Fri, Nov 18, 2022 at 02:05:04PM -0500, Taylor Blau wrote: > On Thu, Nov 17, 2022 at 05:37:29PM -0500, Jeff King wrote: > > I'm adding Taylor to the cc as the author of t6102, when we were > > tracking down all of these "oops, it's not really a blob" cases. This > > fixes one of the lingering cases from that test script. > > > > [1/2]: parse_object(): drop extra "has" check before checking object type > > [2/2]: parse_object(): check on-disk type of suspected blob > > > > object.c | 5 ++--- > > t/t6102-rev-list-unexpected-objects.sh | 4 ++-- > > 2 files changed, 4 insertions(+), 5 deletions(-) > > A blast from the past :-). > > I took a careful look at both of these patches and they looked good to > me, so let's start merging them down. I saw this hit 'next', but I think Ævar's simplification suggestion is worth taking. So here is a patch on top to do so (the original branch is jk/parse-object-type-mismatch for the benefit of any newly-returned maintainers). I was going to do a "helped-by", but since the only thing in the patch is the suggested change, I just handed over authorship. :) I didn't forge a signoff, and I think mine is sufficient under DCO's part (b), but Ævar please indicate if that's OK. -- >8 -- From: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Subject: [PATCH] parse_object(): simplify blob conditional Commit 8db2dad7a0 (parse_object(): check on-disk type of suspected blob, 2022-11-17) simplified the conditional for checking if we might have a blob. But we can simplify it further. In: !obj || (obj && obj->type == OBJ_BLOB) the short-circuit "OR" means "obj" will always be true on the right-hand side. The compiler almost certainly optimized that out anyway, but dropping it makes the conditional easier to understand for humans. Signed-off-by: Jeff King <peff@peff.net> --- object.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/object.c b/object.c index fad1a5af4a..682b852a46 100644 --- a/object.c +++ b/object.c @@ -286,7 +286,7 @@ struct object *parse_object_with_flags(struct repository *r, return &commit->object; } - if ((!obj || (obj && obj->type == OBJ_BLOB)) && + if ((!obj || obj->type == OBJ_BLOB) && oid_object_info(r, oid, NULL) == OBJ_BLOB) { if (!skip_hash && stream_object_signature(r, repl) < 0) { error(_("hash mismatch %s"), oid_to_hex(oid)); -- 2.38.1.950.g65ed8c1df8 ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH 0/2] fixing parse_object() check for type mismatch 2022-11-21 19:26 ` Jeff King @ 2022-11-22 0:05 ` Ævar Arnfjörð Bjarmason 0 siblings, 0 replies; 22+ messages in thread From: Ævar Arnfjörð Bjarmason @ 2022-11-22 0:05 UTC (permalink / raw) To: Jeff King Cc: Taylor Blau, Junio C Hamano, Jonathan Tan, Kousik Sanagavarapu, git On Mon, Nov 21 2022, Jeff King wrote: > On Fri, Nov 18, 2022 at 02:05:04PM -0500, Taylor Blau wrote: > >> On Thu, Nov 17, 2022 at 05:37:29PM -0500, Jeff King wrote: >> > I'm adding Taylor to the cc as the author of t6102, when we were >> > tracking down all of these "oops, it's not really a blob" cases. This >> > fixes one of the lingering cases from that test script. >> > >> > [1/2]: parse_object(): drop extra "has" check before checking object type >> > [2/2]: parse_object(): check on-disk type of suspected blob >> > >> > object.c | 5 ++--- >> > t/t6102-rev-list-unexpected-objects.sh | 4 ++-- >> > 2 files changed, 4 insertions(+), 5 deletions(-) >> >> A blast from the past :-). >> >> I took a careful look at both of these patches and they looked good to >> me, so let's start merging them down. > > I saw this hit 'next', but I think Ævar's simplification suggestion is > worth taking. So here is a patch on top to do so (the original branch is > jk/parse-object-type-mismatch for the benefit of any newly-returned > maintainers). > > I was going to do a "helped-by", but since the only thing in the patch > is the suggested change, I just handed over authorship. :) > > I didn't forge a signoff, and I think mine is sufficient under DCO's > part (b), but Ævar please indicate if that's OK. This looks good to me, thanks for following up. In case my SOB is needed feel free to add it, but it's fine without that too as far as I'm concerned. > -- >8 -- > From: Ævar Arnfjörð Bjarmason <avarab@gmail.com> > Subject: [PATCH] parse_object(): simplify blob conditional > > Commit 8db2dad7a0 (parse_object(): check on-disk type of suspected blob, > 2022-11-17) simplified the conditional for checking if we might have a > blob. But we can simplify it further. In: > > !obj || (obj && obj->type == OBJ_BLOB) > > the short-circuit "OR" means "obj" will always be true on the right-hand > side. The compiler almost certainly optimized that out anyway, but > dropping it makes the conditional easier to understand for humans. > > Signed-off-by: Jeff King <peff@peff.net> > --- > object.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/object.c b/object.c > index fad1a5af4a..682b852a46 100644 > --- a/object.c > +++ b/object.c > @@ -286,7 +286,7 @@ struct object *parse_object_with_flags(struct repository *r, > return &commit->object; > } > > - if ((!obj || (obj && obj->type == OBJ_BLOB)) && > + if ((!obj || obj->type == OBJ_BLOB) && > oid_object_info(r, oid, NULL) == OBJ_BLOB) { > if (!skip_hash && stream_object_signature(r, repl) < 0) { > error(_("hash mismatch %s"), oid_to_hex(oid)); ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2022-12-30 6:07 UTC | newest] Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-11-16 16:39 [RFC][PATCH] object.c: use has_object() instead of repo_has_object_file() Kousik Sanagavarapu 2022-11-16 18:20 ` Jeff King 2022-11-16 21:14 ` Jonathan Tan 2022-11-17 22:37 ` [PATCH 0/2] fixing parse_object() check for type mismatch Jeff King 2022-11-17 22:37 ` [PATCH 1/2] parse_object(): drop extra "has" check before checking object type Jeff King 2022-11-17 22:41 ` [PATCH 2/2] parse_object(): check on-disk type of suspected blob Jeff King 2022-11-18 0:36 ` Ævar Arnfjörð Bjarmason 2022-11-21 19:21 ` Jeff King 2022-11-18 11:46 ` [PATCH 0/4] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason 2022-11-18 11:46 ` [PATCH 1/4] object-file.c: free the "t.tag" in check_tag() Ævar Arnfjörð Bjarmason 2022-11-18 11:46 ` [PATCH 2/4] object tests: add test for unexpected objects in tags Ævar Arnfjörð Bjarmason 2022-11-18 11:46 ` [PATCH 3/4] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason 2022-11-18 11:46 ` [PATCH 4/4] tag: don't emit potentially incorrect "object is a X, not a Y" Ævar Arnfjörð Bjarmason 2022-12-30 1:52 ` [PATCH v2 0/3] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason 2022-12-30 1:52 ` [PATCH v2 1/3] object tests: add test for unexpected objects in tags Ævar Arnfjörð Bjarmason 2022-12-30 4:25 ` Junio C Hamano 2022-12-30 1:52 ` [PATCH v2 2/3] tag: don't misreport type of tagged objects in errors Ævar Arnfjörð Bjarmason 2022-12-30 6:07 ` Junio C Hamano 2022-12-30 1:52 ` [PATCH v2 3/3] tag: don't emit potentially incorrect "object is a X, not a Y" Ævar Arnfjörð Bjarmason 2022-11-18 19:05 ` [PATCH 0/2] fixing parse_object() check for type mismatch Taylor Blau 2022-11-21 19:26 ` Jeff King 2022-11-22 0:05 ` Ævar Arnfjörð Bjarmason
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).