git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] describe: confirm that blobs actually exist
@ 2018-02-12 17:23 Jeff King
  2018-02-12 17:29 ` Jeff King
  2018-02-12 18:38 ` Stefan Beller
  0 siblings, 2 replies; 3+ messages in thread
From: Jeff King @ 2018-02-12 17:23 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Stefan Beller

Prior to 644eb60bd0 (builtin/describe.c: describe a blob,
2017-11-15), we noticed and complained about missing
objects, since they were not valid commits:

  $ git describe 0000000000000000000000000000000000000000
  fatal: 0000000000000000000000000000000000000000 is not a valid 'commit' object

After that commit, we feed any non-commit to lookup_blob(),
and complain only if it returns NULL. But the lookup_*
functions do not actually look at the on-disk object
database at all. They return an entry from the in-memory
object hash if present (and if it matches the requested
type), and otherwise auto-create a "struct object" of the
requested type.

A missing object would hit that latter case: we create a
bogus blob struct, walk all of history looking for it, and
then exit successfully having produced no output.

One reason nobody may have noticed this is that some related
cases do still work OK:

  1. If we ask for a tree by sha1, then the call to
     lookup_commit_referecne_gently() would have parsed it,
     and we would have its true type in the in-memory object
     hash.

  2. If we ask for a name that doesn't exist but isn't a
     40-hex sha1, then get_oid() would complain before we
     even look at the objects at all.

We can fix this by replacing the lookup_blob() call with a
check of the true type via sha1_object_info(). This is not
quite as efficient as we could possibly make this check. We
know in most cases that the object was already parsed in the
earlier commit lookup, so we could call lookup_object(),
which does auto-create, and check the resulting struct's
type (or NULL).  However it's not worth the fragility nor
code complexity to save a single object lookup.

The new tests cover this case, as well as that of a
tree-by-sha1 (which does work as described above, but was
not explicitly tested).

Noticed-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Jeff King <peff@peff.net>
---
 builtin/describe.c  | 2 +-
 t/t6120-describe.sh | 8 ++++++++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index 6fe1c51281..18c68ec7a4 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -502,7 +502,7 @@ static void describe(const char *arg, int last_one)
 
 	if (cmit)
 		describe_commit(&oid, &sb);
-	else if (lookup_blob(&oid))
+	else if (sha1_object_info(oid.hash, NULL) == OBJ_BLOB)
 		describe_blob(oid, &sb);
 	else
 		die(_("%s is neither a commit nor blob"), arg);
diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index a5d9015024..bae78c4e89 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -378,4 +378,12 @@ check_describe tags/A --all A
 check_describe tags/c --all c
 check_describe heads/branch_A --all --match='branch_*' branch_A
 
+test_expect_success 'describe complains about tree object' '
+	test_must_fail git describe HEAD^{tree}
+'
+
+test_expect_success 'describe complains about missing object' '
+	test_must_fail git describe $_z40
+'
+
 test_done
-- 
2.16.1.464.gc4bae515b7

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] describe: confirm that blobs actually exist
  2018-02-12 17:23 [PATCH] describe: confirm that blobs actually exist Jeff King
@ 2018-02-12 17:29 ` Jeff King
  2018-02-12 18:38 ` Stefan Beller
  1 sibling, 0 replies; 3+ messages in thread
From: Jeff King @ 2018-02-12 17:29 UTC (permalink / raw)
  To: git; +Cc: Michael Haggerty, Stefan Beller

On Mon, Feb 12, 2018 at 12:23:06PM -0500, Jeff King wrote:

> We can fix this by replacing the lookup_blob() call with a
> check of the true type via sha1_object_info(). This is not
> quite as efficient as we could possibly make this check. We
> know in most cases that the object was already parsed in the
> earlier commit lookup, so we could call lookup_object(),
> which does auto-create, and check the resulting struct's
> type (or NULL).  However it's not worth the fragility nor
> code complexity to save a single object lookup.

By the way, I did notice one other inefficiency here: we always call
lookup_commit_reference_gently() first, which will call parse_object().
So if you were to "git describe" an enormous blob, we'd load the whole
thing into memory for no purpose. We could structure this as:

  type = sha1_object_info(oid.hash, NULL);
  if (type == OBJ_BLOB)
          describe_blob(&oid);
  else if (lookup_commit_reference_gently(&oid, 1))
          describe_commit(&oid);
  else
          describe("neither commit nor blob");

That incurs an extra object lookup for the commit case, but potentially
saves reading the blob. We could have our cake and eat it, too, if
sha1_file.c had a function like "parse this object unless it's a blob,
in which case just fill in the type info".

Arguably that should be the default when parse_object() is called on a
blob, but I suspect some older code may rely on parse_object() to check
that the object is present and consistent.

Anyway, I don't know that it's really worth caring about too much, but
just something I noticed.

Maybe a #leftoverbits if somebody cares.

-Peff

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] describe: confirm that blobs actually exist
  2018-02-12 17:23 [PATCH] describe: confirm that blobs actually exist Jeff King
  2018-02-12 17:29 ` Jeff King
@ 2018-02-12 18:38 ` Stefan Beller
  1 sibling, 0 replies; 3+ messages in thread
From: Stefan Beller @ 2018-02-12 18:38 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Michael Haggerty

On Mon, Feb 12, 2018 at 9:23 AM, Jeff King <peff@peff.net> wrote:
> Prior to 644eb60bd0 (builtin/describe.c: describe a blob,
> 2017-11-15), we noticed and complained about missing
> objects, since they were not valid commits:
>
>   $ git describe 0000000000000000000000000000000000000000
>   fatal: 0000000000000000000000000000000000000000 is not a valid 'commit' object
>
> After that commit, we feed any non-commit to lookup_blob(),
> and complain only if it returns NULL. But the lookup_*
> functions do not actually look at the on-disk object
> database at all. They return an entry from the in-memory
> object hash if present (and if it matches the requested
> type), and otherwise auto-create a "struct object" of the
> requested type.
>
> A missing object would hit that latter case: we create a
> bogus blob struct, walk all of history looking for it, and
> then exit successfully having produced no output.
>
> One reason nobody may have noticed this is that some related
> cases do still work OK:
>
>   1. If we ask for a tree by sha1, then the call to
>      lookup_commit_referecne_gently() would have parsed it,

lookup_commit_reference_gently

>      and we would have its true type in the in-memory object
>      hash.
>
>   2. If we ask for a name that doesn't exist but isn't a
>      40-hex sha1, then get_oid() would complain before we
>      even look at the objects at all.
>
> We can fix this by replacing the lookup_blob() call with a
> check of the true type via sha1_object_info(). This is not
> quite as efficient as we could possibly make this check. We
> know in most cases that the object was already parsed in the
> earlier commit lookup, so we could call lookup_object(),
> which does auto-create, and check the resulting struct's
> type (or NULL).  However it's not worth the fragility nor
> code complexity to save a single object lookup.
>
> The new tests cover this case, as well as that of a
> tree-by-sha1 (which does work as described above, but was
> not explicitly tested).
>
> Noticed-by: Michael Haggerty <mhagger@alum.mit.edu>
> Signed-off-by: Jeff King <peff@peff.net>

This makes sense.
Reviewed-by: Stefan Beller <sbeller@google.com>

Thanks!
Stefan

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-02-12 18:38 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-12 17:23 [PATCH] describe: confirm that blobs actually exist Jeff King
2018-02-12 17:29 ` Jeff King
2018-02-12 18:38 ` Stefan Beller

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).