[PATCH 1/2] tag: factor out get_tagged

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH 1/2] tag: factor out get_tagged_oid()
@ 2019-09-05 19:55 René Scharfe
  2019-09-05 19:59 ` [PATCH 2/2] use get_tagged_oid() René Scharfe
  2019-09-06  7:13 ` [PATCH 1/2] tag: factor out get_tagged_oid() Jeff King
  0 siblings, 2 replies; 6+ messages in thread
From: René Scharfe @ 2019-09-05 19:55 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano, Stefan Sperling

Add a function for accessing the ID of the object referenced by a tag
safely, i.e. without causing a segfault when encountering a broken tag
where ->tagged is NULL.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 pack-bitmap.c | 4 +---
 revision.c    | 4 +---
 tag.c         | 7 +++++++
 tag.h         | 1 +
 4 files changed, 10 insertions(+), 6 deletions(-)

diff --git a/pack-bitmap.c b/pack-bitmap.c
index ed2befaac6..30842e1e74 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -709,9 +709,7 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs)
 			else
 				object_list_insert(object, &wants);

-			if (!tag->tagged)
-				die("bad tag");
-			object = parse_object_or_die(&tag->tagged->oid, NULL);
+			object = parse_object_or_die(get_tagged_oid(tag), NULL);
 		}

 		if (object->flags & UNINTERESTING)
diff --git a/revision.c b/revision.c
index 07412297f0..ee1b1552b9 100644
--- a/revision.c
+++ b/revision.c
@@ -404,9 +404,7 @@ static struct commit *handle_commit(struct rev_info *revs,
 		struct tag *tag = (struct tag *) object;
 		if (revs->tag_objects && !(flags & UNINTERESTING))
 			add_pending_object(revs, object, tag->tag);
-		if (!tag->tagged)
-			die("bad tag");
-		object = parse_object(revs->repo, &tag->tagged->oid);
+		object = parse_object(revs->repo, get_tagged_oid(tag));
 		if (!object) {
 			if (revs->ignore_missing_links || (flags & UNINTERESTING))
 				return NULL;
diff --git a/tag.c b/tag.c
index 5db870edb9..bfa0e31435 100644
--- a/tag.c
+++ b/tag.c
@@ -212,3 +212,10 @@ int parse_tag(struct tag *item)
 	free(data);
 	return ret;
 }
+
+struct object_id *get_tagged_oid(struct tag *tag)
+{
+	if (!tag->tagged)
+		die("bad tag");
+	return &tag->tagged->oid;
+}
diff --git a/tag.h b/tag.h
index 03265fbfe2..3ce8e72192 100644
--- a/tag.h
+++ b/tag.h
@@ -19,5 +19,6 @@ struct object *deref_tag(struct repository *r, struct object *, const char *, in
 struct object *deref_tag_noverify(struct object *);
 int gpg_verify_tag(const struct object_id *oid,
 		   const char *name_to_report, unsigned flags);
+struct object_id *get_tagged_oid(struct tag *tag);

 #endif /* TAG_H */
--
2.23.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] use get_tagged_oid()
  2019-09-05 19:55 [PATCH 1/2] tag: factor out get_tagged_oid() René Scharfe
@ 2019-09-05 19:59 ` René Scharfe
  2019-09-06  7:13 ` [PATCH 1/2] tag: factor out get_tagged_oid() Jeff King
  1 sibling, 0 replies; 6+ messages in thread
From: René Scharfe @ 2019-09-05 19:59 UTC (permalink / raw)
  To: Git Mailing List; +Cc: Junio C Hamano, Stefan Sperling

Avoid derefencing ->tagged without checking for NULL by using the
convenience wrapper for getting the ID of the tagged object.  It die()s
when encountering a broken tag instead of segfaulting.

Signed-off-by: René Scharfe <l.s.r@web.de>
---
 builtin/describe.c | 2 +-
 builtin/log.c      | 5 +++--
 builtin/replace.c  | 2 +-
 packfile.c         | 2 +-
 ref-filter.c       | 4 ++--
 5 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index 200154297d..e048f85484 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -313,7 +313,7 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst)
 		 */
 		append_name(n, dst);
 		if (longformat)
-			append_suffix(0, n->tag ? &n->tag->tagged->oid : oid, dst);
+			append_suffix(0, n->tag ? get_tagged_oid(n->tag) : oid, dst);
 		if (suffix)
 			strbuf_addstr(dst, suffix);
 		return;
diff --git a/builtin/log.c b/builtin/log.c
index 44b10b3415..c4b35fdaf9 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -627,6 +627,7 @@ int cmd_show(int argc, const char **argv, const char *prefix)
 			break;
 		case OBJ_TAG: {
 			struct tag *t = (struct tag *)o;
+			struct object_id *oid = get_tagged_oid(t);

 			if (rev.shown_one)
 				putchar('\n');
@@ -638,10 +639,10 @@ int cmd_show(int argc, const char **argv, const char *prefix)
 			rev.shown_one = 1;
 			if (ret)
 				break;
-			o = parse_object(the_repository, &t->tagged->oid);
+			o = parse_object(the_repository, oid);
 			if (!o)
 				ret = error(_("could not read object %s"),
-					    oid_to_hex(&t->tagged->oid));
+					    oid_to_hex(oid));
 			objects[i].item = o;
 			i--;
 			break;
diff --git a/builtin/replace.c b/builtin/replace.c
index 644b21ca8d..2a4afb3b93 100644
--- a/builtin/replace.c
+++ b/builtin/replace.c
@@ -421,7 +421,7 @@ static int check_one_mergetag(struct commit *commit,
 		if (get_oid(mergetag_data->argv[i], &oid) < 0)
 			return error(_("not a valid object name: '%s'"),
 				     mergetag_data->argv[i]);
-		if (oideq(&tag->tagged->oid, &oid))
+		if (oideq(get_tagged_oid(tag), &oid))
 			return 0; /* found */
 	}

diff --git a/packfile.c b/packfile.c
index fc43a6c52c..a62ab4cb17 100644
--- a/packfile.c
+++ b/packfile.c
@@ -2139,7 +2139,7 @@ static int add_promisor_object(const struct object_id *oid,
 			oidset_insert(set, &parents->item->object.oid);
 	} else if (obj->type == OBJ_TAG) {
 		struct tag *tag = (struct tag *) obj;
-		oidset_insert(set, &tag->tagged->oid);
+		oidset_insert(set, get_tagged_oid(tag));
 	}
 	return 0;
 }
diff --git a/ref-filter.c b/ref-filter.c
index f27cfc8c3e..8dcc17c049 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1766,7 +1766,7 @@ static int populate_value(struct ref_array_item *ref, struct strbuf *err)
 	 * If it is a tag object, see if we use a value that derefs
 	 * the object, and if we do grab the object it refers to.
 	 */
-	oi_deref.oid = ((struct tag *)obj)->tagged->oid;
+	oi_deref.oid = *get_tagged_oid((struct tag *)obj);

 	/*
 	 * NEEDSWORK: This derefs tag only once, which
@@ -1997,7 +1997,7 @@ static const struct object_id *match_points_at(struct oid_array *points_at,
 	if (!obj)
 		die(_("malformed object at '%s'"), refname);
 	if (obj->type == OBJ_TAG)
-		tagged_oid = &((struct tag *)obj)->tagged->oid;
+		tagged_oid = get_tagged_oid((struct tag *)obj);
 	if (tagged_oid && oid_array_lookup(points_at, tagged_oid) >= 0)
 		return tagged_oid;
 	return NULL;
--
2.23.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] tag: factor out get_tagged_oid()
  2019-09-05 19:55 [PATCH 1/2] tag: factor out get_tagged_oid() René Scharfe
  2019-09-05 19:59 ` [PATCH 2/2] use get_tagged_oid() René Scharfe
@ 2019-09-06  7:13 ` Jeff King
  2019-09-06 15:05   ` René Scharfe
  1 sibling, 1 reply; 6+ messages in thread
From: Jeff King @ 2019-09-06  7:13 UTC (permalink / raw)
  To: René Scharfe; +Cc: Git Mailing List, Junio C Hamano, Stefan Sperling

On Thu, Sep 05, 2019 at 09:55:55PM +0200, René Scharfe wrote:

> Add a function for accessing the ID of the object referenced by a tag
> safely, i.e. without causing a segfault when encountering a broken tag
> where ->tagged is NULL.

This approach seems to pretty reasonable. As somebody who's been
thinking about this, I'd be curious to hear your thoughts on:

  https://public-inbox.org/git/20190906065606.GC5122@sigill.intra.peff.net/

which _in theory_ means tag->tagged would never be NULL (we'd catch it
at the parsing stage and consider that an error). But we'd still
potentially want to protect ourselves as you do here for code paths
which don't necessarily check the parse result.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] tag: factor out get_tagged_oid()
  2019-09-06  7:13 ` [PATCH 1/2] tag: factor out get_tagged_oid() Jeff King
@ 2019-09-06 15:05   ` René Scharfe
  2019-09-06 15:25     ` René Scharfe
  2019-09-06 17:51     ` Jeff King
  0 siblings, 2 replies; 6+ messages in thread
From: René Scharfe @ 2019-09-06 15:05 UTC (permalink / raw)
  To: Jeff King
  Cc: Git Mailing List, Junio C Hamano, Stefan Sperling, Martin Koegler

Am 06.09.19 um 09:13 schrieb Jeff King:
> On Thu, Sep 05, 2019 at 09:55:55PM +0200, René Scharfe wrote:
>
>> Add a function for accessing the ID of the object referenced by a tag
>> safely, i.e. without causing a segfault when encountering a broken tag
>> where ->tagged is NULL.
>
> This approach seems to pretty reasonable. As somebody who's been
> thinking about this, I'd be curious to hear your thoughts on:
>
>   https://public-inbox.org/git/20190906065606.GC5122@sigill.intra.peff.net/
>
> which _in theory_ means tag->tagged would never be NULL (we'd catch it
> at the parsing stage and consider that an error). But we'd still
> potentially want to protect ourselves as you do here for code paths
> which don't necessarily check the parse result.

A tag referencing an unknown object sounds strange to me.  I imagine we
might get such a thing when the referenced object is lost (broken repo)
or purpose-built from an attacker.  Could such a tag still be used for
anything?  Are there other possible causes?  I suspect the answer to
both questions is "no", and then it makes sense to reject it as early
as possible.

But I may be missing something.  In particular I'm confused by these
patches from February 2008, which seem to suggest that such tags should
not be reported in all cases, but sometimes just silently ignored:

   9684afd967 revision.c: handle tag->tagged == NULL
   cc36934791 process_tag: handle tag->tagged == NULL
   24e8a3c946 deref_tag: handle tag->tagged = NULL

So is there perhaps a use case for them after all?

Leaving that aside: The parsed flag means we saw and checked the object
already.  That is true also for broken objects.  Clearing the flag can
cause the same error to be reported multiple times.  How about setting
it at the start as before, but returning -1 from parse_tag_buffer() if
.parsed == 1 && .tagged == NULL?

René

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] tag: factor out get_tagged_oid()
  2019-09-06 15:05   ` René Scharfe
@ 2019-09-06 15:25     ` René Scharfe
  2019-09-06 17:51     ` Jeff King
  1 sibling, 0 replies; 6+ messages in thread
From: René Scharfe @ 2019-09-06 15:25 UTC (permalink / raw)
  To: Jeff King
  Cc: Git Mailing List, Junio C Hamano, Stefan Sperling, Martin Koegler

Am 06.09.19 um 17:05 schrieb René Scharfe:
> A tag referencing an unknown object sounds strange to me.  I imagine we
> might get such a thing when the referenced object is lost (broken repo)
> or purpose-built from an attacker.  Could such a tag still be used for
> anything?  Are there other possible causes?
Forward compatibility perhaps, i.e. supporting tags (by ignoring them)
that point to a new type of object introduced by a future version.

René

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] tag: factor out get_tagged_oid()
  2019-09-06 15:05   ` René Scharfe
  2019-09-06 15:25     ` René Scharfe
@ 2019-09-06 17:51     ` Jeff King
  1 sibling, 0 replies; 6+ messages in thread
From: Jeff King @ 2019-09-06 17:51 UTC (permalink / raw)
  To: René Scharfe
  Cc: Git Mailing List, Junio C Hamano, Stefan Sperling, Martin Koegler

On Fri, Sep 06, 2019 at 05:05:11PM +0200, René Scharfe wrote:

> > This approach seems to pretty reasonable. As somebody who's been
> > thinking about this, I'd be curious to hear your thoughts on:
> >
> >   https://public-inbox.org/git/20190906065606.GC5122@sigill.intra.peff.net/
> >
> > which _in theory_ means tag->tagged would never be NULL (we'd catch it
> > at the parsing stage and consider that an error). But we'd still
> > potentially want to protect ourselves as you do here for code paths
> > which don't necessarily check the parse result.
> 
> A tag referencing an unknown object sounds strange to me.  I imagine we
> might get such a thing when the referenced object is lost (broken repo)
> or purpose-built from an attacker.  Could such a tag still be used for
> anything?  Are there other possible causes?  I suspect the answer to
> both questions is "no", and then it makes sense to reject it as early
> as possible.

I don't think there's really a valid case. Keep in mind that this isn't
checking for the object on-disk at all. It's only checking:

  - were we able to parse the tag at all (i.e., is it syntactically
    valid)

  - does it have a type we know about

  - were we able to create an in-memory struct, meaning we did not see
    the same oid with a different type elsewhere in the same process

The first and middle imply a corruption or malicious attack. The second
could mean forward-compatibility is a concern, but I think this is the
tip of the iceberg. You really need a Git that understands the object
types you're working with, or basically everything is going to fail.

So I'm not too concerned with tightening this code. The question to me
is whether to do it in parse_tag(), or when accessing tag->tagged, or
both.

> But I may be missing something.  In particular I'm confused by these
> patches from February 2008, which seem to suggest that such tags should
> not be reported in all cases, but sometimes just silently ignored:
> 
>    9684afd967 revision.c: handle tag->tagged == NULL
>    cc36934791 process_tag: handle tag->tagged == NULL
>    24e8a3c946 deref_tag: handle tag->tagged = NULL
> 
> So is there perhaps a use case for them after all?

There's not much discussion on the list from that thread, but searching
for messages from Martin around that the main goal was protecting
against corruptions. So I think these NULL checks were attempting to do
the same thing we're doing here, just in a less centralized way.

> Leaving that aside: The parsed flag means we saw and checked the object
> already.  That is true also for broken objects.  Clearing the flag can
> cause the same error to be reported multiple times.  How about setting
> it at the start as before, but returning -1 from parse_tag_buffer() if
> .parsed == 1 && .tagged == NULL?

That wouldn't cover all of the other reasons that we might have seen a
parse failure (so it fixes "tagged == NULL", but means that a subsequent
parse_tag() will silently return success).

I think the multiple error messages may be OK given that this is
explicitly covering a corrupt situation. But the more robust way is
adding an extra bit to say "I parsed this already, and it was corrupt"
to cause parse_tag() to return an error reliably.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-09-06 17:51 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-05 19:55 [PATCH 1/2] tag: factor out get_tagged_oid() René Scharfe
2019-09-05 19:59 ` [PATCH 2/2] use get_tagged_oid() René Scharfe
2019-09-06  7:13 ` [PATCH 1/2] tag: factor out get_tagged_oid() Jeff King
2019-09-06 15:05   ` René Scharfe
2019-09-06 15:25     ` René Scharfe
2019-09-06 17:51     ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).