git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: "Stefan Sperling" <stsp@stsp.name>, "René Scharfe" <l.s.r@web.de>,
	git@vger.kernel.org
Subject: Re: [PATCH] fix segv with corrupt tag object
Date: Mon, 26 Aug 2019 14:18:24 -0400	[thread overview]
Message-ID: <20190826181824.GA22960@sigill.intra.peff.net> (raw)
In-Reply-To: <xmqqo90bhmi3.fsf@gitster-ct.c.googlers.com>

On Mon, Aug 26, 2019 at 10:20:20AM -0700, Junio C Hamano wrote:

> Stefan Sperling <stsp@stsp.name> writes:
> 
> > The root cause of this bug seems to be that the valid assumption
> > that obj->parsed implies a successfully parsed object is broken by
> > parse_tag_buffer() because this function sets the 'parsed' flag even
> > if errors occur during parsing.
> 
> I am mildly negative about that approach.  obj->parsed is about
> "we've done all we need to do to attempt parsing this object" (so
> that next person who gets hold of the object knows that fact---one
> of the reasons why may be that the caller who wants to ensure that
> the fields are ready to be accessed does not have to spend extra
> cycles, but that is not the only one).  Those that want to look at
> various fields in the object (e.g. the tagged object of a tag, the
> tagger identity of a tag, etc.) should be prepared to see and react
> to NULL in there so that they can gracefully handle "slightly"
> corrupt objects.

It seems like the right place to notice "we did not parse correctly" is
an error return from parse_tag_buffer(). We're not calling it ourselves
in this instance, but it looks like it does get propagated from
parse_object(), which would yield NULL.

I wonder if some earlier caller in checkout/archive is ignoring a parse
failure, and continuing to work with the object anyway.

Avoiding setting the parse flag is a cheap way to make sure that the
later calls re-attempt the parse and notice the error themselves. That
wastes some work in the case of a bogus tag, but callers who want to
view the corrupted state aren't really any worse off.

That said, the error condition touched by Stefan's updated patch is not
sufficient to guarantee that tag->tagged is non-NULL (whether we detect
the error case by return code or by lack of "parsed" flag). The code
does this:

          if (!strcmp(type, blob_type)) {
                  item->tagged = (struct object *)lookup_blob(r, &oid);
          } else if (!strcmp(type, tree_type)) {
                  item->tagged = (struct object *)lookup_tree(r, &oid);
          } else if (!strcmp(type, commit_type)) {
                  item->tagged = (struct object *)lookup_commit(r, &oid);
          } else if (!strcmp(type, tag_type)) {
                  item->tagged = (struct object *)lookup_tag(r, &oid);
          } else {
                  error("Unknown type %s", type);
                  item->tagged = NULL;
          }

Any of those lookup_* functions may also return. It's relatively rare,
since we don't actually confirm the type against the object database at
that time. But it can happen if the same program already saw that
particular oid as another type. This is tricky to trigger for
checkout/archive because they generally parse the tag first (but not
impossible; e.g., some config like mailmap.blob may read objects early).
But anything using the revision parser is happy to read multiple
objects.

If we want to cover all cases, probably something like:

  if (!item->tagged)
	ret = -1;

would be simplest.

-Peff

  parent reply	other threads:[~2019-08-26 18:18 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-24 23:09 [PATCH] fix segv with corrupt tag object Stefan Sperling
2019-08-25  7:52 ` René Scharfe
2019-08-26 11:57   ` Stefan Sperling
2019-08-26 17:20     ` Junio C Hamano
2019-08-26 18:02       ` Stefan Sperling
2019-08-26 18:18       ` Jeff King [this message]
2019-08-29 19:06       ` René Scharfe
2019-08-30 16:29         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190826181824.GA22960@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=l.s.r@web.de \
    --cc=stsp@stsp.name \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).