git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Jeff King <peff@peff.net>
Cc: Taylor Blau <me@ttaylorr.com>,
	git@vger.kernel.org, gitster@pobox.com, stolee@gmail.com
Subject: Re: [PATCH 3/3] commit-graph.c: handle corrupt/missing trees
Date: Fri, 6 Sep 2019 11:42:14 -0400	[thread overview]
Message-ID: <20190906154214.GA3657@syl.local> (raw)
In-Reply-To: <20190906061919.GA5122@sigill.intra.peff.net>

On Fri, Sep 06, 2019 at 02:19:20AM -0400, Jeff King wrote:
> On Thu, Sep 05, 2019 at 06:04:57PM -0400, Taylor Blau wrote:
>
> > @@ -846,7 +847,11 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
> >  		if (parse_commit_no_graph(*list))
> >  			die(_("unable to parse commit %s"),
> >  				oid_to_hex(&(*list)->object.oid));
> > -		hashwrite(f, get_commit_tree_oid(*list)->hash, hash_len);
> > +		tree = get_commit_tree_oid(*list);
> > +		if (!tree)
> > +			die(_("unable to get tree for %s"),
> > +				oid_to_hex(&(*list)->object.oid));
> > +		hashwrite(f, tree->hash, hash_len);
>
> Yeah, I think this is a good stop-gap to protect ourselves, until a time
> when parse_commit() and friends consistently warn us about the breakage.
>
> > diff --git a/commit.c b/commit.c
> > index a98de16e3d..fab22cb740 100644
> > --- a/commit.c
> > +++ b/commit.c
> > @@ -358,7 +358,8 @@ struct tree *repo_get_commit_tree(struct repository *r,
> >
> >  struct object_id *get_commit_tree_oid(const struct commit *commit)
> >  {
> > -	return &get_commit_tree(commit)->object.oid;
> > +	struct tree *tree = get_commit_tree(commit);
> > +	return tree ? &tree->object.oid : NULL;
> >  }

You mentioned in the version of this series that is rebased on GitHub's
fork that it may be worth putting this hunk in a separate commit
entirely. I don't disagree, so if there are other comments that merit a
reroll of this, I'm happy to pull this change out as 3/4.

> This one in theory benefits lots of other callsites, too, since it means
> we'll actually return NULL instead of nonsense like "8". But grepping
> around for calls to this function, I found literally zero of them
> actually bother checking for a NULL result. So there are probably dozens
> of similar segfaults waiting to happen in other code paths.
> Discouraging.

Discouraging indeed. I think that you suggest it below, but perhaps the
right thing to do here is implement 'get_commit_tree_oid()' as follows:

  struct object_id *get_commit_tree_oid(const struct commit *commit)
  {
    struct tree *tree = get_commit_tree(commit);
    if (!tree)
      die(_("unable to get tree from commit %s"),
          oid_to_hex(&commit->object.oid));
    return &tree->object.oid;
  }

Which then puts the onus on the *caller* to check their commit pointer
to make sure that it has a legit tree in it, unless they're OK with
dying.

In the commit-graph change that this whole thread is in response to,
that's exactly what we want: I don't want to have to check the return
value of two function calls myself. I'm perfectly happy to die() in the
middle of things if there is an object corruption, but the library code
should take care of that for me, and not allow for dozens of checks,
each with their own unique 'die()'-ing message.

All of that said, I don't know if I think it's worth holding this series
up on the above in the meantime. I do think that it (or something like
it) is generally worth doing, but I'm not sure that now is the time to
do it.

> This is sort-of attributable to my 834876630b (get_commit_tree(): return
> NULL for broken tree, 2019-04-09). Before then it was a BUG(). However,
> that state was relatively short-lived. Before 7b8a21dba1 (commit-graph:
> lazy-load trees for commits, 2018-04-06), we'd have similarly returned
> NULL (and anyway, BUG() is clearly wrong since it's a data error).

Ha, I was wondering why that commit message looked familiar... it turns
out that I'm the culprit, too, via the 'Co-authored-by' trailer. Oops.

> None of which argues against your patches, but it's kind of sad that the
> issue is present in so many code paths. I wonder if we could be handling
> this in a more central way, but I don't see how short of dying.
>
> -Peff
Thanks,
Taylor

  reply	other threads:[~2019-09-06 15:42 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-05 22:04 [PATCH 0/3] commit-graph: harden against various corruptions Taylor Blau
2019-09-05 22:04 ` [PATCH 1/3] t/t5318: introduce failing 'git commit-graph write' tests Taylor Blau
2019-09-06 16:48   ` Derrick Stolee
2019-09-05 22:04 ` [PATCH 2/3] commit-graph.c: handle commit parsing errors Taylor Blau
2019-09-05 22:04 ` [PATCH 3/3] commit-graph.c: handle corrupt/missing trees Taylor Blau
2019-09-06  6:19   ` Jeff King
2019-09-06 15:42     ` Taylor Blau [this message]
2019-09-06 17:34       ` Jeff King
2019-09-06 16:51     ` Derrick Stolee
2019-09-06 17:37       ` Jeff King
2019-09-06 16:57     ` Junio C Hamano
2019-09-06 17:11       ` Junio C Hamano
2019-09-06 17:30         ` Jeff King
2019-09-06 17:28       ` Jeff King
2019-09-09 17:55         ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190906154214.GA3657@syl.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).