git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Taylor Blau <me@ttaylorr.com>
To: Jeff King <peff@peff.net>
Cc: Taylor Blau <me@ttaylorr.com>, Junio C Hamano <gitster@pobox.com>,
	stolee@gmail.com, git@vger.kernel.org
Subject: Re: [RFC PATCH 1/1] commit-graph.c: die on un-parseable commits
Date: Wed, 4 Sep 2019 17:18:47 -0400	[thread overview]
Message-ID: <20190904211847.GA20904@syl.local> (raw)
In-Reply-To: <20190904030456.GA28836@sigill.intra.peff.net>

Hi Peff,

On Tue, Sep 03, 2019 at 11:04:56PM -0400, Jeff King wrote:
> On Tue, Sep 03, 2019 at 10:22:57PM -0400, Taylor Blau wrote:
>
> > When we write a commit graph chunk, we process a given list of 'struct
> > commit *'s and parse out the parent(s) and tree OID in order to write
> > out its information.
> >
> > We do this by calling 'parse_commit_no_graph', and then checking the
> > result of 'get_commit_tree_oid' to write the tree OID. This process
> > assumes that 'parse_commit_no_graph' parses the commit successfully.
> > When this isn't the case, 'get_commit_tree_oid(*list)' may return NULL,
> > in which case trying to '->hash' it causes a SIGSEGV.
> >
> > Instead, teach 'write_graph_chunk_data' to stop when a commit isn't able
> > to be parsed, at the peril of failing to write a commit-graph.
>
> Yeah, I think it makes sense for commit-graph to bail completely if it
> can't parse here. In theory it could omit the entry from the
> commit-graph file (and a reader would just fall back to trying to access
> the object contents itself), but I think we're too late in the process
> for that. And besides, this should generally only happen in a corrupt
> repository.

Yep. I sent this as an RFC PATCH because I wasn't quite sure how folks
would feel about 'die()'-ing in the middle of 'write_graph_chunk_data',
but I think that your reasoning makes sense, and it matches my own
preferences.

So, I am glad that we resolved that portion. I'll keep going below...

> However...
>
> > diff --git a/commit-graph.c b/commit-graph.c
> > index f2888c203b..6aa6998ecd 100644
> > --- a/commit-graph.c
> > +++ b/commit-graph.c
> > @@ -843,7 +843,9 @@ static void write_graph_chunk_data(struct hashfile *f, int hash_len,
> >  		uint32_t packedDate[2];
> >  		display_progress(ctx->progress, ++ctx->progress_cnt);
> >
> > -		parse_commit_no_graph(*list);
> > +		if (parse_commit_no_graph(*list))
> > +			die(_("unable to parse commit %s"),
> > +				oid_to_hex(&(*list)->object.oid));
> >  		hashwrite(f, get_commit_tree_oid(*list)->hash, hash_len);
>
> I don't think parse_commit_no_graph() returning 0 assures us that
> get_commit_tree() and friends will return non-NULL.
>
> This is similar to the case discussed recently where a failed parse of a
> tag may leave "tag->tagged == NULL" even though "tag->obj.parsed" is
> set.
>
> Here an earlier parsing error could cause (*list)->object.parsed to be
> true, but with (*list)->maybe_tree still NULL. Our call to
> parse_commit_no_graph() here would silently return "yep, already tried
> to parse this", and then we'd still segfault.
>
> We _could_ check:
>
>   if (parse_commit_no_graph(*list))
> 	die("unable to parse...");
>   tree = get_commit_tree_oid(*list);
>   if (!tree)
> 	die("unable to get tree for %s...");
>
> but trees are just one piece of data. In fact, the situation is much
> worse for parents: there a NULL parent pointer is valid data, so worse
> than segfaulting, we'd write invalid data to the commit graph, skipping
> one or more parents!
>
> And I think there's literally no way for this function to tell the
> difference between "no parent" and "there was an earlier error, but we
> set the parsed flag anyway and the parent flag is invalid".
>
> I think that argues against Junio's response in:
>
>   https://public-inbox.org/git/xmqqo90bhmi3.fsf@gitster-ct.c.googlers.com/
>
> about how we can use the parsed flag to look at "slightly corrupt"
> objects. I think we'd need at least an extra flag for "corrupt", though
> I think it is simpler just _not_ setting "parsed" and letting the next
> caller spend the extra cycles to rediscover the problem if they're
> interested.

All of this makes sense to me, so I'm wondering what part(s) of this you
feel are worth addressing in this first patch series. Presumably, there
is a longer series that we _could_ write which would introduce a new
'corrupt' field and then check for it here.

But, I'm hesitant to write those patches, since I only have this one
call-site in mind. If we introduce 'corrupt', I feel it would be best to
use it uniformly, instead of only checking it here, and relying on other
bespoke mechanisms to detect corruption elsewhere.

So, I'm content to write the pseudo-code you provided above (which is to
say, call and check both 'parse_commit_no_graph', _and_
'get_commit_tree_oid'), because I think that it's expedient, and fix the
issue which I'm pointing out here.

I don't know how to address the parents situation without going further,
so perhaps it's worth it to think of an alternative, or even leave it
out of these first patch(es).

Let me know what you think, and thanks for your thoughts so far.

> (All of this presupposes that you can convince another piece of code in
> the same process to parse the commit buffer and ignore the error. I have
> no idea if that's possible or not in this case, but it sure would be
> nice not to have to care).
>
> -Peff
Thanks,
Taylor

  reply	other threads:[~2019-09-04 21:18 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-04  2:22 [RFC PATCH 0/1] commit-graph.c: handle corrupt commit trees Taylor Blau
2019-09-04  2:22 ` [RFC PATCH 1/1] commit-graph.c: die on un-parseable commits Taylor Blau
2019-09-04  3:04   ` Jeff King
2019-09-04 21:18     ` Taylor Blau [this message]
2019-09-05  6:47       ` Jeff King
2019-09-06 16:48         ` Derrick Stolee
2019-09-06 17:04           ` Jeff King
2019-09-06 17:19             ` Derrick Stolee
2019-09-06 17:20             ` Derrick Stolee
2019-09-05 22:19     ` Junio C Hamano
2019-09-06  6:35       ` Jeff King
2019-09-06  6:56         ` Jeff King
2019-09-06 16:59         ` Junio C Hamano
2019-09-06 17:04           ` Jeff King
2019-09-09 16:39             ` Junio C Hamano
2019-09-09 16:54               ` Jeff King
2019-09-04 18:25 ` [RFC PATCH 0/1] commit-graph.c: handle corrupt commit trees Garima Singh
2019-09-04 21:21   ` Taylor Blau
2019-09-05  6:08     ` Jeff King
2019-09-06 16:48     ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190904211847.GA20904@syl.local \
    --to=me@ttaylorr.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=peff@peff.net \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).