git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Derrick Stolee <stolee@gmail.com>
To: Jeff King <peff@peff.net>, Taylor Blau <me@ttaylorr.com>
Cc: Junio C Hamano <gitster@pobox.com>, git@vger.kernel.org
Subject: Re: [RFC PATCH 1/1] commit-graph.c: die on un-parseable commits
Date: Fri, 6 Sep 2019 12:48:05 -0400	[thread overview]
Message-ID: <36bf0064-b563-74ed-4ae5-01745ced5d2e@gmail.com> (raw)
In-Reply-To: <20190905064723.GC21450@sigill.intra.peff.net>

On 9/5/2019 2:47 AM, Jeff King wrote:
> On Wed, Sep 04, 2019 at 05:18:47PM -0400, Taylor Blau wrote:
> 
>> All of this makes sense to me, so I'm wondering what part(s) of this you
>> feel are worth addressing in this first patch series. Presumably, there
>> is a longer series that we _could_ write which would introduce a new
>> 'corrupt' field and then check for it here.
>>
>> But, I'm hesitant to write those patches, since I only have this one
>> call-site in mind. If we introduce 'corrupt', I feel it would be best to
>> use it uniformly, instead of only checking it here, and relying on other
>> bespoke mechanisms to detect corruption elsewhere.
>>
>> So, I'm content to write the pseudo-code you provided above (which is to
>> say, call and check both 'parse_commit_no_graph', _and_
>> 'get_commit_tree_oid'), because I think that it's expedient, and fix the
>> issue which I'm pointing out here.
> 
> I'd actually be willing to just take the patch you have here, and
> consider the "parsed but we saw an error" thing as an oddity of the
> object code.  IOW, we shouldn't _have_ to be double-checking here.
> Looking for an error return from parse_commit() should really be all a
> caller needs to do. Once that's fixed, then your code would just be
> doing the right thing.
> 
> That said, there's another unhandled case, I think: lookup_tree() might
> return NULL (if somebody previously saw that oid as a non-tree), and
> parse_commit() wouldn't even notice and return an error!
> 
> IMHO that's also something that parse_commit() should be returning an
> error for. And it's probably a lot easier to trigger versus the "parsed
> earlier but corrupted" thing.
> 
> So it might be worth doing the NULL tree check here in the meantime. I
> dunno.
> 
> Below is a sketch of what I'm thinking parse_commit() should do:
> 
>   - remember when an earlier parse returned an error, so we can repeat
>     that error (this requires some unfortunate bit-field adjusting)
> 
>   - notice a lookup_tree failure
> 
>   - likewise, notice a lookup_parent failure
> 
> ---
> diff --git a/commit.c b/commit.c
> index a98de16e3d..7e415932b7 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -391,7 +391,9 @@ const void *detach_commit_buffer(struct commit *commit, unsigned long *sizep)
>  	return ret;
>  }
>  
> -int parse_commit_buffer(struct repository *r, struct commit *item, const void *buffer, unsigned long size, int check_graph)
> +static int parse_commit_buffer_1(struct repository *r, struct commit *item,
> +				 const void *buffer, unsigned long size,
> +				 int check_graph)
>  {
>  	const char *tail = buffer;
>  	const char *bufptr = buffer;
> @@ -401,9 +403,6 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b
>  	const int tree_entry_len = the_hash_algo->hexsz + 5;
>  	const int parent_entry_len = the_hash_algo->hexsz + 7;
>  
> -	if (item->object.parsed)
> -		return 0;
> -	item->object.parsed = 1;
>  	tail += size;
>  	if (tail <= bufptr + tree_entry_len + 1 || memcmp(bufptr, "tree ", 5) ||
>  			bufptr[tree_entry_len] != '\n')
> @@ -412,6 +411,10 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b
>  		return error("bad tree pointer in commit %s",
>  			     oid_to_hex(&item->object.oid));
>  	set_commit_tree(item, lookup_tree(r, &parent));
> +	if (!item->maybe_tree)
> +		return error("bad tree pointer %s in commit %s",
> +			     oid_to_hex(&parent),
> +			     oid_to_hex(&item->object.oid));
>  	bufptr += tree_entry_len + 1; /* "tree " + "hex sha1" + "\n" */
>  	pptr = &item->parents;
>  
> @@ -431,15 +434,19 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b
>  		if (graft && (graft->nr_parent < 0 || grafts_replace_parents))
>  			continue;
>  		new_parent = lookup_commit(r, &parent);
> -		if (new_parent)
> -			pptr = &commit_list_insert(new_parent, pptr)->next;
> +		if (!new_parent)
> +			return error("bad parent %s in commit %s",
> +				     oid_to_hex(&parent),
> +				     oid_to_hex(&item->object.oid));
> +		pptr = &commit_list_insert(new_parent, pptr)->next;
>  	}
>  	if (graft) {
>  		int i;
>  		struct commit *new_parent;
>  		for (i = 0; i < graft->nr_parent; i++) {
>  			new_parent = lookup_commit(r,
>  						   &graft->parent[i]);
> +			/* Here we ignore bogus grafts. Also should be an error? */
>  			if (!new_parent)
>  				continue;
>  			pptr = &commit_list_insert(new_parent, pptr)->next;
> @@ -453,6 +460,23 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b
>  	return 0;
>  }
>  
> +int parse_commit_buffer(struct repository *r, struct commit *item,
> +			const void *buffer, unsigned long size,
> +			int check_graph)
> +{
> +	int ret;
> +
> +	if (item->object.parsed)
> +		return item->object.corrupt ? -1 : 0;
> +	item->object.parsed = 1;
> +
> +	ret = parse_commit_buffer_1(r, item, buffer, size, check_graph);
> +	if (ret < 0)
> +		item->object.corrupt = 1;
> +
> +	return ret;
> +}
> +
>  int repo_parse_commit_internal(struct repository *r,
>  			       struct commit *item,
>  			       int quiet_on_missing,
> diff --git a/object.h b/object.h
> index 0120892bbd..b83d3964ad 100644
> --- a/object.h
> +++ b/object.h
> @@ -59,7 +59,7 @@ struct object_array {
>  
>  /*
>   * object flag allocation:
> - * revision.h:               0---------10                              25----28
> + * revision.h:               0---------10                              24----27
>   * fetch-pack.c:             01
>   * negotiator/default.c:       2--5
>   * walker.c:                 0-2
> @@ -78,13 +78,14 @@ struct object_array {
>   * builtin/show-branch.c:    0-------------------------------------------26
>   * builtin/unpack-objects.c:                                 2021
>   */
> -#define FLAG_BITS  29
> +#define FLAG_BITS  28
>  
>  /*
>   * The object type is stored in 3 bits.
>   */
>  struct object {
>  	unsigned parsed : 1;
> +	unsigned corrupt : 1;
>  	unsigned type : TYPE_BITS;
>  	unsigned flags : FLAG_BITS;
>  	struct object_id oid;
> diff --git a/revision.h b/revision.h
> index 4134dc6029..5c0b831b37 100644
> --- a/revision.h
> +++ b/revision.h
> @@ -33,7 +33,7 @@
>  #define ALL_REV_FLAGS	(((1u<<11)-1) | NOT_USER_GIVEN | TRACK_LINEAR)
>  
>  #define TOPO_WALK_EXPLORED	(1u<<27)
> -#define TOPO_WALK_INDEGREE	(1u<<28)
> +#define TOPO_WALK_INDEGREE	(1u<<24)

As an aside, these flag bit modifications look fine, but would need to
be explained. I'm guessing that since you are adding a bit of data
to struct object you want to avoid increasing the struct size across
a 32-bit boundary. Are we sure that bit 24 is not used anywhere else?
(My search for "1u<<24" found nothing, and "1 << 24" found a bit in
the cache-entry flags, so this seems safe.)

Thanks,
-Stolee

  reply	other threads:[~2019-09-06 16:48 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-04  2:22 [RFC PATCH 0/1] commit-graph.c: handle corrupt commit trees Taylor Blau
2019-09-04  2:22 ` [RFC PATCH 1/1] commit-graph.c: die on un-parseable commits Taylor Blau
2019-09-04  3:04   ` Jeff King
2019-09-04 21:18     ` Taylor Blau
2019-09-05  6:47       ` Jeff King
2019-09-06 16:48         ` Derrick Stolee [this message]
2019-09-06 17:04           ` Jeff King
2019-09-06 17:19             ` Derrick Stolee
2019-09-06 17:20             ` Derrick Stolee
2019-09-05 22:19     ` Junio C Hamano
2019-09-06  6:35       ` Jeff King
2019-09-06  6:56         ` Jeff King
2019-09-06 16:59         ` Junio C Hamano
2019-09-06 17:04           ` Jeff King
2019-09-09 16:39             ` Junio C Hamano
2019-09-09 16:54               ` Jeff King
2019-09-04 18:25 ` [RFC PATCH 0/1] commit-graph.c: handle corrupt commit trees Garima Singh
2019-09-04 21:21   ` Taylor Blau
2019-09-05  6:08     ` Jeff King
2019-09-06 16:48     ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=36bf0064-b563-74ed-4ae5-01745ced5d2e@gmail.com \
    --to=stolee@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).