git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Taylor Blau <me@ttaylorr.com>
Cc: Junio C Hamano <gitster@pobox.com>,
	stolee@gmail.com, git@vger.kernel.org
Subject: Re: [RFC PATCH 1/1] commit-graph.c: die on un-parseable commits
Date: Thu, 5 Sep 2019 02:47:24 -0400	[thread overview]
Message-ID: <20190905064723.GC21450@sigill.intra.peff.net> (raw)
In-Reply-To: <20190904211847.GA20904@syl.local>

On Wed, Sep 04, 2019 at 05:18:47PM -0400, Taylor Blau wrote:

> All of this makes sense to me, so I'm wondering what part(s) of this you
> feel are worth addressing in this first patch series. Presumably, there
> is a longer series that we _could_ write which would introduce a new
> 'corrupt' field and then check for it here.
> 
> But, I'm hesitant to write those patches, since I only have this one
> call-site in mind. If we introduce 'corrupt', I feel it would be best to
> use it uniformly, instead of only checking it here, and relying on other
> bespoke mechanisms to detect corruption elsewhere.
> 
> So, I'm content to write the pseudo-code you provided above (which is to
> say, call and check both 'parse_commit_no_graph', _and_
> 'get_commit_tree_oid'), because I think that it's expedient, and fix the
> issue which I'm pointing out here.

I'd actually be willing to just take the patch you have here, and
consider the "parsed but we saw an error" thing as an oddity of the
object code.  IOW, we shouldn't _have_ to be double-checking here.
Looking for an error return from parse_commit() should really be all a
caller needs to do. Once that's fixed, then your code would just be
doing the right thing.

That said, there's another unhandled case, I think: lookup_tree() might
return NULL (if somebody previously saw that oid as a non-tree), and
parse_commit() wouldn't even notice and return an error!

IMHO that's also something that parse_commit() should be returning an
error for. And it's probably a lot easier to trigger versus the "parsed
earlier but corrupted" thing.

So it might be worth doing the NULL tree check here in the meantime. I
dunno.

Below is a sketch of what I'm thinking parse_commit() should do:

  - remember when an earlier parse returned an error, so we can repeat
    that error (this requires some unfortunate bit-field adjusting)

  - notice a lookup_tree failure

  - likewise, notice a lookup_parent failure

---
diff --git a/commit.c b/commit.c
index a98de16e3d..7e415932b7 100644
--- a/commit.c
+++ b/commit.c
@@ -391,7 +391,9 @@ const void *detach_commit_buffer(struct commit *commit, unsigned long *sizep)
 	return ret;
 }
 
-int parse_commit_buffer(struct repository *r, struct commit *item, const void *buffer, unsigned long size, int check_graph)
+static int parse_commit_buffer_1(struct repository *r, struct commit *item,
+				 const void *buffer, unsigned long size,
+				 int check_graph)
 {
 	const char *tail = buffer;
 	const char *bufptr = buffer;
@@ -401,9 +403,6 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b
 	const int tree_entry_len = the_hash_algo->hexsz + 5;
 	const int parent_entry_len = the_hash_algo->hexsz + 7;
 
-	if (item->object.parsed)
-		return 0;
-	item->object.parsed = 1;
 	tail += size;
 	if (tail <= bufptr + tree_entry_len + 1 || memcmp(bufptr, "tree ", 5) ||
 			bufptr[tree_entry_len] != '\n')
@@ -412,6 +411,10 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b
 		return error("bad tree pointer in commit %s",
 			     oid_to_hex(&item->object.oid));
 	set_commit_tree(item, lookup_tree(r, &parent));
+	if (!item->maybe_tree)
+		return error("bad tree pointer %s in commit %s",
+			     oid_to_hex(&parent),
+			     oid_to_hex(&item->object.oid));
 	bufptr += tree_entry_len + 1; /* "tree " + "hex sha1" + "\n" */
 	pptr = &item->parents;
 
@@ -431,15 +434,19 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b
 		if (graft && (graft->nr_parent < 0 || grafts_replace_parents))
 			continue;
 		new_parent = lookup_commit(r, &parent);
-		if (new_parent)
-			pptr = &commit_list_insert(new_parent, pptr)->next;
+		if (!new_parent)
+			return error("bad parent %s in commit %s",
+				     oid_to_hex(&parent),
+				     oid_to_hex(&item->object.oid));
+		pptr = &commit_list_insert(new_parent, pptr)->next;
 	}
 	if (graft) {
 		int i;
 		struct commit *new_parent;
 		for (i = 0; i < graft->nr_parent; i++) {
 			new_parent = lookup_commit(r,
 						   &graft->parent[i]);
+			/* Here we ignore bogus grafts. Also should be an error? */
 			if (!new_parent)
 				continue;
 			pptr = &commit_list_insert(new_parent, pptr)->next;
@@ -453,6 +460,23 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b
 	return 0;
 }
 
+int parse_commit_buffer(struct repository *r, struct commit *item,
+			const void *buffer, unsigned long size,
+			int check_graph)
+{
+	int ret;
+
+	if (item->object.parsed)
+		return item->object.corrupt ? -1 : 0;
+	item->object.parsed = 1;
+
+	ret = parse_commit_buffer_1(r, item, buffer, size, check_graph);
+	if (ret < 0)
+		item->object.corrupt = 1;
+
+	return ret;
+}
+
 int repo_parse_commit_internal(struct repository *r,
 			       struct commit *item,
 			       int quiet_on_missing,
diff --git a/object.h b/object.h
index 0120892bbd..b83d3964ad 100644
--- a/object.h
+++ b/object.h
@@ -59,7 +59,7 @@ struct object_array {
 
 /*
  * object flag allocation:
- * revision.h:               0---------10                              25----28
+ * revision.h:               0---------10                              24----27
  * fetch-pack.c:             01
  * negotiator/default.c:       2--5
  * walker.c:                 0-2
@@ -78,13 +78,14 @@ struct object_array {
  * builtin/show-branch.c:    0-------------------------------------------26
  * builtin/unpack-objects.c:                                 2021
  */
-#define FLAG_BITS  29
+#define FLAG_BITS  28
 
 /*
  * The object type is stored in 3 bits.
  */
 struct object {
 	unsigned parsed : 1;
+	unsigned corrupt : 1;
 	unsigned type : TYPE_BITS;
 	unsigned flags : FLAG_BITS;
 	struct object_id oid;
diff --git a/revision.h b/revision.h
index 4134dc6029..5c0b831b37 100644
--- a/revision.h
+++ b/revision.h
@@ -33,7 +33,7 @@
 #define ALL_REV_FLAGS	(((1u<<11)-1) | NOT_USER_GIVEN | TRACK_LINEAR)
 
 #define TOPO_WALK_EXPLORED	(1u<<27)
-#define TOPO_WALK_INDEGREE	(1u<<28)
+#define TOPO_WALK_INDEGREE	(1u<<24)
 
 #define DECORATE_SHORT_REFS	1
 #define DECORATE_FULL_REFS	2

  reply	other threads:[~2019-09-05  6:47 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-04  2:22 [RFC PATCH 0/1] commit-graph.c: handle corrupt commit trees Taylor Blau
2019-09-04  2:22 ` [RFC PATCH 1/1] commit-graph.c: die on un-parseable commits Taylor Blau
2019-09-04  3:04   ` Jeff King
2019-09-04 21:18     ` Taylor Blau
2019-09-05  6:47       ` Jeff King [this message]
2019-09-06 16:48         ` Derrick Stolee
2019-09-06 17:04           ` Jeff King
2019-09-06 17:19             ` Derrick Stolee
2019-09-06 17:20             ` Derrick Stolee
2019-09-05 22:19     ` Junio C Hamano
2019-09-06  6:35       ` Jeff King
2019-09-06  6:56         ` Jeff King
2019-09-06 16:59         ` Junio C Hamano
2019-09-06 17:04           ` Jeff King
2019-09-09 16:39             ` Junio C Hamano
2019-09-09 16:54               ` Jeff King
2019-09-04 18:25 ` [RFC PATCH 0/1] commit-graph.c: handle corrupt commit trees Garima Singh
2019-09-04 21:21   ` Taylor Blau
2019-09-05  6:08     ` Jeff King
2019-09-06 16:48     ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190905064723.GC21450@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).