git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Junio C Hamano <gitster@pobox.com>
Cc: Taylor Blau <me@ttaylorr.com>, stolee@gmail.com, git@vger.kernel.org
Subject: Re: [RFC PATCH 1/1] commit-graph.c: die on un-parseable commits
Date: Fri, 6 Sep 2019 02:56:06 -0400	[thread overview]
Message-ID: <20190906065606.GC5122@sigill.intra.peff.net> (raw)
In-Reply-To: <20190906063503.GB5122@sigill.intra.peff.net>

On Fri, Sep 06, 2019 at 02:35:03AM -0400, Jeff King wrote:

> > Fair enough.  Forcing later users to reattempt parsing (and failing
> > the same way) would be safer and it should also be sufficient as we
> > are talking about how to handle a broken repository, i.e. an error
> > case.
> 
> One of the tricky things, and the reason I used a "corrupt" flag in my
> earlier sketch, is that the state after we encounter a parse error is
> unknown. So imagine parse_commit_buffer() sees that one of the parent
> lines is bogus, and we return an error. The caller gets to see whatever
> half-parsed state we managed to come up with.
> 
> So far so good. But now imagine we call parse_commit_buffer() again, and
> we re-parse. How does that interact with the half-parsed state? Some of
> it works OK (e.g., lookup_tree() would find the same tree). Some not so
> much (I think we'd keep appending parents at each call).
> 
> I guess this might not be too bad to handle. Value fields like
> timestamp_t are OK to overwrite. Pointers to objects likewise, since the
> memory is owned elsewhere. If we see existing parent pointers in an
> object we're parsing, we could probably free them under the assumption
> they're leftover cruft. Likewise for the "tag" field of "struct tag",
> which is owned by the struct and should be freed.
> 
> Blobs and trees don't actually parse anything into their structs. So
> it's really just special-casing those two items.

So here's something a bit more concrete to play with. Using the patch
below, we maintain the invariant that if you called parse_commit() or
equivalent and got a successful return, then commit->tree is always
non-NULL. That fixes the second "missing tree" test from elsewhere in
this thread:

  https://public-inbox.org/git/042a8ba8b2a98c269f9cd1a8e88488b80d686f0d.1567720960.git.me@ttaylorr.com/

_without_ applying the third patch (though the error message we expect
in the test does not tweaked). And it likely also protects most of the
other callers of get_commit_tree_oid(), assuming somewhere in their code
path they call parse_commit() and actually check the error code.

Likewise, tag->tagged is always non-NULL after a successful tag parse,
which should fix the segfault discussed recently in:

  https://public-inbox.org/git/20190824230944.GA14132@jessup.stsp.name/

as well as probably others.

And callers are still able to view the broken objects, but they have to
ignore the error return from the parse functions.

This seems like a promising direction. I'd probably break it into a few
separate patches, and it would be nice to have some tests (especially
one where we actually do try to parse multiple times).

---
diff --git a/commit.c b/commit.c
index a98de16e3d..47e36fd13e 100644
--- a/commit.c
+++ b/commit.c
@@ -398,20 +398,37 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b
 	struct object_id parent;
 	struct commit_list **pptr;
 	struct commit_graft *graft;
+	struct tree *tree;
 	const int tree_entry_len = the_hash_algo->hexsz + 5;
 	const int parent_entry_len = the_hash_algo->hexsz + 7;
 
 	if (item->object.parsed)
 		return 0;
-	item->object.parsed = 1;
+
+	if (item->parents) {
+		/*
+		 * Presumably this is leftover from an earlier failed parse;
+		 * clear it out in preparation for us re-parsing (we'll hit the
+		 * same error, but that's good, since it lets our caller know
+		 * the result cannot be trusted.
+		 */
+		free_commit_list(item->parents);
+		item->parents = NULL;
+	}
+
 	tail += size;
 	if (tail <= bufptr + tree_entry_len + 1 || memcmp(bufptr, "tree ", 5) ||
 			bufptr[tree_entry_len] != '\n')
 		return error("bogus commit object %s", oid_to_hex(&item->object.oid));
 	if (get_oid_hex(bufptr + 5, &parent) < 0)
 		return error("bad tree pointer in commit %s",
 			     oid_to_hex(&item->object.oid));
-	set_commit_tree(item, lookup_tree(r, &parent));
+	tree = lookup_tree(r, &parent);
+	if (!tree)
+		return error("bad tree pointer %s in commit %s",
+			     oid_to_hex(&parent),
+			     oid_to_hex(&item->object.oid));
+	set_commit_tree(item, tree);
 	bufptr += tree_entry_len + 1; /* "tree " + "hex sha1" + "\n" */
 	pptr = &item->parents;
 
@@ -450,6 +467,7 @@ int parse_commit_buffer(struct repository *r, struct commit *item, const void *b
 	if (check_graph)
 		load_commit_graph_info(r, item);
 
+	item->object.parsed = 1;
 	return 0;
 }
 
diff --git a/tag.c b/tag.c
index 5db870edb9..3dcebb4715 100644
--- a/tag.c
+++ b/tag.c
@@ -141,7 +141,11 @@ int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u
 
 	if (item->object.parsed)
 		return 0;
-	item->object.parsed = 1;
+
+	if (item->tag) {
+		/* Left over from an earlier failed parse */
+		FREE_AND_NULL(item->tag);
+	}
 
 	if (size < the_hash_algo->hexsz + 24)
 		return -1;
@@ -167,10 +171,15 @@ int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u
 	} else if (!strcmp(type, tag_type)) {
 		item->tagged = (struct object *)lookup_tag(r, &oid);
 	} else {
-		error("Unknown type %s", type);
-		item->tagged = NULL;
+		return error("unknown tag type '%s' in %s",
+			     type, oid_to_hex(&item->object.oid));
 	}
 
+	if (!item->tagged)
+		return error("bad tag pointer to %s in %s",
+			     oid_to_hex(&oid),
+			     oid_to_hex(&item->object.oid));
+
 	if (bufptr + 4 < tail && starts_with(bufptr, "tag "))
 		; 		/* good */
 	else
@@ -187,6 +196,7 @@ int parse_tag_buffer(struct repository *r, struct tag *item, const void *data, u
 	else
 		item->date = 0;
 
+	item->object.parsed = 1;
 	return 0;
 }
 

  reply	other threads:[~2019-09-06  6:56 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-04  2:22 [RFC PATCH 0/1] commit-graph.c: handle corrupt commit trees Taylor Blau
2019-09-04  2:22 ` [RFC PATCH 1/1] commit-graph.c: die on un-parseable commits Taylor Blau
2019-09-04  3:04   ` Jeff King
2019-09-04 21:18     ` Taylor Blau
2019-09-05  6:47       ` Jeff King
2019-09-06 16:48         ` Derrick Stolee
2019-09-06 17:04           ` Jeff King
2019-09-06 17:19             ` Derrick Stolee
2019-09-06 17:20             ` Derrick Stolee
2019-09-05 22:19     ` Junio C Hamano
2019-09-06  6:35       ` Jeff King
2019-09-06  6:56         ` Jeff King [this message]
2019-09-06 16:59         ` Junio C Hamano
2019-09-06 17:04           ` Jeff King
2019-09-09 16:39             ` Junio C Hamano
2019-09-09 16:54               ` Jeff King
2019-09-04 18:25 ` [RFC PATCH 0/1] commit-graph.c: handle corrupt commit trees Garima Singh
2019-09-04 21:21   ` Taylor Blau
2019-09-05  6:08     ` Jeff King
2019-09-06 16:48     ` Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190906065606.GC5122@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ttaylorr.com \
    --cc=stolee@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).