git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Subject: [PATCH 08/23] fsck: require an actual buffer for non-blobs
Date: Fri, 18 Oct 2019 00:54:12 -0400	[thread overview]
Message-ID: <20191018045412.GH17879@sigill.intra.peff.net> (raw)
In-Reply-To: <20191018044103.GA17625@sigill.intra.peff.net>

The fsck_object() function takes in a buffer, but also a "struct
object". The rules for using these vary between types:

  - for a commit, we'll use the provided buffer; if it's NULL, we'll
    fall back to get_commit_buffer(), which loads from either an
    in-memory cache or from disk. If the latter fails, we'd die(), which
    is non-ideal for fsck.

  - for a tag, a NULL buffer will fall back to loading the object from
    disk (and failure would lead to an fsck error)

  - for a tree, we _never_ look at the provided buffer, and always use
    tree->buffer

  - for a blob, we usually don't look at the buffer at all, unless it
    has been marked as a .gitmodule file. In that case we check the
    buffer given to us, or assume a NULL buffer is a very large blob
    (and complain about it)

This is much more complex than it needs to be. It turns out that nobody
ever feeds a NULL buffer that isn't a blob:

  - git-fsck calls fsck_object() only from fsck_obj(). That in turn is
    called by one of:

      - fsck_obj_buffer(), which is a callback to verify_pack(), which
	unpacks everything except large blobs into a buffer (see
	pack-check.c, lines 131-141).

      - fsck_loose(), which hits a BUG() on non-blobs with a NULL buffer
	(builtin/fsck.c, lines 639-640)

    And in either case, we'll have just called parse_object_buffer()
    anyway, which would segfault on a NULL buffer for commits or tags
    (not for trees, but it would install a NULL tree->buffer which would
    later cause a segfault)

  - git-index-pack asserts that the buffer is non-NULL unless the object
    is a blob (see builtin/index-pack.c, line 832)

  - git-unpack-objects always writes a non-NULL buffer into its
    obj_buffer hash, which is then fed to fsck_object(). (There is
    actually a funny thing here where it does not store blob buffers at
    all, nor does it call fsck on them; it does check any needed blobs
    via fsck_finish() though).

Let's make the rules simpler, which reduces the amount of code and gives
us more flexibility in refactoring the fsck code. The new rules are:

  - only blobs are allowed to pass a NULL buffer

  - we always use the provided buffer, never pulling information from
    the object struct

We don't have to adjust any callers, because they were already adhering
to these. Note that we do drop a few fsck identifiers for missing tags,
but that was all dead code (because nobody passed a NULL tag buffer).

Signed-off-by: Jeff King <peff@peff.net>
---
 fsck.c | 51 +++++++++------------------------------------------
 fsck.h |  6 +++++-
 2 files changed, 14 insertions(+), 43 deletions(-)

diff --git a/fsck.c b/fsck.c
index 79ce3a97c8..347a0ef5c9 100644
--- a/fsck.c
+++ b/fsck.c
@@ -49,13 +49,11 @@ static struct oidset gitmodules_done = OIDSET_INIT;
 	FUNC(MISSING_SPACE_BEFORE_EMAIL, ERROR) \
 	FUNC(MISSING_TAG, ERROR) \
 	FUNC(MISSING_TAG_ENTRY, ERROR) \
-	FUNC(MISSING_TAG_OBJECT, ERROR) \
 	FUNC(MISSING_TREE, ERROR) \
 	FUNC(MISSING_TREE_OBJECT, ERROR) \
 	FUNC(MISSING_TYPE, ERROR) \
 	FUNC(MISSING_TYPE_ENTRY, ERROR) \
 	FUNC(MULTIPLE_AUTHORS, ERROR) \
-	FUNC(TAG_OBJECT_NOT_TAG, ERROR) \
 	FUNC(TREE_NOT_SORTED, ERROR) \
 	FUNC(UNKNOWN_TYPE, ERROR) \
 	FUNC(ZERO_PADDED_DATE, ERROR) \
@@ -541,7 +539,9 @@ static int verify_ordered(unsigned mode1, const char *name1, unsigned mode2, con
 	return c1 < c2 ? 0 : TREE_UNORDERED;
 }
 
-static int fsck_tree(struct tree *item, struct fsck_options *options)
+static int fsck_tree(struct tree *item,
+		     const char *buffer, unsigned long size,
+		     struct fsck_options *options)
 {
 	int retval = 0;
 	int has_null_sha1 = 0;
@@ -558,7 +558,7 @@ static int fsck_tree(struct tree *item, struct fsck_options *options)
 	unsigned o_mode;
 	const char *o_name;
 
-	if (init_tree_desc_gently(&desc, item->buffer, item->size)) {
+	if (init_tree_desc_gently(&desc, buffer, size)) {
 		retval += report(options, &item->object, FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
 		return retval;
 	}
@@ -733,8 +733,8 @@ static int fsck_ident(const char **ident, struct object *obj, struct fsck_option
 	return 0;
 }
 
-static int fsck_commit_buffer(struct commit *commit, const char *buffer,
-	unsigned long size, struct fsck_options *options)
+static int fsck_commit(struct commit *commit, const char *buffer,
+		       unsigned long size, struct fsck_options *options)
 {
 	struct object_id tree_oid, oid;
 	unsigned author_count;
@@ -788,47 +788,15 @@ static int fsck_commit_buffer(struct commit *commit, const char *buffer,
 	return 0;
 }
 
-static int fsck_commit(struct commit *commit, const char *data,
-	unsigned long size, struct fsck_options *options)
-{
-	const char *buffer = data ?  data : get_commit_buffer(commit, &size);
-	int ret = fsck_commit_buffer(commit, buffer, size, options);
-	if (!data)
-		unuse_commit_buffer(commit, buffer);
-	return ret;
-}
-
-static int fsck_tag(struct tag *tag, const char *data,
+static int fsck_tag(struct tag *tag, const char *buffer,
 		    unsigned long size, struct fsck_options *options)
 {
 	struct object_id oid;
 	int ret = 0;
-	const char *buffer;
-	char *to_free = NULL, *eol;
+	char *eol;
 	struct strbuf sb = STRBUF_INIT;
 	const char *p;
 
-	if (data)
-		buffer = data;
-	else {
-		enum object_type type;
-
-		buffer = to_free =
-			read_object_file(&tag->object.oid, &type, &size);
-		if (!buffer)
-			return report(options, &tag->object,
-				FSCK_MSG_MISSING_TAG_OBJECT,
-				"cannot read tag object");
-
-		if (type != OBJ_TAG) {
-			ret = report(options, &tag->object,
-				FSCK_MSG_TAG_OBJECT_NOT_TAG,
-				"expected tag got %s",
-			    type_name(type));
-			goto done;
-		}
-	}
-
 	ret = verify_headers(buffer, size, &tag->object, options);
 	if (ret)
 		goto done;
@@ -889,7 +857,6 @@ static int fsck_tag(struct tag *tag, const char *data,
 
 done:
 	strbuf_release(&sb);
-	free(to_free);
 	return ret;
 }
 
@@ -979,7 +946,7 @@ int fsck_object(struct object *obj, void *data, unsigned long size,
 	if (obj->type == OBJ_BLOB)
 		return fsck_blob((struct blob *)obj, data, size, options);
 	if (obj->type == OBJ_TREE)
-		return fsck_tree((struct tree *) obj, options);
+		return fsck_tree((struct tree *) obj, data, size, options);
 	if (obj->type == OBJ_COMMIT)
 		return fsck_commit((struct commit *) obj, (const char *) data,
 			size, options);
diff --git a/fsck.h b/fsck.h
index b95595ae5f..e479461075 100644
--- a/fsck.h
+++ b/fsck.h
@@ -52,7 +52,11 @@ struct fsck_options {
  *    0		everything OK
  */
 int fsck_walk(struct object *obj, void *data, struct fsck_options *options);
-/* If NULL is passed for data, we assume the object is local and read it. */
+
+/*
+ * Blob objects my pass a NULL data pointer, which indicates they are too large
+ * to fit in memory. All other types must pass a real buffer.
+ */
 int fsck_object(struct object *obj, void *data, unsigned long size,
 	struct fsck_options *options);
 
-- 
2.23.0.1228.gee29b05929


  parent reply	other threads:[~2019-10-18  4:54 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-18  4:41 [PATCH 0/23] parsing and fsck cleanups Jeff King
2019-10-18  4:42 ` [PATCH 01/23] parse_commit_buffer(): treat lookup_commit() failure as parse error Jeff King
2019-10-24  3:37   ` Junio C Hamano
2019-10-24 18:01     ` Jeff King
2019-10-18  4:43 ` [PATCH 02/23] parse_commit_buffer(): treat lookup_tree() " Jeff King
2019-10-24 23:12   ` Jonathan Tan
2019-10-24 23:22     ` Jeff King
2019-10-18  4:45 ` [PATCH 03/23] parse_tag_buffer(): treat NULL tag pointer " Jeff King
2019-10-18  4:47 ` [PATCH 04/23] remember commit/tag parse failures Jeff King
2019-10-24  3:51   ` Junio C Hamano
2019-10-24 23:25   ` Jonathan Tan
2019-10-24 23:41     ` Jeff King
2019-10-18  4:48 ` [PATCH 05/23] fsck: stop checking commit->tree value Jeff King
2019-10-24  3:57   ` Junio C Hamano
2019-10-18  4:49 ` [PATCH 06/23] fsck: stop checking commit->parent counts Jeff King
2019-10-18  4:51 ` [PATCH 07/23] fsck: stop checking tag->tagged Jeff King
2019-10-18  4:54 ` Jeff King [this message]
2019-10-18  4:56 ` [PATCH 09/23] fsck: unify object-name code Jeff King
2019-10-24  6:05   ` Junio C Hamano
2019-10-24 18:07     ` Jeff King
2019-10-25  3:23       ` Junio C Hamano
2019-10-25 21:20         ` Jeff King
2019-10-18  4:56 ` [PATCH 10/23] fsck_describe_object(): build on our get_object_name() primitive Jeff King
2019-10-24  6:06   ` Junio C Hamano
2019-10-18  4:57 ` [PATCH 11/23] fsck: use oids rather than objects for object_name API Jeff King
2019-10-18  4:58 ` [PATCH 12/23] fsck: don't require object structs for display functions Jeff King
2019-10-18  4:58 ` [PATCH 13/23] fsck: only provide oid/type in fsck_error callback Jeff King
2019-10-18  4:58 ` [PATCH 14/23] fsck: only require an oid for skiplist functions Jeff King
2019-10-18  4:59 ` [PATCH 15/23] fsck: don't require an object struct for report() Jeff King
2019-10-18  4:59 ` [PATCH 16/23] fsck: accept an oid instead of a "struct blob" for fsck_blob() Jeff King
2019-10-18  4:59 ` [PATCH 17/23] fsck: drop blob struct from fsck_finish() Jeff King
2019-10-18  5:00 ` [PATCH 18/23] fsck: don't require an object struct for fsck_ident() Jeff King
2019-10-18  5:00 ` [PATCH 19/23] fsck: don't require an object struct in verify_headers() Jeff King
2019-10-18  5:00 ` [PATCH 20/23] fsck: rename vague "oid" local variables Jeff King
2019-10-18  5:01 ` [PATCH 21/23] fsck: accept an oid instead of a "struct tag" for fsck_tag() Jeff King
2019-10-18  5:01 ` [PATCH 22/23] fsck: accept an oid instead of a "struct commit" for fsck_commit() Jeff King
2019-10-18  5:02 ` [PATCH 23/23] fsck: accept an oid instead of a "struct tree" for fsck_tree() Jeff King
2019-10-24 23:49 ` [PATCH 0/23] parsing and fsck cleanups Jonathan Tan
2019-10-25  3:11 ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191018045412.GH17879@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).