git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 0/6] Improve tag checking in fsck and with transfer.fsckobjects
@ 2014-08-28 14:46 Johannes Schindelin
  2014-08-28 14:46 ` [PATCH 1/6] Refactor type_from_string() to avoid die()ing in case of errors Johannes Schindelin
                   ` (7 more replies)
  0 siblings, 8 replies; 68+ messages in thread
From: Johannes Schindelin @ 2014-08-28 14:46 UTC (permalink / raw)
  To: gitster; +Cc: git

This patch series introduces detailed checking of tag objects when calling
git fsck, and also when transfer.fsckobjects is set to true.

To this end, the fsck machinery is reworked to accept the buffer and size
of the object to check, and for commit and tag objects, we verify that the
buffers end in a NUL.

This work was sponsored by GitHub.

Johannes Schindelin (5):
  Refactor type_from_string() to avoid die()ing in case of errors
  fsck: inspect tag objects more closely
  Add regression tests for stricter tag fsck'ing
  Refactor out fsck_object_buffer() accepting the object data
  Make sure that index-pack --strict fails upon invalid tag objects

 builtin/index-pack.c     |   3 +-
 builtin/unpack-objects.c |  14 ++++--
 fsck.c                   | 109 ++++++++++++++++++++++++++++++++++++++++++++---
 fsck.h                   |   3 ++
 object.c                 |  13 +++++-
 object.h                 |   1 +
 t/t1450-fsck.sh          |  39 +++++++++++++++++
 t/t5302-pack-index.sh    |  19 +++++++++
 8 files changed, 188 insertions(+), 13 deletions(-)

-- 
2.0.0.rc3.9669.g840d1f9

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 1/6] Refactor type_from_string() to avoid die()ing in case of errors
  2014-08-28 14:46 [PATCH 0/6] Improve tag checking in fsck and with transfer.fsckobjects Johannes Schindelin
@ 2014-08-28 14:46 ` Johannes Schindelin
  2014-08-28 20:43   ` Junio C Hamano
  2014-08-28 14:46 ` [PATCH 2/6] Accept object data in the fsck_object() function Johannes Schindelin
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 68+ messages in thread
From: Johannes Schindelin @ 2014-08-28 14:46 UTC (permalink / raw)
  To: gitster; +Cc: git

In the next commits, we will enhance the fsck_tag() function to check
tag objects more thoroughly. To this end, we need a function to verify
that a given string is a valid object type, but that does not die() in
the negative case.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 object.c | 13 ++++++++++++-
 object.h |  1 +
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/object.c b/object.c
index a16b9f9..5eee592 100644
--- a/object.c
+++ b/object.c
@@ -33,13 +33,24 @@ const char *typename(unsigned int type)
 	return object_type_strings[type];
 }
 
-int type_from_string(const char *str)
+int type_from_string_gently(const char *str)
 {
 	int i;
 
 	for (i = 1; i < ARRAY_SIZE(object_type_strings); i++)
 		if (!strcmp(str, object_type_strings[i]))
 			return i;
+
+	return -1;
+}
+
+int type_from_string(const char *str)
+{
+	int i = type_from_string_gently(str);
+
+	if (i >= 0)
+		return i;
+
 	die("invalid object type \"%s\"", str);
 }
 
diff --git a/object.h b/object.h
index 5e8d8ee..5c5d22f 100644
--- a/object.h
+++ b/object.h
@@ -54,6 +54,7 @@ struct object {
 
 extern const char *typename(unsigned int type);
 extern int type_from_string(const char *str);
+extern int type_from_string_gently(const char *str);
 
 /*
  * Return the current number of buckets in the object hashmap.
-- 
2.0.0.rc3.9669.g840d1f9

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 2/6] Accept object data in the fsck_object() function
  2014-08-28 14:46 [PATCH 0/6] Improve tag checking in fsck and with transfer.fsckobjects Johannes Schindelin
  2014-08-28 14:46 ` [PATCH 1/6] Refactor type_from_string() to avoid die()ing in case of errors Johannes Schindelin
@ 2014-08-28 14:46 ` Johannes Schindelin
  2014-08-28 20:47   ` Junio C Hamano
  2014-08-29 23:05   ` Jeff King
  2014-08-28 14:46 ` [PATCH 3/6] Make sure fsck_commit_buffer() does not run out of the buffer Johannes Schindelin
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 68+ messages in thread
From: Johannes Schindelin @ 2014-08-28 14:46 UTC (permalink / raw)
  To: gitster; +Cc: git

When fsck'ing an incoming pack, we need to fsck objects that cannot be
read via read_sha1_file() because they are not local yet (and might even
be rejected if transfer.fsckobjects is set to 'true').

For commits, there is a hack in place: we basically cache commit
objects' buffers anyway, but the same is not true, say, for tag objects.

By refactoring fsck_object() to take the object buffer and size as
optional arguments -- optional, because we still fall back to the
previous method to look at the cached commit objects if the caller
passes NULL -- we prepare the machinery for the upcoming handling of tag
objects.

The assumption that such buffers are inherently NUL terminated is now
wrong, of course, hence we pass the size of the buffer so that we can
add a sanity check later, to prevent running past the end of the buffer.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin/fsck.c           |  2 +-
 builtin/index-pack.c     |  3 ++-
 builtin/unpack-objects.c | 14 ++++++++++----
 fsck.c                   | 24 +++++++++++++++---------
 fsck.h                   |  4 +++-
 5 files changed, 31 insertions(+), 16 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index d42a27d..d9f4e6e 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -298,7 +298,7 @@ static int fsck_obj(struct object *obj)
 
 	if (fsck_walk(obj, mark_used, NULL))
 		objerror(obj, "broken links");
-	if (fsck_object(obj, check_strict, fsck_error_func))
+	if (fsck_object(obj, NULL, 0, check_strict, fsck_error_func))
 		return -1;
 
 	if (obj->type == OBJ_TREE) {
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 5568a5b..f2465ff 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -773,7 +773,8 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
 			if (!obj)
 				die(_("invalid %s"), typename(type));
 			if (do_fsck_object &&
-			    fsck_object(obj, 1, fsck_error_function))
+			    fsck_object(obj, buf, size, 1,
+				    fsck_error_function))
 				die(_("Error in object"));
 			if (fsck_walk(obj, mark_link, NULL))
 				die(_("Not all child objects of %s are reachable"), sha1_to_hex(obj->sha1));
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 99cde45..855d94b 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -164,10 +164,10 @@ static unsigned nr_objects;
  * Called only from check_object() after it verified this object
  * is Ok.
  */
-static void write_cached_object(struct object *obj)
+static void write_cached_object(struct object *obj, struct obj_buffer *obj_buf)
 {
 	unsigned char sha1[20];
-	struct obj_buffer *obj_buf = lookup_object_buffer(obj);
+
 	if (write_sha1_file(obj_buf->buffer, obj_buf->size, typename(obj->type), sha1) < 0)
 		die("failed to write object %s", sha1_to_hex(obj->sha1));
 	obj->flags |= FLAG_WRITTEN;
@@ -180,6 +180,8 @@ static void write_cached_object(struct object *obj)
  */
 static int check_object(struct object *obj, int type, void *data)
 {
+	struct obj_buffer *obj_buf;
+
 	if (!obj)
 		return 1;
 
@@ -198,11 +200,15 @@ static int check_object(struct object *obj, int type, void *data)
 		return 0;
 	}
 
-	if (fsck_object(obj, 1, fsck_error_function))
+	obj_buf = lookup_object_buffer(obj);
+	if (!obj_buf)
+		die("Whoops! Cannot find object '%s'", sha1_to_hex(obj->sha1));
+	if (fsck_object(obj, obj_buf->buffer, obj_buf->size, 1,
+			fsck_error_function))
 		die("Error in object");
 	if (fsck_walk(obj, check_object, NULL))
 		die("Error on reachable objects of %s", sha1_to_hex(obj->sha1));
-	write_cached_object(obj);
+	write_cached_object(obj, obj_buf);
 	return 0;
 }
 
diff --git a/fsck.c b/fsck.c
index 56156ff..dd77628 100644
--- a/fsck.c
+++ b/fsck.c
@@ -277,7 +277,7 @@ static int fsck_ident(const char **ident, struct object *obj, fsck_error error_f
 }
 
 static int fsck_commit_buffer(struct commit *commit, const char *buffer,
-			      fsck_error error_func)
+	unsigned long size, fsck_error error_func)
 {
 	unsigned char tree_sha1[20], sha1[20];
 	struct commit_graft *graft;
@@ -322,15 +322,18 @@ static int fsck_commit_buffer(struct commit *commit, const char *buffer,
 	return 0;
 }
 
-static int fsck_commit(struct commit *commit, fsck_error error_func)
+static int fsck_commit(struct commit *commit, const char *data,
+	unsigned long size, fsck_error error_func)
 {
-	const char *buffer = get_commit_buffer(commit, NULL);
-	int ret = fsck_commit_buffer(commit, buffer, error_func);
-	unuse_commit_buffer(commit, buffer);
+	const char *buffer = data ?  data : get_commit_buffer(commit, &size);
+	int ret = fsck_commit_buffer(commit, buffer, size, error_func);
+	if (!data)
+		unuse_commit_buffer(commit, buffer);
 	return ret;
 }
 
-static int fsck_tag(struct tag *tag, fsck_error error_func)
+static int fsck_tag(struct tag *tag, const char *data,
+	unsigned long size, fsck_error error_func)
 {
 	struct object *tagged = tag->tagged;
 
@@ -339,7 +342,8 @@ static int fsck_tag(struct tag *tag, fsck_error error_func)
 	return 0;
 }
 
-int fsck_object(struct object *obj, int strict, fsck_error error_func)
+int fsck_object(struct object *obj, void *data, unsigned long size,
+	int strict, fsck_error error_func)
 {
 	if (!obj)
 		return error_func(obj, FSCK_ERROR, "no valid object to fsck");
@@ -349,9 +353,11 @@ int fsck_object(struct object *obj, int strict, fsck_error error_func)
 	if (obj->type == OBJ_TREE)
 		return fsck_tree((struct tree *) obj, strict, error_func);
 	if (obj->type == OBJ_COMMIT)
-		return fsck_commit((struct commit *) obj, error_func);
+		return fsck_commit((struct commit *) obj, (const char *) data,
+			size, error_func);
 	if (obj->type == OBJ_TAG)
-		return fsck_tag((struct tag *) obj, error_func);
+		return fsck_tag((struct tag *) obj, (const char *) data,
+			size, error_func);
 
 	return error_func(obj, FSCK_ERROR, "unknown type '%d' (internal fsck error)",
 			  obj->type);
diff --git a/fsck.h b/fsck.h
index 1e4f527..d1e6387 100644
--- a/fsck.h
+++ b/fsck.h
@@ -28,6 +28,8 @@ int fsck_error_function(struct object *obj, int type, const char *fmt, ...);
  *    0		everything OK
  */
 int fsck_walk(struct object *obj, fsck_walk_func walk, void *data);
-int fsck_object(struct object *obj, int strict, fsck_error error_func);
+/* If NULL is passed for data, we assume the object is local and read it. */
+int fsck_object(struct object *obj, void *data, unsigned long size,
+	int strict, fsck_error error_func);
 
 #endif
-- 
2.0.0.rc3.9669.g840d1f9

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 3/6] Make sure fsck_commit_buffer() does not run out of the buffer
  2014-08-28 14:46 [PATCH 0/6] Improve tag checking in fsck and with transfer.fsckobjects Johannes Schindelin
  2014-08-28 14:46 ` [PATCH 1/6] Refactor type_from_string() to avoid die()ing in case of errors Johannes Schindelin
  2014-08-28 14:46 ` [PATCH 2/6] Accept object data in the fsck_object() function Johannes Schindelin
@ 2014-08-28 14:46 ` Johannes Schindelin
  2014-08-28 20:59   ` Junio C Hamano
  2014-08-29 23:27   ` Jeff King
  2014-08-28 14:46 ` [PATCH 4/6] fsck: check tag objects' headers Johannes Schindelin
                   ` (4 subsequent siblings)
  7 siblings, 2 replies; 68+ messages in thread
From: Johannes Schindelin @ 2014-08-28 14:46 UTC (permalink / raw)
  To: gitster; +Cc: git

So far, we assumed that the buffer is NUL terminated, but this is not
a safe assumption, now that we opened the fsck_object() API to pass a
buffer directly.

So let's make sure that there is at least an empty line in the buffer.
That way, our checks would fail if the empty line was encountered
prematurely, and consequently we can get away with the current string
comparisons even with non-NUL-terminated buffers are passed to
fsck_object().

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 fsck.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/fsck.c b/fsck.c
index dd77628..db6aaa4 100644
--- a/fsck.c
+++ b/fsck.c
@@ -237,6 +237,26 @@ static int fsck_tree(struct tree *item, int strict, fsck_error error_func)
 	return retval;
 }
 
+static int must_have_empty_line(const void *data, unsigned long size,
+	struct object *obj, fsck_error error_func)
+{
+	const char *buffer = (const char *)data;
+	int i;
+
+	for (i = 0; i < size; i++) {
+		switch (buffer[i]) {
+		case '\0':
+			return error_func(obj, FSCK_ERROR,
+				"invalid message: NUL at offset %d", i);
+		case '\n':
+			if (i + 1 < size && buffer[i + 1] == '\n')
+				return 0;
+		}
+	}
+
+	return error_func(obj, FSCK_ERROR, "invalid buffer: missing empty line");
+}
+
 static int fsck_ident(const char **ident, struct object *obj, fsck_error error_func)
 {
 	char *end;
@@ -284,6 +304,9 @@ static int fsck_commit_buffer(struct commit *commit, const char *buffer,
 	unsigned parent_count, parent_line_count = 0;
 	int err;
 
+	if (must_have_empty_line(buffer, size, &commit->object, error_func))
+		return -1;
+
 	if (!skip_prefix(buffer, "tree ", &buffer))
 		return error_func(&commit->object, FSCK_ERROR, "invalid format - expected 'tree' line");
 	if (get_sha1_hex(buffer, tree_sha1) || buffer[40] != '\n')
-- 
2.0.0.rc3.9669.g840d1f9

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 4/6] fsck: check tag objects' headers
  2014-08-28 14:46 [PATCH 0/6] Improve tag checking in fsck and with transfer.fsckobjects Johannes Schindelin
                   ` (2 preceding siblings ...)
  2014-08-28 14:46 ` [PATCH 3/6] Make sure fsck_commit_buffer() does not run out of the buffer Johannes Schindelin
@ 2014-08-28 14:46 ` Johannes Schindelin
  2014-08-28 21:25   ` Junio C Hamano
  2014-08-28 14:46 ` [PATCH 5/6] Add regression tests for stricter tag fsck'ing Johannes Schindelin
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 68+ messages in thread
From: Johannes Schindelin @ 2014-08-28 14:46 UTC (permalink / raw)
  To: gitster; +Cc: git

We inspect commit objects pretty much in detail in git-fsck, but we just
glanced over the tag objects. Let's be stricter.

This work was sponsored by GitHub Inc.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 fsck.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 87 insertions(+), 1 deletion(-)

diff --git a/fsck.c b/fsck.c
index db6aaa4..d30946b 100644
--- a/fsck.c
+++ b/fsck.c
@@ -6,6 +6,7 @@
 #include "commit.h"
 #include "tag.h"
 #include "fsck.h"
+#include "refs.h"
 
 static int fsck_walk_tree(struct tree *tree, fsck_walk_func walk, void *data)
 {
@@ -355,6 +356,90 @@ static int fsck_commit(struct commit *commit, const char *data,
 	return ret;
 }
 
+static int fsck_tag_buffer(struct tag *tag, const char *data,
+	unsigned long size, fsck_error error_func)
+{
+	unsigned char commit_sha1[20];
+	int ret = 0;
+	const char *buffer;
+	char *tmp = NULL, *eol;
+
+	if (data)
+		buffer = data;
+	else {
+		enum object_type type;
+
+		buffer = tmp = read_sha1_file(tag->object.sha1, &type, &size);
+		if (!buffer)
+			return error_func(&tag->object, FSCK_ERROR,
+				"cannot read tag object");
+
+		if (type != OBJ_TAG) {
+			ret = error_func(&tag->object, FSCK_ERROR,
+				"expected tag got %s",
+			    typename(type));
+			goto done;
+		}
+	}
+
+	if (must_have_empty_line(buffer, size, &tag->object, error_func))
+		goto done;
+
+	if (!skip_prefix(buffer, "object ", &buffer)) {
+		ret = error_func(&tag->object, FSCK_ERROR, "invalid format - expected 'object' line");
+		goto done;
+	}
+	if (get_sha1_hex(buffer, commit_sha1) || buffer[40] != '\n') {
+		ret = error_func(&tag->object, FSCK_ERROR, "invalid 'object' line format - bad sha1");
+		goto done;
+	}
+	buffer += 41;
+
+	if (!skip_prefix(buffer, "type ", &buffer)) {
+		ret = error_func(&tag->object, FSCK_ERROR, "invalid format - expected 'type' line");
+		goto done;
+	}
+	eol = strchr(buffer, '\n');
+	if (!eol) {
+		ret = error_func(&tag->object, FSCK_ERROR, "invalid format - unexpected end after 'type' line");
+		goto done;
+	}
+	*eol = '\0';
+	if (type_from_string_gently(buffer) < 0)
+		ret = error_func(&tag->object, FSCK_ERROR, "invalid 'type' value");
+	*eol = '\n';
+	if (ret)
+		goto done;
+	buffer = eol + 1;
+
+	if (!skip_prefix(buffer, "tag ", &buffer)) {
+		ret = error_func(&tag->object, FSCK_ERROR, "invalid format - expected 'tag' line");
+		goto done;
+	}
+	eol = strchr(buffer, '\n');
+	if (!eol) {
+		ret = error_func(&tag->object, FSCK_ERROR, "invalid format - unexpected end after 'type' line");
+		goto done;
+	}
+	*eol = '\0';
+	if (check_refname_format(buffer, REFNAME_ALLOW_ONELEVEL))
+		ret = error_func(&tag->object, FSCK_ERROR, "invalid 'tag' name: %s", buffer);
+	*eol = '\n';
+	if (ret)
+		goto done;
+	buffer = eol + 1;
+
+	if (!skip_prefix(buffer, "tagger ", &buffer)) {
+		/* early tags do not contain 'tagger' lines; warn only */
+		error_func(&tag->object, FSCK_WARN, "invalid format - expected 'tagger' line");
+	}
+	ret = fsck_ident(&buffer, &tag->object, error_func);
+
+done:
+	free(tmp);
+	return ret;
+}
+
 static int fsck_tag(struct tag *tag, const char *data,
 	unsigned long size, fsck_error error_func)
 {
@@ -362,7 +447,8 @@ static int fsck_tag(struct tag *tag, const char *data,
 
 	if (!tagged)
 		return error_func(&tag->object, FSCK_ERROR, "could not load tagged object");
-	return 0;
+
+	return fsck_tag_buffer(tag, data, size, error_func);
 }
 
 int fsck_object(struct object *obj, void *data, unsigned long size,
-- 
2.0.0.rc3.9669.g840d1f9

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 5/6] Add regression tests for stricter tag fsck'ing
  2014-08-28 14:46 [PATCH 0/6] Improve tag checking in fsck and with transfer.fsckobjects Johannes Schindelin
                   ` (3 preceding siblings ...)
  2014-08-28 14:46 ` [PATCH 4/6] fsck: check tag objects' headers Johannes Schindelin
@ 2014-08-28 14:46 ` Johannes Schindelin
  2014-08-28 14:47 ` [PATCH 6/6] Make sure that index-pack --strict fails upon invalid tag objects Johannes Schindelin
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 68+ messages in thread
From: Johannes Schindelin @ 2014-08-28 14:46 UTC (permalink / raw)
  To: gitster; +Cc: git

The intent of the two new test cases is to catch general breakages in
the fsck_tag() function, not so much to test it extensively, trying to
strike the proper balance between thoroughness and speed.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t1450-fsck.sh | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 8c739c9..16b3e4a 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -194,6 +194,45 @@ test_expect_success 'tag pointing to something else than its type' '
 	test_must_fail git fsck --tags
 '
 
+test_expect_success 'tag with incorrect tag name' '
+	sha=$(git rev-parse HEAD) &&
+	cat >wrong-tag <<-EOF &&
+	object $sha
+	type commit
+	tag wrong name format
+	tagger T A Gger <tagger@example.com> 1234567890 -0000
+
+	This is an invalid tag.
+	EOF
+
+	tag=$(git hash-object -t tag -w --stdin <wrong-tag) &&
+	test_when_finished "remove_object $tag" &&
+	echo $tag >.git/refs/tags/wrong &&
+	test_when_finished "git update-ref -d refs/tags/wrong" &&
+	git fsck --tags 2>out &&
+	cat out &&
+	grep "invalid .tag. name" out
+'
+
+test_expect_success 'tag with missing tagger' '
+	sha=$(git rev-parse HEAD) &&
+	cat >wrong-tag <<-EOF &&
+	object $sha
+	type commit
+	tag gutentag
+
+	This is an invalid tag.
+	EOF
+
+	tag=$(git hash-object -t tag -w --stdin <wrong-tag) &&
+	test_when_finished "remove_object $tag" &&
+	echo $tag >.git/refs/tags/wrong &&
+	test_when_finished "git update-ref -d refs/tags/wrong" &&
+	git fsck --tags 2>out &&
+	cat out &&
+	grep "expected .tagger. line" out
+'
+
 test_expect_success 'cleaned up' '
 	git fsck >actual 2>&1 &&
 	test_cmp empty actual
-- 
2.0.0.rc3.9669.g840d1f9

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 6/6] Make sure that index-pack --strict fails upon invalid tag objects
  2014-08-28 14:46 [PATCH 0/6] Improve tag checking in fsck and with transfer.fsckobjects Johannes Schindelin
                   ` (4 preceding siblings ...)
  2014-08-28 14:46 ` [PATCH 5/6] Add regression tests for stricter tag fsck'ing Johannes Schindelin
@ 2014-08-28 14:47 ` Johannes Schindelin
  2014-09-10 13:52 ` [PATCH v2 0/6] Improve tag checking in fsck and with transfer.fsckobjects Johannes Schindelin
       [not found] ` <cover.1410356761.git.johannes.schindelin@gmx.de>
  7 siblings, 0 replies; 68+ messages in thread
From: Johannes Schindelin @ 2014-08-28 14:47 UTC (permalink / raw)
  To: gitster; +Cc: git

One of the most important use cases for the strict tag object checking
is when transfer.fsckobjects is set to true to catch invalid objects
early on. This new regression test essentially tests the same code path
by directly calling 'index-pack --strict' on a pack containing an
invalid tag object.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t5302-pack-index.sh | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/t/t5302-pack-index.sh b/t/t5302-pack-index.sh
index 4bbb718..80bd3ee 100755
--- a/t/t5302-pack-index.sh
+++ b/t/t5302-pack-index.sh
@@ -243,4 +243,23 @@ test_expect_success 'running index-pack in the object store' '
     test -f .git/objects/pack/pack-${pack1}.idx
 '
 
+test_expect_success 'index-pack --strict fails upon invalid tag' '
+    sha=$(git rev-parse HEAD) &&
+    cat >wrong-tag <<EOF &&
+object $sha
+type commit
+tag guten tag
+
+This is an invalid tag.
+EOF
+
+    tag=$(git hash-object -t tag -w --stdin <wrong-tag) &&
+    pack1=$(echo $tag | git pack-objects tag-test) &&
+    echo remove tag object &&
+    thirtyeight=${tag#??} &&
+    rm -f .git/objects/${tag%$thirtyeight}/$thirtyeight &&
+    test_must_fail git index-pack --strict tag-test-${pack1}.pack 2> err &&
+    grep "invalid .tag. name" err
+'
+
 test_done
-- 
2.0.0.rc3.9669.g840d1f9

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH 1/6] Refactor type_from_string() to avoid die()ing in case of errors
  2014-08-28 14:46 ` [PATCH 1/6] Refactor type_from_string() to avoid die()ing in case of errors Johannes Schindelin
@ 2014-08-28 20:43   ` Junio C Hamano
  0 siblings, 0 replies; 68+ messages in thread
From: Junio C Hamano @ 2014-08-28 20:43 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <johannes.schindelin@gmx.de> writes:

> In the next commits, we will enhance the fsck_tag() function to check
> tag objects more thoroughly. To this end, we need a function to verify
> that a given string is a valid object type, but that does not die() in
> the negative case.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  object.c | 13 ++++++++++++-
>  object.h |  1 +
>  2 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/object.c b/object.c
> index a16b9f9..5eee592 100644
> --- a/object.c
> +++ b/object.c
> @@ -33,13 +33,24 @@ const char *typename(unsigned int type)
>  	return object_type_strings[type];
>  }
>  
> -int type_from_string(const char *str)
> +int type_from_string_gently(const char *str)
>  {
>  	int i;
>  
>  	for (i = 1; i < ARRAY_SIZE(object_type_strings); i++)
>  		if (!strcmp(str, object_type_strings[i]))
>  			return i;
> +
> +	return -1;
> +}
> +
> +int type_from_string(const char *str)
> +{
> +	int i = type_from_string_gently(str);
> +
> +	if (i >= 0)
> +		return i;
> +
>  	die("invalid object type \"%s\"", str);
>  }

Hmph.  The above is not wrong per-se, but I would have done

	int type_from_string_gently(const char *str, int gentle);
        #define type_from_string(str) type_from_string_gently((str), 1)

and added two lines to renamed type_from_string() function

+	if (gently)
+        	return -1;

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 2/6] Accept object data in the fsck_object() function
  2014-08-28 14:46 ` [PATCH 2/6] Accept object data in the fsck_object() function Johannes Schindelin
@ 2014-08-28 20:47   ` Junio C Hamano
  2014-08-29 23:10     ` Jeff King
  2014-08-29 23:05   ` Jeff King
  1 sibling, 1 reply; 68+ messages in thread
From: Junio C Hamano @ 2014-08-28 20:47 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <johannes.schindelin@gmx.de> writes:

> When fsck'ing an incoming pack, we need to fsck objects that cannot be
> read via read_sha1_file() because they are not local yet (and might even
> be rejected if transfer.fsckobjects is set to 'true').
>
> For commits, there is a hack in place: we basically cache commit
> objects' buffers anyway, but the same is not true, say, for tag objects.
>
> By refactoring fsck_object() to take the object buffer and size as
> optional arguments -- optional, because we still fall back to the
> previous method to look at the cached commit objects if the caller
> passes NULL -- we prepare the machinery for the upcoming handling of tag
> objects.
>
> The assumption that such buffers are inherently NUL terminated is now
> wrong, of course, hence we pass the size of the buffer so that we can
> add a sanity check later, to prevent running past the end of the buffer.

A nice side effect may be that we can now check (and perhaps warn) a
commit buffer with a NUL inside, perhaps?   I am not suggesting to
add such a check to this series, but mentioning the possibilty here
may have a merit.

> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  builtin/fsck.c           |  2 +-
>  builtin/index-pack.c     |  3 ++-
>  builtin/unpack-objects.c | 14 ++++++++++----
>  fsck.c                   | 24 +++++++++++++++---------
>  fsck.h                   |  4 +++-
>  5 files changed, 31 insertions(+), 16 deletions(-)
>
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index d42a27d..d9f4e6e 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -298,7 +298,7 @@ static int fsck_obj(struct object *obj)
>  
>  	if (fsck_walk(obj, mark_used, NULL))
>  		objerror(obj, "broken links");
> -	if (fsck_object(obj, check_strict, fsck_error_func))
> +	if (fsck_object(obj, NULL, 0, check_strict, fsck_error_func))
>  		return -1;
>  
>  	if (obj->type == OBJ_TREE) {
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index 5568a5b..f2465ff 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -773,7 +773,8 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
>  			if (!obj)
>  				die(_("invalid %s"), typename(type));
>  			if (do_fsck_object &&
> -			    fsck_object(obj, 1, fsck_error_function))
> +			    fsck_object(obj, buf, size, 1,
> +				    fsck_error_function))
>  				die(_("Error in object"));
>  			if (fsck_walk(obj, mark_link, NULL))
>  				die(_("Not all child objects of %s are reachable"), sha1_to_hex(obj->sha1));
> diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
> index 99cde45..855d94b 100644
> --- a/builtin/unpack-objects.c
> +++ b/builtin/unpack-objects.c
> @@ -164,10 +164,10 @@ static unsigned nr_objects;
>   * Called only from check_object() after it verified this object
>   * is Ok.
>   */
> -static void write_cached_object(struct object *obj)
> +static void write_cached_object(struct object *obj, struct obj_buffer *obj_buf)
>  {
>  	unsigned char sha1[20];
> -	struct obj_buffer *obj_buf = lookup_object_buffer(obj);
> +
>  	if (write_sha1_file(obj_buf->buffer, obj_buf->size, typename(obj->type), sha1) < 0)
>  		die("failed to write object %s", sha1_to_hex(obj->sha1));
>  	obj->flags |= FLAG_WRITTEN;
> @@ -180,6 +180,8 @@ static void write_cached_object(struct object *obj)
>   */
>  static int check_object(struct object *obj, int type, void *data)
>  {
> +	struct obj_buffer *obj_buf;
> +
>  	if (!obj)
>  		return 1;
>  
> @@ -198,11 +200,15 @@ static int check_object(struct object *obj, int type, void *data)
>  		return 0;
>  	}
>  
> -	if (fsck_object(obj, 1, fsck_error_function))
> +	obj_buf = lookup_object_buffer(obj);
> +	if (!obj_buf)
> +		die("Whoops! Cannot find object '%s'", sha1_to_hex(obj->sha1));
> +	if (fsck_object(obj, obj_buf->buffer, obj_buf->size, 1,
> +			fsck_error_function))
>  		die("Error in object");
>  	if (fsck_walk(obj, check_object, NULL))
>  		die("Error on reachable objects of %s", sha1_to_hex(obj->sha1));
> -	write_cached_object(obj);
> +	write_cached_object(obj, obj_buf);
>  	return 0;
>  }
>  
> diff --git a/fsck.c b/fsck.c
> index 56156ff..dd77628 100644
> --- a/fsck.c
> +++ b/fsck.c
> @@ -277,7 +277,7 @@ static int fsck_ident(const char **ident, struct object *obj, fsck_error error_f
>  }
>  
>  static int fsck_commit_buffer(struct commit *commit, const char *buffer,
> -			      fsck_error error_func)
> +	unsigned long size, fsck_error error_func)
>  {
>  	unsigned char tree_sha1[20], sha1[20];
>  	struct commit_graft *graft;
> @@ -322,15 +322,18 @@ static int fsck_commit_buffer(struct commit *commit, const char *buffer,
>  	return 0;
>  }
>  
> -static int fsck_commit(struct commit *commit, fsck_error error_func)
> +static int fsck_commit(struct commit *commit, const char *data,
> +	unsigned long size, fsck_error error_func)
>  {
> -	const char *buffer = get_commit_buffer(commit, NULL);
> -	int ret = fsck_commit_buffer(commit, buffer, error_func);
> -	unuse_commit_buffer(commit, buffer);
> +	const char *buffer = data ?  data : get_commit_buffer(commit, &size);
> +	int ret = fsck_commit_buffer(commit, buffer, size, error_func);
> +	if (!data)
> +		unuse_commit_buffer(commit, buffer);
>  	return ret;
>  }
>  
> -static int fsck_tag(struct tag *tag, fsck_error error_func)
> +static int fsck_tag(struct tag *tag, const char *data,
> +	unsigned long size, fsck_error error_func)
>  {
>  	struct object *tagged = tag->tagged;
>  
> @@ -339,7 +342,8 @@ static int fsck_tag(struct tag *tag, fsck_error error_func)
>  	return 0;
>  }
>  
> -int fsck_object(struct object *obj, int strict, fsck_error error_func)
> +int fsck_object(struct object *obj, void *data, unsigned long size,
> +	int strict, fsck_error error_func)
>  {
>  	if (!obj)
>  		return error_func(obj, FSCK_ERROR, "no valid object to fsck");
> @@ -349,9 +353,11 @@ int fsck_object(struct object *obj, int strict, fsck_error error_func)
>  	if (obj->type == OBJ_TREE)
>  		return fsck_tree((struct tree *) obj, strict, error_func);
>  	if (obj->type == OBJ_COMMIT)
> -		return fsck_commit((struct commit *) obj, error_func);
> +		return fsck_commit((struct commit *) obj, (const char *) data,
> +			size, error_func);
>  	if (obj->type == OBJ_TAG)
> -		return fsck_tag((struct tag *) obj, error_func);
> +		return fsck_tag((struct tag *) obj, (const char *) data,
> +			size, error_func);
>  
>  	return error_func(obj, FSCK_ERROR, "unknown type '%d' (internal fsck error)",
>  			  obj->type);
> diff --git a/fsck.h b/fsck.h
> index 1e4f527..d1e6387 100644
> --- a/fsck.h
> +++ b/fsck.h
> @@ -28,6 +28,8 @@ int fsck_error_function(struct object *obj, int type, const char *fmt, ...);
>   *    0		everything OK
>   */
>  int fsck_walk(struct object *obj, fsck_walk_func walk, void *data);
> -int fsck_object(struct object *obj, int strict, fsck_error error_func);
> +/* If NULL is passed for data, we assume the object is local and read it. */
> +int fsck_object(struct object *obj, void *data, unsigned long size,
> +	int strict, fsck_error error_func);
>  
>  #endif

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 3/6] Make sure fsck_commit_buffer() does not run out of the buffer
  2014-08-28 14:46 ` [PATCH 3/6] Make sure fsck_commit_buffer() does not run out of the buffer Johannes Schindelin
@ 2014-08-28 20:59   ` Junio C Hamano
  2014-08-29 23:27   ` Jeff King
  1 sibling, 0 replies; 68+ messages in thread
From: Junio C Hamano @ 2014-08-28 20:59 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <johannes.schindelin@gmx.de> writes:

> So far, we assumed that the buffer is NUL terminated, but this is not
> a safe assumption, now that we opened the fsck_object() API to pass a
> buffer directly.
>
> So let's make sure that there is at least an empty line in the buffer.
> That way, our checks would fail if the empty line was encountered
> prematurely, and consequently we can get away with the current string
> comparisons even with non-NUL-terminated buffers are passed to
> fsck_object().
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---

Heh, I probably should have read this one before commenting on 2/6.

It makes me feel somewhat uneasy to insist that there must be a
blank line in the commit object, even though from the very first
implementation of "commit-tree" I think we have always had a blank
there at the end of the header, even when you feed nothing as the
message to it.

I think the new check is OK, but the message should be s/empty
line/end of header/ or something.  It is not like we require an
empty line in the log message proper.

>  fsck.c | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
>
> diff --git a/fsck.c b/fsck.c
> index dd77628..db6aaa4 100644
> --- a/fsck.c
> +++ b/fsck.c
> @@ -237,6 +237,26 @@ static int fsck_tree(struct tree *item, int strict, fsck_error error_func)
>  	return retval;
>  }
>  
> +static int must_have_empty_line(const void *data, unsigned long size,
> +	struct object *obj, fsck_error error_func)
> +{
> +	const char *buffer = (const char *)data;
> +	int i;
> +
> +	for (i = 0; i < size; i++) {
> +		switch (buffer[i]) {
> +		case '\0':
> +			return error_func(obj, FSCK_ERROR,
> +				"invalid message: NUL at offset %d", i);
> +		case '\n':
> +			if (i + 1 < size && buffer[i + 1] == '\n')
> +				return 0;
> +		}
> +	}
> +
> +	return error_func(obj, FSCK_ERROR, "invalid buffer: missing empty line");
> +}
> +
>  static int fsck_ident(const char **ident, struct object *obj, fsck_error error_func)
>  {
>  	char *end;
> @@ -284,6 +304,9 @@ static int fsck_commit_buffer(struct commit *commit, const char *buffer,
>  	unsigned parent_count, parent_line_count = 0;
>  	int err;
>  
> +	if (must_have_empty_line(buffer, size, &commit->object, error_func))
> +		return -1;
> +
>  	if (!skip_prefix(buffer, "tree ", &buffer))
>  		return error_func(&commit->object, FSCK_ERROR, "invalid format - expected 'tree' line");
>  	if (get_sha1_hex(buffer, tree_sha1) || buffer[40] != '\n')

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 4/6] fsck: check tag objects' headers
  2014-08-28 14:46 ` [PATCH 4/6] fsck: check tag objects' headers Johannes Schindelin
@ 2014-08-28 21:25   ` Junio C Hamano
  2014-08-28 21:36     ` Junio C Hamano
  2014-08-29 23:43     ` Jeff King
  0 siblings, 2 replies; 68+ messages in thread
From: Junio C Hamano @ 2014-08-28 21:25 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <johannes.schindelin@gmx.de> writes:

> We inspect commit objects pretty much in detail in git-fsck, but we just
> glanced over the tag objects. Let's be stricter.
>
> This work was sponsored by GitHub Inc.

Is it only this commit, or all of these patches in the series?
Does GitHub want their name sprinkled over all changes they sponsor?

>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  fsck.c | 88 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 87 insertions(+), 1 deletion(-)
>
> diff --git a/fsck.c b/fsck.c
> index db6aaa4..d30946b 100644
> --- a/fsck.c
> +++ b/fsck.c
> @@ -6,6 +6,7 @@
>  #include "commit.h"
>  #include "tag.h"
>  #include "fsck.h"
> +#include "refs.h"
>  
>  static int fsck_walk_tree(struct tree *tree, fsck_walk_func walk, void *data)
>  {
> @@ -355,6 +356,90 @@ static int fsck_commit(struct commit *commit, const char *data,
>  	return ret;
>  }
>  
> +static int fsck_tag_buffer(struct tag *tag, const char *data,
> +	unsigned long size, fsck_error error_func)
> +{
> +	unsigned char commit_sha1[20];
> +	int ret = 0;
> +	const char *buffer;
> +	char *tmp = NULL, *eol;

Call it "to_free" or something; naming it 'tmp' would tempt people
who later touch this code to reuse it temporarily to hold something
unrelated (I know they will notice that mistake later, but noticing
mistake after wasting time is too late).

> +	if (data)
> +		buffer = data;
> +	else {
> +		enum object_type type;
> +
> +		buffer = tmp = read_sha1_file(tag->object.sha1, &type, &size);
> +		if (!buffer)
> +			return error_func(&tag->object, FSCK_ERROR,
> +				"cannot read tag object");
> +
> +		if (type != OBJ_TAG) {
> +			ret = error_func(&tag->object, FSCK_ERROR,
> +				"expected tag got %s",
> +			    typename(type));
> +			goto done;
> +		}
> +	}
> +
> +	if (must_have_empty_line(buffer, size, &tag->object, error_func))
> +		goto done;
> +
> +	if (!skip_prefix(buffer, "object ", &buffer)) {
> +		ret = error_func(&tag->object, FSCK_ERROR, "invalid format - expected 'object' line");
> +		goto done;
> +	}
> +	if (get_sha1_hex(buffer, commit_sha1) || buffer[40] != '\n') {

This code is not making the mistake to assume that tagged objects
are always commits, so let's not call it commit_sha1; I'd suggest 
just calling it sha1[] (or even "tmp" or "junk", as the parsed value
is not used).

> +	*eol = '\0';
> +	if (type_from_string_gently(buffer) < 0)
> +		ret = error_func(&tag->object, FSCK_ERROR, "invalid 'type' value");
> +	*eol = '\n';
> +	if (ret)
> +		goto done;

I can see that it is reverted back to '\n' immediately after calling
type_from_string_gently(), but it is a bit unfortunate that "const
char *data" needs to be touched this way.

Since the callee is introduced in this series, perhaps it can be
made to take a counted string?

> +	if (!skip_prefix(buffer, "tag ", &buffer)) {
> +		ret = error_func(&tag->object, FSCK_ERROR, "invalid format - expected 'tag' line");
> +		goto done;
> +	}
> +	eol = strchr(buffer, '\n');
> +	if (!eol) {
> +		ret = error_func(&tag->object, FSCK_ERROR, "invalid format - unexpected end after 'type' line");
> +		goto done;
> +	}
> +	*eol = '\0';
> +	if (check_refname_format(buffer, REFNAME_ALLOW_ONELEVEL))
> +		ret = error_func(&tag->object, FSCK_ERROR, "invalid 'tag' name: %s", buffer);
> +	*eol = '\n';

I actually think this check is harmful.  It often matches the name
of the ref, but there is nothing inherently "refname like" in the
tag name proper.

And I think it is unnecessary.  Don't we already warn if it does not
match the name of the ref we use to point at it?  It would mean that
anything that does not conform to the check-refname-format will get
a warning one way or the other, either it is pointed at by a ref
whose name is kosher and does not match, or a ref itself has a name
that does not pass check-refname-format.

(goes and looks)

Yikes.  I assumed too much.  We do not seem to do much checking on
refs in that

    (1) After "git rev-parse HEAD >.git/refs/heads/ee..oo"
        "fsck" does not complain about the malformed ee..oo branch;

    (2) After "git tag -a foo -m foo && cp .git/refs/tags/foo .git/refs/tags/bar"
        "fsck" does not complain that refs/tags/bar talks about "foo"

But these two are something we would want to fix in a larger context
within "git fack" anyway, so my comments above still stand.

> +	if (!skip_prefix(buffer, "tagger ", &buffer)) {
> +		/* early tags do not contain 'tagger' lines; warn only */
> +		error_func(&tag->object, FSCK_WARN, "invalid format - expected 'tagger' line");

Nice.  Even nicer that this explains why people should not touch
'ret' here.

> +	}
> +	ret = fsck_ident(&buffer, &tag->object, error_func);
> +
> +done:
> +	free(tmp);
> +	return ret;
> +}
> +
>  static int fsck_tag(struct tag *tag, const char *data,
>  	unsigned long size, fsck_error error_func)
>  {
> @@ -362,7 +447,8 @@ static int fsck_tag(struct tag *tag, const char *data,
>  
>  	if (!tagged)
>  		return error_func(&tag->object, FSCK_ERROR, "could not load tagged object");
> -	return 0;
> +
> +	return fsck_tag_buffer(tag, data, size, error_func);
>  }
>  
>  int fsck_object(struct object *obj, void *data, unsigned long size,

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 4/6] fsck: check tag objects' headers
  2014-08-28 21:25   ` Junio C Hamano
@ 2014-08-28 21:36     ` Junio C Hamano
  2014-08-29 23:46       ` Jeff King
  2014-08-29 23:43     ` Jeff King
  1 sibling, 1 reply; 68+ messages in thread
From: Junio C Hamano @ 2014-08-28 21:36 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

>> +	if (check_refname_format(buffer, REFNAME_ALLOW_ONELEVEL))
>> +		ret = error_func(&tag->object, FSCK_ERROR, "invalid 'tag' name: %s", buffer);
>> +	*eol = '\n';
>
> I actually think this check is harmful.

Let me take this one back; we do a moral equivalent when we create a
tag, like this:

	strbuf_addf(sb, "refs/tags/%s", name);
        return check_refname_format(sb->buf, 0);

So validating using check_refname_format() is indeed a very good
thing to do.

As you have length and buffer here, I would suggest updating this
part of your patch to print into a strbuf

	strbuf_addf(&sb, "refs/tags/%.*s", (eol - buffer), buffer);
       	if (check_refname_format(sb.buf))
        	ret = ...

and keep the constness of the incoming data.

	

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 2/6] Accept object data in the fsck_object() function
  2014-08-28 14:46 ` [PATCH 2/6] Accept object data in the fsck_object() function Johannes Schindelin
  2014-08-28 20:47   ` Junio C Hamano
@ 2014-08-29 23:05   ` Jeff King
  1 sibling, 0 replies; 68+ messages in thread
From: Jeff King @ 2014-08-29 23:05 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: gitster, git

On Thu, Aug 28, 2014 at 04:46:42PM +0200, Johannes Schindelin wrote:

> When fsck'ing an incoming pack, we need to fsck objects that cannot be
> read via read_sha1_file() because they are not local yet (and might even
> be rejected if transfer.fsckobjects is set to 'true').
> 
> For commits, there is a hack in place: we basically cache commit
> objects' buffers anyway, but the same is not true, say, for tag objects.
> 
> By refactoring fsck_object() to take the object buffer and size as
> optional arguments -- optional, because we still fall back to the
> previous method to look at the cached commit objects if the caller
> passes NULL -- we prepare the machinery for the upcoming handling of tag
> objects.
> 
> The assumption that such buffers are inherently NUL terminated is now
> wrong, of course, hence we pass the size of the buffer so that we can
> add a sanity check later, to prevent running past the end of the buffer.

After this, can we get rid of passing a "struct object" to the fsck code
entirely? It should be able to operate only on a type/buffer pair, along
with possibly a sha1 for error reporting.

The buffer management in index-pack's sha1_object function is
significantly complicated by attaching the temporary buffer to a "struct
commit", only to then have to detach it a few lines later. It is the
sole reason we had to add "detach_commit_buffer" a while ago.

-Peff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 2/6] Accept object data in the fsck_object() function
  2014-08-28 20:47   ` Junio C Hamano
@ 2014-08-29 23:10     ` Jeff King
  0 siblings, 0 replies; 68+ messages in thread
From: Jeff King @ 2014-08-29 23:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, git

On Thu, Aug 28, 2014 at 01:47:52PM -0700, Junio C Hamano wrote:

> Johannes Schindelin <johannes.schindelin@gmx.de> writes:
> 
> > When fsck'ing an incoming pack, we need to fsck objects that cannot be
> > read via read_sha1_file() because they are not local yet (and might even
> > be rejected if transfer.fsckobjects is set to 'true').
> >
> > For commits, there is a hack in place: we basically cache commit
> > objects' buffers anyway, but the same is not true, say, for tag objects.
> >
> > By refactoring fsck_object() to take the object buffer and size as
> > optional arguments -- optional, because we still fall back to the
> > previous method to look at the cached commit objects if the caller
> > passes NULL -- we prepare the machinery for the upcoming handling of tag
> > objects.
> >
> > The assumption that such buffers are inherently NUL terminated is now
> > wrong, of course, hence we pass the size of the buffer so that we can
> > add a sanity check later, to prevent running past the end of the buffer.
> 
> A nice side effect may be that we can now check (and perhaps warn) a
> commit buffer with a NUL inside, perhaps?   I am not suggesting to
> add such a check to this series, but mentioning the possibilty here
> may have a merit.

I think that is a good check to add at some point. I demonstrated quite
a while ago that you can get up to some mischief by "hiding" bytes after
a NUL in the commit message. If we ever see feasible collision attacks
against sha1, this is going to be an obvious vector for hiding random
bytes to achieve the collisions.

The downside is that git's data model in theory promises to store
arbitrary bytes in people's commit messages. IMHO we have gotten far
enough from that in practice that I do not think anybody is seriously
doing it (you could not really use most of the history inspection tools
in a reasonable way; you would have to write a parallel set of tools
that never shows log messages, and then extracts and does something
separate with the commit content).

-Peff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 3/6] Make sure fsck_commit_buffer() does not run out of the buffer
  2014-08-28 14:46 ` [PATCH 3/6] Make sure fsck_commit_buffer() does not run out of the buffer Johannes Schindelin
  2014-08-28 20:59   ` Junio C Hamano
@ 2014-08-29 23:27   ` Jeff King
  1 sibling, 0 replies; 68+ messages in thread
From: Jeff King @ 2014-08-29 23:27 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: gitster, git

On Thu, Aug 28, 2014 at 04:46:49PM +0200, Johannes Schindelin wrote:

> So far, we assumed that the buffer is NUL terminated, but this is not
> a safe assumption, now that we opened the fsck_object() API to pass a
> buffer directly.
> 
> So let's make sure that there is at least an empty line in the buffer.
> That way, our checks would fail if the empty line was encountered
> prematurely, and consequently we can get away with the current string
> comparisons even with non-NUL-terminated buffers are passed to
> fsck_object().

Hmm. So having looked through all of the code, I _think_ this is enough
because all of the parsing:

  - is left-to-right, and stops when it sees something unexpected, which
    would include a double newline

  - uses strcspn for the string-dependent calls, and each call has "\n"
    in its list of stop-characters.

But it seems kind of flaky and easy for later code to get wrong.

Would it be that hard to just allocate an extra NUL at the end of
objects we inflate in index-pack? All of the regular object code paths
provide the extra NUL-termination, and many parts of git depend on that.
I do not think we call into any other reusable code paths besides fsck()
from index-pack, but if we were to add the NUL, we would future-proof
against these "weird" object buffers getting passed elsewhere.

-Peff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 4/6] fsck: check tag objects' headers
  2014-08-28 21:25   ` Junio C Hamano
  2014-08-28 21:36     ` Junio C Hamano
@ 2014-08-29 23:43     ` Jeff King
  2014-09-02 18:41       ` Junio C Hamano
  1 sibling, 1 reply; 68+ messages in thread
From: Jeff King @ 2014-08-29 23:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, git

On Thu, Aug 28, 2014 at 02:25:16PM -0700, Junio C Hamano wrote:

> Johannes Schindelin <johannes.schindelin@gmx.de> writes:
> 
> > We inspect commit objects pretty much in detail in git-fsck, but we just
> > glanced over the tag objects. Let's be stricter.
> >
> > This work was sponsored by GitHub Inc.
> 
> Is it only this commit, or all of these patches in the series?
> Does GitHub want their name sprinkled over all changes they sponsor?

GitHub does not really care either way. I think it is well-known that we
sponsor some git development (i.e., pretty much everything I and Michael
work on), and we do not need that fact sprinkled in the commit history.

But we are also happy for that fact to be transparent if it changes
people's opinions on whether the patch is a good idea (i.e., to know
that Johannes has some motive beyond just "I think this is the right
thing to do"; I hope he _also_ thinks it is the right thing to do or
would not post the series, of course).

Personally, I think the cover letter is a good place for such things.

-Peff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 4/6] fsck: check tag objects' headers
  2014-08-28 21:36     ` Junio C Hamano
@ 2014-08-29 23:46       ` Jeff King
  2014-08-31 22:46         ` Junio C Hamano
  0 siblings, 1 reply; 68+ messages in thread
From: Jeff King @ 2014-08-29 23:46 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, git

On Thu, Aug 28, 2014 at 02:36:22PM -0700, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
> 
> >> +	if (check_refname_format(buffer, REFNAME_ALLOW_ONELEVEL))
> >> +		ret = error_func(&tag->object, FSCK_ERROR, "invalid 'tag' name: %s", buffer);
> >> +	*eol = '\n';
> >
> > I actually think this check is harmful.
> 
> Let me take this one back; we do a moral equivalent when we create a
> tag, like this:
> 
> 	strbuf_addf(sb, "refs/tags/%s", name);
>         return check_refname_format(sb->buf, 0);

Hmm. But that is because "git tag" always makes one type of tag: one in
which the "tag" field is the same as the refname in which we store it.
So the name must be a valid refname there to meet the ref storage
requirement, and therefore the tag name must, too.

But is that something we necessarily need or want to enforce? Is it OK
for me to have refs/tags/foo pointing to a tag object that is not
related to "foo" (either semantically or syntactically)?

I dunno. I cannot think of a reason you would want to do such a thing,
but this seems like outlawing it because git does not generate it, not
because it is necessarily a problematic thing to be doing.

-Peff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 4/6] fsck: check tag objects' headers
  2014-08-29 23:46       ` Jeff King
@ 2014-08-31 22:46         ` Junio C Hamano
  2014-09-03 22:29           ` Jeff King
  0 siblings, 1 reply; 68+ messages in thread
From: Junio C Hamano @ 2014-08-31 22:46 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, git

Jeff King <peff@peff.net> writes:

> Hmm. But that is because "git tag" always makes one type of tag: one in
> which the "tag" field is the same as the refname in which we store it.
> So the name must be a valid refname there to meet the ref storage
> requirement, and therefore the tag name must, too.
>
> But is that something we necessarily need or want to enforce? Is it OK
> for me to have refs/tags/foo pointing to a tag object that is not
> related to "foo" (either semantically or syntactically)?
>
> I dunno. I cannot think of a reason you would want to do such a thing,
> but this seems like outlawing it because git does not generate it, not
> because it is necessarily a problematic thing to be doing.

Thanks for straightening me out.

If "git fsck" were a tool to validate that the objects and refs are
in line with how "git-core" plumbing and Porcelain toolset uses the
underlying Git data model, it makes sense to insist a tag has a name
that is suitable for a refname, and the tag is pointed by a ref in
"refs/tags/" followed by its name.  The rules such a "git fsck" should
implement would be stricter than what the underlying Git data model
could represent and existing Git tools could handle (i.e. a commit
with broken ident line may not be usable with "shortlog -e" and would
be flagged as corrupt).

But tightening rules in that direction may risk hindering future
progress in an unnecessary way.  We may want to be a bit lenient
when we see something _unusual_ but not necessarily _wrong_, and the
line between them would be blurry in places, as Git is an evolving
software.  It is good to warn about an unsual ones, but we probably
would not want to error on them.

This tightening may be too strict without a very good reason.  For
example, a tentative signed tag (e.g. "for-linus") often used in a
pull request to have it recorded in the resulting merge by the
integrator does not inherently need to be named at all; the ref is
only necessary as a means to transfer the signature from the
contributor to the integrator, and once merged, there is no need for
the tag to have any name.  When we try to improve the workflow to
integrate authenticated work done on the side branch, we may come up
with a way to do so _without_ having to actually have a tag name
(i.e. the "tag" contributor creates for such a purpose may not be
done by "git tag -s" when asking the result to be pulled but do
something different, and it may be perfectly fine for such a
tentative tag to lack the "tag " name line), but still allows us to
record the same merge-tag in the resulting merge commit.

So...

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 4/6] fsck: check tag objects' headers
  2014-08-29 23:43     ` Jeff King
@ 2014-09-02 18:41       ` Junio C Hamano
  2014-09-03 21:38         ` Jeff King
  0 siblings, 1 reply; 68+ messages in thread
From: Junio C Hamano @ 2014-09-02 18:41 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, git

Jeff King <peff@peff.net> writes:

> On Thu, Aug 28, 2014 at 02:25:16PM -0700, Junio C Hamano wrote:
>
>> Johannes Schindelin <johannes.schindelin@gmx.de> writes:
>> 
>> > We inspect commit objects pretty much in detail in git-fsck, but we just
>> > glanced over the tag objects. Let's be stricter.
>> >
>> > This work was sponsored by GitHub Inc.
>> 
>> Is it only this commit, or all of these patches in the series?
>> Does GitHub want their name sprinkled over all changes they sponsor?
>
> GitHub does not really care either way. I think it is well-known that we
> sponsor some git development (i.e., pretty much everything I and Michael
> work on), and we do not need that fact sprinkled in the commit history.
>
> But we are also happy for that fact to be transparent if it changes
> people's opinions on whether the patch is a good idea (i.e., to know
> that Johannes has some motive beyond just "I think this is the right
> thing to do"; I hope he _also_ thinks it is the right thing to do or
> would not post the series, of course).
>
> Personally, I think the cover letter is a good place for such things.

Yeah, and I saw it explained in the cover.  It looked somewhat out
of place to see it duplicated only for this patch in the 6 patch
series and I was wondering if there was something special about this
specific patch, hence my question.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 4/6] fsck: check tag objects' headers
  2014-09-02 18:41       ` Junio C Hamano
@ 2014-09-03 21:38         ` Jeff King
  0 siblings, 0 replies; 68+ messages in thread
From: Jeff King @ 2014-09-03 21:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, git

On Tue, Sep 02, 2014 at 11:41:11AM -0700, Junio C Hamano wrote:

> > Personally, I think the cover letter is a good place for such things.
> 
> Yeah, and I saw it explained in the cover.  It looked somewhat out
> of place to see it duplicated only for this patch in the 6 patch
> series and I was wondering if there was something special about this
> specific patch, hence my question.

Ah. For some reason the cover letter did not make it to me[1]. I agree
that what is there is fine, and it can be dropped from the individual
commit message.

-Peff

[1] It did make it to gmane, so I guess the problem was on my end.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 4/6] fsck: check tag objects' headers
  2014-08-31 22:46         ` Junio C Hamano
@ 2014-09-03 22:29           ` Jeff King
  2014-09-03 23:14             ` Junio C Hamano
  0 siblings, 1 reply; 68+ messages in thread
From: Jeff King @ 2014-09-03 22:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, git

On Sun, Aug 31, 2014 at 03:46:42PM -0700, Junio C Hamano wrote:

> If "git fsck" were a tool to validate that the objects and refs are
> in line with how "git-core" plumbing and Porcelain toolset uses the
> underlying Git data model, it makes sense to insist a tag has a name
> that is suitable for a refname, and the tag is pointed by a ref in
> "refs/tags/" followed by its name.  The rules such a "git fsck" should
> implement would be stricter than what the underlying Git data model
> could represent and existing Git tools could handle (i.e. a commit
> with broken ident line may not be usable with "shortlog -e" and would
> be flagged as corrupt).

This is a bit of an aside, but why do we have the "tag" line in the tag
object in the first place?

It is part of the object contents, and therefore is part of the
signature (which the refname is not). That's somewhat redundant with the
tag message itself. E.g., the git v2.0.4 tag says:

  object 32f56600bb6ac6fc57183e79d2c1515dfa56672f
  type commit
  tag v2.0.4
  tagger Junio C Hamano <gitster@pobox.com> 1406755201 -0700

  Git 2.0.4
  -----BEGIN PGP SIGNATURE-----
  ...

Imagine an evil person pushed the signed v2.0.4 tag to refs/tags/v2.1.0
(perhaps because there is a bug in v2.0.4, and they want you to run the
wrong version so they can exploit it). You can check with "git show"
that the "tag" field is actually v2.0.4, but then you could similarly
check that the message says "Git 2.0.4".

The main advantage of the "tag" field is that it is machine-readable,
and that your verification process can check that "git verify-tag
v2.1.0" actually returns a tag that says "tag v2.1.0". But I do not
think we do that verification at all. I wonder if that is something we
should add support for.

You gave examples later in your email of tags that would not necessarily
care about this tag field (and anyway, if "for-linus" is used over and
over, it is subject to these sorts of replays), so I do not think it is
something we would want unconditionally in verify-tag.

I think this may need to be filed under "possible policy flags for
verifying" that we discussed earlier (i.e., in the same boat as "does
the committer ident match the commit signature", as it is a
porcelain-ish policy, not an integral part of the plumbing).

So this is mostly food for thought at this point.

-Peff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 4/6] fsck: check tag objects' headers
  2014-09-03 22:29           ` Jeff King
@ 2014-09-03 23:14             ` Junio C Hamano
  2014-09-04  2:04               ` Jeff King
  0 siblings, 1 reply; 68+ messages in thread
From: Junio C Hamano @ 2014-09-03 23:14 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, git

Jeff King <peff@peff.net> writes:

> This is a bit of an aside, but why do we have the "tag" line in the tag
> object in the first place?

http://thread.gmane.org/gmane.linux.kernel/297998/focus=1410

> It is part of the object contents, and therefore is part of the
> signature (which the refname is not). That's somewhat redundant with the
> tag message itself. E.g., the git v2.0.4 tag says:
>
>   object 32f56600bb6ac6fc57183e79d2c1515dfa56672f
>   type commit
>   tag v2.0.4
>   tagger Junio C Hamano <gitster@pobox.com> 1406755201 -0700
>
>   Git 2.0.4
>   -----BEGIN PGP SIGNATURE-----
>   ...

Yes, usually we write a moral equivalent in a human readable form as
the tag message, but the mapping "s/^v/Git /" between the tag name
and the message is purely by convention, and I suspect some old tags
I have may even have used "s/^v/Git v/" or "s/^v/git v/" or a
similar inconsistent mapping.

> The main advantage of the "tag" field is that it is machine-readable,
> and that your verification process can check that "git verify-tag
> v2.1.0" actually returns a tag that says "tag v2.1.0". But I do not
> think we do that verification at all. I wonder if that is something we
> should add support for.

Yes.  That essentially boils down to "refs/tags/$tag" must have "tag $tag"
line (the reverse may not have to be true if the hierarchy is
outside refs/tags/, though).

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH 4/6] fsck: check tag objects' headers
  2014-09-03 23:14             ` Junio C Hamano
@ 2014-09-04  2:04               ` Jeff King
  0 siblings, 0 replies; 68+ messages in thread
From: Jeff King @ 2014-09-04  2:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, git

On Wed, Sep 03, 2014 at 04:14:34PM -0700, Junio C Hamano wrote:

> > The main advantage of the "tag" field is that it is machine-readable,
> > and that your verification process can check that "git verify-tag
> > v2.1.0" actually returns a tag that says "tag v2.1.0". But I do not
> > think we do that verification at all. I wonder if that is something we
> > should add support for.
> 
> Yes.  That essentially boils down to "refs/tags/$tag" must have "tag $tag"
> line (the reverse may not have to be true if the hierarchy is
> outside refs/tags/, though).

Exactly. But to just bring it back to the original series, I do not
think this is a thing that fsck should care about. It is something that
verify-tag should care about (but again, not unconditionally, and
probably not by default).

-Peff

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 0/6] Improve tag checking in fsck and with transfer.fsckobjects
  2014-08-28 14:46 [PATCH 0/6] Improve tag checking in fsck and with transfer.fsckobjects Johannes Schindelin
                   ` (5 preceding siblings ...)
  2014-08-28 14:47 ` [PATCH 6/6] Make sure that index-pack --strict fails upon invalid tag objects Johannes Schindelin
@ 2014-09-10 13:52 ` Johannes Schindelin
  2014-09-10 13:58   ` Johannes Schindelin
                     ` (2 more replies)
       [not found] ` <cover.1410356761.git.johannes.schindelin@gmx.de>
  7 siblings, 3 replies; 68+ messages in thread
From: Johannes Schindelin @ 2014-09-10 13:52 UTC (permalink / raw)
  To: gitster; +Cc: git

This patch series introduces detailed checking of tag objects when calling
git fsck, and also when transfer.fsckobjects is set to true.

To this end, the fsck machinery is reworked to accept the buffer and size
of the object to check, and for commit and tag objects, we verify that the
buffers contain an end of header (i.e. an empty line) to guarantee that our
checks do not run beyond the buffer.

This work was sponsored by GitHub.

Changes since v1:

- Let type_from_string_gently() accept an optional string length

- Renamed 'tmp' to 'to_free'

- Renamed must_have_empty_line() to require_end_of_header()

- Renamed 'commit_sha1' in fsck_tag_buffer to 'sha1'

- Avoided temporarily inserting NUL characters into the buffer

- Demoted problematic 'tag' value to a mere warning

- Mentioned GitHub only in the cover letter, i.e. keeping it out of the git log

Still unaddressed:

- getting rid of struct object altogether in fsck (I felt this was quite a big
  task, getting much more familiar with the non-tag code paths, and I did not
  want to delay this patch series up any further)

- ensuring that index-pack passes only NUL-terminated buffers to fsck (again,
  I am not familiar enough with the code, and IIRC the problematic unit test
  that revealed that these buffers are not always NUL-terminated exercised the
  unpack-objects code path, not index-pack, again nothing I wanted to let
  delay this patch series any further).

Johannes Schindelin (6):
  Refactor type_from_string() to avoid die()ing in case of errors
  Accept object data in the fsck_object() function
  Make sure fsck_commit_buffer() does not run out of the buffer
  fsck: check tag objects' headers
  Add regression tests for stricter tag fsck'ing
  Make sure that index-pack --strict fails upon invalid tag objects

 builtin/fsck.c           |   2 +-
 builtin/index-pack.c     |   3 +-
 builtin/unpack-objects.c |  14 +++--
 fsck.c                   | 132 +++++++++++++++++++++++++++++++++++++++++++----
 fsck.h                   |   4 +-
 object.c                 |  11 +++-
 object.h                 |   3 +-
 t/t1450-fsck.sh          |  39 ++++++++++++++
 t/t5302-pack-index.sh    |  19 +++++++
 9 files changed, 207 insertions(+), 20 deletions(-)

-- 
2.0.0.rc3.9669.g840d1f9

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v2 1/6] Refactor type_from_string() to avoid die()ing in case of errors
       [not found] ` <cover.1410356761.git.johannes.schindelin@gmx.de>
@ 2014-09-10 13:52   ` Johannes Schindelin
  2014-09-10 13:52   ` [PATCH v2 2/6] Accept object data in the fsck_object() function Johannes Schindelin
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 68+ messages in thread
From: Johannes Schindelin @ 2014-09-10 13:52 UTC (permalink / raw)
  To: gitster; +Cc: git

In the next commits, we will enhance the fsck_tag() function to check
tag objects more thoroughly. To this end, we need a function to verify
that a given string is a valid object type, but that does not die() in
the negative case.

While at it, prepare type_from_string() for counted strings, i.e. strings
with an explicitly specified length rather than a NUL termination.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 object.c | 11 +++++++++--
 object.h |  3 ++-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/object.c b/object.c
index a16b9f9..aedac24 100644
--- a/object.c
+++ b/object.c
@@ -33,13 +33,20 @@ const char *typename(unsigned int type)
 	return object_type_strings[type];
 }
 
-int type_from_string(const char *str)
+int type_from_string_gently(const char *str, ssize_t len, int gentle)
 {
 	int i;
 
+	if (len < 0)
+		len = strlen(str);
+
 	for (i = 1; i < ARRAY_SIZE(object_type_strings); i++)
-		if (!strcmp(str, object_type_strings[i]))
+		if (!strncmp(str, object_type_strings[i], len))
 			return i;
+
+	if (gentle)
+		return -1;
+
 	die("invalid object type \"%s\"", str);
 }
 
diff --git a/object.h b/object.h
index 5e8d8ee..e028ced 100644
--- a/object.h
+++ b/object.h
@@ -53,7 +53,8 @@ struct object {
 };
 
 extern const char *typename(unsigned int type);
-extern int type_from_string(const char *str);
+extern int type_from_string_gently(const char *str, ssize_t, int gentle);
+#define type_from_string(str) type_from_string_gently(str, -1, 0)
 
 /*
  * Return the current number of buckets in the object hashmap.
-- 
2.0.0.rc3.9669.g840d1f9

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 2/6] Accept object data in the fsck_object() function
       [not found] ` <cover.1410356761.git.johannes.schindelin@gmx.de>
  2014-09-10 13:52   ` [PATCH v2 1/6] Refactor type_from_string() to avoid die()ing in case of errors Johannes Schindelin
@ 2014-09-10 13:52   ` Johannes Schindelin
  2014-09-10 13:52   ` [PATCH v2 3/6] Make sure fsck_commit_buffer() does not run out of the buffer Johannes Schindelin
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 68+ messages in thread
From: Johannes Schindelin @ 2014-09-10 13:52 UTC (permalink / raw)
  To: gitster; +Cc: git

When fsck'ing an incoming pack, we need to fsck objects that cannot be
read via read_sha1_file() because they are not local yet (and might even
be rejected if transfer.fsckobjects is set to 'true').

For commits, there is a hack in place: we basically cache commit
objects' buffers anyway, but the same is not true, say, for tag objects.

By refactoring fsck_object() to take the object buffer and size as
optional arguments -- optional, because we still fall back to the
previous method to look at the cached commit objects if the caller
passes NULL -- we prepare the machinery for the upcoming handling of tag
objects.

The assumption that such buffers are inherently NUL terminated is now
wrong, of course, hence we pass the size of the buffer so that we can
add a sanity check later, to prevent running past the end of the buffer.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin/fsck.c           |  2 +-
 builtin/index-pack.c     |  3 ++-
 builtin/unpack-objects.c | 14 ++++++++++----
 fsck.c                   | 24 +++++++++++++++---------
 fsck.h                   |  4 +++-
 5 files changed, 31 insertions(+), 16 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index d42a27d..d9f4e6e 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -298,7 +298,7 @@ static int fsck_obj(struct object *obj)
 
 	if (fsck_walk(obj, mark_used, NULL))
 		objerror(obj, "broken links");
-	if (fsck_object(obj, check_strict, fsck_error_func))
+	if (fsck_object(obj, NULL, 0, check_strict, fsck_error_func))
 		return -1;
 
 	if (obj->type == OBJ_TREE) {
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 5568a5b..f2465ff 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -773,7 +773,8 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
 			if (!obj)
 				die(_("invalid %s"), typename(type));
 			if (do_fsck_object &&
-			    fsck_object(obj, 1, fsck_error_function))
+			    fsck_object(obj, buf, size, 1,
+				    fsck_error_function))
 				die(_("Error in object"));
 			if (fsck_walk(obj, mark_link, NULL))
 				die(_("Not all child objects of %s are reachable"), sha1_to_hex(obj->sha1));
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 99cde45..855d94b 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -164,10 +164,10 @@ static unsigned nr_objects;
  * Called only from check_object() after it verified this object
  * is Ok.
  */
-static void write_cached_object(struct object *obj)
+static void write_cached_object(struct object *obj, struct obj_buffer *obj_buf)
 {
 	unsigned char sha1[20];
-	struct obj_buffer *obj_buf = lookup_object_buffer(obj);
+
 	if (write_sha1_file(obj_buf->buffer, obj_buf->size, typename(obj->type), sha1) < 0)
 		die("failed to write object %s", sha1_to_hex(obj->sha1));
 	obj->flags |= FLAG_WRITTEN;
@@ -180,6 +180,8 @@ static void write_cached_object(struct object *obj)
  */
 static int check_object(struct object *obj, int type, void *data)
 {
+	struct obj_buffer *obj_buf;
+
 	if (!obj)
 		return 1;
 
@@ -198,11 +200,15 @@ static int check_object(struct object *obj, int type, void *data)
 		return 0;
 	}
 
-	if (fsck_object(obj, 1, fsck_error_function))
+	obj_buf = lookup_object_buffer(obj);
+	if (!obj_buf)
+		die("Whoops! Cannot find object '%s'", sha1_to_hex(obj->sha1));
+	if (fsck_object(obj, obj_buf->buffer, obj_buf->size, 1,
+			fsck_error_function))
 		die("Error in object");
 	if (fsck_walk(obj, check_object, NULL))
 		die("Error on reachable objects of %s", sha1_to_hex(obj->sha1));
-	write_cached_object(obj);
+	write_cached_object(obj, obj_buf);
 	return 0;
 }
 
diff --git a/fsck.c b/fsck.c
index 56156ff..dd77628 100644
--- a/fsck.c
+++ b/fsck.c
@@ -277,7 +277,7 @@ static int fsck_ident(const char **ident, struct object *obj, fsck_error error_f
 }
 
 static int fsck_commit_buffer(struct commit *commit, const char *buffer,
-			      fsck_error error_func)
+	unsigned long size, fsck_error error_func)
 {
 	unsigned char tree_sha1[20], sha1[20];
 	struct commit_graft *graft;
@@ -322,15 +322,18 @@ static int fsck_commit_buffer(struct commit *commit, const char *buffer,
 	return 0;
 }
 
-static int fsck_commit(struct commit *commit, fsck_error error_func)
+static int fsck_commit(struct commit *commit, const char *data,
+	unsigned long size, fsck_error error_func)
 {
-	const char *buffer = get_commit_buffer(commit, NULL);
-	int ret = fsck_commit_buffer(commit, buffer, error_func);
-	unuse_commit_buffer(commit, buffer);
+	const char *buffer = data ?  data : get_commit_buffer(commit, &size);
+	int ret = fsck_commit_buffer(commit, buffer, size, error_func);
+	if (!data)
+		unuse_commit_buffer(commit, buffer);
 	return ret;
 }
 
-static int fsck_tag(struct tag *tag, fsck_error error_func)
+static int fsck_tag(struct tag *tag, const char *data,
+	unsigned long size, fsck_error error_func)
 {
 	struct object *tagged = tag->tagged;
 
@@ -339,7 +342,8 @@ static int fsck_tag(struct tag *tag, fsck_error error_func)
 	return 0;
 }
 
-int fsck_object(struct object *obj, int strict, fsck_error error_func)
+int fsck_object(struct object *obj, void *data, unsigned long size,
+	int strict, fsck_error error_func)
 {
 	if (!obj)
 		return error_func(obj, FSCK_ERROR, "no valid object to fsck");
@@ -349,9 +353,11 @@ int fsck_object(struct object *obj, int strict, fsck_error error_func)
 	if (obj->type == OBJ_TREE)
 		return fsck_tree((struct tree *) obj, strict, error_func);
 	if (obj->type == OBJ_COMMIT)
-		return fsck_commit((struct commit *) obj, error_func);
+		return fsck_commit((struct commit *) obj, (const char *) data,
+			size, error_func);
 	if (obj->type == OBJ_TAG)
-		return fsck_tag((struct tag *) obj, error_func);
+		return fsck_tag((struct tag *) obj, (const char *) data,
+			size, error_func);
 
 	return error_func(obj, FSCK_ERROR, "unknown type '%d' (internal fsck error)",
 			  obj->type);
diff --git a/fsck.h b/fsck.h
index 1e4f527..d1e6387 100644
--- a/fsck.h
+++ b/fsck.h
@@ -28,6 +28,8 @@ int fsck_error_function(struct object *obj, int type, const char *fmt, ...);
  *    0		everything OK
  */
 int fsck_walk(struct object *obj, fsck_walk_func walk, void *data);
-int fsck_object(struct object *obj, int strict, fsck_error error_func);
+/* If NULL is passed for data, we assume the object is local and read it. */
+int fsck_object(struct object *obj, void *data, unsigned long size,
+	int strict, fsck_error error_func);
 
 #endif
-- 
2.0.0.rc3.9669.g840d1f9

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 3/6] Make sure fsck_commit_buffer() does not run out of the buffer
       [not found] ` <cover.1410356761.git.johannes.schindelin@gmx.de>
  2014-09-10 13:52   ` [PATCH v2 1/6] Refactor type_from_string() to avoid die()ing in case of errors Johannes Schindelin
  2014-09-10 13:52   ` [PATCH v2 2/6] Accept object data in the fsck_object() function Johannes Schindelin
@ 2014-09-10 13:52   ` Johannes Schindelin
  2014-09-10 17:43     ` Junio C Hamano
  2014-09-10 20:45     ` Eric Sunshine
  2014-09-10 13:53   ` [PATCH v2 4/6] fsck: check tag objects' headers Johannes Schindelin
                     ` (2 subsequent siblings)
  5 siblings, 2 replies; 68+ messages in thread
From: Johannes Schindelin @ 2014-09-10 13:52 UTC (permalink / raw)
  To: gitster; +Cc: git

So far, we assumed that the buffer is NUL terminated, but this is not
a safe assumption, now that we opened the fsck_object() API to pass a
buffer directly.

So let's make sure that there is at least an empty line in the buffer.
That way, our checks would fail if the empty line was encountered
prematurely, and consequently we can get away with the current string
comparisons even with non-NUL-terminated buffers are passed to
fsck_object().

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 fsck.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/fsck.c b/fsck.c
index dd77628..9dd7d12 100644
--- a/fsck.c
+++ b/fsck.c
@@ -237,6 +237,26 @@ static int fsck_tree(struct tree *item, int strict, fsck_error error_func)
 	return retval;
 }
 
+static int require_end_of_header(const void *data, unsigned long size,
+	struct object *obj, fsck_error error_func)
+{
+	const char *buffer = (const char *)data;
+	int i;
+
+	for (i = 0; i < size; i++) {
+		switch (buffer[i]) {
+		case '\0':
+			return error_func(obj, FSCK_ERROR,
+				"invalid message: NUL at offset %d", i);
+		case '\n':
+			if (i + 1 < size && buffer[i + 1] == '\n')
+				return 0;
+		}
+	}
+
+	return error_func(obj, FSCK_ERROR, "invalid buffer: missing empty line");
+}
+
 static int fsck_ident(const char **ident, struct object *obj, fsck_error error_func)
 {
 	char *end;
@@ -284,6 +304,9 @@ static int fsck_commit_buffer(struct commit *commit, const char *buffer,
 	unsigned parent_count, parent_line_count = 0;
 	int err;
 
+	if (require_end_of_header(buffer, size, &commit->object, error_func))
+		return -1;
+
 	if (!skip_prefix(buffer, "tree ", &buffer))
 		return error_func(&commit->object, FSCK_ERROR, "invalid format - expected 'tree' line");
 	if (get_sha1_hex(buffer, tree_sha1) || buffer[40] != '\n')
-- 
2.0.0.rc3.9669.g840d1f9

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 4/6] fsck: check tag objects' headers
       [not found] ` <cover.1410356761.git.johannes.schindelin@gmx.de>
                     ` (2 preceding siblings ...)
  2014-09-10 13:52   ` [PATCH v2 3/6] Make sure fsck_commit_buffer() does not run out of the buffer Johannes Schindelin
@ 2014-09-10 13:53   ` Johannes Schindelin
  2014-09-10 17:52     ` Junio C Hamano
  2014-09-10 13:53   ` [PATCH v2 5/6] Add regression tests for stricter tag fsck'ing Johannes Schindelin
  2014-09-10 13:53   ` [PATCH v2 6/6] Make sure that index-pack --strict fails upon invalid tag objects Johannes Schindelin
  5 siblings, 1 reply; 68+ messages in thread
From: Johannes Schindelin @ 2014-09-10 13:53 UTC (permalink / raw)
  To: gitster; +Cc: git

We inspect commit objects pretty much in detail in git-fsck, but we just
glanced over the tag objects. Let's be stricter.

Since we do not want to limit 'tag' lines unduly, values that would fail
the refname check only result in warnings, not errors.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 fsck.c | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 84 insertions(+), 1 deletion(-)

diff --git a/fsck.c b/fsck.c
index 9dd7d12..8341a30 100644
--- a/fsck.c
+++ b/fsck.c
@@ -6,6 +6,7 @@
 #include "commit.h"
 #include "tag.h"
 #include "fsck.h"
+#include "refs.h"
 
 static int fsck_walk_tree(struct tree *tree, fsck_walk_func walk, void *data)
 {
@@ -355,6 +356,87 @@ static int fsck_commit(struct commit *commit, const char *data,
 	return ret;
 }
 
+static int fsck_tag_buffer(struct tag *tag, const char *data,
+	unsigned long size, fsck_error error_func)
+{
+	unsigned char sha1[20];
+	int ret = 0;
+	const char *buffer;
+	char *to_free = NULL, *eol;
+	struct strbuf sb = STRBUF_INIT;
+
+	if (data)
+		buffer = data;
+	else {
+		enum object_type type;
+
+		buffer = to_free =
+			read_sha1_file(tag->object.sha1, &type, &size);
+		if (!buffer)
+			return error_func(&tag->object, FSCK_ERROR,
+				"cannot read tag object");
+
+		if (type != OBJ_TAG) {
+			ret = error_func(&tag->object, FSCK_ERROR,
+				"expected tag got %s",
+			    typename(type));
+			goto done;
+		}
+	}
+
+	if (require_end_of_header(buffer, size, &tag->object, error_func))
+		goto done;
+
+	if (!skip_prefix(buffer, "object ", &buffer)) {
+		ret = error_func(&tag->object, FSCK_ERROR, "invalid format - expected 'object' line");
+		goto done;
+	}
+	if (get_sha1_hex(buffer, sha1) || buffer[40] != '\n') {
+		ret = error_func(&tag->object, FSCK_ERROR, "invalid 'object' line format - bad sha1");
+		goto done;
+	}
+	buffer += 41;
+
+	if (!skip_prefix(buffer, "type ", &buffer)) {
+		ret = error_func(&tag->object, FSCK_ERROR, "invalid format - expected 'type' line");
+		goto done;
+	}
+	eol = strchr(buffer, '\n');
+	if (!eol) {
+		ret = error_func(&tag->object, FSCK_ERROR, "invalid format - unexpected end after 'type' line");
+		goto done;
+	}
+	if (type_from_string_gently(buffer, eol - buffer, 1) < 0)
+		ret = error_func(&tag->object, FSCK_ERROR, "invalid 'type' value");
+	if (ret)
+		goto done;
+	buffer = eol + 1;
+
+	if (!skip_prefix(buffer, "tag ", &buffer)) {
+		ret = error_func(&tag->object, FSCK_ERROR, "invalid format - expected 'tag' line");
+		goto done;
+	}
+	eol = strchr(buffer, '\n');
+	if (!eol) {
+		ret = error_func(&tag->object, FSCK_ERROR, "invalid format - unexpected end after 'type' line");
+		goto done;
+	}
+	strbuf_addf(&sb, "refs/tags/%.*s", (int)(eol - buffer), buffer);
+	if (check_refname_format(sb.buf, 0))
+		error_func(&tag->object, FSCK_WARN, "invalid 'tag' name: %s", buffer);
+	buffer = eol + 1;
+
+	if (!skip_prefix(buffer, "tagger ", &buffer)) {
+		/* early tags do not contain 'tagger' lines; warn only */
+		error_func(&tag->object, FSCK_WARN, "invalid format - expected 'tagger' line");
+	}
+	ret = fsck_ident(&buffer, &tag->object, error_func);
+
+done:
+	free(to_free);
+	return ret;
+}
+
 static int fsck_tag(struct tag *tag, const char *data,
 	unsigned long size, fsck_error error_func)
 {
@@ -362,7 +444,8 @@ static int fsck_tag(struct tag *tag, const char *data,
 
 	if (!tagged)
 		return error_func(&tag->object, FSCK_ERROR, "could not load tagged object");
-	return 0;
+
+	return fsck_tag_buffer(tag, data, size, error_func);
 }
 
 int fsck_object(struct object *obj, void *data, unsigned long size,
-- 
2.0.0.rc3.9669.g840d1f9

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 5/6] Add regression tests for stricter tag fsck'ing
       [not found] ` <cover.1410356761.git.johannes.schindelin@gmx.de>
                     ` (3 preceding siblings ...)
  2014-09-10 13:53   ` [PATCH v2 4/6] fsck: check tag objects' headers Johannes Schindelin
@ 2014-09-10 13:53   ` Johannes Schindelin
  2014-09-10 17:56     ` Junio C Hamano
  2014-09-10 13:53   ` [PATCH v2 6/6] Make sure that index-pack --strict fails upon invalid tag objects Johannes Schindelin
  5 siblings, 1 reply; 68+ messages in thread
From: Johannes Schindelin @ 2014-09-10 13:53 UTC (permalink / raw)
  To: gitster; +Cc: git

The intent of the two new test cases is to catch general breakages in
the fsck_tag() function, not so much to test it extensively, trying to
strike the proper balance between thoroughness and speed.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t1450-fsck.sh | 39 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 39 insertions(+)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 8c739c9..16b3e4a 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -194,6 +194,45 @@ test_expect_success 'tag pointing to something else than its type' '
 	test_must_fail git fsck --tags
 '
 
+test_expect_success 'tag with incorrect tag name' '
+	sha=$(git rev-parse HEAD) &&
+	cat >wrong-tag <<-EOF &&
+	object $sha
+	type commit
+	tag wrong name format
+	tagger T A Gger <tagger@example.com> 1234567890 -0000
+
+	This is an invalid tag.
+	EOF
+
+	tag=$(git hash-object -t tag -w --stdin <wrong-tag) &&
+	test_when_finished "remove_object $tag" &&
+	echo $tag >.git/refs/tags/wrong &&
+	test_when_finished "git update-ref -d refs/tags/wrong" &&
+	git fsck --tags 2>out &&
+	cat out &&
+	grep "invalid .tag. name" out
+'
+
+test_expect_success 'tag with missing tagger' '
+	sha=$(git rev-parse HEAD) &&
+	cat >wrong-tag <<-EOF &&
+	object $sha
+	type commit
+	tag gutentag
+
+	This is an invalid tag.
+	EOF
+
+	tag=$(git hash-object -t tag -w --stdin <wrong-tag) &&
+	test_when_finished "remove_object $tag" &&
+	echo $tag >.git/refs/tags/wrong &&
+	test_when_finished "git update-ref -d refs/tags/wrong" &&
+	git fsck --tags 2>out &&
+	cat out &&
+	grep "expected .tagger. line" out
+'
+
 test_expect_success 'cleaned up' '
 	git fsck >actual 2>&1 &&
 	test_cmp empty actual
-- 
2.0.0.rc3.9669.g840d1f9

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v2 6/6] Make sure that index-pack --strict fails upon invalid tag objects
       [not found] ` <cover.1410356761.git.johannes.schindelin@gmx.de>
                     ` (4 preceding siblings ...)
  2014-09-10 13:53   ` [PATCH v2 5/6] Add regression tests for stricter tag fsck'ing Johannes Schindelin
@ 2014-09-10 13:53   ` Johannes Schindelin
  2014-09-10 21:54     ` Junio C Hamano
  5 siblings, 1 reply; 68+ messages in thread
From: Johannes Schindelin @ 2014-09-10 13:53 UTC (permalink / raw)
  To: gitster; +Cc: git

One of the most important use cases for the strict tag object checking
is when transfer.fsckobjects is set to true to catch invalid objects
early on. This new regression test essentially tests the same code path
by directly calling 'index-pack --strict' on a pack containing an
invalid tag object.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t5302-pack-index.sh | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/t/t5302-pack-index.sh b/t/t5302-pack-index.sh
index 4bbb718..80bd3ee 100755
--- a/t/t5302-pack-index.sh
+++ b/t/t5302-pack-index.sh
@@ -243,4 +243,23 @@ test_expect_success 'running index-pack in the object store' '
     test -f .git/objects/pack/pack-${pack1}.idx
 '
 
+test_expect_success 'index-pack --strict fails upon invalid tag' '
+    sha=$(git rev-parse HEAD) &&
+    cat >wrong-tag <<EOF &&
+object $sha
+type commit
+tag guten tag
+
+This is an invalid tag.
+EOF
+
+    tag=$(git hash-object -t tag -w --stdin <wrong-tag) &&
+    pack1=$(echo $tag | git pack-objects tag-test) &&
+    echo remove tag object &&
+    thirtyeight=${tag#??} &&
+    rm -f .git/objects/${tag%$thirtyeight}/$thirtyeight &&
+    test_must_fail git index-pack --strict tag-test-${pack1}.pack 2> err &&
+    grep "invalid .tag. name" err
+'
+
 test_done
-- 
2.0.0.rc3.9669.g840d1f9

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 0/6] Improve tag checking in fsck and with transfer.fsckobjects
  2014-09-10 13:52 ` [PATCH v2 0/6] Improve tag checking in fsck and with transfer.fsckobjects Johannes Schindelin
@ 2014-09-10 13:58   ` Johannes Schindelin
  2014-09-10 21:07   ` Junio C Hamano
  2014-09-11 14:26   ` [PATCH v3 " Johannes Schindelin
  2 siblings, 0 replies; 68+ messages in thread
From: Johannes Schindelin @ 2014-09-10 13:58 UTC (permalink / raw)
  To: gitster; +Cc: git

Hi Junio,

On Wed, 10 Sep 2014, Johannes Schindelin wrote:

> Still unaddressed:
> 
> - getting rid of struct object altogether in fsck (I felt this was quite a big
>   task, getting much more familiar with the non-tag code paths, and I did not
>   want to delay this patch series up any further)
> 
> - ensuring that index-pack passes only NUL-terminated buffers to fsck (again,
>   I am not familiar enough with the code, and IIRC the problematic unit test
>   that revealed that these buffers are not always NUL-terminated exercised the
>   unpack-objects code path, not index-pack, again nothing I wanted to let
>   delay this patch series any further).

To clarify: I am planning to address these issues later this year, but
consider them not critical enough to finish the fsck-tag patch series.

Please let me know if you disagree.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 3/6] Make sure fsck_commit_buffer() does not run out of the buffer
  2014-09-10 13:52   ` [PATCH v2 3/6] Make sure fsck_commit_buffer() does not run out of the buffer Johannes Schindelin
@ 2014-09-10 17:43     ` Junio C Hamano
  2014-09-11 11:59       ` Johannes Schindelin
  2014-09-10 20:45     ` Eric Sunshine
  1 sibling, 1 reply; 68+ messages in thread
From: Junio C Hamano @ 2014-09-10 17:43 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <johannes.schindelin@gmx.de> writes:

> So far, we assumed that the buffer is NUL terminated, but this is not
> a safe assumption, now that we opened the fsck_object() API to pass a
> buffer directly.
>
> So let's make sure that there is at least an empty line in the buffer.
> That way, our checks would fail if the empty line was encountered
> prematurely, and consequently we can get away with the current string
> comparisons even with non-NUL-terminated buffers are passed to
> fsck_object().
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  fsck.c | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
>
> diff --git a/fsck.c b/fsck.c
> index dd77628..9dd7d12 100644
> --- a/fsck.c
> +++ b/fsck.c
> @@ -237,6 +237,26 @@ static int fsck_tree(struct tree *item, int strict, fsck_error error_func)
>  	return retval;
>  }
>  
> +static int require_end_of_header(const void *data, unsigned long size,
> +	struct object *obj, fsck_error error_func)
> +{
> +	const char *buffer = (const char *)data;
> +	int i;
> +
> +	for (i = 0; i < size; i++) {
> +		switch (buffer[i]) {
> +		case '\0':
> +			return error_func(obj, FSCK_ERROR,
> +				"invalid message: NUL at offset %d", i);

Isn't this "invalid header"?  After all we haven't escaped this loop
and haven't seen the message part of the commit object (and it is
the same if you are going to later reuse this for tag objects).

> +		case '\n':
> +			if (i + 1 < size && buffer[i + 1] == '\n')
> +				return 0;
> +		}
> +	}
> +
> +	return error_func(obj, FSCK_ERROR, "invalid buffer: missing empty line");

I think you meant "missing end of header" here.

Other than these small nits, the patch looks good from a cursory
read.

> +}
> +
>  static int fsck_ident(const char **ident, struct object *obj, fsck_error error_func)
>  {
>  	char *end;
> @@ -284,6 +304,9 @@ static int fsck_commit_buffer(struct commit *commit, const char *buffer,
>  	unsigned parent_count, parent_line_count = 0;
>  	int err;
>  
> +	if (require_end_of_header(buffer, size, &commit->object, error_func))
> +		return -1;
> +
>  	if (!skip_prefix(buffer, "tree ", &buffer))
>  		return error_func(&commit->object, FSCK_ERROR, "invalid format - expected 'tree' line");
>  	if (get_sha1_hex(buffer, tree_sha1) || buffer[40] != '\n')

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 4/6] fsck: check tag objects' headers
  2014-09-10 13:53   ` [PATCH v2 4/6] fsck: check tag objects' headers Johannes Schindelin
@ 2014-09-10 17:52     ` Junio C Hamano
  0 siblings, 0 replies; 68+ messages in thread
From: Junio C Hamano @ 2014-09-10 17:52 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <johannes.schindelin@gmx.de> writes:

> +	if (!skip_prefix(buffer, "tag ", &buffer)) {
> +		ret = error_func(&tag->object, FSCK_ERROR, "invalid format - expected 'tag' line");
> +		goto done;
> +	}
> +	eol = strchr(buffer, '\n');
> +	if (!eol) {
> +		ret = error_func(&tag->object, FSCK_ERROR, "invalid format - unexpected end after 'type' line");
> +		goto done;
> +	}
> +	strbuf_addf(&sb, "refs/tags/%.*s", (int)(eol - buffer), buffer);
> +	if (check_refname_format(sb.buf, 0))
> +		error_func(&tag->object, FSCK_WARN, "invalid 'tag' name: %s", buffer);
> +	buffer = eol + 1;
> +
> +	if (!skip_prefix(buffer, "tagger ", &buffer)) {
> +		/* early tags do not contain 'tagger' lines; warn only */
> +		error_func(&tag->object, FSCK_WARN, "invalid format - expected 'tagger' line");
> +	}
> +	ret = fsck_ident(&buffer, &tag->object, error_func);

Shouldn't this relate to the above conditional, to skip ident check
on early tags that lack tagger lines?

> +
> +done:
> +	free(to_free);

strbuf_release(&sb) here?

> +	return ret;
> +}
> +

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 5/6] Add regression tests for stricter tag fsck'ing
  2014-09-10 13:53   ` [PATCH v2 5/6] Add regression tests for stricter tag fsck'ing Johannes Schindelin
@ 2014-09-10 17:56     ` Junio C Hamano
  2014-09-11 14:15       ` Johannes Schindelin
  0 siblings, 1 reply; 68+ messages in thread
From: Junio C Hamano @ 2014-09-10 17:56 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <johannes.schindelin@gmx.de> writes:

> +	test_when_finished "git update-ref -d refs/tags/wrong" &&
> +	git fsck --tags 2>out &&

I wonder what the command does with or without --tags option
(applies to both tests added by this patch)?

Does running "fsck" without the option not to report broken tags
detected by the new checks added in this series?  Should it?

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 3/6] Make sure fsck_commit_buffer() does not run out of the buffer
  2014-09-10 13:52   ` [PATCH v2 3/6] Make sure fsck_commit_buffer() does not run out of the buffer Johannes Schindelin
  2014-09-10 17:43     ` Junio C Hamano
@ 2014-09-10 20:45     ` Eric Sunshine
  1 sibling, 0 replies; 68+ messages in thread
From: Eric Sunshine @ 2014-09-10 20:45 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, Git List

On Wed, Sep 10, 2014 at 9:52 AM, Johannes Schindelin
<johannes.schindelin@gmx.de> wrote:
> So far, we assumed that the buffer is NUL terminated, but this is not
> a safe assumption, now that we opened the fsck_object() API to pass a
> buffer directly.
>
> So let's make sure that there is at least an empty line in the buffer.
> That way, our checks would fail if the empty line was encountered
> prematurely, and consequently we can get away with the current string
> comparisons even with non-NUL-terminated buffers are passed to
> fsck_object().
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  fsck.c | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
>
> diff --git a/fsck.c b/fsck.c
> index dd77628..9dd7d12 100644
> --- a/fsck.c
> +++ b/fsck.c
> @@ -237,6 +237,26 @@ static int fsck_tree(struct tree *item, int strict, fsck_error error_func)
>         return retval;
>  }
>
> +static int require_end_of_header(const void *data, unsigned long size,
> +       struct object *obj, fsck_error error_func)
> +{
> +       const char *buffer = (const char *)data;
> +       int i;
> +
> +       for (i = 0; i < size; i++) {

Should 'i' have type 'unsigned long', rather than 'int', to be
consistent with the type of 'size'?

> +               switch (buffer[i]) {
> +               case '\0':
> +                       return error_func(obj, FSCK_ERROR,
> +                               "invalid message: NUL at offset %d", i);
> +               case '\n':
> +                       if (i + 1 < size && buffer[i + 1] == '\n')
> +                               return 0;
> +               }
> +       }
> +
> +       return error_func(obj, FSCK_ERROR, "invalid buffer: missing empty line");
> +}
> +
>  static int fsck_ident(const char **ident, struct object *obj, fsck_error error_func)
>  {
>         char *end;
> @@ -284,6 +304,9 @@ static int fsck_commit_buffer(struct commit *commit, const char *buffer,
>         unsigned parent_count, parent_line_count = 0;
>         int err;
>
> +       if (require_end_of_header(buffer, size, &commit->object, error_func))
> +               return -1;
> +
>         if (!skip_prefix(buffer, "tree ", &buffer))
>                 return error_func(&commit->object, FSCK_ERROR, "invalid format - expected 'tree' line");
>         if (get_sha1_hex(buffer, tree_sha1) || buffer[40] != '\n')
> --
> 2.0.0.rc3.9669.g840d1f9
>
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 0/6] Improve tag checking in fsck and with transfer.fsckobjects
  2014-09-10 13:52 ` [PATCH v2 0/6] Improve tag checking in fsck and with transfer.fsckobjects Johannes Schindelin
  2014-09-10 13:58   ` Johannes Schindelin
@ 2014-09-10 21:07   ` Junio C Hamano
  2014-09-10 21:31     ` Junio C Hamano
  2014-09-11 14:26   ` [PATCH v3 " Johannes Schindelin
  2 siblings, 1 reply; 68+ messages in thread
From: Junio C Hamano @ 2014-09-10 21:07 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <johannes.schindelin@gmx.de> writes:

> This patch series introduces detailed checking of tag objects when calling
> git fsck, and also when transfer.fsckobjects is set to true.
>
> To this end, the fsck machinery is reworked to accept the buffer and size
> of the object to check, and for commit and tag objects, we verify that the
> buffers contain an end of header (i.e. an empty line) to guarantee that our
> checks do not run beyond the buffer.

Overall they looked sensible and I am trying to queue them on 'pu'
for today's pushout.

Thanks.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 0/6] Improve tag checking in fsck and with transfer.fsckobjects
  2014-09-10 21:07   ` Junio C Hamano
@ 2014-09-10 21:31     ` Junio C Hamano
  2014-09-11 14:20       ` Johannes Schindelin
  0 siblings, 1 reply; 68+ messages in thread
From: Junio C Hamano @ 2014-09-10 21:31 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Junio C Hamano <gitster@pobox.com> writes:

> Johannes Schindelin <johannes.schindelin@gmx.de> writes:
>
>> This patch series introduces detailed checking of tag objects when calling
>> git fsck, and also when transfer.fsckobjects is set to true.
>>
>> To this end, the fsck machinery is reworked to accept the buffer and size
>> of the object to check, and for commit and tag objects, we verify that the
>> buffers contain an end of header (i.e. an empty line) to guarantee that our
>> checks do not run beyond the buffer.
>
> Overall they looked sensible and I am trying to queue them on 'pu'
> for today's pushout.
>
> Thanks.

I was expecting to see interesting interactions with the "oops, fsck
was not exiting with non-zero status in some cases" fix by Peff with
the patch 5/6 of this series that adds two new tests to t1450, but
because the breakage in these two cases are both only warning-events
and not error-events, I didn't prefix "git fsck" with test_must_fail
after all.

Are there some more serious kind of breakages that the new code
actually catches as errors which the new tests are not exercising?

The second test however did need the fix I mentioned in the review
to skip ident validation when we know the buffer does not point at a
potential ident to pass, though (see below, which is what I queued
on the tip of your series as a "SQUASH???" single fix-up patch).

Please do an equivalent in the individual patches that introduce
these buglets in a reroll (i.e. not like I did here, which was done
this way only because I needed to make sure that day's integration
result passes the test suite with minimum effort on my end ;-)).

Thanks.

 fsck.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/fsck.c b/fsck.c
index 8341a30..6e6ea25 100644
--- a/fsck.c
+++ b/fsck.c
@@ -248,14 +248,14 @@ static int require_end_of_header(const void *data, unsigned long size,
 		switch (buffer[i]) {
 		case '\0':
 			return error_func(obj, FSCK_ERROR,
-				"invalid message: NUL at offset %d", i);
+				"invalid header: NUL at offset %d", i);
 		case '\n':
 			if (i + 1 < size && buffer[i + 1] == '\n')
 				return 0;
 		}
 	}
 
-	return error_func(obj, FSCK_ERROR, "invalid buffer: missing empty line");
+	return error_func(obj, FSCK_ERROR, "invalid buffer: missing end of header");
 }
 
 static int fsck_ident(const char **ident, struct object *obj, fsck_error error_func)
@@ -426,14 +426,16 @@ static int fsck_tag_buffer(struct tag *tag, const char *data,
 		error_func(&tag->object, FSCK_WARN, "invalid 'tag' name: %s", buffer);
 	buffer = eol + 1;
 
-	if (!skip_prefix(buffer, "tagger ", &buffer)) {
+	if (!skip_prefix(buffer, "tagger ", &buffer))
 		/* early tags do not contain 'tagger' lines; warn only */
-		error_func(&tag->object, FSCK_WARN, "invalid format - expected 'tagger' line");
-	}
-	ret = fsck_ident(&buffer, &tag->object, error_func);
+		error_func(&tag->object, FSCK_WARN,
+			   "invalid format - expected 'tagger' line");
+	else
+		ret = fsck_ident(&buffer, &tag->object, error_func);
 
 done:
 	free(to_free);
+	strbuf_release(&sb);
 	return ret;
 }
 
-- 
2.1.0-451-g8daadf2

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 6/6] Make sure that index-pack --strict fails upon invalid tag objects
  2014-09-10 13:53   ` [PATCH v2 6/6] Make sure that index-pack --strict fails upon invalid tag objects Johannes Schindelin
@ 2014-09-10 21:54     ` Junio C Hamano
  2014-09-11 14:22       ` Johannes Schindelin
  0 siblings, 1 reply; 68+ messages in thread
From: Junio C Hamano @ 2014-09-10 21:54 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <johannes.schindelin@gmx.de> writes:

> +test_expect_success 'index-pack --strict fails upon invalid tag' '
> +    sha=$(git rev-parse HEAD) &&
> +    cat >wrong-tag <<EOF &&
> +object $sha
> +type commit
> +tag guten tag
> +
> +This is an invalid tag.
> +EOF

Missing tagger is merely a warning event (thanks for a good comment
in the code, by the way, to explain the reason why it is so is
because of annotated tags in earlier versions of Git).  tag names
that are not ref-formed is only a warning event in this reroll.

So this is not really "invalid" in the sense that index-pack can
notice it as an error, no?

> +
> +    tag=$(git hash-object -t tag -w --stdin <wrong-tag) &&
> +    pack1=$(echo $tag | git pack-objects tag-test) &&
> +    echo remove tag object &&
> +    thirtyeight=${tag#??} &&
> +    rm -f .git/objects/${tag%$thirtyeight}/$thirtyeight &&
> +    test_must_fail git index-pack --strict tag-test-${pack1}.pack 2> err &&

I had to drop "must fail" from this one (will amend the "SQUASH???"
again).

Perhaps you would want to add a real error, e.g. a tag with tagger
whose ident does not pass fsck-ident or something?

> +    grep "invalid .tag. name" err
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 3/6] Make sure fsck_commit_buffer() does not run out of the buffer
  2014-09-10 17:43     ` Junio C Hamano
@ 2014-09-11 11:59       ` Johannes Schindelin
  2014-09-11 16:49         ` Junio C Hamano
  0 siblings, 1 reply; 68+ messages in thread
From: Johannes Schindelin @ 2014-09-11 11:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Hi Junio,

On Wed, 10 Sep 2014, Junio C Hamano wrote:

> Johannes Schindelin <johannes.schindelin@gmx.de> writes:
> 
> > diff --git a/fsck.c b/fsck.c
> > index dd77628..9dd7d12 100644
> > --- a/fsck.c
> > +++ b/fsck.c
> > @@ -237,6 +237,26 @@ static int fsck_tree(struct tree *item, int strict, fsck_error error_func)
> >  	return retval;
> >  }
> >  
> > +static int require_end_of_header(const void *data, unsigned long size,
> > +	struct object *obj, fsck_error error_func)
> > +{
> > +	const char *buffer = (const char *)data;
> > +	int i;
> > +
> > +	for (i = 0; i < size; i++) {
> > +		switch (buffer[i]) {
> > +		case '\0':
> > +			return error_func(obj, FSCK_ERROR,
> > +				"invalid message: NUL at offset %d", i);
> 
> Isn't this "invalid header"?  After all we haven't escaped this loop
> and haven't seen the message part of the commit object (and it is
> the same if you are going to later reuse this for tag objects).

My reasoning for keeping it saying "message" was that a message consists
of a header and a body. I will change it to "unterminated header" instead,
also in the error message when no NUL was found.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 5/6] Add regression tests for stricter tag fsck'ing
  2014-09-10 17:56     ` Junio C Hamano
@ 2014-09-11 14:15       ` Johannes Schindelin
  0 siblings, 0 replies; 68+ messages in thread
From: Johannes Schindelin @ 2014-09-11 14:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Hi Junio,

On Wed, 10 Sep 2014, Junio C Hamano wrote:

> Johannes Schindelin <johannes.schindelin@gmx.de> writes:
> 
> > +	test_when_finished "git update-ref -d refs/tags/wrong" &&
> > +	git fsck --tags 2>out &&
> 
> I wonder what the command does with or without --tags option
> (applies to both tests added by this patch)?
> 
> Does running "fsck" without the option not to report broken tags
> detected by the new checks added in this series?  Should it?

fsck fails perfectly fine without the --tags option; this option just
triggers that fsck will show also the objects referenced by the tag
objects (which is nice for debugging).

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 0/6] Improve tag checking in fsck and with transfer.fsckobjects
  2014-09-10 21:31     ` Junio C Hamano
@ 2014-09-11 14:20       ` Johannes Schindelin
  0 siblings, 0 replies; 68+ messages in thread
From: Johannes Schindelin @ 2014-09-11 14:20 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Hi Junio,

On Wed, 10 Sep 2014, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
> 
> > Johannes Schindelin <johannes.schindelin@gmx.de> writes:
> >
> >> This patch series introduces detailed checking of tag objects when calling
> >> git fsck, and also when transfer.fsckobjects is set to true.
> >>
> >> To this end, the fsck machinery is reworked to accept the buffer and size
> >> of the object to check, and for commit and tag objects, we verify that the
> >> buffers contain an end of header (i.e. an empty line) to guarantee that our
> >> checks do not run beyond the buffer.
> >
> > Overall they looked sensible and I am trying to queue them on 'pu'
> > for today's pushout.
> >
> > Thanks.
> 
> I was expecting to see interesting interactions with the "oops, fsck
> was not exiting with non-zero status in some cases" fix by Peff with
> the patch 5/6 of this series that adds two new tests to t1450, but
> because the breakage in these two cases are both only warning-events
> and not error-events, I didn't prefix "git fsck" with test_must_fail
> after all.

There were a couple of issues, thanks for pointing out that I missed
something. Short story: hash-object, fsck *and* pack-objects parse tag
objects as they are encountered. Therefore, it would be a bit hard to test
because we would have to emulate broken hash-object and pack-objects to
generate a pack containing an invalid tag object.

As a compromise, I now test for the warnings to make sure that the code
path is triggered, but do not explicitly test with a pack that would make
index-pack --strict *fail*.

Okay?

Ciao,
Dscho

P.S.: I squashed your changes in slightly modified form.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 6/6] Make sure that index-pack --strict fails upon invalid tag objects
  2014-09-10 21:54     ` Junio C Hamano
@ 2014-09-11 14:22       ` Johannes Schindelin
  2014-09-11 16:50         ` Junio C Hamano
  0 siblings, 1 reply; 68+ messages in thread
From: Johannes Schindelin @ 2014-09-11 14:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Hi,

On Wed, 10 Sep 2014, Junio C Hamano wrote:

> Johannes Schindelin <johannes.schindelin@gmx.de> writes:
> 
> > +    tag=$(git hash-object -t tag -w --stdin <wrong-tag) &&
> > +    pack1=$(echo $tag | git pack-objects tag-test) &&
> > +    echo remove tag object &&
> > +    thirtyeight=${tag#??} &&
> > +    rm -f .git/objects/${tag%$thirtyeight}/$thirtyeight &&
> > +    test_must_fail git index-pack --strict tag-test-${pack1}.pack 2> err &&
> 
> I had to drop "must fail" from this one (will amend the "SQUASH???"
> again).

Funny. It failed here, but for the wrong reason: index-pack --strict
failed here because the object referenced by the tag object was not in the
pack.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v3 0/6] Improve tag checking in fsck and with transfer.fsckobjects
  2014-09-10 13:52 ` [PATCH v2 0/6] Improve tag checking in fsck and with transfer.fsckobjects Johannes Schindelin
  2014-09-10 13:58   ` Johannes Schindelin
  2014-09-10 21:07   ` Junio C Hamano
@ 2014-09-11 14:26   ` Johannes Schindelin
  2014-09-11 14:26     ` [PATCH v3 1/6] Refactor type_from_string() to avoid die()ing in case of errors Johannes Schindelin
                       ` (6 more replies)
  2 siblings, 7 replies; 68+ messages in thread
From: Johannes Schindelin @ 2014-09-11 14:26 UTC (permalink / raw)
  To: gitster; +Cc: git

This patch series introduces detailed checking of tag objects when calling
git fsck, and also when transfer.fsckobjects is set to true.

To this end, the fsck machinery is reworked to accept the buffer and size
of the object to check, and for commit and tag objects, we verify that the
buffers contain an end of header (i.e. an empty line) to guarantee that our
checks do not run beyond the buffer.

This work was sponsored by GitHub.

Changes since v2:

- replaced 'invalid message' with 'unterminated header'

- avoided comparison between int and unsigned long (thanks, Eric Sunshine)

- made ident parsing conditional on finding the optional 'tagger' line

- added forgotten strbuf_release()

Still unaddressed:

- getting rid of struct object altogether in fsck (I felt this was quite a big
  task, getting much more familiar with the non-tag code paths, and I did not
  want to delay this patch series up any further)

- ensuring that index-pack passes only NUL-terminated buffers to fsck (again,
  I am not familiar enough with the code, and IIRC the problematic unit test
  that revealed that these buffers are not always NUL-terminated exercised the
  unpack-objects code path, not index-pack, again nothing I wanted to let
  delay this patch series any further).

Johannes Schindelin (6):
  Refactor type_from_string() to avoid die()ing in case of errors
  Accept object data in the fsck_object() function
  Make sure fsck_commit_buffer() does not run out of the buffer
  fsck: check tag objects' headers
  Add regression tests for stricter tag fsck'ing
  Make sure that index-pack --strict checks tag objects

 builtin/fsck.c           |   2 +-
 builtin/index-pack.c     |   3 +-
 builtin/unpack-objects.c |  14 +++--
 fsck.c                   | 133 +++++++++++++++++++++++++++++++++++++++++++----
 fsck.h                   |   4 +-
 object.c                 |  11 +++-
 object.h                 |   3 +-
 t/t1450-fsck.sh          |  20 +++++++
 t/t5302-pack-index.sh    |  19 +++++++
 9 files changed, 189 insertions(+), 20 deletions(-)

-- 
2.0.0.rc3.9669.g840d1f9

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v3 1/6] Refactor type_from_string() to avoid die()ing in case of errors
  2014-09-11 14:26   ` [PATCH v3 " Johannes Schindelin
@ 2014-09-11 14:26     ` Johannes Schindelin
  2014-09-11 14:26     ` [PATCH v3 2/6] Accept object data in the fsck_object() function Johannes Schindelin
                       ` (5 subsequent siblings)
  6 siblings, 0 replies; 68+ messages in thread
From: Johannes Schindelin @ 2014-09-11 14:26 UTC (permalink / raw)
  To: gitster; +Cc: git

In the next commits, we will enhance the fsck_tag() function to check
tag objects more thoroughly. To this end, we need a function to verify
that a given string is a valid object type, but that does not die() in
the negative case.

While at it, prepare type_from_string() for counted strings, i.e. strings
with an explicitly specified length rather than a NUL termination.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 object.c | 11 +++++++++--
 object.h |  3 ++-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/object.c b/object.c
index a16b9f9..aedac24 100644
--- a/object.c
+++ b/object.c
@@ -33,13 +33,20 @@ const char *typename(unsigned int type)
 	return object_type_strings[type];
 }
 
-int type_from_string(const char *str)
+int type_from_string_gently(const char *str, ssize_t len, int gentle)
 {
 	int i;
 
+	if (len < 0)
+		len = strlen(str);
+
 	for (i = 1; i < ARRAY_SIZE(object_type_strings); i++)
-		if (!strcmp(str, object_type_strings[i]))
+		if (!strncmp(str, object_type_strings[i], len))
 			return i;
+
+	if (gentle)
+		return -1;
+
 	die("invalid object type \"%s\"", str);
 }
 
diff --git a/object.h b/object.h
index 5e8d8ee..e028ced 100644
--- a/object.h
+++ b/object.h
@@ -53,7 +53,8 @@ struct object {
 };
 
 extern const char *typename(unsigned int type);
-extern int type_from_string(const char *str);
+extern int type_from_string_gently(const char *str, ssize_t, int gentle);
+#define type_from_string(str) type_from_string_gently(str, -1, 0)
 
 /*
  * Return the current number of buckets in the object hashmap.
-- 
2.0.0.rc3.9669.g840d1f9

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 2/6] Accept object data in the fsck_object() function
  2014-09-11 14:26   ` [PATCH v3 " Johannes Schindelin
  2014-09-11 14:26     ` [PATCH v3 1/6] Refactor type_from_string() to avoid die()ing in case of errors Johannes Schindelin
@ 2014-09-11 14:26     ` Johannes Schindelin
  2014-09-11 14:26     ` [PATCH v3 3/6] Make sure fsck_commit_buffer() does not run out of the buffer Johannes Schindelin
                       ` (4 subsequent siblings)
  6 siblings, 0 replies; 68+ messages in thread
From: Johannes Schindelin @ 2014-09-11 14:26 UTC (permalink / raw)
  To: gitster; +Cc: git

When fsck'ing an incoming pack, we need to fsck objects that cannot be
read via read_sha1_file() because they are not local yet (and might even
be rejected if transfer.fsckobjects is set to 'true').

For commits, there is a hack in place: we basically cache commit
objects' buffers anyway, but the same is not true, say, for tag objects.

By refactoring fsck_object() to take the object buffer and size as
optional arguments -- optional, because we still fall back to the
previous method to look at the cached commit objects if the caller
passes NULL -- we prepare the machinery for the upcoming handling of tag
objects.

The assumption that such buffers are inherently NUL terminated is now
wrong, of course, hence we pass the size of the buffer so that we can
add a sanity check later, to prevent running past the end of the buffer.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin/fsck.c           |  2 +-
 builtin/index-pack.c     |  3 ++-
 builtin/unpack-objects.c | 14 ++++++++++----
 fsck.c                   | 24 +++++++++++++++---------
 fsck.h                   |  4 +++-
 5 files changed, 31 insertions(+), 16 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index d42a27d..d9f4e6e 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -298,7 +298,7 @@ static int fsck_obj(struct object *obj)
 
 	if (fsck_walk(obj, mark_used, NULL))
 		objerror(obj, "broken links");
-	if (fsck_object(obj, check_strict, fsck_error_func))
+	if (fsck_object(obj, NULL, 0, check_strict, fsck_error_func))
 		return -1;
 
 	if (obj->type == OBJ_TREE) {
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 5568a5b..f2465ff 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -773,7 +773,8 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
 			if (!obj)
 				die(_("invalid %s"), typename(type));
 			if (do_fsck_object &&
-			    fsck_object(obj, 1, fsck_error_function))
+			    fsck_object(obj, buf, size, 1,
+				    fsck_error_function))
 				die(_("Error in object"));
 			if (fsck_walk(obj, mark_link, NULL))
 				die(_("Not all child objects of %s are reachable"), sha1_to_hex(obj->sha1));
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 99cde45..855d94b 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -164,10 +164,10 @@ static unsigned nr_objects;
  * Called only from check_object() after it verified this object
  * is Ok.
  */
-static void write_cached_object(struct object *obj)
+static void write_cached_object(struct object *obj, struct obj_buffer *obj_buf)
 {
 	unsigned char sha1[20];
-	struct obj_buffer *obj_buf = lookup_object_buffer(obj);
+
 	if (write_sha1_file(obj_buf->buffer, obj_buf->size, typename(obj->type), sha1) < 0)
 		die("failed to write object %s", sha1_to_hex(obj->sha1));
 	obj->flags |= FLAG_WRITTEN;
@@ -180,6 +180,8 @@ static void write_cached_object(struct object *obj)
  */
 static int check_object(struct object *obj, int type, void *data)
 {
+	struct obj_buffer *obj_buf;
+
 	if (!obj)
 		return 1;
 
@@ -198,11 +200,15 @@ static int check_object(struct object *obj, int type, void *data)
 		return 0;
 	}
 
-	if (fsck_object(obj, 1, fsck_error_function))
+	obj_buf = lookup_object_buffer(obj);
+	if (!obj_buf)
+		die("Whoops! Cannot find object '%s'", sha1_to_hex(obj->sha1));
+	if (fsck_object(obj, obj_buf->buffer, obj_buf->size, 1,
+			fsck_error_function))
 		die("Error in object");
 	if (fsck_walk(obj, check_object, NULL))
 		die("Error on reachable objects of %s", sha1_to_hex(obj->sha1));
-	write_cached_object(obj);
+	write_cached_object(obj, obj_buf);
 	return 0;
 }
 
diff --git a/fsck.c b/fsck.c
index 56156ff..dd77628 100644
--- a/fsck.c
+++ b/fsck.c
@@ -277,7 +277,7 @@ static int fsck_ident(const char **ident, struct object *obj, fsck_error error_f
 }
 
 static int fsck_commit_buffer(struct commit *commit, const char *buffer,
-			      fsck_error error_func)
+	unsigned long size, fsck_error error_func)
 {
 	unsigned char tree_sha1[20], sha1[20];
 	struct commit_graft *graft;
@@ -322,15 +322,18 @@ static int fsck_commit_buffer(struct commit *commit, const char *buffer,
 	return 0;
 }
 
-static int fsck_commit(struct commit *commit, fsck_error error_func)
+static int fsck_commit(struct commit *commit, const char *data,
+	unsigned long size, fsck_error error_func)
 {
-	const char *buffer = get_commit_buffer(commit, NULL);
-	int ret = fsck_commit_buffer(commit, buffer, error_func);
-	unuse_commit_buffer(commit, buffer);
+	const char *buffer = data ?  data : get_commit_buffer(commit, &size);
+	int ret = fsck_commit_buffer(commit, buffer, size, error_func);
+	if (!data)
+		unuse_commit_buffer(commit, buffer);
 	return ret;
 }
 
-static int fsck_tag(struct tag *tag, fsck_error error_func)
+static int fsck_tag(struct tag *tag, const char *data,
+	unsigned long size, fsck_error error_func)
 {
 	struct object *tagged = tag->tagged;
 
@@ -339,7 +342,8 @@ static int fsck_tag(struct tag *tag, fsck_error error_func)
 	return 0;
 }
 
-int fsck_object(struct object *obj, int strict, fsck_error error_func)
+int fsck_object(struct object *obj, void *data, unsigned long size,
+	int strict, fsck_error error_func)
 {
 	if (!obj)
 		return error_func(obj, FSCK_ERROR, "no valid object to fsck");
@@ -349,9 +353,11 @@ int fsck_object(struct object *obj, int strict, fsck_error error_func)
 	if (obj->type == OBJ_TREE)
 		return fsck_tree((struct tree *) obj, strict, error_func);
 	if (obj->type == OBJ_COMMIT)
-		return fsck_commit((struct commit *) obj, error_func);
+		return fsck_commit((struct commit *) obj, (const char *) data,
+			size, error_func);
 	if (obj->type == OBJ_TAG)
-		return fsck_tag((struct tag *) obj, error_func);
+		return fsck_tag((struct tag *) obj, (const char *) data,
+			size, error_func);
 
 	return error_func(obj, FSCK_ERROR, "unknown type '%d' (internal fsck error)",
 			  obj->type);
diff --git a/fsck.h b/fsck.h
index 1e4f527..d1e6387 100644
--- a/fsck.h
+++ b/fsck.h
@@ -28,6 +28,8 @@ int fsck_error_function(struct object *obj, int type, const char *fmt, ...);
  *    0		everything OK
  */
 int fsck_walk(struct object *obj, fsck_walk_func walk, void *data);
-int fsck_object(struct object *obj, int strict, fsck_error error_func);
+/* If NULL is passed for data, we assume the object is local and read it. */
+int fsck_object(struct object *obj, void *data, unsigned long size,
+	int strict, fsck_error error_func);
 
 #endif
-- 
2.0.0.rc3.9669.g840d1f9

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 3/6] Make sure fsck_commit_buffer() does not run out of the buffer
  2014-09-11 14:26   ` [PATCH v3 " Johannes Schindelin
  2014-09-11 14:26     ` [PATCH v3 1/6] Refactor type_from_string() to avoid die()ing in case of errors Johannes Schindelin
  2014-09-11 14:26     ` [PATCH v3 2/6] Accept object data in the fsck_object() function Johannes Schindelin
@ 2014-09-11 14:26     ` Johannes Schindelin
  2014-09-11 14:26     ` [PATCH v3 4/6] fsck: check tag objects' headers Johannes Schindelin
                       ` (3 subsequent siblings)
  6 siblings, 0 replies; 68+ messages in thread
From: Johannes Schindelin @ 2014-09-11 14:26 UTC (permalink / raw)
  To: gitster; +Cc: git

So far, we assumed that the buffer is NUL terminated, but this is not
a safe assumption, now that we opened the fsck_object() API to pass a
buffer directly.

So let's make sure that there is at least an empty line in the buffer.
That way, our checks would fail if the empty line was encountered
prematurely, and consequently we can get away with the current string
comparisons even with non-NUL-terminated buffers are passed to
fsck_object().

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 fsck.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/fsck.c b/fsck.c
index dd77628..73da6f8 100644
--- a/fsck.c
+++ b/fsck.c
@@ -237,6 +237,26 @@ static int fsck_tree(struct tree *item, int strict, fsck_error error_func)
 	return retval;
 }
 
+static int require_end_of_header(const void *data, unsigned long size,
+	struct object *obj, fsck_error error_func)
+{
+	const char *buffer = (const char *)data;
+	unsigned long i;
+
+	for (i = 0; i < size; i++) {
+		switch (buffer[i]) {
+		case '\0':
+			return error_func(obj, FSCK_ERROR,
+				"unterminated header: NUL at offset %d", i);
+		case '\n':
+			if (i + 1 < size && buffer[i + 1] == '\n')
+				return 0;
+		}
+	}
+
+	return error_func(obj, FSCK_ERROR, "unterminated header");
+}
+
 static int fsck_ident(const char **ident, struct object *obj, fsck_error error_func)
 {
 	char *end;
@@ -284,6 +304,9 @@ static int fsck_commit_buffer(struct commit *commit, const char *buffer,
 	unsigned parent_count, parent_line_count = 0;
 	int err;
 
+	if (require_end_of_header(buffer, size, &commit->object, error_func))
+		return -1;
+
 	if (!skip_prefix(buffer, "tree ", &buffer))
 		return error_func(&commit->object, FSCK_ERROR, "invalid format - expected 'tree' line");
 	if (get_sha1_hex(buffer, tree_sha1) || buffer[40] != '\n')
-- 
2.0.0.rc3.9669.g840d1f9

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 4/6] fsck: check tag objects' headers
  2014-09-11 14:26   ` [PATCH v3 " Johannes Schindelin
                       ` (2 preceding siblings ...)
  2014-09-11 14:26     ` [PATCH v3 3/6] Make sure fsck_commit_buffer() does not run out of the buffer Johannes Schindelin
@ 2014-09-11 14:26     ` Johannes Schindelin
  2014-09-11 14:26     ` [PATCH v3 5/6] Add regression tests for stricter tag fsck'ing Johannes Schindelin
                       ` (2 subsequent siblings)
  6 siblings, 0 replies; 68+ messages in thread
From: Johannes Schindelin @ 2014-09-11 14:26 UTC (permalink / raw)
  To: gitster; +Cc: git

We inspect commit objects pretty much in detail in git-fsck, but we just
glanced over the tag objects. Let's be stricter.

Since we do not want to limit 'tag' lines unduly, values that would fail
the refname check only result in warnings, not errors.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 fsck.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 85 insertions(+), 1 deletion(-)

diff --git a/fsck.c b/fsck.c
index 73da6f8..2fffa43 100644
--- a/fsck.c
+++ b/fsck.c
@@ -6,6 +6,7 @@
 #include "commit.h"
 #include "tag.h"
 #include "fsck.h"
+#include "refs.h"
 
 static int fsck_walk_tree(struct tree *tree, fsck_walk_func walk, void *data)
 {
@@ -355,6 +356,88 @@ static int fsck_commit(struct commit *commit, const char *data,
 	return ret;
 }
 
+static int fsck_tag_buffer(struct tag *tag, const char *data,
+	unsigned long size, fsck_error error_func)
+{
+	unsigned char sha1[20];
+	int ret = 0;
+	const char *buffer;
+	char *to_free = NULL, *eol;
+	struct strbuf sb = STRBUF_INIT;
+
+	if (data)
+		buffer = data;
+	else {
+		enum object_type type;
+
+		buffer = to_free =
+			read_sha1_file(tag->object.sha1, &type, &size);
+		if (!buffer)
+			return error_func(&tag->object, FSCK_ERROR,
+				"cannot read tag object");
+
+		if (type != OBJ_TAG) {
+			ret = error_func(&tag->object, FSCK_ERROR,
+				"expected tag got %s",
+			    typename(type));
+			goto done;
+		}
+	}
+
+	if (require_end_of_header(buffer, size, &tag->object, error_func))
+		goto done;
+
+	if (!skip_prefix(buffer, "object ", &buffer)) {
+		ret = error_func(&tag->object, FSCK_ERROR, "invalid format - expected 'object' line");
+		goto done;
+	}
+	if (get_sha1_hex(buffer, sha1) || buffer[40] != '\n') {
+		ret = error_func(&tag->object, FSCK_ERROR, "invalid 'object' line format - bad sha1");
+		goto done;
+	}
+	buffer += 41;
+
+	if (!skip_prefix(buffer, "type ", &buffer)) {
+		ret = error_func(&tag->object, FSCK_ERROR, "invalid format - expected 'type' line");
+		goto done;
+	}
+	eol = strchr(buffer, '\n');
+	if (!eol) {
+		ret = error_func(&tag->object, FSCK_ERROR, "invalid format - unexpected end after 'type' line");
+		goto done;
+	}
+	if (type_from_string_gently(buffer, eol - buffer, 1) < 0)
+		ret = error_func(&tag->object, FSCK_ERROR, "invalid 'type' value");
+	if (ret)
+		goto done;
+	buffer = eol + 1;
+
+	if (!skip_prefix(buffer, "tag ", &buffer)) {
+		ret = error_func(&tag->object, FSCK_ERROR, "invalid format - expected 'tag' line");
+		goto done;
+	}
+	eol = strchr(buffer, '\n');
+	if (!eol) {
+		ret = error_func(&tag->object, FSCK_ERROR, "invalid format - unexpected end after 'type' line");
+		goto done;
+	}
+	strbuf_addf(&sb, "refs/tags/%.*s", (int)(eol - buffer), buffer);
+	if (check_refname_format(sb.buf, 0))
+		error_func(&tag->object, FSCK_WARN, "invalid 'tag' name: %s", buffer);
+	buffer = eol + 1;
+
+	if (!skip_prefix(buffer, "tagger ", &buffer))
+		/* early tags do not contain 'tagger' lines; warn only */
+		error_func(&tag->object, FSCK_WARN, "invalid format - expected 'tagger' line");
+	else
+		ret = fsck_ident(&buffer, &tag->object, error_func);
+
+done:
+	strbuf_release(&sb);
+	free(to_free);
+	return ret;
+}
+
 static int fsck_tag(struct tag *tag, const char *data,
 	unsigned long size, fsck_error error_func)
 {
@@ -362,7 +445,8 @@ static int fsck_tag(struct tag *tag, const char *data,
 
 	if (!tagged)
 		return error_func(&tag->object, FSCK_ERROR, "could not load tagged object");
-	return 0;
+
+	return fsck_tag_buffer(tag, data, size, error_func);
 }
 
 int fsck_object(struct object *obj, void *data, unsigned long size,
-- 
2.0.0.rc3.9669.g840d1f9

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 5/6] Add regression tests for stricter tag fsck'ing
  2014-09-11 14:26   ` [PATCH v3 " Johannes Schindelin
                       ` (3 preceding siblings ...)
  2014-09-11 14:26     ` [PATCH v3 4/6] fsck: check tag objects' headers Johannes Schindelin
@ 2014-09-11 14:26     ` Johannes Schindelin
  2014-09-11 14:26     ` [PATCH v3 6/6] Make sure that index-pack --strict checks tag objects Johannes Schindelin
  2014-09-12  8:07     ` [PATCH v4 0/6] Improve tag checking in fsck and with transfer.fsckobjects Johannes Schindelin
  6 siblings, 0 replies; 68+ messages in thread
From: Johannes Schindelin @ 2014-09-11 14:26 UTC (permalink / raw)
  To: gitster; +Cc: git

The intent of the new test case is to catch general breakages in
the fsck_tag() function, not so much to test it extensively, trying to
strike the proper balance between thoroughness and speed.

While it *would* have been nice to test the code path where fsck_object()
encounters an invalid tag object, this is not possible using git fsck: tag
objects are parsed already before fsck'ing (and the parser already fails
upon such objects).

Even worse: we would not even be able write out invalid tag objects
because git hash-object parses those objects, too, unless we resorted to
really ugly hacks such as using something like this in the unit tests
(essentially depending on Perl *and* Compress::Zlib):

	hash_invalid_object () {
		contents="$(printf '%s %d\0%s' "$1" ${#2} "$2")" &&
		sha1=$(echo "$contents" | test-sha1) &&
		suffix=${sha1#??} &&
		mkdir -p .git/objects/${sha1%$suffix} &&
		echo "$contents" |
		perl -MCompress::Zlib -e 'undef $/; print compress(<>)' \
			> .git/objects/${sha1%$suffix}/$suffix &&
		echo $sha1
	}

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t1450-fsck.sh | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 8c739c9..2b6a6f2 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -194,6 +194,26 @@ test_expect_success 'tag pointing to something else than its type' '
 	test_must_fail git fsck --tags
 '
 
+test_expect_success 'tag with incorrect tag name & missing tagger' '
+	sha=$(git rev-parse HEAD) &&
+	cat >wrong-tag <<-EOF &&
+	object $sha
+	type commit
+	tag wrong name format
+
+	This is an invalid tag.
+	EOF
+
+	tag=$(git hash-object -t tag -w --stdin <wrong-tag) &&
+	test_when_finished "remove_object $tag" &&
+	echo $tag >.git/refs/tags/wrong &&
+	test_when_finished "git update-ref -d refs/tags/wrong" &&
+	git fsck --tags 2>out &&
+	cat out &&
+	grep "invalid .tag. name" out &&
+	grep "expected .tagger. line" out
+'
+
 test_expect_success 'cleaned up' '
 	git fsck >actual 2>&1 &&
 	test_cmp empty actual
-- 
2.0.0.rc3.9669.g840d1f9

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v3 6/6] Make sure that index-pack --strict checks tag objects
  2014-09-11 14:26   ` [PATCH v3 " Johannes Schindelin
                       ` (4 preceding siblings ...)
  2014-09-11 14:26     ` [PATCH v3 5/6] Add regression tests for stricter tag fsck'ing Johannes Schindelin
@ 2014-09-11 14:26     ` Johannes Schindelin
  2014-09-11 17:58       ` Junio C Hamano
  2014-09-12  8:07     ` [PATCH v4 0/6] Improve tag checking in fsck and with transfer.fsckobjects Johannes Schindelin
  6 siblings, 1 reply; 68+ messages in thread
From: Johannes Schindelin @ 2014-09-11 14:26 UTC (permalink / raw)
  To: gitster; +Cc: git

One of the most important use cases for the strict tag object checking
is when transfer.fsckobjects is set to true to catch invalid objects
early on. This new regression test essentially tests the same code path
by directly calling 'index-pack --strict' on a pack containing an
tag object without a 'tagger' line.

Technically, this test is not enough: it only exercises a code path that
*warns*, not one that *fails*. The reason is that it would be exquisitely
convoluted to test that: not only hash-object, but also pack-index
actually *parse* tag objects when encountering them. Therefore we would
have to actively *break* pack-index in order to test this. Or rewrite
both hash-object and pack-index in shell script. Ain't gonna happen.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t5302-pack-index.sh | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/t/t5302-pack-index.sh b/t/t5302-pack-index.sh
index 4bbb718..4d033df 100755
--- a/t/t5302-pack-index.sh
+++ b/t/t5302-pack-index.sh
@@ -243,4 +243,23 @@ test_expect_success 'running index-pack in the object store' '
     test -f .git/objects/pack/pack-${pack1}.idx
 '
 
+test_expect_success 'index-pack --strict warns upon missing tagger in tag' '
+    sha=$(git rev-parse HEAD) &&
+    cat >wrong-tag <<EOF &&
+object $sha
+type commit
+tag guten tag
+
+This is an invalid tag.
+EOF
+
+    tag=$(git hash-object -t tag -w --stdin <wrong-tag) &&
+    pack1=$(echo $tag $sha | git pack-objects tag-test) &&
+    echo remove tag object &&
+    thirtyeight=${tag#??} &&
+    rm -f .git/objects/${tag%$thirtyeight}/$thirtyeight &&
+    git index-pack --strict tag-test-${pack1}.pack 2> err &&
+    grep "^error:.* expected .tagger. line" err
+'
+
 test_done
-- 
2.0.0.rc3.9669.g840d1f9

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 3/6] Make sure fsck_commit_buffer() does not run out of the buffer
  2014-09-11 11:59       ` Johannes Schindelin
@ 2014-09-11 16:49         ` Junio C Hamano
  0 siblings, 0 replies; 68+ messages in thread
From: Junio C Hamano @ 2014-09-11 16:49 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> > +	for (i = 0; i < size; i++) {
>> > +		switch (buffer[i]) {
>> > +		case '\0':
>> > +			return error_func(obj, FSCK_ERROR,
>> > +				"invalid message: NUL at offset %d", i);
>> 
>> Isn't this "invalid header"?  After all we haven't escaped this loop
>> and haven't seen the message part of the commit object (and it is
>> the same if you are going to later reuse this for tag objects).
>
> My reasoning for keeping it saying "message" was that a message consists
> of a header and a body. I will change it to "unterminated header" instead,
> also in the error message when no NUL was found.

Because end users think of a "message" in the context of discussing
either a commit or a tag as what they give as the value to the "-m"
option, the object payload consists of a header and a body the
latter of which *is* the message to them.  Choosing a word that is
not "message" is a good way to avoid potential confusion here.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 6/6] Make sure that index-pack --strict fails upon invalid tag objects
  2014-09-11 14:22       ` Johannes Schindelin
@ 2014-09-11 16:50         ` Junio C Hamano
  2014-09-11 17:04           ` Johannes Schindelin
  0 siblings, 1 reply; 68+ messages in thread
From: Junio C Hamano @ 2014-09-11 16:50 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Hi,
>
> On Wed, 10 Sep 2014, Junio C Hamano wrote:
>
>> Johannes Schindelin <johannes.schindelin@gmx.de> writes:
>> 
>> > +    tag=$(git hash-object -t tag -w --stdin <wrong-tag) &&
>> > +    pack1=$(echo $tag | git pack-objects tag-test) &&
>> > +    echo remove tag object &&
>> > +    thirtyeight=${tag#??} &&
>> > +    rm -f .git/objects/${tag%$thirtyeight}/$thirtyeight &&
>> > +    test_must_fail git index-pack --strict tag-test-${pack1}.pack 2> err &&
>> 
>> I had to drop "must fail" from this one (will amend the "SQUASH???"
>> again).
>
> Funny. It failed here, but for the wrong reason: index-pack --strict
> failed here because the object referenced by the tag object was not in the
> pack.

That is strange.  It failed with the version you sent to the list
for me for a different reason---it tried to verify the ident that
did not exist in the tested object (which we fixed in the squash).

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v2 6/6] Make sure that index-pack --strict fails upon invalid tag objects
  2014-09-11 16:50         ` Junio C Hamano
@ 2014-09-11 17:04           ` Johannes Schindelin
  0 siblings, 0 replies; 68+ messages in thread
From: Johannes Schindelin @ 2014-09-11 17:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Hi Junio,

On Thu, 11 Sep 2014, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > On Wed, 10 Sep 2014, Junio C Hamano wrote:
> >
> >> Johannes Schindelin <johannes.schindelin@gmx.de> writes:
> >> 
> >> > +    tag=$(git hash-object -t tag -w --stdin <wrong-tag) &&
> >> > +    pack1=$(echo $tag | git pack-objects tag-test) &&
> >> > +    echo remove tag object &&
> >> > +    thirtyeight=${tag#??} &&
> >> > +    rm -f .git/objects/${tag%$thirtyeight}/$thirtyeight &&
> >> > +    test_must_fail git index-pack --strict tag-test-${pack1}.pack 2> err &&
> >> 
> >> I had to drop "must fail" from this one (will amend the "SQUASH???"
> >> again).
> >
> > Funny. It failed here, but for the wrong reason: index-pack --strict
> > failed here because the object referenced by the tag object was not in the
> > pack.
> 
> That is strange.  It failed with the version you sent to the list
> for me for a different reason---it tried to verify the ident that
> did not exist in the tested object (which we fixed in the squash).

Hmm. It is very well possible that I ran the tests in the middle of a
rebase, i.e. without my changes to t5302. Will pay more attention next
time, sorry!

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 6/6] Make sure that index-pack --strict checks tag objects
  2014-09-11 14:26     ` [PATCH v3 6/6] Make sure that index-pack --strict checks tag objects Johannes Schindelin
@ 2014-09-11 17:58       ` Junio C Hamano
  2014-09-11 21:16         ` Junio C Hamano
  0 siblings, 1 reply; 68+ messages in thread
From: Junio C Hamano @ 2014-09-11 17:58 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, Jeff King

Johannes Schindelin <johannes.schindelin@gmx.de> writes:

> One of the most important use cases for the strict tag object checking
> is when transfer.fsckobjects is set to true to catch invalid objects
> early on. This new regression test essentially tests the same code path
> by directly calling 'index-pack --strict' on a pack containing an
> tag object without a 'tagger' line.
>
> Technically, this test is not enough: it only exercises a code path that
> *warns*, not one that *fails*. The reason is that it would be exquisitely
> convoluted to test that: not only hash-object, but also pack-index
> actually *parse* tag objects when encountering them. Therefore we would
> have to actively *break* pack-index in order to test this. Or rewrite
> both hash-object and pack-index in shell script. Ain't gonna happen.

It is perfectly OK to feel and even say "I am not going to do that
in this series" here, but is not very welcome to cast such a
negative "Ain't gonna happen." attitude in stone in the log message
in our history.

When our toolset has become too tight without leaving enough escape
hatch to hinder further development, it is very sensible to at least
think about adding a new "--for-debug" option to hash-object and
pack-objects that allows us to deliberately create broken
datastreams to test against.

I think peff recently added helpers to t5303 to allow us test
corrupt pack streams, which is a way different from what you
envisioned above (i.e. "actively break pack-index") to test
breakages like the ones you were trying to test here.

Having said all that, I do think the series that added new warnings,
errors and a test for the new warnings is an improvement without a
test for the new errors.  So let's queue this, see if somebody comes
up a way to check for these error detection codepath on top of this
series.

Thanks.

> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  t/t5302-pack-index.sh | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
>
> diff --git a/t/t5302-pack-index.sh b/t/t5302-pack-index.sh
> index 4bbb718..4d033df 100755
> --- a/t/t5302-pack-index.sh
> +++ b/t/t5302-pack-index.sh
> @@ -243,4 +243,23 @@ test_expect_success 'running index-pack in the object store' '
>      test -f .git/objects/pack/pack-${pack1}.idx
>  '
>  
> +test_expect_success 'index-pack --strict warns upon missing tagger in tag' '
> +    sha=$(git rev-parse HEAD) &&
> +    cat >wrong-tag <<EOF &&
> +object $sha
> +type commit
> +tag guten tag
> +
> +This is an invalid tag.
> +EOF
> +
> +    tag=$(git hash-object -t tag -w --stdin <wrong-tag) &&
> +    pack1=$(echo $tag $sha | git pack-objects tag-test) &&
> +    echo remove tag object &&
> +    thirtyeight=${tag#??} &&
> +    rm -f .git/objects/${tag%$thirtyeight}/$thirtyeight &&
> +    git index-pack --strict tag-test-${pack1}.pack 2> err &&

s/2> err/2>err/;

> +    grep "^error:.* expected .tagger. line" err
> +'
> +
>  test_done

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 6/6] Make sure that index-pack --strict checks tag objects
  2014-09-11 17:58       ` Junio C Hamano
@ 2014-09-11 21:16         ` Junio C Hamano
  2014-09-11 21:17           ` [PATCH 0/3] hash-object --literally Junio C Hamano
  2014-09-12  8:04           ` [PATCH v3 6/6] Make sure that index-pack --strict checks tag objects Johannes Schindelin
  0 siblings, 2 replies; 68+ messages in thread
From: Junio C Hamano @ 2014-09-11 21:16 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, Jeff King

Junio C Hamano <gitster@pobox.com> writes:

> When our toolset has become too tight without leaving enough escape
> hatch to hinder further development, it is very sensible to at least
> think about adding a new "--for-debug" option to hash-object and
> pack-objects that allows us to deliberately create broken
> datastreams to test against.
>
> I think peff recently added helpers to t5303 to allow us test
> corrupt pack streams, which is a way different from what you
> envisioned above (i.e. "actively break pack-index") to test
> breakages like the ones you were trying to test here.
>
> Having said all that, I do think the series that added new warnings,
> errors and a test for the new warnings is an improvement without a
> test for the new errors.  So let's queue this, see if somebody comes
> up a way to check for these error detection codepath on top of this
> series.

It wasn't too painful to do one of them, and the result looks rather
nice.  It lets us add this patch on top of your series.

-- >8 --
Subject: [PATCH] t1450: make sure fsck detects a malformed tagger line

With "hash-object --literally", write a tag object that is not
supposed to pass one of the new checks added to "fsck", and make
sure that the new check catches the breakage.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 t/t1450-fsck.sh | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 9118253..6ecb844 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -234,6 +234,25 @@ test_expect_success 'tag with incorrect tag name & missing tagger' '
 	grep "expected .tagger. line" out
 '
 
+test_expect_success 'tag with bad tagger' '
+	sha=$(git rev-parse HEAD) &&
+	cat >wrong-tag <<-EOF &&
+	object $sha
+	type commit
+	tag not-quite-wrong
+	tagger Bad Tagger Name
+
+	This is an invalid tag.
+	EOF
+
+	tag=$(git hash-object --literally -t tag -w --stdin <wrong-tag) &&
+	test_when_finished "remove_object $tag" &&
+	echo $tag >.git/refs/tags/wrong &&
+	test_when_finished "git update-ref -d refs/tags/wrong" &&
+	test_must_fail git fsck --tags 2>out &&
+	grep "error in tag .*: invalid author/committer" out
+'
+
 test_expect_success 'cleaned up' '
 	git fsck >actual 2>&1 &&
 	test_cmp empty actual
-- 
2.1.0-459-g1bc3b2b

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 0/3] hash-object --literally
  2014-09-11 21:16         ` Junio C Hamano
@ 2014-09-11 21:17           ` Junio C Hamano
  2014-09-11 21:17             ` [PATCH 1/3] hash-object: reduce file-scope statics Junio C Hamano
                               ` (2 more replies)
  2014-09-12  8:04           ` [PATCH v3 6/6] Make sure that index-pack --strict checks tag objects Johannes Schindelin
  1 sibling, 3 replies; 68+ messages in thread
From: Junio C Hamano @ 2014-09-11 21:17 UTC (permalink / raw)
  To: git

Our toolset may have become too tight without leaving enough escape
hatch to hinder further development.  "hash-object" makes minimum
sanity checks by default for a very good reason, but it means that
we cannot deliberately create broken datastreams to test against
fsck and other codepaths that are supposed to detect and report such
broken data that we may encounter in the field, perhaps created by
third-party tools.

These changes teach a new "--literally" option to "hash-object" to
allow us throw any garbage to create a broken loose object.  You can
even do something horrible like

	git hash-object -t bogus --literally -w --stdin </dev/null
	
by any garbage typename if you wanted to.

It probably needs to be accompanied by "cat-file --literally" that
does not take "-t <type>" option or does not complain even if the
contents look unreasonable.  But that is for another day (hint,
hint).

The second patch is optional.  I thought I may want to pass this as
a new HASH_LITERALLY bit down the callchain to index_fd(), but I
decided against it, as that will allow other codepaths to create
broken datastream, spreading this debugging aid a bit too widely for
my taste.

Junio C Hamano (3):
  hash-object: reduce file-scope statics
  hash-object: pass 'write_object' as a flag
  hash-object: add --literally option

 builtin/hash-object.c | 103 ++++++++++++++++++++++++++++++--------------------
 1 file changed, 61 insertions(+), 42 deletions(-)

-- 
2.1.0-459-g1bc3b2b

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH 1/3] hash-object: reduce file-scope statics
  2014-09-11 21:17           ` [PATCH 0/3] hash-object --literally Junio C Hamano
@ 2014-09-11 21:17             ` Junio C Hamano
  2014-09-11 21:17             ` [PATCH 2/3] hash-object: pass 'write_object' as a flag Junio C Hamano
  2014-09-11 21:17             ` [PATCH 3/3] hash-object: add --literally option Junio C Hamano
  2 siblings, 0 replies; 68+ messages in thread
From: Junio C Hamano @ 2014-09-11 21:17 UTC (permalink / raw)
  To: git

Most of the knobs that affect helper functions called from
cmd_hash_object() were passed to them as parameters already, and the
only effect of having them as file-scope statics was to make the
reader wonder if the parameters are hiding the file-scope global
values by accident.  Adjust their initialisation and make them
function-local variables.

The only exception was no_filters hash_stdin_paths() peeked from the
file-scope global, which was converted to a parameter to the helper
function.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/hash-object.c | 52 +++++++++++++++++++++++----------------------------
 1 file changed, 23 insertions(+), 29 deletions(-)

diff --git a/builtin/hash-object.c b/builtin/hash-object.c
index d7fcf4c..40008e2 100644
--- a/builtin/hash-object.c
+++ b/builtin/hash-object.c
@@ -36,9 +36,7 @@ static void hash_object(const char *path, const char *type, int write_object,
 	hash_fd(fd, type, write_object, vpath);
 }
 
-static int no_filters;
-
-static void hash_stdin_paths(const char *type, int write_objects)
+static void hash_stdin_paths(const char *type, int write_objects, int no_filters)
 {
 	struct strbuf buf = STRBUF_INIT, nbuf = STRBUF_INIT;
 
@@ -50,42 +48,38 @@ static void hash_stdin_paths(const char *type, int write_objects)
 			strbuf_swap(&buf, &nbuf);
 		}
 		hash_object(buf.buf, type, write_objects,
-		    no_filters ? NULL : buf.buf);
+			    no_filters ? NULL : buf.buf);
 	}
 	strbuf_release(&buf);
 	strbuf_release(&nbuf);
 }
 
-static const char * const hash_object_usage[] = {
-	N_("git hash-object [-t <type>] [-w] [--path=<file>|--no-filters] [--stdin] [--] <file>..."),
-	N_("git hash-object  --stdin-paths < <list-of-paths>"),
-	NULL
-};
-
-static const char *type;
-static int write_object;
-static int hashstdin;
-static int stdin_paths;
-static const char *vpath;
-
-static const struct option hash_object_options[] = {
-	OPT_STRING('t', NULL, &type, N_("type"), N_("object type")),
-	OPT_BOOL('w', NULL, &write_object, N_("write the object into the object database")),
-	OPT_COUNTUP( 0 , "stdin", &hashstdin, N_("read the object from stdin")),
-	OPT_BOOL( 0 , "stdin-paths", &stdin_paths, N_("read file names from stdin")),
-	OPT_BOOL( 0 , "no-filters", &no_filters, N_("store file as is without filters")),
-	OPT_STRING( 0 , "path", &vpath, N_("file"), N_("process file as it were from this path")),
-	OPT_END()
-};
-
 int cmd_hash_object(int argc, const char **argv, const char *prefix)
 {
+	static const char * const hash_object_usage[] = {
+		N_("git hash-object [-t <type>] [-w] [--path=<file>|--no-filters] [--stdin] [--] <file>..."),
+		N_("git hash-object  --stdin-paths < <list-of-paths>"),
+		NULL
+	};
+	const char *type = blob_type;
+	int hashstdin = 0;
+	int stdin_paths = 0;
+	int write_object = 0;
+	int no_filters = 0;
+	const char *vpath = NULL;
+	const struct option hash_object_options[] = {
+		OPT_STRING('t', NULL, &type, N_("type"), N_("object type")),
+		OPT_BOOL('w', NULL, &write_object, N_("write the object into the object database")),
+		OPT_COUNTUP( 0 , "stdin", &hashstdin, N_("read the object from stdin")),
+		OPT_BOOL( 0 , "stdin-paths", &stdin_paths, N_("read file names from stdin")),
+		OPT_BOOL( 0 , "no-filters", &no_filters, N_("store file as is without filters")),
+		OPT_STRING( 0 , "path", &vpath, N_("file"), N_("process file as it were from this path")),
+		OPT_END()
+	};
 	int i;
 	int prefix_length = -1;
 	const char *errstr = NULL;
 
-	type = blob_type;
-
 	argc = parse_options(argc, argv, NULL, hash_object_options,
 			     hash_object_usage, 0);
 
@@ -131,7 +125,7 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
 	}
 
 	if (stdin_paths)
-		hash_stdin_paths(type, write_object);
+		hash_stdin_paths(type, write_object, no_filters);
 
 	return 0;
 }
-- 
2.1.0-459-g1bc3b2b

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 2/3] hash-object: pass 'write_object' as a flag
  2014-09-11 21:17           ` [PATCH 0/3] hash-object --literally Junio C Hamano
  2014-09-11 21:17             ` [PATCH 1/3] hash-object: reduce file-scope statics Junio C Hamano
@ 2014-09-11 21:17             ` Junio C Hamano
  2014-09-11 21:17             ` [PATCH 3/3] hash-object: add --literally option Junio C Hamano
  2 siblings, 0 replies; 68+ messages in thread
From: Junio C Hamano @ 2014-09-11 21:17 UTC (permalink / raw)
  To: git

Instead of forcing callers of lower level functions write
(write_object ? HASH_WRITE_OBJECT : 0), prepare the flag to be
passed down in the callchain from the command line parser.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/hash-object.c | 32 +++++++++++++++-----------------
 1 file changed, 15 insertions(+), 17 deletions(-)

diff --git a/builtin/hash-object.c b/builtin/hash-object.c
index 40008e2..1fb07ee 100644
--- a/builtin/hash-object.c
+++ b/builtin/hash-object.c
@@ -10,33 +10,31 @@
 #include "parse-options.h"
 #include "exec_cmd.h"
 
-static void hash_fd(int fd, const char *type, int write_object, const char *path)
+static void hash_fd(int fd, const char *type, const char *path, unsigned flags)
 {
 	struct stat st;
 	unsigned char sha1[20];
-	unsigned flags = (HASH_FORMAT_CHECK |
-			  (write_object ? HASH_WRITE_OBJECT : 0));
 
 	if (fstat(fd, &st) < 0 ||
 	    index_fd(sha1, fd, &st, type_from_string(type), path, flags))
-		die(write_object
+		die((flags & HASH_WRITE_OBJECT)
 		    ? "Unable to add %s to database"
 		    : "Unable to hash %s", path);
 	printf("%s\n", sha1_to_hex(sha1));
 	maybe_flush_or_die(stdout, "hash to stdout");
 }
 
-static void hash_object(const char *path, const char *type, int write_object,
-			const char *vpath)
+static void hash_object(const char *path, const char *type, const char *vpath,
+			unsigned flags)
 {
 	int fd;
 	fd = open(path, O_RDONLY);
 	if (fd < 0)
 		die_errno("Cannot open '%s'", path);
-	hash_fd(fd, type, write_object, vpath);
+	hash_fd(fd, type, vpath, flags);
 }
 
-static void hash_stdin_paths(const char *type, int write_objects, int no_filters)
+static void hash_stdin_paths(const char *type, int no_filters, unsigned flags)
 {
 	struct strbuf buf = STRBUF_INIT, nbuf = STRBUF_INIT;
 
@@ -47,8 +45,7 @@ static void hash_stdin_paths(const char *type, int write_objects, int no_filters
 				die("line is badly quoted");
 			strbuf_swap(&buf, &nbuf);
 		}
-		hash_object(buf.buf, type, write_objects,
-			    no_filters ? NULL : buf.buf);
+		hash_object(buf.buf, type, no_filters ? NULL : buf.buf, flags);
 	}
 	strbuf_release(&buf);
 	strbuf_release(&nbuf);
@@ -64,12 +61,13 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
 	const char *type = blob_type;
 	int hashstdin = 0;
 	int stdin_paths = 0;
-	int write_object = 0;
 	int no_filters = 0;
+	unsigned flags = HASH_FORMAT_CHECK;
 	const char *vpath = NULL;
 	const struct option hash_object_options[] = {
 		OPT_STRING('t', NULL, &type, N_("type"), N_("object type")),
-		OPT_BOOL('w', NULL, &write_object, N_("write the object into the object database")),
+		OPT_BIT('w', NULL, &flags, N_("write the object into the object database"),
+			HASH_WRITE_OBJECT),
 		OPT_COUNTUP( 0 , "stdin", &hashstdin, N_("read the object from stdin")),
 		OPT_BOOL( 0 , "stdin-paths", &stdin_paths, N_("read file names from stdin")),
 		OPT_BOOL( 0 , "no-filters", &no_filters, N_("store file as is without filters")),
@@ -83,7 +81,7 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
 	argc = parse_options(argc, argv, NULL, hash_object_options,
 			     hash_object_usage, 0);
 
-	if (write_object) {
+	if (flags & HASH_WRITE_OBJECT) {
 		prefix = setup_git_directory();
 		prefix_length = prefix ? strlen(prefix) : 0;
 		if (vpath && prefix)
@@ -113,19 +111,19 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
 	}
 
 	if (hashstdin)
-		hash_fd(0, type, write_object, vpath);
+		hash_fd(0, type, vpath, flags);
 
 	for (i = 0 ; i < argc; i++) {
 		const char *arg = argv[i];
 
 		if (0 <= prefix_length)
 			arg = prefix_filename(prefix, prefix_length, arg);
-		hash_object(arg, type, write_object,
-			    no_filters ? NULL : vpath ? vpath : arg);
+		hash_object(arg, type, no_filters ? NULL : vpath ? vpath : arg,
+			    flags);
 	}
 
 	if (stdin_paths)
-		hash_stdin_paths(type, write_object, no_filters);
+		hash_stdin_paths(type, no_filters, flags);
 
 	return 0;
 }
-- 
2.1.0-459-g1bc3b2b

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH 3/3] hash-object: add --literally option
  2014-09-11 21:17           ` [PATCH 0/3] hash-object --literally Junio C Hamano
  2014-09-11 21:17             ` [PATCH 1/3] hash-object: reduce file-scope statics Junio C Hamano
  2014-09-11 21:17             ` [PATCH 2/3] hash-object: pass 'write_object' as a flag Junio C Hamano
@ 2014-09-11 21:17             ` Junio C Hamano
  2 siblings, 0 replies; 68+ messages in thread
From: Junio C Hamano @ 2014-09-11 21:17 UTC (permalink / raw)
  To: git

This is allows "hash-object --stdin" to just hash any garbage into a
"loose object" that may not pass the standard object parsing check
or fsck, so that different kind of corrupt objects third-party tools
may create can be imitated in our test suite.  That would in turn
allow us to test features that catch these corrupt objects.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/hash-object.c | 45 ++++++++++++++++++++++++++++++++++++---------
 1 file changed, 36 insertions(+), 9 deletions(-)

diff --git a/builtin/hash-object.c b/builtin/hash-object.c
index 1fb07ee..6158363 100644
--- a/builtin/hash-object.c
+++ b/builtin/hash-object.c
@@ -10,13 +10,36 @@
 #include "parse-options.h"
 #include "exec_cmd.h"
 
-static void hash_fd(int fd, const char *type, const char *path, unsigned flags)
+/*
+ * This is to create corrupt objects for debugging and as such it
+ * needs to bypass the data conversion performed by, and the type
+ * limitation imposed by, index_fd() and its callees.
+ */
+static int hash_literally(unsigned char *sha1, int fd, const char *type, unsigned flags)
+{
+	struct strbuf buf = STRBUF_INIT;
+	int ret;
+
+	if (strbuf_read(&buf, fd, 4096) < 0)
+		ret = -1;
+	else if (flags & HASH_WRITE_OBJECT)
+		ret = write_sha1_file(buf.buf, buf.len, type, sha1);
+	else
+		ret = hash_sha1_file(buf.buf, buf.len, type, sha1);
+	strbuf_release(&buf);
+	return ret;
+}
+
+static void hash_fd(int fd, const char *type, const char *path, unsigned flags,
+		    int literally)
 {
 	struct stat st;
 	unsigned char sha1[20];
 
 	if (fstat(fd, &st) < 0 ||
-	    index_fd(sha1, fd, &st, type_from_string(type), path, flags))
+	    (literally
+	     ? hash_literally(sha1, fd, type, flags)
+	     : index_fd(sha1, fd, &st, type_from_string(type), path, flags)))
 		die((flags & HASH_WRITE_OBJECT)
 		    ? "Unable to add %s to database"
 		    : "Unable to hash %s", path);
@@ -25,16 +48,17 @@ static void hash_fd(int fd, const char *type, const char *path, unsigned flags)
 }
 
 static void hash_object(const char *path, const char *type, const char *vpath,
-			unsigned flags)
+			unsigned flags, int literally)
 {
 	int fd;
 	fd = open(path, O_RDONLY);
 	if (fd < 0)
 		die_errno("Cannot open '%s'", path);
-	hash_fd(fd, type, vpath, flags);
+	hash_fd(fd, type, vpath, flags, literally);
 }
 
-static void hash_stdin_paths(const char *type, int no_filters, unsigned flags)
+static void hash_stdin_paths(const char *type, int no_filters, unsigned flags,
+			     int literally)
 {
 	struct strbuf buf = STRBUF_INIT, nbuf = STRBUF_INIT;
 
@@ -45,7 +69,8 @@ static void hash_stdin_paths(const char *type, int no_filters, unsigned flags)
 				die("line is badly quoted");
 			strbuf_swap(&buf, &nbuf);
 		}
-		hash_object(buf.buf, type, no_filters ? NULL : buf.buf, flags);
+		hash_object(buf.buf, type, no_filters ? NULL : buf.buf, flags,
+			    literally);
 	}
 	strbuf_release(&buf);
 	strbuf_release(&nbuf);
@@ -62,6 +87,7 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
 	int hashstdin = 0;
 	int stdin_paths = 0;
 	int no_filters = 0;
+	int literally = 0;
 	unsigned flags = HASH_FORMAT_CHECK;
 	const char *vpath = NULL;
 	const struct option hash_object_options[] = {
@@ -71,6 +97,7 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
 		OPT_COUNTUP( 0 , "stdin", &hashstdin, N_("read the object from stdin")),
 		OPT_BOOL( 0 , "stdin-paths", &stdin_paths, N_("read file names from stdin")),
 		OPT_BOOL( 0 , "no-filters", &no_filters, N_("store file as is without filters")),
+		OPT_BOOL( 0, "literally", &literally, N_("just hash any random garbage to create corrupt objects for debugging Git")),
 		OPT_STRING( 0 , "path", &vpath, N_("file"), N_("process file as it were from this path")),
 		OPT_END()
 	};
@@ -111,7 +138,7 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
 	}
 
 	if (hashstdin)
-		hash_fd(0, type, vpath, flags);
+		hash_fd(0, type, vpath, flags, literally);
 
 	for (i = 0 ; i < argc; i++) {
 		const char *arg = argv[i];
@@ -119,11 +146,11 @@ int cmd_hash_object(int argc, const char **argv, const char *prefix)
 		if (0 <= prefix_length)
 			arg = prefix_filename(prefix, prefix_length, arg);
 		hash_object(arg, type, no_filters ? NULL : vpath ? vpath : arg,
-			    flags);
+			    flags, literally);
 	}
 
 	if (stdin_paths)
-		hash_stdin_paths(type, no_filters, flags);
+		hash_stdin_paths(type, no_filters, flags, literally);
 
 	return 0;
 }
-- 
2.1.0-459-g1bc3b2b

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v3 6/6] Make sure that index-pack --strict checks tag objects
  2014-09-11 21:16         ` Junio C Hamano
  2014-09-11 21:17           ` [PATCH 0/3] hash-object --literally Junio C Hamano
@ 2014-09-12  8:04           ` Johannes Schindelin
  1 sibling, 0 replies; 68+ messages in thread
From: Johannes Schindelin @ 2014-09-12  8:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Jeff King

Hi Junio,

On Thu, 11 Sep 2014, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
> 
> > When our toolset has become too tight without leaving enough escape
> > hatch to hinder further development, it is very sensible to at least
> > think about adding a new "--for-debug" option to hash-object and
> > pack-objects that allows us to deliberately create broken
> > datastreams to test against.
> >
> > [...]
> 
> It wasn't too painful to do one of them, and the result looks rather
> nice.

I was loathe to make it easier for interested parties to create invalid
Git objects and to push them onto servers that cannot yet benefit from my
patch series. At the very least, I would have preferred to put such
functionality into test-* executables (where I searched for that
functionality in the first place), i.e. outside the distributed binaries.

But since you already did the work and it does the job, I won't worry
about it.

A bigger worry is that the additional test verifies that fsck catches the
invalid tag object and exits, when we really want to be certain that "git
fetch --strict" will abort on such an object. So the test is still
indirect, although I admit that it is closer now to what we want.

Version 4 of the patch series (without your hash-object --literally patch
because mailed patch series cannot declare on what branches from 'pu' they
rely, I always base my patch series on 'next' for that reason [*1*])
coming up.

Ciao,
Dscho

Footnote *1*: As always, I push my patch series to a topic branch on
GitHub. The one corresponding to the upcoming patch series is in
https://github.com/dscho/git/compare/next...fsck-tag, the one with your
additional test in
https://github.com/dscho/git/compare/next...fsck-tag-plus (the latter
being a thicket rather than a linear topic branch).

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v4 0/6] Improve tag checking in fsck and with transfer.fsckobjects
  2014-09-11 14:26   ` [PATCH v3 " Johannes Schindelin
                       ` (5 preceding siblings ...)
  2014-09-11 14:26     ` [PATCH v3 6/6] Make sure that index-pack --strict checks tag objects Johannes Schindelin
@ 2014-09-12  8:07     ` Johannes Schindelin
  2014-09-12  8:07       ` [PATCH v4 1/6] Refactor type_from_string() to avoid die()ing in case of errors Johannes Schindelin
                         ` (6 more replies)
  6 siblings, 7 replies; 68+ messages in thread
From: Johannes Schindelin @ 2014-09-12  8:07 UTC (permalink / raw)
  To: gitster; +Cc: git

This patch series introduces detailed checking of tag objects when
calling git fsck, and also when transfer.fsckobjects is set to true.

To this end, the fsck machinery is reworked to accept the buffer and
size of the object to check, and for commit and tag objects, we verify
that the buffers contain an end of header (i.e. an empty line) to
guarantee that our checks do not run beyond the buffer.

This work was sponsored by GitHub.

Changes since v3:

- removed undesired negativity from commit message

- removed space in '2> err'

Johannes Schindelin (6):
  Refactor type_from_string() to avoid die()ing in case of errors
  Accept object data in the fsck_object() function
  Make sure fsck_commit_buffer() does not run out of the buffer
  fsck: check tag objects' headers
  Add regression tests for stricter tag fsck'ing
  Make sure that index-pack --strict checks tag objects

 builtin/fsck.c           |   2 +-
 builtin/index-pack.c     |   3 +-
 builtin/unpack-objects.c |  14 +++--
 fsck.c                   | 133 +++++++++++++++++++++++++++++++++++++++++++----
 fsck.h                   |   4 +-
 object.c                 |  11 +++-
 object.h                 |   3 +-
 t/t1450-fsck.sh          |  20 +++++++
 t/t5302-pack-index.sh    |  19 +++++++
 9 files changed, 189 insertions(+), 20 deletions(-)

-- 
2.0.0.rc3.9669.g840d1f9

^ permalink raw reply	[flat|nested] 68+ messages in thread

* [PATCH v4 1/6] Refactor type_from_string() to avoid die()ing in case of errors
  2014-09-12  8:07     ` [PATCH v4 0/6] Improve tag checking in fsck and with transfer.fsckobjects Johannes Schindelin
@ 2014-09-12  8:07       ` Johannes Schindelin
  2014-09-12  8:07       ` [PATCH v4 2/6] Accept object data in the fsck_object() function Johannes Schindelin
                         ` (5 subsequent siblings)
  6 siblings, 0 replies; 68+ messages in thread
From: Johannes Schindelin @ 2014-09-12  8:07 UTC (permalink / raw)
  To: gitster; +Cc: git

In the next commits, we will enhance the fsck_tag() function to check
tag objects more thoroughly. To this end, we need a function to verify
that a given string is a valid object type, but that does not die() in
the negative case.

While at it, prepare type_from_string() for counted strings, i.e. strings
with an explicitly specified length rather than a NUL termination.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 object.c | 11 +++++++++--
 object.h |  3 ++-
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/object.c b/object.c
index a16b9f9..aedac24 100644
--- a/object.c
+++ b/object.c
@@ -33,13 +33,20 @@ const char *typename(unsigned int type)
 	return object_type_strings[type];
 }
 
-int type_from_string(const char *str)
+int type_from_string_gently(const char *str, ssize_t len, int gentle)
 {
 	int i;
 
+	if (len < 0)
+		len = strlen(str);
+
 	for (i = 1; i < ARRAY_SIZE(object_type_strings); i++)
-		if (!strcmp(str, object_type_strings[i]))
+		if (!strncmp(str, object_type_strings[i], len))
 			return i;
+
+	if (gentle)
+		return -1;
+
 	die("invalid object type \"%s\"", str);
 }
 
diff --git a/object.h b/object.h
index 5e8d8ee..e028ced 100644
--- a/object.h
+++ b/object.h
@@ -53,7 +53,8 @@ struct object {
 };
 
 extern const char *typename(unsigned int type);
-extern int type_from_string(const char *str);
+extern int type_from_string_gently(const char *str, ssize_t, int gentle);
+#define type_from_string(str) type_from_string_gently(str, -1, 0)
 
 /*
  * Return the current number of buckets in the object hashmap.
-- 
2.0.0.rc3.9669.g840d1f9

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v4 2/6] Accept object data in the fsck_object() function
  2014-09-12  8:07     ` [PATCH v4 0/6] Improve tag checking in fsck and with transfer.fsckobjects Johannes Schindelin
  2014-09-12  8:07       ` [PATCH v4 1/6] Refactor type_from_string() to avoid die()ing in case of errors Johannes Schindelin
@ 2014-09-12  8:07       ` Johannes Schindelin
  2014-09-12  8:07       ` [PATCH v4 3/6] Make sure fsck_commit_buffer() does not run out of the buffer Johannes Schindelin
                         ` (4 subsequent siblings)
  6 siblings, 0 replies; 68+ messages in thread
From: Johannes Schindelin @ 2014-09-12  8:07 UTC (permalink / raw)
  To: gitster; +Cc: git

When fsck'ing an incoming pack, we need to fsck objects that cannot be
read via read_sha1_file() because they are not local yet (and might even
be rejected if transfer.fsckobjects is set to 'true').

For commits, there is a hack in place: we basically cache commit
objects' buffers anyway, but the same is not true, say, for tag objects.

By refactoring fsck_object() to take the object buffer and size as
optional arguments -- optional, because we still fall back to the
previous method to look at the cached commit objects if the caller
passes NULL -- we prepare the machinery for the upcoming handling of tag
objects.

The assumption that such buffers are inherently NUL terminated is now
wrong, of course, hence we pass the size of the buffer so that we can
add a sanity check later, to prevent running past the end of the buffer.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 builtin/fsck.c           |  2 +-
 builtin/index-pack.c     |  3 ++-
 builtin/unpack-objects.c | 14 ++++++++++----
 fsck.c                   | 24 +++++++++++++++---------
 fsck.h                   |  4 +++-
 5 files changed, 31 insertions(+), 16 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index d42a27d..d9f4e6e 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -298,7 +298,7 @@ static int fsck_obj(struct object *obj)
 
 	if (fsck_walk(obj, mark_used, NULL))
 		objerror(obj, "broken links");
-	if (fsck_object(obj, check_strict, fsck_error_func))
+	if (fsck_object(obj, NULL, 0, check_strict, fsck_error_func))
 		return -1;
 
 	if (obj->type == OBJ_TREE) {
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 5568a5b..f2465ff 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -773,7 +773,8 @@ static void sha1_object(const void *data, struct object_entry *obj_entry,
 			if (!obj)
 				die(_("invalid %s"), typename(type));
 			if (do_fsck_object &&
-			    fsck_object(obj, 1, fsck_error_function))
+			    fsck_object(obj, buf, size, 1,
+				    fsck_error_function))
 				die(_("Error in object"));
 			if (fsck_walk(obj, mark_link, NULL))
 				die(_("Not all child objects of %s are reachable"), sha1_to_hex(obj->sha1));
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 99cde45..855d94b 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -164,10 +164,10 @@ static unsigned nr_objects;
  * Called only from check_object() after it verified this object
  * is Ok.
  */
-static void write_cached_object(struct object *obj)
+static void write_cached_object(struct object *obj, struct obj_buffer *obj_buf)
 {
 	unsigned char sha1[20];
-	struct obj_buffer *obj_buf = lookup_object_buffer(obj);
+
 	if (write_sha1_file(obj_buf->buffer, obj_buf->size, typename(obj->type), sha1) < 0)
 		die("failed to write object %s", sha1_to_hex(obj->sha1));
 	obj->flags |= FLAG_WRITTEN;
@@ -180,6 +180,8 @@ static void write_cached_object(struct object *obj)
  */
 static int check_object(struct object *obj, int type, void *data)
 {
+	struct obj_buffer *obj_buf;
+
 	if (!obj)
 		return 1;
 
@@ -198,11 +200,15 @@ static int check_object(struct object *obj, int type, void *data)
 		return 0;
 	}
 
-	if (fsck_object(obj, 1, fsck_error_function))
+	obj_buf = lookup_object_buffer(obj);
+	if (!obj_buf)
+		die("Whoops! Cannot find object '%s'", sha1_to_hex(obj->sha1));
+	if (fsck_object(obj, obj_buf->buffer, obj_buf->size, 1,
+			fsck_error_function))
 		die("Error in object");
 	if (fsck_walk(obj, check_object, NULL))
 		die("Error on reachable objects of %s", sha1_to_hex(obj->sha1));
-	write_cached_object(obj);
+	write_cached_object(obj, obj_buf);
 	return 0;
 }
 
diff --git a/fsck.c b/fsck.c
index 56156ff..dd77628 100644
--- a/fsck.c
+++ b/fsck.c
@@ -277,7 +277,7 @@ static int fsck_ident(const char **ident, struct object *obj, fsck_error error_f
 }
 
 static int fsck_commit_buffer(struct commit *commit, const char *buffer,
-			      fsck_error error_func)
+	unsigned long size, fsck_error error_func)
 {
 	unsigned char tree_sha1[20], sha1[20];
 	struct commit_graft *graft;
@@ -322,15 +322,18 @@ static int fsck_commit_buffer(struct commit *commit, const char *buffer,
 	return 0;
 }
 
-static int fsck_commit(struct commit *commit, fsck_error error_func)
+static int fsck_commit(struct commit *commit, const char *data,
+	unsigned long size, fsck_error error_func)
 {
-	const char *buffer = get_commit_buffer(commit, NULL);
-	int ret = fsck_commit_buffer(commit, buffer, error_func);
-	unuse_commit_buffer(commit, buffer);
+	const char *buffer = data ?  data : get_commit_buffer(commit, &size);
+	int ret = fsck_commit_buffer(commit, buffer, size, error_func);
+	if (!data)
+		unuse_commit_buffer(commit, buffer);
 	return ret;
 }
 
-static int fsck_tag(struct tag *tag, fsck_error error_func)
+static int fsck_tag(struct tag *tag, const char *data,
+	unsigned long size, fsck_error error_func)
 {
 	struct object *tagged = tag->tagged;
 
@@ -339,7 +342,8 @@ static int fsck_tag(struct tag *tag, fsck_error error_func)
 	return 0;
 }
 
-int fsck_object(struct object *obj, int strict, fsck_error error_func)
+int fsck_object(struct object *obj, void *data, unsigned long size,
+	int strict, fsck_error error_func)
 {
 	if (!obj)
 		return error_func(obj, FSCK_ERROR, "no valid object to fsck");
@@ -349,9 +353,11 @@ int fsck_object(struct object *obj, int strict, fsck_error error_func)
 	if (obj->type == OBJ_TREE)
 		return fsck_tree((struct tree *) obj, strict, error_func);
 	if (obj->type == OBJ_COMMIT)
-		return fsck_commit((struct commit *) obj, error_func);
+		return fsck_commit((struct commit *) obj, (const char *) data,
+			size, error_func);
 	if (obj->type == OBJ_TAG)
-		return fsck_tag((struct tag *) obj, error_func);
+		return fsck_tag((struct tag *) obj, (const char *) data,
+			size, error_func);
 
 	return error_func(obj, FSCK_ERROR, "unknown type '%d' (internal fsck error)",
 			  obj->type);
diff --git a/fsck.h b/fsck.h
index 1e4f527..d1e6387 100644
--- a/fsck.h
+++ b/fsck.h
@@ -28,6 +28,8 @@ int fsck_error_function(struct object *obj, int type, const char *fmt, ...);
  *    0		everything OK
  */
 int fsck_walk(struct object *obj, fsck_walk_func walk, void *data);
-int fsck_object(struct object *obj, int strict, fsck_error error_func);
+/* If NULL is passed for data, we assume the object is local and read it. */
+int fsck_object(struct object *obj, void *data, unsigned long size,
+	int strict, fsck_error error_func);
 
 #endif
-- 
2.0.0.rc3.9669.g840d1f9

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v4 3/6] Make sure fsck_commit_buffer() does not run out of the buffer
  2014-09-12  8:07     ` [PATCH v4 0/6] Improve tag checking in fsck and with transfer.fsckobjects Johannes Schindelin
  2014-09-12  8:07       ` [PATCH v4 1/6] Refactor type_from_string() to avoid die()ing in case of errors Johannes Schindelin
  2014-09-12  8:07       ` [PATCH v4 2/6] Accept object data in the fsck_object() function Johannes Schindelin
@ 2014-09-12  8:07       ` Johannes Schindelin
  2014-09-12  8:08       ` [PATCH v4 4/6] fsck: check tag objects' headers Johannes Schindelin
                         ` (3 subsequent siblings)
  6 siblings, 0 replies; 68+ messages in thread
From: Johannes Schindelin @ 2014-09-12  8:07 UTC (permalink / raw)
  To: gitster; +Cc: git

So far, we assumed that the buffer is NUL terminated, but this is not
a safe assumption, now that we opened the fsck_object() API to pass a
buffer directly.

So let's make sure that there is at least an empty line in the buffer.
That way, our checks would fail if the empty line was encountered
prematurely, and consequently we can get away with the current string
comparisons even with non-NUL-terminated buffers are passed to
fsck_object().

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 fsck.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

diff --git a/fsck.c b/fsck.c
index dd77628..73da6f8 100644
--- a/fsck.c
+++ b/fsck.c
@@ -237,6 +237,26 @@ static int fsck_tree(struct tree *item, int strict, fsck_error error_func)
 	return retval;
 }
 
+static int require_end_of_header(const void *data, unsigned long size,
+	struct object *obj, fsck_error error_func)
+{
+	const char *buffer = (const char *)data;
+	unsigned long i;
+
+	for (i = 0; i < size; i++) {
+		switch (buffer[i]) {
+		case '\0':
+			return error_func(obj, FSCK_ERROR,
+				"unterminated header: NUL at offset %d", i);
+		case '\n':
+			if (i + 1 < size && buffer[i + 1] == '\n')
+				return 0;
+		}
+	}
+
+	return error_func(obj, FSCK_ERROR, "unterminated header");
+}
+
 static int fsck_ident(const char **ident, struct object *obj, fsck_error error_func)
 {
 	char *end;
@@ -284,6 +304,9 @@ static int fsck_commit_buffer(struct commit *commit, const char *buffer,
 	unsigned parent_count, parent_line_count = 0;
 	int err;
 
+	if (require_end_of_header(buffer, size, &commit->object, error_func))
+		return -1;
+
 	if (!skip_prefix(buffer, "tree ", &buffer))
 		return error_func(&commit->object, FSCK_ERROR, "invalid format - expected 'tree' line");
 	if (get_sha1_hex(buffer, tree_sha1) || buffer[40] != '\n')
-- 
2.0.0.rc3.9669.g840d1f9

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v4 4/6] fsck: check tag objects' headers
  2014-09-12  8:07     ` [PATCH v4 0/6] Improve tag checking in fsck and with transfer.fsckobjects Johannes Schindelin
                         ` (2 preceding siblings ...)
  2014-09-12  8:07       ` [PATCH v4 3/6] Make sure fsck_commit_buffer() does not run out of the buffer Johannes Schindelin
@ 2014-09-12  8:08       ` Johannes Schindelin
  2014-09-12  8:08       ` [PATCH v4 5/6] Add regression tests for stricter tag fsck'ing Johannes Schindelin
                         ` (2 subsequent siblings)
  6 siblings, 0 replies; 68+ messages in thread
From: Johannes Schindelin @ 2014-09-12  8:08 UTC (permalink / raw)
  To: gitster; +Cc: git

We inspect commit objects pretty much in detail in git-fsck, but we just
glanced over the tag objects. Let's be stricter.

Since we do not want to limit 'tag' lines unduly, values that would fail
the refname check only result in warnings, not errors.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 fsck.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 85 insertions(+), 1 deletion(-)

diff --git a/fsck.c b/fsck.c
index 73da6f8..2fffa43 100644
--- a/fsck.c
+++ b/fsck.c
@@ -6,6 +6,7 @@
 #include "commit.h"
 #include "tag.h"
 #include "fsck.h"
+#include "refs.h"
 
 static int fsck_walk_tree(struct tree *tree, fsck_walk_func walk, void *data)
 {
@@ -355,6 +356,88 @@ static int fsck_commit(struct commit *commit, const char *data,
 	return ret;
 }
 
+static int fsck_tag_buffer(struct tag *tag, const char *data,
+	unsigned long size, fsck_error error_func)
+{
+	unsigned char sha1[20];
+	int ret = 0;
+	const char *buffer;
+	char *to_free = NULL, *eol;
+	struct strbuf sb = STRBUF_INIT;
+
+	if (data)
+		buffer = data;
+	else {
+		enum object_type type;
+
+		buffer = to_free =
+			read_sha1_file(tag->object.sha1, &type, &size);
+		if (!buffer)
+			return error_func(&tag->object, FSCK_ERROR,
+				"cannot read tag object");
+
+		if (type != OBJ_TAG) {
+			ret = error_func(&tag->object, FSCK_ERROR,
+				"expected tag got %s",
+			    typename(type));
+			goto done;
+		}
+	}
+
+	if (require_end_of_header(buffer, size, &tag->object, error_func))
+		goto done;
+
+	if (!skip_prefix(buffer, "object ", &buffer)) {
+		ret = error_func(&tag->object, FSCK_ERROR, "invalid format - expected 'object' line");
+		goto done;
+	}
+	if (get_sha1_hex(buffer, sha1) || buffer[40] != '\n') {
+		ret = error_func(&tag->object, FSCK_ERROR, "invalid 'object' line format - bad sha1");
+		goto done;
+	}
+	buffer += 41;
+
+	if (!skip_prefix(buffer, "type ", &buffer)) {
+		ret = error_func(&tag->object, FSCK_ERROR, "invalid format - expected 'type' line");
+		goto done;
+	}
+	eol = strchr(buffer, '\n');
+	if (!eol) {
+		ret = error_func(&tag->object, FSCK_ERROR, "invalid format - unexpected end after 'type' line");
+		goto done;
+	}
+	if (type_from_string_gently(buffer, eol - buffer, 1) < 0)
+		ret = error_func(&tag->object, FSCK_ERROR, "invalid 'type' value");
+	if (ret)
+		goto done;
+	buffer = eol + 1;
+
+	if (!skip_prefix(buffer, "tag ", &buffer)) {
+		ret = error_func(&tag->object, FSCK_ERROR, "invalid format - expected 'tag' line");
+		goto done;
+	}
+	eol = strchr(buffer, '\n');
+	if (!eol) {
+		ret = error_func(&tag->object, FSCK_ERROR, "invalid format - unexpected end after 'type' line");
+		goto done;
+	}
+	strbuf_addf(&sb, "refs/tags/%.*s", (int)(eol - buffer), buffer);
+	if (check_refname_format(sb.buf, 0))
+		error_func(&tag->object, FSCK_WARN, "invalid 'tag' name: %s", buffer);
+	buffer = eol + 1;
+
+	if (!skip_prefix(buffer, "tagger ", &buffer))
+		/* early tags do not contain 'tagger' lines; warn only */
+		error_func(&tag->object, FSCK_WARN, "invalid format - expected 'tagger' line");
+	else
+		ret = fsck_ident(&buffer, &tag->object, error_func);
+
+done:
+	strbuf_release(&sb);
+	free(to_free);
+	return ret;
+}
+
 static int fsck_tag(struct tag *tag, const char *data,
 	unsigned long size, fsck_error error_func)
 {
@@ -362,7 +445,8 @@ static int fsck_tag(struct tag *tag, const char *data,
 
 	if (!tagged)
 		return error_func(&tag->object, FSCK_ERROR, "could not load tagged object");
-	return 0;
+
+	return fsck_tag_buffer(tag, data, size, error_func);
 }
 
 int fsck_object(struct object *obj, void *data, unsigned long size,
-- 
2.0.0.rc3.9669.g840d1f9

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v4 5/6] Add regression tests for stricter tag fsck'ing
  2014-09-12  8:07     ` [PATCH v4 0/6] Improve tag checking in fsck and with transfer.fsckobjects Johannes Schindelin
                         ` (3 preceding siblings ...)
  2014-09-12  8:08       ` [PATCH v4 4/6] fsck: check tag objects' headers Johannes Schindelin
@ 2014-09-12  8:08       ` Johannes Schindelin
  2014-09-12  8:08       ` [PATCH v4 6/6] Make sure that index-pack --strict checks tag objects Johannes Schindelin
  2014-09-12 18:02       ` [PATCH v4 0/6] Improve tag checking in fsck and with transfer.fsckobjects Junio C Hamano
  6 siblings, 0 replies; 68+ messages in thread
From: Johannes Schindelin @ 2014-09-12  8:08 UTC (permalink / raw)
  To: gitster; +Cc: git

The intent of the new test case is to catch general breakages in
the fsck_tag() function, not so much to test it extensively, trying to
strike the proper balance between thoroughness and speed.

While it *would* have been nice to test the code path where fsck_object()
encounters an invalid tag object, this is not possible using git fsck: tag
objects are parsed already before fsck'ing (and the parser already fails
upon such objects).

Even worse: we would not even be able write out invalid tag objects
because git hash-object parses those objects, too, unless we resorted to
really ugly hacks such as using something like this in the unit tests
(essentially depending on Perl *and* Compress::Zlib):

	hash_invalid_object () {
		contents="$(printf '%s %d\0%s' "$1" ${#2} "$2")" &&
		sha1=$(echo "$contents" | test-sha1) &&
		suffix=${sha1#??} &&
		mkdir -p .git/objects/${sha1%$suffix} &&
		echo "$contents" |
		perl -MCompress::Zlib -e 'undef $/; print compress(<>)' \
			> .git/objects/${sha1%$suffix}/$suffix &&
		echo $sha1
	}

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t1450-fsck.sh | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 8c739c9..2b6a6f2 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -194,6 +194,26 @@ test_expect_success 'tag pointing to something else than its type' '
 	test_must_fail git fsck --tags
 '
 
+test_expect_success 'tag with incorrect tag name & missing tagger' '
+	sha=$(git rev-parse HEAD) &&
+	cat >wrong-tag <<-EOF &&
+	object $sha
+	type commit
+	tag wrong name format
+
+	This is an invalid tag.
+	EOF
+
+	tag=$(git hash-object -t tag -w --stdin <wrong-tag) &&
+	test_when_finished "remove_object $tag" &&
+	echo $tag >.git/refs/tags/wrong &&
+	test_when_finished "git update-ref -d refs/tags/wrong" &&
+	git fsck --tags 2>out &&
+	cat out &&
+	grep "invalid .tag. name" out &&
+	grep "expected .tagger. line" out
+'
+
 test_expect_success 'cleaned up' '
 	git fsck >actual 2>&1 &&
 	test_cmp empty actual
-- 
2.0.0.rc3.9669.g840d1f9

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* [PATCH v4 6/6] Make sure that index-pack --strict checks tag objects
  2014-09-12  8:07     ` [PATCH v4 0/6] Improve tag checking in fsck and with transfer.fsckobjects Johannes Schindelin
                         ` (4 preceding siblings ...)
  2014-09-12  8:08       ` [PATCH v4 5/6] Add regression tests for stricter tag fsck'ing Johannes Schindelin
@ 2014-09-12  8:08       ` Johannes Schindelin
  2014-09-12 18:02       ` [PATCH v4 0/6] Improve tag checking in fsck and with transfer.fsckobjects Junio C Hamano
  6 siblings, 0 replies; 68+ messages in thread
From: Johannes Schindelin @ 2014-09-12  8:08 UTC (permalink / raw)
  To: gitster; +Cc: git

One of the most important use cases for the strict tag object checking
is when transfer.fsckobjects is set to true to catch invalid objects
early on. This new regression test essentially tests the same code path
by directly calling 'index-pack --strict' on a pack containing an
tag object without a 'tagger' line.

Technically, this test is not enough: it only exercises a code path that
*warns*, not one that *fails*. The reason is that hash-object and
pack-objects both insist on parsing the tag objects and would fail on
invalid tag objects at this time.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 t/t5302-pack-index.sh | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/t/t5302-pack-index.sh b/t/t5302-pack-index.sh
index 4bbb718..61bc8da 100755
--- a/t/t5302-pack-index.sh
+++ b/t/t5302-pack-index.sh
@@ -243,4 +243,23 @@ test_expect_success 'running index-pack in the object store' '
     test -f .git/objects/pack/pack-${pack1}.idx
 '
 
+test_expect_success 'index-pack --strict warns upon missing tagger in tag' '
+    sha=$(git rev-parse HEAD) &&
+    cat >wrong-tag <<EOF &&
+object $sha
+type commit
+tag guten tag
+
+This is an invalid tag.
+EOF
+
+    tag=$(git hash-object -t tag -w --stdin <wrong-tag) &&
+    pack1=$(echo $tag $sha | git pack-objects tag-test) &&
+    echo remove tag object &&
+    thirtyeight=${tag#??} &&
+    rm -f .git/objects/${tag%$thirtyeight}/$thirtyeight &&
+    git index-pack --strict tag-test-${pack1}.pack 2>err &&
+    grep "^error:.* expected .tagger. line" err
+'
+
 test_done
-- 
2.0.0.rc3.9669.g840d1f9

^ permalink raw reply related	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 0/6] Improve tag checking in fsck and with transfer.fsckobjects
  2014-09-12  8:07     ` [PATCH v4 0/6] Improve tag checking in fsck and with transfer.fsckobjects Johannes Schindelin
                         ` (5 preceding siblings ...)
  2014-09-12  8:08       ` [PATCH v4 6/6] Make sure that index-pack --strict checks tag objects Johannes Schindelin
@ 2014-09-12 18:02       ` Junio C Hamano
  2014-09-13  9:08         ` Johannes Schindelin
  6 siblings, 1 reply; 68+ messages in thread
From: Junio C Hamano @ 2014-09-12 18:02 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Thanks. I think this is ready for 'next' and then down to 'master'
for the next release.

^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [PATCH v4 0/6] Improve tag checking in fsck and with transfer.fsckobjects
  2014-09-12 18:02       ` [PATCH v4 0/6] Improve tag checking in fsck and with transfer.fsckobjects Junio C Hamano
@ 2014-09-13  9:08         ` Johannes Schindelin
  0 siblings, 0 replies; 68+ messages in thread
From: Johannes Schindelin @ 2014-09-13  9:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Hi Junio,

On Fri, 12 Sep 2014, Junio C Hamano wrote:

> Thanks. I think this is ready for 'next' and then down to 'master'
> for the next release.

Thank you!
Dscho

^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2014-09-13  9:08 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-28 14:46 [PATCH 0/6] Improve tag checking in fsck and with transfer.fsckobjects Johannes Schindelin
2014-08-28 14:46 ` [PATCH 1/6] Refactor type_from_string() to avoid die()ing in case of errors Johannes Schindelin
2014-08-28 20:43   ` Junio C Hamano
2014-08-28 14:46 ` [PATCH 2/6] Accept object data in the fsck_object() function Johannes Schindelin
2014-08-28 20:47   ` Junio C Hamano
2014-08-29 23:10     ` Jeff King
2014-08-29 23:05   ` Jeff King
2014-08-28 14:46 ` [PATCH 3/6] Make sure fsck_commit_buffer() does not run out of the buffer Johannes Schindelin
2014-08-28 20:59   ` Junio C Hamano
2014-08-29 23:27   ` Jeff King
2014-08-28 14:46 ` [PATCH 4/6] fsck: check tag objects' headers Johannes Schindelin
2014-08-28 21:25   ` Junio C Hamano
2014-08-28 21:36     ` Junio C Hamano
2014-08-29 23:46       ` Jeff King
2014-08-31 22:46         ` Junio C Hamano
2014-09-03 22:29           ` Jeff King
2014-09-03 23:14             ` Junio C Hamano
2014-09-04  2:04               ` Jeff King
2014-08-29 23:43     ` Jeff King
2014-09-02 18:41       ` Junio C Hamano
2014-09-03 21:38         ` Jeff King
2014-08-28 14:46 ` [PATCH 5/6] Add regression tests for stricter tag fsck'ing Johannes Schindelin
2014-08-28 14:47 ` [PATCH 6/6] Make sure that index-pack --strict fails upon invalid tag objects Johannes Schindelin
2014-09-10 13:52 ` [PATCH v2 0/6] Improve tag checking in fsck and with transfer.fsckobjects Johannes Schindelin
2014-09-10 13:58   ` Johannes Schindelin
2014-09-10 21:07   ` Junio C Hamano
2014-09-10 21:31     ` Junio C Hamano
2014-09-11 14:20       ` Johannes Schindelin
2014-09-11 14:26   ` [PATCH v3 " Johannes Schindelin
2014-09-11 14:26     ` [PATCH v3 1/6] Refactor type_from_string() to avoid die()ing in case of errors Johannes Schindelin
2014-09-11 14:26     ` [PATCH v3 2/6] Accept object data in the fsck_object() function Johannes Schindelin
2014-09-11 14:26     ` [PATCH v3 3/6] Make sure fsck_commit_buffer() does not run out of the buffer Johannes Schindelin
2014-09-11 14:26     ` [PATCH v3 4/6] fsck: check tag objects' headers Johannes Schindelin
2014-09-11 14:26     ` [PATCH v3 5/6] Add regression tests for stricter tag fsck'ing Johannes Schindelin
2014-09-11 14:26     ` [PATCH v3 6/6] Make sure that index-pack --strict checks tag objects Johannes Schindelin
2014-09-11 17:58       ` Junio C Hamano
2014-09-11 21:16         ` Junio C Hamano
2014-09-11 21:17           ` [PATCH 0/3] hash-object --literally Junio C Hamano
2014-09-11 21:17             ` [PATCH 1/3] hash-object: reduce file-scope statics Junio C Hamano
2014-09-11 21:17             ` [PATCH 2/3] hash-object: pass 'write_object' as a flag Junio C Hamano
2014-09-11 21:17             ` [PATCH 3/3] hash-object: add --literally option Junio C Hamano
2014-09-12  8:04           ` [PATCH v3 6/6] Make sure that index-pack --strict checks tag objects Johannes Schindelin
2014-09-12  8:07     ` [PATCH v4 0/6] Improve tag checking in fsck and with transfer.fsckobjects Johannes Schindelin
2014-09-12  8:07       ` [PATCH v4 1/6] Refactor type_from_string() to avoid die()ing in case of errors Johannes Schindelin
2014-09-12  8:07       ` [PATCH v4 2/6] Accept object data in the fsck_object() function Johannes Schindelin
2014-09-12  8:07       ` [PATCH v4 3/6] Make sure fsck_commit_buffer() does not run out of the buffer Johannes Schindelin
2014-09-12  8:08       ` [PATCH v4 4/6] fsck: check tag objects' headers Johannes Schindelin
2014-09-12  8:08       ` [PATCH v4 5/6] Add regression tests for stricter tag fsck'ing Johannes Schindelin
2014-09-12  8:08       ` [PATCH v4 6/6] Make sure that index-pack --strict checks tag objects Johannes Schindelin
2014-09-12 18:02       ` [PATCH v4 0/6] Improve tag checking in fsck and with transfer.fsckobjects Junio C Hamano
2014-09-13  9:08         ` Johannes Schindelin
     [not found] ` <cover.1410356761.git.johannes.schindelin@gmx.de>
2014-09-10 13:52   ` [PATCH v2 1/6] Refactor type_from_string() to avoid die()ing in case of errors Johannes Schindelin
2014-09-10 13:52   ` [PATCH v2 2/6] Accept object data in the fsck_object() function Johannes Schindelin
2014-09-10 13:52   ` [PATCH v2 3/6] Make sure fsck_commit_buffer() does not run out of the buffer Johannes Schindelin
2014-09-10 17:43     ` Junio C Hamano
2014-09-11 11:59       ` Johannes Schindelin
2014-09-11 16:49         ` Junio C Hamano
2014-09-10 20:45     ` Eric Sunshine
2014-09-10 13:53   ` [PATCH v2 4/6] fsck: check tag objects' headers Johannes Schindelin
2014-09-10 17:52     ` Junio C Hamano
2014-09-10 13:53   ` [PATCH v2 5/6] Add regression tests for stricter tag fsck'ing Johannes Schindelin
2014-09-10 17:56     ` Junio C Hamano
2014-09-11 14:15       ` Johannes Schindelin
2014-09-10 13:53   ` [PATCH v2 6/6] Make sure that index-pack --strict fails upon invalid tag objects Johannes Schindelin
2014-09-10 21:54     ` Junio C Hamano
2014-09-11 14:22       ` Johannes Schindelin
2014-09-11 16:50         ` Junio C Hamano
2014-09-11 17:04           ` Johannes Schindelin

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).