git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* git error in tag ...: unterminated header
@ 2015-06-25 15:51 Wolfgang Denk
  2015-06-25 17:32 ` Junio C Hamano
  2015-06-25 17:38 ` Junio C Hamano
  0 siblings, 2 replies; 19+ messages in thread
From: Wolfgang Denk @ 2015-06-25 15:51 UTC (permalink / raw)
  To: git

Hi all,

it turns out that recent versions of git (i. e. git version 2.2.0 or
later, resp. anything which includes commit 4d0d8975 "Make sure
fsck_commit_buffer() does not run out of the buffer") throws errors on
our git repository  git://git.denx.de/u-boot:

-> git fsck --full
Checking object directories: 100% (256/256), done.
error in tag eb394f56db3e05d00891d6dc36a00df0025cf255: unterminated header
error in tag 9bf86baaa3b35b25baa2d664e2f7f6cafad689ee: unterminated header
error in tag c7071e6d645a8e13adb0d4cea2caad27213fa62f: unterminated header
Checking objects: 100% (328644/328644), done.
Checking connectivity: 325719, done.

Apparently for some reason the tags  LABEL_2006_03_12_0025,
LABEL_2006_04_18_1106, and LABEL_2006_05_19_1133 are missing newlines,
which was undetected so far, but now is raised as an error.

Question is: how can we fix that?

Is there a tool to "auto-repair" such kind of errors?

If not, would it be be a harmless thing to just delete and recreate
the same tags (referring to the same commits)?  Or can anybody see any
problems tha might be caused by such a tag re-creation?  I mean, it is
not like a rebase of the repository or something like that?  Right?

Thanks in advance.

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
Artificial Intelligence is the study of how to  make  real  computers
act like the ones in movies.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git error in tag ...: unterminated header
  2015-06-25 15:51 git error in tag ...: unterminated header Wolfgang Denk
@ 2015-06-25 17:32 ` Junio C Hamano
  2015-06-25 20:13   ` Wolfgang Denk
  2015-06-25 17:38 ` Junio C Hamano
  1 sibling, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2015-06-25 17:32 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, Wolfgang Denk

Wolfgang Denk <wd@denx.de> writes:

> it turns out that recent versions of git (i. e. git version 2.2.0 or
> later, resp. anything which includes commit 4d0d8975 "Make sure
> fsck_commit_buffer() does not run out of the buffer") throws errors on
> our git repository  git://git.denx.de/u-boot:
>
> -> git fsck --full
> Checking object directories: 100% (256/256), done.
> error in tag eb394f56db3e05d00891d6dc36a00df0025cf255: unterminated header
> error in tag 9bf86baaa3b35b25baa2d664e2f7f6cafad689ee: unterminated header
> error in tag c7071e6d645a8e13adb0d4cea2caad27213fa62f: unterminated header
> Checking objects: 100% (328644/328644), done.
> Checking connectivity: 325719, done.
>
> Apparently for some reason the tags  LABEL_2006_03_12_0025,
> LABEL_2006_04_18_1106, and LABEL_2006_05_19_1133 are missing newlines,
> which was undetected so far, but now is raised as an error.
>
> Question is: how can we fix that?

It could be that 4d0d8975 is buggy and barking at a non breakage.

If there is no message in the tag, i.e.

    -- 8< --
    object 84ef51a632063e8cb57b2e9a4252497512831ffe
    type commit
    tag LABEL_2006_03_12_0025
    tagger Wolfgang Denk <wd@pollux.denx.de> 1142119613 +0100
    -- >8 --

I do not offhand see why we want to require that there is a lone
blank line at the end.

Dscho?  What do you think?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git error in tag ...: unterminated header
  2015-06-25 15:51 git error in tag ...: unterminated header Wolfgang Denk
  2015-06-25 17:32 ` Junio C Hamano
@ 2015-06-25 17:38 ` Junio C Hamano
  1 sibling, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2015-06-25 17:38 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: git

Wolfgang Denk <wd@denx.de> writes:

> Apparently for some reason the tags  LABEL_2006_03_12_0025,
> LABEL_2006_04_18_1106, and LABEL_2006_05_19_1133 are missing newlines,
> which was undetected so far, but now is raised as an error.
>
> Question is: how can we fix that?
>
> Is there a tool to "auto-repair" such kind of errors?
>
> If not, would it be be a harmless thing to just delete and recreate
> the same tags (referring to the same commits)?  Or can anybody see any
> problems tha might be caused by such a tag re-creation?  I mean, it is
> not like a rebase of the repository or something like that?  Right?

I am inclined to say that these messages are that recent Git barking
at a non-errors, and hopefully there is nothing other than ignoring
them you have to do until it gets fixed.

But if you decide to recreate tags, there are a few things to keep
in mind:

 * You would get a different object name, of course.  You would
   likely to get different timestamps (you can export the desired
   timestamp in GIT_COMMITTER_DATE, though).

 * Those who fetched from you and already have these tags will not
   get the updated one by default.  They need to force fetch them.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git error in tag ...: unterminated header
  2015-06-25 17:32 ` Junio C Hamano
@ 2015-06-25 20:13   ` Wolfgang Denk
  2015-06-25 20:24     ` Junio C Hamano
  2015-06-25 20:48     ` git error in tag ...: unterminated header Junio C Hamano
  0 siblings, 2 replies; 19+ messages in thread
From: Wolfgang Denk @ 2015-06-25 20:13 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, git

Dear Junio,

thanks for the fast replies.

In message <xmqqegkzzoaz.fsf@gitster.dls.corp.google.com> you wrote:
> 
> > Question is: how can we fix that?
> 
> It could be that 4d0d8975 is buggy and barking at a non breakage.
> 
> If there is no message in the tag, i.e.
> 
>     -- 8< --
>     object 84ef51a632063e8cb57b2e9a4252497512831ffe
>     type commit
>     tag LABEL_2006_03_12_0025
>     tagger Wolfgang Denk <wd@pollux.denx.de> 1142119613 +0100
>     -- >8 --

It seems recent tools cannot create such tags any more?

> I do not offhand see why we want to require that there is a lone
> blank line at the end.

Hm... it seems I cannot even easily delte these tags:

-> git tag -d LABEL_2006_03_12_0025
Deleted tag 'LABEL_2006_03_12_0025' (was eb394f5)
-> git fsck --full
Checking object directories: 100% (256/256), done.
Checking object directories: 100% (256/256), done.
error in tag eb394f56db3e05d00891d6dc36a00df0025cf255: unterminated header
error in tag 9bf86baaa3b35b25baa2d664e2f7f6cafad689ee: unterminated header
error in tag c7071e6d645a8e13adb0d4cea2caad27213fa62f: unterminated header
error in tag eb394f56db3e05d00891d6dc36a00df0025cf255: unterminated header
error in tag 9bf86baaa3b35b25baa2d664e2f7f6cafad689ee: unterminated header
error in tag c7071e6d645a8e13adb0d4cea2caad27213fa62f: unterminated header
Checking objects: 100% (657288/657288), done.
Checking connectivity: 325718, done.
dangling tag eb394f56db3e05d00891d6dc36a00df0025cf255

Now I also have this "dangling tag" thingy...

Best regards,

Wolfgang Denk

-- 
DENX Software Engineering GmbH,      Managing Director: Wolfgang Denk
HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany
Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de
"An organization dries up if you don't challenge it with growth."
       - Mark Shepherd, former President and CEO of Texas Instruments

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git error in tag ...: unterminated header
  2015-06-25 20:13   ` Wolfgang Denk
@ 2015-06-25 20:24     ` Junio C Hamano
  2015-06-25 21:07       ` Johannes Schindelin
  2015-06-25 20:48     ` git error in tag ...: unterminated header Junio C Hamano
  1 sibling, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2015-06-25 20:24 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: Johannes Schindelin, git

Wolfgang Denk <wd@denx.de> writes:

> Dear Junio,
>
> thanks for the fast replies.
>
> In message <xmqqegkzzoaz.fsf@gitster.dls.corp.google.com> you wrote:
>> 
>> > Question is: how can we fix that?
>> 
>> It could be that 4d0d8975 is buggy and barking at a non breakage.
>> 
>> If there is no message in the tag, i.e.
>> 
>>     -- 8< --
>>     object 84ef51a632063e8cb57b2e9a4252497512831ffe
>>     type commit
>>     tag LABEL_2006_03_12_0025
>>     tagger Wolfgang Denk <wd@pollux.denx.de> 1142119613 +0100
>>     -- >8 --
>
> It seems recent tools cannot create such tags any more?

Yeah, we append an extra blank line after the last header even when
there is no message.  But I do not think that means what you have 
above is invalid.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git error in tag ...: unterminated header
  2015-06-25 20:13   ` Wolfgang Denk
  2015-06-25 20:24     ` Junio C Hamano
@ 2015-06-25 20:48     ` Junio C Hamano
  1 sibling, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2015-06-25 20:48 UTC (permalink / raw)
  To: Wolfgang Denk; +Cc: Johannes Schindelin, git

Wolfgang Denk <wd@denx.de> writes:

> Hm... it seems I cannot even easily delte these tags:
>
> -> git tag -d LABEL_2006_03_12_0025
> Deleted tag 'LABEL_2006_03_12_0025' (was eb394f5)
> -> git fsck --full
> Checking object directories: 100% (256/256), done.
> Checking object directories: 100% (256/256), done.

This is expected.  "git tag -d" only severed the linkage between
tagname LABEL_2006_03_12_0025 and the tag object eb394f5; without
repacking and pruning, the object eb394f5 is not removed from your
object store.

> dangling tag eb394f56db3e05d00891d6dc36a00df0025cf255
>
> Now I also have this "dangling tag" thingy...

That also is expected.  It now is "dangling" because it is not
reachable from any of your refs.

"git repack -a -d -f" after "git tag -d && git tag -a" would make
your repository fsck-clean, but I highly doubt that you would want
to recreate your tags right now.

If I were you, I'd learn to ignore these 'unterminated header'
errors for now and wait to see if Git folks decide that 4d0d8975
(Make sure fsck_commit_buffer() does not run out of the buffer,
2014-09-11) is overzealous and giving an error message where there
is no error, in which case future versions of Git will not complain
on these objects.  It is not too late to do the re-tagging after
they decide to keep the current behaviour.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git error in tag ...: unterminated header
  2015-06-25 20:24     ` Junio C Hamano
@ 2015-06-25 21:07       ` Johannes Schindelin
  2015-06-25 21:21         ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Johannes Schindelin @ 2015-06-25 21:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Wolfgang Denk, git

Hi Junio & Wolfgang,

On 2015-06-25 22:24, Junio C Hamano wrote:
> Wolfgang Denk <wd@denx.de> writes:
> 
>> In message <xmqqegkzzoaz.fsf@gitster.dls.corp.google.com> you wrote:
>>>
>>> > Question is: how can we fix that?
>>>
>>> It could be that 4d0d8975 is buggy and barking at a non breakage.

Well, I would like to believe that this commit made our code *safer* by making sure that we would never overrun the buffer. Remember: under certain circumstances, the buffer passed to the fsck machinery is *not* terminated by a NUL. The code I introduced simply verifies that there is an empty line because the fsck code stops there and does not look further.

If the buffer does *not* contain an empty line, the fsck code runs the danger of looking beyond the allocated memory because it uses functions that assume NUL-terminated strings, while the buffer passed to the fsck code is a counted string.

The quick & dirty work-around would be to detect when the buffer does not contain an empty line and to make a NUL-terminated copy in that case.

A better solution was outlined by Peff back when I submitted those patches: change all the code paths that read objects and make sure that all of them are terminated by a NUL. AFAIR some code paths did that already, but not all of them.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git error in tag ...: unterminated header
  2015-06-25 21:07       ` Johannes Schindelin
@ 2015-06-25 21:21         ` Junio C Hamano
  2015-06-25 22:29           ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2015-06-25 21:21 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Wolfgang Denk, git

Johannes Schindelin <johannes.schindelin@gmx.de> writes:

> Hi Junio & Wolfgang,
>
> On 2015-06-25 22:24, Junio C Hamano wrote:
>> Wolfgang Denk <wd@denx.de> writes:
>> 
>>> In message <xmqqegkzzoaz.fsf@gitster.dls.corp.google.com> you wrote:
>>>>
>>>> > Question is: how can we fix that?
>>>>
>>>> It could be that 4d0d8975 is buggy and barking at a non breakage.
>
> Well, I would like to believe that this commit made our code *safer*
> by making sure that we would never overrun the buffer. Remember: under
> certain circumstances, the buffer passed to the fsck machinery is
> *not* terminated by a NUL. The code I introduced simply verifies that
> there is an empty line because the fsck code stops there and does not
> look further.
>
> If the buffer does *not* contain an empty line, the fsck code runs the
> danger of looking beyond the allocated memory because it uses
> functions that assume NUL-terminated strings, while the buffer passed
> to the fsck code is a counted string.
>
> The quick & dirty work-around would be to detect when the buffer does
> not contain an empty line and to make a NUL-terminated copy in that
> case.

Yes, I can totally understand its quick-and-dirty-ness would break
a valid case where there is no need for a blank after the header.

> A better solution was outlined by Peff back when I submitted those
> patches: change all the code paths that read objects and make sure
> that all of them are terminated by a NUL. AFAIR some code paths did
> that already, but not all of them.

I do not think you necessarily need a NUL.  As you said, your input
is a counted string, so you know the length of the buffer.  And you
are verifying line by line.  As long as you make sure the buffer
ends with "\n" (instead of saying "it has "\n\n" somewhere),
updating the existing code that does

	if (buffer is not well formed wrt "tree")
		barf;
	else
        	advance buffer to point at the next line;
	if (buffer is not well formed wrt "parent")
        	barf;
	...

to do this instead:

	if (buffer is not well formed wrt "tree")
		barf;
	else
        	advance buffer to point at the next line;
	if (buffer is now beyond the end of the original length)
		barf; /* missing "parent" */
	if (buffer is not well formed wrt "parent")
        	barf;
	...

shouldn't be rocket science, no?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git error in tag ...: unterminated header
  2015-06-25 21:21         ` Junio C Hamano
@ 2015-06-25 22:29           ` Junio C Hamano
  2015-06-26  8:06             ` Johannes Schindelin
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2015-06-25 22:29 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Wolfgang Denk, git

Junio C Hamano <gitster@pobox.com> writes:

> Johannes Schindelin <johannes.schindelin@gmx.de> writes:
> ...
>> If the buffer does *not* contain an empty line, the fsck code runs the
>> danger of looking beyond the allocated memory because it uses
>> functions that assume NUL-terminated strings, while the buffer passed
>> to the fsck code is a counted string.
>> that already, but not all of them.
> ...
> I do not think you necessarily need a NUL.  As you said, your input
> is a counted string, so you know the length of the buffer.  And you
> are verifying line by line.  As long as you make sure the buffer
> ends with "\n" (instead of saying "it has "\n\n" somewhere),
> updating the existing code that does ...
> to do this instead: ...
> shouldn't be rocket science, no?

Here is to illustrate what I meant by the above.

I did only the "tag" side (this is on 'maint'), because I did not
want to conflict with a larger change to fsck you have on 'pu'.

I made it use an updated version of require_end_of_header(), which I
named prescan_headers(), as the new sanity check does not require
\n\n as the only end of header, and also the sanity check (both old
and new) also checks for NUL.

I didn't assess how much more involved to make fsck-commit-buffer to
use prescan-headers (and retire require-end-of-header), though.

 fsck.c | 39 ++++++++++++++++++++++++++++++++++-----
 1 file changed, 34 insertions(+), 5 deletions(-)

diff --git a/fsck.c b/fsck.c
index 10bcb65..7d2d07d 100644
--- a/fsck.c
+++ b/fsck.c
@@ -261,6 +261,29 @@ static int require_end_of_header(const void *data, unsigned long size,
 	return error_func(obj, FSCK_ERROR, "unterminated header");
 }
 
+static int prescan_headers(const void *data, unsigned long size,
+			   struct object *obj, fsck_error error_func)
+{
+	const char *buffer = (const char *)data;
+	unsigned long i;
+
+	for (i = 0; i < size; i++) {
+		switch (buffer[i]) {
+		case '\0':
+			return error_func(obj, FSCK_ERROR,
+					  "unterminated header: NUL at offset %d", i);
+		case '\n':
+			if (i + 1 < size && buffer[i + 1] == '\n')
+				return 0; /* end of header */
+		}
+	}
+
+	/* no blank (i.e. headers and nothing else */
+	if (size && buffer[size - 1] == '\n')
+		return 0;
+	return error_func(obj, FSCK_ERROR, "unterminated header");
+}
+
 static int fsck_ident(const char **ident, struct object *obj, fsck_error error_func)
 {
 	char *end;
@@ -365,6 +388,7 @@ static int fsck_tag_buffer(struct tag *tag, const char *data,
 	unsigned char sha1[20];
 	int ret = 0;
 	const char *buffer;
+	const char *end_buffer;
 	char *to_free = NULL, *eol;
 	struct strbuf sb = STRBUF_INIT;
 
@@ -387,10 +411,12 @@ static int fsck_tag_buffer(struct tag *tag, const char *data,
 		}
 	}
 
-	if (require_end_of_header(buffer, size, &tag->object, error_func))
+	end_buffer = buffer + size;
+	if (prescan_headers(buffer, size, &tag->object, error_func))
 		goto done;
 
-	if (!skip_prefix(buffer, "object ", &buffer)) {
+	if (end_buffer <= buffer ||
+	    !skip_prefix(buffer, "object ", &buffer)) {
 		ret = error_func(&tag->object, FSCK_ERROR, "invalid format - expected 'object' line");
 		goto done;
 	}
@@ -400,7 +426,8 @@ static int fsck_tag_buffer(struct tag *tag, const char *data,
 	}
 	buffer += 41;
 
-	if (!skip_prefix(buffer, "type ", &buffer)) {
+	if (end_buffer <= buffer ||
+	    !skip_prefix(buffer, "type ", &buffer)) {
 		ret = error_func(&tag->object, FSCK_ERROR, "invalid format - expected 'type' line");
 		goto done;
 	}
@@ -415,7 +442,8 @@ static int fsck_tag_buffer(struct tag *tag, const char *data,
 		goto done;
 	buffer = eol + 1;
 
-	if (!skip_prefix(buffer, "tag ", &buffer)) {
+	if (end_buffer <= buffer ||
+	    !skip_prefix(buffer, "tag ", &buffer)) {
 		ret = error_func(&tag->object, FSCK_ERROR, "invalid format - expected 'tag' line");
 		goto done;
 	}
@@ -430,7 +458,8 @@ static int fsck_tag_buffer(struct tag *tag, const char *data,
 			   (int)(eol - buffer), buffer);
 	buffer = eol + 1;
 
-	if (!skip_prefix(buffer, "tagger ", &buffer))
+	if (end_buffer <= buffer ||
+	    !skip_prefix(buffer, "tagger ", &buffer))
 		/* early tags do not contain 'tagger' lines; warn only */
 		error_func(&tag->object, FSCK_WARN, "invalid format - expected 'tagger' line");
 	else

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: git error in tag ...: unterminated header
  2015-06-25 22:29           ` Junio C Hamano
@ 2015-06-26  8:06             ` Johannes Schindelin
  2015-06-26 15:52               ` Jeff King
  0 siblings, 1 reply; 19+ messages in thread
From: Johannes Schindelin @ 2015-06-26  8:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Wolfgang Denk, git

Hi Junio,

On 2015-06-26 00:29, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
>> Johannes Schindelin <johannes.schindelin@gmx.de> writes:
>> ...
>>> If the buffer does *not* contain an empty line, the fsck code runs the
>>> danger of looking beyond the allocated memory because it uses
>>> functions that assume NUL-terminated strings, while the buffer passed
>>> to the fsck code is a counted string.
>>> that already, but not all of them.
>> ...
>> I do not think you necessarily need a NUL.  As you said, your input
>> is a counted string, so you know the length of the buffer.  And you
>> are verifying line by line.  As long as you make sure the buffer
>> ends with "\n" (instead of saying "it has "\n\n" somewhere),
>> updating the existing code that does ...
>> to do this instead: ...
>> shouldn't be rocket science, no?
> 
> Here is to illustrate what I meant by the above.

I understood what you were saying, but it still appears too fragile to me to mix functions that assume NUL-terminated strings with an ad-hoc counted string check. Take `skip_prefix()` for example: it really assumes implicitly that the string is NUL-terminated because a first argument that is too short for the prefix will simply compare NUL against non-NUL at some stage. With your idea, we will now assume that all the usages of `skip_prefix()` in `fsck.c`, including future ones, will refrain from asking for a prefix containing `\n`. That might not hold true, and worse: it might not be detected because a couple of code paths *already* pass NUL-terminated buffers to fsck.

Now, we are actually talking about really rare objects, right? So why not be a little generous with memory in the very unlikely event that we encounter such a tag object? I am thinking somewhat along these lines:

-- snipsnap --
diff --git a/fsck.c b/fsck.c
index 2fffa43..4c7530a 100644
--- a/fsck.c
+++ b/fsck.c
@@ -238,10 +238,11 @@ static int fsck_tree(struct tree *item, int strict, fsck_error error_func)
 	return retval;
 }
 
-static int require_end_of_header(const void *data, unsigned long size,
+static int require_end_of_header(const char **data, unsigned long size,
 	struct object *obj, fsck_error error_func)
 {
-	const char *buffer = (const char *)data;
+	static char *nul_terminated;
+	const char *buffer = *(const char **)data;
 	unsigned long i;
 
 	for (i = 0; i < size; i++) {
@@ -255,7 +256,14 @@ static int require_end_of_header(const void *data, unsigned long size,
 		}
 	}
 
-	return error_func(obj, FSCK_ERROR, "unterminated header");
+	/* TODO: when fsck-opt hits `next`, test whether to error out here */
+	if (nul_terminated)
+		free(nul_terminated);
+	nul_terminated = xmalloc(size + 1);
+	memcpy(nul_terminated, *data, size);
+	nul_terminated[size] = '\0';
+	*data = nul_terminated;
+	return 0;
 }
 
 static int fsck_ident(const char **ident, struct object *obj, fsck_error error_func)
@@ -297,15 +304,16 @@ static int fsck_ident(const char **ident, struct object *obj, fsck_error error_f
 	return 0;
 }
 
-static int fsck_commit_buffer(struct commit *commit, const char *buffer,
+static int fsck_commit_buffer(struct commit *commit, const char *orig,
 	unsigned long size, fsck_error error_func)
 {
+	const char *buffer = orig;
 	unsigned char tree_sha1[20], sha1[20];
 	struct commit_graft *graft;
 	unsigned parent_count, parent_line_count = 0;
 	int err;
 
-	if (require_end_of_header(buffer, size, &commit->object, error_func))
+	if (require_end_of_header(&buffer, size, &commit->object, error_func))
 		return -1;
 
 	if (!skip_prefix(buffer, "tree ", &buffer))
@@ -356,9 +364,10 @@ static int fsck_commit(struct commit *commit, const char *data,
 	return ret;
 }
 
-static int fsck_tag_buffer(struct tag *tag, const char *data,
+static int fsck_tag_buffer(struct tag *tag, const char *orig,
 	unsigned long size, fsck_error error_func)
 {
+	const char *data = orig;
 	unsigned char sha1[20];
 	int ret = 0;
 	const char *buffer;
@@ -384,7 +393,7 @@ static int fsck_tag_buffer(struct tag *tag, const char *data,
 		}
 	}
 
-	if (require_end_of_header(buffer, size, &tag->object, error_func))
+	if (require_end_of_header(&buffer, size, &tag->object, error_func))
 		goto done;
 
 	if (!skip_prefix(buffer, "object ", &buffer)) {

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: git error in tag ...: unterminated header
  2015-06-26  8:06             ` Johannes Schindelin
@ 2015-06-26 15:52               ` Jeff King
  2015-06-26 17:37                 ` Junio C Hamano
  2015-06-28 18:18                 ` [PATCH] fsck: it is OK for a tag and a commit to lack the body Junio C Hamano
  0 siblings, 2 replies; 19+ messages in thread
From: Jeff King @ 2015-06-26 15:52 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Junio C Hamano, Wolfgang Denk, git

On Fri, Jun 26, 2015 at 10:06:20AM +0200, Johannes Schindelin wrote:

> I understood what you were saying, but it still appears too fragile to
> me to mix functions that assume NUL-terminated strings with an ad-hoc
> counted string check.

Yeah, I agree. It is not that you cannot make it safe, but that it is
simply a fragile maintenance burden in the future. I thought we dealt
with this already with a1e920a (index-pack: terminate object buffers
with NUL, 2014-12-08), though.

-Peff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git error in tag ...: unterminated header
  2015-06-26 15:52               ` Jeff King
@ 2015-06-26 17:37                 ` Junio C Hamano
  2015-06-27  8:57                   ` Johannes Schindelin
  2015-06-28 18:18                 ` [PATCH] fsck: it is OK for a tag and a commit to lack the body Junio C Hamano
  1 sibling, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2015-06-26 17:37 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, Wolfgang Denk, git

Jeff King <peff@peff.net> writes:

> On Fri, Jun 26, 2015 at 10:06:20AM +0200, Johannes Schindelin wrote:
>
>> I understood what you were saying, but it still appears too fragile to
>> me to mix functions that assume NUL-terminated strings with an ad-hoc
>> counted string check.
>
> Yeah, I agree. It is not that you cannot make it safe, but that it is
> simply a fragile maintenance burden in the future. I thought we dealt
> with this already with a1e920a (index-pack: terminate object buffers
> with NUL, 2014-12-08), though.

Hmph, that is an interesting point.

It would mean that the require_eoh() can be reduced a bit further.

 * It is still a good idea to make sure we do not have NUL in the
   header part,

 * It can still stop scanning when it finds a blank line (i.e. we do
   not care what is in the message part of commit and tag),

 * It does not have to insist that a commit or a tag has a blank
   line to reject a header-only object.

That would mean the name of the helper needs to change, though.

Perhaps like this?

 fsck.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/fsck.c b/fsck.c
index 10bcb65..c23e84e 100644
--- a/fsck.c
+++ b/fsck.c
@@ -241,8 +241,8 @@ static int fsck_tree(struct tree *item, int strict, fsck_error error_func)
 	return retval;
 }
 
-static int require_end_of_header(const void *data, unsigned long size,
-	struct object *obj, fsck_error error_func)
+static int verify_headers(const void *data, unsigned long size,
+			  struct object *obj, fsck_error error_func)
 {
 	const char *buffer = (const char *)data;
 	unsigned long i;
@@ -257,8 +257,13 @@ static int require_end_of_header(const void *data, unsigned long size,
 				return 0;
 		}
 	}
-
-	return error_func(obj, FSCK_ERROR, "unterminated header");
+	/*
+	 * did not find a blank line -- is the last header line
+	 * correctly terminated with LF?
+	 */
+	if (size && buffer[size - 1] != '\n')
+		return error_func(obj, FSCK_ERROR, "unterminated header");
+	return 0;
 }
 
 static int fsck_ident(const char **ident, struct object *obj, fsck_error error_func)

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: git error in tag ...: unterminated header
  2015-06-26 17:37                 ` Junio C Hamano
@ 2015-06-27  8:57                   ` Johannes Schindelin
  2015-06-27 18:36                     ` Junio C Hamano
  0 siblings, 1 reply; 19+ messages in thread
From: Johannes Schindelin @ 2015-06-27  8:57 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Wolfgang Denk, git

Hi Junio,

On 2015-06-26 19:37, Junio C Hamano wrote:
> Jeff King <peff@peff.net> writes:
> 
>> On Fri, Jun 26, 2015 at 10:06:20AM +0200, Johannes Schindelin wrote:
>>
>>> I understood what you were saying, but it still appears too fragile to
>>> me to mix functions that assume NUL-terminated strings with an ad-hoc
>>> counted string check.
>>
>> Yeah, I agree. It is not that you cannot make it safe, but that it is
>> simply a fragile maintenance burden in the future. I thought we dealt
>> with this already with a1e920a (index-pack: terminate object buffers
>> with NUL, 2014-12-08), though.
> 
> Hmph, that is an interesting point.
> 
> It would mean that the require_eoh() can be reduced a bit further.
> 
>  * It is still a good idea to make sure we do not have NUL in the
>    header part,
> 
>  * It can still stop scanning when it finds a blank line (i.e. we do
>    not care what is in the message part of commit and tag),
> 
>  * It does not have to insist that a commit or a tag has a blank
>    line to reject a header-only object.
> 
> That would mean the name of the helper needs to change, though.

You mean in addition to your changes to read new lines only when we're still inside the buffer? I cannot say that I like this fragility (and would prefer the aforementioned patch that simply allocates a NUL-terminated buffer in the rather unlikely event of tag/commit objects without an empty line), but then: you are stuck with maintaining this code, so it is your decision. ;-)

I will hopefully have time starting Tuesday this week to work on that patch, if nobody else beats me to it.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: git error in tag ...: unterminated header
  2015-06-27  8:57                   ` Johannes Schindelin
@ 2015-06-27 18:36                     ` Junio C Hamano
  0 siblings, 0 replies; 19+ messages in thread
From: Junio C Hamano @ 2015-06-27 18:36 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jeff King, Wolfgang Denk, git

Johannes Schindelin <johannes.schindelin@gmx.de> writes:

> On 2015-06-26 19:37, Junio C Hamano wrote:
>> Jeff King <peff@peff.net> writes:
>> 
>>> On Fri, Jun 26, 2015 at 10:06:20AM +0200, Johannes Schindelin wrote:
>>>
>>>> I understood what you were saying, but it still appears too fragile to
>>>> me to mix functions that assume NUL-terminated strings with an ad-hoc
>>>> counted string check.
>>>
>>> Yeah, I agree. It is not that you cannot make it safe, but that it is
>>> simply a fragile maintenance burden in the future. I thought we dealt
>>> with this already with a1e920a (index-pack: terminate object buffers
>>> with NUL, 2014-12-08), though.
>> 
>> Hmph, that is an interesting point.
>> 
>> It would mean that the require_eoh() can be reduced a bit further.
>> ...
>> That would mean the name of the helper needs to change, though.
>
> You mean in addition to your changes to read new lines only when we're
> still inside the buffer?

I think what Peff meant was that we always have the NUL at the end
of the buffer in a world with with a1e920a0 (index-pack: terminate
object buffers with NUL, 2014-12-08).

That means that we can replace require-eoh with verify-headers in
the message you are responding to and update the callers to call the
new function without doing anything else.

It might be tempting to say that require-eoh is not necessary, but I
think verify-headers still has its values. The running fsck may not
know some of the new headers the version of Git that produced the
object being verified knows; hence it is given that the line-by-line
verification does not check all the header lines individually.  But
at least we know that we know the header part of the object must be
terminated with LF and does not contain a NUL even for any new
header that will be invented in the future.  I.e. an object without
a body and ends with an incomplete line as the last line of the
header will not be allowed, ever.  And the only sane way to verify
that is by scanning the object upfront before we verify the known
ones line-by-line, just like we did with require-eoh.

As long as the code verifies line-by-line (which by the way the
"demotable error level" also depends on to allow it to re-sync to
the next header line after seeing an error in one header line; I do
not expect the line-by-line nature of verification to change in the
future for this reason), "make sure that the header part ends with
LF and before starting to read each header line to verify, make sure
we still have data to read" is not as fragile as you made it sound
in your past few messages, simply because there is no valid reason
to use start_with() with a string that has LF in the actual
verification code.  That would be only necessary if we have a known
header line that consists of a fixed token without any variable data
on it before the terminating LF, but in the context of talking about
Git object header, having such a header line is absurd in the first
place.

But with NUL termination guaranteed, we no longer need "before
starting to read each header, the size says we still have something
to read".  That is why I think updating require-eoh to verify-headers
is the only thing we need to do.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH] fsck: it is OK for a tag and a commit to lack the body
  2015-06-26 15:52               ` Jeff King
  2015-06-26 17:37                 ` Junio C Hamano
@ 2015-06-28 18:18                 ` Junio C Hamano
  2015-06-28 18:21                   ` Eric Sunshine
  2015-06-29  5:12                   ` Johannes Schindelin
  1 sibling, 2 replies; 19+ messages in thread
From: Junio C Hamano @ 2015-06-28 18:18 UTC (permalink / raw)
  To: Jeff King; +Cc: Johannes Schindelin, Wolfgang Denk, git

When fsck validates a commit or a tag object, it scans each line in
the header the object using helper functions such as "start_with()",
etc. that work on a NUL terminated buffer, but before a1e920a0
(index-pack: terminate object buffers with NUL, 2014-12-08), the
validation functions were fed the object data as counted strings,
not necessarily terminated with a NUL.  We added a helper function
require_end_of_header() to be called at the beginning of these
validation functions to insist that the object data contains an
empty line before its end.  The theory is that the validating
functions will notice and stop when it hits an empty line as a
normal end of header (or a required header line that is missing)
before scanning past the end of potentially not NUL-terminated
buffer.

But the theory forgot that in the older days, Git itself happily
created objects with only the header lines without a body. This
caused Git 2.2 and later to issue an unnecessary warning on some
existing repositories.

With a1e920a0, we do not need to require an empty line (or the body)
in these objects to safely parse and validate them.  Drop the
offending "must have an empty line" check from this helper function,
while keeping the other check to make sure that there is no NUL in
the header part of the object, and adjust the name of the helper to
what it does accordingly.

Noticed-by: Wolfgang Denk <wd@denx.de>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * With a proper commit message this time. I did this directly on
   top of a1e920a0 (index-pack: terminate object buffers with NUL,
   2014-12-08); it has trivial merge conflicts with more recent
   codebase, whose resolutions I'll push out later on 'pu'.

 fsck.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/fsck.c b/fsck.c
index 88c92e8..3f264e7 100644
--- a/fsck.c
+++ b/fsck.c
@@ -238,8 +238,8 @@ static int fsck_tree(struct tree *item, int strict, fsck_error error_func)
 	return retval;
 }
 
-static int require_end_of_header(const void *data, unsigned long size,
-	struct object *obj, fsck_error error_func)
+static int verify_headers(const void *data, unsigned long size,
+			  struct object *obj, fsck_error error_func)
 {
 	const char *buffer = (const char *)data;
 	unsigned long i;
@@ -255,6 +255,15 @@ static int require_end_of_header(const void *data, unsigned long size,
 		}
 	}
 
+	/*
+	 * We did not find double-LF that separates the header
+	 * and the body.  Not having a body is not a crime but
+	 * we do want to see the terminating LF for the last header
+	 * line.
+	 */
+	if (size && buffer[size - 1] == '\n')
+		return 0;
+
 	return error_func(obj, FSCK_ERROR, "unterminated header");
 }
 
@@ -305,7 +314,7 @@ static int fsck_commit_buffer(struct commit *commit, const char *buffer,
 	unsigned parent_count, parent_line_count = 0;
 	int err;
 
-	if (require_end_of_header(buffer, size, &commit->object, error_func))
+	if (verify_headers(buffer, size, &commit->object, error_func))
 		return -1;
 
 	if (!skip_prefix(buffer, "tree ", &buffer))
@@ -384,7 +393,7 @@ static int fsck_tag_buffer(struct tag *tag, const char *data,
 		}
 	}
 
-	if (require_end_of_header(buffer, size, &tag->object, error_func))
+	if (verify_headers(buffer, size, &tag->object, error_func))
 		goto done;
 
 	if (!skip_prefix(buffer, "object ", &buffer)) {
-- 
2.5.0-rc0-151-g019519d

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] fsck: it is OK for a tag and a commit to lack the body
  2015-06-28 18:18                 ` [PATCH] fsck: it is OK for a tag and a commit to lack the body Junio C Hamano
@ 2015-06-28 18:21                   ` Eric Sunshine
  2015-06-29  5:12                   ` Johannes Schindelin
  1 sibling, 0 replies; 19+ messages in thread
From: Eric Sunshine @ 2015-06-28 18:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Johannes Schindelin, Wolfgang Denk, Git List

On Sun, Jun 28, 2015 at 2:18 PM, Junio C Hamano <gitster@pobox.com> wrote:
> When fsck validates a commit or a tag object, it scans each line in
> the header the object using helper functions such as "start_with()",

s/header/& of/

> etc. that work on a NUL terminated buffer, but before a1e920a0
> (index-pack: terminate object buffers with NUL, 2014-12-08), the
> validation functions were fed the object data as counted strings,
> not necessarily terminated with a NUL.  We added a helper function
> require_end_of_header() to be called at the beginning of these
> validation functions to insist that the object data contains an
> empty line before its end.  The theory is that the validating
> functions will notice and stop when it hits an empty line as a
> normal end of header (or a required header line that is missing)
> before scanning past the end of potentially not NUL-terminated
> buffer.
>
> But the theory forgot that in the older days, Git itself happily
> created objects with only the header lines without a body. This
> caused Git 2.2 and later to issue an unnecessary warning on some
> existing repositories.
>
> With a1e920a0, we do not need to require an empty line (or the body)
> in these objects to safely parse and validate them.  Drop the
> offending "must have an empty line" check from this helper function,
> while keeping the other check to make sure that there is no NUL in
> the header part of the object, and adjust the name of the helper to
> what it does accordingly.
>
> Noticed-by: Wolfgang Denk <wd@denx.de>
> Helped-by: Jeff King <peff@peff.net>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] fsck: it is OK for a tag and a commit to lack the body
  2015-06-28 18:18                 ` [PATCH] fsck: it is OK for a tag and a commit to lack the body Junio C Hamano
  2015-06-28 18:21                   ` Eric Sunshine
@ 2015-06-29  5:12                   ` Johannes Schindelin
  2015-06-29  5:42                     ` Junio C Hamano
  1 sibling, 1 reply; 19+ messages in thread
From: Johannes Schindelin @ 2015-06-29  5:12 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Wolfgang Denk, git

Hi Junio,

On 2015-06-28 20:18, Junio C Hamano wrote:
> diff --git a/fsck.c b/fsck.c
> index 88c92e8..3f264e7 100644
> --- a/fsck.c
> +++ b/fsck.c
> @@ -255,6 +255,15 @@ static int require_end_of_header(const void
> *data, unsigned long size,
>  		}
>  	}
>  
> +	/*
> +	 * We did not find double-LF that separates the header
> +	 * and the body.  Not having a body is not a crime but
> +	 * we do want to see the terminating LF for the last header
> +	 * line.
> +	 */
> +	if (size && buffer[size - 1] == '\n')
> +		return 0;
> +
>  	return error_func(obj, FSCK_ERROR, "unterminated header");
>  }
>  

Hmm. Maybe we should still warn when there is no empty line finishing the header explicitly, or at least make it FSCK_IGNORE by default so that maintainers who like a stricter check can upgrade the condition to an error?

Otherwise: ACK.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] fsck: it is OK for a tag and a commit to lack the body
  2015-06-29  5:12                   ` Johannes Schindelin
@ 2015-06-29  5:42                     ` Junio C Hamano
  2015-06-29 14:51                       ` Johannes Schindelin
  0 siblings, 1 reply; 19+ messages in thread
From: Junio C Hamano @ 2015-06-29  5:42 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Jeff King, Wolfgang Denk, git

Johannes Schindelin <johannes.schindelin@gmx.de> writes:

>> +	/*
>> +	 * We did not find double-LF that separates the header
>> +	 * and the body.  Not having a body is not a crime but
>> +	 * we do want to see the terminating LF for the last header
>> +	 * line.
>> +	 */
>> +	if (size && buffer[size - 1] == '\n')
>> +		return 0;
>> +
>>  	return error_func(obj, FSCK_ERROR, "unterminated header");
>>  }
>
> Hmm. Maybe we should still warn when there is no empty line finishing
> the header explicitly, or at least make it FSCK_IGNORE by default so
> that maintainers who like a stricter check can upgrade the condition
> to an error?

Wolfgang, do you know how these old tags without messages were
created?  I think in the old days we didn't advertise "git tag" as
the sole way to create a tag object and more people drove "git
mktag" from their script, and "mktag" until e0aaf781 (mktag.c:
improve verification of tagger field and tests, 2008-03-27) did not
complain if the header were not terminated with double-LF even when
the tag did not have a body (hence there is no need for double-LF).

Dscho, I do not think it is reasonable to force all repository
owners of projects that started using Git before early 2008 to set
configuration variable to ignore warnings for something that was
perfectly kosher back when their project started.  More importantly,
even though Git core itself adds unnecessary double-LF after the
header in a tag or a commit that does not have any body, I am not
sure if we punish people who use current versions of third-party
reimplementations of Git that do not write that unnecessary
double-LF at the end an object without a body (I am not saying that
there is any known third-party reimplementation to do so---I am
saying that I do not think it is their bug if such implementations
existed today).

Do we have check marked as FSCK_IGNORE by default?  I think a more
interesting "stricter check" may be to flag a tag object or a commit
object that does not have any body, with or without the double-LF at
the end.

And such a check can certainly be added in the future, but what I
sent was a fix to a regression that caused us to start whining on a
syntactically valid object in the v2.2 timeframe, and is not about
adding a new feature.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] fsck: it is OK for a tag and a commit to lack the body
  2015-06-29  5:42                     ` Junio C Hamano
@ 2015-06-29 14:51                       ` Johannes Schindelin
  0 siblings, 0 replies; 19+ messages in thread
From: Johannes Schindelin @ 2015-06-29 14:51 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, Wolfgang Denk, git

Hi Junio,

On 2015-06-29 07:42, Junio C Hamano wrote:
> Johannes Schindelin <johannes.schindelin@gmx.de> writes:
> 
>> Hmm. Maybe we should still warn when there is no empty line finishing
>> the header explicitly, or at least make it FSCK_IGNORE by default so
>> that maintainers who like a stricter check can upgrade the condition
>> to an error?
> 
> [...]
> 
> And such a check can certainly be added in the future

True. I take my suggestion back.

Thanks for the reality check,
Dscho

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-06-29 14:51 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-25 15:51 git error in tag ...: unterminated header Wolfgang Denk
2015-06-25 17:32 ` Junio C Hamano
2015-06-25 20:13   ` Wolfgang Denk
2015-06-25 20:24     ` Junio C Hamano
2015-06-25 21:07       ` Johannes Schindelin
2015-06-25 21:21         ` Junio C Hamano
2015-06-25 22:29           ` Junio C Hamano
2015-06-26  8:06             ` Johannes Schindelin
2015-06-26 15:52               ` Jeff King
2015-06-26 17:37                 ` Junio C Hamano
2015-06-27  8:57                   ` Johannes Schindelin
2015-06-27 18:36                     ` Junio C Hamano
2015-06-28 18:18                 ` [PATCH] fsck: it is OK for a tag and a commit to lack the body Junio C Hamano
2015-06-28 18:21                   ` Eric Sunshine
2015-06-29  5:12                   ` Johannes Schindelin
2015-06-29  5:42                     ` Junio C Hamano
2015-06-29 14:51                       ` Johannes Schindelin
2015-06-25 20:48     ` git error in tag ...: unterminated header Junio C Hamano
2015-06-25 17:38 ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).