git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* Bug in "git am" when the body starts with spaces
@ 2017-04-01  0:24 Linus Torvalds
  2017-04-01  0:52 ` Linus Torvalds
  0 siblings, 1 reply; 8+ messages in thread
From: Linus Torvalds @ 2017-04-01  0:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 743 bytes --]

Try applying the attached patch with

   git am 0001-Test-patch.patch

in the git repository.

At least for me, it results in a very odd commit that has one single
line in the commit message:

    Test patch This should go in the body not in the subject line

which is obviously bogus.

I think the reason is that the "header continuation line" logic kicks
in because the lines in the body start with spaces, but that's
entirely incorrect, since

 (a) we're not in an email header

 (b) there's an empty line in between anyway, so no way are those body
lines continuation lines.

I didn't check how far back this goes, I guess I'll do that next. But
I thought I'd report it here first in case somebody else goes "ahhh".

                Linus

[-- Attachment #2: 0001-Test-patch.patch --]
[-- Type: text/x-patch, Size: 532 bytes --]

From ad65cf7ba97ac071da1f845ec854165e7bf1efdf Mon Sep 17 00:00:00 2001
From: Linus Torvalds <torvalds@linux-foundation.org>
Date: Fri, 31 Mar 2017 17:18:16 -0700
Subject: [PATCH] Test patch example

Subject: [PATCH] Test patch

  This should go in the body
  not in the subject line
---
 Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Makefile b/Makefile
index 9b36068ac..9f36c149b 100644
--- a/Makefile
+++ b/Makefile
@@ -1,3 +1,4 @@
+
 # The default target of this Makefile is...
 all::
 
-- 
2.12.2.401.g5d4234a49


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Bug in "git am" when the body starts with spaces
  2017-04-01  0:24 Bug in "git am" when the body starts with spaces Linus Torvalds
@ 2017-04-01  0:52 ` Linus Torvalds
  2017-04-01  5:27   ` Jeff King
  2017-04-01 19:03   ` Linus Torvalds
  0 siblings, 2 replies; 8+ messages in thread
From: Linus Torvalds @ 2017-04-01  0:52 UTC (permalink / raw)
  To: Junio C Hamano, Jonathan Tan; +Cc: Git Mailing List

Ok, did a bisect, and this bisects to commit 6b4b013f1884 ("mailinfo:
handle in-body header continuations").

The continuation logic is oddly complex, and I can't follow the logic.
But it is completely broken in how it thinks empty lines are somehow
"continuations".

Jonathan?

                     Linus

On Fri, Mar 31, 2017 at 5:24 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> I think the reason is that the "header continuation line" logic kicks
> in because the lines in the body start with spaces, but that's
> entirely incorrect, since
>
>  (a) we're not in an email header
>
>  (b) there's an empty line in between anyway, so no way are those body
> lines continuation lines.
>
> I didn't check how far back this goes, I guess I'll do that next. But
> I thought I'd report it here first in case somebody else goes "ahhh".

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bug in "git am" when the body starts with spaces
  2017-04-01  0:52 ` Linus Torvalds
@ 2017-04-01  5:27   ` Jeff King
  2017-04-01 19:03   ` Linus Torvalds
  1 sibling, 0 replies; 8+ messages in thread
From: Jeff King @ 2017-04-01  5:27 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Jonathan Tan, Git Mailing List

On Fri, Mar 31, 2017 at 05:52:00PM -0700, Linus Torvalds wrote:

> Ok, did a bisect, and this bisects to commit 6b4b013f1884 ("mailinfo:
> handle in-body header continuations").
> 
> The continuation logic is oddly complex, and I can't follow the logic.
> But it is completely broken in how it thinks empty lines are somehow
> "continuations".

I think the continuation logic is OK. The problem is that the newline is
never fed to check_inbody_header() at all. So the next line its state
machine sees is the indented line, which looks like a continuation.

It seems to me that ignoring that newline is a bug, and causes other
problems. For instance, if you have a blank line and then another
header-looking thing (not a continuation) after, it is respected. For
instance, running mailinfo on:

  From ...mbox header...
  From: Email Author <author@example.com>
  Subject: email subject
  Date: whatever

  From: in-body author <author@example.com>

  Subject: in-body subject

  And the rest of the message.

will use "in-body subject" as the subject, though I'd have thought we
should stop parsing in-body headers as soon as we see the first in-body
blank line (the one after the in-body "From").

The blank line gets eaten by the check at the beginning of
handle_commit_msg(), right _before_ we pass it off to the
check_inbody_header() function.

It seems like that should be more like:

diff --git a/mailinfo.c b/mailinfo.c
index a489d9d0f..6d89781eb 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -757,8 +757,10 @@ static int handle_commit_msg(struct mailinfo *mi, struct strbuf *line)
 	assert(!mi->filter_stage);
 
 	if (mi->header_stage) {
-		if (!line->len || (line->len == 1 && line->buf[0] == '\n'))
+		if (!line->len || (line->len == 1 && line->buf[0] == '\n')) {
+			mi->header_stage = 0;
 			return 0;
+		}
 	}
 
 	if (mi->use_inbody_headers && mi->header_stage) {


But that breaks a bunch of tests. It looks like the loop in handle_body
always starts by feeding the initial header-separator (the real headers,
not the in-body ones) to the various parsers. So that first newline
makes us trigger "no more in-body headers" before we even start parsing
any lines. Oops.

So doing this:

@@ -960,7 +962,7 @@ static void handle_body(struct mailinfo *mi, struct strbuf *line)
 			goto handle_body_out;
 	}
 
-	do {
+	while (!strbuf_getwholeline(line, mi->input, '\n')) {
 		/* process any boundary lines */
 		if (*(mi->content_top) && is_multipart_boundary(mi, line)) {
 			/* flush any leftover */
@@ -1013,7 +1015,7 @@ static void handle_body(struct mailinfo *mi, struct strbuf *line)
 
 		if (mi->input_error)
 			break;
-	} while (!strbuf_getwholeline(line, mi->input, '\n'));
+	}
 
 	flush_inbody_header_accum(mi);
 

on top fixes that. But that still breaks a test that has blank lines at
the beginning of the message, before the in-body header. So probably the
state-machine needs an extra state: sucking up pre-inbody-header
newlines.

-Peff

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Bug in "git am" when the body starts with spaces
  2017-04-01  0:52 ` Linus Torvalds
  2017-04-01  5:27   ` Jeff King
@ 2017-04-01 19:03   ` Linus Torvalds
  2017-04-02  4:18     ` Jeff King
  2017-04-02 17:35     ` Junio C Hamano
  1 sibling, 2 replies; 8+ messages in thread
From: Linus Torvalds @ 2017-04-01 19:03 UTC (permalink / raw)
  To: Junio C Hamano, Jonathan Tan, Jeff King; +Cc: Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 743 bytes --]

On Fri, Mar 31, 2017 at 5:52 PM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> The continuation logic is oddly complex, and I can't follow the logic.
> But it is completely broken in how it thinks empty lines are somehow
> "continuations".

The attached patch seems to work for me. Comments?

The logic is fairly simple: if we encounter an empty line, and we have
pending in-body headers, we flush the pending headers, and mark us as
no longer in header mode.

The code used to just ignore empty lines when in header mode, which is
garbage, exactly because it would happily continue accumulating more
headers.

There may be other cases this misses, but this at least seems to fix
the case I encountered.

                    Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 640 bytes --]

 mailinfo.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/mailinfo.c b/mailinfo.c
index a489d9d0f..68037758f 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -757,8 +757,13 @@ static int handle_commit_msg(struct mailinfo *mi, struct strbuf *line)
 	assert(!mi->filter_stage);
 
 	if (mi->header_stage) {
-		if (!line->len || (line->len == 1 && line->buf[0] == '\n'))
+		if (!line->len || (line->len == 1 && line->buf[0] == '\n')) {
+			if (mi->inbody_header_accum.len) {
+				flush_inbody_header_accum(mi);
+				mi->header_stage = 0;
+			}
 			return 0;
+		}
 	}
 
 	if (mi->use_inbody_headers && mi->header_stage) {

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Bug in "git am" when the body starts with spaces
  2017-04-01 19:03   ` Linus Torvalds
@ 2017-04-02  4:18     ` Jeff King
  2017-04-03 17:42       ` Jonathan Tan
  2017-04-02 17:35     ` Junio C Hamano
  1 sibling, 1 reply; 8+ messages in thread
From: Jeff King @ 2017-04-02  4:18 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Jonathan Tan, Git Mailing List

On Sat, Apr 01, 2017 at 12:03:44PM -0700, Linus Torvalds wrote:

> On Fri, Mar 31, 2017 at 5:52 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> >
> > The continuation logic is oddly complex, and I can't follow the logic.
> > But it is completely broken in how it thinks empty lines are somehow
> > "continuations".
> 
> The attached patch seems to work for me. Comments?
> 
> The logic is fairly simple: if we encounter an empty line, and we have
> pending in-body headers, we flush the pending headers, and mark us as
> no longer in header mode.

Hmm. I think this may work. At first I thought it was too strict in
always checking inbody_header_accum.len, because we want this to kick in
always, whether there's whitespace continuation or not. But that
accumulator has to collect preemptively, before it knows if there's
continuation. So it will always be non-empty if we've seen _any_ header,
and it will remain non-empty as long as we keep parsing (because any
time we flush, we do so in order to handle another line).

IOW, I think this implements the state-machine thing I wrote in my
earlier email, because the state "are we inside in-body header parsing"
is always reflected by having a non-empty accumulator. It is a bit
non-obvious though.

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bug in "git am" when the body starts with spaces
  2017-04-01 19:03   ` Linus Torvalds
  2017-04-02  4:18     ` Jeff King
@ 2017-04-02 17:35     ` Junio C Hamano
  1 sibling, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2017-04-02 17:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jonathan Tan, Jeff King, Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Fri, Mar 31, 2017 at 5:52 PM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
>>
>> The continuation logic is oddly complex, and I can't follow the logic.
>> But it is completely broken in how it thinks empty lines are somehow
>> "continuations".
>
> The attached patch seems to work for me. Comments?

We start at header_stage set to 1, keep skipping empty lines while
in that state, and we stay in that state as long as we see in-body
header (or a continuation of the in-body header we saw earlier).  We
get out of this state when we see a blank line after we are done
with the in-body headers.  Once header_stage is set to 0 with a
blank line, we don't do in-body headers (scissors will roll back the
whole thing and irrelevant to the analysis of correctness).

But you found that "keep skipping empty" done unconditionally is
wrong, because we may have already seen an in-body header and are
trying to see if the line is a continuation (in which case
check_inbody_header() would append to the previous) or another
in-body header (in which case again check_inbody_header() would use
it), or something else (in which case check_inbody_header() will
return false, we get out of header_stage=1 state and process this
line as the first line of the log message.  An empty line we see in
this state must trigger "we are no longer taking in-body header"
logic, too.

And that is exactly your patch does.  The change "feels" correct to
me.


>  mailinfo.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/mailinfo.c b/mailinfo.c
> index a489d9d0f..68037758f 100644
> --- a/mailinfo.c
> +++ b/mailinfo.c
> @@ -757,8 +757,13 @@ static int handle_commit_msg(struct mailinfo *mi, struct strbuf *line)
>  	assert(!mi->filter_stage);
>  
>  	if (mi->header_stage) {
> -		if (!line->len || (line->len == 1 && line->buf[0] == '\n'))
> +		if (!line->len || (line->len == 1 && line->buf[0] == '\n')) {
> +			if (mi->inbody_header_accum.len) {
> +				flush_inbody_header_accum(mi);
> +				mi->header_stage = 0;
> +			}
>  			return 0;
> +		}
>  	}
>  
>  	if (mi->use_inbody_headers && mi->header_stage) {

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bug in "git am" when the body starts with spaces
  2017-04-02  4:18     ` Jeff King
@ 2017-04-03 17:42       ` Jonathan Tan
  2017-04-04  6:50         ` Jeff King
  0 siblings, 1 reply; 8+ messages in thread
From: Jonathan Tan @ 2017-04-03 17:42 UTC (permalink / raw)
  To: Jeff King, Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List

Thanks, everyone, for looking into this.

On 04/01/2017 09:18 PM, Jeff King wrote:
> On Sat, Apr 01, 2017 at 12:03:44PM -0700, Linus Torvalds wrote:
>> The logic is fairly simple: if we encounter an empty line, and we have
>> pending in-body headers, we flush the pending headers, and mark us as
>> no longer in header mode.
>
> Hmm. I think this may work. At first I thought it was too strict in
> always checking inbody_header_accum.len, because we want this to kick in
> always, whether there's whitespace continuation or not. But that
> accumulator has to collect preemptively, before it knows if there's
> continuation. So it will always be non-empty if we've seen _any_ header,
> and it will remain non-empty as long as we keep parsing (because any
> time we flush, we do so in order to handle another line).
>
> IOW, I think this implements the state-machine thing I wrote in my
> earlier email, because the state "are we inside in-body header parsing"
> is always reflected by having a non-empty accumulator. It is a bit
> non-obvious though.

About obviousness, I think of a non-empty accumulator merely 
representing that the next line could potentially be a continuation 
line. And it is coincidence that this implies "are we inside in-body 
header parsing"; if not all in-body header lines could be "continued", 
there would be no such implication.

mi->inbody_header_accum.len is already used in check_inbody_header to 
mean "could the next line potentially be a continuation line" and to 
trigger a check for a negative criterion (in this case, a scissors 
line). I think it's fine to do the same thing, the negative criterion 
here being a blank line.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Bug in "git am" when the body starts with spaces
  2017-04-03 17:42       ` Jonathan Tan
@ 2017-04-04  6:50         ` Jeff King
  0 siblings, 0 replies; 8+ messages in thread
From: Jeff King @ 2017-04-04  6:50 UTC (permalink / raw)
  To: Jonathan Tan; +Cc: Linus Torvalds, Junio C Hamano, Git Mailing List

On Mon, Apr 03, 2017 at 10:42:46AM -0700, Jonathan Tan wrote:

> On 04/01/2017 09:18 PM, Jeff King wrote:
> > On Sat, Apr 01, 2017 at 12:03:44PM -0700, Linus Torvalds wrote:
> > > The logic is fairly simple: if we encounter an empty line, and we have
> > > pending in-body headers, we flush the pending headers, and mark us as
> > > no longer in header mode.
> > 
> > Hmm. I think this may work. At first I thought it was too strict in
> > always checking inbody_header_accum.len, because we want this to kick in
> > always, whether there's whitespace continuation or not. But that
> > accumulator has to collect preemptively, before it knows if there's
> > continuation. So it will always be non-empty if we've seen _any_ header,
> > and it will remain non-empty as long as we keep parsing (because any
> > time we flush, we do so in order to handle another line).
> > 
> > IOW, I think this implements the state-machine thing I wrote in my
> > earlier email, because the state "are we inside in-body header parsing"
> > is always reflected by having a non-empty accumulator. It is a bit
> > non-obvious though.
> 
> About obviousness, I think of a non-empty accumulator merely representing
> that the next line could potentially be a continuation line. And it is
> coincidence that this implies "are we inside in-body header parsing"; if not
> all in-body header lines could be "continued", there would be no such
> implication.
> 
> mi->inbody_header_accum.len is already used in check_inbody_header to mean
> "could the next line potentially be a continuation line" and to trigger a
> check for a negative criterion (in this case, a scissors line). I think it's
> fine to do the same thing, the negative criterion here being a blank line.

FWIW, I looked into making a single "state" variable, but it actually
got ugly pretty quickly. I think not because it's not a cleaner method
in the long run, but just because the existing code is so dependent on
individual state variables that needed changing. So I'm OK with Linus's
fix; it certainly follows the existing code patterns.

-Peff

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-04-04  6:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-01  0:24 Bug in "git am" when the body starts with spaces Linus Torvalds
2017-04-01  0:52 ` Linus Torvalds
2017-04-01  5:27   ` Jeff King
2017-04-01 19:03   ` Linus Torvalds
2017-04-02  4:18     ` Jeff King
2017-04-03 17:42       ` Jonathan Tan
2017-04-04  6:50         ` Jeff King
2017-04-02 17:35     ` Junio C Hamano

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).