git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH] mailinfo: unescape quoted-pair in header fields
@ 2016-09-16 21:02 Kevin Daudt
  2016-09-16 22:22 ` Jeff King
                   ` (3 more replies)
  0 siblings, 4 replies; 32+ messages in thread
From: Kevin Daudt @ 2016-09-16 21:02 UTC (permalink / raw)
  To: git; +Cc: Kevin Daudt, Swift Geek, Jeff King, Junio C Hamano

rfc2822 has provisions for quoted strings in structured header fields,
but also allows for escaping these with so-called quoted-pairs.

The only thing git currently does is removing exterior quotes, but
quotes within are left alone.

Tell mailinfo to remove exterior quotes and remove escape characters from the
author so that they don't show up in the commits author field.

Signed-off-by: Kevin Daudt <me@ikke.info>
---
The only thing I could not easily fix is the prevent git am from removing any quotes around the author. This is done in fmt_ident, which calls `strbuf_addstr_without_crud`. 

 mailinfo.c                 | 54 ++++++++++++++++++++++++++++++++++++++++++++++
 t/t5100-mailinfo.sh        |  6 ++++++
 t/t5100/quoted-pair.expect |  5 +++++
 t/t5100/quoted-pair.in     |  9 ++++++++
 4 files changed, 74 insertions(+)
 create mode 100644 t/t5100/quoted-pair.expect
 create mode 100644 t/t5100/quoted-pair.in

diff --git a/mailinfo.c b/mailinfo.c
index e19abe3..04036f3 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -54,15 +54,69 @@ static void parse_bogus_from(struct mailinfo *mi, const struct strbuf *line)
 	get_sane_name(&mi->name, &mi->name, &mi->email);
 }
 
+static int unquote_quoted_string(struct strbuf *line)
+{
+	struct strbuf outbuf;
+	const char *in = line->buf;
+	int c, take_next_literally = 0;
+	int found_error = 0;
+	char escape_context=0;
+
+	strbuf_init(&outbuf, line->len);
+
+	while ((c = *in++) != 0) {
+		if (take_next_literally) {
+			take_next_literally = 0;
+		} else {
+			switch (c) {
+			case '"':
+				if (!escape_context)
+					escape_context = '"';
+				else if (escape_context == '"')
+					escape_context = 0;
+				continue;
+			case '\\':
+				if (escape_context) {
+					take_next_literally = 1;
+					continue;
+				}
+				break;
+			case '(':
+				if (!escape_context)
+					escape_context = '(';
+				else if (escape_context == '(')
+					found_error = 1;
+				break;
+			case ')':
+				if (escape_context == '(')
+					escape_context = 0;
+				break;
+			}
+		}
+
+		strbuf_addch(&outbuf, c);
+	}
+
+	strbuf_reset(line);
+	strbuf_addbuf(line, &outbuf);
+	strbuf_release(&outbuf);
+
+	return found_error;
+
+}
+
 static void handle_from(struct mailinfo *mi, const struct strbuf *from)
 {
 	char *at;
 	size_t el;
 	struct strbuf f;
 
+
 	strbuf_init(&f, from->len);
 	strbuf_addbuf(&f, from);
 
+	unquote_quoted_string(&f);
+
 	at = strchr(f.buf, '@');
 	if (!at) {
 		parse_bogus_from(mi, from);
diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh
index 1a5a546..d0c21fc 100755
--- a/t/t5100-mailinfo.sh
+++ b/t/t5100-mailinfo.sh
@@ -142,4 +142,10 @@ test_expect_success 'mailinfo unescapes with --mboxrd' '
 	test_cmp expect mboxrd/msg
 '
 
+test_expect_success 'mailinfo unescapes rfc2822 quoted-string' '
+    mkdir quoted-pair &&
+    git mailinfo /dev/null /dev/null <"$TEST_DIRECTORY"/t5100/quoted-pair.in >quoted-pair/info &&
+    test_cmp "$TEST_DIRECTORY"/t5100/quoted-pair.expect quoted-pair/info
+'
+
 test_done
diff --git a/t/t5100/quoted-pair.expect b/t/t5100/quoted-pair.expect
new file mode 100644
index 0000000..cab1bce
--- /dev/null
+++ b/t/t5100/quoted-pair.expect
@@ -0,0 +1,5 @@
+Author: Author "The Author" Name
+Email: somebody@example.com
+Subject: testing quoted-pair
+Date: Sun, 25 May 2008 00:38:18 -0700
+
diff --git a/t/t5100/quoted-pair.in b/t/t5100/quoted-pair.in
new file mode 100644
index 0000000..e2e627a
--- /dev/null
+++ b/t/t5100/quoted-pair.in
@@ -0,0 +1,9 @@
+From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
+From: "Author \"The Author\" Name" <somebody@example.com>
+Date: Sun, 25 May 2008 00:38:18 -0700
+Subject: [PATCH] testing quoted-pair
+
+
+
+---
+patch
-- 
2.10.0.86.g6ffa4f1.dirty


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH] mailinfo: unescape quoted-pair in header fields
  2016-09-16 21:02 [PATCH] mailinfo: unescape quoted-pair in header fields Kevin Daudt
@ 2016-09-16 22:22 ` Jeff King
  2016-09-19 10:51   ` Kevin Daudt
  2016-09-19 18:54 ` [PATCH v2 0/2] Handle escape characters in From field Kevin Daudt
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 32+ messages in thread
From: Jeff King @ 2016-09-16 22:22 UTC (permalink / raw)
  To: Kevin Daudt; +Cc: git, Swift Geek, Junio C Hamano

On Fri, Sep 16, 2016 at 11:02:04PM +0200, Kevin Daudt wrote:

> rfc2822 has provisions for quoted strings in structured header fields,
> but also allows for escaping these with so-called quoted-pairs.
> 
> The only thing git currently does is removing exterior quotes, but
> quotes within are left alone.
> 
> Tell mailinfo to remove exterior quotes and remove escape characters from the
> author so that they don't show up in the commits author field.
> 
> Signed-off-by: Kevin Daudt <me@ikke.info>
> ---
> The only thing I could not easily fix is the prevent git am from
> removing any quotes around the author. This is done in fmt_ident,
> which calls `strbuf_addstr_without_crud`. 

Ah, OK. I was wondering where that stripping was being done. That makes
sense, and makes me doubly confident this is the right place to be doing
it, since the other quote-stripping was not even intentional, but just a
side effect of the low-level routines.

I think it is OK to leave it in place. If you really want your name to
be:

  "My Name is Always in Quotes"

then tough luck. Git does not support it via git-am, but nor does it via
git-commit, etc.

>  mailinfo.c                 | 54 ++++++++++++++++++++++++++++++++++++++++++++++
>  t/t5100-mailinfo.sh        |  6 ++++++
>  t/t5100/quoted-pair.expect |  5 +++++
>  t/t5100/quoted-pair.in     |  9 ++++++++
>  4 files changed, 74 insertions(+)
>  create mode 100644 t/t5100/quoted-pair.expect
>  create mode 100644 t/t5100/quoted-pair.in
> 
> diff --git a/mailinfo.c b/mailinfo.c
> index e19abe3..04036f3 100644
> --- a/mailinfo.c
> +++ b/mailinfo.c
> @@ -54,15 +54,69 @@ static void parse_bogus_from(struct mailinfo *mi, const struct strbuf *line)
>  	get_sane_name(&mi->name, &mi->name, &mi->email);
>  }
>  
> +static int unquote_quoted_string(struct strbuf *line)
> +{
> +	struct strbuf outbuf;
> +	const char *in = line->buf;
> +	int c, take_next_literally = 0;
> +	int found_error = 0;
> +	char escape_context=0;

Style: whitespace around "=".

I had to wonder why we needed both escape_context and
take_next_literally; shouldn't we just need a single state bit. But
escape_context is not "escape the next character", it is "we are
currently in a mode where we should be escaping".

Could we give it a more descriptive name? I guess it is more than just
"we are in a mode", but rather "here is the character that will end the
escaped mode". Maybe a comment would be more appropriate.

> +	while ((c = *in++) != 0) {
> +		if (take_next_literally) {
> +			take_next_literally = 0;
> +		} else {

OK, so that means the previous one was backslash-quoted, and we don't do
any other cleverness. Good.

> +			switch (c) {
> +			case '"':
> +				if (!escape_context)
> +					escape_context = '"';
> +				else if (escape_context == '"')
> +					escape_context = 0;
> +				continue;

And here we open or close the quoted portion, depending. Makes sense.

> +			case '\\':
> +				if (escape_context) {
> +					take_next_literally = 1;
> +					continue;
> +				}
> +				break;

I didn't look in the RFC. Is:

  From: my \"name\" <foo@example.com>

really the same as:

  From: "my \\\"name\\\"" <foo@example.com>

? That seems weird, but I think it may be that the former is simply
bogus (you are not supposed to use backslashes outside of the quoted
section at all).

> +			case '(':
> +				if (!escape_context)
> +					escape_context = '(';
> +				else if (escape_context == '(')
> +					found_error = 1;
> +				break;

Hmm. Is:

  From: Name (Comment with (another comment))

really disallowed? RFC2822 seems to say that "comment" can contain
"ccontent", which can itself be a comment.

This is obviously getting pretty silly, but if we are going to follow
the RFC, I think you actually have to do a recursive parse, and keep
track of an arbitrary depth of context.

I dunno. This method probably covers most cases in practice, and it's
easy to reason about.

> +			case ')':
> +				if (escape_context == '(')
> +					escape_context = 0;
> +				break;
> +			}
> +		}
> +
> +		strbuf_addch(&outbuf, c);
> +	}
> +
> +	strbuf_reset(line);
> +	strbuf_addbuf(line, &outbuf);
> +	strbuf_release(&outbuf);

I think you can use strbuf_swap() here to avoid copying the line an
extra time, like:

  strbuf_swap(line, &outbuf);
  strbuf_release(&outbuf);

Another option would be to just:

  in = strbuf_detach(&line);

at the beginning, and then output back into "line".

> +	return found_error;

What happens when we get here and take_next_literally is set? I.e., a
backslash at the end of the string. We'll silently print nothing, which
seems reasonable to me (the other option is to print a literal
backslash).

Ditto, what if escape_context is non-zero? We're in the middle of an
unterminated quoted string (or comment).

I'm fine with silently continuing, but it seems weird that we notice
embedded comments (and return an error), but not these other conditions.

>  static void handle_from(struct mailinfo *mi, const struct strbuf *from)
>  {
>  	char *at;
>  	size_t el;
>  	struct strbuf f;
>  
> +
>  	strbuf_init(&f, from->len);
>  	strbuf_addbuf(&f, from);

Funny extra line?

> +test_expect_success 'mailinfo unescapes rfc2822 quoted-string' '
> +    mkdir quoted-pair &&
> +    git mailinfo /dev/null /dev/null <"$TEST_DIRECTORY"/t5100/quoted-pair.in >quoted-pair/info &&
> +    test_cmp "$TEST_DIRECTORY"/t5100/quoted-pair.expect quoted-pair/info
> +'

We usually break long lines with backslash-escapes. Like:

  git mailinfo /dev/null /dev/null \
	<"$TEST_DIRECTORY"/t5100/quoted-pair.in \
	>quoted-pair/info

I'd also wonder if things might be made much more readable by putting
"$TEST_DIRECTORY/t5100" into a shorter variable like $data or something.
That would be best done as a preparatory patch which updates all of the
tests.

> --- /dev/null
> +++ b/t/t5100/quoted-pair.in
> @@ -0,0 +1,9 @@
> +From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
> +From: "Author \"The Author\" Name" <somebody@example.com>
> +Date: Sun, 25 May 2008 00:38:18 -0700
> +Subject: [PATCH] testing quoted-pair

I do not care that much about the "()" comment behavior myself, but if
we are going to implement it, it probably makes sense to protect it from
regression with a test.

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mailinfo: unescape quoted-pair in header fields
  2016-09-16 22:22 ` Jeff King
@ 2016-09-19 10:51   ` Kevin Daudt
  2016-09-20  3:57     ` Jeff King
  0 siblings, 1 reply; 32+ messages in thread
From: Kevin Daudt @ 2016-09-19 10:51 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Swift Geek, Junio C Hamano

Thanks for the review

On Fri, Sep 16, 2016 at 03:22:06PM -0700, Jeff King wrote:
> On Fri, Sep 16, 2016 at 11:02:04PM +0200, Kevin Daudt wrote:
> 
> >  mailinfo.c                 | 54 ++++++++++++++++++++++++++++++++++++++++++++++
> >  t/t5100-mailinfo.sh        |  6 ++++++
> >  t/t5100/quoted-pair.expect |  5 +++++
> >  t/t5100/quoted-pair.in     |  9 ++++++++
> >  4 files changed, 74 insertions(+)
> >  create mode 100644 t/t5100/quoted-pair.expect
> >  create mode 100644 t/t5100/quoted-pair.in
> > 
> > diff --git a/mailinfo.c b/mailinfo.c
> > index e19abe3..04036f3 100644
> > --- a/mailinfo.c
> > +++ b/mailinfo.c
> > @@ -54,15 +54,69 @@ static void parse_bogus_from(struct mailinfo *mi, const struct strbuf *line)
> >  	get_sane_name(&mi->name, &mi->name, &mi->email);
> >  }
> >  
> > +static int unquote_quoted_string(struct strbuf *line)
> > +{
> > +	struct strbuf outbuf;
> > +	const char *in = line->buf;
> > +	int c, take_next_literally = 0;
> > +	int found_error = 0;
> > +	char escape_context=0;
> 
> Style: whitespace around "=".
> 
> I had to wonder why we needed both escape_context and
> take_next_literally; shouldn't we just need a single state bit. But
> escape_context is not "escape the next character", it is "we are
> currently in a mode where we should be escaping".
> 
> Could we give it a more descriptive name? I guess it is more than just
> "we are in a mode", but rather "here is the character that will end the
> escaped mode". Maybe a comment would be more appropriate.
> 

Yes, your analysis is right, we need to know what character would end
the 'escape context'. I'll add a comment.

> > +	while ((c = *in++) != 0) {
> > +		if (take_next_literally) {
> > +			take_next_literally = 0;
> > +		} else {
> 
> OK, so that means the previous one was backslash-quoted, and we don't do
> any other cleverness. Good.
> 
> > +			switch (c) {
> > +			case '"':
> > +				if (!escape_context)
> > +					escape_context = '"';
> > +				else if (escape_context == '"')
> > +					escape_context = 0;
> > +				continue;
> 
> And here we open or close the quoted portion, depending. Makes sense.
> 
> > +			case '\\':
> > +				if (escape_context) {
> > +					take_next_literally = 1;
> > +					continue;
> > +				}
> > +				break;
> 
> I didn't look in the RFC. Is:
> 
>   From: my \"name\" <foo@example.com>
> 
> really the same as:
> 
>   From: "my \\\"name\\\"" <foo@example.com>
> 
> ? That seems weird, but I think it may be that the former is simply
> bogus (you are not supposed to use backslashes outside of the quoted
> section at all).

Correct, the quoted-pair (escape sequence) can only occur in a quoted
string or a comment. Even more so, the display name *needs* to be quoted
when consisting of more then one word according to the RFC.

> 
> > +			case '(':
> > +				if (!escape_context)
> > +					escape_context = '(';
> > +				else if (escape_context == '(')
> > +					found_error = 1;
> > +				break;
> 
> Hmm. Is:
> 
>   From: Name (Comment with (another comment))
> 
> really disallowed? RFC2822 seems to say that "comment" can contain
> "ccontent", which can itself be a comment.

Yes, you are right, it is allowed, I was just looking at the ctext when
adding this, but failed to see that comments can be nested at that time.

> 
> This is obviously getting pretty silly, but if we are going to follow
> the RFC, I think you actually have to do a recursive parse, and keep
> track of an arbitrary depth of context.
> 
> I dunno. This method probably covers most cases in practice, and it's
> easy to reason about.

The problem is, how do you differentiate between nested comments, and
escaped braces within a comment after one run?
> 
> > +			case ')':
> > +				if (escape_context == '(')
> > +					escape_context = 0;
> > +				break;
> > +			}
> > +		}
> > +
> > +		strbuf_addch(&outbuf, c);
> > +	}
> > +
> > +	strbuf_reset(line);
> > +	strbuf_addbuf(line, &outbuf);
> > +	strbuf_release(&outbuf);
> 
> I think you can use strbuf_swap() here to avoid copying the line an
> extra time, like:
> 
>   strbuf_swap(line, &outbuf);
>   strbuf_release(&outbuf);
> 
> Another option would be to just:
> 
>   in = strbuf_detach(&line);
> 
> at the beginning, and then output back into "line".
> 

Thanks, I just looked at what other functions were doing, but this is
much better indeed.

> > +	return found_error;
> 
> What happens when we get here and take_next_literally is set? I.e., a
> backslash at the end of the string. We'll silently print nothing, which
> seems reasonable to me (the other option is to print a literal
> backslash).
> 
> Ditto, what if escape_context is non-zero? We're in the middle of an
> unterminated quoted string (or comment).
> 
> I'm fine with silently continuing, but it seems weird that we notice
> embedded comments (and return an error), but not these other conditions.
> 

I agree. I'm thinking it's better to just be lenient in this method. If
a quote wasn't properly closed, there would be no e-mail adress for
example. I think it would do little harm, and I'd remove the checking
for the opening brace too.

> >  static void handle_from(struct mailinfo *mi, const struct strbuf *from)
> >  {
> >  	char *at;
> >  	size_t el;
> >  	struct strbuf f;
> >  
> > +
> >  	strbuf_init(&f, from->len);
> >  	strbuf_addbuf(&f, from);
> 
> Funny extra line?

Ugh

> 
> > +test_expect_success 'mailinfo unescapes rfc2822 quoted-string' '
> > +    mkdir quoted-pair &&
> > +    git mailinfo /dev/null /dev/null <"$TEST_DIRECTORY"/t5100/quoted-pair.in >quoted-pair/info &&
> > +    test_cmp "$TEST_DIRECTORY"/t5100/quoted-pair.expect quoted-pair/info
> > +'
> 
> We usually break long lines with backslash-escapes. Like:
> 
>   git mailinfo /dev/null /dev/null \
> 	<"$TEST_DIRECTORY"/t5100/quoted-pair.in \
> 	>quoted-pair/info
> 
> I'd also wonder if things might be made much more readable by putting
> "$TEST_DIRECTORY/t5100" into a shorter variable like $data or something.
> That would be best done as a preparatory patch which updates all of the
> tests.
> 
> > --- /dev/null
> > +++ b/t/t5100/quoted-pair.in
> > @@ -0,0 +1,9 @@
> > +From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
> > +From: "Author \"The Author\" Name" <somebody@example.com>
> > +Date: Sun, 25 May 2008 00:38:18 -0700
> > +Subject: [PATCH] testing quoted-pair
> 
> I do not care that much about the "()" comment behavior myself, but if
> we are going to implement it, it probably makes sense to protect it from
> regression with a test.

Yeah, good idea.
> 
> -Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 0/2] Handle escape characters in From field.
  2016-09-16 21:02 [PATCH] mailinfo: unescape quoted-pair in header fields Kevin Daudt
  2016-09-16 22:22 ` Jeff King
@ 2016-09-19 18:54 ` Kevin Daudt
  2016-09-25 21:08   ` [PATCH v3 1/2] t5100-mailinfo: replace common path prefix with variable Kevin Daudt
  2016-09-19 18:54 ` [PATCH v2 1/2] t5100-mailinfo: replace common path prefix with variable Kevin Daudt
  2016-09-19 18:54 ` [PATCH v2 2/2] mailinfo: unescape quoted-pair in header fields Kevin Daudt
  3 siblings, 1 reply; 32+ messages in thread
From: Kevin Daudt @ 2016-09-19 18:54 UTC (permalink / raw)
  To: git; +Cc: Kevin Daudt, Swift Geek, Jeff King, Junio C Hamano

Changes since v2:
- detach from input parameter to reuse it as an output buffer
- don't return error when encountering another open bracket in a comment
- test escaping in comments

Kevin Daudt (2):
  t5100-mailinfo: replace common path prefix with variable
  mailinfo: unescape quoted-pair in header fields

 mailinfo.c                   | 46 +++++++++++++++++++++++++++++
 t/t5100-mailinfo.sh          | 70 +++++++++++++++++++++++++++-----------------
 t/t5100/comment.expect       |  5 ++++
 t/t5100/comment.in           |  9 ++++++
 t/t5100/quoted-string.expect |  5 ++++
 t/t5100/quoted-string.in     |  9 ++++++
 6 files changed, 117 insertions(+), 27 deletions(-)
 create mode 100644 t/t5100/comment.expect
 create mode 100644 t/t5100/comment.in
 create mode 100644 t/t5100/quoted-string.expect
 create mode 100644 t/t5100/quoted-string.in

-- 
2.10.0.86.g6ffa4f1.dirty


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 1/2] t5100-mailinfo: replace common path prefix with variable
  2016-09-16 21:02 [PATCH] mailinfo: unescape quoted-pair in header fields Kevin Daudt
  2016-09-16 22:22 ` Jeff King
  2016-09-19 18:54 ` [PATCH v2 0/2] Handle escape characters in From field Kevin Daudt
@ 2016-09-19 18:54 ` Kevin Daudt
  2016-09-19 21:16   ` Junio C Hamano
  2016-09-19 18:54 ` [PATCH v2 2/2] mailinfo: unescape quoted-pair in header fields Kevin Daudt
  3 siblings, 1 reply; 32+ messages in thread
From: Kevin Daudt @ 2016-09-19 18:54 UTC (permalink / raw)
  To: git; +Cc: Kevin Daudt, Swift Geek, Jeff King, Junio C Hamano

Many tests need to store data in a file, and repeat the same pattern to
refer to that path:

    "$TEST_DATA"/t5100/

Create a variable that contains this path, and use that instead.

Signed-off-by: Kevin Daudt <me@ikke.info>
---
 t/t5100-mailinfo.sh | 56 +++++++++++++++++++++++++++--------------------------
 1 file changed, 29 insertions(+), 27 deletions(-)

diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh
index 1a5a546..27bf3b8 100755
--- a/t/t5100-mailinfo.sh
+++ b/t/t5100-mailinfo.sh
@@ -7,8 +7,10 @@ test_description='git mailinfo and git mailsplit test'
 
 . ./test-lib.sh
 
+DATA="$TEST_DIRECTORY/t5100"
+
 test_expect_success 'split sample box' \
-	'git mailsplit -o. "$TEST_DIRECTORY"/t5100/sample.mbox >last &&
+	'git mailsplit -o. "$DATA"/sample.mbox >last &&
 	last=$(cat last) &&
 	echo total is $last &&
 	test $(cat last) = 17'
@@ -17,9 +19,9 @@ check_mailinfo () {
 	mail=$1 opt=$2
 	mo="$mail$opt"
 	git mailinfo -u $opt msg$mo patch$mo <$mail >info$mo &&
-	test_cmp "$TEST_DIRECTORY"/t5100/msg$mo msg$mo &&
-	test_cmp "$TEST_DIRECTORY"/t5100/patch$mo patch$mo &&
-	test_cmp "$TEST_DIRECTORY"/t5100/info$mo info$mo
+	test_cmp "$DATA"/msg$mo msg$mo &&
+	test_cmp "$DATA"/patch$mo patch$mo &&
+	test_cmp "$DATA"/info$mo info$mo
 }
 
 
@@ -27,15 +29,15 @@ for mail in 00*
 do
 	test_expect_success "mailinfo $mail" '
 		check_mailinfo $mail "" &&
-		if test -f "$TEST_DIRECTORY"/t5100/msg$mail--scissors
+		if test -f "$DATA"/msg$mail--scissors
 		then
 			check_mailinfo $mail --scissors
 		fi &&
-		if test -f "$TEST_DIRECTORY"/t5100/msg$mail--no-inbody-headers
+		if test -f "$DATA"/msg$mail--no-inbody-headers
 		then
 			check_mailinfo $mail --no-inbody-headers
 		fi &&
-		if test -f "$TEST_DIRECTORY"/t5100/msg$mail--message-id
+		if test -f "$DATA"/msg$mail--message-id
 		then
 			check_mailinfo $mail --message-id
 		fi
@@ -45,7 +47,7 @@ done
 
 test_expect_success 'split box with rfc2047 samples' \
 	'mkdir rfc2047 &&
-	git mailsplit -orfc2047 "$TEST_DIRECTORY"/t5100/rfc2047-samples.mbox \
+	git mailsplit -orfc2047 "$DATA"/rfc2047-samples.mbox \
 	  >rfc2047/last &&
 	last=$(cat rfc2047/last) &&
 	echo total is $last &&
@@ -56,18 +58,18 @@ do
 	test_expect_success "mailinfo $mail" '
 		git mailinfo -u $mail-msg $mail-patch <$mail >$mail-info &&
 		echo msg &&
-		test_cmp "$TEST_DIRECTORY"/t5100/empty $mail-msg &&
+		test_cmp "$DATA"/empty $mail-msg &&
 		echo patch &&
-		test_cmp "$TEST_DIRECTORY"/t5100/empty $mail-patch &&
+		test_cmp "$DATA"/empty $mail-patch &&
 		echo info &&
-		test_cmp "$TEST_DIRECTORY"/t5100/rfc2047-info-$(basename $mail) $mail-info
+		test_cmp "$DATA"/rfc2047-info-$(basename $mail) $mail-info
 	'
 done
 
 test_expect_success 'respect NULs' '
 
-	git mailsplit -d3 -o. "$TEST_DIRECTORY"/t5100/nul-plain &&
-	test_cmp "$TEST_DIRECTORY"/t5100/nul-plain 001 &&
+	git mailsplit -d3 -o. "$DATA"/nul-plain &&
+	test_cmp "$DATA"/nul-plain 001 &&
 	(cat 001 | git mailinfo msg patch) &&
 	test_line_count = 4 patch
 
@@ -75,52 +77,52 @@ test_expect_success 'respect NULs' '
 
 test_expect_success 'Preserve NULs out of MIME encoded message' '
 
-	git mailsplit -d5 -o. "$TEST_DIRECTORY"/t5100/nul-b64.in &&
-	test_cmp "$TEST_DIRECTORY"/t5100/nul-b64.in 00001 &&
+	git mailsplit -d5 -o. "$DATA"/nul-b64.in &&
+	test_cmp "$DATA"/nul-b64.in 00001 &&
 	git mailinfo msg patch <00001 &&
-	test_cmp "$TEST_DIRECTORY"/t5100/nul-b64.expect patch
+	test_cmp "$DATA"/nul-b64.expect patch
 
 '
 
 test_expect_success 'mailinfo on from header without name works' '
 
 	mkdir info-from &&
-	git mailsplit -oinfo-from "$TEST_DIRECTORY"/t5100/info-from.in &&
-	test_cmp "$TEST_DIRECTORY"/t5100/info-from.in info-from/0001 &&
+	git mailsplit -oinfo-from "$DATA"/info-from.in &&
+	test_cmp "$DATA"/info-from.in info-from/0001 &&
 	git mailinfo info-from/msg info-from/patch \
 	  <info-from/0001 >info-from/out &&
-	test_cmp "$TEST_DIRECTORY"/t5100/info-from.expect info-from/out
+	test_cmp "$DATA"/info-from.expect info-from/out
 
 '
 
 test_expect_success 'mailinfo finds headers after embedded From line' '
 	mkdir embed-from &&
-	git mailsplit -oembed-from "$TEST_DIRECTORY"/t5100/embed-from.in &&
-	test_cmp "$TEST_DIRECTORY"/t5100/embed-from.in embed-from/0001 &&
+	git mailsplit -oembed-from "$DATA"/embed-from.in &&
+	test_cmp "$DATA"/embed-from.in embed-from/0001 &&
 	git mailinfo embed-from/msg embed-from/patch \
 	  <embed-from/0001 >embed-from/out &&
-	test_cmp "$TEST_DIRECTORY"/t5100/embed-from.expect embed-from/out
+	test_cmp "$DATA"/embed-from.expect embed-from/out
 '
 
 test_expect_success 'mailinfo on message with quoted >From' '
 	mkdir quoted-from &&
-	git mailsplit -oquoted-from "$TEST_DIRECTORY"/t5100/quoted-from.in &&
-	test_cmp "$TEST_DIRECTORY"/t5100/quoted-from.in quoted-from/0001 &&
+	git mailsplit -oquoted-from "$DATA"/quoted-from.in &&
+	test_cmp "$DATA"/quoted-from.in quoted-from/0001 &&
 	git mailinfo quoted-from/msg quoted-from/patch \
 	  <quoted-from/0001 >quoted-from/out &&
-	test_cmp "$TEST_DIRECTORY"/t5100/quoted-from.expect quoted-from/msg
+	test_cmp "$DATA"/quoted-from.expect quoted-from/msg
 '
 
 test_expect_success 'mailinfo unescapes with --mboxrd' '
 	mkdir mboxrd &&
 	git mailsplit -omboxrd --mboxrd \
-		"$TEST_DIRECTORY"/t5100/sample.mboxrd >last &&
+		"$DATA"/sample.mboxrd >last &&
 	test x"$(cat last)" = x2 &&
 	for i in 0001 0002
 	do
 		git mailinfo mboxrd/msg mboxrd/patch \
 		  <mboxrd/$i >mboxrd/out &&
-		test_cmp "$TEST_DIRECTORY"/t5100/${i}mboxrd mboxrd/msg
+		test_cmp "$DATA"/${i}mboxrd mboxrd/msg
 	done &&
 	sp=" " &&
 	echo "From " >expect &&
-- 
2.10.0.86.g6ffa4f1.dirty


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v2 2/2] mailinfo: unescape quoted-pair in header fields
  2016-09-16 21:02 [PATCH] mailinfo: unescape quoted-pair in header fields Kevin Daudt
                   ` (2 preceding siblings ...)
  2016-09-19 18:54 ` [PATCH v2 1/2] t5100-mailinfo: replace common path prefix with variable Kevin Daudt
@ 2016-09-19 18:54 ` Kevin Daudt
  2016-09-19 21:24   ` Junio C Hamano
                     ` (2 more replies)
  3 siblings, 3 replies; 32+ messages in thread
From: Kevin Daudt @ 2016-09-19 18:54 UTC (permalink / raw)
  To: git; +Cc: Kevin Daudt, Swift Geek, Jeff King, Junio C Hamano

rfc2822 has provisions for quoted strings in structured header fields,
but also allows for escaping these with so-called quoted-pairs.

The only thing git currently does is removing exterior quotes, but
quotes within are left alone.

Remove exterior quotes and remove escape characters so that they don't
show up in the author field.

Signed-off-by: Kevin Daudt <me@ikke.info>
---
 mailinfo.c                   | 46 ++++++++++++++++++++++++++++++++++++++++++++
 t/t5100-mailinfo.sh          | 14 ++++++++++++++
 t/t5100/comment.expect       |  5 +++++
 t/t5100/comment.in           |  9 +++++++++
 t/t5100/quoted-string.expect |  5 +++++
 t/t5100/quoted-string.in     |  9 +++++++++
 6 files changed, 88 insertions(+)
 create mode 100644 t/t5100/comment.expect
 create mode 100644 t/t5100/comment.in
 create mode 100644 t/t5100/quoted-string.expect
 create mode 100644 t/t5100/quoted-string.in

diff --git a/mailinfo.c b/mailinfo.c
index e19abe3..6a7c2f2 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -54,6 +54,50 @@ static void parse_bogus_from(struct mailinfo *mi, const struct strbuf *line)
 	get_sane_name(&mi->name, &mi->name, &mi->email);
 }
 
+static void unquote_quoted_string(struct strbuf *line)
+{
+	const char *in = strbuf_detach(line, NULL);
+	int c, take_next_literally = 0;
+	int found_error = 0;
+
+	/*
+	 * Stores the character that started the escape mode so that we know what
+	 * character will stop it
+	 */
+	char escape_context = 0;
+
+	while ((c = *in++) != 0) {
+		if (take_next_literally) {
+			take_next_literally = 0;
+		} else {
+			switch (c) {
+			case '"':
+				if (!escape_context)
+					escape_context = '"';
+				else if (escape_context == '"')
+					escape_context = 0;
+				continue;
+			case '\\':
+				if (escape_context) {
+					take_next_literally = 1;
+					continue;
+				}
+				break;
+			case '(':
+				if (!escape_context)
+					escape_context = '(';
+				break;
+			case ')':
+				if (escape_context == '(')
+					escape_context = 0;
+				break;
+			}
+		}
+
+		strbuf_addch(line, c);
+	}
+}
+
 static void handle_from(struct mailinfo *mi, const struct strbuf *from)
 {
 	char *at;
@@ -63,6 +107,8 @@ static void handle_from(struct mailinfo *mi, const struct strbuf *from)
 	strbuf_init(&f, from->len);
 	strbuf_addbuf(&f, from);
 
+	unquote_quoted_string(&f);
+
 	at = strchr(f.buf, '@');
 	if (!at) {
 		parse_bogus_from(mi, from);
diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh
index 27bf3b8..8c21434 100755
--- a/t/t5100-mailinfo.sh
+++ b/t/t5100-mailinfo.sh
@@ -144,4 +144,18 @@ test_expect_success 'mailinfo unescapes with --mboxrd' '
 	test_cmp expect mboxrd/msg
 '
 
+test_expect_success 'mailinfo handles rfc2822 quoted-string' '
+	mkdir quoted-string &&
+	git mailinfo /dev/null /dev/null <"$DATA"/quoted-string.in \
+		>quoted-string/info &&
+	test_cmp "$DATA"/quoted-string.expect quoted-string/info
+'
+
+test_expect_success 'mailinfo handles rfc2822 comment' '
+	mkdir comment &&
+	git mailinfo /dev/null /dev/null <"$DATA"/comment.in \
+		>comment/info &&
+	test_cmp "$DATA"/comment.expect comment/info
+'
+
 test_done
diff --git a/t/t5100/comment.expect b/t/t5100/comment.expect
new file mode 100644
index 0000000..1197e76
--- /dev/null
+++ b/t/t5100/comment.expect
@@ -0,0 +1,5 @@
+Author: A U Thor (this is a comment (really))
+Email: somebody@example.com
+Subject: testing comments
+Date: Sun, 25 May 2008 00:38:18 -0700
+
diff --git a/t/t5100/comment.in b/t/t5100/comment.in
new file mode 100644
index 0000000..430ba97
--- /dev/null
+++ b/t/t5100/comment.in
@@ -0,0 +1,9 @@
+From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
+From: "A U Thor" <somebody@example.com> (this is a comment \(really\))
+Date: Sun, 25 May 2008 00:38:18 -0700
+Subject: [PATCH] testing comments
+
+
+
+---
+patch
diff --git a/t/t5100/quoted-string.expect b/t/t5100/quoted-string.expect
new file mode 100644
index 0000000..cab1bce
--- /dev/null
+++ b/t/t5100/quoted-string.expect
@@ -0,0 +1,5 @@
+Author: Author "The Author" Name
+Email: somebody@example.com
+Subject: testing quoted-pair
+Date: Sun, 25 May 2008 00:38:18 -0700
+
diff --git a/t/t5100/quoted-string.in b/t/t5100/quoted-string.in
new file mode 100644
index 0000000..e2e627a
--- /dev/null
+++ b/t/t5100/quoted-string.in
@@ -0,0 +1,9 @@
+From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
+From: "Author \"The Author\" Name" <somebody@example.com>
+Date: Sun, 25 May 2008 00:38:18 -0700
+Subject: [PATCH] testing quoted-pair
+
+
+
+---
+patch
-- 
2.10.0.86.g6ffa4f1.dirty


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 1/2] t5100-mailinfo: replace common path prefix with variable
  2016-09-19 18:54 ` [PATCH v2 1/2] t5100-mailinfo: replace common path prefix with variable Kevin Daudt
@ 2016-09-19 21:16   ` Junio C Hamano
  2016-09-20  3:59     ` Jeff King
  0 siblings, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2016-09-19 21:16 UTC (permalink / raw)
  To: Kevin Daudt; +Cc: git, Swift Geek, Jeff King

Kevin Daudt <me@ikke.info> writes:

> Many tests need to store data in a file, and repeat the same pattern to
> refer to that path:
>
>     "$TEST_DATA"/t5100/

That obviously is a typo of

	"$TEST_DIRECTORY/t5100"

It is a good change, even though I would have chosen a name
that is a bit more descriptive than "$DATA".

>  test_expect_success 'split sample box' \
> -	'git mailsplit -o. "$TEST_DIRECTORY"/t5100/sample.mbox >last &&
> +	'git mailsplit -o. "$DATA"/sample.mbox >last &&

You are just following the pattern, and this instance is not too
bad, but lines like these

> -	test_cmp "$TEST_DIRECTORY"/t5100/msg$mo msg$mo &&
> -	test_cmp "$TEST_DIRECTORY"/t5100/patch$mo patch$mo &&
> -	test_cmp "$TEST_DIRECTORY"/t5100/info$mo info$mo
> +	test_cmp "$DATA"/msg$mo msg$mo &&
> +	test_cmp "$DATA"/patch$mo patch$mo &&
> +	test_cmp "$DATA"/info$mo info$mo

make me wonder why we don't quote the whole thing, i.e.

	test_cmp "$TEST_DATA/info$mo" "info$mo"

as leaving $mo part unquoted forces reader to wonder if it is our
deliberate attempt to allow shell $IFS in $mo and have the argument
split when that happens, which can be avoided if we quoted more
explicitly.

Perhaps we'd leave that as a low-hanging fruit for future people.

Thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 2/2] mailinfo: unescape quoted-pair in header fields
  2016-09-19 18:54 ` [PATCH v2 2/2] mailinfo: unescape quoted-pair in header fields Kevin Daudt
@ 2016-09-19 21:24   ` Junio C Hamano
  2016-09-19 22:04     ` Junio C Hamano
  2016-09-20  4:28   ` Jeff King
  2016-09-21 11:09   ` Jeff King
  2 siblings, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2016-09-19 21:24 UTC (permalink / raw)
  To: Kevin Daudt; +Cc: git, Swift Geek, Jeff King

Kevin Daudt <me@ikke.info> writes:

> +static void unquote_quoted_string(struct strbuf *line)
> +{
> +	const char *in = strbuf_detach(line, NULL);
> +	int c, take_next_literally = 0;
> +	int found_error = 0;
> +
> +	/*
> +	 * Stores the character that started the escape mode so that we know what
> +	 * character will stop it
> +	 */
> +	char escape_context = 0;
> +
> +	while ((c = *in++) != 0) {
> +		if (take_next_literally) {
> +			take_next_literally = 0;
> +		} else {
> +			switch (c) {
> +			case '"':
> +				if (!escape_context)
> +					escape_context = '"';
> +				else if (escape_context == '"')
> +					escape_context = 0;
> +				continue;
> +			case '\\':
> +				if (escape_context) {
> +					take_next_literally = 1;
> +					continue;
> +				}
> +				break;
> +			case '(':
> +				if (!escape_context)
> +					escape_context = '(';
> +				break;
> +			case ')':
> +				if (escape_context == '(')
> +					escape_context = 0;
> +				break;
> +			}
> +		}
> +
> +		strbuf_addch(line, c);
> +	}
> +}

The additional comment makes it very clear what is going on.

Is it an event unusual enough that is worth warning() about if we
have either take_next_literally or escape_context set to non-NUL
upon leaving the loop, I wonder?

Will queue.  Thanks.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 2/2] mailinfo: unescape quoted-pair in header fields
  2016-09-19 21:24   ` Junio C Hamano
@ 2016-09-19 22:04     ` Junio C Hamano
  0 siblings, 0 replies; 32+ messages in thread
From: Junio C Hamano @ 2016-09-19 22:04 UTC (permalink / raw)
  To: Kevin Daudt; +Cc: git, Swift Geek, Jeff King

Junio C Hamano <gitster@pobox.com> writes:

> Kevin Daudt <me@ikke.info> writes:
>
>> +static void unquote_quoted_string(struct strbuf *line)
>> +{
>> +	const char *in = strbuf_detach(line, NULL);
>> +	int c, take_next_literally = 0;
>> +	int found_error = 0;
> ...
>> +	}
>> +}
>
> The additional comment makes it very clear what is going on.
>
> Is it an event unusual enough that is worth warning() about if we
> have either take_next_literally or escape_context set to non-NUL
> upon leaving the loop, I wonder?
>
> Will queue.  Thanks.

It turns out that found_error is not used anywhere and tripped the
-Werror=unused-variable check.  I've removed that line while
queuing.

Thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mailinfo: unescape quoted-pair in header fields
  2016-09-19 10:51   ` Kevin Daudt
@ 2016-09-20  3:57     ` Jeff King
  2016-09-21 16:07       ` Junio C Hamano
  0 siblings, 1 reply; 32+ messages in thread
From: Jeff King @ 2016-09-20  3:57 UTC (permalink / raw)
  To: Kevin Daudt; +Cc: git, Swift Geek, Junio C Hamano

On Mon, Sep 19, 2016 at 12:51:33PM +0200, Kevin Daudt wrote:

> > I didn't look in the RFC. Is:
> > 
> >   From: my \"name\" <foo@example.com>
> > 
> > really the same as:
> > 
> >   From: "my \\\"name\\\"" <foo@example.com>
> > 
> > ? That seems weird, but I think it may be that the former is simply
> > bogus (you are not supposed to use backslashes outside of the quoted
> > section at all).
> 
> Correct, the quoted-pair (escape sequence) can only occur in a quoted
> string or a comment. Even more so, the display name *needs* to be quoted
> when consisting of more then one word according to the RFC.

Hmm. So, I guess a follow-up question is: what would it be OK to do if
we see a quoted-pair outside of quotes? If the top one above violates
the RFC, it seems like stripping the backslashes would be a reasonable
outcome.

So if that's the case, do we actually need to care if we see any
parenthesized comments? I think we should just leave comments in place
either way, so syntactically they are only interesting insofar as we
replace quoted pairs or not.

IOW, I wonder if:

  while ((c = *in++)) {
	switch (c) {
	case '\\':
		if (!*in)
			return 0; /* ignore trailing backslash */
		/* quoted pair */
		strbuf_addch(out, *in++);
		break;
	case '"':
		/*
		 * This may be starting or ending a quoted section,
		 * but we do not care whether we are in such a section.
		 * We _do_ need to remove the quotes, though, as they
		 * are syntactic.
		 */
		break;
	default:
		/*
		 * Anything else is a normal character we keep. These
		 * _might_ be violating the RFC if they are magic
		 * characters outside of a quoted section, but we'd
		 * rather be liberal and pass them through.
		 */
		strbuf_addch(out, c);
		break;
	}
  }

would work. I certainly do not mind following the RFC more closely, but
AFAICT the very simple code above gives a pretty forgiving outcome.

> > This is obviously getting pretty silly, but if we are going to follow
> > the RFC, I think you actually have to do a recursive parse, and keep
> > track of an arbitrary depth of context.
> > 
> > I dunno. This method probably covers most cases in practice, and it's
> > easy to reason about.
> 
> The problem is, how do you differentiate between nested comments, and
> escaped braces within a comment after one run?

I'm not sure what you mean. Escaped characters are always handled first
in your loop. Can you give an example (although if you agree with what I
wrote above, it may not be worth discussing further)?

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 1/2] t5100-mailinfo: replace common path prefix with variable
  2016-09-19 21:16   ` Junio C Hamano
@ 2016-09-20  3:59     ` Jeff King
  0 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2016-09-20  3:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kevin Daudt, git, Swift Geek

On Mon, Sep 19, 2016 at 02:16:23PM -0700, Junio C Hamano wrote:

> Kevin Daudt <me@ikke.info> writes:
> 
> > Many tests need to store data in a file, and repeat the same pattern to
> > refer to that path:
> >
> >     "$TEST_DATA"/t5100/
> 
> That obviously is a typo of
> 
> 	"$TEST_DIRECTORY/t5100"
> 
> It is a good change, even though I would have chosen a name
> that is a bit more descriptive than "$DATA".

The name "$DATA" was my suggestion. I was shooting for something short
since this is used a lot and is really a script-local variable (I'd have
kept it lowercase to indicate that, but maybe that is just me).
Something like "$root" would also work. I dunno.

> > -	test_cmp "$TEST_DIRECTORY"/t5100/msg$mo msg$mo &&
> > -	test_cmp "$TEST_DIRECTORY"/t5100/patch$mo patch$mo &&
> > -	test_cmp "$TEST_DIRECTORY"/t5100/info$mo info$mo
> > +	test_cmp "$DATA"/msg$mo msg$mo &&
> > +	test_cmp "$DATA"/patch$mo patch$mo &&
> > +	test_cmp "$DATA"/info$mo info$mo
> 
> make me wonder why we don't quote the whole thing, i.e.
> 
> 	test_cmp "$TEST_DATA/info$mo" "info$mo"
> 
> as leaving $mo part unquoted forces reader to wonder if it is our
> deliberate attempt to allow shell $IFS in $mo and have the argument
> split when that happens, which can be avoided if we quoted more
> explicitly.
> 
> Perhaps we'd leave that as a low-hanging fruit for future people.

Yeah, I agree that quoting the whole thing makes it more obvious (though
I guess quoting the second info$mo does add two characters).

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 2/2] mailinfo: unescape quoted-pair in header fields
  2016-09-19 18:54 ` [PATCH v2 2/2] mailinfo: unescape quoted-pair in header fields Kevin Daudt
  2016-09-19 21:24   ` Junio C Hamano
@ 2016-09-20  4:28   ` Jeff King
  2016-09-21 11:09   ` Jeff King
  2 siblings, 0 replies; 32+ messages in thread
From: Jeff King @ 2016-09-20  4:28 UTC (permalink / raw)
  To: Kevin Daudt; +Cc: git, Swift Geek, Junio C Hamano

On Mon, Sep 19, 2016 at 08:54:40PM +0200, Kevin Daudt wrote:

> diff --git a/t/t5100/comment.expect b/t/t5100/comment.expect
> new file mode 100644
> index 0000000..1197e76
> --- /dev/null
> +++ b/t/t5100/comment.expect
> @@ -0,0 +1,5 @@
> +Author: A U Thor (this is a comment (really))

Hmm. I don't see any recursion in your parsing, so after the first ")"
our escape_context would be 0 again, right? So a more tricky test is:

  Author: A U Thor (this is a comment (really) with \(quoted\) pairs)

We are still inside "ctext" when we hit those quoted pairs, and they
should be unquoted, but your code would not do so (unless we go the
route of simply unquoting pairs everywhere).

I think your parser would have to follow the BNF more closely with a
recursive descent parser, like:

  const char *parse_comment(const char *in, struct strbuf *out)
  {
        size_t orig_out = out->len;

        if ((in = parse_char('(', in, out))) &&
            (in = parse_ccontent(in, out)) &&
            (in = parse_char(')', in, out))))
                return in;

        strbuf_setlen(out, orig_out);
        return NULL;
  }

  const char *parse_ccontent(const char *in, struct strbuf *out)
  {
        while (*in && *in != ')') {
                const char *next;

                if ((next = parse_quoted_pair(in, out)) ||
                    (next = parse_comment(in, out)) ||
                    (next = parse_ctext(in, out))) {
                        in = next;
                        continue;
                }
        }

	/*
	 * if "in" is NUL here we have an unclosed comment; but we'll
	 * just silently ignore and accept it
	 */
	return in;
  }

  const char *parse_char(char c, const char *in, struct strbuf *out)
  {
        if (*in != c)
                return NULL;
        strbuf_addch(out, c);
        return in + 1;
  }

You can probably guess at the implementation of parse_quoted_pair(),
parse_ctext(), etc (and naturally, the above is completely untested and
probably has some bugs in it).

In a former life (back when it was still rfc822!) I remember
implementing a similar parser, which I think was in turn based on the
cclient code in pine. It's not _too_ hard to get it all right based on
the BNF in the RFC, but as you can see it's a bit tedious. And I'm not
convinced we actually need it to be completely right for our purposes.
We really are looking for a single address, with the email in "<>" and
the name as everything before that, but de-quoted.

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 2/2] mailinfo: unescape quoted-pair in header fields
  2016-09-19 18:54 ` [PATCH v2 2/2] mailinfo: unescape quoted-pair in header fields Kevin Daudt
  2016-09-19 21:24   ` Junio C Hamano
  2016-09-20  4:28   ` Jeff King
@ 2016-09-21 11:09   ` Jeff King
  2016-09-22 22:17     ` Junio C Hamano
  2 siblings, 1 reply; 32+ messages in thread
From: Jeff King @ 2016-09-21 11:09 UTC (permalink / raw)
  To: Kevin Daudt; +Cc: git, Swift Geek, Junio C Hamano

On Mon, Sep 19, 2016 at 08:54:40PM +0200, Kevin Daudt wrote:

> diff --git a/mailinfo.c b/mailinfo.c
> index e19abe3..6a7c2f2 100644
> --- a/mailinfo.c
> +++ b/mailinfo.c
> @@ -54,6 +54,50 @@ static void parse_bogus_from(struct mailinfo *mi, const struct strbuf *line)
>  	get_sane_name(&mi->name, &mi->name, &mi->email);
>  }
>  
> +static void unquote_quoted_string(struct strbuf *line)
> +{
> +	const char *in = strbuf_detach(line, NULL);

I see that this version uses the "detach, and then write into the
replacement" approach, which is good. But...

> +	int c, take_next_literally = 0;
> +	int found_error = 0;
> +
> +	/*
> +	 * Stores the character that started the escape mode so that we know what
> +	 * character will stop it
> +	 */
> +	char escape_context = 0;
> +
> +	while ((c = *in++) != 0) {
> +		if (take_next_literally) {
> +			take_next_literally = 0;
> +		} else {
> [...]
> +		}
> +
> +		strbuf_addch(line, c);
> +	}
> +}

It needs to `free(in)` at the end of the function.

Your original also fed "line->len" as a hint, but I doubt it really
matters in practice, so I don't mind losing that.

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] mailinfo: unescape quoted-pair in header fields
  2016-09-20  3:57     ` Jeff King
@ 2016-09-21 16:07       ` Junio C Hamano
  0 siblings, 0 replies; 32+ messages in thread
From: Junio C Hamano @ 2016-09-21 16:07 UTC (permalink / raw)
  To: Jeff King; +Cc: Kevin Daudt, git, Swift Geek

Jeff King <peff@peff.net> writes:

> So if that's the case, do we actually need to care if we see any
> parenthesized comments? I think we should just leave comments in place
> either way, so syntactically they are only interesting insofar as we
> replace quoted pairs or not.
>
> IOW, I wonder if:
>
>   while ((c = *in++)) {
> 	switch (c) {
> 	case '\\':
> 		if (!*in)
> 			return 0; /* ignore trailing backslash */
> 		/* quoted pair */
> 		strbuf_addch(out, *in++);
> 		break;
> 	case '"':
> 		/*
> 		 * This may be starting or ending a quoted section,
> 		 * but we do not care whether we are in such a section.
> 		 * We _do_ need to remove the quotes, though, as they
> 		 * are syntactic.
> 		 */
> 		break;
> 	default:
> 		/*
> 		 * Anything else is a normal character we keep. These
> 		 * _might_ be violating the RFC if they are magic
> 		 * characters outside of a quoted section, but we'd
> 		 * rather be liberal and pass them through.
> 		 */
> 		strbuf_addch(out, c);
> 		break;
> 	}
>   }
>
> would work. I certainly do not mind following the RFC more closely, but
> AFAICT the very simple code above gives a pretty forgiving outcome.

The simplicity of the code does look attractive to me.  I do not
offhand see an obvious case/flaw that this simplified rule would
mangle a valid human-readable part.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 2/2] mailinfo: unescape quoted-pair in header fields
  2016-09-21 11:09   ` Jeff King
@ 2016-09-22 22:17     ` Junio C Hamano
  2016-09-23  4:15       ` Jeff King
  0 siblings, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2016-09-22 22:17 UTC (permalink / raw)
  To: Jeff King; +Cc: Kevin Daudt, git, Swift Geek

Jeff King <peff@peff.net> writes:

> On Mon, Sep 19, 2016 at 08:54:40PM +0200, Kevin Daudt wrote:
>
>> + ...
>> +	while ((c = *in++) != 0) {
>> +		if (take_next_literally) {
>> +			take_next_literally = 0;
>> +		} else {
>> [...]
>> +		}
>> +
>> +		strbuf_addch(line, c);
>> +	}
>> +}
>
> It needs to `free(in)` at the end of the function.

Ehh, in has been incremented and is pointing at the terminating NUL
there, so it would be more like

	char *to_free, *in;

        to_free = strbuf_detach(line, NULL);
        in = to_free;
	...
        while ((c = *in++)) {
        	...
	}
        free(to_free);

I would think ;-).

        

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 2/2] mailinfo: unescape quoted-pair in header fields
  2016-09-22 22:17     ` Junio C Hamano
@ 2016-09-23  4:15       ` Jeff King
  2016-09-25 20:17         ` Kevin Daudt
  0 siblings, 1 reply; 32+ messages in thread
From: Jeff King @ 2016-09-23  4:15 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Kevin Daudt, git, Swift Geek

On Thu, Sep 22, 2016 at 03:17:23PM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > On Mon, Sep 19, 2016 at 08:54:40PM +0200, Kevin Daudt wrote:
> >
> >> + ...
> >> +	while ((c = *in++) != 0) {
> >> +		if (take_next_literally) {
> >> +			take_next_literally = 0;
> >> +		} else {
> >> [...]
> >> +		}
> >> +
> >> +		strbuf_addch(line, c);
> >> +	}
> >> +}
> >
> > It needs to `free(in)` at the end of the function.
> 
> Ehh, in has been incremented and is pointing at the terminating NUL
> there, so it would be more like
> 
> 	char *to_free, *in;
> 
>         to_free = strbuf_detach(line, NULL);
>         in = to_free;
> 	...
>         while ((c = *in++)) {
>         	...
> 	}
>         free(to_free);
> 
> I would think ;-).

Oops, yes. It is beginning to make the "strbuf_swap()" look less
convoluted. :)

-Peff

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 2/2] mailinfo: unescape quoted-pair in header fields
  2016-09-23  4:15       ` Jeff King
@ 2016-09-25 20:17         ` Kevin Daudt
  2016-09-25 22:38           ` Jakub Narębski
  0 siblings, 1 reply; 32+ messages in thread
From: Kevin Daudt @ 2016-09-25 20:17 UTC (permalink / raw)
  To: Jeff King; +Cc: Junio C Hamano, git, Swift Geek

On Fri, Sep 23, 2016 at 12:15:41AM -0400, Jeff King wrote:
> On Thu, Sep 22, 2016 at 03:17:23PM -0700, Junio C Hamano wrote:
> 
> > Jeff King <peff@peff.net> writes:
> > 
> > > On Mon, Sep 19, 2016 at 08:54:40PM +0200, Kevin Daudt wrote:
> > >
> > >> + ...
> > >> +	while ((c = *in++) != 0) {
> > >> +		if (take_next_literally) {
> > >> +			take_next_literally = 0;
> > >> +		} else {
> > >> [...]
> > >> +		}
> > >> +
> > >> +		strbuf_addch(line, c);
> > >> +	}
> > >> +}
> > >
> > > It needs to `free(in)` at the end of the function.
> > 
> > Ehh, in has been incremented and is pointing at the terminating NUL
> > there, so it would be more like
> > 
> > 	char *to_free, *in;
> > 
> >         to_free = strbuf_detach(line, NULL);
> >         in = to_free;
> > 	...
> >         while ((c = *in++)) {
> >         	...
> > 	}
> >         free(to_free);
> > 
> > I would think ;-).
> 
> Oops, yes. It is beginning to make the "strbuf_swap()" look less
> convoluted. :)
> 

I've switched to strbuf_swap now, much better. I've implemented
recursive parsing without looking at what you provided, just to see what
I'd came up with. Though I've not implemented a recursive descent
parser, but it might suffice.

I'm sending the patches now.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v3 1/2] t5100-mailinfo: replace common path prefix with variable
  2016-09-19 18:54 ` [PATCH v2 0/2] Handle escape characters in From field Kevin Daudt
@ 2016-09-25 21:08   ` Kevin Daudt
  2016-09-25 21:08     ` [PATCH v3 2/2] mailinfo: unescape quoted-pair in header fields Kevin Daudt
                       ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Kevin Daudt @ 2016-09-25 21:08 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Swift Geek, Jeff King, Kevin Daudt

Many tests need to store data in a file, and repeat the same pattern to
refer to that path:

    "$TEST_DIRECTORY"/t5100/

Create a variable that contains this path, and use that instead.

Signed-off-by: Kevin Daudt <me@ikke.info>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Changes since v2:
 - changed $DATA to $data to indicate it's a script-local variable

 t/t5100-mailinfo.sh | 56 +++++++++++++++++++++++++++--------------------------
 1 file changed, 29 insertions(+), 27 deletions(-)

diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh
index 1a5a546..c4ed0f4 100755
--- a/t/t5100-mailinfo.sh
+++ b/t/t5100-mailinfo.sh
@@ -7,8 +7,10 @@ test_description='git mailinfo and git mailsplit test'
 
 . ./test-lib.sh
 
+data="$TEST_DIRECTORY/t5100"
+
 test_expect_success 'split sample box' \
-	'git mailsplit -o. "$TEST_DIRECTORY"/t5100/sample.mbox >last &&
+	'git mailsplit -o. "$data"/sample.mbox >last &&
 	last=$(cat last) &&
 	echo total is $last &&
 	test $(cat last) = 17'
@@ -17,9 +19,9 @@ check_mailinfo () {
 	mail=$1 opt=$2
 	mo="$mail$opt"
 	git mailinfo -u $opt msg$mo patch$mo <$mail >info$mo &&
-	test_cmp "$TEST_DIRECTORY"/t5100/msg$mo msg$mo &&
-	test_cmp "$TEST_DIRECTORY"/t5100/patch$mo patch$mo &&
-	test_cmp "$TEST_DIRECTORY"/t5100/info$mo info$mo
+	test_cmp "$data"/msg$mo msg$mo &&
+	test_cmp "$data"/patch$mo patch$mo &&
+	test_cmp "$data"/info$mo info$mo
 }
 
 
@@ -27,15 +29,15 @@ for mail in 00*
 do
 	test_expect_success "mailinfo $mail" '
 		check_mailinfo $mail "" &&
-		if test -f "$TEST_DIRECTORY"/t5100/msg$mail--scissors
+		if test -f "$data"/msg$mail--scissors
 		then
 			check_mailinfo $mail --scissors
 		fi &&
-		if test -f "$TEST_DIRECTORY"/t5100/msg$mail--no-inbody-headers
+		if test -f "$data"/msg$mail--no-inbody-headers
 		then
 			check_mailinfo $mail --no-inbody-headers
 		fi &&
-		if test -f "$TEST_DIRECTORY"/t5100/msg$mail--message-id
+		if test -f "$data"/msg$mail--message-id
 		then
 			check_mailinfo $mail --message-id
 		fi
@@ -45,7 +47,7 @@ done
 
 test_expect_success 'split box with rfc2047 samples' \
 	'mkdir rfc2047 &&
-	git mailsplit -orfc2047 "$TEST_DIRECTORY"/t5100/rfc2047-samples.mbox \
+	git mailsplit -orfc2047 "$data"/rfc2047-samples.mbox \
 	  >rfc2047/last &&
 	last=$(cat rfc2047/last) &&
 	echo total is $last &&
@@ -56,18 +58,18 @@ do
 	test_expect_success "mailinfo $mail" '
 		git mailinfo -u $mail-msg $mail-patch <$mail >$mail-info &&
 		echo msg &&
-		test_cmp "$TEST_DIRECTORY"/t5100/empty $mail-msg &&
+		test_cmp "$data"/empty $mail-msg &&
 		echo patch &&
-		test_cmp "$TEST_DIRECTORY"/t5100/empty $mail-patch &&
+		test_cmp "$data"/empty $mail-patch &&
 		echo info &&
-		test_cmp "$TEST_DIRECTORY"/t5100/rfc2047-info-$(basename $mail) $mail-info
+		test_cmp "$data"/rfc2047-info-$(basename $mail) $mail-info
 	'
 done
 
 test_expect_success 'respect NULs' '
 
-	git mailsplit -d3 -o. "$TEST_DIRECTORY"/t5100/nul-plain &&
-	test_cmp "$TEST_DIRECTORY"/t5100/nul-plain 001 &&
+	git mailsplit -d3 -o. "$data"/nul-plain &&
+	test_cmp "$data"/nul-plain 001 &&
 	(cat 001 | git mailinfo msg patch) &&
 	test_line_count = 4 patch
 
@@ -75,52 +77,52 @@ test_expect_success 'respect NULs' '
 
 test_expect_success 'Preserve NULs out of MIME encoded message' '
 
-	git mailsplit -d5 -o. "$TEST_DIRECTORY"/t5100/nul-b64.in &&
-	test_cmp "$TEST_DIRECTORY"/t5100/nul-b64.in 00001 &&
+	git mailsplit -d5 -o. "$data"/nul-b64.in &&
+	test_cmp "$data"/nul-b64.in 00001 &&
 	git mailinfo msg patch <00001 &&
-	test_cmp "$TEST_DIRECTORY"/t5100/nul-b64.expect patch
+	test_cmp "$data"/nul-b64.expect patch
 
 '
 
 test_expect_success 'mailinfo on from header without name works' '
 
 	mkdir info-from &&
-	git mailsplit -oinfo-from "$TEST_DIRECTORY"/t5100/info-from.in &&
-	test_cmp "$TEST_DIRECTORY"/t5100/info-from.in info-from/0001 &&
+	git mailsplit -oinfo-from "$data"/info-from.in &&
+	test_cmp "$data"/info-from.in info-from/0001 &&
 	git mailinfo info-from/msg info-from/patch \
 	  <info-from/0001 >info-from/out &&
-	test_cmp "$TEST_DIRECTORY"/t5100/info-from.expect info-from/out
+	test_cmp "$data"/info-from.expect info-from/out
 
 '
 
 test_expect_success 'mailinfo finds headers after embedded From line' '
 	mkdir embed-from &&
-	git mailsplit -oembed-from "$TEST_DIRECTORY"/t5100/embed-from.in &&
-	test_cmp "$TEST_DIRECTORY"/t5100/embed-from.in embed-from/0001 &&
+	git mailsplit -oembed-from "$data"/embed-from.in &&
+	test_cmp "$data"/embed-from.in embed-from/0001 &&
 	git mailinfo embed-from/msg embed-from/patch \
 	  <embed-from/0001 >embed-from/out &&
-	test_cmp "$TEST_DIRECTORY"/t5100/embed-from.expect embed-from/out
+	test_cmp "$data"/embed-from.expect embed-from/out
 '
 
 test_expect_success 'mailinfo on message with quoted >From' '
 	mkdir quoted-from &&
-	git mailsplit -oquoted-from "$TEST_DIRECTORY"/t5100/quoted-from.in &&
-	test_cmp "$TEST_DIRECTORY"/t5100/quoted-from.in quoted-from/0001 &&
+	git mailsplit -oquoted-from "$data"/quoted-from.in &&
+	test_cmp "$data"/quoted-from.in quoted-from/0001 &&
 	git mailinfo quoted-from/msg quoted-from/patch \
 	  <quoted-from/0001 >quoted-from/out &&
-	test_cmp "$TEST_DIRECTORY"/t5100/quoted-from.expect quoted-from/msg
+	test_cmp "$data"/quoted-from.expect quoted-from/msg
 '
 
 test_expect_success 'mailinfo unescapes with --mboxrd' '
 	mkdir mboxrd &&
 	git mailsplit -omboxrd --mboxrd \
-		"$TEST_DIRECTORY"/t5100/sample.mboxrd >last &&
+		"$data"/sample.mboxrd >last &&
 	test x"$(cat last)" = x2 &&
 	for i in 0001 0002
 	do
 		git mailinfo mboxrd/msg mboxrd/patch \
 		  <mboxrd/$i >mboxrd/out &&
-		test_cmp "$TEST_DIRECTORY"/t5100/${i}mboxrd mboxrd/msg
+		test_cmp "$data"/${i}mboxrd mboxrd/msg
 	done &&
 	sp=" " &&
 	echo "From " >expect &&
-- 
2.10.0.89.ge802c3a.dirty


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v3 2/2] mailinfo: unescape quoted-pair in header fields
  2016-09-25 21:08   ` [PATCH v3 1/2] t5100-mailinfo: replace common path prefix with variable Kevin Daudt
@ 2016-09-25 21:08     ` Kevin Daudt
  2016-09-26 19:11       ` Junio C Hamano
  2016-09-26 19:06     ` [PATCH v3 1/2] t5100-mailinfo: replace common path prefix with variable Junio C Hamano
  2016-09-28 19:49     ` [PATCH v4 0/2] Handle RFC2822 quoted-pairs in From header Kevin Daudt
  2 siblings, 1 reply; 32+ messages in thread
From: Kevin Daudt @ 2016-09-25 21:08 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Swift Geek, Jeff King, Kevin Daudt

rfc2822 has provisions for quoted strings and comments in structured header
fields, but also allows for escaping these with so-called quoted-pairs.

The only thing git currently does is removing exterior quotes, but
quotes within are left alone.

Remove exterior quotes and remove escape characters so that they don't
show up in the author field.

Signed-off-by: Kevin Daudt <me@ikke.info>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 Changes since v2:

 - handle comments inside comments recursively
 - renamed the main function to unquote_quoted_pairs because it also
   handles quoted pairs in comments


 mailinfo.c                   | 82 ++++++++++++++++++++++++++++++++++++++++++++
 t/t5100-mailinfo.sh          | 14 ++++++++
 t/t5100/comment.expect       |  5 +++
 t/t5100/comment.in           |  9 +++++
 t/t5100/quoted-string.expect |  5 +++
 t/t5100/quoted-string.in     |  9 +++++
 6 files changed, 124 insertions(+)
 create mode 100644 t/t5100/comment.expect
 create mode 100644 t/t5100/comment.in
 create mode 100644 t/t5100/quoted-string.expect
 create mode 100644 t/t5100/quoted-string.in

diff --git a/mailinfo.c b/mailinfo.c
index e19abe3..b4118a0 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -54,6 +54,86 @@ static void parse_bogus_from(struct mailinfo *mi, const struct strbuf *line)
 	get_sane_name(&mi->name, &mi->name, &mi->email);
 }
 
+static const char *unquote_comment(struct strbuf *outbuf, const char *in)
+{
+	int c;
+	int take_next_litterally = 0;
+
+	strbuf_addch(outbuf, '(');
+
+	while ((c = *in++) != 0) {
+		if (take_next_litterally == 1) {
+			take_next_litterally = 0;
+		} else {
+			switch (c) {
+			case '\\':
+				take_next_litterally = 1;
+				continue;
+			case '(':
+				in = unquote_comment(outbuf, in);
+				continue;
+			case ')':
+				strbuf_addch(outbuf, ')');
+				return in;
+			}
+		}
+
+		strbuf_addch(outbuf, c);
+	}
+
+	return in;
+}
+
+static const char *unquote_quoted_string(struct strbuf *outbuf, const char *in)
+{
+	int c;
+	int take_next_litterally = 0;
+
+	while ((c = *in++) != 0) {
+		if (take_next_litterally == 1) {
+			take_next_litterally = 0;
+		} else {
+			switch (c) {
+			case '\\':
+				take_next_litterally = 1;
+				continue;
+			case '"':
+				return in;
+			}
+		}
+
+		strbuf_addch(outbuf, c);
+	}
+
+	return in;
+}
+
+static void unquote_quoted_pair(struct strbuf *line)
+{
+	struct strbuf outbuf;
+	const char *in = line->buf;
+	int c;
+
+	strbuf_init(&outbuf, line->len);
+
+	while ((c = *in++) != 0) {
+		switch (c) {
+		case '"':
+			in = unquote_quoted_string(&outbuf, in);
+			continue;
+		case '(':
+			in = unquote_comment(&outbuf, in);
+			continue;
+		}
+
+		strbuf_addch(&outbuf, c);
+	}
+
+	strbuf_swap(&outbuf, line);
+	strbuf_release(&outbuf);
+
+}
+
 static void handle_from(struct mailinfo *mi, const struct strbuf *from)
 {
 	char *at;
@@ -63,6 +143,8 @@ static void handle_from(struct mailinfo *mi, const struct strbuf *from)
 	strbuf_init(&f, from->len);
 	strbuf_addbuf(&f, from);
 
+	unquote_quoted_pair(&f);
+
 	at = strchr(f.buf, '@');
 	if (!at) {
 		parse_bogus_from(mi, from);
diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh
index c4ed0f4..3e983c0 100755
--- a/t/t5100-mailinfo.sh
+++ b/t/t5100-mailinfo.sh
@@ -144,4 +144,18 @@ test_expect_success 'mailinfo unescapes with --mboxrd' '
 	test_cmp expect mboxrd/msg
 '
 
+test_expect_success 'mailinfo handles rfc2822 quoted-string' '
+	mkdir quoted-string &&
+	git mailinfo /dev/null /dev/null <"$DATA"/quoted-string.in \
+		>quoted-string/info &&
+	test_cmp "$DATA"/quoted-string.expect quoted-string/info
+'
+
+test_expect_success 'mailinfo handles rfc2822 comment' '
+	mkdir comment &&
+	git mailinfo /dev/null /dev/null <"$DATA"/comment.in \
+		>comment/info &&
+	test_cmp "$DATA"/comment.expect comment/info
+'
+
 test_done
diff --git a/t/t5100/comment.expect b/t/t5100/comment.expect
new file mode 100644
index 0000000..7228177
--- /dev/null
+++ b/t/t5100/comment.expect
@@ -0,0 +1,5 @@
+Author: A U Thor (this is (really) a comment (honestly))
+Email: somebody@example.com
+Subject: testing comments
+Date: Sun, 25 May 2008 00:38:18 -0700
+
diff --git a/t/t5100/comment.in b/t/t5100/comment.in
new file mode 100644
index 0000000..c53a192
--- /dev/null
+++ b/t/t5100/comment.in
@@ -0,0 +1,9 @@
+From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
+From: "A U Thor" <somebody@example.com> (this is \(really\) a comment (honestly))
+Date: Sun, 25 May 2008 00:38:18 -0700
+Subject: [PATCH] testing comments
+
+
+
+---
+patch
diff --git a/t/t5100/quoted-string.expect b/t/t5100/quoted-string.expect
new file mode 100644
index 0000000..cab1bce
--- /dev/null
+++ b/t/t5100/quoted-string.expect
@@ -0,0 +1,5 @@
+Author: Author "The Author" Name
+Email: somebody@example.com
+Subject: testing quoted-pair
+Date: Sun, 25 May 2008 00:38:18 -0700
+
diff --git a/t/t5100/quoted-string.in b/t/t5100/quoted-string.in
new file mode 100644
index 0000000..e2e627a
--- /dev/null
+++ b/t/t5100/quoted-string.in
@@ -0,0 +1,9 @@
+From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
+From: "Author \"The Author\" Name" <somebody@example.com>
+Date: Sun, 25 May 2008 00:38:18 -0700
+Subject: [PATCH] testing quoted-pair
+
+
+
+---
+patch
-- 
2.10.0.89.ge802c3a.dirty


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 2/2] mailinfo: unescape quoted-pair in header fields
  2016-09-25 20:17         ` Kevin Daudt
@ 2016-09-25 22:38           ` Jakub Narębski
  2016-09-26  5:02             ` Kevin Daudt
  0 siblings, 1 reply; 32+ messages in thread
From: Jakub Narębski @ 2016-09-25 22:38 UTC (permalink / raw)
  To: Kevin Daudt, Jeff King; +Cc: Junio C Hamano, git, Swift Geek

W dniu 25.09.2016 o 22:17, Kevin Daudt pisze:
> On Fri, Sep 23, 2016 at 12:15:41AM -0400, Jeff King wrote:

>> Oops, yes. It is beginning to make the "strbuf_swap()" look less
>> convoluted. :)
>>
> 
> I've switched to strbuf_swap now, much better. I've implemented
> recursive parsing without looking at what you provided, just to see what
> I'd came up with. Though I've not implemented a recursive descent
> parser, but it might suffice.

I think you can implement a parser handling proper nesting of parens
without recursion.

Though... what is the definition in the RFC?
-- 
Jakub Narębski


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 2/2] mailinfo: unescape quoted-pair in header fields
  2016-09-25 22:38           ` Jakub Narębski
@ 2016-09-26  5:02             ` Kevin Daudt
  0 siblings, 0 replies; 32+ messages in thread
From: Kevin Daudt @ 2016-09-26  5:02 UTC (permalink / raw)
  To: Jakub Narębski; +Cc: Jeff King, Junio C Hamano, git, Swift Geek

On Mon, Sep 26, 2016 at 12:38:42AM +0200, Jakub Narębski wrote:
> W dniu 25.09.2016 o 22:17, Kevin Daudt pisze:
> > On Fri, Sep 23, 2016 at 12:15:41AM -0400, Jeff King wrote:
> 
> >> Oops, yes. It is beginning to make the "strbuf_swap()" look less
> >> convoluted. :)
> >>
> > 
> > I've switched to strbuf_swap now, much better. I've implemented
> > recursive parsing without looking at what you provided, just to see what
> > I'd came up with. Though I've not implemented a recursive descent
> > parser, but it might suffice.
> 
> I think you can implement a parser handling proper nesting of parens
> without recursion.
> 
> Though... what is the definition in the RFC?

This part describes comments.

    ccontent        =       ctext / quoted-pair / comment

    comment         =       "(" *([FWS] ccontent) [FWS] ")"

    CFWS            =       *([FWS] comment) (([FWS] comment) / FWS)

So each comment can itself also contain a comment.

This could be done without recursion by keeping a count of how many open
parens we have encountered.

Kevin

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 1/2] t5100-mailinfo: replace common path prefix with variable
  2016-09-25 21:08   ` [PATCH v3 1/2] t5100-mailinfo: replace common path prefix with variable Kevin Daudt
  2016-09-25 21:08     ` [PATCH v3 2/2] mailinfo: unescape quoted-pair in header fields Kevin Daudt
@ 2016-09-26 19:06     ` Junio C Hamano
  2016-09-28 19:49     ` [PATCH v4 0/2] Handle RFC2822 quoted-pairs in From header Kevin Daudt
  2 siblings, 0 replies; 32+ messages in thread
From: Junio C Hamano @ 2016-09-26 19:06 UTC (permalink / raw)
  To: Kevin Daudt; +Cc: git, Swift Geek, Jeff King

Kevin Daudt <me@ikke.info> writes:

> Many tests need to store data in a file, and repeat the same pattern to
> refer to that path:
>
>     "$TEST_DIRECTORY"/t5100/
>
> Create a variable that contains this path, and use that instead.
>
> Signed-off-by: Kevin Daudt <me@ikke.info>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  Changes since v2:
>  - changed $DATA to $data to indicate it's a script-local variable

If you are rerolling anyway, I would have liked to see the "why is
only the variable part quoted?"  issue addressed which was raised
during the previous round of the review.  I may have said it is OK
to leave it as a low-hanging fruit for others but that only meant
that it alone is not a strong enough reason to reroll this patch.

Other than that, looks good to me, though ;-)


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 2/2] mailinfo: unescape quoted-pair in header fields
  2016-09-25 21:08     ` [PATCH v3 2/2] mailinfo: unescape quoted-pair in header fields Kevin Daudt
@ 2016-09-26 19:11       ` Junio C Hamano
  2016-09-26 19:26         ` Junio C Hamano
  0 siblings, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2016-09-26 19:11 UTC (permalink / raw)
  To: Kevin Daudt; +Cc: git, Swift Geek, Jeff King

Kevin Daudt <me@ikke.info> writes:

> rfc2822 has provisions for quoted strings and comments in structured header
> fields, but also allows for escaping these with so-called quoted-pairs.
>
> The only thing git currently does is removing exterior quotes, but
> quotes within are left alone.
>
> Remove exterior quotes and remove escape characters so that they don't
> show up in the author field.
>
> Signed-off-by: Kevin Daudt <me@ikke.info>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>  Changes since v2:
>
>  - handle comments inside comments recursively
>  - renamed the main function to unquote_quoted_pairs because it also
>    handles quoted pairs in comments

Sounds good, and the implemention looked straight-forward from a
quick scan.

> diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh
> index c4ed0f4..3e983c0 100755
> --- a/t/t5100-mailinfo.sh
> +++ b/t/t5100-mailinfo.sh
> @@ -144,4 +144,18 @@ test_expect_success 'mailinfo unescapes with --mboxrd' '
>  	test_cmp expect mboxrd/msg
>  '
>  
> +test_expect_success 'mailinfo handles rfc2822 quoted-string' '
> +	mkdir quoted-string &&
> +	git mailinfo /dev/null /dev/null <"$DATA"/quoted-string.in \
> +		>quoted-string/info &&
> +	test_cmp "$DATA"/quoted-string.expect quoted-string/info
> +'
> +
> +test_expect_success 'mailinfo handles rfc2822 comment' '
> +	mkdir comment &&
> +	git mailinfo /dev/null /dev/null <"$DATA"/comment.in \
> +		>comment/info &&
> +	test_cmp "$DATA"/comment.expect comment/info
> +'
> +
>  test_done

Don't these also need to be downcased if you prefer $data over
$DATA, though?

Thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 2/2] mailinfo: unescape quoted-pair in header fields
  2016-09-26 19:11       ` Junio C Hamano
@ 2016-09-26 19:26         ` Junio C Hamano
  2016-09-26 19:44           ` Kevin Daudt
  0 siblings, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2016-09-26 19:26 UTC (permalink / raw)
  To: Kevin Daudt; +Cc: git, Swift Geek, Jeff King

Junio C Hamano <gitster@pobox.com> writes:

> Don't these also need to be downcased if you prefer $data over
> $DATA, though?

For now, I'll queue a SQUASH??? that reverts s/DATA/data/ you did to
1/2 between your 1/2 and 2/2.

Thanks.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 2/2] mailinfo: unescape quoted-pair in header fields
  2016-09-26 19:26         ` Junio C Hamano
@ 2016-09-26 19:44           ` Kevin Daudt
  2016-09-26 22:23             ` Junio C Hamano
  0 siblings, 1 reply; 32+ messages in thread
From: Kevin Daudt @ 2016-09-26 19:44 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Swift Geek, Jeff King

On Mon, Sep 26, 2016 at 12:26:13PM -0700, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > Don't these also need to be downcased if you prefer $data over
> > $DATA, though?
> 
> For now, I'll queue a SQUASH??? that reverts s/DATA/data/ you did to
> 1/2 between your 1/2 and 2/2.
> 

Ugh, thanks. I'd replaced it in the first patch, but forgot it in the
second.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 2/2] mailinfo: unescape quoted-pair in header fields
  2016-09-26 19:44           ` Kevin Daudt
@ 2016-09-26 22:23             ` Junio C Hamano
  2016-09-27 10:26               ` Kevin Daudt
  0 siblings, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2016-09-26 22:23 UTC (permalink / raw)
  To: Kevin Daudt; +Cc: git, Swift Geek, Jeff King

Kevin Daudt <me@ikke.info> writes:

> On Mon, Sep 26, 2016 at 12:26:13PM -0700, Junio C Hamano wrote:
>> Junio C Hamano <gitster@pobox.com> writes:
>> 
>> > Don't these also need to be downcased if you prefer $data over
>> > $DATA, though?
>> 
>> For now, I'll queue a SQUASH??? that reverts s/DATA/data/ you did to
>> 1/2 between your 1/2 and 2/2.
>
> Ugh, thanks. I'd replaced it in the first patch, but forgot it in the
> second.

Heh, I already guessed that much that these were sent without even
be in the tree, after editing only the patch files.  Don't do that
;-)

I am not sure I agree that $data is better over $DATA, though.
Unlike the lowercase $mail and others used in the script that are
clearly "variables", this thing is used as a constant during the
lifetime of the test script.



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v3 2/2] mailinfo: unescape quoted-pair in header fields
  2016-09-26 22:23             ` Junio C Hamano
@ 2016-09-27 10:26               ` Kevin Daudt
  0 siblings, 0 replies; 32+ messages in thread
From: Kevin Daudt @ 2016-09-27 10:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Swift Geek, Jeff King

On Mon, Sep 26, 2016 at 03:23:23PM -0700, Junio C Hamano wrote:
> Kevin Daudt <me@ikke.info> writes:
> 
> > On Mon, Sep 26, 2016 at 12:26:13PM -0700, Junio C Hamano wrote:
> >> Junio C Hamano <gitster@pobox.com> writes:
> >> 
> >> > Don't these also need to be downcased if you prefer $data over
> >> > $DATA, though?
> >> 
> >> For now, I'll queue a SQUASH??? that reverts s/DATA/data/ you did to
> >> 1/2 between your 1/2 and 2/2.
> >
> > Ugh, thanks. I'd replaced it in the first patch, but forgot it in the
> > second.
> 
> Heh, I already guessed that much that these were sent without even
> be in the tree, after editing only the patch files.  Don't do that
> ;-)

That was just my poor wording. I did a proper rebase, but only changed
it for the first commit.




^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v4 0/2] Handle RFC2822 quoted-pairs in From header
  2016-09-25 21:08   ` [PATCH v3 1/2] t5100-mailinfo: replace common path prefix with variable Kevin Daudt
  2016-09-25 21:08     ` [PATCH v3 2/2] mailinfo: unescape quoted-pair in header fields Kevin Daudt
  2016-09-26 19:06     ` [PATCH v3 1/2] t5100-mailinfo: replace common path prefix with variable Junio C Hamano
@ 2016-09-28 19:49     ` Kevin Daudt
  2016-09-28 19:52       ` [PATCH v4 1/2] t5100-mailinfo: replace common path prefix with variable Kevin Daudt
  2016-09-28 19:52       ` [PATCH v4 2/2] mailinfo: unescape quoted-pair in header fields Kevin Daudt
  2 siblings, 2 replies; 32+ messages in thread
From: Kevin Daudt @ 2016-09-28 19:49 UTC (permalink / raw)
  To: git; +Cc: Kevin Daudt, Junio C Hamano, Swift Geek, Jeff King

Changes since v3:
- t5100-mailinfo: Reverted back to capital $DATA
- t5100-mailinfo: Moved quotes to around the entire string, instead of the
  variable, as per Junio's suggestion


Kevin Daudt (2):
  t5100-mailinfo: replace common path prefix with variable
  mailinfo: unescape quoted-pair in header fields

 mailinfo.c                   | 82 ++++++++++++++++++++++++++++++++++++++++++++
 t/t5100-mailinfo.sh          | 82 ++++++++++++++++++++++++++------------------
 t/t5100/comment.expect       |  5 +++
 t/t5100/comment.in           |  9 +++++
 t/t5100/quoted-string.expect |  5 +++
 t/t5100/quoted-string.in     |  9 +++++
 6 files changed, 159 insertions(+), 33 deletions(-)
 create mode 100644 t/t5100/comment.expect
 create mode 100644 t/t5100/comment.in
 create mode 100644 t/t5100/quoted-string.expect
 create mode 100644 t/t5100/quoted-string.in

-- 
2.10.0.372.g6fe1b14


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v4 1/2] t5100-mailinfo: replace common path prefix with variable
  2016-09-28 19:49     ` [PATCH v4 0/2] Handle RFC2822 quoted-pairs in From header Kevin Daudt
@ 2016-09-28 19:52       ` Kevin Daudt
  2016-09-28 20:21         ` Junio C Hamano
  2016-09-28 19:52       ` [PATCH v4 2/2] mailinfo: unescape quoted-pair in header fields Kevin Daudt
  1 sibling, 1 reply; 32+ messages in thread
From: Kevin Daudt @ 2016-09-28 19:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Swift Geek, Jeff King, Kevin Daudt

Many tests need to store data in a file, and repeat the same pattern to
refer to that path:

    "$TEST_DIRECTORY"/t5100/

Create a variable that contains this path, and use that instead.

While we're making this change, make sure the quotes are not just around
the variable, but around the entire string to not give the impression
we want shell splitting to affect the other variables.

Signed-off-by: Kevin Daudt <me@ikke.info>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 t/t5100-mailinfo.sh | 68 +++++++++++++++++++++++++++--------------------------
 1 file changed, 35 insertions(+), 33 deletions(-)

diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh
index 1a5a546..56988b7 100755
--- a/t/t5100-mailinfo.sh
+++ b/t/t5100-mailinfo.sh
@@ -7,8 +7,10 @@ test_description='git mailinfo and git mailsplit test'
 
 . ./test-lib.sh
 
+DATA="$TEST_DIRECTORY/t5100"
+
 test_expect_success 'split sample box' \
-	'git mailsplit -o. "$TEST_DIRECTORY"/t5100/sample.mbox >last &&
+	'git mailsplit -o. "$DATA/sample.mbox" >last &&
 	last=$(cat last) &&
 	echo total is $last &&
 	test $(cat last) = 17'
@@ -16,28 +18,28 @@ test_expect_success 'split sample box' \
 check_mailinfo () {
 	mail=$1 opt=$2
 	mo="$mail$opt"
-	git mailinfo -u $opt msg$mo patch$mo <$mail >info$mo &&
-	test_cmp "$TEST_DIRECTORY"/t5100/msg$mo msg$mo &&
-	test_cmp "$TEST_DIRECTORY"/t5100/patch$mo patch$mo &&
-	test_cmp "$TEST_DIRECTORY"/t5100/info$mo info$mo
+	git mailinfo -u $opt "msg$mo" "patch$mo" <"$mail" >"info$mo" &&
+	test_cmp "$DATA/msg$mo" "msg$mo" &&
+	test_cmp "$DATA/patch$mo" "patch$mo" &&
+	test_cmp "$DATA/info$mo" "info$mo"
 }
 
 
 for mail in 00*
 do
 	test_expect_success "mailinfo $mail" '
-		check_mailinfo $mail "" &&
-		if test -f "$TEST_DIRECTORY"/t5100/msg$mail--scissors
+		check_mailinfo "$mail" "" &&
+		if test -f "$DATA/msg$mail--scissors"
 		then
-			check_mailinfo $mail --scissors
+			check_mailinfo "$mail" --scissors
 		fi &&
-		if test -f "$TEST_DIRECTORY"/t5100/msg$mail--no-inbody-headers
+		if test -f "$DATA/msg$mail--no-inbody-headers"
 		then
-			check_mailinfo $mail --no-inbody-headers
+			check_mailinfo "$mail" --no-inbody-headers
 		fi &&
-		if test -f "$TEST_DIRECTORY"/t5100/msg$mail--message-id
+		if test -f "$DATA/msg$mail--message-id"
 		then
-			check_mailinfo $mail --message-id
+			check_mailinfo "$mail" --message-id
 		fi
 	'
 done
@@ -45,7 +47,7 @@ done
 
 test_expect_success 'split box with rfc2047 samples' \
 	'mkdir rfc2047 &&
-	git mailsplit -orfc2047 "$TEST_DIRECTORY"/t5100/rfc2047-samples.mbox \
+	git mailsplit -orfc2047 "$DATA/rfc2047-samples.mbox" \
 	  >rfc2047/last &&
 	last=$(cat rfc2047/last) &&
 	echo total is $last &&
@@ -54,20 +56,20 @@ test_expect_success 'split box with rfc2047 samples' \
 for mail in rfc2047/00*
 do
 	test_expect_success "mailinfo $mail" '
-		git mailinfo -u $mail-msg $mail-patch <$mail >$mail-info &&
+		git mailinfo -u "$mail-msg" "$mail-patch" <"$mail" >"$mail-info" &&
 		echo msg &&
-		test_cmp "$TEST_DIRECTORY"/t5100/empty $mail-msg &&
+		test_cmp "$DATA/empty" "$mail-msg" &&
 		echo patch &&
-		test_cmp "$TEST_DIRECTORY"/t5100/empty $mail-patch &&
+		test_cmp "$DATA/empty" "$mail-patch" &&
 		echo info &&
-		test_cmp "$TEST_DIRECTORY"/t5100/rfc2047-info-$(basename $mail) $mail-info
+		test_cmp "$DATA/rfc2047-info-$(basename $mail)" "$mail-info"
 	'
 done
 
 test_expect_success 'respect NULs' '
 
-	git mailsplit -d3 -o. "$TEST_DIRECTORY"/t5100/nul-plain &&
-	test_cmp "$TEST_DIRECTORY"/t5100/nul-plain 001 &&
+	git mailsplit -d3 -o. "$DATA/nul-plain" &&
+	test_cmp "$DATA/nul-plain" 001 &&
 	(cat 001 | git mailinfo msg patch) &&
 	test_line_count = 4 patch
 
@@ -75,52 +77,52 @@ test_expect_success 'respect NULs' '
 
 test_expect_success 'Preserve NULs out of MIME encoded message' '
 
-	git mailsplit -d5 -o. "$TEST_DIRECTORY"/t5100/nul-b64.in &&
-	test_cmp "$TEST_DIRECTORY"/t5100/nul-b64.in 00001 &&
+	git mailsplit -d5 -o. "$DATA/nul-b64.in" &&
+	test_cmp "$DATA/nul-b64.in" 00001 &&
 	git mailinfo msg patch <00001 &&
-	test_cmp "$TEST_DIRECTORY"/t5100/nul-b64.expect patch
+	test_cmp "$DATA/nul-b64.expect" patch
 
 '
 
 test_expect_success 'mailinfo on from header without name works' '
 
 	mkdir info-from &&
-	git mailsplit -oinfo-from "$TEST_DIRECTORY"/t5100/info-from.in &&
-	test_cmp "$TEST_DIRECTORY"/t5100/info-from.in info-from/0001 &&
+	git mailsplit -oinfo-from "$DATA/info-from.in" &&
+	test_cmp "$DATA/info-from.in" info-from/0001 &&
 	git mailinfo info-from/msg info-from/patch \
 	  <info-from/0001 >info-from/out &&
-	test_cmp "$TEST_DIRECTORY"/t5100/info-from.expect info-from/out
+	test_cmp "$DATA/info-from.expect" info-from/out
 
 '
 
 test_expect_success 'mailinfo finds headers after embedded From line' '
 	mkdir embed-from &&
-	git mailsplit -oembed-from "$TEST_DIRECTORY"/t5100/embed-from.in &&
-	test_cmp "$TEST_DIRECTORY"/t5100/embed-from.in embed-from/0001 &&
+	git mailsplit -oembed-from "$DATA/embed-from.in" &&
+	test_cmp "$DATA/embed-from.in" embed-from/0001 &&
 	git mailinfo embed-from/msg embed-from/patch \
 	  <embed-from/0001 >embed-from/out &&
-	test_cmp "$TEST_DIRECTORY"/t5100/embed-from.expect embed-from/out
+	test_cmp "$DATA/embed-from.expect" embed-from/out
 '
 
 test_expect_success 'mailinfo on message with quoted >From' '
 	mkdir quoted-from &&
-	git mailsplit -oquoted-from "$TEST_DIRECTORY"/t5100/quoted-from.in &&
-	test_cmp "$TEST_DIRECTORY"/t5100/quoted-from.in quoted-from/0001 &&
+	git mailsplit -oquoted-from "$DATA/quoted-from.in" &&
+	test_cmp "$DATA/quoted-from.in" quoted-from/0001 &&
 	git mailinfo quoted-from/msg quoted-from/patch \
 	  <quoted-from/0001 >quoted-from/out &&
-	test_cmp "$TEST_DIRECTORY"/t5100/quoted-from.expect quoted-from/msg
+	test_cmp "$DATA/quoted-from.expect" quoted-from/msg
 '
 
 test_expect_success 'mailinfo unescapes with --mboxrd' '
 	mkdir mboxrd &&
 	git mailsplit -omboxrd --mboxrd \
-		"$TEST_DIRECTORY"/t5100/sample.mboxrd >last &&
+		"$DATA/sample.mboxrd" >last &&
 	test x"$(cat last)" = x2 &&
 	for i in 0001 0002
 	do
 		git mailinfo mboxrd/msg mboxrd/patch \
 		  <mboxrd/$i >mboxrd/out &&
-		test_cmp "$TEST_DIRECTORY"/t5100/${i}mboxrd mboxrd/msg
+		test_cmp "$DATA/${i}mboxrd" mboxrd/msg
 	done &&
 	sp=" " &&
 	echo "From " >expect &&
-- 
2.10.0.372.g6fe1b14


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v4 2/2] mailinfo: unescape quoted-pair in header fields
  2016-09-28 19:49     ` [PATCH v4 0/2] Handle RFC2822 quoted-pairs in From header Kevin Daudt
  2016-09-28 19:52       ` [PATCH v4 1/2] t5100-mailinfo: replace common path prefix with variable Kevin Daudt
@ 2016-09-28 19:52       ` Kevin Daudt
  1 sibling, 0 replies; 32+ messages in thread
From: Kevin Daudt @ 2016-09-28 19:52 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Swift Geek, Jeff King, Kevin Daudt

rfc2822 has provisions for quoted strings in structured header fields,
but also allows for escaping these with so-called quoted-pairs.

The only thing git currently does is removing exterior quotes, but
quotes within are left alone.

Remove exterior quotes and remove escape characters so that they don't
show up in the author field.

Signed-off-by: Kevin Daudt <me@ikke.info>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 mailinfo.c                   | 82 ++++++++++++++++++++++++++++++++++++++++++++
 t/t5100-mailinfo.sh          | 14 ++++++++
 t/t5100/comment.expect       |  5 +++
 t/t5100/comment.in           |  9 +++++
 t/t5100/quoted-string.expect |  5 +++
 t/t5100/quoted-string.in     |  9 +++++
 6 files changed, 124 insertions(+)
 create mode 100644 t/t5100/comment.expect
 create mode 100644 t/t5100/comment.in
 create mode 100644 t/t5100/quoted-string.expect
 create mode 100644 t/t5100/quoted-string.in

diff --git a/mailinfo.c b/mailinfo.c
index e19abe3..b4118a0 100644
--- a/mailinfo.c
+++ b/mailinfo.c
@@ -54,6 +54,86 @@ static void parse_bogus_from(struct mailinfo *mi, const struct strbuf *line)
 	get_sane_name(&mi->name, &mi->name, &mi->email);
 }
 
+static const char *unquote_comment(struct strbuf *outbuf, const char *in)
+{
+	int c;
+	int take_next_litterally = 0;
+
+	strbuf_addch(outbuf, '(');
+
+	while ((c = *in++) != 0) {
+		if (take_next_litterally == 1) {
+			take_next_litterally = 0;
+		} else {
+			switch (c) {
+			case '\\':
+				take_next_litterally = 1;
+				continue;
+			case '(':
+				in = unquote_comment(outbuf, in);
+				continue;
+			case ')':
+				strbuf_addch(outbuf, ')');
+				return in;
+			}
+		}
+
+		strbuf_addch(outbuf, c);
+	}
+
+	return in;
+}
+
+static const char *unquote_quoted_string(struct strbuf *outbuf, const char *in)
+{
+	int c;
+	int take_next_litterally = 0;
+
+	while ((c = *in++) != 0) {
+		if (take_next_litterally == 1) {
+			take_next_litterally = 0;
+		} else {
+			switch (c) {
+			case '\\':
+				take_next_litterally = 1;
+				continue;
+			case '"':
+				return in;
+			}
+		}
+
+		strbuf_addch(outbuf, c);
+	}
+
+	return in;
+}
+
+static void unquote_quoted_pair(struct strbuf *line)
+{
+	struct strbuf outbuf;
+	const char *in = line->buf;
+	int c;
+
+	strbuf_init(&outbuf, line->len);
+
+	while ((c = *in++) != 0) {
+		switch (c) {
+		case '"':
+			in = unquote_quoted_string(&outbuf, in);
+			continue;
+		case '(':
+			in = unquote_comment(&outbuf, in);
+			continue;
+		}
+
+		strbuf_addch(&outbuf, c);
+	}
+
+	strbuf_swap(&outbuf, line);
+	strbuf_release(&outbuf);
+
+}
+
 static void handle_from(struct mailinfo *mi, const struct strbuf *from)
 {
 	char *at;
@@ -63,6 +143,8 @@ static void handle_from(struct mailinfo *mi, const struct strbuf *from)
 	strbuf_init(&f, from->len);
 	strbuf_addbuf(&f, from);
 
+	unquote_quoted_pair(&f);
+
 	at = strchr(f.buf, '@');
 	if (!at) {
 		parse_bogus_from(mi, from);
diff --git a/t/t5100-mailinfo.sh b/t/t5100-mailinfo.sh
index 56988b7..45d228e 100755
--- a/t/t5100-mailinfo.sh
+++ b/t/t5100-mailinfo.sh
@@ -144,4 +144,18 @@ test_expect_success 'mailinfo unescapes with --mboxrd' '
 	test_cmp expect mboxrd/msg
 '
 
+test_expect_success 'mailinfo handles rfc2822 quoted-string' '
+	mkdir quoted-string &&
+	git mailinfo /dev/null /dev/null <"$DATA/quoted-string.in" \
+		>quoted-string/info &&
+	test_cmp "$DATA/quoted-string.expect" quoted-string/info
+'
+
+test_expect_success 'mailinfo handles rfc2822 comment' '
+	mkdir comment &&
+	git mailinfo /dev/null /dev/null <"$DATA/comment.in" \
+		>comment/info &&
+	test_cmp "$DATA/comment.expect" comment/info
+'
+
 test_done
diff --git a/t/t5100/comment.expect b/t/t5100/comment.expect
new file mode 100644
index 0000000..7228177
--- /dev/null
+++ b/t/t5100/comment.expect
@@ -0,0 +1,5 @@
+Author: A U Thor (this is (really) a comment (honestly))
+Email: somebody@example.com
+Subject: testing comments
+Date: Sun, 25 May 2008 00:38:18 -0700
+
diff --git a/t/t5100/comment.in b/t/t5100/comment.in
new file mode 100644
index 0000000..c53a192
--- /dev/null
+++ b/t/t5100/comment.in
@@ -0,0 +1,9 @@
+From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
+From: "A U Thor" <somebody@example.com> (this is \(really\) a comment (honestly))
+Date: Sun, 25 May 2008 00:38:18 -0700
+Subject: [PATCH] testing comments
+
+
+
+---
+patch
diff --git a/t/t5100/quoted-string.expect b/t/t5100/quoted-string.expect
new file mode 100644
index 0000000..cab1bce
--- /dev/null
+++ b/t/t5100/quoted-string.expect
@@ -0,0 +1,5 @@
+Author: Author "The Author" Name
+Email: somebody@example.com
+Subject: testing quoted-pair
+Date: Sun, 25 May 2008 00:38:18 -0700
+
diff --git a/t/t5100/quoted-string.in b/t/t5100/quoted-string.in
new file mode 100644
index 0000000..e2e627a
--- /dev/null
+++ b/t/t5100/quoted-string.in
@@ -0,0 +1,9 @@
+From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
+From: "Author \"The Author\" Name" <somebody@example.com>
+Date: Sun, 25 May 2008 00:38:18 -0700
+Subject: [PATCH] testing quoted-pair
+
+
+
+---
+patch
-- 
2.10.0.372.g6fe1b14


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 1/2] t5100-mailinfo: replace common path prefix with variable
  2016-09-28 19:52       ` [PATCH v4 1/2] t5100-mailinfo: replace common path prefix with variable Kevin Daudt
@ 2016-09-28 20:21         ` Junio C Hamano
  2016-09-28 20:27           ` Kevin Daudt
  0 siblings, 1 reply; 32+ messages in thread
From: Junio C Hamano @ 2016-09-28 20:21 UTC (permalink / raw)
  To: Kevin Daudt; +Cc: git, Swift Geek, Jeff King

Kevin Daudt <me@ikke.info> writes:

> Many tests need to store data in a file, and repeat the same pattern to
> refer to that path:
>
>     "$TEST_DIRECTORY"/t5100/
>
> Create a variable that contains this path, and use that instead.
>
> While we're making this change, make sure the quotes are not just around
> the variable, but around the entire string to not give the impression
> we want shell splitting to affect the other variables.

Wow.  I was half expecting that you'd say something like "1/2 plus
the SQUASH is OK by me", but you went extra mile to do it right.

Impressed, and very much appreciated.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v4 1/2] t5100-mailinfo: replace common path prefix with variable
  2016-09-28 20:21         ` Junio C Hamano
@ 2016-09-28 20:27           ` Kevin Daudt
  0 siblings, 0 replies; 32+ messages in thread
From: Kevin Daudt @ 2016-09-28 20:27 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Swift Geek, Jeff King

On Wed, Sep 28, 2016 at 01:21:13PM -0700, Junio C Hamano wrote:
> Kevin Daudt <me@ikke.info> writes:
> 
> > Many tests need to store data in a file, and repeat the same pattern to
> > refer to that path:
> >
> >     "$TEST_DIRECTORY"/t5100/
> >
> > Create a variable that contains this path, and use that instead.
> >
> > While we're making this change, make sure the quotes are not just around
> > the variable, but around the entire string to not give the impression
> > we want shell splitting to affect the other variables.
> 
> Wow.  I was half expecting that you'd say something like "1/2 plus
> the SQUASH is OK by me", but you went extra mile to do it right.
> 
> Impressed, and very much appreciated.
> 

You're What's Cooking mail mentioned you expected a reroll, so I guessed
that I could just fix this part as well.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2016-09-28 20:27 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-16 21:02 [PATCH] mailinfo: unescape quoted-pair in header fields Kevin Daudt
2016-09-16 22:22 ` Jeff King
2016-09-19 10:51   ` Kevin Daudt
2016-09-20  3:57     ` Jeff King
2016-09-21 16:07       ` Junio C Hamano
2016-09-19 18:54 ` [PATCH v2 0/2] Handle escape characters in From field Kevin Daudt
2016-09-25 21:08   ` [PATCH v3 1/2] t5100-mailinfo: replace common path prefix with variable Kevin Daudt
2016-09-25 21:08     ` [PATCH v3 2/2] mailinfo: unescape quoted-pair in header fields Kevin Daudt
2016-09-26 19:11       ` Junio C Hamano
2016-09-26 19:26         ` Junio C Hamano
2016-09-26 19:44           ` Kevin Daudt
2016-09-26 22:23             ` Junio C Hamano
2016-09-27 10:26               ` Kevin Daudt
2016-09-26 19:06     ` [PATCH v3 1/2] t5100-mailinfo: replace common path prefix with variable Junio C Hamano
2016-09-28 19:49     ` [PATCH v4 0/2] Handle RFC2822 quoted-pairs in From header Kevin Daudt
2016-09-28 19:52       ` [PATCH v4 1/2] t5100-mailinfo: replace common path prefix with variable Kevin Daudt
2016-09-28 20:21         ` Junio C Hamano
2016-09-28 20:27           ` Kevin Daudt
2016-09-28 19:52       ` [PATCH v4 2/2] mailinfo: unescape quoted-pair in header fields Kevin Daudt
2016-09-19 18:54 ` [PATCH v2 1/2] t5100-mailinfo: replace common path prefix with variable Kevin Daudt
2016-09-19 21:16   ` Junio C Hamano
2016-09-20  3:59     ` Jeff King
2016-09-19 18:54 ` [PATCH v2 2/2] mailinfo: unescape quoted-pair in header fields Kevin Daudt
2016-09-19 21:24   ` Junio C Hamano
2016-09-19 22:04     ` Junio C Hamano
2016-09-20  4:28   ` Jeff King
2016-09-21 11:09   ` Jeff King
2016-09-22 22:17     ` Junio C Hamano
2016-09-23  4:15       ` Jeff King
2016-09-25 20:17         ` Kevin Daudt
2016-09-25 22:38           ` Jakub Narębski
2016-09-26  5:02             ` Kevin Daudt

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).