From: Kevin Daudt <me@ikke.info>
To: Jeff King <peff@peff.net>
Cc: git@vger.kernel.org, Swift Geek <swiftgeek@gmail.com>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH] mailinfo: unescape quoted-pair in header fields
Date: Mon, 19 Sep 2016 12:51:33 +0200 [thread overview]
Message-ID: <20160919105133.GA10901@ikke.info> (raw)
In-Reply-To: <20160916222206.jz2d4gpaxxccia5p@sigill.intra.peff.net>
Thanks for the review
On Fri, Sep 16, 2016 at 03:22:06PM -0700, Jeff King wrote:
> On Fri, Sep 16, 2016 at 11:02:04PM +0200, Kevin Daudt wrote:
>
> > mailinfo.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++
> > t/t5100-mailinfo.sh | 6 ++++++
> > t/t5100/quoted-pair.expect | 5 +++++
> > t/t5100/quoted-pair.in | 9 ++++++++
> > 4 files changed, 74 insertions(+)
> > create mode 100644 t/t5100/quoted-pair.expect
> > create mode 100644 t/t5100/quoted-pair.in
> >
> > diff --git a/mailinfo.c b/mailinfo.c
> > index e19abe3..04036f3 100644
> > --- a/mailinfo.c
> > +++ b/mailinfo.c
> > @@ -54,15 +54,69 @@ static void parse_bogus_from(struct mailinfo *mi, const struct strbuf *line)
> > get_sane_name(&mi->name, &mi->name, &mi->email);
> > }
> >
> > +static int unquote_quoted_string(struct strbuf *line)
> > +{
> > + struct strbuf outbuf;
> > + const char *in = line->buf;
> > + int c, take_next_literally = 0;
> > + int found_error = 0;
> > + char escape_context=0;
>
> Style: whitespace around "=".
>
> I had to wonder why we needed both escape_context and
> take_next_literally; shouldn't we just need a single state bit. But
> escape_context is not "escape the next character", it is "we are
> currently in a mode where we should be escaping".
>
> Could we give it a more descriptive name? I guess it is more than just
> "we are in a mode", but rather "here is the character that will end the
> escaped mode". Maybe a comment would be more appropriate.
>
Yes, your analysis is right, we need to know what character would end
the 'escape context'. I'll add a comment.
> > + while ((c = *in++) != 0) {
> > + if (take_next_literally) {
> > + take_next_literally = 0;
> > + } else {
>
> OK, so that means the previous one was backslash-quoted, and we don't do
> any other cleverness. Good.
>
> > + switch (c) {
> > + case '"':
> > + if (!escape_context)
> > + escape_context = '"';
> > + else if (escape_context == '"')
> > + escape_context = 0;
> > + continue;
>
> And here we open or close the quoted portion, depending. Makes sense.
>
> > + case '\\':
> > + if (escape_context) {
> > + take_next_literally = 1;
> > + continue;
> > + }
> > + break;
>
> I didn't look in the RFC. Is:
>
> From: my \"name\" <foo@example.com>
>
> really the same as:
>
> From: "my \\\"name\\\"" <foo@example.com>
>
> ? That seems weird, but I think it may be that the former is simply
> bogus (you are not supposed to use backslashes outside of the quoted
> section at all).
Correct, the quoted-pair (escape sequence) can only occur in a quoted
string or a comment. Even more so, the display name *needs* to be quoted
when consisting of more then one word according to the RFC.
>
> > + case '(':
> > + if (!escape_context)
> > + escape_context = '(';
> > + else if (escape_context == '(')
> > + found_error = 1;
> > + break;
>
> Hmm. Is:
>
> From: Name (Comment with (another comment))
>
> really disallowed? RFC2822 seems to say that "comment" can contain
> "ccontent", which can itself be a comment.
Yes, you are right, it is allowed, I was just looking at the ctext when
adding this, but failed to see that comments can be nested at that time.
>
> This is obviously getting pretty silly, but if we are going to follow
> the RFC, I think you actually have to do a recursive parse, and keep
> track of an arbitrary depth of context.
>
> I dunno. This method probably covers most cases in practice, and it's
> easy to reason about.
The problem is, how do you differentiate between nested comments, and
escaped braces within a comment after one run?
>
> > + case ')':
> > + if (escape_context == '(')
> > + escape_context = 0;
> > + break;
> > + }
> > + }
> > +
> > + strbuf_addch(&outbuf, c);
> > + }
> > +
> > + strbuf_reset(line);
> > + strbuf_addbuf(line, &outbuf);
> > + strbuf_release(&outbuf);
>
> I think you can use strbuf_swap() here to avoid copying the line an
> extra time, like:
>
> strbuf_swap(line, &outbuf);
> strbuf_release(&outbuf);
>
> Another option would be to just:
>
> in = strbuf_detach(&line);
>
> at the beginning, and then output back into "line".
>
Thanks, I just looked at what other functions were doing, but this is
much better indeed.
> > + return found_error;
>
> What happens when we get here and take_next_literally is set? I.e., a
> backslash at the end of the string. We'll silently print nothing, which
> seems reasonable to me (the other option is to print a literal
> backslash).
>
> Ditto, what if escape_context is non-zero? We're in the middle of an
> unterminated quoted string (or comment).
>
> I'm fine with silently continuing, but it seems weird that we notice
> embedded comments (and return an error), but not these other conditions.
>
I agree. I'm thinking it's better to just be lenient in this method. If
a quote wasn't properly closed, there would be no e-mail adress for
example. I think it would do little harm, and I'd remove the checking
for the opening brace too.
> > static void handle_from(struct mailinfo *mi, const struct strbuf *from)
> > {
> > char *at;
> > size_t el;
> > struct strbuf f;
> >
> > +
> > strbuf_init(&f, from->len);
> > strbuf_addbuf(&f, from);
>
> Funny extra line?
Ugh
>
> > +test_expect_success 'mailinfo unescapes rfc2822 quoted-string' '
> > + mkdir quoted-pair &&
> > + git mailinfo /dev/null /dev/null <"$TEST_DIRECTORY"/t5100/quoted-pair.in >quoted-pair/info &&
> > + test_cmp "$TEST_DIRECTORY"/t5100/quoted-pair.expect quoted-pair/info
> > +'
>
> We usually break long lines with backslash-escapes. Like:
>
> git mailinfo /dev/null /dev/null \
> <"$TEST_DIRECTORY"/t5100/quoted-pair.in \
> >quoted-pair/info
>
> I'd also wonder if things might be made much more readable by putting
> "$TEST_DIRECTORY/t5100" into a shorter variable like $data or something.
> That would be best done as a preparatory patch which updates all of the
> tests.
>
> > --- /dev/null
> > +++ b/t/t5100/quoted-pair.in
> > @@ -0,0 +1,9 @@
> > +From 1234567890123456789012345678901234567890 Mon Sep 17 00:00:00 2001
> > +From: "Author \"The Author\" Name" <somebody@example.com>
> > +Date: Sun, 25 May 2008 00:38:18 -0700
> > +Subject: [PATCH] testing quoted-pair
>
> I do not care that much about the "()" comment behavior myself, but if
> we are going to implement it, it probably makes sense to protect it from
> regression with a test.
Yeah, good idea.
>
> -Peff
next prev parent reply other threads:[~2016-09-19 10:51 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-16 21:02 [PATCH] mailinfo: unescape quoted-pair in header fields Kevin Daudt
2016-09-16 22:22 ` Jeff King
2016-09-19 10:51 ` Kevin Daudt [this message]
2016-09-20 3:57 ` Jeff King
2016-09-21 16:07 ` Junio C Hamano
2016-09-19 18:54 ` [PATCH v2 0/2] Handle escape characters in From field Kevin Daudt
2016-09-25 21:08 ` [PATCH v3 1/2] t5100-mailinfo: replace common path prefix with variable Kevin Daudt
2016-09-25 21:08 ` [PATCH v3 2/2] mailinfo: unescape quoted-pair in header fields Kevin Daudt
2016-09-26 19:11 ` Junio C Hamano
2016-09-26 19:26 ` Junio C Hamano
2016-09-26 19:44 ` Kevin Daudt
2016-09-26 22:23 ` Junio C Hamano
2016-09-27 10:26 ` Kevin Daudt
2016-09-26 19:06 ` [PATCH v3 1/2] t5100-mailinfo: replace common path prefix with variable Junio C Hamano
2016-09-28 19:49 ` [PATCH v4 0/2] Handle RFC2822 quoted-pairs in From header Kevin Daudt
2016-09-28 19:52 ` [PATCH v4 1/2] t5100-mailinfo: replace common path prefix with variable Kevin Daudt
2016-09-28 20:21 ` Junio C Hamano
2016-09-28 20:27 ` Kevin Daudt
2016-09-28 19:52 ` [PATCH v4 2/2] mailinfo: unescape quoted-pair in header fields Kevin Daudt
2016-09-19 18:54 ` [PATCH v2 1/2] t5100-mailinfo: replace common path prefix with variable Kevin Daudt
2016-09-19 21:16 ` Junio C Hamano
2016-09-20 3:59 ` Jeff King
2016-09-19 18:54 ` [PATCH v2 2/2] mailinfo: unescape quoted-pair in header fields Kevin Daudt
2016-09-19 21:24 ` Junio C Hamano
2016-09-19 22:04 ` Junio C Hamano
2016-09-20 4:28 ` Jeff King
2016-09-21 11:09 ` Jeff King
2016-09-22 22:17 ` Junio C Hamano
2016-09-23 4:15 ` Jeff King
2016-09-25 20:17 ` Kevin Daudt
2016-09-25 22:38 ` Jakub Narębski
2016-09-26 5:02 ` Kevin Daudt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160919105133.GA10901@ikke.info \
--to=me@ikke.info \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=peff@peff.net \
--cc=swiftgeek@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).