git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Kevin Daudt <me@ikke.info>
Cc: git@vger.kernel.org, Swift Geek <swiftgeek@gmail.com>,
	Junio C Hamano <gitster@pobox.com>
Subject: Re: [PATCH v2 2/2] mailinfo: unescape quoted-pair in header fields
Date: Mon, 19 Sep 2016 21:28:33 -0700	[thread overview]
Message-ID: <20160920042832.7xzazxsfiug3llyl@sigill.intra.peff.net> (raw)
In-Reply-To: <20160919185440.18234-3-me@ikke.info>

On Mon, Sep 19, 2016 at 08:54:40PM +0200, Kevin Daudt wrote:

> diff --git a/t/t5100/comment.expect b/t/t5100/comment.expect
> new file mode 100644
> index 0000000..1197e76
> --- /dev/null
> +++ b/t/t5100/comment.expect
> @@ -0,0 +1,5 @@
> +Author: A U Thor (this is a comment (really))

Hmm. I don't see any recursion in your parsing, so after the first ")"
our escape_context would be 0 again, right? So a more tricky test is:

  Author: A U Thor (this is a comment (really) with \(quoted\) pairs)

We are still inside "ctext" when we hit those quoted pairs, and they
should be unquoted, but your code would not do so (unless we go the
route of simply unquoting pairs everywhere).

I think your parser would have to follow the BNF more closely with a
recursive descent parser, like:

  const char *parse_comment(const char *in, struct strbuf *out)
  {
        size_t orig_out = out->len;

        if ((in = parse_char('(', in, out))) &&
            (in = parse_ccontent(in, out)) &&
            (in = parse_char(')', in, out))))
                return in;

        strbuf_setlen(out, orig_out);
        return NULL;
  }

  const char *parse_ccontent(const char *in, struct strbuf *out)
  {
        while (*in && *in != ')') {
                const char *next;

                if ((next = parse_quoted_pair(in, out)) ||
                    (next = parse_comment(in, out)) ||
                    (next = parse_ctext(in, out))) {
                        in = next;
                        continue;
                }
        }

	/*
	 * if "in" is NUL here we have an unclosed comment; but we'll
	 * just silently ignore and accept it
	 */
	return in;
  }

  const char *parse_char(char c, const char *in, struct strbuf *out)
  {
        if (*in != c)
                return NULL;
        strbuf_addch(out, c);
        return in + 1;
  }

You can probably guess at the implementation of parse_quoted_pair(),
parse_ctext(), etc (and naturally, the above is completely untested and
probably has some bugs in it).

In a former life (back when it was still rfc822!) I remember
implementing a similar parser, which I think was in turn based on the
cclient code in pine. It's not _too_ hard to get it all right based on
the BNF in the RFC, but as you can see it's a bit tedious. And I'm not
convinced we actually need it to be completely right for our purposes.
We really are looking for a single address, with the email in "<>" and
the name as everything before that, but de-quoted.

-Peff

  parent reply	other threads:[~2016-09-20  4:28 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-16 21:02 [PATCH] mailinfo: unescape quoted-pair in header fields Kevin Daudt
2016-09-16 22:22 ` Jeff King
2016-09-19 10:51   ` Kevin Daudt
2016-09-20  3:57     ` Jeff King
2016-09-21 16:07       ` Junio C Hamano
2016-09-19 18:54 ` [PATCH v2 0/2] Handle escape characters in From field Kevin Daudt
2016-09-25 21:08   ` [PATCH v3 1/2] t5100-mailinfo: replace common path prefix with variable Kevin Daudt
2016-09-25 21:08     ` [PATCH v3 2/2] mailinfo: unescape quoted-pair in header fields Kevin Daudt
2016-09-26 19:11       ` Junio C Hamano
2016-09-26 19:26         ` Junio C Hamano
2016-09-26 19:44           ` Kevin Daudt
2016-09-26 22:23             ` Junio C Hamano
2016-09-27 10:26               ` Kevin Daudt
2016-09-26 19:06     ` [PATCH v3 1/2] t5100-mailinfo: replace common path prefix with variable Junio C Hamano
2016-09-28 19:49     ` [PATCH v4 0/2] Handle RFC2822 quoted-pairs in From header Kevin Daudt
2016-09-28 19:52       ` [PATCH v4 1/2] t5100-mailinfo: replace common path prefix with variable Kevin Daudt
2016-09-28 20:21         ` Junio C Hamano
2016-09-28 20:27           ` Kevin Daudt
2016-09-28 19:52       ` [PATCH v4 2/2] mailinfo: unescape quoted-pair in header fields Kevin Daudt
2016-09-19 18:54 ` [PATCH v2 1/2] t5100-mailinfo: replace common path prefix with variable Kevin Daudt
2016-09-19 21:16   ` Junio C Hamano
2016-09-20  3:59     ` Jeff King
2016-09-19 18:54 ` [PATCH v2 2/2] mailinfo: unescape quoted-pair in header fields Kevin Daudt
2016-09-19 21:24   ` Junio C Hamano
2016-09-19 22:04     ` Junio C Hamano
2016-09-20  4:28   ` Jeff King [this message]
2016-09-21 11:09   ` Jeff King
2016-09-22 22:17     ` Junio C Hamano
2016-09-23  4:15       ` Jeff King
2016-09-25 20:17         ` Kevin Daudt
2016-09-25 22:38           ` Jakub Narębski
2016-09-26  5:02             ` Kevin Daudt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160920042832.7xzazxsfiug3llyl@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=me@ikke.info \
    --cc=swiftgeek@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).