git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: "Роман Донченко" <dpb@corrigendum.ru>
Cc: git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Jay Soffian" <jaysoffian@gmail.com>,
	"Thomas Rast" <tr@thomasrast.ch>
Subject: Re: [PATCH v2 2/2] send-email: handle adjacent RFC 2047-encoded words properly
Date: Sun, 7 Dec 2014 04:18:59 -0500	[thread overview]
Message-ID: <20141207091859.GA21311@peff.net> (raw)
In-Reply-To: <1417894583-2352-2-git-send-email-dpb@corrigendum.ru>

On Sat, Dec 06, 2014 at 10:36:23PM +0300, Роман Донченко wrote:

> The RFC says that they are to be concatenated after decoding (i.e. the
> intervening whitespace is ignored).

Thanks. Both patches look good to me, and I'd be happy to have them
applied as-is. I wrote a few comments below, but in all cases I think I
convinced myself that what you wrote is best.

> +	my $sep = qr/[ \t]+/;
> +	s{$re_encoded_word(?:$sep$re_encoded_word)*}{
> +		my @words = split $sep, $&;
> +		foreach (@words) {
> +			m/$re_encoded_word/;
> +			$charset = $1;
> +			my $encoding = $2;
> +			my $text = $3;

It feels a little weird to have to split and rematch $re_encoded_word in
the loop, but I don't think there is a way around it. ($1, $2, $3) will
have our first word, and ($4, $5, $6) will have the final (if any), but
I don't think we can get access to what is in between.

So I think what you have here is the best we can do.

> +			if ($encoding eq 'q' || $encoding eq 'Q') {
> +				$_ = $text;
> +				s/_/ /g;
> +				s/=([0-9A-F]{2})/chr(hex($1))/egi;

It took me a minute to figure out why this works. $_ is a reference to
the iterator for @words, so it is re-assigning that element of the array
first to the encoded text, and then modifying it in place.

I wonder if it would be more obvious like this:

  join '',
  map {
          m/$re_encoded_word/;
	  $charset = $1;
	  my $encoding = $2;
	  my $text = $3;
          if ($encoding eq 'q' || $encoding eq 'Q') {
	    $text =~ s/_/ /g;
	    $text =~ s=([0-9A-F]{2}/chr(hex($1))/egi;
	  } else {
	    # other encoding not supported yet
	  }
  } split($sep, $&);


I dunno. I kind of like your version better now that I understand it,
but it did take me a minute.

One final note on this bit of code: if there are multiple encoded words,
we grab the $charset from the final encoded word, and never report the
earlier charsets. Technically they do not all have to be the same
(rfc2047 even has an example where they are not). I think we can dismiss
this, though, as:

  1. It was like this before your patches (we might have seen multiple
     non-adjacent encoded words; you're just handling adjacent ones),
     and nobody has complained.

  2. Using two separate encodings in the same header is sufficiently
     ridiculous that I can live with us not handling it properly.

> +# This name is long enough to force format-patch to split it into multiple
> +# encoded-words, assuming it uses UTF-8 with the "Q" encoding.
> +test_expect_success $PREREQ 'long non-ascii self name is suppressed' "
> +	test_suppress_self_quoted 'Ƒüñníęř €. Nâṁé' 'odd_?=mail@example.com' \
> +		'long_non_ascii_self_suppressed'
> +"

Cute. :)

-Peff

  reply	other threads:[~2014-12-07  9:19 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-12-06 19:36 [PATCH v2 1/2] send-email: align RFC 2047 decoding more closely with the spec Роман Донченко
2014-12-06 19:36 ` [PATCH v2 2/2] send-email: handle adjacent RFC 2047-encoded words properly Роман Донченко
2014-12-07  9:18   ` Jeff King [this message]
2014-12-07 14:35     ` Роман Донченко
2014-12-07 17:48       ` Philip Oakley
2014-12-07 18:17         ` Роман Донченко
2014-12-10 22:21           ` Philip Oakley
  -- strict thread matches above, loose matches on Subject: below --
2014-12-14 15:59 [PATCH v2 1/2] send-email: align RFC 2047 decoding more closely with the spec Роман Донченко
2014-12-14 15:59 ` [PATCH v2 2/2] send-email: handle adjacent RFC 2047-encoded words properly Роман Донченко

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141207091859.GA21311@peff.net \
    --to=peff@peff.net \
    --cc=avarab@gmail.com \
    --cc=dpb@corrigendum.ru \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jaysoffian@gmail.com \
    --cc=tr@thomasrast.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).