* [PATCH] send-email: improve RFC2047 quote parsing
@ 2012-07-30 19:25 Thomas Rast
2012-07-30 21:05 ` Junio C Hamano
0 siblings, 1 reply; 3+ messages in thread
From: Thomas Rast @ 2012-07-30 19:25 UTC (permalink / raw)
To: git; +Cc: Christoph Miebach, Junio C Hamano, Jeff King,
Jürgen Rühle
The RFC2047 unquoting, used to parse email addresses in From and Cc
headers, is broken in several ways:
* It erroneously substitutes ' ' for '_' in *the whole* header, even
outside the quoted field. [Noticed by Christoph.]
* It is too liberal in its matching, and happily matches the start of
one quoted chunk against the end of another, or even just something
that looks like such an end. [Noticed by Junio.]
* It fundamentally cannot cope with encodings that are not a superset
of ASCII, nor several (incompatible) encodings in the same header.
This patch fixes the first two by doing a more careful decoding of the
=AB outer quoting. Fixing the fundamental issues is left for a
future, more intrusive, patch.
Noticed-by: Christoph Miebach <christoph.miebach@web.de>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---
This is the easy part, fixed as per Junio's comment that it needs to
use a .*? match for the contents, and with a test.
git-send-email.perl | 10 ++++++----
t/t9001-send-email.sh | 13 +++++++++++++
2 files changed, 19 insertions(+), 4 deletions(-)
diff --git a/git-send-email.perl b/git-send-email.perl
index ef30c55..6647137 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -862,11 +862,13 @@ sub make_message_id {
sub unquote_rfc2047 {
local ($_) = @_;
my $encoding;
- if (s/=\?([^?]+)\?q\?(.*)\?=/$2/g) {
+ s{=\?([^?]+)\?q\?(.*?)\?=}{
$encoding = $1;
- s/_/ /g;
- s/=([0-9A-F]{2})/chr(hex($1))/eg;
- }
+ my $e = $2;
+ $e =~ s/_/ /g;
+ $e =~ s/=([0-9A-F]{2})/chr(hex($1))/eg;
+ $e;
+ }eg;
return wantarray ? ($_, $encoding) : $_;
}
diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
index 8c12c65..0351228 100755
--- a/t/t9001-send-email.sh
+++ b/t/t9001-send-email.sh
@@ -841,6 +841,19 @@ test_expect_success $PREREQ '--compose adds MIME for utf8 subject' '
grep "^Subject: =?UTF-8?q?utf8-s=C3=BCbj=C3=ABct?=" msgtxt1
'
+test_expect_success $PREREQ 'utf8 author is correctly passed on' '
+ clean_fake_sendmail &&
+ test_commit weird_author &&
+ test_when_finished "git reset --hard HEAD^" &&
+ git commit --amend --author "Füñný Nâmé <odd_?=mail@example.com>" &&
+ git format-patch --stdout -1 >funny_name.patch &&
+ git send-email --from="Example <nobody@example.com>" \
+ --to=nobody@example.com \
+ --smtp-server="$(pwd)/fake.sendmail" \
+ funny_name.patch &&
+ grep "^From: Füñný Nâmé <odd_?=mail@example.com>" msgtxt1
+'
+
test_expect_success $PREREQ 'detects ambiguous reference/file conflict' '
echo master > master &&
git add master &&
--
1.7.12.rc0.434.gd809d0f
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] send-email: improve RFC2047 quote parsing
2012-07-30 19:25 [PATCH] send-email: improve RFC2047 quote parsing Thomas Rast
@ 2012-07-30 21:05 ` Junio C Hamano
2012-07-31 8:09 ` Thomas Rast
0 siblings, 1 reply; 3+ messages in thread
From: Junio C Hamano @ 2012-07-30 21:05 UTC (permalink / raw)
To: Thomas Rast; +Cc: git, Christoph Miebach, Jeff King, Jürgen Rühle
Thomas Rast <trast@student.ethz.ch> writes:
> The RFC2047 unquoting, used to parse email addresses in From and Cc
> headers, is broken in several ways:
>
> * It erroneously substitutes ' ' for '_' in *the whole* header, even
> outside the quoted field. [Noticed by Christoph.]
>
> * It is too liberal in its matching, and happily matches the start of
> one quoted chunk against the end of another, or even just something
> that looks like such an end. [Noticed by Junio.]
>
> * It fundamentally cannot cope with encodings that are not a superset
> of ASCII, nor several (incompatible) encodings in the same header.
>
> This patch fixes the first two by doing a more careful decoding of the
> =AB outer quoting. Fixing the fundamental issues is left for a
> future, more intrusive, patch.
What is this =AB thing?
>
> Noticed-by: Christoph Miebach <christoph.miebach@web.de>
> Helped-by: Junio C Hamano <gitster@pobox.com>
> Signed-off-by: Thomas Rast <trast@student.ethz.ch>
> ---
>
> This is the easy part, fixed as per Junio's comment that it needs to
> use a .*? match for the contents, and with a test.
What's the hard part? Do you mean the "fundamentally cannot" part?
Thanks.
> git-send-email.perl | 10 ++++++----
> t/t9001-send-email.sh | 13 +++++++++++++
> 2 files changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/git-send-email.perl b/git-send-email.perl
> index ef30c55..6647137 100755
> --- a/git-send-email.perl
> +++ b/git-send-email.perl
> @@ -862,11 +862,13 @@ sub make_message_id {
> sub unquote_rfc2047 {
> local ($_) = @_;
> my $encoding;
> - if (s/=\?([^?]+)\?q\?(.*)\?=/$2/g) {
> + s{=\?([^?]+)\?q\?(.*?)\?=}{
> $encoding = $1;
> - s/_/ /g;
> - s/=([0-9A-F]{2})/chr(hex($1))/eg;
> - }
> + my $e = $2;
> + $e =~ s/_/ /g;
> + $e =~ s/=([0-9A-F]{2})/chr(hex($1))/eg;
> + $e;
> + }eg;
> return wantarray ? ($_, $encoding) : $_;
> }
>
> diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
> index 8c12c65..0351228 100755
> --- a/t/t9001-send-email.sh
> +++ b/t/t9001-send-email.sh
> @@ -841,6 +841,19 @@ test_expect_success $PREREQ '--compose adds MIME for utf8 subject' '
> grep "^Subject: =?UTF-8?q?utf8-s=C3=BCbj=C3=ABct?=" msgtxt1
> '
>
> +test_expect_success $PREREQ 'utf8 author is correctly passed on' '
> + clean_fake_sendmail &&
> + test_commit weird_author &&
> + test_when_finished "git reset --hard HEAD^" &&
> + git commit --amend --author "Füñný Nâmé <odd_?=mail@example.com>" &&
> + git format-patch --stdout -1 >funny_name.patch &&
> + git send-email --from="Example <nobody@example.com>" \
> + --to=nobody@example.com \
> + --smtp-server="$(pwd)/fake.sendmail" \
> + funny_name.patch &&
> + grep "^From: Füñný Nâmé <odd_?=mail@example.com>" msgtxt1
> +'
> +
> test_expect_success $PREREQ 'detects ambiguous reference/file conflict' '
> echo master > master &&
> git add master &&
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] send-email: improve RFC2047 quote parsing
2012-07-30 21:05 ` Junio C Hamano
@ 2012-07-31 8:09 ` Thomas Rast
0 siblings, 0 replies; 3+ messages in thread
From: Thomas Rast @ 2012-07-31 8:09 UTC (permalink / raw)
To: Junio C Hamano
Cc: Thomas Rast, git, Christoph Miebach, Jeff King,
Jürgen Rühle
Junio C Hamano <gitster@pobox.com> writes:
> Thomas Rast <trast@student.ethz.ch> writes:
>
>> This patch fixes the first two by doing a more careful decoding of the
>> =AB outer quoting. Fixing the fundamental issues is left for a
>> future, more intrusive, patch.
>
> What is this =AB thing?
The two-hex-digits quoting in the style of MIME quoted-printable. I
called it the "outer quoting" (RFC2047: "encoding") because it serves to
protect the bytes from transport damage; there is another encoding
(RFC2047: "character set") "inside" which is specified by the
=?utf-8?...?= wrapper.
BTW, note that we also only handle the Q outer quoting
(quoted-printable). There is a B encoding, which your email in fact
used in the Cc: header:
Cc: [...] =?utf-8?B?SsO8cmdlbiBSw7xobGU=?= <j-r@online.de>
B means Base64, as you can probably guess from the looks of it.
>> This is the easy part, fixed as per Junio's comment that it needs to
>> use a .*? match for the contents, and with a test.
>
> What's the hard part? Do you mean the "fundamentally cannot" part?
Yes, and by "fundamentally" I meant "not without fixing something
outside of this function, which I am too lazy to do at this time". ;-)
--
Thomas Rast
trast@{inf,student}.ethz.ch
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2012-07-31 8:09 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-30 19:25 [PATCH] send-email: improve RFC2047 quote parsing Thomas Rast
2012-07-30 21:05 ` Junio C Hamano
2012-07-31 8:09 ` Thomas Rast
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).