git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* bug (?) in send email
@ 2012-07-28 21:33 Christoph Miebach
  2012-07-30 10:38 ` Christoph Miebach
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Miebach @ 2012-07-28 21:33 UTC (permalink / raw)
  To: git

Hello!

send-email (tested versions 1.7.9.2 and 1.7.10.4) breaks email addresses.

Steps to reproduce:


Modify file.

git commit --author="Michał Tz <name_1911@some.com>" modified.file -m "Test"

git format-patch -o patches origin

Now, the patch seems to have the address right, see [1]

git send-email  --to MYOWN.ADDRESS@mail.com --suppress-cc=author 
patches/0001-Test.patch

But checking my inbox now shows an email starting with:
From: Michał Tz <name 1911@some.com>

So the address is splitted at the underscore.

Furthermore, if I don't use --suppress-cc=author, the CC field shows the 
right address.

Regards

Christoph

[1]
less patches/0001-Test.patch
From: =?UTF-8?q?Micha=C5=82=20Tz?= <name_1911@some.com>

git show
Author: Michał Tz <name_1911@some.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bug (?) in send email
  2012-07-28 21:33 bug (?) in send email Christoph Miebach
@ 2012-07-30 10:38 ` Christoph Miebach
  2012-07-30 12:30   ` Thomas Rast
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Miebach @ 2012-07-30 10:38 UTC (permalink / raw)
  To: git

Hello!

Removing this line
s/_/ /g;
here
https://github.com/git/git/blob/master/git-send-email.perl#L867

Solves this problem for me. But I really don't have any clue, what kind 
of side effects this modification on "sub unquote_rfc2047" might have.

Regards

Christoph

On 28.07.2012 23:33, Christoph Miebach wrote:
> Hello!
>
> send-email (tested versions 1.7.9.2 and 1.7.10.4) breaks email addresses.
>
> Steps to reproduce:
>
>
> Modify file.
>
> git commit --author="Michał Tz <name_1911@some.com>" modified.file -m
> "Test"
>
> git format-patch -o patches origin
>
> Now, the patch seems to have the address right, see [1]
>
> git send-email  --to MYOWN.ADDRESS@mail.com --suppress-cc=author
> patches/0001-Test.patch
>
> But checking my inbox now shows an email starting with:
> From: Michał Tz <name 1911@some.com>
>
> So the address is splitted at the underscore.
>
> Furthermore, if I don't use --suppress-cc=author, the CC field shows the
> right address.
>
> Regards
>
> Christoph
>
> [1]
> less patches/0001-Test.patch
> From: =?UTF-8?q?Micha=C5=82=20Tz?= <name_1911@some.com>
>
> git show
> Author: Michał Tz <name_1911@some.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bug (?) in send email
  2012-07-30 10:38 ` Christoph Miebach
@ 2012-07-30 12:30   ` Thomas Rast
  2012-07-30 15:38     ` Junio C Hamano
  2012-07-30 16:32     ` Jeff King
  0 siblings, 2 replies; 6+ messages in thread
From: Thomas Rast @ 2012-07-30 12:30 UTC (permalink / raw)
  To: Christoph Miebach; +Cc: git, Jürgen Rühle, Jeff King, Junio C Hamano

[+Cc people involved with this function]

Christoph Miebach <christoph.miebach@web.de> writes:

> > git commit --author="Michał Tz <name_1911@some.com>" modified.file -m
> > "Test"
> >
> > git format-patch -o patches origin
> >
> > Now, the patch seems to have the address right, see [1]
> >
> > git send-email  --to MYOWN.ADDRESS@mail.com --suppress-cc=author
> > patches/0001-Test.patch
> >
> > But checking my inbox now shows an email starting with:
> > From: Michał Tz <name 1911@some.com>
> 
> Removing this line
> s/_/ /g;
> here
> https://github.com/git/git/blob/master/git-send-email.perl#L867
>
> Solves this problem for me. But I really don't have any clue, what
> kind of side effects this modification on "sub unquote_rfc2047" might
> have.

It would prevent spaces from being decoded correctly if the encoding
program chooses to make the '_'.  git-format-patch does not actually do
this, see the big comment around pretty.c:304.

I think this patch would be a better match for what RFC2047 specifies.
On the one hand it avoids substituting _ outside of encodings, but OTOH
it also handles more than one encoded-word.  It still does not handle
the case where there are several encoded-words of *different* encodings,
but who would do such a crazy thing?


diff --git i/git-send-email.perl w/git-send-email.perl
index ef30c55..88c4758 100755
--- i/git-send-email.perl
+++ w/git-send-email.perl
@@ -862,11 +862,13 @@ sub make_message_id {
 sub unquote_rfc2047 {
 	local ($_) = @_;
 	my $encoding;
-	if (s/=\?([^?]+)\?q\?(.*)\?=/$2/g) {
+	s{=\?([^?]+)\?q\?(.*)\?=}{
 		$encoding = $1;
-		s/_/ /g;
-		s/=([0-9A-F]{2})/chr(hex($1))/eg;
-	}
+		my $e = $2;
+		$e =~ s/_/ /g;
+		$e =~ s/=([0-9A-F]{2})/chr(hex($1))/eg;
+		$e;
+	}eg;
 	return wantarray ? ($_, $encoding) : $_;
 }
 


-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: bug (?) in send email
  2012-07-30 12:30   ` Thomas Rast
@ 2012-07-30 15:38     ` Junio C Hamano
  2012-07-30 16:34       ` Jeff King
  2012-07-30 16:32     ` Jeff King
  1 sibling, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2012-07-30 15:38 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Christoph Miebach, git, Jürgen Rühle, Jeff King

Thomas Rast <trast@student.ethz.ch> writes:

> It would prevent spaces from being decoded correctly if the encoding
> program chooses to make the '_'.  git-format-patch does not actually do
> this, see the big comment around pretty.c:304.
>
> I think this patch would be a better match for what RFC2047 specifies.
> On the one hand it avoids substituting _ outside of encodings, but OTOH
> it also handles more than one encoded-word.

Yeah, I think it is an improvement.

I however wonder if the captured pattern for $2 should be minimized
with ? at the end, i.e. "..\?q\?(.*?)\?="?

> It still does not handle
> the case where there are several encoded-words of *different* encodings,
> but who would do such a crazy thing?

Even if somebody did so, it wouldn't have worked, and to make it
work, the sub and its caller (there is only one caller that actually
cares what the original encoding was) needs to be rethought anyway,
so I do not think it matters.

It may deserve an in-code NEEDSWORK comment, though.

Thanks.

> diff --git i/git-send-email.perl w/git-send-email.perl
> index ef30c55..88c4758 100755
> --- i/git-send-email.perl
> +++ w/git-send-email.perl
> @@ -862,11 +862,13 @@ sub make_message_id {
>  sub unquote_rfc2047 {
>  	local ($_) = @_;
>  	my $encoding;
> -	if (s/=\?([^?]+)\?q\?(.*)\?=/$2/g) {
> +	s{=\?([^?]+)\?q\?(.*)\?=}{
>  		$encoding = $1;
> -		s/_/ /g;
> -		s/=([0-9A-F]{2})/chr(hex($1))/eg;
> -	}
> +		my $e = $2;
> +		$e =~ s/_/ /g;
> +		$e =~ s/=([0-9A-F]{2})/chr(hex($1))/eg;
> +		$e;
> +	}eg;
>  	return wantarray ? ($_, $encoding) : $_;
>  }

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bug (?) in send email
  2012-07-30 12:30   ` Thomas Rast
  2012-07-30 15:38     ` Junio C Hamano
@ 2012-07-30 16:32     ` Jeff King
  1 sibling, 0 replies; 6+ messages in thread
From: Jeff King @ 2012-07-30 16:32 UTC (permalink / raw)
  To: Thomas Rast
  Cc: Christoph Miebach, git, Jürgen Rühle, Junio C Hamano

On Mon, Jul 30, 2012 at 02:30:35PM +0200, Thomas Rast wrote:

> > Removing this line
> > s/_/ /g;
> > here
> > https://github.com/git/git/blob/master/git-send-email.perl#L867
> >
> > Solves this problem for me. But I really don't have any clue, what
> > kind of side effects this modification on "sub unquote_rfc2047" might
> > have.
> 
> It would prevent spaces from being decoded correctly if the encoding
> program chooses to make the '_'.  git-format-patch does not actually do
> this, see the big comment around pretty.c:304.

Yeah, I think we need to be prepared to decode arbitrary rfc2047. Even
though most of our input will come from format-patch, people can munge
the input between format-patch and send-email.

> I think this patch would be a better match for what RFC2047 specifies.
> On the one hand it avoids substituting _ outside of encodings, but OTOH
> it also handles more than one encoded-word.  It still does not handle
> the case where there are several encoded-words of *different* encodings,
> but who would do such a crazy thing?
> 
> diff --git i/git-send-email.perl w/git-send-email.perl
> index ef30c55..88c4758 100755
> --- i/git-send-email.perl
> +++ w/git-send-email.perl
> @@ -862,11 +862,13 @@ sub make_message_id {
>  sub unquote_rfc2047 {
>  	local ($_) = @_;
>  	my $encoding;
> -	if (s/=\?([^?]+)\?q\?(.*)\?=/$2/g) {
> +	s{=\?([^?]+)\?q\?(.*)\?=}{
>  		$encoding = $1;
> -		s/_/ /g;
> -		s/=([0-9A-F]{2})/chr(hex($1))/eg;
> -	}
> +		my $e = $2;
> +		$e =~ s/_/ /g;
> +		$e =~ s/=([0-9A-F]{2})/chr(hex($1))/eg;
> +		$e;
> +	}eg;
>  	return wantarray ? ($_, $encoding) : $_;

That is definitely an improvement. The existing code was just plain
wrong to assume that the whole string would be rfc2047 encoded, as that
is never the case (e.g., I don't think it is even legal to encode the
addr-spec part).

You are right that the result does not handle multiple encoded chunks
with different encodings. However, it also does not handle a single
encoded chunk with non-encoded bits. E.g., consider:

  From: =?UTF-8?q?Some=20N=C3=A1me?= <you@example.com>

We will return:

  ("Some Náme <you@example.com>", "UTF-8")

Note that the second half is unencoded ASCII, but we have simply
concatenated it with the encoded bit and claimed the whole thing at
UTF-8. This works for utf-8, of course, because it is an ASCII superset.
But what about utf-16? You now have incompatible mixed encodings in a
single string. Which of course is only the tip of the iceberg; we later
may stick the author field into the body, which does not necessarily
have the same encoding (and there is a big "uh oh we should reencode"
argument that does not actually do so).

Furthermore, even if we do use an ASCII superset, the callers do not use
the encoding field very well. They do things like:

  if (unquote_rfc2047($addr) eq $sender) {

So we are comparing the unquoted address, in some encoding (that we
throw away) to whatever we have in $sender, which may have come from the
author ident, or the command-line, or the expansion of a MUA alias. So
we would fail to match utf-8 and latin1 versions of the same identity.

I think the only sane thing for unquote_rfc2047 to do would be to
actually return just a string in some canonical encoding (i.e., utf8),
and iconv each of the bits separately from its appropriate
encoding. And then we assume that everything internal is in the same
encoding and we can just compare strings to get the right answer. And
outgoing, we produce encoded utf8 for the headers, or match the body for
an in-body "From" header.

Of course, that means we would have to start using perl's encoding
functions, which we do not use currently. I think Encode has been in the
perl base system long enough for us to rely on it, but I am not very up
to speed on encoding issues in perl. And nobody has been complaining
about those issues, which I assume means everybody is just using utf8
for outgoing messages.  So it might not be worth the trouble to fix.

I also notice that we do not handle the 'b' rfc2047 encoding at all.
We do handle this in mailinfo for incoming messages, but again, I guess
everybody just uses the more common 'q' for outgoing (since it's mostly
generated by git, anyway). So if nobody is complaining, I'd be inclined
to just leave it.

So after all that rambling, I think my response is "yes, your patch is a
step in the right direction, and we should do it". There are still a ton
of unsupported setups, but really, if we are going to handle everything,
I'd rather see us make the jump to using one of the existing
mail-handling modules rather than reinventing the wheel. And I know
that some people really like the no-dependencies aspect of what we have
currently.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: bug (?) in send email
  2012-07-30 15:38     ` Junio C Hamano
@ 2012-07-30 16:34       ` Jeff King
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff King @ 2012-07-30 16:34 UTC (permalink / raw)
  To: Junio C Hamano
  Cc: Thomas Rast, Christoph Miebach, git, Jürgen Rühle

On Mon, Jul 30, 2012 at 08:38:21AM -0700, Junio C Hamano wrote:

> > I think this patch would be a better match for what RFC2047 specifies.
> > On the one hand it avoids substituting _ outside of encodings, but OTOH
> > it also handles more than one encoded-word.
> 
> Yeah, I think it is an improvement.
> 
> I however wonder if the captured pattern for $2 should be minimized
> with ? at the end, i.e. "..\?q\?(.*?)\?="?

Yeah, definitely. "?=" cannot appear inside (it would need to be
quoted).

> > It still does not handle
> > the case where there are several encoded-words of *different* encodings,
> > but who would do such a crazy thing?
> 
> Even if somebody did so, it wouldn't have worked, and to make it
> work, the sub and its caller (there is only one caller that actually
> cares what the original encoding was) needs to be rethought anyway,
> so I do not think it matters.
> 
> It may deserve an in-code NEEDSWORK comment, though.

I rambled about this in much more detail in another reply, but the gist
of it is that yes, that is the right step for now.

-Peff

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-07-30 16:34 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-28 21:33 bug (?) in send email Christoph Miebach
2012-07-30 10:38 ` Christoph Miebach
2012-07-30 12:30   ` Thomas Rast
2012-07-30 15:38     ` Junio C Hamano
2012-07-30 16:34       ` Jeff King
2012-07-30 16:32     ` Jeff King

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).