* [BUG] encoding problem with format-patch + send-email @ 2007-11-15 10:57 Uwe Kleine-König 2007-11-16 10:49 ` Jeff King 0 siblings, 1 reply; 9+ messages in thread From: Uwe Kleine-König @ 2007-11-15 10:57 UTC (permalink / raw To: git; +Cc: Brian Swetland, Russell King - ARM Linux, Nicolas Pitre Hello, Brian just stumbled over a problem with format-patch + send-email. format-patch only adds Content-Type and Content-Transfer-Encoding headers iff the body needs it. send-email adds "From: A. U. Thor <author@tld>" to the body if sender and From: in the patch to send differ. Both is just fine, but if the author has some non-ascii characters in her name but the body is ascii-only the resulting mail is broken. What about adding the Content-Type and Content-Transfer-Encoding headers in any case? Best regards Uwe -- Uwe Kleine-König, Software Engineer Digi International GmbH Branch Breisach, Küferstrasse 8, 79206 Breisach, Germany Tax: 315/5781/0242 / VAT: DE153662976 / Reg. Amtsgericht Dortmund HRB 13962 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] encoding problem with format-patch + send-email 2007-11-15 10:57 [BUG] encoding problem with format-patch + send-email Uwe Kleine-König @ 2007-11-16 10:49 ` Jeff King 2007-11-16 11:14 ` Uwe Kleine-König 2007-11-17 0:48 ` Junio C Hamano 0 siblings, 2 replies; 9+ messages in thread From: Jeff King @ 2007-11-16 10:49 UTC (permalink / raw To: Uwe Kleine-König, Junio C Hamano Cc: git, Brian Swetland, Russell King - ARM Linux, Nicolas Pitre On Thu, Nov 15, 2007 at 11:57:26AM +0100, Uwe Kleine-König wrote: > send-email adds "From: A. U. Thor <author@tld>" to the body if sender > and From: in the patch to send differ. > > Both is just fine, but if the author has some non-ascii characters in > her name but the body is ascii-only the resulting mail is broken. I posted an untested fix to this and discussed the issue in http://article.gmane.org/gmane.comp.version-control.git/64426 http://article.gmane.org/gmane.comp.version-control.git/64436 but nobody seems to have been interested after that (I don't even use git-send-email myself). Below is an updated patch (there was a typo in one of the regexes in the original) that meets my limited testing for the all-utf8 case (I don't know how people actually use alternate encodings with git, if at all, so I don't know that I can put together a good test case). My test case was something like: git-clone git test && cd test echo junk >>Makefile git-commit -m junk --author 'Uwe Kleine-König <peff@peff.net>' -a git-format-patch HEAD^ git-send-email 0001-junk.patch > What about adding the Content-Type and Content-Transfer-Encoding headers > in any case? You could probably add them unconditionally, but it would be nice to have them match the encoding, so you'd still want to pick them out of the rfc2047 encoding in the from header. -Peff -- >8 -- git-send-email: add charset header if we add encoded 'From' We sometimes pick out the original rfc822 'From' header and include it in the body of the message. If the original author's name needs encoding, then we should specify that in the content-type header. If we already had a content-type header in the mail, then we may need to re-encode. The logic is there to detect this case, but it doesn't actually do the re-encoding. Signed-off-by: Jeff King <peff@peff.net> --- git-send-email.perl | 34 +++++++++++++++++++++++++++++++--- 1 files changed, 31 insertions(+), 3 deletions(-) diff --git a/git-send-email.perl b/git-send-email.perl index f9bd2e5..fd0a4ad 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -514,11 +514,13 @@ $time = time - scalar $#files; sub unquote_rfc2047 { local ($_) = @_; - if (s/=\?utf-8\?q\?(.*)\?=/$1/g) { + my $encoding; + if (s/=\?([^?]+)\?q\?(.*)\?=/$2/g) { + $encoding = $1; s/_/ /g; s/=([0-9A-F]{2})/chr(hex($1))/eg; } - return "$_"; + return wantarray ? ($_, $encoding) : $_; } # use the simplest quoting being able to handle the recipient @@ -667,6 +669,9 @@ foreach my $t (@files) { open(F,"<",$t) or die "can't open file $t"; my $author = undef; + my $author_encoding; + my $has_content_type; + my $body_encoding; @cc = @initial_cc; @xh = (); my $input_format = undef; @@ -692,12 +697,20 @@ foreach my $t (@files) { next if ($suppress_from); } elsif ($1 eq 'From') { - $author = unquote_rfc2047($2); + ($author, $author_encoding) + = unquote_rfc2047($2); } printf("(mbox) Adding cc: %s from line '%s'\n", $2, $_) unless $quiet; push @cc, $2; } + elsif (/^Content-type:/i) { + $has_content_type = 1; + if (/charset="?[^ "]+/) { + $body_encoding = $1; + } + push @xh, $_; + } elsif (!/^Date:\s/ && /^[-A-Za-z]+:\s+\S/) { push @xh, $_; } @@ -756,6 +769,21 @@ foreach my $t (@files) { if (defined $author) { $message = "From: $author\n\n$message"; + if (defined $author_encoding) { + if ($has_content_type) { + if ($body_encoding eq $author_encoding) { + # ok, we already have the right encoding + } + else { + # uh oh, we should re-encode + } + } + else { + push @xh, + 'MIME-Version: 1.0', + "Content-Type: text/plain; charset=$author_encoding"; + } + } } send_message(); -- 1.5.3.1.47.g88b7d-dirty ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [BUG] encoding problem with format-patch + send-email 2007-11-16 10:49 ` Jeff King @ 2007-11-16 11:14 ` Uwe Kleine-König 2007-11-17 0:48 ` Junio C Hamano 1 sibling, 0 replies; 9+ messages in thread From: Uwe Kleine-König @ 2007-11-16 11:14 UTC (permalink / raw To: Jeff King Cc: Junio C Hamano, git, Brian Swetland, Russell King - ARM Linux, Nicolas Pitre Hello Jeff, > sub unquote_rfc2047 { > local ($_) = @_; > - if (s/=\?utf-8\?q\?(.*)\?=/$1/g) { > + my $encoding; > + if (s/=\?([^?]+)\?q\?(.*)\?=/$2/g) { > + $encoding = $1; > s/_/ /g; > s/=([0-9A-F]{2})/chr(hex($1))/eg; > } > - return "$_"; > + return wantarray ? ($_, $encoding) : $_; > } I don't know perl very well, but that wantarray seems hacky. (Something in my head wants to have it always return ($_, $encoding) and fix all callers. :-) > [...] > @@ -756,6 +769,21 @@ foreach my $t (@files) { > > if (defined $author) { > $message = "From: $author\n\n$message"; > + if (defined $author_encoding) { > + if ($has_content_type) { > + if ($body_encoding eq $author_encoding) { > + # ok, we already have the right encoding > + } > + else { > + # uh oh, we should re-encode IMHO we should bail here or do the recoding (and bail if that fails). OTH this patch improves send-emails behaviour because currently it doesn't bother at all and with this patch it could at least fix the common cases. So Acked-by: Uwe Kleine-König <Uwe.Kleine-Koenig@digi.com> but note I only read the code, I didn't run it. > + } > + } > + else { > + push @xh, > + 'MIME-Version: 1.0', > + "Content-Type: text/plain; charset=$author_encoding"; > + } > + } > } Best regards Uwe -- Uwe Kleine-König, Software Engineer Digi International GmbH Branch Breisach, Küferstrasse 8, 79206 Breisach, Germany Tax: 315/5781/0242 / VAT: DE153662976 / Reg. Amtsgericht Dortmund HRB 13962 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] encoding problem with format-patch + send-email 2007-11-16 10:49 ` Jeff King 2007-11-16 11:14 ` Uwe Kleine-König @ 2007-11-17 0:48 ` Junio C Hamano 2007-11-19 10:49 ` Uwe Kleine-König 1 sibling, 1 reply; 9+ messages in thread From: Junio C Hamano @ 2007-11-17 0:48 UTC (permalink / raw To: Jeff King Cc: Uwe Kleine-König, git, Brian Swetland, Russell King - ARM Linux, Nicolas Pitre Jeff King <peff@peff.net> writes: > git-send-email: add charset header if we add encoded 'From' Thanks. > If we already had a content-type header in the mail, then we > may need to re-encode. The logic is there to detect > this case, but it doesn't actually do the re-encoding. Although the charset on rfc2047 encoded header fields can be independent of the charset in the body, I think you need to do crazy things to do so. Will queue. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] encoding problem with format-patch + send-email 2007-11-17 0:48 ` Junio C Hamano @ 2007-11-19 10:49 ` Uwe Kleine-König 2007-11-19 10:58 ` Brian Swetland 2007-11-20 12:54 ` [PATCH] send-email: add transfer encoding header with content-type Jeff King 0 siblings, 2 replies; 9+ messages in thread From: Uwe Kleine-König @ 2007-11-19 10:49 UTC (permalink / raw To: Jeff King Cc: Junio C Hamano, git, Brian Swetland, Russell King - ARM Linux, Nicolas Pitre Hello Jeff Brian sent another mail to the linux-arm-kernel mailing list, now spotting: Content-Type: text/plain; charset=UTF-8 but no Content-Transfer-Encoding:. This yield a 7bit mail with 8bit characters. I think we should add Content-Transfer-Encoding: 8bit , too. Best regards Uwe -- Uwe Kleine-König $ dc << EOF [d1-d1<a]sa99d1<a1[rdn555760928P*pz1<a]salax EOF ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [BUG] encoding problem with format-patch + send-email 2007-11-19 10:49 ` Uwe Kleine-König @ 2007-11-19 10:58 ` Brian Swetland 2007-11-20 12:54 ` [PATCH] send-email: add transfer encoding header with content-type Jeff King 1 sibling, 0 replies; 9+ messages in thread From: Brian Swetland @ 2007-11-19 10:58 UTC (permalink / raw To: Uwe Kleine-König, Jeff King, Junio C Hamano, git [Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de>] > Hello Jeff > > Brian sent another mail to the linux-arm-kernel mailing list, now > spotting: > > Content-Type: text/plain; charset=UTF-8 > > but no Content-Transfer-Encoding:. This yield a 7bit mail with 8bit > characters. I actually tacked the Content-Type on by hand on that one. I haven't had a chance to try the updated send-email, so it may well do the right thing. Brian ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] send-email: add transfer encoding header with content-type 2007-11-19 10:49 ` Uwe Kleine-König 2007-11-19 10:58 ` Brian Swetland @ 2007-11-20 12:54 ` Jeff King 2007-11-20 13:49 ` Uwe Kleine-König 2007-11-21 6:58 ` Junio C Hamano 1 sibling, 2 replies; 9+ messages in thread From: Jeff King @ 2007-11-20 12:54 UTC (permalink / raw To: Uwe Kleine-König Cc: Junio C Hamano, git, Brian Swetland, Russell King - ARM Linux, Nicolas Pitre We add the content-type header only when we have non-7bit characters from the 'From' header, so we really need to specify the encoding (in other cases, where the commit text needed a content-type, git-format-patch will already have added the encoding header). Signed-off-by: Jeff King <peff@peff.net> --- On Mon, Nov 19, 2007 at 11:49:50AM +0100, Uwe Kleine-König wrote: > but no Content-Transfer-Encoding:. This yield a 7bit mail with 8bit > characters. > > I think we should add > > Content-Transfer-Encoding: 8bit Even though Brian's mail turned out to be hand-generated, this problem does exist in git-send-email. I don't know why I didn't add the encoding header in the first place, since it is clearly required. Junio, I think this is maint-worthy. git-send-email.perl | 3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/git-send-email.perl b/git-send-email.perl index fd0a4ad..d7b8391 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -781,7 +781,8 @@ foreach my $t (@files) { else { push @xh, 'MIME-Version: 1.0', - "Content-Type: text/plain; charset=$author_encoding"; + "Content-Type: text/plain; charset=$author_encoding", + 'Content-Transfer-Encoding: 8bit'; } } } -- 1.5.3.6.1784.gd1b1d-dirty ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] send-email: add transfer encoding header with content-type 2007-11-20 12:54 ` [PATCH] send-email: add transfer encoding header with content-type Jeff King @ 2007-11-20 13:49 ` Uwe Kleine-König 2007-11-21 6:58 ` Junio C Hamano 1 sibling, 0 replies; 9+ messages in thread From: Uwe Kleine-König @ 2007-11-20 13:49 UTC (permalink / raw To: Jeff King Cc: Junio C Hamano, git, Brian Swetland, Russell King - ARM Linux, Nicolas Pitre Hello, Jeff King wrote: > We add the content-type header only when we have non-7bit > characters from the 'From' header, so we really need to > specify the encoding (in other cases, where the commit text > needed a content-type, git-format-patch will already have > added the encoding header). > > Signed-off-by: Jeff King <peff@peff.net> Acked-by: Uwe Kleine-König <ukleinek@informatik.uni-freiburg.de> -- Uwe Kleine-König $ dc << EOF [d1-d1<a]sa99d1<a1[rdn555760928P*pz1<a]salax EOF ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] send-email: add transfer encoding header with content-type 2007-11-20 12:54 ` [PATCH] send-email: add transfer encoding header with content-type Jeff King 2007-11-20 13:49 ` Uwe Kleine-König @ 2007-11-21 6:58 ` Junio C Hamano 1 sibling, 0 replies; 9+ messages in thread From: Junio C Hamano @ 2007-11-21 6:58 UTC (permalink / raw To: Jeff King Cc: Uwe Kleine-König, git, Brian Swetland, Russell King - ARM Linux, Nicolas Pitre Jeff King <peff@peff.net> writes: >> I think we should add >> >> Content-Transfer-Encoding: 8bit > > Even though Brian's mail turned out to be hand-generated, this problem > does exist in git-send-email. I don't know why I didn't add the encoding > header in the first place, since it is clearly required. > > Junio, I think this is maint-worthy. Yeah, looks good. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2007-11-21 6:59 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2007-11-15 10:57 [BUG] encoding problem with format-patch + send-email Uwe Kleine-König 2007-11-16 10:49 ` Jeff King 2007-11-16 11:14 ` Uwe Kleine-König 2007-11-17 0:48 ` Junio C Hamano 2007-11-19 10:49 ` Uwe Kleine-König 2007-11-19 10:58 ` Brian Swetland 2007-11-20 12:54 ` [PATCH] send-email: add transfer encoding header with content-type Jeff King 2007-11-20 13:49 ` Uwe Kleine-König 2007-11-21 6:58 ` Junio C Hamano
Code repositories for project(s) associated with this public inbox https://80x24.org/mirrors/git.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).