git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
* [PATCH 1/2] git-send-email.perl: improve detection of MIME encoded-words
@ 2009-06-07  1:12 Brandon Casey
  2009-06-07  1:12 ` [PATCH 2/2] send-email: use UTF-8 rather than utf-8 for consistency Brandon Casey
  2009-06-07 16:45 ` [PATCH 1/2] git-send-email.perl: improve detection of MIME encoded-words Brandon Casey
  0 siblings, 2 replies; 5+ messages in thread
From: Brandon Casey @ 2009-06-07  1:12 UTC (permalink / raw)
  To: git; +Cc: gitster, Brandon Casey

According to rfc2047, an encoded word has the following form:

   encoded-word = "=?" charset "?" encoding "?" encoded-text "?="

   charset = token

   encoding = token

   token = <Any CHAR except SPACE, CTLs, and especials>

   especials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "
               <"> / "/" / "[" / "]" / "?" / "." / "="

   encoded-text = <Any printable ASCII character other than "?"
                     or SPACE>

And rfc822 defines CTLs as:

    CTL = <any ASCII control;  (  0- 37,  0.- 31.)
           character and DEL>; (    177,     127.)

The original code only detected rfc2047 encoded strings when the charset
was UTF-8.  This patch generalizes the matching expression and breaks the
check for an rfc2047 encoded string into its own function.  There's no real
functional change, since any properly rfc2047 encoded string (the ones that
weren't UTF-8) would have fallen through the remaining 'if' statements and
been returned unchanged.

Signed-off-by: Brandon Casey <drafnel@gmail.com>
---
 git-send-email.perl |   10 +++++++++-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index 3d6a982..e735815 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -772,6 +772,14 @@ sub quote_rfc2047 {
 	return $_;
 }
 
+sub is_rfc2047_quoted {
+	my $s = shift;
+	my $token = '[^][()<>@,;:"\/?.= \000-\037\177]+';
+	my $encoded_text = '[!->@-~]+';
+	length($s) <= 75 &&
+	$s =~ m/^(?:"[[:ascii:]]*"|=\?$token\?$token\?$encoded_text\?=)$/o;
+}
+
 # use the simplest quoting being able to handle the recipient
 sub sanitize_address
 {
@@ -783,7 +791,7 @@ sub sanitize_address
 	}
 
 	# if recipient_name is already quoted, do nothing
-	if ($recipient_name =~ /^("[[:ascii:]]*"|=\?utf-8\?q\?.*\?=)$/) {
+	if (is_rfc2047_quoted($recipient_name)) {
 		return $recipient;
 	}
 
-- 
1.6.3.1.9.g95405b

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] send-email: use UTF-8 rather than utf-8 for consistency
  2009-06-07  1:12 [PATCH 1/2] git-send-email.perl: improve detection of MIME encoded-words Brandon Casey
@ 2009-06-07  1:12 ` Brandon Casey
  2009-06-07 16:45 ` [PATCH 1/2] git-send-email.perl: improve detection of MIME encoded-words Brandon Casey
  1 sibling, 0 replies; 5+ messages in thread
From: Brandon Casey @ 2009-06-07  1:12 UTC (permalink / raw)
  To: git; +Cc: gitster, Brandon Casey

The rest of the git source has been converted to use upper-case character
encoding names to assist older platforms.  The charset attribute of MIME
is defined to be case-insensitive, but older platforms may still have an
easier time dealing with upper-case rather than lower-case.  So do so for
send-email too.

Update t9001 to handle the changes.

Signed-off-by: Brandon Casey <drafnel@gmail.com>
---
 git-send-email.perl   |    4 ++--
 t/t9001-send-email.sh |    8 ++++----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index e735815..1d48349 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -577,7 +577,7 @@ EOT
 			if ($need_8bit_cte) {
 				print C2 "MIME-Version: 1.0\n",
 					 "Content-Type: text/plain; ",
-					   "charset=utf-8\n",
+					   "charset=UTF-8\n",
 					 "Content-Transfer-Encoding: 8bit\n";
 			}
 		} elsif (/^MIME-Version:/i) {
@@ -766,7 +766,7 @@ sub unquote_rfc2047 {
 
 sub quote_rfc2047 {
 	local $_ = shift;
-	my $encoding = shift || 'utf-8';
+	my $encoding = shift || 'UTF-8';
 	s/([^-a-zA-Z0-9!*+\/])/sprintf("=%02X", ord($1))/eg;
 	s/(.*)/=\?$encoding\?q\?$1\?=/;
 	return $_;
diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
index ce26ea4..2ce24cd 100755
--- a/t/t9001-send-email.sh
+++ b/t/t9001-send-email.sh
@@ -533,7 +533,7 @@ test_expect_success 'utf8 Cc is rfc2047 encoded' '
 	--smtp-server="$(pwd)/fake.sendmail" \
 	outdir/*.patch &&
 	grep "^Cc:" msgtxt1 |
-	grep "=?utf-8?q?=C3=A0=C3=A9=C3=AC=C3=B6=C3=BA?= <utf8@example.com>"
+	grep "=?UTF-8?q?=C3=A0=C3=A9=C3=AC=C3=B6=C3=BA?= <utf8@example.com>"
 '
 
 test_expect_success '--compose adds MIME for utf8 body' '
@@ -550,7 +550,7 @@ test_expect_success '--compose adds MIME for utf8 body' '
 	  --smtp-server="$(pwd)/fake.sendmail" \
 	  $patches &&
 	grep "^utf8 body" msgtxt1 &&
-	grep "^Content-Type: text/plain; charset=utf-8" msgtxt1
+	grep "^Content-Type: text/plain; charset=UTF-8" msgtxt1
 '
 
 test_expect_success '--compose respects user mime type' '
@@ -573,7 +573,7 @@ test_expect_success '--compose respects user mime type' '
 	  $patches &&
 	grep "^utf8 body" msgtxt1 &&
 	grep "^Content-Type: text/plain; charset=iso-8859-1" msgtxt1 &&
-	! grep "^Content-Type: text/plain; charset=utf-8" msgtxt1
+	! grep "^Content-Type: text/plain; charset=UTF-8" msgtxt1
 '
 
 test_expect_success '--compose adds MIME for utf8 subject' '
@@ -586,7 +586,7 @@ test_expect_success '--compose adds MIME for utf8 subject' '
 	  --smtp-server="$(pwd)/fake.sendmail" \
 	  $patches &&
 	grep "^fake edit" msgtxt1 &&
-	grep "^Subject: =?utf-8?q?utf8-s=C3=BCbj=C3=ABct?=" msgtxt1
+	grep "^Subject: =?UTF-8?q?utf8-s=C3=BCbj=C3=ABct?=" msgtxt1
 '
 
 test_expect_success 'detects ambiguous reference/file conflict' '
-- 
1.6.3.1.9.g95405b

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] git-send-email.perl: improve detection of MIME  encoded-words
  2009-06-07  1:12 [PATCH 1/2] git-send-email.perl: improve detection of MIME encoded-words Brandon Casey
  2009-06-07  1:12 ` [PATCH 2/2] send-email: use UTF-8 rather than utf-8 for consistency Brandon Casey
@ 2009-06-07 16:45 ` Brandon Casey
  2009-06-08  0:25   ` [PATCH v2] " Brandon Casey
  1 sibling, 1 reply; 5+ messages in thread
From: Brandon Casey @ 2009-06-07 16:45 UTC (permalink / raw)
  To: git; +Cc: gitster, Brandon Casey

On Sat, Jun 6, 2009 at 8:12 PM, Brandon Casey <drafnel@gmail.com> wrote:

>  git-send-email.perl |   10 +++++++++-
>  1 files changed, 9 insertions(+), 1 deletions(-)
>
> diff --git a/git-send-email.perl b/git-send-email.perl
> index 3d6a982..e735815 100755
> --- a/git-send-email.perl
> +++ b/git-send-email.perl
> @@ -772,6 +772,14 @@ sub quote_rfc2047 {
>        return $_;
>  }
>
> +sub is_rfc2047_quoted {
> +       my $s = shift;
> +       my $token = '[^][()<>@,;:"\/?.= \000-\037\177]+';

The last element of this regex should be changed to \177-\377 since
non-ascii characters are disallowed too.

I won't be able to send a new patch for a while.  Home internet is
down at the moment, and I don't have an alternative right now.

-brandon

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2] git-send-email.perl: improve detection of MIME encoded-words
  2009-06-07 16:45 ` [PATCH 1/2] git-send-email.perl: improve detection of MIME encoded-words Brandon Casey
@ 2009-06-08  0:25   ` Brandon Casey
  2009-06-08  0:31     ` Brandon Casey
  0 siblings, 1 reply; 5+ messages in thread
From: Brandon Casey @ 2009-06-08  0:25 UTC (permalink / raw)
  To: git; +Cc: gitster, Brandon Casey

According to rfc2047, an encoded word has the following form:

   encoded-word = "=?" charset "?" encoding "?" encoded-text "?="

   charset = token

   encoding = token

   token = <Any CHAR except SPACE, CTLs, and especials>

   especials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "
               <"> / "/" / "[" / "]" / "?" / "." / "="

   encoded-text = <Any printable ASCII character other than "?"
                     or SPACE>

And rfc822 defines CHARs and CTLs as:

    CHAR = <any ASCII character> ; (  0-177,  0.-127.)

    CTL = <any ASCII control     ; (  0- 37,  0.- 31.)
           character and DEL>    ; (    177,     127.)

The original code only detected rfc2047 encoded strings when the charset
was UTF-8.  This patch generalizes the matching expression and breaks the
check for an rfc2047 encoded string into its own function.  There's no real
functional change, since any properly rfc2047 encoded string would have
fallen through the remaining 'if' statements and been returned unchanged.

Signed-off-by: Brandon Casey <drafnel@gmail.com>
---


Here's a replacement patch which increases the range of excluded characters
allowed as tokens so only ASCII characters are allowed (minus the other
exclusions).

-brandon


 git-send-email.perl |   10 +++++++++-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index 3d6a982..8a1a40d 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -772,6 +772,14 @@ sub quote_rfc2047 {
 	return $_;
 }
 
+sub is_rfc2047_quoted {
+	my $s = shift;
+	my $token = '[^][()<>@,;:"\/?.= \000-\037\177-\377]+';
+	my $encoded_text = '[!->@-~]+';
+	length($s) <= 75 &&
+	$s =~ m/^(?:"[[:ascii:]]*"|=\?$token\?$token\?$encoded_text\?=)$/o;
+}
+
 # use the simplest quoting being able to handle the recipient
 sub sanitize_address
 {
@@ -783,7 +791,7 @@ sub sanitize_address
 	}
 
 	# if recipient_name is already quoted, do nothing
-	if ($recipient_name =~ /^("[[:ascii:]]*"|=\?utf-8\?q\?.*\?=)$/) {
+	if (is_rfc2047_quoted($recipient_name)) {
 		return $recipient;
 	}
 
-- 
1.6.3.1.9.g95405b

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] git-send-email.perl: improve detection of MIME  encoded-words
  2009-06-08  0:25   ` [PATCH v2] " Brandon Casey
@ 2009-06-08  0:31     ` Brandon Casey
  0 siblings, 0 replies; 5+ messages in thread
From: Brandon Casey @ 2009-06-08  0:31 UTC (permalink / raw)
  To: git; +Cc: gitster, Brandon Casey

On Sun, Jun 7, 2009 at 7:25 PM, Brandon Casey<drafnel@gmail.com> wrote:

> Here's a replacement patch which increases the range of excluded characters
> allowed as tokens so only ASCII characters are allowed (minus the other
> exclusions).

Here's the diff from the original patch:

diff --git a/git-send-email.perl b/git-send-email.perl
index e735815..8a1a40d 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -774,7 +774,7 @@ sub quote_rfc2047 {

 sub is_rfc2047_quoted {
        my $s = shift;
-       my $token = '[^][()<>@,;:"\/?.= \000-\037\177]+';
+       my $token = '[^][()<>@,;:"\/?.= \000-\037\177-\377]+';
        my $encoded_text = '[!->@-~]+';
        length($s) <= 75 &&
        $s =~ m/^(?:"[[:ascii:]]*"|=\?$token\?$token\?$encoded_text\?=)$/o;

-brandon

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-06-08  0:31 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-06-07  1:12 [PATCH 1/2] git-send-email.perl: improve detection of MIME encoded-words Brandon Casey
2009-06-07  1:12 ` [PATCH 2/2] send-email: use UTF-8 rather than utf-8 for consistency Brandon Casey
2009-06-07 16:45 ` [PATCH 1/2] git-send-email.perl: improve detection of MIME encoded-words Brandon Casey
2009-06-08  0:25   ` [PATCH v2] " Brandon Casey
2009-06-08  0:31     ` Brandon Casey

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).