git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Thomas Rast <trast@student.ethz.ch>
To: <git@vger.kernel.org>
Subject: [PATCH] send-email: ask about and declare 8bit mails
Date: Sat, 12 Jun 2010 17:06:20 +0200	[thread overview]
Message-ID: <cebe57bb68b5e8ea445e560bbe6305c915ce8a1c.1276354971.git.trast@student.ethz.ch> (raw)
In-Reply-To: <201006121211.12870.trast@student.ethz.ch>

git-send-email passes on an 8bit mail as-is even if it does not
declare a content-type.  Because the user can edit email between
format-patch and send-email, such invalid mails are unfortunately not
very hard to come by.

Make git-send-email stop and ask about the encoding to use if it
encounters any such mail.  Also provide a configuration setting to
permanently configure an encoding.

Signed-off-by: Thomas Rast <trast@student.ethz.ch>
---

This takes care of what I ran into earlier today.  However, there's
another problem: format-patch doesn't even mark the patch 8bit if its
patch contents (not log message) are non-ASCII.  I'm really not sure
what to do there.

On the practical hand, there's the problem that the entire
log_tree_commit() call chain is geared towards printing on a file, at
which time it's too late.  So we would either have to go in and fix
all of that to support formatting to a strbuf, or rewrite the patch if
it turns out to be non-ASCII.

On the philosophical hand, we don't really care about file encodings
so far, but this requires declaring one.

Either way, I think if vger doesn't accept format-patch;send-email,
something is really wrong :-)


 Documentation/git-send-email.txt |    9 ++++
 git-send-email.perl              |   59 +++++++++++++++++++++++++++++
 t/t9001-send-email.sh            |   77 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 145 insertions(+), 0 deletions(-)

diff --git a/Documentation/git-send-email.txt b/Documentation/git-send-email.txt
index 12622fc..c283084 100644
--- a/Documentation/git-send-email.txt
+++ b/Documentation/git-send-email.txt
@@ -101,6 +101,15 @@ See the CONFIGURATION section for 'sendemail.multiedit'.
 +
 The --to option must be repeated for each user you want on the to list.
 
+--8bit-encoding=<encoding>::
+	When encountering a non-ASCII message or subject that does not
+	declare its encoding, add headers/quoting to indicate it is
+	encoded in <encoding>.  Default is the value of the
+	'sendemail.assume8bitEncoding'; if that is unspecified, this
+	will be prompted for if any non-ASCII files are encountered.
++
+Note that no attempts whatsoever are made to validate the encoding.
+
 
 Sending
 ~~~~~~~
diff --git a/git-send-email.perl b/git-send-email.perl
index 111c981..6b2ac79 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -54,6 +54,7 @@ git send-email [options] <file | directory | rev-list options >
     --in-reply-to           <str>  * Email "In-Reply-To:"
     --annotate                     * Review each patch that will be sent in an editor.
     --compose                      * Open an editor for introduction.
+    --8bit-encoding         <str>  * Encoding to assume 8bit mails if undeclared
 
   Sending:
     --envelope-sender       <str>  * Email envelope sender.
@@ -191,6 +192,7 @@ my ($smtp_server, $smtp_server_port, $smtp_authuser, $smtp_encryption);
 my ($identity, $aliasfiletype, @alias_files, @smtp_host_parts, $smtp_domain);
 my ($validate, $confirm);
 my (@suppress_cc);
+my ($auto_8bit_encoding);
 
 my ($debug_net_smtp) = 0;		# Net::SMTP, see send_message()
 
@@ -222,6 +224,7 @@ my %config_settings = (
     "multiedit" => \$multiedit,
     "confirm"   => \$confirm,
     "from" => \$sender,
+    "assume8bitencoding" => \$auto_8bit_encoding,
 );
 
 # Help users prepare for 1.7.0
@@ -297,6 +300,7 @@ my $rc = GetOptions("sender|from=s" => \$sender,
 		    "thread!" => \$thread,
 		    "validate!" => \$validate,
 		    "format-patch!" => \$format_patch,
+		    "8bit-encoding=s" => \$auto_8bit_encoding,
 	 );
 
 unless ($rc) {
@@ -669,6 +673,34 @@ sub ask {
 	return undef;
 }
 
+my %broken_encoding;
+
+sub file_declares_8bit_cte($) {
+	my $fn = shift;
+	open (my $fh, '<', $fn);
+	while (my $line = <$fh>) {
+		return 1 if ($line =~ /^Content-Transfer-Encoding: .*8bit.*$/);
+	}
+	close $fh;
+	return 0;
+}
+
+foreach my $f (@files) {
+	next unless (body_or_subject_has_nonascii($f)
+		     && !file_declares_8bit_cte($f));
+	$broken_encoding{$f} = 1;
+}
+
+if (!defined $auto_8bit_encoding && scalar %broken_encoding) {
+	print "The following files are 8bit, but do not declare " .
+		"a Content-Transfer-Encoding.\n";
+	foreach my $f (sort keys %broken_encoding) {
+		print "    $f\n";
+	}
+	$auto_8bit_encoding = ask("Which 8bit encoding should I declare [UTF-8]? ",
+				  default => "UTF-8");
+}
+
 my $prompting = 0;
 if (!defined $sender) {
 	$sender = $repoauthor || $repocommitter || '';
@@ -1221,6 +1253,18 @@ foreach my $t (@files) {
 			or die "(cc-cmd) failed to close pipe to '$cc_cmd'";
 	}
 
+	if ($broken_encoding{$t} && !$has_content_type) {
+		$has_content_type = 1;
+		push @xh, "MIME-Version: 1.0",
+			"Content-Type: text/plain; charset=$auto_8bit_encoding",
+			"Content-Transfer-Encoding: 8bit";
+		$body_encoding = $auto_8bit_encoding;
+	}
+
+	if ($broken_encoding{$t} && !is_rfc2047_quoted($subject)) {
+		$subject = quote_rfc2047($subject, $auto_8bit_encoding);
+	}
+
 	if (defined $author and $author ne $sender) {
 		$message = "From: $author\n\n$message";
 		if (defined $author_encoding) {
@@ -1233,6 +1277,7 @@ foreach my $t (@files) {
 				}
 			}
 			else {
+				$has_content_type = 1;
 				push @xh,
 				  'MIME-Version: 1.0',
 				  "Content-Type: text/plain; charset=$author_encoding",
@@ -1310,3 +1355,17 @@ sub file_has_nonascii {
 	}
 	return 0;
 }
+
+sub body_or_subject_has_nonascii {
+	my $fn = shift;
+	open(my $fh, '<', $fn)
+		or die "unable to open $fn: $!\n";
+	while (my $line = <$fh>) {
+		last if $line =~ /^$/;
+		return 1 if $line =~ /^Subject.*[^[:ascii:]]/;
+	}
+	while (my $line = <$fh>) {
+		return 1 if $line =~ /[^[:ascii:]]/;
+	}
+	return 0;
+}
diff --git a/t/t9001-send-email.sh b/t/t9001-send-email.sh
index 640b3d2..0b8a591 100755
--- a/t/t9001-send-email.sh
+++ b/t/t9001-send-email.sh
@@ -918,4 +918,81 @@ test_expect_success '--no-bcc overrides sendemail.bcc' '
 	! grep "RCPT TO:<other@ex.com>" stdout
 '
 
+cat >email-using-8bit <<EOF
+From fe6ecc66ece37198fe5db91fa2fc41d9f4fe5cc4 Mon Sep 17 00:00:00 2001
+Message-Id: <bogus-message-id@example.com>
+From: author@example.com
+Date: Sat, 12 Jun 2010 15:53:58 +0200
+Subject: subject goes here
+
+Dieser deutsche Text enthält einen Umlaut!
+EOF
+
+cat >content-type-decl <<EOF
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+EOF
+
+test_expect_success 'asks about and fixes 8bit encodings' '
+	clean_fake_sendmail &&
+	echo |
+	git send-email --from=author@example.com --to=nobody@example.com \
+			--smtp-server="$(pwd)/fake.sendmail" \
+			email-using-8bit >stdout &&
+	grep "do not declare a Content-Transfer-Encoding" stdout &&
+	grep email-using-8bit stdout &&
+	grep "Which 8bit encoding" stdout &&
+	grep "Content\\|MIME" msgtxt1 >actual &&
+	test_cmp actual content-type-decl
+'
+
+test_expect_success 'sendemail.8bitEncoding works' '
+	clean_fake_sendmail &&
+	git config sendemail.assume8bitEncoding UTF-8 &&
+	echo bogus |
+	git send-email --from=author@example.com --to=nobody@example.com \
+			--smtp-server="$(pwd)/fake.sendmail" \
+			email-using-8bit >stdout &&
+	grep "Content\\|MIME" msgtxt1 >actual &&
+	test_cmp actual content-type-decl
+'
+
+test_expect_success '--8bit-encoding overrides sendemail.8bitEncoding' '
+	clean_fake_sendmail &&
+	git config sendemail.assume8bitEncoding "bogus too" &&
+	echo bogus |
+	git send-email --from=author@example.com --to=nobody@example.com \
+			--smtp-server="$(pwd)/fake.sendmail" \
+			--8bit-encoding=UTF-8 \
+			email-using-8bit >stdout &&
+	grep "Content\\|MIME" msgtxt1 >actual &&
+	test_cmp actual content-type-decl
+'
+
+cat >email-using-8bit <<EOF
+From fe6ecc66ece37198fe5db91fa2fc41d9f4fe5cc4 Mon Sep 17 00:00:00 2001
+Message-Id: <bogus-message-id@example.com>
+From: author@example.com
+Date: Sat, 12 Jun 2010 15:53:58 +0200
+Subject: Dieser Betreff enthält auch einen Umlaut!
+
+Nothing to see here.
+EOF
+
+cat >expected <<EOF
+Subject: =?UTF-8?q?Dieser=20Betreff=20enth=C3=A4lt=20auch=20einen=20Umlaut!?=
+EOF
+
+test_expect_success '--8bit-encoding also treats subject' '
+	clean_fake_sendmail &&
+	echo bogus |
+	git send-email --from=author@example.com --to=nobody@example.com \
+			--smtp-server="$(pwd)/fake.sendmail" \
+			--8bit-encoding=UTF-8 \
+			email-using-8bit >stdout &&
+	grep "Subject" msgtxt1 >actual &&
+	test_cmp expected actual
+'
+
 test_done
-- 
1.7.1.557.gd161

  reply	other threads:[~2010-06-12 15:09 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-06-06  0:05 [PATCH] bash completion: Support "unpushed commits" warnings in __git_ps1 Andrew Sayers
2010-06-06 18:14 ` Thomas Rast
2010-06-06 20:49   ` Andrew Sayers
2010-06-06 21:07     ` Jakub Narebski
2010-06-06 22:19       ` Andrew Sayers
2010-06-07  7:42     ` Thomas Rast
2010-06-08 21:36       ` [RFC/PATCHv2] bash completion: Support "divergence from upstream" " Andrew Sayers
2010-06-09  8:21         ` Peter Kjellerstedt
2010-06-09  8:45         ` John Tapsell
2010-06-09 21:02           ` Steven Michalske
2010-06-09  9:17         ` Michael J Gruber
2010-06-09 20:48           ` Michael J Gruber
2010-06-09 21:03             ` Michael J Gruber
2010-06-10 11:47         ` [PATCH 0/2] " Thomas Rast
2010-06-10 11:47           ` [PATCH 1/2] rev-list: introduce --count option Thomas Rast
2010-06-10 11:47           ` [PATCH 2/2] bash completion: Support "divergence from upstream" warnings in __git_ps1 Thomas Rast
2010-06-12  0:00             ` SZEDER Gábor
2010-06-12 10:03               ` [PATCH v2 0/2] " Thomas Rast
2010-06-12  9:59                 ` [PATCH v2 1/2] rev-list: introduce --count option Thomas Rast
2010-06-12  9:59                 ` [PATCH v2 2/2] bash completion: Support "divergence from upstream" warnings in __git_ps1 Thomas Rast
2010-06-14  3:13                   ` Junio C Hamano
2010-06-14  7:44                     ` Thomas Rast
2010-06-14 12:36                   ` SZEDER Gábor
2010-06-12 10:11                 ` vger doesn't like UTF-8 from send-email Thomas Rast
2010-06-12 15:06                   ` Thomas Rast [this message]
2010-06-12 16:28                     ` [PATCH] send-email: ask about and declare 8bit mails Junio C Hamano
2010-06-13 15:09                       ` Thomas Rast
2010-06-13  4:15                   ` vger doesn't like UTF-8 from send-email Michael Witten
2010-06-14 11:57                     ` Erik Faye-Lund
2010-06-12 20:50                 ` [PATCH] bash completion: Support "divergence from upstream" messages in __git_ps1 Andrew Sayers
2010-06-14  7:42                   ` Thomas Rast
2010-06-15 21:50                     ` [PATCHv4] " Andrew Sayers
2010-06-16 19:05                       ` Junio C Hamano
2010-06-16 19:11                         ` Thomas Rast
2010-06-17 21:31                         ` [PATCHv5 0/2] " Andrew Sayers
2010-06-18 16:10                           ` Junio C Hamano
2010-06-18 21:02                             ` Andrew Sayers
2010-06-17 21:32                         ` [PATCHv5 1/2] " Andrew Sayers
2010-06-17 21:32                         ` [PATCHv5 2/2] bash-completion: Fix __git_ps1 to work with "set -u" Andrew Sayers
2010-06-10 13:31           ` [PATCH 0/2] bash completion: Support "divergence from upstream" warnings in __git_ps1 Michael J Gruber
2010-06-10 12:03         ` [RFC/PATCHv2] " Thomas Rast
2010-06-06 20:12 ` [PATCH] bash completion: Support "unpushed commits" " Thomas Rast

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cebe57bb68b5e8ea445e560bbe6305c915ce8a1c.1276354971.git.trast@student.ethz.ch \
    --to=trast@student.ethz.ch \
    --cc=git@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).