user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* [PATCH] t/msg_iter: test for X-UNKNOWN charset from Alpine
@ 2020-02-14  7:05  4% Eric Wong
  0 siblings, 0 replies; 3+ results
From: Eric Wong @ 2020-02-14  7:05 UTC (permalink / raw)
  To: meta

A long overdue test for behavior established in 2016.

Fixes: 1b28cc7f00a866cb ("view: try assuming UTF-8 for bogus charsets")
---
 MANIFEST               |  1 +
 t/msg_iter.t           | 20 ++++++++++++++++++++
 t/x-unknown-alpine.eml | 21 +++++++++++++++++++++
 3 files changed, 42 insertions(+)
 create mode 100644 t/x-unknown-alpine.eml

diff --git a/MANIFEST b/MANIFEST
index 5acd8531..48df274e 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -299,6 +299,7 @@ t/watch_maildir.t
 t/watch_maildir_v2.t
 t/www_listing.t
 t/www_static.t
+t/x-unknown-alpine.eml
 t/xcpdb-reshard.t
 xt/git-http-backend.t
 xt/git_async_cmp.t
diff --git a/t/msg_iter.t b/t/msg_iter.t
index de9c39fa..e33bfc69 100644
--- a/t/msg_iter.t
+++ b/t/msg_iter.t
@@ -4,6 +4,7 @@ use strict;
 use warnings;
 use Test::More;
 use Email::MIME;
+use PublicInbox::Hval qw(ascii_html);
 use_ok('PublicInbox::MsgIter');
 
 {
@@ -58,5 +59,24 @@ use_ok('PublicInbox::MsgIter');
 	is(index($raw, '$$$'), -1, 'no unescaped $$$');
 }
 
+{
+	my $f = 't/x-unknown-alpine.eml';
+	my $mime = Email::MIME->new(do {
+		open my $fh, '<', $f or die "open($f): $!";
+		local $/;
+		binmode $fh;
+		<$fh>;
+	});
+	my $raw = '';
+	msg_iter($mime, sub {
+		my ($part, $level, @ex) = @{$_[0]};
+		my ($s, $err) = msg_part_text($part, 'text/plain');
+		$raw .= $s;
+	});
+	like($raw, qr!^\thttps://!ms, 'tab expanded with X-UNKNOWN');
+	like(ascii_html($raw), qr/&#8226; bullet point/s,
+		'got bullet point when X-UNKNOWN assumes UTF-8');
+}
+
 done_testing();
 1;
diff --git a/t/x-unknown-alpine.eml b/t/x-unknown-alpine.eml
new file mode 100644
index 00000000..75b0bc55
--- /dev/null
+++ b/t/x-unknown-alpine.eml
@@ -0,0 +1,21 @@
+Date:	Sat, 13 Aug 2016 12:14:15 +0200 (CEST)
+From:	Alpine User <a@example.com>
+To: <list@example.com>
+Subject: charset=X-UNKNOWN test
+Message-ID: <alpine.DEB.2.20.1608131214070.4924@example>
+User-Agent: Alpine 2.20 (DEB 67 2015-01-07)
+MIME-Version: 1.0
+Content-Type: multipart/mixed; BOUNDARY="8323329-703494712-1471083256=:4924"
+
+  This message is in MIME format.  The first part should be readable text,
+  while the remaining parts are likely unreadable without MIME-aware tools.
+
+--8323329-703494712-1471083256=:4924
+Content-Type: text/plain; charset=X-UNKNOWN
+Content-Transfer-Encoding: QUOTED-PRINTABLE
+
+=09https://example.com/
+
+  =E2=80=A2 bullet point
+
+--8323329-703494712-1471083256=:4924--

^ permalink raw reply related	[relevance 4%]

* [PATCH 3/3] view: try assuming UTF-8 for bogus charsets
  2016-08-18  1:39  7% [PATCH 0/3] attempt to display text/plain with bogus charsets Eric Wong
@ 2016-08-18  1:39  7% ` Eric Wong
  0 siblings, 0 replies; 3+ results
From: Eric Wong @ 2016-08-18  1:39 UTC (permalink / raw)
  To: meta; +Cc: Thomas Ferris Nicolaisen, Johannes Schindelin

For some reason, Alpine will set X-UNKNOWN for valid UTF-8.
Since we favor UTF-8 HTML anyways, try forcing Email::MIME to
handle text/plain as UTF-8 which might show up better.

At least this change renders

	<alpine.DEB.2.20.1608131214070.4924@virtualbox>

properly by showing "•" (&#8226;) instead of
"â ¢" (&#226;&#128;&#162;)

Reported-by: Thomas Ferris Nicolaisen <tfnico@gmail.com>
---
 lib/PublicInbox/View.pm | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 3f0e122..6997c1c 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -457,8 +457,14 @@ sub add_text_body {
 	my $err = $@;
 	if ($err) {
 		if ($ct =~ m!\btext/plain\b!i) {
+			# Try to assume UTF-8 because Alpine seems to
+			# do wacky things and set charset=X-UNKNOWN
+			$part->charset_set('UTF-8');
+			$s = eval { $part->body_str };
+
+			# If forcing charset=UTF-8 failed,
 			# attach_link will warn further down...
-			$s = $part->body;
+			$s = $part->body if $@;
 		} else {
 			return attach_link($upfx, $ct, $p, $fn);
 		}
-- 
EW


^ permalink raw reply related	[relevance 7%]

* [PATCH 0/3] attempt to display text/plain with bogus charsets
@ 2016-08-18  1:39  7% Eric Wong
  2016-08-18  1:39  7% ` [PATCH 3/3] view: try assuming UTF-8 for " Eric Wong
  0 siblings, 1 reply; 3+ results
From: Eric Wong @ 2016-08-18  1:39 UTC (permalink / raw)
  To: meta; +Cc: Thomas Ferris Nicolaisen, Johannes Schindelin

Thomas Ferris Nicolaisen reported a problem with Dscho's
Git for Windows 2.9.3 announcement not rendering text inline:

https://public-inbox.org/git/alpine.DEB.2.20.1608131214070.4924@virtualbox/

This was caused by the bogus charset=X-UNKNOWN set by Alpine.
Attempt to handle it as UTF-8, but fall back to blindly showing
it to the user.  In either case, warn users about the mangled
text.


Eric Wong (3):
  view: attach_link uses string concatentation
  view: try to display bogus charsets for text/plain
  view: try assuming UTF-8 for bogus charsets

 lib/PublicInbox/View.pm | 40 +++++++++++++++++++++++++++++++---------
 1 file changed, 31 insertions(+), 9 deletions(-)

-- 
EW

^ permalink raw reply	[relevance 7%]

Results 1-3 of 3 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2016-08-18  1:39  7% [PATCH 0/3] attempt to display text/plain with bogus charsets Eric Wong
2016-08-18  1:39  7% ` [PATCH 3/3] view: try assuming UTF-8 for " Eric Wong
2020-02-14  7:05  4% [PATCH] t/msg_iter: test for X-UNKNOWN charset from Alpine Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).