* [PATCH] t/msg_iter: test for X-UNKNOWN charset from Alpine
@ 2020-02-14 7:05 4% Eric Wong
0 siblings, 0 replies; 3+ results
From: Eric Wong @ 2020-02-14 7:05 UTC (permalink / raw)
To: meta
A long overdue test for behavior established in 2016.
Fixes: 1b28cc7f00a866cb ("view: try assuming UTF-8 for bogus charsets")
---
MANIFEST | 1 +
t/msg_iter.t | 20 ++++++++++++++++++++
t/x-unknown-alpine.eml | 21 +++++++++++++++++++++
3 files changed, 42 insertions(+)
create mode 100644 t/x-unknown-alpine.eml
diff --git a/MANIFEST b/MANIFEST
index 5acd8531..48df274e 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -299,6 +299,7 @@ t/watch_maildir.t
t/watch_maildir_v2.t
t/www_listing.t
t/www_static.t
+t/x-unknown-alpine.eml
t/xcpdb-reshard.t
xt/git-http-backend.t
xt/git_async_cmp.t
diff --git a/t/msg_iter.t b/t/msg_iter.t
index de9c39fa..e33bfc69 100644
--- a/t/msg_iter.t
+++ b/t/msg_iter.t
@@ -4,6 +4,7 @@ use strict;
use warnings;
use Test::More;
use Email::MIME;
+use PublicInbox::Hval qw(ascii_html);
use_ok('PublicInbox::MsgIter');
{
@@ -58,5 +59,24 @@ use_ok('PublicInbox::MsgIter');
is(index($raw, '$$$'), -1, 'no unescaped $$$');
}
+{
+ my $f = 't/x-unknown-alpine.eml';
+ my $mime = Email::MIME->new(do {
+ open my $fh, '<', $f or die "open($f): $!";
+ local $/;
+ binmode $fh;
+ <$fh>;
+ });
+ my $raw = '';
+ msg_iter($mime, sub {
+ my ($part, $level, @ex) = @{$_[0]};
+ my ($s, $err) = msg_part_text($part, 'text/plain');
+ $raw .= $s;
+ });
+ like($raw, qr!^\thttps://!ms, 'tab expanded with X-UNKNOWN');
+ like(ascii_html($raw), qr/• bullet point/s,
+ 'got bullet point when X-UNKNOWN assumes UTF-8');
+}
+
done_testing();
1;
diff --git a/t/x-unknown-alpine.eml b/t/x-unknown-alpine.eml
new file mode 100644
index 00000000..75b0bc55
--- /dev/null
+++ b/t/x-unknown-alpine.eml
@@ -0,0 +1,21 @@
+Date: Sat, 13 Aug 2016 12:14:15 +0200 (CEST)
+From: Alpine User <a@example.com>
+To: <list@example.com>
+Subject: charset=X-UNKNOWN test
+Message-ID: <alpine.DEB.2.20.1608131214070.4924@example>
+User-Agent: Alpine 2.20 (DEB 67 2015-01-07)
+MIME-Version: 1.0
+Content-Type: multipart/mixed; BOUNDARY="8323329-703494712-1471083256=:4924"
+
+ This message is in MIME format. The first part should be readable text,
+ while the remaining parts are likely unreadable without MIME-aware tools.
+
+--8323329-703494712-1471083256=:4924
+Content-Type: text/plain; charset=X-UNKNOWN
+Content-Transfer-Encoding: QUOTED-PRINTABLE
+
+=09https://example.com/
+
+ =E2=80=A2 bullet point
+
+--8323329-703494712-1471083256=:4924--
^ permalink raw reply related [relevance 4%]
* [PATCH 3/3] view: try assuming UTF-8 for bogus charsets
2016-08-18 1:39 7% [PATCH 0/3] attempt to display text/plain with bogus charsets Eric Wong
@ 2016-08-18 1:39 7% ` Eric Wong
0 siblings, 0 replies; 3+ results
From: Eric Wong @ 2016-08-18 1:39 UTC (permalink / raw)
To: meta; +Cc: Thomas Ferris Nicolaisen, Johannes Schindelin
For some reason, Alpine will set X-UNKNOWN for valid UTF-8.
Since we favor UTF-8 HTML anyways, try forcing Email::MIME to
handle text/plain as UTF-8 which might show up better.
At least this change renders
<alpine.DEB.2.20.1608131214070.4924@virtualbox>
properly by showing "•" (•) instead of
"⠢" (•)
Reported-by: Thomas Ferris Nicolaisen <tfnico@gmail.com>
---
lib/PublicInbox/View.pm | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 3f0e122..6997c1c 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -457,8 +457,14 @@ sub add_text_body {
my $err = $@;
if ($err) {
if ($ct =~ m!\btext/plain\b!i) {
+ # Try to assume UTF-8 because Alpine seems to
+ # do wacky things and set charset=X-UNKNOWN
+ $part->charset_set('UTF-8');
+ $s = eval { $part->body_str };
+
+ # If forcing charset=UTF-8 failed,
# attach_link will warn further down...
- $s = $part->body;
+ $s = $part->body if $@;
} else {
return attach_link($upfx, $ct, $p, $fn);
}
--
EW
^ permalink raw reply related [relevance 7%]
* [PATCH 0/3] attempt to display text/plain with bogus charsets
@ 2016-08-18 1:39 7% Eric Wong
2016-08-18 1:39 7% ` [PATCH 3/3] view: try assuming UTF-8 for " Eric Wong
0 siblings, 1 reply; 3+ results
From: Eric Wong @ 2016-08-18 1:39 UTC (permalink / raw)
To: meta; +Cc: Thomas Ferris Nicolaisen, Johannes Schindelin
Thomas Ferris Nicolaisen reported a problem with Dscho's
Git for Windows 2.9.3 announcement not rendering text inline:
https://public-inbox.org/git/alpine.DEB.2.20.1608131214070.4924@virtualbox/
This was caused by the bogus charset=X-UNKNOWN set by Alpine.
Attempt to handle it as UTF-8, but fall back to blindly showing
it to the user. In either case, warn users about the mangled
text.
Eric Wong (3):
view: attach_link uses string concatentation
view: try to display bogus charsets for text/plain
view: try assuming UTF-8 for bogus charsets
lib/PublicInbox/View.pm | 40 +++++++++++++++++++++++++++++++---------
1 file changed, 31 insertions(+), 9 deletions(-)
--
EW
^ permalink raw reply [relevance 7%]
Results 1-3 of 3 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2016-08-18 1:39 7% [PATCH 0/3] attempt to display text/plain with bogus charsets Eric Wong
2016-08-18 1:39 7% ` [PATCH 3/3] view: try assuming UTF-8 for " Eric Wong
2020-02-14 7:05 4% [PATCH] t/msg_iter: test for X-UNKNOWN charset from Alpine Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).