user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
* [PATCH 0/3] attempt to display text/plain with bogus charsets
@ 2016-08-18  1:39 Eric Wong
  2016-08-18  1:39 ` [PATCH 1/3] view: attach_link uses string concatentation Eric Wong
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Eric Wong @ 2016-08-18  1:39 UTC (permalink / raw)
  To: meta; +Cc: Thomas Ferris Nicolaisen, Johannes Schindelin

Thomas Ferris Nicolaisen reported a problem with Dscho's
Git for Windows 2.9.3 announcement not rendering text inline:

https://public-inbox.org/git/alpine.DEB.2.20.1608131214070.4924@virtualbox/

This was caused by the bogus charset=X-UNKNOWN set by Alpine.
Attempt to handle it as UTF-8, but fall back to blindly showing
it to the user.  In either case, warn users about the mangled
text.


Eric Wong (3):
  view: attach_link uses string concatentation
  view: try to display bogus charsets for text/plain
  view: try assuming UTF-8 for bogus charsets

 lib/PublicInbox/View.pm | 40 +++++++++++++++++++++++++++++++---------
 1 file changed, 31 insertions(+), 9 deletions(-)

-- 
EW

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/3] view: attach_link uses string concatentation
  2016-08-18  1:39 [PATCH 0/3] attempt to display text/plain with bogus charsets Eric Wong
@ 2016-08-18  1:39 ` Eric Wong
  2016-08-18  1:39 ` [PATCH 2/3] view: try to display bogus charsets for text/plain Eric Wong
  2016-08-18  1:39 ` [PATCH 3/3] view: try assuming UTF-8 for bogus charsets Eric Wong
  2 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2016-08-18  1:39 UTC (permalink / raw)
  To: meta; +Cc: Thomas Ferris Nicolaisen, Johannes Schindelin

There is no point in using an array to join on an
empty string (my original intention was probably to
join on "\n").

This is only preparation for the next change to show
a warning to in the attachment link.
---
 lib/PublicInbox/View.pm | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 6f79f60..3057221 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -427,10 +427,10 @@ sub attach_link ($$$$) {
 	} else {
 		$sfn = 'a.bin';
 	}
-	my @ret = qq($nl<a\nhref="$upfx$idx-$sfn">[-- Attachment #$idx: );
+	my $ret = qq($nl<a\nhref="$upfx$idx-$sfn">[-- Attachment #$idx: );
 	my $ts = "Type: $ct, Size: $size bytes";
-	push(@ret, ($desc eq '') ? "$ts --]" : "$desc --]\n[-- $ts --]");
-	join('', @ret, "</a>\n");
+	$ret .= ($desc eq '') ? "$ts --]" : "$desc --]\n[-- $ts --]";
+	$ret .= "</a>\n";
 }
 
 sub add_text_body {
-- 
EW


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/3] view: try to display bogus charsets for text/plain
  2016-08-18  1:39 [PATCH 0/3] attempt to display text/plain with bogus charsets Eric Wong
  2016-08-18  1:39 ` [PATCH 1/3] view: attach_link uses string concatentation Eric Wong
@ 2016-08-18  1:39 ` Eric Wong
  2016-08-18  1:39 ` [PATCH 3/3] view: try assuming UTF-8 for bogus charsets Eric Wong
  2 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2016-08-18  1:39 UTC (permalink / raw)
  To: meta; +Cc: Thomas Ferris Nicolaisen, Johannes Schindelin

Alpine seems to set charset=X-UNKNOWN for valid UTF-8 text,
which causes Email::MIME::body_str to fail as X-UNKNOWN
is not a valid encoding.  So, blindly display the body
as plain-text but warn users about possibly mangled text.

Reported-by: Thomas Ferris Nicolaisen <tfnico@gmail.com>
---
 lib/PublicInbox/View.pm | 30 +++++++++++++++++++++++-------
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 3057221..3f0e122 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -407,14 +407,17 @@ sub flush_quote {
 	$$s .= qq(<span\nclass="q">) . $rv . '</span>'
 }
 
-sub attach_link ($$$$) {
-	my ($upfx, $ct, $p, $fn) = @_;
+sub attach_link ($$$$;$) {
+	my ($upfx, $ct, $p, $fn, $err) = @_;
 	my ($part, $depth, @idx) = @$p;
 	my $nl = $idx[-1] > 1 ? "\n" : '';
 	my $idx = join('.', @idx);
 	my $size = bytes::length($part->body);
 	$ct ||= 'text/plain';
-	$ct =~ s/;.*//; # no attributes
+
+	# hide attributes normally, unless we want to aid users in
+	# spotting MUA problems:
+	$ct =~ s/;.*// unless $err;
 	$ct = ascii_html($ct);
 	my $desc = $part->header('Content-Description');
 	$desc = $fn unless defined $desc;
@@ -427,7 +430,12 @@ sub attach_link ($$$$) {
 	} else {
 		$sfn = 'a.bin';
 	}
-	my $ret = qq($nl<a\nhref="$upfx$idx-$sfn">[-- Attachment #$idx: );
+	my $ret = qq($nl<a\nhref="$upfx$idx-$sfn">);
+	if ($err) {
+		$ret .=
+"[-- Warning: decoded text below may be mangled --]\n";
+	}
+	$ret .= "[-- Attachment #$idx: ";
 	my $ts = "Type: $ct, Size: $size bytes";
 	$ret .= ($desc eq '') ? "$ts --]" : "$desc --]\n[-- $ts --]";
 	$ret .= "</a>\n";
@@ -446,12 +454,20 @@ sub add_text_body {
 	my $s = eval { $part->body_str };
 
 	# badly-encoded message? tell the world about it!
-	return attach_link($upfx, $ct, $p, $fn) if $@;
+	my $err = $@;
+	if ($err) {
+		if ($ct =~ m!\btext/plain\b!i) {
+			# attach_link will warn further down...
+			$s = $part->body;
+		} else {
+			return attach_link($upfx, $ct, $p, $fn);
+		}
+	}
 
 	my @lines = split(/^/m, $s);
 	$s = '';
-	if (defined($fn) || $depth > 0) {
-		$s .= attach_link($upfx, $ct, $p, $fn);
+	if (defined($fn) || $depth > 0 || $err) {
+		$s .= attach_link($upfx, $ct, $p, $fn, $err);
 		$s .= "\n";
 	}
 	my @quot;
-- 
EW


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 3/3] view: try assuming UTF-8 for bogus charsets
  2016-08-18  1:39 [PATCH 0/3] attempt to display text/plain with bogus charsets Eric Wong
  2016-08-18  1:39 ` [PATCH 1/3] view: attach_link uses string concatentation Eric Wong
  2016-08-18  1:39 ` [PATCH 2/3] view: try to display bogus charsets for text/plain Eric Wong
@ 2016-08-18  1:39 ` Eric Wong
  2 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2016-08-18  1:39 UTC (permalink / raw)
  To: meta; +Cc: Thomas Ferris Nicolaisen, Johannes Schindelin

For some reason, Alpine will set X-UNKNOWN for valid UTF-8.
Since we favor UTF-8 HTML anyways, try forcing Email::MIME to
handle text/plain as UTF-8 which might show up better.

At least this change renders

	<alpine.DEB.2.20.1608131214070.4924@virtualbox>

properly by showing "•" (&#8226;) instead of
"â ¢" (&#226;&#128;&#162;)

Reported-by: Thomas Ferris Nicolaisen <tfnico@gmail.com>
---
 lib/PublicInbox/View.pm | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 3f0e122..6997c1c 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -457,8 +457,14 @@ sub add_text_body {
 	my $err = $@;
 	if ($err) {
 		if ($ct =~ m!\btext/plain\b!i) {
+			# Try to assume UTF-8 because Alpine seems to
+			# do wacky things and set charset=X-UNKNOWN
+			$part->charset_set('UTF-8');
+			$s = eval { $part->body_str };
+
+			# If forcing charset=UTF-8 failed,
 			# attach_link will warn further down...
-			$s = $part->body;
+			$s = $part->body if $@;
 		} else {
 			return attach_link($upfx, $ct, $p, $fn);
 		}
-- 
EW


^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-08-18  1:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-18  1:39 [PATCH 0/3] attempt to display text/plain with bogus charsets Eric Wong
2016-08-18  1:39 ` [PATCH 1/3] view: attach_link uses string concatentation Eric Wong
2016-08-18  1:39 ` [PATCH 2/3] view: try to display bogus charsets for text/plain Eric Wong
2016-08-18  1:39 ` [PATCH 3/3] view: try assuming UTF-8 for bogus charsets Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).