user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [SQUASH] msg_part_text: discover text in application/octet-stream
Date: Fri, 12 Mar 2021 02:31:23 +0200	[thread overview]
Message-ID: <20210312003123.GA30304@dcvr> (raw)
In-Reply-To: <20210311014539.19756-1-e@80x24.org>

This simplifies the check and ensures returned text is Perl "utf8"
text (that is, Perl's internal "utf8" and not the strict "UTF-8".

diff --git a/lib/PublicInbox/MsgIter.pm b/lib/PublicInbox/MsgIter.pm
index e2819523..9c6581cc 100644
--- a/lib/PublicInbox/MsgIter.pm
+++ b/lib/PublicInbox/MsgIter.pm
@@ -90,12 +90,8 @@ sub msg_part_text ($$) {
 		# Try to see if it's printable text that we can index
 		# and display:
 		$s = $part->body;
-		if ($s =~ /[^\p{XPosixPrint}\s]/s) {
-			utf8::decode($s);
-			$s =~ /[^\p{XPosixPrint}\s]/s ? undef($s) : undef($err);
-		} else {
-			undef($err);
-		}
+		utf8::decode($s);
+		undef($s =~ /[^\p{XPosixPrint}\s]/s ? $s : $err);
 	}
 	($s, $err);
 }
diff --git a/t/msg_iter.t b/t/msg_iter.t
index 6c52eec8..ae3594da 100644
--- a/t/msg_iter.t
+++ b/t/msg_iter.t
@@ -121,6 +121,7 @@ EOM
 		push @parts, $s;
 	});
 	$expect =~ s/\n/\r\n/sg;
+	utf8::decode($expect); # aka "bytes2str"
 	is_deeply(\@parts, [ "blah\r\n", $expect ],
 		'fallback to application/octet-stream as UTF-8 text');
 

      reply	other threads:[~2021-03-12  0:31 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-11  1:45 [PATCH] msg_part_text: discover text in application/octet-stream Eric Wong
2021-03-12  0:31 ` Eric Wong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210312003123.GA30304@dcvr \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).