about summary refs log tree commit homepage
path: root/lib/PublicInbox/Smsg.pm
diff options
context:
space:
mode:
authorEric Wong <e@yhbt.net>2020-08-19 08:02:33 +0000
committerEric Wong <e@yhbt.net>2020-08-19 08:05:26 +0000
commit86c28d2432292c6bee149f59175486e5610e4462 (patch)
tree9a912e852e40ee6be1280480a20b1f5e362d8fdf /lib/PublicInbox/Smsg.pm
parent99850fabd5fc628ab29c718e9d7de09b8114b208 (diff)
downloadpublic-inbox-86c28d2432292c6bee149f59175486e5610e4462.tar.gz
There may be messages in the wild with wide characters in
headers which aren't non-RFC2047 encoded.  Assume UTF-8 so
those fields can round trip through over.sqlite3.

This doesn't affect docdata.glass in Xapian, but it does
affect how over.sqlite3 stores the same deflated info.
Diffstat (limited to 'lib/PublicInbox/Smsg.pm')
-rw-r--r--lib/PublicInbox/Smsg.pm3
1 files changed, 3 insertions, 0 deletions
diff --git a/lib/PublicInbox/Smsg.pm b/lib/PublicInbox/Smsg.pm
index aaf88f35..62cb951e 100644
--- a/lib/PublicInbox/Smsg.pm
+++ b/lib/PublicInbox/Smsg.pm
@@ -105,6 +105,9 @@ sub populate {
                 # to protect git and NNTP clients
                 $val =~ tr/\0\t\n/   /;
 
+                # rare: in case headers have wide chars (not RFC2047-encoded)
+                utf8::decode($val);
+
                 # lower-case fields for read-only stuff
                 $self->{lc($f)} = $val;