From 86c28d2432292c6bee149f59175486e5610e4462 Mon Sep 17 00:00:00 2001 From: Eric Wong Date: Wed, 19 Aug 2020 08:02:33 +0000 Subject: smsg: handle wide characters in raw mail headers There may be messages in the wild with wide characters in headers which aren't non-RFC2047 encoded. Assume UTF-8 so those fields can round trip through over.sqlite3. This doesn't affect docdata.glass in Xapian, but it does affect how over.sqlite3 stores the same deflated info. --- lib/PublicInbox/Smsg.pm | 3 +++ 1 file changed, 3 insertions(+) (limited to 'lib/PublicInbox/Smsg.pm') diff --git a/lib/PublicInbox/Smsg.pm b/lib/PublicInbox/Smsg.pm index aaf88f35..62cb951e 100644 --- a/lib/PublicInbox/Smsg.pm +++ b/lib/PublicInbox/Smsg.pm @@ -105,6 +105,9 @@ sub populate { # to protect git and NNTP clients $val =~ tr/\0\t\n/ /; + # rare: in case headers have wide chars (not RFC2047-encoded) + utf8::decode($val); + # lower-case fields for read-only stuff $self->{lc($f)} = $val; -- cgit v1.2.3-24-ge0c7