From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00, T_FILL_THIS_FORM_SHORT shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 46DCA1F462 for ; Tue, 4 Jun 2019 11:27:48 +0000 (UTC) From: Eric Wong To: meta@public-inbox.org Subject: [PATCH 00/24] fix IDN linkification, add paranoia Date: Tue, 4 Jun 2019 11:27:24 +0000 Message-Id: <20190604112748.23598-1-e@80x24.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: Looks like I wasn't up-to-date with the Perl 5.6 Unicode changes from the turn-of-the century :x So, regexp character classes like \w, \s, \d, etc are all locale-aware and could match characters I didn't expect. AFAIK, the worst thing from this screwup was a message with an IDN (internationalizd domain name) in a URL causing a message to fail rendering as HTML during linkification. Now, creating attachment and git blob download links with non-ASCII characters could cause some confusion; and lead to difficulty removing said files for people who can't type or tab-complete their way to non-ASCII characters, or lack the proper fonts to display the character. The handy "/a" regexp modifier is off-limits for now, since that's only in Perl 5.14+ and we are just barely taking the bold step of requiring Perl 5.10.1+ a decade after its release. Most of the other changes are probably paranoia at best, and/or optimizations intended to stop bad inputs earlier rather than later. Deeper parts of the stack which actually interpret strings of digits as integers will treat non-ASCII digits as zero. Anyways, internationalized domain names and email addresses are a real thing whether or not it's a security and maintenance nightmare. And it looks like some of the code can already support them (or can be easily tweaked to support them, like Linkify.pm for IDN). There's probably more things along these lines which can be done. Right now, I'm treating git-fast-import and "git-log --raw" output be trusted in that they won't surprise us with Unicode digits and such... But yeah, probably more auditing and eyes for stuff like this would be helpful... Eric Wong (24): linkify: support Internationalized Domain Names in URLs nntp: be explicit about ASCII digit matches nntp: ensure we only handle ASCII whitespace mid: id_compress requires ASCII-clean words feed: only accept ASCII digits for ref~$N http: require SERVER_PORT to be ASCII digit wwwlisting: require ASCII digit for port number wwwattach: only pass the charset through if ASCII www: only emit ASCII chars in attachment filenames www: require ASCII filenames in git blob downloads config: do not accept non-ASCII digits in cgitrc params newswww: only accept ASCII digits as article numbers view: require YYYYmmDD(HHMMSS) timestamps to be ASCII githttpbackend: require Range:, Status: to be ASCII digits searchview: do not allow non-ASCII offsets and limits msgtime: require ASCII digits for parsing dates filter/rubylang: require ASCII digit for mailcount inbox: require ASCII digits for feedmax var solver|viewdiff: restrict digit matches to ASCII www: require ASCII digit for git epoch require ASCII digits for local FS items githttpbackend: require ASCII in path www: require ASCII range for mbox downloads www: require ASCII word characters for CSS filenames lib/PublicInbox/Config.pm | 2 +- lib/PublicInbox/Feed.pm | 2 +- lib/PublicInbox/Filter/RubyLang.pm | 2 +- lib/PublicInbox/GitHTTPBackend.pm | 8 ++++---- lib/PublicInbox/HTTP.pm | 2 +- lib/PublicInbox/Hval.pm | 3 +++ lib/PublicInbox/Inbox.pm | 6 +++--- lib/PublicInbox/Linkify.pm | 5 +++-- lib/PublicInbox/MID.pm | 4 ++-- lib/PublicInbox/MsgTime.pm | 7 ++++--- lib/PublicInbox/NNTP.pm | 16 ++++++++-------- lib/PublicInbox/NewsWWW.pm | 2 +- lib/PublicInbox/Search.pm | 2 +- lib/PublicInbox/SearchView.pm | 4 ++-- lib/PublicInbox/SolverGit.pm | 2 +- lib/PublicInbox/V2Writable.pm | 4 ++-- lib/PublicInbox/View.pm | 6 +++--- lib/PublicInbox/ViewDiff.pm | 4 ++-- lib/PublicInbox/WWW.pm | 20 +++++++++++++------- lib/PublicInbox/WwwAttach.pm | 2 +- lib/PublicInbox/WwwListing.pm | 4 ++-- lib/PublicInbox/Xapcmd.pm | 6 +++--- script/public-inbox-purge | 2 +- t/linkify.t | 12 ++++++++++++ 24 files changed, 75 insertions(+), 52 deletions(-) -- EW