user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 00/24] fix IDN linkification, add paranoia
Date: Tue,  4 Jun 2019 11:27:24 +0000	[thread overview]
Message-ID: <20190604112748.23598-1-e@80x24.org> (raw)

Looks like I wasn't up-to-date with the Perl 5.6 Unicode changes
from the turn-of-the century :x   So, regexp character classes
like \w, \s, \d, etc are all locale-aware and could match
characters I didn't expect.

AFAIK, the worst thing from this screwup was a message with an
IDN (internationalizd domain name) in a URL causing a message to
fail rendering as HTML during linkification.

Now, creating attachment and git blob download links with
non-ASCII characters could cause some confusion; and lead to
difficulty removing said files for people who can't type or
tab-complete their way to non-ASCII characters, or lack the
proper fonts to display the character.

The handy "/a" regexp modifier is off-limits for now, since
that's only in Perl 5.14+ and we are just barely taking the bold
step of requiring Perl 5.10.1+ a decade after its release.

Most of the other changes are probably paranoia at best, and/or
optimizations intended to stop bad inputs earlier rather than
later.  Deeper parts of the stack which actually interpret
strings of digits as integers will treat non-ASCII digits as
zero.

Anyways, internationalized domain names and email addresses are
a real thing whether or not it's a security and maintenance
nightmare.  And it looks like some of the code can already
support them (or can be easily tweaked to support them, like
Linkify.pm for IDN).

There's probably more things along these lines which can be
done.  Right now, I'm treating git-fast-import and
"git-log --raw" output be trusted in that they won't surprise
us with Unicode digits and such...

But yeah, probably more auditing and eyes for stuff like this
would be helpful...

Eric Wong (24):
  linkify: support Internationalized Domain Names in URLs
  nntp: be explicit about ASCII digit matches
  nntp: ensure we only handle ASCII whitespace
  mid: id_compress requires ASCII-clean words
  feed: only accept ASCII digits for ref~$N
  http: require SERVER_PORT to be ASCII digit
  wwwlisting: require ASCII digit for port number
  wwwattach: only pass the charset through if ASCII
  www: only emit ASCII chars in attachment filenames
  www: require ASCII filenames in git blob downloads
  config: do not accept non-ASCII digits in cgitrc params
  newswww: only accept ASCII digits as article numbers
  view: require YYYYmmDD(HHMMSS) timestamps to be ASCII
  githttpbackend: require Range:, Status: to be ASCII digits
  searchview: do not allow non-ASCII offsets and limits
  msgtime: require ASCII digits for parsing dates
  filter/rubylang: require ASCII digit for mailcount
  inbox: require ASCII digits for feedmax var
  solver|viewdiff: restrict digit matches to ASCII
  www: require ASCII digit for git epoch
  require ASCII digits for local FS items
  githttpbackend: require ASCII in path
  www: require ASCII range for mbox downloads
  www: require ASCII word characters for CSS filenames

 lib/PublicInbox/Config.pm          |  2 +-
 lib/PublicInbox/Feed.pm            |  2 +-
 lib/PublicInbox/Filter/RubyLang.pm |  2 +-
 lib/PublicInbox/GitHTTPBackend.pm  |  8 ++++----
 lib/PublicInbox/HTTP.pm            |  2 +-
 lib/PublicInbox/Hval.pm            |  3 +++
 lib/PublicInbox/Inbox.pm           |  6 +++---
 lib/PublicInbox/Linkify.pm         |  5 +++--
 lib/PublicInbox/MID.pm             |  4 ++--
 lib/PublicInbox/MsgTime.pm         |  7 ++++---
 lib/PublicInbox/NNTP.pm            | 16 ++++++++--------
 lib/PublicInbox/NewsWWW.pm         |  2 +-
 lib/PublicInbox/Search.pm          |  2 +-
 lib/PublicInbox/SearchView.pm      |  4 ++--
 lib/PublicInbox/SolverGit.pm       |  2 +-
 lib/PublicInbox/V2Writable.pm      |  4 ++--
 lib/PublicInbox/View.pm            |  6 +++---
 lib/PublicInbox/ViewDiff.pm        |  4 ++--
 lib/PublicInbox/WWW.pm             | 20 +++++++++++++-------
 lib/PublicInbox/WwwAttach.pm       |  2 +-
 lib/PublicInbox/WwwListing.pm      |  4 ++--
 lib/PublicInbox/Xapcmd.pm          |  6 +++---
 script/public-inbox-purge          |  2 +-
 t/linkify.t                        | 12 ++++++++++++
 24 files changed, 75 insertions(+), 52 deletions(-)

-- 
EW


             reply	other threads:[~2019-06-04 11:27 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-04 11:27 Eric Wong [this message]
2019-06-04 11:27 ` [PATCH 01/24] linkify: support Internationalized Domain Names in URLs Eric Wong
2019-06-04 11:27 ` [PATCH 02/24] nntp: be explicit about ASCII digit matches Eric Wong
2019-06-04 11:27 ` [PATCH 03/24] nntp: ensure we only handle ASCII whitespace Eric Wong
2019-06-04 11:27 ` [PATCH 04/24] mid: id_compress requires ASCII-clean words Eric Wong
2019-06-04 11:27 ` [PATCH 05/24] feed: only accept ASCII digits for ref~$N Eric Wong
2019-06-04 11:27 ` [PATCH 06/24] http: require SERVER_PORT to be ASCII digit Eric Wong
2019-06-04 11:27 ` [PATCH 07/24] wwwlisting: require ASCII digit for port number Eric Wong
2019-06-04 11:27 ` [PATCH 08/24] wwwattach: only pass the charset through if ASCII Eric Wong
2019-06-04 11:27 ` [PATCH 09/24] www: only emit ASCII chars in attachment filenames Eric Wong
2019-06-04 11:27 ` [PATCH 10/24] www: require ASCII filenames in git blob downloads Eric Wong
2019-06-04 11:27 ` [PATCH 11/24] config: do not accept non-ASCII digits in cgitrc params Eric Wong
2019-06-04 11:27 ` [PATCH 12/24] newswww: only accept ASCII digits as article numbers Eric Wong
2019-06-04 11:27 ` [PATCH 13/24] view: require YYYYmmDD(HHMMSS) timestamps to be ASCII Eric Wong
2019-06-04 11:27 ` [PATCH 14/24] githttpbackend: require Range:, Status: to be ASCII digits Eric Wong
2019-06-04 11:27 ` [PATCH 15/24] searchview: do not allow non-ASCII offsets and limits Eric Wong
2019-06-04 11:27 ` [PATCH 16/24] msgtime: require ASCII digits for parsing dates Eric Wong
2019-06-04 11:27 ` [PATCH 17/24] filter/rubylang: require ASCII digit for mailcount Eric Wong
2019-06-04 11:27 ` [PATCH 18/24] inbox: require ASCII digits for feedmax var Eric Wong
2019-06-04 11:27 ` [PATCH 19/24] solver|viewdiff: restrict digit matches to ASCII Eric Wong
2019-06-04 11:27 ` [PATCH 20/24] www: require ASCII digit for git epoch Eric Wong
2019-06-04 11:27 ` [PATCH 21/24] require ASCII digits for local FS items Eric Wong
2019-06-04 11:27 ` [PATCH 22/24] githttpbackend: require ASCII in path Eric Wong
2019-06-04 11:27 ` [PATCH 23/24] www: require ASCII range for mbox downloads Eric Wong
2019-06-04 11:27 ` [PATCH 24/24] www: require ASCII word characters for CSS filenames Eric Wong
2019-06-05  2:18 ` [PATCH 25/24] tighten up digit matches to ASCII for git output Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190604112748.23598-1-e@80x24.org \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).