user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* [PATCH 00/24] fix IDN linkification, add paranoia
@ 2019-06-04 11:27  6% Eric Wong
  2019-06-04 11:27  7% ` [PATCH 09/24] www: only emit ASCII chars in attachment filenames Eric Wong
  0 siblings, 1 reply; 2+ results
From: Eric Wong @ 2019-06-04 11:27 UTC (permalink / raw)
  To: meta

Looks like I wasn't up-to-date with the Perl 5.6 Unicode changes
from the turn-of-the century :x   So, regexp character classes
like \w, \s, \d, etc are all locale-aware and could match
characters I didn't expect.

AFAIK, the worst thing from this screwup was a message with an
IDN (internationalizd domain name) in a URL causing a message to
fail rendering as HTML during linkification.

Now, creating attachment and git blob download links with
non-ASCII characters could cause some confusion; and lead to
difficulty removing said files for people who can't type or
tab-complete their way to non-ASCII characters, or lack the
proper fonts to display the character.

The handy "/a" regexp modifier is off-limits for now, since
that's only in Perl 5.14+ and we are just barely taking the bold
step of requiring Perl 5.10.1+ a decade after its release.

Most of the other changes are probably paranoia at best, and/or
optimizations intended to stop bad inputs earlier rather than
later.  Deeper parts of the stack which actually interpret
strings of digits as integers will treat non-ASCII digits as
zero.

Anyways, internationalized domain names and email addresses are
a real thing whether or not it's a security and maintenance
nightmare.  And it looks like some of the code can already
support them (or can be easily tweaked to support them, like
Linkify.pm for IDN).

There's probably more things along these lines which can be
done.  Right now, I'm treating git-fast-import and
"git-log --raw" output be trusted in that they won't surprise
us with Unicode digits and such...

But yeah, probably more auditing and eyes for stuff like this
would be helpful...

Eric Wong (24):
  linkify: support Internationalized Domain Names in URLs
  nntp: be explicit about ASCII digit matches
  nntp: ensure we only handle ASCII whitespace
  mid: id_compress requires ASCII-clean words
  feed: only accept ASCII digits for ref~$N
  http: require SERVER_PORT to be ASCII digit
  wwwlisting: require ASCII digit for port number
  wwwattach: only pass the charset through if ASCII
  www: only emit ASCII chars in attachment filenames
  www: require ASCII filenames in git blob downloads
  config: do not accept non-ASCII digits in cgitrc params
  newswww: only accept ASCII digits as article numbers
  view: require YYYYmmDD(HHMMSS) timestamps to be ASCII
  githttpbackend: require Range:, Status: to be ASCII digits
  searchview: do not allow non-ASCII offsets and limits
  msgtime: require ASCII digits for parsing dates
  filter/rubylang: require ASCII digit for mailcount
  inbox: require ASCII digits for feedmax var
  solver|viewdiff: restrict digit matches to ASCII
  www: require ASCII digit for git epoch
  require ASCII digits for local FS items
  githttpbackend: require ASCII in path
  www: require ASCII range for mbox downloads
  www: require ASCII word characters for CSS filenames

 lib/PublicInbox/Config.pm          |  2 +-
 lib/PublicInbox/Feed.pm            |  2 +-
 lib/PublicInbox/Filter/RubyLang.pm |  2 +-
 lib/PublicInbox/GitHTTPBackend.pm  |  8 ++++----
 lib/PublicInbox/HTTP.pm            |  2 +-
 lib/PublicInbox/Hval.pm            |  3 +++
 lib/PublicInbox/Inbox.pm           |  6 +++---
 lib/PublicInbox/Linkify.pm         |  5 +++--
 lib/PublicInbox/MID.pm             |  4 ++--
 lib/PublicInbox/MsgTime.pm         |  7 ++++---
 lib/PublicInbox/NNTP.pm            | 16 ++++++++--------
 lib/PublicInbox/NewsWWW.pm         |  2 +-
 lib/PublicInbox/Search.pm          |  2 +-
 lib/PublicInbox/SearchView.pm      |  4 ++--
 lib/PublicInbox/SolverGit.pm       |  2 +-
 lib/PublicInbox/V2Writable.pm      |  4 ++--
 lib/PublicInbox/View.pm            |  6 +++---
 lib/PublicInbox/ViewDiff.pm        |  4 ++--
 lib/PublicInbox/WWW.pm             | 20 +++++++++++++-------
 lib/PublicInbox/WwwAttach.pm       |  2 +-
 lib/PublicInbox/WwwListing.pm      |  4 ++--
 lib/PublicInbox/Xapcmd.pm          |  6 +++---
 script/public-inbox-purge          |  2 +-
 t/linkify.t                        | 12 ++++++++++++
 24 files changed, 75 insertions(+), 52 deletions(-)

-- 
EW


^ permalink raw reply	[relevance 6%]

* [PATCH 09/24] www: only emit ASCII chars in attachment filenames
  2019-06-04 11:27  6% [PATCH 00/24] fix IDN linkification, add paranoia Eric Wong
@ 2019-06-04 11:27  7% ` Eric Wong
  0 siblings, 0 replies; 2+ results
From: Eric Wong @ 2019-06-04 11:27 UTC (permalink / raw)
  To: meta

We don't want to emit funky URLs which can be lost in
translation or cause problems with non-Unicode-aware
clients.

Then, don't accept non-ASCII filenames in URLs, since
a manually-generated URL/filename in attachment downloads
could be used for Unicode homographs to confuse folks who
down the attachment.
---
 lib/PublicInbox/Hval.pm | 3 +++
 lib/PublicInbox/View.pm | 2 +-
 lib/PublicInbox/WWW.pm  | 2 +-
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/Hval.pm b/lib/PublicInbox/Hval.pm
index 95a0f70..2b44397 100644
--- a/lib/PublicInbox/Hval.pm
+++ b/lib/PublicInbox/Hval.pm
@@ -13,6 +13,9 @@ our @EXPORT_OK = qw/ascii_html obfuscate_addrs to_filename src_escape
 		to_attr from_attr/;
 my $enc_ascii = find_encoding('us-ascii');
 
+# safe-ish acceptable filename pattern for portability
+our $FN = '[a-zA-Z0-9][a-zA-Z0-9_\-\.]+[a-zA-Z0-9]'; # needs \z anchor
+
 sub new {
 	my ($class, $raw, $href) = @_;
 
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 09afdaf..83ae99b 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -528,7 +528,7 @@ sub attach_link ($$$$;$) {
 	$desc = $fn unless defined $desc;
 	$desc = '' unless defined $desc;
 	my $sfn;
-	if (defined $fn && $fn =~ /\A[[:alnum:]][\w\.-]+[[:alnum:]]\z/) {
+	if (defined $fn && $fn =~ /\A$PublicInbox::Hval::FN\z/o) {
 		$sfn = $fn;
 	} elsif ($ct eq 'text/plain') {
 		$sfn = 'a.txt';
diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index b6f18f8..50b6950 100644
--- a/lib/PublicInbox/WWW.pm
+++ b/lib/PublicInbox/WWW.pm
@@ -28,7 +28,7 @@ use PublicInbox::UserContent;
 our $INBOX_RE = qr!\A/([\w\-][\w\.\-]*)!;
 our $MID_RE = qr!([^/]+)!;
 our $END_RE = qr!(T/|t/|t\.mbox(?:\.gz)?|t\.atom|raw|)!;
-our $ATTACH_RE = qr!(\d[\.\d]*)-([[:alnum:]][\w\.-]+[[:alnum:]])!i;
+our $ATTACH_RE = qr!([0-9][0-9\.]*)-($PublicInbox::Hval::FN)!;
 our $OID_RE = qr![a-f0-9]{7,40}!;
 
 sub new {
-- 
EW


^ permalink raw reply related	[relevance 7%]

Results 1-2 of 2 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2019-06-04 11:27  6% [PATCH 00/24] fix IDN linkification, add paranoia Eric Wong
2019-06-04 11:27  7% ` [PATCH 09/24] www: only emit ASCII chars in attachment filenames Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).