* [PATCH 09/24] www: only emit ASCII chars in attachment filenames
2019-06-04 11:27 6% [PATCH 00/24] fix IDN linkification, add paranoia Eric Wong
@ 2019-06-04 11:27 7% ` Eric Wong
0 siblings, 0 replies; 2+ results
From: Eric Wong @ 2019-06-04 11:27 UTC (permalink / raw)
To: meta
We don't want to emit funky URLs which can be lost in
translation or cause problems with non-Unicode-aware
clients.
Then, don't accept non-ASCII filenames in URLs, since
a manually-generated URL/filename in attachment downloads
could be used for Unicode homographs to confuse folks who
down the attachment.
---
lib/PublicInbox/Hval.pm | 3 +++
lib/PublicInbox/View.pm | 2 +-
lib/PublicInbox/WWW.pm | 2 +-
3 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/lib/PublicInbox/Hval.pm b/lib/PublicInbox/Hval.pm
index 95a0f70..2b44397 100644
--- a/lib/PublicInbox/Hval.pm
+++ b/lib/PublicInbox/Hval.pm
@@ -13,6 +13,9 @@ our @EXPORT_OK = qw/ascii_html obfuscate_addrs to_filename src_escape
to_attr from_attr/;
my $enc_ascii = find_encoding('us-ascii');
+# safe-ish acceptable filename pattern for portability
+our $FN = '[a-zA-Z0-9][a-zA-Z0-9_\-\.]+[a-zA-Z0-9]'; # needs \z anchor
+
sub new {
my ($class, $raw, $href) = @_;
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 09afdaf..83ae99b 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -528,7 +528,7 @@ sub attach_link ($$$$;$) {
$desc = $fn unless defined $desc;
$desc = '' unless defined $desc;
my $sfn;
- if (defined $fn && $fn =~ /\A[[:alnum:]][\w\.-]+[[:alnum:]]\z/) {
+ if (defined $fn && $fn =~ /\A$PublicInbox::Hval::FN\z/o) {
$sfn = $fn;
} elsif ($ct eq 'text/plain') {
$sfn = 'a.txt';
diff --git a/lib/PublicInbox/WWW.pm b/lib/PublicInbox/WWW.pm
index b6f18f8..50b6950 100644
--- a/lib/PublicInbox/WWW.pm
+++ b/lib/PublicInbox/WWW.pm
@@ -28,7 +28,7 @@ use PublicInbox::UserContent;
our $INBOX_RE = qr!\A/([\w\-][\w\.\-]*)!;
our $MID_RE = qr!([^/]+)!;
our $END_RE = qr!(T/|t/|t\.mbox(?:\.gz)?|t\.atom|raw|)!;
-our $ATTACH_RE = qr!(\d[\.\d]*)-([[:alnum:]][\w\.-]+[[:alnum:]])!i;
+our $ATTACH_RE = qr!([0-9][0-9\.]*)-($PublicInbox::Hval::FN)!;
our $OID_RE = qr![a-f0-9]{7,40}!;
sub new {
--
EW
^ permalink raw reply related [relevance 7%]
* [PATCH 00/24] fix IDN linkification, add paranoia
@ 2019-06-04 11:27 6% Eric Wong
2019-06-04 11:27 7% ` [PATCH 09/24] www: only emit ASCII chars in attachment filenames Eric Wong
0 siblings, 1 reply; 2+ results
From: Eric Wong @ 2019-06-04 11:27 UTC (permalink / raw)
To: meta
Looks like I wasn't up-to-date with the Perl 5.6 Unicode changes
from the turn-of-the century :x So, regexp character classes
like \w, \s, \d, etc are all locale-aware and could match
characters I didn't expect.
AFAIK, the worst thing from this screwup was a message with an
IDN (internationalizd domain name) in a URL causing a message to
fail rendering as HTML during linkification.
Now, creating attachment and git blob download links with
non-ASCII characters could cause some confusion; and lead to
difficulty removing said files for people who can't type or
tab-complete their way to non-ASCII characters, or lack the
proper fonts to display the character.
The handy "/a" regexp modifier is off-limits for now, since
that's only in Perl 5.14+ and we are just barely taking the bold
step of requiring Perl 5.10.1+ a decade after its release.
Most of the other changes are probably paranoia at best, and/or
optimizations intended to stop bad inputs earlier rather than
later. Deeper parts of the stack which actually interpret
strings of digits as integers will treat non-ASCII digits as
zero.
Anyways, internationalized domain names and email addresses are
a real thing whether or not it's a security and maintenance
nightmare. And it looks like some of the code can already
support them (or can be easily tweaked to support them, like
Linkify.pm for IDN).
There's probably more things along these lines which can be
done. Right now, I'm treating git-fast-import and
"git-log --raw" output be trusted in that they won't surprise
us with Unicode digits and such...
But yeah, probably more auditing and eyes for stuff like this
would be helpful...
Eric Wong (24):
linkify: support Internationalized Domain Names in URLs
nntp: be explicit about ASCII digit matches
nntp: ensure we only handle ASCII whitespace
mid: id_compress requires ASCII-clean words
feed: only accept ASCII digits for ref~$N
http: require SERVER_PORT to be ASCII digit
wwwlisting: require ASCII digit for port number
wwwattach: only pass the charset through if ASCII
www: only emit ASCII chars in attachment filenames
www: require ASCII filenames in git blob downloads
config: do not accept non-ASCII digits in cgitrc params
newswww: only accept ASCII digits as article numbers
view: require YYYYmmDD(HHMMSS) timestamps to be ASCII
githttpbackend: require Range:, Status: to be ASCII digits
searchview: do not allow non-ASCII offsets and limits
msgtime: require ASCII digits for parsing dates
filter/rubylang: require ASCII digit for mailcount
inbox: require ASCII digits for feedmax var
solver|viewdiff: restrict digit matches to ASCII
www: require ASCII digit for git epoch
require ASCII digits for local FS items
githttpbackend: require ASCII in path
www: require ASCII range for mbox downloads
www: require ASCII word characters for CSS filenames
lib/PublicInbox/Config.pm | 2 +-
lib/PublicInbox/Feed.pm | 2 +-
lib/PublicInbox/Filter/RubyLang.pm | 2 +-
lib/PublicInbox/GitHTTPBackend.pm | 8 ++++----
lib/PublicInbox/HTTP.pm | 2 +-
lib/PublicInbox/Hval.pm | 3 +++
lib/PublicInbox/Inbox.pm | 6 +++---
lib/PublicInbox/Linkify.pm | 5 +++--
lib/PublicInbox/MID.pm | 4 ++--
lib/PublicInbox/MsgTime.pm | 7 ++++---
lib/PublicInbox/NNTP.pm | 16 ++++++++--------
lib/PublicInbox/NewsWWW.pm | 2 +-
lib/PublicInbox/Search.pm | 2 +-
lib/PublicInbox/SearchView.pm | 4 ++--
lib/PublicInbox/SolverGit.pm | 2 +-
lib/PublicInbox/V2Writable.pm | 4 ++--
lib/PublicInbox/View.pm | 6 +++---
lib/PublicInbox/ViewDiff.pm | 4 ++--
lib/PublicInbox/WWW.pm | 20 +++++++++++++-------
lib/PublicInbox/WwwAttach.pm | 2 +-
lib/PublicInbox/WwwListing.pm | 4 ++--
lib/PublicInbox/Xapcmd.pm | 6 +++---
script/public-inbox-purge | 2 +-
t/linkify.t | 12 ++++++++++++
24 files changed, 75 insertions(+), 52 deletions(-)
--
EW
^ permalink raw reply [relevance 6%]
Results 1-2 of 2 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2019-06-04 11:27 6% [PATCH 00/24] fix IDN linkification, add paranoia Eric Wong
2019-06-04 11:27 7% ` [PATCH 09/24] www: only emit ASCII chars in attachment filenames Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).