From: Eric Wong <e@80x24.org> To: meta@public-inbox.org Subject: [PATCH 01/24] linkify: support Internationalized Domain Names in URLs Date: Tue, 4 Jun 2019 11:27:25 +0000 Message-ID: <20190604112748.23598-2-e@80x24.org> (raw) In-Reply-To: <20190604112748.23598-1-e@80x24.org> The "\w" character class in Perl matches any word characters in the Unicode database, not just ASCII characters. So we must be prepared for that and generate links to IDNs. --- lib/PublicInbox/Linkify.pm | 5 +++-- t/linkify.t | 12 ++++++++++++ 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/lib/PublicInbox/Linkify.pm b/lib/PublicInbox/Linkify.pm index d4778e7..84960a9 100644 --- a/lib/PublicInbox/Linkify.pm +++ b/lib/PublicInbox/Linkify.pm @@ -13,6 +13,7 @@ package PublicInbox::Linkify; use strict; use warnings; use Digest::SHA qw/sha1_hex/; +use PublicInbox::Hval qw(ascii_html); my $SALT = rand; my $LINK_RE = qr{([\('!])?\b((?:ftps?|https?|nntps?|gopher):// @@ -61,12 +62,12 @@ sub linkify_1 { $end = ')'; } + $url = ascii_html($url); # for IDN + # salt this, as this could be exploited to show # links in the HTML which don't show up in the raw mail. my $key = sha1_hex($url . $SALT); - # only escape ampersands, others do not match LINK_RE - $url =~ s/&/&/g; $_[0]->{$key} = $url; $beg . 'PI-LINK-'. $key . $end; ^ge; diff --git a/t/linkify.t b/t/linkify.t index fe218b9..c492358 100644 --- a/t/linkify.t +++ b/t/linkify.t @@ -132,4 +132,16 @@ use PublicInbox::Linkify; 'punctuation with unpaired ) OK') } +if ('IDN example: <ACDB98F4-178C-43C3-99C4-A1D03DD6A8F5@sb.org>') { + my $hc = '月'; + my $u = "http://www.\x{6708}.example.com/"; + my $s = $u; + my $l = PublicInbox::Linkify->new; + $s = $l->linkify_1($s); + $s = $l->linkify_2($s); + my $expect = qq{<a +href="http://www.$hc.example.com/">http://www.$hc.example.com/</a>}; + is($s, $expect, 'IDN message escaped properly'); +} + done_testing(); -- EW
next prev parent reply other threads:[~2019-06-04 11:27 UTC|newest] Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-06-04 11:27 [PATCH 00/24] fix IDN linkification, add paranoia Eric Wong 2019-06-04 11:27 ` Eric Wong [this message] 2019-06-04 11:27 ` [PATCH 02/24] nntp: be explicit about ASCII digit matches Eric Wong 2019-06-04 11:27 ` [PATCH 03/24] nntp: ensure we only handle ASCII whitespace Eric Wong 2019-06-04 11:27 ` [PATCH 04/24] mid: id_compress requires ASCII-clean words Eric Wong 2019-06-04 11:27 ` [PATCH 05/24] feed: only accept ASCII digits for ref~$N Eric Wong 2019-06-04 11:27 ` [PATCH 06/24] http: require SERVER_PORT to be ASCII digit Eric Wong 2019-06-04 11:27 ` [PATCH 07/24] wwwlisting: require ASCII digit for port number Eric Wong 2019-06-04 11:27 ` [PATCH 08/24] wwwattach: only pass the charset through if ASCII Eric Wong 2019-06-04 11:27 ` [PATCH 09/24] www: only emit ASCII chars in attachment filenames Eric Wong 2019-06-04 11:27 ` [PATCH 10/24] www: require ASCII filenames in git blob downloads Eric Wong 2019-06-04 11:27 ` [PATCH 11/24] config: do not accept non-ASCII digits in cgitrc params Eric Wong 2019-06-04 11:27 ` [PATCH 12/24] newswww: only accept ASCII digits as article numbers Eric Wong 2019-06-04 11:27 ` [PATCH 13/24] view: require YYYYmmDD(HHMMSS) timestamps to be ASCII Eric Wong 2019-06-04 11:27 ` [PATCH 14/24] githttpbackend: require Range:, Status: to be ASCII digits Eric Wong 2019-06-04 11:27 ` [PATCH 15/24] searchview: do not allow non-ASCII offsets and limits Eric Wong 2019-06-04 11:27 ` [PATCH 16/24] msgtime: require ASCII digits for parsing dates Eric Wong 2019-06-04 11:27 ` [PATCH 17/24] filter/rubylang: require ASCII digit for mailcount Eric Wong 2019-06-04 11:27 ` [PATCH 18/24] inbox: require ASCII digits for feedmax var Eric Wong 2019-06-04 11:27 ` [PATCH 19/24] solver|viewdiff: restrict digit matches to ASCII Eric Wong 2019-06-04 11:27 ` [PATCH 20/24] www: require ASCII digit for git epoch Eric Wong 2019-06-04 11:27 ` [PATCH 21/24] require ASCII digits for local FS items Eric Wong 2019-06-04 11:27 ` [PATCH 22/24] githttpbackend: require ASCII in path Eric Wong 2019-06-04 11:27 ` [PATCH 23/24] www: require ASCII range for mbox downloads Eric Wong 2019-06-04 11:27 ` [PATCH 24/24] www: require ASCII word characters for CSS filenames Eric Wong 2019-06-05 2:18 ` [PATCH 25/24] tighten up digit matches to ASCII for git output Eric Wong
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style List information: https://public-inbox.org/README * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20190604112748.23598-2-e@80x24.org \ --to=e@80x24.org \ --cc=meta@public-inbox.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
user/dev discussion of public-inbox itself This inbox may be cloned and mirrored by anyone: git clone --mirror https://public-inbox.org/meta git clone --mirror http://czquwvybam4bgbro.onion/meta git clone --mirror http://hjrcffqmbrq6wope.onion/meta git clone --mirror http://ou63pmih66umazou.onion/meta # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V1 meta meta/ https://public-inbox.org/meta \ meta@public-inbox.org public-inbox-index meta Example config snippet for mirrors. Newsgroups are available over NNTP: nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta nntp://news.gmane.io/gmane.mail.public-inbox.general note: .onion URLs require Tor: https://www.torproject.org/ code repositories for the project(s) associated with this inbox: https://80x24.org/public-inbox.git AGPL code for this site: git clone https://public-inbox.org/public-inbox.git