* [PATCH 2/2] linkify: do not capture trailing '.' or ';' in URLs
2016-03-01 3:50 6% [PATCH 0/2] linkification improvements Eric Wong
@ 2016-03-01 3:50 7% ` Eric Wong
0 siblings, 0 replies; 2+ results
From: Eric Wong @ 2016-03-01 3:50 UTC (permalink / raw)
To: meta
It seems common for users to end statements with URLs,
while it is rare for a URL itself to end with a '.' or ';'.
So make a guess and assume the URL was intended to not
include the trailing '.' or ';'
---
MANIFEST | 1 +
lib/PublicInbox/Linkify.pm | 10 +++++++++-
t/linkify.t | 26 ++++++++++++++++++++++++++
3 files changed, 36 insertions(+), 1 deletion(-)
create mode 100644 t/linkify.t
diff --git a/MANIFEST b/MANIFEST
index 5d790f9..259f42c 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -80,6 +80,7 @@ t/httpd-corner.psgi
t/httpd-corner.t
t/httpd.t
t/init.t
+t/linkify.t
t/main-bin/spamc
t/mda.t
t/msgmap.t
diff --git a/lib/PublicInbox/Linkify.pm b/lib/PublicInbox/Linkify.pm
index 8f634f4..4eddedd 100644
--- a/lib/PublicInbox/Linkify.pm
+++ b/lib/PublicInbox/Linkify.pm
@@ -25,6 +25,14 @@ sub linkify_1 {
my ($self, $s) = @_;
$s =~ s!$LINK_RE!
my $url = $1;
+ my $end = '';
+
+ # it's fairly common to end URLs in messages with
+ # '.' or ';' to denote the end of a statement.
+ if ($url =~ s/(\.)\z// || $url =~ s/(;)\z//) {
+ $end = $1;
+ }
+
# salt this, as this could be exploited to show
# links in the HTML which don't show up in the raw mail.
my $key = sha1_hex($url . $SALT);
@@ -32,7 +40,7 @@ sub linkify_1 {
# only escape ampersands, others do not match LINK_RE
$url =~ s/&/&/g;
$self->{$key} = $url;
- 'PI-LINK-'. $key;
+ 'PI-LINK-'. $key . $end;
!ge;
$s;
}
diff --git a/t/linkify.t b/t/linkify.t
new file mode 100644
index 0000000..586691a
--- /dev/null
+++ b/t/linkify.t
@@ -0,0 +1,26 @@
+# Copyright (C) 2016 all contributors <meta@public-inbox.org>
+# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+use strict;
+use warnings;
+use Test::More;
+use PublicInbox::Linkify;
+
+{
+ my $l = PublicInbox::Linkify->new;
+ my $u = 'http://example.com/url-with-trailing-period';
+ my $s = $u . '.';
+ $s = $l->linkify_1($s);
+ $s = $l->linkify_2($s);
+ is($s, qq(<a\nhref="$u">$u</a>.), 'trailing period not in URL');
+}
+
+{
+ my $l = PublicInbox::Linkify->new;
+ my $u = 'http://example.com/url-with-trailing-semicolon';
+ my $s = $u . ';';
+ $s = $l->linkify_1($s);
+ $s = $l->linkify_2($s);
+ is($s, qq(<a\nhref="$u">$u</a>;), 'trailing semicolon not in URL');
+}
+
+done_testing();
--
EW
^ permalink raw reply related [relevance 7%]
* [PATCH 0/2] linkification improvements
@ 2016-03-01 3:50 6% Eric Wong
2016-03-01 3:50 7% ` [PATCH 2/2] linkify: do not capture trailing '.' or ';' in URLs Eric Wong
0 siblings, 1 reply; 2+ results
From: Eric Wong @ 2016-03-01 3:50 UTC (permalink / raw)
To: meta
We'll be reusing the linkification code in repobrowse :)
Eric Wong (2):
extract linkification code to a separate package
linkify: do not capture trailing '.' or ';' in URLs
MANIFEST | 2 ++
lib/PublicInbox/Linkify.pm | 65 ++++++++++++++++++++++++++++++++++++++++++++++
lib/PublicInbox/View.pm | 58 ++++++++---------------------------------
t/linkify.t | 26 +++++++++++++++++++
4 files changed, 104 insertions(+), 47 deletions(-)
create mode 100644 lib/PublicInbox/Linkify.pm
create mode 100644 t/linkify.t
^ permalink raw reply [relevance 6%]
Results 1-2 of 2 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2016-03-01 3:50 6% [PATCH 0/2] linkification improvements Eric Wong
2016-03-01 3:50 7% ` [PATCH 2/2] linkify: do not capture trailing '.' or ';' in URLs Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).