about summary refs log tree commit homepage
path: root/lib/PublicInbox/Linkify.pm
DateCommit message (Collapse)
2023-01-30use Net::SSLeay (OpenSSL) for SHA-(1|256) if installed
On my x86-64 machine, OpenSSL SHA-256 is nearly twice as fast as the Digest::SHA implementation from Perl, most likely due to an optimized assembly implementation. SHA-1 is a few percent faster, too.
2022-08-28linkify: avoid digits and dashes in placeholders
The `highlight' module seems to highlight every digit in YAML (and possibly other) source files. This causes problems in linkify_2 which replaces the placeholders with proper URIs. I suspect `-' and other punctuation characters will cause similar problems, so we must stick to [A-Za-z]. Thus transliterate 0-9 to A-J in the hex key to ensure highlight doesn't see digit characters, and rename the prefix to be project-name independent.
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2020-06-27linkify: support imap, imaps, news, and snews URIs
Since we'll have an IMAP server released soon, maybe imaps:// and imap:// URLs can become popular. news:// is defined with nntp:// in RFC 5538, and we can at least support the news:// form in rendered HTML. snews:// may appear in old mail archives, too, so we'll attempt to support it in case clients do.
2020-04-02mid: add $MID_EXTRACT regexp for export
This allows us to consistently enforce the same Message-ID extraction rules everywhere and makes it easier for us to make changes in the future. Update scripts/ssoma-replay, as well, but don't rely on PublicInbox::* modules in that since it's legacy and public-inbox was never a dependency of ssoma.
2020-02-16view: escape ampersand in Message-IDs
We need to escape ampersands (and some other characters for href attributes), so introduce a `mid_href' sub to do just that. '<', '>' and '"' were always escaped, so there's no risk of tag or attribute injection, but creative Message-IDs could cause confusion for some parsers and generate invalid URLs. Start getting rid of the bloated, over-engineered OO Hval API while we're at it, I only noticed this bug because I started killing off Hval->new* callers.
2020-02-06treewide: run update-copyrights from gnulib for 2019
I didn't wait until September to do it, this year!
2020-01-27linkify: move to_html over from ViewDiff
We use the same idiom in many places for doing two-step linkification and HTML escaping. Get rid of an outdated comment in flush_quote while we're at it.
2020-01-27linkify: compile $LINK_RE once
This gives a 3-4% performance improvement in xt/perf-msgview.t with a mirror of https://public-inbox.org/meta/
2019-10-28linkify: support adding "(raw)" link for Message-IDs
And use it for the per-message permalink display.
2019-10-28view: display redundant headers in permalink
Mail headers can contain multiple headers of any type, so ensure we don't hide any information we're getting in the per-message permalink views. This means it's possible to have multiple From, Date, To, Cc, Subject, and In-Reply-To headers displayed. The thread indices are a special case, I guess, since we run out of space on the line if the headers too long and tools like mutt only show the first one.
2019-09-09run update-copyrights from gnulib for 2019
2019-06-04linkify: support Internationalized Domain Names in URLs
The "\w" character class in Perl matches any word characters in the Unicode database, not just ASCII characters. So we must be prepared for that and generate links to IDNs.
2019-04-18linkify: require parentheses pairs in URLs
Dangling parentheses with trailing punctuation usually means the parentheses is not intended as part of the URL.
2019-04-18linkify: don't get confused by URLs in Perl code, at least
The URLs at the top of WwwStream.pm weren't getting linkified correctly.
2019-02-01linkify: support proto://hostname without trailing slash
Sometimes users will write "http://example.com" without the trailing slash, which every browser and tool I've tested seems to understand.
2018-02-07update copyrights for 2018
Using update-copyrights from gnulib While we're at it, use the SPDX identifier for AGPL-3.0+ to ease mechanical processing.
2017-06-23linkify: handle URLs in parenthesized statements
Sometimes, URLs exist at the end of parethesized statements, and we shouldn't unnecessarily capture that. (example: https://public-inbox.org/ruby-core/20170623032722.GA8124@dcvr/)
2016-12-24linkify: modify argument in place
This results in over 1% speedup doing $MESSAGE_ID/T/ HTML generation for a 368-message thread.
2016-12-06linkify: implement Markdown link compatibility (again)
Although unescaped parentheses in URLs are technically allowed, they are uncommon. However, Markdown-like syntaxes are unfortunately common for URLs, so we might as well support them. This fixes parentheses detection at sentence endings, as seen in practice on emails.
2016-12-06Revert "linkify: implement Markdown link compatibility"
This reverts commit 130d0c4e33c5c73dc69e270fc698735d49e0f159.
2016-12-06linkify: implement Markdown link compatibility
Although unescaped parentheses in URLs are technically allowed, they are uncommon. However, Markdown-like syntaxes are unfortunately common for URLs, so we might as well support them.
2016-08-18linkify: be stricter about matching RFC 3986
We're not to-the-letter about percent-encoding, but we should allow all the characters. This is mainly so we can effectively use the link to some Wikipedia pages with parentheses in them: https://en.wikipedia.org/wiki/Atom_(standard) https://en.wikipedia.org/wiki/Git_(software)
2016-07-02linkify: allow '!' in URLs
GoogleGroups URLs often contain '!' in them
2016-05-01linkify: match more URL characters [:,\$] and schemes
Adding ':' (colon), ',' (comma), '$' (dollar sign) and supporting TLS-enabled schemes: ftps, nntps variants as well as gopher :D
2016-05-01linkify: match '~' (tilde) in URLs
Tilde is common for some homepages: http://example.org/~user/ There's probably some other acceptable characters I'm missing.
2016-03-01linkify: do not capture trailing '.' or ';' in URLs
It seems common for users to end statements with URLs, while it is rare for a URL itself to end with a '.' or ';'. So make a guess and assume the URL was intended to not include the trailing '.' or ';'
2016-03-01extract linkification code to a separate package
This will allow us to more easily reuse it elsewhere.