Date | Commit message (Collapse) |
|
On my x86-64 machine, OpenSSL SHA-256 is nearly twice as fast as
the Digest::SHA implementation from Perl, most likely due to an
optimized assembly implementation. SHA-1 is a few percent
faster, too.
|
|
The `highlight' module seems to highlight every digit in
YAML (and possibly other) source files. This causes problems
in linkify_2 which replaces the placeholders with proper URIs.
I suspect `-' and other punctuation characters will cause
similar problems, so we must stick to [A-Za-z].
Thus transliterate 0-9 to A-J in the hex key to ensure highlight
doesn't see digit characters, and rename the prefix to be
project-name independent.
|
|
Using "make update-copyrights" after setting GNULIB_PATH in my
config.mak
|
|
Since we'll have an IMAP server released soon, maybe imaps://
and imap:// URLs can become popular.
news:// is defined with nntp:// in RFC 5538, and we can at least
support the news:// form in rendered HTML. snews:// may appear
in old mail archives, too, so we'll attempt to support it in
case clients do.
|
|
This allows us to consistently enforce the same Message-ID
extraction rules everywhere and makes it easier for us to
make changes in the future.
Update scripts/ssoma-replay, as well, but don't rely on
PublicInbox::* modules in that since it's legacy and
public-inbox was never a dependency of ssoma.
|
|
We need to escape ampersands (and some other characters for href
attributes), so introduce a `mid_href' sub to do just that.
'<', '>' and '"' were always escaped, so there's no risk of tag
or attribute injection, but creative Message-IDs could cause
confusion for some parsers and generate invalid URLs.
Start getting rid of the bloated, over-engineered OO Hval API
while we're at it, I only noticed this bug because I started
killing off Hval->new* callers.
|
|
I didn't wait until September to do it, this year!
|
|
We use the same idiom in many places for doing two-step
linkification and HTML escaping. Get rid of an outdated
comment in flush_quote while we're at it.
|
|
This gives a 3-4% performance improvement in xt/perf-msgview.t
with a mirror of https://public-inbox.org/meta/
|
|
And use it for the per-message permalink display.
|
|
Mail headers can contain multiple headers of any type, so ensure
we don't hide any information we're getting in the per-message
permalink views.
This means it's possible to have multiple From, Date, To, Cc,
Subject, and In-Reply-To headers displayed.
The thread indices are a special case, I guess, since we run
out of space on the line if the headers too long and tools like
mutt only show the first one.
|
|
|
|
The "\w" character class in Perl matches any word characters
in the Unicode database, not just ASCII characters. So we
must be prepared for that and generate links to IDNs.
|
|
Dangling parentheses with trailing punctuation usually means the
parentheses is not intended as part of the URL.
|
|
The URLs at the top of WwwStream.pm weren't getting linkified
correctly.
|
|
Sometimes users will write "http://example.com" without the
trailing slash, which every browser and tool I've tested seems
to understand.
|
|
Using update-copyrights from gnulib
While we're at it, use the SPDX identifier for AGPL-3.0+ to
ease mechanical processing.
|
|
Sometimes, URLs exist at the end of parethesized statements,
and we shouldn't unnecessarily capture that.
(example: https://public-inbox.org/ruby-core/20170623032722.GA8124@dcvr/)
|
|
This results in over 1% speedup doing $MESSAGE_ID/T/ HTML
generation for a 368-message thread.
|
|
Although unescaped parentheses in URLs are technically allowed,
they are uncommon. However, Markdown-like syntaxes are
unfortunately common for URLs, so we might as well support them.
This fixes parentheses detection at sentence endings, as seen
in practice on emails.
|
|
This reverts commit 130d0c4e33c5c73dc69e270fc698735d49e0f159.
|
|
Although unescaped parentheses in URLs are technically allowed,
they are uncommon. However, Markdown-like syntaxes are
unfortunately common for URLs, so we might as well support them.
|
|
We're not to-the-letter about percent-encoding, but
we should allow all the characters. This is mainly
so we can effectively use the link to some Wikipedia
pages with parentheses in them:
https://en.wikipedia.org/wiki/Atom_(standard)
https://en.wikipedia.org/wiki/Git_(software)
|
|
GoogleGroups URLs often contain '!' in them
|
|
Adding ':' (colon), ',' (comma), '$' (dollar sign) and
supporting TLS-enabled schemes: ftps, nntps variants as
well as gopher :D
|
|
Tilde is common for some homepages: http://example.org/~user/
There's probably some other acceptable characters I'm missing.
|
|
It seems common for users to end statements with URLs,
while it is rare for a URL itself to end with a '.' or ';'.
So make a guess and assume the URL was intended to not
include the trailing '.' or ';'
|
|
This will allow us to more easily reuse it elsewhere.
|