about summary refs log tree commit homepage
path: root/lib/PublicInbox/Hval.pm
DateCommit message (Collapse)
2021-04-27lei q + lcat: support --format=text output
This is mainly for "lei lcat" where it's the default, but I find it useful anyways compared to the JSON view. Colors are loaded from ~/.config/lei/config, and fall back to using diff colors from a normal git config (e.g. ~/.gitconfig).
2021-04-11www: do not obfuscate addresses in URLs
As they are likely Message-IDs. If an email address ends up in a URL, then it's likely public, so there's even less reason to obfuscate that particular address. [km: add xt/perf-obfuscate.t] [ew: modernize perf test (5.10.1), use diag instead of print] This version of the patch avoids the massive slowdown noted by Kyle in <https://public-inbox.org/meta/87wnt9or6t.fsf@kyleam.com/>. Performance remains roughly the same, if not slightly faster (which may be due to me testing this on a busy server). Results from xt/perf-obfuscate.t against 6078 messages on a local mirror of <https://public-inbox.org/meta/>: before: 6.67 usr + 0.04 sys = 6.71 CPU after: 6.64 usr + 0.04 sys = 6.68 CPU Reported-by: Kyle Meyer <kyle@kyleam.com> Helped-by: Kyle Meyer <kyle@kyleam.com> Link: https://public-inbox.org/meta/87a6q8p5qa.fsf@kyleam.com/
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2020-08-28www: improve navigation around contemporary threads
Sometimes it's useful to quickly get to threads and messages which are contemporaries of the current thread/message being focused on. This hopefully improves navigation by making: a) the top line (where $INBOX_DIR/description) is shown a link to the latest topics in search results and per-thread/per-message views. b) providing a link to contemporaries ("~YYYY-MM-DD") at around the thread overview skeleton area for per-thread and per-message views
2020-07-10hval: to_filename: return `undef' instead of empty string
Returning an empty string for a filename makes no sense, so instead return `undef' so the caller can setup a fallback using the "//" operator. This fixes uninitialized variable warnings because split() on an empty string returns `undef', which caused to_filename to warn on s// and tr// ops.
2020-04-09triewyde: ficks soem speling errrors
Dikshunarees R gude!
2020-04-07view: do not redundantly obfuscate addresses
We shouldn't rerun the address obfuscator on data we've already run through. Instead, run through the unescaped text part and substitute the UTF-8 "\x{2022}" substitution before it hits HTML escaping Fixes: 9bdd81dc16ba6511 ("view: msg_iter calls add_body_text directly")
2020-02-24hval: ascii_html: drop CRLF => LF conversion
Instead, we add CRLF conversion to the only remaining place which needs it, ViewVCS. This save many redundant ops in in many places. The only other place where this mattered was in View::add_text_body, but we already started doing CRLF conversions when we added diff parsing and link generation for ViewVCS. Otherwise, all other places we used this was for header viewing and Email::MIME doesn't preserve CRLF in headers.
2020-02-16view: remove last Hval->new caller
The object-oriented Hval API turned out to be less useful and more clunky than I envisioned years ago, so get rid of it. We'll no longer strip trailing whitespace from From: headers in the HTML display, but I doubt anybody cares.
2020-02-16view: escape ampersand in Message-IDs
We need to escape ampersands (and some other characters for href attributes), so introduce a `mid_href' sub to do just that. '<', '>' and '"' were always escaped, so there's no risk of tag or attribute injection, but creative Message-IDs could cause confusion for some parsers and generate invalid URLs. Start getting rid of the bloated, over-engineered OO Hval API while we're at it, I only noticed this bug because I started killing off Hval->new* callers.
2020-02-06treewide: run update-copyrights from gnulib for 2019
I didn't wait until September to do it, this year!
2020-01-23hval: from_attr: move to unit test
We don't call from_attr anywhere outside of tests, so don't bloat normal processes with it.
2020-01-23hval: to_attr: support wide characters
We need to escape wide characters when making attribute names from filename-looking things in diffstats.
2020-01-06hval: export prurl and add prototype
This allows to do some compile-time checking and fills in a missing "use" in PublicInbox::NewsWWW, allowing it to be used standalone and independently of PublicInbox::WWW
2020-01-02config: support multi-value inbox.*.*url
Since the beginning of this project, we've implicitly supported inboxes with multiple URLs by relying on the Host: header sent by the client ($env->{HTTP_HOST}). We now offer the option to explicitly configure multiple URLs for every inbox along with the ability to do a best-effort match for matching hostnames.
2019-10-31hval: replace "&apos;" with "&#39;" for compatibility
While testing 216light.css changes, I managed to hit some cases where dillo failed to render &apos; correctly, but I also can't reproduce it reliably. Anyways, it's definitely a problem with some old browsers and newer versions of highlight already work around it, but Debian 10.x has 3.41, so use "&#39;" to maximize compatibility.
2019-10-22hval: remove new_oneline
commit 476fc666c223f0fb ('reduce "PublicInbox::Hval->new_oneline" use') was mis-titled, since it completely eliminated ->new_oneline use.
2019-09-09run update-copyrights from gnulib for 2019
2019-06-04www: only emit ASCII chars in attachment filenames
We don't want to emit funky URLs which can be lost in translation or cause problems with non-Unicode-aware clients. Then, don't accept non-ASCII filenames in URLs, since a manually-generated URL/filename in attachment downloads could be used for Unicode homographs to confuse folks who down the attachment.
2019-02-01hval: routines for attribute escaping
We'll use HTML attributes + anchor links to link to filenames in coming commits.
2019-01-28hval: add src_escape for highlight post-processing
We need to post-process "highlight" output to ensure it doesn't contain odd bytes which cause "wide character" warnings or require odd glyphs in source form.
2019-01-21hval: split out escape sequences to a separate table
We'll want to handle those escape sequences independently, "highlight" already does HTML escaping.
2019-01-20www: admin-configurable CSS via "publicinbox.css"
Maybe we'll default to a dark theme to promote energy savings... See contrib/css/README for details
2019-01-19hval: force monospace for <form> elements, too
Same reasoning as commit 7b7885fc3be2719c068c0a2fc860d53f17a1d933, because GUI browsers have a tendency to use a different font-family (and thus different size) as the rest of the page.
2019-01-01hval: set font-size:100% for all elements
GUI browsers have a tendency to use a larger (though sometimes smaller) font than the rest of the page for some reason I could not find... So set everything to 100% to give uniformity to the page; which benefits visually-challenged users who want to use gigantic fonts for the entire page.
2018-02-07update copyrights for 2018
Using update-copyrights from gnulib While we're at it, use the SPDX identifier for AGPL-3.0+ to ease mechanical processing.
2018-01-29reply: follow obfuscation rules for HTML in sh args
Namely, we do not want to obfuscate the mail address of the site itself.
2018-01-16hval: only allow domain obfuscation in address
Obfuscating username portions of the email address leads to having subsequent parts of the address not being obfuscated; which could mean we show someone else's email entirely. In other words, obfuscating "john.doe@example.com" becomes might mean "doe@example.com" is picked up by scanners. In other news, email address obfuscation is still a horrible usability issue and only exists to appease misguided people.
2017-10-04mbox: support inline filename via Content-Disposition header
This is hopefully more sensical than "raw" files from resulting downloads.
2017-06-29hval: only perform one substitution when obfuscating
Only one substitution character is necessary when obfuscating email addresses.
2017-06-23allow admins to configure non-obfuscated addresses/domains
We will also treat all known list addresses as non-obfuscated. By setting publicinbox.noObfuscate in ~/.public-inbox/config, this will allow users to disable address obfuscation on a per-domain or per-address basis.
2017-06-16view: implement optional address obfuscation
This is lightly-tested and seems to work. I'm still hesitant to support this, but the alternative of receiving death threats for displaying unobfuscated addresses seems to be not worth it.
2016-08-14www: do not double-clean Message-IDs from internal DBs
Ensure we usually strip one level of '<>' from Message-IDs, since our internal SQLite, Xapian, and SHA-1 storage all assume that. Realistically, we screw up if somebody has '<<' or '>>', but those are screwed up mail clients and we can deal with it another time. Currently, this means some messages with '>>' in References or Message-Id are not handled correctly, yet, but we match the behavior of Mail::Thread in keeping the extra '>'.
2016-08-14www: do not unecessarily escape some chars in paths
Based on reading RFC 3986, it seems '@', ':', '!', '$', '&', "'", '; '(', ')', '*', '+', ',', ';', '=' are all allowed in path-absolute where we have the Message-ID. In any case, it seems '@' is fairly common in path components nowadays and too common in Message-IDs.
2016-08-14www: ensure XML validity for some odd ASCII chars
I've seen 0x1b (\e) in at least one message and some other possibly non-printable chars. In any case, make sure they're valid XML with us-ascii encoding as far as xmlstarlet(1) thinks so.
2016-07-06hval: get rid of unused parameter for new_msgid
Exposing compressed Message-IDs in URLs was a mistake, remove a remnant of it.
2016-05-18feed: inline feed entry generation
Remove unnecessary wrapper subroutines and constants which are only used once.
2016-03-12reduce "PublicInbox::Hval->new_oneline" use
It's probably a bad idea to strip extraneous whitespace from some headers as an extra space may convey useful information. Newlines don't seem to be preserved by Email::MIME or Email::Simple anyways, so there's no danger in breaking formatting.
2016-02-25hval: implement common UI for protocol-relative URLs
This allows users to avoid HTTPS -> HTTP downgrade warnings, but we will also avoid encouraging them towards HTTPS, for now. IMHO: the CA system gives a false sense of security, TLS libraries (e.g. OpenSSL) can introduce new bugs and problems (even to attack clients), and TLS libraries also eats memory on cheap servers.
2016-01-09hval: new should not strip leading spaces
We should be able to use this for ASCII art and paragraphs
2015-12-25view: favor whitespace wrap in <head>
If we bite the bullet and rely on inline CSS, we might as well only specify it once per page instead of inline in every <pre> tag which may handle UGC. So this actually saves us a small amount of bandwith on most pages which have multiple <pre> start tags.
2015-12-22hval: move PRE constant for wrapping UGC here
User-generated content (UGC) may have excessively long lines which screw up rendering. This is the only bit of CSS we use.
2015-11-20various internal documentation updates
Hopefully this gives new hackers a better overview of how the components relate to each other.
2015-09-30remove unnecessary fields usage
It doesn't actually give performance improvements unless we use types with "my", but we don't do that. We'll only continue using fields with Danga::Socket-derived classes where they're required.
2015-09-06update copyright headers and email addresses
In the future, it should be possible to use this: git ls-files | UPDATE_COPYRIGHT_HOLDER='all contributors' \ UPDATE_COPYRIGHT_USE_INTERVALS=2 \ xargs /path/to/gnulib/build-aux/update-copyright
2015-09-03get rid of Message-ID compression entirely
Provide a fallback for legacy SHA-1 messages, but do not advertise shorter URLs anymore for data portability concerns. This fixes a regression introduced in commit 81a9c1b476987d845b340ab9013d26cf4487cb9a ("search: disable Message-ID compression in Xapian") which ended up breaking thread-related endpoints for large Message-IDs, as lookups on the SHA-1 message no longer worked.
2015-09-02view: avoid links to unknown compressed Message-IDs
Compressed Message-IDs are irreversible and may not be used at other sites. So avoid compressing Message-IDs we do not know about so users have a chance of finding the message in other archives by doing a Message-ID lookup.
2015-08-25mid: mid_compressed => mid_compress
Consistently name mid_* functions as verbs.
2015-08-15extract redundant Message-ID handling code
Quit repeating ourselves and use a common MID module instead.
2014-09-15hval: fixup bad line endings in HTML output
We should do this in filter, too, but sometimes we prefer to avoid filtering the message at all.