about summary refs log tree commit homepage
path: root/lib/PublicInbox/View.pm
DateCommit message (Collapse)
2020-01-27viewdiff: rewrite and simplify
Instead of going line-by-line, use split() with a giant regexp to capture groups of contiguous lines. This offloads state management to the regexp itself and makes it FAR easier to keep track of <span> and </span> pairings. Performance seems roughly on par after this change for the meta@public-inbox archives. It seems a tiny bit faster for git@vger with xt/perf-msgview.t, likely due to the longer messages and larger contiguous groups of lines having the same prefix (or no prefix at all) and drastically reduces the number of subroutine calls and Perl ops executed.
2020-01-27linkify: move to_html over from ViewDiff
We use the same idiom in many places for doing two-step linkification and HTML escaping. Get rid of an outdated comment in flush_quote while we're at it.
2020-01-27view: inline and eliminate msg_html
No need to keep the old sub around, anymore. Rename auxiliary subs to "msg_page_*" instead of the "html" version.
2020-01-27view: start performing buffering into {obuf}
Get rid of the confusingly named {rv} and {tip} fields and unify them into {obuf} for readability. {obuf} usage may be expanded to more areas in the future. This will eventually make it easier for us to experiment with alternative buffering schemes.
2020-01-27view: simplify duplicate Message-ID handling
It's an uncommon code path, no need to make it more complex than it needs to be by having extra sub parameters.
2020-01-27view: thread_skel: drop constant tpfx parameter
It hasn't changed in a few years. Now we can rely on constant folding to avoid extraneous ops to the $skel buffer.
2020-01-27view: reduce parameters for html_footer
Put more logic into html_footer and less in its only caller so we can control the buffering and string creation.
2020-01-27view: improve readability around walk_thread
Pass \&coderefs explicitly to walk_thread, and add some prototypes + comments to describe what goes on.
2020-01-27www: use "skel" terminology consistently
This saves us a few comments and confusion. Yes, it's a destination so "dst" can be appropriate, but we may be using that term elsewhere.
2020-01-25spelling: favor `publicly' over `publically'
While both can be correct, the former seems more common, is shorter, and is also consistent with the spelling found in the AGPL-3.0 text.
2020-01-12www: discard multipart parent on iteration
We're often iterating through messages while writing to another buffer in our WWW interface, causing memory usage to multiply. Since we know we won't need to keep the MIME object around in some cases, and can tell msg_iter to clobber the on-stack variable while it operates on subparts of multipart messages. With xt/mem-msgview.t switched to multipart from the previous commit, this shows a 13 MB memory reduction on that test.
2020-01-06view: update POSIX::strftime usage
The POSIX module is always loaded, so import `strftime' into the namespace so we can use it and take advantage of compile-time arg checking. While we're at it, update and reorder caller functions to use prototypes, too.
2020-01-06hval: export prurl and add prototype
This allows to do some compile-time checking and fills in a missing "use" in PublicInbox::NewsWWW, allowing it to be used standalone and independently of PublicInbox::WWW
2020-01-05view: msg_html: reduce memory use on reused MIDs
In rare cases where Message-IDs get reused, we do not want to hold onto the large Email::MIME objects in memory after showing the first message. So discard each message as soon as we're done using it so we can save memory for the next message. The new and expensive xt/mem-msgview.t test shows a nearly 14MB reduction for two ~7MB messages. run_script() also gets upgraded to make it easier to pass large inputs via IO GLOBs.
2019-12-27view: msg_iter calls add_body_text directly
No need to waste several kilobytes creating an anonymous sub for every invocation of msg_iter.
2019-12-27searchview: remove anonymous sub when sorting threads by relevance
We don't need to return a closure or have a separate hash for sorting threads by relevance. Instead, we can stuff the relevance {pct} into the SearchMsg object itself and use that. Note: upon reviewing this code, the sort-by-relevance seems bogus as it only considers the relevance of the topmost message. Instead, it would make more sense to the user to sort by the highest relevance of all messages in that particular thread.
2019-12-27view: thread_html: pass named sub to WwwStream
We can pass everything we need into the WWW $ctx to avoid allocating kilobytes of memory for an anonymous sub for every $MESSAGE_ID/t/ request.
2019-12-27view: msg_html: stop using an anonymous sub
Stash 5 local variables into the WWW $ctx hash table instead of allocating several kilobytes for an anonymous sub.
2019-12-27view: avoid anon sub in stream_thread
WwwStream already passes the WWW $ctx to the callback sub, so we don't need to create a new sub every call to capture local variables for the callback.
2019-12-21searchview: save a column in &x=t thread skeleton
Displaying "100%" wastes a precious column. Show "99%" instead since there's little practical difference and <xapian/mset.h> states: Note that these generally aren't percentages of anything meaningful (unless you use a custom weighting formula where they are!) And we're not using a custom weighting formula.
2019-12-20view: show percentage in search results thread skeleton
The displays the Xapian ->get_percent value in the skeleton to improve scanning of relevancy; irrelevant results do not display that. This fixes broken #anchor links introduced in the previous commit, irrelevant messages now link to the /$INBOX/$MESSAGE_ID page.
2019-12-20searchthread: fix usage of user-supplied parameter
Instead of only passing an Inbox object, we'll pass the $ctx reference as PublicInbox::SearchView::mset_thread did. So although mset_thread was wrong, we now make it's usage of SearchThread::thread correct and update other callers to favor the new style of passing the entire $ctx (with ->{-inbox}) instead of just the Inbox object. This makes the thread skeleton at the bottom of the search page to show subjects of messages, but unfortunately links to non-existent #anchors. The next commit will fix that. While we're at it, favor "\&foo" over "*foo" since the former makes the code reference (aka "function pointer) obvious so it won't be confused for other things named "foo" in that scope (e.g. $foo/@foo/%foo).
2019-10-28view: show X-Alt-Message-ID in permalink view, too
Since we index X-Alt-Message-ID (because we need to placate some NNTP clients), we now display it as well, since that Message-ID could be the X-Alt-Message-ID that the reader is actually interested in.
2019-10-28linkify: support adding "(raw)" link for Message-IDs
And use it for the per-message permalink display.
2019-10-28view: improve warning for multiple Message-IDs
"refer" is not the correct term, here; since that would mean multiple messages have the current message in the "References:" header, and that's a normal occurence. Instead, we need to warn the reader that the given message itself has multiple Message-IDs.
2019-10-28view: move '<' and '>' outside <a>
Browsers may underline '<' and '>' in links, which may be confused with '≤' and '≥'. So have the Message-ID header display follow what we do with In-Reply-To headers and move the "&lt;" and "&gt;" outside of <a> in the HTML.
2019-10-28view: display redundant headers in permalink
Mail headers can contain multiple headers of any type, so ensure we don't hide any information we're getting in the per-message permalink views. This means it's possible to have multiple From, Date, To, Cc, Subject, and In-Reply-To headers displayed. The thread indices are a special case, I guess, since we run out of space on the line if the headers too long and tools like mutt only show the first one.
2019-09-09run update-copyrights from gnulib for 2019
2019-06-04view: require YYYYmmDD(HHMMSS) timestamps to be ASCII
Passing digits to `timegm' which it does not understand would be a waste of time.
2019-06-04www: only emit ASCII chars in attachment filenames
We don't want to emit funky URLs which can be lost in translation or cause problems with non-Unicode-aware clients. Then, don't accept non-ASCII filenames in URLs, since a manually-generated URL/filename in attachment downloads could be used for Unicode homographs to confuse folks who down the attachment.
2019-05-15www: use Inbox->over where appropriate
We don't need to rely on Xapian search functionality for the majority of the WWW code, even. subject_normalized is moved to SearchMsg, where it (probably) makes more sense, anyways.
2019-04-23view: avoid "1+ messages" in per-message footer of /t/ and /T/
Try to appear gramatically correct and state: "only message in thread" when there's only one known (to us) message in the thread.
2019-04-18view: show "(no subject)" consistently in HTML
Empty subjects ("") and undefined Subjects: are now both displayed as "(no subject)" for now.
2019-04-16cleanup: use '$ibx' consistently when referring to Inbox refs
'$inbox' is more human-readable, so that is for the more human-readable name in most cases. Making our variable naming more consistent should make the code easier-to-review and harder to screw up.
2019-02-13ensure bytes::length is available to callers
We were relying on Danga::Socket using the "bytes" pragma, previously. Nowadays, the "bytes" pragma is not recommended in general, but bytes::length remains acceptable for getting the byte-size of a scalar.
2019-02-01viewdiff: support renames and long paths in diffstat anchors
This is best-effort, but works well-enough in practice for projects which use shell-friendly filenames as well as the long path names for some Linux kernel selftests.
2019-02-01view: simplify quote splitting
Perl "split" can capture and group in the regexp itself, so rely on that to shorten our code. Comparing the /T/ HTML output of a thread from hell (on LKML with 1356 messages) reveals no difference in the rendered result. Only the HTML source differs in newline placement before/after the closing </span> This allows a minor speedup on my X32 Thinkpad @ 1.6GHz with the aforementioned LKML thread from hell: before: 3.67s after: 3.55s
2019-02-01view: fix broken hunk header hrefs in Atom feeds
We use absolute URLs in the Atom feeds (to ease syndication/mirroring), so hunk headers need to point to the solver URLs.
2019-02-01view: diffstat anchors for multi-message/attachment views
diffstat <-> ^diff anchors work within the same attachment or message while in HTML views which display multiple messages.
2019-01-30Merge remote-tracking branch 'origin/viewvcs' into master
* origin/viewvcs: (66 commits) solvergit: deal with alternative diff prefixes solvergit: extract mode from diff headers properly solvergit: avoid "Wide character" warnings solvergit: do not show full path names to "git apply" css/216dark: add comments and tweak highlight colors viewvcs: avoid segfault with highlight.pm at shutdown solvergit: do not solve blobs twice t/check-www-inbox: disable history t/check-www-inbox: don't follow mboxes t/check-www-inbox: replace IPC::Run with PublicInbox::Spawn hval: add src_escape for highlight post-processing viewvcs: wire up syntax-highlighting for blobs hlmod: disable enclosing <pre> tag t/hl_mod: extra check to ensure we escape HTML wwwhighlight: read_in_full returns undef on errors solver: crank up max patches to 9999 viewvcs: do not show final error message twice qspawn: decode $? for user-friendliness solver: reduce "git apply" invocations solver: hold patches in temporary directory ...
2019-01-30view: remove unused _msg_date sub
Not needed since commit 956abe9ad5f13a0d1755262be412d6a54fda72e9 ("view: depend on SearchMsg for Message-ID")
2019-01-26view: swap CRLF for LF in HTML output
It makes no difference to browsers aside from saving a few bytes; and this means we won't have to worry about extra '%0D' showing up in links to solver.
2019-01-20viewdiff: support diff-highlighting w/o coderepo
Having diff highlighting alone is still useful, even if blob-resolution/recreation is too expensive or unfeasible.
2019-01-19view: wire up diff and vcs viewers with solver
2019-01-19view: disable bold in topic display
It seems pointless due to the indentation, and interacts badly with some CSS colouring.
2019-01-08view: more culling for search threads
{mapping} overhead is now down to ~1.3M at the end of a giant thread from hell.
2019-01-08view: fix wrong date for non-Xapian/SQLite v1 users
We need to parse the MIME object in order to get the datestamp for those sites. Fixes: 7d02b9e64455 ("view: stop storing all MIME objects on large threads")
2019-01-08view: stop storing all MIME objects on large threads
While we try to discard the $smsg (SearchMsg) objects quickly, they remain referenced via $node (SearchThread::Msg) objects, which are stored forever in $ctx->{mapping} to cull redundant words out of subjects in the thread skeleton. This significantly cuts memory bloat with large search results with '&x=t'. Now, the search results overhead of SearchThread::Msg and linked objects are stable at around 350K instead of ~7M per response in a rough test (there's more savings to be had in the same areas). Several hundred kilobytes is still huge and a large per-client cost; but it's far better than MEGABYTES per-client.
2018-12-30handle "multipart/mixed" messages which are not multipart
I've found two examples on https://lore.kernel.org/lkml/ where the messages declared themselves to be "multipart/mixed" but were actually plain text: <87llgalspt.fsf@free.fr> <200308111450.h7BEoOu20077@mail.osdl.org> With the mboxrd downloaded, mutt is able to view them without difficulty. Note: this change would require reindexing of Xapian to pick up the changes. But it's only two ancient messages, the first was resent by the original sender and the second is too old to be relevant.
2018-12-28reply: allow ":none=$REASON" in "replyto" config
This can be useful for configuring archives of lists which are no longer active.