about summary refs log tree commit homepage
path: root/lib/PublicInbox/View.pm
DateCommit message (Collapse)
2020-08-07www: avoid warnings on YYYYMMDD-only t= query parameter
While we always generate YYYYMMDDhhmmss query parameters ourselves, the regexps in paginate_recent allow YYYYMMDD-only (no hhmmss) timestamps, so don't trigger Time::Local::timegm warnings about empty numeric comparisons on empty strings when a client starts making up their own URLs.
2020-08-02remove unnecessary ->header_obj calls
We used ->header_obj in the past as an optimization with Email::MIME. That optimization is no longer necessary with PublicInbox::Eml. This doesn't make any functional difference even if we were to go back to Email::MIME. However, it reduces the amount of code we have and slightly reduces allocations with PublicInbox::Eml.
2020-07-06view: simplify eml_entry callers further
This simplifies the primary callers of eml_entry while only making mknews.perl worse.
2020-07-06www: update internal docs
We no longer favor getline+close for streaming PSGI responses when using public-inbox-httpd. We still support it for other PSGI servers, though.
2020-07-06wwwstream: eliminate ::response, use html_oneshot
All of our streaming responses use ::aresponse, now, and our synchronous responses use html_oneshot. So there's no need for the old WwwStream::response.
2020-07-06view: /$INBOX/$MSGID/t/: avoid extra hash lookup in eml case
We can build and buffer the HTML <head> section once the first non-ghost message in a thread is loaded, so there's no need to perform an extra check on $ctx->{nr} once the $eml is ready.
2020-07-06view: eml_entry: reduce parameters
We can save stack space and simplify subroutine calls, here.
2020-07-06view: update /$INBOX/$MSGID/T/ to be async
Another 10% or so speedup in a frequently-hit endpoint.
2020-07-06view: /$INBOX/$MSGID/t/ reads blobs asynchronously
Once again, this shows a ~10% speedup with multi-message threads in xt/httpd-async-stream.t regardless of whether TEST_JOBS is 1 or 100.
2020-07-06view: make /$INBOX/$MSGID/ permalink async
This will allow -httpd to handle other requusts if waiting on an HDD seek or git to decode a blob.
2020-07-06wwwstream: reduce blob fetch paths for ->getline
This will make it easier to support asynchronous blob retrievals. The `$ctx->{nr}' counter is no longer implicitly supplied since many users didn't care for it, so stack overhead is slightly reduced.
2020-07-06wwwstream: reduce object graph depth
Like with WwwAtomStream and MboxGz, we can bless the existing $ctx object directly to avoid allocating a new hashref. We'll also switch from "->" to "::" to reduce stack utilization.
2020-06-03www: remove smsg_mime API and adjust callers
To further simplify callers and avoid embarrasing memory explosions[1], we can finally eliminate this method in favor of smsg_eml. [1] commit 7d02b9e64455831d3bda20cd2e64e0c15dc07df5 ("view: stop storing all MIME objects on large threads") fixed a huge memory blowup.
2020-05-26view: do not offer links to 0-byte multipart attachments
Offering links to download 0-byte files is useless. We could waste memory by preserving $eml->{bdy} during iteration, but offering attachments of type "multipart" is not very useful, as users are usually interested in decoded attachments or the entire raw message. Fixes: e60231148eb604a3 ("descend into message/(rfc822|news|global) parts")
2020-05-17descend into message/(rfc822|news|global) parts
Email::MIME never supported this properly, but there's real instances of forwarded messages as message/rfc822 attachments. message/news is legacy thing which we'll see in archives, and message/global appears to be the new thing. gmime also supports message/rfc2822, so we'll support it anyways despite lacking other evidence of its existence. Existing attachments remain downloadable as a whole message, but individual attachments of subparts are now downloadable and can be displayed in HTML, too. Furthermore, ensure Xapian can now search for common headers inside those messages as well as the message bodies.
2020-05-16view: drop a newline before first attachment link
However, we'll always have a newline before subsequent attachments links after the first. For the initial part of a multipart message, this regression appeared in 1.5.0, but the display was overly clumped in prior relases, too. Fixes: 453dee4881a9c764 ("msg_iter: pass $idx as a scalar, not array")
2020-05-09replace most uses of PublicInbox::MIME with Eml
PublicInbox::Eml has enough functionality to replace the Email::MIME-based PublicInbox::MIME.
2020-05-09msg_iter: pass $idx as a scalar, not array
This doesn't make any difference for most multipart messages (or any single part messages). However, this starts having space savings when parts start nesting. It also slightly simplifies callers.
2020-05-09msg_iter: make ->each_part method for PublicInbox::MIME
The reliance on Email::MIME->subparts is a tad inefficient with a work-in-progress module to replace Email::MIME. So move towards using ->each_part as a class-specific iterator which can take advantage of more class-specific optimizations in the yet-to-be-revealed PublicInbox::Eml and PublicInbox::Gmime classes. The msg_iter() sub remains for compatibility with existing 3rd-party scripts/modules which use our small public Perl API and Email::MIME.
2020-05-07viewdiff: stricter highlighting and linkification check
Sometimes senders draw ASCII tables and such which we get fooled into attempting highlighting and diffstat anchoring. We now require 3 consecutive diff header lines: /^--- /, /^\Q+++\E /, and /^@@ / to enable diff highlighting (whether generated with git or not). The presence of a line matching /^diff / is not sufficient or even useful to us for highlighting diffs, since that could just be part of a line-wrapped sentence. However, we'll now check for the presence of a line matching /^diff --git / before enabling diffstat anchors. Otherwise cover letters for a patch series may fool us into creating anchors for diffstats.
2020-04-22view: actually omit subject text when dumping topics
Despite dump_topics() calling dedupe_subject() on the subject, the index shows partly duplicated subjects, for example ` [PATCH 2/2] t/www_listing: avoid 'once' warnings ` [PATCH v2] t/www_listing: avoid 'once' warnings " In the second line, the omission character " is appended, but the entire subject is shown. To display the subject with duplicated parts omitted, regenerate it from the array that is modified by dedupe_subject().
2020-04-22view: strip omission character from current message in thread view
In the thread view shown at the top of a message, the subject for the current message is dropped, leaving just the sender's name. However, if skel_dump() omitted part of the subject because it was duplicated, the omission character is still displayed: * [PATCH v2] t/www_listing: avoid 'once' warnings 2020-03-21 1:10 ` [PATCH 2/2] t/www_listing: avoid 'once' warnings Eric Wong @ 2020-03-21 5:24 ` " Eric Wong Note the " on the last line. Adjust the regular expression in _th_index_lite() to account for the omission character. [ew: avoid capturing $1, keep under 80 cols]
2020-04-17searchthread: reduce indirection by removing container
We can rid ourselves of a layer of indirection by subclassing PublicInbox::Smsg instead of using a container object to hold each $smsg. Furthermore, the `{id}' vs. `{mid}' field name confusion is eliminated. This reduces the size of the $rootset passed to walk_thread by around 15%, that is over 50K memory when rendering a /$INBOX/ landing page.
2020-04-09triewyde: ficks soem speling errrors
Dikshunarees R gude!
2020-04-07view: do not redundantly obfuscate addresses
We shouldn't rerun the address obfuscator on data we've already run through. Instead, run through the unescaped text part and substitute the UTF-8 "\x{2022}" substitution before it hits HTML escaping Fixes: 9bdd81dc16ba6511 ("view: msg_iter calls add_body_text directly")
2020-04-05release large (non ref) scalars using `undef $sv'
Using `undef EXPR' like a function call actually frees the heap memory associated with the scalar, whereas `$sv = undef' or `$sv = ""' will hold the buffer around until $sv goes out of scope. The `sv_set_undef' documentation in the perlapi(1) manpage explicitly states this: The perl equivalent is "$sv = undef;". Note that it doesn't free any string buffer, unlike "undef $sv". And I've confirmed by reading Dump() output from Devel::Peek. We'll also inline the old index_body sub in SearchIdx.pm to make the scope of the scalar more obvious. This change saves several hundred kB RSS on both -index and -httpd when hitting large emails with thousands of lines.
2020-04-04view: inline flush_quote sub
No point in having an extra sub for a short, commonly called function in the same file.
2020-04-04viewdiff: reduce sub parameter count
We're slowly moving towards doing all of our output buffering into a single buffer, so passing that around on the stack as a dedicated parameter is confusing.
2020-04-04view: dedupe_subject: allow "0" as a valid Subject
While rare in practice (even by spammers), A single "0" could theoretically be the entire contents of a Subject line. So use the Perl 5.10+ defined-or operator to improve correctness of subject deduplication.
2020-04-04view: use defined-or operator to simplify checks
We depend on Perl 5.10 features in other places. Shorten the lifetime of the `$desc' scalar while we're at it.
2020-04-04view: note we assume UTF-8 on unknown encodings
Clarify that we're assuming the text is UTF-8, since users may have no idea how it's mangled.
2020-04-03quiet "Complex regular subexpression recursion limit" warnings
These seem mostly harmless since Perl will just truncate the match and start a new one on a newline boundary in our case. The only downside is we'd end up with redundant <span> tags in HTML. Limiting the number of line matched ourselves with `{1,$NUM}' doesn't seem prudent since lines vary in length, so we continue to defer the job of limiting matches to the Perl regexp engine. I've noticed this warning in practice on 100K+ line patches to locale data.
2020-04-03view: handle the topic-free case properly
There may be no topics for a given timestamp range, so don't attempt to treat `undef' as an arrayref.
2020-04-02mid: add $MID_EXTRACT regexp for export
This allows us to consistently enforce the same Message-ID extraction rules everywhere and makes it easier for us to make changes in the future. Update scripts/ssoma-replay, as well, but don't rely on PublicInbox::* modules in that since it's legacy and public-inbox was never a dependency of ssoma.
2020-03-22rename PublicInbox::SearchMsg => PublicInbox::Smsg
Since the introduction of over.sqlite3, SearchMsg is not tied to our search functionality in any way, so stop confusing ourselves and future hackers by just calling it "PublicInbox::Smsg". Add a missing "use" in ExtMsg while we're at it.
2020-02-17view: shorten life of MIME object for permalink
We don't need to hold onto the Email::MIME object across multiple WwwResponse->getline calls, instead we can stuff the rendered HTML of the first (and hopefully only) message of the buffer into ctx->{-html_tip}.
2020-02-16view: remove last Hval->new caller
The object-oriented Hval API turned out to be less useful and more clunky than I envisioned years ago, so get rid of it. We'll no longer strip trailing whitespace from From: headers in the HTML display, but I doubt anybody cares.
2020-02-16view: escape ampersand in Message-IDs
We need to escape ampersands (and some other characters for href attributes), so introduce a `mid_href' sub to do just that. '<', '>' and '"' were always escaped, so there's no risk of tag or attribute injection, but creative Message-IDs could cause confusion for some parsers and generate invalid URLs. Start getting rid of the bloated, over-engineered OO Hval API while we're at it, I only noticed this bug because I started killing off Hval->new* callers.
2020-02-16view: escape Subject HTML directly
No need to use the over-engineered Hval OO API when the subject is already normalized and there's no trailing spaces because of normalization.
2020-02-16view,searchview: avoid smsg method calls when using SQLite/Xapian
We already pre-populate the hashref when loading $smsg (PublicInbox::SearchMsg) objects out of over.sqlite3 or Xapian, so making expensive method calls isn't necessary in those cases. We only need to use the method calls when SQLite or Xapian are not available or are being populated (such as during indexing).
2020-02-16view: cleanup topic accumulation and dumping
Avoid needlessly normalizing the subject when dumping, since it's pushed into the @$topic array during accumulation in normalized form. We can also safely treat $smsg as a hashref and avoid calling "->ds" as a method since we know we've got that loaded via Over||Search and won't have to use Email::MIME header lookup methods.
2020-02-16view: dump_topics: better naming of top Subject
We use `$top' in other places, so name it to `$top_subj' consistently for `$subj' and `$prev_subj' comparisons down the function.
2020-02-16view: single id="t" for multi-Subject messages
While multi-Subject messages are unfortunate, try not to generate confusing/invalid HTML with multiple elements having the same HTML id attribute.
2020-02-16view: remove mhref arg from multipart_text_as_html
No point in passing something on stack only to stash it into the $ctx which holds most other parameters used for rendering the HTML.
2020-02-06treewide: run update-copyrights from gnulib for 2019
I didn't wait until September to do it, this year!
2020-01-27viewdiff: rewrite and simplify
Instead of going line-by-line, use split() with a giant regexp to capture groups of contiguous lines. This offloads state management to the regexp itself and makes it FAR easier to keep track of <span> and </span> pairings. Performance seems roughly on par after this change for the meta@public-inbox archives. It seems a tiny bit faster for git@vger with xt/perf-msgview.t, likely due to the longer messages and larger contiguous groups of lines having the same prefix (or no prefix at all) and drastically reduces the number of subroutine calls and Perl ops executed.
2020-01-27linkify: move to_html over from ViewDiff
We use the same idiom in many places for doing two-step linkification and HTML escaping. Get rid of an outdated comment in flush_quote while we're at it.
2020-01-27view: inline and eliminate msg_html
No need to keep the old sub around, anymore. Rename auxiliary subs to "msg_page_*" instead of the "html" version.
2020-01-27view: start performing buffering into {obuf}
Get rid of the confusingly named {rv} and {tip} fields and unify them into {obuf} for readability. {obuf} usage may be expanded to more areas in the future. This will eventually make it easier for us to experiment with alternative buffering schemes.
2020-01-27view: simplify duplicate Message-ID handling
It's an uncommon code path, no need to make it more complex than it needs to be by having extra sub parameters.