about summary refs log tree commit homepage
path: root/lib/PublicInbox/View.pm
DateCommit message (Collapse)
2024-02-09view: decode In-Reply-To comments added by some MUAs
Štěpán Němec <stepnem@smrk.net> wrote: > Eric Wong wrote: > > Subject: [PATCH] view: decode In-Reply-To comments added by Gnus > Or just "some MUAs"? Who knows who else... Yeah, I wouldn't be surprised if there were more... ---8<--- Subject: [PATCH] view: decode In-Reply-To comments added by some MUAs Emacs-based MUAs (e.g. Gnus and rmail) can do it, and maybe some others, too. I noticed it in <https://yhbt.net/lore/git/xmqqr0ho9oi9.fsf@gitster.g/> while scanning for something else.
2024-01-24view: /$INBOX/ links to topics_{new,active}.html
This makes the new endpoints easier-to-find. The navigation is still at the bottom of the page since I figured having it at the top is too cluttered for users on small terminals.
2024-01-10address: avoid [ undef, undef ] address pairs
For totally bogus things in address fields, we'll fall back to showing the original entry in the name column when using Email::Address::XS. The pure Perl version differs here, but we'll just let them be different when it comes to handling bogus data.
2024-01-10www: linkify inbox addresses in To/Cc headers
This makes it easier to discover contemporary messages crossposted to other groups within the same WWW instance. The internal cache is necessary for giant threads, and the expiry mechanism is necessary to prevent attackers from trivially OOM-ing.
2024-01-02view: always show strict|loose note w/ multi-roots
For thread skeletons with multiple roots, it makes sense to note the strict|loose delineation even when the first message matches the desired Message-ID.
2023-11-29www: load and use cindex join data
This is a major step in solving the problem of having to manually associate hundreds/thousands of coderepos with hundreds/thousands of public-inboxes to power solver (and more).
2023-02-04www: sort all /$INBOX/ topics by Received: timestamp
Our previous pinning prevention only worked to prevent older (non-most-recent) topics from being pinned to the landing page, but not the most recent window of messages. We still sort messages within threads by Date: because that makes git-send-email patchsets display more nicely, but we don't want recent topics pinned due to future Date: headers. I nearly switched sort_ds() back to sorting by Received: until I looked back on commit 8e52e5fdea416d6fda0b8d301144af0c043a5a76 (use both Date: and Received: times, 2018-03-21) and was reminded git-send-email relies on Date: for large series, so I added a note about it for sort_ds(). Reported-by: Kyle Meyer <kyle@kyleam.com> Tested-by: Kyle Meyer <kyle@kyleam.com> Link: https://public-inbox.org/meta/87edr5gx63.fsf@kyleam.com/
2023-01-11www: /$INBOX/$MSGID/d/ to diff reused Message-IDs
To ensure users aren't abusing the ability to reuse Message-IDs, provide a convenient front-end to `lei mail-diff' from WWW. Most of the time it's just list-appended signatures, so I expect this to be useful for /all/ users.
2022-09-29www: remove "1\n" lines in $MSGID/t/ view
Fixes: ab9c03ff4aa3 "www: use PerlIO::scalar (zfh) for buffering"
2022-09-11view: fix solver links with multiple messages
For redundant messages sharing Message-IDs, the link to solver (/$INBOX/$OID/s/) was going up too many levels for /$INBOX/$MSGID/ when there were multiple messages sharing the same $MSGID. Unfortunately, redundant messages are common with /all/ due to signature trailers. So dynamically assigning {-spfx} is tricky and error prone from counting `/'. So simplify the code a bit by setting {-spfx} once per HTTP request, instead of every single message.
2022-09-10www: use PerlIO::scalar (zfh) for buffering
Calling Compress::Raw::Zlib::deflate is fairly expensive. Relying on the `.=' (concat) operator inside ->zadd operator is faster, but the method dispatch overhead is noticeable compared to the original code where we had bare `.=' littered throughout. Fortunately, `print' and `say' with the PerlIO::scalar IO layer appears to offer better performance without high method dispatch overhead. This doesn't allow us to save as much memory as I originally hoped, but does allow us to rely less on concat operators in other places and just pass a list of args to `print' and `say' as a appropriate. This does reduce scratchpad use, however, allowing for large memory savings, and we still ->deflate every single $eml.
2022-09-10www: switch to zadd for the majority of buffering
This allows us to focus string concatenations in one place to allow Perl internal scratchpad optimizations to reuse memory. Calling Compress::Raw::Zlib::deflate repeatedly proves too expensive in terms of CPU cycles.
2022-09-10www: drop {obuf} use entirely, for now
This may help us identify hot spots and reduce pad space as needed.
2022-09-10view: switch a few things to ctx->zmore
Unfortunately, this is actually slower. However, this hopefully makes it easier to improve the internals and make performance improvements down the line.
2022-09-10view: html_footer: avoid escaping " in a few places
qq() is a nice alternative to "" when there's embedded " characters in HTML entities.
2022-09-10view: html_footer: remove obuf dependency
Another step towards giving us more options for speedups and memory reductions.
2022-09-10view: html_footer: golf out a few lines
We can build `$u' in one line, and drop an unnecessary empty line to reduce the amount of scrolling required to read this sub.
2022-09-10view: reduce ascii_html calls and {obuf} use
We can rely on {-html_tip} for some things at the top of the page, and reduce ascii_html and obfuscate_addrs calls by working on the whole buffer at once.
2022-09-10view: _th_index_lite: use `//' defined-or op
Just something I noticed while evaluating this subroutine for the buffering overhaul.
2022-09-10view: _th_index_lite: avoid one s///, improve symmetry
We can replace an expensive `s///' substitution with a simpler `chop'. Furthermore, we can delay the "</b>\n" replacement to ensure it's on the same line of Perl code as the `<b>' opening tag for readability.
2022-09-10view: attach_link: reduce obuf manipulation
This is another steep towards reducing the maximum size of an obuf by eventually doing compression earlier while we render messages as HTML. And do some golfing while we're at it...
2022-09-10view: reduce subroutine calls for submsg_hdr
Favor fewer, yet more expensive operations than many smaller ones. While we're still directly manipulating ctx->{obuf} after this, this change makes it easier for us to avoid doing so in the future.
2022-09-10view: remove multipart_text_as_html
It seems like a pointless wrapper function that's not saving us a whole lot. Drop some direct {obuf} manipulation while we're at it.
2022-09-10view: eml_entry: reduce manipulation of ctx->{obuf}
This is another step towards avoid unnecessary copies and pad space waste.
2022-09-10view: simplify _parent_headers
Having References but lacking In-Reply-To is an uncommon case with email, nowadays. So just rely on ->linkify_mids to handle linkification and HTML escaping Furthermore, headers are short enough to return as-is (and rely on CoW improvements in Perl 5.1x) since linkify_mids needs to operate on an independent string, anyways.
2022-09-10viewvcs: use shorter and simpler ctx->html_done
We only return 200s for any response large enough to warrant ->html_done, so we can just assume it. ViewVCS can also take advantage of it with some tweaking to avoid an extra method dispatch.
2022-09-10www_stream: aresponse assumes 200, too
There's no reason to be streaming large amounts of HTML for anything other than a 200 response.
2022-09-10view: rework single message page to compress earlier
We can rely on deflate to compress large thread skeletons on single message pages. Subsequent commits will compress bodies, as well.
2022-09-08view: drop unnecessary comma in date range note
I'm not sure how it got there, but it seems out-of-place in retrospect.
2022-09-02www: omit [thread overview] link for unindexed v1
Unindexed v1 inboxes do not have the thread overview skeleton at the bottom of /$MSGID/ pages, so do not link to it. And for rare messages without a Date: header (or any headers!), this also ensures the [thread overview] is shown regardless.
2022-09-02www: fix top nav bar for unindexed v1 inboxes
For /$INBOX/$MSGID/ pages, we need to point all nav bar links ../ regardless of whether ->over exists. I've also verified this doesn't affect /$INBOX/new.html at all.
2022-09-02www: always show subject for root of thread skeleton
For users with short attention spans, the root message of should have the Subject, since <title> is often truncated in most browsers.
2022-08-29www: provide text/help/#search anchor
This allows jumping to the appropriate section of the "help" from under the dfblob textarea search.
2022-08-29www: atom: fix "changed" href to nowhere
The HTML generated for the Atom feed doesn't have the footer of /T/ and /t/ HTML-only views, so just make "changed" in the diffstat go directly to the permalink #related anchor. Fixes: 66512e177390 ("view: generate query in single-message and commit views")
2022-08-29view: cleanups and reuse for {obuf} preparation
{obuf} will eventually go away and we'll write directly to {zbuf}, but as an intermediate step we'll make some changes to rely less on return values. While we're in the area, reuse Linkify objects in more places where possible to save some allocations.
2022-08-29view: /$INBOX/: show "messages from $old to $new"
With the ViewVCS commit view using /$INBOX/?t=YYYYMMDDhhmmss- links, the use of `t=' may not be immediately obvious to a reader and confuse them into thinking the inbox hasn't been updated in a while. So add a header to the top of the page whenever the `t=' query parameter is used. And kill a couple of redundant variable assignments while we're at it.
2022-08-29treewide: ditch inbox->recent method
It's a needless wrapper, nowadays. Originally, ->over was added on experimental basis to optimize for /$INBOX/ where Xapian ->search is slower on gigantic (LKML-sized) inboxes. Nowadays with extindex, ->over is here to stay given NNTP and IMAP both benefit from it. So reduce the interpreter stack overhead and just access ->over directly. lxs->recent was never used outside of tests, anyways. And while we're in the area, avoid needlessly bumping the refcount of $ctx->{ibx} in View::paginate_recent.
2022-08-29view: speed up /$INBOX/ landing page by 0.5-1.0%
Array lookups and extra arithmetic in Perl is slower than bumping the internal array offset inside the interpreter. Fwiw, using: my ($level, $subj) = splice(@extra, 0, 2) did not result in a performance improvement.
2022-08-29www: allow html_oneshot to take an array arg
Another step towards making our internal APIs more writev-like and reducing the copies needed for `join' or `.=' concatenation.
2022-08-26view: add "this message" link above dfblob: textarea
When jumping to #related from /T/ or /t/ views, it could be disconcerting to not have the current message as context. So add a "this message" link back up to #t as we have always done with the reply instructions.
2022-08-23view: generate query in single-message and commit views
The dfblob: search prefix is probably under-utilized, but is extremely powerful IMHO. To make it easier-to-use, add a search textarea with it prefilled with values for the existing patch message. This allows users to easily run a query for all patches which alter or result in either pre or post-image blobs in the current patch. Behavior changes are as follows: "changed" in the diffstat jumps to the bottom of the message. For /T/ and /t/, it goes to the "related" anchor which is just above the reply instructions in the single-message view. For the single message view, it'll jump to the textarea search form. I initially wanted to use a normal `<a href=' link, but figured the textarea is advantageous for two reasons: 1) users should be able to edit the query before submitting 2) crawlers are less likely to waste CPU/disk on forms It's probably too noisy to add this directly to the /T/ and /t/ views, but seems like a good place to put above the reply instructions in the single message view. Note that the queries used by the /$COMMIT_OID/s/ view is subtly different than the /$MSGID/ view since git will lengthen its abbreviations over time, while emails are immutable. I tried adding dfn: (filename) and s: (subject) support, but couldn't come up with cases where it really made sense for /$MSGID/. /$COMMIT_OID/s/ may benefit from it, since patchid: could be flaky due to non-standard diff generation options.
2022-08-20view: do not show pagination footer for small inboxes
For new public inboxes with few messages, the dead pagination footer is a worthless and confusing waste of space: "page: \n"; without `next' or `prev' links for users to follow.
2022-08-04view: avoid intermediate array when streaming thread
We can rely on auto-vivification to avoid an intermediate array for the map result.
2022-07-28www: drop --subject from "git send-email" instructions
Apparently, --subject doesn't work[1] with "git send-email" in this context. So drop the CLI arg and add a note to tell the user to set a "Subject:" line in their response body, instead. [1] I'm not sure if --subject ever worked as I thought it would, or if it's a regression. In either case, there are current versions of git where it doesn't, so just tell users to use the currently supported method. Link: https://80x24.org/lore/git/CAC4O8c-Tf11CpwuRudyrpXv5bGshuyEenV9kKrs0zRWER-+yHA@mail.gmail.com/
2022-04-02view: remove unused $end variable
Noticed while looking at something else completely unrelated...
2022-02-11view: remove all CR before LF
While we've rendered CR-LF as LF-only in HTML for many years, some messages end up as CR-CR-LF. So strip ALL all CR bytes preceding LF bytes, while preserving odd CR in the middle of lines. Reported-by: Thomas Weißschuh <thomas@t-8ch.de> Link: https://public-inbox.org/meta/8d13668f-cac7-4984-bb4e-ad90502dc46d@t-8ch.de/
2021-10-24thread: avoid Perl5 internal scratchpad target cache
The use of array-returning built-ins such as `grep' inside arrayref declarations appears to result in permanently allocated scratchpad space for caching according to my malloc inspector. Thread skeletons get discarded every response, but multiple skeletons can exist in memory at once, so do what we can to prevent long-lived allocations from being made, here. In other words, replacing constructs such as: my $foo = [ grep(...) ]; with: my @foo = grep(...); Seems to ensure the mortality of the underlying array.
2021-10-09view: save memory by dropping smsg->{from_name} on use
We'll also save a few LoC when generating it. $smsg objects can linger a while when rendering large threads, so saving a few bytes here can add up to several hundred KB saved. I noticed this while chasing the ref cycle leak in commit b28e74c9dc0a (www: fix ref cycle from threading w/ extindex, 2021-10-03). While there's no longer a leak, releasing memory earlier can allow it to be reused sooner and reduce both memory traffic and memory pressure.
2021-10-09view: discard Eml->{bdy} when done using
We can release the raw body buffer once we've obtained a copy of the decoded buffer. This reduces memory pressure ahead of some expensive diff processing.
2021-10-06msg_iter: split_quotes adds trailing "\n"
The regexp in split_quotes relies on the presence of a final "\n", so add it wherever we need to instead of making it the responsibility of every caller. This probably doesn't matter in practice since every email seems to have a "\n" as the final byte (due to the way SMTP works), but maybe there's some odd ones that'll get imported via lei.