about summary refs log tree commit homepage
path: root/lib/PublicInbox/SearchView.pm
DateCommit message (Collapse)
2024-01-10www: linkify inbox addresses in To/Cc headers
This makes it easier to discover contemporary messages crossposted to other groups within the same WWW instance. The internal cache is necessary for giant threads, and the expiry mechanism is necessary to prevent attackers from trivially OOM-ing.
2023-05-31searchview: clarify that only pct% links are same page
Non-matching messages in the skeleton aren't rendered on the same page.
2023-05-31searchview: fix 80-column violation for "above" link
I think just noting "options" is enough and the mbox download buttons are visible enough at the top of the search results pages.
2023-01-18searchview: fix uninitialized variable
Seems harmless, but noise in logs is not good.
2023-01-13search_view: show "No results" text on 404
Oops, this was broken a while ago Fixes: 55263c56cf41c87f (wwwstream: reduce blob fetch paths for ->getline, 2020-07-05)
2022-12-15www_listing: drop "sort options + mbox downloads" bit
The sort options and mbox downloads only apply to individual inbox search endpoints, and they make no sense for the listing of inboxes themselves.
2022-09-10www: use PerlIO::scalar (zfh) for buffering
Calling Compress::Raw::Zlib::deflate is fairly expensive. Relying on the `.=' (concat) operator inside ->zadd operator is faster, but the method dispatch overhead is noticeable compared to the original code where we had bare `.=' littered throughout. Fortunately, `print' and `say' with the PerlIO::scalar IO layer appears to offer better performance without high method dispatch overhead. This doesn't allow us to save as much memory as I originally hoped, but does allow us to rely less on concat operators in other places and just pass a list of args to `print' and `say' as a appropriate. This does reduce scratchpad use, however, allowing for large memory savings, and we still ->deflate every single $eml.
2022-09-10www: switch to zadd for the majority of buffering
This allows us to focus string concatenations in one place to allow Perl internal scratchpad optimizations to reuse memory. Calling Compress::Raw::Zlib::deflate repeatedly proves too expensive in terms of CPU cycles.
2022-09-10www_stream: aresponse assumes 200, too
There's no reason to be streaming large amounts of HTML for anything other than a 200 response.
2022-09-10www_atom_stream: require 200 response
This simplifies parameter passing at the moment. I can't imagine an Atom feed reader would be parsing XML for 404s or other error codes.
2022-07-21www: note "x=m" and "t=1" (mis)use for GET requests
We require "x=m" (requests for mboxes) to be POST requests to avoid unnecessary traffic from crawlers. "t=1" only collapses threads in the summary view, which isn't normally accessible from <form> elements. This also fixes the missing "[summary|nested]" element when "x=m" is used.
2021-10-24thread: avoid Perl5 internal scratchpad target cache
The use of array-returning built-ins such as `grep' inside arrayref declarations appears to result in permanently allocated scratchpad space for caching according to my malloc inspector. Thread skeletons get discarded every response, but multiple skeletons can exist in memory at once, so do what we can to prevent long-lived allocations from being made, here. In other words, replacing constructs such as: my $foo = [ grep(...) ]; with: my @foo = grep(...); Seems to ensure the mortality of the underlying array.
2021-10-13treewide: use warn() or carp() instead of env->{psgi.errors}
Large chunks of our codebase and 3rd-party dependencies do not use ->{psgi.errors}, so trying to standardize on it was a fruitless endeavor. Since warn() and carp() are standard mechanism within Perl, just use that instead and simplify a bunch of existing code.
2021-10-09view: save memory by dropping smsg->{from_name} on use
We'll also save a few LoC when generating it. $smsg objects can linger a while when rendering large threads, so saving a few bytes here can add up to several hundred KB saved. I noticed this while chasing the ref cycle leak in commit b28e74c9dc0a (www: fix ref cycle from threading w/ extindex, 2021-10-03). While there's no longer a leak, releasing memory earlier can allow it to be reused sooner and reduce both memory traffic and memory pressure.
2021-10-01search_view: various navigation tweaks
This improves the "&x=t" navigation between the thread overview (skeleton) section at the bottom and jumping back to the top for the mbox download form. The "--links below ..." text ought to be helpful for users unfamiliar with the /$MSGID/T/ and /$MSGID/t/ views.
2021-06-23www_listing: start updating for pagination + search
When dealing with thousands of inboxes, displaying all of them on a single page isn't going to work. So steal some pagination and search results code from the message search to generate some basic HTML output that looks good in w3m.
2021-03-21searchview: collapse Message-ID links in summary
There's no point in showing duplicate links to the same Message-ID in summary view. The per-message page will note the duplication (if any) separately. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> Link: https://public-inbox.org/meta/20210317132723.xx4klonordhsb6ve@chatter.i7.local/
2021-02-11search: use git approxidate in WWW and "lei q --stdin"
This greatly improves the usability of d:, dt:, and rt: search prefixes for users already familiar git's "approxidate" feature. That is, users familiar with the --(since|after|until|before)= options in git-log(1) and similar commands will be able to use those dates in the WWW UI.
2021-02-07lei: replace --thread with --threads
Nobody is expected to use long options, but for consistency with mairix(1), we'll use the pluralized option throughout (including existing PublicInbox::{Search,SearchView}). Link: https://public-inbox.org/meta/20210206090119.GA14519@dcvr/
2021-01-12lei query + pagination sorta working
Parallelism and interactivity with pager + SIGPIPE needs work; but results are shown and phrase search works without shell users having to apply Xapian quoting rules on top of standard shell quoting.
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2020-12-09treewide: replace {-inbox} with {ibx} for consistency
{ibx} is shorter and is the most prevalent abbreviation in indexing and IMAP code, and the `$ibx' local variable is already prevalent throughout. In general, the codebase favors removal of vowels in variable and field names to denote non-references (because references are "lighter" than non-references). So update WWW and Filter users to use the same code since it reduces confusion and may allow easier code sharing.
2020-12-05isearch: emulate per-inbox search with ->ALL
Using "eidx_key:" boolean prefix to limit results to a given inbox, we can use ->ALL to emulate and replace per-Inbox xap15/[0-9] search indices. With this change, the presence of "extindex.all.topdir" in the $PI_CONFIG will cause the WWW code to use that extindex and ignore per-inbox Xapian DBs in xap15/[0-9]. Unfortunately IMAP search still requires old per-inbox indices, for now. Mapping extindex Xapian docids to per-Inbox UIDs and vice-versa is proving tricky. Fortunately, IMAP search is rarely used and optional. The RFCs don't specify expensive phrase search, either, so `indexlevel=medium' can be used in per-inbox Xapian indices to save space. For primarily WWW (and future JMAP) users; this should result in significant disk space, FD, and page cache footprint savings for large instances with many inboxes and many cross-posted messages.
2020-12-05search: remove mdocid export
There's no need to export it, as shown by the change to SearchView. This should pave the way to making search more flexible and allow per-Inbox search to reuse ->ALL.
2020-09-03search: replace ->query with ->mset
Nearly all of the search uses in the production code rely on a Xapian mset iterator being returned (instead of an array of $smsg objects). So default to returning the mset and move the burden of smsg array conversion into the test cases.
2020-08-28www: more descriptive pagination
Being an easily confused person, I find "next" and "prev" ambiguous as to whether messages on the next or previous page will be newer or older than the current page. Clarify that for the threaded /$INBOX/ view and search results. For search results sorted by relevance, we'll use "[>= $SCORE]" or "[<= $SCORE]" to indicate to indicate directionality. This also fixes $INBOX/new.html for unindexed v1 inboxes.
2020-08-28www: improve navigation around contemporary threads
Sometimes it's useful to quickly get to threads and messages which are contemporaries of the current thread/message being focused on. This hopefully improves navigation by making: a) the top line (where $INBOX_DIR/description) is shown a link to the latest topics in search results and per-thread/per-message views. b) providing a link to contemporaries ("~YYYY-MM-DD") at around the thread overview skeleton area for per-thread and per-message views
2020-08-23mbox: disable "&t" on existing Xapian until full reindex
Expanding threads via over.sqlite3 for mbox.gz downloads without Xapian effectively collapsing on the THREADID column leads to repeated messages getting downloaded. To avoid that situation, use a "has_threadid" Xapian metadata flag that's only set on --reindex (and brand new Xapian DBs). This allows admins to upgrade WWW or do --reindex in any order; without worrying about users eating up bandwidth and CPU cycles.
2020-08-23search: support downloading mboxes results with full thread
Finally, the addition of THREADID for collapsing results in Xapian lets us emulate the "mairix --threads" feature. That is, instead of returning only the matching messages, the entire thread is included in the downloaded mbox.gz This requires a "public-inbox-index --reindex" to be usable.
2020-08-22searchview: fix mbox.gz downloads for lynx users
Unlike w3m and links, the lynx browser seems to require a `name' attribute for `<input type=submit>' elements. Maybe some other browsers do, too. The `name' attribute for submit elements doesn't seem to cause any harm for w3m or links, users, either; despite not (AFAIK) being part of historical or current HTML specs.
2020-08-20search: add mset_to_artnums method
We can avoid importing mdocid() in several places by using this method, simplifying callers.
2020-08-20searchview: convert nested and Atom display to over.sqlite3
git blob retrieval dominates on these, "&x=t" (nested) is roughly the same due to increased overhead for ->get_percent storage balancing out the mass-loading from SQLite. Atom "&x=A" is sped up slightly and uses less memory in the long-lived response.
2020-08-20searchview: speed up search summary by ~10%
Instead of loading one article at-a-time from over.sqlite3, we can use SQL to mass-load IN (?,?, ...) all results with a single SQLite query. Despite SQLite being in-process and having no network latency, the reduction in SQL query executions from loading multiple rows at once speeds things up significantly. We'll keep the over->get_art optimizations from the previous commit, since it still speeds up long-lived responses, slightly.
2020-08-20searchview: use over.sqlite3 instead of Xapian docdata
This is a step towards improving kernel page cache hit rates by relying on over.sqlite3 for document data instead of Xapian. Some micro-optimization to over->get_art was required to maintain performance.
2020-08-20searchquery: split off from searchview
Since this was already a separate package, split it off into its own file since SearchView may not handle inbox groups.
2020-08-20www: reduce long-lived PublicInbox::Search references
While this is unlikely to be a problem in current practice, keeping Xapian DBs open for long responses can interfere with free space recovery after -compact. In the future, it will interfere with inbox search grouping and lead to unexpected results.
2020-07-06view: simplify eml_entry callers further
This simplifies the primary callers of eml_entry while only making mknews.perl worse.
2020-07-06view: eml_entry: reduce parameters
We can save stack space and simplify subroutine calls, here.
2020-07-06ssearchview: /$INBOX/?q=$QUERY&x=t uses async blobs
Another 10% or so speedup when displaying full messages off search results.
2020-07-06wwwstream: reduce blob fetch paths for ->getline
This will make it easier to support asynchronous blob retrievals. The `$ctx->{nr}' counter is no longer implicitly supplied since many users didn't care for it, so stack overhead is slightly reduced.
2020-07-06wwwstream: reduce object graph depth
Like with WwwAtomStream and MboxGz, we can bless the existing $ctx object directly to avoid allocating a new hashref. We'll also switch from "->" to "::" to reduce stack utilization.
2020-06-03www: remove smsg_mime API and adjust callers
To further simplify callers and avoid embarrasing memory explosions[1], we can finally eliminate this method in favor of smsg_eml. [1] commit 7d02b9e64455831d3bda20cd2e64e0c15dc07df5 ("view: stop storing all MIME objects on large threads") fixed a huge memory blowup.
2020-06-03wwwatomstream: convert callers to use smsg_eml
We can simplify WwwAtomStream callbacks by performing ->smsg_eml calls in the `feed_entry' sub itself. This simplifies callers, by reducing the number of places which can load an Eml object into memory.
2020-04-17searchthread: reduce indirection by removing container
We can rid ourselves of a layer of indirection by subclassing PublicInbox::Smsg instead of using a container object to hold each $smsg. Furthermore, the `{id}' vs. `{mid}' field name confusion is eliminated. This reduces the size of the $rootset passed to walk_thread by around 15%, that is over 50K memory when rendering a /$INBOX/ landing page.
2020-03-22rename PublicInbox::SearchMsg => PublicInbox::Smsg
Since the introduction of over.sqlite3, SearchMsg is not tied to our search functionality in any way, so stop confusing ourselves and future hackers by just calling it "PublicInbox::Smsg". Add a missing "use" in ExtMsg while we're at it.
2020-02-27searchview: improve naming and simplify hash override
`%over' could be confused for the overview SQLite DB instance, so call it `%override', instead. There's also no need to write a loop to override a hash when the language can do it for us.
2020-02-24searchview: set obfuscation inbox properly
We never lookup `$ctx->{-obfuscate}' anywhere, as the correct key is `$ctx->{-obfs_ibx}' since some of the address obfuscation stuff is inbox-specific. Note: some of the obfuscation stuff still needs tests, but it's low-priority at the moment since I don't think it's a good feature after all.
2020-02-16view: escape ampersand in Message-IDs
We need to escape ampersands (and some other characters for href attributes), so introduce a `mid_href' sub to do just that. '<', '>' and '"' were always escaped, so there's no risk of tag or attribute injection, but creative Message-IDs could cause confusion for some parsers and generate invalid URLs. Start getting rid of the bloated, over-engineered OO Hval API while we're at it, I only noticed this bug because I started killing off Hval->new* callers.
2020-02-16view,searchview: avoid smsg method calls when using SQLite/Xapian
We already pre-populate the hashref when loading $smsg (PublicInbox::SearchMsg) objects out of over.sqlite3 or Xapian, so making expensive method calls isn't necessary in those cases. We only need to use the method calls when SQLite or Xapian are not available or are being populated (such as during indexing).
2020-02-06treewide: run update-copyrights from gnulib for 2019
I didn't wait until September to do it, this year!