about summary refs log tree commit homepage
path: root/lib/PublicInbox/Feed.pm
DateCommit message (Collapse)
2023-09-11treewide: favor Xapian (SWIG binding) over Search::Xapian
The Xapian SWIG bindings are favored by Xapian upstream for ease-of-maintenance compared to the XS version. While Debian lags on this front, the SWIG bindings are widely available on all *BSDs.
2023-05-31www: more restrictive query string parsing
Only allow single-character query keys to prevent clients from wasting memory in Perl's hash tables. We'll also perform the utf8::decode and tr/+/ / calls once on the whole query string at once to reduce op calls. This also avoids creating an empty hash in the common case when the QUERY_STRING is empty and instead relies on auto-vivification of Perl.
2022-09-11view: fix solver links with multiple messages
For redundant messages sharing Message-IDs, the link to solver (/$INBOX/$OID/s/) was going up too many levels for /$INBOX/$MSGID/ when there were multiple messages sharing the same $MSGID. Unfortunately, redundant messages are common with /all/ due to signature trailers. So dynamically assigning {-spfx} is tricky and error prone from counting `/'. So simplify the code a bit by setting {-spfx} once per HTTP request, instead of every single message.
2022-09-10feed: new_html_i: switch from zmore to `print $zfh'
eml_entry will enable zfh (PerlIO::scalar) buffering, anyways, so there's no point in calling ->zmore to compress small strings. The use of zfh for the skeleton is debatable, but probably of no consequence given html_footer will hit it, anyways.
2022-09-10www_stream: aresponse assumes 200, too
There's no reason to be streaming large amounts of HTML for anything other than a 200 response.
2022-09-10www_atom_stream: require 200 response
This simplifies parameter passing at the moment. I can't imagine an Atom feed reader would be parsing XML for 404s or other error codes.
2022-08-26www: fix unindexed v1 inboxes w/ public-inbox-httpd
Unindexed v1 inboxes were leaving $smsg objects unpopulated when using public-inbox-httpd (but not generic PSGI servers) and causing missing HTML content and uninitialized value warnings. Our existing tests for unindexed v1 inboxes only assumed generic PSGI servers and synchronous blob retrieval. Due to changes several years ago to make git blob retrieval async for slow storage using public-inbox-httpd, our tests were insufficient to detect this regression. So ensure $smsg->populate runs in a few places and rewrite t/plack.t to test against both generic PSGI and -httpd implementations. Fortunately, unindexed v1 inboxes are uncommon, and this bug was only (finally) discovered while developing other features. For ensuring we can test (and not blindly follow) redirects with -httpd, we now provide our own LWP::UserAgent (used internally by Plack::Test::ExternalServer) with redirect following disabled to P:T:ES::test_psgi.
2022-08-04feed: avoid unnecessary map loop in non-over path
We can bless objects while doing the initial insertion to avoid extra the extra map iteration and temporary array(s). Fewer ops means memory savings for the likely case of ->over users, too.
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2020-12-14PublicInbox::Feed owns `feedmax' default value
There's no need to have extra code in the Inbox package for this or to waste dozens of bytes for every Inbox object which uses the default value. This makes our code more flexible w.r.t Inbox-like ExtSearch objects and fixes uninitialized value warnings with ->ALL.
2020-12-09treewide: replace {-inbox} with {ibx} for consistency
{ibx} is shorter and is the most prevalent abbreviation in indexing and IMAP code, and the `$ibx' local variable is already prevalent throughout. In general, the codebase favors removal of vowels in variable and field names to denote non-references (because references are "lighter" than non-references). So update WWW and Filter users to use the same code since it reduces confusion and may allow easier code sharing.
2020-09-16treewide: relax allow >=40 chars for git OID
This will help with eventual git SHA-256 transitions.
2020-08-28www: more descriptive pagination
Being an easily confused person, I find "next" and "prev" ambiguous as to whether messages on the next or previous page will be newer or older than the current page. Clarify that for the threaded /$INBOX/ view and search results. For search results sorted by relevance, we'll use "[>= $SCORE]" or "[<= $SCORE]" to indicate to indicate directionality. This also fixes $INBOX/new.html for unindexed v1 inboxes.
2020-07-06view: simplify eml_entry callers further
This simplifies the primary callers of eml_entry while only making mknews.perl worse.
2020-07-06view: eml_entry: reduce parameters
We can save stack space and simplify subroutine calls, here.
2020-07-06feed: /$INBOX/new.html fetches blobs asynchronously
Once again this speeds another endpoint up 10% or so.
2020-07-06feed: generate_i: eliminate pointless loop
$ctx->{msgs} won't ever contain undef values.
2020-07-06wwwstream: reduce blob fetch paths for ->getline
This will make it easier to support asynchronous blob retrievals. The `$ctx->{nr}' counter is no longer implicitly supplied since many users didn't care for it, so stack overhead is slightly reduced.
2020-07-06wwwstream: reduce object graph depth
Like with WwwAtomStream and MboxGz, we can bless the existing $ctx object directly to avoid allocating a new hashref. We'll also switch from "->" to "::" to reduce stack utilization.
2020-06-03www: remove smsg_mime API and adjust callers
To further simplify callers and avoid embarrasing memory explosions[1], we can finally eliminate this method in favor of smsg_eml. [1] commit 7d02b9e64455831d3bda20cd2e64e0c15dc07df5 ("view: stop storing all MIME objects on large threads") fixed a huge memory blowup.
2020-06-03wwwatomstream: convert callers to use smsg_eml
We can simplify WwwAtomStream callbacks by performing ->smsg_eml calls in the `feed_entry' sub itself. This simplifies callers, by reducing the number of places which can load an Eml object into memory.
2020-05-01feed: remove PublicInbox::MIME module load
We don't call any Email::MIME or any PublicInbox::MIME-specific functions in here.
2020-04-25feed: drop needless version check
We don't need to be checking inbox versions in parts of the WWW code. Checking the presence of $ibx->over is enough, everywhere.
2020-03-22rename PublicInbox::SearchMsg => PublicInbox::Smsg
Since the introduction of over.sqlite3, SearchMsg is not tied to our search functionality in any way, so stop confusing ourselves and future hackers by just calling it "PublicInbox::Smsg". Add a missing "use" in ExtMsg while we're at it.
2020-02-06treewide: run update-copyrights from gnulib for 2019
I didn't wait until September to do it, this year!
2020-01-27inbox: add ->version method
This allows us to simplify version checking by avoiding "//" or "||" operators sprinkled around.
2019-12-27feed: avoid anonymous subs
WwwStream already passes the WWW $ctx to the user-supplied callback, and it's a trivial change for WwwAtomStream to do the same. Callers in Feed.pm can now take advantage of that to save a few kilobytes of memory on every response.
2019-09-22feed: remove unused $cmt->{-html_url} field
It was never used, and will not be needed.
2019-09-09run update-copyrights from gnulib for 2019
2019-06-04feed: only accept ASCII digits for ref~$N
We don't want to waste cycles passing non-ASCII characters to git.
2019-05-15www: use Inbox->over where appropriate
We don't need to rely on Xapian search functionality for the majority of the WWW code, even. subject_normalized is moved to SearchMsg, where it (probably) makes more sense, anyways.
2018-04-18use %H consistently to disable abbreviations
We generally do not want git to waste time finding abbreviations and we do not want the possibility of them becoming ambiguous over time, either.
2018-04-18feed: respect feedmax, again
Gigantic feeds probably make some clients unhappy, clamp it to what it was in the past. Fixes: b9534449ecce2c59 ("view: avoid offset during pagination")
2018-04-03view: avoid offset during pagination
OFFSET in SQLite gets painful to deal with. Instead, rely on timestamps (from Received:) for pagination. This also sets us up for more precise Date searching in case we want it.
2018-04-02www: rework query responses to avoid COUNT in SQLite
In many cases, we do not care about the total number of messages. It's a rather expensive operation in SQLite (Xapian only provides an estimate). For LKML, this brings top-level /$INBOX/ loading time from ~375ms to around 60ms on my system. Days ago, this operation was taking 800-900ms(!) for me before introducing the SQLite overview DB.
2018-03-30feed: optimize query for feeds, too
This is a smaller improvement than the landing /$INBOX/ page because full message bodies are shown; but still saves around 100ms for my system with LKML.
2018-03-27view: depend on SearchMsg for Message-ID
Since we need to handle messages with multiple and duplicate Message-ID headers, our thread skeleton display must account for that. Since we have a "preferred" Message-ID in case of conflicts, use it as the UUID in an Atom feed so readers do not get confused by conflicts.
2018-03-23feed: fix new.html for v2
I forget this endpoint is still accessible (even if not linked). This also simplifies new.html all around and removes some unused clutter from the old days while we're at it.
2018-03-22feed: $INBOX/new.atom endpoint supports v2 inboxes
We can no longer rely on tree name lookups for v2. This also optimizes v1 by relying on git blob object_id lookups while avoiding process spawning overhead for "git log".
2018-02-07update copyrights for 2018
Using update-copyrights from gnulib While we're at it, use the SPDX identifier for AGPL-3.0+ to ease mechanical processing.
2017-01-10introduce PublicInbox::MIME wrapper class
This should fix problems with multipart messages where text/plain parts lack a header. cf. git clone --mirror https://github.com/rjbs/Email-MIME.git refs/pull/28/head In the future, we may still introduce as streaming interface to reduce memory usage on large emails.
2016-12-17feed: support publicinbox.<name>.feedmax
This allows users to customize by using smaller or larger Atom feeds than the default value of 25 entries.
2016-12-03atom: switch to getline/close for response bodies
This will let us stream larger Atom documents bodies without wasting too much memory and reduce the amount of round-trip requests needed to get necessary information. Hopefully clients are using streaming (SAX) parsers, too. This is the final transition in the core public-inbox code to allow migrating to a "pull"-based body streaming scheme which allows a HTTP server to respond appropriately to backpressure from slow clients.
2016-08-14www: do not double-clean Message-IDs from internal DBs
Ensure we usually strip one level of '<>' from Message-IDs, since our internal SQLite, Xapian, and SHA-1 storage all assume that. Realistically, we screw up if somebody has '<<' or '>>', but those are screwed up mail clients and we can deal with it another time. Currently, this means some messages with '>>' in References or Message-Id are not handled correctly, yet, but we match the behavior of Mail::Thread in keeping the extra '>'.
2016-08-14www: do not unecessarily escape some chars in paths
Based on reading RFC 3986, it seems '@', ':', '!', '$', '&', "'", '; '(', ')', '*', '+', ',', ';', '=' are all allowed in path-absolute where we have the Message-ID. In any case, it seems '@' is fairly common in path components nowadays and too common in Message-IDs.
2016-08-06www: use <hr> to delimit messages in /new.html view, too
This is necessary to delimit messages when viewed without threading.
2016-07-09feed: remove dead code and unneeded use
We've cleaned up our code in recent days and WwwStream provides a consistent header for our HTML pages.
2016-07-09www: cleanup parameter passing
Reduce the size of hashes a bit and drops some unneeded hash lookups for uncommon paths.
2016-07-07www: remove old footer generation code and normalize new.html
We now generate all of our HTML using WwwStream which forces us to have consistent headers and footers in the HTML itself. This also makes the search-capable vs search-less installs go to the new.html endpoint to maintain consistency (in case an admin decides to enable Xapian).
2016-07-06feed: fix links to attachments in Atom feed
Oops...