about summary refs log tree commit homepage
path: root/lib/PublicInbox/GzipFilter.pm
DateCommit message (Collapse)
2023-12-13gzip_filter: use OO ->zflush dispatch
While it's not in a code path intended WwwCoderepo and RepoAtom, those classes provide their own ->zflush, this can future-proof our code against future subclasses at a minor performance cost.
2023-10-25drop psgi_return, httpd/async and GetlineBody
Now that psgi_yield is used everywhere, the more complex psgi_return and it's helper bits can be removed. We'll also fix some outdated comments now that everything on psgi_return has switched to psgi_yield. GetlineResponse replaces GetlineBody and does a better job of isolating generic PSGI-only code.
2023-10-25qspawn: introduce new psgi_yield API
This is intended to replace psgi_return and HTTPD/Async entirely, hopefully making our code less convoluted while maintaining the ability to handle slow clients on memory-constrained systems This was made possible by the philosophy shift in commit 21a539a2df0c (httpd/async: switch to buffering-as-fast-as-possible, 2019-06-28). We'll still support generic PSGI via the `pull' model with a GetlineResponse class which is similar to the old GetlineBody.
2023-05-02daemon: improve handling of Git->async_abort
The $oid arg for Git->cat_async is defined on async_abort using the original request, so use undefined $type to distinguish that case in caller-supplied callbacks. async_abort isn't common, of course, but sometimes git subprocesses can die unexpectedly.
2023-04-12gzip_filter: use carp in ->bail for failure checks
carp is more useful since it shows the perspective of the caller and can be made to show a full backtrace with PERL5OPT=-MCarp=verbose
2023-01-04www_coderepo: implement /$CODE_REPO/atom/ endpoint
This should be similar or identical to what's in cgit; and tie into the rest of the www_coderepo stuff.
2022-09-10gzip_filter: write: use multi-arg translate
While we must name this function ->write for PSGI compatibility, our own uses of it can make it operate more like writev(2) or `print' in Perl.
2022-09-10translate: support multiple buffer args
This will let us drop some calls to zmore in subsequent commits.
2022-09-10www: use PerlIO::scalar (zfh) for buffering
Calling Compress::Raw::Zlib::deflate is fairly expensive. Relying on the `.=' (concat) operator inside ->zadd operator is faster, but the method dispatch overhead is noticeable compared to the original code where we had bare `.=' littered throughout. Fortunately, `print' and `say' with the PerlIO::scalar IO layer appears to offer better performance without high method dispatch overhead. This doesn't allow us to save as much memory as I originally hoped, but does allow us to rely less on concat operators in other places and just pass a list of args to `print' and `say' as a appropriate. This does reduce scratchpad use, however, allowing for large memory savings, and we still ->deflate every single $eml.
2022-09-10www: switch to zadd for the majority of buffering
This allows us to focus string concatenations in one place to allow Perl internal scratchpad optimizations to reuse memory. Calling Compress::Raw::Zlib::deflate repeatedly proves too expensive in terms of CPU cycles.
2022-09-10www: drop {obuf} use entirely, for now
This may help us identify hot spots and reduce pad space as needed.
2022-09-10gzip_filter: ->translate can reuse zmore/zflush
We can work towards delaying zlib context allocations in future commits, too.
2022-09-10www: gzip_filter: implicitly flush {obuf} on zmore/zflush
This seems like the least disruptive way to allow more use of ->zmore when streaming large messages to sockets.
2022-08-23gzip_filter: ->zmore and ->zflush support multiple args
This will make writev-like use easier for the next commit, and also future changes where I'll rely more on zlib for buffering.
2022-08-04www: gzip_filter: avoid errors after ->write failure
->zflush must return a string to its caller, not undef. Additionally, {http_out} may be deleted on ->write if ->close recurses. This should fix the following errors: Use of uninitialized value $_[1] in string eq at PublicInbox/HTTP.pm line 211. E: Can't call method "close" on an undefined value at GzipFilter.pm line 167. Fixes: a6d50dc1098c01a1 (www: gzip_filter: gracefully handle socket ->write failures, 2022-08-03)
2022-08-03www: gzip_filter: update a few comments
A few things I noticed while reviewing and evaluating the PSGI code for JMAP support.
2022-08-03www: gzip_filter: gracefully handle socket ->write failures
Socket ->write failures are expected and common for TCP traffic, especially if it's facing unreliable remote connections. So just bail out silently if our {gz} field was already clobbered during the small bit of recursion we hit on ->write failures from async responses. This ought to fix some GzipFilter::zflush errors (via $forward ->close from PublicInbox::HTTP) I've been noticing on deployments running -netd. I'm still unsure as to why I hadn't seen them before, but it might've only been ignorance on my part... Link: https://public-inbox.org/meta/20220802065436.GA13935@dcvr/
2021-10-25www: $MSGID/raw: set charset in HTTP response
By using the charset specified in the message, web browsers are more likely to display the raw text properly for human readers. Inspired by a patch by Thomas Weißschuh: https://public-inbox.org/meta/20211024214337.161779-3-thomas@t-8ch.de/ Cc: Thomas Weißschuh <thomas@t-8ch.de>
2021-10-25gzip_filter: delay async wcb call
This will let us modify the response header later to set a proper charset for Content-Type when displaying raw messages. Cc: Thomas Weißschuh <thomas@t-8ch.de>
2021-10-13treewide: use warn() or carp() instead of env->{psgi.errors}
Large chunks of our codebase and 3rd-party dependencies do not use ->{psgi.errors}, so trying to standardize on it was a fruitless endeavor. Since warn() and carp() are standard mechanism within Perl, just use that instead and simplify a bunch of existing code.
2021-09-29www: do not bump {over} refcnt on long responses
SQLite files may be replaced or removed by admins while generating a large threads or mailbox responses. Ensure we don't hold onto DBI handles and associated file descriptors past their cleanup.
2021-09-28www+httpd: lower priority of large mbox downloads
While each git blob request is treated fairly w.r.t other git blob requests, responses triggering thousands of git blob requests can still noticeably increase latency for less-expensive responses. Move large mbox results and the nasty all.mbox endpoint to a low priority queue which only fires once per-event loop iteration. This reduces the response time of short HTTP responses while many gigantic mboxes are being downloaded simultaneously, but still maximizes use of available I/O when there's no inexpensive HTTP responses happening. This only affects PublicInbox::WWW users who use public-inbox-httpd, not generic PSGI servers.
2021-06-24favor git(1) rather than libgit2 for ExtSearch
While both git and libgit2 take around 16 minutes to load 100K alternates there's already a proposed patch to make git faster: <https://lore.kernel.org/git/20210624005806.12079-1-e@80x24.org/> It's also easier to patch and install git locally since the git.git build system defaults to prefix=$HOME and dealing with dynamic linking with libgit2 is more difficult for end users relying on Inline::C. libgit2 remains in use for the non-ALL.git case, but maybe it's not necessary (libgit2 is significantly slower than git in Debian 10 due to SHA-1 collision checking).
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2020-12-09treewide: replace {-inbox} with {ibx} for consistency
{ibx} is shorter and is the most prevalent abbreviation in indexing and IMAP code, and the `$ibx' local variable is already prevalent throughout. In general, the codebase favors removal of vowels in variable and field names to denote non-references (because references are "lighter" than non-references). So update WWW and Filter users to use the same code since it reduces confusion and may allow easier code sharing.
2020-08-01www: rework async_* to use method table
Although the ->async_next method does not take $self as a receiver, but rather a PublicInbox::HTTP object, we may still retrieve it to be called with the HTTP object via UNIVERSAL->can.
2020-07-06gzipfilter: check http->{forward} for client disconnects
We actually don't do anything with {env} or {'psgix.io'} on client aborts, so checking the truthiness of '{forward}' is necessary.
2020-07-06daemon: warn on missing blobs
Since -edit and -purge should be rare and TOCTOU around them rarer still; missing {blobs} could be indicative of a real bug elsewhere. Warn on them. And I somehow ended up with 3 different field names for Inbox objects. Perhaps they'll be made consistent in the future.
2020-07-06gzipfilter: drop HTTP connection on bugs or data corruption
While all the {async_next} callbacks needed eval guards anyways because of DS->write, {async_eml} callbacks did not. Ensure any bugs in our code or data corruption result in termination of the HTTP connection, so as not to leave clients hanging on a response which never comes or is mangled in some way.
2020-07-06www: update internal docs
We no longer favor getline+close for streaming PSGI responses when using public-inbox-httpd. We still support it for other PSGI servers, though.
2020-07-06remove unused/redundant zlib-related imports
Z_FINISH is the default for Compress::Raw::Zlib::Deflate->flush, anyways, so there's no reason to import it. And none of C::R::Z is needed in WwwText now that gzf_maybe handles it all.
2020-07-06www: start making gzipfilter the parent response class
Virtually all of our responses are going to be gzipped, anyways. This will allow us to utilize zlib as a buffering layer and share common code for async blob retrieval responses. To streamline this and allow GzipFilter to be a parent class, we'll replace the NoopFilter with a similar CompressNoop class which emulates the two Compress::Raw::Zlib::Deflate methods we use. This drops a bunch of redundant code and will hopefully make upcoming WwwStream changes easier to reason about.
2020-07-06qspawn: learn to gzip streaming responses
This will allow us to gzip responses generated by cgit and any other CGI programs or long-lived streaming responses we may spawn.
2020-07-06{gzip,noop}filter: ->zmore returns undef, always
This simplifies callers, as witnessed by the change to WwwListing. It adds overhead to NoopFilter, but NoopFilter should see little use as nearly all HTTP clients request gzip.
2020-07-06gzipfilter: replace Compress::Raw::Deflate usages
The new ->zmore and ->zflush APIs make it possible to replace existing verbose usages of Compress::Raw::Deflate and simplify buffering logic for streaming large gzipped data. One potentially user visible change is we now break the mbox.gz response on zlib failures, instead of silently continuing onto the next message. zlib only seems to fail on OOM, which should be rare; so it's ideal we drop the connection anyways.
2020-07-06wwwlisting: use GzipFilter for HTML
The changes to GzipFilter here may be beneficial for building HTML and XML responses in other places, too.
2020-07-06www*stream: gzip ->getline responses
Our most common endpoints deserve to be gzipped.
2020-07-06wwwstream: oneshot: perform gzip without middleware
Plack::Middleware::Deflater forces us to use a memory-intensive closure. Instead, work towards building compressed strings in memory to reduce the overhead of buffering large HTML output.
2020-07-06gzipfilter: minor cleanups
We currently don't use bytes::length in ->write, so there's no need to `use bytes'. Favor `//=' to describe the intent of the conditional assignment since the C::R::Z::Deflate object is always truthy. Also use the local $gz variable to avoid unnecessary {gz} hash lookups.
2020-03-25gzipfilter: lazy allocate the deflate context
zlib contexts are memory-intensive, particularly when used for compression. Since the gzip filter may be sitting in a limiter queue for a long period, delay the allocation we actually have data to translate, and not a moment sooner.
2020-03-25qspawn: reinstate filter support, add gzip filter
We'll be supporting gzipped from sqlite3(1) dumps for altid files in future commits. In the future (and if we survive), we may replace Plack::Middleware::Deflater with our own GzipFilter to work better with asynchronous responses without relying on memory-intensive anonymous subs.