From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id D558E1F5AE for ; Sun, 5 Jul 2020 23:27:59 +0000 (UTC) From: Eric Wong To: meta@public-inbox.org Subject: [PATCH 00/43] www: async git cat-file w/ -httpd Date: Sun, 5 Jul 2020 23:27:16 +0000 Message-Id: <20200705232759.3161-1-e@yhbt.net> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: This allows -httpd to make better use of time it spends waiting on git-cat-file to respond. It allows us to deal with high-latency HDD storage without a client monopolizing the event loop. Even on a mid-range consumer-grade SSD, this seems to give a 10+% speed improvement for HTTP responses requiring many blobs, including all /T/, /t/, and /t.mbox.gz endpoints. This only benefits indexed inboxes (both v1 and v2); I'm not sure if anybody still uses unindexed v1 inboxes nowadays. A new xt/httpd-async-stream.t maintainer test ensures checksums for responses before and after this series match exactly as before. This builds off a branch I started several months ago (but never published here) to integrate gzip responses into our codebase and remove our optional dependency on Plack::Middleware::Deflater. We already gzip a bunch of things independent of Plack::Middleware::Deflater: manifest.js.gz, altid SQLite3 dumps and all the *.mbox.gz endpoints; so being able to use gzip on all of our responses without an extra dependency seemed logical. Being able to consistently use our GzipFilter API to perform buffering via ->zmore made it significantly easier to reason about small response chunks for ghost messages interspersed with large ones when streaming /$INBOX/$MSGID/t/ endpoints. I'm not yet maximizing use of ->zmore for all buffering of HTTP responses, yet; measurements need to happen, first. That may happen in the 1.7 time frame. In particular, we would need to ensure the Perl method dispatch and DSO overhead to Zlib.so and libz.so of making many ->zmore calls doesn't cause performance regressions compared to the current `.=' use and calling ->zmore/->translate fewer times. Eric Wong (43): gzipfilter: minor cleanups wwwstream: oneshot: perform gzip without middleware www*stream: gzip ->getline responses wwwtext: gzip text/plain responses, as well wwwtext: switch to html_oneshot www: need: use WwwStream::html_oneshot wwwlisting: use GzipFilter for HTML gzipfilter: replace Compress::Raw::Deflate usages {gzip,noop}filter: ->zmore returns undef, always mbox: remove html_oneshot import wwwstatic: support gzipped response directory listings qspawn: learn to gzip streaming responses stop auto-loading Plack::Middleware::Deflater mboxgz: do asynchronous git blob retrievals mboxgz: reduce object hash depth mbox: async blob retrieval for "single message" raw mboxrd wwwatomstream: simplify feed_update callers wwwatomstream: use PublicInbox::Inbox->modified for feed_updated wwwatomstream: reuse $ctx as $self xt/httpd-async-stream: allow more options wwwatomstream: support asynchronous blob retrievals wwwstream: reduce object graph depth wwwstream: reduce blob retrieval paths for ->getline www: start making gzipfilter the parent response class remove unused/redundant zlib-related imports wwwstream: use parent.pm and no warnings wwwstream: subclass off GzipFilter view: wire up /$INBOX/$MESSAGE_ID/ permalink to async view: /$INBOX/$MSGID/t/ reads blobs asynchronously view: update /$INBOX/$MSGID/T/ to be async feed: generate_i: eliminate pointless loop feed: /$INBOX/new.html retrieves blobs asynchronously ssearchview: /$INBOX/?q=$QUERY&x=t uses async blobs view: eml_entry: reduce parameters view: /$INBOX/$MSGID/t/: avoid extra hash lookup in eml case wwwstream: eliminate ::response, use html_oneshot www: update internal docs view: simplify eml_entry callers further wwwtext: simplify gzf_maybe use wwwattach: support async blob retrievals gzipfilter: drop HTTP connection on bugs or data corruption daemon: warn on missing blobs gzipfilter: check http->{forward} for client disconnects Documentation/mknews.perl | 20 +-- Documentation/public-inbox-httpd.pod | 1 - Documentation/technical/ds.txt | 4 +- INSTALL | 5 - MANIFEST | 2 + ci/deps.perl | 1 - examples/cgit.psgi | 8 - examples/newswww.psgi | 8 - examples/public-inbox.psgi | 9 -- examples/unsubscribe.psgi | 1 - lib/PublicInbox/CompressNoop.pm | 22 +++ lib/PublicInbox/Feed.pm | 22 ++- lib/PublicInbox/GetlineBody.pm | 4 +- lib/PublicInbox/GzipFilter.pm | 168 +++++++++++++++++--- lib/PublicInbox/HTTP.pm | 7 + lib/PublicInbox/HTTPD.pm | 5 +- lib/PublicInbox/IMAP.pm | 1 + lib/PublicInbox/Mbox.pm | 137 +++++++++-------- lib/PublicInbox/MboxGz.pm | 81 ++++------ lib/PublicInbox/NNTP.pm | 1 + lib/PublicInbox/Qspawn.pm | 6 +- lib/PublicInbox/SearchView.pm | 40 ++--- lib/PublicInbox/View.pm | 219 ++++++++++++++------------- lib/PublicInbox/WWW.pm | 9 +- lib/PublicInbox/WwwAtomStream.pm | 66 ++++---- lib/PublicInbox/WwwAttach.pm | 63 ++++++-- lib/PublicInbox/WwwListing.pm | 24 +-- lib/PublicInbox/WwwStatic.pm | 14 +- lib/PublicInbox/WwwStream.pm | 110 ++++++++------ lib/PublicInbox/WwwText.pm | 26 ++-- script/public-inbox-httpd | 9 -- script/public-inbox.cgi | 7 - t/httpd-corner.psgi | 7 + t/httpd-corner.t | 9 +- t/plack.t | 4 + t/psgi_attach.t | 162 +++++++++++--------- t/psgi_text.t | 33 +++- t/psgi_v2.t | 80 ++++++++-- t/www_listing.t | 8 +- t/www_static.t | 11 +- xt/httpd-async-stream.t | 104 +++++++++++++ 41 files changed, 964 insertions(+), 554 deletions(-) create mode 100644 lib/PublicInbox/CompressNoop.pm create mode 100644 xt/httpd-async-stream.t