* [PATCH 03/38] view: rework single message page to compress earlier
2022-09-10 8:16 5% [PATCH 00/38] www: reduce memory usage Eric Wong
@ 2022-09-10 8:16 7% ` Eric Wong
0 siblings, 0 replies; 2+ results
From: Eric Wong @ 2022-09-10 8:16 UTC (permalink / raw)
To: meta
We can rely on deflate to compress large thread skeletons on
single message pages. Subsequent commits will compress bodies,
as well.
---
lib/PublicInbox/View.pm | 42 ++++++++++++++++--------------------
lib/PublicInbox/WwwStream.pm | 14 ++++++++++--
2 files changed, 30 insertions(+), 26 deletions(-)
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 446e6bb8..033af283 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -38,14 +38,12 @@ sub msg_page_i {
: $ctx->gone('over');
$ctx->{mhref} = ($ctx->{nr} || $ctx->{smsg}) ?
"../${\mid_href($smsg->{mid})}/" : '';
- my $obuf = _msg_page_prepare_obuf($eml, $ctx);
- if (length($$obuf)) {
+ if (_msg_page_prepare_obuf($eml, $ctx)) {
multipart_text_as_html($eml, $ctx);
- $$obuf .= '</pre><hr>';
+ ${$ctx->{obuf}} .= '</pre><hr>';
}
- delete $ctx->{obuf};
- $$obuf .= html_footer($ctx, $ctx->{first_hdr}) if !$ctx->{smsg};
- $$obuf;
+ html_footer($ctx, $ctx->{first_hdr}) if !$ctx->{smsg};
+ delete($ctx->{obuf}) // \'';
} else { # called by WwwStream::async_next or getline
$ctx->{smsg}; # may be undef
}
@@ -58,14 +56,12 @@ sub no_over_html ($) {
my $eml = PublicInbox::Eml->new($bref);
$ctx->{mhref} = '';
PublicInbox::WwwStream::init($ctx);
- my $obuf = _msg_page_prepare_obuf($eml, $ctx);
- if (length($$obuf)) {
+ if (_msg_page_prepare_obuf($eml, $ctx)) { # sets {-title_html}
multipart_text_as_html($eml, $ctx);
- $$obuf .= '</pre><hr>';
+ ${$ctx->{obuf}} .= '</pre><hr>';
}
- delete $ctx->{obuf};
- eval { $$obuf .= html_footer($ctx, $eml) };
- html_oneshot($ctx, 200, $$obuf);
+ html_footer($ctx, $eml);
+ $ctx->html_done(200);
}
# public functions: (unstable)
@@ -669,7 +665,7 @@ sub _msg_page_prepare_obuf {
if ($nr) { # unlikely
if ($ctx->{chash} eq content_hash($eml)) {
warn "W: BUG? @$mids not deduplicated properly\n";
- return \$rv;
+ return;
}
$rv .=
"<pre>WARNING: multiple messages have this Message-ID\n</pre><pre>";
@@ -746,7 +742,7 @@ sub _msg_page_prepare_obuf {
}
_parent_headers($ctx, $eml);
$rv .= "\n";
- \$rv;
+ 1;
}
sub SKEL_EXPAND () {
@@ -827,13 +823,11 @@ EOM
}
}
-# returns a string buffer
+# appends to obuf
sub html_footer {
my ($ctx, $hdr) = @_;
my $upfx = '../';
- my $skel;
- my $rv = '<pre>';
- my $related;
+ my ($related, $skel);
my $qry = delete $ctx->{-qry};
if ($qry && $ctx->{ibx}->isrch) {
my $q = ''; # search for either ancestor or descendent patches
@@ -896,15 +890,15 @@ EOF
} elsif ($u) { # unlikely
$parent = " <a\nhref=\"$u\"\nrel=prev>parent</a>";
}
- $rv .= "$next $prev$parent ";
+ ${$ctx->{obuf}} .= "<pre>$next $prev$parent ";
} else { # unindexed inboxes w/o over
+ ${$ctx->{obuf}} .= '<pre>';
$skel = qq( <a\nhref="$upfx">latest</a>);
}
- $rv .= qq(<a\nhref="#R">reply</a>);
- $rv .= $skel;
- $rv .= '</pre>';
- $rv .= $related // '';
- $rv .= msg_reply($ctx, $hdr);
+ ${$ctx->{obuf}} .= qq(<a\nhref="#R">reply</a>);
+ # $skel may be big for big threads, don't append it to obuf
+ $skel .= '</pre>' . ($related // '');
+ $ctx->zmore($skel .= msg_reply($ctx, $hdr)); # flushes obuf
}
sub linkify_ref_no_over {
diff --git a/lib/PublicInbox/WwwStream.pm b/lib/PublicInbox/WwwStream.pm
index f2777fdc..115e0440 100644
--- a/lib/PublicInbox/WwwStream.pm
+++ b/lib/PublicInbox/WwwStream.pm
@@ -27,6 +27,9 @@ sub init {
my ($ctx, $cb) = @_;
$ctx->{cb} = $cb;
$ctx->{base_url} = base_url($ctx);
+ $ctx->{-res_hdr} = [ 'Content-Type' => 'text/html; charset=UTF-8' ];
+ $ctx->{gz} = PublicInbox::GzipFilter::gz_or_noop($ctx->{-res_hdr},
+ $ctx->{env});
bless $ctx, __PACKAGE__;
}
@@ -164,6 +167,14 @@ sub getline {
$ctx->zflush(_html_end($ctx));
}
+sub html_done ($$) {
+ my ($ctx, $code) = @_;
+ my $bdy = $ctx->zflush(_html_end($ctx));
+ my $res_hdr = delete $ctx->{-res_hdr};
+ push @$res_hdr, 'Content-Length', length($bdy);
+ [ $code, $res_hdr, [ $bdy ] ]
+}
+
sub html_oneshot ($$;@) {
my ($ctx, $code) = @_[0, 1];
my $res_hdr = [ 'Content-Type' => 'text/html; charset=UTF-8',
@@ -195,9 +206,8 @@ sub async_next ($) {
sub aresponse {
my ($ctx, $code, $cb) = @_;
- my $res_hdr = [ 'Content-Type' => 'text/html; charset=UTF-8' ];
init($ctx, $cb);
- $ctx->psgi_response($code, $res_hdr);
+ $ctx->psgi_response($code, delete $ctx->{-res_hdr});
}
sub html_init {
^ permalink raw reply related [relevance 7%]
* [PATCH 00/38] www: reduce memory usage
@ 2022-09-10 8:16 5% Eric Wong
2022-09-10 8:16 7% ` [PATCH 03/38] view: rework single message page to compress earlier Eric Wong
0 siblings, 1 reply; 2+ results
From: Eric Wong @ 2022-09-10 8:16 UTC (permalink / raw)
To: meta
I'm over the moon with this series since this drops dozens of
megabytes of scratchpad use while providing tiny speedups along
the way. For me, that's a 10-15% reduction in memory use under
public-inbox-netd w/ mwrap-perl[1] overhead.
This scratchpad use has been bothering me for a long time
(since I fixed all the other leaks, including one in the core
Encode module).
There's more coming, of course, but this series is big enough
and shown good results on https://yhbt.net/lore/
Also, it also provides a good pattern/guidance going forward
on how to efficiently implement future features.
I actually started out in this series trying to buffer
everything using gzip to avoid space-wasting uncompressed
strings living in memory. Unfortunately,
Compress::Raw::Zlib::deflate calls proved too expensive to call
frequently for short strings.
Going back to `.=' ops via a ->zadd method brought back some of
the speed while consolidating the scratchpad to a single place;
but I didn't like the performance regression.
I kept those detours in the history presented here since I
figure it's worth showing
Finally relying on PerlIO::scalar with print|say ops proved to
be the fastest since OO ->method dispatch overhead can be avoided
and there's no scratchpad use at all from these, either.
As before, we still call C:R:Z:deflate after every full message
and flush to the socket periodically.
I may even consider using PerlIO::gzip in the future, but that's
a non-standard module. However, I definitely took inspiration
from it since I saw that it would buffer uncompressed data into
memory before compressing it.
There's also a few small simplifications and speedups I noticed
along the way, and several other bugfixes I posted independently
while working on this series.
[1] I used https://80x24.org/mwrap-perl.git to check malloc use
Eric Wong (38):
xt: fold perf-obfuscate into perf-msgview, future-proof
www: gzip_filter: implicitly flush {obuf} on zmore/zflush
view: rework single message page to compress earlier
www_atom_stream: require 200 response
www_stream: aresponse assumes 200, too
www_text: reduce parameter passing for response header
viewvcs: use shorter and simpler ctx->html_done
www_listing: consolidate some ->zmore dispatches
www_listing: avoid unnecessary work for common cases
www: viewdiff: use return value for diff_hunk
view: simplify _parent_headers
view: eml_entry: reduce manipulation of ctx->{obuf}
gzip_filter: ->translate can reuse zmore/zflush
view: remove multipart_text_as_html
view: reduce subroutine calls for submsg_hdr
view: attach_link: reduce obuf manipulation
viewdiff: reuse existing string in diff_before_or_after
view: _th_index_lite: avoid one s///, improve symmetry
view: _th_index_lite: use `//' defined-or op
view: reduce ascii_html calls and {obuf} use
view: html_footer: golf out a few lines
view: html_footer: remove obuf dependency
view: html_footer: avoid escaping " in a few places
viewdiff: diff_hunk: shorten conditionals, slightly
view: switch a few things to ctx->zmore
www: drop {obuf} use entirely, for now
www: switch to zadd for the majority of buffering
www: use PerlIO::scalar (zfh) for buffering
viewdiff: diff_before_or_after: avoid extra capture
viewdiff: diff_header: shorten function, slightly
www_static: switch to `print $zfh', and optimize
httpd/async: describe which ->write subs it can call
translate: support multiple buffer args
gzip_filter: write: use multi-arg translate
feed: new_html_i: switch from zmore to `print $zfh'
mbox*: use multi-arg ->translate and ->write
www_listing: switch to `print $zfh'
viewvcs: switch to `print $zfh'
Documentation/mknews.perl | 3 +-
MANIFEST | 1 -
lib/PublicInbox/CompressNoop.pm | 4 +-
lib/PublicInbox/Feed.pm | 12 +-
lib/PublicInbox/GzipFilter.pm | 62 +++---
lib/PublicInbox/HTTPD/Async.pm | 9 +-
lib/PublicInbox/Mbox.pm | 11 +-
lib/PublicInbox/MboxGz.pm | 3 +-
lib/PublicInbox/SearchView.pm | 8 +-
lib/PublicInbox/View.pm | 312 ++++++++++++-------------------
lib/PublicInbox/ViewDiff.pm | 115 +++++-------
lib/PublicInbox/ViewVCS.pm | 17 +-
lib/PublicInbox/WwwAtomStream.pm | 19 +-
lib/PublicInbox/WwwListing.pm | 40 ++--
lib/PublicInbox/WwwStatic.pm | 32 ++--
lib/PublicInbox/WwwStream.pm | 23 ++-
lib/PublicInbox/WwwText.pm | 35 ++--
t/psgi_v2.t | 4 +-
xt/perf-msgview.t | 10 +-
xt/perf-obfuscate.t | 66 -------
20 files changed, 320 insertions(+), 466 deletions(-)
delete mode 100644 xt/perf-obfuscate.t
^ permalink raw reply [relevance 5%]
Results 1-2 of 2 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2022-09-10 8:16 5% [PATCH 00/38] www: reduce memory usage Eric Wong
2022-09-10 8:16 7% ` [PATCH 03/38] view: rework single message page to compress earlier Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).