* [PATCH 01/38] xt: fold perf-obfuscate into perf-msgview, future-proof
2022-09-10 8:16 6% [PATCH 00/38] www: reduce memory usage Eric Wong
@ 2022-09-10 8:16 7% ` Eric Wong
0 siblings, 0 replies; 2+ results
From: Eric Wong @ 2022-09-10 8:16 UTC (permalink / raw)
To: meta
perf-obfuscate was close enough to perf-msgview that it only
required setting the `obfuscate' field of the inbox.
Then update perf-msgview to account for upcoming internal
changes. The current use of {obuf} and concat ops results in
excessive scratchpad space and I may be able to even get
speedups by avoiding concat ops.
---
MANIFEST | 1 -
xt/perf-msgview.t | 10 ++++---
xt/perf-obfuscate.t | 66 ---------------------------------------------
3 files changed, 7 insertions(+), 70 deletions(-)
delete mode 100644 xt/perf-obfuscate.t
diff --git a/MANIFEST b/MANIFEST
index ac21ddcc..8be912d0 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -602,7 +602,6 @@ xt/nntpd-validate.t
xt/over-fsck.perl
xt/perf-msgview.t
xt/perf-nntpd.t
-xt/perf-obfuscate.t
xt/perf-threading.t
xt/pop3d-mpop.t
xt/solver.t
diff --git a/xt/perf-msgview.t b/xt/perf-msgview.t
index 7f92ce85..ef261359 100644
--- a/xt/perf-msgview.t
+++ b/xt/perf-msgview.t
@@ -11,6 +11,8 @@ use PublicInbox::WwwStream;
my $inboxdir = $ENV{GIANT_INBOX_DIR} // $ENV{GIANT_PI_DIR};
my $blob = $ENV{TEST_BLOB};
+my $obfuscate = $ENV{PI_OBFUSCATE} ? 1 : 0;
+diag "PI_OBFUSCATE=$obfuscate";
plan skip_all => "GIANT_INBOX_DIR not defined for $0" unless $inboxdir;
my @cat = qw(cat-file --buffer --batch-check --batch-all-objects);
@@ -21,7 +23,8 @@ if (require_git(2.19, 1)) {
"git <2.19, cat-file lacks --unordered, locality suffers\n";
}
require_mods qw(Plack::Util);
-my $ibx = PublicInbox::Inbox->new({ inboxdir => $inboxdir, name => 'name' });
+my $ibx = PublicInbox::Inbox->new({ inboxdir => $inboxdir, name => 'name',
+ obfuscate => $obfuscate});
my $git = $ibx->git;
my $fh = $blob ? undef : $git->popen(@cat);
if ($fh) {
@@ -46,10 +49,11 @@ $ctx->{mhref} = '../';
my $cb = sub {
$eml = PublicInbox::Eml->new(shift);
$eml->each_part(\&PublicInbox::View::add_text_body, $ctx, 1);
- $ctx->zflush;
+ $ctx->zflush(grep defined, delete @$ctx{'obuf'}); # compat
++$m;
delete $ctx->{zbuf};
- ${$ctx->{obuf}} = '';
+ ${$ctx->{obuf}} = ''; # compat
+ $ctx->{gz} = PublicInbox::GzipFilter::gzip_or_die();
};
my $t = timeit(1, sub {
diff --git a/xt/perf-obfuscate.t b/xt/perf-obfuscate.t
deleted file mode 100644
index 4da36124..00000000
--- a/xt/perf-obfuscate.t
+++ /dev/null
@@ -1,66 +0,0 @@
-#!perl -w
-# Copyright (C) all contributors <meta@public-inbox.org>
-# License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
-use strict;
-use v5.10.1;
-use PublicInbox::TestCommon;
-use Benchmark qw(:all);
-use PublicInbox::Inbox;
-use PublicInbox::View;
-use PublicInbox::WwwStream;
-
-my $inboxdir = $ENV{GIANT_INBOX_DIR};
-plan skip_all => "GIANT_INBOX_DIR not defined for $0" unless $inboxdir;
-
-my $obfuscate = $ENV{PI_OBFUSCATE} ? 1 : 0;
-diag "obfuscate=$obfuscate\n";
-
-my @cat = qw(cat-file --buffer --batch-check --batch-all-objects);
-if (require_git(2.19, 1)) {
- push @cat, '--unordered';
-} else {
- warn
-"git <2.19, cat-file lacks --unordered, locality suffers\n";
-}
-require_mods qw(Plack::Util);
-my $ibx = PublicInbox::Inbox->new({ inboxdir => $inboxdir, name => 'name' ,
- obfuscate => $obfuscate});
-my $git = $ibx->git;
-my $fh = $git->popen(@cat);
-my $vec = '';
-vec($vec, fileno($fh), 1) = 1;
-select($vec, undef, undef, 60) or die "timed out waiting for --batch-check";
-
-my $ctx = bless {
- env => { HTTP_HOST => 'example.com', 'psgi.url_scheme' => 'https' },
- ibx => $ibx,
- www => Plack::Util::inline_object(style => sub {''}),
- gz => PublicInbox::GzipFilter::gzip_or_die(),
-}, 'PublicInbox::WwwStream';
-my ($eml, $res, $oid, $type);
-my $n = 0;
-my $m = 0;
-${$ctx->{obuf}} = '';
-$ctx->{mhref} = '../';
-
-my $cb = sub {
- $eml = PublicInbox::Eml->new(shift);
- $eml->each_part(\&PublicInbox::View::add_text_body, $ctx, 1);
- $ctx->zflush;
- ++$m;
- delete $ctx->{zbuf};
- ${$ctx->{obuf}} = '';
-};
-
-my $t = timeit(1, sub {
- while (<$fh>) {
- ($oid, $type) = split / /;
- next if $type ne 'blob';
- ++$n;
- $git->cat_async($oid, $cb);
- }
- $git->async_wait_all;
-});
-diag 'add_text_body took '.timestr($t)." for $n <=> $m messages";
-is($m, $n, 'rendered all messages');
-done_testing();
^ permalink raw reply related [relevance 7%]
* [PATCH 00/38] www: reduce memory usage
@ 2022-09-10 8:16 6% Eric Wong
2022-09-10 8:16 7% ` [PATCH 01/38] xt: fold perf-obfuscate into perf-msgview, future-proof Eric Wong
0 siblings, 1 reply; 2+ results
From: Eric Wong @ 2022-09-10 8:16 UTC (permalink / raw)
To: meta
I'm over the moon with this series since this drops dozens of
megabytes of scratchpad use while providing tiny speedups along
the way. For me, that's a 10-15% reduction in memory use under
public-inbox-netd w/ mwrap-perl[1] overhead.
This scratchpad use has been bothering me for a long time
(since I fixed all the other leaks, including one in the core
Encode module).
There's more coming, of course, but this series is big enough
and shown good results on https://yhbt.net/lore/
Also, it also provides a good pattern/guidance going forward
on how to efficiently implement future features.
I actually started out in this series trying to buffer
everything using gzip to avoid space-wasting uncompressed
strings living in memory. Unfortunately,
Compress::Raw::Zlib::deflate calls proved too expensive to call
frequently for short strings.
Going back to `.=' ops via a ->zadd method brought back some of
the speed while consolidating the scratchpad to a single place;
but I didn't like the performance regression.
I kept those detours in the history presented here since I
figure it's worth showing
Finally relying on PerlIO::scalar with print|say ops proved to
be the fastest since OO ->method dispatch overhead can be avoided
and there's no scratchpad use at all from these, either.
As before, we still call C:R:Z:deflate after every full message
and flush to the socket periodically.
I may even consider using PerlIO::gzip in the future, but that's
a non-standard module. However, I definitely took inspiration
from it since I saw that it would buffer uncompressed data into
memory before compressing it.
There's also a few small simplifications and speedups I noticed
along the way, and several other bugfixes I posted independently
while working on this series.
[1] I used https://80x24.org/mwrap-perl.git to check malloc use
Eric Wong (38):
xt: fold perf-obfuscate into perf-msgview, future-proof
www: gzip_filter: implicitly flush {obuf} on zmore/zflush
view: rework single message page to compress earlier
www_atom_stream: require 200 response
www_stream: aresponse assumes 200, too
www_text: reduce parameter passing for response header
viewvcs: use shorter and simpler ctx->html_done
www_listing: consolidate some ->zmore dispatches
www_listing: avoid unnecessary work for common cases
www: viewdiff: use return value for diff_hunk
view: simplify _parent_headers
view: eml_entry: reduce manipulation of ctx->{obuf}
gzip_filter: ->translate can reuse zmore/zflush
view: remove multipart_text_as_html
view: reduce subroutine calls for submsg_hdr
view: attach_link: reduce obuf manipulation
viewdiff: reuse existing string in diff_before_or_after
view: _th_index_lite: avoid one s///, improve symmetry
view: _th_index_lite: use `//' defined-or op
view: reduce ascii_html calls and {obuf} use
view: html_footer: golf out a few lines
view: html_footer: remove obuf dependency
view: html_footer: avoid escaping " in a few places
viewdiff: diff_hunk: shorten conditionals, slightly
view: switch a few things to ctx->zmore
www: drop {obuf} use entirely, for now
www: switch to zadd for the majority of buffering
www: use PerlIO::scalar (zfh) for buffering
viewdiff: diff_before_or_after: avoid extra capture
viewdiff: diff_header: shorten function, slightly
www_static: switch to `print $zfh', and optimize
httpd/async: describe which ->write subs it can call
translate: support multiple buffer args
gzip_filter: write: use multi-arg translate
feed: new_html_i: switch from zmore to `print $zfh'
mbox*: use multi-arg ->translate and ->write
www_listing: switch to `print $zfh'
viewvcs: switch to `print $zfh'
Documentation/mknews.perl | 3 +-
MANIFEST | 1 -
lib/PublicInbox/CompressNoop.pm | 4 +-
lib/PublicInbox/Feed.pm | 12 +-
lib/PublicInbox/GzipFilter.pm | 62 +++---
lib/PublicInbox/HTTPD/Async.pm | 9 +-
lib/PublicInbox/Mbox.pm | 11 +-
lib/PublicInbox/MboxGz.pm | 3 +-
lib/PublicInbox/SearchView.pm | 8 +-
lib/PublicInbox/View.pm | 312 ++++++++++++-------------------
lib/PublicInbox/ViewDiff.pm | 115 +++++-------
lib/PublicInbox/ViewVCS.pm | 17 +-
lib/PublicInbox/WwwAtomStream.pm | 19 +-
lib/PublicInbox/WwwListing.pm | 40 ++--
lib/PublicInbox/WwwStatic.pm | 32 ++--
lib/PublicInbox/WwwStream.pm | 23 ++-
lib/PublicInbox/WwwText.pm | 35 ++--
t/psgi_v2.t | 4 +-
xt/perf-msgview.t | 10 +-
xt/perf-obfuscate.t | 66 -------
20 files changed, 320 insertions(+), 466 deletions(-)
delete mode 100644 xt/perf-obfuscate.t
^ permalink raw reply [relevance 6%]
Results 1-2 of 2 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2022-09-10 8:16 6% [PATCH 00/38] www: reduce memory usage Eric Wong
2022-09-10 8:16 7% ` [PATCH 01/38] xt: fold perf-obfuscate into perf-msgview, future-proof Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).