user/dev discussion of public-inbox itself
 help / color / Atom feed
* [PATCH 0/2] www: memory reductions for multipart
@ 2020-01-11  6:28 Eric Wong
  2020-01-11  6:28 ` [PATCH 1/2] xt/mem-msgview.t: change to test one multipart message Eric Wong
  2020-01-11  6:28 ` [PATCH 2/2] www: discard multipart parent on iteration Eric Wong
  0 siblings, 2 replies; 3+ messages in thread
From: Eric Wong @ 2020-01-11  6:28 UTC (permalink / raw)
  To: meta

Multipart messages in Email::MIME still cost us a lot of memory,
but this makes things less bad (not "good", just "less bad" :P)

Eric Wong (2):
  xt/mem-msgview.t: change to test one multipart message
  www: discard multipart parent on iteration

 lib/PublicInbox/MsgIter.pm       |  5 +++--
 lib/PublicInbox/SolverGit.pm     |  5 ++---
 lib/PublicInbox/View.pm          |  6 +++---
 lib/PublicInbox/WwwAtomStream.pm |  4 ++--
 lib/PublicInbox/WwwAttach.pm     |  2 +-
 xt/mem-msgview.t                 | 32 ++++++++++++++++++++++----------
 6 files changed, 33 insertions(+), 21 deletions(-)

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/2] xt/mem-msgview.t: change to test one multipart message
  2020-01-11  6:28 [PATCH 0/2] www: memory reductions for multipart Eric Wong
@ 2020-01-11  6:28 ` Eric Wong
  2020-01-11  6:28 ` [PATCH 2/2] www: discard multipart parent on iteration Eric Wong
  1 sibling, 0 replies; 3+ messages in thread
From: Eric Wong @ 2020-01-11  6:28 UTC (permalink / raw)
  To: meta

A single multipart message is far more common than
a reused Message-ID, so rewrite the test to only have
a single multipart message.  Memory improvements will
be implemented in the next commit.
---
 xt/mem-msgview.t | 32 ++++++++++++++++++++++----------
 1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/xt/mem-msgview.t b/xt/mem-msgview.t
index 1ea0f559..68b5919f 100644
--- a/xt/mem-msgview.t
+++ b/xt/mem-msgview.t
@@ -1,6 +1,8 @@
 #!perl -w
 # Copyright (C) 2020 all contributors <meta@public-inbox.org>
 # License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
+# Note: this may be altered as-needed to demonstrate improvements.
+# See history in git for this file.
 use strict;
 use IO::Handle; # ->flush
 use Fcntl qw(SEEK_SET);
@@ -11,7 +13,7 @@ use Test::More;
 my @mods = qw(DBD::SQLite BSD::Resource);
 require_mods(@mods);
 use_ok($_) for @mods;
-my $lines = $ENV{NR_LINES} // 100000;
+my $lines = $ENV{NR_LINES} // 50000;
 my ($tmpdir, $for_destroy) = tmpdir();
 my $inboxname = 'big';
 my $inboxdir = "$tmpdir/big";
@@ -32,29 +34,39 @@ EOF
 		'inbox initialized');
 
 	$fh = tmpfile('big.eml', undef, my $append = 1) or die;
-	printf($fh <<'EOF', $addr, $mid) or die;
+	my $hdr = sprintf(<<'EOF', $addr, $mid);
 From: Dr. X <x@example.com>
 To: Nikki <%s>
 Date: Tue, 3 May 1988 00:00:00 +0000
 Subject: todo
 Message-ID: <%s>
+MIME-Version: 1.0
+Content-Type: multipart/mixed; boundary="FOO"
+Content-Disposition: inline
+
+--FOO
+Content-Type: text/plain; charset=utf-8
+Content-Disposition: inline
+
+EOF
+	print $fh $hdr or die;
+	for (0..$lines) { print $fh 'x' x 72, "\n" or die }
+	print $fh <<EOF or die;
+
+--FOO
+Content-Type: text/plain; charset=utf-8
+Content-Disposition: inline
 
 EOF
 	for (0..$lines) { print $fh 'x' x 72, "\n" or die }
+	print $fh "\n--FOO--\n" or die;
 	$fh->flush or die;
 	sysseek($fh, 0, SEEK_SET) or die;
 	my $env = { ORIGINAL_RECIPIENT => $addr };
 	my $err = '';
 	my $opt = { 0 => $fh, 2 => \$err, run_mode => 0 };
 	ok(run_script([qw(-mda --no-precheck)], $env, $opt),
-		'1st message delivered');
-
-	# resend the message with same mid but different content
-	print $fh "mindcrime\n" or die;
-	$fh->flush or die;
-	sysseek($fh, 0, SEEK_SET) or die;
-	ok(run_script([qw(-mda --no-precheck)], $env, $opt),
-		'2nd message delivered');
+		'message delivered');
 }
 
 my $www = PublicInbox::WWW->new;

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 2/2] www: discard multipart parent on iteration
  2020-01-11  6:28 [PATCH 0/2] www: memory reductions for multipart Eric Wong
  2020-01-11  6:28 ` [PATCH 1/2] xt/mem-msgview.t: change to test one multipart message Eric Wong
@ 2020-01-11  6:28 ` Eric Wong
  1 sibling, 0 replies; 3+ messages in thread
From: Eric Wong @ 2020-01-11  6:28 UTC (permalink / raw)
  To: meta

We're often iterating through messages while writing to another
buffer in our WWW interface, causing memory usage to multiply.
Since we know we won't need to keep the MIME object around in
some cases, and can tell msg_iter to clobber the on-stack
variable while it operates on subparts of multipart messages.

With xt/mem-msgview.t switched to multipart from the previous
commit, this shows a 13 MB memory reduction on that test.
---
 lib/PublicInbox/MsgIter.pm       | 5 +++--
 lib/PublicInbox/SolverGit.pm     | 5 ++---
 lib/PublicInbox/View.pm          | 6 +++---
 lib/PublicInbox/WwwAtomStream.pm | 4 ++--
 lib/PublicInbox/WwwAttach.pm     | 2 +-
 5 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/lib/PublicInbox/MsgIter.pm b/lib/PublicInbox/MsgIter.pm
index cdd78b39..f238954b 100644
--- a/lib/PublicInbox/MsgIter.pm
+++ b/lib/PublicInbox/MsgIter.pm
@@ -12,10 +12,11 @@ use PublicInbox::MIME;
 # Like Email::MIME::walk_parts, but this is:
 # * non-recursive
 # * passes depth and indices to the iterator callback
-sub msg_iter ($$;$) {
-	my ($mime, $cb, $cb_arg) = @_;
+sub msg_iter ($$;$$) {
+	my ($mime, $cb, $cb_arg, $do_undef) = @_;
 	my @parts = $mime->subparts;
 	if (@parts) {
+		$mime = $_[0] = undef if $do_undef; # saves some memory
 		my $i = 0;
 		@parts = map { [ $_, 1, ++$i ] } @parts;
 		while (my $p = shift @parts) {
diff --git a/lib/PublicInbox/SolverGit.pm b/lib/PublicInbox/SolverGit.pm
index 8629f0da..b48e8ac4 100644
--- a/lib/PublicInbox/SolverGit.pm
+++ b/lib/PublicInbox/SolverGit.pm
@@ -233,9 +233,8 @@ sub find_extract_diffs ($$$) {
 	my $diffs = [];
 	foreach my $smsg (@$msgs) {
 		$ibx->smsg_mime($smsg) or next;
-		my $mime = delete $smsg->{mime};
-		msg_iter($mime, \&extract_diff,
-				[$self, $diffs, $pre, $post, $ibx, $smsg]);
+		msg_iter(delete $smsg->{mime}, \&extract_diff,
+				[$self, $diffs, $pre, $post, $ibx, $smsg], 1);
 	}
 	@$diffs ? $diffs : undef;
 }
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 405da2a9..d88b34da 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -275,7 +275,7 @@ sub index_entry {
 	# scan through all parts, looking for displayable text
 	$ctx->{mhref} = $mhref;
 	$ctx->{rv} = \$rv;
-	msg_iter($mime, \&add_text_body, $ctx);
+	msg_iter($mime, \&add_text_body, $ctx, 1);
 	delete $ctx->{rv};
 
 	# add the footer
@@ -506,12 +506,12 @@ sub thread_html_i { # PublicInbox::WwwStream::getline callback
 }
 
 sub multipart_text_as_html {
-	my ($mime, $mhref, $ctx) = @_;
+	my (undef, $mhref, $ctx) = @_; # $mime = $_[0]
 	$ctx->{mhref} = $mhref;
 	$ctx->{rv} = \(my $rv = '');
 
 	# scan through all parts, looking for displayable text
-	msg_iter($mime, \&add_text_body, $ctx);
+	msg_iter($_[0], \&add_text_body, $ctx, 1);
 	${delete $ctx->{rv}};
 }
 
diff --git a/lib/PublicInbox/WwwAtomStream.pm b/lib/PublicInbox/WwwAtomStream.pm
index 9430dd97..9ec1383d 100644
--- a/lib/PublicInbox/WwwAtomStream.pm
+++ b/lib/PublicInbox/WwwAtomStream.pm
@@ -101,9 +101,9 @@ sub atom_header {
 sub feed_entry {
 	my ($self, $smsg) = @_;
 	my $ctx = $self->{ctx};
-	my $mime = $smsg->{mime};
+	my $mid = $smsg->mid; # may extract Message-ID from {mime}
+	my $mime = delete $smsg->{mime};
 	my $hdr = $mime->header_obj;
-	my $mid = $smsg->mid;
 	my $irt = PublicInbox::View::in_reply_to($hdr);
 	my $uuid = to_uuid($mid);
 	my $base = $ctx->{feed_base_url};
diff --git a/lib/PublicInbox/WwwAttach.pm b/lib/PublicInbox/WwwAttach.pm
index cda1c6c8..92f47e49 100644
--- a/lib/PublicInbox/WwwAttach.pm
+++ b/lib/PublicInbox/WwwAttach.pm
@@ -40,7 +40,7 @@ sub get_attach ($$$) {
 	my $mime = $ctx->{-inbox}->msg_by_mid($ctx->{mid}) or return $res;
 	$mime = PublicInbox::MIME->new($mime);
 	$res->[3] = $idx;
-	msg_iter($mime, \&get_attach_i, $res);
+	msg_iter($mime, \&get_attach_i, $res, 1);
 	pop @$res; # cleanup before letting PSGI server see it
 	$res
 }

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, back to index

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-11  6:28 [PATCH 0/2] www: memory reductions for multipart Eric Wong
2020-01-11  6:28 ` [PATCH 1/2] xt/mem-msgview.t: change to test one multipart message Eric Wong
2020-01-11  6:28 ` [PATCH 2/2] www: discard multipart parent on iteration Eric Wong

user/dev discussion of public-inbox itself

Archives are clonable:
	git clone --mirror http://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.io/gmane.mail.public-inbox.general

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git