user/dev discussion of public-inbox itself
 help / color / Atom feed
From: Eric Wong <e@yhbt.net>
To: meta@public-inbox.org
Subject: [PATCH 2/2] www: discard multipart parent on iteration
Date: Sat, 11 Jan 2020 06:28:16 +0000
Message-ID: <20200111062816.9891-3-e@yhbt.net> (raw)
In-Reply-To: <20200111062816.9891-1-e@yhbt.net>

We're often iterating through messages while writing to another
buffer in our WWW interface, causing memory usage to multiply.
Since we know we won't need to keep the MIME object around in
some cases, and can tell msg_iter to clobber the on-stack
variable while it operates on subparts of multipart messages.

With xt/mem-msgview.t switched to multipart from the previous
commit, this shows a 13 MB memory reduction on that test.
---
 lib/PublicInbox/MsgIter.pm       | 5 +++--
 lib/PublicInbox/SolverGit.pm     | 5 ++---
 lib/PublicInbox/View.pm          | 6 +++---
 lib/PublicInbox/WwwAtomStream.pm | 4 ++--
 lib/PublicInbox/WwwAttach.pm     | 2 +-
 5 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/lib/PublicInbox/MsgIter.pm b/lib/PublicInbox/MsgIter.pm
index cdd78b39..f238954b 100644
--- a/lib/PublicInbox/MsgIter.pm
+++ b/lib/PublicInbox/MsgIter.pm
@@ -12,10 +12,11 @@ use PublicInbox::MIME;
 # Like Email::MIME::walk_parts, but this is:
 # * non-recursive
 # * passes depth and indices to the iterator callback
-sub msg_iter ($$;$) {
-	my ($mime, $cb, $cb_arg) = @_;
+sub msg_iter ($$;$$) {
+	my ($mime, $cb, $cb_arg, $do_undef) = @_;
 	my @parts = $mime->subparts;
 	if (@parts) {
+		$mime = $_[0] = undef if $do_undef; # saves some memory
 		my $i = 0;
 		@parts = map { [ $_, 1, ++$i ] } @parts;
 		while (my $p = shift @parts) {
diff --git a/lib/PublicInbox/SolverGit.pm b/lib/PublicInbox/SolverGit.pm
index 8629f0da..b48e8ac4 100644
--- a/lib/PublicInbox/SolverGit.pm
+++ b/lib/PublicInbox/SolverGit.pm
@@ -233,9 +233,8 @@ sub find_extract_diffs ($$$) {
 	my $diffs = [];
 	foreach my $smsg (@$msgs) {
 		$ibx->smsg_mime($smsg) or next;
-		my $mime = delete $smsg->{mime};
-		msg_iter($mime, \&extract_diff,
-				[$self, $diffs, $pre, $post, $ibx, $smsg]);
+		msg_iter(delete $smsg->{mime}, \&extract_diff,
+				[$self, $diffs, $pre, $post, $ibx, $smsg], 1);
 	}
 	@$diffs ? $diffs : undef;
 }
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 405da2a9..d88b34da 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -275,7 +275,7 @@ sub index_entry {
 	# scan through all parts, looking for displayable text
 	$ctx->{mhref} = $mhref;
 	$ctx->{rv} = \$rv;
-	msg_iter($mime, \&add_text_body, $ctx);
+	msg_iter($mime, \&add_text_body, $ctx, 1);
 	delete $ctx->{rv};
 
 	# add the footer
@@ -506,12 +506,12 @@ sub thread_html_i { # PublicInbox::WwwStream::getline callback
 }
 
 sub multipart_text_as_html {
-	my ($mime, $mhref, $ctx) = @_;
+	my (undef, $mhref, $ctx) = @_; # $mime = $_[0]
 	$ctx->{mhref} = $mhref;
 	$ctx->{rv} = \(my $rv = '');
 
 	# scan through all parts, looking for displayable text
-	msg_iter($mime, \&add_text_body, $ctx);
+	msg_iter($_[0], \&add_text_body, $ctx, 1);
 	${delete $ctx->{rv}};
 }
 
diff --git a/lib/PublicInbox/WwwAtomStream.pm b/lib/PublicInbox/WwwAtomStream.pm
index 9430dd97..9ec1383d 100644
--- a/lib/PublicInbox/WwwAtomStream.pm
+++ b/lib/PublicInbox/WwwAtomStream.pm
@@ -101,9 +101,9 @@ sub atom_header {
 sub feed_entry {
 	my ($self, $smsg) = @_;
 	my $ctx = $self->{ctx};
-	my $mime = $smsg->{mime};
+	my $mid = $smsg->mid; # may extract Message-ID from {mime}
+	my $mime = delete $smsg->{mime};
 	my $hdr = $mime->header_obj;
-	my $mid = $smsg->mid;
 	my $irt = PublicInbox::View::in_reply_to($hdr);
 	my $uuid = to_uuid($mid);
 	my $base = $ctx->{feed_base_url};
diff --git a/lib/PublicInbox/WwwAttach.pm b/lib/PublicInbox/WwwAttach.pm
index cda1c6c8..92f47e49 100644
--- a/lib/PublicInbox/WwwAttach.pm
+++ b/lib/PublicInbox/WwwAttach.pm
@@ -40,7 +40,7 @@ sub get_attach ($$$) {
 	my $mime = $ctx->{-inbox}->msg_by_mid($ctx->{mid}) or return $res;
 	$mime = PublicInbox::MIME->new($mime);
 	$res->[3] = $idx;
-	msg_iter($mime, \&get_attach_i, $res);
+	msg_iter($mime, \&get_attach_i, $res, 1);
 	pop @$res; # cleanup before letting PSGI server see it
 	$res
 }

      parent reply index

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-11  6:28 [PATCH 0/2] www: memory reductions for multipart Eric Wong
2020-01-11  6:28 ` [PATCH 1/2] xt/mem-msgview.t: change to test one multipart message Eric Wong
2020-01-11  6:28 ` Eric Wong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200111062816.9891-3-e@yhbt.net \
    --to=e@yhbt.net \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

user/dev discussion of public-inbox itself

Archives are clonable:
	git clone --mirror https://public-inbox.org/meta
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

Example config snippet for mirrors

Newsgroups are available over NNTP:
	nntp://news.public-inbox.org/inbox.comp.mail.public-inbox.meta
	nntp://ou63pmih66umazou.onion/inbox.comp.mail.public-inbox.meta
	nntp://czquwvybam4bgbro.onion/inbox.comp.mail.public-inbox.meta
	nntp://hjrcffqmbrq6wope.onion/inbox.comp.mail.public-inbox.meta
	nntp://news.gmane.io/gmane.mail.public-inbox.general

 note: .onion URLs require Tor: https://www.torproject.org/

AGPL code for this site: git clone https://public-inbox.org/public-inbox.git