user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
* [PATCH 0/4] WWW-related memory savings
@ 2021-10-09 12:03 Eric Wong
  2021-10-09 12:03 ` [PATCH 1/4] solver_git: shorten scalar lifetimes Eric Wong
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Eric Wong @ 2021-10-09 12:03 UTC (permalink / raw)
  To: meta

Some things I noticed while tracking down our own
reference cycle leak and the Encode <= 3.12 memory leak.
There's more aggressive stuff I'm testing, too, but
I've yet to check throughput performance.

Eric Wong (4):
  solver_git: shorten scalar lifetimes
  view: discard Eml->{bdy} when done using
  http: avoid Perl target cache for psgi.input
  view: save memory by dropping smsg->{from_name} on use

 lib/PublicInbox/HTTP.pm       | 30 ++++++------------------------
 lib/PublicInbox/SearchView.pm |  2 +-
 lib/PublicInbox/Smsg.pm       |  9 +++------
 lib/PublicInbox/SolverGit.pm  |  8 ++++----
 lib/PublicInbox/View.pm       |  4 +++-
 5 files changed, 17 insertions(+), 36 deletions(-)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/4] solver_git: shorten scalar lifetimes
  2021-10-09 12:03 [PATCH 0/4] WWW-related memory savings Eric Wong
@ 2021-10-09 12:03 ` Eric Wong
  2021-10-09 12:03 ` [PATCH 2/4] view: discard Eml->{bdy} when done using Eric Wong
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2021-10-09 12:03 UTC (permalink / raw)
  To: meta

Some of these scalar buffers may be large patches, so try
to keep them as short-lived as possible to reduce memory
pressure.
---
 lib/PublicInbox/SolverGit.pm | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/PublicInbox/SolverGit.pm b/lib/PublicInbox/SolverGit.pm
index b0cd0f2c..5d5060f4 100644
--- a/lib/PublicInbox/SolverGit.pm
+++ b/lib/PublicInbox/SolverGit.pm
@@ -111,8 +111,6 @@ sub extract_diff ($$) {
 	my ($self, $want, $smsg) = @$arg;
 	my ($part) = @$p; # ignore $depth and @idx;
 	my $ct = $part->content_type || 'text/plain';
-	my ($s, undef) = msg_part_text($part, $ct);
-	defined $s or return;
 	my $post = $want->{oid_b};
 	my $pre = $want->{oid_a};
 	if (!defined($pre) || $pre !~ /\A[a-f0-9]+\z/) {
@@ -122,11 +120,12 @@ sub extract_diff ($$) {
 	# Email::MIME::Encodings forces QP to be CRLF upon decoding,
 	# change it back to LF:
 	my $cte = $part->header('Content-Transfer-Encoding') || '';
+	my ($s, undef) = msg_part_text($part, $ct);
+	defined $s or return;
+	delete $part->{bdy};
 	if ($cte =~ /\bquoted-printable\b/i && $part->crlf eq "\n") {
 		$s =~ s/\r\n/\n/sg;
 	}
-
-
 	$s =~ m!( # $1 start header lines we save for debugging:
 
 		# everything before ^index is optional, but we don't
@@ -169,6 +168,7 @@ sub extract_diff ($$) {
 		# because git-apply(1) handles that case, too
 		(?:^(?:[\@\+\x20\-\\][^\n]*|)$LF)+
 	)!smx or return;
+	undef $s; # free memory
 
 	my $di = {
 		hdr_lines => $1,

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 2/4] view: discard Eml->{bdy} when done using
  2021-10-09 12:03 [PATCH 0/4] WWW-related memory savings Eric Wong
  2021-10-09 12:03 ` [PATCH 1/4] solver_git: shorten scalar lifetimes Eric Wong
@ 2021-10-09 12:03 ` Eric Wong
  2021-10-09 12:03 ` [PATCH 3/4] http: avoid Perl target cache for psgi.input Eric Wong
  2021-10-09 12:03 ` [PATCH 4/4] view: save memory by dropping smsg->{from_name} on use Eric Wong
  3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2021-10-09 12:03 UTC (permalink / raw)
  To: meta

We can release the raw body buffer once we've obtained a copy of
the decoded buffer.  This reduces memory pressure ahead of some
expensive diff processing.
---
 lib/PublicInbox/View.pm | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 64e73234..a6944b80 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -533,6 +533,7 @@ sub attach_link ($$$$;$) {
 
 	my $nl = $idx eq '1' ? '' : "\n"; # like join("\n", ...)
 	my $size = length($part->body);
+	delete $part->{bdy}; # save memory
 
 	# hide attributes normally, unless we want to aid users in
 	# spotting MUA problems:
@@ -632,6 +633,7 @@ sub add_text_body { # callback for each_part
 		attach_link($ctx, $ct, $p, $fn, $err);
 		$$rv .= "\n";
 	}
+	delete $part->{bdy}; # save memory
 	foreach my $cur (@sections) {
 		if ($cur =~ /\A>/) {
 			# we use a <span> here to allow users to specify

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 3/4] http: avoid Perl target cache for psgi.input
  2021-10-09 12:03 [PATCH 0/4] WWW-related memory savings Eric Wong
  2021-10-09 12:03 ` [PATCH 1/4] solver_git: shorten scalar lifetimes Eric Wong
  2021-10-09 12:03 ` [PATCH 2/4] view: discard Eml->{bdy} when done using Eric Wong
@ 2021-10-09 12:03 ` Eric Wong
  2021-10-09 12:03 ` [PATCH 4/4] view: save memory by dropping smsg->{from_name} on use Eric Wong
  3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2021-10-09 12:03 UTC (permalink / raw)
  To: meta

By using syswrite to populate env->{psgi.input}.  The substr()
call IO::Handle->write will trigger Perl's target/scratchpad and
result in a permanent allocation.  Since this is a cold path,
that allocation is pointless, and syswrite() can already write a
substring.

Allowing Perl to cache a large allocation in a cold path only
result in fragmentation and wasted RAM.

write(2) on a regular file won't result in short writes
unless the FS quotas or free space limits are hit, or the buffer
is close to overflowing (e.g. the 0x7ffff000-byte Linux limit).
Since our HTTP server will never buffer that much in RAM,
there's no need to retry syswrite nor rely on the retrying
implicit in IO::Handle->write and the "print" perlop.
---
 lib/PublicInbox/HTTP.pm | 30 ++++++------------------------
 1 file changed, 6 insertions(+), 24 deletions(-)

diff --git a/lib/PublicInbox/HTTP.pm b/lib/PublicInbox/HTTP.pm
index b2c74cf3..82c2b200 100644
--- a/lib/PublicInbox/HTTP.pm
+++ b/lib/PublicInbox/HTTP.pm
@@ -26,7 +26,6 @@ use Plack::HTTPParser qw(parse_http_request); # XS or pure Perl
 use Plack::Util;
 use HTTP::Status qw(status_message);
 use HTTP::Date qw(time2str);
-use IO::Handle; # ->write
 use PublicInbox::DS qw(msg_more);
 use PublicInbox::Syscall qw(EPOLLIN EPOLLONESHOT);
 use PublicInbox::Tmpfile;
@@ -117,15 +116,6 @@ sub rbuf_process {
 	$len ? read_input($self, $rbuf) : app_dispatch($self, undef, $rbuf);
 }
 
-# IO::Handle::write returns boolean, this returns bytes written:
-sub xwrite ($$$) {
-	my ($fh, $rbuf, $max) = @_;
-	my $w = length($$rbuf);
-	$w = $max if $w > $max;
-	$fh->write($$rbuf, $w) or return;
-	$w;
-}
-
 sub read_input ($;$) {
 	my ($self, $rbuf) = @_;
 	$rbuf //= $self->{rbuf} // (\(my $x = ''));
@@ -138,7 +128,7 @@ sub read_input ($;$) {
 
 	while ($len > 0) {
 		if ($$rbuf ne '') {
-			my $w = xwrite($input, $rbuf, $len);
+			my $w = syswrite($input, $$rbuf, $len);
 			return write_err($self, $len) unless $w;
 			$len -= $w;
 			die "BUG: $len < 0 (w=$w)" if $len < 0;
@@ -333,12 +323,6 @@ sub response_write {
 	}
 }
 
-sub input_tmpfile ($) {
-	my $input = tmpfile('http.input', $_[0]->{sock}) or return;
-	$input->autoflush(1);
-	$input;
-}
-
 sub input_prepare {
 	my ($self, $env) = @_;
 	my ($input, $len);
@@ -354,24 +338,22 @@ sub input_prepare {
 		return quit($self, 400) if $hte !~ /\Achunked\z/i;
 
 		$len = CHUNK_START;
-		$input = input_tmpfile($self);
+		$input = tmpfile('http.input', $self->{sock});
 	} else {
 		$len = $env->{CONTENT_LENGTH};
 		if (defined $len) {
 			# rfc7230 3.3.3.4
 			return quit($self, 400) if $len !~ /\A[0-9]+\z/;
-
 			return quit($self, 413) if $len > $MAX_REQUEST_BUFFER;
-			$input = $len ? input_tmpfile($self) : $null_io;
+			$input = $len ? tmpfile('http.input', $self->{sock})
+				: $null_io;
 		} else {
 			$input = $null_io;
 		}
 	}
 
 	# TODO: expire idle clients on ENFILE / EMFILE
-	return unless $input;
-
-	$env->{'psgi.input'} = $input;
+	$env->{'psgi.input'} = $input // return;
 	$self->{env} = $env;
 	$self->{input_left} = $len || 0;
 }
@@ -441,7 +423,7 @@ sub read_input_chunked { # unlikely...
 		# drain the current chunk
 		until ($len <= 0) {
 			if ($$rbuf ne '') {
-				my $w = xwrite($input, $rbuf, $len);
+				my $w = syswrite($input, $$rbuf, $len);
 				return write_err($self, "$len chunk") if !$w;
 				$len -= $w;
 				if ($len == 0) {

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 4/4] view: save memory by dropping smsg->{from_name} on use
  2021-10-09 12:03 [PATCH 0/4] WWW-related memory savings Eric Wong
                   ` (2 preceding siblings ...)
  2021-10-09 12:03 ` [PATCH 3/4] http: avoid Perl target cache for psgi.input Eric Wong
@ 2021-10-09 12:03 ` Eric Wong
  3 siblings, 0 replies; 5+ messages in thread
From: Eric Wong @ 2021-10-09 12:03 UTC (permalink / raw)
  To: meta

We'll also save a few LoC when generating it.  $smsg objects can
linger a while when rendering large threads, so saving a few
bytes here can add up to several hundred KB saved.

I noticed this while chasing the ref cycle leak in commit
b28e74c9dc0a (www: fix ref cycle from threading w/ extindex, 2021-10-03).
While there's no longer a leak, releasing memory earlier can
allow it to be reused sooner and reduce both memory traffic and
memory pressure.
---
 lib/PublicInbox/SearchView.pm | 2 +-
 lib/PublicInbox/Smsg.pm       | 9 +++------
 lib/PublicInbox/View.pm       | 2 +-
 3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/lib/PublicInbox/SearchView.pm b/lib/PublicInbox/SearchView.pm
index 91196cca..e74ddb90 100644
--- a/lib/PublicInbox/SearchView.pm
+++ b/lib/PublicInbox/SearchView.pm
@@ -122,7 +122,7 @@ sub mset_summary {
 		$min = $pct;
 
 		my $s = ascii_html($smsg->{subject});
-		my $f = ascii_html($smsg->{from_name});
+		my $f = ascii_html(delete $smsg->{from_name});
 		if ($obfs_ibx) {
 			obfuscate_addrs($obfs_ibx, $s);
 			obfuscate_addrs($obfs_ibx, $f);
diff --git a/lib/PublicInbox/Smsg.pm b/lib/PublicInbox/Smsg.pm
index fb28eff7..a2f54507 100644
--- a/lib/PublicInbox/Smsg.pm
+++ b/lib/PublicInbox/Smsg.pm
@@ -57,15 +57,12 @@ sub load_from_data ($$) {
 sub psgi_cull ($) {
 	my ($self) = @_;
 
-	# ghosts don't have ->{from}
-	my $from = delete($self->{from}) // '';
-	my @n = PublicInbox::Address::names($from);
-	$self->{from_name} = join(', ', @n);
-
 	# drop NNTP-only fields which aren't relevant to PSGI results:
 	# saves ~80K on a 200 item search result:
 	# TODO: we may need to keep some of these for JMAP...
-	delete @$self{qw(tid to cc bytes lines)};
+	my ($f) = delete @$self{qw(from tid to cc bytes lines)};
+	# ghosts don't have ->{from}
+	$self->{from_name} = join(', ', PublicInbox::Address::names($f // ''));
 	$self;
 }
 
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index a6944b80..116aa641 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -978,7 +978,7 @@ sub skel_dump { # walk_thread callback
 		$$skel .= delete($ctx->{sl_note}) || '';
 	}
 
-	my $f = ascii_html($smsg->{from_name});
+	my $f = ascii_html(delete $smsg->{from_name});
 	my $obfs_ibx = $ctx->{-obfs_ibx};
 	obfuscate_addrs($obfs_ibx, $f) if $obfs_ibx;
 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-10-09 12:03 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-09 12:03 [PATCH 0/4] WWW-related memory savings Eric Wong
2021-10-09 12:03 ` [PATCH 1/4] solver_git: shorten scalar lifetimes Eric Wong
2021-10-09 12:03 ` [PATCH 2/4] view: discard Eml->{bdy} when done using Eric Wong
2021-10-09 12:03 ` [PATCH 3/4] http: avoid Perl target cache for psgi.input Eric Wong
2021-10-09 12:03 ` [PATCH 4/4] view: save memory by dropping smsg->{from_name} on use Eric Wong

Code repositories for project(s) associated with this inbox:

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).