user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
* [PATCH 0/3] some efficiency improvements
@ 2020-04-05  7:53 Eric Wong
  2020-04-05  7:53 ` [PATCH 1/3] release large (non ref) scalars using `undef $sv' Eric Wong
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: Eric Wong @ 2020-04-05  7:53 UTC (permalink / raw)
  To: meta

[1/3] I seem to forget every decade or so that `$sv = undef'
and `undef $sv' have different behaviors.  Sometimes I want
the former, but here I want the latter.

I'm not entirely happy with [2/3], but the worst case is
theoretically better than before.  I really wish Perl
and PSGI could support things like writev+sendmmsg
well.

[3/3] will make some difference with many inboxes.

Eric Wong (3):
  release large (non ref) scalars using `undef $sv'
  mbox: halve ->getline "context switches"
  git: reduce stat buffer storage overhead

 lib/PublicInbox/Git.pm       | 13 ++++++-------
 lib/PublicInbox/Mbox.pm      |  9 +++------
 lib/PublicInbox/SearchIdx.pm | 33 ++++++++++++++++-----------------
 lib/PublicInbox/View.pm      |  4 ++--
 lib/PublicInbox/ViewDiff.pm  |  2 +-
 5 files changed, 28 insertions(+), 33 deletions(-)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/3] release large (non ref) scalars using `undef $sv'
  2020-04-05  7:53 [PATCH 0/3] some efficiency improvements Eric Wong
@ 2020-04-05  7:53 ` Eric Wong
  2020-04-05  7:53 ` [PATCH 2/3] mbox: halve ->getline "context switches" Eric Wong
  2020-04-05  7:53 ` [PATCH 3/3] git: reduce stat buffer storage overhead Eric Wong
  2 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2020-04-05  7:53 UTC (permalink / raw)
  To: meta

Using `undef EXPR' like a function call actually frees the heap
memory associated with the scalar, whereas `$sv = undef' or
`$sv = ""' will hold the buffer around until $sv goes out
of scope.

The `sv_set_undef' documentation in the perlapi(1) manpage
explicitly states this:

  The perl equivalent is "$sv = undef;". Note that it doesn't
  free any string buffer, unlike "undef $sv".

And I've confirmed by reading Dump() output from Devel::Peek.

We'll also inline the old index_body sub in SearchIdx.pm to make
the scope of the scalar more obvious.

This change saves several hundred kB RSS on both -index and
-httpd when hitting large emails with thousands of lines.
---
 lib/PublicInbox/SearchIdx.pm | 33 ++++++++++++++++-----------------
 lib/PublicInbox/View.pm      |  4 ++--
 lib/PublicInbox/ViewDiff.pm  |  2 +-
 3 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index 89d8bc2b..9a5484e3 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -275,22 +275,8 @@ sub index_diff ($$$) {
 	index_text($self, join("\n", @xnq), 1, 'XNQ');
 }
 
-sub index_body ($$$) {
-	my ($self, $txt, $doc) = @_;
-	if ($doc) {
-		# does it look like a diff?
-		if ($txt =~ /^(?:diff|---|\+\+\+) /ms) {
-			index_diff($self, $txt, $doc);
-		} else {
-			index_text($self, $txt, 1, 'XNQ');
-		}
-	} else {
-		index_text($self, $txt, 0, 'XQUOT');
-	}
-}
-
 sub index_xapian { # msg_iter callback
-	my ($part, $depth, @idx) = @{$_[0]};
+	my $part = $_[0]->[0]; # ignore $depth and @idx
 	my ($self, $doc) = @{$_[1]};
 	my $ct = $part->content_type || 'text/plain';
 	my $fn = $part->filename;
@@ -300,11 +286,24 @@ sub index_xapian { # msg_iter callback
 
 	my ($s, undef) = msg_part_text($part, $ct);
 	defined $s or return;
+	$_[0]->[0] = $part = undef; # free memory
 
 	# split off quoted and unquoted blocks:
 	my @sections = PublicInbox::MsgIter::split_quotes($s);
-	$part = $s = undef;
-	index_body($self, $_, /\A>/ ? 0 : $doc) for @sections;
+	undef $s; # free memory
+	for my $txt (@sections) {
+		if ($txt =~ /\A>/) {
+			index_text($self, $txt, 0, 'XQUOT');
+		} else {
+			# does it look like a diff?
+			if ($txt =~ /^(?:diff|---|\+\+\+) /ms) {
+				index_diff($self, $txt, $doc);
+			} else {
+				index_text($self, $txt, 1, 'XNQ');
+			}
+		}
+		undef $txt; # free memory
+	}
 }
 
 sub add_xapian ($$$$) {
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index 1e53d8dc..ddd94e48 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -566,7 +566,7 @@ sub add_text_body { # callback for msg_iter
 
 	# split off quoted and unquoted blocks:
 	my @sections = PublicInbox::MsgIter::split_quotes($s);
-	$s = '';
+	undef $s; # free memory
 	my $rv = $ctx->{obuf};
 	if (defined($fn) || $depth > 0 || $err) {
 		# badly-encoded message with $err? tell the world about it!
@@ -587,7 +587,7 @@ sub add_text_body { # callback for msg_iter
 			# regular lines, OK
 			$$rv .= $l->to_html($cur);
 		}
-		$cur = undef;
+		undef $cur; # free memory
 	}
 
 	obfuscate_addrs($ibx, $$rv) if $ibx->{obfuscate};
diff --git a/lib/PublicInbox/ViewDiff.pm b/lib/PublicInbox/ViewDiff.pm
index f7422712..3d6058a9 100644
--- a/lib/PublicInbox/ViewDiff.pm
+++ b/lib/PublicInbox/ViewDiff.pm
@@ -192,7 +192,7 @@ sub flush_diff ($$) {
 	my ($ctx, $cur) = @_;
 
 	my @top = split($EXTRACT_DIFFS, $$cur);
-	$$cur = undef;
+	undef $$cur; # free memory
 
 	my $linkify = $ctx->{-linkify};
 	my $dst = $ctx->{obuf};

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/3] mbox: halve ->getline "context switches"
  2020-04-05  7:53 [PATCH 0/3] some efficiency improvements Eric Wong
  2020-04-05  7:53 ` [PATCH 1/3] release large (non ref) scalars using `undef $sv' Eric Wong
@ 2020-04-05  7:53 ` Eric Wong
  2020-04-05  7:53 ` [PATCH 3/3] git: reduce stat buffer storage overhead Eric Wong
  2 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2020-04-05  7:53 UTC (permalink / raw)
  To: meta

We don't need to take extra trips through the event loop for a
single message (in the common case of Message-IDs being unique).
In fact, holding the body reference left behind by Email::Simple
could be harmful to memory usage, though in practice it's not a
big problem since code paths which use Email::MIME take far more.
---
 lib/PublicInbox/Mbox.pm | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/lib/PublicInbox/Mbox.pm b/lib/PublicInbox/Mbox.pm
index 3013dc91..d5beceaf 100644
--- a/lib/PublicInbox/Mbox.pm
+++ b/lib/PublicInbox/Mbox.pm
@@ -44,19 +44,16 @@ sub getline {
 	my ($ctx, $id, $prev, $next, $mref, $hdr) = @$more;
 	if ($hdr) { # first message hits this, only
 		pop @$more; # $hdr
-		return msg_hdr($ctx, $hdr);
-	}
-	if ($mref) { # all messages hit this
 		pop @$more; # $mref
-		return msg_body($$mref);
+		return msg_hdr($ctx, $hdr) . msg_body($$mref);
 	}
 	my $cur = $next or return;
 	my $ibx = $ctx->{-inbox};
 	$next = $ibx->over->next_by_mid($ctx->{mid}, \$id, \$prev);
 	$mref = $ibx->msg_by_smsg($cur) or return;
 	$hdr = Email::Simple->new($mref)->header_obj;
-	@$more = ($ctx, $id, $prev, $next, $mref); # $next may be undef, here
-	msg_hdr($ctx, $hdr); # all but first message hits this
+	@$more = ($ctx, $id, $prev, $next); # $next may be undef, here
+	msg_hdr($ctx, $hdr) . msg_body($$mref);
 }
 
 sub close {} # noop

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 3/3] git: reduce stat buffer storage overhead
  2020-04-05  7:53 [PATCH 0/3] some efficiency improvements Eric Wong
  2020-04-05  7:53 ` [PATCH 1/3] release large (non ref) scalars using `undef $sv' Eric Wong
  2020-04-05  7:53 ` [PATCH 2/3] mbox: halve ->getline "context switches" Eric Wong
@ 2020-04-05  7:53 ` Eric Wong
  2 siblings, 0 replies; 4+ messages in thread
From: Eric Wong @ 2020-04-05  7:53 UTC (permalink / raw)
  To: meta

The stat() array is a whopping 480 bytes (on x86-64, Perl 5.28),
while the new packed representation of two 64-bit doubles as a
scalar is "only" 56 bytes.  This can add up when there's many
inboxes.  Just use a string comparison on the packed
representation.

Some 32-bit Perl builds (IIRC OpenBSD) lack quad support, so
doubles were chosen for pack() portability.
---
 lib/PublicInbox/Git.pm | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/lib/PublicInbox/Git.pm b/lib/PublicInbox/Git.pm
index 9c96b3f0..8410b2fc 100644
--- a/lib/PublicInbox/Git.pm
+++ b/lib/PublicInbox/Git.pm
@@ -55,10 +55,8 @@ sub git_quote ($) {
 
 sub new {
 	my ($class, $git_dir) = @_;
-	my @st;
-	$st[7] = $st[10] = 0;
 	# may contain {-tmp} field for File::Temp::Dir
-	bless { git_dir => $git_dir, st => \@st, -git_path => {} }, $class
+	bless { git_dir => $git_dir, alt_st => '', -git_path => {} }, $class
 }
 
 sub git_path ($$) {
@@ -79,10 +77,11 @@ sub alternates_changed {
 	my ($self) = @_;
 	my $alt = git_path($self, 'objects/info/alternates');
 	my @st = stat($alt) or return 0;
-	my $old_st = $self->{st};
-	# 10 - ctime, 7 - size
-	return 0 if ($st[10] == $old_st->[10] && $st[7] == $old_st->[7]);
-	$self->{st} = \@st;
+
+	# can't rely on 'q' on some 32-bit builds, but `d' works
+	my $st = pack('dd', $st[10], $st[7]); # 10: ctime, 7: size
+	return 0 if $self->{alt_st} eq $st;
+	$self->{alt_st} = $st; # always a true value
 }
 
 sub last_check_err {

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-04-05  7:53 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-05  7:53 [PATCH 0/3] some efficiency improvements Eric Wong
2020-04-05  7:53 ` [PATCH 1/3] release large (non ref) scalars using `undef $sv' Eric Wong
2020-04-05  7:53 ` [PATCH 2/3] mbox: halve ->getline "context switches" Eric Wong
2020-04-05  7:53 ` [PATCH 3/3] git: reduce stat buffer storage overhead Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).