user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* [PATCH 4/4] view: save memory by dropping smsg->{from_name} on use
  @ 2021-10-09 12:03  6% ` Eric Wong
  0 siblings, 0 replies; 3+ results
From: Eric Wong @ 2021-10-09 12:03 UTC (permalink / raw)
  To: meta

We'll also save a few LoC when generating it.  $smsg objects can
linger a while when rendering large threads, so saving a few
bytes here can add up to several hundred KB saved.

I noticed this while chasing the ref cycle leak in commit
b28e74c9dc0a (www: fix ref cycle from threading w/ extindex, 2021-10-03).
While there's no longer a leak, releasing memory earlier can
allow it to be reused sooner and reduce both memory traffic and
memory pressure.
---
 lib/PublicInbox/SearchView.pm | 2 +-
 lib/PublicInbox/Smsg.pm       | 9 +++------
 lib/PublicInbox/View.pm       | 2 +-
 3 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/lib/PublicInbox/SearchView.pm b/lib/PublicInbox/SearchView.pm
index 91196cca..e74ddb90 100644
--- a/lib/PublicInbox/SearchView.pm
+++ b/lib/PublicInbox/SearchView.pm
@@ -122,7 +122,7 @@ sub mset_summary {
 		$min = $pct;
 
 		my $s = ascii_html($smsg->{subject});
-		my $f = ascii_html($smsg->{from_name});
+		my $f = ascii_html(delete $smsg->{from_name});
 		if ($obfs_ibx) {
 			obfuscate_addrs($obfs_ibx, $s);
 			obfuscate_addrs($obfs_ibx, $f);
diff --git a/lib/PublicInbox/Smsg.pm b/lib/PublicInbox/Smsg.pm
index fb28eff7..a2f54507 100644
--- a/lib/PublicInbox/Smsg.pm
+++ b/lib/PublicInbox/Smsg.pm
@@ -57,15 +57,12 @@ sub load_from_data ($$) {
 sub psgi_cull ($) {
 	my ($self) = @_;
 
-	# ghosts don't have ->{from}
-	my $from = delete($self->{from}) // '';
-	my @n = PublicInbox::Address::names($from);
-	$self->{from_name} = join(', ', @n);
-
 	# drop NNTP-only fields which aren't relevant to PSGI results:
 	# saves ~80K on a 200 item search result:
 	# TODO: we may need to keep some of these for JMAP...
-	delete @$self{qw(tid to cc bytes lines)};
+	my ($f) = delete @$self{qw(from tid to cc bytes lines)};
+	# ghosts don't have ->{from}
+	$self->{from_name} = join(', ', PublicInbox::Address::names($f // ''));
 	$self;
 }
 
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index a6944b80..116aa641 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -978,7 +978,7 @@ sub skel_dump { # walk_thread callback
 		$$skel .= delete($ctx->{sl_note}) || '';
 	}
 
-	my $f = ascii_html($smsg->{from_name});
+	my $f = ascii_html(delete $smsg->{from_name});
 	my $obfs_ibx = $ctx->{-obfs_ibx};
 	obfuscate_addrs($obfs_ibx, $f) if $obfs_ibx;
 

^ permalink raw reply related	[relevance 6%]

* [PATCH 2/2] www: fix ref cycle from threading w/ extindex
  2021-10-04  0:07  7% ` [PATCH 0/2] www: fix ref cycles when threading extindex Eric Wong
@ 2021-10-04  0:07  5%   ` Eric Wong
  0 siblings, 0 replies; 3+ results
From: Eric Wong @ 2021-10-04  0:07 UTC (permalink / raw)
  To: meta

Unlike v1 inboxes (which don't accept duplicate Message-IDs at
all), and v2 inboxes (which generate a new Message-ID for
duplicates), extindex must accept duplicate Message-IDs as-is.

This was fine for storage, but prevented the reference-cycle
mechanism of our message threading display algorithm from working
reliably.  It could no longer delete the ->{parent} field from
clobbered entries in the %id_table.

So we now take into account reused Message-IDs and never clobber
entries in %id_table.  Instead, we mark reused Message-IDs as
"imposters" and special-case them by injecting them as children
after all other threading is complete.

This cycle was noticed using a pre-release of Devel::Mwrap::PSGI:
  https://80x24.org/mwrap-perl.git
---
 lib/PublicInbox/SearchThread.pm | 104 +++++++++++++++++---------------
 t/thread-cycle.t                |  21 ++++++-
 2 files changed, 74 insertions(+), 51 deletions(-)

diff --git a/lib/PublicInbox/SearchThread.pm b/lib/PublicInbox/SearchThread.pm
index 8fb3a030aa72..507f25baab0e 100644
--- a/lib/PublicInbox/SearchThread.pm
+++ b/lib/PublicInbox/SearchThread.pm
@@ -24,70 +24,74 @@ use PublicInbox::MID qw($MID_EXTRACT);
 
 sub thread {
 	my ($msgs, $ordersub, $ctx) = @_;
+	my (%id_table, @imposters);
+	keys(%id_table) = scalar @$msgs; # pre-size
 
-	# A. put all current $msgs (non-ghosts) into %id_table
-	my %id_table = map {;
+	# A. put all current non-imposter $msgs (non-ghosts) into %id_table
+	# (imposters are messages with reused Message-IDs)
+	# Sadly, we sort here anyways since the fill-in-the-blanks References:
+	# can be shakier if somebody used In-Reply-To with multiple, disparate
+	# messages.  So, take the client Date: into account since we can't
+	# always determine ordering when somebody uses multiple In-Reply-To.
+	my @kids = sort { $a->{ds} <=> $b->{ds} } grep {
 		# this delete saves around 4K across 1K messages
 		# TODO: move this to a more appropriate place, breaks tests
 		# if we do it during psgi_cull
 		delete $_->{num};
 
-		$_->{mid} => PublicInbox::SearchThread::Msg::cast($_);
+		PublicInbox::SearchThread::Msg::cast($_);
+		if (exists $id_table{$_->{mid}}) {
+			$_->{children} = [];
+			push @imposters, $_; # we'll deal with them later
+			undef;
+		} else {
+			$id_table{$_->{mid}} = $_;
+			defined($_->{references});
+		}
 	} @$msgs;
+	for my $smsg (@kids) {
+		# This loop exists to help fill in gaps left from missing
+		# messages.  It is not needed in a perfect world where
+		# everything is perfectly referenced, only the last ref
+		# matters.
+		my $prev;
+		for my $ref ($smsg->{references} =~ m/$MID_EXTRACT/go) {
+			# Find a Container object for the given Message-ID
+			my $cont = $id_table{$ref} //=
+				PublicInbox::SearchThread::Msg::ghost($ref);
+
+			# Link the References field's Containers together in
+			# the order implied by the References header
+			#
+			# * If they are already linked don't change the
+			#   existing links
+			# * Do not add a link if adding that link would
+			#   introduce a loop...
+			if ($prev &&
+				!$cont->{parent} &&  # already linked
+				!$cont->has_descendent($prev) # would loop
+			   ) {
+				$prev->add_child($cont);
+			}
+			$prev = $cont;
+		}
 
-	# Sadly, we sort here anyways since the fill-in-the-blanks References:
-	# can be shakier if somebody used In-Reply-To with multiple, disparate
-	# messages.  So, take the client Date: into account since we can't
-	# always determine ordering when somebody uses multiple In-Reply-To.
-	# We'll trust the client Date: header here instead of the Received:
-	# time since this is for display (and not retrieval)
-	_set_parent(\%id_table, $_) for sort { $a->{ds} <=> $b->{ds} } @$msgs;
+		# C. Set the parent of this message to be the last element in
+		# References.
+		if (defined $prev && !$smsg->has_descendent($prev)) {
+			$prev->add_child($smsg);
+		}
+	}
 	my $ibx = $ctx->{ibx};
-	my $rootset = [ grep {
+	my $rootset = [ grep { # n.b.: delete prevents cyclic refs
 			!delete($_->{parent}) && $_->visible($ibx)
 		} values %id_table ];
 	$rootset = $ordersub->($rootset);
 	$_->order_children($ordersub, $ctx) for @$rootset;
-	$rootset;
-}
 
-sub _set_parent ($$) {
-	my ($id_table, $this) = @_;
-
-	# B. For each element in the message's References field:
-	defined(my $refs = $this->{references}) or return;
-
-	# This loop exists to help fill in gaps left from missing
-	# messages.  It is not needed in a perfect world where
-	# everything is perfectly referenced, only the last ref
-	# matters.
-	my $prev;
-	foreach my $ref ($refs =~ m/$MID_EXTRACT/go) {
-		# Find a Container object for the given Message-ID
-		my $cont = $id_table->{$ref} //=
-			PublicInbox::SearchThread::Msg::ghost($ref);
-
-		# Link the References field's Containers together in
-		# the order implied by the References header
-		#
-		# * If they are already linked don't change the
-		#   existing links
-		# * Do not add a link if adding that link would
-		#   introduce a loop...
-		if ($prev &&
-			!$cont->{parent} &&  # already linked
-			!$cont->has_descendent($prev) # would loop
-		   ) {
-			$prev->add_child($cont);
-		}
-		$prev = $cont;
-	}
-
-	# C. Set the parent of this message to be the last element in
-	# References.
-	if (defined $prev && !$this->has_descendent($prev)) { # would loop
-		$prev->add_child($this);
-	}
+	# parent imposter messages with reused Message-IDs
+	unshift(@{$id_table{$_->{mid}}->{children}}, $_) for @imposters;
+	$rootset;
 }
 
 package PublicInbox::SearchThread::Msg;
diff --git a/t/thread-cycle.t b/t/thread-cycle.t
index 4b47c01c37c1..e89b18464a5f 100644
--- a/t/thread-cycle.t
+++ b/t/thread-cycle.t
@@ -96,7 +96,26 @@ if ('sorting by Date') {
 	is("\n".$backward, "\n".$forward, 'forward and backward matches');
 }
 
-done_testing();
+SKIP: {
+	require_mods 'Devel::Cycle', 1;
+	Devel::Cycle->import('find_cycle');
+	my @dup = (
+		{ mid => 5, references => '<6>' },
+		{ mid => 5, references => '<6> <1>' },
+	);
+	open my $fh, '+>', \(my $out = '') or xbail "open: $!";
+	(undef, $smsgs) = $make_objs->(@dup);
+	eval 'package EmptyInbox; sub smsg_by_mid { undef }';
+	my $ctx = { ibx => bless {}, 'EmptyInbox' };
+	my $rootset = PublicInbox::SearchThread::thread($smsgs, sub {
+		[ sort { $a->{mid} cmp $b->{mid} } @{$_[0]} ] }, $ctx);
+	my $oldout = select $fh;
+	find_cycle($rootset);
+	select $oldout;
+	is($out, '', 'nothing from find_cycle');
+} # Devel::Cycle check
+
+done_testing;
 
 sub thread_to_s {
 	my ($msgs) = @_;

^ permalink raw reply related	[relevance 5%]

* [PATCH 0/2] www: fix ref cycles when threading extindex
  @ 2021-10-04  0:07  7% ` Eric Wong
  2021-10-04  0:07  5%   ` [PATCH 2/2] www: fix ref cycle from threading w/ extindex Eric Wong
  0 siblings, 1 reply; 3+ results
From: Eric Wong @ 2021-10-04  0:07 UTC (permalink / raw)
  To: meta

I finally got Devel::Mwrap::PSGI working and fixed a
long-standing reference cycle.  AFAIK, it only affects
extindex users, not v2, and definitely not v1.

Eric Wong (2):
  t/thread-cycle: make Email::Simple optional
  www: fix ref cycle from threading w/ extindex

 lib/PublicInbox/SearchThread.pm | 104 +++++++++++++++++---------------
 t/thread-cycle.t                |  50 ++++++++++-----
 2 files changed, 88 insertions(+), 66 deletions(-)

^ permalink raw reply	[relevance 7%]

Results 1-3 of 3 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2021-09-04 23:53     httpd memory usage? Eric Wong
2021-10-04  0:07  7% ` [PATCH 0/2] www: fix ref cycles when threading extindex Eric Wong
2021-10-04  0:07  5%   ` [PATCH 2/2] www: fix ref cycle from threading w/ extindex Eric Wong
2021-10-09 12:03     [PATCH 0/4] WWW-related memory savings Eric Wong
2021-10-09 12:03  6% ` [PATCH 4/4] view: save memory by dropping smsg->{from_name} on use Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).