* [PATCH 4/4] view: save memory by dropping smsg->{from_name} on use
@ 2021-10-09 12:03 6% ` Eric Wong
0 siblings, 0 replies; 3+ results
From: Eric Wong @ 2021-10-09 12:03 UTC (permalink / raw)
To: meta
We'll also save a few LoC when generating it. $smsg objects can
linger a while when rendering large threads, so saving a few
bytes here can add up to several hundred KB saved.
I noticed this while chasing the ref cycle leak in commit
b28e74c9dc0a (www: fix ref cycle from threading w/ extindex, 2021-10-03).
While there's no longer a leak, releasing memory earlier can
allow it to be reused sooner and reduce both memory traffic and
memory pressure.
---
lib/PublicInbox/SearchView.pm | 2 +-
lib/PublicInbox/Smsg.pm | 9 +++------
lib/PublicInbox/View.pm | 2 +-
3 files changed, 5 insertions(+), 8 deletions(-)
diff --git a/lib/PublicInbox/SearchView.pm b/lib/PublicInbox/SearchView.pm
index 91196cca..e74ddb90 100644
--- a/lib/PublicInbox/SearchView.pm
+++ b/lib/PublicInbox/SearchView.pm
@@ -122,7 +122,7 @@ sub mset_summary {
$min = $pct;
my $s = ascii_html($smsg->{subject});
- my $f = ascii_html($smsg->{from_name});
+ my $f = ascii_html(delete $smsg->{from_name});
if ($obfs_ibx) {
obfuscate_addrs($obfs_ibx, $s);
obfuscate_addrs($obfs_ibx, $f);
diff --git a/lib/PublicInbox/Smsg.pm b/lib/PublicInbox/Smsg.pm
index fb28eff7..a2f54507 100644
--- a/lib/PublicInbox/Smsg.pm
+++ b/lib/PublicInbox/Smsg.pm
@@ -57,15 +57,12 @@ sub load_from_data ($$) {
sub psgi_cull ($) {
my ($self) = @_;
- # ghosts don't have ->{from}
- my $from = delete($self->{from}) // '';
- my @n = PublicInbox::Address::names($from);
- $self->{from_name} = join(', ', @n);
-
# drop NNTP-only fields which aren't relevant to PSGI results:
# saves ~80K on a 200 item search result:
# TODO: we may need to keep some of these for JMAP...
- delete @$self{qw(tid to cc bytes lines)};
+ my ($f) = delete @$self{qw(from tid to cc bytes lines)};
+ # ghosts don't have ->{from}
+ $self->{from_name} = join(', ', PublicInbox::Address::names($f // ''));
$self;
}
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index a6944b80..116aa641 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -978,7 +978,7 @@ sub skel_dump { # walk_thread callback
$$skel .= delete($ctx->{sl_note}) || '';
}
- my $f = ascii_html($smsg->{from_name});
+ my $f = ascii_html(delete $smsg->{from_name});
my $obfs_ibx = $ctx->{-obfs_ibx};
obfuscate_addrs($obfs_ibx, $f) if $obfs_ibx;
^ permalink raw reply related [relevance 6%]
* [PATCH 2/2] www: fix ref cycle from threading w/ extindex
2021-10-04 0:07 7% ` [PATCH 0/2] www: fix ref cycles when threading extindex Eric Wong
@ 2021-10-04 0:07 5% ` Eric Wong
0 siblings, 0 replies; 3+ results
From: Eric Wong @ 2021-10-04 0:07 UTC (permalink / raw)
To: meta
Unlike v1 inboxes (which don't accept duplicate Message-IDs at
all), and v2 inboxes (which generate a new Message-ID for
duplicates), extindex must accept duplicate Message-IDs as-is.
This was fine for storage, but prevented the reference-cycle
mechanism of our message threading display algorithm from working
reliably. It could no longer delete the ->{parent} field from
clobbered entries in the %id_table.
So we now take into account reused Message-IDs and never clobber
entries in %id_table. Instead, we mark reused Message-IDs as
"imposters" and special-case them by injecting them as children
after all other threading is complete.
This cycle was noticed using a pre-release of Devel::Mwrap::PSGI:
https://80x24.org/mwrap-perl.git
---
lib/PublicInbox/SearchThread.pm | 104 +++++++++++++++++---------------
t/thread-cycle.t | 21 ++++++-
2 files changed, 74 insertions(+), 51 deletions(-)
diff --git a/lib/PublicInbox/SearchThread.pm b/lib/PublicInbox/SearchThread.pm
index 8fb3a030aa72..507f25baab0e 100644
--- a/lib/PublicInbox/SearchThread.pm
+++ b/lib/PublicInbox/SearchThread.pm
@@ -24,70 +24,74 @@ use PublicInbox::MID qw($MID_EXTRACT);
sub thread {
my ($msgs, $ordersub, $ctx) = @_;
+ my (%id_table, @imposters);
+ keys(%id_table) = scalar @$msgs; # pre-size
- # A. put all current $msgs (non-ghosts) into %id_table
- my %id_table = map {;
+ # A. put all current non-imposter $msgs (non-ghosts) into %id_table
+ # (imposters are messages with reused Message-IDs)
+ # Sadly, we sort here anyways since the fill-in-the-blanks References:
+ # can be shakier if somebody used In-Reply-To with multiple, disparate
+ # messages. So, take the client Date: into account since we can't
+ # always determine ordering when somebody uses multiple In-Reply-To.
+ my @kids = sort { $a->{ds} <=> $b->{ds} } grep {
# this delete saves around 4K across 1K messages
# TODO: move this to a more appropriate place, breaks tests
# if we do it during psgi_cull
delete $_->{num};
- $_->{mid} => PublicInbox::SearchThread::Msg::cast($_);
+ PublicInbox::SearchThread::Msg::cast($_);
+ if (exists $id_table{$_->{mid}}) {
+ $_->{children} = [];
+ push @imposters, $_; # we'll deal with them later
+ undef;
+ } else {
+ $id_table{$_->{mid}} = $_;
+ defined($_->{references});
+ }
} @$msgs;
+ for my $smsg (@kids) {
+ # This loop exists to help fill in gaps left from missing
+ # messages. It is not needed in a perfect world where
+ # everything is perfectly referenced, only the last ref
+ # matters.
+ my $prev;
+ for my $ref ($smsg->{references} =~ m/$MID_EXTRACT/go) {
+ # Find a Container object for the given Message-ID
+ my $cont = $id_table{$ref} //=
+ PublicInbox::SearchThread::Msg::ghost($ref);
+
+ # Link the References field's Containers together in
+ # the order implied by the References header
+ #
+ # * If they are already linked don't change the
+ # existing links
+ # * Do not add a link if adding that link would
+ # introduce a loop...
+ if ($prev &&
+ !$cont->{parent} && # already linked
+ !$cont->has_descendent($prev) # would loop
+ ) {
+ $prev->add_child($cont);
+ }
+ $prev = $cont;
+ }
- # Sadly, we sort here anyways since the fill-in-the-blanks References:
- # can be shakier if somebody used In-Reply-To with multiple, disparate
- # messages. So, take the client Date: into account since we can't
- # always determine ordering when somebody uses multiple In-Reply-To.
- # We'll trust the client Date: header here instead of the Received:
- # time since this is for display (and not retrieval)
- _set_parent(\%id_table, $_) for sort { $a->{ds} <=> $b->{ds} } @$msgs;
+ # C. Set the parent of this message to be the last element in
+ # References.
+ if (defined $prev && !$smsg->has_descendent($prev)) {
+ $prev->add_child($smsg);
+ }
+ }
my $ibx = $ctx->{ibx};
- my $rootset = [ grep {
+ my $rootset = [ grep { # n.b.: delete prevents cyclic refs
!delete($_->{parent}) && $_->visible($ibx)
} values %id_table ];
$rootset = $ordersub->($rootset);
$_->order_children($ordersub, $ctx) for @$rootset;
- $rootset;
-}
-sub _set_parent ($$) {
- my ($id_table, $this) = @_;
-
- # B. For each element in the message's References field:
- defined(my $refs = $this->{references}) or return;
-
- # This loop exists to help fill in gaps left from missing
- # messages. It is not needed in a perfect world where
- # everything is perfectly referenced, only the last ref
- # matters.
- my $prev;
- foreach my $ref ($refs =~ m/$MID_EXTRACT/go) {
- # Find a Container object for the given Message-ID
- my $cont = $id_table->{$ref} //=
- PublicInbox::SearchThread::Msg::ghost($ref);
-
- # Link the References field's Containers together in
- # the order implied by the References header
- #
- # * If they are already linked don't change the
- # existing links
- # * Do not add a link if adding that link would
- # introduce a loop...
- if ($prev &&
- !$cont->{parent} && # already linked
- !$cont->has_descendent($prev) # would loop
- ) {
- $prev->add_child($cont);
- }
- $prev = $cont;
- }
-
- # C. Set the parent of this message to be the last element in
- # References.
- if (defined $prev && !$this->has_descendent($prev)) { # would loop
- $prev->add_child($this);
- }
+ # parent imposter messages with reused Message-IDs
+ unshift(@{$id_table{$_->{mid}}->{children}}, $_) for @imposters;
+ $rootset;
}
package PublicInbox::SearchThread::Msg;
diff --git a/t/thread-cycle.t b/t/thread-cycle.t
index 4b47c01c37c1..e89b18464a5f 100644
--- a/t/thread-cycle.t
+++ b/t/thread-cycle.t
@@ -96,7 +96,26 @@ if ('sorting by Date') {
is("\n".$backward, "\n".$forward, 'forward and backward matches');
}
-done_testing();
+SKIP: {
+ require_mods 'Devel::Cycle', 1;
+ Devel::Cycle->import('find_cycle');
+ my @dup = (
+ { mid => 5, references => '<6>' },
+ { mid => 5, references => '<6> <1>' },
+ );
+ open my $fh, '+>', \(my $out = '') or xbail "open: $!";
+ (undef, $smsgs) = $make_objs->(@dup);
+ eval 'package EmptyInbox; sub smsg_by_mid { undef }';
+ my $ctx = { ibx => bless {}, 'EmptyInbox' };
+ my $rootset = PublicInbox::SearchThread::thread($smsgs, sub {
+ [ sort { $a->{mid} cmp $b->{mid} } @{$_[0]} ] }, $ctx);
+ my $oldout = select $fh;
+ find_cycle($rootset);
+ select $oldout;
+ is($out, '', 'nothing from find_cycle');
+} # Devel::Cycle check
+
+done_testing;
sub thread_to_s {
my ($msgs) = @_;
^ permalink raw reply related [relevance 5%]
* [PATCH 0/2] www: fix ref cycles when threading extindex
@ 2021-10-04 0:07 7% ` Eric Wong
2021-10-04 0:07 5% ` [PATCH 2/2] www: fix ref cycle from threading w/ extindex Eric Wong
0 siblings, 1 reply; 3+ results
From: Eric Wong @ 2021-10-04 0:07 UTC (permalink / raw)
To: meta
I finally got Devel::Mwrap::PSGI working and fixed a
long-standing reference cycle. AFAIK, it only affects
extindex users, not v2, and definitely not v1.
Eric Wong (2):
t/thread-cycle: make Email::Simple optional
www: fix ref cycle from threading w/ extindex
lib/PublicInbox/SearchThread.pm | 104 +++++++++++++++++---------------
t/thread-cycle.t | 50 ++++++++++-----
2 files changed, 88 insertions(+), 66 deletions(-)
^ permalink raw reply [relevance 7%]
Results 1-3 of 3 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2021-09-04 23:53 httpd memory usage? Eric Wong
2021-10-04 0:07 7% ` [PATCH 0/2] www: fix ref cycles when threading extindex Eric Wong
2021-10-04 0:07 5% ` [PATCH 2/2] www: fix ref cycle from threading w/ extindex Eric Wong
2021-10-09 12:03 [PATCH 0/4] WWW-related memory savings Eric Wong
2021-10-09 12:03 6% ` [PATCH 4/4] view: save memory by dropping smsg->{from_name} on use Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).