* [PATCH 2/4] codesearch: deduplicate {ibx_score} name pairs
2024-03-11 19:40 7% [PATCH 0/4] memory reductions for WWW + solver Eric Wong
@ 2024-03-11 19:40 7% ` Eric Wong
0 siblings, 0 replies; 2+ results
From: Eric Wong @ 2024-03-11 19:40 UTC (permalink / raw)
To: meta
With my current mirror of lore + gko, this saves over 300K
allocations and brings the allocation count in this area down
to under 5K. The reduction in AV refs saves around 45MB RAM
according to measurements done live via Devel::Mwrap.
---
lib/PublicInbox/CodeSearch.pm | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/lib/PublicInbox/CodeSearch.pm b/lib/PublicInbox/CodeSearch.pm
index 1f95a726..48033bb5 100644
--- a/lib/PublicInbox/CodeSearch.pm
+++ b/lib/PublicInbox/CodeSearch.pm
@@ -292,6 +292,7 @@ W: cindex.$name.topdir=$self->{topdir} has no usable join data for $cfg_f
EOM
my ($ekeys, $roots, $ibx2root) = @$jd{qw(ekeys roots ibx2root)};
my $roots2paths = roots2paths($self);
+ my %dedupe; # 50x alloc reduction w/ lore + gko mirror (Mar 2024)
for my $root_offs (@$ibx2root) {
my $ekey = shift(@$ekeys) // die 'BUG: {ekeys} empty';
scalar(@$root_offs) or next;
@@ -320,9 +321,15 @@ EOM
if (my $git = $dir2cr{$_}) {
$ibx_p2g{$_} = $git;
$ibx2self = 1;
- $ibx->{-hide_www} or
- push @{$git->{ibx_score}},
+ if (!$ibx->{-hide_www}) {
+ # don't stringify $nr directly
+ # to avoid long-lived PV
+ my $k = ($nr + 0)."\0".
+ ($ibx + 0);
+ my $s = $dedupe{$k} //=
[ $nr, $ibx->{name} ];
+ push @{$git->{ibx_score}}, $s;
+ }
push @$gits, $git;
} else {
warn <<EOM;
^ permalink raw reply related [relevance 7%]
* [PATCH 0/4] memory reductions for WWW + solver
@ 2024-03-11 19:40 7% Eric Wong
2024-03-11 19:40 7% ` [PATCH 2/4] codesearch: deduplicate {ibx_score} name pairs Eric Wong
0 siblings, 1 reply; 2+ results
From: Eric Wong @ 2024-03-11 19:40 UTC (permalink / raw)
To: meta
1/4 gets rid of some overload caused by parallel solver
invocations under heavy (likely bot) traffic crawling
yhbt.net/lore with many coderepos enabled and joined
to inboxes.
2/4 is a large reduction in allocations from loading
coderepo <=> inbox associations, 4/4 is smaller.
I found 2/4 with Devel::Mwrap and noticed 4/4 while
working on 2/4.
3/4 is just a doc update but I've been successfully using
jemalloc on my lore+gko mirror for a week or two, now
(and I plan to experiment with making glibc||dlmalloc more
resistant to fragmentation)
Eric Wong (4):
www: use a dedicated limiter for blob solver
codesearch: deduplicate {ibx_score} name pairs
doc: tuning: note reduced fragmentation w/ jemalloc
codesearch: deduplicate $git->{nick} field
Documentation/public-inbox-tuning.pod | 5 +++
examples/public-inbox-netd@.service | 2 ++
lib/PublicInbox/CodeSearch.pm | 14 ++++++--
lib/PublicInbox/SolverGit.pm | 15 +++++----
lib/PublicInbox/ViewVCS.pm | 48 ++++++++++++++++++++++-----
5 files changed, 66 insertions(+), 18 deletions(-)
^ permalink raw reply [relevance 7%]
Results 1-2 of 2 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2024-03-11 19:40 7% [PATCH 0/4] memory reductions for WWW + solver Eric Wong
2024-03-11 19:40 7% ` [PATCH 2/4] codesearch: deduplicate {ibx_score} name pairs Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).