user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* [PATCH 2/4] codesearch: deduplicate {ibx_score} name pairs
  2024-03-11 19:40  7% [PATCH 0/4] memory reductions for WWW + solver Eric Wong
@ 2024-03-11 19:40  7% ` Eric Wong
  0 siblings, 0 replies; 2+ results
From: Eric Wong @ 2024-03-11 19:40 UTC (permalink / raw)
  To: meta

With my current mirror of lore + gko, this saves over 300K
allocations and brings the allocation count in this area down
to under 5K.  The reduction in AV refs saves around 45MB RAM
according to measurements done live via Devel::Mwrap.
---
 lib/PublicInbox/CodeSearch.pm | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/lib/PublicInbox/CodeSearch.pm b/lib/PublicInbox/CodeSearch.pm
index 1f95a726..48033bb5 100644
--- a/lib/PublicInbox/CodeSearch.pm
+++ b/lib/PublicInbox/CodeSearch.pm
@@ -292,6 +292,7 @@ W: cindex.$name.topdir=$self->{topdir} has no usable join data for $cfg_f
 EOM
 	my ($ekeys, $roots, $ibx2root) = @$jd{qw(ekeys roots ibx2root)};
 	my $roots2paths = roots2paths($self);
+	my %dedupe; # 50x alloc reduction w/ lore + gko mirror (Mar 2024)
 	for my $root_offs (@$ibx2root) {
 		my $ekey = shift(@$ekeys) // die 'BUG: {ekeys} empty';
 		scalar(@$root_offs) or next;
@@ -320,9 +321,15 @@ EOM
 				if (my $git = $dir2cr{$_}) {
 					$ibx_p2g{$_} = $git;
 					$ibx2self = 1;
-					$ibx->{-hide_www} or
-						push @{$git->{ibx_score}},
+					if (!$ibx->{-hide_www}) {
+						# don't stringify $nr directly
+						# to avoid long-lived PV
+						my $k = ($nr + 0)."\0".
+							($ibx + 0);
+						my $s = $dedupe{$k} //=
 							[ $nr, $ibx->{name} ];
+						push @{$git->{ibx_score}}, $s;
+					}
 					push @$gits, $git;
 				} else {
 					warn <<EOM;

^ permalink raw reply related	[relevance 7%]

* [PATCH 0/4] memory reductions for WWW + solver
@ 2024-03-11 19:40  7% Eric Wong
  2024-03-11 19:40  7% ` [PATCH 2/4] codesearch: deduplicate {ibx_score} name pairs Eric Wong
  0 siblings, 1 reply; 2+ results
From: Eric Wong @ 2024-03-11 19:40 UTC (permalink / raw)
  To: meta

1/4 gets rid of some overload caused by parallel solver
invocations under heavy (likely bot) traffic crawling
yhbt.net/lore with many coderepos enabled and joined
to inboxes.

2/4 is a large reduction in allocations from loading
coderepo <=> inbox associations, 4/4 is smaller.
I found 2/4 with Devel::Mwrap and noticed 4/4 while
working on 2/4.

3/4 is just a doc update but I've been successfully using
jemalloc on my lore+gko mirror for a week or two, now
(and I plan to experiment with making glibc||dlmalloc more
resistant to fragmentation)

Eric Wong (4):
  www: use a dedicated limiter for blob solver
  codesearch: deduplicate {ibx_score} name pairs
  doc: tuning: note reduced fragmentation w/ jemalloc
  codesearch: deduplicate $git->{nick} field

 Documentation/public-inbox-tuning.pod |  5 +++
 examples/public-inbox-netd@.service   |  2 ++
 lib/PublicInbox/CodeSearch.pm         | 14 ++++++--
 lib/PublicInbox/SolverGit.pm          | 15 +++++----
 lib/PublicInbox/ViewVCS.pm            | 48 ++++++++++++++++++++++-----
 5 files changed, 66 insertions(+), 18 deletions(-)

^ permalink raw reply	[relevance 7%]

Results 1-2 of 2 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2024-03-11 19:40  7% [PATCH 0/4] memory reductions for WWW + solver Eric Wong
2024-03-11 19:40  7% ` [PATCH 2/4] codesearch: deduplicate {ibx_score} name pairs Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).