* [PATCH] extindex: support --jobs/-j properly on creation for shard count
@ 2021-07-25 12:44 Eric Wong
0 siblings, 0 replies; only message in thread
From: Eric Wong @ 2021-07-25 12:44 UTC (permalink / raw)
To: meta
This wasn't wired up properly, but Xapian appears to suffer from
I/O amplification problems as DB shards get larger:
https://lists.xapian.org/pipermail/xapian-discuss/2019-February/009727.html
<23640.32170.703368.841021@y.dockes.com>
Of course, we shouldn't have too many shards, either; because
performance problems with too many shards was the entire reason
extindex was created:
https://lists.xapian.org/pipermail/xapian-discuss/2020-August/009823.html
<20200826064728.GA32239@dcvr>
---
lib/PublicInbox/ExtSearchIdx.pm | 3 ++-
t/extsearch.t | 11 +++++++++++
2 files changed, 13 insertions(+), 1 deletion(-)
diff --git a/lib/PublicInbox/ExtSearchIdx.pm b/lib/PublicInbox/ExtSearchIdx.pm
index 357312b8..ad56c0d5 100644
--- a/lib/PublicInbox/ExtSearchIdx.pm
+++ b/lib/PublicInbox/ExtSearchIdx.pm
@@ -52,7 +52,8 @@ sub new {
parallel => 1,
lock_path => "$dir/ei.lock",
}, __PACKAGE__;
- $self->{shards} = $self->count_shards || nproc_shards($opt->{creat});
+ $self->{shards} = $self->count_shards ||
+ nproc_shards({ nproc => $opt->{jobs} });
my $oidx = PublicInbox::OverIdx->new("$self->{xpfx}/over.sqlite3");
$self->{-no_fsync} = $oidx->{-no_fsync} = 1 if !$opt->{fsync};
$self->{oidx} = $oidx;
diff --git a/t/extsearch.t b/t/extsearch.t
index 46a6f2ec..1f62e80c 100644
--- a/t/extsearch.t
+++ b/t/extsearch.t
@@ -411,4 +411,15 @@ if ('dedupe + dry-run') {
'--dry-run alone fails');
}
+for my $j (1, 3, 6) {
+ my $o = { 2 => \(my $err = '') };
+ my $d = "$home/extindex-j$j";
+ ok(run_script(['-extindex', "-j$j", '--all', $d], undef, $o),
+ "init with -j$j");
+ my $max = $j - 2;
+ $max = 0 if $max < 0;
+ my @dirs = glob("$d/ei*/?");
+ like($dirs[-1], qr!/ei[0-9]+/$max\z!, '-j works');
+}
+
done_testing;
^ permalink raw reply related [flat|nested] only message in thread
only message in thread, other threads:[~2021-07-25 12:44 UTC | newest]
Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-07-25 12:44 [PATCH] extindex: support --jobs/-j properly on creation for shard count Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).