* [PATCH 1/4] v2writable: use a smaller default for Xapian partitions
2019-06-14 3:03 5% [PATCH 0/4] xcpdb: support resharding Xapian DBs Eric Wong
@ 2019-06-14 3:03 7% ` Eric Wong
0 siblings, 0 replies; 3+ results
From: Eric Wong @ 2019-06-14 3:03 UTC (permalink / raw)
To: meta
Apparently 16 CPUs (probably HT) and SATA storage is common
these days. Having excessive Xapian partitions leads to
contention and excessive FD/space use. So set a smaller
default but continue allowing user-specified values to bump
this up.
---
lib/PublicInbox/V2Writable.pm | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index a8c33ef..c504651 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -23,7 +23,14 @@ use IO::Handle;
# an estimate of the post-packed size to the raw uncompressed size
my $PACKING_FACTOR = 0.4;
-# assume 2 cores if GNU nproc(1) is not available
+# SATA storage lags behind what CPUs are capable of, so relying on
+# nproc(1) can be misleading and having extra Xapian partions is a
+# waste of FDs and space. It can also lead to excessive IO latency
+# and slow things down. Users on NVME or other fast storage can
+# use the NPROC env or switches in our script/public-inbox-* programs
+# to increase Xapian partitions.
+our $NPROC_MAX_DEFAULT = 4;
+
sub nproc_parts ($) {
my ($creat_opt) = @_;
if (ref($creat_opt) eq 'HASH') {
@@ -32,7 +39,14 @@ sub nproc_parts ($) {
}
}
- my $n = int($ENV{NPROC} || `nproc 2>/dev/null` || 2);
+ my $n = $ENV{NPROC};
+ if (!$n) {
+ chomp($n = `nproc 2>/dev/null`);
+ # assume 2 cores if GNU nproc(1) is not available
+ $n = 2 if !$n;
+ $n = $NPROC_MAX_DEFAULT if $NPROC_MAX_DEFAULT > 4;
+ }
+
# subtract for the main process and git-fast-import
$n -= 1;
$n < 1 ? 1 : $n;
--
EW
^ permalink raw reply related [relevance 7%]
* [PATCH 0/4] xcpdb: support resharding Xapian DBs
@ 2019-06-14 3:03 5% Eric Wong
2019-06-14 3:03 7% ` [PATCH 1/4] v2writable: use a smaller default for Xapian partitions Eric Wong
0 siblings, 1 reply; 3+ results
From: Eric Wong @ 2019-06-14 3:03 UTC (permalink / raw)
To: meta
Defaulting the number of Xapian shards based on the number
of CPUs can be detrimental to performance given the lack of
speed in common storage systems; since NVMe speeds are not
yet common.
To help public-inbox users recover from this inefficiency while
allowing continuous email archival, we can support arbitrary
resharding to have fewer shards (or more, if doing HW upgrades).
Note: I'm also going to move the documentation towards using the
word "shard" (instead of "partition") to be consistent with
current Xapian documentation (1.4+, and "master").
Xapian 1.2 did not use the word "shard" at all, but IME from my
interactions with non-Xapian search engine folks, the word
"shard" is pretty common.
Eric Wong (4):
v2writable: use a smaller default for Xapian partitions
xapcmd: preserve indexlevel based on the destination
xcpdb: use destination shard as progress prefix
xcpdb: support resharding v2 repos
Documentation/public-inbox-xcpdb.pod | 11 ++
MANIFEST | 1 +
lib/PublicInbox/V2Writable.pm | 18 ++-
lib/PublicInbox/Xapcmd.pm | 222 +++++++++++++++++++++------
script/public-inbox-xcpdb | 4 +-
t/xcpdb-reshard.t | 83 ++++++++++
6 files changed, 286 insertions(+), 53 deletions(-)
create mode 100644 t/xcpdb-reshard.t
--
EW
^ permalink raw reply [relevance 5%]
* [RFC] v2writable: use a smaller default for Xapian partitions
@ 2019-06-12 16:50 7% Eric Wong
0 siblings, 0 replies; 3+ results
From: Eric Wong @ 2019-06-12 16:50 UTC (permalink / raw)
To: Konstantin Ryabitsev; +Cc: meta
Apparently 16 CPUs (probably HT) and SATA storage is common
these days. Having excessive Xapian partitions leads to
contention and excessive FD/space use. So set a smaller
default but continue allowing user-specified values to bump
this up.
---
I noticed korg had lots of partitions, which seems like
overkill and wastes FDs, at least. repartitioning will
be a different step.
lib/PublicInbox/V2Writable.pm | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index a8c33ef..c504651 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -23,7 +23,14 @@ use IO::Handle;
# an estimate of the post-packed size to the raw uncompressed size
my $PACKING_FACTOR = 0.4;
-# assume 2 cores if GNU nproc(1) is not available
+# SATA storage lags behind what CPUs are capable of, so relying on
+# nproc(1) can be misleading and having extra Xapian partions is a
+# waste of FDs and space. It can also lead to excessive IO latency
+# and slow things down. Users on NVME or other fast storage can
+# use the NPROC env or switches in our script/public-inbox-* programs
+# to increase Xapian partitions.
+our $NPROC_MAX_DEFAULT = 4;
+
sub nproc_parts ($) {
my ($creat_opt) = @_;
if (ref($creat_opt) eq 'HASH') {
@@ -32,7 +39,14 @@ sub nproc_parts ($) {
}
}
- my $n = int($ENV{NPROC} || `nproc 2>/dev/null` || 2);
+ my $n = $ENV{NPROC};
+ if (!$n) {
+ chomp($n = `nproc 2>/dev/null`);
+ # assume 2 cores if GNU nproc(1) is not available
+ $n = 2 if !$n;
+ $n = $NPROC_MAX_DEFAULT if $NPROC_MAX_DEFAULT > 4;
+ }
+
# subtract for the main process and git-fast-import
$n -= 1;
$n < 1 ? 1 : $n;
--
EW
^ permalink raw reply related [relevance 7%]
Results 1-3 of 3 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2019-06-12 16:50 7% [RFC] v2writable: use a smaller default for Xapian partitions Eric Wong
2019-06-14 3:03 5% [PATCH 0/4] xcpdb: support resharding Xapian DBs Eric Wong
2019-06-14 3:03 7% ` [PATCH 1/4] v2writable: use a smaller default for Xapian partitions Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).