about summary refs log tree commit homepage
path: root/lib/PublicInbox/Config.pm
diff options
context:
space:
mode:
authorEric Wong <e@yhbt.net>2020-08-07 01:14:04 +0000
committerEric Wong <e@yhbt.net>2020-08-07 23:45:38 +0000
commit06a2418fd053c9a5b80217e74d1b47b8e1ca85e1 (patch)
tree37dc120e64b6f2114164a3e4d2d358373b1b1eb5 /lib/PublicInbox/Config.pm
parent32f6a1f9498f759041b72d6f4d5cb959088a3dec (diff)
downloadpublic-inbox-06a2418fd053c9a5b80217e74d1b47b8e1ca85e1.tar.gz
This gives better page cache utilization for Xapian indexing on
slow storage by improving locality for random I/O activity on
the Xapian DB.

Instead of doing a single-pass to index both SQLite and Xapian;
this indexes them separately.  The first pass is identical to
indexlevel=basic: it indexes both over.sqlite3 and msgmap.sqlite3.

Subsequent passes only operate on a single Xapian shard for
documents belonging to that shard.  Given enough shards, each
individual shard can be made small enough to fit into the kernel
page cache and avoid HDD seeks for read activity.

Doing rough tests with a busy system with a 7200 RPM HDD with ext4,
full indexing of LKML (9 epochs) goes from ~80 hours (-j0) to
~30 hours (-j8) with 16GB RAM with 7 shards configured and fsync(2)
disabled (--no-sync) and `--batch-size=10m'.
Diffstat (limited to 'lib/PublicInbox/Config.pm')
-rw-r--r--lib/PublicInbox/Config.pm9
1 files changed, 5 insertions, 4 deletions
diff --git a/lib/PublicInbox/Config.pm b/lib/PublicInbox/Config.pm
index 67199bb3..f9184bd2 100644
--- a/lib/PublicInbox/Config.pm
+++ b/lib/PublicInbox/Config.pm
@@ -369,8 +369,8 @@ sub _fill_code_repo {
         $git;
 }
 
-sub _git_config_bool ($) {
-        my ($val) = @_;
+sub git_bool {
+        my ($val) = $_[-1]; # $_[0] may be $self, or $val
         if ($val =~ /\A(?:false|no|off|[\-\+]?(?:0x)?0+)\z/i) {
                 0;
         } elsif ($val =~ /\A(?:true|yes|on|[\-\+]?(?:0x)?[0-9]+)\z/i) {
@@ -386,7 +386,8 @@ sub _fill {
 
         foreach my $k (qw(inboxdir filter newsgroup
                         watch httpbackendmax
-                        replyto feedmax nntpserver indexlevel)) {
+                        replyto feedmax nntpserver
+                        indexlevel indexsequentialshard)) {
                 my $v = $self->{"$pfx.$k"};
                 $ibx->{$k} = $v if defined $v;
         }
@@ -400,7 +401,7 @@ sub _fill {
         foreach my $k (qw(obfuscate)) {
                 my $v = $self->{"$pfx.$k"};
                 defined $v or next;
-                if (defined(my $bval = _git_config_bool($v))) {
+                if (defined(my $bval = git_bool($v))) {
                         $ibx->{$k} = $bval;
                 } else {
                         warn "Ignoring $pfx.$k=$v in config, not boolean\n";