diff options
author | Eric Wong <e@80x24.org> | 2020-12-15 02:02:16 +0000 |
---|---|---|
committer | Eric Wong <e@80x24.org> | 2020-12-17 19:12:53 +0000 |
commit | 7281c5c492f9d6bbd585da9f061d19819d952352 (patch) | |
tree | 947228cf3f08bc6ae8874d99936fcef457096282 /lib/PublicInbox/V2Writable.pm | |
parent | d26c2837f479b41182946a6540aad95d34b2b594 (diff) | |
download | public-inbox-7281c5c492f9d6bbd585da9f061d19819d952352.tar.gz |
--reindex allows us to catch missed and stale messages due to -extindex vs -index races prior to commit 02b2fcc46f364b51 ("extsearchidx: enforce -index before -extindex"). We'll also rely on reindex to internally deal with v1/v2 inbox removals and partial-unindexing of messages which are only removed from one inbox out of many. This reindex design is completely different than how normal v1/v2 inbox reindex operates due to extindex having multiple histories to work with. Instead of scanning git history, this relies exclusively on comparing over.sqlite3 contents between the v1/v2 inboxes and the extindex. Changes to Xapian behavior also get picked up, now. Xapian indexing is handled by workers with minimal IPC to the parent process. This results in more read I/O but fewer writes when dealing with cross-posted messages. Changes to $smsg->populate and --rethread still need further work.
Diffstat (limited to 'lib/PublicInbox/V2Writable.pm')
-rw-r--r-- | lib/PublicInbox/V2Writable.pm | 6 |
1 files changed, 5 insertions, 1 deletions
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm index bef3a67a..572eb418 100644 --- a/lib/PublicInbox/V2Writable.pm +++ b/lib/PublicInbox/V2Writable.pm @@ -275,13 +275,17 @@ sub _idx_init { # with_umask callback $self->{shards} = $nshards if $nshards && $nshards != $self->{shards}; $self->{batch_bytes} = $opt->{batch_size} // $PublicInbox::SearchIdx::BATCH_BYTES; - $self->{batch_bytes} *= $self->{shards} if $self->{parallel}; # need to create all shards before initializing msgmap FD # idx_shards must be visible to all forked processes my $max = $self->{shards} - 1; my $idx = $self->{idx_shards} = []; push @$idx, PublicInbox::SearchIdxShard->new($self, $_) for (0..$max); + + # SearchIdxShard may do their own flushing, so don't scale + # until after forking + $self->{batch_bytes} *= $self->{shards} if $self->{parallel}; + my $ibx = $self->{ibx} or return; # ExtIdxSearch # Now that all subprocesses are up, we can open the FDs |