about summary refs log tree commit homepage
diff options
context:
space:
mode:
-rw-r--r--Documentation/public-inbox-index.pod12
-rw-r--r--Documentation/public-inbox-init.pod15
-rw-r--r--Documentation/public-inbox-tuning.pod21
-rw-r--r--Documentation/public-inbox-xcpdb.pod1
4 files changed, 40 insertions, 9 deletions
diff --git a/Documentation/public-inbox-index.pod b/Documentation/public-inbox-index.pod
index 46a53825..207b2ed8 100644
--- a/Documentation/public-inbox-index.pod
+++ b/Documentation/public-inbox-index.pod
@@ -39,8 +39,12 @@ normal search functionality.
 Influences the number of Xapian indexing shards in a
 (L<public-inbox-v2-format(5)>) inbox.
 
+See L<public-inbox-init(1)/--jobs> for a full description
+of sharding.
+
 C<--jobs=0> is accepted as of public-inbox 1.6.0 (PENDING)
-to disable parallel indexing.
+to disable parallel indexing regardless of the number of
+pre-existing shards.
 
 If the inbox has not been indexed or initialized, C<JOBS - 1>
 shards will be created (one job is always needed for indexing
@@ -133,7 +137,11 @@ Available in public-inbox 1.6.0 (PENDING).
 =item --no-fsync
 
 Disables L<fsync(2)> and L<fdatasync(2)> operations on SQLite
-and Xapian.  This is only effective with Xapian 1.4+.
+and Xapian.  This is only effective with Xapian 1.4+.  This is
+primarily intended for systems with low RAM and the small
+(default) C<--batch-size=1m>.  Users of large C<--batch-size>
+may even find disabling L<fdatasync(2)> causes too much dirty
+data to accumulate, resulting on latency spikes from writeback.
 
 Available in public-inbox 1.6.0 (PENDING).
 
diff --git a/Documentation/public-inbox-init.pod b/Documentation/public-inbox-init.pod
index b25dd1e4..24645045 100644
--- a/Documentation/public-inbox-init.pod
+++ b/Documentation/public-inbox-init.pod
@@ -86,14 +86,21 @@ Default: unset, no epochs are skipped
 Control the number of Xapian index shards in a
 C<-V2> (L<public-inbox-v2-format(5)>) inbox.
 
-It is useful to use a single shard (C<-j1>) for inboxes on
+It can be useful to use a single shard (C<-j1>) for inboxes on
 high-latency storage (e.g. rotational HDD) unless the system has
 enough RAM to cache 5-10x the size of the git repository.
 
-It is generally not useful to specify higher values than the
-default due to contention in the top-level producer process.
+Another approach for HDDs is to use the
+L<public-inbox-index(1)/publicInbox.indexSequentialShard> option
+and many shards, so each shard may fit into the kernel page
+cache.  Unfortunately, excessive shards slows down read-only
+query performance.
 
-Default: the number of online CPUs, up to 4
+For fast storage, it is generally not useful to specify higher
+values than the default due to the top-level producer process
+being a bottleneck.
+
+Default: the number of online CPUs, up to 4 (3 shard workers, 1 producer)
 
 =item --skip-docdata
 
diff --git a/Documentation/public-inbox-tuning.pod b/Documentation/public-inbox-tuning.pod
index abc53d1e..e3f2899b 100644
--- a/Documentation/public-inbox-tuning.pod
+++ b/Documentation/public-inbox-tuning.pod
@@ -69,7 +69,8 @@ footprint when indexing on HDDs.
 
 Initializing a mirror with a high C<--jobs> count to create more
 shards (in C<-V2> inboxes) will keep each shard smaller and
-reduce its kernel page cache footprint.
+reduce its kernel page cache footprint.  Keep in mind excessive
+sharding imposes a performance penalty for read-only queries.
 
 Users with large amounts of RAM are advised to set a large value
 for C<publicinbox.indexBatchSize> as documented in
@@ -88,12 +89,21 @@ used by public-inbox are no exception to that.
 
 public-inbox 1.6.0+ disables copy-on-write (CoW) on Xapian and SQLite
 indices on btrfs to achieve acceptable performance (even on SSD).
-Disabling copy-on-write also disables checksumming, thus raid1
-(or higher) configurations may corrupt on unsafe shutdowns.
+Disabling copy-on-write also disables checksumming, thus C<raid1>
+(or higher) configurations may be corrupt after unsafe shutdowns.
 
 Fortunately, these SQLite and Xapian indices are designed to
 recoverable from git if missing.
 
+Disabling CoW does not prevent all fragmentation.
+
+Avoid snapshotting subvolumes containing Xapian and/or SQLite indices.
+Snapshots use CoW despite our efforts to disable it, resulting
+in fragmentation.
+
+L<filefrag(8)> can be used to monitor fragmentation, and
+C<btrfs filesystem defragment -fr $INBOX_DIR> may be necessary.
+
 Large filesystems benefit significantly from the C<space_cache=v2>
 mount option documented in L<btrfs(5)>.
 
@@ -106,6 +116,11 @@ While SSD read performance is generally good, SSD write performance
 degrades as the drive ages and/or gets full.  Issuing C<TRIM> commands
 via L<fstrim(8)> or similar is required to sustain write performance.
 
+Users of the Flash-Friendly File System
+L<F2FS|https://en.wikipedia.org/wiki/F2FS> may benefit from
+optimizations found in SQLite 3.21.0+.  Benchmarks are greatly
+appreciated.
+
 =head2 Read-only daemons
 
 L<public-inbox-httpd(1)>, L<public-inbox-imapd(1)>, and
diff --git a/Documentation/public-inbox-xcpdb.pod b/Documentation/public-inbox-xcpdb.pod
index 52939894..1397a7f4 100644
--- a/Documentation/public-inbox-xcpdb.pod
+++ b/Documentation/public-inbox-xcpdb.pod
@@ -60,6 +60,7 @@ used with C<--compact>.
 =item --no-fsync
 
 Disable L<fsync(2)> and L<fdatasync(2)>.
+See L<public-inbox-index(1)/--no-fsync> for caveats.
 
 Available in public-inbox 1.6.0 (PENDING).