about summary refs log tree commit homepage
path: root/Documentation
diff options
context:
space:
mode:
authorEric Wong <e@yhbt.net>2020-08-25 10:51:20 +0000
committerEric Wong <e@yhbt.net>2020-08-26 06:10:45 +0000
commit78b1c973fba89bc6651ffa16807c40abd38e7bad (patch)
treed344f2059b4dfa949f9e1b288417b19f85817365 /Documentation
parent9684e0406fd2c67706bc46e4c8e98a53c8edede3 (diff)
downloadpublic-inbox-78b1c973fba89bc6651ffa16807c40abd38e7bad.tar.gz
I've learned a thing or three about btrfs in the past few
weeks and remembered some old HDD things, too.

The Xapian MultiDatabase problem will need to be addressed
for 1.7...
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/public-inbox-index.pod12
-rw-r--r--Documentation/public-inbox-init.pod15
-rw-r--r--Documentation/public-inbox-tuning.pod21
-rw-r--r--Documentation/public-inbox-xcpdb.pod1
4 files changed, 40 insertions, 9 deletions
diff --git a/Documentation/public-inbox-index.pod b/Documentation/public-inbox-index.pod
index 46a53825..207b2ed8 100644
--- a/Documentation/public-inbox-index.pod
+++ b/Documentation/public-inbox-index.pod
@@ -39,8 +39,12 @@ normal search functionality.
 Influences the number of Xapian indexing shards in a
 (L<public-inbox-v2-format(5)>) inbox.
 
+See L<public-inbox-init(1)/--jobs> for a full description
+of sharding.
+
 C<--jobs=0> is accepted as of public-inbox 1.6.0 (PENDING)
-to disable parallel indexing.
+to disable parallel indexing regardless of the number of
+pre-existing shards.
 
 If the inbox has not been indexed or initialized, C<JOBS - 1>
 shards will be created (one job is always needed for indexing
@@ -133,7 +137,11 @@ Available in public-inbox 1.6.0 (PENDING).
 =item --no-fsync
 
 Disables L<fsync(2)> and L<fdatasync(2)> operations on SQLite
-and Xapian.  This is only effective with Xapian 1.4+.
+and Xapian.  This is only effective with Xapian 1.4+.  This is
+primarily intended for systems with low RAM and the small
+(default) C<--batch-size=1m>.  Users of large C<--batch-size>
+may even find disabling L<fdatasync(2)> causes too much dirty
+data to accumulate, resulting on latency spikes from writeback.
 
 Available in public-inbox 1.6.0 (PENDING).
 
diff --git a/Documentation/public-inbox-init.pod b/Documentation/public-inbox-init.pod
index b25dd1e4..24645045 100644
--- a/Documentation/public-inbox-init.pod
+++ b/Documentation/public-inbox-init.pod
@@ -86,14 +86,21 @@ Default: unset, no epochs are skipped
 Control the number of Xapian index shards in a
 C<-V2> (L<public-inbox-v2-format(5)>) inbox.
 
-It is useful to use a single shard (C<-j1>) for inboxes on
+It can be useful to use a single shard (C<-j1>) for inboxes on
 high-latency storage (e.g. rotational HDD) unless the system has
 enough RAM to cache 5-10x the size of the git repository.
 
-It is generally not useful to specify higher values than the
-default due to contention in the top-level producer process.
+Another approach for HDDs is to use the
+L<public-inbox-index(1)/publicInbox.indexSequentialShard> option
+and many shards, so each shard may fit into the kernel page
+cache.  Unfortunately, excessive shards slows down read-only
+query performance.
 
-Default: the number of online CPUs, up to 4
+For fast storage, it is generally not useful to specify higher
+values than the default due to the top-level producer process
+being a bottleneck.
+
+Default: the number of online CPUs, up to 4 (3 shard workers, 1 producer)
 
 =item --skip-docdata
 
diff --git a/Documentation/public-inbox-tuning.pod b/Documentation/public-inbox-tuning.pod
index abc53d1e..e3f2899b 100644
--- a/Documentation/public-inbox-tuning.pod
+++ b/Documentation/public-inbox-tuning.pod
@@ -69,7 +69,8 @@ footprint when indexing on HDDs.
 
 Initializing a mirror with a high C<--jobs> count to create more
 shards (in C<-V2> inboxes) will keep each shard smaller and
-reduce its kernel page cache footprint.
+reduce its kernel page cache footprint.  Keep in mind excessive
+sharding imposes a performance penalty for read-only queries.
 
 Users with large amounts of RAM are advised to set a large value
 for C<publicinbox.indexBatchSize> as documented in
@@ -88,12 +89,21 @@ used by public-inbox are no exception to that.
 
 public-inbox 1.6.0+ disables copy-on-write (CoW) on Xapian and SQLite
 indices on btrfs to achieve acceptable performance (even on SSD).
-Disabling copy-on-write also disables checksumming, thus raid1
-(or higher) configurations may corrupt on unsafe shutdowns.
+Disabling copy-on-write also disables checksumming, thus C<raid1>
+(or higher) configurations may be corrupt after unsafe shutdowns.
 
 Fortunately, these SQLite and Xapian indices are designed to
 recoverable from git if missing.
 
+Disabling CoW does not prevent all fragmentation.
+
+Avoid snapshotting subvolumes containing Xapian and/or SQLite indices.
+Snapshots use CoW despite our efforts to disable it, resulting
+in fragmentation.
+
+L<filefrag(8)> can be used to monitor fragmentation, and
+C<btrfs filesystem defragment -fr $INBOX_DIR> may be necessary.
+
 Large filesystems benefit significantly from the C<space_cache=v2>
 mount option documented in L<btrfs(5)>.
 
@@ -106,6 +116,11 @@ While SSD read performance is generally good, SSD write performance
 degrades as the drive ages and/or gets full.  Issuing C<TRIM> commands
 via L<fstrim(8)> or similar is required to sustain write performance.
 
+Users of the Flash-Friendly File System
+L<F2FS|https://en.wikipedia.org/wiki/F2FS> may benefit from
+optimizations found in SQLite 3.21.0+.  Benchmarks are greatly
+appreciated.
+
 =head2 Read-only daemons
 
 L<public-inbox-httpd(1)>, L<public-inbox-imapd(1)>, and
diff --git a/Documentation/public-inbox-xcpdb.pod b/Documentation/public-inbox-xcpdb.pod
index 52939894..1397a7f4 100644
--- a/Documentation/public-inbox-xcpdb.pod
+++ b/Documentation/public-inbox-xcpdb.pod
@@ -60,6 +60,7 @@ used with C<--compact>.
 =item --no-fsync
 
 Disable L<fsync(2)> and L<fdatasync(2)>.
+See L<public-inbox-index(1)/--no-fsync> for caveats.
 
 Available in public-inbox 1.6.0 (PENDING).