diff options
author | Eric Wong <e@yhbt.net> | 2020-08-25 10:51:20 +0000 |
---|---|---|
committer | Eric Wong <e@yhbt.net> | 2020-08-26 06:10:45 +0000 |
commit | 78b1c973fba89bc6651ffa16807c40abd38e7bad (patch) | |
tree | d344f2059b4dfa949f9e1b288417b19f85817365 /Documentation | |
parent | 9684e0406fd2c67706bc46e4c8e98a53c8edede3 (diff) | |
download | public-inbox-78b1c973fba89bc6651ffa16807c40abd38e7bad.tar.gz |
I've learned a thing or three about btrfs in the past few weeks and remembered some old HDD things, too. The Xapian MultiDatabase problem will need to be addressed for 1.7...
Diffstat (limited to 'Documentation')
-rw-r--r-- | Documentation/public-inbox-index.pod | 12 | ||||
-rw-r--r-- | Documentation/public-inbox-init.pod | 15 | ||||
-rw-r--r-- | Documentation/public-inbox-tuning.pod | 21 | ||||
-rw-r--r-- | Documentation/public-inbox-xcpdb.pod | 1 |
4 files changed, 40 insertions, 9 deletions
diff --git a/Documentation/public-inbox-index.pod b/Documentation/public-inbox-index.pod index 46a53825..207b2ed8 100644 --- a/Documentation/public-inbox-index.pod +++ b/Documentation/public-inbox-index.pod @@ -39,8 +39,12 @@ normal search functionality. Influences the number of Xapian indexing shards in a (L<public-inbox-v2-format(5)>) inbox. +See L<public-inbox-init(1)/--jobs> for a full description +of sharding. + C<--jobs=0> is accepted as of public-inbox 1.6.0 (PENDING) -to disable parallel indexing. +to disable parallel indexing regardless of the number of +pre-existing shards. If the inbox has not been indexed or initialized, C<JOBS - 1> shards will be created (one job is always needed for indexing @@ -133,7 +137,11 @@ Available in public-inbox 1.6.0 (PENDING). =item --no-fsync Disables L<fsync(2)> and L<fdatasync(2)> operations on SQLite -and Xapian. This is only effective with Xapian 1.4+. +and Xapian. This is only effective with Xapian 1.4+. This is +primarily intended for systems with low RAM and the small +(default) C<--batch-size=1m>. Users of large C<--batch-size> +may even find disabling L<fdatasync(2)> causes too much dirty +data to accumulate, resulting on latency spikes from writeback. Available in public-inbox 1.6.0 (PENDING). diff --git a/Documentation/public-inbox-init.pod b/Documentation/public-inbox-init.pod index b25dd1e4..24645045 100644 --- a/Documentation/public-inbox-init.pod +++ b/Documentation/public-inbox-init.pod @@ -86,14 +86,21 @@ Default: unset, no epochs are skipped Control the number of Xapian index shards in a C<-V2> (L<public-inbox-v2-format(5)>) inbox. -It is useful to use a single shard (C<-j1>) for inboxes on +It can be useful to use a single shard (C<-j1>) for inboxes on high-latency storage (e.g. rotational HDD) unless the system has enough RAM to cache 5-10x the size of the git repository. -It is generally not useful to specify higher values than the -default due to contention in the top-level producer process. +Another approach for HDDs is to use the +L<public-inbox-index(1)/publicInbox.indexSequentialShard> option +and many shards, so each shard may fit into the kernel page +cache. Unfortunately, excessive shards slows down read-only +query performance. -Default: the number of online CPUs, up to 4 +For fast storage, it is generally not useful to specify higher +values than the default due to the top-level producer process +being a bottleneck. + +Default: the number of online CPUs, up to 4 (3 shard workers, 1 producer) =item --skip-docdata diff --git a/Documentation/public-inbox-tuning.pod b/Documentation/public-inbox-tuning.pod index abc53d1e..e3f2899b 100644 --- a/Documentation/public-inbox-tuning.pod +++ b/Documentation/public-inbox-tuning.pod @@ -69,7 +69,8 @@ footprint when indexing on HDDs. Initializing a mirror with a high C<--jobs> count to create more shards (in C<-V2> inboxes) will keep each shard smaller and -reduce its kernel page cache footprint. +reduce its kernel page cache footprint. Keep in mind excessive +sharding imposes a performance penalty for read-only queries. Users with large amounts of RAM are advised to set a large value for C<publicinbox.indexBatchSize> as documented in @@ -88,12 +89,21 @@ used by public-inbox are no exception to that. public-inbox 1.6.0+ disables copy-on-write (CoW) on Xapian and SQLite indices on btrfs to achieve acceptable performance (even on SSD). -Disabling copy-on-write also disables checksumming, thus raid1 -(or higher) configurations may corrupt on unsafe shutdowns. +Disabling copy-on-write also disables checksumming, thus C<raid1> +(or higher) configurations may be corrupt after unsafe shutdowns. Fortunately, these SQLite and Xapian indices are designed to recoverable from git if missing. +Disabling CoW does not prevent all fragmentation. + +Avoid snapshotting subvolumes containing Xapian and/or SQLite indices. +Snapshots use CoW despite our efforts to disable it, resulting +in fragmentation. + +L<filefrag(8)> can be used to monitor fragmentation, and +C<btrfs filesystem defragment -fr $INBOX_DIR> may be necessary. + Large filesystems benefit significantly from the C<space_cache=v2> mount option documented in L<btrfs(5)>. @@ -106,6 +116,11 @@ While SSD read performance is generally good, SSD write performance degrades as the drive ages and/or gets full. Issuing C<TRIM> commands via L<fstrim(8)> or similar is required to sustain write performance. +Users of the Flash-Friendly File System +L<F2FS|https://en.wikipedia.org/wiki/F2FS> may benefit from +optimizations found in SQLite 3.21.0+. Benchmarks are greatly +appreciated. + =head2 Read-only daemons L<public-inbox-httpd(1)>, L<public-inbox-imapd(1)>, and diff --git a/Documentation/public-inbox-xcpdb.pod b/Documentation/public-inbox-xcpdb.pod index 52939894..1397a7f4 100644 --- a/Documentation/public-inbox-xcpdb.pod +++ b/Documentation/public-inbox-xcpdb.pod @@ -60,6 +60,7 @@ used with C<--compact>. =item --no-fsync Disable L<fsync(2)> and L<fdatasync(2)>. +See L<public-inbox-index(1)/--no-fsync> for caveats. Available in public-inbox 1.6.0 (PENDING). |