From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id A2FC31F66E for ; Tue, 25 Aug 2020 10:51:20 +0000 (UTC) From: Eric Wong To: meta@public-inbox.org Subject: [PATCH] doc: add some more tuning notes Date: Tue, 25 Aug 2020 10:51:20 +0000 Message-Id: <20200825105120.30106-1-e@yhbt.net> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit List-Id: I've learned a thing or three about btrfs in the past few weeks and remembered some old HDD things, too. The Xapian MultiDatabase problem will need to be addressed for 1.7... --- Documentation/public-inbox-index.pod | 12 ++++++++++-- Documentation/public-inbox-init.pod | 15 +++++++++++---- Documentation/public-inbox-tuning.pod | 21 ++++++++++++++++++--- Documentation/public-inbox-xcpdb.pod | 1 + 4 files changed, 40 insertions(+), 9 deletions(-) diff --git a/Documentation/public-inbox-index.pod b/Documentation/public-inbox-index.pod index 46a53825..207b2ed8 100644 --- a/Documentation/public-inbox-index.pod +++ b/Documentation/public-inbox-index.pod @@ -39,8 +39,12 @@ normal search functionality. Influences the number of Xapian indexing shards in a (L) inbox. +See L for a full description +of sharding. + C<--jobs=0> is accepted as of public-inbox 1.6.0 (PENDING) -to disable parallel indexing. +to disable parallel indexing regardless of the number of +pre-existing shards. If the inbox has not been indexed or initialized, C shards will be created (one job is always needed for indexing @@ -133,7 +137,11 @@ Available in public-inbox 1.6.0 (PENDING). =item --no-fsync Disables L and L operations on SQLite -and Xapian. This is only effective with Xapian 1.4+. +and Xapian. This is only effective with Xapian 1.4+. This is +primarily intended for systems with low RAM and the small +(default) C<--batch-size=1m>. Users of large C<--batch-size> +may even find disabling L causes too much dirty +data to accumulate, resulting on latency spikes from writeback. Available in public-inbox 1.6.0 (PENDING). diff --git a/Documentation/public-inbox-init.pod b/Documentation/public-inbox-init.pod index b25dd1e4..24645045 100644 --- a/Documentation/public-inbox-init.pod +++ b/Documentation/public-inbox-init.pod @@ -86,14 +86,21 @@ Default: unset, no epochs are skipped Control the number of Xapian index shards in a C<-V2> (L) inbox. -It is useful to use a single shard (C<-j1>) for inboxes on +It can be useful to use a single shard (C<-j1>) for inboxes on high-latency storage (e.g. rotational HDD) unless the system has enough RAM to cache 5-10x the size of the git repository. -It is generally not useful to specify higher values than the -default due to contention in the top-level producer process. +Another approach for HDDs is to use the +L option +and many shards, so each shard may fit into the kernel page +cache. Unfortunately, excessive shards slows down read-only +query performance. -Default: the number of online CPUs, up to 4 +For fast storage, it is generally not useful to specify higher +values than the default due to the top-level producer process +being a bottleneck. + +Default: the number of online CPUs, up to 4 (3 shard workers, 1 producer) =item --skip-docdata diff --git a/Documentation/public-inbox-tuning.pod b/Documentation/public-inbox-tuning.pod index abc53d1e..e3f2899b 100644 --- a/Documentation/public-inbox-tuning.pod +++ b/Documentation/public-inbox-tuning.pod @@ -69,7 +69,8 @@ footprint when indexing on HDDs. Initializing a mirror with a high C<--jobs> count to create more shards (in C<-V2> inboxes) will keep each shard smaller and -reduce its kernel page cache footprint. +reduce its kernel page cache footprint. Keep in mind excessive +sharding imposes a performance penalty for read-only queries. Users with large amounts of RAM are advised to set a large value for C as documented in @@ -88,12 +89,21 @@ used by public-inbox are no exception to that. public-inbox 1.6.0+ disables copy-on-write (CoW) on Xapian and SQLite indices on btrfs to achieve acceptable performance (even on SSD). -Disabling copy-on-write also disables checksumming, thus raid1 -(or higher) configurations may corrupt on unsafe shutdowns. +Disabling copy-on-write also disables checksumming, thus C +(or higher) configurations may be corrupt after unsafe shutdowns. Fortunately, these SQLite and Xapian indices are designed to recoverable from git if missing. +Disabling CoW does not prevent all fragmentation. + +Avoid snapshotting subvolumes containing Xapian and/or SQLite indices. +Snapshots use CoW despite our efforts to disable it, resulting +in fragmentation. + +L can be used to monitor fragmentation, and +C may be necessary. + Large filesystems benefit significantly from the C mount option documented in L. @@ -106,6 +116,11 @@ While SSD read performance is generally good, SSD write performance degrades as the drive ages and/or gets full. Issuing C commands via L or similar is required to sustain write performance. +Users of the Flash-Friendly File System +L may benefit from +optimizations found in SQLite 3.21.0+. Benchmarks are greatly +appreciated. + =head2 Read-only daemons L, L, and diff --git a/Documentation/public-inbox-xcpdb.pod b/Documentation/public-inbox-xcpdb.pod index 52939894..1397a7f4 100644 --- a/Documentation/public-inbox-xcpdb.pod +++ b/Documentation/public-inbox-xcpdb.pod @@ -60,6 +60,7 @@ used with C<--compact>. =item --no-fsync Disable L and L. +See L for caveats. Available in public-inbox 1.6.0 (PENDING).