user/dev discussion of public-inbox itself
 help / color / mirror / Atom feed
* [PATCH] doc: add some more tuning notes
@ 2020-08-25 10:51 Eric Wong
  0 siblings, 0 replies; only message in thread
From: Eric Wong @ 2020-08-25 10:51 UTC (permalink / raw)
  To: meta

I've learned a thing or three about btrfs in the past few
weeks and remembered some old HDD things, too.

The Xapian MultiDatabase problem will need to be addressed
for 1.7...
 Documentation/public-inbox-index.pod  | 12 ++++++++++--
 Documentation/public-inbox-init.pod   | 15 +++++++++++----
 Documentation/public-inbox-tuning.pod | 21 ++++++++++++++++++---
 Documentation/public-inbox-xcpdb.pod  |  1 +
 4 files changed, 40 insertions(+), 9 deletions(-)

diff --git a/Documentation/public-inbox-index.pod b/Documentation/public-inbox-index.pod
index 46a53825..207b2ed8 100644
--- a/Documentation/public-inbox-index.pod
+++ b/Documentation/public-inbox-index.pod
@@ -39,8 +39,12 @@ normal search functionality.
 Influences the number of Xapian indexing shards in a
 (L<public-inbox-v2-format(5)>) inbox.
+See L<public-inbox-init(1)/--jobs> for a full description
+of sharding.
 C<--jobs=0> is accepted as of public-inbox 1.6.0 (PENDING)
-to disable parallel indexing.
+to disable parallel indexing regardless of the number of
+pre-existing shards.
 If the inbox has not been indexed or initialized, C<JOBS - 1>
 shards will be created (one job is always needed for indexing
@@ -133,7 +137,11 @@ Available in public-inbox 1.6.0 (PENDING).
 =item --no-fsync
 Disables L<fsync(2)> and L<fdatasync(2)> operations on SQLite
-and Xapian.  This is only effective with Xapian 1.4+.
+and Xapian.  This is only effective with Xapian 1.4+.  This is
+primarily intended for systems with low RAM and the small
+(default) C<--batch-size=1m>.  Users of large C<--batch-size>
+may even find disabling L<fdatasync(2)> causes too much dirty
+data to accumulate, resulting on latency spikes from writeback.
 Available in public-inbox 1.6.0 (PENDING).
diff --git a/Documentation/public-inbox-init.pod b/Documentation/public-inbox-init.pod
index b25dd1e4..24645045 100644
--- a/Documentation/public-inbox-init.pod
+++ b/Documentation/public-inbox-init.pod
@@ -86,14 +86,21 @@ Default: unset, no epochs are skipped
 Control the number of Xapian index shards in a
 C<-V2> (L<public-inbox-v2-format(5)>) inbox.
-It is useful to use a single shard (C<-j1>) for inboxes on
+It can be useful to use a single shard (C<-j1>) for inboxes on
 high-latency storage (e.g. rotational HDD) unless the system has
 enough RAM to cache 5-10x the size of the git repository.
-It is generally not useful to specify higher values than the
-default due to contention in the top-level producer process.
+Another approach for HDDs is to use the
+L<public-inbox-index(1)/publicInbox.indexSequentialShard> option
+and many shards, so each shard may fit into the kernel page
+cache.  Unfortunately, excessive shards slows down read-only
+query performance.
-Default: the number of online CPUs, up to 4
+For fast storage, it is generally not useful to specify higher
+values than the default due to the top-level producer process
+being a bottleneck.
+Default: the number of online CPUs, up to 4 (3 shard workers, 1 producer)
 =item --skip-docdata
diff --git a/Documentation/public-inbox-tuning.pod b/Documentation/public-inbox-tuning.pod
index abc53d1e..e3f2899b 100644
--- a/Documentation/public-inbox-tuning.pod
+++ b/Documentation/public-inbox-tuning.pod
@@ -69,7 +69,8 @@ footprint when indexing on HDDs.
 Initializing a mirror with a high C<--jobs> count to create more
 shards (in C<-V2> inboxes) will keep each shard smaller and
-reduce its kernel page cache footprint.
+reduce its kernel page cache footprint.  Keep in mind excessive
+sharding imposes a performance penalty for read-only queries.
 Users with large amounts of RAM are advised to set a large value
 for C<publicinbox.indexBatchSize> as documented in
@@ -88,12 +89,21 @@ used by public-inbox are no exception to that.
 public-inbox 1.6.0+ disables copy-on-write (CoW) on Xapian and SQLite
 indices on btrfs to achieve acceptable performance (even on SSD).
-Disabling copy-on-write also disables checksumming, thus raid1
-(or higher) configurations may corrupt on unsafe shutdowns.
+Disabling copy-on-write also disables checksumming, thus C<raid1>
+(or higher) configurations may be corrupt after unsafe shutdowns.
 Fortunately, these SQLite and Xapian indices are designed to
 recoverable from git if missing.
+Disabling CoW does not prevent all fragmentation.
+Avoid snapshotting subvolumes containing Xapian and/or SQLite indices.
+Snapshots use CoW despite our efforts to disable it, resulting
+in fragmentation.
+L<filefrag(8)> can be used to monitor fragmentation, and
+C<btrfs filesystem defragment -fr $INBOX_DIR> may be necessary.
 Large filesystems benefit significantly from the C<space_cache=v2>
 mount option documented in L<btrfs(5)>.
@@ -106,6 +116,11 @@ While SSD read performance is generally good, SSD write performance
 degrades as the drive ages and/or gets full.  Issuing C<TRIM> commands
 via L<fstrim(8)> or similar is required to sustain write performance.
+Users of the Flash-Friendly File System
+L<F2FS|> may benefit from
+optimizations found in SQLite 3.21.0+.  Benchmarks are greatly
 =head2 Read-only daemons
 L<public-inbox-httpd(1)>, L<public-inbox-imapd(1)>, and
diff --git a/Documentation/public-inbox-xcpdb.pod b/Documentation/public-inbox-xcpdb.pod
index 52939894..1397a7f4 100644
--- a/Documentation/public-inbox-xcpdb.pod
+++ b/Documentation/public-inbox-xcpdb.pod
@@ -60,6 +60,7 @@ used with C<--compact>.
 =item --no-fsync
 Disable L<fsync(2)> and L<fdatasync(2)>.
+See L<public-inbox-index(1)/--no-fsync> for caveats.
 Available in public-inbox 1.6.0 (PENDING).

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2020-08-25 10:51 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-25 10:51 [PATCH] doc: add some more tuning notes Eric Wong

user/dev discussion of public-inbox itself

This inbox may be cloned and mirrored by anyone:

	git clone --mirror
	git clone --mirror http://czquwvybam4bgbro.onion/meta
	git clone --mirror http://hjrcffqmbrq6wope.onion/meta
	git clone --mirror http://ou63pmih66umazou.onion/meta

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V1 meta meta/ \
	public-inbox-index meta

Example config snippet for mirrors.
Newsgroups are available over NNTP:
 note: .onion URLs require Tor:

code repositories for the project(s) associated with this inbox:

AGPL code for this site: git clone