PUBLIC-INBOX-TUNING(7)     public-inbox user manual     PUBLIC-INBOX-TUNING(7)

NAME
       public-inbox-tuning - tuning public-inbox

DESCRIPTION
       public-inbox intends to support a wide variety of hardware.  While we
       strive to provide the best out-of-the-box performance possible, tuning
       knobs are an unfortunate necessity in some cases.

       1.  New inboxes: public-inbox-init -V2

       2.  Optional Inline::C use

       3.  Performance on rotational hard disk drives

       4.  Btrfs (and possibly other copy-on-write filesystems)

       5.  Performance on solid state drives

       6.  Read-only daemons

       7.  Other OS tuning knobs

       8.  Scalability to many inboxes

   New inboxes: public-inbox-init -V2
       If you're starting a new inbox (and not mirroring an existing one), the
       -V2 requires DBD::SQLite, but is orders of magnitude more scalable than
       the original "-V1" format.

   Optional Inline::C use
       Our optional use of Inline::C speeds up subprocess spawning from large
       daemon processes.

       To enable Inline::C, either set the "PERL_INLINE_DIRECTORY" environment
       variable to point to a writable directory, or create
       "~/.cache/public-inbox/inline-c" for any user(s) running public-inbox
       processes.

       If libgit2 development files are installed and Inline::C is enabled
       (described above), per-inbox "git cat-file --batch" processes are
       replaced with a single perl(1) process running
       "PublicInbox::Gcf2::loop" in read-only daemons.  libgit2 use will be
       available in public-inbox 1.7.0+

       More (optional) Inline::C use will be introduced in the future to lower
       memory use and improve scalability.

       Note: Inline::C is required for lei(1), but not public-inbox-*

   Performance on rotational hard disk drives
       Random I/O performance is poor on rotational HDDs.  Xapian indexing
       performance degrades significantly as DBs grow larger than available
       RAM.  Attempts to parallelize random I/O on HDDs leads to pathological
       slowdowns as inboxes grow.

       While "-V2" introduced Xapian shards as a parallelization mechanism for
       SSDs; enabling "publicInbox.indexSequentialShard" repurposes sharding
       as mechanism to reduce the kernel page cache footprint when indexing on
       HDDs.

       Initializing a mirror with a high "--jobs" count to create more shards
       (in "-V2" inboxes) will keep each shard smaller and reduce its kernel
       page cache footprint.  Keep in mind excessive sharding imposes a
       performance penalty for read-only queries.

       Users with large amounts of RAM are advised to set a large value for
       "publicinbox.indexBatchSize" as documented in public-inbox-index(1).

       "dm-crypt" users on Linux 4.0+ are advised to try the
       "--perf-same_cpu_crypt" "--perf-submit_from_crypt_cpus" switches of
       cryptsetup(8) to reduce I/O contention from kernel workqueue threads.

   Btrfs (and possibly other copy-on-write filesystems)
       btrfs(5) performance degrades from fragmentation when using large
       databases and random writes.  The Xapian + SQLite indices used by
       public-inbox are no exception to that.

       public-inbox 1.6.0+ disables copy-on-write (CoW) on Xapian and SQLite
       indices on btrfs to achieve acceptable performance (even on SSD).
       Disabling copy-on-write also disables checksumming, thus "raid1" (or
       higher) configurations may be corrupt after unsafe shutdowns.

       Fortunately, these SQLite and Xapian indices are designed to
       recoverable from git if missing.

       Disabling CoW does not prevent all fragmentation.  Large values of
       "publicInbox.indexBatchSize" also limit fragmentation during the
       initial index.

       Avoid snapshotting subvolumes containing Xapian and/or SQLite indices.
       Snapshots use CoW despite our efforts to disable it, resulting in
       fragmentation.

       filefrag(8) can be used to monitor fragmentation, and "btrfs filesystem
       defragment -fr $INBOX_DIR" may be necessary.

       Large filesystems benefit significantly from the "space_cache=v2" mount
       option documented in btrfs(5).

       Older, non-CoW filesystems are generally work well out-of-the-box for
       our Xapian and SQLite indices.

   Performance on solid state drives
       While SSD read performance is generally good, SSD write performance
       degrades as the drive ages and/or gets full.  Issuing "TRIM" commands
       via fstrim(8) or similar is required to sustain write performance.

       Users of the Flash-Friendly File System F2FS
       <https://en.wikipedia.org/wiki/F2FS> may benefit from optimizations
       found in SQLite 3.21.0+.  Benchmarks are greatly appreciated.

   Read-only daemons
       public-inbox-httpd(1), public-inbox-imapd(1), and public-inbox-nntpd(1)
       are all designed for C10K (or higher) levels of concurrency from a
       single process.  SMP systems may use "--worker-processes=NUM" as
       documented in public-inbox-daemon(8) for parallelism.

       The open file descriptor limit ("RLIMIT_NOFILE", "ulimit -n" in sh(1),
       "LimitNOFILE=" in systemd.exec(5)) may need to be raised to accommodate
       many concurrent clients.

       Transport Layer Security (IMAPS, NNTPS, or via STARTTLS) significantly
       increases memory use of client sockets, sure to account for that in
       capacity planning.

   Other OS tuning knobs
       Linux users: the "sys.vm.max_map_count" sysctl may need to be increased
       if handling thousands of inboxes (with public-inbox-extindex(1)) to
       avoid out-of-memory errors from git.

       Other OSes may have similar tuning knobs (patches appreciated).

   Scalability to many inboxes
       public-inbox-extindex(1) allows any number of public-inboxes to share
       the same Xapian indices.

       git 2.33+ startup time is orders-of-magnitude faster and uses less
       memory when dealing with thousands of alternates required for thousands
       of inboxes with public-inbox-extindex(1).

       Frequent packing (via git-gc(1)) both improves performance and reduces
       the need to increase "sys.vm.max_map_count".

CONTACT
       Feedback encouraged via plain-text mail to
       <mailto:meta@public-inbox.org>

       Information for *BSDs and non-traditional filesystems especially
       welcome.

       Our archives are hosted at <https://public-inbox.org/meta/>,
       <http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/meta/>,
       and other places

COPYRIGHT
       Copyright all contributors <mailto:meta@public-inbox.org>

       License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>

public-inbox.git                  1993-10-02            PUBLIC-INBOX-TUNING(7)