=head1 NAME public-inbox-extindex - create and update external search indices =head1 SYNOPSIS public-inbox-extindex [OPTIONS] EXTINDEX_DIR INBOX_DIR... public-inbox-extindex [OPTIONS] [EXTINDEX_DIR] --all =head1 DESCRIPTION public-inbox-extindex creates and updates an external search and overview database used by the read-only public-inbox PSGI (HTTP), NNTP, and IMAP interfaces. This requires either the L SWIG bindings OR or L XS bindings along with L and L Perl modules. =head1 OPTIONS =over =item -j JOBS =item --jobs=JOBS =item --no-fsync =item --dangerous =item --rethread =item --max-size SIZE =item --batch-size SIZE These switches behave as they do for L =item --all Index all C entries in C. C entries indexed by C can have full Xapian searching abilities with the per-C C set to C and their respective Xapian (C or C) directories removed. For multiple public-inboxes where cross-posting is common, this allows significant space savings on Xapian indices. =item --dedupe=MSGID =item --dedupe Rerun deduplication on messages with the given Message-ID or all messages if no Message-ID is specified. Deduplication rules may change and evolve over time, especially if filters are involved. C<--dedupe=MSGID> may be specified multiple times to deduplicate multiple Message-IDs. Use this if you see C warnings from WWW logs. =item --gc Perform garbage collection instead of indexing. Use this if inboxes are removed from the extindex, a newsgroup name is set or changed, or if messages are purged or removed from some inboxes. =item --reindex Forces a re-index of all messages in the extindex. This can be used for in-place upgrades and bugfixes while read-only server processes are utilizing the index. Keep in mind this roughly doubles the size of the already-large Xapian database. =item --fast Used with C<--reindex>, it will only look for new and stale entries and not touch already-indexed messages. =back =head1 FILES L =head1 CONFIGURATION public-inbox-extindex does not write to the L file, it must be entered manually. The extindex name of C is a special case which corresponds to indexing C<--all> inboxes. An example for C<--all> is as follows: [extindex "all"] topdir = /path/to/extindex_dir url = all coderepo = foo coderepo = bar Putting an C entry in the config allows L. You can have any number of C sections where C<$NAME> is something other than C to display a union of several inboxes. It is strongly recommended any public inboxes indexed by this command have a stable C entry (regardless of the presence of an NNTP or IMAP server). Otherwise, public-inbox-extindex will use C as an internal key which can cause needless reindexing and require L<--gc> if inboxes are relocated. See L for more details. =head1 ENVIRONMENT =over 8 =item PI_CONFIG Used to override the default "~/.public-inbox/config" value. =item XAPIAN_FLUSH_THRESHOLD The number of documents to update before committing changes to disk. This environment is handled directly by Xapian, refer to Xapian API documentation for more details. Setting C or C for a large C<--reindex> may cause L, L and L tasks to wait long and unpredictable periods of time during C<--reindex>. Default: none, uses C =back =head1 UPGRADING Occasionally, public-inbox will update its schema version and require a full index by running this command. =head1 LOCKING It is safe to use C<--dedupe>, C<--gc> and C<--reindex> while other processes are writing to covered inboxes or extindex. The extindex locks will be released roughly every 10s to allow L and L processes to write to the extindex. =head1 CONTACT Feedback welcome via plain-text mail to L The mail archives are hosted at L and L =head1 COPYRIGHT Copyright all contributors L License: AGPL-3.0+ L =head1 SEE ALSO L, L