diff options
Diffstat (limited to 'Documentation/public-inbox-extindex.pod')
-rw-r--r-- | Documentation/public-inbox-extindex.pod | 180 |
1 files changed, 180 insertions, 0 deletions
diff --git a/Documentation/public-inbox-extindex.pod b/Documentation/public-inbox-extindex.pod new file mode 100644 index 00000000..2db7d7e9 --- /dev/null +++ b/Documentation/public-inbox-extindex.pod @@ -0,0 +1,180 @@ +=head1 NAME + +public-inbox-extindex - create and update external search indices + +=head1 SYNOPSIS + +public-inbox-extindex [OPTIONS] EXTINDEX_DIR INBOX_DIR... + +public-inbox-extindex [OPTIONS] [EXTINDEX_DIR] --all + +=head1 DESCRIPTION + +public-inbox-extindex creates and updates an external search and +overview database used by the read-only public-inbox PSGI (HTTP), +NNTP, and IMAP interfaces. This requires either the +L<Xapian> SWIG bindings OR or L<Search::Xapian> XS bindings +along with L<DBD::SQLite> and L<DBI> Perl modules. + +=head1 OPTIONS + +=over + +=item -j JOBS + +=item --jobs=JOBS + +=item --no-fsync + +=item --dangerous + +=item --rethread + +=item --max-size SIZE + +=item --batch-size SIZE + +These switches behave as they do for L<public-inbox-index(1)> + +=item --all + +Index all C<publicinbox> entries in C<PI_CONFIG>. + +C<publicinbox> entries indexed by C<public-inbox-extindex> can +have full Xapian searching abilities with the per-C<publicinbox> +C<indexlevel> set to C<basic> and their respective Xapian +(C<xap15> or C<xapian15>) directories removed. For multiple +public-inboxes where cross-posting is common, this allows +significant space savings on Xapian indices. + +=item --dedupe=MSGID + +=item --dedupe + +Rerun deduplication on messages with the given Message-ID or +all messages if no Message-ID is specified. Deduplication rules may +change and evolve over time, especially if filters are involved. + +C<--dedupe=MSGID> may be specified multiple times to deduplicate +multiple Message-IDs. + +Use this if you see C<W: BUG? $MSGID not deduplicated properly> +warnings from WWW logs. + +=item --gc + +Perform garbage collection instead of indexing. Use this if +inboxes are removed from the extindex, a newsgroup name is +set or changed, or if messages are purged or removed from +some inboxes. + +=item --reindex + +Forces a re-index of all messages in the extindex. This can be +used for in-place upgrades and bugfixes while read-only server +processes are utilizing the index. Keep in mind this roughly +doubles the size of the already-large Xapian database. + +=item --fast + +Used with C<--reindex>, it will only look for new and stale +entries and not touch already-indexed messages. + +=item --no-multi-pack-index + +Disable writing a L<git-multi-pack-index(1)> file to save memory. +Normally, enabling multi-pack-index speeds up startup time of +subsequent L<git-cat-file(1)> processes by 3-4%, but generating +this file requires several GB of memory with large repos. + +Unlike the C<core.multiPackIndex> directive in git, it's still +possible to read existing multi-pack-index files if they are +created elsewhere. + +Available in public-inbox 2.0.0+ + +=back + +=head1 FILES + +L<public-inbox-extindex-format(5)> + +=head1 CONFIGURATION + +public-inbox-extindex does not write to the L<public-inbox-config(5)> +file, it must be entered manually. +The extindex name of C<all> is a special case which +corresponds to indexing C<--all> inboxes. An example for +C<--all> is as follows: + + [extindex "all"] + topdir = /path/to/extindex_dir + url = all + coderepo = foo + coderepo = bar + +Putting an C<extindex> entry in the config allows L<PublicInbox::WWW>. +You can have any number of C<extentry.$NAME> sections where C<$NAME> +is something other than C<all> to display a union of several inboxes. + +It is strongly recommended any public inboxes indexed by this command +have a stable C<publicinbox.$NAME.newsgroup> entry (regardless of +the presence of an NNTP or IMAP server). Otherwise, public-inbox-extindex +will use C<publicinbox.$NAME.inboxdir> as an internal key which can +cause needless reindexing and require L<--gc> if inboxes are relocated. + +See L<public-inbox-config(5)> for more details. + +=head1 ENVIRONMENT + +=over 8 + +=item PI_CONFIG + +Used to override the default "~/.public-inbox/config" value. + +=item XAPIAN_FLUSH_THRESHOLD + +The number of documents to update before committing changes to +disk. This environment is handled directly by Xapian, refer to +Xapian API documentation for more details. + +Setting C<XAPIAN_FLUSH_THRESHOLD> or +C<publicinbox.indexBatchSize> for a large C<--reindex> may cause +L<public-inbox-mda(1)>, L<public-inbox-learn(1)> and +L<public-inbox-watch(1)> tasks to wait long and unpredictable +periods of time during C<--reindex>. + +Default: none, uses C<publicinbox.indexBatchSize> + +=back + +=head1 UPGRADING + +Occasionally, public-inbox will update its schema version and +require a full index by running this command. + +=head1 LOCKING + +It is safe to use C<--dedupe>, C<--gc> and C<--reindex> while +other processes are writing to covered inboxes or extindex. +The extindex locks will be released roughly every 10s to +allow L<public-inbox-mda(1)> and L<public-inbox-watch(1)> +processes to write to the extindex. + +=head1 CONTACT + +Feedback welcome via plain-text mail to L<mailto:meta@public-inbox.org> + +The mail archives are hosted at L<https://public-inbox.org/meta/> and +L<http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/meta/> + +=head1 COPYRIGHT + +Copyright all contributors L<mailto:meta@public-inbox.org> + +License: AGPL-3.0+ L<https://www.gnu.org/licenses/agpl-3.0.txt> + +=head1 SEE ALSO + +L<Search::Xapian>, L<DBD::SQLite> |