about summary refs log tree commit homepage
path: root/Documentation/public-inbox-extindex.pod
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/public-inbox-extindex.pod')
-rw-r--r--Documentation/public-inbox-extindex.pod180
1 files changed, 180 insertions, 0 deletions
diff --git a/Documentation/public-inbox-extindex.pod b/Documentation/public-inbox-extindex.pod
new file mode 100644
index 00000000..2db7d7e9
--- /dev/null
+++ b/Documentation/public-inbox-extindex.pod
@@ -0,0 +1,180 @@
+=head1 NAME
+
+public-inbox-extindex - create and update external search indices
+
+=head1 SYNOPSIS
+
+public-inbox-extindex [OPTIONS] EXTINDEX_DIR INBOX_DIR...
+
+public-inbox-extindex [OPTIONS] [EXTINDEX_DIR] --all
+
+=head1 DESCRIPTION
+
+public-inbox-extindex creates and updates an external search and
+overview database used by the read-only public-inbox PSGI (HTTP),
+NNTP, and IMAP interfaces.  This requires either the
+L<Xapian> SWIG bindings OR or L<Search::Xapian> XS bindings
+along with L<DBD::SQLite> and L<DBI> Perl modules.
+
+=head1 OPTIONS
+
+=over
+
+=item -j JOBS
+
+=item --jobs=JOBS
+
+=item --no-fsync
+
+=item --dangerous
+
+=item --rethread
+
+=item --max-size SIZE
+
+=item --batch-size SIZE
+
+These switches behave as they do for L<public-inbox-index(1)>
+
+=item --all
+
+Index all C<publicinbox> entries in C<PI_CONFIG>.
+
+C<publicinbox> entries indexed by C<public-inbox-extindex> can
+have full Xapian searching abilities with the per-C<publicinbox>
+C<indexlevel> set to C<basic> and their respective Xapian
+(C<xap15> or C<xapian15>) directories removed.  For multiple
+public-inboxes where cross-posting is common, this allows
+significant space savings on Xapian indices.
+
+=item --dedupe=MSGID
+
+=item --dedupe
+
+Rerun deduplication on messages with the given Message-ID or
+all messages if no Message-ID is specified.  Deduplication rules may
+change and evolve over time, especially if filters are involved.
+
+C<--dedupe=MSGID> may be specified multiple times to deduplicate
+multiple Message-IDs.
+
+Use this if you see C<W: BUG? $MSGID not deduplicated properly>
+warnings from WWW logs.
+
+=item --gc
+
+Perform garbage collection instead of indexing.  Use this if
+inboxes are removed from the extindex, a newsgroup name is
+set or changed, or if messages are purged or removed from
+some inboxes.
+
+=item --reindex
+
+Forces a re-index of all messages in the extindex.  This can be
+used for in-place upgrades and bugfixes while read-only server
+processes are utilizing the index.  Keep in mind this roughly
+doubles the size of the already-large Xapian database.
+
+=item --fast
+
+Used with C<--reindex>, it will only look for new and stale
+entries and not touch already-indexed messages.
+
+=item --no-multi-pack-index
+
+Disable writing a L<git-multi-pack-index(1)> file to save memory.
+Normally, enabling multi-pack-index speeds up startup time of
+subsequent L<git-cat-file(1)> processes by 3-4%, but generating
+this file requires several GB of memory with large repos.
+
+Unlike the C<core.multiPackIndex> directive in git, it's still
+possible to read existing multi-pack-index files if they are
+created elsewhere.
+
+Available in public-inbox 2.0.0+
+
+=back
+
+=head1 FILES
+
+L<public-inbox-extindex-format(5)>
+
+=head1 CONFIGURATION
+
+public-inbox-extindex does not write to the L<public-inbox-config(5)>
+file, it must be entered manually.
+The extindex name of C<all> is a special case which
+corresponds to indexing C<--all> inboxes.  An example for
+C<--all> is as follows:
+
+        [extindex "all"]
+                topdir = /path/to/extindex_dir
+                url = all
+                coderepo = foo
+                coderepo = bar
+
+Putting an C<extindex> entry in the config allows L<PublicInbox::WWW>.
+You can have any number of C<extentry.$NAME> sections where C<$NAME>
+is something other than C<all> to display a union of several inboxes.
+
+It is strongly recommended any public inboxes indexed by this command
+have a stable C<publicinbox.$NAME.newsgroup> entry (regardless of
+the presence of an NNTP or IMAP server).  Otherwise, public-inbox-extindex
+will use C<publicinbox.$NAME.inboxdir> as an internal key which can
+cause needless reindexing and require L<--gc> if inboxes are relocated.
+
+See L<public-inbox-config(5)> for more details.
+
+=head1 ENVIRONMENT
+
+=over 8
+
+=item PI_CONFIG
+
+Used to override the default "~/.public-inbox/config" value.
+
+=item XAPIAN_FLUSH_THRESHOLD
+
+The number of documents to update before committing changes to
+disk.  This environment is handled directly by Xapian, refer to
+Xapian API documentation for more details.
+
+Setting C<XAPIAN_FLUSH_THRESHOLD> or
+C<publicinbox.indexBatchSize> for a large C<--reindex> may cause
+L<public-inbox-mda(1)>, L<public-inbox-learn(1)> and
+L<public-inbox-watch(1)> tasks to wait long and unpredictable
+periods of time during C<--reindex>.
+
+Default: none, uses C<publicinbox.indexBatchSize>
+
+=back
+
+=head1 UPGRADING
+
+Occasionally, public-inbox will update its schema version and
+require a full index by running this command.
+
+=head1 LOCKING
+
+It is safe to use C<--dedupe>, C<--gc> and C<--reindex> while
+other processes are writing to covered inboxes or extindex.
+The extindex locks will be released roughly every 10s to
+allow L<public-inbox-mda(1)> and L<public-inbox-watch(1)>
+processes to write to the extindex.
+
+=head1 CONTACT
+
+Feedback welcome via plain-text mail to L<mailto:meta@public-inbox.org>
+
+The mail archives are hosted at L<https://public-inbox.org/meta/> and
+L<http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/meta/>
+
+=head1 COPYRIGHT
+
+Copyright all contributors L<mailto:meta@public-inbox.org>
+
+License: AGPL-3.0+ L<https://www.gnu.org/licenses/agpl-3.0.txt>
+
+=head1 SEE ALSO
+
+L<Search::Xapian>, L<DBD::SQLite>