about summary refs log tree commit homepage
path: root/Documentation/public-inbox-extindex.pod
diff options
context:
space:
mode:
Diffstat (limited to 'Documentation/public-inbox-extindex.pod')
-rw-r--r--Documentation/public-inbox-extindex.pod99
1 files changed, 91 insertions, 8 deletions
diff --git a/Documentation/public-inbox-extindex.pod b/Documentation/public-inbox-extindex.pod
index 2e2e6383..2db7d7e9 100644
--- a/Documentation/public-inbox-extindex.pod
+++ b/Documentation/public-inbox-extindex.pod
@@ -10,13 +10,10 @@ public-inbox-extindex [OPTIONS] [EXTINDEX_DIR] --all
 
 =head1 DESCRIPTION
 
-FIXME: behavior not finalized  It should probably write to the
-config file the first time --all is used.
-
 public-inbox-extindex creates and updates an external search and
 overview database used by the read-only public-inbox PSGI (HTTP),
 NNTP, and IMAP interfaces.  This requires either the
-L<Search::Xapian> XS bindings OR the L<Xapian> SWIG bindings,
+L<Xapian> SWIG bindings OR or L<Search::Xapian> XS bindings
 along with L<DBD::SQLite> and L<DBI> Perl modules.
 
 =head1 OPTIONS
@@ -27,7 +24,17 @@ along with L<DBD::SQLite> and L<DBI> Perl modules.
 
 =item --jobs=JOBS
 
-... TODO, see L<public-inbox-index(5)>
+=item --no-fsync
+
+=item --dangerous
+
+=item --rethread
+
+=item --max-size SIZE
+
+=item --batch-size SIZE
+
+These switches behave as they do for L<public-inbox-index(1)>
 
 =item --all
 
@@ -40,6 +47,52 @@ C<indexlevel> set to C<basic> and their respective Xapian
 public-inboxes where cross-posting is common, this allows
 significant space savings on Xapian indices.
 
+=item --dedupe=MSGID
+
+=item --dedupe
+
+Rerun deduplication on messages with the given Message-ID or
+all messages if no Message-ID is specified.  Deduplication rules may
+change and evolve over time, especially if filters are involved.
+
+C<--dedupe=MSGID> may be specified multiple times to deduplicate
+multiple Message-IDs.
+
+Use this if you see C<W: BUG? $MSGID not deduplicated properly>
+warnings from WWW logs.
+
+=item --gc
+
+Perform garbage collection instead of indexing.  Use this if
+inboxes are removed from the extindex, a newsgroup name is
+set or changed, or if messages are purged or removed from
+some inboxes.
+
+=item --reindex
+
+Forces a re-index of all messages in the extindex.  This can be
+used for in-place upgrades and bugfixes while read-only server
+processes are utilizing the index.  Keep in mind this roughly
+doubles the size of the already-large Xapian database.
+
+=item --fast
+
+Used with C<--reindex>, it will only look for new and stale
+entries and not touch already-indexed messages.
+
+=item --no-multi-pack-index
+
+Disable writing a L<git-multi-pack-index(1)> file to save memory.
+Normally, enabling multi-pack-index speeds up startup time of
+subsequent L<git-cat-file(1)> processes by 3-4%, but generating
+this file requires several GB of memory with large repos.
+
+Unlike the C<core.multiPackIndex> directive in git, it's still
+possible to read existing multi-pack-index files if they are
+created elsewhere.
+
+Available in public-inbox 2.0.0+
+
 =back
 
 =head1 FILES
@@ -48,7 +101,29 @@ L<public-inbox-extindex-format(5)>
 
 =head1 CONFIGURATION
 
-... TODO, see L<public-inbox-index(5)>
+public-inbox-extindex does not write to the L<public-inbox-config(5)>
+file, it must be entered manually.
+The extindex name of C<all> is a special case which
+corresponds to indexing C<--all> inboxes.  An example for
+C<--all> is as follows:
+
+        [extindex "all"]
+                topdir = /path/to/extindex_dir
+                url = all
+                coderepo = foo
+                coderepo = bar
+
+Putting an C<extindex> entry in the config allows L<PublicInbox::WWW>.
+You can have any number of C<extentry.$NAME> sections where C<$NAME>
+is something other than C<all> to display a union of several inboxes.
+
+It is strongly recommended any public inboxes indexed by this command
+have a stable C<publicinbox.$NAME.newsgroup> entry (regardless of
+the presence of an NNTP or IMAP server).  Otherwise, public-inbox-extindex
+will use C<publicinbox.$NAME.inboxdir> as an internal key which can
+cause needless reindexing and require L<--gc> if inboxes are relocated.
+
+See L<public-inbox-config(5)> for more details.
 
 =head1 ENVIRONMENT
 
@@ -76,9 +151,17 @@ Default: none, uses C<publicinbox.indexBatchSize>
 
 =head1 UPGRADING
 
-Occasionally, public-inbox will update it's schema version and
+Occasionally, public-inbox will update its schema version and
 require a full index by running this command.
 
+=head1 LOCKING
+
+It is safe to use C<--dedupe>, C<--gc> and C<--reindex> while
+other processes are writing to covered inboxes or extindex.
+The extindex locks will be released roughly every 10s to
+allow L<public-inbox-mda(1)> and L<public-inbox-watch(1)>
+processes to write to the extindex.
+
 =head1 CONTACT
 
 Feedback welcome via plain-text mail to L<mailto:meta@public-inbox.org>
@@ -88,7 +171,7 @@ L<http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/meta/>
 
 =head1 COPYRIGHT
 
-Copyright 2021 all contributors L<mailto:meta@public-inbox.org>
+Copyright all contributors L<mailto:meta@public-inbox.org>
 
 License: AGPL-3.0+ L<https://www.gnu.org/licenses/agpl-3.0.txt>