diff options
Diffstat (limited to 'Documentation/public-inbox-extindex.pod')
-rw-r--r-- | Documentation/public-inbox-extindex.pod | 99 |
1 files changed, 91 insertions, 8 deletions
diff --git a/Documentation/public-inbox-extindex.pod b/Documentation/public-inbox-extindex.pod index 2e2e6383..2db7d7e9 100644 --- a/Documentation/public-inbox-extindex.pod +++ b/Documentation/public-inbox-extindex.pod @@ -10,13 +10,10 @@ public-inbox-extindex [OPTIONS] [EXTINDEX_DIR] --all =head1 DESCRIPTION -FIXME: behavior not finalized It should probably write to the -config file the first time --all is used. - public-inbox-extindex creates and updates an external search and overview database used by the read-only public-inbox PSGI (HTTP), NNTP, and IMAP interfaces. This requires either the -L<Search::Xapian> XS bindings OR the L<Xapian> SWIG bindings, +L<Xapian> SWIG bindings OR or L<Search::Xapian> XS bindings along with L<DBD::SQLite> and L<DBI> Perl modules. =head1 OPTIONS @@ -27,7 +24,17 @@ along with L<DBD::SQLite> and L<DBI> Perl modules. =item --jobs=JOBS -... TODO, see L<public-inbox-index(5)> +=item --no-fsync + +=item --dangerous + +=item --rethread + +=item --max-size SIZE + +=item --batch-size SIZE + +These switches behave as they do for L<public-inbox-index(1)> =item --all @@ -40,6 +47,52 @@ C<indexlevel> set to C<basic> and their respective Xapian public-inboxes where cross-posting is common, this allows significant space savings on Xapian indices. +=item --dedupe=MSGID + +=item --dedupe + +Rerun deduplication on messages with the given Message-ID or +all messages if no Message-ID is specified. Deduplication rules may +change and evolve over time, especially if filters are involved. + +C<--dedupe=MSGID> may be specified multiple times to deduplicate +multiple Message-IDs. + +Use this if you see C<W: BUG? $MSGID not deduplicated properly> +warnings from WWW logs. + +=item --gc + +Perform garbage collection instead of indexing. Use this if +inboxes are removed from the extindex, a newsgroup name is +set or changed, or if messages are purged or removed from +some inboxes. + +=item --reindex + +Forces a re-index of all messages in the extindex. This can be +used for in-place upgrades and bugfixes while read-only server +processes are utilizing the index. Keep in mind this roughly +doubles the size of the already-large Xapian database. + +=item --fast + +Used with C<--reindex>, it will only look for new and stale +entries and not touch already-indexed messages. + +=item --no-multi-pack-index + +Disable writing a L<git-multi-pack-index(1)> file to save memory. +Normally, enabling multi-pack-index speeds up startup time of +subsequent L<git-cat-file(1)> processes by 3-4%, but generating +this file requires several GB of memory with large repos. + +Unlike the C<core.multiPackIndex> directive in git, it's still +possible to read existing multi-pack-index files if they are +created elsewhere. + +Available in public-inbox 2.0.0+ + =back =head1 FILES @@ -48,7 +101,29 @@ L<public-inbox-extindex-format(5)> =head1 CONFIGURATION -... TODO, see L<public-inbox-index(5)> +public-inbox-extindex does not write to the L<public-inbox-config(5)> +file, it must be entered manually. +The extindex name of C<all> is a special case which +corresponds to indexing C<--all> inboxes. An example for +C<--all> is as follows: + + [extindex "all"] + topdir = /path/to/extindex_dir + url = all + coderepo = foo + coderepo = bar + +Putting an C<extindex> entry in the config allows L<PublicInbox::WWW>. +You can have any number of C<extentry.$NAME> sections where C<$NAME> +is something other than C<all> to display a union of several inboxes. + +It is strongly recommended any public inboxes indexed by this command +have a stable C<publicinbox.$NAME.newsgroup> entry (regardless of +the presence of an NNTP or IMAP server). Otherwise, public-inbox-extindex +will use C<publicinbox.$NAME.inboxdir> as an internal key which can +cause needless reindexing and require L<--gc> if inboxes are relocated. + +See L<public-inbox-config(5)> for more details. =head1 ENVIRONMENT @@ -76,9 +151,17 @@ Default: none, uses C<publicinbox.indexBatchSize> =head1 UPGRADING -Occasionally, public-inbox will update it's schema version and +Occasionally, public-inbox will update its schema version and require a full index by running this command. +=head1 LOCKING + +It is safe to use C<--dedupe>, C<--gc> and C<--reindex> while +other processes are writing to covered inboxes or extindex. +The extindex locks will be released roughly every 10s to +allow L<public-inbox-mda(1)> and L<public-inbox-watch(1)> +processes to write to the extindex. + =head1 CONTACT Feedback welcome via plain-text mail to L<mailto:meta@public-inbox.org> @@ -88,7 +171,7 @@ L<http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/meta/> =head1 COPYRIGHT -Copyright 2021 all contributors L<mailto:meta@public-inbox.org> +Copyright all contributors L<mailto:meta@public-inbox.org> License: AGPL-3.0+ L<https://www.gnu.org/licenses/agpl-3.0.txt> |