about summary refs log tree commit homepage
path: root/Documentation/public-inbox-extindex.pod
blob: 9731dfb0d2a77e03fb342b3f65da9a1bad7d80b0 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
=head1 NAME

public-inbox-extindex - create and update external search indices

=head1 SYNOPSIS

public-inbox-extindex [OPTIONS] EXTINDEX_DIR INBOX_DIR...

public-inbox-extindex [OPTIONS] [EXTINDEX_DIR] --all

=head1 DESCRIPTION

public-inbox-extindex creates and updates an external search and
overview database used by the read-only public-inbox PSGI (HTTP),
NNTP, and IMAP interfaces.  This requires either the
L<Search::Xapian> XS bindings OR the L<Xapian> SWIG bindings,
along with L<DBD::SQLite> and L<DBI> Perl modules.

=head1 OPTIONS

=over

=item -j JOBS

=item --jobs=JOBS

... TODO, see L<public-inbox-index(5)>

=item --all

Index all C<publicinbox> entries in C<PI_CONFIG>.

C<publicinbox> entries indexed by C<public-inbox-extindex> can
have full Xapian searching abilities with the per-C<publicinbox>
C<indexlevel> set to C<basic> and their respective Xapian
(C<xap15> or C<xapian15>) directories removed.  For multiple
public-inboxes where cross-posting is common, this allows
significant space savings on Xapian indices.

=item --gc

Perform garbage collection instead of indexing.  Use this if
inboxes are removed from the extindex, or if messages are
purged or removed from some inboxes.

=item --reindex

Forces a re-index of all messages in the extindex.  This can be
used for in-place upgrades and bugfixes while read-only server
processes are utilizing the index.  Keep in mind this roughly
doubles the size of the already-large Xapian database.

The extindex locks will be released roughly every 10s to
allow L<public-inbox-mda(1)> and L<public-inbox-watch(1)>
processes to write to the extindex.

=item --fast

Used with C<--reindex>, it will only look for new and stale
entries and not touch already-indexed messages.

=back

=head1 FILES

L<public-inbox-extindex-format(5)>

=head1 CONFIGURATION

public-inbox-extindex does not currently write to the
L<public-inbox-config(5)> file, configuration may be entered
manually.  The extindex name of C<all> is a special case which
corresponds to indexing C<--all> inboxes.  An example for
C<--all> is as follows:

	[extindex "all"]
		topdir = /path/to/extindex_dir
		url = all
		coderepo = foo
		coderepo = bar

See L<public-inbox-config(5)> for more details.

=head1 ENVIRONMENT

=over 8

=item PI_CONFIG

Used to override the default "~/.public-inbox/config" value.

=item XAPIAN_FLUSH_THRESHOLD

The number of documents to update before committing changes to
disk.  This environment is handled directly by Xapian, refer to
Xapian API documentation for more details.

Setting C<XAPIAN_FLUSH_THRESHOLD> or
C<publicinbox.indexBatchSize> for a large C<--reindex> may cause
L<public-inbox-mda(1)>, L<public-inbox-learn(1)> and
L<public-inbox-watch(1)> tasks to wait long and unpredictable
periods of time during C<--reindex>.

Default: none, uses C<publicinbox.indexBatchSize>

=back

=head1 UPGRADING

Occasionally, public-inbox will update it's schema version and
require a full index by running this command.

=head1 CONTACT

Feedback welcome via plain-text mail to L<mailto:meta@public-inbox.org>

The mail archives are hosted at L<https://public-inbox.org/meta/> and
L<http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/meta/>

=head1 COPYRIGHT

Copyright 2021 all contributors L<mailto:meta@public-inbox.org>

License: AGPL-3.0+ L<https://www.gnu.org/licenses/agpl-3.0.txt>

=head1 SEE ALSO

L<Search::Xapian>, L<DBD::SQLite>