1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
| | =head1 NAME
public-inbox-index - create and update search indices
=head1 SYNOPSIS
public-inbox-index [OPTIONS] INBOX_DIR...
=head1 DESCRIPTION
public-inbox-index creates and updates the search, overview and
NNTP article number database used by the read-only public-inbox
HTTP and NNTP interfaces. Currently, this requires
L<DBD::SQLite> and L<DBI> Perl modules. L<Search::Xapian>
is optional, only to support the PSGI search interface.
Once the initial indices are created by public-inbox-index,
L<public-inbox-mda(1)> and L<public-inbox-watch(1)> will
automatically maintain them.
Running this manually to update indices is only required if
relying on L<git-fetch(1)> to mirror an existing public-inbox;
or if upgrading to a new version of public-inbox using
the C<--reindex> option.
Having the overview and article number database is essential to
running the NNTP interface, and strongly recommended for the
HTTP interface as it provides thread grouping in addition to
normal search functionality.
=head1 OPTIONS
=over
=item --jobs=JOBS, -j
Control the number of Xapian indexing jobs in a
(L<public-inbox-v2-format(5)>) inbox.
C<--jobs=0> is accepted as of public-inbox 1.6.0 (PENDING)
to disable parallel indexing.
Default: the number of existing Xapian shards
=item --compact / -c
Compacts the Xapian DBs after indexing. This is recommended
when using C<--reindex> to avoid running out of disk space
while indexing multiple inboxes.
While option takes a negligible amount of time compared to
C<--reindex>, it requires temporarily duplicating the entire
contents of the Xapian DB.
This switch may be specified twice, in which case compaction
happens both before and after indexing to minimize the temporal
footprint of the (re)indexing operation.
Available since public-inbox 1.4.0.
=item --reindex
Forces a re-index of all messages in the inbox.
This can be used for in-place upgrades and bugfixes while
NNTP/HTTP server processes are utilizing the index. Keep in
mind this roughly doubles the size of the already-large
Xapian database. Using this with C<--compact> or running
L<public-inbox-compact(1)> afterwards is recommended to
release free space.
public-inbox protects writes to various indices with L<flock(2)>,
so it is safe to reindex while L<public-inbox-watch(1)>,
L<public-inbox-mda(1)> or L<public-inbox-learn(1)> run.
This does not touch the NNTP article number database or
affect threading.
=item --prune
Run L<git-gc(1)> to prune and expire reflogs if discontiguous history
is detected. This is intended to be used in mirrors after running
L<public-inbox-edit(1)> or L<public-inbox-purge(1)> to ensure data
is expunged from mirrors.
Available since public-inbox 1.2.0.
=item --max-size SIZE
Sets or overrides L</publicinbox.indexMaxSize> on a
per-invocation basis. See L</publicinbox.indexMaxSize>
below.
Available since public-inbox 1.5.0.
=item --batch-size SIZE
Sets or overrides L</publicinbox.indexBatchSize> on a
per-invocation basis. See L</publicinbox.indexBatchSize>
below.
Available in public-inbox 1.6.0 (PENDING).
=back
=head1 FILES
For v1 (ssoma) repositories described in L<public-inbox-v1-format>.
All public-inbox-specific files are contained within the
C<$GIT_DIR/public-inbox/> directory.
v2 inboxes are described in L<public-inbox-v2-format>.
=head1 CONFIGURATION
=over 8
=item publicinbox.indexMaxSize
Prevents indexing of messages larger than the specified size
value. A single suffix modifier of C<k>, C<m> or C<g> is
supported, thus the value of C<1m> to prevents indexing of
messages larger than one megabyte.
This is useful for avoiding memory exhaustion in mirrors.
This option is only available in public-inbox 1.5 or later.
Default: none
=item publicinbox.indexBatchSize
Flushes changes to the filesystem and releases locks after
indexing the given number of bytes. The default value of C<1m>
(one megabyte) is low to minimize memory use and reduce
contention with parallel invocations of L<public-inbox-mda(1)>,
L<public-inbox-learn(1)>, and L<public-inbox-watch(1)>.
Increase this value on powerful systems to improve throughput at
the expense of memory use. The reduction of lock granularity
may not be noticeable on fast systems.
This option is available in public-inbox 1.6 or later.
public-inbox 1.5 and earlier used the current default, C<1m>.
For L<public-inbox-v2-format(5)> inboxes, this value is
multiplied by the number of Xapian shards. Thus a typical v2
inbox with 3 shards will flush every 3 megabytes by default.
Default: 1m (one megabyte)
=back
=head1 ENVIRONMENT
=over 8
=item PI_CONFIG
Used to override the default "~/.public-inbox/config" value.
=item XAPIAN_FLUSH_THRESHOLD
The number of documents to update before committing changes to
disk. This environment is handled directly by Xapian, refer to
Xapian API documentation for more details.
For public-inbox 1.6 and later, use C<publicinbox.indexBatchSize>
instead. Setting C<XAPIAN_FLUSH_THRESHOLD> for a large C<--reindex>
may cause L<public-inbox-mda(1)>, L<public-inbox-learn(1)> and
L<public-inbox-watch(1)> tasks to wait long periods of time
during C<--reindex>.
Default: none, uses C<publicinbox.indexBatchSize>
=back
=head1 UPGRADING
Occasionally, public-inbox will update it's schema version and
require a full index by running this command.
=head1 CONTACT
Feedback welcome via plain-text mail to L<mailto:meta@public-inbox.org>
The mail archives are hosted at L<https://public-inbox.org/meta/>
and L<http://hjrcffqmbrq6wope.onion/meta/>
=head1 COPYRIGHT
Copyright 2016-2020 all contributors L<mailto:meta@public-inbox.org>
License: AGPL-3.0+ L<https://www.gnu.org/licenses/agpl-3.0.txt>
=head1 SEE ALSO
L<Search::Xapian>, L<DBD::SQLite>
|