public-inbox.git  about / heads / tags
an "archives first" approach to mailing lists
blob b53e45ed4a3eb21869530ff96c6b7d9a14cdb1a7 4666 bytes (raw)
$ git show HEAD:Documentation/public-inbox-extindex.pod	# shows this blob on the CLI

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
 
=head1 NAME

public-inbox-extindex - create and update external search indices

=head1 SYNOPSIS

public-inbox-extindex [OPTIONS] EXTINDEX_DIR INBOX_DIR...

public-inbox-extindex [OPTIONS] [EXTINDEX_DIR] --all

=head1 DESCRIPTION

public-inbox-extindex creates and updates an external search and
overview database used by the read-only public-inbox PSGI (HTTP),
NNTP, and IMAP interfaces.  This requires either the
L<Xapian> SWIG bindings OR or L<Search::Xapian> XS bindings
along with L<DBD::SQLite> and L<DBI> Perl modules.

=head1 OPTIONS

=over

=item -j JOBS

=item --jobs=JOBS

=item --no-fsync

=item --dangerous

=item --rethread

=item --max-size SIZE

=item --batch-size SIZE

These switches behave as they do for L<public-inbox-index(1)>

=item --all

Index all C<publicinbox> entries in C<PI_CONFIG>.

C<publicinbox> entries indexed by C<public-inbox-extindex> can
have full Xapian searching abilities with the per-C<publicinbox>
C<indexlevel> set to C<basic> and their respective Xapian
(C<xap15> or C<xapian15>) directories removed.  For multiple
public-inboxes where cross-posting is common, this allows
significant space savings on Xapian indices.

=item --dedupe=MSGID

=item --dedupe

Rerun deduplication on messages with the given Message-ID or
all messages if no Message-ID is specified.  Deduplication rules may
change and evolve over time, especially if filters are involved.

C<--dedupe=MSGID> may be specified multiple times to deduplicate
multiple Message-IDs.

Use this if you see C<W: BUG? $MSGID not deduplicated properly>
warnings from WWW logs.

=item --gc

Perform garbage collection instead of indexing.  Use this if
inboxes are removed from the extindex, a newsgroup name is
set or changed, or if messages are purged or removed from
some inboxes.

=item --reindex

Forces a re-index of all messages in the extindex.  This can be
used for in-place upgrades and bugfixes while read-only server
processes are utilizing the index.  Keep in mind this roughly
doubles the size of the already-large Xapian database.

=item --fast

Used with C<--reindex>, it will only look for new and stale
entries and not touch already-indexed messages.

=back

=head1 FILES

L<public-inbox-extindex-format(5)>

=head1 CONFIGURATION

public-inbox-extindex does not write to the L<public-inbox-config(5)>
file, it must be entered manually.
The extindex name of C<all> is a special case which
corresponds to indexing C<--all> inboxes.  An example for
C<--all> is as follows:

	[extindex "all"]
		topdir = /path/to/extindex_dir
		url = all
		coderepo = foo
		coderepo = bar

Putting an C<extindex> entry in the config allows L<PublicInbox::WWW>.
You can have any number of C<extentry.$NAME> sections where C<$NAME>
is something other than C<all> to display a union of several inboxes.

It is strongly recommended any public inboxes indexed by this command
have a stable C<publicinbox.$NAME.newsgroup> entry (regardless of
the presence of an NNTP or IMAP server).  Otherwise, public-inbox-extindex
will use C<publicinbox.$NAME.inboxdir> as an internal key which can
cause needless reindexing and require L<--gc> if inboxes are relocated.

See L<public-inbox-config(5)> for more details.

=head1 ENVIRONMENT

=over 8

=item PI_CONFIG

Used to override the default "~/.public-inbox/config" value.

=item XAPIAN_FLUSH_THRESHOLD

The number of documents to update before committing changes to
disk.  This environment is handled directly by Xapian, refer to
Xapian API documentation for more details.

Setting C<XAPIAN_FLUSH_THRESHOLD> or
C<publicinbox.indexBatchSize> for a large C<--reindex> may cause
L<public-inbox-mda(1)>, L<public-inbox-learn(1)> and
L<public-inbox-watch(1)> tasks to wait long and unpredictable
periods of time during C<--reindex>.

Default: none, uses C<publicinbox.indexBatchSize>

=back

=head1 UPGRADING

Occasionally, public-inbox will update its schema version and
require a full index by running this command.

=head1 LOCKING

It is safe to use C<--dedupe>, C<--gc> and C<--reindex> while
other processes are writing to covered inboxes or extindex.
The extindex locks will be released roughly every 10s to
allow L<public-inbox-mda(1)> and L<public-inbox-watch(1)>
processes to write to the extindex.

=head1 CONTACT

Feedback welcome via plain-text mail to L<mailto:meta@public-inbox.org>

The mail archives are hosted at L<https://public-inbox.org/meta/> and
L<http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/meta/>

=head1 COPYRIGHT

Copyright all contributors L<mailto:meta@public-inbox.org>

License: AGPL-3.0+ L<https://www.gnu.org/licenses/agpl-3.0.txt>

=head1 SEE ALSO

L<Search::Xapian>, L<DBD::SQLite>

git clone https://public-inbox.org/public-inbox.git
git clone http://7fh6tueqddpjyxjmgtdiueylzoqt6pt7hec3pukyptlmohoowvhde4yd.onion/public-inbox.git