NAME
    public-inbox-index - create and update search indices

SYNOPSIS
    public-inbox-index [OPTIONS] INBOX_DIR...

    public-inbox-index [OPTIONS] --all

DESCRIPTION
    public-inbox-index creates and updates the search, overview and NNTP
    article number database used by the read-only public-inbox HTTP and NNTP
    interfaces. Currently, this requires DBD::SQLite and DBI Perl modules.
    Search::Xapian is optional, only to support the PSGI search interface.

    Once the initial indices are created by public-inbox-index,
    public-inbox-mda(1) and public-inbox-watch(1) will automatically
    maintain them.

    Running this manually to update indices is only required if relying on
    git-fetch(1) to mirror an existing public-inbox; or if upgrading to a
    new version of public-inbox using the "--reindex" option.

    Having the overview and article number database is essential to running
    the NNTP interface, and strongly recommended for the HTTP interface as
    it provides thread grouping in addition to normal search functionality.

OPTIONS
    --jobs=JOBS, -j
        Influences the number of Xapian indexing shards in a
        (public-inbox-v2-format(5)) inbox.

        See "--jobs" in public-inbox-init(1) for a full description of
        sharding.

        "--jobs=0" is accepted as of public-inbox 1.6.0 to disable parallel
        indexing regardless of the number of pre-existing shards.

        If the inbox has not been indexed or initialized, "JOBS - 1" shards
        will be created (one job is always needed for indexing the overview
        and article number mapping).

        Default: the number of existing Xapian shards

    --compact / -c
        Compacts the Xapian DBs after indexing. This is recommended when
        using "--reindex" to avoid running out of disk space while indexing
        multiple inboxes.

        While option takes a negligible amount of time compared to
        "--reindex", it requires temporarily duplicating the entire contents
        of the Xapian DB.

        This switch may be specified twice, in which case compaction happens
        both before and after indexing to minimize the temporal footprint of
        the (re)indexing operation.

        Available since public-inbox 1.4.0.

    --reindex
        Forces a re-index of all messages in the inbox. This can be used for
        in-place upgrades and bugfixes while NNTP/HTTP server processes are
        utilizing the index. Keep in mind this roughly doubles the size of
        the already-large Xapian database. Using this with "--compact" or
        running public-inbox-compact(1) afterwards is recommended to release
        free space.

        public-inbox protects writes to various indices with flock(2), so it
        is safe to reindex (and rethread) while public-inbox-watch(1),
        public-inbox-mda(1) or public-inbox-learn(1) run.

        This does not touch the NNTP article number database. It does not
        affect threading unless "--rethread" is used.

    --all
        Index all inboxes configured in ~/.public-inbox/config. This is an
        alternative to specifying individual inboxes directories on the
        command-line.

    --rethread
        Regenerate internal THREADID and message thread associations when
        reindexing.

        This fixes some bugs in older versions of public-inbox. While it is
        possible to use this without "--reindex", it makes little sense to
        do so.

        Available in public-inbox 1.6.0+.

    --prune
        Run git-gc(1) to prune and expire reflogs if discontiguous history
        is detected. This is intended to be used in mirrors after running
        public-inbox-edit(1) or public-inbox-purge(1) to ensure data is
        expunged from mirrors.

        Available since public-inbox 1.2.0.

    --max-size SIZE
        Sets or overrides "publicinbox.indexMaxSize" on a per-invocation
        basis. See "publicinbox.indexMaxSize" below.

        Available since public-inbox 1.5.0.

    --batch-size SIZE
        Sets or overrides "publicinbox.indexBatchSize" on a per-invocation
        basis. See "publicinbox.indexBatchSize" below.

        When using rotational storage but abundant RAM, using a large value
        (e.g. "500m") with "--sequential-shard" can significantly speed up
        and reduce fragmentation during the initial index and full
        "--reindex" invocations (but not incremental updates).

        Available in public-inbox 1.6.0+.

    --no-fsync
        Disables fsync(2) and fdatasync(2) operations on SQLite and Xapian.
        This is only effective with Xapian 1.4+. This is primarily intended
        for systems with low RAM and the small (default) "--batch-size=1m".
        Users of large "--batch-size" may even find disabling fdatasync(2)
        causes too much dirty data to accumulate, resulting on latency
        spikes from writeback.

        Available in public-inbox 1.6.0+.

    --sequential-shard
        Sets or overrides "publicinbox.indexSequentialShard" on a
        per-invocation basis. See "publicinbox.indexSequentialShard" below.

        Available in public-inbox 1.6.0+.

    --skip-docdata
        Stop storing document data in Xapian on an existing inbox.

        See "--skip-docdata" in public-inbox-init(1) for description and
        caveats.

        Available in public-inbox 1.6.0+.

    --update-extindex=EXTINDEX, -E
        Update the given external index (public-inbox-extindex-format(5).
        Either the configured section name (e.g. "all") or a directory name
        may be specified.

        Defaults to "all" if "[extindex "all"]" is configured, otherwise no
        external indices are updated.

        May be specified multiple times in rare cases where multiple
        external indices are configured.

    --no-update-extindex
        Do not update the "all" external index by default. This negates all
        uses of "-E" / "--update-extindex=" on the command-line.

FILES
    For v1 (ssoma) repositories described in public-inbox-v1-format(5). All
    public-inbox-specific files are contained within the
    "$GIT_DIR/public-inbox/" directory.

    v2 inboxes are described in public-inbox-v2-format(5).

CONFIGURATION
    publicinbox.indexMaxSize
            Prevents indexing of messages larger than the specified size
            value. A single suffix modifier of "k", "m" or "g" is supported,
            thus the value of "1m" to prevents indexing of messages larger
            than one megabyte.

            This is useful for avoiding memory exhaustion in mirrors via
            git. It does not prevent public-inbox-mda(1) or
            public-inbox-watch(1) from importing (and indexing) a message.

            This option is only available in public-inbox 1.5 or later.

            Default: none

    publicinbox.indexBatchSize
            Flushes changes to the filesystem and releases locks after
            indexing the given number of bytes. The default value of "1m"
            (one megabyte) is low to minimize memory use and reduce
            contention with parallel invocations of public-inbox-mda(1),
            public-inbox-learn(1), and public-inbox-watch(1).

            Increase this value on powerful systems to improve throughput at
            the expense of memory use. The reduction of lock granularity may
            not be noticeable on fast systems. With SSDs, values above "4m"
            have little benefit.

            For public-inbox-v2-format(5) inboxes, this value is multiplied
            by the number of Xapian shards. Thus a typical v2 inbox with 3
            shards will flush every 3 megabytes by default unless
            parallelism is disabled via "--sequential-shard" or "--jobs=0".

            This influences memory usage of Xapian, but it is not exact. The
            actual memory used by Xapian and Perl has been observed in
            excess of 10x this value.

            This option is available in public-inbox 1.6 or later.
            public-inbox 1.5 and earlier used the current default, "1m".

            Default: 1m (one megabyte)

    publicinbox.indexSequentialShard
            For public-inbox-v2-format(5) inboxes, setting this to "true"
            allows indexing Xapian shards in multiple passes. This speeds up
            indexing on rotational storage with high seek latency by
            allowing individual shards to fit into the kernel page cache.

            Using a higher-than-normal number of "--jobs" with
            public-inbox-init(1) may be required to ensure individual shards
            are small enough to fit into cache.

            Warning: interrupting "public-inbox-index(1)" while this option
            is in use may leave the search indices out-of-date with respect
            to SQLite databases. WWW and IMAP users may notice incomplete
            search results, but it is otherwise non-fatal. Using "--reindex"
            will bring everything back up-to-date.

            Available in public-inbox 1.6.0+.

            This is ignored on public-inbox-v1-format(5) inboxes.

            Default: false, shards are indexed in parallel

    publicinbox.<name>.indexSequentialShard
            Identical to "publicinbox.indexSequentialShard", but only affect
            the inbox matching <name>.

ENVIRONMENT
    PI_CONFIG
            Used to override the default "~/.public-inbox/config" value.

    XAPIAN_FLUSH_THRESHOLD
            The number of documents to update before committing changes to
            disk. This environment is handled directly by Xapian, refer to
            Xapian API documentation for more details.

            For public-inbox 1.6 and later, use "publicinbox.indexBatchSize"
            instead.

            Setting "XAPIAN_FLUSH_THRESHOLD" or "publicinbox.indexBatchSize"
            for a large "--reindex" may cause public-inbox-mda(1),
            public-inbox-learn(1) and public-inbox-watch(1) tasks to wait
            long and unpredictable periods of time during "--reindex".

            Default: none, uses "publicinbox.indexBatchSize"

UPGRADING
    Occasionally, public-inbox will update it's schema version and require a
    full index by running this command.

CONTACT
    Feedback welcome via plain-text mail to <mailto:meta@public-inbox.org>

    The mail archives are hosted at <https://public-inbox.org/meta/> and
    <http://hjrcffqmbrq6wope.onion/meta/>

COPYRIGHT
    Copyright 2016-2021 all contributors <mailto:meta@public-inbox.org>

    License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>

SEE ALSO
    Search::Xapian, DBD::SQLite, public-inbox-extindex-format(5)