PUBLIC-INBOX-CINDEX(1)     public-inbox user manual     PUBLIC-INBOX-CINDEX(1)

NAME
       public-inbox-cindex - create and update coderepo search indices

SYNOPSIS
       public-inbox-cindex -d EXTDIR [OPTIONS] --join

       public-inbox-cindex -d EXTDIR [OPTIONS] --update

       public-inbox-cindex [OPTIONS] -g GIT_DIR [-g GIT_DIR]...

DESCRIPTION
       public-inbox-cindex creates and updates the Xapian search index for git
       code repository ("coderepo") search.  It can associate (fuzzy join)
       coderepos with Xapian-indexed inboxes to enable blob reconstruction in
       the "$INBOX_URL/$BLOB_OID/s/") WWW endpoint.  It only indexes commit
       messages and diffs as they would show up in an email.  It does not
       currently index the contents of blobs directly.

       Like inbox indices, coderepo indices can either be internal or external
       to a coderepo.  Either way, they're both created and updated through
       public-inbox-cindex.  External indices via "-d EXTDIR" are recommended
       for sites hosting multiple coderepos with common history.

       Currently, public-inbox-cindex exists mainly to save WWW admins the
       trouble of associating hundreds/thousands of inboxes and coderepos with
       each other via "publicinbox.*.coderepo" and "coderepo.*.dir" directives
       in public-inbox-config(5)).  Eventually, it will allow lei-rediff(1)
       functionality to be ported to the WWW UI and allow searching commits in
       coderepos directly via WWW interface.

       Once the initial indices are created by public-inbox-cindex, the
       "--update" switch will incrementally update them.

OPTIONS
       -d EXTDIR
           Use the given directory as an external index.  External indices are
           generally recommended to internal indices since they do not need
           write access to any coderepos themselves.  They are highly
           recommended when many coderepos share a common history or if there
           is an M:N relationship between inboxes and coderepos.

       -g GIT_DIR
       --git-dir=GIT_DIR
           When not using "-d EXTDIR", the cindex will be written to
           "$GIT_DIR/public-inbox-cindex".  May also be combined with "-d
           EXTDIR" to index a single (or subset of) git coderepos.

           May be specified multiple times.

       -j JOBS
       --jobs=JOBS
           Influences the number of Xapian indexing shards.

           If the repo has not been indexed or initialized, "JOBS - 1" shards
           will be created.

           Default: the number of existing Xapian shards

       --join
           Attempt a fuzzy association of all inboxes and coderepos to enhance
           the WWW interface.  See "EXAMPLE" below.

           A C++ compiler, xapian-delve(1) and Xapian development files (e.g.
           libxapian-dev or xapian-core-devel) will make this operation orders
           of magnitude faster.

           This operation should be rerun whenever inboxes or coderepos are
           added or removed, or when one project merges with another.

           Web servers running PublicInbox::WWW (e.g. public-inbox-netd(1) or
           public-inbox-httpd(1)) currently need to be restarted to pick up
           new (or expire old) associations.

       --reindex
           Forces a re-index of all commits.  This can be used for in-place
           upgrades and bugfixes while read-only processes are utilizing the
           index.

       --update
       -u  Incrementally index all previously-indexed coderepos without
           checking for new ones.

       --prune
           Unindexes commits which are no longer accessible via git.  Use this
           after git-gc(1) (or git-prune(1)), or if coderepos are removed.

       --no-fsync
       --dangerous
       --max-size SIZE
       --batch-size SIZE
           These affect the coderepo index the same way they affect inbox
           indices.  See public-inbox-index(1).

           A smaller value of "--max-size" (e.g. "--max-size=10m") is highly
           recommended to limit memory usage for gigantic commits.

       --project-list=FILE
           The same project list used by cgit, gitweb, grokmirror and
           public-inbox-clone(1).  Requires "--project-root=DIRECTORY".

       --project-root=DIRECTORY
       -r DIRECTORY
           Specifies the top-level directory for projects in
           "--project-list=FILE".

       --exclude (GLOB|PATH)
           Exclude given coderepos when using "--project-list=FILE".  May be
           specified multiple times.

FILES
       For internal indices, the Xapian DB is stored in
       "$GIT_DIR/public-inbox-cindex".

       External indices are stored wherever "-d EXTDIR" points.

CONFIGURATION
       publicinbox.indexMaxSize
       publicinbox.indexBatchSize
               These configuration knobs affect the coderepo index the same
               way they affect inbox indices.  See public-inbox-index(1).

       cindex.$NAME.topdir
               Directory where an external coderepo index was created (with
               "-d EXTDIR").

               $NAME is the URL prefix (without leading/trailing slashes) for
               all coderepos in the WWW interface, an empty string ("") is
               allowed when repos are stored at the toplevel.

               Combined with cindex.$NAME.localprefix, this allows admins to
               avoid specifying a separate coderepo.$NICK.dir entry for every
               coderepo indexed.

       cindex.$NAME.localprefix
               The local directory name prefix of all coderepos to be
               displayed in the WWW interface.  This is typically a
               subdirectory in "--project-root=DIRECTORY"

ENVIRONMENT
       PI_CONFIG
               Used to override the default "~/.public-inbox/config" value.

       XAPIAN_FLUSH_THRESHOLD
               The number of documents to update before committing changes to
               disk.  This variable is handled directly by Xapian, refer to
               Xapian API documentation for more details.

               Use "publicinbox.indexBatchSize" instead.

UPGRADING
       Occasionally, public-inbox will update its schema version and require a
       full reindex by running this command with "--reindex".

EXAMPLE
       Assuming you have an "all" extindex for your inboxes and store
       coderepos in "/path/to/repos", the contents of your "PI_CONFIG" file
       should include something like this:

               [extindex "all"]
                       topdir = /path/to/eidx-all
               [cindex "pub"]
                       localprefix = /path/to/repos/pub
                       topdir = /path/to/cidx-all

       Assuming you're using a cgit/gitweb/grokmirror/public-inbox-clone(1)
       compatible --project-list, you can periodically use "--join" when new
       coderepos are added, deleted, or when one project merges with another:

               public-inbox-cindex -d /path/to/cidx-all --max-size=10m \
                       -L medium --join \
                       --exclude='**/uninteresting-a.git' \
                       --exclude='**/uninteresting-b.git' \
                       --project-list=/path/to/repos/projects.list \
                       --project-root=/path/to/repos

       After this (and restarting the webserver), a project in
       "/path/to/pub/project.git" should be visible via
       "https://$HOSTNAME/pub/project.git" from a WWW instance and the bottom
       of the project.git HTML page should display a list of "associated
       public inboxes".   When viewing patch emails in any associated inbox,
       diff hunk headers (those "@@ -123,4 +123,4 @@" lines) will link to
       "$INBOX_URL/$BLOB_OID/s/" URLs which attempt to display the blob
       (reconstructing blobs from patch emails if necessary).

       Since "--join" is expensive and (coderepo|inbox) additions/removals are
       rare, incrementally updating the index can be done more quickly with
       "--update":

               public-inbox-cindex -d /path/to/cidx-all --max-size=10m --update

CONTACT
       Feedback welcome via plain-text mail to <mailto:meta@public-inbox.org>

       The mail archives are hosted at <https://public-inbox.org/meta/> and
       <http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/meta/>

COPYRIGHT
       Copyright all contributors <mailto:meta@public-inbox.org>

       License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>

SEE ALSO
       public-inbox-index(1)

public-inbox.git                  1993-10-02            PUBLIC-INBOX-CINDEX(1)