PUBLIC-INBOX-CINDEX(1) public-inbox user manual PUBLIC-INBOX-CINDEX(1)
NAME
public-inbox-cindex - create and update coderepo search indices
SYNOPSIS
public-inbox-cindex -d EXTDIR [OPTIONS] --join
public-inbox-cindex -d EXTDIR [OPTIONS] --update
public-inbox-cindex [OPTIONS] -g GIT_DIR [-g GIT_DIR]...
DESCRIPTION
public-inbox-cindex creates and updates the Xapian search index for git
code repository ("coderepo") search. It can associate (fuzzy join)
coderepos with Xapian-indexed inboxes to enable blob reconstruction in
the "$INBOX_URL/$BLOB_OID/s/") WWW endpoint. It only indexes commit
messages and diffs as they would show up in an email. It does not
currently index the contents of blobs directly.
Like inbox indices, coderepo indices can either be internal or external
to a coderepo. Either way, they're both created and updated through
public-inbox-cindex. External indices via "-d EXTDIR" are recommended
for sites hosting multiple coderepos with common history.
Currently, public-inbox-cindex exists mainly to save WWW admins the
trouble of associating hundreds/thousands of inboxes and coderepos with
each other via "publicinbox.*.coderepo" and "coderepo.*.dir" directives
in public-inbox-config(5)). Eventually, it will allow lei-rediff(1)
functionality to be ported to the WWW UI and allow searching commits in
coderepos directly via WWW interface.
Once the initial indices are created by public-inbox-cindex, the
"--update" switch will incrementally update them.
OPTIONS
-d EXTDIR
Use the given directory as an external index. External indices are
generally recommended to internal indices since they do not need
write access to any coderepos themselves. They are highly
recommended when many coderepos share a common history or if there
is an M:N relationship between inboxes and coderepos.
-g GIT_DIR
--git-dir=GIT_DIR
When not using "-d EXTDIR", the cindex will be written to
"$GIT_DIR/public-inbox-cindex". May also be combined with "-d
EXTDIR" to index a single (or subset of) git coderepos.
May be specified multiple times.
-j JOBS
--jobs=JOBS
Influences the number of Xapian indexing shards.
If the repo has not been indexed or initialized, "JOBS - 1" shards
will be created.
Default: the number of existing Xapian shards
--join
Attempt a fuzzy association of all inboxes and coderepos to enhance
the WWW interface. See "EXAMPLE" below.
A C++ compiler, xapian-delve(1) and Xapian development files (e.g.
libxapian-dev or xapian-core-devel) will make this operation orders
of magnitude faster.
This operation should be rerun whenever inboxes or coderepos are
added or removed, or when one project merges with another.
Web servers running PublicInbox::WWW (e.g. public-inbox-netd(1) or
public-inbox-httpd(1)) currently need to be restarted to pick up
new (or expire old) associations.
--reindex
Forces a re-index of all commits. This can be used for in-place
upgrades and bugfixes while read-only processes are utilizing the
index.
--update
-u Incrementally index all previously-indexed coderepos without
checking for new ones.
--prune
Unindexes commits which are no longer accessible via git. Use this
after git-gc(1) (or git-prune(1)), or if coderepos are removed.
--no-fsync
--dangerous
--max-size SIZE
--batch-size SIZE
These affect the coderepo index the same way they affect inbox
indices. See public-inbox-index(1).
A smaller value of "--max-size" (e.g. "--max-size=10m") is highly
recommended to limit memory usage for gigantic commits.
--project-list=FILE
The same project list used by cgit, gitweb, grokmirror and
public-inbox-clone(1). Requires "--project-root=DIRECTORY".
--project-root=DIRECTORY
-r DIRECTORY
Specifies the top-level directory for projects in
"--project-list=FILE".
--exclude (GLOB|PATH)
Exclude given coderepos when using "--project-list=FILE". May be
specified multiple times.
FILES
For internal indices, the Xapian DB is stored in
"$GIT_DIR/public-inbox-cindex".
External indices are stored wherever "-d EXTDIR" points.
CONFIGURATION
publicinbox.indexMaxSize
publicinbox.indexBatchSize
These configuration knobs affect the coderepo index the same
way they affect inbox indices. See public-inbox-index(1).
cindex.$NAME.topdir
Directory where an external coderepo index was created (with
"-d EXTDIR").
$NAME is the URL prefix (without leading/trailing slashes) for
all coderepos in the WWW interface, an empty string ("") is
allowed when repos are stored at the toplevel.
Combined with cindex.$NAME.localprefix, this allows admins to
avoid specifying a separate coderepo.$NICK.dir entry for every
coderepo indexed.
cindex.$NAME.localprefix
The local directory name prefix of all coderepos to be
displayed in the WWW interface. This is typically a
subdirectory in "--project-root=DIRECTORY"
ENVIRONMENT
PI_CONFIG
Used to override the default "~/.public-inbox/config" value.
XAPIAN_FLUSH_THRESHOLD
The number of documents to update before committing changes to
disk. This variable is handled directly by Xapian, refer to
Xapian API documentation for more details.
Use "publicinbox.indexBatchSize" instead.
UPGRADING
Occasionally, public-inbox will update its schema version and require a
full reindex by running this command with "--reindex".
EXAMPLE
Assuming you have an "all" extindex for your inboxes and store
coderepos in "/path/to/repos", the contents of your "PI_CONFIG" file
should include something like this:
[extindex "all"]
topdir = /path/to/eidx-all
[cindex "pub"]
localprefix = /path/to/repos/pub
topdir = /path/to/cidx-all
Assuming you're using a cgit/gitweb/grokmirror/public-inbox-clone(1)
compatible --project-list, you can periodically use "--join" when new
coderepos are added, deleted, or when one project merges with another:
public-inbox-cindex -d /path/to/cidx-all --max-size=10m \
-L medium --join \
--exclude='**/uninteresting-a.git' \
--exclude='**/uninteresting-b.git' \
--project-list=/path/to/repos/projects.list \
--project-root=/path/to/repos
After this (and restarting the webserver), a project in
"/path/to/pub/project.git" should be visible via
"https://$HOSTNAME/pub/project.git" from a WWW instance and the bottom
of the project.git HTML page should display a list of "associated
public inboxes". When viewing patch emails in any associated inbox,
diff hunk headers (those "@@ -123,4 +123,4 @@" lines) will link to
"$INBOX_URL/$BLOB_OID/s/" URLs which attempt to display the blob
(reconstructing blobs from patch emails if necessary).
Since "--join" is expensive and (coderepo|inbox) additions/removals are
rare, incrementally updating the index can be done more quickly with
"--update":
public-inbox-cindex -d /path/to/cidx-all --max-size=10m --update
CONTACT
Feedback welcome via plain-text mail to <mailto:meta@public-inbox.org>
The mail archives are hosted at <https://public-inbox.org/meta/> and
<http://4uok3hntl7oi7b4uf4rtfwefqeexfzil2w6kgk2jn5z2f764irre7byd.onion/meta/>
COPYRIGHT
Copyright all contributors <mailto:meta@public-inbox.org>
License: AGPL-3.0+ <https://www.gnu.org/licenses/agpl-3.0.txt>
SEE ALSO
public-inbox-index(1)
public-inbox.git 1993-10-02 PUBLIC-INBOX-CINDEX(1)