about summary refs log tree commit homepage
path: root/script/public-inbox-index
DateCommit message (Collapse)
2020-02-06treewide: run update-copyrights from gnulib for 2019
I didn't wait until September to do it, this year!
2019-11-16index: pass global variables into subs
Avoid 'Variable "%s" will not stay shared' warnings when the contents of this script eval'ed into a sub.
2019-11-14inboxwritable: drop {-importer} cyclic reference
InboxWritable caching the result of ->importer leads to a circular references with returned (V2Writable|Import) object holds onto the calling InboxWritable object. With public-inbox-watch, this leads to a memory leak if a user is reloading via SIGHUP after a message is imported (it would only become noticeable with SIGHUPs after every message imported). I would not expect anybody to to notice this in real-world usage. I only noticed this since I was making -xcpdb suitable for long-lived process use (e.g. "mod_perl style") and a flock remained unreleased on v1 inboxes after resharding. WatchMaildir (used by -watch) already handles caching of the importer object itself, and all of our other real-world uses of ->importer are short-lived or designed for batch scripts, so there's no need to cache the importer result internally.
2019-05-29searchidx: store indexlevel=medium as metadata
And use it from Admin. It's easy to tell what indexlevel=basic is from unconfigured inboxes, but distinguishing between 'medium' and 'full' would require stat()-ing position.* files which is fragile and Xapian-implementation-dependent. So use the metadata facility of Xapian and store it in the main partition so Admin tools can deal better with unconfigured inboxes copied using generic tools like cp(1) or rsync(1).
2019-05-29index: remove warning on unconfigured inboxes
It's annoying for people using "git fetch && public-inbox-index" as one user; and running -httpd/-nntpd as a different user (where users see different config files).
2019-05-29index: support --verbose option
It doesn't implement progress of batches, yet, but it wires up the parsing of the command-line while preserving output compatibility. This output is NOT meant to be stable.
2019-05-23doc: various updates to reflect current state
-index documentation avoid redundant v1 information and refers readers to apropriate v1/v2 manpages. Search::Xapian can also be optional, now, as only the PSGI search interface uses it. Favor "INBOX_DIR" where appropriate, since "REPO_DIR" can be confused for code repos which we also support. XAPIAN_FLUSH_THRESHOLD is documented for all relevant bulk commands.
2019-05-23admin: move index_inbox over
We will be reindexing after copydatabase
2019-05-23admin: hoist out resolve_inboxes for -compact and -index
Both of these index-affecting commands should work similarly on the command-line. public-inbox-index no longer complains about unconfigured ~/.public-inbox/config; but often I found myself being annoyed by that, anyways...
2019-05-15admin: improve warnings and errors for missing modules
Since we lazy-load Xapian now, some errors may become more cryptic or buried. Try to improve that by making Admin show better errors.
2019-05-15lazy load Xapian and make it optional for v2
More tests work without Search::Xapian, now. Usability issues still need to be fixed
2019-05-14v2writable: allow setting nproc via creat options
Avoiding reliance on environment variables is a bit cleaner for writing tests
2019-05-06index: warn with info about the message as context
This can help users track down the source of warnings when presented with imperfect emails. While we're at it, make the __WARN__ callback in t/v2writable.t a no-op since we don't check for warnings, there.
2019-01-31Merge remote-tracking branch 'origin/purge'
* origin/purge: implement public-inbox-purge tool v2writable: read epoch on purge v2writable: cleanup processes when done v2writable: purge ignores non-existent git epoch directories v2writable: ->purge returns undef on no-op import: purge: reap fast-export process hoist out resolve_repo_dir from -index
2019-01-15index: allow working on unconfigured inboxes, again
2019-01-11hoist out resolve_repo_dir from -index
We'll be using it in future admin tools, and making this easier-to-test.
2019-01-02use PublicInbox::Config::each_inbox where appropriate
No need to reach into PublicInbox::Config internals and iterate through the hashref by hand
2018-07-18index: avoid false-positive warning on off-by-one
We subtract one from "jobs" to map to "partitions" to account for the overview index and git fast-import jobs.
2018-05-17index: avoid setting NPROC to undef
This quiets a warning inside Spawn.pm
2018-04-07index: allow specifying --jobs=0 to disable multiprocess
Not everybody needs multiprocess support.
2018-04-04v2: support incremental indexing + purge
This is important for people running mirrors via "git fetch", as they need to be kept up-to-date. Purging is also now supported in mirrors. The short-lived "--regenerate" option is gone and is now implicitly enabled as a result. It's still cheap when article number regeneration is unnecessary, as we track the range for each git repository.
2018-03-22v2writable: add NNTP article number regeneration support
Allow best-effort regeneration of NNTP article numbers from cloned git repositories in addition to indexing Xapian Article numbers will not remain consistent when we add purge support, though.
2018-03-22v2writable: support reindexing Xapian
This still requires a msgmap.sqlite3 file to exist, but it allows us to tweak Xapian indexing rules and reindex the Xapian database online while -watch is running.
2018-03-19index: s/GIT_DIR/REPO_DIR/
No functional changes, yet, but this makes future changes easier-to-read.
2018-02-07update copyrights for 2018
Using update-copyrights from gnulib While we're at it, use the SPDX identifier for AGPL-3.0+ to ease mechanical processing.
2016-11-18index: allow indexing before configuration
One may build the initial index on a powerful host and transfer it to a weaker one for incremental indexing. Thus there is no requirement to have a configured public-inbox for building the index unless a user needs altid support or some such.
2016-08-11search: support alt-ID for mapping legacy serial numbers
For some existing mailing list archives, messages are identified by serial number (such as NNTP article numbers in gmane). Those links may become inaccessible (as is the current case for gmane), so ensure users can still search based on old serial numbers. Now, I run the following periodically to get article numbers from gmane (while news.gmane.org remains): NNTPSERVER=news.gmane.org export NNTPSERVER GROUP=gmane.comp.version-control.git perl -I lib scripts/xhdr-num2mid $GROUP --msgmap=/path/to/gmane.sqlite3 (I might integrate this further with public-inbox-* scripts one day). My ~/.public-inbox/config as an added "altid" snippet which now looks like this: [publicinbox "git"] address = git@vger.kernel.org mainrepo = /path/to/git.vger.git newsgroup = inbox.comp.version-control.git ; relative pathnames expand to $mainrepo/public-inbox/$file altid = serial:gmane:file=gmane.sqlite3 And run "public-inbox-index --reindex /path/to/git.vger.git" periodically. This ought to allow searching for "gmane:12345" to work for Xapian-enabled instances. Disclaimer: while public-inbox supports NNTP and stable article serial numbers, use of those for public links is discouraged since it encourages centralization.
2016-07-31search: support reindexing existing search indices
This should make tweaking the way we search more efficiet by allowing us to avoid doubling destroying the index every time we want to change something. We also give priority to incremental indexing via public-inbox-{watch,mda} and have manual invocations of public-inbox-index perform batch updates while releasing ssoma.lock.
2016-04-28import: run git-update-server-info when done
We should update $GIT_DIR/info/refs for dumb HTTP clients whenever we make changes to the repository. The best place to update is immediately after making commits. This fixes a bug where public-inbox-learn did not properly update $GIT_DIR/info/refs after inserting or removing messages.
2016-02-27move executables to script/ directory
This seems to match more closely with what is expected of Perl packages based on how blib is used. Hopefully makes the top-level source tree less cluttered and things easier-to-find.