about summary refs log tree commit homepage
path: root/script/public-inbox-xcpdb
DateCommit message (Collapse)
2023-05-04xcpdb: support cindex upgrades and resharding
xcpdb is necessary for upgrading Xapian backends (e.g. glass to honey), thus codesearch indices (cindex) must be supported. Resharding is also useful if CPU count is altered on system upgrades or downgrades. cindex Xapian sharding is completely different than anything else we do, so the resharding operation must be a special case based on existing cindex sharding rules.
2021-09-15support -C (chdir) for most non-daemon commands
Because make(1), git(1), tar(1) all support -C in this form, as do our newer commands such as lei, public-inbox-{clone,fetch}.
2021-07-31extindex: -xcpdb and -compact support
Since extindex uses Xapian shards in a similar way to v2 inboxes, we'll support -xcpdb (reshard+upgrade) and -compact all the same to give admins tuning+upgrade options.
2021-07-28treewide: s/sequential_shard/sequential-shard/g
The underscore variant was never documented and maintaining the difference between the command-line and internal hash is not worth it.
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2020-09-02script/*: fold $usage into $help, support `-h' instead of -?
`-h' doesn't conflict with anything, and some users (including git users) may be more accustomed to using it rather than the rarely-seen-outside-of-Getopt::Long `-?' switch. We can also rely on the GetOptions() function to emit a proper error message instead of just "bad command-line args".
2020-08-14index|compact|xcpdb: support --all switch
For -index, this is a convenient way to quickly index all inboxes after a grok-pull. Might as well support it for rarely used commands like -compact and -xcpdb, too.
2020-08-13xcpdb: wire up new index options and --help
--sequential-shard also disables the copy parallelism (--jobs), so it can be useful for systems unable to handle parallel random I/O but still want many shards. There was a missing "use strict", too, which is fixed.
2020-08-13xcpdb: support --no-fsync from CLI
This was omitted in 8b1950055d51d436 :x Fixes: 8b1950055d51d436 ("index+xcpdb: rename `--no-sync' to `--no-fsync'")
2020-07-25index+xcpdb: support --no-sync flag
This allows us to speed up indexing operations to SQLite and Xapian. Unfortunately, it doesn't affect operations using `xapian-compact' and the compactor API, since that doesn't seem to support Xapian::DB_NO_SYNC, yet.
2020-02-06treewide: run update-copyrights from gnulib for 2019
I didn't wait until September to do it, this year!
2019-06-14xcpdb: support resharding v2 repos
v2 repos are sometimes created on machines where CPU parallelization exceeds the capability of the storage devices. In that case, users may reshard the Xapian DB to any smaller, positive integer to avoid excessive overhead and contention when bottlenecked by slow storage. Resharding can also be used to increase shard count after hardware upgrades.
2019-05-24doc: xcpdb: add switch documentation
In particular, the '--compact' switch is really useful since it works without holding the inbox-wide lock for minutes at a time on giant inboxes (inboxes where copies can take dozens, if not hundreds of minutes).
2019-05-23xcpdb|compact: support some xapian-compact switches
Allow users to specify the --blocksize <B>, --no-full, --fuller options for xapian-compact(1) for fine-tuning compact behavior for low-traffic/inactive inboxes. We also won't support --multipass, since it doesn't seem compatible with our requirement to use --no-renumber. We also won't support --single-file, since it only seems intended for totally dead inboxes; and it doesn't seem worth the support overhead when "totally dead" turns out to be a misdiagnosis.
2019-05-23compact: reuse infrastructure from xcpdb
Since -xcpdb is a superset of -compact, we can reuse much of that code used for driving compact. For compact (only), this is slightly less memory efficient since it requires an extra process per-partition, but we get to prefix the output with the partition name for more readable output.
2019-05-23xcpdb: implement progress reporting
Copying an entire Xapian DB is horribly slow whether it's done via Perl or copydatabase(1). So displaying some progress indication is good for user experience. While we're at it, prefix xapian-compact output, too; since parallel processes end up clobbering each other.
2019-05-23xapcmd: xcpdb supports compaction
To minimize the delay on active inboxes, it's actually ideal to run xapian-compact at the end of the per-partition cpdb process; since the new DB isn't accessible yet and so we don't have to deal with lock contention with -mda or -watch processes. The downside is temporary file overhead (3x instead of 2x) required.
2019-05-23xcpdb: implement using Perl bindings
By avoid copydatabase(1) entirely, we can make further changes to avoid locking the entire inbox for a long operation and switch to fine-grained locking.
2019-05-23xcpdb: new tool which wraps Xapian's copydatabase(1)
copydatabase(1) is an existing Xapian tool which is the recommended way to upgrade existing DBs to the latest Xapian database format (currently "glass" for stable/released versions). Our use of Xapian relies on preserving document IDs, so we'll wrap it like we do xapian-compact(1) and use the "--no-renumber" switch. I could not name the tool "public-inbox-copydatabase" since it would be ambiguous as to which DB it's actually copying. So, I abbreviated the suffix to "xcpdb" (Xapian CoPy DataBase), which I hope is acceptable and unambiguous.