about summary refs log tree commit homepage
path: root/lib/PublicInbox/Xapcmd.pm
DateCommit message (Collapse)
2020-01-27xapcmd: increase scope of lock
The old lock scope was only sufficient for protecting against concurrent modifications from the common -mda, -watch, or -learn writers. It was not sufficient for protecting against parallel -compact or -xcpdb invocations from eager admins. Most of the time this only leads to confusing and misleading warning messages, but parallel xcpdb --reshard could lead to errors.
2020-01-27inbox: add ->version method
This allows us to simplify version checking by avoiding "//" or "||" operators sprinkled around.
2020-01-13xapcmd: use popen_rd for running xapian-compact
public-inbox-compact wrapper displays progress by default, anyways, and there's not a lot of output, so simplify our code by using popen_rd instead of spawn + optional pipe. While we're at it use "while (<HANDLE>)" to display progress as it happens, since "foreach (<$HANDLE>)" slurps the contents into an array, first.
2020-01-06treewide: "require" + "use" cleanup and docs
There's a bunch of leftover "require" and "use" statements we no longer need and can get rid of, along with some excessive imports via "use". IO::Handle usage isn't always obvious, so add comments describing why a package loads it. Along the same lines, document the tmpdir support as the reason we depend on File::Temp 0.19, even though every Perl 5.10.1+ user has it. While we're at it, favor "use" over "require", since it it gives us extra compile-time checking.
2019-12-30spawn: allow passing GLOB handles for redirects
We can save callers the trouble of {-hold} and {-dev_null} refs as well as the trouble of calling fileno().
2019-12-24search: support SWIG-generated Xapian.pm
Xapian upstream is slowly phasing out the XS-based Search::Xapian in favor of the SWIG-generated "Xapian" package. While Debian and both FreeBSD have Search::Xapian, OpenBSD only includes the "Xapian" binding. More information about the status of the "Xapian" Perl module here: https://trac.xapian.org/ticket/523
2019-11-24xapcmd: replace Xtmpdirs with File::Temp->newdir
Since we're using Perl 5.10.1 and File::Temp 0.19+, we don't need Xtmpdirs at all for cleaning up tempdirs on failure and can just rely on the DESTROY handler provided by File::Temp.
2019-11-16xapcmd: do not fire END and DESTROY handlers in child
We need to bypass whatever Test::More does with END/DESTROY handlers for use in lon-lived process. This doesn't affect any of our normal code since we don't use END/DESTROY for Xapcmd and its callers.
2019-11-16inboxwritable: add ->cleanup method
We've been using this in -edit, and will be using it in some more scripts and tests to optimize for run_mode=2 with run_script. Keeping this in the *Writable modules since I don't see it being useful for the WWW and NNTP read-only interfaces which use PublicInbox::Inbox.
2019-11-14inboxwritable: drop {-importer} cyclic reference
InboxWritable caching the result of ->importer leads to a circular references with returned (V2Writable|Import) object holds onto the calling InboxWritable object. With public-inbox-watch, this leads to a memory leak if a user is reloading via SIGHUP after a message is imported (it would only become noticeable with SIGHUPs after every message imported). I would not expect anybody to to notice this in real-world usage. I only noticed this since I was making -xcpdb suitable for long-lived process use (e.g. "mod_perl style") and a flock remained unreleased on v1 inboxes after resharding. WatchMaildir (used by -watch) already handles caching of the importer object itself, and all of our other real-world uses of ->importer are short-lived or designed for batch scripts, so there's no need to cache the importer result internally.
2019-11-14xapcmd: localize %SIG changes using "local"
Perl's "local" allows changes to %SIG (and %ENV) to be limited to its enclosing block. This allows us to get rid of a global variable and ad-hoc method for restoring signal handlers.
2019-10-16config: support "inboxdir" in addition to "mainrepo"
"mainrepo" ws a bad name and artifact from the early days when I intended for there to be a "spamrepo" (now just the ENV{PI_EMERGENCY} Maildir). With v2, "mainrepo" can be especially confusing, since v2 needs at least two git repositories (epoch + all.git) to function and we shouldn't confuse users by having them point to a git repository for v2. Much of our documentation already references "INBOX_DIR" for command-line arguments, so use "inboxdir" as the git-config(1)-friendly variant for that. "mainrepo" remains supported indefinitely for compatibility. Users may need to revert to old versions, or may be referring to old documentation and must not be forced to change config files to account for this change. So if you're using "mainrepo" today, I do NOT recommend changing it right away because other bugs can lurk. Link: https://public-inbox.org/meta/874l0ice8v.fsf@alyssa.is/
2019-06-14xapcmd: favor 'shard' over 'part' in local variables
Yet another step to keeping our naming consistent with Xapian terminology.
2019-06-14xapcmd: update comments referencing "partitions"
Don't confuse future readers of our code.
2019-06-14v2writable: rename {partitions} field to {shards}
Our internal data structure should be consistent with Xapian terminology.
2019-06-14v2writable: count_partitions => count_shards
Another step towards becoming consistent with Xapian terminology
2019-06-14admin|xapcmd: user-facing messages say "shard"
We're slowly getting rid of the word "partition" when it comes to remain consistent with Xapian docs.
2019-06-14xcpdb: support resharding v2 repos
v2 repos are sometimes created on machines where CPU parallelization exceeds the capability of the storage devices. In that case, users may reshard the Xapian DB to any smaller, positive integer to avoid excessive overhead and contention when bottlenecked by slow storage. Resharding can also be used to increase shard count after hardware upgrades.
2019-06-14xcpdb: use destination shard as progress prefix
For M:N resharding, we'll want to display the number from the new shard number.
2019-06-14xapcmd: preserve indexlevel based on the destination
To support M:N resharding, we need to ensure we store the indexlevel in the destination shard, rather than the originating one.
2019-06-04require ASCII digits for local FS items
In case some BOFH decides to randomly create directories using non-ASCII digits all over the place.
2019-05-29searchidx: store indexlevel=medium as metadata
And use it from Admin. It's easy to tell what indexlevel=basic is from unconfigured inboxes, but distinguishing between 'medium' and 'full' would require stat()-ing position.* files which is fragile and Xapian-implementation-dependent. So use the metadata facility of Xapian and store it in the main partition so Admin tools can deal better with unconfigured inboxes copied using generic tools like cp(1) or rsync(1).
2019-05-29index: support --verbose option
It doesn't implement progress of batches, yet, but it wires up the parsing of the command-line while preserving output compatibility. This output is NOT meant to be stable.
2019-05-23xapcmd: do not reset %SIG until last Xtmpdir is done
To properly handle compact tmpdir cleanup in single process situations, we need to carefully account for Xtmpdir not being a singleton and ensuring we don't clobber signal handlers which belong to other Xtmpdirs.
2019-05-23xcpdb|compact: support --jobs/-j flag like gmake(1)
We don't have to be tied to the number of partitions in case we made a bad choice at initialization. This doesn't affect reindexing, but the copying phase is already intensive. And optimize away the extra process when we only have a single job which won't parallelize. The wording for the (v2) reindexing phase could be improved, later. I also plan to allow repartitioning of existing Xapian DBs.
2019-05-23xapcmd: cleanup on interrupted xcpdb "--compact"
We should not have leftover junk on interrupted invocations.
2019-05-23xcpdb|compact: support some xapian-compact switches
Allow users to specify the --blocksize <B>, --no-full, --fuller options for xapian-compact(1) for fine-tuning compact behavior for low-traffic/inactive inboxes. We also won't support --multipass, since it doesn't seem compatible with our requirement to use --no-renumber. We also won't support --single-file, since it only seems intended for totally dead inboxes; and it doesn't seem worth the support overhead when "totally dead" turns out to be a misdiagnosis.
2019-05-23compact: reuse infrastructure from xcpdb
Since -xcpdb is a superset of -compact, we can reuse much of that code used for driving compact. For compact (only), this is slightly less memory efficient since it requires an extra process per-partition, but we get to prefix the output with the partition name for more readable output.
2019-05-23xcpdb: remove temporary directories on aborts
Cleanup temporary directories on common termination signals (INT, HUP, PIPE, TERM), but only if it's not in the process of being committed via rename() sequence.
2019-05-23xcpdb: show re-indexing progress
Emit information about reindexing git revision ranges when used with xcpdb. Additionally, distinguish Xapian copy output from v2 git epoch counting by increasing directory context info. For now, v1 batches batches are emitted. v2 indexing is still missing progress reporting for batches, as the data structures for reindexing would benefit from a refactoring, first. This does not currently affect the use of public-inbox-index, but may in the future.
2019-05-23xapcmd: use "print STDERR" for progress reporting
`warn' is reserved for actual warnings, as it respects $SIG{__WARN__} and we rely on that override to print message context information when we are indexing.
2019-05-23xapcmd: avoid EXDEV when finalizing changes
By creating temporary directories as deep as possible, we can allow v2 repositories to have `xap$SCHEMA_VERSION' (e.g. `xap15') reside on a separate FS. We also check st_dev ahead-of-time to avoid doing work which will fail with EXDEV. Of course, another process may still move/change things around.
2019-05-23xcpdb: cleanup error handling and diagnosis
Running a full "public-inbox-index --reindex" in parallel with "public-inbox-xcpdb" on the same inbox can still cause problems, though.
2019-05-23xcpdb: implement progress reporting
Copying an entire Xapian DB is horribly slow whether it's done via Perl or copydatabase(1). So displaying some progress indication is good for user experience. While we're at it, prefix xapian-compact output, too; since parallel processes end up clobbering each other.
2019-05-23xcpdb: use fine-grained locking
Copying an entire Xapian DB takes a long time, so update our reindexing code to support partial reindexing, snapshot the pre-copydatabase git revisions, perform the lengthy copy, and do a partial reindex when the copy + renames are done.
2019-05-23xapcmd: xcpdb supports compaction
To minimize the delay on active inboxes, it's actually ideal to run xapian-compact at the end of the per-partition cpdb process; since the new DB isn't accessible yet and so we don't have to deal with lock contention with -mda or -watch processes. The downside is temporary file overhead (3x instead of 2x) required.
2019-05-23xcpdb: implement using Perl bindings
By avoid copydatabase(1) entirely, we can make further changes to avoid locking the entire inbox for a long operation and switch to fine-grained locking.
2019-05-23xapcmd: do not cleanup on errors
We move the old directory into the new directory, so avoid the situation where a bug or error could cause the tempdir cleanup to run and destroy both our old and new directories.
2019-05-23xapcmd: support spawn options
copydatabase(1) is exceptionally noisy and it's output is confusing when run in parallel. Support redirects at least, and env while we're at it to give us future options. We can also stuff a -jobs parameter into the options to limit parallelism since it can be useful for low-priority upgrade jobs.
2019-05-23xapcmd: new module for wrapping Xapian commands
Port public-inbox-compact(1) over to using it, and we will need to wrap copydatabase(1) to ease glass migrations, too.