about summary refs log tree commit homepage
path: root/lib/PublicInbox/Xapcmd.pm
DateCommit message (Collapse)
2021-09-23xcpdb: avoid race when shards are added
It's possible for the rename() sequence to cause read-only daemons using ->xdb_shards_flat to load an incomplete set of contiguous shards and get invalid docids for search results. With this change, we favor the case where search is momentarily unavailable rather than giving wrong results during the small window where Xapcmd->commit_changes runs.
2021-09-23xcpdb: -R$SHARDS creates new shards with correct perms
"Correct" meaning the permissions match that of the parent xap15 or ei15 directory.
2021-09-22treewide: fix %SIG localization, harder
This fixes the occasional t/lei-sigpipe.t infinite loop under "make check-run". Link: http://nntp.perl.org/group/perl.perl5.porters/258784 <CAHhgV8hPbcmkzWizp6Vijw921M5BOXixj4+zTh3nRS9vRBYk8w@mail.gmail.com> Followup-to: b552bb9150775fe4 ("daemon+watch: fix localization of %SIG for non-signalfd users")
2021-08-11treewide: use *nix-specific dirname regexps
None of our code elsewhere accounts for non-*nix pathnames and it's not worth our time to start. So stop wasting CPU cycles giving the illusion that we'd care about non-*nix pathnames.
2021-08-08searchidx: die on Xapian load errors
Xapian bindings may not be installed or be out-of-date w.r.t. the Perl version, improve the visibility of errors in those cases. Cleanup and drop some redundant checks while we're at it. Cc: "Toke Høiland-Jørgensen" <toke@toke.dk> Link: https://public-inbox.org/meta/87k0ky5mbd.fsf@toke.dk/
2021-07-31extindex: -xcpdb and -compact support
Since extindex uses Xapian shards in a similar way to v2 inboxes, we'll support -xcpdb (reshard+upgrade) and -compact all the same to give admins tuning+upgrade options.
2021-03-28treewide: shorten temporary filename
File::Temp only requires four 'X' characters (unlike mkstemp(3), which requires six). So only so only give it 4 to avoid an 80-column violation and maybe save metadata space on FSes.
2021-02-07xapcmd: avoid potential die surprise in children
Make some notes about sub usage, this may be converted to use workqueues once the cmsg dependency is dropped.
2021-01-24treewide: reseed RNG in child processes
This prevents name conflicts leading to retries and slowdowns in temporary file name generation. No actual data corruption resulted because all temporary files are opened with O_EXCL anyways. This may increase security for IMAP, NNTP, and HTTPS sessions using TLS, but it's all public data anyways.
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2020-12-25inboxwritable: delay umask_prepare calls
This simplifies all ->with_umask callers and opens the door for further optimizations to delay/elide process spawning.
2020-12-17inbox: simplify v2 epoch counting
Perl readdir detects list context and can return an array suitable for the grep op. From there, we can rely on substr to remove the ".git" suffix and integerize the value to save a few bytes before letting List::Util::max return the value. This is how we detect Xapian shards nowadays, too, and we'll also use defined-or (//) to simplify the return value there. We'll also simplify InboxWritable->git_dir_latest, remove some callers, and consider removing it entirely.
2020-11-07v2: some changes for ExtSearchIdx compatibility
We'll be using per-sync-state {ibx} refs instead, so make parts of the v2 indexing code less-dependent on $self->{ibx} where $self is a V2Writable object.
2020-08-27over: rename ->connect method to ->dbh
`->connect' is confused with the perlfunc for the `connect(2)' syscall, and also `DBI->connect'. Since SQLite doesn't use sockets, the word "connect" needlessly confuses me. Give it a short name to match the field name we use for it, which also matches the variable name used by the DBI(3pm) and DBD::SQLite(3pm) manpages.
2020-08-20xapcmd: simplify {reindex} parameter passing
No need to localize it, here, since we can just refer to it in the `$opt' hashref. Hopefully this improves readability for others like it does for me. I sometimes wonder if the concept of a stack in high-level languages is even necessary...
2020-08-13xcpdb: wire up new index options and --help
--sequential-shard also disables the copy parallelism (--jobs), so it can be useful for systems unable to handle parallel random I/O but still want many shards. There was a missing "use strict", too, which is fixed.
2020-08-13xapcmd: reduce CPU idling when shards exceeds job count
In case there's unbalanced shards AND we're limiting parallelism while using many shards, spawn the next task in the queue ASAP once a task is done, instead of waiting for all tasks to finish before spawning the next batch. Unbalanced shards probably isn't a big issue for most users; however many smaller shards with few jobs can be useful for HDD users to reduce the effect of random writes.
2020-08-13xapcmd: simplify sub reference
We don't need to fully-qualify when referring to subs in the same namespace, nor do we need make a SCALAR ref only to dereference it (Yes, still learning Perl :x)
2020-08-10index+xcpdb: improve SIG{INT,TERM,HUP,PIPE} behavior
-index now invokes ->DESTROY like xcpdb does, which is necessary to cleanup $INBOX_DIR/msgmap-XXXXXXX files. We'll also exit with the expected values for various signals by adding 128 as described in <https://www.tldp.org/LDP/abs/html/exitcodes.html> -xcpdb now terminates worker processes and xapian-compact(1) invocations when prematurely killed, too.
2020-08-08support setting No_COW on Perl <5.22
fileno(DIRHANDLE) only works on Perl 5.22+, so we need to use dirfd(3) ourselves from Inline::C (or rely on chattr(1) being installed). While we're at it, rename `set_nodatacow' to `nodatacow_fd' for consistency with `nodatacow_dir'.
2020-08-07index+xcpdb: rename `--no-sync' to `--no-fsync'
We'll continue supporting `--no-sync' even if its yet-to-make it it into a release, but the term `sync' is overloaded in our codebase which may be confusing to new hackers and users. None of our our code nor dependencies issue the sync(2) syscall, either, only fsync(2) and fdatasync(2).
2020-08-07xapcmd: drop outdated comment
We replaced Xtmpdir with File::Temp->newdir in commit 2a3e3a0469f54f6a4f80bf04614e5ddd794a6c5e ("xapcmd: replace Xtmpdirs with File::Temp->newdir") but forgot to remove the outdated comment.
2020-08-07xapcmd: remove redundant searchidx require
We already "use" it starting with commit cd8dd7b08fddc7c2b5f218c3fcaa5dca5f9ad945 ("search: support SWIG-generated Xapian.pm"), so there's no need to require it redundantly.
2020-08-07xapcmd: quietly no-op on indexlevel=basic
I find myself mindlessly adding "-c" to public-inbox-index, and other users may do the same. Instead of erroring out, we'll just silently ignore it, for now and allow public-inbox-compact to work on SQLite-only inboxes. We'll only check for xapian-compact if search exists, since it won't be needed in case we support SQLite VACUUM.
2020-07-29xapcmd: -xcpdb and -compact disable CoW, too
This gives an opportunity for users already suffering from CoW fragmentation to at least get the Xapian DBs off CoW. Aside from over.sqlite3 in v1, the SQLite DBs remain untouched; though VACUUM support may come in the future.
2020-07-26index: --compact respects --jobs
And -compact supports --jobs=0 like -index to disable parallel execution. Running three xapian-compact processes in parallel on a USB 2.0 HDD is pretty painful.
2020-07-25index+xcpdb: support --no-sync flag
This allows us to speed up indexing operations to SQLite and Xapian. Unfortunately, it doesn't affect operations using `xapian-compact' and the compactor API, since that doesn't seem to support Xapian::DB_NO_SYNC, yet.
2020-07-25xapcmd: set {from} properly for v1 inboxes
This was a bug, but I'm not sure where it matters, yet, but it may matter in the future.
2020-07-17with_umask: pass args to callback
While it makes the code flow slightly less well in some places, it saves us runtime allocations and indentation.
2020-07-14xapcmd: delay over->check_inodes trigger
We must not trigger wakeups on InboxIdle users until after we've renamed all files into place. Otherwise, the InboxIdle caller may just reopen the old (soon-to-be-unlinked) file. This fixes occasional test failures in t/nntpd.t Fixes: f977826a17f8735e ("lock: reduce inotify wakeups")
2020-06-25lock: reduce inotify wakeups
We can reduce the amount of platform-specific code by always relying on IN_MODIFY/NOTE_WRITE notifications from lock release. This reduces the number of times our read-only daemons will need to wake up when -watch sees no-op message changes (e.g. replied, seen, recent flag changes).
2020-04-15testcommon: DESTROY: wait for killed daemon
Otherwise, the waitpid(-1, 0) call in Xapcmd::process_queue() may reap it in a subsequent test when using t/run.perl to reuse processes for testing. While we're at it, make Xapcmd::process_queue warn about unknown PIDs in case other PIDs leak through to us in the future.
2020-03-29index: support --compact / -c on command-line
It's more convenient to specify `-c' / `--compact' on the command-line when reindexing than it is to invoke public-inbox-compact(1) separately. This is especially convenient in low-space situations when public-inbox-index is operating on multiple inboxes sequentially, as compaction can happen immediately after indexing each inbox, instead of waiting until all inboxes are indexed.
2020-02-06treewide: run update-copyrights from gnulib for 2019
I didn't wait until September to do it, this year!
2020-01-27xapcmd: increase scope of lock
The old lock scope was only sufficient for protecting against concurrent modifications from the common -mda, -watch, or -learn writers. It was not sufficient for protecting against parallel -compact or -xcpdb invocations from eager admins. Most of the time this only leads to confusing and misleading warning messages, but parallel xcpdb --reshard could lead to errors.
2020-01-27inbox: add ->version method
This allows us to simplify version checking by avoiding "//" or "||" operators sprinkled around.
2020-01-13xapcmd: use popen_rd for running xapian-compact
public-inbox-compact wrapper displays progress by default, anyways, and there's not a lot of output, so simplify our code by using popen_rd instead of spawn + optional pipe. While we're at it use "while (<HANDLE>)" to display progress as it happens, since "foreach (<$HANDLE>)" slurps the contents into an array, first.
2020-01-06treewide: "require" + "use" cleanup and docs
There's a bunch of leftover "require" and "use" statements we no longer need and can get rid of, along with some excessive imports via "use". IO::Handle usage isn't always obvious, so add comments describing why a package loads it. Along the same lines, document the tmpdir support as the reason we depend on File::Temp 0.19, even though every Perl 5.10.1+ user has it. While we're at it, favor "use" over "require", since it it gives us extra compile-time checking.
2019-12-30spawn: allow passing GLOB handles for redirects
We can save callers the trouble of {-hold} and {-dev_null} refs as well as the trouble of calling fileno().
2019-12-24search: support SWIG-generated Xapian.pm
Xapian upstream is slowly phasing out the XS-based Search::Xapian in favor of the SWIG-generated "Xapian" package. While Debian and both FreeBSD have Search::Xapian, OpenBSD only includes the "Xapian" binding. More information about the status of the "Xapian" Perl module here: https://trac.xapian.org/ticket/523
2019-11-24xapcmd: replace Xtmpdirs with File::Temp->newdir
Since we're using Perl 5.10.1 and File::Temp 0.19+, we don't need Xtmpdirs at all for cleaning up tempdirs on failure and can just rely on the DESTROY handler provided by File::Temp.
2019-11-16xapcmd: do not fire END and DESTROY handlers in child
We need to bypass whatever Test::More does with END/DESTROY handlers for use in lon-lived process. This doesn't affect any of our normal code since we don't use END/DESTROY for Xapcmd and its callers.
2019-11-16inboxwritable: add ->cleanup method
We've been using this in -edit, and will be using it in some more scripts and tests to optimize for run_mode=2 with run_script. Keeping this in the *Writable modules since I don't see it being useful for the WWW and NNTP read-only interfaces which use PublicInbox::Inbox.
2019-11-14inboxwritable: drop {-importer} cyclic reference
InboxWritable caching the result of ->importer leads to a circular references with returned (V2Writable|Import) object holds onto the calling InboxWritable object. With public-inbox-watch, this leads to a memory leak if a user is reloading via SIGHUP after a message is imported (it would only become noticeable with SIGHUPs after every message imported). I would not expect anybody to to notice this in real-world usage. I only noticed this since I was making -xcpdb suitable for long-lived process use (e.g. "mod_perl style") and a flock remained unreleased on v1 inboxes after resharding. WatchMaildir (used by -watch) already handles caching of the importer object itself, and all of our other real-world uses of ->importer are short-lived or designed for batch scripts, so there's no need to cache the importer result internally.
2019-11-14xapcmd: localize %SIG changes using "local"
Perl's "local" allows changes to %SIG (and %ENV) to be limited to its enclosing block. This allows us to get rid of a global variable and ad-hoc method for restoring signal handlers.
2019-10-16config: support "inboxdir" in addition to "mainrepo"
"mainrepo" ws a bad name and artifact from the early days when I intended for there to be a "spamrepo" (now just the ENV{PI_EMERGENCY} Maildir). With v2, "mainrepo" can be especially confusing, since v2 needs at least two git repositories (epoch + all.git) to function and we shouldn't confuse users by having them point to a git repository for v2. Much of our documentation already references "INBOX_DIR" for command-line arguments, so use "inboxdir" as the git-config(1)-friendly variant for that. "mainrepo" remains supported indefinitely for compatibility. Users may need to revert to old versions, or may be referring to old documentation and must not be forced to change config files to account for this change. So if you're using "mainrepo" today, I do NOT recommend changing it right away because other bugs can lurk. Link: https://public-inbox.org/meta/874l0ice8v.fsf@alyssa.is/
2019-06-14xapcmd: favor 'shard' over 'part' in local variables
Yet another step to keeping our naming consistent with Xapian terminology.
2019-06-14xapcmd: update comments referencing "partitions"
Don't confuse future readers of our code.
2019-06-14v2writable: rename {partitions} field to {shards}
Our internal data structure should be consistent with Xapian terminology.
2019-06-14v2writable: count_partitions => count_shards
Another step towards becoming consistent with Xapian terminology