Date | Commit message (Collapse) |
|
`->connect' is confused with the perlfunc for the `connect(2)'
syscall, and also `DBI->connect'. Since SQLite doesn't use
sockets, the word "connect" needlessly confuses me. Give
it a short name to match the field name we use for it, which
also matches the variable name used by the DBI(3pm) and
DBD::SQLite(3pm) manpages.
|
|
No need to localize it, here, since we can just refer to it
in the `$opt' hashref. Hopefully this improves readability
for others like it does for me.
I sometimes wonder if the concept of a stack in high-level
languages is even necessary...
|
|
--sequential-shard also disables the copy parallelism (--jobs),
so it can be useful for systems unable to handle parallel random
I/O but still want many shards.
There was a missing "use strict", too, which is fixed.
|
|
In case there's unbalanced shards AND we're limiting parallelism
while using many shards, spawn the next task in the queue ASAP
once a task is done, instead of waiting for all tasks to finish
before spawning the next batch.
Unbalanced shards probably isn't a big issue for most users;
however many smaller shards with few jobs can be useful for HDD
users to reduce the effect of random writes.
|
|
We don't need to fully-qualify when referring to subs in
the same namespace, nor do we need make a SCALAR ref only
to dereference it
(Yes, still learning Perl :x)
|
|
-index now invokes ->DESTROY like xcpdb does, which is necessary
to cleanup $INBOX_DIR/msgmap-XXXXXXX files. We'll also exit
with the expected values for various signals by adding 128
as described in <https://www.tldp.org/LDP/abs/html/exitcodes.html>
-xcpdb now terminates worker processes and xapian-compact(1)
invocations when prematurely killed, too.
|
|
fileno(DIRHANDLE) only works on Perl 5.22+, so we need to use
dirfd(3) ourselves from Inline::C (or rely on chattr(1) being
installed).
While we're at it, rename `set_nodatacow' to `nodatacow_fd'
for consistency with `nodatacow_dir'.
|
|
We'll continue supporting `--no-sync' even if its yet-to-make it
it into a release, but the term `sync' is overloaded in our
codebase which may be confusing to new hackers and users.
None of our our code nor dependencies issue the sync(2) syscall,
either, only fsync(2) and fdatasync(2).
|
|
We replaced Xtmpdir with File::Temp->newdir in
commit 2a3e3a0469f54f6a4f80bf04614e5ddd794a6c5e
("xapcmd: replace Xtmpdirs with File::Temp->newdir")
but forgot to remove the outdated comment.
|
|
We already "use" it starting with commit
cd8dd7b08fddc7c2b5f218c3fcaa5dca5f9ad945
("search: support SWIG-generated Xapian.pm"),
so there's no need to require it redundantly.
|
|
I find myself mindlessly adding "-c" to public-inbox-index,
and other users may do the same. Instead of erroring out,
we'll just silently ignore it, for now and allow
public-inbox-compact to work on SQLite-only inboxes.
We'll only check for xapian-compact if search exists, since
it won't be needed in case we support SQLite VACUUM.
|
|
This gives an opportunity for users already suffering from CoW
fragmentation to at least get the Xapian DBs off CoW. Aside
from over.sqlite3 in v1, the SQLite DBs remain untouched; though
VACUUM support may come in the future.
|
|
And -compact supports --jobs=0 like -index to disable parallel
execution. Running three xapian-compact processes in parallel
on a USB 2.0 HDD is pretty painful.
|
|
This allows us to speed up indexing operations to SQLite
and Xapian.
Unfortunately, it doesn't affect operations using
`xapian-compact' and the compactor API, since that doesn't seem
to support Xapian::DB_NO_SYNC, yet.
|
|
This was a bug, but I'm not sure where it matters, yet, but it
may matter in the future.
|
|
While it makes the code flow slightly less well in some places,
it saves us runtime allocations and indentation.
|
|
We must not trigger wakeups on InboxIdle users until after we've
renamed all files into place. Otherwise, the InboxIdle caller
may just reopen the old (soon-to-be-unlinked) file.
This fixes occasional test failures in t/nntpd.t
Fixes: f977826a17f8735e ("lock: reduce inotify wakeups")
|
|
We can reduce the amount of platform-specific code by always
relying on IN_MODIFY/NOTE_WRITE notifications from lock release.
This reduces the number of times our read-only daemons will
need to wake up when -watch sees no-op message changes
(e.g. replied, seen, recent flag changes).
|
|
Otherwise, the waitpid(-1, 0) call in Xapcmd::process_queue()
may reap it in a subsequent test when using t/run.perl to reuse
processes for testing.
While we're at it, make Xapcmd::process_queue warn about unknown
PIDs in case other PIDs leak through to us in the future.
|
|
It's more convenient to specify `-c' / `--compact' on the
command-line when reindexing than it is to invoke
public-inbox-compact(1) separately.
This is especially convenient in low-space situations when
public-inbox-index is operating on multiple inboxes
sequentially, as compaction can happen immediately after
indexing each inbox, instead of waiting until all inboxes are
indexed.
|
|
I didn't wait until September to do it, this year!
|
|
The old lock scope was only sufficient for protecting against
concurrent modifications from the common -mda, -watch, or -learn
writers.
It was not sufficient for protecting against parallel -compact
or -xcpdb invocations from eager admins. Most of the time this
only leads to confusing and misleading warning messages, but
parallel xcpdb --reshard could lead to errors.
|
|
This allows us to simplify version checking by avoiding
"//" or "||" operators sprinkled around.
|
|
public-inbox-compact wrapper displays progress by default,
anyways, and there's not a lot of output, so simplify our
code by using popen_rd instead of spawn + optional pipe.
While we're at it use "while (<HANDLE>)" to display
progress as it happens, since "foreach (<$HANDLE>)"
slurps the contents into an array, first.
|
|
There's a bunch of leftover "require" and "use" statements we no
longer need and can get rid of, along with some excessive
imports via "use".
IO::Handle usage isn't always obvious, so add comments
describing why a package loads it. Along the same lines,
document the tmpdir support as the reason we depend on
File::Temp 0.19, even though every Perl 5.10.1+ user has it.
While we're at it, favor "use" over "require", since it it gives
us extra compile-time checking.
|
|
We can save callers the trouble of {-hold} and {-dev_null}
refs as well as the trouble of calling fileno().
|
|
Xapian upstream is slowly phasing out the XS-based Search::Xapian
in favor of the SWIG-generated "Xapian" package. While Debian and
both FreeBSD have Search::Xapian, OpenBSD only includes the "Xapian"
binding.
More information about the status of the "Xapian" Perl module here:
https://trac.xapian.org/ticket/523
|
|
Since we're using Perl 5.10.1 and File::Temp 0.19+, we don't
need Xtmpdirs at all for cleaning up tempdirs on failure and
can just rely on the DESTROY handler provided by File::Temp.
|
|
We need to bypass whatever Test::More does with END/DESTROY
handlers for use in lon-lived process. This doesn't affect
any of our normal code since we don't use END/DESTROY for
Xapcmd and its callers.
|
|
We've been using this in -edit, and will be using it in some
more scripts and tests to optimize for run_mode=2 with
run_script.
Keeping this in the *Writable modules since I don't see it being
useful for the WWW and NNTP read-only interfaces which use
PublicInbox::Inbox.
|
|
InboxWritable caching the result of ->importer leads to a
circular references with returned (V2Writable|Import) object
holds onto the calling InboxWritable object.
With public-inbox-watch, this leads to a memory leak if a user
is reloading via SIGHUP after a message is imported (it would
only become noticeable with SIGHUPs after every message imported).
I would not expect anybody to to notice this in real-world
usage. I only noticed this since I was making -xcpdb suitable
for long-lived process use (e.g. "mod_perl style") and a flock
remained unreleased on v1 inboxes after resharding.
WatchMaildir (used by -watch) already handles caching of the
importer object itself, and all of our other real-world uses of
->importer are short-lived or designed for batch scripts, so
there's no need to cache the importer result internally.
|
|
Perl's "local" allows changes to %SIG (and %ENV) to be limited
to its enclosing block. This allows us to get rid of a global
variable and ad-hoc method for restoring signal handlers.
|
|
"mainrepo" ws a bad name and artifact from the early days when I
intended for there to be a "spamrepo" (now just the
ENV{PI_EMERGENCY} Maildir). With v2, "mainrepo" can be
especially confusing, since v2 needs at least two git
repositories (epoch + all.git) to function and we shouldn't
confuse users by having them point to a git repository for v2.
Much of our documentation already references "INBOX_DIR" for
command-line arguments, so use "inboxdir" as the
git-config(1)-friendly variant for that.
"mainrepo" remains supported indefinitely for compatibility.
Users may need to revert to old versions, or may be referring
to old documentation and must not be forced to change config
files to account for this change.
So if you're using "mainrepo" today, I do NOT recommend changing
it right away because other bugs can lurk.
Link: https://public-inbox.org/meta/874l0ice8v.fsf@alyssa.is/
|
|
Yet another step to keeping our naming consistent with Xapian
terminology.
|
|
Don't confuse future readers of our code.
|
|
Our internal data structure should be consistent with Xapian
terminology.
|
|
Another step towards becoming consistent with Xapian terminology
|
|
We're slowly getting rid of the word "partition" when it
comes to remain consistent with Xapian docs.
|
|
v2 repos are sometimes created on machines where CPU
parallelization exceeds the capability of the storage devices.
In that case, users may reshard the Xapian DB to any smaller,
positive integer to avoid excessive overhead and contention when
bottlenecked by slow storage.
Resharding can also be used to increase shard count after
hardware upgrades.
|
|
For M:N resharding, we'll want to display the number from
the new shard number.
|
|
To support M:N resharding, we need to ensure we store the
indexlevel in the destination shard, rather than the
originating one.
|
|
In case some BOFH decides to randomly create directories
using non-ASCII digits all over the place.
|
|
And use it from Admin.
It's easy to tell what indexlevel=basic is from unconfigured
inboxes, but distinguishing between 'medium' and 'full' would
require stat()-ing position.* files which is fragile and
Xapian-implementation-dependent.
So use the metadata facility of Xapian and store it in the main
partition so Admin tools can deal better with unconfigured
inboxes copied using generic tools like cp(1) or rsync(1).
|
|
It doesn't implement progress of batches, yet, but it wires
up the parsing of the command-line while preserving output
compatibility.
This output is NOT meant to be stable.
|
|
To properly handle compact tmpdir cleanup in single process
situations, we need to carefully account for Xtmpdir not
being a singleton and ensuring we don't clobber signal
handlers which belong to other Xtmpdirs.
|
|
We don't have to be tied to the number of partitions in case
we made a bad choice at initialization. This doesn't affect
reindexing, but the copying phase is already intensive.
And optimize away the extra process when we only have a single
job which won't parallelize.
The wording for the (v2) reindexing phase could be improved,
later. I also plan to allow repartitioning of existing
Xapian DBs.
|
|
We should not have leftover junk on interrupted invocations.
|
|
Allow users to specify the --blocksize <B>, --no-full, --fuller
options for xapian-compact(1) for fine-tuning compact behavior
for low-traffic/inactive inboxes.
We also won't support --multipass, since it doesn't seem
compatible with our requirement to use --no-renumber.
We also won't support --single-file, since it only seems
intended for totally dead inboxes; and it doesn't seem
worth the support overhead when "totally dead" turns out
to be a misdiagnosis.
|
|
Since -xcpdb is a superset of -compact, we can reuse much of
that code used for driving compact.
For compact (only), this is slightly less memory efficient since
it requires an extra process per-partition, but we get to prefix
the output with the partition name for more readable output.
|
|
Cleanup temporary directories on common termination signals
(INT, HUP, PIPE, TERM), but only if it's not in the process
of being committed via rename() sequence.
|