Date | Commit message (Collapse) |
|
The old lock scope was only sufficient for protecting against
concurrent modifications from the common -mda, -watch, or -learn
writers.
It was not sufficient for protecting against parallel -compact
or -xcpdb invocations from eager admins. Most of the time this
only leads to confusing and misleading warning messages, but
parallel xcpdb --reshard could lead to errors.
|
|
This allows us to simplify version checking by avoiding
"//" or "||" operators sprinkled around.
|
|
public-inbox-compact wrapper displays progress by default,
anyways, and there's not a lot of output, so simplify our
code by using popen_rd instead of spawn + optional pipe.
While we're at it use "while (<HANDLE>)" to display
progress as it happens, since "foreach (<$HANDLE>)"
slurps the contents into an array, first.
|
|
There's a bunch of leftover "require" and "use" statements we no
longer need and can get rid of, along with some excessive
imports via "use".
IO::Handle usage isn't always obvious, so add comments
describing why a package loads it. Along the same lines,
document the tmpdir support as the reason we depend on
File::Temp 0.19, even though every Perl 5.10.1+ user has it.
While we're at it, favor "use" over "require", since it it gives
us extra compile-time checking.
|
|
We can save callers the trouble of {-hold} and {-dev_null}
refs as well as the trouble of calling fileno().
|
|
Xapian upstream is slowly phasing out the XS-based Search::Xapian
in favor of the SWIG-generated "Xapian" package. While Debian and
both FreeBSD have Search::Xapian, OpenBSD only includes the "Xapian"
binding.
More information about the status of the "Xapian" Perl module here:
https://trac.xapian.org/ticket/523
|
|
Since we're using Perl 5.10.1 and File::Temp 0.19+, we don't
need Xtmpdirs at all for cleaning up tempdirs on failure and
can just rely on the DESTROY handler provided by File::Temp.
|
|
We need to bypass whatever Test::More does with END/DESTROY
handlers for use in lon-lived process. This doesn't affect
any of our normal code since we don't use END/DESTROY for
Xapcmd and its callers.
|
|
We've been using this in -edit, and will be using it in some
more scripts and tests to optimize for run_mode=2 with
run_script.
Keeping this in the *Writable modules since I don't see it being
useful for the WWW and NNTP read-only interfaces which use
PublicInbox::Inbox.
|
|
InboxWritable caching the result of ->importer leads to a
circular references with returned (V2Writable|Import) object
holds onto the calling InboxWritable object.
With public-inbox-watch, this leads to a memory leak if a user
is reloading via SIGHUP after a message is imported (it would
only become noticeable with SIGHUPs after every message imported).
I would not expect anybody to to notice this in real-world
usage. I only noticed this since I was making -xcpdb suitable
for long-lived process use (e.g. "mod_perl style") and a flock
remained unreleased on v1 inboxes after resharding.
WatchMaildir (used by -watch) already handles caching of the
importer object itself, and all of our other real-world uses of
->importer are short-lived or designed for batch scripts, so
there's no need to cache the importer result internally.
|
|
Perl's "local" allows changes to %SIG (and %ENV) to be limited
to its enclosing block. This allows us to get rid of a global
variable and ad-hoc method for restoring signal handlers.
|
|
"mainrepo" ws a bad name and artifact from the early days when I
intended for there to be a "spamrepo" (now just the
ENV{PI_EMERGENCY} Maildir). With v2, "mainrepo" can be
especially confusing, since v2 needs at least two git
repositories (epoch + all.git) to function and we shouldn't
confuse users by having them point to a git repository for v2.
Much of our documentation already references "INBOX_DIR" for
command-line arguments, so use "inboxdir" as the
git-config(1)-friendly variant for that.
"mainrepo" remains supported indefinitely for compatibility.
Users may need to revert to old versions, or may be referring
to old documentation and must not be forced to change config
files to account for this change.
So if you're using "mainrepo" today, I do NOT recommend changing
it right away because other bugs can lurk.
Link: https://public-inbox.org/meta/874l0ice8v.fsf@alyssa.is/
|
|
Yet another step to keeping our naming consistent with Xapian
terminology.
|
|
Don't confuse future readers of our code.
|
|
Our internal data structure should be consistent with Xapian
terminology.
|
|
Another step towards becoming consistent with Xapian terminology
|
|
We're slowly getting rid of the word "partition" when it
comes to remain consistent with Xapian docs.
|
|
v2 repos are sometimes created on machines where CPU
parallelization exceeds the capability of the storage devices.
In that case, users may reshard the Xapian DB to any smaller,
positive integer to avoid excessive overhead and contention when
bottlenecked by slow storage.
Resharding can also be used to increase shard count after
hardware upgrades.
|
|
For M:N resharding, we'll want to display the number from
the new shard number.
|
|
To support M:N resharding, we need to ensure we store the
indexlevel in the destination shard, rather than the
originating one.
|
|
In case some BOFH decides to randomly create directories
using non-ASCII digits all over the place.
|
|
And use it from Admin.
It's easy to tell what indexlevel=basic is from unconfigured
inboxes, but distinguishing between 'medium' and 'full' would
require stat()-ing position.* files which is fragile and
Xapian-implementation-dependent.
So use the metadata facility of Xapian and store it in the main
partition so Admin tools can deal better with unconfigured
inboxes copied using generic tools like cp(1) or rsync(1).
|
|
It doesn't implement progress of batches, yet, but it wires
up the parsing of the command-line while preserving output
compatibility.
This output is NOT meant to be stable.
|
|
To properly handle compact tmpdir cleanup in single process
situations, we need to carefully account for Xtmpdir not
being a singleton and ensuring we don't clobber signal
handlers which belong to other Xtmpdirs.
|
|
We don't have to be tied to the number of partitions in case
we made a bad choice at initialization. This doesn't affect
reindexing, but the copying phase is already intensive.
And optimize away the extra process when we only have a single
job which won't parallelize.
The wording for the (v2) reindexing phase could be improved,
later. I also plan to allow repartitioning of existing
Xapian DBs.
|
|
We should not have leftover junk on interrupted invocations.
|
|
Allow users to specify the --blocksize <B>, --no-full, --fuller
options for xapian-compact(1) for fine-tuning compact behavior
for low-traffic/inactive inboxes.
We also won't support --multipass, since it doesn't seem
compatible with our requirement to use --no-renumber.
We also won't support --single-file, since it only seems
intended for totally dead inboxes; and it doesn't seem
worth the support overhead when "totally dead" turns out
to be a misdiagnosis.
|
|
Since -xcpdb is a superset of -compact, we can reuse much of
that code used for driving compact.
For compact (only), this is slightly less memory efficient since
it requires an extra process per-partition, but we get to prefix
the output with the partition name for more readable output.
|
|
Cleanup temporary directories on common termination signals
(INT, HUP, PIPE, TERM), but only if it's not in the process
of being committed via rename() sequence.
|
|
Emit information about reindexing git revision ranges when used
with xcpdb. Additionally, distinguish Xapian copy output from
v2 git epoch counting by increasing directory context info.
For now, v1 batches batches are emitted. v2 indexing is still
missing progress reporting for batches, as the data structures
for reindexing would benefit from a refactoring, first.
This does not currently affect the use of public-inbox-index,
but may in the future.
|
|
`warn' is reserved for actual warnings, as it respects
$SIG{__WARN__} and we rely on that override to print
message context information when we are indexing.
|
|
By creating temporary directories as deep as possible,
we can allow v2 repositories to have `xap$SCHEMA_VERSION'
(e.g. `xap15') reside on a separate FS.
We also check st_dev ahead-of-time to avoid doing work which
will fail with EXDEV. Of course, another process may still
move/change things around.
|
|
Running a full "public-inbox-index --reindex" in parallel
with "public-inbox-xcpdb" on the same inbox can still cause
problems, though.
|
|
Copying an entire Xapian DB is horribly slow whether it's done
via Perl or copydatabase(1). So displaying some progress
indication is good for user experience.
While we're at it, prefix xapian-compact output, too; since
parallel processes end up clobbering each other.
|
|
Copying an entire Xapian DB takes a long time, so update our
reindexing code to support partial reindexing, snapshot the
pre-copydatabase git revisions, perform the lengthy copy,
and do a partial reindex when the copy + renames are done.
|
|
To minimize the delay on active inboxes, it's actually ideal to
run xapian-compact at the end of the per-partition cpdb process;
since the new DB isn't accessible yet and so we don't have to
deal with lock contention with -mda or -watch processes. The
downside is temporary file overhead (3x instead of 2x) required.
|
|
By avoid copydatabase(1) entirely, we can make further changes
to avoid locking the entire inbox for a long operation and
switch to fine-grained locking.
|
|
We move the old directory into the new directory, so avoid the
situation where a bug or error could cause the tempdir cleanup to run
and destroy both our old and new directories.
|
|
copydatabase(1) is exceptionally noisy and it's output is
confusing when run in parallel. Support redirects at least, and
env while we're at it to give us future options.
We can also stuff a -jobs parameter into the options to limit
parallelism since it can be useful for low-priority upgrade
jobs.
|
|
Port public-inbox-compact(1) over to using it, and we will need
to wrap copydatabase(1) to ease glass migrations, too.
|