Date | Commit message (Collapse) |
|
Using "make update-copyrights" after setting GNULIB_PATH in my
config.mak
|
|
As with CORE::die and $SIG{__DIE__}, it turns out CORE::warn is
safe to use inside $SIG{__WARN__} handlers without triggering
infinite recursion. So fall back to reusing CORE::warn instead
of creating a new sub.
|
|
We'll count the number of log changes (regardless of index or
unindex) and only attach inboxes to ExtSearchIdx objects when
they get new work. We'll also reduce lock bouncing and only
update external indices after all per-inbox indexing is done.
This also updates existing v2 indexing/unindexing callers
to be more consistent and ensures unindex log entries update
per-inbox last commit information.
|
|
Some of my ancient v1-only scripts called public-inbox-index
to operate on GIT_DIR:
GIT_DIR=/path/to/foo.git public-inbox-index
This change ensures they keep working, otherwise "." will be
passed to the --git-dir= switch of git(1) because that's the
default directory if no inboxes are specified on the
command-line.
Fixes: 9fcce78e40b0a7c6 ("script/public-inbox-*: favor caller-provided pathnames")
|
|
We need to canonicalize paths for inboxes which do not have
a newsgroup defined, otherwise ->eidx_key matches can fail
in unexpected ways.
|
|
We'll try to avoid calling Cwd::abs_path and use
File::Spec->rel2abs instead, since abs_path will resolve
symlinks the user specified on the command-line.
Unfortunately, ->rel2abs still leaves "/.." and "/../"
uncollapsed, so we still need to fall back to Cwd::abs_path in
those cases.
While we are at it, we'll also resolve inboxdir from deep inside
v2 directories instead of misdetecting them as v1 bare git
repos.
In any case, stop matching directories by name and instead rely
on the unique combination of st_dev + st_ino on stat() as we
started doing in the extindex code.
|
|
-index runs on data that's already frozen in git, so there's
no point in warning users about it.
While we're at it, set the {current_info} prefix for v1 as
we do in v2 inboxes in case new problems show up.
|
|
We've stopped referring to inboxdirs as "repos" a while ago
since v2 inboxes have multiple git repos associated with them.
So update the name to reflect that and avoid an unnecessary
export that's only used by a test case.
|
|
At least not for resolving inboxes, since there's no good way
for a user to specify what is an inbox or extindex directory
without a command-line switch.
Instead of changing the -extindex command, we change the -index
command internals to rely on the new {-use_cwd} flag to avoid
internal use of negation, since double-negatives and the like
are confusing to me.
|
|
When `--all' is passed to -index and similar commands, process
them in the same order as what is given in the config file.
This ensures predictable behavior so admins can ensure certain
inboxes see updated indices before others. For (upcoming)
external indices, this will ensure stable Xref: ordering for
predictable caching/memoization by NNTP clients.
|
|
"inboxes 1 inboxes not supported by ..." was non-sensical.
Now it'll show "-V1 inbox not supported by ...", instead.
|
|
Since we no longer read document data from Xapian, allow users
to opt-out of storing it.
This breaks compatibility with previous releases of
public-inbox, but gives us a ~1.5% space savings on Xapian
storage (and associated I/O and page cache pressure reduction).
|
|
This is helpful with --all, or when multiple inboxes
are being indexed.
|
|
Established tools like make(1), prove(1) and xargs(1) don't warn
when the desired parallelism level can't be met, either.
|
|
Converting v1 inboxes from v2 can be a painful experience
on HDD. Some of the new options in the CLI or config
file make it less painful.
|
|
We parse other options, too, not just --max-size
|
|
-index now invokes ->DESTROY like xcpdb does, which is necessary
to cleanup $INBOX_DIR/msgmap-XXXXXXX files. We'll also exit
with the expected values for various signals by adding 128
as described in <https://www.tldp.org/LDP/abs/html/exitcodes.html>
-xcpdb now terminates worker processes and xapian-compact(1)
invocations when prematurely killed, too.
|
|
`--reindex' involves chomping down lots of mail, so it benefits
from parallelization just like the initial indexing. It's
also a bit surprising to specify `--jobs/-j' without parallel
processes, so ensure we turn on parallelization there, too.
We can simplify initialization here, as well, since neither
`eval' nor `V2Writable->new' should be in this code.
|
|
PublicInbox::Eml has enough functionality to replace the
Email::MIME-based PublicInbox::MIME.
|
|
In normal mail paths, we can rely on MTAs being configured with
reasonable limits in the -watch and -mda mail injection paths.
However, the MTA is bypassed in a git-only delivery path, a BOFH
could inject a large message and DoS users attempting to mirror
a public-inbox.
This doesn't protect unindexed WWW interfaces from Email::MIME
memory explosions on v1 inboxes. Probably nobody cares about
unindexed WWW interfaces anymore, especially now that Xapian is
optional for indexing.
|
|
It hasn't been needed since commit 089cca37fa036411
("config: ignore missing config files"). And we
actually want to propagate errors when we can't
start new processes or if git(1) is missing.
|
|
I didn't wait until September to do it, this year!
|
|
This allows us to simplify version checking by avoiding
"//" or "||" operators sprinkled around.
|
|
No point in lazy-loading these, since they're always loaded
anyways and would not have portability problems on systems with
minimal dependencies.
|
|
This simplifies our admin module a bit and allows solver to be
used with v1 inboxes using git versions prior to v1.8.5 (but
still >= git v1.8.0).
|
|
We can save callers the trouble of {-hold} and {-dev_null}
refs as well as the trouble of calling fileno().
|
|
Xapian upstream is slowly phasing out the XS-based Search::Xapian
in favor of the SWIG-generated "Xapian" package. While Debian and
both FreeBSD have Search::Xapian, OpenBSD only includes the "Xapian"
binding.
More information about the status of the "Xapian" Perl module here:
https://trac.xapian.org/ticket/523
|
|
-mda should not be dealing with broken Date: headers
nowadays, and deprioritize it in our documentation and
internal checks.
|
|
PublicInbox::Admin::config() just adds an extra layer of
indirection which we barely rely on. So get rid of this
global variable and make it easier to run tests in the
future without relying on global state.
|
|
InboxWritable caching the result of ->importer leads to a
circular references with returned (V2Writable|Import) object
holds onto the calling InboxWritable object.
With public-inbox-watch, this leads to a memory leak if a user
is reloading via SIGHUP after a message is imported (it would
only become noticeable with SIGHUPs after every message imported).
I would not expect anybody to to notice this in real-world
usage. I only noticed this since I was making -xcpdb suitable
for long-lived process use (e.g. "mod_perl style") and a flock
remained unreleased on v1 inboxes after resharding.
WatchMaildir (used by -watch) already handles caching of the
importer object itself, and all of our other real-world uses of
->importer are short-lived or designed for batch scripts, so
there's no need to cache the importer result internally.
|
|
"mainrepo" ws a bad name and artifact from the early days when I
intended for there to be a "spamrepo" (now just the
ENV{PI_EMERGENCY} Maildir). With v2, "mainrepo" can be
especially confusing, since v2 needs at least two git
repositories (epoch + all.git) to function and we shouldn't
confuse users by having them point to a git repository for v2.
Much of our documentation already references "INBOX_DIR" for
command-line arguments, so use "inboxdir" as the
git-config(1)-friendly variant for that.
"mainrepo" remains supported indefinitely for compatibility.
Users may need to revert to old versions, or may be referring
to old documentation and must not be forced to change config
files to account for this change.
So if you're using "mainrepo" today, I do NOT recommend changing
it right away because other bugs can lurk.
Link: https://public-inbox.org/meta/874l0ice8v.fsf@alyssa.is/
|
|
Since public-inbox-index may be run against a large list of
(intended) inboxes from the command-line, it's helpful to show
which directory fails the resolution.
|
|
For whatever reason, inbox directories can go missing
temporarily or permanently. Tell the admin about them
and continue on our way.
|
|
It's slightly confusing since we dedicate one job
to dealing with fast-import + SQLite indexing; and
it's not worth complaining about when it happens.
|
|
Our internal data structure should be consistent with Xapian
terminology.
|
|
We're slowly getting rid of the word "partition" when it
comes to remain consistent with Xapian docs.
|
|
No point in forcing admin programs to reparse the config
themselves; and we won't support multiple instances of it;
unlike the WWW code.
|
|
We'll be using this in -edit, and maybe other admin-oriented
tools for UI-consistency.
|
|
We no longer make -index warn on it, no other code uses it;
and working on unconfigured inboxes is totally reasonable
for admins who are setting things up.
|
|
And use it from Admin.
It's easy to tell what indexlevel=basic is from unconfigured
inboxes, but distinguishing between 'medium' and 'full' would
require stat()-ing position.* files which is fragile and
Xapian-implementation-dependent.
So use the metadata facility of Xapian and store it in the main
partition so Admin tools can deal better with unconfigured
inboxes copied using generic tools like cp(1) or rsync(1).
|
|
It doesn't implement progress of batches, yet, but it wires
up the parsing of the command-line while preserving output
compatibility.
This output is NOT meant to be stable.
|
|
Copying an entire Xapian DB takes a long time, so update our
reindexing code to support partial reindexing, snapshot the
pre-copydatabase git revisions, perform the lengthy copy,
and do a partial reindex when the copy + renames are done.
|
|
We will be reindexing after copydatabase
|
|
Both of these index-affecting commands should work similarly
on the command-line.
public-inbox-index no longer complains about unconfigured
~/.public-inbox/config; but often I found myself being
annoyed by that, anyways...
|
|
Since we lazy-load Xapian now, some errors may become
more cryptic or buried. Try to improve that by making
Admin show better errors.
|
|
We'll be using it in future admin tools, and making this
easier-to-test.
|