Date | Commit message (Collapse) |
|
The musl strftime(3) implementation on AlpineLinux 3.19.0
doesn't support `%k' and `%k' isn't in POSIX, either. So we
fall back to using the `sprintf' perlop in the user-facing UI
since leading zeroes require needless overhead for my eyes and
brain to parse in the time.
|
|
Absolute pathnames of git coderepos are stored in the cindex,
but we should favor paths relative to $ENV{PWD} since it
respects symlinks in the heirarchy.
Respecting symlinks makes it easier to migrate cindex to
new storage as old storage wears out and to relocate the
storage device onto another machine.
|
|
This saves us some code, and is a small step towards getting
ProcessIO working with stat, fcntl and other perlops that don't
work with tied handles.
|
|
The Xapian SWIG bindings are favored by Xapian upstream for
ease-of-maintenance compared to the XS version. While Debian
lags on this front, the SWIG bindings are widely available
on all *BSDs.
|
|
This is much easier to support than xcpdb since it's 1:1 and
doesn't follow a different sharding scheme than the inboxes and
extindices.
|
|
This bit of common code will be handy for the upcoming
resolve_cidxdir, too.
|
|
I favor leaving the publicinbox.<name>.indexlevel parameter
out of config files to make it easier to alter and reduce
sources of truth. It worked well in most cases, but
public-inbox-watch also needs to detect the indexlevel.
Moving the sub to InboxWritable (from Admin) probably makes
sense since it's a per-inbox attribute and allows -watch
to reuse it.
|
|
We'll also support the $base arg of File::Spec->rel2abs
since it should make codesearch indexing easier.
|
|
We'll be using this for indexing git coderepos, and
switch to Perl 5.12 while we're at it since unicode_strings
doesn't affect this package.
|
|
It's more consistent with TAP output and hopefully puts
users at ease in case they don't understand the meaning
of a message.
|
|
This lets administrators reindex specific time ranges
according to git "approxidate" formats. These arguments
are passed directly to underlying git-log(1) invocations
and may still reach into old epochs.
Since these options rely on git committer dates (which we infer
from the most recent Received: header), they are not guaranteed
to be strictly tied to git history and it's possible to
over/under-reindex some messages. It's probably not a major
problem in practice, though; reindexing a few extra messages
is generally harmless aside from some extra device wear.
Since this currently relies on git-log, these options do not
affect -extindex, yet.
|
|
Neither Inboxes nor ExtSearch objects were retrying correctly
when there are live git processes, but the inboxes were getting
rescanned for search or other reasons. Ensure the scan retries
eventually if there's live processes.
We also need to update the cleanup task to detect Xapian shard
count changes, since Xapian ->reopen is enough to detect any
other Xapian changes. Otherwise, we just issue an inexpensive
->reopen call and let Xapian check whether there's anything
worth reopening.
This also lets us eliminate the Devel::Peek dependency.
|
|
This fixes the occasional t/lei-sigpipe.t infinite loop
under "make check-run".
Link: http://nntp.perl.org/group/perl.perl5.porters/258784
<CAHhgV8hPbcmkzWizp6Vijw921M5BOXixj4+zTh3nRS9vRBYk8w@mail.gmail.com>
Followup-to: b552bb9150775fe4 ("daemon+watch: fix localization of %SIG for non-signalfd users")
|
|
Setting up and maintaining git-only mirrors of v2 inboxes is
complex since multiple commands are required to clone and fetch
into epochs.
Unlike grokmirror, these commands do not require any
configuration. Instead, they rely on existing git config files
and work like "git clone --mirror" and "git fetch",
respectively.
Like grokmirror, they use manifest.js.gz, but only on a
per-inbox basis so users won't have to clone every inbox of a
large instance nor edit config files to include/exclude inboxes
they're interested in.
|
|
Since extindex uses Xapian shards in a similar way to
v2 inboxes, we'll support -xcpdb (reshard+upgrade) and
-compact all the same to give admins tuning+upgrade
options.
|
|
No callers pass an unblessed pathname to index_inbox,
only Inbox object refs.
|
|
The underscore variant was never documented and maintaining
the difference between the command-line and internal hash
is not worth it.
|
|
This can be useful for users who want to clone and
mirror an existing public-inbox. This doesn't have
update support, yet, so users will need to run
"git fetch && public-inbox-index" for now.
|
|
There's nothing we can do about bad emails in our search
results, so quiet things down and don't fight the MUA for
the terminal.
|
|
Using "make update-copyrights" after setting GNULIB_PATH in my
config.mak
|
|
As with CORE::die and $SIG{__DIE__}, it turns out CORE::warn is
safe to use inside $SIG{__WARN__} handlers without triggering
infinite recursion. So fall back to reusing CORE::warn instead
of creating a new sub.
|
|
We'll count the number of log changes (regardless of index or
unindex) and only attach inboxes to ExtSearchIdx objects when
they get new work. We'll also reduce lock bouncing and only
update external indices after all per-inbox indexing is done.
This also updates existing v2 indexing/unindexing callers
to be more consistent and ensures unindex log entries update
per-inbox last commit information.
|
|
Some of my ancient v1-only scripts called public-inbox-index
to operate on GIT_DIR:
GIT_DIR=/path/to/foo.git public-inbox-index
This change ensures they keep working, otherwise "." will be
passed to the --git-dir= switch of git(1) because that's the
default directory if no inboxes are specified on the
command-line.
Fixes: 9fcce78e40b0a7c6 ("script/public-inbox-*: favor caller-provided pathnames")
|
|
We need to canonicalize paths for inboxes which do not have
a newsgroup defined, otherwise ->eidx_key matches can fail
in unexpected ways.
|
|
We'll try to avoid calling Cwd::abs_path and use
File::Spec->rel2abs instead, since abs_path will resolve
symlinks the user specified on the command-line.
Unfortunately, ->rel2abs still leaves "/.." and "/../"
uncollapsed, so we still need to fall back to Cwd::abs_path in
those cases.
While we are at it, we'll also resolve inboxdir from deep inside
v2 directories instead of misdetecting them as v1 bare git
repos.
In any case, stop matching directories by name and instead rely
on the unique combination of st_dev + st_ino on stat() as we
started doing in the extindex code.
|
|
-index runs on data that's already frozen in git, so there's
no point in warning users about it.
While we're at it, set the {current_info} prefix for v1 as
we do in v2 inboxes in case new problems show up.
|
|
We've stopped referring to inboxdirs as "repos" a while ago
since v2 inboxes have multiple git repos associated with them.
So update the name to reflect that and avoid an unnecessary
export that's only used by a test case.
|
|
At least not for resolving inboxes, since there's no good way
for a user to specify what is an inbox or extindex directory
without a command-line switch.
Instead of changing the -extindex command, we change the -index
command internals to rely on the new {-use_cwd} flag to avoid
internal use of negation, since double-negatives and the like
are confusing to me.
|
|
When `--all' is passed to -index and similar commands, process
them in the same order as what is given in the config file.
This ensures predictable behavior so admins can ensure certain
inboxes see updated indices before others. For (upcoming)
external indices, this will ensure stable Xref: ordering for
predictable caching/memoization by NNTP clients.
|
|
"inboxes 1 inboxes not supported by ..." was non-sensical.
Now it'll show "-V1 inbox not supported by ...", instead.
|
|
Since we no longer read document data from Xapian, allow users
to opt-out of storing it.
This breaks compatibility with previous releases of
public-inbox, but gives us a ~1.5% space savings on Xapian
storage (and associated I/O and page cache pressure reduction).
|
|
This is helpful with --all, or when multiple inboxes
are being indexed.
|
|
Established tools like make(1), prove(1) and xargs(1) don't warn
when the desired parallelism level can't be met, either.
|
|
Converting v1 inboxes from v2 can be a painful experience
on HDD. Some of the new options in the CLI or config
file make it less painful.
|
|
We parse other options, too, not just --max-size
|
|
-index now invokes ->DESTROY like xcpdb does, which is necessary
to cleanup $INBOX_DIR/msgmap-XXXXXXX files. We'll also exit
with the expected values for various signals by adding 128
as described in <https://www.tldp.org/LDP/abs/html/exitcodes.html>
-xcpdb now terminates worker processes and xapian-compact(1)
invocations when prematurely killed, too.
|
|
`--reindex' involves chomping down lots of mail, so it benefits
from parallelization just like the initial indexing. It's
also a bit surprising to specify `--jobs/-j' without parallel
processes, so ensure we turn on parallelization there, too.
We can simplify initialization here, as well, since neither
`eval' nor `V2Writable->new' should be in this code.
|
|
PublicInbox::Eml has enough functionality to replace the
Email::MIME-based PublicInbox::MIME.
|
|
In normal mail paths, we can rely on MTAs being configured with
reasonable limits in the -watch and -mda mail injection paths.
However, the MTA is bypassed in a git-only delivery path, a BOFH
could inject a large message and DoS users attempting to mirror
a public-inbox.
This doesn't protect unindexed WWW interfaces from Email::MIME
memory explosions on v1 inboxes. Probably nobody cares about
unindexed WWW interfaces anymore, especially now that Xapian is
optional for indexing.
|
|
It hasn't been needed since commit 089cca37fa036411
("config: ignore missing config files"). And we
actually want to propagate errors when we can't
start new processes or if git(1) is missing.
|
|
I didn't wait until September to do it, this year!
|
|
This allows us to simplify version checking by avoiding
"//" or "||" operators sprinkled around.
|
|
No point in lazy-loading these, since they're always loaded
anyways and would not have portability problems on systems with
minimal dependencies.
|
|
This simplifies our admin module a bit and allows solver to be
used with v1 inboxes using git versions prior to v1.8.5 (but
still >= git v1.8.0).
|
|
We can save callers the trouble of {-hold} and {-dev_null}
refs as well as the trouble of calling fileno().
|
|
Xapian upstream is slowly phasing out the XS-based Search::Xapian
in favor of the SWIG-generated "Xapian" package. While Debian and
both FreeBSD have Search::Xapian, OpenBSD only includes the "Xapian"
binding.
More information about the status of the "Xapian" Perl module here:
https://trac.xapian.org/ticket/523
|
|
-mda should not be dealing with broken Date: headers
nowadays, and deprioritize it in our documentation and
internal checks.
|
|
PublicInbox::Admin::config() just adds an extra layer of
indirection which we barely rely on. So get rid of this
global variable and make it easier to run tests in the
future without relying on global state.
|
|
InboxWritable caching the result of ->importer leads to a
circular references with returned (V2Writable|Import) object
holds onto the calling InboxWritable object.
With public-inbox-watch, this leads to a memory leak if a user
is reloading via SIGHUP after a message is imported (it would
only become noticeable with SIGHUPs after every message imported).
I would not expect anybody to to notice this in real-world
usage. I only noticed this since I was making -xcpdb suitable
for long-lived process use (e.g. "mod_perl style") and a flock
remained unreleased on v1 inboxes after resharding.
WatchMaildir (used by -watch) already handles caching of the
importer object itself, and all of our other real-world uses of
->importer are short-lived or designed for batch scripts, so
there's no need to cache the importer result internally.
|
|
"mainrepo" ws a bad name and artifact from the early days when I
intended for there to be a "spamrepo" (now just the
ENV{PI_EMERGENCY} Maildir). With v2, "mainrepo" can be
especially confusing, since v2 needs at least two git
repositories (epoch + all.git) to function and we shouldn't
confuse users by having them point to a git repository for v2.
Much of our documentation already references "INBOX_DIR" for
command-line arguments, so use "inboxdir" as the
git-config(1)-friendly variant for that.
"mainrepo" remains supported indefinitely for compatibility.
Users may need to revert to old versions, or may be referring
to old documentation and must not be forced to change config
files to account for this change.
So if you're using "mainrepo" today, I do NOT recommend changing
it right away because other bugs can lurk.
Link: https://public-inbox.org/meta/874l0ice8v.fsf@alyssa.is/
|