about summary refs log tree commit homepage
path: root/lib/PublicInbox/Admin.pm
DateCommit message (Collapse)
2023-12-13treewide: avoid strftime %k for portability
The musl strftime(3) implementation on AlpineLinux 3.19.0 doesn't support `%k' and `%k' isn't in POSIX, either. So we fall back to using the `sprintf' perlop in the user-facing UI since leading zeroes require needless overhead for my eyes and brain to parse in the time.
2023-11-29admin: resolve_git_dir respects symlinks
Absolute pathnames of git coderepos are stored in the cindex, but we should favor paths relative to $ENV{PWD} since it respects symlinks in the heirarchy. Respecting symlinks makes it easier to migrate cindex to new storage as old storage wears out and to relocate the storage device onto another machine.
2023-10-28treewide: use run_qx where appropriate
This saves us some code, and is a small step towards getting ProcessIO working with stat, fcntl and other perlops that don't work with tied handles.
2023-09-11treewide: favor Xapian (SWIG binding) over Search::Xapian
The Xapian SWIG bindings are favored by Xapian upstream for ease-of-maintenance compared to the XS version. While Debian lags on this front, the SWIG bindings are widely available on all *BSDs.
2023-05-03compact: support codesearch indices
This is much easier to support than xcpdb since it's 1:1 and doesn't follow a different sharding scheme than the inboxes and extindices.
2023-05-03admin: hoist out resolve_any_idxdir from resolve_{inboxdir,eidxdir}
This bit of common code will be handy for the upcoming resolve_cidxdir, too.
2023-04-06watch: use detect_indexlevel for unconfigured inboxes
I favor leaving the publicinbox.<name>.indexlevel parameter out of config files to make it easier to alter and reduce sources of truth. It worked well in most cases, but public-inbox-watch also needs to detect the indexlevel. Moving the sub to InboxWritable (from Admin) probably makes sense since it's a per-inbox attribute and allows -watch to reuse it.
2023-03-25admin: ensure resolved GIT_DIR is absolute
We'll also support the $base arg of File::Spec->rel2abs since it should make codesearch indexing easier.
2023-03-25admin: hoist out resolve_git_dir
We'll be using this for indexing git coderepos, and switch to Perl 5.12 while we're at it since unicode_strings doesn't affect this package.
2021-10-10admin: add '# ' prefix for progress messages
It's more consistent with TAP output and hopefully puts users at ease in case they don't understand the meaning of a message.
2021-10-05index: --reindex w/ --{since,until,before,after}
This lets administrators reindex specific time ranges according to git "approxidate" formats. These arguments are passed directly to underlying git-log(1) invocations and may still reach into old epochs. Since these options rely on git committer dates (which we infer from the most recent Received: header), they are not guaranteed to be strictly tied to git history and it's possible to over/under-reindex some messages. It's probably not a major problem in practice, though; reindexing a few extra messages is generally harmless aside from some extra device wear. Since this currently relies on git-log, these options do not affect -extindex, yet.
2021-09-23daemons: revamp periodic cleanup task
Neither Inboxes nor ExtSearch objects were retrying correctly when there are live git processes, but the inboxes were getting rescanned for search or other reasons. Ensure the scan retries eventually if there's live processes. We also need to update the cleanup task to detect Xapian shard count changes, since Xapian ->reopen is enough to detect any other Xapian changes. Otherwise, we just issue an inexpensive ->reopen call and let Xapian check whether there's anything worth reopening. This also lets us eliminate the Devel::Peek dependency.
2021-09-22treewide: fix %SIG localization, harder
This fixes the occasional t/lei-sigpipe.t infinite loop under "make check-run". Link: http://nntp.perl.org/group/perl.perl5.porters/258784 <CAHhgV8hPbcmkzWizp6Vijw921M5BOXixj4+zTh3nRS9vRBYk8w@mail.gmail.com> Followup-to: b552bb9150775fe4 ("daemon+watch: fix localization of %SIG for non-signalfd users")
2021-09-12new public-inbox-{clone,fetch} commands
Setting up and maintaining git-only mirrors of v2 inboxes is complex since multiple commands are required to clone and fetch into epochs. Unlike grokmirror, these commands do not require any configuration. Instead, they rely on existing git config files and work like "git clone --mirror" and "git fetch", respectively. Like grokmirror, they use manifest.js.gz, but only on a per-inbox basis so users won't have to clone every inbox of a large instance nor edit config files to include/exclude inboxes they're interested in.
2021-07-31extindex: -xcpdb and -compact support
Since extindex uses Xapian shards in a similar way to v2 inboxes, we'll support -xcpdb (reshard+upgrade) and -compact all the same to give admins tuning+upgrade options.
2021-07-31admin: index_inbox: drop unnecessary check
No callers pass an unblessed pathname to index_inbox, only Inbox object refs.
2021-07-28treewide: s/sequential_shard/sequential-shard/g
The underscore variant was never documented and maintaining the difference between the command-line and internal hash is not worth it.
2021-02-07lei: add-external --mirror support
This can be useful for users who want to clone and mirror an existing public-inbox. This doesn't have update support, yet, so users will need to run "git fetch && public-inbox-index" for now.
2021-02-05eml: handle warning ignores for lei
There's nothing we can do about bad emails in our search results, so quiet things down and don't fight the MUA for the terminal.
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2020-12-26default to CORE::warn in $SIG{__WARN__} handlers
As with CORE::die and $SIG{__DIE__}, it turns out CORE::warn is safe to use inside $SIG{__WARN__} handlers without triggering infinite recursion. So fall back to reusing CORE::warn instead of creating a new sub.
2020-12-26index: do not attach inbox to extindex unless updated
We'll count the number of log changes (regardless of index or unindex) and only attach inboxes to ExtSearchIdx objects when they get new work. We'll also reduce lock bouncing and only update external indices after all per-inbox indexing is done. This also updates existing v2 indexing/unindexing callers to be more consistent and ensures unindex log entries update per-inbox last commit information.
2020-12-22admin: resolve inboxes to absolute paths for index
Some of my ancient v1-only scripts called public-inbox-index to operate on GIT_DIR: GIT_DIR=/path/to/foo.git public-inbox-index This change ensures they keep working, otherwise "." will be passed to the --git-dir= switch of git(1) because that's the default directory if no inboxes are specified on the command-line. Fixes: 9fcce78e40b0a7c6 ("script/public-inbox-*: favor caller-provided pathnames")
2020-12-21use rel2abs_collapsed when loading Inbox objects
We need to canonicalize paths for inboxes which do not have a newsgroup defined, otherwise ->eidx_key matches can fail in unexpected ways.
2020-12-20script/public-inbox-*: favor caller-provided pathnames
We'll try to avoid calling Cwd::abs_path and use File::Spec->rel2abs instead, since abs_path will resolve symlinks the user specified on the command-line. Unfortunately, ->rel2abs still leaves "/.." and "/../" uncollapsed, so we still need to fall back to Cwd::abs_path in those cases. While we are at it, we'll also resolve inboxdir from deep inside v2 directories instead of misdetecting them as v1 bare git repos. In any case, stop matching directories by name and instead rely on the unique combination of st_dev + st_ino on stat() as we started doing in the extindex code.
2020-12-17index: ignore some warnings, set {current_info} for v1
-index runs on data that's already frozen in git, so there's no point in warning users about it. While we're at it, set the {current_info} prefix for v1 as we do in v2 inboxes in case new problems show up.
2020-12-09admin: resolve_repo_dir => resolve_inboxdir
We've stopped referring to inboxdirs as "repos" a while ago since v2 inboxes have multiple git repos associated with them. So update the name to reflect that and avoid an unnecessary export that's only used by a test case.
2020-12-09extindex: do not use current dir like -index does
At least not for resolving inboxes, since there's no good way for a user to specify what is an inbox or extindex directory without a command-line switch. Instead of changing the -extindex command, we change the -index command internals to rely on the new {-use_cwd} flag to avoid internal use of negation, since double-negatives and the like are confusing to me.
2020-10-13admin: preserve config ordering of `--all' switch
When `--all' is passed to -index and similar commands, process them in the same order as what is given in the config file. This ensures predictable behavior so admins can ensure certain inboxes see updated indices before others. For (upcoming) external indices, this will ensure stable Xref: ordering for predictable caching/memoization by NNTP clients.
2020-09-02admin: improve minimum version text
"inboxes 1 inboxes not supported by ..." was non-sensical. Now it'll show "-V1 inbox not supported by ...", instead.
2020-08-20init+index: support --skip-docdata for Xapian
Since we no longer read document data from Xapian, allow users to opt-out of storing it. This breaks compatibility with previous releases of public-inbox, but gives us a ~1.5% space savings on Xapian storage (and associated I/O and page cache pressure reduction).
2020-08-20admin: progress shows the inbox being indexed
This is helpful with --all, or when multiple inboxes are being indexed.
2020-08-13admin: don't warn when --jobs exceeds shards
Established tools like make(1), prove(1) and xargs(1) don't warn when the desired parallelism level can't be met, either.
2020-08-10convert: support new -index options
Converting v1 inboxes from v2 can be a painful experience on HDD. Some of the new options in the CLI or config file make it less painful.
2020-08-10admin: use a generic variable name
We parse other options, too, not just --max-size
2020-08-10index+xcpdb: improve SIG{INT,TERM,HUP,PIPE} behavior
-index now invokes ->DESTROY like xcpdb does, which is necessary to cleanup $INBOX_DIR/msgmap-XXXXXXX files. We'll also exit with the expected values for various signals by adding 128 as described in <https://www.tldp.org/LDP/abs/html/exitcodes.html> -xcpdb now terminates worker processes and xapian-compact(1) invocations when prematurely killed, too.
2020-05-17index: v2: parallelize if --reindex or --jobs is specified
`--reindex' involves chomping down lots of mail, so it benefits from parallelization just like the initial indexing. It's also a bit surprising to specify `--jobs/-j' without parallel processes, so ensure we turn on parallelization there, too. We can simplify initialization here, as well, since neither `eval' nor `V2Writable->new' should be in this code.
2020-05-09replace most uses of PublicInbox::MIME with Eml
PublicInbox::Eml has enough functionality to replace the Email::MIME-based PublicInbox::MIME.
2020-04-21index: support --max-size / publicinbox.indexMaxSize
In normal mail paths, we can rely on MTAs being configured with reasonable limits in the -watch and -mda mail injection paths. However, the MTA is bypassed in a git-only delivery path, a BOFH could inject a large message and DoS users attempting to mirror a public-inbox. This doesn't protect unindexed WWW interfaces from Email::MIME memory explosions on v1 inboxes. Probably nobody cares about unindexed WWW interfaces anymore, especially now that Xapian is optional for indexing.
2020-04-20drop needless `eval {}' around Config->new
It hasn't been needed since commit 089cca37fa036411 ("config: ignore missing config files"). And we actually want to propagate errors when we can't start new processes or if git(1) is missing.
2020-02-06treewide: run update-copyrights from gnulib for 2019
I didn't wait until September to do it, this year!
2020-01-27inbox: add ->version method
This allows us to simplify version checking by avoiding "//" or "||" operators sprinkled around.
2020-01-06admin: do not lazy-load Inbox or Config packages
No point in lazy-loading these, since they're always loaded anyways and would not have portability problems on systems with minimal dependencies.
2019-12-30spawn: support chdir via -C option
This simplifies our admin module a bit and allows solver to be used with v1 inboxes using git versions prior to v1.8.5 (but still >= git v1.8.0).
2019-12-30spawn: allow passing GLOB handles for redirects
We can save callers the trouble of {-hold} and {-dev_null} refs as well as the trouble of calling fileno().
2019-12-24search: support SWIG-generated Xapian.pm
Xapian upstream is slowly phasing out the XS-based Search::Xapian in favor of the SWIG-generated "Xapian" package. While Debian and both FreeBSD have Search::Xapian, OpenBSD only includes the "Xapian" binding. More information about the status of the "Xapian" Perl module here: https://trac.xapian.org/ticket/523
2019-12-12Date::Parse is now optional
-mda should not be dealing with broken Date: headers nowadays, and deprioritize it in our documentation and internal checks.
2019-11-16admin: get rid of singleton $CFG var
PublicInbox::Admin::config() just adds an extra layer of indirection which we barely rely on. So get rid of this global variable and make it easier to run tests in the future without relying on global state.
2019-11-14inboxwritable: drop {-importer} cyclic reference
InboxWritable caching the result of ->importer leads to a circular references with returned (V2Writable|Import) object holds onto the calling InboxWritable object. With public-inbox-watch, this leads to a memory leak if a user is reloading via SIGHUP after a message is imported (it would only become noticeable with SIGHUPs after every message imported). I would not expect anybody to to notice this in real-world usage. I only noticed this since I was making -xcpdb suitable for long-lived process use (e.g. "mod_perl style") and a flock remained unreleased on v1 inboxes after resharding. WatchMaildir (used by -watch) already handles caching of the importer object itself, and all of our other real-world uses of ->importer are short-lived or designed for batch scripts, so there's no need to cache the importer result internally.
2019-10-16config: support "inboxdir" in addition to "mainrepo"
"mainrepo" ws a bad name and artifact from the early days when I intended for there to be a "spamrepo" (now just the ENV{PI_EMERGENCY} Maildir). With v2, "mainrepo" can be especially confusing, since v2 needs at least two git repositories (epoch + all.git) to function and we shouldn't confuse users by having them point to a git repository for v2. Much of our documentation already references "INBOX_DIR" for command-line arguments, so use "inboxdir" as the git-config(1)-friendly variant for that. "mainrepo" remains supported indefinitely for compatibility. Users may need to revert to old versions, or may be referring to old documentation and must not be forced to change config files to account for this change. So if you're using "mainrepo" today, I do NOT recommend changing it right away because other bugs can lurk. Link: https://public-inbox.org/meta/874l0ice8v.fsf@alyssa.is/