about summary refs log tree commit homepage
path: root/lib/PublicInbox/InboxWritable.pm
DateCommit message (Collapse)
2021-06-09inbox_writable: fix import_maildir
I'm not sure if anybody uses this, but it exists. It'll likely be dropped in the future. Fixes: fa3f0cbcd1af5008 ("use MdirReader in -watch and InboxWritable")
2021-04-05lei: maildir: move shard support to MdirReader
We'll eventually want lei_input users like "lei import" and "lei tag" to support parallel reads.
2021-03-23mbox_reader: add ->reads method to avoid nonsensical formats
Relying on UNIVERSAL::can may cause internal helper methods to be used, which can lead to failures or nonsensical results.
2021-02-21inbox_writable: require PublicInbox::MdirReader
This wasn't causing known failures, but maybe it was or will in the future.
2021-02-12import_mbox: use MboxReader
It supports more mbox variants and it's trailing newline behavior is probably more correct despite the previous change to PublicInbox::Filter::Vger.
2021-02-10use MdirReader in -watch and InboxWritable
MdirReader now handles files in "$MAILDIR/new" properly and is stricter about what it accepts. eml_from_path is also made robust against FIFOs while eliminating TOCTOU races with between stat(2) and open(2) calls.
2021-02-05eml: handle warning ignores for lei
There's nothing we can do about bad emails in our search results, so quiet things down and don't fight the MUA for the terminal.
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2020-12-26default to CORE::warn in $SIG{__WARN__} handlers
As with CORE::die and $SIG{__DIE__}, it turns out CORE::warn is safe to use inside $SIG{__WARN__} handlers without triggering infinite recursion. So fall back to reusing CORE::warn instead of creating a new sub.
2020-12-25inboxwritable: delay umask_prepare calls
This simplifies all ->with_umask callers and opens the door for further optimizations to delay/elide process spawning.
2020-12-23inboxwritable: _init_v1: set created_at ASAP
This ensures we have UIDVALIDITY to index earlier rather than later for v1 inboxes, matching v2 behavior.
2020-12-17inboxwritable: drop git_dir_n sub
There's only one caller, unlikely to be any more, and should be harmless to open code.
2020-12-17inbox: simplify v2 epoch counting
Perl readdir detects list context and can return an array suitable for the grep op. From there, we can rely on substr to remove the ".git" suffix and integerize the value to save a few bytes before letting List::Util::max return the value. This is how we detect Xapian shards nowadays, too, and we'll also use defined-or (//) to simplify the return value there. We'll also simplify InboxWritable->git_dir_latest, remove some callers, and consider removing it entirely.
2020-12-17inboxwritable: warn_ignore: "Bad UTF7 data escape"
As with the other messages in this callback, there's nothing we can do about invalid messages ending up in our Maildirs for -watch.
2020-12-09treewide: replace {-inbox} with {ibx} for consistency
{ibx} is shorter and is the most prevalent abbreviation in indexing and IMAP code, and the `$ibx' local variable is already prevalent throughout. In general, the codebase favors removal of vowels in variable and field names to denote non-references (because references are "lighter" than non-references). So update WWW and Filter users to use the same code since it reduces confusion and may allow easier code sharing.
2020-11-24manifest: support faster generation via [extindex "all"]
For a mirror of lore.kernel.org with >140 inboxes, this speeds up manifest.js.gz generation from ~1s to 40ms on my HW. This is still unacceptable when dealing with thousands of inboxes, but gets us closer to where we need to be.
2020-11-07inboxwritable: eidx_key for external index
This is preferable to open-coding "newsgroup // inboxdir" everywhere.
2020-11-07v2: some changes for ExtSearchIdx compatibility
We'll be using per-sync-state {ibx} refs instead, so make parts of the v2 indexing code less-dependent on $self->{ibx} where $self is a V2Writable object.
2020-08-03watch: quiet some warnings on spam mailboxes
Email::Address::XS and PublicInbox::MsgTime both emit warnings which are likely to trigger from spam messages. Since this can be configured to remove spam, just filter out those warnings to avoid cluttering up stderr with useless information.
2020-08-02inboxwritable: rename mime_from_path to eml_from_path
This is more accurate given we use PublicInbox::Eml instead of Email::MIME/PublicInbox::MIME, nowadays.
2020-08-02inboxwritable: mime_from_path: reduce `$/' scope and returns
We don't want `local $/' affecting Eml->new, and we can use implicit returns which may be faster on older Perl.
2020-07-17with_umask: pass args to callback
While it makes the code flow slightly less well in some places, it saves us runtime allocations and indentation.
2020-07-02tests: add use/require statements for TEST_RUN_MODE=0
The default (and fast) TEST_RUN_MODE=2 preloads most modules, but TEST_RUN_MODE=0 is more realistic and can catch some problems which may show up in real-world use.
2020-06-28inboxwritable: ensure ssoma.lock exists on init
This will allow us to use InboxIdle on empty/unindexed v1 inboxes.
2020-06-23init: add --skip-artnum parameter
For archivists with only newer mail archives, this option allows reserving reserve NNTP article numbers for yet-to-be-archived old messages. Indexers will need to be updated to support this feature in future commits. -V1 inboxes will now be initialized with SQLite and Xapian support if this option is used, or if --indexlevel= is specified.
2020-06-08index: v2: parallel by default
InboxWritable should only set $v2w->{parallel} if the $parallel flag is defined to 0 or 1. We want indexing a new inbox to utilize SMP, just like --reindex. -index once again allows -j0/--jobs=0 to force single-process use, and we'll be ensuring that works in tests to maintain performance on small systems. Fixes: 61a2fff5b34a3e32 ("admin: move index_inbox over")
2020-05-09replace most uses of PublicInbox::MIME with Eml
PublicInbox::Eml has enough functionality to replace the Email::MIME-based PublicInbox::MIME.
2020-04-19inboxwritable: mime_from_path: reuse in more places
There's nothing Maildir-specific about the function, so `maildir_path_load' was a bad name. So give it a more appropriate name and use it in our tests. This save ourselves some code and inconsistency by reusing an existing internal library routine in more places. We can drop the "From_" line in some of our (formerly) mbox sample files.
2020-04-04inboxwritable: fix From_ line unescaping
We can't rely on Email::MIME noticing the change to our scalar ref after calling `PublicInbox::MIME->new'. This is because Email::MIME::body_set (unlike Email::Simple::body_set) will copy the contents of the body into `->{body_raw}' as a new scalar. Furthermore, we need to escape multiple From lines in the body, not just the first one, using the `g' modifier to `s//'. Reported-by: Kyle Meyer <kyle@kyleam.com>
2020-03-29index: support --compact / -c on command-line
It's more convenient to specify `-c' / `--compact' on the command-line when reindexing than it is to invoke public-inbox-compact(1) separately. This is especially convenient in low-space situations when public-inbox-index is operating on multiple inboxes sequentially, as compaction can happen immediately after indexing each inbox, instead of waiting until all inboxes are indexed.
2020-02-06treewide: run update-copyrights from gnulib for 2019
I didn't wait until September to do it, this year!
2020-01-27inbox: add ->version method
This allows us to simplify version checking by avoiding "//" or "||" operators sprinkled around.
2020-01-01filter/base: export REJECT as a constant
And update callers to use it, as it makes the code a bit cleaner. Probably irrelvant, but it should be faster, too, as "perl -I lib -w -MO=Deparse $FILE" shows REJECT() calls are constant-folded.
2019-11-16inboxwritable: add ->cleanup method
We've been using this in -edit, and will be using it in some more scripts and tests to optimize for run_mode=2 with run_script. Keeping this in the *Writable modules since I don't see it being useful for the WWW and NNTP read-only interfaces which use PublicInbox::Inbox.
2019-11-14inboxwritable: drop {-importer} cyclic reference
InboxWritable caching the result of ->importer leads to a circular references with returned (V2Writable|Import) object holds onto the calling InboxWritable object. With public-inbox-watch, this leads to a memory leak if a user is reloading via SIGHUP after a message is imported (it would only become noticeable with SIGHUPs after every message imported). I would not expect anybody to to notice this in real-world usage. I only noticed this since I was making -xcpdb suitable for long-lived process use (e.g. "mod_perl style") and a flock remained unreleased on v1 inboxes after resharding. WatchMaildir (used by -watch) already handles caching of the importer object itself, and all of our other real-world uses of ->importer are short-lived or designed for batch scripts, so there's no need to cache the importer result internally.
2019-10-30inboxwritable: add assert_usable_dir sub
And use it for mda, since "0" could be a usable directory if somebody insists on using relative paths...
2019-10-22inboxwritable: import_maildir uses maildir_path_load
I'm not sure if this will get used anywhere, but at least call a function which exists in dead code.
2019-10-16config: support "inboxdir" in addition to "mainrepo"
"mainrepo" ws a bad name and artifact from the early days when I intended for there to be a "spamrepo" (now just the ENV{PI_EMERGENCY} Maildir). With v2, "mainrepo" can be especially confusing, since v2 needs at least two git repositories (epoch + all.git) to function and we shouldn't confuse users by having them point to a git repository for v2. Much of our documentation already references "INBOX_DIR" for command-line arguments, so use "inboxdir" as the git-config(1)-friendly variant for that. "mainrepo" remains supported indefinitely for compatibility. Users may need to revert to old versions, or may be referring to old documentation and must not be forced to change config files to account for this change. So if you're using "mainrepo" today, I do NOT recommend changing it right away because other bugs can lurk. Link: https://public-inbox.org/meta/874l0ice8v.fsf@alyssa.is/
2019-09-09run update-copyrights from gnulib for 2019
2019-06-14inboxwritable: s/partitions/shards/ in local var
More work towards being consistent with Xapian's own terminology
2019-05-23v1writable: retire in favor of InboxWritable
In retrospect, introducing V1Writable was unnecessary and InboxWritable->importer is in a better position to abstract away differences between v1 and v2 writers. So teach InboxWritable to initialize inboxes and get rid of V1Writable.
2019-01-05filter/rubylang: fix SQLite DB lifetime problems
Clearly the AltId stuff was never tested for v2. Ensure this tricky filter (which reuses Msgmap to avoid introducing new serial numbers) doesn't trigger deadlocks SQLite due to opening a DB for writing multiple times. I went through several iterations of this change before going with this one, which is the least intrusive I could fine.
2019-01-05inboxwritable: drop unused variable
2018-07-29mda: use InboxWritable
It's a convenient wrapper nowadays, so get rid of some legacy code and minimize differences from the -watch code.
2018-05-30respect umask if core.sharedRepository is not set
This is consistent with git itself and the previous behavior was a result of misunderstanding of how git interprets this. And adjust tests slightly to match the new behavior. Reported-by: Konstantin Ryabitsev <konstantin@linuxfoundation.org> <38873789-ab42-65a1-20c9-12c30b171f4f@linuxfoundation.org>
2018-03-30v2: respect core.sharedRepository in git configs
Ensure -convert and -compact do not make repositories unreadable on live servers.
2018-03-30search: move permissions handling to InboxWritable
We'll be making sure V2Writable uses this.
2018-03-20InboxWritable: add mbox/maildir parsing + import logic
This will make it easier to as well as supporting future Filter API users. It allows simplifying our ad-hoc import_vger_from_mbox script.
2018-03-20introduce InboxWritable class
This code will be shared with future mass-import tools.