about summary refs log tree commit homepage
path: root/lib/PublicInbox/WatchMaildir.pm
DateCommit message (Collapse)
2020-02-06treewide: run update-copyrights from gnulib for 2019
I didn't wait until September to do it, this year!
2020-01-11make Filesys::Notify::Simple optional
It's only used by us in public-inbox-watch, and maybe not for long. It's in most installations because Plack pulls it in though, but Plack is no longer required.
2020-01-11spawn (and thus popen_rd) die on failure
Most spawn and popen_rd callers die on failure to spawn, anyways, and some are missing checks entirely. This saves us a bunch of verbose error-checking code in callers. This also makes popen_rd more consistent, since it already dies on pipe creation failures.
2020-01-06treewide: "require" + "use" cleanup and docs
There's a bunch of leftover "require" and "use" statements we no longer need and can get rid of, along with some excessive imports via "use". IO::Handle usage isn't always obvious, so add comments describing why a package loads it. Along the same lines, document the tmpdir support as the reason we depend on File::Temp 0.19, even though every Perl 5.10.1+ user has it. While we're at it, favor "use" over "require", since it it gives us extra compile-time checking.
2020-01-01filter/base: export REJECT as a constant
And update callers to use it, as it makes the code a bit cleaner. Probably irrelvant, but it should be faster, too, as "perl -I lib -w -MO=Deparse $FILE" shows REJECT() calls are constant-folded.
2019-11-24check for File::Temp 0.19 for ->newdir method
This is distributed with Perl 5.10.1 and onwards, so it should not be an installation burden for any users. I'm planning to move away from tempdir() entirely and use File::Temp->newdir to remove dependencies on END{} blocks.
2019-10-22watchmaildir: remove redundant _path_to_mime
InboxWritable::maildir_path_load exists and we may support it for use with standalone scripts.
2019-10-15mda, watch: wire up List-ID header support
This also adds watchheader tests for -watch, which we never had before :x
2019-09-09run update-copyrights from gnulib for 2019
2019-07-06watch: allow multiple spam watch directories
Given most folks have multiple mail accounts, there's no reason we can't support multiple Maildirs.
2019-07-06watch: remove some indirectly-used imports
We can drop some unnecessary imports and now that we switched to InboxWritable.
2019-06-27watchmaildir: show the current path on spamcheck failures
Knowing which message failed a spam check is tough when I have many Maildirs and don't have a search indexing tool setup for spam mail.
2019-01-05filter/rubylang: fix SQLite DB lifetime problems
Clearly the AltId stuff was never tested for v2. Ensure this tricky filter (which reuses Msgmap to avoid introducing new serial numbers) doesn't trigger deadlocks SQLite due to opening a DB for writing multiple times. I went through several iterations of this change before going with this one, which is the least intrusive I could fine.
2019-01-05watchmaildir: normalize Maildir pathnames consistently
Remove redundant slashes while we're at it.
2019-01-05watchmaildir: get rid of unused spamdir field
Unused since commit 6c2caa791bd5fbf5c4edb1a4a2c1807e527348a7 ("watchmaildir: support v2 repositories")
2019-01-05watchmaildir: support multiple inboxes in the same Maildir
Not sure what I was smoking when I originally wrote this code. cf. https://public-inbox.org/meta/874li887mp.fsf@vuxu.org/
2019-01-02use PublicInbox::Config::each_inbox where appropriate
No need to reach into PublicInbox::Config internals and iterate through the hashref by hand
2018-07-29mda: allow configuring globally without spamc support
This reuses some of the configuration from -watch, but remains independent since some configurations will use -watch for some inboxes and -mda for others. The default remains "spamc" for -mda users so nothing changes without explicit configuration. Per-inbox configurations may also be supported in the future.
2018-04-19filter/rubylang: do not set altid on spam training
I suppose it's a bug or inconsistency that altid is write-only and their deletions do not get reflected. But for now, we do not set it when training spam so there's no window where an invalid NNTP article number shows up. This should solve the problem where there's massive gaps in messages solved by spam training for ruby groups: https://public-inbox.org/meta/20180307093754.GA27748@dcvr/
2018-03-20InboxWritable: add mbox/maildir parsing + import logic
This will make it easier to as well as supporting future Filter API users. It allows simplifying our ad-hoc import_vger_from_mbox script.
2018-03-20import: discard all the same headers as MDA
Reduce the places where we have duplicate logic for discarding unwanted headers.
2018-03-20introduce InboxWritable class
This code will be shared with future mass-import tools.
2018-03-19v2writable: allow disabling parallelization
While parallel processes improves import speed for initial imports; they are probably not necessary for daily mail imports via WatchMaildir and certainly not for public-inbox-init. Save some memory for daily use and even helps improve readability of some subroutines by showing which methods they call remotely.
2018-03-19watchmaildir: support v2 repositories
Unfortunately this gives up some minor performance tweaks we made to avoid reforking import processes.
2018-03-19import: force Message-ID generation for v1 here
This allows us to share code for generating Message-IDs between v1 and v2 repos. For v1, this introduces a slight incompatibility in message removal iff the original message lacked a Message-ID AND the training request came from a message which did not pass through the public-inbox: The workaround for this would be to reuse the bad message from the archive itself.
2018-03-19watchmaildir: use content_digest to generate Message-Id
This can probably be moved to Import for code reuse.
2018-03-19import: implement barrier operation for v1 repos
This will allow WatchMaildir to use ->barrier operations instead of reaching inside for nchg. This also ensures dumb HTTP clients can see changes to V2 repos immediately.
2018-02-28use PublicInbox::MIME consistently
It works around some bugs in older Email::MIME which we'll find useful.
2018-02-08watch_maildir: allow '-' in mail filename
Hostnames can contain '-' and this allows public-inbox-watch(1) to work on machines which generate Maildir files with '-' in them.
2018-02-07update copyrights for 2018
Using update-copyrights from gnulib While we're at it, use the SPDX identifier for AGPL-3.0+ to ease mechanical processing.
2017-11-16watch: use "spam" in commit message for removals
This makes it easy to identify the reason for message removals.
2017-06-26watch: avoid potential race condition while quitting
We must not trigger future activity when initializing a -watch shutdown.
2017-06-26watch: commit changes to fast-import sooner
We should make changes visible sooner, even during lengthy scans.
2017-06-26watch: use "self-inotify-tempfile trick" for quit
This should be more reliable and safer as it'll ensure existing fast-import instances are shut down properly.
2017-06-26watch: improve fairness during full rescans
We need to ensure new messages are being processed fairly during full rescans, so have the ->scan subroutine yield and reschedule itself. Additionally, having a long-running task inside the signal handler is dangerous and subject to reentrancy bugs. Due to the limitations of the Filesys::Notify::Simple interface, we cannot rely on multiplexing I/O interfaces (select, IO::Poll, Danga::Socket, etc...) for this. Forking a separate process was considered, but it is more expensive for a mostly-idle process. So, we use a variant of the "self-pipe trick" via inotify (or whatever Filesys::Notify::Simple gives us). Instead of writing to our own pipe, we write to a file in our own temporary directory watched by Filesys::Notify::Simple to trigger events in signal handlers.
2017-06-26watch: ensure HUP causes the scanner to be reloaded
Otherwise the old watcher may run indefinitely
2017-06-23watchmaildir: deal with rejected (100) messages
The RubyLang filter is strict about what messages it rejects, so the spam learning path will not auto-train or remove messages missing X-Mail-Count headers.
2017-06-22add filter for RubyLang lists
Unfortunately, it appears we have to reject this and instead add support filtering at View time(*), due to DKIM signatures in messages from ruby-lang.org. (*) which may not be worth it
2017-05-09watchmaildir: show $@ in warning message
It should be helpful to know what error happened.
2017-04-04watchmaildir: do not reject lowercase flags on Maildir files
Dovecot uses 'a'..'z' (lowercase) to designate keywords in Maildir flags. This was preventing certain messages from being marked as spam. https://wiki2.dovecot.org/MailboxFormat/Maildir
2017-01-26watchmaildir: allow arguments for filters
We'll want to allow some degree of configuration for various mailing lists.
2017-01-19watchmaildir: limit live importer processes
We don't want to be triggering OOM or swapping on weaker systems when we have dozens of inboxes as potential targets.
2017-01-10introduce PublicInbox::MIME wrapper class
This should fix problems with multipart messages where text/plain parts lack a header. cf. git clone --mirror https://github.com/rjbs/Email-MIME.git refs/pull/28/head In the future, we may still introduce as streaming interface to reduce memory usage on large emails.
2017-01-02watch: watchspam affects all configured inboxes
If a message is spam in one mailbox, it is spam in all others a particular user/group will care about.
2016-09-01watch: use "publicinboxwatch" namespace
We'll keep supporting "publicinboxlearn" indefinitely, but "publicinboxwatch" is probably more appropriate at the moment. Noticed while writing documentation.
2016-08-12watch: respect altid for incremental watch changes
We need to pass the Inbox object to SearchIdx to get altid mappings properly for incremental imports. TODO: use the Inbox object in more places where it makes sense to do so.
2016-06-26watch_maildir: warn on spam check failures
It would be nice to know about spamcheck failures.
2016-06-24watch_maildir: ignore Trash and Drafts, support Dovecot
Trashed messages and drafts are probably not intended for importing, so do not import them. Dovecot uses extra flags via lowercase letters, so we must support those (as that's the server I use).
2016-06-24watch_maildir: implement optional spam checking
Mailing lists I watch and mirror may not have the best spam filtering, and an extra layer should not hurt.
2016-06-24watch_maildir: rename _check_spam => _remove_spam
We do not actually do spam checking, here; but will do spam checking before adding a message in the future.