about summary refs log tree commit homepage
path: root/script/public-inbox-learn
DateCommit message (Collapse)
2020-02-06treewide: run update-copyrights from gnulib for 2019
I didn't wait until September to do it, this year!
2019-11-16learn: pass global variables into subs
Avoid 'Variable "%s" will not stay shared' warnings when the contents of this script eval'ed into a sub.
2019-10-30mda: support multiple List-ID matches
While it's not RFC2919-conformant, mail software can theoretically set multiple List-ID headers. Deliver to all inboxes which match a given List-ID since that's likely the intended. Cc: Eric W. Biederman <ebiederm@xmission.com> Link: https://public-inbox.org/meta/87pniltscf.fsf@x220.int.ebiederm.org/
2019-10-30mda: hoist out List-ID handling and reuse in -learn
It's now possible to inject false-positive ham into an inbox the same way -mda does via List-ID.
2019-10-30learn: hoist out remove_or_add subroutine
We'll be reusing it for List-ID processing in the next commit.
2019-10-30learn: GIT_COMMITTER_<NAME|EMAIL> may be "" or "0"
Users may be zeroes or blanks.
2019-10-30learn: update usage statement
Use <foo|bar> since that seems to be the favored notation for required command args (taking a hint from git(1) manpage). While we're at it, remove the space after '<' for the redirect to match git.git coding style.
2019-10-30learn: only map recipient list on "ham" or "rm"
It's assumed that "spam" can end up anywhere due to Bcc:, so we need to scan every single inbox. However, "rm" is usually more targeted and and "ham" obviously only belongs in some inboxes.
2019-10-30learn: support multiple To/Cc headers
It's possible to specify these headers multiple times, and PublicInbox::MDA->precheck takes that into account, so -learn should, too.
2019-09-09run update-copyrights from gnulib for 2019
2018-05-17learn: support for v2 repos
Oops, I mainly rely on public-inbox-watch for spam training and completely forgot this tool existed :x
2018-02-28use PublicInbox::MIME consistently
It works around some bugs in older Email::MIME which we'll find useful.
2018-02-07update copyrights for 2018
Using update-copyrights from gnulib While we're at it, use the SPDX identifier for AGPL-3.0+ to ease mechanical processing.
2017-11-16learn: use "spam" as subject for removal commits (part #2)
We need to use the correct subject when doing global scanning, too. In fact, the per-recipient spam training path is entirely redundant at this point.
2017-11-16learn: use "spam" as subject for removal commits
Sometimes an email is an innocent removal "rm" for a misdirected, off-topic post, while most removed messages are "spam". Allow anybody to look at history and easily distinguish the reason for removing the message.
2017-04-05learn: scan all inboxes when learning spam
This matches the behavior of the -watch daemon since 6d534038285ddd760709ba76ea007f9108200097 ("watch: watchspam affects all configured inboxes")
2017-01-19learn: implement "rm" only functionality
Do not consider this interface stable, but I just needed a way to remove mis-imported multipart messages so public-inbox-watch could pick them up again from my Maildir.
2017-01-10introduce PublicInbox::MIME wrapper class
This should fix problems with multipart messages where text/plain parts lack a header. cf. git clone --mirror https://github.com/rjbs/Email-MIME.git refs/pull/28/head In the future, we may still introduce as streaming interface to reduce memory usage on large emails.
2016-07-26learn: fix uninitialized variable
Oops :x
2016-06-26mda: drop leading "From " lines again
Oops... While we're at it, drop blank lines before the "From ", too, since it could happen.
2016-06-24split out spamcheck/spamc to its own module.
This should hopefully make it easier to try other anti-spam systems (or none at all) in the future.
2016-06-17import: auto-update index when done
This prevents multiple update processes from stepping over each other while called under the lock, and also allows the new -watch process to update the index iff indexing was desired.
2016-06-15mda: hook up new filter functionality
This removes the Email::Filter dependency as well as the signature-breaking scrubber code. We now prefer to reject unacceptable messages and grudgingly (and blindly) mirror messages we're not the primary endpoint for.
2016-06-15learn: remove IPC::Run dependency
We'll be relying on our spawn implementation, for now; since it'll be consistent with the rest of our code and can optionally take advantage of vfork.
2016-05-30script/*{mda,learn}: no strict params for Email::MIME::ContentType
User input is imperfect, do not pollute our mail logs with warnings we cannot fix. This is documented in the Email::MIME::ContentType manpage so it should remain supported.
2016-05-25remove Email::Address dependency
git has stricter requirements for ident names (no '<>') which Email::Address allows. Even in 1.908, Email::Address also has an incomplete fix for CVE-2015-7686 with a DoS-able regexp for comments. Since we don't care for or need all the RFC compliance of Email::Address, avoiding it entirely may be preferable. Email::Address will still be installed as a requirement for Email::MIME, but it is only used by the Email::MIME::header_str_set which we do not use
2016-05-14rename most instances of "list" to "inbox"
A public-inbox is NOT necessarily a mailing list, but it could serve as an input point for zero, one, or infinite mailing lists :D
2016-04-25remove ssoma dependency
By converting to using ourt git-fast-import-based Import module. This should allow us to be more easily installed.
2016-04-09public-inbox-learn: drop leading "From " line from mboxes
It can confuse Email::MIME if we have it.
2016-02-27move executables to script/ directory
This seems to match more closely with what is expected of Perl packages based on how blib is used. Hopefully makes the top-level source tree less cluttered and things easier-to-find.