about summary refs log tree commit homepage
path: root/lib/PublicInbox/Filter
DateCommit message (Collapse)
2022-11-28filter/rubylang: adjust filter for new list software
The host serving ruby-core and ruby-dev no longer set X-Mail-Count, but the serial number remains active in the Subject.
2021-02-12filter/vger: kill trailing newlines aggressively
PublicInbox::MboxReader->(mboxrd|mboxo) only deletes the last trailing newline, not every single trailing newline like InboxWritable->import_mbox does. Testing PublicInbox::MboxReader->mboxrd (next commit) with scripts/import_vger_from_mbox on the LKML archive I got 2018 for v2 development; this difference was responsible for a single spam message(*) from out of 2722831 not being filtered correctly and returning a different result. (*) dated 2014-08-25
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2020-12-09treewide: replace {-inbox} with {ibx} for consistency
{ibx} is shorter and is the most prevalent abbreviation in indexing and IMAP code, and the `$ibx' local variable is already prevalent throughout. In general, the codebase favors removal of vowels in variable and field names to denote non-references (because references are "lighter" than non-references). So update WWW and Filter users to use the same code since it reduces confusion and may allow easier code sharing.
2020-05-09replace most uses of PublicInbox::MIME with Eml
PublicInbox::Eml has enough functionality to replace the Email::MIME-based PublicInbox::MIME.
2020-05-09filter/rubylang: avoid recursing subparts to strip trailers
Mailman only seems to add trailers (or signatures) as attachments at the top-level of MIME messages. So don't bother recursing with ->walk_parts since ->walk_parts is non-trivial to recreate in the Email::MIME replacement I'm working on.
2020-02-06treewide: run update-copyrights from gnulib for 2019
I didn't wait until September to do it, this year!
2020-01-06treewide: "require" + "use" cleanup and docs
There's a bunch of leftover "require" and "use" statements we no longer need and can get rid of, along with some excessive imports via "use". IO::Handle usage isn't always obvious, so add comments describing why a package loads it. Along the same lines, document the tmpdir support as the reason we depend on File::Temp 0.19, even though every Perl 5.10.1+ user has it. While we're at it, favor "use" over "require", since it it gives us extra compile-time checking.
2020-01-01filter/base: export REJECT as a constant
And update callers to use it, as it makes the code a bit cleaner. Probably irrelvant, but it should be faster, too, as "perl -I lib -w -MO=Deparse $FILE" shows REJECT() calls are constant-folded.
2019-10-30filter/base: remove MAX_MID_SIZE constant
We don't need it in the filter, here, since we have one in the MDA package.
2019-09-09run update-copyrights from gnulib for 2019
2019-06-04filter/rubylang: require ASCII digit for mailcount
Unlikely to matter, but who knows...
2018-12-28add filter for gmane archives
Extracted from import_slrnspool, since some spools get converted to mbox or what not.
2018-04-19filter/rubylang: do not set altid on spam training
I suppose it's a bug or inconsistency that altid is write-only and their deletions do not get reflected. But for now, we do not set it when training spam so there's no window where an invalid NNTP article number shows up. This should solve the problem where there's massive gaps in messages solved by spam training for ruby groups: https://public-inbox.org/meta/20180307093754.GA27748@dcvr/
2018-04-05support altid mechanism for v2
There's enough gmane links out there in wild that it makes sense to maintain support for these mappings.
2018-02-07update copyrights for 2018
Using update-copyrights from gnulib While we're at it, use the SPDX identifier for AGPL-3.0+ to ease mechanical processing.
2017-06-22filter/rubylang: reuse altid entry from inbox object
This allows users to DRY up their config a bit and avoid specifying altid twice when reusing the NNTP-centric msgmap for [ruby-*:\d+] serial numbers. My current work-in-progress ~/.public-inbox/config entry for the ruby-core list is: ------8<------- [publicinbox "ruby-core"] address = ruby-core@ruby-lang.org url = //public-inbox.org/ruby-core mainrepo = /path/to/ruby-core.git newsgroup = inbox.comp.lang.ruby.core watchheader = List-Id:<ruby-core.ruby-lang.org> altid = serial:ruby-core:file=msgmap.sqlite3 watch = maildir:/path/to/Maildir/.INBOX.ruby filter = PublicInbox::Filter::RubyLang
2017-06-22add filter for RubyLang lists
Unfortunately, it appears we have to reject this and instead add support filtering at View time(*), due to DKIM signatures in messages from ruby-lang.org. (*) which may not be worth it
2017-06-07filter/subjecttag: account for missing Subject: header
This is a high indicator of spam (but out-of-scope for this particular module) but sometimes it is not, and people legitimately forget to set a Subject: header at all.
2017-01-26add filter for Subject: tags
Some mailing lists add annoying tags into the Subject line which discourages readers from doing proper mail organization on the client side. They also waste precious screen space and attention span. Remove them from our archives to reduce clutter.
2017-01-10introduce PublicInbox::MIME wrapper class
This should fix problems with multipart messages where text/plain parts lack a header. cf. git clone --mirror https://github.com/rjbs/Email-MIME.git refs/pull/28/head In the future, we may still introduce as streaming interface to reduce memory usage on large emails.
2016-06-17filter/base: reject more types by default
Try to be descriptive for some of these.
2016-06-17filter: split out scrub method from delivery
We will scrub for importing archives, so ensure it is usable outside of the delivery routine.
2016-06-15filter: begin work on a new filter API
This filter API should be independent of Email::Filter and hopefully less intrusive to long running processes.