about summary refs log tree commit homepage
path: root/script/public-inbox-mda
DateCommit message (Collapse)
2023-11-13treewide: update read_all to avoid eof|close checks
read_all can be expanded to support FIFOs/pipes/sockets where read-until-EOF behavior is desired. We can also rely on wantarray to support splitting on EOL markers, but it's hard-coded to support only `$/ eq "\n"' since (AFAIK) it's the only way we use the wantarray form `readline'.
2023-11-11mda: fix and test some usage problems
-mda now honors `--help' properly and invocations missing ORIGINAL_RECIPIENT now fail with EX_NOUSER. Helped-by: Leah Neukirchen <leah@vuxu.org> Link: https://public-inbox.org/meta/87msvlguqu.fsf@vuxu.org/
2023-11-11mda|learn|watch: support dropUniqueUnsubscribe config
List-Unsubscribe headers with unique identifiers (such as those generated by our examples/unsubscribe.milter) should not end up in public archives. Add a new config knob to strip List-Unsubscribe headers if they have the `List-Unsubscribe-Post: List-Unsubscribe=One-Click' header. Unfortunately, this breaks DKIM signatures if the signature covers either of these List-Unsubscribe* headers. However, breaking DKIM is the lesser evil compared to any archive reader being able to stop archival by an independent archivist. As much as I would like this to be the default, it probably affects few users at the moment since very few mailing lists use unique identifiers in List-Unsubscribe (but that number has grown, recently).
2023-10-11treewide: consolidate "From " line removal
Aside from our prior import bugs (fixed in a0c07cba0e5d8b6a (mda: drop leading "From " lines again, 2016-06-26)), we'll always have to be dealing with mutt piping messages to us and `git format-patch' output. So just share the regexp so we can use it everywhere. In may be desirable to allow importing messages with a leading "From " line for FUSE, even. Additionally, some instances of this regexp needlessly added optional `\r?' (CR) checks ahead of the `\n' (LF) element; but they're pointless anyways since [^\n]* is enough to exclude all non-LF bytes.
2023-08-28Fix some typos/grammar/errors in docs and comments
2021-01-01update copyrights for 2021
Using "make update-copyrights" after setting GNULIB_PATH in my config.mak
2020-12-09rename {pi_config} fields to {pi_cfg}
{pi_config} may be confused with the documented `PI_CONFIG' environment variable, and we'll favor vowel-removal to be consistent with our usage of object references. The `pi_' prefix may stay in some places, for now; since a separate namespace may come into this codebase for local/private client-tooling. For InboxIdle, we'll also remove an invalid comment about holding a reference to the PublicInbox::Config object, too.
2020-09-02mda+learn: add --help / -h support
"use Getopt::Long" doesn't seem too slow on a hot page cache, and it's probably used frequently enough to be in cache. We'll also start reducing the amount of markup in the .pod and favoring verbatim text in documentation for readability in source form, since the bold text seems excessive.
2020-08-02remove unnecessary ->header_obj calls
We used ->header_obj in the past as an optimization with Email::MIME. That optimization is no longer necessary with PublicInbox::Eml. This doesn't make any functional difference even if we were to go back to Email::MIME. However, it reduces the amount of code we have and slightly reduces allocations with PublicInbox::Eml.
2020-05-09replace most uses of PublicInbox::MIME with Eml
PublicInbox::Eml has enough functionality to replace the Email::MIME-based PublicInbox::MIME.
2020-04-19favor `do {}' over `eval {}' for localized slurp
I did not know to use the return value of `do' back in the day. There's probably no practical difference in these cases, but `eval' is overkill for these uses and may hide actual errors. We can get rid of a few redundant `scalar' ops and pass scalar refs to Email::MIME->new to avoid copies in a few more places, too.
2020-02-06treewide: run update-copyrights from gnulib for 2019
I didn't wait until September to do it, this year!
2019-11-16mda: pass global variables into subs
Avoid 'Variable "%s" will not stay shared' warnings when the contents of this script eval'ed into a sub.
2019-10-30mda: support multiple List-ID matches
While it's not RFC2919-conformant, mail software can theoretically set multiple List-ID headers. Deliver to all inboxes which match a given List-ID since that's likely the intended. Cc: Eric W. Biederman <ebiederm@xmission.com> Link: https://public-inbox.org/meta/87pniltscf.fsf@x220.int.ebiederm.org/
2019-10-30mda: prepare for multiple destinations
Multiple List-ID headers will be supported in the next commit
2019-10-30inboxwritable: add assert_usable_dir sub
And use it for mda, since "0" could be a usable directory if somebody insists on using relative paths...
2019-10-30mda: skip MIME parsing if spam
We don't want to waste cycles parsing the message for MIME bits if it's spam.
2019-10-30mda: hoist out mda_filter_adjust
It makes it easier to document the default -mda behavior is stricter than normal, including "public-inbox-learn ham"
2019-10-30mda: hoist out List-ID handling and reuse in -learn
It's now possible to inject false-positive ham into an inbox the same way -mda does via List-ID.
2019-10-17Merge remote-tracking branch 'origin/inboxdir'
* origin/inboxdir: config: remove redundant inboxdir check config: support "inboxdir" in addition to "mainrepo" examples/grok-pull.post_update_hook: use "inbox_dir"
2019-10-16config: support "inboxdir" in addition to "mainrepo"
"mainrepo" ws a bad name and artifact from the early days when I intended for there to be a "spamrepo" (now just the ENV{PI_EMERGENCY} Maildir). With v2, "mainrepo" can be especially confusing, since v2 needs at least two git repositories (epoch + all.git) to function and we shouldn't confuse users by having them point to a git repository for v2. Much of our documentation already references "INBOX_DIR" for command-line arguments, so use "inboxdir" as the git-config(1)-friendly variant for that. "mainrepo" remains supported indefinitely for compatibility. Users may need to revert to old versions, or may be referring to old documentation and must not be forced to change config files to account for this change. So if you're using "mainrepo" today, I do NOT recommend changing it right away because other bugs can lurk. Link: https://public-inbox.org/meta/874l0ice8v.fsf@alyssa.is/
2019-10-16mda: support --no-precheck option
Since -mda now supports List-ID to better support mirroring of existing mailing lists, it probably makes sense to support disabling the precheck function to provide more accurate (though potentially spammier) mirrors of lists
2019-10-15mda, watch: wire up List-ID header support
This also adds watchheader tests for -watch, which we never had before :x
2019-09-09run update-copyrights from gnulib for 2019
2019-01-05filter/rubylang: fix SQLite DB lifetime problems
Clearly the AltId stuff was never tested for v2. Ensure this tricky filter (which reuses Msgmap to avoid introducing new serial numbers) doesn't trigger deadlocks SQLite due to opening a DB for writing multiple times. I went through several iterations of this change before going with this one, which is the least intrusive I could fine.
2018-07-29mda: allow configuring globally without spamc support
This reuses some of the configuration from -watch, but remains independent since some configurations will use -watch for some inboxes and -mda for others. The default remains "spamc" for -mda users so nothing changes without explicit configuration. Per-inbox configurations may also be supported in the future.
2018-07-29mda: v2: ensure message bodies are indexed
We must not clobber the original message string, as Email::MIME(*) still needs it for iterating through parts in SearchIdx (but not when handing it as a raw string to git-fast-import). I've noticed message bodies (especially dfpre/dpost) were not getting indexed when going through -mda (no problems with -watch). This also did not affect v1 repos, since indexing is a separate process for v1 and requires re-reading the data from git. (*) tested Email::MIME 1.937 on Debian stretch
2018-07-29mda: use InboxWritable
It's a convenient wrapper nowadays, so get rid of some legacy code and minimize differences from the -watch code.
2018-06-12public-inbox-mda: use <sysexits.h> status codes where applicable
Many MTA understand these and map them to sensible SMTP error messages. Inability to find an inbox results in "5.1.1 user unknown". Misformatted messages are rejected with "5.6.0 data format error". Unsupported inbox versions are reported as "5.3.5 local configuration error". All of these are interpreted as permanent failures.
2018-03-29mda: support v2 inboxes
I mainly focus on -watch for mirroring busy mailing lists, but using -mda should remain an option.
2018-02-28use PublicInbox::MIME consistently
It works around some bugs in older Email::MIME which we'll find useful.
2018-02-07update copyrights for 2018
Using update-copyrights from gnulib While we're at it, use the SPDX identifier for AGPL-3.0+ to ease mechanical processing.
2017-01-10introduce PublicInbox::MIME wrapper class
This should fix problems with multipart messages where text/plain parts lack a header. cf. git clone --mirror https://github.com/rjbs/Email-MIME.git refs/pull/28/head In the future, we may still introduce as streaming interface to reduce memory usage on large emails.
2016-07-26mda: always call Import::done, even on dupes
We don't want to leave fast_import_crash_* dumps around on duplicates.
2016-06-26mda: drop leading "From " lines again
Oops... While we're at it, drop blank lines before the "From ", too, since it could happen.
2016-06-24split out spamcheck/spamc to its own module.
This should hopefully make it easier to try other anti-spam systems (or none at all) in the future.
2016-06-21spawn: improve error checking for fork failures
fork failures are unfortunately common when Xapian has gigabytes and gigabytes mmapped.
2016-06-17import: auto-update index when done
This prevents multiple update processes from stepping over each other while called under the lock, and also allows the new -watch process to update the index iff indexing was desired.
2016-06-17mda: support loading arbitrary filters
Give users some rope to do their own filtering.
2016-06-15mda: hook up new filter functionality
This removes the Email::Filter dependency as well as the signature-breaking scrubber code. We now prefer to reject unacceptable messages and grudgingly (and blindly) mirror messages we're not the primary endpoint for.
2016-06-15mda: precheck no longer depends on Email::Filter
Email::Filter doesn't offer any functionality we need, here; and our dependency on Email::Filter will gradually be removed since it (and Email::LocalDelivery) seem abandoned and we can have more-fine-grained control by rolling our own Maildir delivery which can work transactionally.
2016-06-15drop dependency on File::Path::Expand
We still pull it in via Email::LocalDelivery, but that dependency will go away, soon.
2016-05-30script/*{mda,learn}: no strict params for Email::MIME::ContentType
User input is imperfect, do not pollute our mail logs with warnings we cannot fix. This is documented in the Email::MIME::ContentType manpage so it should remain supported.
2016-05-25remove Email::Address dependency
git has stricter requirements for ident names (no '<>') which Email::Address allows. Even in 1.908, Email::Address also has an incomplete fix for CVE-2015-7686 with a DoS-able regexp for comments. Since we don't care for or need all the RFC compliance of Email::Address, avoiding it entirely may be preferable. Email::Address will still be installed as a requirement for Email::MIME, but it is only used by the Email::MIME::header_str_set which we do not use
2016-05-16declare Inbox object for reusability
From the beginning, we've avoided objects here in favor of faster startup time; but it may not be worth it since a persistent httpd/nntpd is faster and -mda isn't hit as often.
2016-05-14rename most instances of "list" to "inbox"
A public-inbox is NOT necessarily a mailing list, but it could serve as an input point for zero, one, or infinite mailing lists :D
2016-04-28import: run git-update-server-info when done
We should update $GIT_DIR/info/refs for dumb HTTP clients whenever we make changes to the repository. The best place to update is immediately after making commits. This fixes a bug where public-inbox-learn did not properly update $GIT_DIR/info/refs after inserting or removing messages.
2016-04-25remove ssoma dependency
By converting to using ourt git-fast-import-based Import module. This should allow us to be more easily installed.
2016-02-27move executables to script/ directory
This seems to match more closely with what is expected of Perl packages based on how blib is used. Hopefully makes the top-level source tree less cluttered and things easier-to-find.