From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Subject: [PATCH 02/14] learn: only map recipient list on "ham" or "rm"
Date: Mon, 28 Oct 2019 10:45:16 +0000 [thread overview]
Message-ID: <20191028104528.10140-3-e@80x24.org> (raw)
In-Reply-To: <20191028104528.10140-1-e@80x24.org>
It's assumed that "spam" can end up anywhere due to Bcc:, so we
need to scan every single inbox. However, "rm" is usually more
targeted and and "ham" obviously only belongs in some inboxes.
---
script/public-inbox-learn | 71 +++++++++++++++++++--------------------
1 file changed, 35 insertions(+), 36 deletions(-)
diff --git a/script/public-inbox-learn b/script/public-inbox-learn
index 8ff1652b..d2d665d5 100755
--- a/script/public-inbox-learn
+++ b/script/public-inbox-learn
@@ -39,17 +39,7 @@ my $mime = PublicInbox::MIME->new(eval {
$data
});
-# get all recipients
-my %dests;
-foreach my $h (qw(Cc To)) {
- my @val = $mime->header($h) or next;
- for (@val) {
- foreach my $email (PublicInbox::Address::emails($_)) {
- $dests{lc($email)} = 1;
- }
- }
-}
-
+# spam is removed from all known inboxes
if ($train eq 'spam') {
$pi_config->each_inbox(sub {
my ($ibx) = @_;
@@ -58,36 +48,45 @@ if ($train eq 'spam') {
$im->remove($mime, 'spam');
$im->done;
});
-}
+} else {
+ require PublicInbox::MDA if $train eq "ham";
-require PublicInbox::MDA if $train eq "ham";
+ # get all recipients
+ my %dests; # address => <PublicInbox::Inbox|0(false)>
+ for ($mime->header('Cc'), $mime->header('To')) {
+ foreach my $addr (PublicInbox::Address::emails($_)) {
+ $addr = lc($addr);
+ $dests{$addr} //= $pi_config->lookup($addr) // 0;
+ }
+ }
-# n.b. message may be cross-posted to multiple public-inboxes
-foreach my $recipient (keys %dests) {
- my $dst = $pi_config->lookup($recipient) or next;
- # We do not touch GIT_COMMITTER_* env here so we can track
- # who trained the message.
- $dst->{name} = $ENV{GIT_COMMITTER_NAME} || $dst->{name};
- $dst->{-primary_address} = $ENV{GIT_COMMITTER_EMAIL} || $recipient;
- $dst = PublicInbox::InboxWritable->new($dst);
- my $im = $dst->importer(0);
+ # n.b. message may be cross-posted to multiple public-inboxes
+ while (my ($addr, $dst) = each %dests) {
+ next unless ref($dst);
+ # We do not touch GIT_COMMITTER_* env here so we can track
+ # who trained the message.
+ $dst->{name} = $ENV{GIT_COMMITTER_NAME} || $dst->{name};
+ $dst->{-primary_address} = $ENV{GIT_COMMITTER_EMAIL} || $addr;
+ $dst = PublicInbox::InboxWritable->new($dst);
+ my $im = $dst->importer(0);
- if ($train eq "spam" || $train eq "rm") {
- # This needs to be idempotent, as my inotify trainer
- # may train for each cross-posted message, and this
- # script already learns for every list in
- # ~/.public-inbox/config
- $im->remove($mime, $train);
- } else { # $train eq "ham"
- # no checking for spam here, we assume the message has
- # been reviewed by a human at this point:
- PublicInbox::MDA->set_list_headers($mime, $dst);
+ if ($train eq "rm") {
+ # This needs to be idempotent, as my inotify trainer
+ # may train for each cross-posted message, and this
+ # script already learns for every list in
+ # ~/.public-inbox/config
+ $im->remove($mime, $train);
+ } elsif ($train eq "ham") {
+ # no checking for spam here, we assume the message has
+ # been reviewed by a human at this point:
+ PublicInbox::MDA->set_list_headers($mime, $dst);
- # Ham messages are trained when they're marked into
- # a SEEN state, so this is idempotent:
- $im->add($mime);
+ # Ham messages are trained when they're marked into
+ # a SEEN state, so this is idempotent:
+ $im->add($mime);
+ }
+ $im->done;
}
- $im->done;
}
if ($err) {
next prev parent reply other threads:[~2019-10-28 10:45 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-28 10:45 [PATCH 00/14] learn: sync w/ -mda changes and add manpage Eric Wong
2019-10-28 10:45 ` [PATCH 01/14] learn: support multiple To/Cc headers Eric Wong
2019-10-28 10:45 ` Eric Wong [this message]
2019-10-28 10:45 ` [PATCH 03/14] learn: update usage statement Eric Wong
2019-10-28 10:45 ` [PATCH 04/14] learn: GIT_COMMITTER_<NAME|EMAIL> may be "" or "0" Eric Wong
2019-10-28 10:45 ` [PATCH 05/14] learn: hoist out remove_or_add subroutine Eric Wong
2019-10-28 10:45 ` [PATCH 06/14] mda: hoist out List-ID handling and reuse in -learn Eric Wong
2019-10-28 10:45 ` [PATCH 07/14] filter/base: remove MAX_MID_SIZE constant Eric Wong
2019-10-28 10:45 ` [PATCH 08/14] mda: hoist out mda_filter_adjust Eric Wong
2019-10-28 10:45 ` [PATCH 09/14] mda: skip MIME parsing if spam Eric Wong
2019-10-28 10:45 ` [PATCH 10/14] inboxwritable: add assert_usable_dir sub Eric Wong
2019-10-28 10:45 ` [PATCH 11/14] mda: prepare for multiple destinations Eric Wong
2019-10-28 10:45 ` [PATCH 12/14] mda: support multiple List-ID matches Eric Wong
2019-10-28 18:05 ` Eric W. Biederman
2019-10-30 21:32 ` Eric Wong
2019-10-28 10:45 ` [PATCH 13/14] learn: allow running without spamc Eric Wong
2019-10-28 10:45 ` [PATCH 14/14] doc: add public-inbox-learn(1) manpage Eric Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191028104528.10140-3-e@80x24.org \
--to=e@80x24.org \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).