user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* [PATCH 02/14] learn: only map recipient list on "ham" or "rm"
  2019-10-28 10:45  6% [PATCH 00/14] learn: sync w/ -mda changes and add manpage Eric Wong
@ 2019-10-28 10:45  7% ` Eric Wong
  0 siblings, 0 replies; 2+ results
From: Eric Wong @ 2019-10-28 10:45 UTC (permalink / raw)
  To: meta

It's assumed that "spam" can end up anywhere due to Bcc:, so we
need to scan every single inbox.  However, "rm" is usually more
targeted and and "ham" obviously only belongs in some inboxes.
---
 script/public-inbox-learn | 71 +++++++++++++++++++--------------------
 1 file changed, 35 insertions(+), 36 deletions(-)

diff --git a/script/public-inbox-learn b/script/public-inbox-learn
index 8ff1652b..d2d665d5 100755
--- a/script/public-inbox-learn
+++ b/script/public-inbox-learn
@@ -39,17 +39,7 @@ my $mime = PublicInbox::MIME->new(eval {
 	$data
 });
 
-# get all recipients
-my %dests;
-foreach my $h (qw(Cc To)) {
-	my @val = $mime->header($h) or next;
-	for (@val) {
-		foreach my $email (PublicInbox::Address::emails($_)) {
-			$dests{lc($email)} = 1;
-		}
-	}
-}
-
+# spam is removed from all known inboxes
 if ($train eq 'spam') {
 	$pi_config->each_inbox(sub {
 		my ($ibx) = @_;
@@ -58,36 +48,45 @@ if ($train eq 'spam') {
 		$im->remove($mime, 'spam');
 		$im->done;
 	});
-}
+} else {
+	require PublicInbox::MDA if $train eq "ham";
 
-require PublicInbox::MDA if $train eq "ham";
+	# get all recipients
+	my %dests; # address => <PublicInbox::Inbox|0(false)>
+	for ($mime->header('Cc'), $mime->header('To')) {
+		foreach my $addr (PublicInbox::Address::emails($_)) {
+			$addr = lc($addr);
+			$dests{$addr} //= $pi_config->lookup($addr) // 0;
+		}
+	}
 
-# n.b. message may be cross-posted to multiple public-inboxes
-foreach my $recipient (keys %dests) {
-	my $dst = $pi_config->lookup($recipient) or next;
-	# We do not touch GIT_COMMITTER_* env here so we can track
-	# who trained the message.
-	$dst->{name} = $ENV{GIT_COMMITTER_NAME} || $dst->{name};
-	$dst->{-primary_address} = $ENV{GIT_COMMITTER_EMAIL} || $recipient;
-	$dst = PublicInbox::InboxWritable->new($dst);
-	my $im = $dst->importer(0);
+	# n.b. message may be cross-posted to multiple public-inboxes
+	while (my ($addr, $dst) = each %dests) {
+		next unless ref($dst);
+		# We do not touch GIT_COMMITTER_* env here so we can track
+		# who trained the message.
+		$dst->{name} = $ENV{GIT_COMMITTER_NAME} || $dst->{name};
+		$dst->{-primary_address} = $ENV{GIT_COMMITTER_EMAIL} || $addr;
+		$dst = PublicInbox::InboxWritable->new($dst);
+		my $im = $dst->importer(0);
 
-	if ($train eq "spam" || $train eq "rm") {
-		# This needs to be idempotent, as my inotify trainer
-		# may train for each cross-posted message, and this
-		# script already learns for every list in
-		# ~/.public-inbox/config
-		$im->remove($mime, $train);
-	} else { # $train eq "ham"
-		# no checking for spam here, we assume the message has
-		# been reviewed by a human at this point:
-		PublicInbox::MDA->set_list_headers($mime, $dst);
+		if ($train eq "rm") {
+			# This needs to be idempotent, as my inotify trainer
+			# may train for each cross-posted message, and this
+			# script already learns for every list in
+			# ~/.public-inbox/config
+			$im->remove($mime, $train);
+		} elsif ($train eq "ham") {
+			# no checking for spam here, we assume the message has
+			# been reviewed by a human at this point:
+			PublicInbox::MDA->set_list_headers($mime, $dst);
 
-		# Ham messages are trained when they're marked into
-		# a SEEN state, so this is idempotent:
-		$im->add($mime);
+			# Ham messages are trained when they're marked into
+			# a SEEN state, so this is idempotent:
+			$im->add($mime);
+		}
+		$im->done;
 	}
-	$im->done;
 }
 
 if ($err) {

^ permalink raw reply related	[relevance 7%]

* [PATCH 00/14] learn: sync w/ -mda changes and add manpage
@ 2019-10-28 10:45  6% Eric Wong
  2019-10-28 10:45  7% ` [PATCH 02/14] learn: only map recipient list on "ham" or "rm" Eric Wong
  0 siblings, 1 reply; 2+ results
From: Eric Wong @ 2019-10-28 10:45 UTC (permalink / raw)
  To: meta

What started with adding a manpage for public-inbox-learn,
ended up being a bunch of fixes and improvements to catch
up to -mda changes.

-mda also learned to deal with multiple List-ID headers in the
meantime.

Eric Wong (14):
  learn: support multiple To/Cc headers
  learn: only map recipient list on "ham" or "rm"
  learn: update usage statement
  learn: GIT_COMMITTER_<NAME|EMAIL> may be "" or "0"
  learn: hoist out remove_or_add subroutine
  mda: hoist out List-ID handling and reuse in -learn
  filter/base: remove MAX_MID_SIZE constant
  mda: hoist out mda_filter_adjust
  mda: skip MIME parsing if spam
  inboxwritable: add assert_usable_dir sub
  mda: prepare for multiple destinations
  mda: support multiple List-ID matches
  learn: allow running without spamc
  doc: add public-inbox-learn(1) manpage

 Documentation/include.mk             |   1 +
 Documentation/public-inbox-learn.pod |  86 +++++++++++++++++++++
 MANIFEST                             |   1 +
 lib/PublicInbox/Filter/Base.pm       |   1 -
 lib/PublicInbox/InboxWritable.pm     |   9 ++-
 lib/PublicInbox/MDA.pm               |  22 ++++++
 lib/PublicInbox/V2Writable.pm        |   5 +-
 script/public-inbox-learn            |  84 +++++++++++---------
 script/public-inbox-mda              | 110 ++++++++++++++++-----------
 t/import.t                           |   8 ++
 t/mda.t                              |  19 +++++
 t/v2writable.t                       |  12 +++
 12 files changed, 275 insertions(+), 83 deletions(-)
 create mode 100644 Documentation/public-inbox-learn.pod
 mode change 100755 => 100644 script/public-inbox-learn


^ permalink raw reply	[relevance 6%]

Results 1-2 of 2 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2019-10-28 10:45  6% [PATCH 00/14] learn: sync w/ -mda changes and add manpage Eric Wong
2019-10-28 10:45  7% ` [PATCH 02/14] learn: only map recipient list on "ham" or "rm" Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).