user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
* [PATCH 00/14] learn: sync w/ -mda changes and add manpage
@ 2019-10-28 10:45 Eric Wong
  2019-10-28 10:45 ` [PATCH 01/14] learn: support multiple To/Cc headers Eric Wong
                   ` (13 more replies)
  0 siblings, 14 replies; 17+ messages in thread
From: Eric Wong @ 2019-10-28 10:45 UTC (permalink / raw)
  To: meta

What started with adding a manpage for public-inbox-learn,
ended up being a bunch of fixes and improvements to catch
up to -mda changes.

-mda also learned to deal with multiple List-ID headers in the
meantime.

Eric Wong (14):
  learn: support multiple To/Cc headers
  learn: only map recipient list on "ham" or "rm"
  learn: update usage statement
  learn: GIT_COMMITTER_<NAME|EMAIL> may be "" or "0"
  learn: hoist out remove_or_add subroutine
  mda: hoist out List-ID handling and reuse in -learn
  filter/base: remove MAX_MID_SIZE constant
  mda: hoist out mda_filter_adjust
  mda: skip MIME parsing if spam
  inboxwritable: add assert_usable_dir sub
  mda: prepare for multiple destinations
  mda: support multiple List-ID matches
  learn: allow running without spamc
  doc: add public-inbox-learn(1) manpage

 Documentation/include.mk             |   1 +
 Documentation/public-inbox-learn.pod |  86 +++++++++++++++++++++
 MANIFEST                             |   1 +
 lib/PublicInbox/Filter/Base.pm       |   1 -
 lib/PublicInbox/InboxWritable.pm     |   9 ++-
 lib/PublicInbox/MDA.pm               |  22 ++++++
 lib/PublicInbox/V2Writable.pm        |   5 +-
 script/public-inbox-learn            |  84 +++++++++++---------
 script/public-inbox-mda              | 110 ++++++++++++++++-----------
 t/import.t                           |   8 ++
 t/mda.t                              |  19 +++++
 t/v2writable.t                       |  12 +++
 12 files changed, 275 insertions(+), 83 deletions(-)
 create mode 100644 Documentation/public-inbox-learn.pod
 mode change 100755 => 100644 script/public-inbox-learn


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 01/14] learn: support multiple To/Cc headers
  2019-10-28 10:45 [PATCH 00/14] learn: sync w/ -mda changes and add manpage Eric Wong
@ 2019-10-28 10:45 ` Eric Wong
  2019-10-28 10:45 ` [PATCH 02/14] learn: only map recipient list on "ham" or "rm" Eric Wong
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2019-10-28 10:45 UTC (permalink / raw)
  To: meta

It's possible to specify these headers multiple times, and
PublicInbox::MDA->precheck takes that into account, so
-learn should, too.
---
 script/public-inbox-learn | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/script/public-inbox-learn b/script/public-inbox-learn
index c4c4d4b9..8ff1652b 100755
--- a/script/public-inbox-learn
+++ b/script/public-inbox-learn
@@ -42,9 +42,11 @@ my $mime = PublicInbox::MIME->new(eval {
 # get all recipients
 my %dests;
 foreach my $h (qw(Cc To)) {
-	my $val = $mime->header($h) or next;
-	foreach my $email (PublicInbox::Address::emails($val)) {
-		$dests{lc($email)} = 1;
+	my @val = $mime->header($h) or next;
+	for (@val) {
+		foreach my $email (PublicInbox::Address::emails($_)) {
+			$dests{lc($email)} = 1;
+		}
 	}
 }
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 02/14] learn: only map recipient list on "ham" or "rm"
  2019-10-28 10:45 [PATCH 00/14] learn: sync w/ -mda changes and add manpage Eric Wong
  2019-10-28 10:45 ` [PATCH 01/14] learn: support multiple To/Cc headers Eric Wong
@ 2019-10-28 10:45 ` Eric Wong
  2019-10-28 10:45 ` [PATCH 03/14] learn: update usage statement Eric Wong
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2019-10-28 10:45 UTC (permalink / raw)
  To: meta

It's assumed that "spam" can end up anywhere due to Bcc:, so we
need to scan every single inbox.  However, "rm" is usually more
targeted and and "ham" obviously only belongs in some inboxes.
---
 script/public-inbox-learn | 71 +++++++++++++++++++--------------------
 1 file changed, 35 insertions(+), 36 deletions(-)

diff --git a/script/public-inbox-learn b/script/public-inbox-learn
index 8ff1652b..d2d665d5 100755
--- a/script/public-inbox-learn
+++ b/script/public-inbox-learn
@@ -39,17 +39,7 @@ my $mime = PublicInbox::MIME->new(eval {
 	$data
 });
 
-# get all recipients
-my %dests;
-foreach my $h (qw(Cc To)) {
-	my @val = $mime->header($h) or next;
-	for (@val) {
-		foreach my $email (PublicInbox::Address::emails($_)) {
-			$dests{lc($email)} = 1;
-		}
-	}
-}
-
+# spam is removed from all known inboxes
 if ($train eq 'spam') {
 	$pi_config->each_inbox(sub {
 		my ($ibx) = @_;
@@ -58,36 +48,45 @@ if ($train eq 'spam') {
 		$im->remove($mime, 'spam');
 		$im->done;
 	});
-}
+} else {
+	require PublicInbox::MDA if $train eq "ham";
 
-require PublicInbox::MDA if $train eq "ham";
+	# get all recipients
+	my %dests; # address => <PublicInbox::Inbox|0(false)>
+	for ($mime->header('Cc'), $mime->header('To')) {
+		foreach my $addr (PublicInbox::Address::emails($_)) {
+			$addr = lc($addr);
+			$dests{$addr} //= $pi_config->lookup($addr) // 0;
+		}
+	}
 
-# n.b. message may be cross-posted to multiple public-inboxes
-foreach my $recipient (keys %dests) {
-	my $dst = $pi_config->lookup($recipient) or next;
-	# We do not touch GIT_COMMITTER_* env here so we can track
-	# who trained the message.
-	$dst->{name} = $ENV{GIT_COMMITTER_NAME} || $dst->{name};
-	$dst->{-primary_address} = $ENV{GIT_COMMITTER_EMAIL} || $recipient;
-	$dst = PublicInbox::InboxWritable->new($dst);
-	my $im = $dst->importer(0);
+	# n.b. message may be cross-posted to multiple public-inboxes
+	while (my ($addr, $dst) = each %dests) {
+		next unless ref($dst);
+		# We do not touch GIT_COMMITTER_* env here so we can track
+		# who trained the message.
+		$dst->{name} = $ENV{GIT_COMMITTER_NAME} || $dst->{name};
+		$dst->{-primary_address} = $ENV{GIT_COMMITTER_EMAIL} || $addr;
+		$dst = PublicInbox::InboxWritable->new($dst);
+		my $im = $dst->importer(0);
 
-	if ($train eq "spam" || $train eq "rm") {
-		# This needs to be idempotent, as my inotify trainer
-		# may train for each cross-posted message, and this
-		# script already learns for every list in
-		# ~/.public-inbox/config
-		$im->remove($mime, $train);
-	} else { # $train eq "ham"
-		# no checking for spam here, we assume the message has
-		# been reviewed by a human at this point:
-		PublicInbox::MDA->set_list_headers($mime, $dst);
+		if ($train eq "rm") {
+			# This needs to be idempotent, as my inotify trainer
+			# may train for each cross-posted message, and this
+			# script already learns for every list in
+			# ~/.public-inbox/config
+			$im->remove($mime, $train);
+		} elsif ($train eq "ham") {
+			# no checking for spam here, we assume the message has
+			# been reviewed by a human at this point:
+			PublicInbox::MDA->set_list_headers($mime, $dst);
 
-		# Ham messages are trained when they're marked into
-		# a SEEN state, so this is idempotent:
-		$im->add($mime);
+			# Ham messages are trained when they're marked into
+			# a SEEN state, so this is idempotent:
+			$im->add($mime);
+		}
+		$im->done;
 	}
-	$im->done;
 }
 
 if ($err) {

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 03/14] learn: update usage statement
  2019-10-28 10:45 [PATCH 00/14] learn: sync w/ -mda changes and add manpage Eric Wong
  2019-10-28 10:45 ` [PATCH 01/14] learn: support multiple To/Cc headers Eric Wong
  2019-10-28 10:45 ` [PATCH 02/14] learn: only map recipient list on "ham" or "rm" Eric Wong
@ 2019-10-28 10:45 ` Eric Wong
  2019-10-28 10:45 ` [PATCH 04/14] learn: GIT_COMMITTER_<NAME|EMAIL> may be "" or "0" Eric Wong
                   ` (10 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2019-10-28 10:45 UTC (permalink / raw)
  To: meta

Use <foo|bar> since that seems to be the favored notation
for required command args (taking a hint from git(1) manpage).
While we're at it, remove the space after '<' for the redirect
to match git.git coding style.
---
 script/public-inbox-learn | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/script/public-inbox-learn b/script/public-inbox-learn
index d2d665d5..ad132985 100755
--- a/script/public-inbox-learn
+++ b/script/public-inbox-learn
@@ -4,7 +4,7 @@
 #
 # Used for training spam (via SpamAssassin) and removing messages from a
 # public-inbox
-my $usage = "$0 (spam|ham) < /path/to/message";
+my $usage = "$0 <spam|ham|rm> </path/to/message";
 use strict;
 use warnings;
 use PublicInbox::Config;
@@ -39,7 +39,7 @@ my $mime = PublicInbox::MIME->new(eval {
 	$data
 });
 
-# spam is removed from all known inboxes
+# spam is removed from all known inboxes since it is often Bcc:-ed
 if ($train eq 'spam') {
 	$pi_config->each_inbox(sub {
 		my ($ibx) = @_;

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 04/14] learn: GIT_COMMITTER_<NAME|EMAIL> may be "" or "0"
  2019-10-28 10:45 [PATCH 00/14] learn: sync w/ -mda changes and add manpage Eric Wong
                   ` (2 preceding siblings ...)
  2019-10-28 10:45 ` [PATCH 03/14] learn: update usage statement Eric Wong
@ 2019-10-28 10:45 ` Eric Wong
  2019-10-28 10:45 ` [PATCH 05/14] learn: hoist out remove_or_add subroutine Eric Wong
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2019-10-28 10:45 UTC (permalink / raw)
  To: meta

Users may be zeroes or blanks.
---
 script/public-inbox-learn | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/script/public-inbox-learn b/script/public-inbox-learn
index ad132985..299f75a0 100755
--- a/script/public-inbox-learn
+++ b/script/public-inbox-learn
@@ -65,8 +65,8 @@ if ($train eq 'spam') {
 		next unless ref($dst);
 		# We do not touch GIT_COMMITTER_* env here so we can track
 		# who trained the message.
-		$dst->{name} = $ENV{GIT_COMMITTER_NAME} || $dst->{name};
-		$dst->{-primary_address} = $ENV{GIT_COMMITTER_EMAIL} || $addr;
+		$dst->{name} = $ENV{GIT_COMMITTER_NAME} // $dst->{name};
+		$dst->{-primary_address} = $ENV{GIT_COMMITTER_EMAIL} // $addr;
 		$dst = PublicInbox::InboxWritable->new($dst);
 		my $im = $dst->importer(0);
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 05/14] learn: hoist out remove_or_add subroutine
  2019-10-28 10:45 [PATCH 00/14] learn: sync w/ -mda changes and add manpage Eric Wong
                   ` (3 preceding siblings ...)
  2019-10-28 10:45 ` [PATCH 04/14] learn: GIT_COMMITTER_<NAME|EMAIL> may be "" or "0" Eric Wong
@ 2019-10-28 10:45 ` Eric Wong
  2019-10-28 10:45 ` [PATCH 06/14] mda: hoist out List-ID handling and reuse in -learn Eric Wong
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2019-10-28 10:45 UTC (permalink / raw)
  To: meta

We'll be reusing it for List-ID processing in the next commit.
---
 script/public-inbox-learn | 56 ++++++++++++++++++++++-----------------
 1 file changed, 31 insertions(+), 25 deletions(-)

diff --git a/script/public-inbox-learn b/script/public-inbox-learn
index 299f75a0..56739f88 100755
--- a/script/public-inbox-learn
+++ b/script/public-inbox-learn
@@ -39,6 +39,34 @@ my $mime = PublicInbox::MIME->new(eval {
 	$data
 });
 
+sub remove_or_add ($$$) {
+	my ($ibx, $train, $addr) = @_;
+
+	# We do not touch GIT_COMMITTER_* env here so we can track
+	# who trained the message.
+	$ibx->{name} = $ENV{GIT_COMMITTER_NAME} // $ibx->{name};
+	$ibx->{-primary_address} = $ENV{GIT_COMMITTER_EMAIL} // $addr;
+	$ibx = PublicInbox::InboxWritable->new($ibx);
+	my $im = $ibx->importer(0);
+
+	if ($train eq "rm") {
+		# This needs to be idempotent, as my inotify trainer
+		# may train for each cross-posted message, and this
+		# script already learns for every list in
+		# ~/.public-inbox/config
+		$im->remove($mime, $train);
+	} elsif ($train eq "ham") {
+		# no checking for spam here, we assume the message has
+		# been reviewed by a human at this point:
+		PublicInbox::MDA->set_list_headers($mime, $ibx);
+
+		# Ham messages are trained when they're marked into
+		# a SEEN state, so this is idempotent:
+		$im->add($mime);
+	}
+	$im->done;
+}
+
 # spam is removed from all known inboxes since it is often Bcc:-ed
 if ($train eq 'spam') {
 	$pi_config->each_inbox(sub {
@@ -61,31 +89,9 @@ if ($train eq 'spam') {
 	}
 
 	# n.b. message may be cross-posted to multiple public-inboxes
-	while (my ($addr, $dst) = each %dests) {
-		next unless ref($dst);
-		# We do not touch GIT_COMMITTER_* env here so we can track
-		# who trained the message.
-		$dst->{name} = $ENV{GIT_COMMITTER_NAME} // $dst->{name};
-		$dst->{-primary_address} = $ENV{GIT_COMMITTER_EMAIL} // $addr;
-		$dst = PublicInbox::InboxWritable->new($dst);
-		my $im = $dst->importer(0);
-
-		if ($train eq "rm") {
-			# This needs to be idempotent, as my inotify trainer
-			# may train for each cross-posted message, and this
-			# script already learns for every list in
-			# ~/.public-inbox/config
-			$im->remove($mime, $train);
-		} elsif ($train eq "ham") {
-			# no checking for spam here, we assume the message has
-			# been reviewed by a human at this point:
-			PublicInbox::MDA->set_list_headers($mime, $dst);
-
-			# Ham messages are trained when they're marked into
-			# a SEEN state, so this is idempotent:
-			$im->add($mime);
-		}
-		$im->done;
+	while (my ($addr, $ibx) = each %dests) {
+		next unless ref($ibx); # $ibx may be 0
+		remove_or_add($ibx, $train, $addr);
 	}
 }
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 06/14] mda: hoist out List-ID handling and reuse in -learn
  2019-10-28 10:45 [PATCH 00/14] learn: sync w/ -mda changes and add manpage Eric Wong
                   ` (4 preceding siblings ...)
  2019-10-28 10:45 ` [PATCH 05/14] learn: hoist out remove_or_add subroutine Eric Wong
@ 2019-10-28 10:45 ` Eric Wong
  2019-10-28 10:45 ` [PATCH 07/14] filter/base: remove MAX_MID_SIZE constant Eric Wong
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2019-10-28 10:45 UTC (permalink / raw)
  To: meta

It's now possible to inject false-positive ham into an inbox
the same way -mda does via List-ID.
---
 lib/PublicInbox/MDA.pm    | 15 +++++++++++++++
 script/public-inbox-learn |  8 +++++++-
 script/public-inbox-mda   |  5 +----
 3 files changed, 23 insertions(+), 5 deletions(-)
 mode change 100755 => 100644 script/public-inbox-learn

diff --git a/lib/PublicInbox/MDA.pm b/lib/PublicInbox/MDA.pm
index 9cafda13..ce2c870f 100644
--- a/lib/PublicInbox/MDA.pm
+++ b/lib/PublicInbox/MDA.pm
@@ -83,4 +83,19 @@ sub set_list_headers {
 	}
 }
 
+# TODO: deal with multiple List-ID headers?
+sub inbox_for_list_id ($$) {
+	my ($klass, $config, $simple) = @_;
+
+	# newer Email::Simple allows header_raw, as does Email::MIME:
+	my $list_id = $simple->can('header_raw') ?
+			$simple->header_raw('List-Id') :
+			$simple->header('List-Id');
+	my $ibx;
+	if (defined $list_id && $list_id =~ /<[ \t]*(.+)?[ \t]*>/) {
+		$ibx = $config->lookup_list_id($1);
+	}
+	$ibx;
+}
+
 1;
diff --git a/script/public-inbox-learn b/script/public-inbox-learn
old mode 100755
new mode 100644
index 56739f88..79f3ead5
--- a/script/public-inbox-learn
+++ b/script/public-inbox-learn
@@ -77,7 +77,7 @@ if ($train eq 'spam') {
 		$im->done;
 	});
 } else {
-	require PublicInbox::MDA if $train eq "ham";
+	require PublicInbox::MDA;
 
 	# get all recipients
 	my %dests; # address => <PublicInbox::Inbox|0(false)>
@@ -89,10 +89,16 @@ if ($train eq 'spam') {
 	}
 
 	# n.b. message may be cross-posted to multiple public-inboxes
+	my %seen;
 	while (my ($addr, $ibx) = each %dests) {
 		next unless ref($ibx); # $ibx may be 0
+		next if $seen{"$ibx"}++;
 		remove_or_add($ibx, $train, $addr);
 	}
+	my $ibx = PublicInbox::MDA->inbox_for_list_id($pi_config, $mime);
+	if ($ibx && !$seen{"$ibx"}) {
+		remove_or_add($ibx, $train, $ibx->{-primary_address});
+	}
 }
 
 if ($err) {
diff --git a/script/public-inbox-mda b/script/public-inbox-mda
index 584218b5..3ff318c9 100755
--- a/script/public-inbox-mda
+++ b/script/public-inbox-mda
@@ -43,10 +43,7 @@ if (defined $recipient) {
 	$dst = $config->lookup($recipient); # first check
 }
 if (!defined $dst) {
-	my $list_id = $simple->header('List-Id');
-	if (defined $list_id && $list_id =~ /<[ \t]*(.+)?[ \t]*>/) {
-		$dst = $config->lookup_list_id($1);
-	}
+	$dst = PublicInbox::MDA->inbox_for_list_id($config, $simple);
 	if (!defined $dst && !defined $recipient) {
 		die "ORIGINAL_RECIPIENT not defined in ENV\n";
 	}

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 07/14] filter/base: remove MAX_MID_SIZE constant
  2019-10-28 10:45 [PATCH 00/14] learn: sync w/ -mda changes and add manpage Eric Wong
                   ` (5 preceding siblings ...)
  2019-10-28 10:45 ` [PATCH 06/14] mda: hoist out List-ID handling and reuse in -learn Eric Wong
@ 2019-10-28 10:45 ` Eric Wong
  2019-10-28 10:45 ` [PATCH 08/14] mda: hoist out mda_filter_adjust Eric Wong
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2019-10-28 10:45 UTC (permalink / raw)
  To: meta

We don't need it in the filter, here, since we have
one in the MDA package.
---
 lib/PublicInbox/Filter/Base.pm | 1 -
 1 file changed, 1 deletion(-)

diff --git a/lib/PublicInbox/Filter/Base.pm b/lib/PublicInbox/Filter/Base.pm
index 052cd332..7a0c720f 100644
--- a/lib/PublicInbox/Filter/Base.pm
+++ b/lib/PublicInbox/Filter/Base.pm
@@ -6,7 +6,6 @@ package PublicInbox::Filter::Base;
 use strict;
 use warnings;
 use PublicInbox::MsgIter;
-use constant MAX_MID_SIZE => 244; # max term size - 1 in Xapian
 
 sub No ($) { "*** We only accept plain-text mail, No $_[0] ***" }
 

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 08/14] mda: hoist out mda_filter_adjust
  2019-10-28 10:45 [PATCH 00/14] learn: sync w/ -mda changes and add manpage Eric Wong
                   ` (6 preceding siblings ...)
  2019-10-28 10:45 ` [PATCH 07/14] filter/base: remove MAX_MID_SIZE constant Eric Wong
@ 2019-10-28 10:45 ` Eric Wong
  2019-10-28 10:45 ` [PATCH 09/14] mda: skip MIME parsing if spam Eric Wong
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2019-10-28 10:45 UTC (permalink / raw)
  To: meta

It makes it easier to document the default -mda behavior is
stricter than normal, including "public-inbox-learn ham"
---
 script/public-inbox-mda | 18 ++++++++++++------
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/script/public-inbox-mda b/script/public-inbox-mda
index 3ff318c9..71c5d937 100755
--- a/script/public-inbox-mda
+++ b/script/public-inbox-mda
@@ -75,13 +75,19 @@ if ($spamc) {
 my $mime = PublicInbox::MIME->new(\$str);
 do_exit(0) unless $spam_ok;
 
-my $fcfg = $dst->{filter} || '';
-# -mda defaults to the strict base filter
-if ($fcfg eq '') {
-	$dst->{filter} = 'PublicInbox::Filter::Base';
-} elsif ($fcfg eq 'scrub') { # legacy alias, undocumented, remove?
-	$dst->{filter} = 'PublicInbox::Filter::Mirror';
+# -mda defaults to the strict base filter which we won't use anywhere else
+sub mda_filter_adjust ($) {
+	my ($ibx) = @_;
+	my $fcfg = $ibx->{filter} || '';
+	if ($fcfg eq '') {
+		$ibx->{filter} = 'PublicInbox::Filter::Base';
+	} elsif ($fcfg eq 'scrub') { # legacy alias, undocumented, remove?
+		$ibx->{filter} = 'PublicInbox::Filter::Mirror';
+	}
 }
+
+mda_filter_adjust($dst);
+
 my $filter = $dst->filter;
 my $ret = $filter->delivery($mime);
 if (ref($ret) && $ret->isa('Email::MIME')) { # filter altered message

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 09/14] mda: skip MIME parsing if spam
  2019-10-28 10:45 [PATCH 00/14] learn: sync w/ -mda changes and add manpage Eric Wong
                   ` (7 preceding siblings ...)
  2019-10-28 10:45 ` [PATCH 08/14] mda: hoist out mda_filter_adjust Eric Wong
@ 2019-10-28 10:45 ` Eric Wong
  2019-10-28 10:45 ` [PATCH 10/14] inboxwritable: add assert_usable_dir sub Eric Wong
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2019-10-28 10:45 UTC (permalink / raw)
  To: meta

We don't want to waste cycles parsing the message for MIME bits
if it's spam.
---
 script/public-inbox-mda | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/script/public-inbox-mda b/script/public-inbox-mda
index 71c5d937..69354616 100755
--- a/script/public-inbox-mda
+++ b/script/public-inbox-mda
@@ -71,9 +71,9 @@ if ($spamc) {
 	my $fh = $emm->fh;
 	read($fh, $str, -s $fh);
 }
+do_exit(0) unless $spam_ok;
 
 my $mime = PublicInbox::MIME->new(\$str);
-do_exit(0) unless $spam_ok;
 
 # -mda defaults to the strict base filter which we won't use anywhere else
 sub mda_filter_adjust ($) {

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 10/14] inboxwritable: add assert_usable_dir sub
  2019-10-28 10:45 [PATCH 00/14] learn: sync w/ -mda changes and add manpage Eric Wong
                   ` (8 preceding siblings ...)
  2019-10-28 10:45 ` [PATCH 09/14] mda: skip MIME parsing if spam Eric Wong
@ 2019-10-28 10:45 ` Eric Wong
  2019-10-28 10:45 ` [PATCH 11/14] mda: prepare for multiple destinations Eric Wong
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2019-10-28 10:45 UTC (permalink / raw)
  To: meta

And use it for mda, since "0" could be a usable directory
if somebody insists on using relative paths...
---
 lib/PublicInbox/InboxWritable.pm |  9 ++++++++-
 lib/PublicInbox/V2Writable.pm    |  5 ++---
 script/public-inbox-mda          |  4 +++-
 t/import.t                       |  8 ++++++++
 t/v2writable.t                   | 12 ++++++++++++
 5 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/lib/PublicInbox/InboxWritable.pm b/lib/PublicInbox/InboxWritable.pm
index ab7b0ed5..9eab394d 100644
--- a/lib/PublicInbox/InboxWritable.pm
+++ b/lib/PublicInbox/InboxWritable.pm
@@ -30,12 +30,19 @@ sub new {
 	$self;
 }
 
+sub assert_usable_dir {
+	my ($self) = @_;
+	my $dir = $self->{inboxdir};
+	return $dir if defined($dir) && $dir ne '';
+	die "no inboxdir defined for $self->{name}\n";
+}
+
 sub init_inbox {
 	my ($self, $shards, $skip_epoch, $skip_artnum) = @_;
 	# TODO: honor skip_artnum
 	my $v = $self->{version} || 1;
 	if ($v == 1) {
-		my $dir = $self->{inboxdir} or die "no inboxdir in inbox\n";
+		my $dir = assert_usable_dir($self);
 		PublicInbox::Import::init_bare($dir);
 	} else {
 		my $v2w = importer($self);
diff --git a/lib/PublicInbox/V2Writable.pm b/lib/PublicInbox/V2Writable.pm
index ad2e8e62..1825da2c 100644
--- a/lib/PublicInbox/V2Writable.pm
+++ b/lib/PublicInbox/V2Writable.pm
@@ -77,7 +77,8 @@ sub new {
 	# $creat may be any true value, or 0/undef.  A hashref is true,
 	# and $creat->{nproc} may be set to an integer
 	my ($class, $v2ibx, $creat) = @_;
-	my $dir = $v2ibx->{inboxdir} or die "no inboxdir in inbox\n";
+	$v2ibx = PublicInbox::InboxWritable->new($v2ibx);
+	my $dir = $v2ibx->assert_usable_dir;
 	unless (-d $dir) {
 		if ($creat) {
 			require File::Path;
@@ -86,8 +87,6 @@ sub new {
 			die "$dir does not exist\n";
 		}
 	}
-
-	$v2ibx = PublicInbox::InboxWritable->new($v2ibx);
 	$v2ibx->umask_prepare;
 
 	my $xpfx = "$dir/xap" . PublicInbox::Search::SCHEMA_VERSION;
diff --git a/script/public-inbox-mda b/script/public-inbox-mda
index 69354616..c122984f 100755
--- a/script/public-inbox-mda
+++ b/script/public-inbox-mda
@@ -49,8 +49,10 @@ if (!defined $dst) {
 	}
 	defined $dst or do_exit(67); # EX_NOUSER 5.1.1 user unknown
 }
-$dst->{inboxdir} or do_exit(67);
+
 $dst = PublicInbox::InboxWritable->new($dst);
+eval { $dst->assert_usable_dir };
+do_exit(67) if $@;
 
 # pre-check, MDA has stricter rules than an importer might;
 if ($precheck && !PublicInbox::MDA->precheck($simple, $dst->{address})) {
diff --git a/t/import.t b/t/import.t
index 4ec3c4f3..d309eec5 100644
--- a/t/import.t
+++ b/t/import.t
@@ -96,4 +96,12 @@ is(undef, $im->checkpoint, 'checkpoint works before ->done');
 $im->done;
 is(undef, $im->checkpoint, 'checkpoint works after ->done');
 $im->checkpoint;
+
+my $nogit = PublicInbox::Git->new("$dir/non-existent/dir");
+eval {
+	my $nope = PublicInbox::Import->new($nogit, 'nope', 'no@example.com');
+	$nope->add($mime);
+};
+ok($@, 'Import->add fails on non-existent dir');
+
 done_testing();
diff --git a/t/v2writable.t b/t/v2writable.t
index c2daac2f..06dafe98 100644
--- a/t/v2writable.t
+++ b/t/v2writable.t
@@ -260,4 +260,16 @@ EOF
 	$im->done;
 }
 
+my $tmp = {
+	inboxdir => "$inboxdir/non-existent/subdir",
+	name => 'nope',
+	version => 2,
+	-primary_address => 'test@example.com',
+};
+eval {
+	my $nope = PublicInbox::V2Writable->new($tmp);
+	$nope->add($mime);
+};
+ok($@, 'V2Writable fails on non-existent dir');
+
 done_testing();

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 11/14] mda: prepare for multiple destinations
  2019-10-28 10:45 [PATCH 00/14] learn: sync w/ -mda changes and add manpage Eric Wong
                   ` (9 preceding siblings ...)
  2019-10-28 10:45 ` [PATCH 10/14] inboxwritable: add assert_usable_dir sub Eric Wong
@ 2019-10-28 10:45 ` Eric Wong
  2019-10-28 10:45 ` [PATCH 12/14] mda: support multiple List-ID matches Eric Wong
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2019-10-28 10:45 UTC (permalink / raw)
  To: meta

Multiple List-ID headers will be supported in the next commit
---
 script/public-inbox-mda | 92 ++++++++++++++++++++++++-----------------
 1 file changed, 55 insertions(+), 37 deletions(-)

diff --git a/script/public-inbox-mda b/script/public-inbox-mda
index c122984f..821bd9cc 100755
--- a/script/public-inbox-mda
+++ b/script/public-inbox-mda
@@ -37,27 +37,39 @@ my $config = PublicInbox::Config->new;
 my $key = 'publicinboxmda.spamcheck';
 my $default = 'PublicInbox::Spamcheck::Spamc';
 my $spamc = PublicInbox::Spamcheck::get($config, $key, $default);
-my $dst;
+my $dests = [];
 my $recipient = $ENV{ORIGINAL_RECIPIENT};
 if (defined $recipient) {
-	$dst = $config->lookup($recipient); # first check
+	my $ibx = $config->lookup($recipient); # first check
+	push @$dests, $ibx if $ibx;
 }
-if (!defined $dst) {
-	$dst = PublicInbox::MDA->inbox_for_list_id($config, $simple);
-	if (!defined $dst && !defined $recipient) {
+if (!scalar(@$dests)) {
+	my $ibx = PublicInbox::MDA->inbox_for_list_id($config, $simple);
+	if (!defined($ibx) && !defined($recipient)) {
 		die "ORIGINAL_RECIPIENT not defined in ENV\n";
 	}
-	defined $dst or do_exit(67); # EX_NOUSER 5.1.1 user unknown
+	defined($ibx) or do_exit(67); # EX_NOUSER 5.1.1 user unknown
+	push @$dests, $ibx;
 }
 
-$dst = PublicInbox::InboxWritable->new($dst);
-eval { $dst->assert_usable_dir };
-do_exit(67) if $@;
+my $err;
+@$dests = grep {
+	my $ibx = PublicInbox::InboxWritable->new($_);
+	eval { $ibx->assert_usable_dir };
+	if ($@) {
+		warn $@;
+		$err = 1;
+		0;
+	# pre-check, MDA has stricter rules than an importer might;
+	} elsif ($precheck) {
+		!!PublicInbox::MDA->precheck($simple, $ibx->{address});
+	} else {
+		1;
+	}
+} @$dests;
+
+do_exit(67) if $err && scalar(@$dests) == 0;
 
-# pre-check, MDA has stricter rules than an importer might;
-if ($precheck && !PublicInbox::MDA->precheck($simple, $dst->{address})) {
-	do_exit(0);
-}
 $simple = undef;
 my $spam_ok;
 if ($spamc) {
@@ -75,8 +87,6 @@ if ($spamc) {
 }
 do_exit(0) unless $spam_ok;
 
-my $mime = PublicInbox::MIME->new(\$str);
-
 # -mda defaults to the strict base filter which we won't use anywhere else
 sub mda_filter_adjust ($) {
 	my ($ibx) = @_;
@@ -88,30 +98,38 @@ sub mda_filter_adjust ($) {
 	}
 }
 
-mda_filter_adjust($dst);
+my @rejects;
+for my $ibx (@$dests) {
+	mda_filter_adjust($ibx);
+	my $filter = $ibx->filter;
+	my $mime = PublicInbox::MIME->new($str);
+	my $ret = $filter->delivery($mime);
+	if (ref($ret) && $ret->isa('Email::MIME')) { # filter altered message
+		$mime = $ret;
+	} elsif ($ret == PublicInbox::Filter::Base::IGNORE) {
+		next; # nothing, keep looping
+	} elsif ($ret == PublicInbox::Filter::Base::REJECT) {
+		push @rejects, $filter->err;
+		next;
+	}
 
-my $filter = $dst->filter;
-my $ret = $filter->delivery($mime);
-if (ref($ret) && $ret->isa('Email::MIME')) { # filter altered message
-	$mime = $ret;
-} elsif ($ret == PublicInbox::Filter::Base::IGNORE) {
-	do_exit(0); # chuck it to emergency
-} elsif ($ret == PublicInbox::Filter::Base::REJECT) {
-	$! = 65; # EX_DATAERR 5.6.0 data format error
-	die $filter->err, "\n";
-} # else { accept
-$filter = undef;
+	PublicInbox::MDA->set_list_headers($mime, $ibx);
+	my $im = $ibx->importer(0);
+	if (defined $im->add($mime)) {
+		# ->abort is idempotent, no emergency if a single
+		# destination succeeds
+		$emm->abort;
+	} else { # v1-only
+		my $mid = $mime->header_obj->header_raw('Message-ID');
+		# this message is similar to what ssoma-mda shows:
+		print STDERR "CONFLICT: Message-ID: $mid exists\n";
+	}
+	$im->done;
+}
 
-PublicInbox::MDA->set_list_headers($mime, $dst);
-my $im = $dst->importer(0);
-if (defined $im->add($mime)) {
-	$emm = $emm->abort;
-} else {
-	# this message is similar to what ssoma-mda shows:
-	print STDERR "CONFLICT: Message-ID: ",
-			$mime->header_obj->header_raw('Message-ID'),
-			" exists\n";
+if (scalar(@rejects) && scalar(@rejects) == scalar(@$dests)) {
+	$! = 65; # EX_DATAERR 5.6.0 data format error
+	die join("\n", @rejects, '');
 }
 
-$im->done;
 do_exit(0);

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 12/14] mda: support multiple List-ID matches
  2019-10-28 10:45 [PATCH 00/14] learn: sync w/ -mda changes and add manpage Eric Wong
                   ` (10 preceding siblings ...)
  2019-10-28 10:45 ` [PATCH 11/14] mda: prepare for multiple destinations Eric Wong
@ 2019-10-28 10:45 ` Eric Wong
  2019-10-28 18:05   ` Eric W. Biederman
  2019-10-28 10:45 ` [PATCH 13/14] learn: allow running without spamc Eric Wong
  2019-10-28 10:45 ` [PATCH 14/14] doc: add public-inbox-learn(1) manpage Eric Wong
  13 siblings, 1 reply; 17+ messages in thread
From: Eric Wong @ 2019-10-28 10:45 UTC (permalink / raw)
  To: meta; +Cc: Eric W . Biederman

While it's not RFC2919-conformant, mail software can
theoretically set multiple List-ID headers.  Deliver to all
inboxes which match a given List-ID since that's likely the
intended.

Cc: Eric W. Biederman <ebiederm@xmission.com>
Link: https://public-inbox.org/meta/87pniltscf.fsf@x220.int.ebiederm.org/
---
 lib/PublicInbox/MDA.pm    | 19 +++++++++++++------
 script/public-inbox-learn |  5 +++--
 script/public-inbox-mda   |  7 +++----
 t/mda.t                   | 19 +++++++++++++++++++
 4 files changed, 38 insertions(+), 12 deletions(-)

diff --git a/lib/PublicInbox/MDA.pm b/lib/PublicInbox/MDA.pm
index ce2c870f..933d82a8 100644
--- a/lib/PublicInbox/MDA.pm
+++ b/lib/PublicInbox/MDA.pm
@@ -84,18 +84,25 @@ sub set_list_headers {
 }
 
 # TODO: deal with multiple List-ID headers?
-sub inbox_for_list_id ($$) {
+sub inboxes_for_list_id ($$) {
 	my ($klass, $config, $simple) = @_;
 
 	# newer Email::Simple allows header_raw, as does Email::MIME:
-	my $list_id = $simple->can('header_raw') ?
+	my @list_ids = $simple->can('header_raw') ?
 			$simple->header_raw('List-Id') :
 			$simple->header('List-Id');
-	my $ibx;
-	if (defined $list_id && $list_id =~ /<[ \t]*(.+)?[ \t]*>/) {
-		$ibx = $config->lookup_list_id($1);
+	my @dests;
+	for my $list_id (@list_ids) {
+		$list_id =~ /<[ \t]*(.+)?[ \t]*>/ or next;
+		if (my $ibx = $config->lookup_list_id($1)) {
+			push @dests, $ibx;
+		}
+	}
+	if (scalar(@list_ids) > 1) {
+		warn "W: multiple List-IDs in message:\n";
+		warn "W: List-ID: $_\n" for @list_ids
 	}
-	$ibx;
+	\@dests;
 }
 
 1;
diff --git a/script/public-inbox-learn b/script/public-inbox-learn
index 79f3ead5..3073294a 100644
--- a/script/public-inbox-learn
+++ b/script/public-inbox-learn
@@ -95,8 +95,9 @@ if ($train eq 'spam') {
 		next if $seen{"$ibx"}++;
 		remove_or_add($ibx, $train, $addr);
 	}
-	my $ibx = PublicInbox::MDA->inbox_for_list_id($pi_config, $mime);
-	if ($ibx && !$seen{"$ibx"}) {
+	my $dests = PublicInbox::MDA->inboxes_for_list_id($pi_config, $mime);
+	for my $ibx (@$dests) {
+		next if !$seen{"$ibx"}++;
 		remove_or_add($ibx, $train, $ibx->{-primary_address});
 	}
 }
diff --git a/script/public-inbox-mda b/script/public-inbox-mda
index 821bd9cc..dca8a0ea 100755
--- a/script/public-inbox-mda
+++ b/script/public-inbox-mda
@@ -44,12 +44,11 @@ if (defined $recipient) {
 	push @$dests, $ibx if $ibx;
 }
 if (!scalar(@$dests)) {
-	my $ibx = PublicInbox::MDA->inbox_for_list_id($config, $simple);
-	if (!defined($ibx) && !defined($recipient)) {
+	$dests = PublicInbox::MDA->inboxes_for_list_id($config, $simple);
+	if (!scalar(@$dests) && !defined($recipient)) {
 		die "ORIGINAL_RECIPIENT not defined in ENV\n";
 	}
-	defined($ibx) or do_exit(67); # EX_NOUSER 5.1.1 user unknown
-	push @$dests, $ibx;
+	scalar(@$dests) or do_exit(67); # EX_NOUSER 5.1.1 user unknown
 }
 
 my $err;
diff --git a/t/mda.t b/t/mda.t
index 99592b2d..35811ac6 100644
--- a/t/mda.t
+++ b/t/mda.t
@@ -308,6 +308,25 @@ EOF
 	my $cur = `git --git-dir=$maindir diff HEAD~1..HEAD`;
 	like($cur, qr/this message would not be accepted without --no-precheck/,
 		'--no-precheck delivered message anyways');
+
+	# try a message with multiple List-ID headers
+	$in = <<EOF;
+List-ID: <foo.bar>
+List-ID: <$list_id>
+Message-ID: <2lids\@example>
+Subject: two List-IDs
+From: user <user\@example.com>
+To: $addr
+Date: Fri, 02 Oct 1993 00:00:00 +0000
+
+EOF
+	($out, $err) = ('', '');
+	IPC::Run::run([$mda], \$in, \$out, \$err);
+	is($?, 0, 'mda OK with multiple List-Id matches');
+	$cur = `git --git-dir=$maindir diff HEAD~1..HEAD`;
+	like($cur, qr/Message-ID: <2lids\@example>/,
+		'multi List-ID match delivered');
+	like($err, qr/multiple List-ID/, 'warned about multiple List-ID');
 }
 
 done_testing();

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 13/14] learn: allow running without spamc
  2019-10-28 10:45 [PATCH 00/14] learn: sync w/ -mda changes and add manpage Eric Wong
                   ` (11 preceding siblings ...)
  2019-10-28 10:45 ` [PATCH 12/14] mda: support multiple List-ID matches Eric Wong
@ 2019-10-28 10:45 ` Eric Wong
  2019-10-28 10:45 ` [PATCH 14/14] doc: add public-inbox-learn(1) manpage Eric Wong
  13 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2019-10-28 10:45 UTC (permalink / raw)
  To: meta

It's possible that a user will want to disabe SpamAssassin
via "publicinboxmda.spamcheck=none" in public-inbox-config(5)
when injecting ham into an inbox.

Fixes: 466df3e029fe ("mda: allow configuring globally without spamc support")
---
 script/public-inbox-learn | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/script/public-inbox-learn b/script/public-inbox-learn
index 3073294a..145f41ea 100644
--- a/script/public-inbox-learn
+++ b/script/public-inbox-learn
@@ -11,21 +11,23 @@ use PublicInbox::Config;
 use PublicInbox::InboxWritable;
 use PublicInbox::MIME;
 use PublicInbox::Address;
-use PublicInbox::Spamcheck::Spamc;
+use PublicInbox::Spamcheck;
 my $train = shift or die "usage: $usage\n";
 if ($train !~ /\A(?:ham|spam|rm)\z/) {
 	die "`$train' not recognized.\nusage: $usage\n";
 }
 
-my $spamc = PublicInbox::Spamcheck::Spamc->new;
 my $pi_config = PublicInbox::Config->new;
+my $key = 'publicinboxmda.spamcheck';
+my $default = 'PublicInbox::Spamcheck::Spamc';
+my $spamc = PublicInbox::Spamcheck::get($pi_config, $key, $default);
 my $err;
 my $mime = PublicInbox::MIME->new(eval {
 	local $/;
 	my $data = scalar <STDIN>;
 	$data =~ s/\A[\r\n]*From [^\r\n]*\r?\n//s;
 
-	if ($train ne 'rm') {
+	if ($train ne 'rm' && defined($spamc)) {
 		eval {
 			if ($train eq 'ham') {
 				$spamc->hamlearn(\$data);

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 14/14] doc: add public-inbox-learn(1) manpage
  2019-10-28 10:45 [PATCH 00/14] learn: sync w/ -mda changes and add manpage Eric Wong
                   ` (12 preceding siblings ...)
  2019-10-28 10:45 ` [PATCH 13/14] learn: allow running without spamc Eric Wong
@ 2019-10-28 10:45 ` Eric Wong
  13 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2019-10-28 10:45 UTC (permalink / raw)
  To: meta

Tools intended for end users need manpages.
---
 Documentation/include.mk             |  1 +
 Documentation/public-inbox-learn.pod | 86 ++++++++++++++++++++++++++++
 MANIFEST                             |  1 +
 3 files changed, 88 insertions(+)
 create mode 100644 Documentation/public-inbox-learn.pod

diff --git a/Documentation/include.mk b/Documentation/include.mk
index d2357ffc..bb622c1a 100644
--- a/Documentation/include.mk
+++ b/Documentation/include.mk
@@ -41,6 +41,7 @@ m1 += public-inbox-edit
 m1 += public-inbox-httpd
 m1 += public-inbox-index
 m1 += public-inbox-init
+m1 += public-inbox-learn
 m1 += public-inbox-mda
 m1 += public-inbox-nntpd
 m1 += public-inbox-watch
diff --git a/Documentation/public-inbox-learn.pod b/Documentation/public-inbox-learn.pod
new file mode 100644
index 00000000..b8190b59
--- /dev/null
+++ b/Documentation/public-inbox-learn.pod
@@ -0,0 +1,86 @@
+=head1 NAME
+
+public-inbox-learn - spam trainer and remover for public-inbox
+
+=head1 SYNOPSIS
+
+B<public-inbox-learn> <spam|ham|rm> E<lt>MESSAGE
+
+=head1 DESCRIPTION
+
+public-inbox-learn can remove spam or inject ham messages into
+an inbox while training a SpamAssassin instance.
+
+It is intended for users of L<public-inbox-mda(1)> or
+L<public-inbox-watch(1)>, but not users relying on
+L<git-fetch(1)> to mirror inboxes.
+
+It reads one message from standard input and operates on it
+depending on the command given:
+
+=head1 COMMANDS
+
+public-inbox-learn takes one of the following commands as its
+first and only argument:
+
+=over 8
+
+=item spam
+
+Treat the message as spam.  This will mark the message as
+removed so it becomes inaccessible via NNTP or WWW endpoints
+for all configured inboxes.
+
+The message remains accessible in git history.
+
+It will also be fed to L<spamc(1)> for training purposes unless
+C<publicinboxmda.spamcheck> is C<none> in L<public-inbox-config(5)>.
+
+=item ham
+
+Treat standard input as ham.  This is useful for manually injecting
+messages into the archives which failed the spam check run by
+L<public-inbox-mda(1)> or L<public-inbox-watch(1)>.
+
+It relies on the C<To:>, C<Cc:>, and C<List-ID:> headers
+to match configured inbox addresses and C<listid> directives.
+
+It will also be fed to L<spamc(1)> for training purposes unless
+C<publicinboxmda.spamcheck> is C<none> in L<public-inbox-config(5)>.
+
+=item rm
+
+This is identical to the C<spam> command above, but does
+not feed the message to L<spamc(1)>
+
+=back
+
+=head1 ENVIRONMENT
+
+=over 8
+
+=item PI_CONFIG
+
+Per-user config file parseable by L<git-config(1)>.
+See L<public-inbox-config(5)>.
+
+Default: ~/.public-inbox/config
+
+=back
+
+=head1 CONTACT
+
+Feedback welcome via plain-text mail to L<mailto:meta@public-inbox.org>
+
+The mail archives are hosted at L<https://public-inbox.org/meta/>
+and L<http://hjrcffqmbrq6wope.onion/meta/>
+
+=head1 COPYRIGHT
+
+Copyright 2019 all contributors L<mailto:meta@public-inbox.org>
+
+License: AGPL-3.0+ L<https://www.gnu.org/licenses/agpl-3.0.txt>
+
+=head1 SEE ALSO
+
+L<spamc(1)>, L<public-inbox-mda(1)>, L<public-inbox-watch(1)>
diff --git a/MANIFEST b/MANIFEST
index 7d2ac17c..d1b6749a 100644
--- a/MANIFEST
+++ b/MANIFEST
@@ -22,6 +22,7 @@ Documentation/public-inbox-edit.pod
 Documentation/public-inbox-httpd.pod
 Documentation/public-inbox-index.pod
 Documentation/public-inbox-init.pod
+Documentation/public-inbox-learn.pod
 Documentation/public-inbox-mda.pod
 Documentation/public-inbox-nntpd.pod
 Documentation/public-inbox-overview.pod

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 12/14] mda: support multiple List-ID matches
  2019-10-28 10:45 ` [PATCH 12/14] mda: support multiple List-ID matches Eric Wong
@ 2019-10-28 18:05   ` Eric W. Biederman
  2019-10-30 21:32     ` Eric Wong
  0 siblings, 1 reply; 17+ messages in thread
From: Eric W. Biederman @ 2019-10-28 18:05 UTC (permalink / raw)
  To: Eric Wong; +Cc: meta

Eric Wong <e@80x24.org> writes:

> While it's not RFC2919-conformant, mail software can
> theoretically set multiple List-ID headers.  Deliver to all
> inboxes which match a given List-ID since that's likely the
> intended.

There is a todo line you can kill, noted below.


There should probably be a warning about List-ID's you can't
look up.

In case of misconfiguration or you subscribe to an extra mail-box and
have not yet configured the List-ID for the list.  I don't know how to
find the List-ID ahead of time so it seems inevitiable that there will
be a couple messages with an uncofigured List-ID.

If you are not receiving from a mailling list you might get spam or
other unsolicited email from someone's list server.  Knowing the List-ID
of that email is probably also useful.  Knowing that this kind of
non-sense exists guarantees that there will be email whose List-ID won't
be configured.

> Cc: Eric W. Biederman <ebiederm@xmission.com>
> Link: https://public-inbox.org/meta/87pniltscf.fsf@x220.int.ebiederm.org/
> ---
>  lib/PublicInbox/MDA.pm    | 19 +++++++++++++------
>  script/public-inbox-learn |  5 +++--
>  script/public-inbox-mda   |  7 +++----
>  t/mda.t                   | 19 +++++++++++++++++++
>  4 files changed, 38 insertions(+), 12 deletions(-)
>
> diff --git a/lib/PublicInbox/MDA.pm b/lib/PublicInbox/MDA.pm
> index ce2c870f..933d82a8 100644
> --- a/lib/PublicInbox/MDA.pm
> +++ b/lib/PublicInbox/MDA.pm
> @@ -84,18 +84,25 @@ sub set_list_headers {
>  }
>  
>  # TODO: deal with multiple List-ID headers?
   ^^^^^^^^^^^^^^^^^^------ You can kill this line now.
   
> -sub inbox_for_list_id ($$) {
> +sub inboxes_for_list_id ($$) {
>  	my ($klass, $config, $simple) = @_;
>  
>  	# newer Email::Simple allows header_raw, as does Email::MIME:
> -	my $list_id = $simple->can('header_raw') ?
> +	my @list_ids = $simple->can('header_raw') ?
>  			$simple->header_raw('List-Id') :
>  			$simple->header('List-Id');
> -	my $ibx;
> -	if (defined $list_id && $list_id =~ /<[ \t]*(.+)?[ \t]*>/) {
> -		$ibx = $config->lookup_list_id($1);
> +	my @dests;
> +	for my $list_id (@list_ids) {
> +		$list_id =~ /<[ \t]*(.+)?[ \t]*>/ or next;
> +		if (my $ibx = $config->lookup_list_id($1)) {
> +			push @dests, $ibx;
> +		}
> +	}
> +	if (scalar(@list_ids) > 1) {
> +		warn "W: multiple List-IDs in message:\n";
> +		warn "W: List-ID: $_\n" for @list_ids
>  	}
> -	$ibx;
> +	\@dests;
>  }
>  
>  1;
> diff --git a/script/public-inbox-learn b/script/public-inbox-learn
> index 79f3ead5..3073294a 100644
> --- a/script/public-inbox-learn
> +++ b/script/public-inbox-learn
> @@ -95,8 +95,9 @@ if ($train eq 'spam') {
>  		next if $seen{"$ibx"}++;
>  		remove_or_add($ibx, $train, $addr);
>  	}
> -	my $ibx = PublicInbox::MDA->inbox_for_list_id($pi_config, $mime);
> -	if ($ibx && !$seen{"$ibx"}) {
> +	my $dests = PublicInbox::MDA->inboxes_for_list_id($pi_config, $mime);
> +	for my $ibx (@$dests) {
> +		next if !$seen{"$ibx"}++;
>  		remove_or_add($ibx, $train, $ibx->{-primary_address});
>  	}
>  }
> diff --git a/script/public-inbox-mda b/script/public-inbox-mda
> index 821bd9cc..dca8a0ea 100755
> --- a/script/public-inbox-mda
> +++ b/script/public-inbox-mda
> @@ -44,12 +44,11 @@ if (defined $recipient) {
>  	push @$dests, $ibx if $ibx;
>  }
>  if (!scalar(@$dests)) {
> -	my $ibx = PublicInbox::MDA->inbox_for_list_id($config, $simple);
> -	if (!defined($ibx) && !defined($recipient)) {
> +	$dests = PublicInbox::MDA->inboxes_for_list_id($config, $simple);
> +	if (!scalar(@$dests) && !defined($recipient)) {
>  		die "ORIGINAL_RECIPIENT not defined in ENV\n";
>  	}
> -	defined($ibx) or do_exit(67); # EX_NOUSER 5.1.1 user unknown
> -	push @$dests, $ibx;
> +	scalar(@$dests) or do_exit(67); # EX_NOUSER 5.1.1 user unknown
>  }
>  
>  my $err;
> diff --git a/t/mda.t b/t/mda.t
> index 99592b2d..35811ac6 100644
> --- a/t/mda.t
> +++ b/t/mda.t
> @@ -308,6 +308,25 @@ EOF
>  	my $cur = `git --git-dir=$maindir diff HEAD~1..HEAD`;
>  	like($cur, qr/this message would not be accepted without --no-precheck/,
>  		'--no-precheck delivered message anyways');
> +
> +	# try a message with multiple List-ID headers
> +	$in = <<EOF;
> +List-ID: <foo.bar>
> +List-ID: <$list_id>
> +Message-ID: <2lids\@example>
> +Subject: two List-IDs
> +From: user <user\@example.com>
> +To: $addr
> +Date: Fri, 02 Oct 1993 00:00:00 +0000
> +
> +EOF
> +	($out, $err) = ('', '');
> +	IPC::Run::run([$mda], \$in, \$out, \$err);
> +	is($?, 0, 'mda OK with multiple List-Id matches');
> +	$cur = `git --git-dir=$maindir diff HEAD~1..HEAD`;
> +	like($cur, qr/Message-ID: <2lids\@example>/,
> +		'multi List-ID match delivered');
> +	like($err, qr/multiple List-ID/, 'warned about multiple List-ID');
>  }
>  
>  done_testing();

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 12/14] mda: support multiple List-ID matches
  2019-10-28 18:05   ` Eric W. Biederman
@ 2019-10-30 21:32     ` Eric Wong
  0 siblings, 0 replies; 17+ messages in thread
From: Eric Wong @ 2019-10-30 21:32 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: meta

"Eric W. Biederman" <ebiederm@xmission.com> wrote:
> Eric Wong <e@80x24.org> writes:
> 
> > While it's not RFC2919-conformant, mail software can
> > theoretically set multiple List-ID headers.  Deliver to all
> > inboxes which match a given List-ID since that's likely the
> > intended.
> 
> There is a todo line you can kill, noted below.

Done and pushed

> There should probably be a warning about List-ID's you can't
> look up.
> 
> In case of misconfiguration or you subscribe to an extra mail-box and
> have not yet configured the List-ID for the list.  I don't know how to
> find the List-ID ahead of time so it seems inevitiable that there will
> be a couple messages with an uncofigured List-ID.

I'm not so sure about that...  We don't warn on existing cases
involving ORIGINAL_RECIPIENT/To/Cc.  Instead, it goes to
~/.public-inbox/emergency/ (or whatever PI_EMERGENCY is set to).

> If you are not receiving from a mailling list you might get spam or
> other unsolicited email from someone's list server.  Knowing the List-ID
> of that email is probably also useful.  Knowing that this kind of
> non-sense exists guarantees that there will be email whose List-ID won't
> be configured.

Given we already toss undeliverables into an emergency/ Maildir;
I don't think training users to look for warnings in noisy (and
potentially access-limited) mail logs is necessary.


> > Cc: Eric W. Biederman <ebiederm@xmission.com>
> > Link: https://public-inbox.org/meta/87pniltscf.fsf@x220.int.ebiederm.org/
> > ---

> > diff --git a/lib/PublicInbox/MDA.pm b/lib/PublicInbox/MDA.pm
> > index ce2c870f..933d82a8 100644
> > --- a/lib/PublicInbox/MDA.pm
> > +++ b/lib/PublicInbox/MDA.pm
> > @@ -84,18 +84,25 @@ sub set_list_headers {
> >  }
> >  
> >  # TODO: deal with multiple List-ID headers?
>    ^^^^^^^^^^^^^^^^^^------ You can kill this line now.

Yup.  I also have lots of TODO comments throughout which
need to be updated/removed :x

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-10-30 21:32 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-28 10:45 [PATCH 00/14] learn: sync w/ -mda changes and add manpage Eric Wong
2019-10-28 10:45 ` [PATCH 01/14] learn: support multiple To/Cc headers Eric Wong
2019-10-28 10:45 ` [PATCH 02/14] learn: only map recipient list on "ham" or "rm" Eric Wong
2019-10-28 10:45 ` [PATCH 03/14] learn: update usage statement Eric Wong
2019-10-28 10:45 ` [PATCH 04/14] learn: GIT_COMMITTER_<NAME|EMAIL> may be "" or "0" Eric Wong
2019-10-28 10:45 ` [PATCH 05/14] learn: hoist out remove_or_add subroutine Eric Wong
2019-10-28 10:45 ` [PATCH 06/14] mda: hoist out List-ID handling and reuse in -learn Eric Wong
2019-10-28 10:45 ` [PATCH 07/14] filter/base: remove MAX_MID_SIZE constant Eric Wong
2019-10-28 10:45 ` [PATCH 08/14] mda: hoist out mda_filter_adjust Eric Wong
2019-10-28 10:45 ` [PATCH 09/14] mda: skip MIME parsing if spam Eric Wong
2019-10-28 10:45 ` [PATCH 10/14] inboxwritable: add assert_usable_dir sub Eric Wong
2019-10-28 10:45 ` [PATCH 11/14] mda: prepare for multiple destinations Eric Wong
2019-10-28 10:45 ` [PATCH 12/14] mda: support multiple List-ID matches Eric Wong
2019-10-28 18:05   ` Eric W. Biederman
2019-10-30 21:32     ` Eric Wong
2019-10-28 10:45 ` [PATCH 13/14] learn: allow running without spamc Eric Wong
2019-10-28 10:45 ` [PATCH 14/14] doc: add public-inbox-learn(1) manpage Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).