user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Cc: Eric Wong <e@80x24.org>
Subject: [PATCH 2/2] -learn: nuke HTML portions when training as ham
Date: Thu, 13 Nov 2014 21:53:01 +0000	[thread overview]
Message-ID: <1415915581-2522-2-git-send-email-e@80x24.org> (raw)
In-Reply-To: <1415915581-2522-1-git-send-email-e@80x24.org>

Sometimes people send HTML email and I forget to fixup in my
MUA during moderation.  Automatically strip out HTML portions
instead.
---
 public-inbox-learn | 19 ++++++++++---------
 t/mda.t            | 41 +++++++++++++++++++++++++++++++++++++++++
 2 files changed, 51 insertions(+), 9 deletions(-)

diff --git a/public-inbox-learn b/public-inbox-learn
index 13b75b7..db0a1bb 100755
--- a/public-inbox-learn
+++ b/public-inbox-learn
@@ -24,9 +24,16 @@ foreach my $h (qw(Cc To)) {
 	}
 }
 
-my $in = $mime->as_string;
-$mime->body_set('');
+my ($name, $email, $date);
+
+if ($train eq "ham") {
+	require PublicInbox::MDA;
+	require PublicInbox::Filter;
+	PublicInbox::Filter->run($mime);
+	($name, $email, $date) = PublicInbox::MDA->author_info($mime);
+}
 
+my $in = $mime->as_string;
 my $err = 0;
 my @output = qw(> /dev/null > /dev/null);
 
@@ -50,16 +57,10 @@ foreach my $recipient (keys %dests) {
 			}
 		}
 	} else { # $train eq "ham"
-		require PublicInbox::MDA;
-		require PublicInbox::Filter;
-
-		# no checking for errors here, we assume the message has
+		# no checking for spam here, we assume the message has
 		# been reviewed by a human at this point:
-		PublicInbox::Filter->run($mime);
 		PublicInbox::MDA->set_list_headers($mime, $dst);
 
-		my ($name, $email, $date) =
-				PublicInbox::MDA->author_info($mime);
 		local $ENV{GIT_AUTHOR_NAME} = $name;
 		local $ENV{GIT_AUTHOR_EMAIL} = $email;
 		local $ENV{GIT_AUTHOR_DATE} = $date;
diff --git a/t/mda.t b/t/mda.t
index fad96e5..53712a5 100644
--- a/t/mda.t
+++ b/t/mda.t
@@ -205,14 +205,55 @@ EOF
 	my $in = $simple->as_string;
 
 	# now train it
+	# these should be overridden
 	local $ENV{GIT_AUTHOR_EMAIL} = 'trainer@example.com';
 	local $ENV{GIT_COMMITTER_EMAIL} = 'trainer@example.com';
+
 	run([$learn, "ham"], \$in);
 	is($?, 0, "learned ham without failure");
 	my $msg = `ssoma cat $mid $maindir`;
 	like($msg, qr/\Q$mid\E/, "ham message delivered");
 	run([$learn, "ham"], \$in);
 	is($?, 0, "learned ham idempotently ");
+
+	# ensure trained email is filtered, too
+	my $html_body = "<html><body>hi</body></html>";
+	my $parts = [
+		Email::MIME->create(
+			attributes => {
+				content_type => 'text/html; charset=UTF-8',
+				encoding => 'base64',
+			},
+			body => $html_body,
+		),
+		Email::MIME->create(
+			attributes => {
+				content_type => 'text/plain',
+				encoding => 'quoted-printable',
+			},
+			body => 'hi = "bye"',
+		)
+	];
+	$mid = 'multipart-html-sucks@11';
+	my $mime = Email::MIME->create(
+		header_str => [
+		  From => 'a@example.com',
+		  Subject => 'blah',
+		  Cc => $addr,
+		  'Message-ID' => "<$mid>",
+		  'Content-Type' => 'multipart/alternative',
+		],
+		parts => $parts,
+	);
+
+	{
+		$in = $mime->as_string;
+		run([$learn, "ham"], \$in);
+		is($?, 0, "learned ham without failure");
+		$msg = `ssoma cat $mid $maindir`;
+		like($msg, qr/<\Q$mid\E>/, "ham message delivered");
+		unlike($msg, qr/<html>/i, '<html> filtered');
+	}
 }
 
 # faildir - emergency destination is maildir
-- 
EW


      reply	other threads:[~2014-11-13 21:53 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-11-13 21:53 [PATCH 1/2] view: account for filter bugs which leak HTML into the repo Eric Wong
2014-11-13 21:53 ` Eric Wong [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://public-inbox.org/README

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1415915581-2522-2-git-send-email-e@80x24.org \
    --to=e@80x24.org \
    --cc=meta@public-inbox.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).