From: Eric Wong <e@80x24.org>
To: meta@public-inbox.org
Cc: Eric Wong <e@80x24.org>
Subject: [PATCH 2/2] -learn: nuke HTML portions when training as ham
Date: Thu, 13 Nov 2014 21:53:01 +0000 [thread overview]
Message-ID: <1415915581-2522-2-git-send-email-e@80x24.org> (raw)
In-Reply-To: <1415915581-2522-1-git-send-email-e@80x24.org>
Sometimes people send HTML email and I forget to fixup in my
MUA during moderation. Automatically strip out HTML portions
instead.
---
public-inbox-learn | 19 ++++++++++---------
t/mda.t | 41 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 51 insertions(+), 9 deletions(-)
diff --git a/public-inbox-learn b/public-inbox-learn
index 13b75b7..db0a1bb 100755
--- a/public-inbox-learn
+++ b/public-inbox-learn
@@ -24,9 +24,16 @@ foreach my $h (qw(Cc To)) {
}
}
-my $in = $mime->as_string;
-$mime->body_set('');
+my ($name, $email, $date);
+
+if ($train eq "ham") {
+ require PublicInbox::MDA;
+ require PublicInbox::Filter;
+ PublicInbox::Filter->run($mime);
+ ($name, $email, $date) = PublicInbox::MDA->author_info($mime);
+}
+my $in = $mime->as_string;
my $err = 0;
my @output = qw(> /dev/null > /dev/null);
@@ -50,16 +57,10 @@ foreach my $recipient (keys %dests) {
}
}
} else { # $train eq "ham"
- require PublicInbox::MDA;
- require PublicInbox::Filter;
-
- # no checking for errors here, we assume the message has
+ # no checking for spam here, we assume the message has
# been reviewed by a human at this point:
- PublicInbox::Filter->run($mime);
PublicInbox::MDA->set_list_headers($mime, $dst);
- my ($name, $email, $date) =
- PublicInbox::MDA->author_info($mime);
local $ENV{GIT_AUTHOR_NAME} = $name;
local $ENV{GIT_AUTHOR_EMAIL} = $email;
local $ENV{GIT_AUTHOR_DATE} = $date;
diff --git a/t/mda.t b/t/mda.t
index fad96e5..53712a5 100644
--- a/t/mda.t
+++ b/t/mda.t
@@ -205,14 +205,55 @@ EOF
my $in = $simple->as_string;
# now train it
+ # these should be overridden
local $ENV{GIT_AUTHOR_EMAIL} = 'trainer@example.com';
local $ENV{GIT_COMMITTER_EMAIL} = 'trainer@example.com';
+
run([$learn, "ham"], \$in);
is($?, 0, "learned ham without failure");
my $msg = `ssoma cat $mid $maindir`;
like($msg, qr/\Q$mid\E/, "ham message delivered");
run([$learn, "ham"], \$in);
is($?, 0, "learned ham idempotently ");
+
+ # ensure trained email is filtered, too
+ my $html_body = "<html><body>hi</body></html>";
+ my $parts = [
+ Email::MIME->create(
+ attributes => {
+ content_type => 'text/html; charset=UTF-8',
+ encoding => 'base64',
+ },
+ body => $html_body,
+ ),
+ Email::MIME->create(
+ attributes => {
+ content_type => 'text/plain',
+ encoding => 'quoted-printable',
+ },
+ body => 'hi = "bye"',
+ )
+ ];
+ $mid = 'multipart-html-sucks@11';
+ my $mime = Email::MIME->create(
+ header_str => [
+ From => 'a@example.com',
+ Subject => 'blah',
+ Cc => $addr,
+ 'Message-ID' => "<$mid>",
+ 'Content-Type' => 'multipart/alternative',
+ ],
+ parts => $parts,
+ );
+
+ {
+ $in = $mime->as_string;
+ run([$learn, "ham"], \$in);
+ is($?, 0, "learned ham without failure");
+ $msg = `ssoma cat $mid $maindir`;
+ like($msg, qr/<\Q$mid\E>/, "ham message delivered");
+ unlike($msg, qr/<html>/i, '<html> filtered');
+ }
}
# faildir - emergency destination is maildir
--
EW
prev parent reply other threads:[~2014-11-13 21:53 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-11-13 21:53 [PATCH 1/2] view: account for filter bugs which leak HTML into the repo Eric Wong
2014-11-13 21:53 ` Eric Wong [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://public-inbox.org/README
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1415915581-2522-2-git-send-email-e@80x24.org \
--to=e@80x24.org \
--cc=meta@public-inbox.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).