user/dev discussion of public-inbox itself
 help / color / mirror / code / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download mbox.gz: |
* [PATCH 0/9] big mda filter changes
@ 2016-06-15  0:37  7% Eric Wong
  2016-06-15  0:37  2% ` [PATCH 9/9] mda: hook up new filter functionality Eric Wong
  0 siblings, 1 reply; 2+ results
From: Eric Wong @ 2016-06-15  0:37 UTC (permalink / raw)
  To: meta

Eric Wong (9):
      drop dependency on File::Path::Expand
      t/feed.t: make IPC::Run usage optional
      learn: remove IPC::Run dependency
      t/mda.t: remove senseless use of Email::Filter
      t/mda: use only Maildir for testing
      mda: precheck no longer depends on Email::Filter
      filter: begin work on a new filter API
      emergency: implement new emergency Maildir delivery
      mda: hook up new filter functionality

 INSTALL                          |   3 -
 Makefile.PL                      |   2 -
 lib/PublicInbox/Config.pm        |   3 +-
 lib/PublicInbox/Emergency.pm     |  96 +++++++++++
 lib/PublicInbox/Filter.pm        | 232 ---------------------------
 lib/PublicInbox/Filter/Base.pm   | 100 ++++++++++++
 lib/PublicInbox/Filter/Mirror.pm |  12 ++
 lib/PublicInbox/Filter/Vger.pm   |  33 ++++
 lib/PublicInbox/MDA.pm           |  11 +-
 script/public-inbox-learn        |  42 +++--
 script/public-inbox-mda          | 135 ++++++++--------
 t/emergency.t                    |  53 ++++++
 t/feed.t                         |  18 +--
 t/filter.t                       | 337 ---------------------------------------
 t/filter_base.t                  |  81 ++++++++++
 t/filter_mirror.t                |  40 +++++
 t/filter_vger.t                  |  46 ++++++
 t/mda.t                          |  79 ++-------
 t/precheck.t                     |  14 +-
 19 files changed, 586 insertions(+), 751 deletions(-)

 Note to self: get "git apply" to work on --irreversible-delete patches


^ permalink raw reply	[relevance 7%]

* [PATCH 9/9] mda: hook up new filter functionality
  2016-06-15  0:37  7% [PATCH 0/9] big mda filter changes Eric Wong
@ 2016-06-15  0:37  2% ` Eric Wong
  0 siblings, 0 replies; 2+ results
From: Eric Wong @ 2016-06-15  0:37 UTC (permalink / raw)
  To: meta

This removes the Email::Filter dependency as well as the
signature-breaking scrubber code.  We now prefer to
reject unacceptable messages and grudgingly (and blindly)
mirror messages we're not the primary endpoint for.
---
 INSTALL                   |   2 -
 Makefile.PL               |   1 -
 lib/PublicInbox/Filter.pm | 232 -------------------------------
 script/public-inbox-learn |   6 +-
 script/public-inbox-mda   | 131 +++++++++---------
 t/filter.t                | 337 ----------------------------------------------
 6 files changed, 69 insertions(+), 640 deletions(-)
 delete mode 100644 lib/PublicInbox/Filter.pm
 delete mode 100644 t/filter.t

diff --git a/INSTALL b/INSTALL
index 25cc3c9..03b356a 100644
--- a/INSTALL
+++ b/INSTALL
@@ -25,10 +25,8 @@ Requirements (server MDA)
 * git
 * SpamAssassin (spamc/spamd)
 * MTA - postfix is recommended
-* lynx (for converting HTML messages to text)
 * Perl and several modules:    (Debian package name)
   - Date::Parse                libtimedate-perl
-  - Email::Filter              libemail-filter-perl
   - Email::MIME                libemail-mime-perl
   - Email::MIME::ContentType   libemail-mime-contenttype-perl
   - Encode::MIME::Header       perl
diff --git a/Makefile.PL b/Makefile.PL
index 4f25312..0cba59d 100644
--- a/Makefile.PL
+++ b/Makefile.PL
@@ -18,7 +18,6 @@ WriteMakefile(
 		# We also depend on git.
 		# Keep this sorted and synced to the INSTALL document
 		'Date::Parse' => 0,
-		'Email::Filter' => 0,
 		'Email::MIME' => 0,
 		'Email::MIME::ContentType' => 0,
 		'Email::Simple' => 0,
diff --git a/lib/PublicInbox/Filter.pm b/lib/PublicInbox/Filter.pm
deleted file mode 100644
index 8b78a44..0000000
--- a/lib/PublicInbox/Filter.pm
+++ /dev/null
@@ -1,232 +0,0 @@
-# Copyright (C) 2013-2015 all contributors <meta@public-inbox.org>
-# License: AGPLv3 or later (https://www.gnu.org/licenses/agpl-3.0.txt)
-#
-# Used to filter incoming mail for -mda and importers
-# This only exposes one function: run
-# Note: the settings here are highly opinionated.  Obviously, this is
-# Free Software (AGPLv3), so you may change it if you host yourself.
-package PublicInbox::Filter;
-use strict;
-use warnings;
-use Email::MIME;
-use Email::MIME::ContentType qw/parse_content_type/;
-use Email::Filter;
-use IPC::Run;
-our $VERSION = '0.0.1';
-use constant NO_HTML => '*** We only accept plain-text email, no HTML ***';
-use constant TEXT_ONLY => '*** We only accept plain-text email ***';
-
-# start with the same defaults as mailman
-our $BAD_EXT = qr/\.(exe|bat|cmd|com|pif|scr|vbs|cpl|zip)\s*\z/i;
-our $MIME_HTML = qr!\btext/x?html\b!i;
-our $MIME_TEXT_ANY = qr!\btext/[a-z0-9\+\._-]+\b!i;
-
-# this is highly opinionated delivery
-# returns 0 only if there is nothing to deliver
-sub run {
-	my ($class, $mime, $filter) = @_;
-
-	my $content_type = $mime->header('Content-Type') || 'text/plain';
-
-	if ($content_type =~ m!\btext/plain\b!i) {
-		return 1; # yay, nothing to do
-	} elsif ($content_type =~ $MIME_HTML) {
-		$filter->reject(NO_HTML) if $filter;
-		# HTML-only, non-multipart
-		my $body = $mime->body;
-		my $ct_parsed = parse_content_type($content_type);
-		dump_html(\$body, $ct_parsed->{attributes}->{charset});
-		replace_body($mime, $body);
-		return 1;
-	} elsif ($content_type =~ m!\bmultipart/!i) {
-		return strip_multipart($mime, $content_type, $filter);
-	} else {
-		$filter->reject(TEXT_ONLY) if $filter;
-		replace_body($mime, "$content_type message scrubbed");
-		return 0;
-	}
-}
-
-sub replace_part {
-	my ($mime, $part, $type) = ($_[0], $_[1], $_[3]);
-	# don't copy $_[2], that's the body (it may be huge)
-
-	# Email::MIME insists on setting Date:, so just set it consistently
-	# to avoid conflicts to avoid git merge conflicts in a split brain
-	# situation.
-	unless (defined $part->header('Date')) {
-		my $date = $mime->header('Date') ||
-		           'Thu, 01 Jan 1970 00:00:00 +0000';
-		$part->header_set('Date', $date);
-	}
-
-	$part->charset_set(undef);
-	$part->name_set(undef);
-	$part->filename_set(undef);
-	$part->format_set(undef);
-	$part->encoding_set('8bit');
-	$part->disposition_set(undef);
-	$part->content_type_set($type);
-	$part->body_set($_[2]);
-}
-
-# converts one part of a multipart message to text
-sub html_part_to_text {
-	my ($mime, $part) = @_;
-	my $body = $part->body;
-	my $ct_parsed = parse_content_type($part->content_type);
-	dump_html(\$body, $ct_parsed->{attributes}->{charset});
-	replace_part($mime, $part, $body, 'text/plain');
-}
-
-# modifies $_[0] in place
-sub dump_html {
-	my ($body, $charset) = @_;
-	$charset ||= 'US-ASCII';
-	my @cmd = qw(lynx -stdin -stderr -dump);
-	my $out = "";
-	my $err = "";
-
-	# be careful about remote command injection!
-	if ($charset =~ /\A([A-Za-z0-9\-]+)\z/) {
-		push @cmd, "-assume_charset=$charset";
-	}
-	if (IPC::Run::run(\@cmd, $body, \$out, \$err)) {
-		$out =~ s/\r\n/\n/sg;
-		$$body = $out;
-	} else {
-		# give them an ugly version:
-		$$body = "public-inbox HTML conversion failed: $err\n" .
-			 $$body . "\n";
-	}
-}
-
-# this is to correct old archives during import.
-sub strip_multipart {
-	my ($mime, $content_type, $filter) = @_;
-
-	my (@html, @keep);
-	my $rejected = 0;
-	my $ok = 1;
-
-	# scan through all parts once
-	$mime->walk_parts(sub {
-		my ($part) = @_;
-		return if $part->subparts; # walk_parts already recurses
-
-		# some extensions are just bad, reject them outright
-		my $fn = $part->filename;
-		if (defined($fn) && $fn =~ $BAD_EXT) {
-			$filter->reject("Bad file type: $1") if $filter;
-			$rejected++;
-			return;
-		}
-
-		my $part_type = $part->content_type || '';
-		if ($part_type =~ m!\btext/plain\b!i) {
-			push @keep, $part;
-		} elsif ($part_type =~ $MIME_HTML) {
-			$filter->reject(NO_HTML) if $filter;
-			push @html, $part;
-		} elsif ($part_type =~ $MIME_TEXT_ANY) {
-			# Give other text attachments the benefit of the doubt,
-			# here?  Could be source code or script the user wants
-			# help with.
-
-			push @keep, $part;
-		} elsif ($part_type eq '' ||
-		         $part_type =~ m!\bapplication/octet-stream\b!i) {
-			# unfortunately, some mailers don't set correct types,
-			# let messages of unknown type through but do not
-			# change the sender-specified type
-			if (recheck_type_ok($part)) {
-				push @keep, $part;
-			} elsif ($filter) {
-				$filter->reject("Bad attachment: $part_type ".
-						TEXT_ONLY);
-			} else {
-				$rejected++;
-			}
-		} elsif ($part_type =~ m!\bapplication/pgp-signature\b!i) {
-			# PGP signatures are not huge, we may keep them.
-			# They can only be valid if it's the last element,
-			# so we keep them iff the message is unmodified:
-			if ($rejected == 0 && !@html) {
-				push @keep, $part;
-			}
-		} elsif ($filter) {
-			$filter->reject("unacceptable mime-type: $part_type ".
-					TEXT_ONLY);
-		} else {
-			# reject everything else, including non-PGP signatures
-			$rejected++;
-		}
-	});
-
-	if ($content_type =~ m!\bmultipart/alternative\b!i) {
-		if (scalar @keep == 1) {
-			return collapse($mime, $keep[0]);
-		}
-	} else { # convert HTML parts to plain text
-		foreach my $part (@html) {
-			html_part_to_text($mime, $part);
-			push @keep, $part;
-		}
-	}
-
-	if (@keep == 0) {
-		@keep = (Email::MIME->create(
-			attributes => {
-				content_type => 'text/plain',
-				charset => 'US-ASCII',
-				encoding => '8bit',
-			},
-			body_str => 'all attachments scrubbed by '. __PACKAGE__
-		));
-		$ok = 0;
-	}
-	if (scalar(@html) || $rejected) {
-		$mime->parts_set(\@keep);
-		$mime->body_set($mime->body_raw);
-		mark_changed($mime);
-	} # else: no changes
-
-	return $ok;
-}
-
-sub mark_changed {
-	my ($mime) = @_;
-	$mime->header_set('X-Content-Filtered-By', __PACKAGE__ ." $VERSION");
-}
-
-sub collapse {
-	my ($mime, $part) = @_;
-	$mime->header_set('Content-Type', $part->content_type);
-	$mime->body_set($part->body_raw);
-	my $cte = $part->header('Content-Transfer-Encoding');
-	if (defined($cte) && $cte ne '') {
-		$mime->header_set('Content-Transfer-Encoding', $cte);
-	}
-	mark_changed($mime);
-	return 1;
-}
-
-sub replace_body {
-	my $mime = $_[0];
-	$mime->body_set($_[1]);
-	$mime->header_set('Content-Type', 'text/plain');
-	if ($mime->header('Content-Transfer-Encoding')) {
-		$mime->header_set('Content-Transfer-Encoding', undef);
-	}
-	mark_changed($mime);
-}
-
-# Check for display-able text, no messed up binaries
-# Note: we can not rewrite the message with the detected mime type
-sub recheck_type_ok {
-	my ($part) = @_;
-	my $s = $part->body;
-	((length($s) < 0x10000) && ($s =~ /\A([[:print:]\s]+)\z/s));
-}
-
-1;
diff --git a/script/public-inbox-learn b/script/public-inbox-learn
index 783cf03..817fd5e 100755
--- a/script/public-inbox-learn
+++ b/script/public-inbox-learn
@@ -55,11 +55,7 @@ foreach my $h (qw(Cc To)) {
 	}
 }
 
-if ($train eq "ham") {
-	require PublicInbox::MDA;
-	require PublicInbox::Filter;
-	PublicInbox::Filter->run($mime);
-}
+require PublicInbox::MDA if $train eq "ham";
 
 # n.b. message may be cross-posted to multiple public-inboxes
 foreach my $recipient (keys %dests) {
diff --git a/script/public-inbox-mda b/script/public-inbox-mda
index ff2835d..63096fe 100755
--- a/script/public-inbox-mda
+++ b/script/public-inbox-mda
@@ -6,97 +6,102 @@
 use strict;
 use warnings;
 my $usage = 'public-inbox-mda < rfc2822_message';
+my ($ems, $emm);
 
-use Email::Filter;
+sub do_exit {
+	my ($code) = shift;
+	$emm = $ems = undef; # trigger DESTROY
+	exit $code;
+}
+
+use Email::Simple;
 use Email::MIME;
 use Email::MIME::ContentType;
 $Email::MIME::ContentType::STRICT_PARAMS = 0; # user input is imperfect
-use IPC::Run qw(run);
 use PublicInbox::MDA;
-use PublicInbox::Filter;
 use PublicInbox::Config;
 use PublicInbox::Import;
 use PublicInbox::Git;
+use PublicInbox::Emergency;
+use PublicInbox::Filter::Base;
+use PublicInbox::Spawn qw(popen_rd);
 
 # n.b: hopefully we can setup the emergency path without bailing due to
 # user error, we really want to setup the emergency destination ASAP
 # in case there's bugs in our code or user error.
 my $emergency = $ENV{PI_EMERGENCY} || "$ENV{HOME}/.public-inbox/emergency/";
-
-# this reads the message from stdin
-my $filter = Email::Filter->new(emergency => $emergency);
+$ems = PublicInbox::Emergency->new($emergency);
+my $str = eval { local $/; <STDIN> };
+$ems->prepare(\$str);
+my $simple = Email::Simple->new(\$str);
 my $config = PublicInbox::Config->new;
 
 my $recipient = $ENV{ORIGINAL_RECIPIENT};
 defined $recipient or die "ORIGINAL_RECIPIENT not defined in ENV\n";
 my $dst = $config->lookup($recipient); # first check
-defined $dst or exit(1);
-my $main_repo = $dst->{mainrepo} or exit(1);
-my $filtered; # string dest
+defined $dst or do_exit(1);
+my $main_repo = $dst->{mainrepo} or do_exit(1);
+
+# pre-check, MDA has stricter rules than an importer might;
+do_exit(0) unless PublicInbox::MDA->precheck($simple, $dst->{address});
 
-if (PublicInbox::MDA->precheck($filter->simple, $dst->{address}) &&
-    do_spamc($filter->simple, \$filtered)) {
-	# update our message with SA headers (in case our filter rejects it)
-	my $msg = Email::MIME->new(\$filtered);
-	$filtered = undef;
-	$filter->simple($msg);
+$str = '';
+my $spam_ok = do_spamc($ems->fh, \$str);
+$simple = undef;
+$emm = PublicInbox::Emergency->new($emergency);
+$emm->prepare(\$str);
+$ems = $ems->abort;
+my $mime = Email::MIME->new(\$str);
+$str = '';
+do_exit(0) unless $spam_ok;
 
-	my $filter_arg;
-	my $fcfg = $dst->{filter};
-	if (!defined $fcfg || $filter eq 'reject') {
-		$filter_arg = $filter;
-	} elsif ($fcfg eq 'scrub') {
-		$filter_arg = undef; # the default for legacy versions
-	} else {
-		warn "publicinbox.$dst->{name}.filter=$fcfg invalid\n";
-		warn "must be either 'scrub' or 'reject' (the default)\n";
-	}
+my $fcfg = $dst->{filter} || '';
+my $filter;
+if ($fcfg eq 'scrub') { # TODO:
+	require PublicInbox::Filter::Mirror;
+	$filter = PublicInbox::Filter::Mirror->new;
+} else {
+	$filter = PublicInbox::Filter::Base->new;
+}
 
-	if (PublicInbox::Filter->run($msg, $filter_arg)) {
-		# run spamc again on the HTML-free message
-		if (do_spamc($msg, \$filtered)) {
-			$msg = Email::MIME->new(\$filtered);
-			PublicInbox::MDA->set_list_headers($msg, $dst);
-			$filter->simple($msg);
+my $ret = $filter->delivery($mime);
+if (ref($ret) && $ret->isa('Email::MIME')) { # filter altered message
+	$mime = $ret;
+} elsif ($ret == PublicInbox::Filter::Base::IGNORE) {
+	do_exit(0); # chuck it to emergency
+} elsif ($ret == PublicInbox::Filter::Base::REJECT) {
+	$! = $ret;
+	die $filter->err, "\n";
+} # else { accept
 
-			END {
-				index_sync($main_repo) if ($? == 0);
-			};
-			my $git = PublicInbox::Git->new($main_repo);
-			my $im = PublicInbox::Import->new($git,
-						$dst->{name}, $recipient);
-			if (defined $im->add($msg)) {
-				$im->done;
-				$filter->ignore; # exits
-			}
-			# this message is similar to what ssoma-mda shows:
-			print STDERR "CONFLICT: Message-ID: ",
-				$msg->header_obj->header_raw('Message-ID'),
-				" exists\n";
-		}
-	}
+PublicInbox::MDA->set_list_headers($mime, $dst);
+END { index_sync($main_repo) if $? == 0 };
+my $git = PublicInbox::Git->new($main_repo);
+my $im = PublicInbox::Import->new($git, $dst->{name}, $recipient);
+if (defined $im->add($mime)) {
+	$im->done;
+	$emm = $emm->abort;
 } else {
-	# Ensure emergency spam gets spamassassin headers.
-	# This makes it easier to prioritize obvious spam from less obvious
-	if (defined($filtered) && $filtered ne '') {
-		my $drop = Email::MIME->new(\$filtered);
-		$filtered = undef;
-		$filter->simple($drop);
-	}
+	# this message is similar to what ssoma-mda shows:
+	print STDERR "CONFLICT: Message-ID: ",
+			$mime->header_obj->header_raw('Message-ID'),
+			" exists\n";
 }
-exit 0; # goes to emergency
+do_exit(0);
 
 # we depend on "report_safe 0" in /etc/spamassassin/*.cf with --headers
-# not using Email::Filter->pipe here since we want the stdout of
-# the command even on failure (spamc will set $? on error).
 sub do_spamc {
-	my ($msg, $out) = @_;
-	eval {
-		my $orig = $msg->as_string;
-		run([qw/spamc -E --headers/], \$orig, $out);
-	};
+	my ($in, $out) = @_;
+	my $rdr = { 0 => fileno($in) };
+	my ($fh, $pid) = popen_rd([qw/spamc -E --headers/], undef, $rdr);
+	my $r;
+	do {
+		$r = sysread($fh, $$out, 65536, length($$out));
+	} while (defined($r) && $r != 0);
+	close $fh or die "close failed: $!\n";
+	waitpid($pid, 0);
 
-	return ($@ || $? || !defined($$out) || $$out eq '') ? 0 : 1;
+	($? || $$out eq '') ? 0 : 1;
 }
 
 sub index_sync {
diff --git a/t/filter.t b/t/filter.t
deleted file mode 100644
index 609a192..0000000
--- a/t/filter.t
+++ /dev/null
@@ -1,337 +0,0 @@
-# Copyright (C) 2013-2015 all contributors <meta@public-inbox.org>
-# License: AGPLv3 or later (https://www.gnu.org/licenses/agpl-3.0.txt)
-use strict;
-use warnings;
-use Test::More;
-use Email::MIME;
-use PublicInbox::Filter;
-
-sub count_body_parts {
-	my ($bodies, $part) = @_;
-	my $body = $part->body_raw;
-	$body =~ s/\A\s*//;
-	$body =~ s/\s*\z//;
-	$bodies->{$body} ||= 0;
-	$bodies->{$body}++;
-}
-
-# multipart/alternative: HTML and quoted-printable, keep the plain-text
-{
-	my $html_body = "<html><body>hi</body></html>";
-	my $parts = [
-		Email::MIME->create(
-			attributes => {
-				content_type => 'text/html; charset=UTF-8',
-				encoding => 'base64',
-			},
-			body => $html_body,
-		),
-		Email::MIME->create(
-			attributes => {
-				content_type => 'text/plain',
-				encoding => 'quoted-printable',
-			},
-			body => 'hi = "bye"',
-		)
-	];
-	my $email = Email::MIME->create(
-		header_str => [
-		  From => 'a@example.com',
-		  Subject => 'blah',
-		  'Content-Type' => 'multipart/alternative'
-		],
-		parts => $parts,
-	);
-	is(1, PublicInbox::Filter->run($email), "run was a success");
-	my $parsed = Email::MIME->new($email->as_string);
-	is("text/plain", $parsed->header("Content-Type"));
-	is(scalar $parsed->parts, 1, "HTML part removed");
-	my %bodies;
-	$parsed->walk_parts(sub {
-		my ($part) = @_;
-		return if $part->subparts; # walk_parts already recurses
-		count_body_parts(\%bodies, $part);
-	});
-	is(scalar keys %bodies, 1, "one bodies");
-	is($bodies{"hi =3D \"bye\"="}, 1, "QP text part unchanged");
-	$parsed->walk_parts(sub {
-		my ($part) = @_;
-		my $b = $part->body;
-		$b =~ s/\s*\z//;
-		is($b, "hi = \"bye\"", "decoded body matches");
-	});
-}
-
-# plain-text email is passed through unchanged
-{
-	my $s = Email::MIME->create(
-		header => [
-			From => 'a@example.com',
-			To => 'b@example.com',
-			'Content-Type' => 'text/plain',
-			Subject => 'this is a subject',
-		],
-		body => "hello world\n",
-	);
-	is(1, PublicInbox::Filter->run($s), "run was a success");
-}
-
-# convert single-part HTML to plain-text
-{
-	my $s = Email::MIME->create(
-		header => [
-			From => 'a@example.com',
-			To => 'b@example.com',
-			'Content-Type' => 'text/html',
-			Subject => 'HTML only badness',
-		],
-		body => "<html><body>bad body\r\n</body></html>\n",
-	);
-	is(1, PublicInbox::Filter->run($s), "run was a success");
-	unlike($s->as_string, qr/<html>/, "HTML removed");
-	is("text/plain", $s->header("Content-Type"),
-		"content-type changed");
-	like($s->body, qr/\A\s*bad body\s*\z/, "body");
-	unlike($s->body, qr/\r/, "body has no cr");
-	like($s->header("X-Content-Filtered-By"),
-		qr/PublicInbox::Filter/, "XCFB header added");
-}
-
-# multipart/alternative: HTML and plain-text, keep the plain-text
-{
-	my $html_body = "<html><body>hi</body></html>";
-	my $parts = [
-		Email::MIME->create(
-			attributes => {
-				content_type => 'text/html; charset=UTF-8',
-				encoding => 'base64',
-			},
-			body => $html_body,
-		),
-		Email::MIME->create(
-			attributes => {
-				content_type => 'text/plain',
-			},
-			body=> 'hi',
-		)
-	];
-	my $email = Email::MIME->create(
-		header_str => [
-		  From => 'a@example.com',
-		  Subject => 'blah',
-		  'Content-Type' => 'multipart/alternative'
-		],
-		parts => $parts,
-	);
-	is(1, PublicInbox::Filter->run($email), "run was a success");
-	my $parsed = Email::MIME->new($email->as_string);
-	is("text/plain", $parsed->header("Content-Type"));
-	is(scalar $parsed->parts, 1, "HTML part removed");
-	my %bodies;
-	$parsed->walk_parts(sub {
-		my ($part) = @_;
-		return if $part->subparts; # walk_parts already recurses
-		count_body_parts(\%bodies, $part);
-	});
-	is(scalar keys %bodies, 1, "one bodies");
-	is($bodies{"hi"}, 1, "plain text part unchanged");
-}
-
-# multi-part plain-text-only
-{
-	my $parts = [
-		Email::MIME->create(
-			attributes => { content_type => 'text/plain', },
-			body => 'hi',
-		),
-		Email::MIME->create(
-			attributes => { content_type => 'text/plain', },
-			body => 'bye',
-		)
-	];
-	my $email = Email::MIME->create(
-		header_str => [ From => 'a@example.com', Subject => 'blah' ],
-		parts => $parts,
-	);
-	is(1, PublicInbox::Filter->run($email), "run was a success");
-	my $parsed = Email::MIME->new($email->as_string);
-	is(scalar $parsed->parts, 2, "still 2 parts");
-	my %bodies;
-	$parsed->walk_parts(sub {
-		my ($part) = @_;
-		return if $part->subparts; # walk_parts already recurses
-		count_body_parts(\%bodies, $part);
-	});
-	is(scalar keys %bodies, 2, "two bodies");
-	is($bodies{"bye"}, 1, "bye part exists");
-	is($bodies{"hi"}, 1, "hi part exists");
-	is($parsed->header("X-Content-Filtered-By"), undef,
-		"XCFB header unset");
-}
-
-# multi-part HTML, several HTML parts
-{
-	my $parts = [
-		Email::MIME->create(
-			attributes => {
-				content_type => 'text/html',
-				encoding => 'base64',
-			},
-			body => '<html><body>b64 body</body></html>',
-		),
-		Email::MIME->create(
-			attributes => {
-				content_type => 'text/html',
-				encoding => 'quoted-printable',
-			},
-			body => '<html><body>qp body</body></html>',
-		)
-	];
-	my $email = Email::MIME->create(
-		header_str => [ From => 'a@example.com', Subject => 'blah' ],
-		parts => $parts,
-	);
-	is(1, PublicInbox::Filter->run($email), "run was a success");
-	my $parsed = Email::MIME->new($email->as_string);
-	is(scalar $parsed->parts, 2, "still 2 parts");
-	my %bodies;
-	$parsed->walk_parts(sub {
-		my ($part) = @_;
-		return if $part->subparts; # walk_parts already recurses
-		count_body_parts(\%bodies, $part);
-	});
-	is(scalar keys %bodies, 2, "two body parts");
-	is($bodies{"b64 body"}, 1, "base64 part converted");
-	is($bodies{"qp body"}, 1, "qp part converted");
-	like($parsed->header("X-Content-Filtered-By"), qr/PublicInbox::Filter/,
-	     "XCFB header added");
-}
-
-# plain-text with image attachments, kill images
-{
-	my $parts = [
-		Email::MIME->create(
-			attributes => { content_type => 'text/plain' },
-			body => 'see image',
-		),
-		Email::MIME->create(
-			attributes => {
-				content_type => 'image/jpeg',
-				filename => 'scary.jpg',
-				encoding => 'base64',
-			},
-			body => 'bad',
-		)
-	];
-	my $email = Email::MIME->create(
-		header_str => [ From => 'a@example.com', Subject => 'blah' ],
-		parts => $parts,
-	);
-	is(1, PublicInbox::Filter->run($email), "run was a success");
-	my $parsed = Email::MIME->new($email->as_string);
-	is(scalar $parsed->parts, 1, "image part removed");
-	my %bodies;
-	$parsed->walk_parts(sub {
-		my ($part) = @_;
-		return if $part->subparts; # walk_parts already recurses
-		count_body_parts(\%bodies, $part);
-	});
-	is(scalar keys %bodies, 1, "one body");
-	is($bodies{'see image'}, 1, 'original body exists');
-	like($parsed->header("X-Content-Filtered-By"), qr/PublicInbox::Filter/,
-	     "XCFB header added");
-}
-
-# all bad
-{
-	my $parts = [
-		Email::MIME->create(
-			attributes => {
-				content_type => 'image/jpeg',
-				filename => 'scary.jpg',
-				encoding => 'base64',
-			},
-			body => 'bad',
-		),
-		Email::MIME->create(
-			attributes => {
-				content_type => 'text/plain',
-				filename => 'scary.exe',
-				encoding => 'base64',
-			},
-			body => 'bad',
-		),
-	];
-	my $email = Email::MIME->create(
-		header_str => [ From => 'a@example.com', Subject => 'blah' ],
-		parts => $parts,
-	);
-	is(0, PublicInbox::Filter->run($email),
-		"run signaled to stop delivery");
-	my $parsed = Email::MIME->new($email->as_string);
-	is(scalar $parsed->parts, 1, "bad parts removed");
-	my %bodies;
-	$parsed->walk_parts(sub {
-		my ($part) = @_;
-		return if $part->subparts; # walk_parts already recurses
-		count_body_parts(\%bodies, $part);
-	});
-	is(scalar keys %bodies, 1, "one body");
-	is($bodies{"all attachments scrubbed by PublicInbox::Filter"}, 1,
-	   "attachment scrubber left its mark");
-	like($parsed->header("X-Content-Filtered-By"), qr/PublicInbox::Filter/,
-	     "XCFB header added");
-}
-
-{
-	my $s = Email::MIME->create(
-		header => [
-			From => 'a@example.com',
-			To => 'b@example.com',
-			'Content-Type' => 'test/pain',
-			Subject => 'this is a subject',
-		],
-		body => "hello world\n",
-	);
-	is(0, PublicInbox::Filter->run($s), "run was a failure");
-	like($s->as_string, qr/scrubbed/, "scrubbed message");
-}
-
-# multi-part with application/octet-stream
-{
-	my $os = 'application/octet-stream';
-	my $parts = [
-		Email::MIME->create(
-			attributes => { content_type => $os },
-			body => <<EOF
-#include <stdio.h>
-int main(void)
-{
-	printf("Hello world\\n");
-	return 0;
-}
-\f
-/* some folks like ^L */
-EOF
-		),
-		Email::MIME->create(
-			attributes => {
-				filename => 'zero.data',
-				encoding => 'base64',
-				content_type => $os,
-			},
-			body => ("\0" x 4096),
-		)
-	];
-	my $email = Email::MIME->create(
-		header_str => [ From => 'a@example.com', Subject => 'blah' ],
-		parts => $parts,
-	);
-	is(1, PublicInbox::Filter->run($email), "run was a success");
-	my $parsed = Email::MIME->new($email->as_string);
-	is(scalar $parsed->parts, 1, "only one remaining part");
-	like($parsed->header("X-Content-Filtered-By"),
-		qr/PublicInbox::Filter/, "XCFB header added");
-}
-
-done_testing();

^ permalink raw reply related	[relevance 2%]

Results 1-2 of 2 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2016-06-15  0:37  7% [PATCH 0/9] big mda filter changes Eric Wong
2016-06-15  0:37  2% ` [PATCH 9/9] mda: hook up new filter functionality Eric Wong

Code repositories for project(s) associated with this public inbox

	https://80x24.org/public-inbox.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).