From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.3.2 (2011-06-06) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-2.9 required=3.0 tests=ALL_TRUSTED,AWL,BAYES_00 shortcircuit=no autolearn=unavailable version=3.3.2 X-Original-To: meta@public-inbox.org Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id 12147200EA; Thu, 1 Oct 2015 22:23:17 +0000 (UTC) Date: Thu, 1 Oct 2015 22:23:17 +0000 From: Eric Wong To: meta@public-inbox.org Subject: [PATCH 3/3] filter: more consistent labeling of rejections Message-ID: <20151001222317.GD23598@dcvr.yhbt.net> References: <20151001222143.GA23598@dcvr.yhbt.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151001222143.GA23598@dcvr.yhbt.net> List-Id: While we're at it, reject non-plain-text top-level messages, too. They probably do not exist in practice, but we cannot afford to scrub given policies implemented by overzealous mail providers. While we're at it, update the comment for strip_multipart. --- lib/PublicInbox/Filter.pm | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/lib/PublicInbox/Filter.pm b/lib/PublicInbox/Filter.pm index 6f28e01..01052d0 100644 --- a/lib/PublicInbox/Filter.pm +++ b/lib/PublicInbox/Filter.pm @@ -13,6 +13,7 @@ use Email::Filter; use IPC::Run; our $VERSION = '0.0.1'; use constant NO_HTML => '*** We only accept plain-text email, no HTML ***'; +use constant TEXT_ONLY => '*** We only accept plain-text email ***'; # start with the same defaults as mailman our $BAD_EXT = qr/\.(exe|bat|cmd|com|pif|scr|vbs|cpl|zip)\s*\z/i; @@ -49,6 +50,7 @@ sub run { } elsif ($content_type =~ m!\bmultipart/!i) { return strip_multipart($mime, $content_type, $filter); } else { + $filter->reject(TEXT_ONLY) if $filter; replace_body($mime, "$content_type message scrubbed"); return 0; } @@ -108,10 +110,7 @@ sub dump_html { } } -# this is to correct user errors and not expected to cover all corner cases -# if users don't want to hit this, they should be sending text/plain messages -# unfortunately, too many people send HTML mail and we'll attempt to convert -# it to something safer, smaller and harder-to-spy-on-users-with. +# this is to correct old archives during import. sub strip_multipart { my ($mime, $content_type, $filter) = @_; @@ -152,7 +151,7 @@ sub strip_multipart { if (recheck_type_ok($part)) { push @keep, $part; } elsif ($filter) { - $filter->reject('no attachments') + $filter->reject(TEXT_ONLY); } else { $rejected++; } @@ -164,7 +163,7 @@ sub strip_multipart { push @keep, $part; } } else { - $filter->reject('no attachments') if $filter; + $filter->reject(TEXT_ONLY) if $filter; # reject everything else, including non-PGP signatures $rejected++; } -- EW