From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: X-Spam-Status: No, score=-4.0 required=3.0 tests=ALL_TRUSTED,BAYES_00 shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id DF2781FAF0 for ; Thu, 15 Feb 2018 11:08:45 +0000 (UTC) From: "Eric Wong (Contractor, The Linux Foundation)" To: meta@public-inbox.org Subject: [WIP 14/17] address: extract more characters from email addresses Date: Thu, 15 Feb 2018 11:08:37 +0000 Message-Id: <20180215110840.30413-15-e@80x24.org> In-Reply-To: <20180215110840.30413-1-e@80x24.org> References: <20180215105509.GA22409@dcvr> <20180215110840.30413-1-e@80x24.org> List-Id: There's a lot of weird characters which show up in LKML archives which we did not support before. Furthermore, allow spaces before the '>' in the From: line as at least some non-spam poster used it. --- lib/PublicInbox/Address.pm | 3 ++- t/address.t | 5 +++-- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/lib/PublicInbox/Address.pm b/lib/PublicInbox/Address.pm index f334ade..548f417 100644 --- a/lib/PublicInbox/Address.pm +++ b/lib/PublicInbox/Address.pm @@ -8,7 +8,8 @@ use warnings; # just enough to make thing sanely displayable and pass to git sub emails { - ($_[0] =~ /([\w\.\+=\-]+\@[\w\.\-]+)>?\s*(?:\(.*?\))?(?:,\s*|\z)/g) + ($_[0] =~ /([\w\.\+=\?"\(\)\-!#\$%&'\*\/\^\`\|\{\}~]+\@[\w\.\-\(\)]+) + (?:\s[^>]*)?>?\s*(?:\(.*?\))?(?:,\s*|\z)/gx) } sub names { diff --git a/t/address.t b/t/address.t index e35e4f8..eced5c4 100644 --- a/t/address.t +++ b/t/address.t @@ -9,8 +9,9 @@ is_deeply([qw(e@example.com e@example.org)], [PublicInbox::Address::emails('User , e@example.org')], 'address extraction works as expected'); -is_deeply([PublicInbox::Address::emails('"ex@example.com" ')], - [qw(ex@example.com)]); +is_deeply(['user@example.com'], + [PublicInbox::Address::emails('')], + 'comment after domain accepted before >'); my @names = PublicInbox::Address::names( 'User , e@e, "John A. Doe" , '); -- EW