about summary refs log tree commit homepage
path: root/lib/PublicInbox/Hval.pm
diff options
context:
space:
mode:
authorEric Wong <e@80x24.org>2021-04-11 05:32:55 +0000
committerEric Wong <e@80x24.org>2021-04-11 06:40:21 +0000
commite98c3f01267c810ee214be87d0ee1bd575b23b88 (patch)
tree938f62dce4d4faa792b9f2813b0c6f155d10695b /lib/PublicInbox/Hval.pm
parentea4e9025dd14f251996baf724e04fc478375b6a2 (diff)
downloadpublic-inbox-e98c3f01267c810ee214be87d0ee1bd575b23b88.tar.gz
As they are likely Message-IDs.   If an email address ends up in
a URL, then it's likely public, so there's even less reason to
obfuscate that particular address.

[km: add xt/perf-obfuscate.t]
[ew: modernize perf test (5.10.1), use diag instead of print]

This version of the patch avoids the massive slowdown noted by Kyle in
<https://public-inbox.org/meta/87wnt9or6t.fsf@kyleam.com/>.
Performance remains roughly the same, if not slightly faster
(which may be due to me testing this on a busy server).  Results
from xt/perf-obfuscate.t against 6078 messages on a local mirror
of <https://public-inbox.org/meta/>:

	before: 6.67 usr + 0.04 sys = 6.71 CPU
	 after: 6.64 usr + 0.04 sys = 6.68 CPU

Reported-by: Kyle Meyer <kyle@kyleam.com>
Helped-by: Kyle Meyer <kyle@kyleam.com>
Link: https://public-inbox.org/meta/87a6q8p5qa.fsf@kyleam.com/
Diffstat (limited to 'lib/PublicInbox/Hval.pm')
-rw-r--r--lib/PublicInbox/Hval.pm21
1 files changed, 14 insertions, 7 deletions
diff --git a/lib/PublicInbox/Hval.pm b/lib/PublicInbox/Hval.pm
index d20f70ae..eab4738e 100644
--- a/lib/PublicInbox/Hval.pm
+++ b/lib/PublicInbox/Hval.pm
@@ -82,15 +82,22 @@ sub obfuscate_addrs ($$;$) {
         my $repl = $_[2] // '&#8226;';
         my $re = $ibx->{-no_obfuscate_re}; # regex of domains
         my $addrs = $ibx->{-no_obfuscate}; # { $address => 1 }
-        $_[1] =~ s/(([\w\.\+=\-]+)\@([\w\-]+\.[\w\.\-]+))/
-                my ($addr, $user, $domain) = ($1, $2, $3);
-                if ($addrs->{$addr} || ((defined $re && $domain =~ $re))) {
-                        $addr;
+        $_[1] =~ s#(\S+)\@([\w\-]+\.[\w\.\-]+)#
+                my ($pfx, $domain) = ($1, $2);
+                if (index($pfx, '://') > 0 || $pfx !~ s/([\w\.\+=\-]+)\z//) {
+                        "$pfx\@$domain";
                 } else {
-                        $domain =~ s!([^\.]+)\.!$1$repl!;
-                        $user . '@' . $domain
+                        my $user = $1;
+                        my $addr = "$user\@$domain";
+                        if ($addrs->{$addr} || ((defined($re) &&
+                                                $domain =~ $re))) {
+                                $pfx.$addr;
+                        } else {
+                                $domain =~ s!([^\.]+)\.!$1$repl!;
+                                $pfx . $user . '@' . $domain
+                        }
                 }
-                /sge;
+                #sge;
 }
 
 # like format_sanitized_subject in git.git pretty.c with '%f' format string