* [PATCH] quiet "Complex regular subexpression recursion limit" warnings
@ 2020-04-03 21:51 7% Eric Wong
0 siblings, 0 replies; 1+ results
From: Eric Wong @ 2020-04-03 21:51 UTC (permalink / raw)
To: meta
These seem mostly harmless since Perl will just truncate the
match and start a new one on a newline boundary in our case.
The only downside is we'd end up with redundant <span> tags in
HTML.
Limiting the number of line matched ourselves with `{1,$NUM}'
doesn't seem prudent since lines vary in length, so we continue
to defer the job of limiting matches to the Perl regexp engine.
I've noticed this warning in practice on 100K+ line patches to
locale data.
---
lib/PublicInbox/MsgIter.pm | 10 ++++++++++
lib/PublicInbox/SearchIdx.pm | 2 +-
lib/PublicInbox/View.pm | 2 +-
lib/PublicInbox/ViewDiff.pm | 11 +++++++++++
t/msg_iter.t | 30 ++++++++++++++++++++++++++++++
5 files changed, 53 insertions(+), 2 deletions(-)
diff --git a/lib/PublicInbox/MsgIter.pm b/lib/PublicInbox/MsgIter.pm
index 6c18d2bf..fa25564a 100644
--- a/lib/PublicInbox/MsgIter.pm
+++ b/lib/PublicInbox/MsgIter.pm
@@ -71,4 +71,14 @@ sub msg_part_text ($$) {
($s, $err);
}
+# returns an array of quoted or unquoted sections
+sub split_quotes {
+ # Quiet "Complex regular subexpression recursion limit" warning
+ # in case an inconsiderate sender quotes 32K of text at once.
+ # The warning from Perl is harmless for us since our callers can
+ # tolerate less-than-ideal matches which work within Perl limits.
+ no warnings 'regexp';
+ split(/((?:^>[^\n]*\n)+)/sm, shift);
+}
+
1;
diff --git a/lib/PublicInbox/SearchIdx.pm b/lib/PublicInbox/SearchIdx.pm
index fe00df53..89d8bc2b 100644
--- a/lib/PublicInbox/SearchIdx.pm
+++ b/lib/PublicInbox/SearchIdx.pm
@@ -302,7 +302,7 @@ sub index_xapian { # msg_iter callback
defined $s or return;
# split off quoted and unquoted blocks:
- my @sections = split(/((?:^>[^\n]*\n)+)/sm, $s);
+ my @sections = PublicInbox::MsgIter::split_quotes($s);
$part = $s = undef;
index_body($self, $_, /\A>/ ? 0 : $doc) for @sections;
}
diff --git a/lib/PublicInbox/View.pm b/lib/PublicInbox/View.pm
index c42654b6..70c10604 100644
--- a/lib/PublicInbox/View.pm
+++ b/lib/PublicInbox/View.pm
@@ -576,7 +576,7 @@ sub add_text_body { # callback for msg_iter
$s .= "\n" unless $s =~ /\n\z/s;
# split off quoted and unquoted blocks:
- my @sections = split(/((?:^>[^\n]*\n)+)/sm, $s);
+ my @sections = PublicInbox::MsgIter::split_quotes($s);
$s = '';
my $rv = $ctx->{obuf};
if (defined($fn) || $depth > 0 || $err) {
diff --git a/lib/PublicInbox/ViewDiff.pm b/lib/PublicInbox/ViewDiff.pm
index d22c80b9..5d391a13 100644
--- a/lib/PublicInbox/ViewDiff.pm
+++ b/lib/PublicInbox/ViewDiff.pm
@@ -202,6 +202,17 @@ sub flush_diff ($$$) {
$dctx = diff_header($dst, \$x, $ctx, \@top);
} elsif ($dctx) {
my $after = '';
+
+ # Quiet "Complex regular subexpression recursion limit"
+ # warning. Perl will truncate matches upon hitting
+ # that limit, giving us more (and shorter) scalars than
+ # would be ideal, but otherwise it's harmless.
+ #
+ # We could replace the `+' metacharacter with `{1,100}'
+ # to limit the matches ourselves to 100, but we can
+ # let Perl do it for us, quietly.
+ no warnings 'regexp';
+
for my $s (split(/((?:(?:^\+[^\n]*\n)+)|
(?:(?:^-[^\n]*\n)+)|
(?:^@@ [^\n]+\n))/xsm, $x)) {
diff --git a/t/msg_iter.t b/t/msg_iter.t
index e33bfc69..d303564f 100644
--- a/t/msg_iter.t
+++ b/t/msg_iter.t
@@ -78,5 +78,35 @@ use_ok('PublicInbox::MsgIter');
'got bullet point when X-UNKNOWN assumes UTF-8');
}
+{ # API not finalized
+ my @warn;
+ local $SIG{__WARN__} = sub { push @warn, [ @_ ] };
+ my $attr = "So and so wrote:\n";
+ my $q = "> hello world\n" x 10;
+ my $nq = "hello world\n" x 10;
+ my @sections = PublicInbox::MsgIter::split_quotes($attr . $q . $nq);
+ is($sections[0], $attr, 'attribution matches');
+ is($sections[1], $q, 'quoted section matches');
+ is($sections[2], $nq, 'non-quoted section matches');
+ is(scalar(@sections), 3, 'only three sections for short message');
+ is_deeply(\@warn, [], 'no warnings');
+
+ $q x= 3300;
+ $nq x= 3300;
+ @sections = PublicInbox::MsgIter::split_quotes($attr . $q . $nq);
+ is_deeply(\@warn, [], 'no warnings on giant message');
+ is(join('', @sections), $attr . $q . $nq, 'result matches expected');
+ is(shift(@sections), $attr, 'attribution is first section');
+ my @check = ('', '');
+ while (defined(my $l = shift @sections)) {
+ next if $l eq '';
+ like($l, qr/\n\z/s, 'section ends with newline');
+ my $idx = ($l =~ /\A>/) ? 0 : 1;
+ $check[$idx] .= $l;
+ }
+ is($check[0], $q, 'long quoted section matches');
+ is($check[1], $nq, 'long quoted section matches');
+}
+
done_testing();
1;
^ permalink raw reply related [relevance 7%]
Results 1-1 of 1 | reverse | options above
-- pct% links below jump to the message on this page, permalinks otherwise --
2020-04-03 21:51 7% [PATCH] quiet "Complex regular subexpression recursion limit" warnings Eric Wong
Code repositories for project(s) associated with this public inbox
https://80x24.org/public-inbox.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).