From: Jim Meyering <jim@meyering.net>
To: Norihiro Tanaka <noritnk@kcn.ne.jp>,
GNU grep developers <grep-devel@gnu.org>
Cc: fryasu@yahoo.co.jp,
"bug-gnulib@gnu.org List" <bug-gnulib@gnu.org>,
Paul Eggert <eggert@cs.ucla.edu>,
40634@debbugs.gnu.org
Subject: Re: bug#40634: Massive pattern list handling with -E format seems very slow since 2.28.
Date: Fri, 11 Sep 2020 14:47:49 +0200 [thread overview]
Message-ID: <CA+8g5KGQr=L5YdzqGNiR+j_KvaiOja1_pmAtz4__C_7si_CCaQ@mail.gmail.com> (raw)
In-Reply-To: <20200419111025.4326.27F6AC2D@kcn.ne.jp>
On Sun, Apr 19, 2020 at 4:10 AM Norihiro Tanaka <noritnk@kcn.ne.jp> wrote:
> On Sun, 19 Apr 2020 07:41:49 +0900
> Norihiro Tanaka <noritnk@kcn.ne.jp> wrote:
> > On Sat, 18 Apr 2020 00:22:26 +0900
> > Norihiro Tanaka <noritnk@kcn.ne.jp> wrote:
> >
> > >
> > > On Fri, 17 Apr 2020 10:24:42 +0900
> > > Norihiro Tanaka <noritnk@kcn.ne.jp> wrote:
> > >
> > > >
> > > > On Fri, 17 Apr 2020 09:35:36 +0900
> > > > Norihiro Tanaka <noritnk@kcn.ne.jp> wrote:
> > > >
> > > > >
> > > > > On Thu, 16 Apr 2020 16:00:29 -0700
> > > > > Paul Eggert <eggert@cs.ucla.edu> wrote:
> > > > >
> > > > > > On 4/16/20 3:53 PM, Norihiro Tanaka wrote:
> > > > > >
> > > > > > > I have had no idea to solve the problem yet. If we revert it, bug#33357
> > > > > > > will come back.
> > > > > >
> > > > > > Yes, I'd rather not revert if we can help it.
> > > > > >
> > > > > > My own thought was to not analyze the regular expression if we discover that the input is empty. :-)
> > > > >
> > > > > Now, I have a idea, it is that we build indexes of epsilon nodes
> > > > > including in follows before remove epsilon nodes.
> > > >
> > > >
> > > > I wrote fix for the bug, but it will be slower then at grep 2.27 yet.
> > >
> > > It was improved previous patch.
> >
> > Sorry, correct patch is here.
>
> I made the previous patch even simpler.
>
> before:
>
> $ env LC_ALL=C time -p src/grep -E -v -m1 -f grep-patterns.txt /dev/null
> real 7.24
> user 7.14
> sys 0.09
>
> after:
>
> $ env LC_ALL=C time -p src/grep -E -v -m1 -f grep-patterns.txt /dev/null
> real 0.62
> user 0.52
> sys 0.10
Thank you for this patch. I have rebased and made minor syntactic changes.
I'll push it to gnulib soon, if not today, then by Monday.
I am considering creating a test case in grep, but it feels too tight
to be feasible: I would use a relative perf test, requiring that a
passing test incur a perf cost of less than say 100x. Here's the
beginnings of my attempt (note: this is just an outline -- obviously
would not rely on having "time" in path or as a shell builtin):
gen()
{
local n=$1
local i=1
while : ; do
local pat=$(printf $i | sha1sum | cut -d' ' -f1)
printf '%s\n' "$pat$pat(\$|$pat)"
i=$(expr $i + 1)
test $i = $n && break
done
}
gen 4000 > pats-4000
head -400 pats-4000 > pats-400
# With fixed code, that a 10x input size increase (n=400 to 4000)
# induces a 40x runtime increase: .05 -> 2.0s
# Just prior to this change, it's 150x: 0.2 -> 30s
env LC_ALL=C time -p src/grep -E -v -m1 -f pats-400 /dev/null
env LC_ALL=C time -p src/grep -E -v -m1 -f pats-4000 /dev/null
next prev parent reply other threads:[~2020-09-11 12:48 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <6503eb8e-e6fd-b4dd-aab7-76bb6955d87b@cs.ucla.edu>
[not found] ` <20200417093536.875E.27F6AC2D@kcn.ne.jp>
2020-04-17 1:24 ` bug#40634: Massive pattern list handling with -E format seems very slow since 2.28 Norihiro Tanaka
2020-04-17 15:22 ` Norihiro Tanaka
2020-04-18 22:41 ` Norihiro Tanaka
2020-04-19 2:10 ` Norihiro Tanaka
2020-09-11 12:47 ` Jim Meyering [this message]
2020-09-11 13:17 ` Jim Meyering
2020-09-11 23:01 ` Paul Eggert
2020-09-12 6:41 ` Jim Meyering
2020-09-14 2:03 ` Paul Eggert
2020-09-14 14:14 ` Jim Meyering
2020-09-21 19:22 ` Paul Eggert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://lists.gnu.org/mailman/listinfo/bug-gnulib
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CA+8g5KGQr=L5YdzqGNiR+j_KvaiOja1_pmAtz4__C_7si_CCaQ@mail.gmail.com' \
--to=jim@meyering.net \
--cc=40634@debbugs.gnu.org \
--cc=bug-gnulib@gnu.org \
--cc=eggert@cs.ucla.edu \
--cc=fryasu@yahoo.co.jp \
--cc=grep-devel@gnu.org \
--cc=noritnk@kcn.ne.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).