On Fri, Sep 11, 2020 at 2:47 PM Jim Meyering wrote: > On Sun, Apr 19, 2020 at 4:10 AM Norihiro Tanaka wrote: > > On Sun, 19 Apr 2020 07:41:49 +0900 > > Norihiro Tanaka wrote: > > > On Sat, 18 Apr 2020 00:22:26 +0900 > > > Norihiro Tanaka wrote: > > > > > > > > > > > On Fri, 17 Apr 2020 10:24:42 +0900 > > > > Norihiro Tanaka wrote: > > > > > > > > > > > > > > On Fri, 17 Apr 2020 09:35:36 +0900 > > > > > Norihiro Tanaka wrote: > > > > > > > > > > > > > > > > > On Thu, 16 Apr 2020 16:00:29 -0700 > > > > > > Paul Eggert wrote: > > > > > > > > > > > > > On 4/16/20 3:53 PM, Norihiro Tanaka wrote: > > > > > > > > > > > > > > > I have had no idea to solve the problem yet. If we revert it, bug#33357 > > > > > > > > will come back. > > > > > > > > > > > > > > Yes, I'd rather not revert if we can help it. > > > > > > > > > > > > > > My own thought was to not analyze the regular expression if we discover that the input is empty. :-) > > > > > > > > > > > > Now, I have a idea, it is that we build indexes of epsilon nodes > > > > > > including in follows before remove epsilon nodes. > > > > > > > > > > > > > > > I wrote fix for the bug, but it will be slower then at grep 2.27 yet. > > > > > > > > It was improved previous patch. > > > > > > Sorry, correct patch is here. > > > > I made the previous patch even simpler. > > > > before: > > > > $ env LC_ALL=C time -p src/grep -E -v -m1 -f grep-patterns.txt /dev/null > > real 7.24 > > user 7.14 > > sys 0.09 > > > > after: > > > > $ env LC_ALL=C time -p src/grep -E -v -m1 -f grep-patterns.txt /dev/null > > real 0.62 > > user 0.52 > > sys 0.10 > > Thank you for this patch. I have rebased and made minor syntactic changes. > I'll push it to gnulib soon, if not today, then by Monday. > > I am considering creating a test case in grep, but it feels too tight > to be feasible: I would use a relative perf test, requiring that a > passing test incur a perf cost of less than say 100x. Here's the > beginnings of my attempt (note: this is just an outline -- obviously > would not rely on having "time" in path or as a shell builtin): > > gen() > { > local n=$1 > local i=1 > while : ; do > local pat=$(printf $i | sha1sum | cut -d' ' -f1) > printf '%s\n' "$pat$pat(\$|$pat)" > i=$(expr $i + 1) > test $i = $n && break > done > } > > gen 4000 > pats-4000 > head -400 pats-4000 > pats-400 > > # With fixed code, that a 10x input size increase (n=400 to 4000) > # induces a 40x runtime increase: .05 -> 2.0s > # Just prior to this change, it's 150x: 0.2 -> 30s > > env LC_ALL=C time -p src/grep -E -v -m1 -f pats-400 /dev/null > env LC_ALL=C time -p src/grep -E -v -m1 -f pats-4000 /dev/null And here is the adjusted patch: