From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS22989 209.51.188.0/24 X-Spam-Status: No, score=-3.9 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id C24A01F45A for ; Sun, 19 Apr 2020 02:10:41 +0000 (UTC) Received: from localhost ([::1]:36450 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jPzPs-0001wM-Kl for normalperson@yhbt.net; Sat, 18 Apr 2020 22:10:40 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:51978) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jPzPo-0001wB-JB for bug-gnulib@gnu.org; Sat, 18 Apr 2020 22:10:37 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.90_1) (envelope-from ) id 1jPzPn-0003uS-2f for bug-gnulib@gnu.org; Sat, 18 Apr 2020 22:10:36 -0400 Received: from mailgw07.kcn.ne.jp ([61.86.7.214]:40246) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jPzPm-0003pq-5h for bug-gnulib@gnu.org; Sat, 18 Apr 2020 22:10:34 -0400 Received: from mxs02-s (mailgw2.kcn.ne.jp [61.86.15.234]) by mailgw07.kcn.ne.jp (Postfix) with ESMTP id 4CC4A4119E for ; Sun, 19 Apr 2020 11:10:29 +0900 (JST) X-matriXscan-loop-detect: a204fab572889166595e134212f7ff70eb9900fe Received: from mail14.kcn.ne.jp ([61.86.6.132]) by mxs02-s with ESMTP; Sun, 19 Apr 2020 11:10:26 +0900 (JST) Received: from [10.120.1.110] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail14.kcn.ne.jp (Postfix) with ESMTPA id 5D6464121E1C; Sun, 19 Apr 2020 11:10:26 +0900 (JST) Date: Sun, 19 Apr 2020 11:10:26 +0900 From: Norihiro Tanaka To: 40634@debbugs.gnu.org Subject: Re: bug#40634: Massive pattern list handling with -E format seems very slow since 2.28. In-Reply-To: <20200419074109.431A.27F6AC2D@kcn.ne.jp> References: <20200418002153.8771.27F6AC2D@kcn.ne.jp> <20200419074109.431A.27F6AC2D@kcn.ne.jp> Message-Id: <20200419111025.4326.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------_5E9BB1D8000000004318_MULTIPART_MIXED_" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.74.02 [ja] X-matriXscan-msec-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized Received-SPF: pass client-ip=61.86.7.214; envelope-from=noritnk@kcn.ne.jp; helo=mailgw07.kcn.ne.jp X-detected-operating-system: by eggs.gnu.org: Linux 3.1-3.10 [fuzzy] X-Received-From: 61.86.7.214 X-BeenThere: bug-gnulib@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Gnulib discussion list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: fryasu@yahoo.co.jp, Paul Eggert , bug-gnulib@gnu.org Errors-To: bug-gnulib-bounces+normalperson=yhbt.net@gnu.org Sender: "bug-gnulib" --------_5E9BB1D8000000004318_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit On Sun, 19 Apr 2020 07:41:49 +0900 Norihiro Tanaka wrote: > > On Sat, 18 Apr 2020 00:22:26 +0900 > Norihiro Tanaka wrote: > > > > > On Fri, 17 Apr 2020 10:24:42 +0900 > > Norihiro Tanaka wrote: > > > > > > > > On Fri, 17 Apr 2020 09:35:36 +0900 > > > Norihiro Tanaka wrote: > > > > > > > > > > > On Thu, 16 Apr 2020 16:00:29 -0700 > > > > Paul Eggert wrote: > > > > > > > > > On 4/16/20 3:53 PM, Norihiro Tanaka wrote: > > > > > > > > > > > I have had no idea to solve the problem yet. If we revert it, bug#33357 > > > > > > will come back. > > > > > > > > > > Yes, I'd rather not revert if we can help it. > > > > > > > > > > My own thought was to not analyze the regular expression if we discover that the input is empty. :-) > > > > > > > > Now, I have a idea, it is that we build indexes of epsilon nodes > > > > including in follows before remove epsilon nodes. > > > > > > > > > I wrote fix for the bug, but it will be slower then at grep 2.27 yet. > > > > It was improved previous patch. > > Sorry, correct patch is here. I made the previous patch even simpler. before: $ env LC_ALL=C time -p src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 7.24 user 7.14 sys 0.09 after: $ env LC_ALL=C time -p src/grep -E -v -m1 -f grep-patterns.txt /dev/null real 0.62 user 0.52 sys 0.10 --------_5E9BB1D8000000004318_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII"; name="0001-dfa-use-backword-set-in-removal-of-epsilon-closure.patch" Content-Disposition: attachment; filename="0001-dfa-use-backword-set-in-removal-of-epsilon-closure.patch" Content-Transfer-Encoding: base64 RnJvbSBmZmZlMzI2YTdjZDdkNDUyZjg2ZWE1MWNhZTZmZTA4MTY4YTEzOTFlIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBTdW4sIDE5IEFwciAyMDIwIDExOjAxOjE4ICswOTAwClN1YmplY3Q6IFtQQVRDSF0gZGZh OiB1c2UgYmFja3dvcmQgc2V0IGluIHJlbW92YWwgb2YgZXBzaWxvbiBjbG9zdXJlCgpXaGVuIHJl bW92ZSBlcHNpbG9uIGNsb3N1cmUsIHNvIGZhciBzZWFyY2hlZCBub2RlcyBpbmNsdWRpbmcgZXBz aWxvbiBjbG9zdXJlCmluIGFsbCBub2RlcyBzZXF1ZW50aWFsbHksIGJ1dCBpdCBpcyBzbG93IGZv ciBzb21lIGNhc2VzLiAgTm93IGJ1aWxkIGJhY2t3b3JkCnNldCBiZWZvcmUgc2VhcmNoLCBhbmQg b25seSBjaGVjayBwcmV2aW9zIHBvc2l0aW9uIHdpdGggdGhlIHNldC4KUHJvYmxlbSByZXBvcnRl ZCBpbjogaHR0cHM6Ly9kZWJidWdzLmdudS5vcmcvY2dpL2J1Z3JlcG9ydC5jZ2k/YnVnPTQwNjM0 CgoqIGxpYi9kZmEuYyAoc3RydWN0IGRmYSk6IE5ldyBtZW1iZXIgJ2Vwc2lsb24nLgooYWRkdG9r X21iKTogQ2hlY2sgd2hldGhlciBhIHBhdHRlcm4gaGFzIGVwc2lsb24gbm9kZSBvciBub3QuCihl cHNjbG9zdXJlKTogV2hlbiByZW1vdmUgZXBzaWxvbiBub2RlIGFuZCByZWNvbm5lY3QsIG9ubHkg Y2hlY2sgcHJldmlvcwpwb3NpdGlvbnMuCihkZmFhbmFseXplKTogQnVpbGQgYmFja3dvcmQgc2V0 LgotLS0KIGxpYi9kZmEuYyB8ICAgNjUgKysrKysrKysrKysrKysrKysrKysrKysrKysrKysrKysr KysrKysrKysrKysrKystLS0tLS0tLS0tLS0tCiAxIGZpbGVzIGNoYW5nZWQsIDUxIGluc2VydGlv bnMoKyksIDE0IGRlbGV0aW9ucygtKQoKZGlmZiAtLWdpdCBhL2xpYi9kZmEuYyBiL2xpYi9kZmEu YwppbmRleCA5OTM5ZDIyLi44MTFjNDFlIDEwMDY0NAotLS0gYS9saWIvZGZhLmMKKysrIGIvbGli L2RmYS5jCkBAIC00ODYsNiArNDg2LDcgQEAgc3RydWN0IGRmYQogICBib29sIGZhc3Q7CQkJLyog VGhlIERGQSBpcyBmYXN0LiAgKi8KICAgdG9rZW4gdXRmOF9hbnljaGFyX2NsYXNzZXNbOV07IC8q IFRvIGxvd2VyIEFOWUNIQVIgaW4gVVRGLTggbG9jYWxlcy4gICovCiAgIG1ic3RhdGVfdCBtYnM7 CQkvKiBNdWx0aWJ5dGUgY29udmVyc2lvbiBzdGF0ZS4gICovCisgIGJvb2wgZXBzaWxvbjsKIAog ICAvKiBUaGUgZm9sbG93aW5nIGFyZSB2YWxpZCBvbmx5IGlmIE1CX0NVUl9NQVggPiAxLiAgKi8K IApAQCAtMTYxMywxMyArMTYxNCwyMSBAQCBhZGR0b2tfbWIgKHN0cnVjdCBkZmEgKmRmYSwgdG9r ZW4gdCwgY2hhciBtYnByb3ApCiAgICAgICBkZmEtPnBhcnNlLmRlcHRoLS07CiAgICAgICBicmVh azsKIAotICAgIGNhc2UgQkFDS1JFRjoKLSAgICAgIGRmYS0+ZmFzdCA9IGZhbHNlOworICAgIGNh c2UgQkVHTElORToKKyAgICBjYXNlIEVORExJTkU6CisgICAgY2FzZSBCRUdXT1JEOgorICAgIGNh c2UgRU5EV09SRDoKKyAgICBjYXNlIExJTVdPUkQ6CisgICAgY2FzZSBOT1RMSU1XT1JEOgorICAg IGNhc2UgRU1QVFk6CisgICAgICBkZmEtPmVwc2lsb24gPSB0cnVlOwogICAgICAgRkFMTFRIUk9V R0g7CisKICAgICBkZWZhdWx0OgotICAgICAgZGZhLT5ubGVhdmVzKys7Ci0gICAgICBGQUxMVEhS T1VHSDsKLSAgICBjYXNlIEVNUFRZOgorICAgICAgaWYgKHQgPT0gQkFDS1JFRikKKyAgICAgICAg ZGZhLT5mYXN0ID0gZmFsc2U7CisgICAgICBpZiAodCAhPSBFTVBUWSkKKyAgICAgICAgZGZhLT5u bGVhdmVzKys7CiAgICAgICBkZmEtPnBhcnNlLmRlcHRoKys7CiAgICAgICBicmVhazsKICAgICB9 CkBAIC0yMjk1LDE0ICsyMzA0LDE1IEBAIHN0YXRlX2luZGV4IChzdHJ1Y3QgZGZhICpkLCBwb3Np dGlvbl9zZXQgY29uc3QgKnMsIGludCBjb250ZXh0KQogICAgY29uc3RyYWludC4gIFJlcGVhdCBl eGhhdXN0aXZlbHkgdW50aWwgbm8gZnVubnkgcG9zaXRpb25zIGFyZSBsZWZ0LgogICAgUy0+ZWxl bXMgbXVzdCBiZSBsYXJnZSBlbm91Z2ggdG8gaG9sZCB0aGUgcmVzdWx0LiAgKi8KIHN0YXRpYyB2 b2lkCi1lcHNjbG9zdXJlIChzdHJ1Y3QgZGZhIGNvbnN0ICpkKQorZXBzY2xvc3VyZSAoc3RydWN0 IGRmYSBjb25zdCAqZCwgcG9zaXRpb25fc2V0ICpiYWNrd29yZCkKIHsKICAgcG9zaXRpb25fc2V0 IHRtcDsKICAgYWxsb2NfcG9zaXRpb25fc2V0ICgmdG1wLCBkLT5ubGVhdmVzKTsKICAgZm9yIChp ZHhfdCBpID0gMDsgaSA8IGQtPnRpbmRleDsgaSsrKQogICAgIGlmIChkLT5mb2xsb3dzW2ldLm5l bGVtID4gMCAmJiBkLT50b2tlbnNbaV0gPj0gTk9UQ0hBUgogICAgICAgICAmJiBkLT50b2tlbnNb aV0gIT0gQkFDS1JFRiAmJiBkLT50b2tlbnNbaV0gIT0gQU5ZQ0hBUgotICAgICAgICAmJiBkLT50 b2tlbnNbaV0gIT0gTUJDU0VUICYmIGQtPnRva2Vuc1tpXSA8IENTRVQpCisgICAgICAgICYmIGQt PnRva2Vuc1tpXSAhPSBNQkNTRVQgJiYgZC0+dG9rZW5zW2ldIDwgQ1NFVAorICAgICAgICAmJiBk LT50b2tlbnNbaV0gIT0gQkVHKQogICAgICAgewogICAgICAgICB1bnNpZ25lZCBpbnQgY29uc3Ry YWludDsKICAgICAgICAgc3dpdGNoIChkLT50b2tlbnNbaV0pCkBAIC0yMzI1LDE2ICsyMzM1LDIx IEBAIGVwc2Nsb3N1cmUgKHN0cnVjdCBkZmEgY29uc3QgKmQpCiAgICAgICAgICAgY2FzZSBOT1RM SU1XT1JEOgogICAgICAgICAgICAgY29uc3RyYWludCA9IE5PVExJTVdPUkRfQ09OU1RSQUlOVDsK ICAgICAgICAgICAgIGJyZWFrOwotICAgICAgICAgIGRlZmF1bHQ6CisgICAgICAgICAgY2FzZSBF TVBUWToKICAgICAgICAgICAgIGNvbnN0cmFpbnQgPSBOT19DT05TVFJBSU5UOwogICAgICAgICAg ICAgYnJlYWs7CisgICAgICAgICAgZGVmYXVsdDoKKyAgICAgICAgICAgIGFib3J0ICgpOwogICAg ICAgICAgIH0KIAogICAgICAgICBkZWxldGUgKGksICZkLT5mb2xsb3dzW2ldKTsKIAotICAgICAg ICBmb3IgKGlkeF90IGogPSAwOyBqIDwgZC0+dGluZGV4OyBqKyspCi0gICAgICAgICAgaWYgKGkg IT0gaiAmJiBkLT5mb2xsb3dzW2pdLm5lbGVtID4gMCkKLSAgICAgICAgICAgIHJlcGxhY2UgKCZk LT5mb2xsb3dzW2pdLCBpLCAmZC0+Zm9sbG93c1tpXSwgY29uc3RyYWludCwgJnRtcCk7CisgICAg ICAgIGZvciAoaWR4X3QgaiA9IDA7IGogPCBiYWNrd29yZFtpXS5uZWxlbTsgaisrKQorICAgICAg ICAgIHJlcGxhY2UgKCZkLT5mb2xsb3dzW2JhY2t3b3JkW2ldLmVsZW1zW2pdLmluZGV4XSwgaSwg JmQtPmZvbGxvd3NbaV0sCisgICAgICAgICAgICAgICAgICAgY29uc3RyYWludCwgJnRtcCk7Cisg ICAgICAgIGZvciAoaWR4X3QgaiA9IDA7IGogPCBkLT5mb2xsb3dzW2ldLm5lbGVtOyBqKyspCisg ICAgICAgICAgcmVwbGFjZSAoJmJhY2t3b3JkW2QtPmZvbGxvd3NbaV0uZWxlbXNbal0uaW5kZXhd LCBpLCAmYmFja3dvcmRbaV0sCisgICAgICAgICAgICAgICAgICAgTk9fQ09OU1RSQUlOVCwgJnRt cCk7CiAgICAgICB9CiAgIGZyZWUgKHRtcC5lbGVtcyk7CiB9CkBAIC0yNjQxLDYgKzI2NTYsNyBA QCBkZmFhbmFseXplIChzdHJ1Y3QgZGZhICpkLCBib29sIHNlYXJjaGZsYWcpCiAgIC8qIEZpcnN0 cG9zIGFuZCBsYXN0cG9zIGVsZW1lbnRzLiAgKi8KICAgcG9zaXRpb24gKmZpcnN0cG9zID0gcG9z YWxsb2M7CiAgIHBvc2l0aW9uICpsYXN0cG9zID0gZmlyc3Rwb3MgKyBkLT5ubGVhdmVzOworICBw b3NpdGlvbl9zZXQgKmJhY2t3b3JkID0gTlVMTDsKICAgcG9zaXRpb24gcG9zOwogICBwb3NpdGlv bl9zZXQgdG1wOwogCkBAIC0yNjczLDYgKzI2ODksOSBAQCBkZmFhbmFseXplIChzdHJ1Y3QgZGZh ICpkLCBib29sIHNlYXJjaGZsYWcpCiAgIGFsbG9jX3Bvc2l0aW9uX3NldCAoJm1lcmdlZCwgZC0+ bmxlYXZlcyk7CiAgIGQtPmZvbGxvd3MgPSB4Y2FsbG9jIChkLT50aW5kZXgsIHNpemVvZiAqZC0+ Zm9sbG93cyk7CiAKKyAgaWYgKGQtPmVwc2lsb24pCisgICAgYmFja3dvcmQgPSB4Y2FsbG9jIChk LT50aW5kZXgsIHNpemVvZiAqYmFja3dvcmQpOworCiAgIGZvciAoaWR4X3QgaSA9IDA7IGkgPCBk LT50aW5kZXg7IGkrKykKICAgICB7CiAgICAgICBzd2l0Y2ggKGQtPnRva2Vuc1tpXSkKQEAgLTI3 MTAsNiArMjcyOSwxNyBAQCBkZmFhbmFseXplIChzdHJ1Y3QgZGZhICpkLCBib29sIHNlYXJjaGZs YWcpCiAgICAgICAgIGNhc2UgQ0FUOgogICAgICAgICAgIC8qIEV2ZXJ5IGVsZW1lbnQgaW4gdGhl IGZpcnN0cG9zIG9mIHRoZSBzZWNvbmQgYXJndW1lbnQgaXMgaW4gdGhlCiAgICAgICAgICAgICAg Zm9sbG93IG9mIGV2ZXJ5IGVsZW1lbnQgaW4gdGhlIGxhc3Rwb3Mgb2YgdGhlIGZpcnN0IGFyZ3Vt ZW50LiAgKi8KKyAgICAgICAgICBpZiAoZC0+ZXBzaWxvbikKKyAgICAgICAgICAgIHsKKyAgICAg ICAgICAgICAgdG1wLm5lbGVtID0gc3RrWy0yXS5ubGFzdHBvczsKKyAgICAgICAgICAgICAgdG1w LmVsZW1zID0gbGFzdHBvcyAtIHN0a1stMV0ubmxhc3Rwb3MgLSBzdGtbLTJdLm5sYXN0cG9zOwor ICAgICAgICAgICAgICBwb3NpdGlvbiAqcCA9IGZpcnN0cG9zIC0gc3RrWy0xXS5uZmlyc3Rwb3M7 CisgICAgICAgICAgICAgIGZvciAoaWR4X3QgaiA9IDA7IGogPCBzdGtbLTFdLm5maXJzdHBvczsg aisrKQorICAgICAgICAgICAgICAgIHsKKyAgICAgICAgICAgICAgICAgIG1lcmdlICgmdG1wLCAm YmFja3dvcmRbcFtqXS5pbmRleF0sICZtZXJnZWQpOworICAgICAgICAgICAgICAgICAgY29weSAo Jm1lcmdlZCwgJmJhY2t3b3JkW3Bbal0uaW5kZXhdKTsKKyAgICAgICAgICAgICAgICB9CisgICAg ICAgICAgICB9CiAgICAgICAgICAgewogICAgICAgICAgICAgdG1wLm5lbGVtID0gc3RrWy0xXS5u Zmlyc3Rwb3M7CiAgICAgICAgICAgICB0bXAuZWxlbXMgPSBmaXJzdHBvcyAtIHN0a1stMV0ubmZp cnN0cG9zOwpAQCAtMjc5OSw5ICsyODI5LDE1IEBAIGRmYWFuYWx5emUgKHN0cnVjdCBkZmEgKmQs IGJvb2wgc2VhcmNoZmxhZykKICNlbmRpZgogICAgIH0KIAotICAvKiBGb3IgZWFjaCBmb2xsb3cg c2V0IHRoYXQgaXMgdGhlIGZvbGxvdyBzZXQgb2YgYSByZWFsIHBvc2l0aW9uLCByZXBsYWNlCi0g ICAgIGl0IHdpdGggaXRzIGVwc2lsb24gY2xvc3VyZS4gICovCi0gIGVwc2Nsb3N1cmUgKGQpOwor ICBpZiAoZC0+ZXBzaWxvbikKKyAgICB7CisgICAgICAvKiBGb3IgZWFjaCBmb2xsb3cgc2V0IHRo YXQgaXMgdGhlIGZvbGxvdyBzZXQgb2YgYSByZWFsIHBvc2l0aW9uLAorICAgICAgICAgcmVwbGFj ZSBpdCB3aXRoIGl0cyBlcHNpbG9uIGNsb3N1cmUuICAqLworICAgICAgZXBzY2xvc3VyZSAoZCwg YmFja3dvcmQpOworCisgICAgICBmb3IgKGlkeF90IGkgPSAwOyBpIDwgZC0+dGluZGV4OyBpKysp CisgICAgICAgIGZyZWUgKGJhY2t3b3JkW2ldLmVsZW1zKTsKKyAgICB9CiAKICAgZGZhb3B0aW1p emUgKGQpOwogCkBAIC0yODYzLDYgKzI4OTksNyBAQCBkZmFhbmFseXplIChzdHJ1Y3QgZGZhICpk LCBib29sIHNlYXJjaGZsYWcpCiAgIGQtPm1pbl90cmNvdW50Kys7CiAgIGQtPnRyY291bnQgPSAw OwogCisgIGZyZWUgKGJhY2t3b3JkKTsKICAgZnJlZSAocG9zYWxsb2MpOwogICBmcmVlIChzdGth bGxvYyk7CiAgIGZyZWUgKG1lcmdlZC5lbGVtcyk7Ci0tIAoxLjcuMQoK --------_5E9BB1D8000000004318_MULTIPART_MIXED_--