From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS22989 209.51.188.0/24 X-Spam-Status: No, score=-3.9 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_LOW, SPF_HELO_NONE,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id C628F1F45A for ; Fri, 17 Apr 2020 01:25:03 +0000 (UTC) Received: from localhost ([::1]:40998 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jPFkc-0005vI-Io for normalperson@yhbt.net; Thu, 16 Apr 2020 21:25:02 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:33792) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jPFkQ-0005hh-Kj for bug-gnulib@gnu.org; Thu, 16 Apr 2020 21:24:56 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jPFkP-0007SP-2f for bug-gnulib@gnu.org; Thu, 16 Apr 2020 21:24:50 -0400 Received: from mailgw04.kcn.ne.jp ([61.86.7.211]:51474) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1jPFkO-0007RP-LD for bug-gnulib@gnu.org; Thu, 16 Apr 2020 21:24:49 -0400 Received: from mxs02-s (mailgw2.kcn.ne.jp [61.86.15.234]) by mailgw04.kcn.ne.jp (Postfix) with ESMTP id 509768077D for ; Fri, 17 Apr 2020 10:24:44 +0900 (JST) X-matriXscan-loop-detect: 720f28862985e530c57a51f59961a2758d61e2bb Received: from mail10.kcn.ne.jp ([61.86.6.128]) by mxs02-s with ESMTP; Fri, 17 Apr 2020 10:24:43 +0900 (JST) Received: from [10.120.1.110] (i118-21-128-66.s30.a048.ap.plala.or.jp [118.21.128.66]) by mail10.kcn.ne.jp (Postfix) with ESMTPA id 15DA440E3754; Fri, 17 Apr 2020 10:24:43 +0900 (JST) Date: Fri, 17 Apr 2020 10:24:42 +0900 From: Norihiro Tanaka To: <40634@debbugs.gnu.org> Subject: Re: bug#40634: Massive pattern list handling with -E format seems very slow since 2.28. In-Reply-To: <20200417093536.875E.27F6AC2D@kcn.ne.jp> References: <6503eb8e-e6fd-b4dd-aab7-76bb6955d87b@cs.ucla.edu> <20200417093536.875E.27F6AC2D@kcn.ne.jp> Message-Id: <20200417102441.8766.27F6AC2D@kcn.ne.jp> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------_5E99047C000000008756_MULTIPART_MIXED_" Content-Transfer-Encoding: 7bit X-Mailer: Becky! ver. 2.74.02 [ja] X-matriXscan-msec-AV: Clean X-matriXscan-Action: Approve X-matriXscan: Uncategorized X-detected-operating-system: by eggs.gnu.org: GNU/Linux 3.x [fuzzy] X-Received-From: 61.86.7.211 X-BeenThere: bug-gnulib@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Gnulib discussion list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: fryasu@yahoo.co.jp, Paul Eggert , bug-gnulib@gnu.org Errors-To: bug-gnulib-bounces+normalperson=yhbt.net@gnu.org Sender: "bug-gnulib" --------_5E99047C000000008756_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit On Fri, 17 Apr 2020 09:35:36 +0900 Norihiro Tanaka wrote: > > On Thu, 16 Apr 2020 16:00:29 -0700 > Paul Eggert wrote: > > > On 4/16/20 3:53 PM, Norihiro Tanaka wrote: > > > > > I have had no idea to solve the problem yet. If we revert it, bug#33357 > > > will come back. > > > > Yes, I'd rather not revert if we can help it. > > > > My own thought was to not analyze the regular expression if we discover that the input is empty. :-) > > Now, I have a idea, it is that we build indexes of epsilon nodes > including in follows before remove epsilon nodes. I wrote fix for the bug, but it will be slower then at grep 2.27 yet. --------_5E99047C000000008756_MULTIPART_MIXED_ Content-Type: text/plain; charset="US-ASCII"; name="0001-dfa-build-auxiliary-indexes-before-remove-epsilon-cl.patch" Content-Disposition: attachment; filename="0001-dfa-build-auxiliary-indexes-before-remove-epsilon-cl.patch" Content-Transfer-Encoding: base64 RnJvbSBlNTY4MGE0YzQzYjE4NWRmNmI4YzFiODA0MjNkNjYzZGU2ZjZlMzdlIE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBOb3JpaGlybyBUYW5ha2EgPG5vcml0bmtAa2NuLm5lLmpwPgpE YXRlOiBGcmksIDE3IEFwciAyMDIwIDEwOjEyOjAxICswOTAwClN1YmplY3Q6IFtQQVRDSF0gZGZh OiBidWlsZCBhdXhpbGlhcnkgaW5kZXhlcyBiZWZvcmUgcmVtb3ZlIGVwc2lsb24gY2xvc3VyZQoK V2hlbiByZW1vdmUgZXBzaWxvbiBjbG9zdXJlLCBzbyBmYXIgc2VhcmNoZWQgbm9kZXMgaW5jbHVk aW5nIGVwc2lsb24gY2xvc3VyZQppbiBhbGwgbm9kZXMgc2VxdWVudGlhbGx5LCBidXQgaXQgaXMg c2xvdyBmb3Igc29tZSBjYXNlcy4gIE5vdyBidWlsZAphdXhpbGlhcnkgaW5kZXhlcyBiZWZvcmUg c2VhcmNoLgoKUHJvYmxlbSByZXBvcnRlZCBpbjogaHR0cHM6Ly9kZWJidWdzLmdudS5vcmcvY2dp L2J1Z3JlcG9ydC5jZ2k/YnVnPTQwNjM0CgoqIGxpYi9kZmEuYyAob3ZlcndyYXApOiBOZXcgZnVu Y3Rpb24uCihlcHNjbG9zdXJlKTogYnVpbGQgYXV4aWxpYXJ5IGluZGV4ZXMgYmVmb3JlIHJlbW92 ZSBlcHNpbG9uIGNsb3N1cmUuCi0tLQogbGliL2RmYS5jIHwgICA1MiArKysrKysrKysrKysrKysr KysrKysrKysrKysrKysrKysrKysrKysrKysrKystLS0tLS0tCiAxIGZpbGVzIGNoYW5nZWQsIDQ1 IGluc2VydGlvbnMoKyksIDcgZGVsZXRpb25zKC0pCgpkaWZmIC0tZ2l0IGEvbGliL2RmYS5jIGIv bGliL2RmYS5jCmluZGV4IDk5MzlkMjIuLjk1ODA2OWUgMTAwNjQ0Ci0tLSBhL2xpYi9kZmEuYwor KysgYi9saWIvZGZhLmMKQEAgLTIyMDEsNiArMjIwMSwyMiBAQCByZXBsYWNlIChwb3NpdGlvbl9z ZXQgKmRzdCwgaWR4X3QgZGVsLCBwb3NpdGlvbl9zZXQgKmFkZCwKICAgICB9CiB9CiAKK3N0YXRp YyBib29sCitvdmVyd3JhcCAocG9zaXRpb25fc2V0IGNvbnN0ICpzLCBpZHhfdCBjb25zdCAqZWxl bXMsIGlkeF90IG5lbGVtKQoreworICBpZHhfdCBpID0gMCwgaiA9IDA7CisKKyAgd2hpbGUgKGkg PCBzLT5uZWxlbSAmJiBqIDwgbmVsZW0pCisgICAgaWYgKHMtPmVsZW1zW2ldLmluZGV4IDwgZWxl bXNbal0pCisgICAgICBpKys7CisgICAgZWxzZSBpZiAocy0+ZWxlbXNbaV0uaW5kZXggPiBlbGVt c1tqXSkKKyAgICAgIGorKzsKKyAgICBlbHNlCisgICAgICByZXR1cm4gdHJ1ZTsKKworICByZXR1 cm4gZmFsc2U7Cit9CisKIC8qIEZpbmQgdGhlIGluZGV4IG9mIHRoZSBzdGF0ZSBjb3JyZXNwb25k aW5nIHRvIHRoZSBnaXZlbiBwb3NpdGlvbiBzZXQgd2l0aAogICAgdGhlIGdpdmVuIHByZWNlZGlu ZyBjb250ZXh0LCBvciBjcmVhdGUgYSBuZXcgc3RhdGUgaWYgdGhlcmUgaXMgbm8gc3VjaAogICAg c3RhdGUuICBDb250ZXh0IHRlbGxzIHdoZXRoZXIgd2UgZ290IGhlcmUgb24gYSBuZXdsaW5lIG9y IGxldHRlci4gICovCkBAIC0yMjk4LDE0ICsyMzE0LDMzIEBAIHN0YXRpYyB2b2lkCiBlcHNjbG9z dXJlIChzdHJ1Y3QgZGZhIGNvbnN0ICpkKQogewogICBwb3NpdGlvbl9zZXQgdG1wOworICBpZHhf dCAqY3VycnMsICpuZXh0czsKKyAgaWR4X3QgbmN1cnIgPSAwOworICBpZHhfdCBubmV4dCA9IDA7 CisKICAgYWxsb2NfcG9zaXRpb25fc2V0ICgmdG1wLCBkLT5ubGVhdmVzKTsKLSAgZm9yIChpZHhf dCBpID0gMDsgaSA8IGQtPnRpbmRleDsgaSsrKQotICAgIGlmIChkLT5mb2xsb3dzW2ldLm5lbGVt ID4gMCAmJiBkLT50b2tlbnNbaV0gPj0gTk9UQ0hBUgorICBjdXJycyA9IHhubWFsbG9jIChkLT50 aW5kZXgsIHNpemVvZiAqY3VycnMpOworICBuZXh0cyA9IHhubWFsbG9jIChkLT50aW5kZXgsIHNp emVvZiAqbmV4dHMpOworCisgIGZvciAoaWR4X3QgaSA9IDA7IGkgPCBkLT50aW5kZXg7IGkrKykg eworICAgICBpZiAoZC0+Zm9sbG93c1tpXS5uZWxlbSA+IDAgJiYgZC0+dG9rZW5zW2ldID49IE5P VENIQVIKICAgICAgICAgJiYgZC0+dG9rZW5zW2ldICE9IEJBQ0tSRUYgJiYgZC0+dG9rZW5zW2ld ICE9IEFOWUNIQVIKICAgICAgICAgJiYgZC0+dG9rZW5zW2ldICE9IE1CQ1NFVCAmJiBkLT50b2tl bnNbaV0gPCBDU0VUKQorICAgICAgY3VycnNbbmN1cnIrK10gPSBpOworICB9CisKKyAgZm9yIChp ZHhfdCBpID0gMCwgaiA9IDA7IGkgPCBkLT50aW5kZXg7IGkrKykKKyAgICB7CisgICAgICB3aGls ZSAoaiA8IG5jdXJyICYmIGN1cnJzW2pdIDwgaSkKKyAgICAgICAgaisrOworICAgICAgaWYgKG92 ZXJ3cmFwICgmZC0+Zm9sbG93c1tpXSwgY3VycnMsIG5jdXJyKSkKKyAgICAgICAgbmV4dHNbbm5l eHQrK10gPSBpOworICAgIH0KKworICBmb3IgKGlkeF90IGkgPSAwOyBpIDwgbmN1cnI7IGkrKykK ICAgICAgIHsKICAgICAgICAgdW5zaWduZWQgaW50IGNvbnN0cmFpbnQ7Ci0gICAgICAgIHN3aXRj aCAoZC0+dG9rZW5zW2ldKQorICAgICAgICBzd2l0Y2ggKGQtPnRva2Vuc1tjdXJyc1tpXV0pCiAg ICAgICAgICAgewogICAgICAgICAgIGNhc2UgQkVHTElORToKICAgICAgICAgICAgIGNvbnN0cmFp bnQgPSBCRUdMSU5FX0NPTlNUUkFJTlQ7CkBAIC0yMzMwLDEzICsyMzY1LDE2IEBAIGVwc2Nsb3N1 cmUgKHN0cnVjdCBkZmEgY29uc3QgKmQpCiAgICAgICAgICAgICBicmVhazsKICAgICAgICAgICB9 CiAKLSAgICAgICAgZGVsZXRlIChpLCAmZC0+Zm9sbG93c1tpXSk7CisgICAgICAgIGRlbGV0ZSAo aSwgJmQtPmZvbGxvd3NbY3VycnNbaV1dKTsKIAotICAgICAgICBmb3IgKGlkeF90IGogPSAwOyBq IDwgZC0+dGluZGV4OyBqKyspCi0gICAgICAgICAgaWYgKGkgIT0gaiAmJiBkLT5mb2xsb3dzW2pd Lm5lbGVtID4gMCkKLSAgICAgICAgICAgIHJlcGxhY2UgKCZkLT5mb2xsb3dzW2pdLCBpLCAmZC0+ Zm9sbG93c1tpXSwgY29uc3RyYWludCwgJnRtcCk7CisgICAgICAgIGZvciAoaWR4X3QgaiA9IDA7 IGogPCBubmV4dDsgaisrKQorICAgICAgICAgIHJlcGxhY2UgKCZkLT5mb2xsb3dzW25leHRzW2pd XSwgY3VycnNbaV0sICZkLT5mb2xsb3dzW2N1cnJzW2ldXSwKKyAgICAgICAgICAgICAgICAgIGNv bnN0cmFpbnQsICZ0bXApOwogICAgICAgfQorCiAgIGZyZWUgKHRtcC5lbGVtcyk7CisgIGZyZWUg KGN1cnJzKTsKKyAgZnJlZSAobmV4dHMpOwogfQogCiAvKiBSZXR1cm5zIHRoZSBzZXQgb2YgY29u dGV4dHMgZm9yIHdoaWNoIHRoZXJlIGlzIGF0IGxlYXN0IG9uZQotLSAKMS43LjEKCg== --------_5E99047C000000008756_MULTIPART_MIXED_--