From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-2.6 required=3.0 tests=AWL,BAYES_00, DKIM_ADSP_CUSTOM_MED,FORGED_GMAIL_RCVD,FREEMAIL_FORGED_FROMDOMAIN, FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY shortcircuit=no autolearn=no autolearn_force=no version=3.4.2 Received: from neon.ruby-lang.org (neon.ruby-lang.org [221.186.184.75]) by dcvr.yhbt.net (Postfix) with ESMTP id 7B6431F4B4 for ; Sat, 24 Oct 2020 14:51:58 +0000 (UTC) Received: from neon.ruby-lang.org (localhost [IPv6:::1]) by neon.ruby-lang.org (Postfix) with ESMTP id 3AB861209B7; Sat, 24 Oct 2020 23:51:18 +0900 (JST) Received: from xtrwkhkc.outbound-mail.sendgrid.net (xtrwkhkc.outbound-mail.sendgrid.net [167.89.16.28]) by neon.ruby-lang.org (Postfix) with ESMTPS id 960921209A5 for ; Sat, 24 Oct 2020 23:51:15 +0900 (JST) Received: by filterdrecv-p3las1-bf7bc68d5-pcsdj with SMTP id filterdrecv-p3las1-bf7bc68d5-pcsdj-18-5F943F86-C 2020-10-24 14:51:50.178719908 +0000 UTC m=+247955.142169930 Received: from herokuapp.com (unknown) by ismtpd0098p1mdw1.sendgrid.net (SG) with ESMTP id 4oSlk6P7Tq68F9HKLlRg0w for ; Sat, 24 Oct 2020 14:51:50.008 +0000 (UTC) Date: Sat, 24 Oct 2020 14:51:50 +0000 (UTC) From: eregontp@gmail.com Message-ID: References: Mime-Version: 1.0 X-Redmine-MailingListIntegration-Message-Ids: 76394 X-Redmine-Project: ruby-master X-Redmine-Issue-Tracker: Feature X-Redmine-Issue-Id: 17206 X-Redmine-Issue-Author: fatkodima X-Redmine-Sender: Eregon X-Mailer: Redmine X-Redmine-Host: bugs.ruby-lang.org X-Redmine-Site: Ruby Issue Tracking System X-Auto-Response-Suppress: All Auto-Submitted: auto-generated X-SG-EID: =?us-ascii?Q?KippOI8ZHtTweq7XfQzW93937kJ4QNWwSBuHnaMEcr22zVhc747iCuRmh9NSOo?= =?us-ascii?Q?Gx=2Fb8R39J6bFUdw2wJZSIb845PMSQNaaUsQrA=2F9?= =?us-ascii?Q?0xgQP6JCLmtRzts1ghFXeodC5v9sPfMOrZAbcgI?= =?us-ascii?Q?gKcBnVXPa0Qimk+5DlaGTrv3Hb768q98qXOydRN?= =?us-ascii?Q?kItSL5lvuyW1bDRqSWSz4EUNxsB5x7+4T7snMmW?= =?us-ascii?Q?gnFSxmc9uIb3h=2F0aA=3D?= To: ruby-core@ruby-lang.org X-Entity-ID: b/2+PoftWZ6GuOu3b0IycA== X-ML-Name: ruby-core X-Mail-Count: 100524 Subject: [ruby-core:100524] [Ruby master Feature#17206] Introduce new Regexp option to avoid global MatchData allocations X-BeenThere: ruby-core@ruby-lang.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Ruby developers List-Id: Ruby developers List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ruby-core-bounces@ruby-lang.org Sender: "ruby-core" Issue #17206 has been updated by Eregon (Benoit Daloze). I took a quick look, the logic to set $~ is here: https://github.com/ruby/ruby/blob/148961adcd0704d964fce920330a6301b9704c25/re.c#L1608-L1623 It does not seem so expensive, but the region is allocated which xmalloc() which is probably not so cheap (there is also a `rb_gc()` call in there, hopefully it's not hit in practice). `rb_backref_set()` goes through a few indirections (it needs to reach the caller frame typically), but it does not seem too expensive either. I think it would be valuable to investigate further what's actually expensive for setting `$~` and how can that be optimized. A hacky Regexp flag to manually optimize `match/=~/===` calls doesn't seem a good way to me. The caller code knows if it needs $~, etc, not the Regexp literal. ---------------------------------------- Feature #17206: Introduce new Regexp option to avoid global MatchData allocations https://bugs.ruby-lang.org/issues/17206#change-88146 * Author: fatkodima (Dima Fatko) * Status: Open * Priority: Normal ---------------------------------------- Originates from https://bugs.ruby-lang.org/issues/17030 When this option is specified, ruby will not create global `MatchData` objects, when not explicitly needed by the method. If the new option is named `f`, we can write as `/o/f`, and `grep(/o/f)` is faster than `grep(/o/)`. This speeds up not only `grep`, but also `all?`, `any?`, `case` and so on. Many people have written code like this: ```ruby IO.foreach("foo.txt") do |line| case line when /^#/ # do nothing when /^(\d+)/ # using $1 when /xxx/ # using $& when /yyy/ # not using $& else # ... end end ``` This is slow, because of the above mentioned problem. Replacing `/^#/` with `/^#/f`, and `/yyy/` with `/yyy/f` will make it faster. Some benchmarks - https://bugs.ruby-lang.org/issues/17030#note-9 which show `2.5x` to `5x` speedup. PR: https://github.com/ruby/ruby/pull/3455 -- https://bugs.ruby-lang.org/