From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-2.6 required=3.0 tests=BAYES_00,DKIM_ADSP_CUSTOM_MED, FORGED_GMAIL_RCVD,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY shortcircuit=no autolearn=no autolearn_force=no version=3.4.2 Received: from neon.ruby-lang.org (neon.ruby-lang.org [221.186.184.75]) by dcvr.yhbt.net (Postfix) with ESMTP id 5790F1F66E for ; Tue, 25 Aug 2020 22:10:00 +0000 (UTC) Received: from neon.ruby-lang.org (localhost [IPv6:::1]) by neon.ruby-lang.org (Postfix) with ESMTP id E5B1C1209DB; Wed, 26 Aug 2020 07:09:20 +0900 (JST) Received: from xtrwkhkc.outbound-mail.sendgrid.net (xtrwkhkc.outbound-mail.sendgrid.net [167.89.16.28]) by neon.ruby-lang.org (Postfix) with ESMTPS id 554791208F1 for ; Wed, 26 Aug 2020 07:09:18 +0900 (JST) Received: by filterdrecv-p3iad2-cbd967498-9jnfk with SMTP id filterdrecv-p3iad2-cbd967498-9jnfk-19-5F458C2A-14 2020-08-25 22:09:46.176113966 +0000 UTC m=+9564.945320478 Received: from herokuapp.com (unknown) by ismtpd0015p1iad1.sendgrid.net (SG) with ESMTP id i0eiu1g6S-a0Q49h4WG0Uw for ; Tue, 25 Aug 2020 22:09:46.084 +0000 (UTC) Date: Tue, 25 Aug 2020 22:09:46 +0000 (UTC) From: fatkodima123@gmail.com Message-ID: References: Mime-Version: 1.0 X-Redmine-MailingListIntegration-Message-Ids: 75522 X-Redmine-Project: ruby-master X-Redmine-Issue-Tracker: Bug X-Redmine-Issue-Id: 17030 X-Redmine-Issue-Author: marcandre X-Redmine-Sender: fatkodima X-Mailer: Redmine X-Redmine-Host: bugs.ruby-lang.org X-Redmine-Site: Ruby Issue Tracking System X-Auto-Response-Suppress: All Auto-Submitted: auto-generated X-SG-EID: =?us-ascii?Q?RvZ0H4gD69HjmyxuoEldmMenU4znNUKl7mOsZIogdq55AIZlYEhJj7md8xrUI+?= =?us-ascii?Q?RtJluTLTL3UR34DZbfJ3HoVEY59MwJJkq4qYvKW?= =?us-ascii?Q?=2Fssy5JWFdZUPLazvHFgTSla5iUU0YIfOCuoiiJa?= =?us-ascii?Q?SBg6E9P9p6O5sRfQ3EnbbTLppYJKA76ZYtDEbHW?= =?us-ascii?Q?3s0eh4=2F05gVVjkgIztcI8yt0tCxKcxdxfurUDm6?= =?us-ascii?Q?S=2F6OXxWbO4uZKQRE8=3D?= To: ruby-core@ruby-lang.org X-ML-Name: ruby-core X-Mail-Count: 99690 Subject: [ruby-core:99690] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp X-BeenThere: ruby-core@ruby-lang.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Ruby developers List-Id: Ruby developers List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Errors-To: ruby-core-bounces@ruby-lang.org Sender: "ruby-core" Issue #17030 has been updated by fatkodima (Dima Fatko). I have implemented a simple PoC - https://github.com/ruby/ruby/pull/3455. I got the following results. ## Enumerable#grep ```ruby ARR =3D %w[foobar foobaz bazquux hello world just making this array longer] REGEXP =3D /o/ FAST_REGEXP =3D /o/f Benchmark.ips do |x| x.report("select.match?") { ARR.select { |e| e.match?(REGEXP) } } x.report("grep") { ARR.grep(REGEXP) } x.report("fast_grep") { ARR.grep(FAST_REGEXP) } x.compare! end puts "********* MEMORY *********\n" Benchmark.memory do |x| x.report("select.match?") { ARR.select { |e| e.match?(REGEXP) } } x.report("grep") { ARR.grep(REGEXP) } x.report("fast_grep") { ARR.grep(FAST_REGEXP) } x.compare! end ``` ``` Warming up -------------------------------------- select.match? 57.956k i/100ms grep 22.715k i/100ms fast_grep 59.434k i/100ms Calculating ------------------------------------- select.match? 580.339k (=B1 0.5%) i/s - 2.956M in 5.093260s grep 225.854k (=B1 0.6%) i/s - 1.136M in 5.028890s fast_grep 532.658k (=B1 9.0%) i/s - 2.675M in 5.067008s Comparison: select.match?: 580338.8 i/s fast_grep: 532658.1 i/s - same-ish: difference falls within er= ror grep: 225853.7 i/s - 2.57x (=B1 0.00) slower ********* MEMORY ********* Calculating ------------------------------------- select.match? 120.000 memsize ( 0.000 retained) 1.000 objects ( 0.000 retained) 0.000 strings ( 0.000 retained) grep 536.000 memsize ( 168.000 retained) 3.000 objects ( 1.000 retained) 0.000 strings ( 0.000 retained) fast_grep 200.000 memsize ( 0.000 retained) 1.000 objects ( 0.000 retained) 0.000 strings ( 0.000 retained) Comparison: select.match?: 120 allocated fast_grep: 200 allocated - 1.67x more grep: 536 allocated - 4.47x more ``` ## case-when ```ruby REGEXP =3D /z/ FAST_REGEXP =3D /z/f def case_when(str) case str when REGEXP true end end def fast_case_when(str) case str when FAST_REGEXP true end end STR =3D 'foobarbaz' Benchmark.ips do |x| x.report("case_when") { case_when(STR) } x.report("fast_case_when") { fast_case_when(STR) } x.compare! end puts "********* MEMORY *********\n" Benchmark.memory do |x| x.report("case_when") { case_when(STR) } x.report("fast_case_when") { fast_case_when(STR) } x.compare! end ``` ``` Warming up -------------------------------------- case_when 95.463k i/100ms fast_case_when 456.981k i/100ms Calculating ------------------------------------- case_when 964.438k (=B1 0.8%) i/s - 4.869M in 5.048469s fast_case_when 4.571M (=B1 0.6%) i/s - 23.306M in 5.098414s Comparison: fast_case_when: 4571379.8 i/s case_when: 964438.3 i/s - 4.74x (=B1 0.00) slower ********* MEMORY ********* Calculating ------------------------------------- case_when 168.000 memsize ( 0.000 retained) 1.000 objects ( 0.000 retained) 0.000 strings ( 0.000 retained) fast_case_when 0.000 memsize ( 0.000 retained) 0.000 objects ( 0.000 retained) 0.000 strings ( 0.000 retained) Comparison: fast_case_when: 0 allocated case_when: 168 allocated - Infx more ``` ## Enumerable#any? ```ruby REGEXP =3D /longer/ FAST_REGEXP =3D /longer/f ARR =3D %w[foobar foobaz bazquux hello world just making this array longer] Benchmark.ips do |x| x.report("any?") { ARR.any?(REGEXP) } x.report("fast_any?") { ARR.any?(FAST_REGEXP) } x.compare! end puts "********* MEMORY *********\n" Benchmark.memory do |x| x.report("any?") { ARR.any?(REGEXP) } x.report("fast_any?") { ARR.any?(FAST_REGEXP) } x.compare! end ``` ``` Warming up -------------------------------------- any? 25.840k i/100ms fast_any? 95.381k i/100ms Calculating ------------------------------------- any? 261.095k (=B1 1.0%) i/s - 1.318M in 5.047859s fast_any? 893.676k (=B113.2%) i/s - 4.388M in 5.070820s Comparison: fast_any?: 893675.9 i/s any?: 261095.0 i/s - 3.42x (=B1 0.00) slower ********* MEMORY ********* Calculating ------------------------------------- any? 168.000 memsize ( 168.000 retained) 1.000 objects ( 1.000 retained) 0.000 strings ( 0.000 retained) fast_any? 0.000 memsize ( 0.000 retained) 0.000 objects ( 0.000 retained) 0.000 strings ( 0.000 retained) Comparison: fast_any?: 0 allocated any?: 168 allocated - Infx more ``` If that seems OK, I will update and finish my PR with tests/docs/etc. ---------------------------------------- Bug #17030: Enumerable#grep{_v} should be optimized for Regexp https://bugs.ruby-lang.org/issues/17030#change-87181 * Author: marcandre (Marc-Andre Lafortune) * Status: Open * Priority: Normal * Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN ---------------------------------------- Currently: ```ruby array.select { |e| e.match?(REGEXP) } # about 3x faster and 6x more memory efficient than array.grep(REGEXP) ``` This is because `grep` calls `Regexp#=3D=3D=3D` which creates useless `Matc= hData` -- = https://bugs.ruby-lang.org/ Unsubscribe: