ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
From: fatkodima123@gmail.com
To: ruby-core@ruby-lang.org
Subject: [ruby-core:99690] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp
Date: Tue, 25 Aug 2020 22:09:46 +0000 (UTC)	[thread overview]
Message-ID: <redmine.journal-87181.20200825220945.182@ruby-lang.org> (raw)
In-Reply-To: redmine.issue-17030.20200713202650.182@ruby-lang.org

Issue #17030 has been updated by fatkodima (Dima Fatko).


I have implemented a simple PoC - https://github.com/ruby/ruby/pull/3455.

I got the following results.

## Enumerable#grep
```ruby
ARR = %w[foobar foobaz bazquux hello world just making this array longer]

REGEXP = /o/
FAST_REGEXP = /o/f

Benchmark.ips do |x|
  x.report("select.match?")  { ARR.select { |e| e.match?(REGEXP) } }
  x.report("grep")           { ARR.grep(REGEXP) }
  x.report("fast_grep")      { ARR.grep(FAST_REGEXP) }
  x.compare!
end

puts "********* MEMORY *********\n"

Benchmark.memory do |x|
  x.report("select.match?")  { ARR.select { |e| e.match?(REGEXP) } }
  x.report("grep")           { ARR.grep(REGEXP) }
  x.report("fast_grep")      { ARR.grep(FAST_REGEXP) }
  x.compare!
end
```

```
Warming up --------------------------------------
       select.match?    57.956k i/100ms
                grep    22.715k i/100ms
           fast_grep    59.434k i/100ms
Calculating -------------------------------------
       select.match?    580.339k (± 0.5%) i/s -      2.956M in   5.093260s
                grep    225.854k (± 0.6%) i/s -      1.136M in   5.028890s
           fast_grep    532.658k (± 9.0%) i/s -      2.675M in   5.067008s

Comparison:
       select.match?:   580338.8 i/s
           fast_grep:   532658.1 i/s - same-ish: difference falls within error
                grep:   225853.7 i/s - 2.57x  (± 0.00) slower

********* MEMORY *********
Calculating -------------------------------------
       select.match?   120.000  memsize (     0.000  retained)
                         1.000  objects (     0.000  retained)
                         0.000  strings (     0.000  retained)
                grep   536.000  memsize (   168.000  retained)
                         3.000  objects (     1.000  retained)
                         0.000  strings (     0.000  retained)
           fast_grep   200.000  memsize (     0.000  retained)
                         1.000  objects (     0.000  retained)
                         0.000  strings (     0.000  retained)

Comparison:
       select.match?:        120 allocated
           fast_grep:        200 allocated - 1.67x more
                grep:        536 allocated - 4.47x more
```

## case-when
```ruby
REGEXP = /z/
FAST_REGEXP = /z/f

def case_when(str)
  case str
  when REGEXP
    true
  end
end

def fast_case_when(str)
  case str
  when FAST_REGEXP
    true
  end
end

STR = 'foobarbaz'

Benchmark.ips do |x|
  x.report("case_when")       { case_when(STR) }
  x.report("fast_case_when")  { fast_case_when(STR) }
  x.compare!
end

puts "********* MEMORY *********\n"

Benchmark.memory do |x|
  x.report("case_when")       { case_when(STR) }
  x.report("fast_case_when")  { fast_case_when(STR) }
  x.compare!
end
```

```
Warming up --------------------------------------
           case_when    95.463k i/100ms
      fast_case_when   456.981k i/100ms
Calculating -------------------------------------
           case_when    964.438k (± 0.8%) i/s -      4.869M in   5.048469s
      fast_case_when      4.571M (± 0.6%) i/s -     23.306M in   5.098414s

Comparison:
      fast_case_when:  4571379.8 i/s
           case_when:   964438.3 i/s - 4.74x  (± 0.00) slower

********* MEMORY *********
Calculating -------------------------------------
           case_when   168.000  memsize (     0.000  retained)
                         1.000  objects (     0.000  retained)
                         0.000  strings (     0.000  retained)
      fast_case_when     0.000  memsize (     0.000  retained)
                         0.000  objects (     0.000  retained)
                         0.000  strings (     0.000  retained)

Comparison:
      fast_case_when:          0 allocated
           case_when:        168 allocated - Infx more
```

## Enumerable#any?
```ruby
REGEXP = /longer/
FAST_REGEXP = /longer/f
ARR = %w[foobar foobaz bazquux hello world just making this array longer]

Benchmark.ips do |x|
  x.report("any?")        { ARR.any?(REGEXP) }
  x.report("fast_any?")   { ARR.any?(FAST_REGEXP) }
  x.compare!
end

puts "********* MEMORY *********\n"

Benchmark.memory do |x|
  x.report("any?")        { ARR.any?(REGEXP) }
  x.report("fast_any?")   { ARR.any?(FAST_REGEXP) }
  x.compare!
end
```

```
Warming up --------------------------------------
                any?    25.840k i/100ms
           fast_any?    95.381k i/100ms
Calculating -------------------------------------
                any?    261.095k (± 1.0%) i/s -      1.318M in   5.047859s
           fast_any?    893.676k (±13.2%) i/s -      4.388M in   5.070820s

Comparison:
           fast_any?:   893675.9 i/s
                any?:   261095.0 i/s - 3.42x  (± 0.00) slower

********* MEMORY *********
Calculating -------------------------------------
                any?   168.000  memsize (   168.000  retained)
                         1.000  objects (     1.000  retained)
                         0.000  strings (     0.000  retained)
           fast_any?     0.000  memsize (     0.000  retained)
                         0.000  objects (     0.000  retained)
                         0.000  strings (     0.000  retained)

Comparison:
           fast_any?:          0 allocated
                any?:        168 allocated - Infx more
```

If that seems OK, I will update and finish my PR with tests/docs/etc.

----------------------------------------
Bug #17030: Enumerable#grep{_v} should be optimized for Regexp
https://bugs.ruby-lang.org/issues/17030#change-87181

* Author: marcandre (Marc-Andre Lafortune)
* Status: Open
* Priority: Normal
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
Currently:

```ruby
array.select { |e| e.match?(REGEXP) }

# about 3x faster and 6x more memory efficient than
array.grep(REGEXP)
```

This is because `grep` calls `Regexp#===` which creates useless `MatchData`



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

  parent reply	other threads:[~2020-08-25 22:10 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-13 20:26 [ruby-core:99156] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp marcandre-ruby-core
2020-07-13 20:27 ` [ruby-core:99157] " marcandre-ruby-core
2020-07-14  1:24 ` [ruby-core:99161] " shyouhei
2020-07-14  3:37 ` [ruby-core:99163] " marcandre-ruby-core
2020-07-14 10:25 ` [ruby-core:99164] " nobu
2020-07-14 13:28 ` [ruby-core:99165] " eregontp
2020-07-14 13:46 ` [ruby-core:99166] " marcandre-ruby-core
2020-07-21 11:27 ` [ruby-core:99248] " scivola20
2020-08-25 17:23 ` [ruby-core:99687] " fatkodima123
2020-08-25 22:09 ` fatkodima123 [this message]
2020-08-25 23:37 ` [ruby-core:99692] " sawadatsuyoshi
2020-08-26  8:31 ` [ruby-core:99702] " fatkodima123
2020-08-26 15:01 ` [ruby-core:99705] " marcandre-ruby-core
2020-08-26 17:01 ` [ruby-core:99706] " fatkodima123
2020-08-26 17:30 ` [ruby-core:99708] " marcandre-ruby-core
2020-08-26 19:17 ` [ruby-core:99713] " daniel
2020-08-26 19:31 ` [ruby-core:99714] " marcandre-ruby-core
2020-08-26 20:43 ` [ruby-core:99716] " daniel
2020-08-27  2:10 ` [ruby-core:99722] " marcandre-ruby-core
2020-08-27  8:03 ` [ruby-core:99729] " fatkodima123
2020-08-27 16:58 ` [ruby-core:99737] " daniel
2020-08-27 17:16 ` [ruby-core:99738] " merch-redmine
2020-08-28 15:01 ` [ruby-core:99751] " eregontp
2020-08-28 15:30 ` [ruby-core:99752] " eregontp
2020-09-25  6:53 ` [ruby-core:100127] " matz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.ruby-lang.org/en/community/mailing-lists/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=redmine.journal-87181.20200825220945.182@ruby-lang.org \
    --to=ruby-core@ruby-lang.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).