ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:99156] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp
@ 2020-07-13 20:26 marcandre-ruby-core
  2020-07-13 20:27 ` [ruby-core:99157] " marcandre-ruby-core
                   ` (23 more replies)
  0 siblings, 24 replies; 25+ messages in thread
From: marcandre-ruby-core @ 2020-07-13 20:26 UTC (permalink / raw)
  To: ruby-core

Issue #17030 has been reported by marcandre (Marc-Andre Lafortune).

----------------------------------------
Bug #17030: Enumerable#grep{_v} should be optimized for Regexp
https://bugs.ruby-lang.org/issues/17030

* Author: marcandre (Marc-Andre Lafortune)
* Status: Open
* Priority: Normal
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
Currently:

```ruby
array.select { |e| e.match?(REGEXP) }

# about 3x faster and 6x more memory efficient than
array.grep(REGEXP)
```

This is because `grep` calls `Regexp#===` which creates useless `MatchData`



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [ruby-core:99157] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp
  2020-07-13 20:26 [ruby-core:99156] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp marcandre-ruby-core
@ 2020-07-13 20:27 ` marcandre-ruby-core
  2020-07-14  1:24 ` [ruby-core:99161] " shyouhei
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: marcandre-ruby-core @ 2020-07-13 20:27 UTC (permalink / raw)
  To: ruby-core

Issue #17030 has been updated by marcandre (Marc-Andre Lafortune).


Code to reproduce:

```ruby
require 'benchmark-ips'
require 'benchmark-memory'

arr = %w[foobar foobaz bazquux hello world just making this array longer]

REGEXP = /o/

def select_match(arr)
  arr.select { |e| e.match?(REGEXP) }
end

def grep(arr)
  arr.grep(REGEXP)
end

Benchmark.ips do |x|
  x.report("select.match?")  { select_match(arr) }
  x.report("grep")           { grep(arr) }
  x.compare!
end

puts "********* MEMORY *********"

Benchmark.memory do |x|
  x.report("select.match?")  { select_match(arr) }
  x.report("grep")           { grep(arr) }
  x.compare!
end
```

----------------------------------------
Bug #17030: Enumerable#grep{_v} should be optimized for Regexp
https://bugs.ruby-lang.org/issues/17030#change-86535

* Author: marcandre (Marc-Andre Lafortune)
* Status: Open
* Priority: Normal
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
Currently:

```ruby
array.select { |e| e.match?(REGEXP) }

# about 3x faster and 6x more memory efficient than
array.grep(REGEXP)
```

This is because `grep` calls `Regexp#===` which creates useless `MatchData`



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [ruby-core:99161] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp
  2020-07-13 20:26 [ruby-core:99156] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp marcandre-ruby-core
  2020-07-13 20:27 ` [ruby-core:99157] " marcandre-ruby-core
@ 2020-07-14  1:24 ` shyouhei
  2020-07-14  3:37 ` [ruby-core:99163] " marcandre-ruby-core
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: shyouhei @ 2020-07-14  1:24 UTC (permalink / raw)
  To: ruby-core

Issue #17030 has been updated by shyouhei (Shyouhei Urabe).


Yes but E#grep's allocating MatchData is by spec.  You can observe `$&` etc. by passing a block to it.

```ruby
p %w[q w e r].grep(/./) { $~ }
```

So this is at least a breaking change.

----------------------------------------
Bug #17030: Enumerable#grep{_v} should be optimized for Regexp
https://bugs.ruby-lang.org/issues/17030#change-86539

* Author: marcandre (Marc-Andre Lafortune)
* Status: Open
* Priority: Normal
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
Currently:

```ruby
array.select { |e| e.match?(REGEXP) }

# about 3x faster and 6x more memory efficient than
array.grep(REGEXP)
```

This is because `grep` calls `Regexp#===` which creates useless `MatchData`



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [ruby-core:99163] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp
  2020-07-13 20:26 [ruby-core:99156] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp marcandre-ruby-core
  2020-07-13 20:27 ` [ruby-core:99157] " marcandre-ruby-core
  2020-07-14  1:24 ` [ruby-core:99161] " shyouhei
@ 2020-07-14  3:37 ` marcandre-ruby-core
  2020-07-14 10:25 ` [ruby-core:99164] " nobu
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: marcandre-ruby-core @ 2020-07-14  3:37 UTC (permalink / raw)
  To: ruby-core

Issue #17030 has been updated by marcandre (Marc-Andre Lafortune).


You are right for `grep` with a block, we can't necessarily optimize, but we should optimize `grep` without a block, no?

----------------------------------------
Bug #17030: Enumerable#grep{_v} should be optimized for Regexp
https://bugs.ruby-lang.org/issues/17030#change-86541

* Author: marcandre (Marc-Andre Lafortune)
* Status: Open
* Priority: Normal
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
Currently:

```ruby
array.select { |e| e.match?(REGEXP) }

# about 3x faster and 6x more memory efficient than
array.grep(REGEXP)
```

This is because `grep` calls `Regexp#===` which creates useless `MatchData`



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [ruby-core:99164] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp
  2020-07-13 20:26 [ruby-core:99156] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp marcandre-ruby-core
                   ` (2 preceding siblings ...)
  2020-07-14  3:37 ` [ruby-core:99163] " marcandre-ruby-core
@ 2020-07-14 10:25 ` nobu
  2020-07-14 13:28 ` [ruby-core:99165] " eregontp
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: nobu @ 2020-07-14 10:25 UTC (permalink / raw)
  To: ruby-core

Issue #17030 has been updated by nobu (Nobuyoshi Nakada).


Even without a block, `grep` sets `$~` to the last match result.

----------------------------------------
Bug #17030: Enumerable#grep{_v} should be optimized for Regexp
https://bugs.ruby-lang.org/issues/17030#change-86542

* Author: marcandre (Marc-Andre Lafortune)
* Status: Open
* Priority: Normal
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
Currently:

```ruby
array.select { |e| e.match?(REGEXP) }

# about 3x faster and 6x more memory efficient than
array.grep(REGEXP)
```

This is because `grep` calls `Regexp#===` which creates useless `MatchData`



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [ruby-core:99165] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp
  2020-07-13 20:26 [ruby-core:99156] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp marcandre-ruby-core
                   ` (3 preceding siblings ...)
  2020-07-14 10:25 ` [ruby-core:99164] " nobu
@ 2020-07-14 13:28 ` eregontp
  2020-07-14 13:46 ` [ruby-core:99166] " marcandre-ruby-core
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: eregontp @ 2020-07-14 13:28 UTC (permalink / raw)
  To: ruby-core

Issue #17030 has been updated by Eregon (Benoit Daloze).


nobu (Nobuyoshi Nakada) wrote in #note-4:
> Even without a block, `grep` sets `$~` to the last match result.

I guess cases using `$~` after the call to `grep` are very rare (notably because only the last match of the Enumerable would be accessible).

So I would suggest not setting `$~` for `grep` without block.

----------------------------------------
Bug #17030: Enumerable#grep{_v} should be optimized for Regexp
https://bugs.ruby-lang.org/issues/17030#change-86543

* Author: marcandre (Marc-Andre Lafortune)
* Status: Open
* Priority: Normal
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
Currently:

```ruby
array.select { |e| e.match?(REGEXP) }

# about 3x faster and 6x more memory efficient than
array.grep(REGEXP)
```

This is because `grep` calls `Regexp#===` which creates useless `MatchData`



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [ruby-core:99166] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp
  2020-07-13 20:26 [ruby-core:99156] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp marcandre-ruby-core
                   ` (4 preceding siblings ...)
  2020-07-14 13:28 ` [ruby-core:99165] " eregontp
@ 2020-07-14 13:46 ` marcandre-ruby-core
  2020-07-21 11:27 ` [ruby-core:99248] " scivola20
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: marcandre-ruby-core @ 2020-07-14 13:46 UTC (permalink / raw)
  To: ruby-core

Issue #17030 has been updated by marcandre (Marc-Andre Lafortune).


nobu (Nobuyoshi Nakada) wrote in #note-4:
> Even without a block, `grep` sets `$~` to the last match result.

I agree with @eregon, doesn't seem like it makes much sense to use that.

There's also no valid reason it should set `$~` for all the matches tested before the last one.

----------------------------------------
Bug #17030: Enumerable#grep{_v} should be optimized for Regexp
https://bugs.ruby-lang.org/issues/17030#change-86544

* Author: marcandre (Marc-Andre Lafortune)
* Status: Open
* Priority: Normal
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
Currently:

```ruby
array.select { |e| e.match?(REGEXP) }

# about 3x faster and 6x more memory efficient than
array.grep(REGEXP)
```

This is because `grep` calls `Regexp#===` which creates useless `MatchData`



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [ruby-core:99248] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp
  2020-07-13 20:26 [ruby-core:99156] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp marcandre-ruby-core
                   ` (5 preceding siblings ...)
  2020-07-14 13:46 ` [ruby-core:99166] " marcandre-ruby-core
@ 2020-07-21 11:27 ` scivola20
  2020-08-25 17:23 ` [ruby-core:99687] " fatkodima123
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: scivola20 @ 2020-07-21 11:27 UTC (permalink / raw)
  To: ruby-core

Issue #17030 has been updated by scivola20 (sciv ola).


I have an idea to solve it without any compatibility problem.

[1] Introduce such a Regexp object that `===` method is same as `match?`.
[2] Introduce regexp literal option that makes the Regexp object as [1].

If the option is `'f'`, we can write as `/o/f`, and `grep(/o/f)` is faster than `grep(/o/)`.

This speed up not only `grep` but also `all?`, `any?`, `case` and so on.

Many people have written like this:

```rb
IO.foreach("foo.txt") do |line|
  case line
  when /^#/
    # do nothing 
  when /^(\d+)/
    # using $1
  when /xxx/
    # using $&
  when /yyy/
    # not using $&
  else
    # ...
  end
end
```

This is slow because of the above mentioned problem.
Replacing `/^#/` with `/^#/f`, and `/yyy/` with `/yyy/f` will make it faster.

----------------------------------------
Bug #17030: Enumerable#grep{_v} should be optimized for Regexp
https://bugs.ruby-lang.org/issues/17030#change-86632

* Author: marcandre (Marc-Andre Lafortune)
* Status: Open
* Priority: Normal
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
Currently:

```ruby
array.select { |e| e.match?(REGEXP) }

# about 3x faster and 6x more memory efficient than
array.grep(REGEXP)
```

This is because `grep` calls `Regexp#===` which creates useless `MatchData`



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [ruby-core:99687] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp
  2020-07-13 20:26 [ruby-core:99156] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp marcandre-ruby-core
                   ` (6 preceding siblings ...)
  2020-07-21 11:27 ` [ruby-core:99248] " scivola20
@ 2020-08-25 17:23 ` fatkodima123
  2020-08-25 22:09 ` [ruby-core:99690] " fatkodima123
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: fatkodima123 @ 2020-08-25 17:23 UTC (permalink / raw)
  To: ruby-core

Issue #17030 has been updated by fatkodima (Dima Fatko).


I like @scivola20 's idea and would like to try to implement it.
Before starting, @nobu, wdyt, is this something that will be merged?

----------------------------------------
Bug #17030: Enumerable#grep{_v} should be optimized for Regexp
https://bugs.ruby-lang.org/issues/17030#change-87177

* Author: marcandre (Marc-Andre Lafortune)
* Status: Open
* Priority: Normal
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
Currently:

```ruby
array.select { |e| e.match?(REGEXP) }

# about 3x faster and 6x more memory efficient than
array.grep(REGEXP)
```

This is because `grep` calls `Regexp#===` which creates useless `MatchData`



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [ruby-core:99690] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp
  2020-07-13 20:26 [ruby-core:99156] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp marcandre-ruby-core
                   ` (7 preceding siblings ...)
  2020-08-25 17:23 ` [ruby-core:99687] " fatkodima123
@ 2020-08-25 22:09 ` fatkodima123
  2020-08-25 23:37 ` [ruby-core:99692] " sawadatsuyoshi
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: fatkodima123 @ 2020-08-25 22:09 UTC (permalink / raw)
  To: ruby-core

Issue #17030 has been updated by fatkodima (Dima Fatko).


I have implemented a simple PoC - https://github.com/ruby/ruby/pull/3455.

I got the following results.

## Enumerable#grep
```ruby
ARR = %w[foobar foobaz bazquux hello world just making this array longer]

REGEXP = /o/
FAST_REGEXP = /o/f

Benchmark.ips do |x|
  x.report("select.match?")  { ARR.select { |e| e.match?(REGEXP) } }
  x.report("grep")           { ARR.grep(REGEXP) }
  x.report("fast_grep")      { ARR.grep(FAST_REGEXP) }
  x.compare!
end

puts "********* MEMORY *********\n"

Benchmark.memory do |x|
  x.report("select.match?")  { ARR.select { |e| e.match?(REGEXP) } }
  x.report("grep")           { ARR.grep(REGEXP) }
  x.report("fast_grep")      { ARR.grep(FAST_REGEXP) }
  x.compare!
end
```

```
Warming up --------------------------------------
       select.match?    57.956k i/100ms
                grep    22.715k i/100ms
           fast_grep    59.434k i/100ms
Calculating -------------------------------------
       select.match?    580.339k (± 0.5%) i/s -      2.956M in   5.093260s
                grep    225.854k (± 0.6%) i/s -      1.136M in   5.028890s
           fast_grep    532.658k (± 9.0%) i/s -      2.675M in   5.067008s

Comparison:
       select.match?:   580338.8 i/s
           fast_grep:   532658.1 i/s - same-ish: difference falls within error
                grep:   225853.7 i/s - 2.57x  (± 0.00) slower

********* MEMORY *********
Calculating -------------------------------------
       select.match?   120.000  memsize (     0.000  retained)
                         1.000  objects (     0.000  retained)
                         0.000  strings (     0.000  retained)
                grep   536.000  memsize (   168.000  retained)
                         3.000  objects (     1.000  retained)
                         0.000  strings (     0.000  retained)
           fast_grep   200.000  memsize (     0.000  retained)
                         1.000  objects (     0.000  retained)
                         0.000  strings (     0.000  retained)

Comparison:
       select.match?:        120 allocated
           fast_grep:        200 allocated - 1.67x more
                grep:        536 allocated - 4.47x more
```

## case-when
```ruby
REGEXP = /z/
FAST_REGEXP = /z/f

def case_when(str)
  case str
  when REGEXP
    true
  end
end

def fast_case_when(str)
  case str
  when FAST_REGEXP
    true
  end
end

STR = 'foobarbaz'

Benchmark.ips do |x|
  x.report("case_when")       { case_when(STR) }
  x.report("fast_case_when")  { fast_case_when(STR) }
  x.compare!
end

puts "********* MEMORY *********\n"

Benchmark.memory do |x|
  x.report("case_when")       { case_when(STR) }
  x.report("fast_case_when")  { fast_case_when(STR) }
  x.compare!
end
```

```
Warming up --------------------------------------
           case_when    95.463k i/100ms
      fast_case_when   456.981k i/100ms
Calculating -------------------------------------
           case_when    964.438k (± 0.8%) i/s -      4.869M in   5.048469s
      fast_case_when      4.571M (± 0.6%) i/s -     23.306M in   5.098414s

Comparison:
      fast_case_when:  4571379.8 i/s
           case_when:   964438.3 i/s - 4.74x  (± 0.00) slower

********* MEMORY *********
Calculating -------------------------------------
           case_when   168.000  memsize (     0.000  retained)
                         1.000  objects (     0.000  retained)
                         0.000  strings (     0.000  retained)
      fast_case_when     0.000  memsize (     0.000  retained)
                         0.000  objects (     0.000  retained)
                         0.000  strings (     0.000  retained)

Comparison:
      fast_case_when:          0 allocated
           case_when:        168 allocated - Infx more
```

## Enumerable#any?
```ruby
REGEXP = /longer/
FAST_REGEXP = /longer/f
ARR = %w[foobar foobaz bazquux hello world just making this array longer]

Benchmark.ips do |x|
  x.report("any?")        { ARR.any?(REGEXP) }
  x.report("fast_any?")   { ARR.any?(FAST_REGEXP) }
  x.compare!
end

puts "********* MEMORY *********\n"

Benchmark.memory do |x|
  x.report("any?")        { ARR.any?(REGEXP) }
  x.report("fast_any?")   { ARR.any?(FAST_REGEXP) }
  x.compare!
end
```

```
Warming up --------------------------------------
                any?    25.840k i/100ms
           fast_any?    95.381k i/100ms
Calculating -------------------------------------
                any?    261.095k (± 1.0%) i/s -      1.318M in   5.047859s
           fast_any?    893.676k (±13.2%) i/s -      4.388M in   5.070820s

Comparison:
           fast_any?:   893675.9 i/s
                any?:   261095.0 i/s - 3.42x  (± 0.00) slower

********* MEMORY *********
Calculating -------------------------------------
                any?   168.000  memsize (   168.000  retained)
                         1.000  objects (     1.000  retained)
                         0.000  strings (     0.000  retained)
           fast_any?     0.000  memsize (     0.000  retained)
                         0.000  objects (     0.000  retained)
                         0.000  strings (     0.000  retained)

Comparison:
           fast_any?:          0 allocated
                any?:        168 allocated - Infx more
```

If that seems OK, I will update and finish my PR with tests/docs/etc.

----------------------------------------
Bug #17030: Enumerable#grep{_v} should be optimized for Regexp
https://bugs.ruby-lang.org/issues/17030#change-87181

* Author: marcandre (Marc-Andre Lafortune)
* Status: Open
* Priority: Normal
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
Currently:

```ruby
array.select { |e| e.match?(REGEXP) }

# about 3x faster and 6x more memory efficient than
array.grep(REGEXP)
```

This is because `grep` calls `Regexp#===` which creates useless `MatchData`



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [ruby-core:99692] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp
  2020-07-13 20:26 [ruby-core:99156] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp marcandre-ruby-core
                   ` (8 preceding siblings ...)
  2020-08-25 22:09 ` [ruby-core:99690] " fatkodima123
@ 2020-08-25 23:37 ` sawadatsuyoshi
  2020-08-26  8:31 ` [ruby-core:99702] " fatkodima123
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: sawadatsuyoshi @ 2020-08-25 23:37 UTC (permalink / raw)
  To: ruby-core

Issue #17030 has been updated by sawa (Tsuyoshi Sawada).


I feel scivola20 (sciv ola)'s idea promising, but have a feeling that it is going to introduce the same kind of mess as when `"string"f` notation was introduced. People are going to need to write `/regex/f` all over the place.

Just by analogy from the situation with strings, what about introducing the following pragma, which will make all regex literals on that page fast regex literals (i.e., `===` becomes `match?`)?

```ruby
# boolean_regex_literal: true
```

And perhaps in the long run, Matz might want to make all regex literals work like that.

----------------------------------------
Bug #17030: Enumerable#grep{_v} should be optimized for Regexp
https://bugs.ruby-lang.org/issues/17030#change-87183

* Author: marcandre (Marc-Andre Lafortune)
* Status: Open
* Priority: Normal
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
Currently:

```ruby
array.select { |e| e.match?(REGEXP) }

# about 3x faster and 6x more memory efficient than
array.grep(REGEXP)
```

This is because `grep` calls `Regexp#===` which creates useless `MatchData`



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [ruby-core:99702] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp
  2020-07-13 20:26 [ruby-core:99156] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp marcandre-ruby-core
                   ` (9 preceding siblings ...)
  2020-08-25 23:37 ` [ruby-core:99692] " sawadatsuyoshi
@ 2020-08-26  8:31 ` fatkodima123
  2020-08-26 15:01 ` [ruby-core:99705] " marcandre-ruby-core
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: fatkodima123 @ 2020-08-26  8:31 UTC (permalink / raw)
  To: ruby-core

Issue #17030 has been updated by fatkodima (Dima Fatko).


sawa (Tsuyoshi Sawada) wrote in #note-10:
> I feel scivola20 (sciv ola)'s idea promising, but have a concern that it is going to introduce the same kind of mess as when `"string"f` notation was introduced (same "f" used due to frozen and fast, but this is just coincidental). People are going to need to write `/regex/f` all over the place.
> 
> Just by analogy from the situation with strings, what about introducing the following pragma, which will make all regex literals on that page fast regex literals (i.e., `===` becomes `match?`)?
> 
> ```ruby
> # boolean_regex_literal: true
> ```
> 
> And perhaps in the long run, Matz might want to make all regexes work like that, or simply change the definition of `Regexp#===`.


Yes, seems like this will solve the problem of typing `regex/f` all over. Does it also mean that we then should have something like in frozen string world (`String#@+`) to manually change to the old behavior where we need it, like

```ruby
# boolean_regex_literal: true

case var
when /foo/
  # does not set $~, etc
when +/bar/
  # sets $~, etc
end
```
?

----------------------------------------
Bug #17030: Enumerable#grep{_v} should be optimized for Regexp
https://bugs.ruby-lang.org/issues/17030#change-87192

* Author: marcandre (Marc-Andre Lafortune)
* Status: Open
* Priority: Normal
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
Currently:

```ruby
array.select { |e| e.match?(REGEXP) }

# about 3x faster and 6x more memory efficient than
array.grep(REGEXP)
```

This is because `grep` calls `Regexp#===` which creates useless `MatchData`



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [ruby-core:99705] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp
  2020-07-13 20:26 [ruby-core:99156] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp marcandre-ruby-core
                   ` (10 preceding siblings ...)
  2020-08-26  8:31 ` [ruby-core:99702] " fatkodima123
@ 2020-08-26 15:01 ` marcandre-ruby-core
  2020-08-26 17:01 ` [ruby-core:99706] " fatkodima123
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: marcandre-ruby-core @ 2020-08-26 15:01 UTC (permalink / raw)
  To: ruby-core

Issue #17030 has been updated by marcandre (Marc-Andre Lafortune).


Couldn't static analysis of the code determine in most cases if match data need be generated or not?

This is Ruby, so I can think of some corner cases where things like `const_get(:Regexp).last_match` would be impacted (in theory), what other limitations would this have?

Maybe it would be best to start a different thread as none of these proposals have a relation to `grep{_v}` without block not being optimized.


----------------------------------------
Bug #17030: Enumerable#grep{_v} should be optimized for Regexp
https://bugs.ruby-lang.org/issues/17030#change-87196

* Author: marcandre (Marc-Andre Lafortune)
* Status: Open
* Priority: Normal
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
Currently:

```ruby
array.select { |e| e.match?(REGEXP) }

# about 3x faster and 6x more memory efficient than
array.grep(REGEXP)
```

This is because `grep` calls `Regexp#===` which creates useless `MatchData`



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [ruby-core:99706] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp
  2020-07-13 20:26 [ruby-core:99156] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp marcandre-ruby-core
                   ` (11 preceding siblings ...)
  2020-08-26 15:01 ` [ruby-core:99705] " marcandre-ruby-core
@ 2020-08-26 17:01 ` fatkodima123
  2020-08-26 17:30 ` [ruby-core:99708] " marcandre-ruby-core
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: fatkodima123 @ 2020-08-26 17:01 UTC (permalink / raw)
  To: ruby-core

Issue #17030 has been updated by fatkodima (Dima Fatko).


marcandre (Marc-Andre Lafortune) wrote in #note-12:
> Maybe it would be best to start a different thread as none of these proposals have a relation to `grep{_v}` without block not being optimized.

I have already implemented a patch to make `grep{_v}` faster and right before submitting the `Create Pull Request` button, I realized (with the help of scivola20's comment), that this case can be generalized. Because we already have, at least, `Enumerable#{all?,any?,none?}` and many future methods to be added (like `grep`), which can benefit from this generalized solution. 


> This is Ruby, so I can think of some corner cases where things like `const_get(:Regexp).last_match` would be impacted (in theory), what other limitations would this have?
> 

`case-when`? 

> Couldn't static analysis of the code determine in most cases if match data need be generated or not?
> 

In many cases, probably yes, but again, `case-when`.

----------------------------------------
Bug #17030: Enumerable#grep{_v} should be optimized for Regexp
https://bugs.ruby-lang.org/issues/17030#change-87197

* Author: marcandre (Marc-Andre Lafortune)
* Status: Open
* Priority: Normal
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
Currently:

```ruby
array.select { |e| e.match?(REGEXP) }

# about 3x faster and 6x more memory efficient than
array.grep(REGEXP)
```

This is because `grep` calls `Regexp#===` which creates useless `MatchData`



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [ruby-core:99708] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp
  2020-07-13 20:26 [ruby-core:99156] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp marcandre-ruby-core
                   ` (12 preceding siblings ...)
  2020-08-26 17:01 ` [ruby-core:99706] " fatkodima123
@ 2020-08-26 17:30 ` marcandre-ruby-core
  2020-08-26 19:17 ` [ruby-core:99713] " daniel
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: marcandre-ruby-core @ 2020-08-26 17:30 UTC (permalink / raw)
  To: ruby-core

Issue #17030 has been updated by marcandre (Marc-Andre Lafortune).


fatkodima (Dima Fatko) wrote in #note-13:
> In many cases, probably yes, but again, `case-when`, when arguments/consts/etc instead of local vars are used - it is hard to tell if them are regexes or not.

That's not really what I'm proposing. I'm proposing something like an internal `Regexp.needs_last_match?` that would return `true` or `false` depending on the Ruby code, and that could be used to optimize methods. It would return `true` if any subsequent code could be impacted by `$~` and al.

```ruby
def foo
  /x/ =~ 'x' # needs_last_match? # => false
  case method
  when /(foo)/ # needs_last_match? # => false
    do_something
  when /(bar)/ # needs_last_match? # => true
    puts $2
    # ... # needs_last_match? # => false
  end
end

def bar
  # ... # needs_last_match? # => true
  case x
  when /(foo)/ # needs_last_match? # => true
    do_something
  end
  Regexp.last_match
  # ... # needs_last_match? # => false (false negative)
  Regexp.send :last_match
  # ... # needs_last_match? # => false (false negative)
  const_get(:Regexp).last_match
end
```

----------------------------------------
Bug #17030: Enumerable#grep{_v} should be optimized for Regexp
https://bugs.ruby-lang.org/issues/17030#change-87199

* Author: marcandre (Marc-Andre Lafortune)
* Status: Open
* Priority: Normal
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
Currently:

```ruby
array.select { |e| e.match?(REGEXP) }

# about 3x faster and 6x more memory efficient than
array.grep(REGEXP)
```

This is because `grep` calls `Regexp#===` which creates useless `MatchData`



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [ruby-core:99713] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp
  2020-07-13 20:26 [ruby-core:99156] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp marcandre-ruby-core
                   ` (13 preceding siblings ...)
  2020-08-26 17:30 ` [ruby-core:99708] " marcandre-ruby-core
@ 2020-08-26 19:17 ` daniel
  2020-08-26 19:31 ` [ruby-core:99714] " marcandre-ruby-core
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: daniel @ 2020-08-26 19:17 UTC (permalink / raw)
  To: ruby-core

Issue #17030 has been updated by Dan0042 (Daniel DeLorme).


> Couldn't static analysis of the code determine in most cases if match data need be generated or not?

scivola20 had a good idea, but this is even better. We can automatically get the best performance without having to manually optimize each case.

But static analysis has other limits besides `const_get(:Regexp).last_match`

```ruby
def foo(v)
  /x/ =~ 'x' # needs_last_match? depends on whether 'v' is regexp
  case method
  when v
    $1
  end
end
```

So a simpler approach would be to check if the match operation's scope (in this case the method body) contains any of the regexp-related pseudo-globals.

----------------------------------------
Bug #17030: Enumerable#grep{_v} should be optimized for Regexp
https://bugs.ruby-lang.org/issues/17030#change-87203

* Author: marcandre (Marc-Andre Lafortune)
* Status: Open
* Priority: Normal
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
Currently:

```ruby
array.select { |e| e.match?(REGEXP) }

# about 3x faster and 6x more memory efficient than
array.grep(REGEXP)
```

This is because `grep` calls `Regexp#===` which creates useless `MatchData`



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [ruby-core:99714] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp
  2020-07-13 20:26 [ruby-core:99156] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp marcandre-ruby-core
                   ` (14 preceding siblings ...)
  2020-08-26 19:17 ` [ruby-core:99713] " daniel
@ 2020-08-26 19:31 ` marcandre-ruby-core
  2020-08-26 20:43 ` [ruby-core:99716] " daniel
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: marcandre-ruby-core @ 2020-08-26 19:31 UTC (permalink / raw)
  To: ruby-core

Issue #17030 has been updated by marcandre (Marc-Andre Lafortune).


Dan0042 (Daniel DeLorme) wrote in #note-15:
> But static analysis has other limits

Good example but it is easily resolved: assume `v` isn't a Regexp and we may get a false positive, which is not a big issue. There will be other false positives: `str.gsub(regexp, &block)`. That's not a real issue, simply assume that `block` will want access to `Regexp.last_match`. I'm really only worried about false negatives... Any other example comes to mind?

----------------------------------------
Bug #17030: Enumerable#grep{_v} should be optimized for Regexp
https://bugs.ruby-lang.org/issues/17030#change-87204

* Author: marcandre (Marc-Andre Lafortune)
* Status: Open
* Priority: Normal
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
Currently:

```ruby
array.select { |e| e.match?(REGEXP) }

# about 3x faster and 6x more memory efficient than
array.grep(REGEXP)
```

This is because `grep` calls `Regexp#===` which creates useless `MatchData`



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [ruby-core:99716] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp
  2020-07-13 20:26 [ruby-core:99156] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp marcandre-ruby-core
                   ` (15 preceding siblings ...)
  2020-08-26 19:31 ` [ruby-core:99714] " marcandre-ruby-core
@ 2020-08-26 20:43 ` daniel
  2020-08-27  2:10 ` [ruby-core:99722] " marcandre-ruby-core
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: daniel @ 2020-08-26 20:43 UTC (permalink / raw)
  To: ruby-core

Issue #17030 has been updated by Dan0042 (Daniel DeLorme).


What about this?

```ruby
2.times do
  p $~  # depends on match *below*
  rx =~ str
end
```

Now imagine if `2.times` is replaced by `foo`; a priori we can't know if or how many times the block will be executed. So what I was trying to say is that flow control can lead to all kinds of code paths where it's extremely difficult to know which matching operations a pseudo-global may depend on. Maybe not impossible, but personally I wouldn't want to code that kind of analysis when a simple approach is enough for >90% of cases, and guaranteed to be bug-free.


> There will be other false positives: `str.gsub(regexp, &block)`. That's not a real issue, simply assume that `block` will want access to `Regexp.last_match`.

Actually... `block` does not have access to `Regexp.last_match` (unless you created the block in the same scope as the gsub operation, but that would be unusual)

----------------------------------------
Bug #17030: Enumerable#grep{_v} should be optimized for Regexp
https://bugs.ruby-lang.org/issues/17030#change-87206

* Author: marcandre (Marc-Andre Lafortune)
* Status: Open
* Priority: Normal
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
Currently:

```ruby
array.select { |e| e.match?(REGEXP) }

# about 3x faster and 6x more memory efficient than
array.grep(REGEXP)
```

This is because `grep` calls `Regexp#===` which creates useless `MatchData`



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [ruby-core:99722] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp
  2020-07-13 20:26 [ruby-core:99156] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp marcandre-ruby-core
                   ` (16 preceding siblings ...)
  2020-08-26 20:43 ` [ruby-core:99716] " daniel
@ 2020-08-27  2:10 ` marcandre-ruby-core
  2020-08-27  8:03 ` [ruby-core:99729] " fatkodima123
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: marcandre-ruby-core @ 2020-08-27  2:10 UTC (permalink / raw)
  To: ruby-core

Issue #17030 has been updated by marcandre (Marc-Andre Lafortune).


Dan0042 (Daniel DeLorme) wrote in #note-17:
> Maybe not impossible, but personally I wouldn't want to code that kind of analysis when a simple approach is enough for >90% of cases, and guaranteed to be bug-free.

Agreed

> > There will be other false positives: `str.gsub(regexp, &block)`. That's not a real issue, simply assume that `block` will want access to `Regexp.last_match`.
> 
> Actually... `block` does not have access to `Regexp.last_match` (unless you created the block in the same scope as the gsub operation, but that would be unusual)

You are right, my bad.



----------------------------------------
Bug #17030: Enumerable#grep{_v} should be optimized for Regexp
https://bugs.ruby-lang.org/issues/17030#change-87213

* Author: marcandre (Marc-Andre Lafortune)
* Status: Open
* Priority: Normal
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
Currently,

```ruby
array.select { |e| e.match?(REGEXP) }
```

is about three times faster and six times more memory efficient than

```ruby
array.grep(REGEXP)
```

This is because `grep` calls `Regexp#===`, which creates useless `MatchData`.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [ruby-core:99729] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp
  2020-07-13 20:26 [ruby-core:99156] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp marcandre-ruby-core
                   ` (17 preceding siblings ...)
  2020-08-27  2:10 ` [ruby-core:99722] " marcandre-ruby-core
@ 2020-08-27  8:03 ` fatkodima123
  2020-08-27 16:58 ` [ruby-core:99737] " daniel
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: fatkodima123 @ 2020-08-27  8:03 UTC (permalink / raw)
  To: ruby-core

Issue #17030 has been updated by fatkodima (Dima Fatko).


Dan0042 (Daniel DeLorme) wrote in #note-15:
> 
> So a simpler approach would be to check if the match operation's scope (in this case the method body) contains any of the regexp-related pseudo-globals.

I didn't quite get it. So, to summarize, how this new approach should work? Can you elaborate in few more sentences?

Does ruby already do some kind of static analysis that you can point me to?



----------------------------------------
Bug #17030: Enumerable#grep{_v} should be optimized for Regexp
https://bugs.ruby-lang.org/issues/17030#change-87222

* Author: marcandre (Marc-Andre Lafortune)
* Status: Open
* Priority: Normal
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
Currently,

```ruby
array.select { |e| e.match?(REGEXP) }
```

is about three times faster and six times more memory efficient than

```ruby
array.grep(REGEXP)
```

This is because `grep` calls `Regexp#===`, which creates useless `MatchData`.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [ruby-core:99737] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp
  2020-07-13 20:26 [ruby-core:99156] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp marcandre-ruby-core
                   ` (18 preceding siblings ...)
  2020-08-27  8:03 ` [ruby-core:99729] " fatkodima123
@ 2020-08-27 16:58 ` daniel
  2020-08-27 17:16 ` [ruby-core:99738] " merch-redmine
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: daniel @ 2020-08-27 16:58 UTC (permalink / raw)
  To: ruby-core

Issue #17030 has been updated by Dan0042 (Daniel DeLorme).


Yeah ok, that sentence wasn't very clear, sorry.

The first thing is that when compiling a method to an iseq, you have to set a flag on the iseq if the method contains any of the "last_match" pseudo-globals (`$~`, `$&`, `$1`, `Regexp.last_match`, ...)

Then in `rb_reg_match` (aka `Regexp#=~`), you check if the current iseq has the flag set. This is similar to how `rb_backref_get` gets the last_match object from execution context > control frame > normal control frame > ep > svar > backref. If the flag is not set it means you can use a variant of `reg_match_pos` that only returns the position without using `rb_reg_search` to set the last_match, in the same vein as `rb_reg_match_m_p` (aka `Regexp#match?`).

But I may be missing a few details here, as I don't have a full understanding of the VM.

----------------------------------------
Bug #17030: Enumerable#grep{_v} should be optimized for Regexp
https://bugs.ruby-lang.org/issues/17030#change-87236

* Author: marcandre (Marc-Andre Lafortune)
* Status: Open
* Priority: Normal
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
Currently,

```ruby
array.select { |e| e.match?(REGEXP) }
```

is about three times faster and six times more memory efficient than

```ruby
array.grep(REGEXP)
```

This is because `grep` calls `Regexp#===`, which creates useless `MatchData`.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [ruby-core:99738] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp
  2020-07-13 20:26 [ruby-core:99156] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp marcandre-ruby-core
                   ` (19 preceding siblings ...)
  2020-08-27 16:58 ` [ruby-core:99737] " daniel
@ 2020-08-27 17:16 ` merch-redmine
  2020-08-28 15:01 ` [ruby-core:99751] " eregontp
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 25+ messages in thread
From: merch-redmine @ 2020-08-27 17:16 UTC (permalink / raw)
  To: ruby-core

Issue #17030 has been updated by jeremyevans0 (Jeremy Evans).


Dan0042 (Daniel DeLorme) wrote in #note-21:
> Yeah ok, that sentence wasn't very clear, sorry.
> 
> The first thing is that when compiling a method to an iseq, you have to set a flag on the iseq if the method contains any of the "last_match" pseudo-globals (`$~`, `$&`, `$1`, `Regexp.last_match`, ...)
> 
> Then in `rb_reg_match` (aka `Regexp#=~`), you check if the current iseq has the flag set. This is similar to how `rb_backref_get` gets the last_match object from execution context > control frame > normal control frame > ep > svar > backref. If the flag is not set it means you can use a variant of `reg_match_pos` that only returns the position without using `rb_reg_search` to set the last_match, in the same vein as `rb_reg_match_m_p` (aka `Regexp#match?`).
> 
> But I may be missing a few details here, as I don't have a full understanding of the VM.

Unfortunately, you can't take this approach for VM optimizations without breaking backwards compatibility unless you also have a deoptimization approach that will handle code such as:

```ruby
def a; /(a)/ =~ 'a'; binding; end; a.eval('$1')

def a; /(a)/ =~ 'a'; proc{}; end; a.binding.eval('$1')

def a(c, m); /(a)/ =~ 'a'; c.send(m); end; a(Regexp, :last_match)
```

----------------------------------------
Bug #17030: Enumerable#grep{_v} should be optimized for Regexp
https://bugs.ruby-lang.org/issues/17030#change-87237

* Author: marcandre (Marc-Andre Lafortune)
* Status: Open
* Priority: Normal
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
Currently,

```ruby
array.select { |e| e.match?(REGEXP) }
```

is about three times faster and six times more memory efficient than

```ruby
array.grep(REGEXP)
```

This is because `grep` calls `Regexp#===`, which creates useless `MatchData`.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [ruby-core:99751] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp
  2020-07-13 20:26 [ruby-core:99156] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp marcandre-ruby-core
                   ` (20 preceding siblings ...)
  2020-08-27 17:16 ` [ruby-core:99738] " merch-redmine
@ 2020-08-28 15:01 ` eregontp
  2020-08-28 15:30 ` [ruby-core:99752] " eregontp
  2020-09-25  6:53 ` [ruby-core:100127] " matz
  23 siblings, 0 replies; 25+ messages in thread
From: eregontp @ 2020-08-28 15:01 UTC (permalink / raw)
  To: ruby-core

Issue #17030 has been updated by Eregon (Benoit Daloze).


This is something that a JIT with inlining and escape analysis can optimize and always be correct.
Static analysis doesn't cut it for Ruby.

On TruffleRuby (master + a fix I'll merge soon) for the benchmark above:
```
Calculating -------------------------------------
       select.match?      2.503M (± 2.9%) i/s -     12.677M in   5.068796s
                grep      2.502M (± 2.8%) i/s -     12.558M in   5.022837s
       select.match?      2.519M (± 2.6%) i/s -     12.677M in   5.036105s
                grep      2.498M (± 2.2%) i/s -     12.558M in   5.030485s

Comparison:
       select.match?:  2518943.6 i/s
                grep:  2497618.0 i/s - same-ish: difference falls within error

```

MRI 2.6 for comparison:
```
Calculating -------------------------------------
       select.match?    943.017k (± 0.6%) i/s -      4.770M in   5.058962s
                grep    470.844k (± 0.8%) i/s -      2.389M in   5.074917s
       select.match?    944.326k (± 0.7%) i/s -      4.770M in   5.052020s
                grep    471.122k (± 2.5%) i/s -      2.389M in   5.074969s

Comparison:
       select.match?:   944325.5 i/s
                grep:   471122.3 i/s - 2.00x  (± 0.00) slower
```

----------------------------------------
Bug #17030: Enumerable#grep{_v} should be optimized for Regexp
https://bugs.ruby-lang.org/issues/17030#change-87251

* Author: marcandre (Marc-Andre Lafortune)
* Status: Open
* Priority: Normal
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
Currently,

```ruby
array.select { |e| e.match?(REGEXP) }
```

is about three times faster and six times more memory efficient than

```ruby
array.grep(REGEXP)
```

This is because `grep` calls `Regexp#===`, which creates useless `MatchData`.



-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [ruby-core:99752] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp
  2020-07-13 20:26 [ruby-core:99156] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp marcandre-ruby-core
                   ` (21 preceding siblings ...)
  2020-08-28 15:01 ` [ruby-core:99751] " eregontp
@ 2020-08-28 15:30 ` eregontp
  2020-09-25  6:53 ` [ruby-core:100127] " matz
  23 siblings, 0 replies; 25+ messages in thread
From: eregontp @ 2020-08-28 15:30 UTC (permalink / raw)
  To: ruby-core

Issue #17030 has been updated by Eregon (Benoit Daloze).


I'm surprised at the performance difference on MRI though. Just allocating the MatchData shouldn't be so expensive. Maybe `Regexp#match?` has additional optimizations?

----------------------------------------
Bug #17030: Enumerable#grep{_v} should be optimized for Regexp
https://bugs.ruby-lang.org/issues/17030#change-87252

* Author: marcandre (Marc-Andre Lafortune)
* Status: Open
* Priority: Normal
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
Currently,

```ruby
array.select { |e| e.match?(REGEXP) }
```

is about three times faster and six times more memory efficient than

```ruby
array.grep(REGEXP)
```

This is because `grep` calls `Regexp#===`, which creates useless `MatchData`.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [ruby-core:100127] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp
  2020-07-13 20:26 [ruby-core:99156] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp marcandre-ruby-core
                   ` (22 preceding siblings ...)
  2020-08-28 15:30 ` [ruby-core:99752] " eregontp
@ 2020-09-25  6:53 ` matz
  23 siblings, 0 replies; 25+ messages in thread
From: matz @ 2020-09-25  6:53 UTC (permalink / raw)
  To: ruby-core

Issue #17030 has been updated by matz (Yukihiro Matsumoto).


As far as we measured, there are still plenty of room for the optimization (for example, we don't need to allocate `MatchObject` for `grep_v`). We will investigate to improve the performance for those methods.

Besides that, `/f` flag for regexp may be a useful idea (though little ugly). Could you resubmit the independent issue for the feature, if you like.

Matz.


----------------------------------------
Bug #17030: Enumerable#grep{_v} should be optimized for Regexp
https://bugs.ruby-lang.org/issues/17030#change-87696

* Author: marcandre (Marc-Andre Lafortune)
* Status: Open
* Priority: Normal
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
Currently,

```ruby
array.select { |e| e.match?(REGEXP) }
```

is about three times faster and six times more memory efficient than

```ruby
array.grep(REGEXP)
```

This is because `grep` calls `Regexp#===`, which creates useless `MatchData`.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2020-09-25  6:54 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-13 20:26 [ruby-core:99156] [Ruby master Bug#17030] Enumerable#grep{_v} should be optimized for Regexp marcandre-ruby-core
2020-07-13 20:27 ` [ruby-core:99157] " marcandre-ruby-core
2020-07-14  1:24 ` [ruby-core:99161] " shyouhei
2020-07-14  3:37 ` [ruby-core:99163] " marcandre-ruby-core
2020-07-14 10:25 ` [ruby-core:99164] " nobu
2020-07-14 13:28 ` [ruby-core:99165] " eregontp
2020-07-14 13:46 ` [ruby-core:99166] " marcandre-ruby-core
2020-07-21 11:27 ` [ruby-core:99248] " scivola20
2020-08-25 17:23 ` [ruby-core:99687] " fatkodima123
2020-08-25 22:09 ` [ruby-core:99690] " fatkodima123
2020-08-25 23:37 ` [ruby-core:99692] " sawadatsuyoshi
2020-08-26  8:31 ` [ruby-core:99702] " fatkodima123
2020-08-26 15:01 ` [ruby-core:99705] " marcandre-ruby-core
2020-08-26 17:01 ` [ruby-core:99706] " fatkodima123
2020-08-26 17:30 ` [ruby-core:99708] " marcandre-ruby-core
2020-08-26 19:17 ` [ruby-core:99713] " daniel
2020-08-26 19:31 ` [ruby-core:99714] " marcandre-ruby-core
2020-08-26 20:43 ` [ruby-core:99716] " daniel
2020-08-27  2:10 ` [ruby-core:99722] " marcandre-ruby-core
2020-08-27  8:03 ` [ruby-core:99729] " fatkodima123
2020-08-27 16:58 ` [ruby-core:99737] " daniel
2020-08-27 17:16 ` [ruby-core:99738] " merch-redmine
2020-08-28 15:01 ` [ruby-core:99751] " eregontp
2020-08-28 15:30 ` [ruby-core:99752] " eregontp
2020-09-25  6:53 ` [ruby-core:100127] " matz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).