* [ruby-core:100239] [Ruby master Feature#17206] Introduce new Regexp option to avoid MatchData allocation
@ 2020-09-30 15:42 fatkodima123
2020-09-30 15:58 ` [ruby-core:100240] " zn
` (6 more replies)
0 siblings, 7 replies; 8+ messages in thread
From: fatkodima123 @ 2020-09-30 15:42 UTC (permalink / raw)
To: ruby-core
Issue #17206 has been reported by fatkodima (Dima Fatko).
----------------------------------------
Feature #17206: Introduce new Regexp option to avoid MatchData allocation
https://bugs.ruby-lang.org/issues/17206
* Author: fatkodima (Dima Fatko)
* Status: Open
* Priority: Normal
----------------------------------------
Originates from https://bugs.ruby-lang.org/issues/17030
When this option is specified, ruby will not create global `MatchData` objects, when not explicitly needed by the method.
If the new option is named `f`, we can write as `/o/f`, and `grep(/o/f)` is faster than `grep(/o/)`.
This speeds up not only `grep`, but also `all?`, `any?`, `case` and so on.
Many people have written code like this:
```ruby
IO.foreach("foo.txt") do |line|
case line
when /^#/
# do nothing
when /^(\d+)/
# using $1
when /xxx/
# using $&
when /yyy/
# not using $&
else
# ...
end
end
```
This is slow, because of the above mentioned problem.
Replacing `/^#/` with `/^#/f`, and `/yyy/` with `/yyy/f` will make it faster.
Some benchmarks - https://bugs.ruby-lang.org/issues/17030#note-9 which show `2.5x` to `5x` speedup.
PR: https://github.com/ruby/ruby/pull/3455
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
* [ruby-core:100240] [Ruby master Feature#17206] Introduce new Regexp option to avoid MatchData allocation
2020-09-30 15:42 [ruby-core:100239] [Ruby master Feature#17206] Introduce new Regexp option to avoid MatchData allocation fatkodima123
@ 2020-09-30 15:58 ` zn
2020-09-30 16:22 ` [ruby-core:100241] " fatkodima123
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: zn @ 2020-09-30 15:58 UTC (permalink / raw)
To: ruby-core
Issue #17206 has been updated by znz (Kazuhiro NISHIYAMA).
What does `regexp_without_matchdata.match(string)` return when matched?
----------------------------------------
Feature #17206: Introduce new Regexp option to avoid MatchData allocation
https://bugs.ruby-lang.org/issues/17206#change-87826
* Author: fatkodima (Dima Fatko)
* Status: Open
* Priority: Normal
----------------------------------------
Originates from https://bugs.ruby-lang.org/issues/17030
When this option is specified, ruby will not create global `MatchData` objects, when not explicitly needed by the method.
If the new option is named `f`, we can write as `/o/f`, and `grep(/o/f)` is faster than `grep(/o/)`.
This speeds up not only `grep`, but also `all?`, `any?`, `case` and so on.
Many people have written code like this:
```ruby
IO.foreach("foo.txt") do |line|
case line
when /^#/
# do nothing
when /^(\d+)/
# using $1
when /xxx/
# using $&
when /yyy/
# not using $&
else
# ...
end
end
```
This is slow, because of the above mentioned problem.
Replacing `/^#/` with `/^#/f`, and `/yyy/` with `/yyy/f` will make it faster.
Some benchmarks - https://bugs.ruby-lang.org/issues/17030#note-9 which show `2.5x` to `5x` speedup.
PR: https://github.com/ruby/ruby/pull/3455
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
* [ruby-core:100241] [Ruby master Feature#17206] Introduce new Regexp option to avoid MatchData allocation
2020-09-30 15:42 [ruby-core:100239] [Ruby master Feature#17206] Introduce new Regexp option to avoid MatchData allocation fatkodima123
2020-09-30 15:58 ` [ruby-core:100240] " zn
@ 2020-09-30 16:22 ` fatkodima123
2020-09-30 18:18 ` [ruby-core:100242] [Ruby master Feature#17206] Introduce new Regexp option to avoid global MatchData allocations eregontp
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: fatkodima123 @ 2020-09-30 16:22 UTC (permalink / raw)
To: ruby-core
Issue #17206 has been updated by fatkodima (Dima Fatko).
znz (Kazuhiro NISHIYAMA) wrote in #note-1:
> What does `regexp_without_matchdata.match(string)` return when matched?
Thats what `when not explicitly needed by the method.` part was about: it returns `MatchData` in this case, as requested.
----------------------------------------
Feature #17206: Introduce new Regexp option to avoid MatchData allocation
https://bugs.ruby-lang.org/issues/17206#change-87827
* Author: fatkodima (Dima Fatko)
* Status: Open
* Priority: Normal
----------------------------------------
Originates from https://bugs.ruby-lang.org/issues/17030
When this option is specified, ruby will not create global `MatchData` objects, when not explicitly needed by the method.
If the new option is named `f`, we can write as `/o/f`, and `grep(/o/f)` is faster than `grep(/o/)`.
This speeds up not only `grep`, but also `all?`, `any?`, `case` and so on.
Many people have written code like this:
```ruby
IO.foreach("foo.txt") do |line|
case line
when /^#/
# do nothing
when /^(\d+)/
# using $1
when /xxx/
# using $&
when /yyy/
# not using $&
else
# ...
end
end
```
This is slow, because of the above mentioned problem.
Replacing `/^#/` with `/^#/f`, and `/yyy/` with `/yyy/f` will make it faster.
Some benchmarks - https://bugs.ruby-lang.org/issues/17030#note-9 which show `2.5x` to `5x` speedup.
PR: https://github.com/ruby/ruby/pull/3455
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
* [ruby-core:100242] [Ruby master Feature#17206] Introduce new Regexp option to avoid global MatchData allocations
2020-09-30 15:42 [ruby-core:100239] [Ruby master Feature#17206] Introduce new Regexp option to avoid MatchData allocation fatkodima123
2020-09-30 15:58 ` [ruby-core:100240] " zn
2020-09-30 16:22 ` [ruby-core:100241] " fatkodima123
@ 2020-09-30 18:18 ` eregontp
2020-10-24 1:34 ` [ruby-core:100519] " scivola20
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: eregontp @ 2020-09-30 18:18 UTC (permalink / raw)
To: ruby-core
Issue #17206 has been updated by Eregon (Benoit Daloze).
IMHO hardcoding such knowledge in the pattern feels wrong (vs in the matching method like `Regexp#match?` which is fine).
It seems to me that it could cause confusing bugs, e.g. when using `/f` in the `case` above if a `when` clause starts to use one of the `$~`-derived variables.
Then it would unexpectedly always be `nil`, causing a potentially very subtle bug.
I have a hard time to believe that allocating the MatchData is so expensive.
If that's the case, then there must be a lot of optimization potential for faster allocation of MatchData in CRuby.
What I think rather is this is due to having to set $~ in the caller, and maybe to compute group offsets.
I think it would be worth investigating more in details where does the performance overhead from `$~` & friends come from in CRuby.
----------------------------------------
Feature #17206: Introduce new Regexp option to avoid global MatchData allocations
https://bugs.ruby-lang.org/issues/17206#change-87829
* Author: fatkodima (Dima Fatko)
* Status: Open
* Priority: Normal
----------------------------------------
Originates from https://bugs.ruby-lang.org/issues/17030
When this option is specified, ruby will not create global `MatchData` objects, when not explicitly needed by the method.
If the new option is named `f`, we can write as `/o/f`, and `grep(/o/f)` is faster than `grep(/o/)`.
This speeds up not only `grep`, but also `all?`, `any?`, `case` and so on.
Many people have written code like this:
```ruby
IO.foreach("foo.txt") do |line|
case line
when /^#/
# do nothing
when /^(\d+)/
# using $1
when /xxx/
# using $&
when /yyy/
# not using $&
else
# ...
end
end
```
This is slow, because of the above mentioned problem.
Replacing `/^#/` with `/^#/f`, and `/yyy/` with `/yyy/f` will make it faster.
Some benchmarks - https://bugs.ruby-lang.org/issues/17030#note-9 which show `2.5x` to `5x` speedup.
PR: https://github.com/ruby/ruby/pull/3455
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
* [ruby-core:100519] [Ruby master Feature#17206] Introduce new Regexp option to avoid global MatchData allocations
2020-09-30 15:42 [ruby-core:100239] [Ruby master Feature#17206] Introduce new Regexp option to avoid MatchData allocation fatkodima123
` (2 preceding siblings ...)
2020-09-30 18:18 ` [ruby-core:100242] [Ruby master Feature#17206] Introduce new Regexp option to avoid global MatchData allocations eregontp
@ 2020-10-24 1:34 ` scivola20
2020-10-24 14:30 ` [ruby-core:100523] " eregontp
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: scivola20 @ 2020-10-24 1:34 UTC (permalink / raw)
To: ruby-core
Issue #17206 has been updated by scivola20 (sciv ola).
I believe that people who can use `match?` and `match` methods properly, can use this new Regexp option properly.
By the way, the total size of ``$` ``, `$&`, `$'` equals to the size of the target string. Therefore a huge amount of String garbage will be generated, if the text is very large.
----------------------------------------
Feature #17206: Introduce new Regexp option to avoid global MatchData allocations
https://bugs.ruby-lang.org/issues/17206#change-88142
* Author: fatkodima (Dima Fatko)
* Status: Open
* Priority: Normal
----------------------------------------
Originates from https://bugs.ruby-lang.org/issues/17030
When this option is specified, ruby will not create global `MatchData` objects, when not explicitly needed by the method.
If the new option is named `f`, we can write as `/o/f`, and `grep(/o/f)` is faster than `grep(/o/)`.
This speeds up not only `grep`, but also `all?`, `any?`, `case` and so on.
Many people have written code like this:
```ruby
IO.foreach("foo.txt") do |line|
case line
when /^#/
# do nothing
when /^(\d+)/
# using $1
when /xxx/
# using $&
when /yyy/
# not using $&
else
# ...
end
end
```
This is slow, because of the above mentioned problem.
Replacing `/^#/` with `/^#/f`, and `/yyy/` with `/yyy/f` will make it faster.
Some benchmarks - https://bugs.ruby-lang.org/issues/17030#note-9 which show `2.5x` to `5x` speedup.
PR: https://github.com/ruby/ruby/pull/3455
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
* [ruby-core:100523] [Ruby master Feature#17206] Introduce new Regexp option to avoid global MatchData allocations
2020-09-30 15:42 [ruby-core:100239] [Ruby master Feature#17206] Introduce new Regexp option to avoid MatchData allocation fatkodima123
` (3 preceding siblings ...)
2020-10-24 1:34 ` [ruby-core:100519] " scivola20
@ 2020-10-24 14:30 ` eregontp
2020-10-24 14:51 ` [ruby-core:100524] " eregontp
2020-10-28 22:43 ` [ruby-core:100626] " scivola20
6 siblings, 0 replies; 8+ messages in thread
From: eregontp @ 2020-10-24 14:30 UTC (permalink / raw)
To: ruby-core
Issue #17206 has been updated by Eregon (Benoit Daloze).
scivola20 (sciv ola) wrote in #note-5:
> I believe that people who can use `match?` and `match` methods properly, can use this new Regexp option properly.
I disagree, `match?` is clear, I think `=~` suddenly not setting `$~` would be a frequent source of bugs.
> By the way, the total size of ``$` ``, `$&`, `$'` equals to the size of the target string. Therefore a huge amount of String garbage will be generated, if the text is very large.
They are all based on `$~`, isn't it?
I think they only need a copy-on-write copy of the source string (to avoid later mutations affecting them) + the matched offsets.
At least that's what happens in TruffleRuby.
----------------------------------------
Feature #17206: Introduce new Regexp option to avoid global MatchData allocations
https://bugs.ruby-lang.org/issues/17206#change-88145
* Author: fatkodima (Dima Fatko)
* Status: Open
* Priority: Normal
----------------------------------------
Originates from https://bugs.ruby-lang.org/issues/17030
When this option is specified, ruby will not create global `MatchData` objects, when not explicitly needed by the method.
If the new option is named `f`, we can write as `/o/f`, and `grep(/o/f)` is faster than `grep(/o/)`.
This speeds up not only `grep`, but also `all?`, `any?`, `case` and so on.
Many people have written code like this:
```ruby
IO.foreach("foo.txt") do |line|
case line
when /^#/
# do nothing
when /^(\d+)/
# using $1
when /xxx/
# using $&
when /yyy/
# not using $&
else
# ...
end
end
```
This is slow, because of the above mentioned problem.
Replacing `/^#/` with `/^#/f`, and `/yyy/` with `/yyy/f` will make it faster.
Some benchmarks - https://bugs.ruby-lang.org/issues/17030#note-9 which show `2.5x` to `5x` speedup.
PR: https://github.com/ruby/ruby/pull/3455
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
* [ruby-core:100524] [Ruby master Feature#17206] Introduce new Regexp option to avoid global MatchData allocations
2020-09-30 15:42 [ruby-core:100239] [Ruby master Feature#17206] Introduce new Regexp option to avoid MatchData allocation fatkodima123
` (4 preceding siblings ...)
2020-10-24 14:30 ` [ruby-core:100523] " eregontp
@ 2020-10-24 14:51 ` eregontp
2020-10-28 22:43 ` [ruby-core:100626] " scivola20
6 siblings, 0 replies; 8+ messages in thread
From: eregontp @ 2020-10-24 14:51 UTC (permalink / raw)
To: ruby-core
Issue #17206 has been updated by Eregon (Benoit Daloze).
I took a quick look, the logic to set $~ is here:
https://github.com/ruby/ruby/blob/148961adcd0704d964fce920330a6301b9704c25/re.c#L1608-L1623
It does not seem so expensive, but the region is allocated which xmalloc() which is probably not so cheap (there is also a `rb_gc()` call in there, hopefully it's not hit in practice).
`rb_backref_set()` goes through a few indirections (it needs to reach the caller frame typically), but it does not seem too expensive either.
I think it would be valuable to investigate further what's actually expensive for setting `$~` and how can that be optimized.
A hacky Regexp flag to manually optimize `match/=~/===` calls doesn't seem a good way to me.
The caller code knows if it needs $~, etc, not the Regexp literal.
----------------------------------------
Feature #17206: Introduce new Regexp option to avoid global MatchData allocations
https://bugs.ruby-lang.org/issues/17206#change-88146
* Author: fatkodima (Dima Fatko)
* Status: Open
* Priority: Normal
----------------------------------------
Originates from https://bugs.ruby-lang.org/issues/17030
When this option is specified, ruby will not create global `MatchData` objects, when not explicitly needed by the method.
If the new option is named `f`, we can write as `/o/f`, and `grep(/o/f)` is faster than `grep(/o/)`.
This speeds up not only `grep`, but also `all?`, `any?`, `case` and so on.
Many people have written code like this:
```ruby
IO.foreach("foo.txt") do |line|
case line
when /^#/
# do nothing
when /^(\d+)/
# using $1
when /xxx/
# using $&
when /yyy/
# not using $&
else
# ...
end
end
```
This is slow, because of the above mentioned problem.
Replacing `/^#/` with `/^#/f`, and `/yyy/` with `/yyy/f` will make it faster.
Some benchmarks - https://bugs.ruby-lang.org/issues/17030#note-9 which show `2.5x` to `5x` speedup.
PR: https://github.com/ruby/ruby/pull/3455
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
* [ruby-core:100626] [Ruby master Feature#17206] Introduce new Regexp option to avoid global MatchData allocations
2020-09-30 15:42 [ruby-core:100239] [Ruby master Feature#17206] Introduce new Regexp option to avoid MatchData allocation fatkodima123
` (5 preceding siblings ...)
2020-10-24 14:51 ` [ruby-core:100524] " eregontp
@ 2020-10-28 22:43 ` scivola20
6 siblings, 0 replies; 8+ messages in thread
From: scivola20 @ 2020-10-28 22:43 UTC (permalink / raw)
To: ruby-core
Issue #17206 has been updated by scivola20 (sciv ola).
Sorry. “a huge amount of String garbage” is my misunderstanding.
But I don’t know under what situation this option may cause a bug.
----------------------------------------
Feature #17206: Introduce new Regexp option to avoid global MatchData allocations
https://bugs.ruby-lang.org/issues/17206#change-88261
* Author: fatkodima (Dima Fatko)
* Status: Open
* Priority: Normal
----------------------------------------
Originates from https://bugs.ruby-lang.org/issues/17030
When this option is specified, ruby will not create global `MatchData` objects, when not explicitly needed by the method.
If the new option is named `f`, we can write as `/o/f`, and `grep(/o/f)` is faster than `grep(/o/)`.
This speeds up not only `grep`, but also `all?`, `any?`, `case` and so on.
Many people have written code like this:
```ruby
IO.foreach("foo.txt") do |line|
case line
when /^#/
# do nothing
when /^(\d+)/
# using $1
when /xxx/
# using $&
when /yyy/
# not using $&
else
# ...
end
end
```
This is slow, because of the above mentioned problem.
Replacing `/^#/` with `/^#/f`, and `/yyy/` with `/yyy/f` will make it faster.
Some benchmarks - https://bugs.ruby-lang.org/issues/17030#note-9 which show `2.5x` to `5x` speedup.
PR: https://github.com/ruby/ruby/pull/3455
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2020-10-28 22:43 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-30 15:42 [ruby-core:100239] [Ruby master Feature#17206] Introduce new Regexp option to avoid MatchData allocation fatkodima123
2020-09-30 15:58 ` [ruby-core:100240] " zn
2020-09-30 16:22 ` [ruby-core:100241] " fatkodima123
2020-09-30 18:18 ` [ruby-core:100242] [Ruby master Feature#17206] Introduce new Regexp option to avoid global MatchData allocations eregontp
2020-10-24 1:34 ` [ruby-core:100519] " scivola20
2020-10-24 14:30 ` [ruby-core:100523] " eregontp
2020-10-24 14:51 ` [ruby-core:100524] " eregontp
2020-10-28 22:43 ` [ruby-core:100626] " scivola20
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).