ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts
@ 2021-04-27 23:39 sam.saffron
  2021-04-28  5:23 ` [ruby-core:103636] " mame
                   ` (43 more replies)
  0 siblings, 44 replies; 45+ messages in thread
From: sam.saffron @ 2021-04-27 23:39 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been reported by sam.saffron (Sam Saffron).

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:103636] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
@ 2021-04-28  5:23 ` mame
  2021-04-28  7:25 ` [ruby-core:103639] " sam.saffron
                   ` (42 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: mame @ 2021-04-28  5:23 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by mame (Yusuke Endoh).


I like this idea. It might be better to use SIGVTALRM instead of a monitor thread. However, it may affect the performance of a program that repeatedly uses a small and trivial regexp. Anyone can try to implement it and evaluate the performance?

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-91730

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:103639] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
  2021-04-28  5:23 ` [ruby-core:103636] " mame
@ 2021-04-28  7:25 ` sam.saffron
  2021-04-29  5:23 ` [ruby-core:103652] " mame
                   ` (41 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: sam.saffron @ 2021-04-28  7:25 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by sam.saffron (Sam Saffron).


I wonder if even a lightweight SIGVTALRM may be too much of a performance hit? On the upside though not needing to think about a background thread is nice! 

If we are doing a background thread implementation I would recommend dropping fidelity. 

That way every 100ms (or something else reasonable) we would walk all threads checking for a particular regexp execution. If you see the same globally increasing "regexp run number" twice you know it has been running for at least 100ms.

That way ultra short regexps pay almost no cost (log regexp_run_number++ in thread local storage is about all)  

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-91735

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:103652] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
  2021-04-28  5:23 ` [ruby-core:103636] " mame
  2021-04-28  7:25 ` [ruby-core:103639] " sam.saffron
@ 2021-04-29  5:23 ` mame
  2021-04-29 13:05 ` [ruby-core:103655] " daniel
                   ` (40 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: mame @ 2021-04-29  5:23 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by mame (Yusuke Endoh).


Invoking a thread implicitly in the interpreter is troublesome. Previously, Ruby had a timer thread, but as far as I know, it was (almost) abundaned by @normalperson. If you try to revive a timer thread, you should learn the complex history first.

Another idea suggested by @naruse: simply recording the start time of onig_search, calculating the elapsed time at `CHECK_INTERRUPT_IN_MATCH_AT`, and raising an exception if the time limit exceeded. This approach is very tractable because it does not use any asynchronous things like SIGVTALRM nor a thread. 

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-91748

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:103655] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (2 preceding siblings ...)
  2021-04-29  5:23 ` [ruby-core:103652] " mame
@ 2021-04-29 13:05 ` daniel
  2021-04-30 20:29 ` [ruby-core:103676] " eregontp
                   ` (39 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: daniel @ 2021-04-29 13:05 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by Dan0042 (Daniel DeLorme).


+1 wonderful and useful idea

Even without the DoS aspect, it's all too easy to create regexps with pathological performance that only manifests in certain edge cases, usually in production. It would be very useful if some kind of timeout exception was raised. Ideally that exception should have references (attr_reader) to both the regexp and string that caused the timeout.

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-91751

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:103676] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (3 preceding siblings ...)
  2021-04-29 13:05 ` [ruby-core:103655] " daniel
@ 2021-04-30 20:29 ` eregontp
  2021-05-03  7:58 ` [ruby-core:103696] " sam.saffron
                   ` (38 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: eregontp @ 2021-04-30 20:29 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by Eregon (Benoit Daloze).


Shouldn't an app have a global timeout per request anyway, and that would catch regexps and other things too?
Capturing the time in regexp interrupt checks is easy but sounds fairly expensive.

Could such regexps emit a warning since they can result in pathological backtracking?
Or is it too difficult to detect them / the problematic patterns evolve over time?

FWIW it's possible to use a non-backtracking regexp engine for many but not all Ruby regular expressions (`--use-truffle-regex` in TruffleRuby).
So that would be one way: warn for those regexps that cannot be implemented without backtracking, but that's probably more restrictive than needed.

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-91771

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:103696] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (4 preceding siblings ...)
  2021-04-30 20:29 ` [ruby-core:103676] " eregontp
@ 2021-05-03  7:58 ` sam.saffron
  2021-05-03 13:38 ` [ruby-core:103700] " mame
                   ` (37 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: sam.saffron @ 2021-05-03  7:58 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by sam.saffron (Sam Saffron).


> Shouldn't an app have a global timeout per request anyway

Sort of, it gets complicated. Unicorn is easy cause it is single threaded. Killing off threads in Puma is much more fraught, in Sidekiq the old pattern of killing off was nuked by Mike cause he saw it as way too risky https://github.com/mperham/sidekiq/commit/7e094567a585578fad0bfd0c8669efb46643f853. 


> Or is it too difficult to detect them / the problematic patterns evolve over time?

Sadly I think they are very hard to predict upfront. 

I do hear you though, a zero cost when no timeout is defined and very cheap cost when a timeout is defined seems non trivial to implement.

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-91788

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:103700] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (5 preceding siblings ...)
  2021-05-03  7:58 ` [ruby-core:103696] " sam.saffron
@ 2021-05-03 13:38 ` mame
  2021-05-03 14:57 ` [ruby-core:103701] " jean.boussier
                   ` (36 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: mame @ 2021-05-03 13:38 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by mame (Yusuke Endoh).


Eregon (Benoit Daloze) wrote in #note-5:
> Shouldn't an app have a global timeout per request anyway, and that would catch regexps and other things too?

I agree that it is much better to have. Still, I think this proposal is good-to-have because, IMO, it mitigates ReDoS generally. But I admit that this is less important than per-request timeout.

> Capturing the time in regexp interrupt checks is easy but sounds fairly expensive.

Agreed, this is my main concern. Thus I think this proposal requires careful performance evaluation.

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-91791

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:103701] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (6 preceding siblings ...)
  2021-05-03 13:38 ` [ruby-core:103700] " mame
@ 2021-05-03 14:57 ` jean.boussier
  2021-05-04  0:12 ` [ruby-core:103705] " duerst
                   ` (35 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: jean.boussier @ 2021-05-03 14:57 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by byroot (Jean Boussier).


> I admit that this is less important than per-request timeout.

I'd like to second what Sam said on this. With what is now the most common way of deploying Ruby in production, namely a threaded web-server like Puma, there is no good way to have a global request timeout. The only mechanism that is semi-working is `Timeout.timeout` so ultimately `Thread.raise` which is very likely to leave the process in a corrupted state. Only forking servers can actually provide a reliable request timeout through `SIGTERM / SIGKILL`.

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-91792

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:103705] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (7 preceding siblings ...)
  2021-05-03 14:57 ` [ruby-core:103701] " jean.boussier
@ 2021-05-04  0:12 ` duerst
  2021-05-04 10:31 ` [ruby-core:103710] " eregontp
                   ` (34 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: duerst @ 2021-05-04  0:12 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by duerst (Martin Dürst).


sam.saffron (Sam Saffron) wrote in #note-6:

> > Or is it too difficult to detect them / the problematic patterns evolve over time?
> 
> Sadly I think they are very hard to predict upfront. 

In general, yes. But for an extremely large set of regular expressions, it's easy to predict that they are harmless. And some specific patterns in regular expressions are clear signs that something might go wrong.


> I do hear you though, a zero cost when no timeout is defined and very cheap cost when a timeout is defined seems non trivial to implement.

I very strongly suggest that this feature be voluntary, e.g. as an additional flag on the regular expression.

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-91798

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:103710] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (8 preceding siblings ...)
  2021-05-04  0:12 ` [ruby-core:103705] " duerst
@ 2021-05-04 10:31 ` eregontp
  2021-05-04 12:55 ` [ruby-core:103711] " daniel
                   ` (33 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: eregontp @ 2021-05-04 10:31 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by Eregon (Benoit Daloze).


sam.saffron (Sam Saffron) wrote in #note-6:
> Sort of, it gets complicated. Unicorn is easy cause it is single threaded. Killing off threads in Puma is much more fraught, in Sidekiq the old pattern of killing off was nuked by Mike cause he saw it as way too risky https://github.com/mperham/sidekiq/commit/7e094567a585578fad0bfd0c8669efb46643f853. 

I think fixing Timeout.timeout might be possible.
The main/major issue is it can trigger within `ensure`, right? Is there anything else?
We could automatically mask `Thread#raise` within `ensure` so it only happens after the `ensure` body completes.
And we could still have a larger "hard timeout" if an `ensure` takes way too long (shouldn't happen, but one cannot be sure).
I recall discussing this with @schneems some time ago on Twitter.

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-91803

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:103711] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (9 preceding siblings ...)
  2021-05-04 10:31 ` [ruby-core:103710] " eregontp
@ 2021-05-04 12:55 ` daniel
  2021-05-04 14:22 ` [ruby-core:103713] " jean.boussier
                   ` (32 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: daniel @ 2021-05-04 12:55 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by Dan0042 (Daniel DeLorme).


duerst (Martin Dürst) wrote in #note-9:
> I very strongly suggest that this feature be voluntary, e.g. as an additional flag on the regular expression.

If you have to turn it on for each regexp, that would make the feature kinda useless. I agree with the OP that this decision is at the application level. You want it either on or off for all/most regexps. Although maybe it would make sense to override the default timeout for a few specific regexps that are known to be time-consuming or performance-critical.

Rather than `CHECK_INTERRUPT_IN_MATCH_AT` would it be feasible to check for timeouts only when backtracking occurs?

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-91804

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:103713] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (10 preceding siblings ...)
  2021-05-04 12:55 ` [ruby-core:103711] " daniel
@ 2021-05-04 14:22 ` jean.boussier
  2021-05-05  2:02 ` [ruby-core:103725] " duerst
                   ` (31 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: jean.boussier @ 2021-05-04 14:22 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by byroot (Jean Boussier).


> The main/major issue is it can trigger within ensure, right? Is there anything else?

That would fix most issues, but not all. It can also trigger in places where exception are entirely unexpected, so there's just no `ensure`.

Also I'm not clear on the details, but some C extensions (often various clients) can't be interrupted by `Thread#raise`. 

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-91807

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:103725] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (11 preceding siblings ...)
  2021-05-04 14:22 ` [ruby-core:103713] " jean.boussier
@ 2021-05-05  2:02 ` duerst
  2021-05-05  5:28 ` [ruby-core:103730] " Eric Wong
                   ` (30 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: duerst @ 2021-05-05  2:02 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by duerst (Martin Dürst).


Eregon (Benoit Daloze) wrote in #note-10:

> I think fixing Timeout.timeout might be possible.
> The main/major issue is it can trigger within `ensure`, right? Is there anything else?
> We could automatically mask `Thread#raise` within `ensure` so it only happens after the `ensure` body completes.
> And we could still have a larger "hard timeout" if an `ensure` takes way too long (shouldn't happen, but one cannot be sure).
> I recall discussing this with @schneems some time ago on Twitter.

I created a separate issue for the improvement of Timeout.timeout: #17849. Please feel free to discuss there. My guess is that there are all kinds of other issues that can happen in a Web application, so it would be better to solve this for the general case.

Dan0042 (Daniel DeLorme) wrote in #note-11:
> duerst (Martin Dürst) wrote in #note-9:
> > I very strongly suggest that this feature be voluntary, e.g. as an additional flag on the regular expression.
> 
> If you have to turn it on for each regexp, that would make the feature kinda useless. I agree with the OP that this decision is at the application level.

I have no problems with making it possible to switch this on at the application level.

> You want it either on or off for all/most regexps. Although it would make sense to be able to override the default timeout for a few specific regexps that are known to be time-consuming or performance-critical.

Yes. My assumption is that when writing a regular expression, the writer should make sure it's well behaved. So in general, timeouts would only be needed for regular expressions that come from the outside.

> Rather than `CHECK_INTERRUPT_IN_MATCH_AT` would it be feasible to check for timeouts only when backtracking occurs?

In a backtracking regular expression engine, backtracking occurs very often. There are many cases of backtracking that are still totally harmless.

Ideally, a regular expression engine would deal with most regular expressions in a way similar to what RE2 (or any DFA-based implementation) does, and only use a timeout for those that a DFA-based strategy cannot handle (backreferences,...). But that would require quite a bit of implementation work.

(Of course all the above discussion is predicated on the assumption that timeouts cannot be added to regular expressions with negligible speed loss.) 

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-91820

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:103730] Re: [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (12 preceding siblings ...)
  2021-05-05  2:02 ` [ruby-core:103725] " duerst
@ 2021-05-05  5:28 ` Eric Wong
  2021-05-06 13:30 ` [ruby-core:103760] " daniel
                   ` (29 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Eric Wong @ 2021-05-05  5:28 UTC (permalink / raw)
  To: ruby-core

sam.saffron@gmail.com wrote:
> Feature #17837: Add support for Regexp timeouts
> https://bugs.ruby-lang.org/issues/17837

> I recommend against a "per Regexp" API as this decision is at
> the application level. You want to apply it to all regular
> expressions in all the gems you are consuming.

The syscall costs are higher nowadays and this will penalize
good regexps.  IME with unicorn, global timeouts of this type
means problems go unfixed for too long and fester into worse
problems.

Ultimately many Ruby problems come from tolerating excessively
deep/complex dependency stacks(*) and developers having too
much crap to manage.

Anecdotally, my experience with Perl5 RE is better than with
Onig*.  I know Perl5 has the same underlying problems as Onig*,
however Perl5 RE seems less bad in practice.

Again, Perl5 RE does have underlying problems, but they don't
manifest nearly as much as they do with Ruby (I've as much
or more Perl experience than I have in Ruby).

One example I remember off the top of my head is
[ruby-core:74030].   I just tested that again after 5 years:
Ruby still infinite loops; Perl still terminates as it should.

Your example translated to Perl5 also stops fine for me:

	("A" . "C" x 100 . "X") =~ /A(B|C+)+D/;

Onig* might be able to learn a thing or three from Perl5 when it
comes to common real-world cases.  Again, I know Perl5 RE has
underlying problems just like Onig*, they do not manifest as
easily.


(*) and I apologize for letting crap like unicorn become too
    popular and perpetuating the existence of of buggy/broken
    code; I'll try to find more time to scare users away from it.

> I recommend against a move to RE2 at the moment as way too
> much would break 

RE2 could be done gradually, like frozen strings: %re2[foo]
Or a magic comment: "# regexp-engine: re2"

Perl supports pluggable re::engine since 2007, so more things
Ruby can learn from Perl :>
```

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:103760] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (13 preceding siblings ...)
  2021-05-05  5:28 ` [ruby-core:103730] " Eric Wong
@ 2021-05-06 13:30 ` daniel
  2021-05-06 15:20 ` [ruby-core:103761] " eregontp
                   ` (28 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: daniel @ 2021-05-06 13:30 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by Dan0042 (Daniel DeLorme).


duerst (Martin Dürst) wrote in #note-14:
> In a backtracking regular expression engine, backtracking occurs very often. There are many cases of backtracking that are still totally harmless.

Even if backtracking occurs very often, my thinking was that it occurs less often than `CHECK_INTERRUPT_IN_MATCH_AT`. But now that I'm looking at regexec.c I'm not so sure anymore. I can't make heads or tails of that code. But still, the slowness of a regexp is directly correlated to how much backtracking occurs, so it would make sense to tie the timeout into that. Like, check the timeout at every 256th backtrack or something like that. If anyone can figure out what constitutes a "backtrack" in the regexec code.

> Ideally, a regular expression engine would deal with most regular expressions in a way similar to what RE2 (or any DFA-based implementation) does

That's rather out of scope for this ticket isn't it?

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-91870

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:103761] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (14 preceding siblings ...)
  2021-05-06 13:30 ` [ruby-core:103760] " daniel
@ 2021-05-06 15:20 ` eregontp
  2021-05-07  6:03 ` [ruby-core:103769] " nobu
                   ` (27 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: eregontp @ 2021-05-06 15:20 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by Eregon (Benoit Daloze).


I noticed there is a reply in [ruby-core:103730] by @normalperson that unfortunately doesn't seem to be mirrored on Redmine:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/103730

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-91872

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:103769] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (15 preceding siblings ...)
  2021-05-06 15:20 ` [ruby-core:103761] " eregontp
@ 2021-05-07  6:03 ` nobu
  2021-05-07  7:57 ` [ruby-core:103770] " mame
                   ` (26 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: nobu @ 2021-05-07  6:03 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by nobu (Nobuyoshi Nakada).


I think that backtracking limit would be better than timeout.


----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-91880

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:103770] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (16 preceding siblings ...)
  2021-05-07  6:03 ` [ruby-core:103769] " nobu
@ 2021-05-07  7:57 ` mame
  2021-05-07  8:14 ` [ruby-core:103771] " sam.saffron
                   ` (25 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: mame @ 2021-05-07  7:57 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by mame (Yusuke Endoh).


I've created a simple prototype of `Regexp.timeout=` by a polling approach.

Conclusion first. It brings about 5% overhead in micro benchmark, unfortunately.
I guess it is unlikely to be significant in a real application, but not good anyway.

---

The following is about my patch, just for the record.

https://github.com/ruby/ruby/compare/master...mame:regexp-timeout-prototype

Implementation approach:

1. When starting regexp matching, the current time is recorded by using `clock_gettime(CLOCK_MONOTONIC)`
2. At `CHECK_INTERRUPT_IN_MATCH_AT`, the elapsed time is calculated and an exception is raised if expired

Example:

```
Regexp.timeout = 1 # one second
/^(([a-z])+)+$/ =~ "abcdefghijklmnopqrstuvwxyz@" #=> regexp match timeout (RuntimeError)
```

Benchmark:

The following simple regexp matching becomes 4.8% slower.

```
10000000.times { /(abc)+/ =~ "abcabcabc" }

# The minimum time in 10 executions
# before: 1.962 s
# after:  2.056 s
```

The following complex regexp matching becomes 5.2% slower.

```
/^(([a-z])+)+$/ =~ "abcdefghijklmnopqrstuvwxyz@"

# The minimum time in 10 executions
# before: 2.237 s
# after:  2.353 s
```

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-91882

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:103771] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (17 preceding siblings ...)
  2021-05-07  7:57 ` [ruby-core:103770] " mame
@ 2021-05-07  8:14 ` sam.saffron
  2021-05-11  7:33 ` [ruby-core:103780] " sam.saffron
                   ` (24 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: sam.saffron @ 2021-05-07  8:14 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by sam.saffron (Sam Saffron).


@mame not sure if the compiler takes care of this but maybe we can avoid calls to GET_THREAD if the static reg_match_time_limit is not set, just bypass all of this if the static is not set? 

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-91883

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:103780] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (18 preceding siblings ...)
  2021-05-07  8:14 ` [ruby-core:103771] " sam.saffron
@ 2021-05-11  7:33 ` sam.saffron
  2021-05-11 11:25 ` [ruby-core:103784] " mame
                   ` (23 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: sam.saffron @ 2021-05-11  7:33 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by sam.saffron (Sam Saffron).


I tested with: 

```
diff --git a/thread.c b/thread.c
index 47e43ecb63..811b6e88a8 100644
--- a/thread.c
+++ b/thread.c
@@ -1573,25 +1573,29 @@ rb_thread_reg_match_time_limit_get()
 void
 rb_thread_reg_match_start(void)
 {
-    rb_thread_t *th = GET_THREAD();
     if (reg_match_time_limit) {
-        th->reg_match_end_time = rb_hrtime_add(reg_match_time_limit, rb_hrtime_now());
-    }
-    else {
-        th->reg_match_end_time = 0;
+       rb_thread_t *th = GET_THREAD();
+       if (reg_match_time_limit) {
+           th->reg_match_end_time = rb_hrtime_add(reg_match_time_limit, rb_hrtime_now());
+       }
+       else {
+           th->reg_match_end_time = 0;
+       }
     }
 }
 
 void
 rb_thread_reg_check_ints(void)
 {
-    rb_thread_t *th = GET_THREAD();
+    if (reg_match_time_limit) {
+       rb_thread_t *th = GET_THREAD();
 
-    if (th->reg_match_end_time && th->reg_match_end_time < rb_hrtime_now()) {
-        VALUE argv[2];
-        argv[0] = rb_eRuntimeError;
-        argv[1] = rb_str_new2("regexp match timeout");
-        rb_threadptr_raise(th, 2, argv);
+       if (th->reg_match_end_time && th->reg_match_end_time < rb_hrtime_now()) {
+           VALUE argv[2];
+           argv[0] = rb_eRuntimeError;
+           argv[1] = rb_str_new2("regexp match timeout");
+           rb_threadptr_raise(th, 2, argv);
+       }
     }
 
     rb_thread_check_ints();
```

'10000000.times { /(abc)+/ =~ "abcabcabc" }'

Before (min over 10 runs): 1.590 after 1.610 ~ 1.2% slower


I can't figure out though how to squeeze back the perf on the big regex.

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-91893

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [ruby-core:103784] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (19 preceding siblings ...)
  2021-05-11  7:33 ` [ruby-core:103780] " sam.saffron
@ 2021-05-11 11:25 ` mame
  2021-05-11 12:23 ` [ruby-core:103785] " nobu
                   ` (22 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: mame @ 2021-05-11 11:25 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by mame (Yusuke Endoh).


Interesting. I've tested your patch, but it is not so small on my machine.

'10000000.times { /(abc)+/ =~ "abcabcabc" }'
Before (min over 10 runs): 2.037 after 1.962 ~ 3.8% slower

It may depend on the performance of branch prediction of CPU.

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-91896

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:103785] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (20 preceding siblings ...)
  2021-05-11 11:25 ` [ruby-core:103784] " mame
@ 2021-05-11 12:23 ` nobu
  2021-05-11 22:22 ` [ruby-core:103789] " sam.saffron
                   ` (21 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: nobu @ 2021-05-11 12:23 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by nobu (Nobuyoshi Nakada).


I made a patch for `Regexp#backtrack_limit=`.
This seems no significant performance difference.

https://github.com/ruby/ruby/compare/master...nobu:Regexp.backtrack_limit?expand=1

I don't think timeout per regexp is a good idea.
To avoid DoS on server side, timeout should be per-session, I think.

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-91897

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:103789] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (21 preceding siblings ...)
  2021-05-11 12:23 ` [ruby-core:103785] " nobu
@ 2021-05-11 22:22 ` sam.saffron
  2021-05-11 22:23 ` [ruby-core:103790] " sam.saffron
                   ` (20 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: sam.saffron @ 2021-05-11 22:22 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by sam.saffron (Sam Saffron).


@nobu I follow but unfortunately there are many ways in which `thread.raise` can corrupt internal state. 

See: https://github.com/mperham/sidekiq/issues/852 

Discussion goes back to 2008 on this one: http://blog.headius.com/2008/02/ruby-threadraise-threadkill-timeoutrb.html

Not providing any mechanism for safe timeouts on certain operations means you would need to drain+recycle entire processes on timeout for multi-threaded services (like the very popular Sidekiq and Puma). 

Databases provide timeouts on operations as do many other tools. They allow for partial mitigation. Regex is a very common oversight.


Just in the last few weeks we had a whole Rails security issue on this exact problem + 3 items on Discourse. It is incredibly common. 

Thanks so much for the backtrack_limit patch, I will test it out. Honestly most apps would do just fine with a 30 second timeout which may end up simply being setting backtrack limit to an outrageously high number. 

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-91902

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:103790] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (22 preceding siblings ...)
  2021-05-11 22:22 ` [ruby-core:103789] " sam.saffron
@ 2021-05-11 22:23 ` sam.saffron
  2021-05-12 15:13 ` [ruby-core:103800] " get.codetriage
                   ` (19 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: sam.saffron @ 2021-05-11 22:23 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by sam.saffron (Sam Saffron).


An alternative may be something like: 

`Thread.safe_raise` which allows for raising in places we consider "safe" like mid-regex. Not sure... 

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-91903

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:103800] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (23 preceding siblings ...)
  2021-05-11 22:23 ` [ruby-core:103790] " sam.saffron
@ 2021-05-12 15:13 ` get.codetriage
  2021-05-12 22:18 ` [ruby-core:103809] " daniel
                   ` (18 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: get.codetriage @ 2021-05-12 15:13 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by schneems (Richard Schneeman).


I commented on https://bugs.ruby-lang.org/issues/17849 which I would LOVE to see some movement on. I support being able to have high-level "safe" timeouts. I also support a separate effort to improve this pathological regex DoS problem though I don't have specific opinions on low-level details on implementation yet.

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-91915

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:103809] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (24 preceding siblings ...)
  2021-05-12 15:13 ` [ruby-core:103800] " get.codetriage
@ 2021-05-12 22:18 ` daniel
  2021-05-21 18:00 ` [ruby-core:103953] " mame
                   ` (17 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: daniel @ 2021-05-12 22:18 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by Dan0042 (Daniel DeLorme).


nobu (Nobuyoshi Nakada) wrote in #note-22:
> I made a patch for `Regexp#backtrack_limit=`.
> This seems no significant performance difference.

This is really perfect isn't it? It's much better that a wall clock timeout. If you have 2 active threads then a 1s timeout really means 0.5s of execution time; this approach is closer to a cpu time based timeout, and there's no syscall overhead.

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-91924

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:103953] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (25 preceding siblings ...)
  2021-05-12 22:18 ` [ruby-core:103809] " daniel
@ 2021-05-21 18:00 ` mame
  2021-07-14 11:59 ` [ruby-core:104598] " duerst
                   ` (16 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: mame @ 2021-05-21 18:00 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by mame (Yusuke Endoh).


We discussed this issue in today's dev-meeting. We agreed that, if we can find a good enough threshold, `Regexp#backtrack_limit=` is better than `Regexp#timeout=`. For example,

* the threshold should stop almost all practical Regexps that run in about one minute, and
* the threshold should NOT stop almost all practical Regexps that ends at most in a few seconds.

Of course, it depends on the input and regexps, so we need to evaluate this in practical settings. We will wait for @sam.saffron's experiment.

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-92085

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:104598] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (26 preceding siblings ...)
  2021-05-21 18:00 ` [ruby-core:103953] " mame
@ 2021-07-14 11:59 ` duerst
  2021-10-17 13:55 ` [ruby-core:105656] " Dan0042 (Daniel DeLorme)
                   ` (15 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: duerst @ 2021-07-14 11:59 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by duerst (Martin Dürst).


nobu (Nobuyoshi Nakada) wrote in #note-22:
> I made a patch for `Regexp#backtrack_limit=`.
> This seems no significant performance difference.
> 
> https://github.com/ruby/ruby/compare/master...nobu:Regexp.backtrack_limit?expand=1

I have looked at this patch. I think this is the general direction to go. I also think that the interface/API looks good, maybe having a keyword argument on Regexp.new, too, would be a good addition.

I installed the patch and ran some very experiments. I started with a very slow Regexp that I use to show my students. It can be made of any length n, but gets really slow when n grows, to the order of O(2**n). I realized that it actually may be some kind of worst case, because it's not really doing much except for backtracking. That means that the overhead of counting the backtracks will show very clearly. Any more 'average' example should slow down quite a bit less.

So here is the program I used:
```Ruby
HAS_BACKTRACK_LIMIT = Regexp.respond_to? :backtrack_limit

def slow_find (n)
  s = 'a' * n
  r = Regexp.compile('a?' * n + s)
  m = nil
  t1 = Time.now
  10.times { m = s.match(r) }
  t2 = Time.now
  print "n: #{n}, time: #{t2-t1}/10"
  print ", backtrack_count: #{m.backtrack_count}" if HAS_BACKTRACK_LIMIT
  puts
end

(25..29).each do |i|
  slow_find i
end
```

You can easily adjust this by changing the part (25..29) (the range of s n's used) and the two instances of '10' (the number of times a match is run in a row).

Here are the results for the patch:
```
duerst@Kloentalersee:~/nobuRuby$ ./ruby try_backtrack_limit.rb
n: 25, time: 2.8453695/10, backtrack_count: 33554431
n: 26, time: 5.6392941/10, backtrack_count: 67108863
n: 27, time: 11.3532755/10, backtrack_count: 134217727
n: 28, time: 24.1388335/10, backtrack_count: 268435455
n: 29, time: 49.084651/10, backtrack_count: 536870911
duerst@Kloentalersee:~/nobuRuby$ ./ruby try_backtrack_limit.rb
n: 25, time: 2.7971486/10, backtrack_count: 33554431
n: 26, time: 5.9609293/10, backtrack_count: 67108863
n: 27, time: 12.126138/10, backtrack_count: 134217727
n: 28, time: 24.7895166/10, backtrack_count: 268435455
n: 29, time: 49.6923646/10, backtrack_count: 536870911
duerst@Kloentalersee:~/nobuRuby$ ./ruby try_backtrack_limit.rb
n: 25, time: 2.8213545/10, backtrack_count: 33554431
n: 26, time: 6.1295964/10, backtrack_count: 67108863
n: 27, time: 12.1948968/10, backtrack_count: 134217727
n: 28, time: 24.6284841/10, backtrack_count: 268435455
n: 29, time: 48.6898231/10, backtrack_count: 536870911
```

And here are the results without the patch:
```
duerst@Kloentalersee:~/ruby3$ ./ruby ../nobuRuby/try_backtrack_limit.rb
n: 25, time: 2.6384167/10
n: 26, time: 5.2395088/10
n: 27, time: 11.3225276/10
n: 28, time: 23.289667/10
n: 29, time: 45.9637488/10
duerst@Kloentalersee:~/ruby3$ ./ruby ../nobuRuby/try_backtrack_limit.rb
n: 25, time: 2.5845849/10
n: 26, time: 5.2094378/10
n: 27, time: 10.5159888/10
n: 28, time: 22.5549276/10
n: 29, time: 45.600226/10
duerst@Kloentalersee:~/ruby3$ ./ruby ../nobuRuby/try_backtrack_limit.rb
n: 25, time: 2.5993792/10
n: 26, time: 5.2897985/10
n: 27, time: 11.2203586/10
n: 28, time: 23.1157868/10
n: 29, time: 45.0094087/10
```

These results where obtained on a WSL2/Ubuntu install on Windows 10. All other user programs were switched off, which on Windows doesn't mean there's nothing else running, of course. It should be clear from the above results that the difference is around 5%, maybe a bit higher, but not 10%.

As I already said, that's for what I think is pretty much the worst case. All this Regexp does in backtrack in a binary tree of depth n, testing out all the combinations of choosing 'a' or not 'a' in the first half of the Regexp (which is just "a?a?a?a?...."). Every time it looks for an 'a' in that part, it finds one. But then (except for the very very last try) it cannot match the second part of the Regexp (just n "a"s) to the rest of the string (which is also just n "a"s). For that, it actually doesn't need time, because this part is optimized with a Boyer-Moor algorithm, which means it just checks that the last "a" in the Regexp is beyond the actual string and so there's no match. This can be seen from the debug output when compiling Ruby with
```
#define ONIG_DEBUG_PARSE_TREE
#define ONIG_DEBUG_COMPILE
```
in regint.h, which results in the following:

```
$ ./ruby -e 'Regexp.new("a?a?a?aaa")'
`RubyGems' were not loaded.
`error_highlight' was not loaded.
`did_you_mean' was not loaded.

PATTERN: /a?a?a?aaa/ (US-ASCII)
<list:55e37d443dc0>
   <quantifier:55e37d443d80>{0,1}
      <string:55e37d443d40>a
   <quantifier:55e37d443e40>{0,1}
      <string:55e37d443e00>a
   <quantifier:55e37d445030>{0,1}
      <string:55e37d444ff0>a
   <string:55e37d4450b0>aaa
optimize: EXACT_BM
  anchor: []
  sub anchor: []

exact: [aaa]: length: 3
code length: 26
0:[push:(+2)] 5:[exact1:a] 7:[push:(+2)] 12:[exact1:a] 14:[push:(+2)]
19:[exact1:a] 21:[exact3:aaa] 25:[end] 
```

So this should give everybody some indication of the worst slowdown with this new backtrack_limit feature. Results for some more "average" scenarios would also help.

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-92884

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:105656] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (27 preceding siblings ...)
  2021-07-14 11:59 ` [ruby-core:104598] " duerst
@ 2021-10-17 13:55 ` Dan0042 (Daniel DeLorme)
  2021-10-25  3:09 ` [ruby-core:105770] " mame (Yusuke Endoh)
                   ` (14 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Dan0042 (Daniel DeLorme) @ 2021-10-17 13:55 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by Dan0042 (Daniel DeLorme).


So if we have 536870911 backtracks per 48.6898231/10 seconds, that comes out to roughly 110M backtracks per seconds.
How about fixing a safe limit of 60s -> 6600M backtracks?
Since it only stops the most pathological regexp after 60s, that means it will definitely NOT stop all practical Regexps that ends at most in a few seconds.
If this was incorporated in ruby 3.1 it would allow testing in real-world applications in order to find a lower threshold that should stop almost all practical Regexps that run in about one minute.

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-94156

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:105770] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (28 preceding siblings ...)
  2021-10-17 13:55 ` [ruby-core:105656] " Dan0042 (Daniel DeLorme)
@ 2021-10-25  3:09 ` mame (Yusuke Endoh)
  2021-10-25  4:26 ` [ruby-core:105772] " mame (Yusuke Endoh)
                   ` (13 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: mame (Yusuke Endoh) @ 2021-10-25  3:09 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by mame (Yusuke Endoh).


Discussed at dev-meeting today.

In summary, there are two proposals, `Regexp.timeout=` and `Regexp.backtrack_limit=`, which have a trade-off.

* `Regexp.timeout=` is easy to use in practical applications, but makes the regexp matching slow (especially in simple regexp cases)
* `Regexp.backtrack_limit=` introduces little runtime overhead, but is difficult to decide its good defacto limit.

@ko1 suggested mixing the two ideas; enabling the time limit after 10,000 backtracks occurred. This will introduce no overhead for many simple regexps, and provide easy-to-use time-based API.

I will give it a try to implement and experiment it later. (Or contribution is welcome.)

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-94284

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:105772] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (29 preceding siblings ...)
  2021-10-25  3:09 ` [ruby-core:105770] " mame (Yusuke Endoh)
@ 2021-10-25  4:26 ` mame (Yusuke Endoh)
  2021-10-25  4:39 ` [ruby-core:105773] " mame (Yusuke Endoh)
                   ` (12 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: mame (Yusuke Endoh) @ 2021-10-25  4:26 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by mame (Yusuke Endoh).


mame (Yusuke Endoh) wrote in #note-31:
> @ko1 suggested mixing the two ideas

According to @ko1, @knu suggested it first. Sorry for my wrong credit.

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-94286

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:105773] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (30 preceding siblings ...)
  2021-10-25  4:26 ` [ruby-core:105772] " mame (Yusuke Endoh)
@ 2021-10-25  4:39 ` mame (Yusuke Endoh)
  2021-10-25 11:17 ` [ruby-core:105787] " Eregon (Benoit Daloze)
                   ` (11 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: mame (Yusuke Endoh) @ 2021-10-25  4:39 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by mame (Yusuke Endoh).


For the record: this API will not interrupt the execution of a regexp that includes no interrupt check. Typically just a long regexp like `/xxxxxxxxx....{so long}...xxx/` may not be interrupted even if the time limit is exceeded. The document of this API should note this.

In short, `/#{untrusted_input}/ =~ something` will be vulnerabile against DoS even after this API is used.

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-94287

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:105787] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (31 preceding siblings ...)
  2021-10-25  4:39 ` [ruby-core:105773] " mame (Yusuke Endoh)
@ 2021-10-25 11:17 ` Eregon (Benoit Daloze)
  2021-10-25 16:42 ` [ruby-core:105791] " mame (Yusuke Endoh)
                   ` (10 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Eregon (Benoit Daloze) @ 2021-10-25 11:17 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by Eregon (Benoit Daloze).


What if the time between two backtracks is much larger for some Regexp, isn't that possible with many characters being matched and then at the end a possible backtrack? (e.g., something like `/(a{100000}|b{100000})*/`)
If so, it sounds like 10000 backtracks could be either microseconds or seconds, i.e., not necessarily related to time, and the approach would not work for some Regexps which backtrack.

IMHO a better solution to this is https://youtu.be/DYPCkR7Ngx8?t=1231 / https://eregon.me/blog/assets/research/just-in-time-compiling-ruby-regexps-on-truffleruby.pdf slide 18 (which is what TruffleRuby does).
i.e., use a automaton-based regexp engine (which always matches in linear time) and warn for regexps which can't be run by it (called "slow regexps").
Those slow regexps should then be reviewed and ideally rewritten so they can be matched by the automaton-based regexp engine.
They could also have a timeout if needed, with much less impact than on all regexps.

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-94304

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:105791] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (32 preceding siblings ...)
  2021-10-25 11:17 ` [ruby-core:105787] " Eregon (Benoit Daloze)
@ 2021-10-25 16:42 ` mame (Yusuke Endoh)
  2021-10-25 17:21 ` [ruby-core:105793] " Dan0042 (Daniel DeLorme)
                   ` (9 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: mame (Yusuke Endoh) @ 2021-10-25 16:42 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by mame (Yusuke Endoh).


In my understanding, this feature is just a workaround to prevent Regexp DoS pracitcally. We can craft a regexp skirting this feature, but it would have worked well against regexps that I have ever seen as DoS issues.

I agree that using automaton-based regexp engine is a smarter solution, but it requires changes in Ruby code. On the other hand, this feature is never a best solution, but will mitigate many Regexp DoS issues with minimum incompatibility.

I hope we can introduce this feature as a short-term DoS mitigation in near future (if possible, Ruby 3.1).

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-94309

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:105793] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (33 preceding siblings ...)
  2021-10-25 16:42 ` [ruby-core:105791] " mame (Yusuke Endoh)
@ 2021-10-25 17:21 ` Dan0042 (Daniel DeLorme)
  2022-03-22  6:45 ` [ruby-core:108015] " mame (Yusuke Endoh)
                   ` (8 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Dan0042 (Daniel DeLorme) @ 2021-10-25 17:21 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by Dan0042 (Daniel DeLorme).


There are other tradeoffs to consider
* `Regexp.backtrack_limit=` is deterministic, and will stop execution after a certain amount of "processing" regardless of how many threads are busy
* `Regexp.timeout=` will stop a regexp after a certain time regardless of how other many threads are busy or the nature/composition of the regexp

Personally I don't care much for the `Regexp.timeout` approach; I consider that `backtrack_limit` is a better indicator of ReDOS (e.g 1M backtracks in 1s may be ok, but 10M backtracks in 1s is not).
So if we're mixing the two approaches I would like some control over this, such as `Regexp.backtrack_limit = a..b` where the time limit is enabled after `a` backtracks and `b` is the hard backtrack limit.



Eregon (Benoit Daloze) wrote in #note-34:
> What if the time between two backtracks is much larger for some Regexp, isn't that possible with many characters being matched and then at the end a possible backtrack? (e.g., something like `/(a{100000}|b{100000})*/`)
> If so, it sounds like 10000 backtracks could be either microseconds or seconds, i.e., not necessarily related to time, and the approach would not work for some Regexps which backtrack.

I don't think we need to worry that much about a regexp custom-made to be slow. ReDOS is about custom-made _strings_ that trigger backtracking in very plain, regular-looking regexps. In CVE-2021-22880, a regexp as simple as `/^-?\D+[\d,]+\.\d{2}$/` was the source of the trouble. I think it's ok to think of ReDOS protection in terms of such real-life regexps like that one, and not the realm of all possible weird regexps. And I think these real-life regexps will have a predictable relationship between number of backtracks and time.

> IMHO a better solution to this is use a automaton-based regexp engine (which always matches in linear time)

It may indeed be "better", but when will it be available? `Regexp.backtrack_limit=` is available right now, which makes it "better" by default, IMHO.

The `Regexp.backtrack_limit=` approach is
* simple
* deterministic
* almost no overheard
* available now

`Regexp.timeout=` sounds "easy to use in practical applications" but it's also a bit arbitrary. What timeout to use? 5 seconds? Why 5? In reality we should measure how long regexps take to execute and then fix a limit based on the largest valid measured value. And at that point there's no reason why time it easier to measure than backtracks.

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-94311

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:108015] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (34 preceding siblings ...)
  2021-10-25 17:21 ` [ruby-core:105793] " Dan0042 (Daniel DeLorme)
@ 2022-03-22  6:45 ` mame (Yusuke Endoh)
  2022-03-22 10:12 ` [ruby-core:108017] " Eregon (Benoit Daloze)
                   ` (7 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: mame (Yusuke Endoh) @ 2022-03-22  6:45 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by mame (Yusuke Endoh).


I discussed this issue with some committers including @matz, @nobu, @akr, and @naruse. In light of the recent increase in ReDoS reports, we agreed as follows.

We will introduce the following new APIs.

* `Regexp.timeout` and `Regexp.timeout=` which get and set the process-global timeout configuration for Regexp matching, and
* `Regexp.new(src, timeout: Integer)` and `Regexp#timeout` which get and set the per-Regexp timeout configuration. This is prioritized to the global configuration.

Regexp matching methods (`=~`, `Regexp#match`, etc?) will raise a `Regexp::TimeoutError` exception when it hits timeout. To reuse the code that `rescue`s `Timeout::Error`, `Regexp::TimeoutError` should inherit from `Timeout::Error`. For the sake, we need to make timeout gem built-in.

I'll try creating a PR, and share details if any.

BTW, we agreed that we do not introduce `Regexp.backtrack_limit=`. It would be "deterministic" for one Ruby version, which is indeed good. However, it would not be "deterministic" over mutiple Ruby versions. It is difficult to define the number of "backtracks". It depends highly on the implementation details and optimizations of the regular expression engine. In future we may replace onigmo with its newer version, or even other regexp implementations such as oniguruma. We cannot guarantee its compatibility.

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-96974

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:108017] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (35 preceding siblings ...)
  2022-03-22  6:45 ` [ruby-core:108015] " mame (Yusuke Endoh)
@ 2022-03-22 10:12 ` Eregon (Benoit Daloze)
  2022-03-22 13:55 ` [ruby-core:108023] " Dan0042 (Daniel DeLorme)
                   ` (6 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Eregon (Benoit Daloze) @ 2022-03-22 10:12 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by Eregon (Benoit Daloze).


mame (Yusuke Endoh) wrote in #note-37:
> To reuse the code that `rescue`s `Timeout::Error`, `Regexp::TimeoutError` should inherit from `Timeout::Error`. For the sake, we need to make timeout gem built-in.

I think it's not a good idea to have Regexp::TimeoutError < Timeout::Error.
Existing usages of `Timeout.timeout` don't expect Regexp matching can cause it (and `Timeout.timeout` will still not affect Regexps as I understand), so it could cause some breakage (i.e., go to the `rescue Timeout::Error` for a Regexp timeout, while before this would only be for a timeout from `Timeout.timeout`).

I think it is best to be its own separate exception, if someone wants to rescue a `Regexp::TimeoutError` (should be pretty rare), then it seems best to spell it out.

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-96975

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:108023] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (36 preceding siblings ...)
  2022-03-22 10:12 ` [ruby-core:108017] " Eregon (Benoit Daloze)
@ 2022-03-22 13:55 ` Dan0042 (Daniel DeLorme)
  2022-03-23  0:21 ` [ruby-core:108029] " mame (Yusuke Endoh)
                   ` (5 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Dan0042 (Daniel DeLorme) @ 2022-03-22 13:55 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by Dan0042 (Daniel DeLorme).


mame (Yusuke Endoh) wrote in #note-37:
> BTW, we agreed that we do not introduce `Regexp.backtrack_limit=`. It would be "deterministic" for one Ruby version, which is indeed good. However, it would not be "deterministic" over mutiple Ruby versions. It is difficult to define the number of "backtracks". It depends highly on the implementation details and optimizations of the regular expression engine. In future we may replace onigmo with its newer version, or even other regexp implementations such as oniguruma. We cannot guarantee its compatibility.

I find this unfortunate. `Regexp.timeout` is not even close to deterministic or predictable even for a single Ruby version. It depends on the CPU type, CPU load, number of threads. The "cannot guarantee compatibility" argument applies to timeout at least just as much as backtrack_limit.

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-96982

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:108029] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (37 preceding siblings ...)
  2022-03-22 13:55 ` [ruby-core:108023] " Dan0042 (Daniel DeLorme)
@ 2022-03-23  0:21 ` mame (Yusuke Endoh)
  2022-03-23 14:38 ` [ruby-core:108041] " Eregon (Benoit Daloze)
                   ` (4 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: mame (Yusuke Endoh) @ 2022-03-23  0:21 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by mame (Yusuke Endoh).


Eregon (Benoit Daloze) wrote in #note-38:
> I think it's not a good idea to have Regexp::TimeoutError < Timeout::Error.

@naruse conceived the idea. TBH, I am unsure if it will work well. But I think it is good to try it first, and we can consider removing the inheritance if we discover any actual problems.

Dan0042 (Daniel DeLorme) wrote in #note-39:
> The "cannot guarantee compatibility" argument applies to timeout at least just as much as backtrack_limit.

Yes, both `timeout` and `backtrack_limit` are indeterministic. Then, `timeout` is better because it is often determined from application requirement. We also took into account the fact that the predecessor of this API, .NET, provided `timeout` and not `backtrack_limit`-like thing.

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-96988

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:108041] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (38 preceding siblings ...)
  2022-03-23  0:21 ` [ruby-core:108029] " mame (Yusuke Endoh)
@ 2022-03-23 14:38 ` Eregon (Benoit Daloze)
  2022-03-28  5:15 ` [ruby-core:108094] " mame (Yusuke Endoh)
                   ` (3 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: Eregon (Benoit Daloze) @ 2022-03-23 14:38 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by Eregon (Benoit Daloze).


mame (Yusuke Endoh) wrote in #note-40:
> @naruse conceived the idea. TBH, I am unsure if it will work well.

@naruse Could you explain why you think Regexp::TimeoutError should inherit from Timeout::Error?
And give an example from existing code where this is useful?
I think there is no good use case for this inheritance.

> But I think it is good to try it first, and we can consider removing the inheritance if we discover any actual problems.

I think that's not going to work, if we do it first we'll likely never be able to undo it.
We need to decide this when introducing the feature, we can't change it after the fact as it will cause compatibility issues to change it.



----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-97001

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:108094] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (39 preceding siblings ...)
  2022-03-23 14:38 ` [ruby-core:108041] " Eregon (Benoit Daloze)
@ 2022-03-28  5:15 ` mame (Yusuke Endoh)
  2022-03-28  6:04 ` [ruby-core:108096] " mame (Yusuke Endoh)
                   ` (2 subsequent siblings)
  43 siblings, 0 replies; 45+ messages in thread
From: mame (Yusuke Endoh) @ 2022-03-28  5:15 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by mame (Yusuke Endoh).


Eregon (Benoit Daloze) wrote in #note-41:
> @naruse Could you explain why you think Regexp::TimeoutError should inherit from Timeout::Error?
> And give an example from existing code where this is useful?
> I think there is no good use case for this inheritance.

As far as I understand, this will allow to reuse code that does "rescue Timeout::Error".
Because Timeout::Error may be raised at any point in a code block of Timeout.timeout, such "rescuse" clause should be robust, so jumping from Regexp matching to that code would be considered safe.

However, @ko1 and I found another problem of the inheritance: `Thread.handle_interrupt`.
Until now, Timeout::Error has traditionally been an asynchronous exception.
Thus, it must not be raised during a code block in `Thread.handle_interrupt(Timeout::Error)`
However, Regexp::TimeoutError will be raised synchronously, so it won't be masked.
Instead, it will immediately be raised even in `Thread.handle_interrupt`, which may bring confusion.

I spoke to @matz about this issue, and he said it would be good for Regexp::TimeoutError not to inherit from Timeout::Error.
@naruse Do you have an opinion?

Alternatively, we may raise Regexp::TimeoutError asynchronously.
However, this means it does not stop Regexp matching in `Thread.handle_interrupt`.
I think this will be against the original motivation, ReDoS countermeasures.

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-97056

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:108096] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (40 preceding siblings ...)
  2022-03-28  5:15 ` [ruby-core:108094] " mame (Yusuke Endoh)
@ 2022-03-28  6:04 ` mame (Yusuke Endoh)
  2022-03-28 10:19 ` [ruby-core:108098] " Eregon (Benoit Daloze)
  2022-03-30  1:22 ` [ruby-core:108116] " mame (Yusuke Endoh)
  43 siblings, 0 replies; 45+ messages in thread
From: mame (Yusuke Endoh) @ 2022-03-28  6:04 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by mame (Yusuke Endoh).


I created a PR: https://github.com/ruby/ruby/pull/5703

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-97058

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:108098] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (41 preceding siblings ...)
  2022-03-28  6:04 ` [ruby-core:108096] " mame (Yusuke Endoh)
@ 2022-03-28 10:19 ` Eregon (Benoit Daloze)
  2022-03-30  1:22 ` [ruby-core:108116] " mame (Yusuke Endoh)
  43 siblings, 0 replies; 45+ messages in thread
From: Eregon (Benoit Daloze) @ 2022-03-28 10:19 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by Eregon (Benoit Daloze).


Good point `Timeout::Error` being an asynchronous (i.e., Thread#raise) exception, and of course `Regexp::TimeoutError` should be a regular "synchronous" exception (like `Kernel#raise`), because it can only happen from inside Regexp matching.
Reusing the rescue handler of `Timeout::Error` seems not useful to me, that rescue handler likely only correctly deals with a `Timeout.timeout` timeout.
For robust exception hanlding, one would likely use something like, and that covers both:
```ruby
begin
  ...
rescue StandardError => e
  # rescue StandardError and not Exception, otherwise we'd need to immediately reraise "fatal exceptions" like NoMemoryError, SystemStackError, SignalException, SystemExit and more
  log e
  # potentially retry up to N times
end
```
near the start/bottom of the stack.

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-97062

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [ruby-core:108116] [Ruby master Feature#17837] Add support for Regexp timeouts
  2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
                   ` (42 preceding siblings ...)
  2022-03-28 10:19 ` [ruby-core:108098] " Eregon (Benoit Daloze)
@ 2022-03-30  1:22 ` mame (Yusuke Endoh)
  43 siblings, 0 replies; 45+ messages in thread
From: mame (Yusuke Endoh) @ 2022-03-30  1:22 UTC (permalink / raw)
  To: ruby-core

Issue #17837 has been updated by mame (Yusuke Endoh).


@naruse said "let's try it with Ruby 3.2.0-preview1" so I'll merge my PR soon. 

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-97084

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2022-03-30  1:22 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
2021-04-28  5:23 ` [ruby-core:103636] " mame
2021-04-28  7:25 ` [ruby-core:103639] " sam.saffron
2021-04-29  5:23 ` [ruby-core:103652] " mame
2021-04-29 13:05 ` [ruby-core:103655] " daniel
2021-04-30 20:29 ` [ruby-core:103676] " eregontp
2021-05-03  7:58 ` [ruby-core:103696] " sam.saffron
2021-05-03 13:38 ` [ruby-core:103700] " mame
2021-05-03 14:57 ` [ruby-core:103701] " jean.boussier
2021-05-04  0:12 ` [ruby-core:103705] " duerst
2021-05-04 10:31 ` [ruby-core:103710] " eregontp
2021-05-04 12:55 ` [ruby-core:103711] " daniel
2021-05-04 14:22 ` [ruby-core:103713] " jean.boussier
2021-05-05  2:02 ` [ruby-core:103725] " duerst
2021-05-05  5:28 ` [ruby-core:103730] " Eric Wong
2021-05-06 13:30 ` [ruby-core:103760] " daniel
2021-05-06 15:20 ` [ruby-core:103761] " eregontp
2021-05-07  6:03 ` [ruby-core:103769] " nobu
2021-05-07  7:57 ` [ruby-core:103770] " mame
2021-05-07  8:14 ` [ruby-core:103771] " sam.saffron
2021-05-11  7:33 ` [ruby-core:103780] " sam.saffron
2021-05-11 11:25 ` [ruby-core:103784] " mame
2021-05-11 12:23 ` [ruby-core:103785] " nobu
2021-05-11 22:22 ` [ruby-core:103789] " sam.saffron
2021-05-11 22:23 ` [ruby-core:103790] " sam.saffron
2021-05-12 15:13 ` [ruby-core:103800] " get.codetriage
2021-05-12 22:18 ` [ruby-core:103809] " daniel
2021-05-21 18:00 ` [ruby-core:103953] " mame
2021-07-14 11:59 ` [ruby-core:104598] " duerst
2021-10-17 13:55 ` [ruby-core:105656] " Dan0042 (Daniel DeLorme)
2021-10-25  3:09 ` [ruby-core:105770] " mame (Yusuke Endoh)
2021-10-25  4:26 ` [ruby-core:105772] " mame (Yusuke Endoh)
2021-10-25  4:39 ` [ruby-core:105773] " mame (Yusuke Endoh)
2021-10-25 11:17 ` [ruby-core:105787] " Eregon (Benoit Daloze)
2021-10-25 16:42 ` [ruby-core:105791] " mame (Yusuke Endoh)
2021-10-25 17:21 ` [ruby-core:105793] " Dan0042 (Daniel DeLorme)
2022-03-22  6:45 ` [ruby-core:108015] " mame (Yusuke Endoh)
2022-03-22 10:12 ` [ruby-core:108017] " Eregon (Benoit Daloze)
2022-03-22 13:55 ` [ruby-core:108023] " Dan0042 (Daniel DeLorme)
2022-03-23  0:21 ` [ruby-core:108029] " mame (Yusuke Endoh)
2022-03-23 14:38 ` [ruby-core:108041] " Eregon (Benoit Daloze)
2022-03-28  5:15 ` [ruby-core:108094] " mame (Yusuke Endoh)
2022-03-28  6:04 ` [ruby-core:108096] " mame (Yusuke Endoh)
2022-03-28 10:19 ` [ruby-core:108098] " Eregon (Benoit Daloze)
2022-03-30  1:22 ` [ruby-core:108116] " mame (Yusuke Endoh)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).