ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
From: "mame (Yusuke Endoh)" <noreply@ruby-lang.org>
To: ruby-core@ruby-lang.org
Subject: [ruby-core:108015] [Ruby master Feature#17837] Add support for Regexp timeouts
Date: Tue, 22 Mar 2022 06:45:27 +0000 (UTC)	[thread overview]
Message-ID: <redmine.journal-96974.20220322064527.5660@ruby-lang.org> (raw)
In-Reply-To: redmine.issue-17837.20210427233953.5660@ruby-lang.org

Issue #17837 has been updated by mame (Yusuke Endoh).


I discussed this issue with some committers including @matz, @nobu, @akr, and @naruse. In light of the recent increase in ReDoS reports, we agreed as follows.

We will introduce the following new APIs.

* `Regexp.timeout` and `Regexp.timeout=` which get and set the process-global timeout configuration for Regexp matching, and
* `Regexp.new(src, timeout: Integer)` and `Regexp#timeout` which get and set the per-Regexp timeout configuration. This is prioritized to the global configuration.

Regexp matching methods (`=~`, `Regexp#match`, etc?) will raise a `Regexp::TimeoutError` exception when it hits timeout. To reuse the code that `rescue`s `Timeout::Error`, `Regexp::TimeoutError` should inherit from `Timeout::Error`. For the sake, we need to make timeout gem built-in.

I'll try creating a PR, and share details if any.

BTW, we agreed that we do not introduce `Regexp.backtrack_limit=`. It would be "deterministic" for one Ruby version, which is indeed good. However, it would not be "deterministic" over mutiple Ruby versions. It is difficult to define the number of "backtracks". It depends highly on the implementation details and optimizations of the regular expression engine. In future we may replace onigmo with its newer version, or even other regexp implementations such as oniguruma. We cannot guarantee its compatibility.

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-96974

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

  parent reply	other threads:[~2022-03-22  6:46 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
2021-04-28  5:23 ` [ruby-core:103636] " mame
2021-04-28  7:25 ` [ruby-core:103639] " sam.saffron
2021-04-29  5:23 ` [ruby-core:103652] " mame
2021-04-29 13:05 ` [ruby-core:103655] " daniel
2021-04-30 20:29 ` [ruby-core:103676] " eregontp
2021-05-03  7:58 ` [ruby-core:103696] " sam.saffron
2021-05-03 13:38 ` [ruby-core:103700] " mame
2021-05-03 14:57 ` [ruby-core:103701] " jean.boussier
2021-05-04  0:12 ` [ruby-core:103705] " duerst
2021-05-04 10:31 ` [ruby-core:103710] " eregontp
2021-05-04 12:55 ` [ruby-core:103711] " daniel
2021-05-04 14:22 ` [ruby-core:103713] " jean.boussier
2021-05-05  2:02 ` [ruby-core:103725] " duerst
2021-05-05  5:28 ` [ruby-core:103730] " Eric Wong
2021-05-06 13:30 ` [ruby-core:103760] " daniel
2021-05-06 15:20 ` [ruby-core:103761] " eregontp
2021-05-07  6:03 ` [ruby-core:103769] " nobu
2021-05-07  7:57 ` [ruby-core:103770] " mame
2021-05-07  8:14 ` [ruby-core:103771] " sam.saffron
2021-05-11  7:33 ` [ruby-core:103780] " sam.saffron
2021-05-11 11:25 ` [ruby-core:103784] " mame
2021-05-11 12:23 ` [ruby-core:103785] " nobu
2021-05-11 22:22 ` [ruby-core:103789] " sam.saffron
2021-05-11 22:23 ` [ruby-core:103790] " sam.saffron
2021-05-12 15:13 ` [ruby-core:103800] " get.codetriage
2021-05-12 22:18 ` [ruby-core:103809] " daniel
2021-05-21 18:00 ` [ruby-core:103953] " mame
2021-07-14 11:59 ` [ruby-core:104598] " duerst
2021-10-17 13:55 ` [ruby-core:105656] " Dan0042 (Daniel DeLorme)
2021-10-25  3:09 ` [ruby-core:105770] " mame (Yusuke Endoh)
2021-10-25  4:26 ` [ruby-core:105772] " mame (Yusuke Endoh)
2021-10-25  4:39 ` [ruby-core:105773] " mame (Yusuke Endoh)
2021-10-25 11:17 ` [ruby-core:105787] " Eregon (Benoit Daloze)
2021-10-25 16:42 ` [ruby-core:105791] " mame (Yusuke Endoh)
2021-10-25 17:21 ` [ruby-core:105793] " Dan0042 (Daniel DeLorme)
2022-03-22  6:45 ` mame (Yusuke Endoh) [this message]
2022-03-22 10:12 ` [ruby-core:108017] " Eregon (Benoit Daloze)
2022-03-22 13:55 ` [ruby-core:108023] " Dan0042 (Daniel DeLorme)
2022-03-23  0:21 ` [ruby-core:108029] " mame (Yusuke Endoh)
2022-03-23 14:38 ` [ruby-core:108041] " Eregon (Benoit Daloze)
2022-03-28  5:15 ` [ruby-core:108094] " mame (Yusuke Endoh)
2022-03-28  6:04 ` [ruby-core:108096] " mame (Yusuke Endoh)
2022-03-28 10:19 ` [ruby-core:108098] " Eregon (Benoit Daloze)
2022-03-30  1:22 ` [ruby-core:108116] " mame (Yusuke Endoh)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.ruby-lang.org/en/community/mailing-lists/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=redmine.journal-96974.20220322064527.5660@ruby-lang.org \
    --to=ruby-core@ruby-lang.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).