ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
From: duerst@it.aoyama.ac.jp
To: ruby-core@ruby-lang.org
Subject: [ruby-core:103725] [Ruby master Feature#17837] Add support for Regexp timeouts
Date: Wed, 05 May 2021 02:02:43 +0000 (UTC)	[thread overview]
Message-ID: <redmine.journal-91820.20210505020242.5660@ruby-lang.org> (raw)
In-Reply-To: redmine.issue-17837.20210427233953.5660@ruby-lang.org

Issue #17837 has been updated by duerst (Martin Dürst).


Eregon (Benoit Daloze) wrote in #note-10:

> I think fixing Timeout.timeout might be possible.
> The main/major issue is it can trigger within `ensure`, right? Is there anything else?
> We could automatically mask `Thread#raise` within `ensure` so it only happens after the `ensure` body completes.
> And we could still have a larger "hard timeout" if an `ensure` takes way too long (shouldn't happen, but one cannot be sure).
> I recall discussing this with @schneems some time ago on Twitter.

I created a separate issue for the improvement of Timeout.timeout: #17849. Please feel free to discuss there. My guess is that there are all kinds of other issues that can happen in a Web application, so it would be better to solve this for the general case.

Dan0042 (Daniel DeLorme) wrote in #note-11:
> duerst (Martin Dürst) wrote in #note-9:
> > I very strongly suggest that this feature be voluntary, e.g. as an additional flag on the regular expression.
> 
> If you have to turn it on for each regexp, that would make the feature kinda useless. I agree with the OP that this decision is at the application level.

I have no problems with making it possible to switch this on at the application level.

> You want it either on or off for all/most regexps. Although it would make sense to be able to override the default timeout for a few specific regexps that are known to be time-consuming or performance-critical.

Yes. My assumption is that when writing a regular expression, the writer should make sure it's well behaved. So in general, timeouts would only be needed for regular expressions that come from the outside.

> Rather than `CHECK_INTERRUPT_IN_MATCH_AT` would it be feasible to check for timeouts only when backtracking occurs?

In a backtracking regular expression engine, backtracking occurs very often. There are many cases of backtracking that are still totally harmless.

Ideally, a regular expression engine would deal with most regular expressions in a way similar to what RE2 (or any DFA-based implementation) does, and only use a timeout for those that a DFA-based strategy cannot handle (backreferences,...). But that would require quite a bit of implementation work.

(Of course all the above discussion is predicated on the assumption that timeouts cannot be added to regular expressions with negligible speed loss.) 

----------------------------------------
Feature #17837: Add support for Regexp timeouts
https://bugs.ruby-lang.org/issues/17837#change-91820

* Author: sam.saffron (Sam Saffron)
* Status: Open
* Priority: Normal
----------------------------------------
### Background

ReDoS are a very common security issue. At Discourse we have seen a few through the years. https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS

In a nutshell there are 100s of ways this can happen in production apps, the key is for an attacker (or possibly innocent person) to supply either a problematic Regexp or a bad string to test it with.

```
/A(B|C+)+D/ =~ "A" + "C" * 100 + "X"
```

Having a problem Regexp somewhere in a large app is a universal constant, it will happen as long as you are using Regexps. 


Currently the only feasible way of supplying a consistent safeguard is by using `Thread.raise` and managing all execution. This kind of pattern requires usage of a third party implementation. There are possibly issues with jRuby and Truffle when taking approaches like this.

### Prior art

.NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=net-5.0

Java has nothing built in as far as I can tell: https://stackoverflow.com/questions/910740/cancelling-a-long-running-regex-match

Node has nothing built in as far as I can tell: https://stackoverflow.com/questions/38859506/cancel-regex-match-if-timeout


Golang and Rust uses RE2 which is not vulnerable to DoS by limiting features (available in Ruby RE2 gem)

```
irb(main):003:0> r = RE2::Regexp.new('A(B|C+)+D')
=> #<RE2::Regexp /A(B|C+)+D/>
irb(main):004:0> r.match("A" + "C" * 100 + "X")
=> nil
```

### Proposal

Implement `Regexp.timeout` which allow us to specify a global timeout for all Regexp operations in Ruby. 

Per Regexp would require massive application changes, almost all web apps would do just fine with a 1 second Regexp timeout.

If `timeout` is set to `nil` everything would work as it does today, when set to second a "monitor" thread would track running regexps and time them out according to the global value.

### Alternatives 

I recommend against a "per Regexp" API as this decision is at the application level. You want to apply it to all regular expressions in all the gems you are consuming.

I recommend against a move to RE2 at the moment as way too much would break 


### See also: 

https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation-2020.pdf
https://levelup.gitconnected.com/the-regular-expression-denial-of-service-redos-cheat-sheet-a78d0ed7d865





-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

  parent reply	other threads:[~2021-05-05  2:02 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-27 23:39 [ruby-core:103631] [Ruby master Feature#17837] Add support for Regexp timeouts sam.saffron
2021-04-28  5:23 ` [ruby-core:103636] " mame
2021-04-28  7:25 ` [ruby-core:103639] " sam.saffron
2021-04-29  5:23 ` [ruby-core:103652] " mame
2021-04-29 13:05 ` [ruby-core:103655] " daniel
2021-04-30 20:29 ` [ruby-core:103676] " eregontp
2021-05-03  7:58 ` [ruby-core:103696] " sam.saffron
2021-05-03 13:38 ` [ruby-core:103700] " mame
2021-05-03 14:57 ` [ruby-core:103701] " jean.boussier
2021-05-04  0:12 ` [ruby-core:103705] " duerst
2021-05-04 10:31 ` [ruby-core:103710] " eregontp
2021-05-04 12:55 ` [ruby-core:103711] " daniel
2021-05-04 14:22 ` [ruby-core:103713] " jean.boussier
2021-05-05  2:02 ` duerst [this message]
2021-05-05  5:28 ` [ruby-core:103730] " Eric Wong
2021-05-06 13:30 ` [ruby-core:103760] " daniel
2021-05-06 15:20 ` [ruby-core:103761] " eregontp
2021-05-07  6:03 ` [ruby-core:103769] " nobu
2021-05-07  7:57 ` [ruby-core:103770] " mame
2021-05-07  8:14 ` [ruby-core:103771] " sam.saffron
2021-05-11  7:33 ` [ruby-core:103780] " sam.saffron
2021-05-11 11:25 ` [ruby-core:103784] " mame
2021-05-11 12:23 ` [ruby-core:103785] " nobu
2021-05-11 22:22 ` [ruby-core:103789] " sam.saffron
2021-05-11 22:23 ` [ruby-core:103790] " sam.saffron
2021-05-12 15:13 ` [ruby-core:103800] " get.codetriage
2021-05-12 22:18 ` [ruby-core:103809] " daniel
2021-05-21 18:00 ` [ruby-core:103953] " mame
2021-07-14 11:59 ` [ruby-core:104598] " duerst
2021-10-17 13:55 ` [ruby-core:105656] " Dan0042 (Daniel DeLorme)
2021-10-25  3:09 ` [ruby-core:105770] " mame (Yusuke Endoh)
2021-10-25  4:26 ` [ruby-core:105772] " mame (Yusuke Endoh)
2021-10-25  4:39 ` [ruby-core:105773] " mame (Yusuke Endoh)
2021-10-25 11:17 ` [ruby-core:105787] " Eregon (Benoit Daloze)
2021-10-25 16:42 ` [ruby-core:105791] " mame (Yusuke Endoh)
2021-10-25 17:21 ` [ruby-core:105793] " Dan0042 (Daniel DeLorme)
2022-03-22  6:45 ` [ruby-core:108015] " mame (Yusuke Endoh)
2022-03-22 10:12 ` [ruby-core:108017] " Eregon (Benoit Daloze)
2022-03-22 13:55 ` [ruby-core:108023] " Dan0042 (Daniel DeLorme)
2022-03-23  0:21 ` [ruby-core:108029] " mame (Yusuke Endoh)
2022-03-23 14:38 ` [ruby-core:108041] " Eregon (Benoit Daloze)
2022-03-28  5:15 ` [ruby-core:108094] " mame (Yusuke Endoh)
2022-03-28  6:04 ` [ruby-core:108096] " mame (Yusuke Endoh)
2022-03-28 10:19 ` [ruby-core:108098] " Eregon (Benoit Daloze)
2022-03-30  1:22 ` [ruby-core:108116] " mame (Yusuke Endoh)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.ruby-lang.org/en/community/mailing-lists/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=redmine.journal-91820.20210505020242.5660@ruby-lang.org \
    --to=ruby-core@ruby-lang.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).