From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS4713 221.184.0.0/13 X-Spam-Status: No, score=-3.8 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from neon.ruby-lang.org (neon.ruby-lang.org [221.186.184.75]) by dcvr.yhbt.net (Postfix) with ESMTP id 2CD971F8C6 for ; Wed, 14 Jul 2021 11:59:09 +0000 (UTC) Received: from neon.ruby-lang.org (localhost [IPv6:::1]) by neon.ruby-lang.org (Postfix) with ESMTP id A4821120993; Wed, 14 Jul 2021 20:57:53 +0900 (JST) Received: from o1678948x4.outbound-mail.sendgrid.net (o1678948x4.outbound-mail.sendgrid.net [167.89.48.4]) by neon.ruby-lang.org (Postfix) with ESMTPS id 008B0120970 for ; Wed, 14 Jul 2021 20:57:50 +0900 (JST) Received: by filterdrecv-6655cd48f5-rfvwb with SMTP id filterdrecv-6655cd48f5-rfvwb-1-60EED186-70 2021-07-14 11:59:02.751851577 +0000 UTC m=+138851.435574106 Received: from herokuapp.com (unknown) by geopod-ismtpd-4-0 (SG) with ESMTP id XNxUV6eSTVGbqtq2_C7jMw for ; Wed, 14 Jul 2021 11:59:02.668 +0000 (UTC) Date: Wed, 14 Jul 2021 11:59:02 +0000 (UTC) From: duerst@it.aoyama.ac.jp Message-ID: References: Mime-Version: 1.0 X-Redmine-Project: ruby-master X-Redmine-Issue-Tracker: Feature X-Redmine-Issue-Id: 17837 X-Redmine-Issue-Author: sam.saffron X-Redmine-Sender: duerst X-Mailer: Redmine X-Redmine-Host: bugs.ruby-lang.org X-Redmine-Site: Ruby Issue Tracking System X-Auto-Response-Suppress: All Auto-Submitted: auto-generated X-Redmine-MailingListIntegration-Message-Ids: 80725 X-SG-EID: =?us-ascii?Q?uQY=2F2xNrNfHHTWbKn6MBvvzfU5Pqk9I4lnOVb0CFDuvKNo0WCbO0cuIA9vqgz7?= =?us-ascii?Q?OVjiEZHQStI4y0e71R33sFT=2F=2FyAYmtyarISWXTd?= =?us-ascii?Q?WEOHEm13s2b0Rezkh4Gi+rJqz4mAYTUrlzC9C51?= =?us-ascii?Q?i1EtTzAiWm8Np=2Fa9zgHz0U8M1VbgNKYixNQfoY1?= =?us-ascii?Q?SlrGk+ctfA61yrEr1XuC1Ng2v3Sp6HA4aVQ=3D=3D?= To: ruby-core@ruby-lang.org X-Entity-ID: b/2+PoftWZ6GuOu3b0IycA== X-ML-Name: ruby-core X-Mail-Count: 104598 Subject: [ruby-core:104598] [Ruby master Feature#17837] Add support for Regexp timeouts X-BeenThere: ruby-core@ruby-lang.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Ruby developers List-Id: Ruby developers List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Errors-To: ruby-core-bounces@ruby-lang.org Sender: "ruby-core" Issue #17837 has been updated by duerst (Martin D=FCrst). nobu (Nobuyoshi Nakada) wrote in #note-22: > I made a patch for `Regexp#backtrack_limit=3D`. > This seems no significant performance difference. > = > https://github.com/ruby/ruby/compare/master...nobu:Regexp.backtrack_limit= ?expand=3D1 I have looked at this patch. I think this is the general direction to go. I= also think that the interface/API looks good, maybe having a keyword argum= ent on Regexp.new, too, would be a good addition. I installed the patch and ran some very experiments. I started with a very = slow Regexp that I use to show my students. It can be made of any length n,= but gets really slow when n grows, to the order of O(2**n). I realized tha= t it actually may be some kind of worst case, because it's not really doing= much except for backtracking. That means that the overhead of counting the= backtracks will show very clearly. Any more 'average' example should slow = down quite a bit less. So here is the program I used: ```Ruby HAS_BACKTRACK_LIMIT =3D Regexp.respond_to? :backtrack_limit def slow_find (n) s =3D 'a' * n r =3D Regexp.compile('a?' * n + s) m =3D nil t1 =3D Time.now 10.times { m =3D s.match(r) } t2 =3D Time.now print "n: #{n}, time: #{t2-t1}/10" print ", backtrack_count: #{m.backtrack_count}" if HAS_BACKTRACK_LIMIT puts end (25..29).each do |i| slow_find i end ``` You can easily adjust this by changing the part (25..29) (the range of s n'= s used) and the two instances of '10' (the number of times a match is run i= n a row). Here are the results for the patch: ``` duerst@Kloentalersee:~/nobuRuby$ ./ruby try_backtrack_limit.rb n: 25, time: 2.8453695/10, backtrack_count: 33554431 n: 26, time: 5.6392941/10, backtrack_count: 67108863 n: 27, time: 11.3532755/10, backtrack_count: 134217727 n: 28, time: 24.1388335/10, backtrack_count: 268435455 n: 29, time: 49.084651/10, backtrack_count: 536870911 duerst@Kloentalersee:~/nobuRuby$ ./ruby try_backtrack_limit.rb n: 25, time: 2.7971486/10, backtrack_count: 33554431 n: 26, time: 5.9609293/10, backtrack_count: 67108863 n: 27, time: 12.126138/10, backtrack_count: 134217727 n: 28, time: 24.7895166/10, backtrack_count: 268435455 n: 29, time: 49.6923646/10, backtrack_count: 536870911 duerst@Kloentalersee:~/nobuRuby$ ./ruby try_backtrack_limit.rb n: 25, time: 2.8213545/10, backtrack_count: 33554431 n: 26, time: 6.1295964/10, backtrack_count: 67108863 n: 27, time: 12.1948968/10, backtrack_count: 134217727 n: 28, time: 24.6284841/10, backtrack_count: 268435455 n: 29, time: 48.6898231/10, backtrack_count: 536870911 ``` And here are the results without the patch: ``` duerst@Kloentalersee:~/ruby3$ ./ruby ../nobuRuby/try_backtrack_limit.rb n: 25, time: 2.6384167/10 n: 26, time: 5.2395088/10 n: 27, time: 11.3225276/10 n: 28, time: 23.289667/10 n: 29, time: 45.9637488/10 duerst@Kloentalersee:~/ruby3$ ./ruby ../nobuRuby/try_backtrack_limit.rb n: 25, time: 2.5845849/10 n: 26, time: 5.2094378/10 n: 27, time: 10.5159888/10 n: 28, time: 22.5549276/10 n: 29, time: 45.600226/10 duerst@Kloentalersee:~/ruby3$ ./ruby ../nobuRuby/try_backtrack_limit.rb n: 25, time: 2.5993792/10 n: 26, time: 5.2897985/10 n: 27, time: 11.2203586/10 n: 28, time: 23.1157868/10 n: 29, time: 45.0094087/10 ``` These results where obtained on a WSL2/Ubuntu install on Windows 10. All ot= her user programs were switched off, which on Windows doesn't mean there's = nothing else running, of course. It should be clear from the above results = that the difference is around 5%, maybe a bit higher, but not 10%. As I already said, that's for what I think is pretty much the worst case. A= ll this Regexp does in backtrack in a binary tree of depth n, testing out a= ll the combinations of choosing 'a' or not 'a' in the first half of the Reg= exp (which is just "a?a?a?a?...."). Every time it looks for an 'a' in that = part, it finds one. But then (except for the very very last try) it cannot = match the second part of the Regexp (just n "a"s) to the rest of the string= (which is also just n "a"s). For that, it actually doesn't need time, beca= use this part is optimized with a Boyer-Moor algorithm, which means it just= checks that the last "a" in the Regexp is beyond the actual string and so = there's no match. This can be seen from the debug output when compiling Rub= y with ``` #define ONIG_DEBUG_PARSE_TREE #define ONIG_DEBUG_COMPILE ``` in regint.h, which results in the following: ``` $ ./ruby -e 'Regexp.new("a?a?a?aaa")' `RubyGems' were not loaded. `error_highlight' was not loaded. `did_you_mean' was not loaded. PATTERN: /a?a?a?aaa/ (US-ASCII) {0,1} a {0,1} a {0,1} a aaa optimize: EXACT_BM anchor: [] sub anchor: [] exact: [aaa]: length: 3 code length: 26 0:[push:(+2)] 5:[exact1:a] 7:[push:(+2)] 12:[exact1:a] 14:[push:(+2)] 19:[exact1:a] 21:[exact3:aaa] 25:[end] = ``` So this should give everybody some indication of the worst slowdown with th= is new backtrack_limit feature. Results for some more "average" scenarios w= ould also help. ---------------------------------------- Feature #17837: Add support for Regexp timeouts https://bugs.ruby-lang.org/issues/17837#change-92884 * Author: sam.saffron (Sam Saffron) * Status: Open * Priority: Normal ---------------------------------------- ### Background ReDoS are a very common security issue. At Discourse we have seen a few thr= ough the years. https://owasp.org/www-community/attacks/Regular_expression_= Denial_of_Service_-_ReDoS In a nutshell there are 100s of ways this can happen in production apps, th= e key is for an attacker (or possibly innocent person) to supply either a p= roblematic Regexp or a bad string to test it with. ``` /A(B|C+)+D/ =3D~ "A" + "C" * 100 + "X" ``` Having a problem Regexp somewhere in a large app is a universal constant, i= t will happen as long as you are using Regexps. = Currently the only feasible way of supplying a consistent safeguard is by u= sing `Thread.raise` and managing all execution. This kind of pattern requir= es usage of a third party implementation. There are possibly issues with jR= uby and Truffle when taking approaches like this. ### Prior art .NET provides a `MatchTimeout` property per: https://docs.microsoft.com/en-= us/dotnet/api/system.text.regularexpressions.regex.matchtimeout?view=3Dnet-= 5.0 Java has nothing built in as far as I can tell: https://stackoverflow.com/q= uestions/910740/cancelling-a-long-running-regex-match Node has nothing built in as far as I can tell: https://stackoverflow.com/q= uestions/38859506/cancel-regex-match-if-timeout Golang and Rust uses RE2 which is not vulnerable to DoS by limiting feature= s (available in Ruby RE2 gem) ``` irb(main):003:0> r =3D RE2::Regexp.new('A(B|C+)+D') =3D> # irb(main):004:0> r.match("A" + "C" * 100 + "X") =3D> nil ``` ### Proposal Implement `Regexp.timeout` which allow us to specify a global timeout for a= ll Regexp operations in Ruby. = Per Regexp would require massive application changes, almost all web apps w= ould do just fine with a 1 second Regexp timeout. If `timeout` is set to `nil` everything would work as it does today, when s= et to second a "monitor" thread would track running regexps and time them o= ut according to the global value. ### Alternatives = I recommend against a "per Regexp" API as this decision is at the applicati= on level. You want to apply it to all regular expressions in all the gems y= ou are consuming. I recommend against a move to RE2 at the moment as way too much would break = ### See also: = https://people.cs.vt.edu/davisjam/downloads/publications/Davis-Dissertation= -2020.pdf https://levelup.gitconnected.com/the-regular-expression-denial-of-service-r= edos-cheat-sheet-a78d0ed7d865 -- = https://bugs.ruby-lang.org/ Unsubscribe: