ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:61810] [ruby-trunk - Bug #9694] [Open] Bad regexp hangs ruby
       [not found] <redmine.issue-9694.20140402093611@ruby-lang.org>
@ 2014-04-02  9:36 ` nikolai.markov
  2014-04-02 22:21 ` [ruby-core:61812] [ruby-trunk - Bug #9694] " rafaelmfranca
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: nikolai.markov @ 2014-04-02  9:36 UTC (permalink / raw
  To: ruby-core

Issue #9694 has been reported by Nikolay Markov.

----------------------------------------
Bug #9694: Bad regexp hangs ruby
https://bugs.ruby-lang.org/issues/9694

* Author: Nikolay Markov
* Status: Open
* Priority: Normal
* Assignee: 
* Category: regexp
* Target version: 
* ruby -v: ruby 2.1.1p76 (2014-02-24 revision 45161) [x86_64-darwin13.0]
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN
----------------------------------------
Here is an extracted problem i ran into recently:
~~~
$ cat test.rb

str = ('a' * ARGV[0].to_i) + '?'
re = /(\w*)*$/
re.match(str)
~~~

On few chars match returns quite fast, but here's what happens on 14 'a'-s and up:
~~~
$ time RBENV_VERSION=2.1.1 ruby test.rb 14

real 1.392	user 1.364	sys 0.026	pcpu 99.83

$ time RBENV_VERSION=2.1.1 ruby test.rb 15

real 3.979	user 3.949	sys 0.026	pcpu 99.89

$ time RBENV_VERSION=2.1.1 ruby test.rb 16

real 11.995	user 11.954	sys 0.031	pcpu 99.92
~~~

Ruby versions 1.9.3 and 2.0 behave similarly. 

I ran into the problem, because one of my colleagues copy-pasted this regexp to test url's somewhere from stackoverflow:
/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-:]*)*\/?$/

I know the regexp is useless, however i think it's still a problem if a bad regexp can hang ruby.
Python (2.7) says that this regexp is bad and does not compile it.
Perl matches without any performance issues





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [ruby-core:61812] [ruby-trunk - Bug #9694] Bad regexp hangs ruby
       [not found] <redmine.issue-9694.20140402093611@ruby-lang.org>
  2014-04-02  9:36 ` [ruby-core:61810] [ruby-trunk - Bug #9694] [Open] Bad regexp hangs ruby nikolai.markov
@ 2014-04-02 22:21 ` rafaelmfranca
  2014-04-02 22:34 ` [ruby-core:61814] " gergo
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: rafaelmfranca @ 2014-04-02 22:21 UTC (permalink / raw
  To: ruby-core

Issue #9694 has been updated by Rafael França.


I tried to reproduce with your script and could not:

```
$ ruby -v
ruby 2.1.1p76 (2014-02-24 revision 45161) [x86_64-darwin13.0]

$ cat foo.rb
str = ('a' * ARGV[0].to_i) + '?'
re = /(\w)$/
re.match(str)

$ time ruby foo.rb 14

real	0m0.134s
user	0m0.075s
sys	0m0.050s

$ time ruby foo.rb 15

real	0m0.145s
user	0m0.082s
sys	0m0.054s

$ time ruby foo.rb 16

real	0m0.137s
user	0m0.076s
sys	0m0.052s

$ time ruby foo.rb 50

real	0m0.142s
user	0m0.080s
sys	0m0.053s
```

----------------------------------------
Bug #9694: Bad regexp hangs ruby
https://bugs.ruby-lang.org/issues/9694#change-46045

* Author: Nikolay Markov
* Status: Open
* Priority: Normal
* Assignee: 
* Category: regexp
* Target version: 
* ruby -v: ruby 2.1.1p76 (2014-02-24 revision 45161) [x86_64-darwin13.0]
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN
----------------------------------------
Here is an extracted problem i ran into recently:
~~~
$ cat test.rb

str = ('a' * ARGV[0].to_i) + '?'
re = /(\w*)*$/
re.match(str)
~~~

On few chars match returns quite fast, but here's what happens on 14 'a'-s and up:
~~~
$ time RBENV_VERSION=2.1.1 ruby test.rb 14

real 1.392	user 1.364	sys 0.026	pcpu 99.83

$ time RBENV_VERSION=2.1.1 ruby test.rb 15

real 3.979	user 3.949	sys 0.026	pcpu 99.89

$ time RBENV_VERSION=2.1.1 ruby test.rb 16

real 11.995	user 11.954	sys 0.031	pcpu 99.92
~~~

Ruby versions 1.9.3 and 2.0 behave similarly. 

I ran into the problem, because one of my colleagues copy-pasted this regexp to test url's somewhere from stackoverflow:
/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-:]*)*\/?$/

I know the regexp is useless, however i think it's still a problem if a bad regexp can hang ruby.
Python (2.7) says that this regexp is bad and does not compile it.
Perl matches without any performance issues





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [ruby-core:61814] [ruby-trunk - Bug #9694] Bad regexp hangs ruby
       [not found] <redmine.issue-9694.20140402093611@ruby-lang.org>
  2014-04-02  9:36 ` [ruby-core:61810] [ruby-trunk - Bug #9694] [Open] Bad regexp hangs ruby nikolai.markov
  2014-04-02 22:21 ` [ruby-core:61812] [ruby-trunk - Bug #9694] " rafaelmfranca
@ 2014-04-02 22:34 ` gergo
  2014-04-03  2:42 ` [ruby-core:61816] " shyouhei
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 6+ messages in thread
From: gergo @ 2014-04-02 22:34 UTC (permalink / raw
  To: ruby-core

Issue #9694 has been updated by Gergo Erdosi.


For some reason the regex is not displayed correctly. I'm able to reproduce the reported issue (see the correct regex below):

~~~
$ cat test.rb 
str = ('a' * ARGV[0].to_i) + '?'
re = /(\w*)*$/
re.match(str)
$ time ruby test.rb 14

real	0m1.179s
user	0m1.128s
sys	0m0.004s
$ time ruby test.rb 15

real	0m3.568s
user	0m3.419s
sys	0m0.020s
$ time ruby test.rb 16

real	0m10.767s
user	0m10.276s
sys	0m0.067s
~~~

----------------------------------------
Bug #9694: Bad regexp hangs ruby
https://bugs.ruby-lang.org/issues/9694#change-46046

* Author: Nikolay Markov
* Status: Open
* Priority: Normal
* Assignee: 
* Category: regexp
* Target version: 
* ruby -v: ruby 2.1.1p76 (2014-02-24 revision 45161) [x86_64-darwin13.0]
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN
----------------------------------------
Here is an extracted problem i ran into recently:
~~~
$ cat test.rb

str = ('a' * ARGV[0].to_i) + '?'
re = /(\w*)*$/
re.match(str)
~~~

On few chars match returns quite fast, but here's what happens on 14 'a'-s and up:
~~~
$ time RBENV_VERSION=2.1.1 ruby test.rb 14

real 1.392	user 1.364	sys 0.026	pcpu 99.83

$ time RBENV_VERSION=2.1.1 ruby test.rb 15

real 3.979	user 3.949	sys 0.026	pcpu 99.89

$ time RBENV_VERSION=2.1.1 ruby test.rb 16

real 11.995	user 11.954	sys 0.031	pcpu 99.92
~~~

Ruby versions 1.9.3 and 2.0 behave similarly. 

I ran into the problem, because one of my colleagues copy-pasted this regexp to test url's somewhere from stackoverflow:
/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-:]*)*\/?$/

I know the regexp is useless, however i think it's still a problem if a bad regexp can hang ruby.
Python (2.7) says that this regexp is bad and does not compile it.
Perl matches without any performance issues





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [ruby-core:61816] [ruby-trunk - Bug #9694] Bad regexp hangs ruby
       [not found] <redmine.issue-9694.20140402093611@ruby-lang.org>
                   ` (2 preceding siblings ...)
  2014-04-02 22:34 ` [ruby-core:61814] " gergo
@ 2014-04-03  2:42 ` shyouhei
  2014-04-03 15:59 ` [ruby-core:61842] " nikolai.markov
  2014-04-04 11:19 ` [ruby-core:61855] " shyouhei
  5 siblings, 0 replies; 6+ messages in thread
From: shyouhei @ 2014-04-03  2:42 UTC (permalink / raw
  To: ruby-core

Issue #9694 has been updated by Shyouhei Urabe.


Ruby's regexp engine is NP-complete.  It's ultimately impossible to guarantee
regexp matches to run fast (if you don't think so please send us a proof).  It
might be possible to warn your specific bad regexp, but in general it's also
impossible to tell which regexp is bad and which isn't.  That's the Halintg
problem http://en.wikipedia.org/wiki/Halting_problem .

So, in short, there is (at least believed to be) no ultimate solution.  All what
might be possible is to find a reasonable compromise.  For instance python does
not detect that shorter one:

~~~python
zsh % python
Python 2.7.5+ (default, Feb 27 2014, 19:37:08) 
[GCC 4.8.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.compile(r'(\w*)*$')
<_sre.SRE_Pattern object at 0x1dee030>
>>> 
~~~

----------------------------------------
Bug #9694: Bad regexp hangs ruby
https://bugs.ruby-lang.org/issues/9694#change-46048

* Author: Nikolay Markov
* Status: Open
* Priority: Normal
* Assignee: 
* Category: regexp
* Target version: 
* ruby -v: ruby 2.1.1p76 (2014-02-24 revision 45161) [x86_64-darwin13.0]
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN
----------------------------------------
Here is an extracted problem i ran into recently:
~~~
$ cat test.rb

str = ('a' * ARGV[0].to_i) + '?'
re = /(\w*)*$/
re.match(str)
~~~

On few chars match returns quite fast, but here's what happens on 14 'a'-s and up:
~~~
$ time RBENV_VERSION=2.1.1 ruby test.rb 14

real 1.392	user 1.364	sys 0.026	pcpu 99.83

$ time RBENV_VERSION=2.1.1 ruby test.rb 15

real 3.979	user 3.949	sys 0.026	pcpu 99.89

$ time RBENV_VERSION=2.1.1 ruby test.rb 16

real 11.995	user 11.954	sys 0.031	pcpu 99.92
~~~

Ruby versions 1.9.3 and 2.0 behave similarly. 

I ran into the problem, because one of my colleagues copy-pasted this regexp to test url's somewhere from stackoverflow:
/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-:]*)*\/?$/

I know the regexp is useless, however i think it's still a problem if a bad regexp can hang ruby.
Python (2.7) says that this regexp is bad and does not compile it.
Perl matches without any performance issues





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [ruby-core:61842] [ruby-trunk - Bug #9694] Bad regexp hangs ruby
       [not found] <redmine.issue-9694.20140402093611@ruby-lang.org>
                   ` (3 preceding siblings ...)
  2014-04-03  2:42 ` [ruby-core:61816] " shyouhei
@ 2014-04-03 15:59 ` nikolai.markov
  2014-04-04 11:19 ` [ruby-core:61855] " shyouhei
  5 siblings, 0 replies; 6+ messages in thread
From: nikolai.markov @ 2014-04-03 15:59 UTC (permalink / raw
  To: ruby-core

Issue #9694 has been updated by Nikolay Markov.


Rafael, i'm sorry, the bad regexp is not displaying properly, something is obviously wrong with my formatting. Gergo reproduced it same as i have it.

Urabe, do you know how Perl does that?  Also, i'll be grateful for the link to regexp sources in ruby

----------------------------------------
Bug #9694: Bad regexp hangs ruby
https://bugs.ruby-lang.org/issues/9694#change-46064

* Author: Nikolay Markov
* Status: Open
* Priority: Normal
* Assignee: 
* Category: regexp
* Target version: 
* ruby -v: ruby 2.1.1p76 (2014-02-24 revision 45161) [x86_64-darwin13.0]
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN
----------------------------------------
Here is an extracted problem i ran into recently:
~~~
$ cat test.rb

str = ('a' * ARGV[0].to_i) + '?'
re = /(\w*)*$/
re.match(str)
~~~

On few chars match returns quite fast, but here's what happens on 14 'a'-s and up:
~~~
$ time RBENV_VERSION=2.1.1 ruby test.rb 14

real 1.392	user 1.364	sys 0.026	pcpu 99.83

$ time RBENV_VERSION=2.1.1 ruby test.rb 15

real 3.979	user 3.949	sys 0.026	pcpu 99.89

$ time RBENV_VERSION=2.1.1 ruby test.rb 16

real 11.995	user 11.954	sys 0.031	pcpu 99.92
~~~

Ruby versions 1.9.3 and 2.0 behave similarly. 

I ran into the problem, because one of my colleagues copy-pasted this regexp to test url's somewhere from stackoverflow:
/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-:]*)*\/?$/

I know the regexp is useless, however i think it's still a problem if a bad regexp can hang ruby.
Python (2.7) says that this regexp is bad and does not compile it.
Perl matches without any performance issues





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [ruby-core:61855] [ruby-trunk - Bug #9694] Bad regexp hangs ruby
       [not found] <redmine.issue-9694.20140402093611@ruby-lang.org>
                   ` (4 preceding siblings ...)
  2014-04-03 15:59 ` [ruby-core:61842] " nikolai.markov
@ 2014-04-04 11:19 ` shyouhei
  5 siblings, 0 replies; 6+ messages in thread
From: shyouhei @ 2014-04-04 11:19 UTC (permalink / raw
  To: ruby-core

Issue #9694 has been updated by Shyouhei Urabe.


Nikolay Markov wrote:
> Urabe, do you know how Perl does that?  Also, i'll be grateful for the link to regexp sources in ruby

Don't know anything about perl's.  Ruby's regexp engine is called Onigmo: https://github.com/k-takata/Onigmo

----------------------------------------
Bug #9694: Bad regexp hangs ruby
https://bugs.ruby-lang.org/issues/9694#change-46071

* Author: Nikolay Markov
* Status: Open
* Priority: Normal
* Assignee: 
* Category: regexp
* Target version: 
* ruby -v: ruby 2.1.1p76 (2014-02-24 revision 45161) [x86_64-darwin13.0]
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN
----------------------------------------
Here is an extracted problem i ran into recently:
~~~
$ cat test.rb

str = ('a' * ARGV[0].to_i) + '?'
re = /(\w*)*$/
re.match(str)
~~~

On few chars match returns quite fast, but here's what happens on 14 'a'-s and up:
~~~
$ time RBENV_VERSION=2.1.1 ruby test.rb 14

real 1.392	user 1.364	sys 0.026	pcpu 99.83

$ time RBENV_VERSION=2.1.1 ruby test.rb 15

real 3.979	user 3.949	sys 0.026	pcpu 99.89

$ time RBENV_VERSION=2.1.1 ruby test.rb 16

real 11.995	user 11.954	sys 0.031	pcpu 99.92
~~~

Ruby versions 1.9.3 and 2.0 behave similarly. 

I ran into the problem, because one of my colleagues copy-pasted this regexp to test url's somewhere from stackoverflow:
/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-:]*)*\/?$/

I know the regexp is useless, however i think it's still a problem if a bad regexp can hang ruby.
Python (2.7) says that this regexp is bad and does not compile it.
Perl matches without any performance issues





-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-04-04 11:18 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <redmine.issue-9694.20140402093611@ruby-lang.org>
2014-04-02  9:36 ` [ruby-core:61810] [ruby-trunk - Bug #9694] [Open] Bad regexp hangs ruby nikolai.markov
2014-04-02 22:21 ` [ruby-core:61812] [ruby-trunk - Bug #9694] " rafaelmfranca
2014-04-02 22:34 ` [ruby-core:61814] " gergo
2014-04-03  2:42 ` [ruby-core:61816] " shyouhei
2014-04-03 15:59 ` [ruby-core:61842] " nikolai.markov
2014-04-04 11:19 ` [ruby-core:61855] " shyouhei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).