ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:103200] [Ruby master Bug#17774] Quantified empty group causes regex to fail
@ 2021-04-04  8:08 davidell
  2021-04-05  6:16 ` [ruby-core:103236] " mame
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: davidell @ 2021-04-04  8:08 UTC (permalink / raw)
  To: ruby-core

Issue #17774 has been reported by Davidebyzero (David Ellsworth).

----------------------------------------
Bug #17774: Quantified empty group causes regex to fail
https://bugs.ruby-lang.org/issues/17774

* Author: Davidebyzero (David Ellsworth)
* Status: Open
* Priority: Normal
* ruby -v: ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-msys]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN
----------------------------------------
The regex `^((x*)(?=\2$))*x$` matches powers of 2 in unary, expressed as strings of `x` characters whose length is the number.

Adding an empty group `()` in the middle of it should have no effect on its operation, and indeed it does not. `^((x*)()(?=\2$))*x$` still matches powers of 2 just fine.  
Quantifying that empty group, `(){4}`, should still have no effect. And indeed, `^((x*)(){4}(?=\2$))*x$` still matches powers of 2. But quantify that to `(){5}`, and suddenly it fails.

The following command line should print `1`, but instead prints nothing:
```
ruby -e 'print 1 if "x"*32 =~ /^((x*)(){5}(?=\2$))*x$/'
```
However this one does print `1`:
```
ruby -e 'print 1 if "x"*32 =~ /^((x*)(){4}(?=\2$))*x$/'
```

Bug found to occur on [Try It Online](https://tio.run/): `ruby 2.5.5p157 (2019-03-15 revision 67260) [x86_64-linux]`
Bug confirmed to happen on my own machine: `ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-msys]`

Solving the challenge [Is that number a Two Bit Number™️?](https://codegolf.stackexchange.com/questions/211840/is-that-number-a-two-bit-number%ef%b8%8f/222792#222792) on Code Golf Stack Exchange is what led me to discover this bug.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [ruby-core:103236] [Ruby master Bug#17774] Quantified empty group causes regex to fail
  2021-04-04  8:08 [ruby-core:103200] [Ruby master Bug#17774] Quantified empty group causes regex to fail davidell
@ 2021-04-05  6:16 ` mame
  2021-05-01 11:06 ` [ruby-core:103681] " s.wanabe
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: mame @ 2021-04-05  6:16 UTC (permalink / raw)
  To: ruby-core

Issue #17774 has been updated by mame (Yusuke Endoh).


Thank you, I can reproduce the issue.

The issue is in the code from [onigmo](https://github.com/k-takata/Onigmo), so it would be helpful if you could report this issue to the upstream.

By a quick investigation, an optimization expands `(){4}`, and does not expand `(){5}`, which makes the difference of the behavior.
Enabling debug output suggests that the bug is caused by `USE_MONOMANIAC_CHECK_CAPTURES_IN_ENDLESS_REPEAT` option. The `(){5}` case works great by the following change that disables the option, but I'm unsure the performance impact.

```
diff --git a/regint.h b/regint.h
index 0740429688..968ea6cde8 100644
--- a/regint.h
+++ b/regint.h
@@ -71,7 +71,6 @@
 #define USE_PERL_SUBEXP_CALL
 #define USE_CAPITAL_P_NAMED_GROUP
 #define USE_BACKREF_WITH_LEVEL        /* \k<name+n>, \k<name-n> */
-#define USE_MONOMANIAC_CHECK_CAPTURES_IN_ENDLESS_REPEAT  /* /(?:()|())*\2/ */
 #define USE_NEWLINE_AT_END_OF_STRING_HAS_EMPTY_LINE     /* /\n$/ =~ "\n" */
 #define USE_WARNING_REDUNDANT_NESTED_REPEAT_OPERATOR
 /* !!! moved to regenc.h. */ /* #define USE_CRNL_AS_LINE_TERMINATOR */
```

----------------------------------------
Bug #17774: Quantified empty group causes regex to fail
https://bugs.ruby-lang.org/issues/17774#change-91317

* Author: Davidebyzero (David Ellsworth)
* Status: Open
* Priority: Normal
* ruby -v: ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-msys]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN
----------------------------------------
The regex `^((x*)(?=\2$))*x$` matches powers of 2 in unary, expressed as strings of `x` characters whose length is the number.

Adding an empty group `()` in the middle of it should have no effect on its operation, and indeed it does not. `^((x*)()(?=\2$))*x$` still matches powers of 2 just fine.  
Quantifying that empty group, `(){4}`, should still have no effect. And indeed, `^((x*)(){4}(?=\2$))*x$` still matches powers of 2. But quantify that to `(){5}`, and suddenly it fails.

The following command line should print `1`, but instead prints nothing:
```
ruby -e 'print 1 if "x"*32 =~ /^((x*)(){5}(?=\2$))*x$/'
```
However this one does print `1`:
```
ruby -e 'print 1 if "x"*32 =~ /^((x*)(){4}(?=\2$))*x$/'
```

Bug found to occur on [Try It Online](https://tio.run/): `ruby 2.5.5p157 (2019-03-15 revision 67260) [x86_64-linux]`
Bug confirmed to happen on my own machine: `ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-msys]`

Solving the challenge [Is that number a Two Bit Number™️?](https://codegolf.stackexchange.com/questions/211840/is-that-number-a-two-bit-number%ef%b8%8f/222792#222792) on Code Golf Stack Exchange is what led me to discover this bug.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [ruby-core:103681] [Ruby master Bug#17774] Quantified empty group causes regex to fail
  2021-04-04  8:08 [ruby-core:103200] [Ruby master Bug#17774] Quantified empty group causes regex to fail davidell
  2021-04-05  6:16 ` [ruby-core:103236] " mame
@ 2021-05-01 11:06 ` s.wanabe
  2021-05-01 12:50 ` [ruby-core:103682] " sawadatsuyoshi
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: s.wanabe @ 2021-05-01 11:06 UTC (permalink / raw)
  To: ruby-core

Issue #17774 has been updated by wanabe (_ wanabe).


The reproduction example could be a bit shorter.
```
$ ruby -ve 'p "xxxx" =~ /(?:x(){5})*$/, "xxxx" =~ /(?:x(){4})*$/'
ruby 3.1.0dev (2021-05-01T02:04:17Z origin/master 121fa24a34) [x86_64-linux]
3
0
```

This problem has already been fixed in Oniguruma, a derivative of Onigmo.
https://github.com/kkos/oniguruma/commit/ca64663ca8bb34ca7dc219d18ec6e475cca9dec8

```
$ (git checkout ca64663ca8bb34ca7dc219d18ec6e475cca9dec8~ && autoreconf -vfi && ./configure && make -j6 && sed -i sample/simple.c -e 's/\(pattern *= [^"]*\)"[^"]*"/\1"(?:x(){5})*$"/' -e 's/\(str *= [^"]*\)"[^"]*"/\1"xxxx"/' && (cd sample; make simple)) > build.log 2>&1 && ./sample/simple
match at 3
0: (3-4)
1: (4-4)

$ (git checkout ca64663ca8bb34ca7dc219d18ec6e475cca9dec8 && autoreconf -vfi && ./configure && make -j6 && sed -i sample/simple.c -e 's/\(pattern *= [^"]*\)"[^"]*"/\1"(?:x(){5})*$"/' -e 's/\(str *= [^"]*\)"[^"]*"/\1"xxxx"/' && (cd sample; make simple)) > build.log 2>&1 && ./sample/simple
match at 0
0: (0-4)
1: (4-4)

```

I think that introducing a mechanism that exists in Oniguruma 6.x, such as empty_status_mem and set_empty_status_check_trav, may solve the problem.

----------------------------------------
Bug #17774: Quantified empty group causes regex to fail
https://bugs.ruby-lang.org/issues/17774#change-91776

* Author: Davidebyzero (David Ellsworth)
* Status: Open
* Priority: Normal
* ruby -v: ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-msys]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN
----------------------------------------
The regex `^((x*)(?=\2$))*x$` matches powers of 2 in unary, expressed as strings of `x` characters whose length is the number.

Adding an empty group `()` in the middle of it should have no effect on its operation, and indeed it does not. `^((x*)()(?=\2$))*x$` still matches powers of 2 just fine.  
Quantifying that empty group, `(){4}`, should still have no effect. And indeed, `^((x*)(){4}(?=\2$))*x$` still matches powers of 2. But quantify that to `(){5}`, and suddenly it fails.

The following command line should print `1`, but instead prints nothing:
```
ruby -e 'print 1 if "x"*32 =~ /^((x*)(){5}(?=\2$))*x$/'
```
However this one does print `1`:
```
ruby -e 'print 1 if "x"*32 =~ /^((x*)(){4}(?=\2$))*x$/'
```

Bug found to occur on [Try It Online](https://tio.run/): `ruby 2.5.5p157 (2019-03-15 revision 67260) [x86_64-linux]`
Bug confirmed to happen on my own machine: `ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-msys]`

Solving the challenge [Is that number a Two Bit Number™️?](https://codegolf.stackexchange.com/questions/211840/is-that-number-a-two-bit-number%ef%b8%8f/222792#222792) on Code Golf Stack Exchange is what led me to discover this bug.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [ruby-core:103682] [Ruby master Bug#17774] Quantified empty group causes regex to fail
  2021-04-04  8:08 [ruby-core:103200] [Ruby master Bug#17774] Quantified empty group causes regex to fail davidell
  2021-04-05  6:16 ` [ruby-core:103236] " mame
  2021-05-01 11:06 ` [ruby-core:103681] " s.wanabe
@ 2021-05-01 12:50 ` sawadatsuyoshi
  2021-05-01 20:06 ` [ruby-core:103689] " s.wanabe
  2021-10-13 16:43 ` [ruby-core:105633] " jeremyevans0 (Jeremy Evans)
  4 siblings, 0 replies; 6+ messages in thread
From: sawadatsuyoshi @ 2021-05-01 12:50 UTC (permalink / raw)
  To: ruby-core

Issue #17774 has been updated by sawa (Tsuyoshi Sawada).


wanabe (_ wanabe) wrote in #note-2:
> ... Oniguruma, a derivative of Onigmo

I believe it is the other way around.


----------------------------------------
Bug #17774: Quantified empty group causes regex to fail
https://bugs.ruby-lang.org/issues/17774#change-91777

* Author: Davidebyzero (David Ellsworth)
* Status: Open
* Priority: Normal
* ruby -v: ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-msys]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN
----------------------------------------
The regex `^((x*)(?=\2$))*x$` matches powers of 2 in unary, expressed as strings of `x` characters whose length is the number.

Adding an empty group `()` in the middle of it should have no effect on its operation, and indeed it does not. `^((x*)()(?=\2$))*x$` still matches powers of 2 just fine.  
Quantifying that empty group, `(){4}`, should still have no effect. And indeed, `^((x*)(){4}(?=\2$))*x$` still matches powers of 2. But quantify that to `(){5}`, and suddenly it fails.

The following command line should print `1`, but instead prints nothing:
```
ruby -e 'print 1 if "x"*32 =~ /^((x*)(){5}(?=\2$))*x$/'
```
However this one does print `1`:
```
ruby -e 'print 1 if "x"*32 =~ /^((x*)(){4}(?=\2$))*x$/'
```

Bug found to occur on [Try It Online](https://tio.run/): `ruby 2.5.5p157 (2019-03-15 revision 67260) [x86_64-linux]`
Bug confirmed to happen on my own machine: `ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-msys]`

Solving the challenge [Is that number a Two Bit Number™️?](https://codegolf.stackexchange.com/questions/211840/is-that-number-a-two-bit-number%ef%b8%8f/222792#222792) on Code Golf Stack Exchange is what led me to discover this bug.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [ruby-core:103689] [Ruby master Bug#17774] Quantified empty group causes regex to fail
  2021-04-04  8:08 [ruby-core:103200] [Ruby master Bug#17774] Quantified empty group causes regex to fail davidell
                   ` (2 preceding siblings ...)
  2021-05-01 12:50 ` [ruby-core:103682] " sawadatsuyoshi
@ 2021-05-01 20:06 ` s.wanabe
  2021-10-13 16:43 ` [ruby-core:105633] " jeremyevans0 (Jeremy Evans)
  4 siblings, 0 replies; 6+ messages in thread
From: s.wanabe @ 2021-05-01 20:06 UTC (permalink / raw)
  To: ruby-core

Issue #17774 has been updated by wanabe (_ wanabe).


sawa (Tsuyoshi Sawada) wrote in #note-3:
> wanabe (_ wanabe) wrote in #note-2:
> > ... Oniguruma, a derivative of Onigmo
> 
> I believe it is the other way around.

Oh I'm very sorry, I wrote it wrong.
I was aware of it, but I simply used the wrong word.


----------------------------------------
Bug #17774: Quantified empty group causes regex to fail
https://bugs.ruby-lang.org/issues/17774#change-91782

* Author: Davidebyzero (David Ellsworth)
* Status: Open
* Priority: Normal
* ruby -v: ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-msys]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN
----------------------------------------
The regex `^((x*)(?=\2$))*x$` matches powers of 2 in unary, expressed as strings of `x` characters whose length is the number.

Adding an empty group `()` in the middle of it should have no effect on its operation, and indeed it does not. `^((x*)()(?=\2$))*x$` still matches powers of 2 just fine.  
Quantifying that empty group, `(){4}`, should still have no effect. And indeed, `^((x*)(){4}(?=\2$))*x$` still matches powers of 2. But quantify that to `(){5}`, and suddenly it fails.

The following command line should print `1`, but instead prints nothing:
```
ruby -e 'print 1 if "x"*32 =~ /^((x*)(){5}(?=\2$))*x$/'
```
However this one does print `1`:
```
ruby -e 'print 1 if "x"*32 =~ /^((x*)(){4}(?=\2$))*x$/'
```

Bug found to occur on [Try It Online](https://tio.run/): `ruby 2.5.5p157 (2019-03-15 revision 67260) [x86_64-linux]`
Bug confirmed to happen on my own machine: `ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-msys]`

Solving the challenge [Is that number a Two Bit Number™️?](https://codegolf.stackexchange.com/questions/211840/is-that-number-a-two-bit-number%ef%b8%8f/222792#222792) on Code Golf Stack Exchange is what led me to discover this bug.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [ruby-core:105633] [Ruby master Bug#17774] Quantified empty group causes regex to fail
  2021-04-04  8:08 [ruby-core:103200] [Ruby master Bug#17774] Quantified empty group causes regex to fail davidell
                   ` (3 preceding siblings ...)
  2021-05-01 20:06 ` [ruby-core:103689] " s.wanabe
@ 2021-10-13 16:43 ` jeremyevans0 (Jeremy Evans)
  4 siblings, 0 replies; 6+ messages in thread
From: jeremyevans0 (Jeremy Evans) @ 2021-10-13 16:43 UTC (permalink / raw)
  To: ruby-core

Issue #17774 has been updated by jeremyevans0 (Jeremy Evans).


I looked into fixing this by removing the define of `USE_MONOMANIAC_CHECK_CAPTURES_IN_ENDLESS_REPEAT`, as @mame indicated: https://github.com/ruby/ruby/commit/018922ba15eb7aea86957789d7defae9ffc43688

It ends up breaking a few specs.  For example, it changes the behavior of:

```ruby
/(a|\2b|())*/.match("aaabbb").to_a

# Before:
# => ["aaabbb", "", ""]

# After:
# => ["aaa", "", ""]
```

For this example, Ruby 1.8 returns `["aaa", "a", nil]`.  The equivalent in Perl returns `["aaa", "", ""]`. The equivalent in Python 2 and 3 returns `["aaabbb", "", ""]`.  I think the `["aaabbb", "", ""]` result seems best for a greedy match since it matches the most characters. However, I can also see where an implementation would return one of the other results if a scan terminates when no forward progress is made during an iteration.

Anyway, if we are OK with this behavior change for empty capture groups, I can submit the commit as a pull request.  However, I think it would be better to wait for a fix in Onigmo.

----------------------------------------
Bug #17774: Quantified empty group causes regex to fail
https://bugs.ruby-lang.org/issues/17774#change-94123

* Author: Davidebyzero (David Ellsworth)
* Status: Open
* Priority: Normal
* ruby -v: ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-msys]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN, 3.0: UNKNOWN
----------------------------------------
The regex `^((x*)(?=\2$))*x$` matches powers of 2 in unary, expressed as strings of `x` characters whose length is the number.

Adding an empty group `()` in the middle of it should have no effect on its operation, and indeed it does not. `^((x*)()(?=\2$))*x$` still matches powers of 2 just fine.  
Quantifying that empty group, `(){4}`, should still have no effect. And indeed, `^((x*)(){4}(?=\2$))*x$` still matches powers of 2. But quantify that to `(){5}`, and suddenly it fails.

The following command line should print `1`, but instead prints nothing:
```
ruby -e 'print 1 if "x"*32 =~ /^((x*)(){5}(?=\2$))*x$/'
```
However this one does print `1`:
```
ruby -e 'print 1 if "x"*32 =~ /^((x*)(){4}(?=\2$))*x$/'
```

Bug found to occur on [Try It Online](https://tio.run/): `ruby 2.5.5p157 (2019-03-15 revision 67260) [x86_64-linux]`
Bug confirmed to happen on my own machine: `ruby 2.7.2p137 (2020-10-01 revision 5445e04352) [x86_64-msys]`

Solving the challenge [Is that number a Two Bit Number™️?](https://codegolf.stackexchange.com/questions/211840/is-that-number-a-two-bit-number%ef%b8%8f/222792#222792) on Code Golf Stack Exchange is what led me to discover this bug.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-10-13 16:43 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-04  8:08 [ruby-core:103200] [Ruby master Bug#17774] Quantified empty group causes regex to fail davidell
2021-04-05  6:16 ` [ruby-core:103236] " mame
2021-05-01 11:06 ` [ruby-core:103681] " s.wanabe
2021-05-01 12:50 ` [ruby-core:103682] " sawadatsuyoshi
2021-05-01 20:06 ` [ruby-core:103689] " s.wanabe
2021-10-13 16:43 ` [ruby-core:105633] " jeremyevans0 (Jeremy Evans)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).