ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
From: "make_now_just (Hiroya Fujinami) via ruby-core" <ruby-core@ml.ruby-lang.org>
To: ruby-core@ml.ruby-lang.org
Cc: "make_now_just (Hiroya Fujinami)" <noreply@ruby-lang.org>
Subject: [ruby-core:116491] [Ruby master Bug#20225] Inconsistent behavior of regex matching for a regex has a null loop
Date: Tue, 30 Jan 2024 02:53:03 +0000 (UTC)	[thread overview]
Message-ID: <redmine.issue-20225.20240130025303.9532@ruby-lang.org> (raw)
In-Reply-To: redmine.issue-20225.20240130025303.9532@ruby-lang.org

Issue #20225 has been reported by make_now_just (Hiroya Fujinami).

----------------------------------------
Bug #20225: Inconsistent behavior of regex matching for a regex has a null loop
https://bugs.ruby-lang.org/issues/20225

* Author: make_now_just (Hiroya Fujinami)
* Status: Open
* Priority: Normal
* Assignee: make_now_just (Hiroya Fujinami)
* Backport: 3.0: UNKNOWN, 3.1: UNKNOWN, 3.2: UNKNOWN, 3.3: UNKNOWN
----------------------------------------
Usually, in Ruby (Onigmo), when a null loop (a loop consuming no characters) occurs on regex matching, this loop is terminated. But, if a loop has a capture and some complex condition is satisfied, this causes backtracking. This behavior invokes unexpected results, for example,

```ruby
p /(?:.B.(?<a>(?:[C-Z]|.)*)+){2}/ =~ "ABCABC" # => nil
p /(?:.B.(?:(?:[C-Z]|.)*)+){2}/ =~ "ABCABC"   # => 0
```

Because the above regex has a capture and the below does not, different matching results are returned. It is not very intuitive that the presence of a capture changes the matching result.

The detailed condition for changing the null-loop behavior is 1) a previous capture in this loop holds the empty string, and 2) this capture's position is different from the current matching position. This condition is checked in `STACK_NULL_CHECK_MEMST` (https://github.com/ruby/ruby/blob/bbb7ab906ec64b963bd4b5d37e47b14796d64371/regexec.c#L1766-L1778).

Perhaps, you cannot understand what this condition means. Don't worry, I also cannot understand. This condition has been introduced for at least 20 years, and no one may remember the reason for this necessity. (If you know, please tell me!) Even if there is a reason, I believe that there is no reasonable authority for allowing counter-intuitive behavior, such as the above example.

This behavior can also cause memoization to be buggy. Memoization relies on the fact that backtracking only depends on positions and states (byte-code offsets of a regex). However, this condition additionally refers to captures, and the memoization is broken.

My proposal is to **correct this inconsistent behavior**. Specifically, a null loop should be determined solely on the basis of whether the matching position has changed, without referring to captures.

This fix changes the behavior of regex matching, but I believe that the probability that this will actually cause backward compatibility problems is remarkably low. This is because I have never seen any mention of this puzzling behavior before.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

       reply	other threads:[~2024-01-30  2:53 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-30  2:53 make_now_just (Hiroya Fujinami) via ruby-core [this message]
2024-02-01  5:59 ` [ruby-core:116544] [Ruby master Bug#20225] Inconsistent behavior of regex matching for a regex has a null loop make_now_just (Hiroya Fujinami) via ruby-core
2024-02-01  6:05 ` [ruby-core:116545] " make_now_just (Hiroya Fujinami) via ruby-core
2024-02-03 14:25 ` [ruby-core:116563] " Eregon (Benoit Daloze) via ruby-core
2024-02-03 14:29 ` [ruby-core:116564] " Eregon (Benoit Daloze) via ruby-core
2024-02-03 22:55 ` [ruby-core:116570] " jirkamarsik (Jirka Marsik) via ruby-core
2024-02-03 23:04 ` [ruby-core:116571] " jirkamarsik (Jirka Marsik) via ruby-core
2024-03-14  9:42 ` [ruby-core:117149] " mame (Yusuke Endoh) via ruby-core
2024-03-14 14:40 ` [ruby-core:117172] " Eregon (Benoit Daloze) via ruby-core
2024-03-15  0:37 ` [ruby-core:117191] " make_now_just (Hiroya Fujinami) via ruby-core

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.ruby-lang.org/en/community/mailing-lists/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=redmine.issue-20225.20240130025303.9532@ruby-lang.org \
    --to=ruby-core@ruby-lang.org \
    --cc=noreply@ruby-lang.org \
    --cc=ruby-core@ml.ruby-lang.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).