ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
From: duerst@it.aoyama.ac.jp
To: ruby-core@ruby-lang.org
Subject: [ruby-core:88729] [Ruby trunk Bug#13671] Regexp with lookbehind and case-insensitivity raises RegexpError only on strings with certain characters
Date: Wed, 29 Aug 2018 10:20:11 +0000 (UTC)	[thread overview]
Message-ID: <redmine.journal-73785.20180829102010.e9c3c1539f4884b4@ruby-lang.org> (raw)
In-Reply-To: redmine.issue-13671.20170622232858@ruby-lang.org

Issue #13671 has been updated by duerst (Martin Dürst).


gotoken (Kentaro Goto) wrote:

> For example, `"ss"` and `"st"` are mapped `"ß"` (`"\u00DF"`) and `"st"` (`"\uFB06"`).
> Those combinations are listed in ftp://ftp.unicode.org/Public/UNIDATA/SpecialCasing.txt
> 
> By the way, this expansion by `//i` option looks over kill for me. 
> I wish case sensitivity and SpecialCasing mapping were separated...

I still have to verify this, but currently I strongly suspect that the problem is NOT in SpecialCasing, but in how Onigmo (/Oniguruma?) implement it. 

----------------------------------------
Bug #13671: Regexp with lookbehind and case-insensitivity raises RegexpError only on strings with certain characters
https://bugs.ruby-lang.org/issues/13671#change-73785

* Author: dschweisguth (Dave Schweisguth)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: 2.4.1
* Backport: 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN
----------------------------------------
Here is a test program:

~~~ ruby
def test(description)
  begin
    yield
    puts "#{description} is OK"
  rescue RegexpError
    puts "#{description} raises RegexpError"
  end
end

test("ass, case-insensitive, special") { /(?<!ass)/i =~ '✨' }
test("bss, case-insensitive, special") { /(?<!bss)/i =~ '✨' }
test("as,  case-insensitive, special") { /(?<!as)/i  =~ '✨' }
test("ss,  case-insensitive, special") { /(?<!ss)/i  =~ '✨' }
test("ass, case-sensitive,   special") { /(?<!ass)/  =~ '✨' }
test("ass, case-insensitive, regular") { /(?<!ass)/i =~ 'x' }

~~~

Running the test program with Ruby 2.4.1 (macOS) gives

~~~
ass, case-insensitive, special raises RegexpError
bss, case-insensitive, special raises RegexpError
as,  case-insensitive, special is OK
ss,  case-insensitive, special is OK
ass, case-sensitive,   special is OK
ass, case-insensitive, regular is OK

~~~

The RegexpError is "invalid pattern in look-behind: /(?<!ass)/i (RegexpError)"

Side note: in the real code in which I found this error I was able to work around the error by using (?i) after the lookbehind instead of //i.

Running the test program with Ruby 2.3.4 does not report any RegexpErrors.

I think this is a regression, although I might be wrong and it might be saving me from an incorrect result with certain strings.

---Files--------------------------------
test.rb (531 Bytes)


-- 
https://bugs.ruby-lang.org/

      parent reply	other threads:[~2018-08-29 10:20 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <redmine.issue-13671.20170622232858@ruby-lang.org>
2017-06-22 23:28 ` [ruby-core:81742] [Ruby trunk Bug#13671] Regexp with lookbehind and case-insensitivity raises RegexpError only on strings with certain characters dave
2017-06-23  8:49 ` [ruby-core:81748] " hanmac
2017-07-14  9:51 ` [ruby-core:82065] " naruse
2018-08-27  2:35 ` [ruby-core:88656] " gotoken
2018-08-27  3:46 ` [ruby-core:88657] " zn
2018-08-27  5:44 ` [ruby-core:88669] " gotoken
2018-08-27  6:02 ` [ruby-core:88671] " shyouhei
2018-08-27  6:31 ` [ruby-core:88674] " gotoken
2018-08-29 10:20 ` duerst [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.ruby-lang.org/en/community/mailing-lists/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=redmine.journal-73785.20180829102010.e9c3c1539f4884b4@ruby-lang.org \
    --to=ruby-core@ruby-lang.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).