[ruby-core:53944] [ruby-trunk - Bug #8210][Open] Multibyte character interfering with end-line character within a regex

ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed

* [ruby-core:53944] [ruby-trunk - Bug #8210][Open] Multibyte character interfering with end-line character within a regex
@ 2013-04-03  6:22 sawa (Tsuyoshi Sawada)
  2013-04-03  6:27 ` [ruby-core:53945] [ruby-trunk - Bug #8210] " sawa (Tsuyoshi Sawada)
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: sawa (Tsuyoshi Sawada) @ 2013-04-03  6:22 UTC (permalink / raw
  To: ruby-core


Issue #8210 has been reported by sawa (Tsuyoshi Sawada).

----------------------------------------
Bug #8210: Multibyte character interfering with end-line character within a regex
https://bugs.ruby-lang.org/issues/8210

Author: sawa (Tsuyoshi Sawada)
Status: Open
Priority: Normal
Assignee: 
Category: 
Target version: 
ruby -v: 2.0


=begin
With this regex:

    regex1 = /\z/

the following strings match as expected:

    "hello" =~ regex1 # => 5
    "こんにちは" =~ regex1 # => 5

but with these regexes:

    regex2 = /#$/?\z/
    regex3 = /\n?\z/

they show difference:

    "hello" =~ regex2 # => 5
    "hello" =~ regex3 # => 5
    "こんにちは" =~ regex2 # => nil
    "こんにちは" =~ regex3 # => nil

The string encoding is UTF-8, and the OS is Linux (i.e., `$/` is `"\n"`). I expect them to behave the same, and believe this is a bug.
=end


-- 
http://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [ruby-core:53945] [ruby-trunk - Bug #8210] Multibyte character interfering with end-line character within a regex
  2013-04-03  6:22 [ruby-core:53944] [ruby-trunk - Bug #8210][Open] Multibyte character interfering with end-line character within a regex sawa (Tsuyoshi Sawada)
@ 2013-04-03  6:27 ` sawa (Tsuyoshi Sawada)
  2013-04-03  6:37 ` [ruby-core:53946] " sawa (Tsuyoshi Sawada)
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: sawa (Tsuyoshi Sawada) @ 2013-04-03  6:27 UTC (permalink / raw
  To: ruby-core


Issue #8210 has been updated by sawa (Tsuyoshi Sawada).


=begin
A different regex:

    regex4 = /[[:space:]]?\z/

seems to work as expected:

    "hello" =~ regex4 # => 5
    "こんにちは" =~ regex4 # => 5
=end
----------------------------------------
Bug #8210: Multibyte character interfering with end-line character within a regex
https://bugs.ruby-lang.org/issues/8210#change-38154

Author: sawa (Tsuyoshi Sawada)
Status: Open
Priority: Normal
Assignee: 
Category: 
Target version: 
ruby -v: 2.0


=begin
With this regex:

    regex1 = /\z/

the following strings match as expected:

    "hello" =~ regex1 # => 5
    "こんにちは" =~ regex1 # => 5

but with these regexes:

    regex2 = /#$/?\z/
    regex3 = /\n?\z/

they show difference:

    "hello" =~ regex2 # => 5
    "hello" =~ regex3 # => 5
    "こんにちは" =~ regex2 # => nil
    "こんにちは" =~ regex3 # => nil

The string encoding is UTF-8, and the OS is Linux (i.e., `$/` is `"\n"`). I expect them to behave the same, and believe this is a bug.
=end


-- 
http://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [ruby-core:53946] [ruby-trunk - Bug #8210] Multibyte character interfering with end-line character within a regex
  2013-04-03  6:22 [ruby-core:53944] [ruby-trunk - Bug #8210][Open] Multibyte character interfering with end-line character within a regex sawa (Tsuyoshi Sawada)
  2013-04-03  6:27 ` [ruby-core:53945] [ruby-trunk - Bug #8210] " sawa (Tsuyoshi Sawada)
@ 2013-04-03  6:37 ` sawa (Tsuyoshi Sawada)
  2013-04-06  0:58 ` [ruby-core:54046] " sawa (Tsuyoshi Sawada)
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: sawa (Tsuyoshi Sawada) @ 2013-04-03  6:37 UTC (permalink / raw
  To: ruby-core


Issue #8210 has been updated by sawa (Tsuyoshi Sawada).


=begin
Still a different regex:

    regex5 = /\n?$/

seems to work as expected:

    "hello" =~ regex5 # => 5
    "こんにちは" =~ regex5 # => 5
=end
----------------------------------------
Bug #8210: Multibyte character interfering with end-line character within a regex
https://bugs.ruby-lang.org/issues/8210#change-38155

Author: sawa (Tsuyoshi Sawada)
Status: Open
Priority: Normal
Assignee: 
Category: 
Target version: 
ruby -v: 2.0


=begin
With this regex:

    regex1 = /\z/

the following strings match as expected:

    "hello" =~ regex1 # => 5
    "こんにちは" =~ regex1 # => 5

but with these regexes:

    regex2 = /#$/?\z/
    regex3 = /\n?\z/

they show difference:

    "hello" =~ regex2 # => 5
    "hello" =~ regex3 # => 5
    "こんにちは" =~ regex2 # => nil
    "こんにちは" =~ regex3 # => nil

The string encoding is UTF-8, and the OS is Linux (i.e., `$/` is `"\n"`). I expect them to behave the same, and believe this is a bug.
=end


-- 
http://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [ruby-core:54046] [ruby-trunk - Bug #8210] Multibyte character interfering with end-line character within a regex
  2013-04-03  6:22 [ruby-core:53944] [ruby-trunk - Bug #8210][Open] Multibyte character interfering with end-line character within a regex sawa (Tsuyoshi Sawada)
  2013-04-03  6:27 ` [ruby-core:53945] [ruby-trunk - Bug #8210] " sawa (Tsuyoshi Sawada)
  2013-04-03  6:37 ` [ruby-core:53946] " sawa (Tsuyoshi Sawada)
@ 2013-04-06  0:58 ` sawa (Tsuyoshi Sawada)
  2013-04-06 10:34 ` [ruby-core:54058] " sawa (Tsuyoshi Sawada)
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: sawa (Tsuyoshi Sawada) @ 2013-04-06  0:58 UTC (permalink / raw
  To: ruby-core


Issue #8210 has been updated by sawa (Tsuyoshi Sawada).


=begin
The problem seems to happen with combination of a certain token, `?`, and `\z`.

    "こんにちは" =~ /a?\z/ # => nil
    "こんにちは" =~ / ?\z/ # => nil
    "こんにちは" =~ /\t?\z/ # => nil
    "こんにちは" =~ /\n?\z/ # => nil
    "こんにちは" =~ /\s?\z/ # => nil
    "こんにちは" =~ /.?\z/ # => 4
    "こんにちは" =~ /\S?\z/ # => 4
    "こんにちは" =~ /\W?\z/ # => 5
    "こんにちは" =~ /あ?\z/ # => 5
    "こんにちは" =~ /\w?\z/ # => 5
=end

----------------------------------------
Bug #8210: Multibyte character interfering with end-line character within a regex
https://bugs.ruby-lang.org/issues/8210#change-38278

Author: sawa (Tsuyoshi Sawada)
Status: Open
Priority: Normal
Assignee: 
Category: 
Target version: 
ruby -v: 2.0


=begin
With this regex:

    regex1 = /\z/

the following strings match as expected:

    "hello" =~ regex1 # => 5
    "こんにちは" =~ regex1 # => 5

but with these regexes:

    regex2 = /#$/?\z/
    regex3 = /\n?\z/

they show difference:

    "hello" =~ regex2 # => 5
    "hello" =~ regex3 # => 5
    "こんにちは" =~ regex2 # => nil
    "こんにちは" =~ regex3 # => nil

The string encoding is UTF-8, and the OS is Linux (i.e., `$/` is `"\n"`). I expect them to behave the same, and believe this is a bug.
=end


-- 
http://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [ruby-core:54058] [ruby-trunk - Bug #8210] Multibyte character interfering with end-line character within a regex
  2013-04-03  6:22 [ruby-core:53944] [ruby-trunk - Bug #8210][Open] Multibyte character interfering with end-line character within a regex sawa (Tsuyoshi Sawada)
                   ` (2 preceding siblings ...)
  2013-04-06  0:58 ` [ruby-core:54046] " sawa (Tsuyoshi Sawada)
@ 2013-04-06 10:34 ` sawa (Tsuyoshi Sawada)
  2013-04-06 11:15 ` [ruby-core:54060] [ruby-trunk - Bug #8210][Assigned] " naruse (Yui NARUSE)
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: sawa (Tsuyoshi Sawada) @ 2013-04-06 10:34 UTC (permalink / raw
  To: ruby-core


Issue #8210 has been updated by sawa (Tsuyoshi Sawada).


Is this bug report wrong? If so, please note so.
----------------------------------------
Bug #8210: Multibyte character interfering with end-line character within a regex
https://bugs.ruby-lang.org/issues/8210#change-38290

Author: sawa (Tsuyoshi Sawada)
Status: Open
Priority: Normal
Assignee: 
Category: 
Target version: 
ruby -v: 2.0


=begin
With this regex:

    regex1 = /\z/

the following strings match as expected:

    "hello" =~ regex1 # => 5
    "こんにちは" =~ regex1 # => 5

but with these regexes:

    regex2 = /#$/?\z/
    regex3 = /\n?\z/

they show difference:

    "hello" =~ regex2 # => 5
    "hello" =~ regex3 # => 5
    "こんにちは" =~ regex2 # => nil
    "こんにちは" =~ regex3 # => nil

The string encoding is UTF-8, and the OS is Linux (i.e., `$/` is `"\n"`). I expect them to behave the same, and believe this is a bug.
=end


-- 
http://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [ruby-core:54060] [ruby-trunk - Bug #8210][Assigned] Multibyte character interfering with end-line character within a regex
  2013-04-03  6:22 [ruby-core:53944] [ruby-trunk - Bug #8210][Open] Multibyte character interfering with end-line character within a regex sawa (Tsuyoshi Sawada)
                   ` (3 preceding siblings ...)
  2013-04-06 10:34 ` [ruby-core:54058] " sawa (Tsuyoshi Sawada)
@ 2013-04-06 11:15 ` naruse (Yui NARUSE)
  2013-04-08 19:05 ` [ruby-core:54118] [ruby-trunk - Bug #8210] " acheong87 (Andrew Cheong)
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: naruse (Yui NARUSE) @ 2013-04-06 11:15 UTC (permalink / raw
  To: ruby-core


Issue #8210 has been updated by naruse (Yui NARUSE).

Category set to M17N
Status changed from Open to Assigned
Assignee set to naruse (Yui NARUSE)
Target version set to current: 2.1.0

sawa (Tsuyoshi Sawada) wrote:
> Is this bug report wrong? If so, please note so.

This looks really bug of oniguruma/onigmo.
----------------------------------------
Bug #8210: Multibyte character interfering with end-line character within a regex
https://bugs.ruby-lang.org/issues/8210#change-38292

Author: sawa (Tsuyoshi Sawada)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: M17N
Target version: current: 2.1.0
ruby -v: 2.0


=begin
With this regex:

    regex1 = /\z/

the following strings match as expected:

    "hello" =~ regex1 # => 5
    "こんにちは" =~ regex1 # => 5

but with these regexes:

    regex2 = /#$/?\z/
    regex3 = /\n?\z/

they show difference:

    "hello" =~ regex2 # => 5
    "hello" =~ regex3 # => 5
    "こんにちは" =~ regex2 # => nil
    "こんにちは" =~ regex3 # => nil

The string encoding is UTF-8, and the OS is Linux (i.e., `$/` is `"\n"`). I expect them to behave the same, and believe this is a bug.
=end


-- 
http://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [ruby-core:54118] [ruby-trunk - Bug #8210] Multibyte character interfering with end-line character within a regex
  2013-04-03  6:22 [ruby-core:53944] [ruby-trunk - Bug #8210][Open] Multibyte character interfering with end-line character within a regex sawa (Tsuyoshi Sawada)
                   ` (4 preceding siblings ...)
  2013-04-06 11:15 ` [ruby-core:54060] [ruby-trunk - Bug #8210][Assigned] " naruse (Yui NARUSE)
@ 2013-04-08 19:05 ` acheong87 (Andrew Cheong)
  2013-04-09  5:42 ` [ruby-core:54127] " rondinif (Franco Rondini)
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: acheong87 (Andrew Cheong) @ 2013-04-08 19:05 UTC (permalink / raw
  To: ruby-core


Issue #8210 has been updated by acheong87 (Andrew Cheong).


Contributing notes regarding this bug can be found here: http://stackoverflow.com/a/15885857/925913.
----------------------------------------
Bug #8210: Multibyte character interfering with end-line character within a regex
https://bugs.ruby-lang.org/issues/8210#change-38369

Author: sawa (Tsuyoshi Sawada)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: M17N
Target version: current: 2.1.0
ruby -v: 2.0


=begin
With this regex:

    regex1 = /\z/

the following strings match as expected:

    "hello" =~ regex1 # => 5
    "こんにちは" =~ regex1 # => 5

but with these regexes:

    regex2 = /#$/?\z/
    regex3 = /\n?\z/

they show difference:

    "hello" =~ regex2 # => 5
    "hello" =~ regex3 # => 5
    "こんにちは" =~ regex2 # => nil
    "こんにちは" =~ regex3 # => nil

The string encoding is UTF-8, and the OS is Linux (i.e., `$/` is `"\n"`). I expect them to behave the same, and believe this is a bug.
=end


-- 
http://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [ruby-core:54127] [ruby-trunk - Bug #8210] Multibyte character interfering with end-line character within a regex
  2013-04-03  6:22 [ruby-core:53944] [ruby-trunk - Bug #8210][Open] Multibyte character interfering with end-line character within a regex sawa (Tsuyoshi Sawada)
                   ` (5 preceding siblings ...)
  2013-04-08 19:05 ` [ruby-core:54118] [ruby-trunk - Bug #8210] " acheong87 (Andrew Cheong)
@ 2013-04-09  5:42 ` rondinif (Franco Rondini)
  2013-04-09 15:41 ` [ruby-core:54145] " k_takata (Ken Takata)
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: rondinif (Franco Rondini) @ 2013-04-09  5:42 UTC (permalink / raw
  To: ruby-core


Issue #8210 has been updated by rondinif (Franco Rondini).


Just edited the [answer](http://stackoverflow.com/a/15885857/1657028) and [test code available](https://gist.github.com/anonymous/5339185)
----------------------------------------
Bug #8210: Multibyte character interfering with end-line character within a regex
https://bugs.ruby-lang.org/issues/8210#change-38378

Author: sawa (Tsuyoshi Sawada)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: M17N
Target version: current: 2.1.0
ruby -v: 2.0


=begin
With this regex:

    regex1 = /\z/

the following strings match as expected:

    "hello" =~ regex1 # => 5
    "こんにちは" =~ regex1 # => 5

but with these regexes:

    regex2 = /#$/?\z/
    regex3 = /\n?\z/

they show difference:

    "hello" =~ regex2 # => 5
    "hello" =~ regex3 # => 5
    "こんにちは" =~ regex2 # => nil
    "こんにちは" =~ regex3 # => nil

The string encoding is UTF-8, and the OS is Linux (i.e., `$/` is `"\n"`). I expect them to behave the same, and believe this is a bug.
=end


-- 
http://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [ruby-core:54145] [ruby-trunk - Bug #8210] Multibyte character interfering with end-line character within a regex
  2013-04-03  6:22 [ruby-core:53944] [ruby-trunk - Bug #8210][Open] Multibyte character interfering with end-line character within a regex sawa (Tsuyoshi Sawada)
                   ` (6 preceding siblings ...)
  2013-04-09  5:42 ` [ruby-core:54127] " rondinif (Franco Rondini)
@ 2013-04-09 15:41 ` k_takata (Ken Takata)
  2013-04-11  3:46 ` [ruby-core:54166] " sawa (Tsuyoshi Sawada)
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: k_takata (Ken Takata) @ 2013-04-09 15:41 UTC (permalink / raw
  To: ruby-core


Issue #8210 has been updated by k_takata (Ken Takata).

File fix-8210-1.diff added
File fix-8210-2.diff added

This problem was caused by optimization of \z.
I wrote two patches to fix this problem.

Maybe fix-8210-1.diff is more efficient than fix-8210-2.diff,
but the former one tries to do backward search when 'start==range'
after 'start' is adjusted. This behavior is a little bit confusing.
----------------------------------------
Bug #8210: Multibyte character interfering with end-line character within a regex
https://bugs.ruby-lang.org/issues/8210#change-38399

Author: sawa (Tsuyoshi Sawada)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: M17N
Target version: current: 2.1.0
ruby -v: 2.0


=begin
With this regex:

    regex1 = /\z/

the following strings match as expected:

    "hello" =~ regex1 # => 5
    "こんにちは" =~ regex1 # => 5

but with these regexes:

    regex2 = /#$/?\z/
    regex3 = /\n?\z/

they show difference:

    "hello" =~ regex2 # => 5
    "hello" =~ regex3 # => 5
    "こんにちは" =~ regex2 # => nil
    "こんにちは" =~ regex3 # => nil

The string encoding is UTF-8, and the OS is Linux (i.e., `$/` is `"\n"`). I expect them to behave the same, and believe this is a bug.
=end


-- 
http://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [ruby-core:54166] [ruby-trunk - Bug #8210] Multibyte character interfering with end-line character within a regex
  2013-04-03  6:22 [ruby-core:53944] [ruby-trunk - Bug #8210][Open] Multibyte character interfering with end-line character within a regex sawa (Tsuyoshi Sawada)
                   ` (7 preceding siblings ...)
  2013-04-09 15:41 ` [ruby-core:54145] " k_takata (Ken Takata)
@ 2013-04-11  3:46 ` sawa (Tsuyoshi Sawada)
  2013-04-11 14:31 ` [ruby-core:54179] " naruse (Yui NARUSE)
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: sawa (Tsuyoshi Sawada) @ 2013-04-11  3:46 UTC (permalink / raw
  To: ruby-core


Issue #8210 has been updated by sawa (Tsuyoshi Sawada).


Is either of k_takata's bug fix going to be incorporated?
----------------------------------------
Bug #8210: Multibyte character interfering with end-line character within a regex
https://bugs.ruby-lang.org/issues/8210#change-38430

Author: sawa (Tsuyoshi Sawada)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: M17N
Target version: current: 2.1.0
ruby -v: 2.0


=begin
With this regex:

    regex1 = /\z/

the following strings match as expected:

    "hello" =~ regex1 # => 5
    "こんにちは" =~ regex1 # => 5

but with these regexes:

    regex2 = /#$/?\z/
    regex3 = /\n?\z/

they show difference:

    "hello" =~ regex2 # => 5
    "hello" =~ regex3 # => 5
    "こんにちは" =~ regex2 # => nil
    "こんにちは" =~ regex3 # => nil

The string encoding is UTF-8, and the OS is Linux (i.e., `$/` is `"\n"`). I expect them to behave the same, and believe this is a bug.
=end


-- 
http://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [ruby-core:54179] [ruby-trunk - Bug #8210] Multibyte character interfering with end-line character within a regex
  2013-04-03  6:22 [ruby-core:53944] [ruby-trunk - Bug #8210][Open] Multibyte character interfering with end-line character within a regex sawa (Tsuyoshi Sawada)
                   ` (8 preceding siblings ...)
  2013-04-11  3:46 ` [ruby-core:54166] " sawa (Tsuyoshi Sawada)
@ 2013-04-11 14:31 ` naruse (Yui NARUSE)
  2013-04-13 10:31 ` [ruby-core:54251] " k_takata (Ken Takata)
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 14+ messages in thread
From: naruse (Yui NARUSE) @ 2013-04-11 14:31 UTC (permalink / raw
  To: ruby-core


Issue #8210 has been updated by naruse (Yui NARUSE).


k_takata (Ken Takata) wrote:
> This problem was caused by optimization of \z.
> I wrote two patches to fix this problem.
> 
> Maybe fix-8210-1.diff is more efficient than fix-8210-2.diff,
> but the former one tries to do backward search when 'start==range'
> after 'start' is adjusted. This behavior is a little bit confusing.

k_takata (Ken Takata) wrote:
> This problem was caused by optimization of \z.
> I wrote two patches to fix this problem.
> 
> Maybe fix-8210-1.diff is more efficient than fix-8210-2.diff,
> but the former one tries to do backward search when 'start==range'
> after 'start' is adjusted. This behavior is a little bit confusing.

I think -1 is suitable because it looks to keep original intention more than -2.
----------------------------------------
Bug #8210: Multibyte character interfering with end-line character within a regex
https://bugs.ruby-lang.org/issues/8210#change-38450

Author: sawa (Tsuyoshi Sawada)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: M17N
Target version: current: 2.1.0
ruby -v: 2.0


=begin
With this regex:

    regex1 = /\z/

the following strings match as expected:

    "hello" =~ regex1 # => 5
    "こんにちは" =~ regex1 # => 5

but with these regexes:

    regex2 = /#$/?\z/
    regex3 = /\n?\z/

they show difference:

    "hello" =~ regex2 # => 5
    "hello" =~ regex3 # => 5
    "こんにちは" =~ regex2 # => nil
    "こんにちは" =~ regex3 # => nil

The string encoding is UTF-8, and the OS is Linux (i.e., `$/` is `"\n"`). I expect them to behave the same, and believe this is a bug.
=end


-- 
http://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [ruby-core:54251] [ruby-trunk - Bug #8210] Multibyte character interfering with end-line character within a regex
  2013-04-03  6:22 [ruby-core:53944] [ruby-trunk - Bug #8210][Open] Multibyte character interfering with end-line character within a regex sawa (Tsuyoshi Sawada)
                   ` (9 preceding siblings ...)
  2013-04-11 14:31 ` [ruby-core:54179] " naruse (Yui NARUSE)
@ 2013-04-13 10:31 ` k_takata (Ken Takata)
  2013-04-13 13:17 ` [ruby-core:54252] [Backport 200 - Backport " k_takata (Ken Takata)
  2013-05-14  9:23 ` [ruby-core:54979] [Backport93 " k_takata (Ken Takata)
  12 siblings, 0 replies; 14+ messages in thread
From: k_takata (Ken Takata) @ 2013-04-13 10:31 UTC (permalink / raw
  To: ruby-core


Issue #8210 has been updated by k_takata (Ken Takata).

File fix-8210-1-update.diff added

> I think -1 is suitable because it looks to keep original intention more than -2.

Thanks for your comment.
I have updated onigmo's tmp/ruby-2.0.x branch.
https://github.com/k-takata/Onigmo/tree/f22cf2e566712cace60d17f84d63119d7c5764ee

I also attach an updated patch so that can be applied to Ruby 1.9.3.
----------------------------------------
Bug #8210: Multibyte character interfering with end-line character within a regex
https://bugs.ruby-lang.org/issues/8210#change-38513

Author: sawa (Tsuyoshi Sawada)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: M17N
Target version: current: 2.1.0
ruby -v: 2.0


=begin
With this regex:

    regex1 = /\z/

the following strings match as expected:

    "hello" =~ regex1 # => 5
    "こんにちは" =~ regex1 # => 5

but with these regexes:

    regex2 = /#$/?\z/
    regex3 = /\n?\z/

they show difference:

    "hello" =~ regex2 # => 5
    "hello" =~ regex3 # => 5
    "こんにちは" =~ regex2 # => nil
    "こんにちは" =~ regex3 # => nil

The string encoding is UTF-8, and the OS is Linux (i.e., `$/` is `"\n"`). I expect them to behave the same, and believe this is a bug.
=end


-- 
http://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [ruby-core:54252] [Backport 200 - Backport #8210] Multibyte character interfering with end-line character within a regex
  2013-04-03  6:22 [ruby-core:53944] [ruby-trunk - Bug #8210][Open] Multibyte character interfering with end-line character within a regex sawa (Tsuyoshi Sawada)
                   ` (10 preceding siblings ...)
  2013-04-13 10:31 ` [ruby-core:54251] " k_takata (Ken Takata)
@ 2013-04-13 13:17 ` k_takata (Ken Takata)
  2013-05-14  9:23 ` [ruby-core:54979] [Backport93 " k_takata (Ken Takata)
  12 siblings, 0 replies; 14+ messages in thread
From: k_takata (Ken Takata) @ 2013-04-13 13:17 UTC (permalink / raw
  To: ruby-core


Issue #8210 has been updated by k_takata (Ken Takata).


I think it's better to backport this patch to Ruby 1.9.3 too.
----------------------------------------
Backport #8210: Multibyte character interfering with end-line character within a regex
https://bugs.ruby-lang.org/issues/8210#change-38516

Author: sawa (Tsuyoshi Sawada)
Status: Assigned
Priority: Normal
Assignee: nagachika (Tomoyuki Chikanaga)
Category: 
Target version: 


=begin
With this regex:

    regex1 = /\z/

the following strings match as expected:

    "hello" =~ regex1 # => 5
    "こんにちは" =~ regex1 # => 5

but with these regexes:

    regex2 = /#$/?\z/
    regex3 = /\n?\z/

they show difference:

    "hello" =~ regex2 # => 5
    "hello" =~ regex3 # => 5
    "こんにちは" =~ regex2 # => nil
    "こんにちは" =~ regex3 # => nil

The string encoding is UTF-8, and the OS is Linux (i.e., `$/` is `"\n"`). I expect them to behave the same, and believe this is a bug.
=end


-- 
http://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [ruby-core:54979] [Backport93 - Backport #8210] Multibyte character interfering with end-line character within a regex
  2013-04-03  6:22 [ruby-core:53944] [ruby-trunk - Bug #8210][Open] Multibyte character interfering with end-line character within a regex sawa (Tsuyoshi Sawada)
                   ` (11 preceding siblings ...)
  2013-04-13 13:17 ` [ruby-core:54252] [Backport 200 - Backport " k_takata (Ken Takata)
@ 2013-05-14  9:23 ` k_takata (Ken Takata)
  12 siblings, 0 replies; 14+ messages in thread
From: k_takata (Ken Takata) @ 2013-05-14  9:23 UTC (permalink / raw
  To: ruby-core


Issue #8210 has been updated by k_takata (Ken Takata).


Hi usa,

> * regexec.c (onig_search): fix problem with optimization of \z.
>   [Backport #8210]
>   patched by k_tanaka at [ruby-core:54251].

Thank you for merging my patch.
BTW, my name is not tanaka...
----------------------------------------
Backport #8210: Multibyte character interfering with end-line character within a regex
https://bugs.ruby-lang.org/issues/8210#change-39329

Author: sawa (Tsuyoshi Sawada)
Status: Closed
Priority: Normal
Assignee: usa (Usaku NAKAMURA)
Category: 
Target version: 


=begin
With this regex:

    regex1 = /\z/

the following strings match as expected:

    "hello" =~ regex1 # => 5
    "こんにちは" =~ regex1 # => 5

but with these regexes:

    regex2 = /#$/?\z/
    regex3 = /\n?\z/

they show difference:

    "hello" =~ regex2 # => 5
    "hello" =~ regex3 # => 5
    "こんにちは" =~ regex2 # => nil
    "こんにちは" =~ regex3 # => nil

The string encoding is UTF-8, and the OS is Linux (i.e., `$/` is `"\n"`). I expect them to behave the same, and believe this is a bug.
=end


-- 
http://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-05-14  9:48 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-03  6:22 [ruby-core:53944] [ruby-trunk - Bug #8210][Open] Multibyte character interfering with end-line character within a regex sawa (Tsuyoshi Sawada)
2013-04-03  6:27 ` [ruby-core:53945] [ruby-trunk - Bug #8210] " sawa (Tsuyoshi Sawada)
2013-04-03  6:37 ` [ruby-core:53946] " sawa (Tsuyoshi Sawada)
2013-04-06  0:58 ` [ruby-core:54046] " sawa (Tsuyoshi Sawada)
2013-04-06 10:34 ` [ruby-core:54058] " sawa (Tsuyoshi Sawada)
2013-04-06 11:15 ` [ruby-core:54060] [ruby-trunk - Bug #8210][Assigned] " naruse (Yui NARUSE)
2013-04-08 19:05 ` [ruby-core:54118] [ruby-trunk - Bug #8210] " acheong87 (Andrew Cheong)
2013-04-09  5:42 ` [ruby-core:54127] " rondinif (Franco Rondini)
2013-04-09 15:41 ` [ruby-core:54145] " k_takata (Ken Takata)
2013-04-11  3:46 ` [ruby-core:54166] " sawa (Tsuyoshi Sawada)
2013-04-11 14:31 ` [ruby-core:54179] " naruse (Yui NARUSE)
2013-04-13 10:31 ` [ruby-core:54251] " k_takata (Ken Takata)
2013-04-13 13:17 ` [ruby-core:54252] [Backport 200 - Backport " k_takata (Ken Takata)
2013-05-14  9:23 ` [ruby-core:54979] [Backport93 " k_takata (Ken Takata)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).