* [ruby-core:53944] [ruby-trunk - Bug #8210][Open] Multibyte character interfering with end-line character within a regex
@ 2013-04-03 6:22 sawa (Tsuyoshi Sawada)
2013-04-03 6:27 ` [ruby-core:53945] [ruby-trunk - Bug #8210] " sawa (Tsuyoshi Sawada)
` (12 more replies)
0 siblings, 13 replies; 14+ messages in thread
From: sawa (Tsuyoshi Sawada) @ 2013-04-03 6:22 UTC (permalink / raw
To: ruby-core
Issue #8210 has been reported by sawa (Tsuyoshi Sawada).
----------------------------------------
Bug #8210: Multibyte character interfering with end-line character within a regex
https://bugs.ruby-lang.org/issues/8210
Author: sawa (Tsuyoshi Sawada)
Status: Open
Priority: Normal
Assignee:
Category:
Target version:
ruby -v: 2.0
=begin
With this regex:
regex1 = /\z/
the following strings match as expected:
"hello" =~ regex1 # => 5
"こんにちは" =~ regex1 # => 5
but with these regexes:
regex2 = /#$/?\z/
regex3 = /\n?\z/
they show difference:
"hello" =~ regex2 # => 5
"hello" =~ regex3 # => 5
"こんにちは" =~ regex2 # => nil
"こんにちは" =~ regex3 # => nil
The string encoding is UTF-8, and the OS is Linux (i.e., `$/` is `"\n"`). I expect them to behave the same, and believe this is a bug.
=end
--
http://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 14+ messages in thread
* [ruby-core:53945] [ruby-trunk - Bug #8210] Multibyte character interfering with end-line character within a regex
2013-04-03 6:22 [ruby-core:53944] [ruby-trunk - Bug #8210][Open] Multibyte character interfering with end-line character within a regex sawa (Tsuyoshi Sawada)
@ 2013-04-03 6:27 ` sawa (Tsuyoshi Sawada)
2013-04-03 6:37 ` [ruby-core:53946] " sawa (Tsuyoshi Sawada)
` (11 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: sawa (Tsuyoshi Sawada) @ 2013-04-03 6:27 UTC (permalink / raw
To: ruby-core
Issue #8210 has been updated by sawa (Tsuyoshi Sawada).
=begin
A different regex:
regex4 = /[[:space:]]?\z/
seems to work as expected:
"hello" =~ regex4 # => 5
"こんにちは" =~ regex4 # => 5
=end
----------------------------------------
Bug #8210: Multibyte character interfering with end-line character within a regex
https://bugs.ruby-lang.org/issues/8210#change-38154
Author: sawa (Tsuyoshi Sawada)
Status: Open
Priority: Normal
Assignee:
Category:
Target version:
ruby -v: 2.0
=begin
With this regex:
regex1 = /\z/
the following strings match as expected:
"hello" =~ regex1 # => 5
"こんにちは" =~ regex1 # => 5
but with these regexes:
regex2 = /#$/?\z/
regex3 = /\n?\z/
they show difference:
"hello" =~ regex2 # => 5
"hello" =~ regex3 # => 5
"こんにちは" =~ regex2 # => nil
"こんにちは" =~ regex3 # => nil
The string encoding is UTF-8, and the OS is Linux (i.e., `$/` is `"\n"`). I expect them to behave the same, and believe this is a bug.
=end
--
http://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 14+ messages in thread
* [ruby-core:53946] [ruby-trunk - Bug #8210] Multibyte character interfering with end-line character within a regex
2013-04-03 6:22 [ruby-core:53944] [ruby-trunk - Bug #8210][Open] Multibyte character interfering with end-line character within a regex sawa (Tsuyoshi Sawada)
2013-04-03 6:27 ` [ruby-core:53945] [ruby-trunk - Bug #8210] " sawa (Tsuyoshi Sawada)
@ 2013-04-03 6:37 ` sawa (Tsuyoshi Sawada)
2013-04-06 0:58 ` [ruby-core:54046] " sawa (Tsuyoshi Sawada)
` (10 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: sawa (Tsuyoshi Sawada) @ 2013-04-03 6:37 UTC (permalink / raw
To: ruby-core
Issue #8210 has been updated by sawa (Tsuyoshi Sawada).
=begin
Still a different regex:
regex5 = /\n?$/
seems to work as expected:
"hello" =~ regex5 # => 5
"こんにちは" =~ regex5 # => 5
=end
----------------------------------------
Bug #8210: Multibyte character interfering with end-line character within a regex
https://bugs.ruby-lang.org/issues/8210#change-38155
Author: sawa (Tsuyoshi Sawada)
Status: Open
Priority: Normal
Assignee:
Category:
Target version:
ruby -v: 2.0
=begin
With this regex:
regex1 = /\z/
the following strings match as expected:
"hello" =~ regex1 # => 5
"こんにちは" =~ regex1 # => 5
but with these regexes:
regex2 = /#$/?\z/
regex3 = /\n?\z/
they show difference:
"hello" =~ regex2 # => 5
"hello" =~ regex3 # => 5
"こんにちは" =~ regex2 # => nil
"こんにちは" =~ regex3 # => nil
The string encoding is UTF-8, and the OS is Linux (i.e., `$/` is `"\n"`). I expect them to behave the same, and believe this is a bug.
=end
--
http://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 14+ messages in thread
* [ruby-core:54046] [ruby-trunk - Bug #8210] Multibyte character interfering with end-line character within a regex
2013-04-03 6:22 [ruby-core:53944] [ruby-trunk - Bug #8210][Open] Multibyte character interfering with end-line character within a regex sawa (Tsuyoshi Sawada)
2013-04-03 6:27 ` [ruby-core:53945] [ruby-trunk - Bug #8210] " sawa (Tsuyoshi Sawada)
2013-04-03 6:37 ` [ruby-core:53946] " sawa (Tsuyoshi Sawada)
@ 2013-04-06 0:58 ` sawa (Tsuyoshi Sawada)
2013-04-06 10:34 ` [ruby-core:54058] " sawa (Tsuyoshi Sawada)
` (9 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: sawa (Tsuyoshi Sawada) @ 2013-04-06 0:58 UTC (permalink / raw
To: ruby-core
Issue #8210 has been updated by sawa (Tsuyoshi Sawada).
=begin
The problem seems to happen with combination of a certain token, `?`, and `\z`.
"こんにちは" =~ /a?\z/ # => nil
"こんにちは" =~ / ?\z/ # => nil
"こんにちは" =~ /\t?\z/ # => nil
"こんにちは" =~ /\n?\z/ # => nil
"こんにちは" =~ /\s?\z/ # => nil
"こんにちは" =~ /.?\z/ # => 4
"こんにちは" =~ /\S?\z/ # => 4
"こんにちは" =~ /\W?\z/ # => 5
"こんにちは" =~ /あ?\z/ # => 5
"こんにちは" =~ /\w?\z/ # => 5
=end
----------------------------------------
Bug #8210: Multibyte character interfering with end-line character within a regex
https://bugs.ruby-lang.org/issues/8210#change-38278
Author: sawa (Tsuyoshi Sawada)
Status: Open
Priority: Normal
Assignee:
Category:
Target version:
ruby -v: 2.0
=begin
With this regex:
regex1 = /\z/
the following strings match as expected:
"hello" =~ regex1 # => 5
"こんにちは" =~ regex1 # => 5
but with these regexes:
regex2 = /#$/?\z/
regex3 = /\n?\z/
they show difference:
"hello" =~ regex2 # => 5
"hello" =~ regex3 # => 5
"こんにちは" =~ regex2 # => nil
"こんにちは" =~ regex3 # => nil
The string encoding is UTF-8, and the OS is Linux (i.e., `$/` is `"\n"`). I expect them to behave the same, and believe this is a bug.
=end
--
http://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 14+ messages in thread
* [ruby-core:54058] [ruby-trunk - Bug #8210] Multibyte character interfering with end-line character within a regex
2013-04-03 6:22 [ruby-core:53944] [ruby-trunk - Bug #8210][Open] Multibyte character interfering with end-line character within a regex sawa (Tsuyoshi Sawada)
` (2 preceding siblings ...)
2013-04-06 0:58 ` [ruby-core:54046] " sawa (Tsuyoshi Sawada)
@ 2013-04-06 10:34 ` sawa (Tsuyoshi Sawada)
2013-04-06 11:15 ` [ruby-core:54060] [ruby-trunk - Bug #8210][Assigned] " naruse (Yui NARUSE)
` (8 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: sawa (Tsuyoshi Sawada) @ 2013-04-06 10:34 UTC (permalink / raw
To: ruby-core
Issue #8210 has been updated by sawa (Tsuyoshi Sawada).
Is this bug report wrong? If so, please note so.
----------------------------------------
Bug #8210: Multibyte character interfering with end-line character within a regex
https://bugs.ruby-lang.org/issues/8210#change-38290
Author: sawa (Tsuyoshi Sawada)
Status: Open
Priority: Normal
Assignee:
Category:
Target version:
ruby -v: 2.0
=begin
With this regex:
regex1 = /\z/
the following strings match as expected:
"hello" =~ regex1 # => 5
"こんにちは" =~ regex1 # => 5
but with these regexes:
regex2 = /#$/?\z/
regex3 = /\n?\z/
they show difference:
"hello" =~ regex2 # => 5
"hello" =~ regex3 # => 5
"こんにちは" =~ regex2 # => nil
"こんにちは" =~ regex3 # => nil
The string encoding is UTF-8, and the OS is Linux (i.e., `$/` is `"\n"`). I expect them to behave the same, and believe this is a bug.
=end
--
http://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 14+ messages in thread
* [ruby-core:54060] [ruby-trunk - Bug #8210][Assigned] Multibyte character interfering with end-line character within a regex
2013-04-03 6:22 [ruby-core:53944] [ruby-trunk - Bug #8210][Open] Multibyte character interfering with end-line character within a regex sawa (Tsuyoshi Sawada)
` (3 preceding siblings ...)
2013-04-06 10:34 ` [ruby-core:54058] " sawa (Tsuyoshi Sawada)
@ 2013-04-06 11:15 ` naruse (Yui NARUSE)
2013-04-08 19:05 ` [ruby-core:54118] [ruby-trunk - Bug #8210] " acheong87 (Andrew Cheong)
` (7 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: naruse (Yui NARUSE) @ 2013-04-06 11:15 UTC (permalink / raw
To: ruby-core
Issue #8210 has been updated by naruse (Yui NARUSE).
Category set to M17N
Status changed from Open to Assigned
Assignee set to naruse (Yui NARUSE)
Target version set to current: 2.1.0
sawa (Tsuyoshi Sawada) wrote:
> Is this bug report wrong? If so, please note so.
This looks really bug of oniguruma/onigmo.
----------------------------------------
Bug #8210: Multibyte character interfering with end-line character within a regex
https://bugs.ruby-lang.org/issues/8210#change-38292
Author: sawa (Tsuyoshi Sawada)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: M17N
Target version: current: 2.1.0
ruby -v: 2.0
=begin
With this regex:
regex1 = /\z/
the following strings match as expected:
"hello" =~ regex1 # => 5
"こんにちは" =~ regex1 # => 5
but with these regexes:
regex2 = /#$/?\z/
regex3 = /\n?\z/
they show difference:
"hello" =~ regex2 # => 5
"hello" =~ regex3 # => 5
"こんにちは" =~ regex2 # => nil
"こんにちは" =~ regex3 # => nil
The string encoding is UTF-8, and the OS is Linux (i.e., `$/` is `"\n"`). I expect them to behave the same, and believe this is a bug.
=end
--
http://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 14+ messages in thread
* [ruby-core:54118] [ruby-trunk - Bug #8210] Multibyte character interfering with end-line character within a regex
2013-04-03 6:22 [ruby-core:53944] [ruby-trunk - Bug #8210][Open] Multibyte character interfering with end-line character within a regex sawa (Tsuyoshi Sawada)
` (4 preceding siblings ...)
2013-04-06 11:15 ` [ruby-core:54060] [ruby-trunk - Bug #8210][Assigned] " naruse (Yui NARUSE)
@ 2013-04-08 19:05 ` acheong87 (Andrew Cheong)
2013-04-09 5:42 ` [ruby-core:54127] " rondinif (Franco Rondini)
` (6 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: acheong87 (Andrew Cheong) @ 2013-04-08 19:05 UTC (permalink / raw
To: ruby-core
Issue #8210 has been updated by acheong87 (Andrew Cheong).
Contributing notes regarding this bug can be found here: http://stackoverflow.com/a/15885857/925913.
----------------------------------------
Bug #8210: Multibyte character interfering with end-line character within a regex
https://bugs.ruby-lang.org/issues/8210#change-38369
Author: sawa (Tsuyoshi Sawada)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: M17N
Target version: current: 2.1.0
ruby -v: 2.0
=begin
With this regex:
regex1 = /\z/
the following strings match as expected:
"hello" =~ regex1 # => 5
"こんにちは" =~ regex1 # => 5
but with these regexes:
regex2 = /#$/?\z/
regex3 = /\n?\z/
they show difference:
"hello" =~ regex2 # => 5
"hello" =~ regex3 # => 5
"こんにちは" =~ regex2 # => nil
"こんにちは" =~ regex3 # => nil
The string encoding is UTF-8, and the OS is Linux (i.e., `$/` is `"\n"`). I expect them to behave the same, and believe this is a bug.
=end
--
http://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 14+ messages in thread
* [ruby-core:54127] [ruby-trunk - Bug #8210] Multibyte character interfering with end-line character within a regex
2013-04-03 6:22 [ruby-core:53944] [ruby-trunk - Bug #8210][Open] Multibyte character interfering with end-line character within a regex sawa (Tsuyoshi Sawada)
` (5 preceding siblings ...)
2013-04-08 19:05 ` [ruby-core:54118] [ruby-trunk - Bug #8210] " acheong87 (Andrew Cheong)
@ 2013-04-09 5:42 ` rondinif (Franco Rondini)
2013-04-09 15:41 ` [ruby-core:54145] " k_takata (Ken Takata)
` (5 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: rondinif (Franco Rondini) @ 2013-04-09 5:42 UTC (permalink / raw
To: ruby-core
Issue #8210 has been updated by rondinif (Franco Rondini).
Just edited the [answer](http://stackoverflow.com/a/15885857/1657028) and [test code available](https://gist.github.com/anonymous/5339185)
----------------------------------------
Bug #8210: Multibyte character interfering with end-line character within a regex
https://bugs.ruby-lang.org/issues/8210#change-38378
Author: sawa (Tsuyoshi Sawada)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: M17N
Target version: current: 2.1.0
ruby -v: 2.0
=begin
With this regex:
regex1 = /\z/
the following strings match as expected:
"hello" =~ regex1 # => 5
"こんにちは" =~ regex1 # => 5
but with these regexes:
regex2 = /#$/?\z/
regex3 = /\n?\z/
they show difference:
"hello" =~ regex2 # => 5
"hello" =~ regex3 # => 5
"こんにちは" =~ regex2 # => nil
"こんにちは" =~ regex3 # => nil
The string encoding is UTF-8, and the OS is Linux (i.e., `$/` is `"\n"`). I expect them to behave the same, and believe this is a bug.
=end
--
http://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 14+ messages in thread
* [ruby-core:54145] [ruby-trunk - Bug #8210] Multibyte character interfering with end-line character within a regex
2013-04-03 6:22 [ruby-core:53944] [ruby-trunk - Bug #8210][Open] Multibyte character interfering with end-line character within a regex sawa (Tsuyoshi Sawada)
` (6 preceding siblings ...)
2013-04-09 5:42 ` [ruby-core:54127] " rondinif (Franco Rondini)
@ 2013-04-09 15:41 ` k_takata (Ken Takata)
2013-04-11 3:46 ` [ruby-core:54166] " sawa (Tsuyoshi Sawada)
` (4 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: k_takata (Ken Takata) @ 2013-04-09 15:41 UTC (permalink / raw
To: ruby-core
Issue #8210 has been updated by k_takata (Ken Takata).
File fix-8210-1.diff added
File fix-8210-2.diff added
This problem was caused by optimization of \z.
I wrote two patches to fix this problem.
Maybe fix-8210-1.diff is more efficient than fix-8210-2.diff,
but the former one tries to do backward search when 'start==range'
after 'start' is adjusted. This behavior is a little bit confusing.
----------------------------------------
Bug #8210: Multibyte character interfering with end-line character within a regex
https://bugs.ruby-lang.org/issues/8210#change-38399
Author: sawa (Tsuyoshi Sawada)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: M17N
Target version: current: 2.1.0
ruby -v: 2.0
=begin
With this regex:
regex1 = /\z/
the following strings match as expected:
"hello" =~ regex1 # => 5
"こんにちは" =~ regex1 # => 5
but with these regexes:
regex2 = /#$/?\z/
regex3 = /\n?\z/
they show difference:
"hello" =~ regex2 # => 5
"hello" =~ regex3 # => 5
"こんにちは" =~ regex2 # => nil
"こんにちは" =~ regex3 # => nil
The string encoding is UTF-8, and the OS is Linux (i.e., `$/` is `"\n"`). I expect them to behave the same, and believe this is a bug.
=end
--
http://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 14+ messages in thread
* [ruby-core:54166] [ruby-trunk - Bug #8210] Multibyte character interfering with end-line character within a regex
2013-04-03 6:22 [ruby-core:53944] [ruby-trunk - Bug #8210][Open] Multibyte character interfering with end-line character within a regex sawa (Tsuyoshi Sawada)
` (7 preceding siblings ...)
2013-04-09 15:41 ` [ruby-core:54145] " k_takata (Ken Takata)
@ 2013-04-11 3:46 ` sawa (Tsuyoshi Sawada)
2013-04-11 14:31 ` [ruby-core:54179] " naruse (Yui NARUSE)
` (3 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: sawa (Tsuyoshi Sawada) @ 2013-04-11 3:46 UTC (permalink / raw
To: ruby-core
Issue #8210 has been updated by sawa (Tsuyoshi Sawada).
Is either of k_takata's bug fix going to be incorporated?
----------------------------------------
Bug #8210: Multibyte character interfering with end-line character within a regex
https://bugs.ruby-lang.org/issues/8210#change-38430
Author: sawa (Tsuyoshi Sawada)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: M17N
Target version: current: 2.1.0
ruby -v: 2.0
=begin
With this regex:
regex1 = /\z/
the following strings match as expected:
"hello" =~ regex1 # => 5
"こんにちは" =~ regex1 # => 5
but with these regexes:
regex2 = /#$/?\z/
regex3 = /\n?\z/
they show difference:
"hello" =~ regex2 # => 5
"hello" =~ regex3 # => 5
"こんにちは" =~ regex2 # => nil
"こんにちは" =~ regex3 # => nil
The string encoding is UTF-8, and the OS is Linux (i.e., `$/` is `"\n"`). I expect them to behave the same, and believe this is a bug.
=end
--
http://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 14+ messages in thread
* [ruby-core:54179] [ruby-trunk - Bug #8210] Multibyte character interfering with end-line character within a regex
2013-04-03 6:22 [ruby-core:53944] [ruby-trunk - Bug #8210][Open] Multibyte character interfering with end-line character within a regex sawa (Tsuyoshi Sawada)
` (8 preceding siblings ...)
2013-04-11 3:46 ` [ruby-core:54166] " sawa (Tsuyoshi Sawada)
@ 2013-04-11 14:31 ` naruse (Yui NARUSE)
2013-04-13 10:31 ` [ruby-core:54251] " k_takata (Ken Takata)
` (2 subsequent siblings)
12 siblings, 0 replies; 14+ messages in thread
From: naruse (Yui NARUSE) @ 2013-04-11 14:31 UTC (permalink / raw
To: ruby-core
Issue #8210 has been updated by naruse (Yui NARUSE).
k_takata (Ken Takata) wrote:
> This problem was caused by optimization of \z.
> I wrote two patches to fix this problem.
>
> Maybe fix-8210-1.diff is more efficient than fix-8210-2.diff,
> but the former one tries to do backward search when 'start==range'
> after 'start' is adjusted. This behavior is a little bit confusing.
k_takata (Ken Takata) wrote:
> This problem was caused by optimization of \z.
> I wrote two patches to fix this problem.
>
> Maybe fix-8210-1.diff is more efficient than fix-8210-2.diff,
> but the former one tries to do backward search when 'start==range'
> after 'start' is adjusted. This behavior is a little bit confusing.
I think -1 is suitable because it looks to keep original intention more than -2.
----------------------------------------
Bug #8210: Multibyte character interfering with end-line character within a regex
https://bugs.ruby-lang.org/issues/8210#change-38450
Author: sawa (Tsuyoshi Sawada)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: M17N
Target version: current: 2.1.0
ruby -v: 2.0
=begin
With this regex:
regex1 = /\z/
the following strings match as expected:
"hello" =~ regex1 # => 5
"こんにちは" =~ regex1 # => 5
but with these regexes:
regex2 = /#$/?\z/
regex3 = /\n?\z/
they show difference:
"hello" =~ regex2 # => 5
"hello" =~ regex3 # => 5
"こんにちは" =~ regex2 # => nil
"こんにちは" =~ regex3 # => nil
The string encoding is UTF-8, and the OS is Linux (i.e., `$/` is `"\n"`). I expect them to behave the same, and believe this is a bug.
=end
--
http://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 14+ messages in thread
* [ruby-core:54251] [ruby-trunk - Bug #8210] Multibyte character interfering with end-line character within a regex
2013-04-03 6:22 [ruby-core:53944] [ruby-trunk - Bug #8210][Open] Multibyte character interfering with end-line character within a regex sawa (Tsuyoshi Sawada)
` (9 preceding siblings ...)
2013-04-11 14:31 ` [ruby-core:54179] " naruse (Yui NARUSE)
@ 2013-04-13 10:31 ` k_takata (Ken Takata)
2013-04-13 13:17 ` [ruby-core:54252] [Backport 200 - Backport " k_takata (Ken Takata)
2013-05-14 9:23 ` [ruby-core:54979] [Backport93 " k_takata (Ken Takata)
12 siblings, 0 replies; 14+ messages in thread
From: k_takata (Ken Takata) @ 2013-04-13 10:31 UTC (permalink / raw
To: ruby-core
Issue #8210 has been updated by k_takata (Ken Takata).
File fix-8210-1-update.diff added
> I think -1 is suitable because it looks to keep original intention more than -2.
Thanks for your comment.
I have updated onigmo's tmp/ruby-2.0.x branch.
https://github.com/k-takata/Onigmo/tree/f22cf2e566712cace60d17f84d63119d7c5764ee
I also attach an updated patch so that can be applied to Ruby 1.9.3.
----------------------------------------
Bug #8210: Multibyte character interfering with end-line character within a regex
https://bugs.ruby-lang.org/issues/8210#change-38513
Author: sawa (Tsuyoshi Sawada)
Status: Assigned
Priority: Normal
Assignee: naruse (Yui NARUSE)
Category: M17N
Target version: current: 2.1.0
ruby -v: 2.0
=begin
With this regex:
regex1 = /\z/
the following strings match as expected:
"hello" =~ regex1 # => 5
"こんにちは" =~ regex1 # => 5
but with these regexes:
regex2 = /#$/?\z/
regex3 = /\n?\z/
they show difference:
"hello" =~ regex2 # => 5
"hello" =~ regex3 # => 5
"こんにちは" =~ regex2 # => nil
"こんにちは" =~ regex3 # => nil
The string encoding is UTF-8, and the OS is Linux (i.e., `$/` is `"\n"`). I expect them to behave the same, and believe this is a bug.
=end
--
http://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 14+ messages in thread
* [ruby-core:54252] [Backport 200 - Backport #8210] Multibyte character interfering with end-line character within a regex
2013-04-03 6:22 [ruby-core:53944] [ruby-trunk - Bug #8210][Open] Multibyte character interfering with end-line character within a regex sawa (Tsuyoshi Sawada)
` (10 preceding siblings ...)
2013-04-13 10:31 ` [ruby-core:54251] " k_takata (Ken Takata)
@ 2013-04-13 13:17 ` k_takata (Ken Takata)
2013-05-14 9:23 ` [ruby-core:54979] [Backport93 " k_takata (Ken Takata)
12 siblings, 0 replies; 14+ messages in thread
From: k_takata (Ken Takata) @ 2013-04-13 13:17 UTC (permalink / raw
To: ruby-core
Issue #8210 has been updated by k_takata (Ken Takata).
I think it's better to backport this patch to Ruby 1.9.3 too.
----------------------------------------
Backport #8210: Multibyte character interfering with end-line character within a regex
https://bugs.ruby-lang.org/issues/8210#change-38516
Author: sawa (Tsuyoshi Sawada)
Status: Assigned
Priority: Normal
Assignee: nagachika (Tomoyuki Chikanaga)
Category:
Target version:
=begin
With this regex:
regex1 = /\z/
the following strings match as expected:
"hello" =~ regex1 # => 5
"こんにちは" =~ regex1 # => 5
but with these regexes:
regex2 = /#$/?\z/
regex3 = /\n?\z/
they show difference:
"hello" =~ regex2 # => 5
"hello" =~ regex3 # => 5
"こんにちは" =~ regex2 # => nil
"こんにちは" =~ regex3 # => nil
The string encoding is UTF-8, and the OS is Linux (i.e., `$/` is `"\n"`). I expect them to behave the same, and believe this is a bug.
=end
--
http://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 14+ messages in thread
* [ruby-core:54979] [Backport93 - Backport #8210] Multibyte character interfering with end-line character within a regex
2013-04-03 6:22 [ruby-core:53944] [ruby-trunk - Bug #8210][Open] Multibyte character interfering with end-line character within a regex sawa (Tsuyoshi Sawada)
` (11 preceding siblings ...)
2013-04-13 13:17 ` [ruby-core:54252] [Backport 200 - Backport " k_takata (Ken Takata)
@ 2013-05-14 9:23 ` k_takata (Ken Takata)
12 siblings, 0 replies; 14+ messages in thread
From: k_takata (Ken Takata) @ 2013-05-14 9:23 UTC (permalink / raw
To: ruby-core
Issue #8210 has been updated by k_takata (Ken Takata).
Hi usa,
> * regexec.c (onig_search): fix problem with optimization of \z.
> [Backport #8210]
> patched by k_tanaka at [ruby-core:54251].
Thank you for merging my patch.
BTW, my name is not tanaka...
----------------------------------------
Backport #8210: Multibyte character interfering with end-line character within a regex
https://bugs.ruby-lang.org/issues/8210#change-39329
Author: sawa (Tsuyoshi Sawada)
Status: Closed
Priority: Normal
Assignee: usa (Usaku NAKAMURA)
Category:
Target version:
=begin
With this regex:
regex1 = /\z/
the following strings match as expected:
"hello" =~ regex1 # => 5
"こんにちは" =~ regex1 # => 5
but with these regexes:
regex2 = /#$/?\z/
regex3 = /\n?\z/
they show difference:
"hello" =~ regex2 # => 5
"hello" =~ regex3 # => 5
"こんにちは" =~ regex2 # => nil
"こんにちは" =~ regex3 # => nil
The string encoding is UTF-8, and the OS is Linux (i.e., `$/` is `"\n"`). I expect them to behave the same, and believe this is a bug.
=end
--
http://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2013-05-14 9:48 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-03 6:22 [ruby-core:53944] [ruby-trunk - Bug #8210][Open] Multibyte character interfering with end-line character within a regex sawa (Tsuyoshi Sawada)
2013-04-03 6:27 ` [ruby-core:53945] [ruby-trunk - Bug #8210] " sawa (Tsuyoshi Sawada)
2013-04-03 6:37 ` [ruby-core:53946] " sawa (Tsuyoshi Sawada)
2013-04-06 0:58 ` [ruby-core:54046] " sawa (Tsuyoshi Sawada)
2013-04-06 10:34 ` [ruby-core:54058] " sawa (Tsuyoshi Sawada)
2013-04-06 11:15 ` [ruby-core:54060] [ruby-trunk - Bug #8210][Assigned] " naruse (Yui NARUSE)
2013-04-08 19:05 ` [ruby-core:54118] [ruby-trunk - Bug #8210] " acheong87 (Andrew Cheong)
2013-04-09 5:42 ` [ruby-core:54127] " rondinif (Franco Rondini)
2013-04-09 15:41 ` [ruby-core:54145] " k_takata (Ken Takata)
2013-04-11 3:46 ` [ruby-core:54166] " sawa (Tsuyoshi Sawada)
2013-04-11 14:31 ` [ruby-core:54179] " naruse (Yui NARUSE)
2013-04-13 10:31 ` [ruby-core:54251] " k_takata (Ken Takata)
2013-04-13 13:17 ` [ruby-core:54252] [Backport 200 - Backport " k_takata (Ken Takata)
2013-05-14 9:23 ` [ruby-core:54979] [Backport93 " k_takata (Ken Takata)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).