ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:68254] [Ruby trunk - Bug #10891] [Open] /[[:punct:]]/ POSIX group broken (with string literals?)
       [not found] <redmine.issue-10891.20150223122909@ruby-lang.org>
@ 2015-02-23 12:29 ` lord.thom
  2015-02-23 14:45 ` [ruby-core:68260] [Ruby trunk - Bug #10891] " nobu
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 8+ messages in thread
From: lord.thom @ 2015-02-23 12:29 UTC (permalink / raw)
  To: ruby-core

Issue #10891 has been reported by Tom Lord.

----------------------------------------
Bug #10891: /[[:punct:]]/ POSIX group broken (with string literals?)
https://bugs.ruby-lang.org/issues/10891

* Author: Tom Lord
* Status: Open
* Priority: Normal
* Assignee: ruby-core
* ruby -v: ruby 2.2.0p0 (2014-12-25 revision 49005) [x86_64-linux]
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN
----------------------------------------
The regular expression: /[[:punct:]]/ should match the following characters:

    ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~

However, it only works for these characters:

    ! " # % & ' ( ) * , - . / : ; ? @ [ \\ ] _ { }

And does not work for these characters:

    $ + < = > ^ ` | ~

However, this is where it gets really weird... Consider the following:

    60.chr == "<" # true
    60.chr =~ /[[:punct:]]/ # => 0
    "<" =~ /[[:punct:]]/ # => nil

So, it seems that the regular expression only fails for string literals!



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:68260] [Ruby trunk - Bug #10891] /[[:punct:]]/ POSIX group broken (with string literals?)
       [not found] <redmine.issue-10891.20150223122909@ruby-lang.org>
  2015-02-23 12:29 ` [ruby-core:68254] [Ruby trunk - Bug #10891] [Open] /[[:punct:]]/ POSIX group broken (with string literals?) lord.thom
@ 2015-02-23 14:45 ` nobu
  2015-02-23 15:53 ` [ruby-core:68263] " lord.thom
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 8+ messages in thread
From: nobu @ 2015-02-23 14:45 UTC (permalink / raw)
  To: ruby-core

Issue #10891 has been updated by Nobuyoshi Nakada.

Description updated

It occurs with UTF-8 encoding only.

----------------------------------------
Bug #10891: /[[:punct:]]/ POSIX group broken (with string literals?)
https://bugs.ruby-lang.org/issues/10891#change-51615

* Author: Tom Lord
* Status: Open
* Priority: Normal
* Assignee: ruby-core
* ruby -v: ruby 2.2.0p0 (2014-12-25 revision 49005) [x86_64-linux]
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN
----------------------------------------
The regular expression: `/[[:punct:]]/` should match the following characters:

    ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~

However, it only works for these characters:

    ! " # % & ' ( ) * , - . / : ; ? @ [ \\ ] _ { }

And does not work for these characters:

    $ + < = > ^ ` | ~

However, this is where it gets really weird... Consider the following:

    60.chr == "<" # true
    60.chr =~ /[[:punct:]]/ # => 0
    "<" =~ /[[:punct:]]/ # => nil

So, it seems that the regular expression only fails for string literals!



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:68263] [Ruby trunk - Bug #10891] /[[:punct:]]/ POSIX group broken (with string literals?)
       [not found] <redmine.issue-10891.20150223122909@ruby-lang.org>
  2015-02-23 12:29 ` [ruby-core:68254] [Ruby trunk - Bug #10891] [Open] /[[:punct:]]/ POSIX group broken (with string literals?) lord.thom
  2015-02-23 14:45 ` [ruby-core:68260] [Ruby trunk - Bug #10891] " nobu
@ 2015-02-23 15:53 ` lord.thom
  2015-02-24 10:03 ` [ruby-core:68280] " lord.thom
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 8+ messages in thread
From: lord.thom @ 2015-02-23 15:53 UTC (permalink / raw)
  To: ruby-core

Issue #10891 has been updated by Tom Lord.


Nobuyoshi Nakada wrote:
> It occurs with UTF-8 encoding only.

Ahhhhh, of course - *that's* what the difference between `60.chr` and `"<"` is!

Like you said, the issue only affects UTF-8 encodings:

    #<Encoding:UTF-8>, #<Encoding:UTF8-MAC>, #<Encoding:UTF8-DoCoMo>, #<Encoding:UTF8-KDDI>, #<Encoding:UTF8-SoftBank>

----------------------------------------
Bug #10891: /[[:punct:]]/ POSIX group broken (with string literals?)
https://bugs.ruby-lang.org/issues/10891#change-51617

* Author: Tom Lord
* Status: Open
* Priority: Normal
* Assignee: ruby-core
* ruby -v: ruby 2.2.0p0 (2014-12-25 revision 49005) [x86_64-linux]
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN
----------------------------------------
The regular expression: `/[[:punct:]]/` should match the following characters:

    ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~

However, it only works for these characters:

    ! " # % & ' ( ) * , - . / : ; ? @ [ \\ ] _ { }

And does not work for these characters:

    $ + < = > ^ ` | ~

However, this is where it gets really weird... Consider the following:

    60.chr == "<" # true
    60.chr =~ /[[:punct:]]/ # => 0
    "<" =~ /[[:punct:]]/ # => nil

So, it seems that the regular expression only fails for string literals!



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:68280] [Ruby trunk - Bug #10891] /[[:punct:]]/ POSIX group broken (with string literals?)
       [not found] <redmine.issue-10891.20150223122909@ruby-lang.org>
                   ` (2 preceding siblings ...)
  2015-02-23 15:53 ` [ruby-core:68263] " lord.thom
@ 2015-02-24 10:03 ` lord.thom
  2015-11-30  3:14 ` [ruby-core:71742] " shugo
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 8+ messages in thread
From: lord.thom @ 2015-02-24 10:03 UTC (permalink / raw)
  To: ruby-core

Issue #10891 has been updated by Tom Lord.


On further investigation, this is a known issue in Onigmo (Ruby 2.x's regexp parser).

However, it was apparently "fixed" way back in 2006: https://github.com/k-takata/Onigmo/blob/d0b3173893b9499a4e53ae1da16ba76c06d85571/HISTORY#L584-585 (Note: I can't find a reference to any Oniguruma/Onigmo source control dating back this far, to see the actual commit)

...And yet, it remains an open issue: https://github.com/k-takata/Onigmo/issues/42

----------------------------------------
Bug #10891: /[[:punct:]]/ POSIX group broken (with string literals?)
https://bugs.ruby-lang.org/issues/10891#change-51638

* Author: Tom Lord
* Status: Open
* Priority: Normal
* Assignee: ruby-core
* ruby -v: ruby 2.2.0p0 (2014-12-25 revision 49005) [x86_64-linux]
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN
----------------------------------------
The regular expression: `/[[:punct:]]/` should match the following characters:

    ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~

However, it only works for these characters:

    ! " # % & ' ( ) * , - . / : ; ? @ [ \\ ] _ { }

And does not work for these characters:

    $ + < = > ^ ` | ~

However, this is where it gets really weird... Consider the following:

    60.chr == "<" # true
    60.chr =~ /[[:punct:]]/ # => 0
    "<" =~ /[[:punct:]]/ # => nil

So, it seems that the regular expression only fails for string literals!



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:71742] [Ruby trunk - Bug #10891] /[[:punct:]]/ POSIX group broken (with string literals?)
       [not found] <redmine.issue-10891.20150223122909@ruby-lang.org>
                   ` (3 preceding siblings ...)
  2015-02-24 10:03 ` [ruby-core:68280] " lord.thom
@ 2015-11-30  3:14 ` shugo
  2015-11-30  9:42 ` [ruby-core:71746] [Ruby trunk - Bug #10891] [Feedback] " naruse
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 8+ messages in thread
From: shugo @ 2015-11-30  3:14 UTC (permalink / raw)
  To: ruby-core

Issue #10891 has been updated by Shugo Maeda.

Assignee changed from ruby-core to Yui NARUSE

How about to interpret `[[:punct]]` as `[\p{P}\p{S}]` for unicode strings so that `[[:punct]]` will be a superset of POSIX's one?

----------------------------------------
Bug #10891: /[[:punct:]]/ POSIX group broken (with string literals?)
https://bugs.ruby-lang.org/issues/10891#change-55146

* Author: Tom Lord
* Status: Open
* Priority: Normal
* Assignee: Yui NARUSE
* ruby -v: ruby 2.2.0p0 (2014-12-25 revision 49005) [x86_64-linux]
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN
----------------------------------------
The regular expression: `/[[:punct:]]/` should match the following characters:

    ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~

However, it only works for these characters:

    ! " # % & ' ( ) * , - . / : ; ? @ [ \\ ] _ { }

And does not work for these characters:

    $ + < = > ^ ` | ~

However, this is where it gets really weird... Consider the following:

    60.chr == "<" # true
    60.chr =~ /[[:punct:]]/ # => 0
    "<" =~ /[[:punct:]]/ # => nil

So, it seems that the regular expression only fails for string literals!



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:71746] [Ruby trunk - Bug #10891] [Feedback] /[[:punct:]]/ POSIX group broken (with string literals?)
       [not found] <redmine.issue-10891.20150223122909@ruby-lang.org>
                   ` (4 preceding siblings ...)
  2015-11-30  3:14 ` [ruby-core:71742] " shugo
@ 2015-11-30  9:42 ` naruse
  2015-11-30 13:48 ` [ruby-core:71756] [Ruby trunk - Bug #10891] " shugo
  2019-07-08  1:07 ` [ruby-core:93600] [Ruby master Bug#10891] " merch-redmine
  7 siblings, 0 replies; 8+ messages in thread
From: naruse @ 2015-11-30  9:42 UTC (permalink / raw)
  To: ruby-core

Issue #10891 has been updated by Yui NARUSE.

Status changed from Open to Feedback

It follows UTR#18's Standard Recommendation.
http://www.unicode.org/reports/tr18/#punct

----------------------------------------
Bug #10891: /[[:punct:]]/ POSIX group broken (with string literals?)
https://bugs.ruby-lang.org/issues/10891#change-55149

* Author: Tom Lord
* Status: Feedback
* Priority: Normal
* Assignee: Yui NARUSE
* ruby -v: ruby 2.2.0p0 (2014-12-25 revision 49005) [x86_64-linux]
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN
----------------------------------------
The regular expression: `/[[:punct:]]/` should match the following characters:

    ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~

However, it only works for these characters:

    ! " # % & ' ( ) * , - . / : ; ? @ [ \\ ] _ { }

And does not work for these characters:

    $ + < = > ^ ` | ~

However, this is where it gets really weird... Consider the following:

    60.chr == "<" # true
    60.chr =~ /[[:punct:]]/ # => 0
    "<" =~ /[[:punct:]]/ # => nil

So, it seems that the regular expression only fails for string literals!



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:71756] [Ruby trunk - Bug #10891] /[[:punct:]]/ POSIX group broken (with string literals?)
       [not found] <redmine.issue-10891.20150223122909@ruby-lang.org>
                   ` (5 preceding siblings ...)
  2015-11-30  9:42 ` [ruby-core:71746] [Ruby trunk - Bug #10891] [Feedback] " naruse
@ 2015-11-30 13:48 ` shugo
  2019-07-08  1:07 ` [ruby-core:93600] [Ruby master Bug#10891] " merch-redmine
  7 siblings, 0 replies; 8+ messages in thread
From: shugo @ 2015-11-30 13:48 UTC (permalink / raw)
  To: ruby-core

Issue #10891 has been updated by Shugo Maeda.


Yui NARUSE wrote:
> It follows UTR#18's Standard Recommendation.
> http://www.unicode.org/reports/tr18/#punct

In general, it would be a reasonable choice.

However, in Ruby, the problem is that it's hard to guess the programmers intention from code,
because the behavior is decided not by the regular expression, but by the target string.

```
def do_something(s)
  ...
  if /[[:punct:]]/ =~ s  # should "<" match, or shouldn't?
    ...
  end
  ...
end
```

If you want to reject symbols, `/\p{P}/` can be used instead, and it's more readable.


----------------------------------------
Bug #10891: /[[:punct:]]/ POSIX group broken (with string literals?)
https://bugs.ruby-lang.org/issues/10891#change-55167

* Author: Tom Lord
* Status: Feedback
* Priority: Normal
* Assignee: Yui NARUSE
* ruby -v: ruby 2.2.0p0 (2014-12-25 revision 49005) [x86_64-linux]
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN
----------------------------------------
The regular expression: `/[[:punct:]]/` should match the following characters:

    ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~

However, it only works for these characters:

    ! " # % & ' ( ) * , - . / : ; ? @ [ \\ ] _ { }

And does not work for these characters:

    $ + < = > ^ ` | ~

However, this is where it gets really weird... Consider the following:

    60.chr == "<" # true
    60.chr =~ /[[:punct:]]/ # => 0
    "<" =~ /[[:punct:]]/ # => nil

So, it seems that the regular expression only fails for string literals!



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [ruby-core:93600] [Ruby master Bug#10891] /[[:punct:]]/ POSIX group broken (with string literals?)
       [not found] <redmine.issue-10891.20150223122909@ruby-lang.org>
                   ` (6 preceding siblings ...)
  2015-11-30 13:48 ` [ruby-core:71756] [Ruby trunk - Bug #10891] " shugo
@ 2019-07-08  1:07 ` merch-redmine
  7 siblings, 0 replies; 8+ messages in thread
From: merch-redmine @ 2019-07-08  1:07 UTC (permalink / raw)
  To: ruby-core

Issue #10891 has been updated by jeremyevans0 (Jeremy Evans).

Status changed from Feedback to Closed

This was apparently fixed between Ruby 2.3 and 2.4:

```
$ ruby23 -e 'p("<".force_encoding("UTF-8") =~ /[[:punct:]]/)' 
nil
$ ruby24 -e 'p("<".force_encoding("UTF-8") =~ /[[:punct:]]/)' 
0
```


----------------------------------------
Bug #10891: /[[:punct:]]/ POSIX group broken (with string literals?)
https://bugs.ruby-lang.org/issues/10891#change-79193

* Author: tom-lord (Tom Lord)
* Status: Closed
* Priority: Normal
* Assignee: naruse (Yui NARUSE)
* Target version: 
* ruby -v: ruby 2.2.0p0 (2014-12-25 revision 49005) [x86_64-linux]
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN, 2.2: UNKNOWN
----------------------------------------
The regular expression: `/[[:punct:]]/` should match the following characters:

    ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~

However, it only works for these characters:

    ! " # % & ' ( ) * , - . / : ; ? @ [ \\ ] _ { }

And does not work for these characters:

    $ + < = > ^ ` | ~

However, this is where it gets really weird... Consider the following:

    60.chr == "<" # true
    60.chr =~ /[[:punct:]]/ # => 0
    "<" =~ /[[:punct:]]/ # => nil

So, it seems that the regular expression only fails for string literals!



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-07-08  1:08 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <redmine.issue-10891.20150223122909@ruby-lang.org>
2015-02-23 12:29 ` [ruby-core:68254] [Ruby trunk - Bug #10891] [Open] /[[:punct:]]/ POSIX group broken (with string literals?) lord.thom
2015-02-23 14:45 ` [ruby-core:68260] [Ruby trunk - Bug #10891] " nobu
2015-02-23 15:53 ` [ruby-core:68263] " lord.thom
2015-02-24 10:03 ` [ruby-core:68280] " lord.thom
2015-11-30  3:14 ` [ruby-core:71742] " shugo
2015-11-30  9:42 ` [ruby-core:71746] [Ruby trunk - Bug #10891] [Feedback] " naruse
2015-11-30 13:48 ` [ruby-core:71756] [Ruby trunk - Bug #10891] " shugo
2019-07-08  1:07 ` [ruby-core:93600] [Ruby master Bug#10891] " merch-redmine

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).