ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:66835] [ruby-trunk - Bug #10598] [Open] Cannot make two symbols with same bytes and different encodings
       [not found] <redmine.issue-10598.20141215045658@ruby-lang.org>
@ 2014-12-15  4:56 ` davidegrayson
  2014-12-15  7:19 ` [ruby-core:66840] [ruby-trunk - Bug #10598] [Closed] " nobu
  1 sibling, 0 replies; 2+ messages in thread
From: davidegrayson @ 2014-12-15  4:56 UTC (permalink / raw
  To: ruby-core

Issue #10598 has been reported by David Grayson.

----------------------------------------
Bug #10598: Cannot make two symbols with same bytes and different encodings
https://bugs.ruby-lang.org/issues/10598

* Author: David Grayson
* Status: Open
* Priority: Normal
* Assignee: 
* Category: core
* Target version: 
* ruby -v: ruby 2.2.0preview2 (2014-11-28 trunk 48628) [x86_64-linux]
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN
----------------------------------------
It looks like Ruby 2.1.1 introduced a bug where it is impossible create two different symbols with the same bytes but different encodings.  Here is a simple script that reproduces the bug:

```{ruby}
sym1 = "ab".force_encoding("UTF-16").to_sym
sym2 = "ab".to_sym
puts sym2.encoding

sym3 = "cd".to_sym
sym4 = "cd".force_encoding("UTF-16").to_sym
puts sym4.encoding
```

I would expect the output of this script to be:

```
US-ASCII
UTF-16
```

The script behaves as expected in Ruby 2.1.0, but in Ruby 2.1.1 and every later version that I tested, it gives incorrect results.  Here is a shell session showing the output of the script when I run it in Ruby 2.1.0, 2.1.1, and 2.2.0-preview2:

```
$ chruby 2.1.0 && ruby -v && ruby symbol_encoding_bug.rb
ruby 2.1.0p0 (2013-12-25 revision 44422) [x86_64-linux]
US-ASCII
UTF-16

$ chruby 2.1.1 && ruby -v && ruby symbol_encoding_bug.rb
ruby 2.1.1p76 (2014-02-24 revision 45161) [x86_64-linux]
UTF-16
US-ASCII

$ chruby 2.2.0-preview2 && ruby -v && ruby symbol_encoding_bug.rb
ruby 2.2.0preview2 (2014-11-28 trunk 48628) [x86_64-linux]
UTF-16
US-ASCII
```

It looks like `String#to_sym` is not properly accounting for the encoding of the string when it searches the symbol table.

This is definitely a bug; the value of `"ab".to_sym.encoding` should be predictable; it should not depend on the state of the symbol table.

By the way, JRuby has a similar bug:  https://github.com/jruby/jruby/issues/1348



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 2+ messages in thread

* [ruby-core:66840] [ruby-trunk - Bug #10598] [Closed] Cannot make two symbols with same bytes and different encodings
       [not found] <redmine.issue-10598.20141215045658@ruby-lang.org>
  2014-12-15  4:56 ` [ruby-core:66835] [ruby-trunk - Bug #10598] [Open] Cannot make two symbols with same bytes and different encodings davidegrayson
@ 2014-12-15  7:19 ` nobu
  1 sibling, 0 replies; 2+ messages in thread
From: nobu @ 2014-12-15  7:19 UTC (permalink / raw
  To: ruby-core

Issue #10598 has been updated by Nobuyoshi Nakada.

Status changed from Open to Closed
% Done changed from 0 to 100

Applied in changeset r48845.

----------
string.c: fix coderange for non-endianness string

* string.c (rb_enc_str_coderange): dummy wchar, non-endianness
  encoding string cannot be ascii only.
  [ruby-core:66835] [Bug #10598]

----------------------------------------
Bug #10598: Cannot make two symbols with same bytes and different encodings
https://bugs.ruby-lang.org/issues/10598#change-50403

* Author: David Grayson
* Status: Closed
* Priority: Normal
* Assignee: 
* Category: core
* Target version: 
* ruby -v: ruby 2.2.0preview2 (2014-11-28 trunk 48628) [x86_64-linux]
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN
----------------------------------------
It looks like Ruby 2.1.1 introduced a bug where it is impossible create two different symbols with the same bytes but different encodings.  Here is a simple script that reproduces the bug:

```{ruby}
sym1 = "ab".force_encoding("UTF-16").to_sym
sym2 = "ab".to_sym
puts sym2.encoding

sym3 = "cd".to_sym
sym4 = "cd".force_encoding("UTF-16").to_sym
puts sym4.encoding
```

I would expect the output of this script to be:

```
US-ASCII
UTF-16
```

The script behaves as expected in Ruby 2.1.0, but in Ruby 2.1.1 and every later version that I tested, it gives incorrect results.  Here is a shell session showing the output of the script when I run it in Ruby 2.1.0, 2.1.1, and 2.2.0-preview2:

```
$ chruby 2.1.0 && ruby -v && ruby symbol_encoding_bug.rb
ruby 2.1.0p0 (2013-12-25 revision 44422) [x86_64-linux]
US-ASCII
UTF-16

$ chruby 2.1.1 && ruby -v && ruby symbol_encoding_bug.rb
ruby 2.1.1p76 (2014-02-24 revision 45161) [x86_64-linux]
UTF-16
US-ASCII

$ chruby 2.2.0-preview2 && ruby -v && ruby symbol_encoding_bug.rb
ruby 2.2.0preview2 (2014-11-28 trunk 48628) [x86_64-linux]
UTF-16
US-ASCII
```

It looks like `String#to_sym` is not properly accounting for the encoding of the string when it searches the symbol table.

This is definitely a bug; the value of `"ab".to_sym.encoding` should be predictable; it should not depend on the state of the symbol table.

By the way, JRuby has a similar bug:  https://github.com/jruby/jruby/issues/1348



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2014-12-15  7:26 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <redmine.issue-10598.20141215045658@ruby-lang.org>
2014-12-15  4:56 ` [ruby-core:66835] [ruby-trunk - Bug #10598] [Open] Cannot make two symbols with same bytes and different encodings davidegrayson
2014-12-15  7:19 ` [ruby-core:66840] [ruby-trunk - Bug #10598] [Closed] " nobu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).