ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:97461] [Ruby master Bug#16497] StringIO#internal_encoding is broken (more severely in 2.7)
       [not found] <redmine.issue-16497.20200110111831.710@ruby-lang.org>
@ 2020-03-12 13:04 ` jean.boussier
  2020-03-15 13:07 ` [ruby-core:97506] " naruse
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 5+ messages in thread
From: jean.boussier @ 2020-03-12 13:04 UTC (permalink / raw)
  To: ruby-core

Issue #16497 has been updated by byroot (Jean Boussier).


I have a potential fix for this issue: https://github.com/ruby/ruby/pull/2960



----------------------------------------
Bug #16497: StringIO#internal_encoding is broken (more severely in 2.7)
https://bugs.ruby-lang.org/issues/16497#change-84605

* Author: zverok (Victor Shepelev)
* Status: Assigned
* Priority: Normal
* Assignee: nobu (Nobuyoshi Nakada)
* Backport: 2.5: DONTNEED, 2.6: DONTNEED, 2.7: REQUIRED
----------------------------------------
To the best of my understanding from [Encoding](https://docs.ruby-lang.org/en/master/Encoding.html) docs, the following is true:

* external encoding (explicitly specified or taken from `Encoding.default_external`) specifies how the IO understands input and stores it internally
* internal encoding (explicitly specified or taken from `Encoding.default_internal`) specifies how the IO converts what it reads.

Demonstration with regular files:

```ruby
# prepare data
File.write('test.txt', 'Україна'.encode('KOI8-U'), encoding: 'KOI8-U') #=> 7

def test(io)
  str = io.read
  [io.external_encoding, io.internal_encoding, str, str.encoding]
end

# read it:
test(File.open('test.txt', 'r:KOI8-U'))
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]

# We can specify internal encoding when opening the file:
test(File.open('test.txt', 'r:KOI8-U:UTF-8'))
# => [#<Encoding:KOI8-U>, #<Encoding:UTF-8>, "Україна", #<Encoding:UTF-8>]

# ...or when it is already opened
test(File.open('test.txt').tap { |f| f.set_encoding('KOI8-U', 'UTF-8') })
# => [#<Encoding:KOI8-U>, #<Encoding:UTF-8>, "Україна", #<Encoding:UTF-8>]

# ...or with Encoding.default_internal
Encoding.default_internal = 'UTF-8'
test(File.open('test.txt', 'r:KOI8-U'))
# => [#<Encoding:KOI8-U>, #<Encoding:UTF-8>, "Україна", #<Encoding:UTF-8>]
```

But with StringIO, **internal encoding can't be set** in Ruby **2.6**:

```ruby
require 'stringio'
Encoding.default_internal = nil
str = 'Україна'.encode('KOI8-U')

# Simplest form:
test(StringIO.new(str))
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]

# Try to set via mode
test(StringIO.new(str, 'r:KOI8-U:UTF-8'))
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]

# Try to set via set_encoding:
test(StringIO.new(str, 'r:KOI8-U:UTF-8').tap { |f| f.set_encoding('KOI8-U', 'UTF-8') })
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]

# Try to set via Enoding.default_internal:
Encoding.default_internal = 'UTF-8'
test(StringIO.new(str))
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]
```

So, in 2.6, any attempt to do something with StringIO's internal encoding are **just ignored**.

In **2.7**, though, matters became much worse:
```ruby
require 'stringio'
Encoding.default_internal = nil
str = 'Україна'.encode('KOI8-U')

# Behaves same as 2.6
test(StringIO.new(str))
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]

# Try to set via mode: WEIRD behavior starts
test(StringIO.new(str, 'r:KOI8-U:UTF-8'))
# => [#<Encoding:UTF-8>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:UTF-8>]

# Try to set via set_encoding: still just ignored
test(StringIO.new(str, 'r:KOI8-U:UTF-8').tap { |f| f.set_encoding('KOI8-U', 'UTF-8') })
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]

# Try to set via Enoding.default_internal: WEIRD behavior again
Encoding.default_internal = 'UTF-8'
test(StringIO.new(str))
# => [#<Encoding:UTF-8>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:UTF-8>]
```

So, **2.7** not just ignores attempts to set **internal** encoding, but erroneously sets it to **external** one, so strings are not recoded, but their encoding is forced to change.

I believe it is severe bug (more severe than 2.6's "just ignoring").

[This Reddit thread](https://www.reddit.com/r/ruby/comments/emd6q4/is_this_a_stringio_bug_in_ruby_270/) shows how it breaks existing code:

* the author uses `StringIO` to work with `ASCII-8BIT` strings;
* the code is performed in Rails environment (which sets `internal_encoding` to `UTF-8` by default);
* under **2.7**, `StringIO#read` returns `ASCII-8BIT` content in Strings saying their encoding is `UTF-8`.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:97506] [Ruby master Bug#16497] StringIO#internal_encoding is broken (more severely in 2.7)
       [not found] <redmine.issue-16497.20200110111831.710@ruby-lang.org>
  2020-03-12 13:04 ` [ruby-core:97461] [Ruby master Bug#16497] StringIO#internal_encoding is broken (more severely in 2.7) jean.boussier
@ 2020-03-15 13:07 ` naruse
  2020-03-15 17:27 ` [ruby-core:97511] " zverok.offline
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 5+ messages in thread
From: naruse @ 2020-03-15 13:07 UTC (permalink / raw)
  To: ruby-core

Issue #16497 has been updated by naruse (Yui NARUSE).

Backport changed from 2.5: DONTNEED, 2.6: DONTNEED, 2.7: REQUIRED to 2.5: DONTNEED, 2.6: DONTNEED, 2.7: DONE

ruby_2_7 47b08728cf3d0441a3da4dc1dcdd578817b0e036.

----------------------------------------
Bug #16497: StringIO#internal_encoding is broken (more severely in 2.7)
https://bugs.ruby-lang.org/issues/16497#change-84655

* Author: zverok (Victor Shepelev)
* Status: Closed
* Priority: Normal
* Assignee: nobu (Nobuyoshi Nakada)
* Backport: 2.5: DONTNEED, 2.6: DONTNEED, 2.7: DONE
----------------------------------------
To the best of my understanding from [Encoding](https://docs.ruby-lang.org/en/master/Encoding.html) docs, the following is true:

* external encoding (explicitly specified or taken from `Encoding.default_external`) specifies how the IO understands input and stores it internally
* internal encoding (explicitly specified or taken from `Encoding.default_internal`) specifies how the IO converts what it reads.

Demonstration with regular files:

```ruby
# prepare data
File.write('test.txt', 'Україна'.encode('KOI8-U'), encoding: 'KOI8-U') #=> 7

def test(io)
  str = io.read
  [io.external_encoding, io.internal_encoding, str, str.encoding]
end

# read it:
test(File.open('test.txt', 'r:KOI8-U'))
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]

# We can specify internal encoding when opening the file:
test(File.open('test.txt', 'r:KOI8-U:UTF-8'))
# => [#<Encoding:KOI8-U>, #<Encoding:UTF-8>, "Україна", #<Encoding:UTF-8>]

# ...or when it is already opened
test(File.open('test.txt').tap { |f| f.set_encoding('KOI8-U', 'UTF-8') })
# => [#<Encoding:KOI8-U>, #<Encoding:UTF-8>, "Україна", #<Encoding:UTF-8>]

# ...or with Encoding.default_internal
Encoding.default_internal = 'UTF-8'
test(File.open('test.txt', 'r:KOI8-U'))
# => [#<Encoding:KOI8-U>, #<Encoding:UTF-8>, "Україна", #<Encoding:UTF-8>]
```

But with StringIO, **internal encoding can't be set** in Ruby **2.6**:

```ruby
require 'stringio'
Encoding.default_internal = nil
str = 'Україна'.encode('KOI8-U')

# Simplest form:
test(StringIO.new(str))
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]

# Try to set via mode
test(StringIO.new(str, 'r:KOI8-U:UTF-8'))
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]

# Try to set via set_encoding:
test(StringIO.new(str, 'r:KOI8-U:UTF-8').tap { |f| f.set_encoding('KOI8-U', 'UTF-8') })
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]

# Try to set via Enoding.default_internal:
Encoding.default_internal = 'UTF-8'
test(StringIO.new(str))
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]
```

So, in 2.6, any attempt to do something with StringIO's internal encoding are **just ignored**.

In **2.7**, though, matters became much worse:
```ruby
require 'stringio'
Encoding.default_internal = nil
str = 'Україна'.encode('KOI8-U')

# Behaves same as 2.6
test(StringIO.new(str))
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]

# Try to set via mode: WEIRD behavior starts
test(StringIO.new(str, 'r:KOI8-U:UTF-8'))
# => [#<Encoding:UTF-8>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:UTF-8>]

# Try to set via set_encoding: still just ignored
test(StringIO.new(str, 'r:KOI8-U:UTF-8').tap { |f| f.set_encoding('KOI8-U', 'UTF-8') })
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]

# Try to set via Enoding.default_internal: WEIRD behavior again
Encoding.default_internal = 'UTF-8'
test(StringIO.new(str))
# => [#<Encoding:UTF-8>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:UTF-8>]
```

So, **2.7** not just ignores attempts to set **internal** encoding, but erroneously sets it to **external** one, so strings are not recoded, but their encoding is forced to change.

I believe it is severe bug (more severe than 2.6's "just ignoring").

[This Reddit thread](https://www.reddit.com/r/ruby/comments/emd6q4/is_this_a_stringio_bug_in_ruby_270/) shows how it breaks existing code:

* the author uses `StringIO` to work with `ASCII-8BIT` strings;
* the code is performed in Rails environment (which sets `internal_encoding` to `UTF-8` by default);
* under **2.7**, `StringIO#read` returns `ASCII-8BIT` content in Strings saying their encoding is `UTF-8`.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:97511] [Ruby master Bug#16497] StringIO#internal_encoding is broken (more severely in 2.7)
       [not found] <redmine.issue-16497.20200110111831.710@ruby-lang.org>
  2020-03-12 13:04 ` [ruby-core:97461] [Ruby master Bug#16497] StringIO#internal_encoding is broken (more severely in 2.7) jean.boussier
  2020-03-15 13:07 ` [ruby-core:97506] " naruse
@ 2020-03-15 17:27 ` zverok.offline
  2020-12-23  6:35 ` [ruby-core:101643] " zverok.offline
  2021-10-26 16:31 ` [ruby-core:105810] " zverok (Victor Shepelev)
  4 siblings, 0 replies; 5+ messages in thread
From: zverok.offline @ 2020-03-15 17:27 UTC (permalink / raw)
  To: ruby-core

Issue #16497 has been updated by zverok (Victor Shepelev).


@naruse one of my two "weird" cases is not fixed yet:

```ruby
def test(io)
  str = io.read
  [io.external_encoding, io.internal_encoding, str, str.encoding]
end

str = 'Україна'.encode('KOI8-U')

test(StringIO.new(str, 'r:KOI8-U:UTF-8'))
# => [#<Encoding:UTF-8>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:UTF-8>]
```
(Tried just now on the freshest `master`)


----------------------------------------
Bug #16497: StringIO#internal_encoding is broken (more severely in 2.7)
https://bugs.ruby-lang.org/issues/16497#change-84660

* Author: zverok (Victor Shepelev)
* Status: Closed
* Priority: Normal
* Assignee: nobu (Nobuyoshi Nakada)
* Backport: 2.5: DONTNEED, 2.6: DONTNEED, 2.7: DONE
----------------------------------------
To the best of my understanding from [Encoding](https://docs.ruby-lang.org/en/master/Encoding.html) docs, the following is true:

* external encoding (explicitly specified or taken from `Encoding.default_external`) specifies how the IO understands input and stores it internally
* internal encoding (explicitly specified or taken from `Encoding.default_internal`) specifies how the IO converts what it reads.

Demonstration with regular files:

```ruby
# prepare data
File.write('test.txt', 'Україна'.encode('KOI8-U'), encoding: 'KOI8-U') #=> 7

def test(io)
  str = io.read
  [io.external_encoding, io.internal_encoding, str, str.encoding]
end

# read it:
test(File.open('test.txt', 'r:KOI8-U'))
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]

# We can specify internal encoding when opening the file:
test(File.open('test.txt', 'r:KOI8-U:UTF-8'))
# => [#<Encoding:KOI8-U>, #<Encoding:UTF-8>, "Україна", #<Encoding:UTF-8>]

# ...or when it is already opened
test(File.open('test.txt').tap { |f| f.set_encoding('KOI8-U', 'UTF-8') })
# => [#<Encoding:KOI8-U>, #<Encoding:UTF-8>, "Україна", #<Encoding:UTF-8>]

# ...or with Encoding.default_internal
Encoding.default_internal = 'UTF-8'
test(File.open('test.txt', 'r:KOI8-U'))
# => [#<Encoding:KOI8-U>, #<Encoding:UTF-8>, "Україна", #<Encoding:UTF-8>]
```

But with StringIO, **internal encoding can't be set** in Ruby **2.6**:

```ruby
require 'stringio'
Encoding.default_internal = nil
str = 'Україна'.encode('KOI8-U')

# Simplest form:
test(StringIO.new(str))
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]

# Try to set via mode
test(StringIO.new(str, 'r:KOI8-U:UTF-8'))
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]

# Try to set via set_encoding:
test(StringIO.new(str, 'r:KOI8-U:UTF-8').tap { |f| f.set_encoding('KOI8-U', 'UTF-8') })
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]

# Try to set via Enoding.default_internal:
Encoding.default_internal = 'UTF-8'
test(StringIO.new(str))
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]
```

So, in 2.6, any attempt to do something with StringIO's internal encoding are **just ignored**.

In **2.7**, though, matters became much worse:
```ruby
require 'stringio'
Encoding.default_internal = nil
str = 'Україна'.encode('KOI8-U')

# Behaves same as 2.6
test(StringIO.new(str))
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]

# Try to set via mode: WEIRD behavior starts
test(StringIO.new(str, 'r:KOI8-U:UTF-8'))
# => [#<Encoding:UTF-8>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:UTF-8>]

# Try to set via set_encoding: still just ignored
test(StringIO.new(str, 'r:KOI8-U:UTF-8').tap { |f| f.set_encoding('KOI8-U', 'UTF-8') })
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]

# Try to set via Enoding.default_internal: WEIRD behavior again
Encoding.default_internal = 'UTF-8'
test(StringIO.new(str))
# => [#<Encoding:UTF-8>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:UTF-8>]
```

So, **2.7** not just ignores attempts to set **internal** encoding, but erroneously sets it to **external** one, so strings are not recoded, but their encoding is forced to change.

I believe it is severe bug (more severe than 2.6's "just ignoring").

[This Reddit thread](https://www.reddit.com/r/ruby/comments/emd6q4/is_this_a_stringio_bug_in_ruby_270/) shows how it breaks existing code:

* the author uses `StringIO` to work with `ASCII-8BIT` strings;
* the code is performed in Rails environment (which sets `internal_encoding` to `UTF-8` by default);
* under **2.7**, `StringIO#read` returns `ASCII-8BIT` content in Strings saying their encoding is `UTF-8`.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:101643] [Ruby master Bug#16497] StringIO#internal_encoding is broken (more severely in 2.7)
       [not found] <redmine.issue-16497.20200110111831.710@ruby-lang.org>
                   ` (2 preceding siblings ...)
  2020-03-15 17:27 ` [ruby-core:97511] " zverok.offline
@ 2020-12-23  6:35 ` zverok.offline
  2021-10-26 16:31 ` [ruby-core:105810] " zverok (Victor Shepelev)
  4 siblings, 0 replies; 5+ messages in thread
From: zverok.offline @ 2020-12-23  6:35 UTC (permalink / raw)
  To: ruby-core

Issue #16497 has been updated by zverok (Victor Shepelev).


@nobu What's the status of this?

----------------------------------------
Bug #16497: StringIO#internal_encoding is broken (more severely in 2.7)
https://bugs.ruby-lang.org/issues/16497#change-89434

* Author: zverok (Victor Shepelev)
* Status: Assigned
* Priority: Normal
* Assignee: nobu (Nobuyoshi Nakada)
* Backport: 2.5: DONTNEED, 2.6: DONTNEED, 2.7: REQUIRED
----------------------------------------
To the best of my understanding from [Encoding](https://docs.ruby-lang.org/en/master/Encoding.html) docs, the following is true:

* external encoding (explicitly specified or taken from `Encoding.default_external`) specifies how the IO understands input and stores it internally
* internal encoding (explicitly specified or taken from `Encoding.default_internal`) specifies how the IO converts what it reads.

Demonstration with regular files:

```ruby
# prepare data
File.write('test.txt', 'Україна'.encode('KOI8-U'), encoding: 'KOI8-U') #=> 7

def test(io)
  str = io.read
  [io.external_encoding, io.internal_encoding, str, str.encoding]
end

# read it:
test(File.open('test.txt', 'r:KOI8-U'))
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]

# We can specify internal encoding when opening the file:
test(File.open('test.txt', 'r:KOI8-U:UTF-8'))
# => [#<Encoding:KOI8-U>, #<Encoding:UTF-8>, "Україна", #<Encoding:UTF-8>]

# ...or when it is already opened
test(File.open('test.txt').tap { |f| f.set_encoding('KOI8-U', 'UTF-8') })
# => [#<Encoding:KOI8-U>, #<Encoding:UTF-8>, "Україна", #<Encoding:UTF-8>]

# ...or with Encoding.default_internal
Encoding.default_internal = 'UTF-8'
test(File.open('test.txt', 'r:KOI8-U'))
# => [#<Encoding:KOI8-U>, #<Encoding:UTF-8>, "Україна", #<Encoding:UTF-8>]
```

But with StringIO, **internal encoding can't be set** in Ruby **2.6**:

```ruby
require 'stringio'
Encoding.default_internal = nil
str = 'Україна'.encode('KOI8-U')

# Simplest form:
test(StringIO.new(str))
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]

# Try to set via mode
test(StringIO.new(str, 'r:KOI8-U:UTF-8'))
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]

# Try to set via set_encoding:
test(StringIO.new(str, 'r:KOI8-U:UTF-8').tap { |f| f.set_encoding('KOI8-U', 'UTF-8') })
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]

# Try to set via Enoding.default_internal:
Encoding.default_internal = 'UTF-8'
test(StringIO.new(str))
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]
```

So, in 2.6, any attempt to do something with StringIO's internal encoding are **just ignored**.

In **2.7**, though, matters became much worse:
```ruby
require 'stringio'
Encoding.default_internal = nil
str = 'Україна'.encode('KOI8-U')

# Behaves same as 2.6
test(StringIO.new(str))
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]

# Try to set via mode: WEIRD behavior starts
test(StringIO.new(str, 'r:KOI8-U:UTF-8'))
# => [#<Encoding:UTF-8>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:UTF-8>]

# Try to set via set_encoding: still just ignored
test(StringIO.new(str, 'r:KOI8-U:UTF-8').tap { |f| f.set_encoding('KOI8-U', 'UTF-8') })
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]

# Try to set via Enoding.default_internal: WEIRD behavior again
Encoding.default_internal = 'UTF-8'
test(StringIO.new(str))
# => [#<Encoding:UTF-8>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:UTF-8>]
```

So, **2.7** not just ignores attempts to set **internal** encoding, but erroneously sets it to **external** one, so strings are not recoded, but their encoding is forced to change.

I believe it is severe bug (more severe than 2.6's "just ignoring").

[This Reddit thread](https://www.reddit.com/r/ruby/comments/emd6q4/is_this_a_stringio_bug_in_ruby_270/) shows how it breaks existing code:

* the author uses `StringIO` to work with `ASCII-8BIT` strings;
* the code is performed in Rails environment (which sets `internal_encoding` to `UTF-8` by default);
* under **2.7**, `StringIO#read` returns `ASCII-8BIT` content in Strings saying their encoding is `UTF-8`.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:105810] [Ruby master Bug#16497] StringIO#internal_encoding is broken (more severely in 2.7)
       [not found] <redmine.issue-16497.20200110111831.710@ruby-lang.org>
                   ` (3 preceding siblings ...)
  2020-12-23  6:35 ` [ruby-core:101643] " zverok.offline
@ 2021-10-26 16:31 ` zverok (Victor Shepelev)
  4 siblings, 0 replies; 5+ messages in thread
From: zverok (Victor Shepelev) @ 2021-10-26 16:31 UTC (permalink / raw)
  To: ruby-core

Issue #16497 has been updated by zverok (Victor Shepelev).


@nobu @naruse @byroot Year after, this is still broken in the recent head.

```ruby
RUBY_DESCRIPTION
# => "ruby 3.1.0dev (2021-10-26T11:17:00Z master afdca0e780) [x86_64-linux]"
str = 'Україна'.encode('KOI8-U')
# => "\xF5\xCB\xD2\xC1\xA7\xCE\xC1"
io = StringIO.new(str, 'r:KOI8-U:UTF-8')
io.internal_encoding
# => nil  -- expected UTF-8
io.external_encoding
# => #<Encoding:UTF-8> -- expected KOI8-U
out = io.read
# => "\xF5\xCB\xD2\xC1\xA7\xCE\xC1" -- expected 'Україна' in UTF-8, but it seems to be still KOI8-U?
out.encoding
# => #<Encoding:UTF-8> -- but it can't even report it properly
```


----------------------------------------
Bug #16497: StringIO#internal_encoding is broken (more severely in 2.7)
https://bugs.ruby-lang.org/issues/16497#change-94329

* Author: zverok (Victor Shepelev)
* Status: Assigned
* Priority: Normal
* Assignee: nobu (Nobuyoshi Nakada)
* Backport: 2.5: DONTNEED, 2.6: DONTNEED, 2.7: REQUIRED
----------------------------------------
To the best of my understanding from [Encoding](https://docs.ruby-lang.org/en/master/Encoding.html) docs, the following is true:

* external encoding (explicitly specified or taken from `Encoding.default_external`) specifies how the IO understands input and stores it internally
* internal encoding (explicitly specified or taken from `Encoding.default_internal`) specifies how the IO converts what it reads.

Demonstration with regular files:

```ruby
# prepare data
File.write('test.txt', 'Україна'.encode('KOI8-U'), encoding: 'KOI8-U') #=> 7

def test(io)
  str = io.read
  [io.external_encoding, io.internal_encoding, str, str.encoding]
end

# read it:
test(File.open('test.txt', 'r:KOI8-U'))
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]

# We can specify internal encoding when opening the file:
test(File.open('test.txt', 'r:KOI8-U:UTF-8'))
# => [#<Encoding:KOI8-U>, #<Encoding:UTF-8>, "Україна", #<Encoding:UTF-8>]

# ...or when it is already opened
test(File.open('test.txt').tap { |f| f.set_encoding('KOI8-U', 'UTF-8') })
# => [#<Encoding:KOI8-U>, #<Encoding:UTF-8>, "Україна", #<Encoding:UTF-8>]

# ...or with Encoding.default_internal
Encoding.default_internal = 'UTF-8'
test(File.open('test.txt', 'r:KOI8-U'))
# => [#<Encoding:KOI8-U>, #<Encoding:UTF-8>, "Україна", #<Encoding:UTF-8>]
```

But with StringIO, **internal encoding can't be set** in Ruby **2.6**:

```ruby
require 'stringio'
Encoding.default_internal = nil
str = 'Україна'.encode('KOI8-U')

# Simplest form:
test(StringIO.new(str))
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]

# Try to set via mode
test(StringIO.new(str, 'r:KOI8-U:UTF-8'))
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]

# Try to set via set_encoding:
test(StringIO.new(str, 'r:KOI8-U:UTF-8').tap { |f| f.set_encoding('KOI8-U', 'UTF-8') })
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]

# Try to set via Enoding.default_internal:
Encoding.default_internal = 'UTF-8'
test(StringIO.new(str))
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]
```

So, in 2.6, any attempt to do something with StringIO's internal encoding are **just ignored**.

In **2.7**, though, matters became much worse:
```ruby
require 'stringio'
Encoding.default_internal = nil
str = 'Україна'.encode('KOI8-U')

# Behaves same as 2.6
test(StringIO.new(str))
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]

# Try to set via mode: WEIRD behavior starts
test(StringIO.new(str, 'r:KOI8-U:UTF-8'))
# => [#<Encoding:UTF-8>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:UTF-8>]

# Try to set via set_encoding: still just ignored
test(StringIO.new(str, 'r:KOI8-U:UTF-8').tap { |f| f.set_encoding('KOI8-U', 'UTF-8') })
# => [#<Encoding:KOI8-U>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:KOI8-U>]

# Try to set via Enoding.default_internal: WEIRD behavior again
Encoding.default_internal = 'UTF-8'
test(StringIO.new(str))
# => [#<Encoding:UTF-8>, nil, "\xF5\xCB\xD2\xC1\xA7\xCE\xC1", #<Encoding:UTF-8>]
```

So, **2.7** not just ignores attempts to set **internal** encoding, but erroneously sets it to **external** one, so strings are not recoded, but their encoding is forced to change.

I believe it is severe bug (more severe than 2.6's "just ignoring").

[This Reddit thread](https://www.reddit.com/r/ruby/comments/emd6q4/is_this_a_stringio_bug_in_ruby_270/) shows how it breaks existing code:

* the author uses `StringIO` to work with `ASCII-8BIT` strings;
* the code is performed in Rails environment (which sets `internal_encoding` to `UTF-8` by default);
* under **2.7**, `StringIO#read` returns `ASCII-8BIT` content in Strings saying their encoding is `UTF-8`.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-10-26 16:32 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <redmine.issue-16497.20200110111831.710@ruby-lang.org>
2020-03-12 13:04 ` [ruby-core:97461] [Ruby master Bug#16497] StringIO#internal_encoding is broken (more severely in 2.7) jean.boussier
2020-03-15 13:07 ` [ruby-core:97506] " naruse
2020-03-15 17:27 ` [ruby-core:97511] " zverok.offline
2020-12-23  6:35 ` [ruby-core:101643] " zverok.offline
2021-10-26 16:31 ` [ruby-core:105810] " zverok (Victor Shepelev)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).