ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:109152] [Ruby master Bug#18899] Inconsistent argument handling in IO#set_encoding
@ 2022-07-06 12:34 javanthropus (Jeremy Bopp)
  2022-08-21 14:24 ` [ruby-core:109614] " javanthropus (Jeremy Bopp)
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: javanthropus (Jeremy Bopp) @ 2022-07-06 12:34 UTC (permalink / raw)
  To: ruby-core

Issue #18899 has been reported by javanthropus (Jeremy Bopp).

----------------------------------------
Bug #18899: Inconsistent argument handling in IO#set_encoding
https://bugs.ruby-lang.org/issues/18899

* Author: javanthropus (Jeremy Bopp)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
`IO#set_encoding` behaves differently when processing a single String argument than it does when processing 2 arguments (whether Strings or Encodings) in the case where the external encoding is being set to binary and the internal encoding is being set to any other encoding.

This script demonstrates the resulting values of the external and internal encodings for an IO instance given different ways to equivalently call `#set_encoding`:

```ruby
#!/usr/bin/env ruby


def show(io, args)
  printf(
    "args: %-50s  external encoding: %-25s  internal encoding: %-25s\n",
    args.inspect,
    io.external_encoding.inspect,
    io.internal_encoding.inspect
  )
end

File.open('/dev/null') do |f|
  args = ['binary:utf-8']
  f.set_encoding(*args)
  show(f, args)

  args = ['binary', 'utf-8']
  f.set_encoding(*args)
  show(f, args)

  args = [Encoding.find('binary'), Encoding.find('utf-8')]
  f.set_encoding(*args)
  show(f, args)
end
```

This behavior is the same from Ruby 2.7.0 to 3.0.2.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:109614] [Ruby master Bug#18899] Inconsistent argument handling in IO#set_encoding
  2022-07-06 12:34 [ruby-core:109152] [Ruby master Bug#18899] Inconsistent argument handling in IO#set_encoding javanthropus (Jeremy Bopp)
@ 2022-08-21 14:24 ` javanthropus (Jeremy Bopp)
  2022-08-23 22:20 ` [ruby-core:109651] " jeremyevans0 (Jeremy Evans)
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: javanthropus (Jeremy Bopp) @ 2022-08-21 14:24 UTC (permalink / raw)
  To: ruby-core

Issue #18899 has been updated by javanthropus (Jeremy Bopp).


Can anyone confirm if this is a bug or intended behavior?  I've taken a look at the code that implements this, and there are 2 pretty independent code paths for handling the single string argument case and the multiple argument case.  If this is confirmed to be a bug, I would like to write a patch to unify the behavior.

----------------------------------------
Bug #18899: Inconsistent argument handling in IO#set_encoding
https://bugs.ruby-lang.org/issues/18899#change-98798

* Author: javanthropus (Jeremy Bopp)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
`IO#set_encoding` behaves differently when processing a single String argument than it does when processing 2 arguments (whether Strings or Encodings) in the case where the external encoding is being set to binary and the internal encoding is being set to any other encoding.

This script demonstrates the resulting values of the external and internal encodings for an IO instance given different ways to equivalently call `#set_encoding`:

```ruby
#!/usr/bin/env ruby


def show(io, args)
  printf(
    "args: %-50s  external encoding: %-25s  internal encoding: %-25s\n",
    args.inspect,
    io.external_encoding.inspect,
    io.internal_encoding.inspect
  )
end

File.open('/dev/null') do |f|
  args = ['binary:utf-8']
  f.set_encoding(*args)
  show(f, args)

  args = ['binary', 'utf-8']
  f.set_encoding(*args)
  show(f, args)

  args = [Encoding.find('binary'), Encoding.find('utf-8')]
  f.set_encoding(*args)
  show(f, args)
end
```

This behavior is the same from Ruby 2.7.0 to 3.1.2.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:109651] [Ruby master Bug#18899] Inconsistent argument handling in IO#set_encoding
  2022-07-06 12:34 [ruby-core:109152] [Ruby master Bug#18899] Inconsistent argument handling in IO#set_encoding javanthropus (Jeremy Bopp)
  2022-08-21 14:24 ` [ruby-core:109614] " javanthropus (Jeremy Bopp)
@ 2022-08-23 22:20 ` jeremyevans0 (Jeremy Evans)
  2022-08-26 13:01 ` [ruby-core:109709] " javanthropus (Jeremy Bopp)
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: jeremyevans0 (Jeremy Evans) @ 2022-08-23 22:20 UTC (permalink / raw)
  To: ruby-core

Issue #18899 has been updated by jeremyevans0 (Jeremy Evans).


I think it is a bug.  I submitted a pull request to fix it: https://github.com/ruby/ruby/pull/6280.  Not sure if the approach taken is the best way, though.

----------------------------------------
Bug #18899: Inconsistent argument handling in IO#set_encoding
https://bugs.ruby-lang.org/issues/18899#change-98874

* Author: javanthropus (Jeremy Bopp)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
`IO#set_encoding` behaves differently when processing a single String argument than it does when processing 2 arguments (whether Strings or Encodings) in the case where the external encoding is being set to binary and the internal encoding is being set to any other encoding.

This script demonstrates the resulting values of the external and internal encodings for an IO instance given different ways to equivalently call `#set_encoding`:

```ruby
#!/usr/bin/env ruby


def show(io, args)
  printf(
    "args: %-50s  external encoding: %-25s  internal encoding: %-25s\n",
    args.inspect,
    io.external_encoding.inspect,
    io.internal_encoding.inspect
  )
end

File.open('/dev/null') do |f|
  args = ['binary:utf-8']
  f.set_encoding(*args)
  show(f, args)

  args = ['binary', 'utf-8']
  f.set_encoding(*args)
  show(f, args)

  args = [Encoding.find('binary'), Encoding.find('utf-8')]
  f.set_encoding(*args)
  show(f, args)
end
```

This behavior is the same from Ruby 2.7.0 to 3.1.2.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:109709] [Ruby master Bug#18899] Inconsistent argument handling in IO#set_encoding
  2022-07-06 12:34 [ruby-core:109152] [Ruby master Bug#18899] Inconsistent argument handling in IO#set_encoding javanthropus (Jeremy Bopp)
  2022-08-21 14:24 ` [ruby-core:109614] " javanthropus (Jeremy Bopp)
  2022-08-23 22:20 ` [ruby-core:109651] " jeremyevans0 (Jeremy Evans)
@ 2022-08-26 13:01 ` javanthropus (Jeremy Bopp)
  2022-11-21 12:22 ` [ruby-core:110832] " naruse (Yui NARUSE)
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: javanthropus (Jeremy Bopp) @ 2022-08-26 13:01 UTC (permalink / raw)
  To: ruby-core

Issue #18899 has been updated by javanthropus (Jeremy Bopp).


I ran my test against your branch, and it addresses this issue.  I hope it can be incorporated soon.  Thanks!

----------------------------------------
Bug #18899: Inconsistent argument handling in IO#set_encoding
https://bugs.ruby-lang.org/issues/18899#change-98943

* Author: javanthropus (Jeremy Bopp)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
`IO#set_encoding` behaves differently when processing a single String argument than it does when processing 2 arguments (whether Strings or Encodings) in the case where the external encoding is being set to binary and the internal encoding is being set to any other encoding.

This script demonstrates the resulting values of the external and internal encodings for an IO instance given different ways to equivalently call `#set_encoding`:

```ruby
#!/usr/bin/env ruby


def show(io, args)
  printf(
    "args: %-50s  external encoding: %-25s  internal encoding: %-25s\n",
    args.inspect,
    io.external_encoding.inspect,
    io.internal_encoding.inspect
  )
end

File.open('/dev/null') do |f|
  args = ['binary:utf-8']
  f.set_encoding(*args)
  show(f, args)

  args = ['binary', 'utf-8']
  f.set_encoding(*args)
  show(f, args)

  args = [Encoding.find('binary'), Encoding.find('utf-8')]
  f.set_encoding(*args)
  show(f, args)
end
```

This behavior is the same from Ruby 2.7.0 to 3.1.2.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:110832] [Ruby master Bug#18899] Inconsistent argument handling in IO#set_encoding
  2022-07-06 12:34 [ruby-core:109152] [Ruby master Bug#18899] Inconsistent argument handling in IO#set_encoding javanthropus (Jeremy Bopp)
                   ` (2 preceding siblings ...)
  2022-08-26 13:01 ` [ruby-core:109709] " javanthropus (Jeremy Bopp)
@ 2022-11-21 12:22 ` naruse (Yui NARUSE)
  2022-11-21 13:53 ` [ruby-core:110836] " javanthropus (Jeremy Bopp)
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: naruse (Yui NARUSE) @ 2022-11-21 12:22 UTC (permalink / raw)
  To: ruby-core

Issue #18899 has been updated by naruse (Yui NARUSE).


I think your example needs to be as follows:
```ruby
#!/usr/bin/env ruby

def show(io, args)
  printf(
    "args: %-50s  external encoding: %-25s  internal encoding: %-25s\n",
    args.inspect,
    io.external_encoding.inspect,
    io.internal_encoding.inspect
  )
end

File.open('/dev/null', 'r:binary:utf-8') do |f|
  args = ['r:binary:utf-8']
  show(f, args)

  args = ['binary:utf-8']
  f.set_encoding(*args)
  show(f, args)

  args = ['binary', 'utf-8']
  f.set_encoding(*args)
  show(f, args)

  args = [Encoding.find('binary'), Encoding.find('utf-8')]
  f.set_encoding(*args)
  show(f, args)
end
```
The result will be
```
args: ["r:binary:utf-8"]                                  external encoding: #<Encoding:ASCII-8BIT>     internal encoding: nil
args: ["binary:utf-8"]                                    external encoding: #<Encoding:ASCII-8BIT>     internal encoding: nil
args: ["binary", "utf-8"]                                 external encoding: #<Encoding:ASCII-8BIT>     internal encoding: #<Encoding:UTF-8>
args: [#<Encoding:ASCII-8BIT>, #<Encoding:UTF-8>]         external encoding: #<Encoding:ASCII-8BIT>     internal encoding: #<Encoding:UTF-8>
```

----------------------------------------
Bug #18899: Inconsistent argument handling in IO#set_encoding
https://bugs.ruby-lang.org/issues/18899#change-100190

* Author: javanthropus (Jeremy Bopp)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
`IO#set_encoding` behaves differently when processing a single String argument than it does when processing 2 arguments (whether Strings or Encodings) in the case where the external encoding is being set to binary and the internal encoding is being set to any other encoding.

This script demonstrates the resulting values of the external and internal encodings for an IO instance given different ways to equivalently call `#set_encoding`:

```ruby
#!/usr/bin/env ruby


def show(io, args)
  printf(
    "args: %-50s  external encoding: %-25s  internal encoding: %-25s\n",
    args.inspect,
    io.external_encoding.inspect,
    io.internal_encoding.inspect
  )
end

File.open('/dev/null') do |f|
  args = ['binary:utf-8']
  f.set_encoding(*args)
  show(f, args)

  args = ['binary', 'utf-8']
  f.set_encoding(*args)
  show(f, args)

  args = [Encoding.find('binary'), Encoding.find('utf-8')]
  f.set_encoding(*args)
  show(f, args)
end
```

This behavior is the same from Ruby 2.7.0 to 3.1.2.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:110836] [Ruby master Bug#18899] Inconsistent argument handling in IO#set_encoding
  2022-07-06 12:34 [ruby-core:109152] [Ruby master Bug#18899] Inconsistent argument handling in IO#set_encoding javanthropus (Jeremy Bopp)
                   ` (3 preceding siblings ...)
  2022-11-21 12:22 ` [ruby-core:110832] " naruse (Yui NARUSE)
@ 2022-11-21 13:53 ` javanthropus (Jeremy Bopp)
  2022-11-25 17:55 ` [ruby-core:111012] " jeremyevans0 (Jeremy Evans)
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: javanthropus (Jeremy Bopp) @ 2022-11-21 13:53 UTC (permalink / raw)
  To: ruby-core

Issue #18899 has been updated by javanthropus (Jeremy Bopp).


Thank you for your response.  How do the changes to the example make a difference?  The results with the original example are:
```
args: ["binary:utf-8"]                                    external encoding: #<Encoding:ASCII-8BIT>     internal encoding: nil                      
args: ["binary", "utf-8"]                                 external encoding: #<Encoding:ASCII-8BIT>     internal encoding: #<Encoding:UTF-8>        
args: [#<Encoding:ASCII-8BIT>, #<Encoding:UTF-8>]         external encoding: #<Encoding:ASCII-8BIT>     internal encoding: #<Encoding:UTF-8>
```

Unless I'm mistaken, these are exactly the same as the last 3 lines of the modified example's output.  The question remains as to why the single string argument case results in a `nil` internal encoding while the 2 argument cases do not.

Before investigating this, I thought that the logic would first split `"binary:utf-8"` into `"binary"` and `"utf-8"` and then proceed as in the 2 string argument case.  In other words, I expected that all cases would result in the internal encoding being set to the same value, either `nil` or `Encoding::UTF-8`.

----------------------------------------
Bug #18899: Inconsistent argument handling in IO#set_encoding
https://bugs.ruby-lang.org/issues/18899#change-100193

* Author: javanthropus (Jeremy Bopp)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
`IO#set_encoding` behaves differently when processing a single String argument than it does when processing 2 arguments (whether Strings or Encodings) in the case where the external encoding is being set to binary and the internal encoding is being set to any other encoding.

This script demonstrates the resulting values of the external and internal encodings for an IO instance given different ways to equivalently call `#set_encoding`:

```ruby
#!/usr/bin/env ruby


def show(io, args)
  printf(
    "args: %-50s  external encoding: %-25s  internal encoding: %-25s\n",
    args.inspect,
    io.external_encoding.inspect,
    io.internal_encoding.inspect
  )
end

File.open('/dev/null') do |f|
  args = ['binary:utf-8']
  f.set_encoding(*args)
  show(f, args)

  args = ['binary', 'utf-8']
  f.set_encoding(*args)
  show(f, args)

  args = [Encoding.find('binary'), Encoding.find('utf-8')]
  f.set_encoding(*args)
  show(f, args)
end
```

This behavior is the same from Ruby 2.7.0 to 3.1.2.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:111012] [Ruby master Bug#18899] Inconsistent argument handling in IO#set_encoding
  2022-07-06 12:34 [ruby-core:109152] [Ruby master Bug#18899] Inconsistent argument handling in IO#set_encoding javanthropus (Jeremy Bopp)
                   ` (4 preceding siblings ...)
  2022-11-21 13:53 ` [ruby-core:110836] " javanthropus (Jeremy Bopp)
@ 2022-11-25 17:55 ` jeremyevans0 (Jeremy Evans)
  2022-11-26 12:44 ` [ruby-core:111019] " Eregon (Benoit Daloze)
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: jeremyevans0 (Jeremy Evans) @ 2022-11-25 17:55 UTC (permalink / raw)
  To: ruby-core

Issue #18899 has been updated by jeremyevans0 (Jeremy Evans).


After more research, it appears the current behavior is expected.  Parsing the single string with embedded colon is already handled correctly.  However, if the external encoding is binary/ASCII-8BIT, then the internal encoding is deliberately set to `nil`:

```c
// in rb_io_ext_int_to_encs
    if (ext == rb_ascii8bit_encoding()) {
        /* If external is ASCII-8BIT, no transcoding */
        intern = NULL;
    }
```

Basically, the `'binary:utf-8'` encoding doesn't make sense.  Providing two encodings is done to transcode from one encoding to the other.  There is no transcoding if the external encoding is binary.  If you want the internal encoding to be UTF-8, then just use `'utf-8'`.

That still leaves us with inconsistent behavior between `'binary:utf-8'` and `'binary', 'utf-8'`.  So I propose to make the `'binary', 'utf-8'` behavior the same as `'binary:utf-8'`.  I updated my pull request to do that: https://github.com/ruby/ruby/pull/6280

An alternative approach would be to remove the above code to treat the external encoding specially.

----------------------------------------
Bug #18899: Inconsistent argument handling in IO#set_encoding
https://bugs.ruby-lang.org/issues/18899#change-100263

* Author: javanthropus (Jeremy Bopp)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
`IO#set_encoding` behaves differently when processing a single String argument than it does when processing 2 arguments (whether Strings or Encodings) in the case where the external encoding is being set to binary and the internal encoding is being set to any other encoding.

This script demonstrates the resulting values of the external and internal encodings for an IO instance given different ways to equivalently call `#set_encoding`:

```ruby
#!/usr/bin/env ruby


def show(io, args)
  printf(
    "args: %-50s  external encoding: %-25s  internal encoding: %-25s\n",
    args.inspect,
    io.external_encoding.inspect,
    io.internal_encoding.inspect
  )
end

File.open('/dev/null') do |f|
  args = ['binary:utf-8']
  f.set_encoding(*args)
  show(f, args)

  args = ['binary', 'utf-8']
  f.set_encoding(*args)
  show(f, args)

  args = [Encoding.find('binary'), Encoding.find('utf-8')]
  f.set_encoding(*args)
  show(f, args)
end
```

This behavior is the same from Ruby 2.7.0 to 3.1.2.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:111019] [Ruby master Bug#18899] Inconsistent argument handling in IO#set_encoding
  2022-07-06 12:34 [ruby-core:109152] [Ruby master Bug#18899] Inconsistent argument handling in IO#set_encoding javanthropus (Jeremy Bopp)
                   ` (5 preceding siblings ...)
  2022-11-25 17:55 ` [ruby-core:111012] " jeremyevans0 (Jeremy Evans)
@ 2022-11-26 12:44 ` Eregon (Benoit Daloze)
  2022-11-26 23:20 ` [ruby-core:111026] " javanthropus (Jeremy Bopp)
  2022-11-30 18:21 ` [ruby-core:111095] " Dan0042 (Daniel DeLorme)
  8 siblings, 0 replies; 10+ messages in thread
From: Eregon (Benoit Daloze) @ 2022-11-26 12:44 UTC (permalink / raw)
  To: ruby-core

Issue #18899 has been updated by Eregon (Benoit Daloze).


I've taken a look in `IO#set_encoding` recently and it's such an unreadable mess, I think nobody would be able to explain its full semantics.
So anything to simplify it would IMHO be welcome.
I think `IO#set_encoding` should simply set the internal/external encodings for that IO, with no special cases and not caring about the default external/internal encodings.
If some cases don't make any sense they should raise an exception.

----------------------------------------
Bug #18899: Inconsistent argument handling in IO#set_encoding
https://bugs.ruby-lang.org/issues/18899#change-100274

* Author: javanthropus (Jeremy Bopp)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
`IO#set_encoding` behaves differently when processing a single String argument than it does when processing 2 arguments (whether Strings or Encodings) in the case where the external encoding is being set to binary and the internal encoding is being set to any other encoding.

This script demonstrates the resulting values of the external and internal encodings for an IO instance given different ways to equivalently call `#set_encoding`:

```ruby
#!/usr/bin/env ruby


def show(io, args)
  printf(
    "args: %-50s  external encoding: %-25s  internal encoding: %-25s\n",
    args.inspect,
    io.external_encoding.inspect,
    io.internal_encoding.inspect
  )
end

File.open('/dev/null') do |f|
  args = ['binary:utf-8']
  f.set_encoding(*args)
  show(f, args)

  args = ['binary', 'utf-8']
  f.set_encoding(*args)
  show(f, args)

  args = [Encoding.find('binary'), Encoding.find('utf-8')]
  f.set_encoding(*args)
  show(f, args)
end
```

This behavior is the same from Ruby 2.7.0 to 3.1.2.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:111026] [Ruby master Bug#18899] Inconsistent argument handling in IO#set_encoding
  2022-07-06 12:34 [ruby-core:109152] [Ruby master Bug#18899] Inconsistent argument handling in IO#set_encoding javanthropus (Jeremy Bopp)
                   ` (6 preceding siblings ...)
  2022-11-26 12:44 ` [ruby-core:111019] " Eregon (Benoit Daloze)
@ 2022-11-26 23:20 ` javanthropus (Jeremy Bopp)
  2022-11-30 18:21 ` [ruby-core:111095] " Dan0042 (Daniel DeLorme)
  8 siblings, 0 replies; 10+ messages in thread
From: javanthropus (Jeremy Bopp) @ 2022-11-26 23:20 UTC (permalink / raw)
  To: ruby-core

Issue #18899 has been updated by javanthropus (Jeremy Bopp).


Please also see #18995 for another example of the intricate implementation behaving unexpectedly.  During my own investigation, I discovered that using `"-"` for the internal encoding name is silently ignored.  According to the comments in the code, `"-"` is used to indicate no conversion, but it's completely undocumented for the method.  If you use `"-"` for the external encoding name, you get similarly divergent behavior as reported for this issue if you pass `"-:utf-8"` vs. `"-"`, `"utf-8"`.

----------------------------------------
Bug #18899: Inconsistent argument handling in IO#set_encoding
https://bugs.ruby-lang.org/issues/18899#change-100280

* Author: javanthropus (Jeremy Bopp)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
`IO#set_encoding` behaves differently when processing a single String argument than it does when processing 2 arguments (whether Strings or Encodings) in the case where the external encoding is being set to binary and the internal encoding is being set to any other encoding.

This script demonstrates the resulting values of the external and internal encodings for an IO instance given different ways to equivalently call `#set_encoding`:

```ruby
#!/usr/bin/env ruby


def show(io, args)
  printf(
    "args: %-50s  external encoding: %-25s  internal encoding: %-25s\n",
    args.inspect,
    io.external_encoding.inspect,
    io.internal_encoding.inspect
  )
end

File.open('/dev/null') do |f|
  args = ['binary:utf-8']
  f.set_encoding(*args)
  show(f, args)

  args = ['binary', 'utf-8']
  f.set_encoding(*args)
  show(f, args)

  args = [Encoding.find('binary'), Encoding.find('utf-8')]
  f.set_encoding(*args)
  show(f, args)
end
```

This behavior is the same from Ruby 2.7.0 to 3.1.2.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:111095] [Ruby master Bug#18899] Inconsistent argument handling in IO#set_encoding
  2022-07-06 12:34 [ruby-core:109152] [Ruby master Bug#18899] Inconsistent argument handling in IO#set_encoding javanthropus (Jeremy Bopp)
                   ` (7 preceding siblings ...)
  2022-11-26 23:20 ` [ruby-core:111026] " javanthropus (Jeremy Bopp)
@ 2022-11-30 18:21 ` Dan0042 (Daniel DeLorme)
  8 siblings, 0 replies; 10+ messages in thread
From: Dan0042 (Daniel DeLorme) @ 2022-11-30 18:21 UTC (permalink / raw)
  To: ruby-core

Issue #18899 has been updated by Dan0042 (Daniel DeLorme).


Naively, I would have expected "binary:utf-8" to take arbitrary input and force the encoding to UTF-8, and "utf-8:utf-8" to read and validate the input as UTF-8.
Neither does what I expected. `¯\_(ツ)_/¯`

----------------------------------------
Bug #18899: Inconsistent argument handling in IO#set_encoding
https://bugs.ruby-lang.org/issues/18899#change-100360

* Author: javanthropus (Jeremy Bopp)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
`IO#set_encoding` behaves differently when processing a single String argument than it does when processing 2 arguments (whether Strings or Encodings) in the case where the external encoding is being set to binary and the internal encoding is being set to any other encoding.

This script demonstrates the resulting values of the external and internal encodings for an IO instance given different ways to equivalently call `#set_encoding`:

```ruby
#!/usr/bin/env ruby


def show(io, args)
  printf(
    "args: %-50s  external encoding: %-25s  internal encoding: %-25s\n",
    args.inspect,
    io.external_encoding.inspect,
    io.internal_encoding.inspect
  )
end

File.open('/dev/null') do |f|
  args = ['binary:utf-8']
  f.set_encoding(*args)
  show(f, args)

  args = ['binary', 'utf-8']
  f.set_encoding(*args)
  show(f, args)

  args = [Encoding.find('binary'), Encoding.find('utf-8')]
  f.set_encoding(*args)
  show(f, args)
end
```

This behavior is the same from Ruby 2.7.0 to 3.1.2.



-- 
https://bugs.ruby-lang.org/
 ______________________________________________
 ruby-core mailing list -- ruby-core@ml.ruby-lang.org
 To unsubscribe send an email to ruby-core-leave@ml.ruby-lang.org
 ruby-core info -- https://ml.ruby-lang.org/mailman3/postorius/lists/ruby-core.ml.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-11-30 18:21 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-06 12:34 [ruby-core:109152] [Ruby master Bug#18899] Inconsistent argument handling in IO#set_encoding javanthropus (Jeremy Bopp)
2022-08-21 14:24 ` [ruby-core:109614] " javanthropus (Jeremy Bopp)
2022-08-23 22:20 ` [ruby-core:109651] " jeremyevans0 (Jeremy Evans)
2022-08-26 13:01 ` [ruby-core:109709] " javanthropus (Jeremy Bopp)
2022-11-21 12:22 ` [ruby-core:110832] " naruse (Yui NARUSE)
2022-11-21 13:53 ` [ruby-core:110836] " javanthropus (Jeremy Bopp)
2022-11-25 17:55 ` [ruby-core:111012] " jeremyevans0 (Jeremy Evans)
2022-11-26 12:44 ` [ruby-core:111019] " Eregon (Benoit Daloze)
2022-11-26 23:20 ` [ruby-core:111026] " javanthropus (Jeremy Bopp)
2022-11-30 18:21 ` [ruby-core:111095] " Dan0042 (Daniel DeLorme)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).