ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
From: "jeremyevans0 (Jeremy Evans)" <noreply@ruby-lang.org>
To: ruby-core@ml.ruby-lang.org
Subject: [ruby-core:111012] [Ruby master Bug#18899] Inconsistent argument handling in IO#set_encoding
Date: Fri, 25 Nov 2022 17:55:52 +0000 (UTC)	[thread overview]
Message-ID: <redmine.journal-100263.20221125175551.692@ruby-lang.org> (raw)
In-Reply-To: redmine.issue-18899.20220706123432.692@ruby-lang.org

Issue #18899 has been updated by jeremyevans0 (Jeremy Evans).


After more research, it appears the current behavior is expected.  Parsing the single string with embedded colon is already handled correctly.  However, if the external encoding is binary/ASCII-8BIT, then the internal encoding is deliberately set to `nil`:

```c
// in rb_io_ext_int_to_encs
    if (ext == rb_ascii8bit_encoding()) {
        /* If external is ASCII-8BIT, no transcoding */
        intern = NULL;
    }
```

Basically, the `'binary:utf-8'` encoding doesn't make sense.  Providing two encodings is done to transcode from one encoding to the other.  There is no transcoding if the external encoding is binary.  If you want the internal encoding to be UTF-8, then just use `'utf-8'`.

That still leaves us with inconsistent behavior between `'binary:utf-8'` and `'binary', 'utf-8'`.  So I propose to make the `'binary', 'utf-8'` behavior the same as `'binary:utf-8'`.  I updated my pull request to do that: https://github.com/ruby/ruby/pull/6280

An alternative approach would be to remove the above code to treat the external encoding specially.

----------------------------------------
Bug #18899: Inconsistent argument handling in IO#set_encoding
https://bugs.ruby-lang.org/issues/18899#change-100263

* Author: javanthropus (Jeremy Bopp)
* Status: Open
* Priority: Normal
* ruby -v: ruby 3.1.2p20 (2022-04-12 revision 4491bb740a) [x86_64-linux]
* Backport: 2.7: UNKNOWN, 3.0: UNKNOWN, 3.1: UNKNOWN
----------------------------------------
`IO#set_encoding` behaves differently when processing a single String argument than it does when processing 2 arguments (whether Strings or Encodings) in the case where the external encoding is being set to binary and the internal encoding is being set to any other encoding.

This script demonstrates the resulting values of the external and internal encodings for an IO instance given different ways to equivalently call `#set_encoding`:

```ruby
#!/usr/bin/env ruby


def show(io, args)
  printf(
    "args: %-50s  external encoding: %-25s  internal encoding: %-25s\n",
    args.inspect,
    io.external_encoding.inspect,
    io.internal_encoding.inspect
  )
end

File.open('/dev/null') do |f|
  args = ['binary:utf-8']
  f.set_encoding(*args)
  show(f, args)

  args = ['binary', 'utf-8']
  f.set_encoding(*args)
  show(f, args)

  args = [Encoding.find('binary'), Encoding.find('utf-8')]
  f.set_encoding(*args)
  show(f, args)
end
```

This behavior is the same from Ruby 2.7.0 to 3.1.2.



-- 
https://bugs.ruby-lang.org/

  parent reply	other threads:[~2022-11-25 21:55 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-06 12:34 [ruby-core:109152] [Ruby master Bug#18899] Inconsistent argument handling in IO#set_encoding javanthropus (Jeremy Bopp)
2022-08-21 14:24 ` [ruby-core:109614] " javanthropus (Jeremy Bopp)
2022-08-23 22:20 ` [ruby-core:109651] " jeremyevans0 (Jeremy Evans)
2022-08-26 13:01 ` [ruby-core:109709] " javanthropus (Jeremy Bopp)
2022-11-21 12:22 ` [ruby-core:110832] " naruse (Yui NARUSE)
2022-11-21 13:53 ` [ruby-core:110836] " javanthropus (Jeremy Bopp)
2022-11-25 17:55 ` jeremyevans0 (Jeremy Evans) [this message]
2022-11-26 12:44 ` [ruby-core:111019] " Eregon (Benoit Daloze)
2022-11-26 23:20 ` [ruby-core:111026] " javanthropus (Jeremy Bopp)
2022-11-30 18:21 ` [ruby-core:111095] " Dan0042 (Daniel DeLorme)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.ruby-lang.org/en/community/mailing-lists/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=redmine.journal-100263.20221125175551.692@ruby-lang.org \
    --to=ruby-core@ruby-lang.org \
    --cc=ruby-core@ml.ruby-lang.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).