ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
From: "Eregon (Benoit Daloze)" <noreply@ruby-lang.org>
To: ruby-core@ruby-lang.org
Subject: [ruby-core:107532] [Ruby master Feature#18576] Rename `ASCII-8BIT` encoding to `BINARY`
Date: Wed, 09 Feb 2022 17:34:28 +0000 (UTC)	[thread overview]
Message-ID: <redmine.journal-96444.20220209173428.7941@ruby-lang.org> (raw)
In-Reply-To: redmine.issue-18576.20220208090805.7941@ruby-lang.org

Issue #18576 has been updated by Eregon (Benoit Daloze).


naruse (Yui NARUSE) wrote in #note-6:
> I want to ask you that how often you can actually distinguish them.

I think in many cases it is possible to distinguish.
For instance, an HTTP header might initially be in the binary encoding and mean "unknown encoding" (can often find the real encoding through `Content-Type`'s charset, but not always and could be invalid)
Another example is `socket.read(N)` which might be actual binary data (e.g. for a binary protocol), or text and the actual encoding depends then on what's communicated on that socket.

And I would think most Ruby programs need to handle the binary encoding somehow, and can only leave a String as binary if it's only bytes < 128, otherwise things break.

> If so, Ruby may not provide two objects.

I don't think two different "binary" Encodings are useful, one seems enough in practice and can be used for both meanings, which are very close (as a binary byte array, or a marker for unknown encoding).

> This is very good question. Ruby's answer is "yes, ASCII-8BIT is similar to ISO-8859-*". As you say, an ASCII-8BIT string's 8-bit range is undefined. But Ruby doesn't matter that. In the real world such phenomenon is sometimes discovered.

I think such situations need to be handled somehow and given a real encoding.
"ASCII-8BIT" feels confusing because there is no such thing as a "8th" bit of ASCII, without a more specific encoding which defines that.
So it really means unknown, and "ASCII-8BIT" seems far from "unknown encoding".

Also "ASCII-8BIT" sounds clearly wrong if it's actual binary data (which might not use any ASCII concept at all).
The behavior that this pseudo-encoding is ASCII compatible and e.g. shows byte 65 as `A` is fine, after all hexdump utilities typically do the same for bytes < 128 and it's helpful if there is text in the middle of binary data.

> Anyway Rails programmers don't need such understanding usually. If renaming cares people who just hit the surface of this chaos, it might be worth considered, though changing encoding.name may hit the compatibility issue.

Not just Rails programmers, I think most Ruby programmers are confused when they see ASCII-8BIT, and not only the first time.
I believe renaming to BINARY would help them understand the meaning much better.

@tenderlovemaking One issue is e.g. error messages in CRuby are encoded in the binary encoding (probably for the legacy reason of using `rb_str_new()`), and so that would be I think a wide-reaching change with a high chance of causing real compatibility issues, it seems too incompatible to me.
As an example, the encoding negotiation rules (e.g. for concatenation) in Ruby are all based around whether one side is `#ascii_only?` and if yes then just use the other side's encoding. Preventing to e.g. concat with a ASCII-only binary string would break lots of programs.
Anyway, I think that's a separate issue indeed.

----------------------------------------
Feature #18576: Rename `ASCII-8BIT` encoding to `BINARY`
https://bugs.ruby-lang.org/issues/18576#change-96444

* Author: byroot (Jean Boussier)
* Status: Open
* Priority: Normal
----------------------------------------
### Context

I'm now used to it, but something that confused me for years was errors such as:

```ruby
>> "fée" + "\xFF".b
(irb):3:in `+': incompatible character encodings: UTF-8 and ASCII-8BIT (Encoding::CompatibilityError)
```

When you aren't that familiar with Ruby, it's really not evident that `ASCII-8BIT` basically means "no encoding" or "binary".

And even when you know it, if you don't read carefully it's very easily confused with `US-ASCII`.

The `Encoding::BINARY` alias is much more telling IMHO.

### Proposal

Since `Encoding::ASCII_8BIT` has been aliased as `Encoding::BINARY` for years, I think renaming it to `BINARY` and then making asking `ASCII_8BIT` the alias would significantly improve usability without backward compatibility concerns.

The only concern I could see would be the consistency with a handful of C API functions:

  - `rb_encoding *rb_ascii8bit_encoding(void)`
  - `int rb_ascii8bit_encindex(void)`
  - `VALUE rb_io_ascii8bit_binmode(VALUE io)`

But that's for much more advanced users, so I don't think it's much of a concern.




-- 
https://bugs.ruby-lang.org/

  parent reply	other threads:[~2022-02-09 17:34 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-08  9:08 [ruby-core:107514] [Ruby master Feature#18576] Rename `ASCII-8BIT` encoding to `BINARY` byroot (Jean Boussier)
2022-02-08  9:21 ` [ruby-core:107515] " duerst
2022-02-08  9:28 ` [ruby-core:107516] " byroot (Jean Boussier)
2022-02-08  9:41 ` [ruby-core:107517] " naruse (Yui NARUSE)
2022-02-08 10:15 ` [ruby-core:107518] " Eregon (Benoit Daloze)
2022-02-08 10:21 ` [ruby-core:107519] " Eregon (Benoit Daloze)
2022-02-09  9:51 ` [ruby-core:107527] " naruse (Yui NARUSE)
2022-02-09 16:52 ` [ruby-core:107531] " tenderlovemaking (Aaron Patterson)
2022-02-09 17:34 ` Eregon (Benoit Daloze) [this message]
2022-02-09 17:49 ` [ruby-core:107533] " jeremyevans0 (Jeremy Evans)
2022-02-09 23:35 ` [ruby-core:107537] " tenderlovemaking (Aaron Patterson)
2022-02-10  7:53 ` [ruby-core:107549] " duerst
2022-02-10  9:11 ` [ruby-core:107550] " byroot (Jean Boussier)
2022-02-10 14:15 ` [ruby-core:107553] " Eregon (Benoit Daloze)
2022-02-17  9:14 ` [ruby-core:107619] " matz (Yukihiro Matsumoto)
2022-02-17  9:16 ` [ruby-core:107620] " byroot (Jean Boussier)
2022-02-17  9:24 ` [ruby-core:107621] " matz (Yukihiro Matsumoto)
2022-02-17  9:27 ` [ruby-core:107622] " byroot (Jean Boussier)
2022-02-17 13:30 ` [ruby-core:107634] " Eregon (Benoit Daloze)
2022-02-17 13:58 ` [ruby-core:107636] " matz (Yukihiro Matsumoto)
2022-02-17 14:00 ` [ruby-core:107637] " byroot (Jean Boussier)
2022-02-17 15:34 ` [ruby-core:107640] " byroot (Jean Boussier)
2022-02-19 10:59 ` [ruby-core:107666] " byroot (Jean Boussier)
2022-02-21  8:23 ` [ruby-core:107680] " byroot (Jean Boussier)
2022-03-17  9:03 ` [ruby-core:107943] " matz (Yukihiro Matsumoto)
2022-03-17 11:08 ` [ruby-core:107944] " Eregon (Benoit Daloze)
2022-03-17 15:06 ` [ruby-core:107956] " larskanis (Lars Kanis)
2023-12-06 12:36 ` [ruby-core:115604] " Eregon (Benoit Daloze) via ruby-core
2023-12-20  8:44 ` [ruby-core:115813] " naruse (Yui NARUSE) via ruby-core
2024-01-11 10:26 ` [ruby-core:116170] " Eregon (Benoit Daloze) via ruby-core
2024-01-11 10:30 ` [ruby-core:116172] " Eregon (Benoit Daloze) via ruby-core
2024-01-11 10:35 ` [ruby-core:116173] " byroot (Jean Boussier) via ruby-core
2024-01-17  8:26 ` [ruby-core:116266] " naruse (Yui NARUSE) via ruby-core
2024-01-17  8:36 ` [ruby-core:116268] " byroot (Jean Boussier) via ruby-core
2024-01-17  9:19 ` [ruby-core:116269] " zverok (Victor Shepelev) via ruby-core
2024-01-18  0:58 ` [ruby-core:116280] " Dan0042 (Daniel DeLorme) via ruby-core
2024-01-18 15:19 ` [ruby-core:116298] " Eregon (Benoit Daloze) via ruby-core
2024-01-21  9:46 ` [ruby-core:116355] " byroot (Jean Boussier) via ruby-core
2024-01-22 10:15 ` [ruby-core:116363] " Eregon (Benoit Daloze) via ruby-core
2024-01-24  6:47 ` [ruby-core:116393] " shyouhei (Shyouhei Urabe) via ruby-core
2024-02-14  9:32 ` [ruby-core:116738] " naruse (Yui NARUSE) via ruby-core
2024-02-19 12:38 ` [ruby-core:116845] " byroot (Jean Boussier) via ruby-core
2024-02-19 23:02 ` [ruby-core:116855] " Dan0042 (Daniel DeLorme) via ruby-core
2024-02-19 23:23 ` [ruby-core:116857] " duerst via ruby-core
2024-02-20  7:51 ` [ruby-core:116868] " byroot (Jean Boussier) via ruby-core
2024-02-20 11:16 ` [ruby-core:116875] " Eregon (Benoit Daloze) via ruby-core
2024-04-13 19:29 ` [ruby-core:117508] " alexander-s (Alexander S) via ruby-core

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.ruby-lang.org/en/community/mailing-lists/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=redmine.journal-96444.20220209173428.7941@ruby-lang.org \
    --to=ruby-core@ruby-lang.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).