ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:108843] [Ruby master Bug#10584] String.valid_encoding?, String.ascii_only? fails to account for BOM.
       [not found] <redmine.issue-10584.20141210005539.8866@ruby-lang.org>
@ 2022-06-10  6:15 ` mame (Yusuke Endoh)
  0 siblings, 0 replies; only message in thread
From: mame (Yusuke Endoh) @ 2022-06-10  6:15 UTC (permalink / raw
  To: ruby-core

Issue #10584 has been updated by mame (Yusuke Endoh).

Status changed from Open to Rejected

For the third and forth examples, you can use `BOM|UTF-8` encoding.

```
$ ruby -e 'p File.read("utf-8-with-bom-file", encoding: "BOM|UTF-8").ascii_only?'
true
$ ruby -e 'p File.read("utf-8-with-bom-file", encoding: "BOM|UTF-8")[0]'
"#"
```

For the first and second examples, I think it is a problem of the definition of `String#valid_encoding?` rather than a BOM. Currently, `"\uFFFE".valid_encoding?` returns true. (Note that `U+FFFE` is not a character.) So I think it is considered a spec. If we change it as a new feature, we need to evaluate its value and estimate the impact of compatibility.

----------------------------------------
Bug #10584: String.valid_encoding?, String.ascii_only? fails to account for BOM.
https://bugs.ruby-lang.org/issues/10584#change-97921

* Author: geoff-codes (Geoff Nixon)
* Status: Rejected
* Priority: Normal
* ruby -v: ruby 2.2.0preview2 (2014-11-28 trunk 48628) [x86_64-darwin14]
* Backport: 2.0.0: UNKNOWN, 2.1: UNKNOWN
----------------------------------------
IMO:

- A Unicode (UTF-16, UTF-32) string with a valid BOM should not be considered a valid encoding if endianness is changed.

- A UTF-8 string with BOM should not consider the BOM as a codepoint.

~~~sh
> file utf-16be-file
utf-16be-file: POSIX shell script, Big-endian UTF-16 Unicode text executable

> file utf-16le-file
utf-16le-file: POSIX shell script, Little-endian UTF-16 Unicode text executable

> file utf-8-with-bom-file
utf-8-with-bom-file: POSIX shell script, UTF-8 Unicode (with BOM) text executable
~~~

~~~sh
> ruby -e "p File.binread('utf-16le-file').force_encoding('UTF-16BE').valid_encoding?"
true # false

> ruby -e "p File.binread('utf-16be-file').force_encoding('UTF-16LE').valid_encoding?"
true # false

> ruby -e "p File.read('utf-8-with-bom-file').ascii_only?"
false # true

> ruby -e "p File.read('utf-8-with-bom-file')[0]"
"" # '#'
~~~

No?

---Files--------------------------------
utf-8-with-bom-file (14 Bytes)
utf-16le-file (2.46 KB)
utf-16be-file (2.45 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2022-06-10  6:16 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <redmine.issue-10584.20141210005539.8866@ruby-lang.org>
2022-06-10  6:15 ` [ruby-core:108843] [Ruby master Bug#10584] String.valid_encoding?, String.ascii_only? fails to account for BOM mame (Yusuke Endoh)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).