ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:93024] [Ruby trunk Bug#15908] Detecting BOM with non-UTF encoding
       [not found] <redmine.issue-15908.20190608124445@ruby-lang.org>
@ 2019-06-08 12:44 ` nobu
  2019-08-29  6:50 ` [ruby-core:94652] [Ruby master " duerst
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 5+ messages in thread
From: nobu @ 2019-06-08 12:44 UTC (permalink / raw
  To: ruby-core

Issue #15908 has been reported by nobu (Nobuyoshi Nakada).

----------------------------------------
Bug #15908: Detecting BOM with non-UTF encoding
https://bugs.ruby-lang.org/issues/15908

* Author: nobu (Nobuyoshi Nakada)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: 
* Backport: 2.4: UNKNOWN, 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
Currently, "bom|" encoding prefix to `File.open` is ignored if the encoding name is not a UTF.
But one usage of BOM is to tell if the stream is a UTF or not, and especially common on Windows, e.g. UTF-16LE or OEMCP.
So I think this restriction should be removed.

---Files--------------------------------
0001-Enable-BOM-detection-with-non-UTF-encodings.patch (4.27 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:94652] [Ruby master Bug#15908] Detecting BOM with non-UTF encoding
       [not found] <redmine.issue-15908.20190608124445@ruby-lang.org>
  2019-06-08 12:44 ` [ruby-core:93024] [Ruby trunk Bug#15908] Detecting BOM with non-UTF encoding nobu
@ 2019-08-29  6:50 ` duerst
  2019-08-29  6:54 ` [ruby-core:94653] " naruse
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 5+ messages in thread
From: duerst @ 2019-08-29  6:50 UTC (permalink / raw
  To: ruby-core

Issue #15908 has been updated by duerst (Martin Dürst).

Status changed from Open to Closed

Depending on usage, distinction of UTF-8 (with/without BOM), UTF-16LE without BOM, UTF-16BE with or without BOM, and so on may also be necessary. Also, for Japanese, traditionally distinction between EUC-JP, Shift_JIS, and ISO-2022-JP can additionally be necessary.

For more complex cases, heuristics are needed. On the other hand, applications may not want to (or not be allowed to, as e.g. for the bootstrap phase of an XML parser) allow more than a well defined subset.

This kind of processing is therefore better left to applications.

I'm closing this issue to not leave it dangling, but please feel free to reopen if you disagree.

----------------------------------------
Bug #15908: Detecting BOM with non-UTF encoding
https://bugs.ruby-lang.org/issues/15908#change-81251

* Author: nobu (Nobuyoshi Nakada)
* Status: Closed
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: 
* Backport: 2.4: UNKNOWN, 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
Currently, "bom|" encoding prefix to `File.open` is ignored if the encoding name is not a UTF.
But one usage of BOM is to tell if the stream is a UTF or not, and especially common on Windows, e.g. UTF-16LE or OEMCP.
So I think this restriction should be removed.

---Files--------------------------------
0001-Enable-BOM-detection-with-non-UTF-encodings.patch (4.27 KB)


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:94653] [Ruby master Bug#15908] Detecting BOM with non-UTF encoding
       [not found] <redmine.issue-15908.20190608124445@ruby-lang.org>
  2019-06-08 12:44 ` [ruby-core:93024] [Ruby trunk Bug#15908] Detecting BOM with non-UTF encoding nobu
  2019-08-29  6:50 ` [ruby-core:94652] [Ruby master " duerst
@ 2019-08-29  6:54 ` naruse
  2019-08-30  2:46 ` [ruby-core:94675] " nobu
  2019-08-30  8:07 ` [ruby-core:94680] " duerst
  4 siblings, 0 replies; 5+ messages in thread
From: naruse @ 2019-08-29  6:54 UTC (permalink / raw
  To: ruby-core

Issue #15908 has been updated by naruse (Yui NARUSE).


I understand there's theoretically exist a situation this feature is useful.
But I think it doesn't exist in practice.
I object to provide an additional utility to support legacy encoding.

----------------------------------------
Bug #15908: Detecting BOM with non-UTF encoding
https://bugs.ruby-lang.org/issues/15908#change-81252

* Author: nobu (Nobuyoshi Nakada)
* Status: Closed
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: 
* Backport: 2.4: UNKNOWN, 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
Currently, "bom|" encoding prefix to `File.open` is ignored if the encoding name is not a UTF.
But one usage of BOM is to tell if the stream is a UTF or not, and especially common on Windows, e.g. UTF-16LE or OEMCP.
So I think this restriction should be removed.

---Files--------------------------------
0001-Enable-BOM-detection-with-non-UTF-encodings.patch (4.27 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:94675] [Ruby master Bug#15908] Detecting BOM with non-UTF encoding
       [not found] <redmine.issue-15908.20190608124445@ruby-lang.org>
                   ` (2 preceding siblings ...)
  2019-08-29  6:54 ` [ruby-core:94653] " naruse
@ 2019-08-30  2:46 ` nobu
  2019-08-30  8:07 ` [ruby-core:94680] " duerst
  4 siblings, 0 replies; 5+ messages in thread
From: nobu @ 2019-08-30  2:46 UTC (permalink / raw
  To: ruby-core

Issue #15908 has been updated by nobu (Nobuyoshi Nakada).


I thought UTF-16LE and CP932 as the main purpose however, I'm bit surprised that these texts have been extinct on Windows already. :tada:


----------------------------------------
Bug #15908: Detecting BOM with non-UTF encoding
https://bugs.ruby-lang.org/issues/15908#change-81280

* Author: nobu (Nobuyoshi Nakada)
* Status: Closed
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: 
* Backport: 2.4: UNKNOWN, 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
Currently, "bom|" encoding prefix to `File.open` is ignored if the encoding name is not a UTF.
But one usage of BOM is to tell if the stream is a UTF or not, and especially common on Windows, e.g. UTF-16LE or OEMCP.
So I think this restriction should be removed.

---Files--------------------------------
0001-Enable-BOM-detection-with-non-UTF-encodings.patch (4.27 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:94680] [Ruby master Bug#15908] Detecting BOM with non-UTF encoding
       [not found] <redmine.issue-15908.20190608124445@ruby-lang.org>
                   ` (3 preceding siblings ...)
  2019-08-30  2:46 ` [ruby-core:94675] " nobu
@ 2019-08-30  8:07 ` duerst
  4 siblings, 0 replies; 5+ messages in thread
From: duerst @ 2019-08-30  8:07 UTC (permalink / raw
  To: ruby-core

Issue #15908 has been updated by duerst (Martin Dürst).


nobu (Nobuyoshi Nakada) wrote:
> I thought UTF-16LE and CP932 as the main purpose however, I'm bit surprised that these texts have been extinct on Windows already. :tada:

They are not yet extinct, unfortunately :-(. In Japan, there may be quite a few cases where this would work, but even in Japan, there are many other cases where a larger and/or different selection of encodings is needed.

----------------------------------------
Bug #15908: Detecting BOM with non-UTF encoding
https://bugs.ruby-lang.org/issues/15908#change-81286

* Author: nobu (Nobuyoshi Nakada)
* Status: Closed
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: 
* Backport: 2.4: UNKNOWN, 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
Currently, "bom|" encoding prefix to `File.open` is ignored if the encoding name is not a UTF.
But one usage of BOM is to tell if the stream is a UTF or not, and especially common on Windows, e.g. UTF-16LE or OEMCP.
So I think this restriction should be removed.

---Files--------------------------------
0001-Enable-BOM-detection-with-non-UTF-encodings.patch (4.27 KB)


-- 
https://bugs.ruby-lang.org/

Unsubscribe: <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
<http://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-08-30  8:07 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <redmine.issue-15908.20190608124445@ruby-lang.org>
2019-06-08 12:44 ` [ruby-core:93024] [Ruby trunk Bug#15908] Detecting BOM with non-UTF encoding nobu
2019-08-29  6:50 ` [ruby-core:94652] [Ruby master " duerst
2019-08-29  6:54 ` [ruby-core:94653] " naruse
2019-08-30  2:46 ` [ruby-core:94675] " nobu
2019-08-30  8:07 ` [ruby-core:94680] " duerst

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).