ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:76668] [Ruby trunk Bug#12650] Use UTF-8 encoding for ENV on Windows
       [not found] <redmine.issue-12650.20160803005342@ruby-lang.org>
@ 2016-08-03  0:53 ` davispuh
  2016-08-03  5:36 ` [ruby-core:76677] " usa
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: davispuh @ 2016-08-03  0:53 UTC (permalink / raw
  To: ruby-core

Issue #12650 has been reported by Dāvis Mosāns.

----------------------------------------
Bug #12650: Use UTF-8 encoding for ENV on Windows
https://bugs.ruby-lang.org/issues/12650

* Author: Dāvis Mosāns
* Status: Open
* Priority: Normal
* Assignee: 
* ruby -v: ruby 2.4.0dev (2016-08-02 trunk 55799) [x64-mswin64_140]
* Backport: 2.1: UNKNOWN, 2.2: UNKNOWN, 2.3: UNKNOWN
----------------------------------------
Windows environment variables supports Unicode (same wide WinAPI) and so there's no reason to limit ourselves to any codepage.
Currently ENV would use locale's encoding (console's codepage) which obviously won't work correctly for characters outside of those codepages.

I've attached a patch which implements this and fixes bug #9715


---Files--------------------------------
0001-Always-use-UTF-8-encoded-environment-on-Windows.patch (3.64 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:76677] [Ruby trunk Bug#12650] Use UTF-8 encoding for ENV on Windows
       [not found] <redmine.issue-12650.20160803005342@ruby-lang.org>
  2016-08-03  0:53 ` [ruby-core:76668] [Ruby trunk Bug#12650] Use UTF-8 encoding for ENV on Windows davispuh
@ 2016-08-03  5:36 ` usa
  2016-08-03  6:06 ` [ruby-core:76678] [Ruby trunk Feature#12650] " nobu
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: usa @ 2016-08-03  5:36 UTC (permalink / raw
  To: ruby-core

Issue #12650 has been updated by Usaku NAKAMURA.


We don't want to break compatibility.
Wait Ruby3.

----------------------------------------
Bug #12650: Use UTF-8 encoding for ENV on Windows
https://bugs.ruby-lang.org/issues/12650#change-59894

* Author: Dāvis Mosāns
* Status: Open
* Priority: Normal
* Assignee: 
* ruby -v: ruby 2.4.0dev (2016-08-02 trunk 55799) [x64-mswin64_140]
* Backport: 2.1: UNKNOWN, 2.2: UNKNOWN, 2.3: UNKNOWN
----------------------------------------
Windows environment variables supports Unicode (same wide WinAPI) and so there's no reason to limit ourselves to any codepage.
Currently ENV would use locale's encoding (console's codepage) which obviously won't work correctly for characters outside of those codepages.

I've attached a patch which implements this and fixes bug #9715


---Files--------------------------------
0001-Always-use-UTF-8-encoded-environment-on-Windows.patch (3.64 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:76678] [Ruby trunk Feature#12650] Use UTF-8 encoding for ENV on Windows
       [not found] <redmine.issue-12650.20160803005342@ruby-lang.org>
  2016-08-03  0:53 ` [ruby-core:76668] [Ruby trunk Bug#12650] Use UTF-8 encoding for ENV on Windows davispuh
  2016-08-03  5:36 ` [ruby-core:76677] " usa
@ 2016-08-03  6:06 ` nobu
  2016-08-03 19:46 ` [ruby-core:76691] " billk
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: nobu @ 2016-08-03  6:06 UTC (permalink / raw
  To: ruby-core

Issue #12650 has been updated by Nobuyoshi Nakada.

Tracker changed from Bug to Feature

----------------------------------------
Feature #12650: Use UTF-8 encoding for ENV on Windows
https://bugs.ruby-lang.org/issues/12650#change-59895

* Author: Dāvis Mosāns
* Status: Open
* Priority: Normal
* Assignee: 
----------------------------------------
Windows environment variables supports Unicode (same wide WinAPI) and so there's no reason to limit ourselves to any codepage.
Currently ENV would use locale's encoding (console's codepage) which obviously won't work correctly for characters outside of those codepages.

I've attached a patch which implements this and fixes bug #9715


---Files--------------------------------
0001-Always-use-UTF-8-encoded-environment-on-Windows.patch (3.64 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:76691] [Ruby trunk Feature#12650] Use UTF-8 encoding for ENV on Windows
       [not found] <redmine.issue-12650.20160803005342@ruby-lang.org>
                   ` (2 preceding siblings ...)
  2016-08-03  6:06 ` [ruby-core:76678] [Ruby trunk Feature#12650] " nobu
@ 2016-08-03 19:46 ` billk
  2016-10-08  1:25 ` [ruby-core:77523] " ethan_j_brown
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: billk @ 2016-08-03 19:46 UTC (permalink / raw
  To: ruby-core

Issue #12650 has been updated by B Kelly.


Hi,

Usaku NAKAMURA wrote:
> We don't want to break compatibility.
> Wait Ruby3.

We always invoke ruby with -EUTF-8:UTF-8 .

Would make sense to enable this patch in ruby 2.x in such situations
where UTF-8 behavior has been requested explicitly?




----------------------------------------
Feature #12650: Use UTF-8 encoding for ENV on Windows
https://bugs.ruby-lang.org/issues/12650#change-59907

* Author: Dāvis Mosāns
* Status: Open
* Priority: Normal
* Assignee: 
----------------------------------------
Windows environment variables supports Unicode (same wide WinAPI) and so there's no reason to limit ourselves to any codepage.
Currently ENV would use locale's encoding (console's codepage) which obviously won't work correctly for characters outside of those codepages.

I've attached a patch which implements this and fixes bug #9715


---Files--------------------------------
0001-Always-use-UTF-8-encoded-environment-on-Windows.patch (3.64 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:77523] [Ruby trunk Feature#12650] Use UTF-8 encoding for ENV on Windows
       [not found] <redmine.issue-12650.20160803005342@ruby-lang.org>
                   ` (3 preceding siblings ...)
  2016-08-03 19:46 ` [ruby-core:76691] " billk
@ 2016-10-08  1:25 ` ethan_j_brown
  2016-10-10  2:44   ` [ruby-core:77535] " RRRoy BBBean
  2017-01-04 14:15 ` [ruby-core:78968] " thomas
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 10+ messages in thread
From: ethan_j_brown @ 2016-10-08  1:25 UTC (permalink / raw
  To: ruby-core

Issue #12650 has been updated by Ethan Brown.


If you could rethink the plan to wait until Ruby 3, that would be great.

I would expect Ruby to normalize on UTF-8 strings everywhere internally, and only convert to local codepage on the boundary (such as writing to console, file, etc).

We are tracking a number of issues in Puppet that we believe are caused by the current behavior:

* [Puppet Throws Exception when Running Under Unicode Windows User](https://tickets.puppetlabs.com/browse/PUP-6035)
* [Bundler Fails when Running Under a Unicode Windows User](https://tickets.puppetlabs.com/browse/PUP-6034)
* [Puppet Crashes when Unicode User Applies Manifest](https://tickets.puppetlabs.com/browse/PUP-5822)

----------------------------------------
Feature #12650: Use UTF-8 encoding for ENV on Windows
https://bugs.ruby-lang.org/issues/12650#change-60787

* Author: Dāvis Mosāns
* Status: Open
* Priority: Normal
* Assignee: 
----------------------------------------
Windows environment variables supports Unicode (same wide WinAPI) and so there's no reason to limit ourselves to any codepage.
Currently ENV would use locale's encoding (console's codepage) which obviously won't work correctly for characters outside of those codepages.

I've attached a patch which implements this and fixes bug #9715


---Files--------------------------------
0001-Always-use-UTF-8-encoded-environment-on-Windows.patch (3.64 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:77535] Re: [Ruby trunk Feature#12650] Use UTF-8 encoding for ENV on Windows
  2016-10-08  1:25 ` [ruby-core:77523] " ethan_j_brown
@ 2016-10-10  2:44   ` RRRoy BBBean
  0 siblings, 0 replies; 10+ messages in thread
From: RRRoy BBBean @ 2016-10-10  2:44 UTC (permalink / raw
  To: ruby-core

PIPES: I wrote a small gem several years ago that handled a problem with 
UTF-8 I/O. The key parts, extracted from their containing module & 
class, are below. This is how I dealt with Hangeul (Korean) characters 
used as data for a non-web application.

     @stdout_callback = 'UTF-8'

     def run
             validate_and_configure
             @stdin, @stdout, @stderr, @wait_thread = Open3.popen3( 
@cmd_text, :chdir=>@cd_path )
             @stdin.set_encoding @stdio_encoding
             @stdout.set_encoding @stdio_encoding
             @stderr.set_encoding @stdio_encoding
             @running =  monitor_stdout && monitor_stderr && attend_thread
     end

DIRECTORY LISTINGS: From some other code, I use this trick to read 
filenames in Hangeul.

Dir.entries(@titles_path,:encoding=>'UTF-8').each {|thing_in_directory| 
... }

FILE I/O with BOM: For file I/O with Hangeul, I use crazy stuff like this.

BOM = "\xEF\xBB\xBF".force_encoding("UTF-8")

Note that some applications (Firefox, Notepad++) recognize the Byte 
Order Mark, and other applications are befuddled when they encounter it. 
I, personally, prefer to use the Byte Order Mark because it immediately 
identifies the file format as UTF-8 (for applications that recognize the 
BOM).

         def strip_bom line
             return nil if line.nil? || line.empty?
             line.force_encoding 'UTF-8'
             line.gsub( BOM, '' )
         end

Also note that when files containing the BOM are concatenated or pasted 
into one-another by BOM-befuddled applications, one or more Byte Order 
Marks can easily become embedded within the data. That's why I use the 
above method.

Anyway, I learned to cope with some of the UTF-8 issues in Ruby, because 
of my work with Korean. I like the way Ruby handles UTF-8 now. although 
it would be nice if everyone could adopt UTF-8 as the de facto standard.

I'm not claiming that my coding techniques are any good, but maybe this 
will help someone.



On 10/07/2016 08:25 PM, ethan_j_brown@hotmail.com wrote:
> Issue #12650 has been updated by Ethan Brown.
>
>
> If you could rethink the plan to wait until Ruby 3, that would be great.
>
> I would expect Ruby to normalize on UTF-8 strings everywhere internally, and only convert to local codepage on the boundary (such as writing to console, file, etc).
>
> We are tracking a number of issues in Puppet that we believe are caused by the current behavior:
>
> * [Puppet Throws Exception when Running Under Unicode Windows User](https://tickets.puppetlabs.com/browse/PUP-6035)
> * [Bundler Fails when Running Under a Unicode Windows User](https://tickets.puppetlabs.com/browse/PUP-6034)
> * [Puppet Crashes when Unicode User Applies Manifest](https://tickets.puppetlabs.com/browse/PUP-5822)
>
> ----------------------------------------
> Feature #12650: Use UTF-8 encoding for ENV on Windows
> https://bugs.ruby-lang.org/issues/12650#change-60787
>
> * Author: Dāvis Mosāns
> * Status: Open
> * Priority: Normal
> * Assignee:
> ----------------------------------------
> Windows environment variables supports Unicode (same wide WinAPI) and so there's no reason to limit ourselves to any codepage.
> Currently ENV would use locale's encoding (console's codepage) which obviously won't work correctly for characters outside of those codepages.
>
> I've attached a patch which implements this and fixes bug #9715
>
>
> ---Files--------------------------------
> 0001-Always-use-UTF-8-encoded-environment-on-Windows.patch (3.64 KB)
>
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:78968] [Ruby trunk Feature#12650] Use UTF-8 encoding for ENV on Windows
       [not found] <redmine.issue-12650.20160803005342@ruby-lang.org>
                   ` (4 preceding siblings ...)
  2016-10-08  1:25 ` [ruby-core:77523] " ethan_j_brown
@ 2017-01-04 14:15 ` thomas
  2017-03-13 15:02 ` [ruby-core:80136] " shyouhei
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: thomas @ 2017-01-04 14:15 UTC (permalink / raw
  To: ruby-core

Issue #12650 has been updated by Thomas Thomassen.


B Kelly wrote:
> Hi,
> 
> Usaku NAKAMURA wrote:
> > We don't want to break compatibility.
> > Wait Ruby3.
> 
> We always invoke ruby with -EUTF-8:UTF-8 .
> 
> Would make sense to enable this patch in ruby 2.x in such situations
> where UTF-8 behavior has been requested explicitly?

I would like to second this request. We are also troubled by the encoding issues under Windows. Not sure when Ruby 3 is planned to be released, but we would prefer for a more immediate solution.

----------------------------------------
Feature #12650: Use UTF-8 encoding for ENV on Windows
https://bugs.ruby-lang.org/issues/12650#change-62388

* Author: Dāvis Mosāns
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
Windows environment variables supports Unicode (same wide WinAPI) and so there's no reason to limit ourselves to any codepage.
Currently ENV would use locale's encoding (console's codepage) which obviously won't work correctly for characters outside of those codepages.

I've attached a patch which implements this and fixes bug #9715


---Files--------------------------------
0001-Always-use-UTF-8-encoded-environment-on-Windows.patch (3.64 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:80136] [Ruby trunk Feature#12650] Use UTF-8 encoding for ENV on Windows
       [not found] <redmine.issue-12650.20160803005342@ruby-lang.org>
                   ` (5 preceding siblings ...)
  2017-01-04 14:15 ` [ruby-core:78968] " thomas
@ 2017-03-13 15:02 ` shyouhei
  2017-03-13 16:33 ` [ruby-core:80143] " thomas
  2019-12-26  5:56 ` [ruby-core:96487] [Ruby master " naruse
  8 siblings, 0 replies; 10+ messages in thread
From: shyouhei @ 2017-03-13 15:02 UTC (permalink / raw
  To: ruby-core

Issue #12650 has been updated by shyouhei (Shyouhei Urabe).


We looked at this issue in today's developer meeting.

First off, attendees' understanding: ENV in Windows is managed by its kernel, and is provided to an userland process as an array of wide characters.  Tell me if it's wrong.  Also, we already support writing UTF_8 strings into ENV because that has no backwards compatibility problem.  The problem is to read from it.

Now, from our long tradition of using OEM codepage in Windows, it has been difficult to change the encoding of ENV to UTF_8.  A tragedy is Windows does have chcp 65001, wich is not practically used anywhere.  So windows users are left in their code pages.

I understand you want to use UTF_8.  In order to do so, changing default encoding is not practically possible now because of backwards compatibility.  I advice you to propose other ways; like for instance having some sort of "UTF_8 mode"-like thing.  Maybe does it make sense for you to set default_internal encoding (which is set to nil by default)?

----------------------------------------
Feature #12650: Use UTF-8 encoding for ENV on Windows
https://bugs.ruby-lang.org/issues/12650#change-63564

* Author: davispuh (Dāvis Mosāns)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
Windows environment variables supports Unicode (same wide WinAPI) and so there's no reason to limit ourselves to any codepage.
Currently ENV would use locale's encoding (console's codepage) which obviously won't work correctly for characters outside of those codepages.

I've attached a patch which implements this and fixes bug #9715


---Files--------------------------------
0001-Always-use-UTF-8-encoded-environment-on-Windows.patch (3.64 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:80143] [Ruby trunk Feature#12650] Use UTF-8 encoding for ENV on Windows
       [not found] <redmine.issue-12650.20160803005342@ruby-lang.org>
                   ` (6 preceding siblings ...)
  2017-03-13 15:02 ` [ruby-core:80136] " shyouhei
@ 2017-03-13 16:33 ` thomas
  2019-12-26  5:56 ` [ruby-core:96487] [Ruby master " naruse
  8 siblings, 0 replies; 10+ messages in thread
From: thomas @ 2017-03-13 16:33 UTC (permalink / raw
  To: ruby-core

Issue #12650 has been updated by thomthom (Thomas Thomassen).


I would be ok with it not being default, as long as it can be configured for the whole interpreter and not some magic comment that would have to be in each source file.
In our particular scenario we are embedding Ruby into our application and we would like to configure the Ruby interpreter to use this "UTF-8 mode".
People that are writing Ruby extensions for our application already have to use hacks such as force_encoding to correct this - and it's a constant source of bugs and problems. If we could force ENV strings to be UTF-8 by default for the embedded environment we provide that be a great relief for us.

shyouhei (Shyouhei Urabe) wrote:
> We looked at this issue in today's developer meeting.
> 
> First off, attendees' understanding: ENV in Windows is managed by its kernel, and is provided to an userland process as an array of wide characters.  Tell me if it's wrong.  Also, we already support writing UTF_8 strings into ENV because that has no backwards compatibility problem.  The problem is to read from it.
> 
> Now, from our long tradition of using OEM codepage in Windows, it has been difficult to change the encoding of ENV to UTF_8.  A tragedy is Windows does have chcp 65001, wich is not practically used anywhere.  So windows users are left in their code pages.
> 
> I understand you want to use UTF_8.  In order to do so, changing default encoding is not practically possible now because of backwards compatibility.  I advice you to propose other ways; like for instance having some sort of "UTF_8 mode"-like thing.  Maybe does it make sense for you to set default_internal encoding (which is set to nil by default)?



----------------------------------------
Feature #12650: Use UTF-8 encoding for ENV on Windows
https://bugs.ruby-lang.org/issues/12650#change-63571

* Author: davispuh (Dāvis Mosāns)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
Windows environment variables supports Unicode (same wide WinAPI) and so there's no reason to limit ourselves to any codepage.
Currently ENV would use locale's encoding (console's codepage) which obviously won't work correctly for characters outside of those codepages.

I've attached a patch which implements this and fixes bug #9715


---Files--------------------------------
0001-Always-use-UTF-8-encoded-environment-on-Windows.patch (3.64 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [ruby-core:96487] [Ruby master Feature#12650] Use UTF-8 encoding for ENV on Windows
       [not found] <redmine.issue-12650.20160803005342@ruby-lang.org>
                   ` (7 preceding siblings ...)
  2017-03-13 16:33 ` [ruby-core:80143] " thomas
@ 2019-12-26  5:56 ` naruse
  8 siblings, 0 replies; 10+ messages in thread
From: naruse @ 2019-12-26  5:56 UTC (permalink / raw
  To: ruby-core

Issue #12650 has been updated by naruse (Yui NARUSE).

Target version set to 3.0
Assignee set to cruby-windows

----------------------------------------
Feature #12650: Use UTF-8 encoding for ENV on Windows
https://bugs.ruby-lang.org/issues/12650#change-83415

* Author: davispuh (Dāvis Mosāns)
* Status: Open
* Priority: Normal
* Assignee: cruby-windows
* Target version: 3.0
----------------------------------------
Windows environment variables supports Unicode (same wide WinAPI) and so there's no reason to limit ourselves to any codepage.
Currently ENV would use locale's encoding (console's codepage) which obviously won't work correctly for characters outside of those codepages.

I've attached a patch which implements this and fixes bug #9715


---Files--------------------------------
0001-Always-use-UTF-8-encoded-environment-on-Windows.patch (3.64 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-12-26  5:56 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <redmine.issue-12650.20160803005342@ruby-lang.org>
2016-08-03  0:53 ` [ruby-core:76668] [Ruby trunk Bug#12650] Use UTF-8 encoding for ENV on Windows davispuh
2016-08-03  5:36 ` [ruby-core:76677] " usa
2016-08-03  6:06 ` [ruby-core:76678] [Ruby trunk Feature#12650] " nobu
2016-08-03 19:46 ` [ruby-core:76691] " billk
2016-10-08  1:25 ` [ruby-core:77523] " ethan_j_brown
2016-10-10  2:44   ` [ruby-core:77535] " RRRoy BBBean
2017-01-04 14:15 ` [ruby-core:78968] " thomas
2017-03-13 15:02 ` [ruby-core:80136] " shyouhei
2017-03-13 16:33 ` [ruby-core:80143] " thomas
2019-12-26  5:56 ` [ruby-core:96487] [Ruby master " naruse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).