ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:70214] [Ruby trunk - Bug #11410] [Open] Win32 Registry enumeration performs unnecessary string re-encoding which cause UndefinedConversionError exceptions
       [not found] <redmine.issue-11410.20150802183657@ruby-lang.org>
@ 2015-08-02 18:36 ` ethan_j_brown
  2015-08-02 19:27 ` [ruby-core:70216] [Ruby trunk - Bug #11410] " ethan_j_brown
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 5+ messages in thread
From: ethan_j_brown @ 2015-08-02 18:36 UTC (permalink / raw
  To: ruby-core

Issue #11410 has been reported by Ethan Brown.

----------------------------------------
Bug #11410: Win32 Registry enumeration performs unnecessary string re-encoding which cause UndefinedConversionError exceptions
https://bugs.ruby-lang.org/issues/11410

* Author: Ethan Brown
* Status: Open
* Priority: Normal
* Assignee: 
* ruby -v: ruby 2.1.5p273 (2014-11-13 revision 48405) [x64-mingw32]
* Backport: 2.0.0: DONTNEED, 2.1: REQUIRED, 2.2: REQUIRED
----------------------------------------
When enumerating keys with `Win32::Registry#each_key` / `Win32::Registry#keys` or values with `Win32::Registry#each_value` / `Win32::Registry#values`, Ruby will take a `UTF-16LE` string returned from the Windows API and convert it to the local codepage.  In the case of `each_value`, the string is then immediately converted back to `UTF16-LE` before being used in subsequent Windows API calls.  Not only is this conversion unnecessary, but it may cause encoding exceptions when the local codepage does not support all of the characters present in the original Unicode string.

One such example of this is when a Unicode en-dash `U+2013` appears in a string, and the local codepage is `IBM437`, which has no equivalent character. But this is just one of many examples that may trigger this behavior. 


~~~
[1] pry(main)> RUBY_VERSION
=> "2.1.5"
[2] pry(main)> ENDASH_UTF_16 = [0x2013]
=> [8211]
[3] pry(main)> utf_16_str = ENDASH_UTF_16.pack('s*').force_encoding(Encoding::UTF_16LE)
=> "\u2013"
[4] pry(main)> utf_16_str.encode(Encoding::IBM437)
Encoding::UndefinedConversionError: U+2013 to IBM437 in conversion from UTF-16LE to UTF-8 to IBM437
from (pry):4:in `encode'
~~~


NOTE: Normal registry reads of a value at a particular key are not problematic - the bad behavior is triggered specifically during enumeration. 

This is primarily as a result of the `export_string` function which re-encodes strings
https://github.com/ruby/ruby/blob/ruby_2_1/ext/win32/lib/win32/registry.rb#L894-L896


It is used by `each_value` and `each_key`, which return `UTF-16LE` strings:

https://github.com/ruby/ruby/blob/ruby_2_1/ext/win32/lib/win32/registry.rb#L561
https://github.com/ruby/ruby/blob/ruby_2_1/ext/win32/lib/win32/registry.rb#L598


In the `each_value` method, this LOCALE re-encoded string is then passed to the `read` method, where it is turned back into a `UTF16-LE` string to be passed to `RegQueryValueExW`

https://github.com/ruby/ruby/blob/v2_1_5/ext/win32/lib/win32/registry.rb#L563
https://github.com/ruby/ruby/blob/v2_1_5/ext/win32/lib/win32/registry.rb#L631
https://github.com/ruby/ruby/blob/v2_1_5/ext/win32/lib/win32/registry.rb#L307


Inside Puppet, we employed a solution that avoids Ruby's `Win32::Registry` when performing enumeration, and relies on internal helpers instead (avoiding unnecessary string encodings).  This was unfortunate, but necessary:
https://github.com/puppetlabs/puppet/commit/c610cd01eeef3fafa7aa2761a3435dd6c1b0d8d4

Note also that we typically convert `UTF-16LE` strings to `UTF-8` internally (since this is almost always guaranteed to be a lossless conversion), until we reach an end-user boundary where they absolutely need a specific encoding rendered. For instance, our version of `read` converts to `UTF8`:
https://github.com/puppetlabs/puppet/blob/c610cd01eeef3fafa7aa2761a3435dd6c1b0d8d4/lib/puppet/util/windows/registry.rb#L211-L214



I suggest that other locations where strings are re-encoded be examined for potential issues, as locale codepage conversions are generally considered dangerous given Win32 APIs use `UTF-16LE`.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:70216] [Ruby trunk - Bug #11410] Win32 Registry enumeration performs unnecessary string re-encoding which cause UndefinedConversionError exceptions
       [not found] <redmine.issue-11410.20150802183657@ruby-lang.org>
  2015-08-02 18:36 ` [ruby-core:70214] [Ruby trunk - Bug #11410] [Open] Win32 Registry enumeration performs unnecessary string re-encoding which cause UndefinedConversionError exceptions ethan_j_brown
@ 2015-08-02 19:27 ` ethan_j_brown
  2015-08-03  1:49 ` [ruby-core:70221] [Ruby trunk - Bug #11410] [Feedback] " nobu
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 5+ messages in thread
From: ethan_j_brown @ 2015-08-02 19:27 UTC (permalink / raw
  To: ruby-core

Issue #11410 has been updated by Ethan Brown.


I realized that I should have included some sample code demonstrating the problem:

~~~ruby
require 'win32/registry'

ENDASH_UTF_16 = [0x2013]
TM_UTF_16 = [0x2122]

endash_utf_16_str = ENDASH_UTF_16.pack('s*').force_encoding(Encoding::UTF_16LE)
tm_utf_16_str = TM_UTF_16.pack('s*').force_encoding(Encoding::UTF_16LE)

def test_with_encoding(root, key_name, encoding)
  Encoding::default_internal = encoding

  puts "\n\nTesting with #{encoding.to_s}"

  puts "- Reading value #{root.parent.keyname}\\#{root.keyname}\\#{key_name}"
  begin
    value = root[key_name]
    puts "    - read value #{key_name} as #{value}"
  rescue Exception => e
    puts "    x failed to read from #{key_name}\n\t\t#{e}\n"
  end

  puts " - Reading value #{root.parent.keyname}\\#{root.keyname}\\#{key_name}"
  begin
    type, value = root.read(key_name)
    puts "    - read value #{key_name} as type: #{type}, value: #{value}"
  rescue Exception => e
    puts "    x failed to read from #{key_name}\n\t\t#{e}\n"
  end

  puts " - Enumerating Keys for #{root.parent.keyname}\\#{root.keyname}"
  begin
    root.each_key do |key, wtime|
      puts "    - read each_key #{key}"
    end
  rescue Exception => e
    puts "    x failed to each_key from #{root.parent.keyname}\\#{root.keyname}\n\t\t#{e}\n"
  end

  puts " - Enumerating Values for #{root.parent.keyname}\\#{root.keyname}"
  begin
    root.each_value do |name, type, value|
      puts "    - read each_value #{name} as type: #{type}, value: #{value}"
    end
  rescue Exception => e
    puts "    x failed to each_value from #{root.parent.keyname}\\#{root.keyname}\n\t\t#{e}\n"
  end

end


root = Win32::Registry::HKEY_CURRENT_USER
root.create('SOFTWARE\rubyfail') do |reg|
  # create subkey with trademark symbol
  reg.create(tm_utf_16_str)

  # create endash value named foo
  reg.write('foo', Win32::Registry::REG_SZ, endash_utf_16_str)

  test_with_encoding(reg, 'foo', Encoding::WINDOWS_1252)

  # failures with both enumeration of keys and values
  test_with_encoding(reg, 'foo', Encoding::IBM437)
end
~~~


The important part is that you will failures in calling `each_key` and `each_value` when either contains characters that cannot be converted to the current codepage.

----------------------------------------
Bug #11410: Win32 Registry enumeration performs unnecessary string re-encoding which cause UndefinedConversionError exceptions
https://bugs.ruby-lang.org/issues/11410#change-53639

* Author: Ethan Brown
* Status: Open
* Priority: Normal
* Assignee: 
* ruby -v: ruby 2.1.5p273 (2014-11-13 revision 48405) [x64-mingw32]
* Backport: 2.0.0: DONTNEED, 2.1: REQUIRED, 2.2: REQUIRED
----------------------------------------
When enumerating keys with `Win32::Registry#each_key` / `Win32::Registry#keys` or values with `Win32::Registry#each_value` / `Win32::Registry#values`, Ruby will take a `UTF-16LE` string returned from the Windows API and convert it to the local codepage.  In the case of `each_value`, the string is then immediately converted back to `UTF16-LE` before being used in subsequent Windows API calls.  Not only is this conversion unnecessary, but it may cause encoding exceptions when the local codepage does not support all of the characters present in the original Unicode string.

One such example of this is when a Unicode en-dash `U+2013` appears in a string, and the local codepage is `IBM437`, which has no equivalent character. But this is just one of many examples that may trigger this behavior. 


~~~
[1] pry(main)> RUBY_VERSION
=> "2.1.5"
[2] pry(main)> ENDASH_UTF_16 = [0x2013]
=> [8211]
[3] pry(main)> utf_16_str = ENDASH_UTF_16.pack('s*').force_encoding(Encoding::UTF_16LE)
=> "\u2013"
[4] pry(main)> utf_16_str.encode(Encoding::IBM437)
Encoding::UndefinedConversionError: U+2013 to IBM437 in conversion from UTF-16LE to UTF-8 to IBM437
from (pry):4:in `encode'
~~~


NOTE: Normal registry reads of a value at a particular key are not problematic - the bad behavior is triggered specifically during enumeration. 

This is primarily as a result of the `export_string` function which re-encodes strings
https://github.com/ruby/ruby/blob/ruby_2_1/ext/win32/lib/win32/registry.rb#L894-L896


It is used by `each_value` and `each_key`, which return `UTF-16LE` strings:

https://github.com/ruby/ruby/blob/ruby_2_1/ext/win32/lib/win32/registry.rb#L561
https://github.com/ruby/ruby/blob/ruby_2_1/ext/win32/lib/win32/registry.rb#L598


In the `each_value` method, this LOCALE re-encoded string is then passed to the `read` method, where it is turned back into a `UTF16-LE` string to be passed to `RegQueryValueExW`

https://github.com/ruby/ruby/blob/v2_1_5/ext/win32/lib/win32/registry.rb#L563
https://github.com/ruby/ruby/blob/v2_1_5/ext/win32/lib/win32/registry.rb#L631
https://github.com/ruby/ruby/blob/v2_1_5/ext/win32/lib/win32/registry.rb#L307


Inside Puppet, we employed a solution that avoids Ruby's `Win32::Registry` when performing enumeration, and relies on internal helpers instead (avoiding unnecessary string encodings).  This was unfortunate, but necessary:
https://github.com/puppetlabs/puppet/commit/c610cd01eeef3fafa7aa2761a3435dd6c1b0d8d4

Note also that we typically convert `UTF-16LE` strings to `UTF-8` internally (since this is almost always guaranteed to be a lossless conversion), until we reach an end-user boundary where they absolutely need a specific encoding rendered. For instance, our version of `read` converts to `UTF8`:
https://github.com/puppetlabs/puppet/blob/c610cd01eeef3fafa7aa2761a3435dd6c1b0d8d4/lib/puppet/util/windows/registry.rb#L211-L214



I suggest that other locations where strings are re-encoded be examined for potential issues, as locale codepage conversions are generally considered dangerous given Win32 APIs use `UTF-16LE`.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:70221] [Ruby trunk - Bug #11410] [Feedback] Win32 Registry enumeration performs unnecessary string re-encoding which cause UndefinedConversionError exceptions
       [not found] <redmine.issue-11410.20150802183657@ruby-lang.org>
  2015-08-02 18:36 ` [ruby-core:70214] [Ruby trunk - Bug #11410] [Open] Win32 Registry enumeration performs unnecessary string re-encoding which cause UndefinedConversionError exceptions ethan_j_brown
  2015-08-02 19:27 ` [ruby-core:70216] [Ruby trunk - Bug #11410] " ethan_j_brown
@ 2015-08-03  1:49 ` nobu
  2015-08-03  2:08 ` [ruby-core:70223] [Ruby trunk - Bug #11410] " nobu
  2015-08-13  6:57 ` [ruby-core:70363] " ethan_j_brown
  4 siblings, 0 replies; 5+ messages in thread
From: nobu @ 2015-08-03  1:49 UTC (permalink / raw
  To: ruby-core

Issue #11410 has been updated by Nobuyoshi Nakada.

Status changed from Open to Feedback

I agree that unnecessary conversions should be removed, but your code won't work yet, since the results will be expected in the locale encoding.

What do you want?

1. it's OK
2. return everything in UTF-8
3. add optional parameter to specify the result encoding
4. or others

----------------------------------------
Bug #11410: Win32 Registry enumeration performs unnecessary string re-encoding which cause UndefinedConversionError exceptions
https://bugs.ruby-lang.org/issues/11410#change-53649

* Author: Ethan Brown
* Status: Feedback
* Priority: Normal
* Assignee: 
* ruby -v: ruby 2.1.5p273 (2014-11-13 revision 48405) [x64-mingw32]
* Backport: 2.0.0: DONTNEED, 2.1: REQUIRED, 2.2: REQUIRED
----------------------------------------
When enumerating keys with `Win32::Registry#each_key` / `Win32::Registry#keys` or values with `Win32::Registry#each_value` / `Win32::Registry#values`, Ruby will take a `UTF-16LE` string returned from the Windows API and convert it to the local codepage.  In the case of `each_value`, the string is then immediately converted back to `UTF16-LE` before being used in subsequent Windows API calls.  Not only is this conversion unnecessary, but it may cause encoding exceptions when the local codepage does not support all of the characters present in the original Unicode string.

One such example of this is when a Unicode en-dash `U+2013` appears in a string, and the local codepage is `IBM437`, which has no equivalent character. But this is just one of many examples that may trigger this behavior. 


~~~
[1] pry(main)> RUBY_VERSION
=> "2.1.5"
[2] pry(main)> ENDASH_UTF_16 = [0x2013]
=> [8211]
[3] pry(main)> utf_16_str = ENDASH_UTF_16.pack('s*').force_encoding(Encoding::UTF_16LE)
=> "\u2013"
[4] pry(main)> utf_16_str.encode(Encoding::IBM437)
Encoding::UndefinedConversionError: U+2013 to IBM437 in conversion from UTF-16LE to UTF-8 to IBM437
from (pry):4:in `encode'
~~~


NOTE: Normal registry reads of a value at a particular key are not problematic - the bad behavior is triggered specifically during enumeration. 

This is primarily as a result of the `export_string` function which re-encodes strings
https://github.com/ruby/ruby/blob/ruby_2_1/ext/win32/lib/win32/registry.rb#L894-L896


It is used by `each_value` and `each_key`, which return `UTF-16LE` strings:

https://github.com/ruby/ruby/blob/ruby_2_1/ext/win32/lib/win32/registry.rb#L561
https://github.com/ruby/ruby/blob/ruby_2_1/ext/win32/lib/win32/registry.rb#L598


In the `each_value` method, this LOCALE re-encoded string is then passed to the `read` method, where it is turned back into a `UTF16-LE` string to be passed to `RegQueryValueExW`

https://github.com/ruby/ruby/blob/v2_1_5/ext/win32/lib/win32/registry.rb#L563
https://github.com/ruby/ruby/blob/v2_1_5/ext/win32/lib/win32/registry.rb#L631
https://github.com/ruby/ruby/blob/v2_1_5/ext/win32/lib/win32/registry.rb#L307


Inside Puppet, we employed a solution that avoids Ruby's `Win32::Registry` when performing enumeration, and relies on internal helpers instead (avoiding unnecessary string encodings).  This was unfortunate, but necessary:
https://github.com/puppetlabs/puppet/commit/c610cd01eeef3fafa7aa2761a3435dd6c1b0d8d4

Note also that we typically convert `UTF-16LE` strings to `UTF-8` internally (since this is almost always guaranteed to be a lossless conversion), until we reach an end-user boundary where they absolutely need a specific encoding rendered. For instance, our version of `read` converts to `UTF8`:
https://github.com/puppetlabs/puppet/blob/c610cd01eeef3fafa7aa2761a3435dd6c1b0d8d4/lib/puppet/util/windows/registry.rb#L211-L214



I suggest that other locations where strings are re-encoded be examined for potential issues, as locale codepage conversions are generally considered dangerous given Win32 APIs use `UTF-16LE`.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:70223] [Ruby trunk - Bug #11410] Win32 Registry enumeration performs unnecessary string re-encoding which cause UndefinedConversionError exceptions
       [not found] <redmine.issue-11410.20150802183657@ruby-lang.org>
                   ` (2 preceding siblings ...)
  2015-08-03  1:49 ` [ruby-core:70221] [Ruby trunk - Bug #11410] [Feedback] " nobu
@ 2015-08-03  2:08 ` nobu
  2015-08-13  6:57 ` [ruby-core:70363] " ethan_j_brown
  4 siblings, 0 replies; 5+ messages in thread
From: nobu @ 2015-08-03  2:08 UTC (permalink / raw
  To: ruby-core

Issue #11410 has been updated by Nobuyoshi Nakada.


https://github.com/nobu/ruby/tree/bug/11410-win32-registry-encoding

----------------------------------------
Bug #11410: Win32 Registry enumeration performs unnecessary string re-encoding which cause UndefinedConversionError exceptions
https://bugs.ruby-lang.org/issues/11410#change-53651

* Author: Ethan Brown
* Status: Feedback
* Priority: Normal
* Assignee: 
* ruby -v: ruby 2.1.5p273 (2014-11-13 revision 48405) [x64-mingw32]
* Backport: 2.0.0: DONTNEED, 2.1: REQUIRED, 2.2: REQUIRED
----------------------------------------
When enumerating keys with `Win32::Registry#each_key` / `Win32::Registry#keys` or values with `Win32::Registry#each_value` / `Win32::Registry#values`, Ruby will take a `UTF-16LE` string returned from the Windows API and convert it to the local codepage.  In the case of `each_value`, the string is then immediately converted back to `UTF16-LE` before being used in subsequent Windows API calls.  Not only is this conversion unnecessary, but it may cause encoding exceptions when the local codepage does not support all of the characters present in the original Unicode string.

One such example of this is when a Unicode en-dash `U+2013` appears in a string, and the local codepage is `IBM437`, which has no equivalent character. But this is just one of many examples that may trigger this behavior. 


~~~
[1] pry(main)> RUBY_VERSION
=> "2.1.5"
[2] pry(main)> ENDASH_UTF_16 = [0x2013]
=> [8211]
[3] pry(main)> utf_16_str = ENDASH_UTF_16.pack('s*').force_encoding(Encoding::UTF_16LE)
=> "\u2013"
[4] pry(main)> utf_16_str.encode(Encoding::IBM437)
Encoding::UndefinedConversionError: U+2013 to IBM437 in conversion from UTF-16LE to UTF-8 to IBM437
from (pry):4:in `encode'
~~~


NOTE: Normal registry reads of a value at a particular key are not problematic - the bad behavior is triggered specifically during enumeration. 

This is primarily as a result of the `export_string` function which re-encodes strings
https://github.com/ruby/ruby/blob/ruby_2_1/ext/win32/lib/win32/registry.rb#L894-L896


It is used by `each_value` and `each_key`, which return `UTF-16LE` strings:

https://github.com/ruby/ruby/blob/ruby_2_1/ext/win32/lib/win32/registry.rb#L561
https://github.com/ruby/ruby/blob/ruby_2_1/ext/win32/lib/win32/registry.rb#L598


In the `each_value` method, this LOCALE re-encoded string is then passed to the `read` method, where it is turned back into a `UTF16-LE` string to be passed to `RegQueryValueExW`

https://github.com/ruby/ruby/blob/v2_1_5/ext/win32/lib/win32/registry.rb#L563
https://github.com/ruby/ruby/blob/v2_1_5/ext/win32/lib/win32/registry.rb#L631
https://github.com/ruby/ruby/blob/v2_1_5/ext/win32/lib/win32/registry.rb#L307


Inside Puppet, we employed a solution that avoids Ruby's `Win32::Registry` when performing enumeration, and relies on internal helpers instead (avoiding unnecessary string encodings).  This was unfortunate, but necessary:
https://github.com/puppetlabs/puppet/commit/c610cd01eeef3fafa7aa2761a3435dd6c1b0d8d4

Note also that we typically convert `UTF-16LE` strings to `UTF-8` internally (since this is almost always guaranteed to be a lossless conversion), until we reach an end-user boundary where they absolutely need a specific encoding rendered. For instance, our version of `read` converts to `UTF8`:
https://github.com/puppetlabs/puppet/blob/c610cd01eeef3fafa7aa2761a3435dd6c1b0d8d4/lib/puppet/util/windows/registry.rb#L211-L214



I suggest that other locations where strings are re-encoded be examined for potential issues, as locale codepage conversions are generally considered dangerous given Win32 APIs use `UTF-16LE`.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:70363] [Ruby trunk - Bug #11410] Win32 Registry enumeration performs unnecessary string re-encoding which cause UndefinedConversionError exceptions
       [not found] <redmine.issue-11410.20150802183657@ruby-lang.org>
                   ` (3 preceding siblings ...)
  2015-08-03  2:08 ` [ruby-core:70223] [Ruby trunk - Bug #11410] " nobu
@ 2015-08-13  6:57 ` ethan_j_brown
  4 siblings, 0 replies; 5+ messages in thread
From: ethan_j_brown @ 2015-08-13  6:57 UTC (permalink / raw
  To: ruby-core

Issue #11410 has been updated by Ethan Brown.


I think the best solution here is to use UTF-8 strings wherever possible.  If a program needs to use locale, then let the program decide to do that.  I don't think Ruby should be making encoding decisions for a user like this, given Ruby is using wide character APIs and UTF-16LE strings.

While your proposed solution should work, I don't think the burden should be put on the calling code to always set an encoding value everywhere `#each_value` or `#each_key` is used, to prevent an exception.  Keep in mind that `#keys` and `#values` call `#each_keys` and `#each_value`, but with your solution, there is no way to override the encoding when using those methods.

----------------------------------------
Bug #11410: Win32 Registry enumeration performs unnecessary string re-encoding which cause UndefinedConversionError exceptions
https://bugs.ruby-lang.org/issues/11410#change-53771

* Author: Ethan Brown
* Status: Feedback
* Priority: Normal
* Assignee: 
* ruby -v: ruby 2.1.5p273 (2014-11-13 revision 48405) [x64-mingw32]
* Backport: 2.0.0: DONTNEED, 2.1: REQUIRED, 2.2: REQUIRED
----------------------------------------
When enumerating keys with `Win32::Registry#each_key` / `Win32::Registry#keys` or values with `Win32::Registry#each_value` / `Win32::Registry#values`, Ruby will take a `UTF-16LE` string returned from the Windows API and convert it to the local codepage.  In the case of `each_value`, the string is then immediately converted back to `UTF16-LE` before being used in subsequent Windows API calls.  Not only is this conversion unnecessary, but it may cause encoding exceptions when the local codepage does not support all of the characters present in the original Unicode string.

One such example of this is when a Unicode en-dash `U+2013` appears in a string, and the local codepage is `IBM437`, which has no equivalent character. But this is just one of many examples that may trigger this behavior. 


~~~
[1] pry(main)> RUBY_VERSION
=> "2.1.5"
[2] pry(main)> ENDASH_UTF_16 = [0x2013]
=> [8211]
[3] pry(main)> utf_16_str = ENDASH_UTF_16.pack('s*').force_encoding(Encoding::UTF_16LE)
=> "\u2013"
[4] pry(main)> utf_16_str.encode(Encoding::IBM437)
Encoding::UndefinedConversionError: U+2013 to IBM437 in conversion from UTF-16LE to UTF-8 to IBM437
from (pry):4:in `encode'
~~~


NOTE: Normal registry reads of a value at a particular key are not problematic - the bad behavior is triggered specifically during enumeration. 

This is primarily as a result of the `export_string` function which re-encodes strings
https://github.com/ruby/ruby/blob/ruby_2_1/ext/win32/lib/win32/registry.rb#L894-L896


It is used by `each_value` and `each_key`, which return `UTF-16LE` strings:

https://github.com/ruby/ruby/blob/ruby_2_1/ext/win32/lib/win32/registry.rb#L561
https://github.com/ruby/ruby/blob/ruby_2_1/ext/win32/lib/win32/registry.rb#L598


In the `each_value` method, this LOCALE re-encoded string is then passed to the `read` method, where it is turned back into a `UTF16-LE` string to be passed to `RegQueryValueExW`

https://github.com/ruby/ruby/blob/v2_1_5/ext/win32/lib/win32/registry.rb#L563
https://github.com/ruby/ruby/blob/v2_1_5/ext/win32/lib/win32/registry.rb#L631
https://github.com/ruby/ruby/blob/v2_1_5/ext/win32/lib/win32/registry.rb#L307


Inside Puppet, we employed a solution that avoids Ruby's `Win32::Registry` when performing enumeration, and relies on internal helpers instead (avoiding unnecessary string encodings).  This was unfortunate, but necessary:
https://github.com/puppetlabs/puppet/commit/c610cd01eeef3fafa7aa2761a3435dd6c1b0d8d4

Note also that we typically convert `UTF-16LE` strings to `UTF-8` internally (since this is almost always guaranteed to be a lossless conversion), until we reach an end-user boundary where they absolutely need a specific encoding rendered. For instance, our version of `read` converts to `UTF8`:
https://github.com/puppetlabs/puppet/blob/c610cd01eeef3fafa7aa2761a3435dd6c1b0d8d4/lib/puppet/util/windows/registry.rb#L211-L214



I suggest that other locations where strings are re-encoded be examined for potential issues, as locale codepage conversions are generally considered dangerous given Win32 APIs use `UTF-16LE`.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-08-13  6:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <redmine.issue-11410.20150802183657@ruby-lang.org>
2015-08-02 18:36 ` [ruby-core:70214] [Ruby trunk - Bug #11410] [Open] Win32 Registry enumeration performs unnecessary string re-encoding which cause UndefinedConversionError exceptions ethan_j_brown
2015-08-02 19:27 ` [ruby-core:70216] [Ruby trunk - Bug #11410] " ethan_j_brown
2015-08-03  1:49 ` [ruby-core:70221] [Ruby trunk - Bug #11410] [Feedback] " nobu
2015-08-03  2:08 ` [ruby-core:70223] [Ruby trunk - Bug #11410] " nobu
2015-08-13  6:57 ` [ruby-core:70363] " ethan_j_brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).