ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:81681] [Ruby trunk Bug#13660] rb_str_hash_m discards bits from the hash
       [not found] <redmine.issue-13660.20170614215607@ruby-lang.org>
@ 2017-06-14 21:56 ` eregontp
  2017-06-14 21:58 ` [ruby-core:81682] " eregontp
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 5+ messages in thread
From: eregontp @ 2017-06-14 21:56 UTC (permalink / raw)
  To: ruby-core

Issue #13660 has been reported by Eregon (Benoit Daloze).

----------------------------------------
Bug #13660: rb_str_hash_m discards bits from the hash
https://bugs.ruby-lang.org/issues/13660

* Author: Eregon (Benoit Daloze)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.3.3p222 (2016-11-21 revision 56859) [x64-mingw32]
* Backport: 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN
----------------------------------------
I believe rb_str_hash_m might discard some bits from the hash value in some situations.

It computes the hash as a st_index_t, which is either a unsigned long or a unsigned long long.
But the st_index_t value is converted to a VALUE with:
#define ST2FIX(h) LONG2FIX((long)(h))

Note that for instance on x64-mingw32, SIZEOF_LONG is 4, but SIZEOF_LONG_LONG and SIZEOF_VOIDP are 8 bytes.
So that truncates half the bits of the hash on such a platform if my understanding is correct.

Even is SIZEOF_LONG is 8, LONG2FIX loses the MSB I think, given that not all long can fit the Fixnum range on MRI (should it be LONG2NUM?).
Also, I am not sure if it is intended to cast from a unsigned value to a signed value.

I tried many things while debugging the rb_str_hash spec on ruby/spec and eventually gave up.
This computation looks wrong to me in MRI.

For info, here is my debug code:
https://github.com/eregon/rubyspec/blob/d62189450c0a56bfcd379e5e505ad097892d2bc7/optional/capi/string_spec.rb#L501-L518
https://github.com/eregon/rubyspec/blob/d62189450c0a56bfcd379e5e505ad097892d2bc7/optional/capi/ext/string_spec.c#L361-L381
and the build result on AppVeyor:
https://ci.appveyor.com/project/eregon/spec-x948i/build/629



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:81682] [Ruby trunk Bug#13660] rb_str_hash_m discards bits from the hash
       [not found] <redmine.issue-13660.20170614215607@ruby-lang.org>
  2017-06-14 21:56 ` [ruby-core:81681] [Ruby trunk Bug#13660] rb_str_hash_m discards bits from the hash eregontp
@ 2017-06-14 21:58 ` eregontp
  2017-06-15  5:13 ` [ruby-core:81688] " duerst
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 5+ messages in thread
From: eregontp @ 2017-06-14 21:58 UTC (permalink / raw)
  To: ruby-core

Issue #13660 has been updated by Eregon (Benoit Daloze).


What is particularly puzzling on that AppVeyor log is both hash values at the Ruby level look the same, and have the same object_id, yet the values are not Fixnum#==.

----------------------------------------
Bug #13660: rb_str_hash_m discards bits from the hash
https://bugs.ruby-lang.org/issues/13660#change-65374

* Author: Eregon (Benoit Daloze)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.3.3p222 (2016-11-21 revision 56859) [x64-mingw32]
* Backport: 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN
----------------------------------------
I believe rb_str_hash_m might discard some bits from the hash value in some situations.

It computes the hash as a st_index_t, which is either a unsigned long or a unsigned long long.
But the st_index_t value is converted to a VALUE with:
#define ST2FIX(h) LONG2FIX((long)(h))

Note that for instance on x64-mingw32, SIZEOF_LONG is 4, but SIZEOF_LONG_LONG and SIZEOF_VOIDP are 8 bytes.
So that truncates half the bits of the hash on such a platform if my understanding is correct.

Even is SIZEOF_LONG is 8, LONG2FIX loses the MSB I think, given that not all long can fit the Fixnum range on MRI (should it be LONG2NUM?).
Also, I am not sure if it is intended to cast from a unsigned value to a signed value.

I tried many things while debugging the rb_str_hash spec on ruby/spec and eventually gave up.
This computation looks wrong to me in MRI.

For info, here is my debug code:
https://github.com/eregon/rubyspec/blob/d62189450c0a56bfcd379e5e505ad097892d2bc7/optional/capi/string_spec.rb#L501-L518
https://github.com/eregon/rubyspec/blob/d62189450c0a56bfcd379e5e505ad097892d2bc7/optional/capi/ext/string_spec.c#L361-L381
and the build result on AppVeyor:
https://ci.appveyor.com/project/eregon/spec-x948i/build/629



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:81688] [Ruby trunk Bug#13660] rb_str_hash_m discards bits from the hash
       [not found] <redmine.issue-13660.20170614215607@ruby-lang.org>
  2017-06-14 21:56 ` [ruby-core:81681] [Ruby trunk Bug#13660] rb_str_hash_m discards bits from the hash eregontp
  2017-06-14 21:58 ` [ruby-core:81682] " eregontp
@ 2017-06-15  5:13 ` duerst
  2017-07-15  2:22 ` [ruby-core:82071] " shyouhei
  2017-07-17 16:11 ` [ruby-core:82089] " eregontp
  4 siblings, 0 replies; 5+ messages in thread
From: duerst @ 2017-06-15  5:13 UTC (permalink / raw)
  To: ruby-core

Issue #13660 has been updated by duerst (Martin Dürst).


I don't think there is any guarantee for the length of a hash value in Ruby. It's just assumed it's long enough to not lead to overly many collisions.

Also, if the calculation of the original value (before throwing away bits) is really good (i.e. all bits of the input affect all bits of the output,...), then when there is a need to shorten a hash value, which bits are being thrown away shouldn't make any difference. (sorry, quite a few ifs)

----------------------------------------
Bug #13660: rb_str_hash_m discards bits from the hash
https://bugs.ruby-lang.org/issues/13660#change-65378

* Author: Eregon (Benoit Daloze)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.3.3p222 (2016-11-21 revision 56859) [x64-mingw32]
* Backport: 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN
----------------------------------------
I believe rb_str_hash_m might discard some bits from the hash value in some situations.

It computes the hash as a st_index_t, which is either a unsigned long or a unsigned long long.
But the st_index_t value is converted to a VALUE with:
#define ST2FIX(h) LONG2FIX((long)(h))

Note that for instance on x64-mingw32, SIZEOF_LONG is 4, but SIZEOF_LONG_LONG and SIZEOF_VOIDP are 8 bytes.
So that truncates half the bits of the hash on such a platform if my understanding is correct.

Even is SIZEOF_LONG is 8, LONG2FIX loses the MSB I think, given that not all long can fit the Fixnum range on MRI (should it be LONG2NUM?).
Also, I am not sure if it is intended to cast from a unsigned value to a signed value.

I tried many things while debugging the rb_str_hash spec on ruby/spec and eventually gave up.
This computation looks wrong to me in MRI.

For info, here is my debug code:
https://github.com/eregon/rubyspec/blob/d62189450c0a56bfcd379e5e505ad097892d2bc7/optional/capi/string_spec.rb#L501-L518
https://github.com/eregon/rubyspec/blob/d62189450c0a56bfcd379e5e505ad097892d2bc7/optional/capi/ext/string_spec.c#L361-L381
and the build result on AppVeyor:
https://ci.appveyor.com/project/eregon/spec-x948i/build/629



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:82071] [Ruby trunk Bug#13660] rb_str_hash_m discards bits from the hash
       [not found] <redmine.issue-13660.20170614215607@ruby-lang.org>
                   ` (2 preceding siblings ...)
  2017-06-15  5:13 ` [ruby-core:81688] " duerst
@ 2017-07-15  2:22 ` shyouhei
  2017-07-17 16:11 ` [ruby-core:82089] " eregontp
  4 siblings, 0 replies; 5+ messages in thread
From: shyouhei @ 2017-07-15  2:22 UTC (permalink / raw)
  To: ruby-core

Issue #13660 has been updated by shyouhei (Shyouhei Urabe).


We looked at this issue at yesterday's developer meeting.

The attendees agree that current behaviour is intentional.  Because creating Bignums every time is too slow for this experiential hot spot, we want to avoid such thing and stick to Fixnum.

If you (or other cryptography experts) have any concerns on information security by cutting bits of hash values, please tell us a bit more details about what's wrong.

----------------------------------------
Bug #13660: rb_str_hash_m discards bits from the hash
https://bugs.ruby-lang.org/issues/13660#change-65803

* Author: Eregon (Benoit Daloze)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.3.3p222 (2016-11-21 revision 56859) [x64-mingw32]
* Backport: 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN
----------------------------------------
I believe rb_str_hash_m might discard some bits from the hash value in some situations.

It computes the hash as a st_index_t, which is either a unsigned long or a unsigned long long.
But the st_index_t value is converted to a VALUE with:
#define ST2FIX(h) LONG2FIX((long)(h))

Note that for instance on x64-mingw32, SIZEOF_LONG is 4, but SIZEOF_LONG_LONG and SIZEOF_VOIDP are 8 bytes.
So that truncates half the bits of the hash on such a platform if my understanding is correct.

Even is SIZEOF_LONG is 8, LONG2FIX loses the MSB I think, given that not all long can fit the Fixnum range on MRI (should it be LONG2NUM?).
Also, I am not sure if it is intended to cast from a unsigned value to a signed value.

I tried many things while debugging the rb_str_hash spec on ruby/spec and eventually gave up.
This computation looks wrong to me in MRI.

For info, here is my debug code:
https://github.com/eregon/rubyspec/blob/d62189450c0a56bfcd379e5e505ad097892d2bc7/optional/capi/string_spec.rb#L501-L518
https://github.com/eregon/rubyspec/blob/d62189450c0a56bfcd379e5e505ad097892d2bc7/optional/capi/ext/string_spec.c#L361-L381
and the build result on AppVeyor:
https://ci.appveyor.com/project/eregon/spec-x948i/build/629



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:82089] [Ruby trunk Bug#13660] rb_str_hash_m discards bits from the hash
       [not found] <redmine.issue-13660.20170614215607@ruby-lang.org>
                   ` (3 preceding siblings ...)
  2017-07-15  2:22 ` [ruby-core:82071] " shyouhei
@ 2017-07-17 16:11 ` eregontp
  4 siblings, 0 replies; 5+ messages in thread
From: eregontp @ 2017-07-17 16:11 UTC (permalink / raw)
  To: ruby-core

Issue #13660 has been updated by Eregon (Benoit Daloze).


I think the case where half the bits are lost could become a potential security issue.
Essentially all strings which have the same first half will collide in a Hash, and that's likely trivial to generate
(the same prefix/suffix of the right length is likely to generate the same half).

In that case (sizeof(long) < sizeof(void*)), I think at least the two parts should be combined with something like (long)(value ^ (value >> 32)).

But I am not a security expert.

----------------------------------------
Bug #13660: rb_str_hash_m discards bits from the hash
https://bugs.ruby-lang.org/issues/13660#change-65820

* Author: Eregon (Benoit Daloze)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.3.3p222 (2016-11-21 revision 56859) [x64-mingw32]
* Backport: 2.2: UNKNOWN, 2.3: UNKNOWN, 2.4: UNKNOWN
----------------------------------------
I believe rb_str_hash_m might discard some bits from the hash value in some situations.

It computes the hash as a st_index_t, which is either a unsigned long or a unsigned long long.
But the st_index_t value is converted to a VALUE with:
#define ST2FIX(h) LONG2FIX((long)(h))

Note that for instance on x64-mingw32, SIZEOF_LONG is 4, but SIZEOF_LONG_LONG and SIZEOF_VOIDP are 8 bytes.
So that truncates half the bits of the hash on such a platform if my understanding is correct.

Even is SIZEOF_LONG is 8, LONG2FIX loses the MSB I think, given that not all long can fit the Fixnum range on MRI (should it be LONG2NUM?).
Also, I am not sure if it is intended to cast from a unsigned value to a signed value.

I tried many things while debugging the rb_str_hash spec on ruby/spec and eventually gave up.
This computation looks wrong to me in MRI.

For info, here is my debug code:
https://github.com/eregon/rubyspec/blob/d62189450c0a56bfcd379e5e505ad097892d2bc7/optional/capi/string_spec.rb#L501-L518
https://github.com/eregon/rubyspec/blob/d62189450c0a56bfcd379e5e505ad097892d2bc7/optional/capi/ext/string_spec.c#L361-L381
and the build result on AppVeyor:
https://ci.appveyor.com/project/eregon/spec-x948i/build/629



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2017-07-17 16:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <redmine.issue-13660.20170614215607@ruby-lang.org>
2017-06-14 21:56 ` [ruby-core:81681] [Ruby trunk Bug#13660] rb_str_hash_m discards bits from the hash eregontp
2017-06-14 21:58 ` [ruby-core:81682] " eregontp
2017-06-15  5:13 ` [ruby-core:81688] " duerst
2017-07-15  2:22 ` [ruby-core:82071] " shyouhei
2017-07-17 16:11 ` [ruby-core:82089] " eregontp

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).