[ruby-core:89985] [Ruby trunk Feature#15331] [PATCH] Faster hashing for short string literals

ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed

* [ruby-core:89985] [Ruby trunk Feature#15331] [PATCH] Faster hashing for short string literals
       [not found] <redmine.issue-15331.20181122220623@ruby-lang.org>
@ 2018-11-22 22:06 ` alanwucanada
  2018-11-23  1:46 ` [ruby-core:89994] " matz
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: alanwucanada @ 2018-11-22 22:06 UTC (permalink / raw)
  To: ruby-core

Issue #15331 has been reported by alanwu (Alan Wu).

----------------------------------------
Feature #15331: [PATCH] Faster hashing for short string literals
https://bugs.ruby-lang.org/issues/15331

* Author: alanwu (Alan Wu)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
## Background

It's common for applications to use string literals as hash keys, especially
for applications that work with YAML or JSON:

```
paylod['name']
```

At the moment (r65895), hash lookups using a string key is about 30% slower
than using a symbol key.

## Proposal

We memoize the hash code for short fstrings. There is extra, currently unused
space at the end of the RString struct we can use to store the hash code.
The unique nature of fstrings makes it so that every user of each fstring benefit
from the memoized hash.

## Evaluation

The included benchmark hash_aref_str.rb is about 20% faster with this optimization.
hash_aref_long_str.rb shows that for long strings which we cannot memoize, there is
no visible performance penalty.

vm2_freezestring.yml is also not visibly slower after this optimization.

I have also attached a bechmark (string_key_vs_symbol_key.rb) that compares hash
lookups with symbols against hash lookups with strings. With this optimization, the
gap in performance is smaller. (10% slower post patch vs 30% slower on trunk)

---Files--------------------------------
0001-Hash-code-memoization-for-short-fstrings.patch (3.62 KB)
string_key_vs_symbol_key.rb (594 Bytes)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [ruby-core:89994] [Ruby trunk Feature#15331] [PATCH] Faster hashing for short string literals
       [not found] <redmine.issue-15331.20181122220623@ruby-lang.org>
  2018-11-22 22:06 ` [ruby-core:89985] [Ruby trunk Feature#15331] [PATCH] Faster hashing for short string literals alanwucanada
@ 2018-11-23  1:46 ` matz
  2018-11-23 12:00 ` [ruby-core:90005] " nobu
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: matz @ 2018-11-23  1:46 UTC (permalink / raw)
  To: ruby-core

Issue #15331 has been updated by matz (Yukihiro Matsumoto).

I like the idea!

Matz.

----------------------------------------
Feature #15331: [PATCH] Faster hashing for short string literals
https://bugs.ruby-lang.org/issues/15331#change-75099

* Author: alanwu (Alan Wu)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
## Background

It's common for applications to use string literals as hash keys, especially
for applications that work with YAML or JSON:

```
paylod['name']
```

At the moment (r65895), hash lookups using a string key is about 30% slower
than using a symbol key.

## Proposal

We memoize the hash code for short fstrings. There is extra, currently unused
space at the end of the RString struct we can use to store the hash code.
The unique nature of fstrings makes it so that every user of each fstring benefit
from the memoized hash.

## Evaluation

The included benchmark hash_aref_str.rb is about 20% faster with this optimization.
hash_aref_long_str.rb shows that for long strings which we cannot memoize, there is
no visible performance penalty.

vm2_freezestring.yml is also not visibly slower after this optimization.

I have also attached a bechmark (string_key_vs_symbol_key.rb) that compares hash
lookups with symbols against hash lookups with strings. With this optimization, the
gap in performance is smaller. (10% slower post patch vs 30% slower on trunk)

---Files--------------------------------
string_key_vs_symbol_key.rb (594 Bytes)
0001-Hash-code-memoization-for-short-fstrings.patch (3.62 KB)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [ruby-core:90005] [Ruby trunk Feature#15331] [PATCH] Faster hashing for short string literals
       [not found] <redmine.issue-15331.20181122220623@ruby-lang.org>
  2018-11-22 22:06 ` [ruby-core:89985] [Ruby trunk Feature#15331] [PATCH] Faster hashing for short string literals alanwucanada
  2018-11-23  1:46 ` [ruby-core:89994] " matz
@ 2018-11-23 12:00 ` nobu
  2018-11-23 18:52 ` [ruby-core:90013] " alanwucanada
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: nobu @ 2018-11-23 12:00 UTC (permalink / raw)
  To: ruby-core

Issue #15331 has been updated by nobu (Nobuyoshi Nakada).

```c
#define CAN_MEMO_HASH(str) \
    (RB_FL_TEST_RAW((str), RSTRING_NOEMBED | RUBY_FL_FREEZE) == RUBY_FL_FREEZE && RSTRING_EMBED_LEN(str) < MEMO_HASH_LEN_MAX)
```

Maybe, the last comparison should be `<=`, as the size of `ary` is `MEMO_HASH_LEN_MAX + 1`?
And wide-char encodings would need consideration.

----------------------------------------
Feature #15331: [PATCH] Faster hashing for short string literals
https://bugs.ruby-lang.org/issues/15331#change-75109

* Author: alanwu (Alan Wu)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
## Background

It's common for applications to use string literals as hash keys, especially
for applications that work with YAML or JSON:

```
paylod['name']
```

At the moment (r65895), hash lookups using a string key is about 30% slower
than using a symbol key.

## Proposal

We memoize the hash code for short fstrings. There is extra, currently unused
space at the end of the RString struct we can use to store the hash code.
The unique nature of fstrings makes it so that every user of each fstring benefit
from the memoized hash.

## Evaluation

The included benchmark hash_aref_str.rb is about 20% faster with this optimization.
hash_aref_long_str.rb shows that for long strings which we cannot memoize, there is
no visible performance penalty.

vm2_freezestring.yml is also not visibly slower after this optimization.

I have also attached a bechmark (string_key_vs_symbol_key.rb) that compares hash
lookups with symbols against hash lookups with strings. With this optimization, the
gap in performance is smaller. (10% slower post patch vs 30% slower on trunk)

---Files--------------------------------
string_key_vs_symbol_key.rb (594 Bytes)
0001-Hash-code-memoization-for-short-fstrings.patch (3.62 KB)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [ruby-core:90013] [Ruby trunk Feature#15331] [PATCH] Faster hashing for short string literals
       [not found] <redmine.issue-15331.20181122220623@ruby-lang.org>
                   ` (2 preceding siblings ...)
  2018-11-23 12:00 ` [ruby-core:90005] " nobu
@ 2018-11-23 18:52 ` alanwucanada
  2018-11-24  5:35 ` [ruby-core:90022] " alanwucanada
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: alanwucanada @ 2018-11-23 18:52 UTC (permalink / raw)
  To: ruby-core

Issue #15331 has been updated by alanwu (Alan Wu).

File 0001-Hash-code-memoization-for-short-fstrings-v2.patch added

> Maybe, the last comparison should be `<=`, as the size of `ary` is `MEMO_HASH_LEN_MAX + 1`?

Yes! I was one byte too conservative.

> And wide-char encodings would need consideration.

I don't see why we need to special-case any encoding. `RSTRING_EMBED_LEN` gets the size in bytes and is used in `String#bytesize`. Do we write/read past `arr[RSTRING_EMBED_LEN(str)]` or modify frozen strings in-place somewhere?

I attached an updated patch that has one additional benchmark that test the code path which writes the memo. No visible penalty there either.

----------------------------------------
Feature #15331: [PATCH] Faster hashing for short string literals
https://bugs.ruby-lang.org/issues/15331#change-75117

* Author: alanwu (Alan Wu)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
## Background

It's common for applications to use string literals as hash keys, especially
for applications that work with YAML or JSON:

```
paylod['name']
```

At the moment (r65895), hash lookups using a string key is about 30% slower
than using a symbol key.

## Proposal

We memoize the hash code for short fstrings. There is extra, currently unused
space at the end of the RString struct we can use to store the hash code.
The unique nature of fstrings makes it so that every user of each fstring benefit
from the memoized hash.

## Evaluation

The included benchmark hash_aref_str.rb is about 20% faster with this optimization.
hash_aref_long_str.rb shows that for long strings which we cannot memoize, there is
no visible performance penalty.

vm2_freezestring.yml is also not visibly slower after this optimization.

I have also attached a bechmark (string_key_vs_symbol_key.rb) that compares hash
lookups with symbols against hash lookups with strings. With this optimization, the
gap in performance is smaller. (10% slower post patch vs 30% slower on trunk)

---Files--------------------------------
string_key_vs_symbol_key.rb (594 Bytes)
0001-Hash-code-memoization-for-short-fstrings.patch (3.62 KB)
0001-Hash-code-memoization-for-short-fstrings-v2.patch (4.29 KB)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [ruby-core:90022] [Ruby trunk Feature#15331] [PATCH] Faster hashing for short string literals
       [not found] <redmine.issue-15331.20181122220623@ruby-lang.org>
                   ` (3 preceding siblings ...)
  2018-11-23 18:52 ` [ruby-core:90013] " alanwucanada
@ 2018-11-24  5:35 ` alanwucanada
  2018-11-25  3:22 ` [ruby-core:90054] " nobu
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: alanwucanada @ 2018-11-24  5:35 UTC (permalink / raw)
  To: ruby-core

Issue #15331 has been updated by alanwu (Alan Wu).

Ah I see the problem with wide-char encodings now. There are some C APIs that write a codepoint-wide terminator at arr[RSTRING_EMBED_LEN], which can clobber the hash memo.
I will benchmark to see if fetching the encoding struct is slow; I think it's a cache miss every time.

----------------------------------------
Feature #15331: [PATCH] Faster hashing for short string literals
https://bugs.ruby-lang.org/issues/15331#change-75127

* Author: alanwu (Alan Wu)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
## Background

It's common for applications to use string literals as hash keys, especially
for applications that work with YAML or JSON:

```
paylod['name']
```

At the moment (r65895), hash lookups using a string key is about 30% slower
than using a symbol key.

## Proposal

We memoize the hash code for short fstrings. There is extra, currently unused
space at the end of the RString struct we can use to store the hash code.
The unique nature of fstrings makes it so that every user of each fstring benefit
from the memoized hash.

## Evaluation

The included benchmark hash_aref_fstr.rb is about 20% faster with this optimization.
hash_aref_long_str.rb shows that for long strings which we cannot memoize, there is
no visible performance penalty.

vm2_freezestring.yml is also not visibly slower after this optimization.

I have also attached a bechmark (string_key_vs_symbol_key.rb) that compares hash
lookups with symbols against hash lookups with strings. With this optimization, the
gap in performance is smaller. (10% slower post patch vs 30% slower on trunk)

---Files--------------------------------
string_key_vs_symbol_key.rb (594 Bytes)
0001-Hash-code-memoization-for-short-fstrings.patch (3.62 KB)
0001-Hash-code-memoization-for-short-fstrings-v2.patch (4.29 KB)
0001-Hash-code-memoization-for-short-fstrings-v3.patch (4.36 KB)
0001-Hash-code-memoization-for-short-fstrings-v4.patch (4.31 KB)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [ruby-core:90054] [Ruby trunk Feature#15331] [PATCH] Faster hashing for short string literals
       [not found] <redmine.issue-15331.20181122220623@ruby-lang.org>
                   ` (4 preceding siblings ...)
  2018-11-24  5:35 ` [ruby-core:90022] " alanwucanada
@ 2018-11-25  3:22 ` nobu
  2018-11-26  4:39 ` [ruby-core:90066] " alanwucanada
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: nobu @ 2018-11-25  3:22 UTC (permalink / raw)
  To: ruby-core

Issue #15331 has been updated by nobu (Nobuyoshi Nakada).

One idea is to shrink `MEMO_HASH_LEN_MAX` for a wide-char terminators.

----------------------------------------
Feature #15331: [PATCH] Faster hashing for short string literals
https://bugs.ruby-lang.org/issues/15331#change-75163

* Author: alanwu (Alan Wu)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
## Background

It's common for applications to use string literals as hash keys, especially
for applications that work with YAML or JSON:

```
paylod['name']
```

At the moment (r65895), hash lookups using a string key is about 30% slower
than using a symbol key.

## Proposal

We memoize the hash code for short fstrings. There is extra, currently unused
space at the end of the RString struct we can use to store the hash code.
The unique nature of fstrings makes it so that every user of each fstring benefit
from the memoized hash.

## Evaluation

The included benchmark hash_aref_fstr.rb is about 20% faster with this optimization.
hash_aref_long_str.rb shows that for long strings which we cannot memoize, there is
no visible performance penalty.

vm2_freezestring.yml is also not visibly slower after this optimization.

I have also attached a bechmark (string_key_vs_symbol_key.rb) that compares hash
lookups with symbols against hash lookups with strings. With this optimization, the
gap in performance is smaller. (10% slower post patch vs 30% slower on trunk)

---Files--------------------------------
string_key_vs_symbol_key.rb (594 Bytes)
0001-Hash-code-memoization-for-short-fstrings.patch (3.62 KB)
0001-Hash-code-memoization-for-short-fstrings-v2.patch (4.29 KB)
0001-Hash-code-memoization-for-short-fstrings-v3.patch (4.36 KB)
0001-Hash-code-memoization-for-short-fstrings-v4.patch (4.31 KB)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [ruby-core:90066] [Ruby trunk Feature#15331] [PATCH] Faster hashing for short string literals
       [not found] <redmine.issue-15331.20181122220623@ruby-lang.org>
                   ` (5 preceding siblings ...)
  2018-11-25  3:22 ` [ruby-core:90054] " nobu
@ 2018-11-26  4:39 ` alanwucanada
  2018-11-26 13:47 ` [ruby-core:90076] " nobu
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: alanwucanada @ 2018-11-26  4:39 UTC (permalink / raw)
  To: ruby-core

Issue #15331 has been updated by alanwu (Alan Wu).

File 0001-Hash-code-memoization-for-short-fstrings-v5.patch added

I've taken multi-byte terminators into account and updated my implementation. I've also included a test that should warn us about hash clobbering.
I also noticed that the benchmark that creates many unique fstrings is actually 5% slower on v4 of this patch with a higher iteration count.
I had to change the hash table interaction for fstring creation a bit to make up the performance difference.
Apologies for the bigger patch size.

```

For short enough fstrings, we have extra space inside the
RString struct we could use to remember the hash code.

Applications that work with JSON or YAML benefit from this
optimization as they often use string literals as hash keys.
The popular ORM ActiveRecord also uses fstring internally and
thus would benefit from this optimization.

This also provides yet another incentive for users to turn on
frozen string literals.

On a 64-bit machine, for strings that use ASCII and UTF-8
encoding, "short enough" means len < 16. For 32 bit machines,
"short enough" means len < 8.

* st.c: introduce a new function: st_update_with_hash() that
  allows users to avoid unncessary re-hashing when doing
  multiple updates.

* string.c: memoize hash code for short, embeded fstrings after
  the embedded buffer. Make use of st_update_with_hash() to
  avoid hashing the same string multiple times.

* string.c: harden str_fill_term() to make sure we don't zero
  more than 4 bytes of memory. No encoding in the world needs
  a terminator longer than 4 bytes at the moment.

* hash_aref_fstr.rb: benchmark demonstrating the best case
  scenario. About 20% faster with this commit.

* hash_aref_long_str.rb: benchamrk demonstrating the worst case
  scenario for this optimization. It shows that there is no
  visible performance impact for strings that don't benefit
  from the optimization.

* freeze_unique_strings.yml: benchmark that creates many unique
  fstrings. We memoize the hash for each fstring we create, but
  there is no visible performance impact.
```

----------------------------------------
Feature #15331: [PATCH] Faster hashing for short string literals
https://bugs.ruby-lang.org/issues/15331#change-75177

* Author: alanwu (Alan Wu)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
## Background

It's common for applications to use string literals as hash keys, especially
for applications that work with YAML or JSON:

```
paylod['name']
```

At the moment (r65895), hash lookups using a string key is about 30% slower
than using a symbol key.

## Proposal

We memoize the hash code for short fstrings. There is extra, currently unused
space at the end of the RString struct we can use to store the hash code.
The unique nature of fstrings makes it so that every user of each fstring benefit
from the memoized hash.

## Evaluation

The included benchmark hash_aref_fstr.rb is about 20% faster with this optimization.
hash_aref_long_str.rb shows that for long strings which we cannot memoize, there is
no visible performance penalty.

vm2_freezestring.yml is also not visibly slower after this optimization.

I have also attached a bechmark (string_key_vs_symbol_key.rb) that compares hash
lookups with symbols against hash lookups with strings. With this optimization, the
gap in performance is smaller. (10% slower post patch vs 30% slower on trunk)

---Files--------------------------------
string_key_vs_symbol_key.rb (594 Bytes)
0001-Hash-code-memoization-for-short-fstrings.patch (3.62 KB)
0001-Hash-code-memoization-for-short-fstrings-v2.patch (4.29 KB)
0001-Hash-code-memoization-for-short-fstrings-v3.patch (4.36 KB)
0001-Hash-code-memoization-for-short-fstrings-v4.patch (4.31 KB)
0001-Hash-code-memoization-for-short-fstrings-v5.patch (12.9 KB)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [ruby-core:90076] [Ruby trunk Feature#15331] [PATCH] Faster hashing for short string literals
       [not found] <redmine.issue-15331.20181122220623@ruby-lang.org>
                   ` (6 preceding siblings ...)
  2018-11-26  4:39 ` [ruby-core:90066] " alanwucanada
@ 2018-11-26 13:47 ` nobu
  2018-11-26 16:32 ` [ruby-core:90077] " alanwucanada
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: nobu @ 2018-11-26 13:47 UTC (permalink / raw)
  To: ruby-core

Issue #15331 has been updated by nobu (Nobuyoshi Nakada).

alanwu (Alan Wu) wrote:
> I've taken multi-byte terminators into account and updated my implementation. I've also included a test that should warn us about hash clobbering.

Seems nice.

About the macro `BUILTIN_SINGLE_BYTE_ENC_IDX_P`, its name is misleading as it is true for UTF-8.
In common, single-byte encoding means that it uses just one byte per character, 0..255 range, e.g., ASCII-8BIT, US-ASCII, and ISO-8859 family.
UTF-8 is one of mutli-byte encodings.
Also it doesn't cover all non-wchar encodings, so it doesn't seem generic enough to be shared in encindex.h.

> I had to change the hash table interaction for fstring creation a bit to make up the performance difference.
> Apologies for the bigger patch size.

It may be nice to separate `st_update_with_hash` patch.

----------------------------------------
Feature #15331: [PATCH] Faster hashing for short string literals
https://bugs.ruby-lang.org/issues/15331#change-75200

* Author: alanwu (Alan Wu)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
## Background

It's common for applications to use string literals as hash keys, especially
for applications that work with YAML or JSON:

```
paylod['name']
```

At the moment (r65895), hash lookups using a string key is about 30% slower
than using a symbol key.

## Proposal

We memoize the hash code for short fstrings. There is extra, currently unused
space at the end of the RString struct we can use to store the hash code.
The unique nature of fstrings makes it so that every user of each fstring benefit
from the memoized hash.

## Evaluation

The included benchmark hash_aref_fstr.rb is about 20% faster with this optimization.
hash_aref_long_str.rb shows that for long strings which we cannot memoize, there is
no visible performance penalty.

vm2_freezestring.yml is also not visibly slower after this optimization.

I have also attached a bechmark (string_key_vs_symbol_key.rb) that compares hash
lookups with symbols against hash lookups with strings. With this optimization, the
gap in performance is smaller. (10% slower post patch vs 30% slower on trunk)

---Files--------------------------------
string_key_vs_symbol_key.rb (594 Bytes)
0001-Hash-code-memoization-for-short-fstrings.patch (3.62 KB)
0001-Hash-code-memoization-for-short-fstrings-v2.patch (4.29 KB)
0001-Hash-code-memoization-for-short-fstrings-v3.patch (4.36 KB)
0001-Hash-code-memoization-for-short-fstrings-v4.patch (4.31 KB)
0001-Hash-code-memoization-for-short-fstrings-v5.patch (12.9 KB)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [ruby-core:90077] [Ruby trunk Feature#15331] [PATCH] Faster hashing for short string literals
       [not found] <redmine.issue-15331.20181122220623@ruby-lang.org>
                   ` (7 preceding siblings ...)
  2018-11-26 13:47 ` [ruby-core:90076] " nobu
@ 2018-11-26 16:32 ` alanwucanada
  2018-12-05 21:30 ` [ruby-core:90329] " alanwucanada
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: alanwucanada @ 2018-11-26 16:32 UTC (permalink / raw)
  To: ruby-core

Issue #15331 has been updated by alanwu (Alan Wu).

File 0002-Hash-code-memoization-for-short-fstrings.patch added
File 0001-Add-st_update_with_hash.patch added

nobu (Nobuyoshi Nakada) wrote:
> About the macro `BUILTIN_SINGLE_BYTE_ENC_IDX_P`, its name is misleading as it is true for UTF-8.

I agree. I removed it.

> It may be nice to separate `st_update_with_hash` patch.

Done.

I moved away from using +1 in the buffer size, since it could be +1 or +4 depending on the encoding.
I also expanded the test to test a variety of string lengths.

----------------------------------------
Feature #15331: [PATCH] Faster hashing for short string literals
https://bugs.ruby-lang.org/issues/15331#change-75202

* Author: alanwu (Alan Wu)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
## Background

It's common for applications to use string literals as hash keys, especially
for applications that work with YAML or JSON:

```
paylod['name']
```

At the moment (r65895), hash lookups using a string key is about 30% slower
than using a symbol key.

## Proposal

We memoize the hash code for short fstrings. There is extra, currently unused
space at the end of the RString struct we can use to store the hash code.
The unique nature of fstrings makes it so that every user of each fstring benefit
from the memoized hash.

## Evaluation

The included benchmark hash_aref_fstr.rb is about 20% faster with this optimization.
hash_aref_long_str.rb shows that for long strings which we cannot memoize, there is
no visible performance penalty.

vm2_freezestring.yml is also not visibly slower after this optimization.

I have also attached a bechmark (string_key_vs_symbol_key.rb) that compares hash
lookups with symbols against hash lookups with strings. With this optimization, the
gap in performance is smaller. (10% slower post patch vs 30% slower on trunk)

---Files--------------------------------
string_key_vs_symbol_key.rb (594 Bytes)
0001-Hash-code-memoization-for-short-fstrings.patch (3.62 KB)
0001-Hash-code-memoization-for-short-fstrings-v2.patch (4.29 KB)
0001-Hash-code-memoization-for-short-fstrings-v3.patch (4.36 KB)
0001-Hash-code-memoization-for-short-fstrings-v4.patch (4.31 KB)
0001-Hash-code-memoization-for-short-fstrings-v5.patch (12.9 KB)
0001-Add-st_update_with_hash.patch (3.09 KB)
0002-Hash-code-memoization-for-short-fstrings.patch (10 KB)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [ruby-core:90329] [Ruby trunk Feature#15331] [PATCH] Faster hashing for short string literals
       [not found] <redmine.issue-15331.20181122220623@ruby-lang.org>
                   ` (8 preceding siblings ...)
  2018-11-26 16:32 ` [ruby-core:90077] " alanwucanada
@ 2018-12-05 21:30 ` alanwucanada
  2019-02-07 10:44 ` [ruby-core:91468] " matz
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: alanwucanada @ 2018-12-05 21:30 UTC (permalink / raw)
  To: ruby-core

Issue #15331 has been updated by alanwu (Alan Wu).

Is there anything I can do to move this forward?

----------------------------------------
Feature #15331: [PATCH] Faster hashing for short string literals
https://bugs.ruby-lang.org/issues/15331#change-75437

* Author: alanwu (Alan Wu)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
## Background

It's common for applications to use string literals as hash keys, especially
for applications that work with YAML or JSON:

```
paylod['name']
```

At the moment (r65895), hash lookups using a string key is about 30% slower
than using a symbol key.

## Proposal

We memoize the hash code for short fstrings. There is extra, currently unused
space at the end of the RString struct we can use to store the hash code.
The unique nature of fstrings makes it so that every user of each fstring benefit
from the memoized hash.

## Evaluation

The included benchmark hash_aref_fstr.rb is about 20% faster with this optimization.
hash_aref_long_str.rb shows that for long strings which we cannot memoize, there is
no visible performance penalty.

vm2_freezestring.yml is also not visibly slower after this optimization.

I have also attached a bechmark (string_key_vs_symbol_key.rb) that compares hash
lookups with symbols against hash lookups with strings. With this optimization, the
gap in performance is smaller. (10% slower post patch vs 30% slower on trunk)

---Files--------------------------------
string_key_vs_symbol_key.rb (594 Bytes)
0001-Hash-code-memoization-for-short-fstrings.patch (3.62 KB)
0001-Hash-code-memoization-for-short-fstrings-v2.patch (4.29 KB)
0001-Hash-code-memoization-for-short-fstrings-v3.patch (4.36 KB)
0001-Hash-code-memoization-for-short-fstrings-v4.patch (4.31 KB)
0001-Hash-code-memoization-for-short-fstrings-v5.patch (12.9 KB)
0001-Add-st_update_with_hash.patch (3.09 KB)
0002-Hash-code-memoization-for-short-fstrings.patch (10 KB)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [ruby-core:91468] [Ruby trunk Feature#15331] [PATCH] Faster hashing for short string literals
       [not found] <redmine.issue-15331.20181122220623@ruby-lang.org>
                   ` (9 preceding siblings ...)
  2018-12-05 21:30 ` [ruby-core:90329] " alanwucanada
@ 2019-02-07 10:44 ` matz
  2019-02-07 16:35 ` [ruby-core:91473] " alanwucanada
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: matz @ 2019-02-07 10:44 UTC (permalink / raw)
  To: ruby-core

Issue #15331 has been updated by matz (Yukihiro Matsumoto).

Status changed from Open to Closed

@nobu benchmarked the result from the patch, and it does not improve the performance as we expected. Probably when the hash value is memoized by this patch, the string is short by definition, so the cost of hash value calculation is not big at all. So (even though it doesn't harm the performance at all), I don't think it's worth adding.

Matz.

----------------------------------------
Feature #15331: [PATCH] Faster hashing for short string literals
https://bugs.ruby-lang.org/issues/15331#change-76729

* Author: alanwu (Alan Wu)
* Status: Closed
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
## Background

It's common for applications to use string literals as hash keys, especially
for applications that work with YAML or JSON:

```
paylod['name']
```

At the moment (r65895), hash lookups using a string key is about 30% slower
than using a symbol key.

## Proposal

We memoize the hash code for short fstrings. There is extra, currently unused
space at the end of the RString struct we can use to store the hash code.
The unique nature of fstrings makes it so that every user of each fstring benefit
from the memoized hash.

## Evaluation

The included benchmark hash_aref_fstr.rb is about 20% faster with this optimization.
hash_aref_long_str.rb shows that for long strings which we cannot memoize, there is
no visible performance penalty.

vm2_freezestring.yml is also not visibly slower after this optimization.

I have also attached a bechmark (string_key_vs_symbol_key.rb) that compares hash
lookups with symbols against hash lookups with strings. With this optimization, the
gap in performance is smaller. (10% slower post patch vs 30% slower on trunk)

---Files--------------------------------
string_key_vs_symbol_key.rb (594 Bytes)
0001-Hash-code-memoization-for-short-fstrings.patch (3.62 KB)
0001-Hash-code-memoization-for-short-fstrings-v2.patch (4.29 KB)
0001-Hash-code-memoization-for-short-fstrings-v3.patch (4.36 KB)
0001-Hash-code-memoization-for-short-fstrings-v4.patch (4.31 KB)
0001-Hash-code-memoization-for-short-fstrings-v5.patch (12.9 KB)
0001-Add-st_update_with_hash.patch (3.09 KB)
0002-Hash-code-memoization-for-short-fstrings.patch (10 KB)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [ruby-core:91473] [Ruby trunk Feature#15331] [PATCH] Faster hashing for short string literals
       [not found] <redmine.issue-15331.20181122220623@ruby-lang.org>
                   ` (10 preceding siblings ...)
  2019-02-07 10:44 ` [ruby-core:91468] " matz
@ 2019-02-07 16:35 ` alanwucanada
  2019-02-08  6:28 ` [ruby-core:91488] " nobu
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: alanwucanada @ 2019-02-07 16:35 UTC (permalink / raw)
  To: ruby-core

Issue #15331 has been updated by alanwu (Alan Wu).

I proposed this change because every attribute access in Active Record go through a hash lookup.
When we profile our app in production, Active Record attribute access shows up as a clear hot spot.

When I used this patch to run response time benchmark against https://github.com/codetriage/codetriage,
there was a 1% improvement. A 1% improvement would be amazing for us in production.

I'm disappointed that the patch doesn't provide enough speed up to justify the maintenance burden,
but I thank you both for your time.

----------------------------------------
Feature #15331: [PATCH] Faster hashing for short string literals
https://bugs.ruby-lang.org/issues/15331#change-76732

* Author: alanwu (Alan Wu)
* Status: Closed
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
## Background

It's common for applications to use string literals as hash keys, especially
for applications that work with YAML or JSON:

```
paylod['name']
```

At the moment (r65895), hash lookups using a string key is about 30% slower
than using a symbol key.

## Proposal

We memoize the hash code for short fstrings. There is extra, currently unused
space at the end of the RString struct we can use to store the hash code.
The unique nature of fstrings makes it so that every user of each fstring benefit
from the memoized hash.

## Evaluation

The included benchmark hash_aref_fstr.rb is about 20% faster with this optimization.
hash_aref_long_str.rb shows that for long strings which we cannot memoize, there is
no visible performance penalty.

vm2_freezestring.yml is also not visibly slower after this optimization.

I have also attached a bechmark (string_key_vs_symbol_key.rb) that compares hash
lookups with symbols against hash lookups with strings. With this optimization, the
gap in performance is smaller. (10% slower post patch vs 30% slower on trunk)

---Files--------------------------------
string_key_vs_symbol_key.rb (594 Bytes)
0001-Hash-code-memoization-for-short-fstrings.patch (3.62 KB)
0001-Hash-code-memoization-for-short-fstrings-v2.patch (4.29 KB)
0001-Hash-code-memoization-for-short-fstrings-v3.patch (4.36 KB)
0001-Hash-code-memoization-for-short-fstrings-v4.patch (4.31 KB)
0001-Hash-code-memoization-for-short-fstrings-v5.patch (12.9 KB)
0001-Add-st_update_with_hash.patch (3.09 KB)
0002-Hash-code-memoization-for-short-fstrings.patch (10 KB)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [ruby-core:91488] [Ruby trunk Feature#15331] [PATCH] Faster hashing for short string literals
       [not found] <redmine.issue-15331.20181122220623@ruby-lang.org>
                   ` (11 preceding siblings ...)
  2019-02-07 16:35 ` [ruby-core:91473] " alanwucanada
@ 2019-02-08  6:28 ` nobu
  2019-02-16  9:33 ` [ruby-core:91576] " duerst
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 16+ messages in thread
From: nobu @ 2019-02-08  6:28 UTC (permalink / raw)
  To: ruby-core

Issue #15331 has been updated by nobu (Nobuyoshi Nakada).

Isn't 1% a measurement error?
Does it always occur constantly?

----------------------------------------
Feature #15331: [PATCH] Faster hashing for short string literals
https://bugs.ruby-lang.org/issues/15331#change-76747

* Author: alanwu (Alan Wu)
* Status: Closed
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
## Background

It's common for applications to use string literals as hash keys, especially
for applications that work with YAML or JSON:

```
paylod['name']
```

At the moment (r65895), hash lookups using a string key is about 30% slower
than using a symbol key.

## Proposal

We memoize the hash code for short fstrings. There is extra, currently unused
space at the end of the RString struct we can use to store the hash code.
The unique nature of fstrings makes it so that every user of each fstring benefit
from the memoized hash.

## Evaluation

The included benchmark hash_aref_fstr.rb is about 20% faster with this optimization.
hash_aref_long_str.rb shows that for long strings which we cannot memoize, there is
no visible performance penalty.

vm2_freezestring.yml is also not visibly slower after this optimization.

I have also attached a bechmark (string_key_vs_symbol_key.rb) that compares hash
lookups with symbols against hash lookups with strings. With this optimization, the
gap in performance is smaller. (10% slower post patch vs 30% slower on trunk)

---Files--------------------------------
string_key_vs_symbol_key.rb (594 Bytes)
0001-Hash-code-memoization-for-short-fstrings.patch (3.62 KB)
0001-Hash-code-memoization-for-short-fstrings-v2.patch (4.29 KB)
0001-Hash-code-memoization-for-short-fstrings-v3.patch (4.36 KB)
0001-Hash-code-memoization-for-short-fstrings-v4.patch (4.31 KB)
0001-Hash-code-memoization-for-short-fstrings-v5.patch (12.9 KB)
0001-Add-st_update_with_hash.patch (3.09 KB)
0002-Hash-code-memoization-for-short-fstrings.patch (10 KB)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [ruby-core:91576] [Ruby trunk Feature#15331] [PATCH] Faster hashing for short string literals
       [not found] <redmine.issue-15331.20181122220623@ruby-lang.org>
                   ` (12 preceding siblings ...)
  2019-02-08  6:28 ` [ruby-core:91488] " nobu
@ 2019-02-16  9:33 ` duerst
  2019-02-17  3:32 ` [ruby-core:91578] " alanwucanada
  2019-02-17 21:11 ` [ruby-core:91580] " ko1
  15 siblings, 0 replies; 16+ messages in thread
From: duerst @ 2019-02-16  9:33 UTC (permalink / raw)
  To: ruby-core

Issue #15331 has been updated by duerst (Martin Dürst).

nobu (Nobuyoshi Nakada) wrote:
> Isn't 1% a measurement error?
> Does it always occur constantly?

If a 1% improvement is consistently reproducible, then it's worth reconsidering.

alanwu (Alan Wu) wrote:
> I proposed this change because every attribute access in Active Record goes through a hash lookup.
> When we profile our app in production, Active Record attribute access shows up as a clear hot spot.

That doesn't come as a surprise to me.

> When I used this patch to run response time benchmark against https://github.com/codetriage/codetriage,
> there was a 1% improvement. A 1% improvement would be translate to a great deal of cost savings for us in production.

Can you give more information? What 'response time benchmark' did you use (I didn't find any such benchmark in codetriage, but I didn't look very hard and am not familiar with that application)? Did you try other benchmarks, or can you try them?

> I'm disappointed that the patch doesn't provide enough speed-up to justify the maintenance burden,
> but I thank you both for your time.

I don't think we say that 1% isn't enough. If it's reproducible, it's very easily worth it. But if we can't reproduce the speedup, it's difficult for us to include your code.

----------------------------------------
Feature #15331: [PATCH] Faster hashing for short string literals
https://bugs.ruby-lang.org/issues/15331#change-76840

* Author: alanwu (Alan Wu)
* Status: Closed
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
## Background

It's common for applications to use string literals as hash keys, especially
for applications that work with YAML or JSON:

```
paylod['name']
```

At the moment (r65895), hash lookups using a string key is about 30% slower
than using a symbol key.

## Proposal

We memoize the hash code for short fstrings. There is extra, currently unused
space at the end of the RString struct we can use to store the hash code.
The unique nature of fstrings makes it so that every user of each fstring benefit
from the memoized hash.

## Evaluation

The included benchmark hash_aref_fstr.rb is about 20% faster with this optimization.
hash_aref_long_str.rb shows that for long strings which we cannot memoize, there is
no visible performance penalty.

vm2_freezestring.yml is also not visibly slower after this optimization.

I have also attached a bechmark (string_key_vs_symbol_key.rb) that compares hash
lookups with symbols against hash lookups with strings. With this optimization, the
gap in performance is smaller. (10% slower post patch vs 30% slower on trunk)

---Files--------------------------------
string_key_vs_symbol_key.rb (594 Bytes)
0001-Hash-code-memoization-for-short-fstrings.patch (3.62 KB)
0001-Hash-code-memoization-for-short-fstrings-v2.patch (4.29 KB)
0001-Hash-code-memoization-for-short-fstrings-v3.patch (4.36 KB)
0001-Hash-code-memoization-for-short-fstrings-v4.patch (4.31 KB)
0001-Hash-code-memoization-for-short-fstrings-v5.patch (12.9 KB)
0001-Add-st_update_with_hash.patch (3.09 KB)
0002-Hash-code-memoization-for-short-fstrings.patch (10 KB)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [ruby-core:91578] [Ruby trunk Feature#15331] [PATCH] Faster hashing for short string literals
       [not found] <redmine.issue-15331.20181122220623@ruby-lang.org>
                   ` (13 preceding siblings ...)
  2019-02-16  9:33 ` [ruby-core:91576] " duerst
@ 2019-02-17  3:32 ` alanwucanada
  2019-02-17 21:11 ` [ruby-core:91580] " ko1
  15 siblings, 0 replies; 16+ messages in thread
From: alanwucanada @ 2019-02-17  3:32 UTC (permalink / raw)
  To: ruby-core

Issue #15331 has been updated by alanwu (Alan Wu).

After running some benchmarks on rubygems.org with wrk, I do not think this is patch is a good idea anymore.
It seems to decrease throughput instead of increase it, despite the theoretical improvement shown in micro benchmarks.
The 1% improvement is there, but it's due to a change I made to ActiveRecord to make it benefit from hash memoization, not due to the hash memorization itself.

The better place to do this optimization is probably at compile time, after doing inlining, I think.
Thank you again for taking a look at this patch. I learned a lot trying to put it together.

----------------------------------------
Feature #15331: [PATCH] Faster hashing for short string literals
https://bugs.ruby-lang.org/issues/15331#change-76842

* Author: alanwu (Alan Wu)
* Status: Closed
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
## Background

It's common for applications to use string literals as hash keys, especially
for applications that work with YAML or JSON:

```
paylod['name']
```

At the moment (r65895), hash lookups using a string key is about 30% slower
than using a symbol key.

## Proposal

We memoize the hash code for short fstrings. There is extra, currently unused
space at the end of the RString struct we can use to store the hash code.
The unique nature of fstrings makes it so that every user of each fstring benefit
from the memoized hash.

## Evaluation

The included benchmark hash_aref_fstr.rb is about 20% faster with this optimization.
hash_aref_long_str.rb shows that for long strings which we cannot memoize, there is
no visible performance penalty.

vm2_freezestring.yml is also not visibly slower after this optimization.

I have also attached a bechmark (string_key_vs_symbol_key.rb) that compares hash
lookups with symbols against hash lookups with strings. With this optimization, the
gap in performance is smaller. (10% slower post patch vs 30% slower on trunk)

---Files--------------------------------
string_key_vs_symbol_key.rb (594 Bytes)
0001-Hash-code-memoization-for-short-fstrings.patch (3.62 KB)
0001-Hash-code-memoization-for-short-fstrings-v2.patch (4.29 KB)
0001-Hash-code-memoization-for-short-fstrings-v3.patch (4.36 KB)
0001-Hash-code-memoization-for-short-fstrings-v4.patch (4.31 KB)
0001-Hash-code-memoization-for-short-fstrings-v5.patch (12.9 KB)
0001-Add-st_update_with_hash.patch (3.09 KB)
0002-Hash-code-memoization-for-short-fstrings.patch (10 KB)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [ruby-core:91580] [Ruby trunk Feature#15331] [PATCH] Faster hashing for short string literals
       [not found] <redmine.issue-15331.20181122220623@ruby-lang.org>
                   ` (14 preceding siblings ...)
  2019-02-17  3:32 ` [ruby-core:91578] " alanwucanada
@ 2019-02-17 21:11 ` ko1
  15 siblings, 0 replies; 16+ messages in thread
From: ko1 @ 2019-02-17 21:11 UTC (permalink / raw)
  To: ruby-core

Issue #15331 has been updated by ko1 (Koichi Sasada).

My opinion: For short strings, I don't think it doesn't improve.
For long strings, it can be (not sure how many long string's hash value are used). So allocating hash value (+8 bytes) in external allocated area can help (but it should be applied to enough long strings).

----------------------------------------
Feature #15331: [PATCH] Faster hashing for short string literals
https://bugs.ruby-lang.org/issues/15331#change-76845

* Author: alanwu (Alan Wu)
* Status: Closed
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
## Background

It's common for applications to use string literals as hash keys, especially
for applications that work with YAML or JSON:

```
paylod['name']
```

At the moment (r65895), hash lookups using a string key is about 30% slower
than using a symbol key.

## Proposal

We memoize the hash code for short fstrings. There is extra, currently unused
space at the end of the RString struct we can use to store the hash code.
The unique nature of fstrings makes it so that every user of each fstring benefit
from the memoized hash.

## Evaluation

The included benchmark hash_aref_fstr.rb is about 20% faster with this optimization.
hash_aref_long_str.rb shows that for long strings which we cannot memoize, there is
no visible performance penalty.

vm2_freezestring.yml is also not visibly slower after this optimization.

I have also attached a bechmark (string_key_vs_symbol_key.rb) that compares hash
lookups with symbols against hash lookups with strings. With this optimization, the
gap in performance is smaller. (10% slower post patch vs 30% slower on trunk)

---Files--------------------------------
string_key_vs_symbol_key.rb (594 Bytes)
0001-Hash-code-memoization-for-short-fstrings.patch (3.62 KB)
0001-Hash-code-memoization-for-short-fstrings-v2.patch (4.29 KB)
0001-Hash-code-memoization-for-short-fstrings-v3.patch (4.36 KB)
0001-Hash-code-memoization-for-short-fstrings-v4.patch (4.31 KB)
0001-Hash-code-memoization-for-short-fstrings-v5.patch (12.9 KB)
0001-Add-st_update_with_hash.patch (3.09 KB)
0002-Hash-code-memoization-for-short-fstrings.patch (10 KB)

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-02-17 21:11 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <redmine.issue-15331.20181122220623@ruby-lang.org>
2018-11-22 22:06 ` [ruby-core:89985] [Ruby trunk Feature#15331] [PATCH] Faster hashing for short string literals alanwucanada
2018-11-23  1:46 ` [ruby-core:89994] " matz
2018-11-23 12:00 ` [ruby-core:90005] " nobu
2018-11-23 18:52 ` [ruby-core:90013] " alanwucanada
2018-11-24  5:35 ` [ruby-core:90022] " alanwucanada
2018-11-25  3:22 ` [ruby-core:90054] " nobu
2018-11-26  4:39 ` [ruby-core:90066] " alanwucanada
2018-11-26 13:47 ` [ruby-core:90076] " nobu
2018-11-26 16:32 ` [ruby-core:90077] " alanwucanada
2018-12-05 21:30 ` [ruby-core:90329] " alanwucanada
2019-02-07 10:44 ` [ruby-core:91468] " matz
2019-02-07 16:35 ` [ruby-core:91473] " alanwucanada
2019-02-08  6:28 ` [ruby-core:91488] " nobu
2019-02-16  9:33 ` [ruby-core:91576] " duerst
2019-02-17  3:32 ` [ruby-core:91578] " alanwucanada
2019-02-17 21:11 ` [ruby-core:91580] " ko1

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).