* [ruby-core:94507] [Ruby master Feature#16121] Stop making a redundant hash copy in Hash#dup
[not found] <redmine.issue-16121.20190823171108@ruby-lang.org>
@ 2019-08-23 17:11 ` dylan.smith
2019-08-23 20:16 ` [ruby-core:94510] [Ruby master Bug#16121] " dylan.smith
` (5 subsequent siblings)
6 siblings, 0 replies; 7+ messages in thread
From: dylan.smith @ 2019-08-23 17:11 UTC (permalink / raw)
To: ruby-core
Issue #16121 has been reported by dylants (Dylan Thacker-Smith).
----------------------------------------
Feature #16121: Stop making a redundant hash copy in Hash#dup
https://bugs.ruby-lang.org/issues/16121
* Author: dylants (Dylan Thacker-Smith)
* Status: Open
* Priority: Normal
* Assignee:
* Target version:
----------------------------------------
## Problem
I noticed while profiling object allocations that Hash#dup was allocating 2 objects instead of only 1 as expected. I looked for alternatives for comparison and found that `Hash[hash]` created a copy with only a single object allocation and seemed to be more than twice as fast. Reading the source code revealed the difference was that Hash#dup creates a copy of the Hash, then rehashes the copy. However, rehashing is done by making a copy of the hash, so the first copy before rehashing was unnecessary.
## Solution
I changed the code to just use rehashing to make the copy of the hash to improve performance while also preserving the existing behaviour.
## Benchmark
```ruby
require 'benchmark'
N = 100000
def report(x, name)
x.report(name) do
N.times do
yield
end
end
end
hashes = {
small_hash: { a: 1 },
larger_hash: 20.times.map { |i| [('a'.ord + i).chr.to_sym, i] }.to_h
}
Benchmark.bmbm do |x|
hashes.each do |name, hash|
report(x, "#{name}.dup") do
hash.dup
end
end
end
```
results on master
```
user system total real
small_hash.dup 0.401350 0.001638 0.402988 ( 0.404608)
larger_hash.dup 7.218548 0.433616 7.652164 ( 7.695990)
```
results with the attached patch
```
user system total real
small_hash.dup 0.336733 0.002425 0.339158 ( 0.341760)
larger_hash.dup 6.617343 0.398407 7.015750 ( 7.070282)
```
---Files--------------------------------
0001-Stop-making-a-redundant-hash-copy-in-Hash-dup.diff.txt (1.93 KB)
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 7+ messages in thread
* [ruby-core:94510] [Ruby master Bug#16121] Stop making a redundant hash copy in Hash#dup
[not found] <redmine.issue-16121.20190823171108@ruby-lang.org>
2019-08-23 17:11 ` [ruby-core:94507] [Ruby master Feature#16121] Stop making a redundant hash copy in Hash#dup dylan.smith
@ 2019-08-23 20:16 ` dylan.smith
2019-09-19 8:20 ` [ruby-core:94983] " ko1
` (4 subsequent siblings)
6 siblings, 0 replies; 7+ messages in thread
From: dylan.smith @ 2019-08-23 20:16 UTC (permalink / raw)
To: ruby-core
Issue #16121 has been updated by dylants (Dylan Thacker-Smith).
Backport set to 2.5: UNKNOWN, 2.6: UNKNOWN
ruby -v set to ruby 2.7.0dev (2019-08-23T16:41:09Z master b38ab0a3a9) [x86_64-darwin18]
Tracker changed from Feature to Bug
File 0004-Stop-making-a-redundant-hash-copy-in-Hash-dup.diff.txt added
File 0003-Remove-dead-code-paths-in-rb_hash_initialize_copy.diff.txt added
File 0002-Fix-freeing-and-clearing-destination-hash-in-Hash.diff.txt added
File 0001-Remove-redundant-Check_Type-after-to_hash.diff.txt added
I split up the patch into the 4 attached patches, because I realized there were a few changes that could be emphasized independently, each with their own description of the change. Hopefully that will make the optimization itself in the last patch easier to review.
Technically that makes this a bug fix, since "Patch 2/4" does fix a memory leak in the following contrived code
```ruby
h = 9.times.map { |i| [i, i] }.to_h
h.send(:initialize_copy, {})
```
where the st_table for the hash `h` won't get freed with `st_free_table` because it doesn't try to call that on the `if (RHASH_AR_TABLE_P(hash2)) {` conditional block in `rb_hash_initialize_copy`. Mostly I didn't want to ignore it and turn into undefined behaviour from the optimization.
----------------------------------------
Bug #16121: Stop making a redundant hash copy in Hash#dup
https://bugs.ruby-lang.org/issues/16121#change-80942
* Author: dylants (Dylan Thacker-Smith)
* Status: Open
* Priority: Normal
* Assignee:
* Target version:
* ruby -v: ruby 2.7.0dev (2019-08-23T16:41:09Z master b38ab0a3a9) [x86_64-darwin18]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
## Problem
I noticed while profiling object allocations that Hash#dup was allocating 2 objects instead of only 1 as expected. I looked for alternatives for comparison and found that `Hash[hash]` created a copy with only a single object allocation and seemed to be more than twice as fast. Reading the source code revealed the difference was that Hash#dup creates a copy of the Hash, then rehashes the copy. However, rehashing is done by making a copy of the hash, so the first copy before rehashing was unnecessary.
## Solution
I changed the code to just use rehashing to make the copy of the hash to improve performance while also preserving the existing behaviour.
## Benchmark
```ruby
require 'benchmark'
N = 100000
def report(x, name)
x.report(name) do
N.times do
yield
end
end
end
hashes = {
small_hash: { a: 1 },
larger_hash: 20.times.map { |i| [('a'.ord + i).chr.to_sym, i] }.to_h
}
Benchmark.bmbm do |x|
hashes.each do |name, hash|
report(x, "#{name}.dup") do
hash.dup
end
end
end
```
results on master
```
user system total real
small_hash.dup 0.401350 0.001638 0.402988 ( 0.404608)
larger_hash.dup 7.218548 0.433616 7.652164 ( 7.695990)
```
results with the attached patch
```
user system total real
small_hash.dup 0.336733 0.002425 0.339158 ( 0.341760)
larger_hash.dup 6.617343 0.398407 7.015750 ( 7.070282)
```
---Files--------------------------------
0001-Stop-making-a-redundant-hash-copy-in-Hash-dup.diff.txt (1.93 KB)
0001-Remove-redundant-Check_Type-after-to_hash.diff.txt (624 Bytes)
0002-Fix-freeing-and-clearing-destination-hash-in-Hash.diff.txt (1.57 KB)
0003-Remove-dead-code-paths-in-rb_hash_initialize_copy.diff.txt (1.12 KB)
0004-Stop-making-a-redundant-hash-copy-in-Hash-dup.diff.txt (1.35 KB)
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 7+ messages in thread
* [ruby-core:94983] [Ruby master Bug#16121] Stop making a redundant hash copy in Hash#dup
[not found] <redmine.issue-16121.20190823171108@ruby-lang.org>
2019-08-23 17:11 ` [ruby-core:94507] [Ruby master Feature#16121] Stop making a redundant hash copy in Hash#dup dylan.smith
2019-08-23 20:16 ` [ruby-core:94510] [Ruby master Bug#16121] " dylan.smith
@ 2019-09-19 8:20 ` ko1
2019-09-20 7:45 ` [ruby-core:95000] " ko1
` (3 subsequent siblings)
6 siblings, 0 replies; 7+ messages in thread
From: ko1 @ 2019-09-19 8:20 UTC (permalink / raw)
To: ruby-core
Issue #16121 has been updated by ko1 (Koichi Sasada).
Assignee set to ko1 (Koichi Sasada)
I'll review it.
----------------------------------------
Bug #16121: Stop making a redundant hash copy in Hash#dup
https://bugs.ruby-lang.org/issues/16121#change-81602
* Author: dylants (Dylan Thacker-Smith)
* Status: Open
* Priority: Normal
* Assignee: ko1 (Koichi Sasada)
* Target version:
* ruby -v: ruby 2.7.0dev (2019-08-23T16:41:09Z master b38ab0a3a9) [x86_64-darwin18]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
## Problem
I noticed while profiling object allocations that Hash#dup was allocating 2 objects instead of only 1 as expected. I looked for alternatives for comparison and found that `Hash[hash]` created a copy with only a single object allocation and seemed to be more than twice as fast. Reading the source code revealed the difference was that Hash#dup creates a copy of the Hash, then rehashes the copy. However, rehashing is done by making a copy of the hash, so the first copy before rehashing was unnecessary.
## Solution
I changed the code to just use rehashing to make the copy of the hash to improve performance while also preserving the existing behaviour.
## Benchmark
```ruby
require 'benchmark'
N = 100000
def report(x, name)
x.report(name) do
N.times do
yield
end
end
end
hashes = {
small_hash: { a: 1 },
larger_hash: 20.times.map { |i| [('a'.ord + i).chr.to_sym, i] }.to_h
}
Benchmark.bmbm do |x|
hashes.each do |name, hash|
report(x, "#{name}.dup") do
hash.dup
end
end
end
```
results on master
```
user system total real
small_hash.dup 0.401350 0.001638 0.402988 ( 0.404608)
larger_hash.dup 7.218548 0.433616 7.652164 ( 7.695990)
```
results with the attached patch
```
user system total real
small_hash.dup 0.336733 0.002425 0.339158 ( 0.341760)
larger_hash.dup 6.617343 0.398407 7.015750 ( 7.070282)
```
---Files--------------------------------
0001-Remove-redundant-Check_Type-after-to_hash.diff.txt (624 Bytes)
0002-Fix-freeing-and-clearing-destination-hash-in-Hash.diff.txt (1.57 KB)
0003-Remove-dead-code-paths-in-rb_hash_initialize_copy.diff.txt (1.12 KB)
0004-Stop-making-a-redundant-hash-copy-in-Hash-dup.diff.txt (1.35 KB)
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 7+ messages in thread
* [ruby-core:95000] [Ruby master Bug#16121] Stop making a redundant hash copy in Hash#dup
[not found] <redmine.issue-16121.20190823171108@ruby-lang.org>
` (2 preceding siblings ...)
2019-09-19 8:20 ` [ruby-core:94983] " ko1
@ 2019-09-20 7:45 ` ko1
2019-09-20 8:35 ` [ruby-core:95002] " ko1
` (2 subsequent siblings)
6 siblings, 0 replies; 7+ messages in thread
From: ko1 @ 2019-09-20 7:45 UTC (permalink / raw)
To: ruby-core
Issue #16121 has been updated by ko1 (Koichi Sasada).
Thank you. Completely fine.
Do you want to merge via github PR or attached patch by my commit?
----------------------------------------
Bug #16121: Stop making a redundant hash copy in Hash#dup
https://bugs.ruby-lang.org/issues/16121#change-81628
* Author: dylants (Dylan Thacker-Smith)
* Status: Open
* Priority: Normal
* Assignee: ko1 (Koichi Sasada)
* Target version:
* ruby -v: ruby 2.7.0dev (2019-08-23T16:41:09Z master b38ab0a3a9) [x86_64-darwin18]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
## Problem
I noticed while profiling object allocations that Hash#dup was allocating 2 objects instead of only 1 as expected. I looked for alternatives for comparison and found that `Hash[hash]` created a copy with only a single object allocation and seemed to be more than twice as fast. Reading the source code revealed the difference was that Hash#dup creates a copy of the Hash, then rehashes the copy. However, rehashing is done by making a copy of the hash, so the first copy before rehashing was unnecessary.
## Solution
I changed the code to just use rehashing to make the copy of the hash to improve performance while also preserving the existing behaviour.
## Benchmark
```ruby
require 'benchmark'
N = 100000
def report(x, name)
x.report(name) do
N.times do
yield
end
end
end
hashes = {
small_hash: { a: 1 },
larger_hash: 20.times.map { |i| [('a'.ord + i).chr.to_sym, i] }.to_h
}
Benchmark.bmbm do |x|
hashes.each do |name, hash|
report(x, "#{name}.dup") do
hash.dup
end
end
end
```
results on master
```
user system total real
small_hash.dup 0.401350 0.001638 0.402988 ( 0.404608)
larger_hash.dup 7.218548 0.433616 7.652164 ( 7.695990)
```
results with the attached patch
```
user system total real
small_hash.dup 0.336733 0.002425 0.339158 ( 0.341760)
larger_hash.dup 6.617343 0.398407 7.015750 ( 7.070282)
```
---Files--------------------------------
0001-Remove-redundant-Check_Type-after-to_hash.diff.txt (624 Bytes)
0002-Fix-freeing-and-clearing-destination-hash-in-Hash.diff.txt (1.57 KB)
0003-Remove-dead-code-paths-in-rb_hash_initialize_copy.diff.txt (1.12 KB)
0004-Stop-making-a-redundant-hash-copy-in-Hash-dup.diff.txt (1.35 KB)
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 7+ messages in thread
* [ruby-core:95002] [Ruby master Bug#16121] Stop making a redundant hash copy in Hash#dup
[not found] <redmine.issue-16121.20190823171108@ruby-lang.org>
` (3 preceding siblings ...)
2019-09-20 7:45 ` [ruby-core:95000] " ko1
@ 2019-09-20 8:35 ` ko1
2019-09-27 22:49 ` [ruby-core:95137] " dylan.smith
2019-10-21 8:29 ` [ruby-core:95452] " ko1
6 siblings, 0 replies; 7+ messages in thread
From: ko1 @ 2019-09-20 8:35 UTC (permalink / raw)
To: ruby-core
Issue #16121 has been updated by ko1 (Koichi Sasada).
I found that
> ` rb_define_method(rb_cArray, "initialize_copy", rb_ary_replace, 1);`
`Array#initialize_copy` == `Array#replace`. Can we use same technique?
----------------------------------------
Bug #16121: Stop making a redundant hash copy in Hash#dup
https://bugs.ruby-lang.org/issues/16121#change-81630
* Author: dylants (Dylan Thacker-Smith)
* Status: Open
* Priority: Normal
* Assignee: ko1 (Koichi Sasada)
* Target version:
* ruby -v: ruby 2.7.0dev (2019-08-23T16:41:09Z master b38ab0a3a9) [x86_64-darwin18]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
## Problem
I noticed while profiling object allocations that Hash#dup was allocating 2 objects instead of only 1 as expected. I looked for alternatives for comparison and found that `Hash[hash]` created a copy with only a single object allocation and seemed to be more than twice as fast. Reading the source code revealed the difference was that Hash#dup creates a copy of the Hash, then rehashes the copy. However, rehashing is done by making a copy of the hash, so the first copy before rehashing was unnecessary.
## Solution
I changed the code to just use rehashing to make the copy of the hash to improve performance while also preserving the existing behaviour.
## Benchmark
```ruby
require 'benchmark'
N = 100000
def report(x, name)
x.report(name) do
N.times do
yield
end
end
end
hashes = {
small_hash: { a: 1 },
larger_hash: 20.times.map { |i| [('a'.ord + i).chr.to_sym, i] }.to_h
}
Benchmark.bmbm do |x|
hashes.each do |name, hash|
report(x, "#{name}.dup") do
hash.dup
end
end
end
```
results on master
```
user system total real
small_hash.dup 0.401350 0.001638 0.402988 ( 0.404608)
larger_hash.dup 7.218548 0.433616 7.652164 ( 7.695990)
```
results with the attached patch
```
user system total real
small_hash.dup 0.336733 0.002425 0.339158 ( 0.341760)
larger_hash.dup 6.617343 0.398407 7.015750 ( 7.070282)
```
---Files--------------------------------
0001-Remove-redundant-Check_Type-after-to_hash.diff.txt (624 Bytes)
0002-Fix-freeing-and-clearing-destination-hash-in-Hash.diff.txt (1.57 KB)
0003-Remove-dead-code-paths-in-rb_hash_initialize_copy.diff.txt (1.12 KB)
0004-Stop-making-a-redundant-hash-copy-in-Hash-dup.diff.txt (1.35 KB)
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 7+ messages in thread
* [ruby-core:95137] [Ruby master Bug#16121] Stop making a redundant hash copy in Hash#dup
[not found] <redmine.issue-16121.20190823171108@ruby-lang.org>
` (4 preceding siblings ...)
2019-09-20 8:35 ` [ruby-core:95002] " ko1
@ 2019-09-27 22:49 ` dylan.smith
2019-10-21 8:29 ` [ruby-core:95452] " ko1
6 siblings, 0 replies; 7+ messages in thread
From: dylan.smith @ 2019-09-27 22:49 UTC (permalink / raw)
To: ruby-core
Issue #16121 has been updated by dylants (Dylan Thacker-Smith).
ko1 (Koichi Sasada) wrote:
> I found that
>
> > ` rb_define_method(rb_cArray, "initialize_copy", rb_ary_replace, 1);`
>
> `Array#initialize_copy` == `Array#replace`. Can we use same technique?
I tried doing that and it caused test failures because Hash#replace stop rehashing keys in ruby 2.6. I've opened a bug for that: https://bugs.ruby-lang.org/issues/16187
I've opened https://github.com/ruby/ruby/pull/2489 which replaces the implementation of both of these methods with a single optimized implementation that always rehashes.
I benchmarked it with the following benchmark/hash_dup.yml file
```
prelude: |
small_hash = { a: 1 }
larger_hash = 20.times.map { |i| [('a'.ord + i).chr.to_sym, i] }.to_h
benchmark:
dup_small: small_hash.dup
dup_larger: larger_hash.dup
replace_small: '{}.replace(small_hash)'
replace_larger: '{}.replace(larger_hash)'
loop_count: 100000
```
and here are the results against the latest ruby master commit
```
Comparison:
dup_small
built-ruby: 4105090.4 i/s
compare-ruby: 3089471.1 i/s - 1.33x slower
dup_larger
built-ruby: 682873.5 i/s
compare-ruby: 518623.8 i/s - 1.32x slower
replace_small
compare-ruby: 10697475.5 i/s
built-ruby: 9399379.9 i/s - 1.14x slower
replace_larger
built-ruby: 807767.5 i/s
compare-ruby: 358383.1 i/s - 2.25x slower
```
The `replace_small` case is slower due to rehashing, but is still much faster than on ruby version 2.5.6 that did rehash keys of small hashes:
```
replace_small
built-ruby: 8847991.6 i/s
compare-ruby: 2178174.7 i/s - 4.06x slower
```
----------------------------------------
Bug #16121: Stop making a redundant hash copy in Hash#dup
https://bugs.ruby-lang.org/issues/16121#change-81776
* Author: dylants (Dylan Thacker-Smith)
* Status: Open
* Priority: Normal
* Assignee: ko1 (Koichi Sasada)
* Target version:
* ruby -v: ruby 2.7.0dev (2019-08-23T16:41:09Z master b38ab0a3a9) [x86_64-darwin18]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
## Problem
I noticed while profiling object allocations that Hash#dup was allocating 2 objects instead of only 1 as expected. I looked for alternatives for comparison and found that `Hash[hash]` created a copy with only a single object allocation and seemed to be more than twice as fast. Reading the source code revealed the difference was that Hash#dup creates a copy of the Hash, then rehashes the copy. However, rehashing is done by making a copy of the hash, so the first copy before rehashing was unnecessary.
## Solution
I changed the code to just use rehashing to make the copy of the hash to improve performance while also preserving the existing behaviour.
## Benchmark
```ruby
require 'benchmark'
N = 100000
def report(x, name)
x.report(name) do
N.times do
yield
end
end
end
hashes = {
small_hash: { a: 1 },
larger_hash: 20.times.map { |i| [('a'.ord + i).chr.to_sym, i] }.to_h
}
Benchmark.bmbm do |x|
hashes.each do |name, hash|
report(x, "#{name}.dup") do
hash.dup
end
end
end
```
results on master
```
user system total real
small_hash.dup 0.401350 0.001638 0.402988 ( 0.404608)
larger_hash.dup 7.218548 0.433616 7.652164 ( 7.695990)
```
results with the attached patch
```
user system total real
small_hash.dup 0.336733 0.002425 0.339158 ( 0.341760)
larger_hash.dup 6.617343 0.398407 7.015750 ( 7.070282)
```
---Files--------------------------------
0001-Remove-redundant-Check_Type-after-to_hash.diff.txt (624 Bytes)
0002-Fix-freeing-and-clearing-destination-hash-in-Hash.diff.txt (1.57 KB)
0003-Remove-dead-code-paths-in-rb_hash_initialize_copy.diff.txt (1.12 KB)
0004-Stop-making-a-redundant-hash-copy-in-Hash-dup.diff.txt (1.35 KB)
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 7+ messages in thread
* [ruby-core:95452] [Ruby master Bug#16121] Stop making a redundant hash copy in Hash#dup
[not found] <redmine.issue-16121.20190823171108@ruby-lang.org>
` (5 preceding siblings ...)
2019-09-27 22:49 ` [ruby-core:95137] " dylan.smith
@ 2019-10-21 8:29 ` ko1
6 siblings, 0 replies; 7+ messages in thread
From: ko1 @ 2019-10-21 8:29 UTC (permalink / raw)
To: ruby-core
Issue #16121 has been updated by ko1 (Koichi Sasada).
Thank you, I merged it!
----------------------------------------
Bug #16121: Stop making a redundant hash copy in Hash#dup
https://bugs.ruby-lang.org/issues/16121#change-82202
* Author: dylants (Dylan Thacker-Smith)
* Status: Open
* Priority: Normal
* Assignee: ko1 (Koichi Sasada)
* Target version:
* ruby -v: ruby 2.7.0dev (2019-08-23T16:41:09Z master b38ab0a3a9) [x86_64-darwin18]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
## Problem
I noticed while profiling object allocations that Hash#dup was allocating 2 objects instead of only 1 as expected. I looked for alternatives for comparison and found that `Hash[hash]` created a copy with only a single object allocation and seemed to be more than twice as fast. Reading the source code revealed the difference was that Hash#dup creates a copy of the Hash, then rehashes the copy. However, rehashing is done by making a copy of the hash, so the first copy before rehashing was unnecessary.
## Solution
I changed the code to just use rehashing to make the copy of the hash to improve performance while also preserving the existing behaviour.
## Benchmark
```ruby
require 'benchmark'
N = 100000
def report(x, name)
x.report(name) do
N.times do
yield
end
end
end
hashes = {
small_hash: { a: 1 },
larger_hash: 20.times.map { |i| [('a'.ord + i).chr.to_sym, i] }.to_h
}
Benchmark.bmbm do |x|
hashes.each do |name, hash|
report(x, "#{name}.dup") do
hash.dup
end
end
end
```
results on master
```
user system total real
small_hash.dup 0.401350 0.001638 0.402988 ( 0.404608)
larger_hash.dup 7.218548 0.433616 7.652164 ( 7.695990)
```
results with the attached patch
```
user system total real
small_hash.dup 0.336733 0.002425 0.339158 ( 0.341760)
larger_hash.dup 6.617343 0.398407 7.015750 ( 7.070282)
```
---Files--------------------------------
0001-Remove-redundant-Check_Type-after-to_hash.diff.txt (624 Bytes)
0002-Fix-freeing-and-clearing-destination-hash-in-Hash.diff.txt (1.57 KB)
0003-Remove-dead-code-paths-in-rb_hash_initialize_copy.diff.txt (1.12 KB)
0004-Stop-making-a-redundant-hash-copy-in-Hash-dup.diff.txt (1.35 KB)
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 7+ messages in thread