ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
From: dylan.smith@shopify.com
To: ruby-core@ruby-lang.org
Subject: [ruby-core:94510] [Ruby master Bug#16121] Stop making a redundant hash copy in Hash#dup
Date: Fri, 23 Aug 2019 20:16:19 +0000 (UTC)	[thread overview]
Message-ID: <redmine.journal-80942.20190823201618.80aa3343b32c7298@ruby-lang.org> (raw)
In-Reply-To: redmine.issue-16121.20190823171108@ruby-lang.org

Issue #16121 has been updated by dylants (Dylan Thacker-Smith).

Backport set to 2.5: UNKNOWN, 2.6: UNKNOWN
ruby -v set to ruby 2.7.0dev (2019-08-23T16:41:09Z master b38ab0a3a9) [x86_64-darwin18]
Tracker changed from Feature to Bug
File 0004-Stop-making-a-redundant-hash-copy-in-Hash-dup.diff.txt added
File 0003-Remove-dead-code-paths-in-rb_hash_initialize_copy.diff.txt added
File 0002-Fix-freeing-and-clearing-destination-hash-in-Hash.diff.txt added
File 0001-Remove-redundant-Check_Type-after-to_hash.diff.txt added

I split up the patch into the 4 attached patches, because I realized there were a few changes that could be emphasized independently, each with their own description of the change.  Hopefully that will make the optimization itself in the last patch easier to review.

Technically that makes this a bug fix, since "Patch 2/4" does fix a memory leak in the following contrived code

```ruby
h = 9.times.map { |i| [i, i] }.to_h
h.send(:initialize_copy, {})
```

where the st_table for the hash `h` won't get freed with `st_free_table` because it doesn't try to call that on the `if (RHASH_AR_TABLE_P(hash2)) {` conditional block in `rb_hash_initialize_copy`.  Mostly I didn't want to ignore it and turn into undefined behaviour from the optimization.

----------------------------------------
Bug #16121: Stop making a redundant hash copy in Hash#dup
https://bugs.ruby-lang.org/issues/16121#change-80942

* Author: dylants (Dylan Thacker-Smith)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.7.0dev (2019-08-23T16:41:09Z master b38ab0a3a9) [x86_64-darwin18]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
## Problem

I noticed while profiling object allocations that Hash#dup was allocating 2 objects instead of only 1 as expected.  I looked for alternatives for comparison and found that `Hash[hash]` created a copy with only a single object allocation and seemed to be more than twice as fast.  Reading the source code revealed the difference was that Hash#dup creates a copy of the Hash, then rehashes the copy.   However, rehashing is done by making a copy of the hash, so the first copy before rehashing was unnecessary.

## Solution

I changed the code to just use rehashing to make the copy of the hash to improve performance while also preserving the existing behaviour.

## Benchmark

```ruby
require 'benchmark'

N = 100000

def report(x, name)
  x.report(name) do
    N.times do
      yield
    end
  end
end

hashes = {
  small_hash: { a: 1 },
  larger_hash: 20.times.map { |i| [('a'.ord + i).chr.to_sym, i] }.to_h
}

Benchmark.bmbm do |x|
  hashes.each do |name, hash|
    report(x, "#{name}.dup") do
      hash.dup
    end
  end
end
```

results on master

```
                      user     system      total        real
small_hash.dup    0.401350   0.001638   0.402988 (  0.404608)
larger_hash.dup   7.218548   0.433616   7.652164 (  7.695990)
```

results with the attached patch

```
                      user     system      total        real
small_hash.dup    0.336733   0.002425   0.339158 (  0.341760)
larger_hash.dup   6.617343   0.398407   7.015750 (  7.070282)
```

---Files--------------------------------
0001-Stop-making-a-redundant-hash-copy-in-Hash-dup.diff.txt (1.93 KB)
0001-Remove-redundant-Check_Type-after-to_hash.diff.txt (624 Bytes)
0002-Fix-freeing-and-clearing-destination-hash-in-Hash.diff.txt (1.57 KB)
0003-Remove-dead-code-paths-in-rb_hash_initialize_copy.diff.txt (1.12 KB)
0004-Stop-making-a-redundant-hash-copy-in-Hash-dup.diff.txt (1.35 KB)


-- 
https://bugs.ruby-lang.org/

  parent reply	other threads:[~2019-08-23 20:16 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <redmine.issue-16121.20190823171108@ruby-lang.org>
2019-08-23 17:11 ` [ruby-core:94507] [Ruby master Feature#16121] Stop making a redundant hash copy in Hash#dup dylan.smith
2019-08-23 20:16 ` dylan.smith [this message]
2019-09-19  8:20 ` [ruby-core:94983] [Ruby master Bug#16121] " ko1
2019-09-20  7:45 ` [ruby-core:95000] " ko1
2019-09-20  8:35 ` [ruby-core:95002] " ko1
2019-09-27 22:49 ` [ruby-core:95137] " dylan.smith
2019-10-21  8:29 ` [ruby-core:95452] " ko1

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.ruby-lang.org/en/community/mailing-lists/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=redmine.journal-80942.20190823201618.80aa3343b32c7298@ruby-lang.org \
    --to=ruby-core@ruby-lang.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).