ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:95519] [Ruby master Bug#16278] Potential memory leak when an hash is used as a key for another hash
       [not found] <redmine.issue-16278.20191023232506@ruby-lang.org>
@ 2019-10-23 23:25 ` cristian
  2019-10-23 23:43 ` [ruby-core:95520] " mame
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 13+ messages in thread
From: cristian @ 2019-10-23 23:25 UTC (permalink / raw)
  To: ruby-core

Issue #16278 has been reported by cristiangreco (Cristian Greco).

----------------------------------------
Bug #16278: Potential memory leak when an hash is used as a key for another hash
https://bugs.ruby-lang.org/issues/16278

* Author: cristiangreco (Cristian Greco)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-darwin18]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
Hi,

I've been hitting what seems to be a memory leak.

When an hash is used as key for another hash, the former object will be retained even after multiple GC runs.

The following code snippet demonstrates how the hash `{:a => 1}` (which is never used outside the scope of `create`) is retained even after 10 GC runs (`find` will look for an object with a given `object_id` on heap).


```ruby
# frozen_string_literal: true

def create
  h = {{:a => 1} => 2}
  h.keys.first.object_id
end

def find(object_id)
  ObjectSpace.each_object(Hash).any?{|h| h.object_id == object_id} ? 1 : 0
end


leaked = create

10.times do
  GC.start(full_mark: true, immediate_sweep: true)
end

exit find(leaked)
```

This code snippet is expected to exit with `0` while it exits with `1` in my tests. I've tested this on multiple recent ruby versions and OSs, either locally (OSX with homebrew) or in different CIs (e.g. [here](https://github.com/cristiangreco/ruby-hash-leak/commit/285e586b7193104989f59b92579fe8f25770141e/checks?check_suite_id=278711566)).

Can you please help understand what's going on here? Thanks!



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:95520] [Ruby master Bug#16278] Potential memory leak when an hash is used as a key for another hash
       [not found] <redmine.issue-16278.20191023232506@ruby-lang.org>
  2019-10-23 23:25 ` [ruby-core:95519] [Ruby master Bug#16278] Potential memory leak when an hash is used as a key for another hash cristian
@ 2019-10-23 23:43 ` mame
  2019-10-24  7:55 ` [ruby-core:95528] " cristian
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 13+ messages in thread
From: mame @ 2019-10-23 23:43 UTC (permalink / raw)
  To: ruby-core

Issue #16278 has been updated by mame (Yusuke Endoh).

Status changed from Open to Rejected

Ruby uses so-called conservative GC, which is inexact.  It is not guaranteed that an object is collected when it becomes unreachable.

You can confirm that it does not cause leak by the following code:

```ruby
def create
  h = {{:a => 1} => 2}
  h.keys.first.object_id
end

loop { create }
```

This code uses constant memory.  If it caused memory leak, the memory usage would continue to increase.

----------------------------------------
Bug #16278: Potential memory leak when an hash is used as a key for another hash
https://bugs.ruby-lang.org/issues/16278#change-82293

* Author: cristiangreco (Cristian Greco)
* Status: Rejected
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-darwin18]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
Hi,

I've been hitting what seems to be a memory leak.

When an hash is used as key for another hash, the former object will be retained even after multiple GC runs.

The following code snippet demonstrates how the hash `{:a => 1}` (which is never used outside the scope of `create`) is retained even after 10 GC runs (`find` will look for an object with a given `object_id` on heap).


```ruby
# frozen_string_literal: true

def create
  h = {{:a => 1} => 2}
  h.keys.first.object_id
end

def find(object_id)
  ObjectSpace.each_object(Hash).any?{|h| h.object_id == object_id} ? 1 : 0
end


leaked = create

10.times do
  GC.start(full_mark: true, immediate_sweep: true)
end

exit find(leaked)
```

This code snippet is expected to exit with `0` while it exits with `1` in my tests. I've tested this on multiple recent ruby versions and OSs, either locally (OSX with homebrew) or in different CIs (e.g. [here](https://github.com/cristiangreco/ruby-hash-leak/commit/285e586b7193104989f59b92579fe8f25770141e/checks?check_suite_id=278711566)).

Can you please help understand what's going on here? Thanks!



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:95528] [Ruby master Bug#16278] Potential memory leak when an hash is used as a key for another hash
       [not found] <redmine.issue-16278.20191023232506@ruby-lang.org>
  2019-10-23 23:25 ` [ruby-core:95519] [Ruby master Bug#16278] Potential memory leak when an hash is used as a key for another hash cristian
  2019-10-23 23:43 ` [ruby-core:95520] " mame
@ 2019-10-24  7:55 ` cristian
  2019-10-25  6:07 ` [ruby-core:95539] " ko1
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 13+ messages in thread
From: cristian @ 2019-10-24  7:55 UTC (permalink / raw)
  To: ruby-core

Issue #16278 has been updated by cristiangreco (Cristian Greco).

Status changed from Rejected to Open

mame (Yusuke Endoh) wrote: 
> This code uses constant memory.  If it caused memory leak, the memory usage would continue to increase.

Thank for your answer! 

If an application exercises this pattern very frequently during lifetime and across multiple processes then it’s definitely going to bloat memory, at the very least. As a real-world example, this is causing high memory usage for the [Prometheus client gem](https://github.com/prometheus/client_ruby), where such pattern is heavily used when passing around metric labels.

As Ruby allows any object to be potentially used as hash key, this behaviour is easily going to cause troubles in long running application and to confuse developers.

I understand it might not be easy or cheap to change the way it works now, but hope you agree with me that application code should not be designed to work around these implementation-specific details. Do you think there’s any viable approach to alleviate this type of issues in future releases? 

----------------------------------------
Bug #16278: Potential memory leak when an hash is used as a key for another hash
https://bugs.ruby-lang.org/issues/16278#change-82303

* Author: cristiangreco (Cristian Greco)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-darwin18]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
Hi,

I've been hitting what seems to be a memory leak.

When an hash is used as key for another hash, the former object will be retained even after multiple GC runs.

The following code snippet demonstrates how the hash `{:a => 1}` (which is never used outside the scope of `create`) is retained even after 10 GC runs (`find` will look for an object with a given `object_id` on heap).


```ruby
# frozen_string_literal: true

def create
  h = {{:a => 1} => 2}
  h.keys.first.object_id
end

def find(object_id)
  ObjectSpace.each_object(Hash).any?{|h| h.object_id == object_id} ? 1 : 0
end


leaked = create

10.times do
  GC.start(full_mark: true, immediate_sweep: true)
end

exit find(leaked)
```

This code snippet is expected to exit with `0` while it exits with `1` in my tests. I've tested this on multiple recent ruby versions and OSs, either locally (OSX with homebrew) or in different CIs (e.g. [here](https://github.com/cristiangreco/ruby-hash-leak/commit/285e586b7193104989f59b92579fe8f25770141e/checks?check_suite_id=278711566)).

Can you please help understand what's going on here? Thanks!



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:95539] [Ruby master Bug#16278] Potential memory leak when an hash is used as a key for another hash
       [not found] <redmine.issue-16278.20191023232506@ruby-lang.org>
                   ` (2 preceding siblings ...)
  2019-10-24  7:55 ` [ruby-core:95528] " cristian
@ 2019-10-25  6:07 ` ko1
  2019-10-25 21:18 ` [ruby-core:95553] " cristian
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 13+ messages in thread
From: ko1 @ 2019-10-25  6:07 UTC (permalink / raw)
  To: ruby-core

Issue #16278 has been updated by ko1 (Koichi Sasada).


> If an application exercises this pattern very frequently during lifetime and across multiple processes then it’s definitely going to bloat memory, at the very least. As a real-world example, this is causing high memory usage for the Prometheus client gem, where such pattern is heavily used when passing around metric labels.

Could you give us good example with that gem? As mame-san said, your example is not a problem because "This code uses constant memory".





----------------------------------------
Bug #16278: Potential memory leak when an hash is used as a key for another hash
https://bugs.ruby-lang.org/issues/16278#change-82315

* Author: cristiangreco (Cristian Greco)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-darwin18]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
Hi,

I've been hitting what seems to be a memory leak.

When an hash is used as key for another hash, the former object will be retained even after multiple GC runs.

The following code snippet demonstrates how the hash `{:a => 1}` (which is never used outside the scope of `create`) is retained even after 10 GC runs (`find` will look for an object with a given `object_id` on heap).


```ruby
# frozen_string_literal: true

def create
  h = {{:a => 1} => 2}
  h.keys.first.object_id
end

def find(object_id)
  ObjectSpace.each_object(Hash).any?{|h| h.object_id == object_id} ? 1 : 0
end


leaked = create

10.times do
  GC.start(full_mark: true, immediate_sweep: true)
end

exit find(leaked)
```

This code snippet is expected to exit with `0` while it exits with `1` in my tests. I've tested this on multiple recent ruby versions and OSs, either locally (OSX with homebrew) or in different CIs (e.g. [here](https://github.com/cristiangreco/ruby-hash-leak/commit/285e586b7193104989f59b92579fe8f25770141e/checks?check_suite_id=278711566)).

Can you please help understand what's going on here? Thanks!



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:95553] [Ruby master Bug#16278] Potential memory leak when an hash is used as a key for another hash
       [not found] <redmine.issue-16278.20191023232506@ruby-lang.org>
                   ` (3 preceding siblings ...)
  2019-10-25  6:07 ` [ruby-core:95539] " ko1
@ 2019-10-25 21:18 ` cristian
  2019-10-25 21:31 ` [ruby-core:95554] " merch-redmine
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 13+ messages in thread
From: cristian @ 2019-10-25 21:18 UTC (permalink / raw)
  To: ruby-core

Issue #16278 has been updated by cristiangreco (Cristian Greco).


ko1 (Koichi Sasada) wrote:
> > If an application exercises this pattern very frequently during lifetime and across multiple processes then it’s definitely going to bloat memory, at the very least. As a real-world example, this is causing high memory usage for the Prometheus client gem, where such pattern is heavily used when passing around metric labels.
> 
> Could you give us good example with that gem? As mame-san said, your example is not a problem because "This code uses constant memory".

Hi Koichi, thanks for taking the time to look into this!

This example creates a metric of type `counter`, updates it 3 times using different labels and prints out an export of the metric registry:

```ruby
# frozen_string_literal: true

require "prometheus/client"
require "prometheus/client/formats/text"

require "prometheus/client/data_stores/synchronized"
# require "prometheus/client/data_stores/single_threaded"
# require "prometheus/client/data_stores/direct_file_store"

Prometheus::Client.config.data_store = Prometheus::Client::DataStores::Synchronized.new
# Prometheus::Client.config.data_store = Prometheus::Client::DataStores::SingleThreaded.new
# Prometheus::Client.config.data_store = Prometheus::Client::DataStores::DirectFileStore.new(dir: ".")

def test
  registry = Prometheus::Client::Registry.new
  counter = registry.counter(
    :counter_metric,
    docstring: "a counter",
    labels: %i[label1 label2],
  )

  labels1 = { label1: "foo", label2: "bar" }
  labels2 = { label1: 1, label2: 2 }
  labels3 = { label1: :a, label2: :b }

  counter.increment(by: 1, labels: labels1)
  counter.increment(by: 1, labels: labels2)
  counter.increment(by: 1, labels: labels3)

  puts Prometheus::Client::Formats::Text.marshal(registry)

  [labels1.object_id, labels2.object_id, labels3.object_id]
end

def find(id)
  ObjectSpace.each_object(Hash).each { |h| id == h.object_id }
end

retained_ids = test

10.times do
  GC.start(full_mark: true, immediate_sweep: true)
end

retained_ids.each { |id| puts "found #{id}" if find(id) }
```

Using each store I found that the 3 labels hashes are retained after garbage collection (all the stores use a similar pattern of storing labels hashes within another hash).

What do you think is going on here?

----------------------------------------
Bug #16278: Potential memory leak when an hash is used as a key for another hash
https://bugs.ruby-lang.org/issues/16278#change-82331

* Author: cristiangreco (Cristian Greco)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-darwin18]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
Hi,

I've been hitting what seems to be a memory leak.

When an hash is used as key for another hash, the former object will be retained even after multiple GC runs.

The following code snippet demonstrates how the hash `{:a => 1}` (which is never used outside the scope of `create`) is retained even after 10 GC runs (`find` will look for an object with a given `object_id` on heap).


```ruby
# frozen_string_literal: true

def create
  h = {{:a => 1} => 2}
  h.keys.first.object_id
end

def find(object_id)
  ObjectSpace.each_object(Hash).any?{|h| h.object_id == object_id} ? 1 : 0
end


leaked = create

10.times do
  GC.start(full_mark: true, immediate_sweep: true)
end

exit find(leaked)
```

This code snippet is expected to exit with `0` while it exits with `1` in my tests. I've tested this on multiple recent ruby versions and OSs, either locally (OSX with homebrew) or in different CIs (e.g. [here](https://github.com/cristiangreco/ruby-hash-leak/commit/285e586b7193104989f59b92579fe8f25770141e/checks?check_suite_id=278711566)).

Can you please help understand what's going on here? Thanks!



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:95554] [Ruby master Bug#16278] Potential memory leak when an hash is used as a key for another hash
       [not found] <redmine.issue-16278.20191023232506@ruby-lang.org>
                   ` (4 preceding siblings ...)
  2019-10-25 21:18 ` [ruby-core:95553] " cristian
@ 2019-10-25 21:31 ` merch-redmine
  2019-10-25 22:53 ` [ruby-core:95555] " cristian
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 13+ messages in thread
From: merch-redmine @ 2019-10-25 21:31 UTC (permalink / raw)
  To: ruby-core

Issue #16278 has been updated by jeremyevans0 (Jeremy Evans).


cristiangreco (Cristian Greco) wrote:
> Using each store I found that the 3 labels hashes are retained after garbage collection (all the stores use a similar pattern of storing labels hashes within another hash).
> 
> What do you think is going on here?

`object_id` is only unique for the life of the object.  After the object is garbage collected, the same `object_id` could be used for a different object.  So measuring using `object_id` is not a good idea.  Maybe use `ObjectSpace.define_finalizer` in your testing?  I'm not sure this affects your particular test program, it is just a good general principle.

Can you remove the usage of `Prometheus` in your test script, and design a test script using only core Ruby classes, so we can be sure that `Prometheus` is not holding any references to the hashes?

----------------------------------------
Bug #16278: Potential memory leak when an hash is used as a key for another hash
https://bugs.ruby-lang.org/issues/16278#change-82332

* Author: cristiangreco (Cristian Greco)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-darwin18]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
Hi,

I've been hitting what seems to be a memory leak.

When an hash is used as key for another hash, the former object will be retained even after multiple GC runs.

The following code snippet demonstrates how the hash `{:a => 1}` (which is never used outside the scope of `create`) is retained even after 10 GC runs (`find` will look for an object with a given `object_id` on heap).


```ruby
# frozen_string_literal: true

def create
  h = {{:a => 1} => 2}
  h.keys.first.object_id
end

def find(object_id)
  ObjectSpace.each_object(Hash).any?{|h| h.object_id == object_id} ? 1 : 0
end


leaked = create

10.times do
  GC.start(full_mark: true, immediate_sweep: true)
end

exit find(leaked)
```

This code snippet is expected to exit with `0` while it exits with `1` in my tests. I've tested this on multiple recent ruby versions and OSs, either locally (OSX with homebrew) or in different CIs (e.g. [here](https://github.com/cristiangreco/ruby-hash-leak/commit/285e586b7193104989f59b92579fe8f25770141e/checks?check_suite_id=278711566)).

Can you please help understand what's going on here? Thanks!



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:95555] [Ruby master Bug#16278] Potential memory leak when an hash is used as a key for another hash
       [not found] <redmine.issue-16278.20191023232506@ruby-lang.org>
                   ` (5 preceding siblings ...)
  2019-10-25 21:31 ` [ruby-core:95554] " merch-redmine
@ 2019-10-25 22:53 ` cristian
  2019-10-26  2:21 ` [ruby-core:95557] " merch-redmine
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 13+ messages in thread
From: cristian @ 2019-10-25 22:53 UTC (permalink / raw)
  To: ruby-core

Issue #16278 has been updated by cristiangreco (Cristian Greco).


jeremyevans0 (Jeremy Evans) wrote:

> `object_id` is only unique for the life of the object.  After the object is garbage collected, the same `object_id` could be used for a different object.  So measuring using `object_id` is not a good idea.  Maybe use `ObjectSpace.define_finalizer` in your testing?  I'm not sure this affects your particular test program, it is just a good general principle.
> 
> Can you remove the usage of `Prometheus` in your test script, and design a test script using only core Ruby classes, so we can be sure that `Prometheus` is not holding any references to the hashes?

Hi Jeremy!

I reported an example code using the prometheus gem as requested by Koichi.

The usage of `ObjectSpace.define_finalizer` seems to always force objects to be retained in my tests. However, it seems very unlikely that object_ids are re-used in such short code.

I've put together another snippet that uses again `object_id` for comparison. In my tests `h3` and `h4` are always present on heap after GC:

```ruby
# frozen_string_literal: true

require 'objspace'

class Klass; end

def create
  h1 = { :a => 1 }
  h2 = { Klass.new => 2 }
  h3 = { [Klass.new] => 3 }
  h4 = { { :a => Klass.new } => 4 }

  $id_h1 = h1.object_id
  $id_h2 = h2.object_id
  $id_h3 = h3.object_id
  $id_h4 = h3.object_id

  nil
end

10.times do
  GC.start(full_mark: true, immediate_sweep: true)
end

create

10.times do
  GC.start(full_mark: true, immediate_sweep: true)
end

ObjectSpace.each_object(Hash) do |h|
  puts "found h1" if h.object_id == $id_h1
  puts "found h2" if h.object_id == $id_h2
  puts "found h3" if h.object_id == $id_h3
  puts "found h4" if h.object_id == $id_h4
end
```

For the sake of completeness, this bug report originates from a strange behaviour I've observed using the MemoryProfiler gem: https://github.com/SamSaffron/memory_profiler/issues/81


----------------------------------------
Bug #16278: Potential memory leak when an hash is used as a key for another hash
https://bugs.ruby-lang.org/issues/16278#change-82333

* Author: cristiangreco (Cristian Greco)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-darwin18]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
Hi,

I've been hitting what seems to be a memory leak.

When an hash is used as key for another hash, the former object will be retained even after multiple GC runs.

The following code snippet demonstrates how the hash `{:a => 1}` (which is never used outside the scope of `create`) is retained even after 10 GC runs (`find` will look for an object with a given `object_id` on heap).


```ruby
# frozen_string_literal: true

def create
  h = {{:a => 1} => 2}
  h.keys.first.object_id
end

def find(object_id)
  ObjectSpace.each_object(Hash).any?{|h| h.object_id == object_id} ? 1 : 0
end


leaked = create

10.times do
  GC.start(full_mark: true, immediate_sweep: true)
end

exit find(leaked)
```

This code snippet is expected to exit with `0` while it exits with `1` in my tests. I've tested this on multiple recent ruby versions and OSs, either locally (OSX with homebrew) or in different CIs (e.g. [here](https://github.com/cristiangreco/ruby-hash-leak/commit/285e586b7193104989f59b92579fe8f25770141e/checks?check_suite_id=278711566)).

Can you please help understand what's going on here? Thanks!



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:95557] [Ruby master Bug#16278] Potential memory leak when an hash is used as a key for another hash
       [not found] <redmine.issue-16278.20191023232506@ruby-lang.org>
                   ` (6 preceding siblings ...)
  2019-10-25 22:53 ` [ruby-core:95555] " cristian
@ 2019-10-26  2:21 ` merch-redmine
  2019-10-26  4:13 ` [ruby-core:95559] " nobu
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 13+ messages in thread
From: merch-redmine @ 2019-10-26  2:21 UTC (permalink / raw)
  To: ruby-core

Issue #16278 has been updated by jeremyevans0 (Jeremy Evans).


Here's a modified version of your script, fixing the issue where `$id_h4 = h3.object_id`, and showing the actual contents of the objects found with those ids:

```ruby
require 'objspace'

class Klass; end

def create
  h1 = { :a => 1 }
  h2 = { Klass.new => 2 }
  h3 = { [Klass.new] => 3 }
  h4 = { { :a => Klass.new } => 4 }

  $id_h1 = h1.object_id
  $id_h2 = h2.object_id
  $id_h3 = h3.object_id
  $id_h4 = h4.object_id

  nil
end

10.times do
  GC.start(full_mark: true, immediate_sweep: true)
end

create

10.times do
  GC.start(full_mark: true, immediate_sweep: true)
end

ObjectSpace.each_object(Hash) do |h|
  puts "found h1: #{ObjectSpace._id2ref($id_h1)}" if h.object_id == $id_h1
  puts "found h2: #{ObjectSpace._id2ref($id_h2)}" if h.object_id == $id_h2
  puts "found h3: #{ObjectSpace._id2ref($id_h3)}" if h.object_id == $id_h3
  puts "found h4: #{ObjectSpace._id2ref($id_h4)}" if h.object_id == $id_h4
end
```

Here's the output from running it a few times

```
# first run
found h4: {{:a=>#<Klass:0x00000615fa257108>}=>4}
found h1: {:a=>1}

# second run
found h1: {:a=>1}
found h4: {{:a=>#<Klass:0x0000032a2e2b2c20>}=>4}
found h3: {}
found h2: {:full_mark=>true, :immediate_sweep=>true}

# third run
found h4: {{:a=>#<Klass:0x0000067d0e3a2578>}=>4}
found h1: {:a=>1}

# fourth run (this when compiled with optimizations)
found h4: {{:a=>#<Klass:0x0000042b0e833c30>}=>4}
```

It does seem to always retain `h4`, but that could be due to the conservative GC. When compiled without optimizations (`-O0`), `h1` seems to always be retained as well.  On my system, `h4` seems to always be retained in ruby 2.1 and 2.3-2.7, not always retained in 2.2 and the branch I have to fully separate keyword arguments from positional arguments (https://github.com/jeremyevans/ruby/tree/r3).  As those changes seem unrelated, I'm going to assume those versions just change the memory layout such that the conservative GC no longer retains `h4`.  The fact that `h1` is always retained when compiled without optimizations lends weight to this theory.

I agree with @mame that unless you are seeing unbounded memory growth, this is not a bug, just a result of the conservative GC.

----------------------------------------
Bug #16278: Potential memory leak when an hash is used as a key for another hash
https://bugs.ruby-lang.org/issues/16278#change-82335

* Author: cristiangreco (Cristian Greco)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-darwin18]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
Hi,

I've been hitting what seems to be a memory leak.

When an hash is used as key for another hash, the former object will be retained even after multiple GC runs.

The following code snippet demonstrates how the hash `{:a => 1}` (which is never used outside the scope of `create`) is retained even after 10 GC runs (`find` will look for an object with a given `object_id` on heap).


```ruby
# frozen_string_literal: true

def create
  h = {{:a => 1} => 2}
  h.keys.first.object_id
end

def find(object_id)
  ObjectSpace.each_object(Hash).any?{|h| h.object_id == object_id} ? 1 : 0
end


leaked = create

10.times do
  GC.start(full_mark: true, immediate_sweep: true)
end

exit find(leaked)
```

This code snippet is expected to exit with `0` while it exits with `1` in my tests. I've tested this on multiple recent ruby versions and OSs, either locally (OSX with homebrew) or in different CIs (e.g. [here](https://github.com/cristiangreco/ruby-hash-leak/commit/285e586b7193104989f59b92579fe8f25770141e/checks?check_suite_id=278711566)).

Can you please help understand what's going on here? Thanks!



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:95559] [Ruby master Bug#16278] Potential memory leak when an hash is used as a key for another hash
       [not found] <redmine.issue-16278.20191023232506@ruby-lang.org>
                   ` (7 preceding siblings ...)
  2019-10-26  2:21 ` [ruby-core:95557] " merch-redmine
@ 2019-10-26  4:13 ` nobu
  2019-10-26 12:44 ` [ruby-core:95565] " cristian
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 13+ messages in thread
From: nobu @ 2019-10-26  4:13 UTC (permalink / raw)
  To: ruby-core

Issue #16278 has been updated by nobu (Nobuyoshi Nakada).


Adding this method and calling it after `create` clears the output.

```ruby
def garbage
  h1 = h2 = h3 = h4 = h5 = h6 = h7 = h8 = h9 = h10 = nil
end
```

So a “shadow” seems staying on the VM stack.

----------------------------------------
Bug #16278: Potential memory leak when an hash is used as a key for another hash
https://bugs.ruby-lang.org/issues/16278#change-82337

* Author: cristiangreco (Cristian Greco)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-darwin18]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
Hi,

I've been hitting what seems to be a memory leak.

When an hash is used as key for another hash, the former object will be retained even after multiple GC runs.

The following code snippet demonstrates how the hash `{:a => 1}` (which is never used outside the scope of `create`) is retained even after 10 GC runs (`find` will look for an object with a given `object_id` on heap).


```ruby
# frozen_string_literal: true

def create
  h = {{:a => 1} => 2}
  h.keys.first.object_id
end

def find(object_id)
  ObjectSpace.each_object(Hash).any?{|h| h.object_id == object_id} ? 1 : 0
end


leaked = create

10.times do
  GC.start(full_mark: true, immediate_sweep: true)
end

exit find(leaked)
```

This code snippet is expected to exit with `0` while it exits with `1` in my tests. I've tested this on multiple recent ruby versions and OSs, either locally (OSX with homebrew) or in different CIs (e.g. [here](https://github.com/cristiangreco/ruby-hash-leak/commit/285e586b7193104989f59b92579fe8f25770141e/checks?check_suite_id=278711566)).

Can you please help understand what's going on here? Thanks!



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:95565] [Ruby master Bug#16278] Potential memory leak when an hash is used as a key for another hash
       [not found] <redmine.issue-16278.20191023232506@ruby-lang.org>
                   ` (8 preceding siblings ...)
  2019-10-26  4:13 ` [ruby-core:95559] " nobu
@ 2019-10-26 12:44 ` cristian
  2019-10-26 15:23 ` [ruby-core:95566] " merch-redmine
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 13+ messages in thread
From: cristian @ 2019-10-26 12:44 UTC (permalink / raw)
  To: ruby-core

Issue #16278 has been updated by cristiangreco (Cristian Greco).


Hi Jeremy, thanks for these details!

I don’t know the details of ruby’s GC, seems to me it might behave unpredictably sometimes. I guess what confuses me now is that although that object is retained we don’t observe unbounded memory growth when calling the method in a loop: is it eventually collected? Is it somehow re-used, or what? 

Another thing that intrigues me is that this behaviour is not always consistent: it might vary across different ruby builds (you mentioned that memory layout and compiler optimisations might affect the output) and is also subtly influenced by code changes. For example, if I add this line `puts ObjectSpace.count_objects[:T_HASH]` just after the call to `create` then `h4` is almost never found after GC (or its reference is found associated to another object). Not sure why this happens though!


nobu (Nobuyoshi Nakada) wrote:
> Adding this method and calling it after `create` clears the output.
> 
> ```ruby
> def garbage
>   h1 = h2 = h3 = h4 = h5 = h6 = h7 = h8 = h9 = h10 = nil
> end
> ```
> 
> So a “shadow” seems staying on the VM stack.

Hey Nobu, what is going on here? What means that a shadow is retained on the stack? 


----------------------------------------
Bug #16278: Potential memory leak when an hash is used as a key for another hash
https://bugs.ruby-lang.org/issues/16278#change-82346

* Author: cristiangreco (Cristian Greco)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-darwin18]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
Hi,

I've been hitting what seems to be a memory leak.

When an hash is used as key for another hash, the former object will be retained even after multiple GC runs.

The following code snippet demonstrates how the hash `{:a => 1}` (which is never used outside the scope of `create`) is retained even after 10 GC runs (`find` will look for an object with a given `object_id` on heap).


```ruby
# frozen_string_literal: true

def create
  h = {{:a => 1} => 2}
  h.keys.first.object_id
end

def find(object_id)
  ObjectSpace.each_object(Hash).any?{|h| h.object_id == object_id} ? 1 : 0
end


leaked = create

10.times do
  GC.start(full_mark: true, immediate_sweep: true)
end

exit find(leaked)
```

This code snippet is expected to exit with `0` while it exits with `1` in my tests. I've tested this on multiple recent ruby versions and OSs, either locally (OSX with homebrew) or in different CIs (e.g. [here](https://github.com/cristiangreco/ruby-hash-leak/commit/285e586b7193104989f59b92579fe8f25770141e/checks?check_suite_id=278711566)).

Can you please help understand what's going on here? Thanks!



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:95566] [Ruby master Bug#16278] Potential memory leak when an hash is used as a key for another hash
       [not found] <redmine.issue-16278.20191023232506@ruby-lang.org>
                   ` (9 preceding siblings ...)
  2019-10-26 12:44 ` [ruby-core:95565] " cristian
@ 2019-10-26 15:23 ` merch-redmine
  2019-10-26 16:31 ` [ruby-core:95567] " XrXr
  2019-10-27 20:41 ` [ruby-core:95571] " sacrogemini
  12 siblings, 0 replies; 13+ messages in thread
From: merch-redmine @ 2019-10-26 15:23 UTC (permalink / raw)
  To: ruby-core

Issue #16278 has been updated by jeremyevans0 (Jeremy Evans).

Status changed from Open to Rejected

cristiangreco (Cristian Greco) wrote:
> Hi Jeremy, thanks for these details!
> 
> I don’t know the details of ruby’s GC, seems to me it might behave unpredictably sometimes. I guess what confuses me now is that although that object is retained we don’t observe unbounded memory growth when calling the method in a loop: is it eventually collected? Is it somehow re-used, or what? 

It is eventually collected when Ruby's GC can no longer find a reference to it.  The reason it may be retained even though there is no direct reference to it in Ruby is that pointers to the object may still exist somewhere on the the C/Ruby stack.  Ruby's GC is conservative, part of the scan for objects is just looking at at stack memory, and if any memory could be considered a pointer to a Ruby object, that Ruby object is retained during that GC pass.

This conservative GC is needed to make C extensions work.  C local variables may hold pointers to Ruby objects, and collecting the Ruby objects while there is a C local variable pointing to them can cause undefined behavior (often a segfault).

> Another thing that intrigues me is that this behaviour is not always consistent: it might vary across different ruby builds (you mentioned that memory layout and compiler optimisations might affect the output) and is also subtly influenced by code changes. For example, if I add this line `puts ObjectSpace.count_objects[:T_HASH]` just after the call to `create` then `h4` is almost never found after GC (or its reference is found associated to another object). Not sure why this happens though!

It happens because the memory layout changed, and there are no longer locations in memory on the stack that point to the objects.

> nobu (Nobuyoshi Nakada) wrote:
> > Adding this method and calling it after `create` clears the output.
> > 
> > ```ruby
> > def garbage
> >   h1 = h2 = h3 = h4 = h5 = h6 = h7 = h8 = h9 = h10 = nil
> > end
> > ```
> > 
> > So a “shadow” seems staying on the VM stack.
> 
> Hey Nobu, what is going on here? What means that a shadow is retained on the stack?

It just means there are locations in memory on the stack that point to the Ruby objects.  By calling another method that sets a bunch of local variables, the locations in memory that previously pointed to the hashes now are modified to contain `nil` (`Qnil` in C, 0x4 or 0x8 depending on processor type).  Since no locations in memory point to the hashes, the hashes can then be GCed.

Hopefully that provides an adequate explanation for Ruby's GC.  For more background, please review some conference presentations by @ko1 :  http://www.atdot.net/~ko1/activities/.  I'm going to close this now.  Please only reopen if you can show a case with unbounded memory growth.

----------------------------------------
Bug #16278: Potential memory leak when an hash is used as a key for another hash
https://bugs.ruby-lang.org/issues/16278#change-82347

* Author: cristiangreco (Cristian Greco)
* Status: Rejected
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-darwin18]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
Hi,

I've been hitting what seems to be a memory leak.

When an hash is used as key for another hash, the former object will be retained even after multiple GC runs.

The following code snippet demonstrates how the hash `{:a => 1}` (which is never used outside the scope of `create`) is retained even after 10 GC runs (`find` will look for an object with a given `object_id` on heap).


```ruby
# frozen_string_literal: true

def create
  h = {{:a => 1} => 2}
  h.keys.first.object_id
end

def find(object_id)
  ObjectSpace.each_object(Hash).any?{|h| h.object_id == object_id} ? 1 : 0
end


leaked = create

10.times do
  GC.start(full_mark: true, immediate_sweep: true)
end

exit find(leaked)
```

This code snippet is expected to exit with `0` while it exits with `1` in my tests. I've tested this on multiple recent ruby versions and OSs, either locally (OSX with homebrew) or in different CIs (e.g. [here](https://github.com/cristiangreco/ruby-hash-leak/commit/285e586b7193104989f59b92579fe8f25770141e/checks?check_suite_id=278711566)).

Can you please help understand what's going on here? Thanks!



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:95567] [Ruby master Bug#16278] Potential memory leak when an hash is used as a key for another hash
       [not found] <redmine.issue-16278.20191023232506@ruby-lang.org>
                   ` (10 preceding siblings ...)
  2019-10-26 15:23 ` [ruby-core:95566] " merch-redmine
@ 2019-10-26 16:31 ` XrXr
  2019-10-27 20:41 ` [ruby-core:95571] " sacrogemini
  12 siblings, 0 replies; 13+ messages in thread
From: XrXr @ 2019-10-26 16:31 UTC (permalink / raw)
  To: ruby-core

Issue #16278 has been updated by alanwu (Alan Wu).


The GC scans the native stack for Ruby's C code to find values to retain.
Pointers to Ruby's heap objects can end up on the native stack for a variety
of reasons, and this is mostly up to the C compiler. Whether a pointer ends up
on the native stack depends on how Ruby is built, inlining decisions, register
alloation and a bunch of other things -- this is not something Ruby's code has
direct control over.

The generated machine code might interact in a way such that certain sequence
of Ruby code overwrite heap pointers on the native stack, letting the GC
collect those objects.

To illustrate, here is a modified version of your prometheus script:


```ruby
# frozen_string_literal: true

require "prometheus/client"
require "prometheus/client/formats/text"

require "prometheus/client/data_stores/synchronized"

Prometheus::Client.config.data_store = Prometheus::Client::DataStores::Synchronized.new

def test
  registry = Prometheus::Client::Registry.new
  counter = registry.counter(
    :counter_metric,
    docstring: "a counter",
    labels: %i[label1 label2],
  )

  labels1 = { label1: "foo", label2: "bar" }
  labels2 = { label1: 1, label2: 2 }
  labels3 = { label1: :a, label2: :b }

  counter.increment(by: 1, labels: labels1)
  counter.increment(by: 1, labels: labels2)
  counter.increment(by: 1, labels: labels3)

  puts Prometheus::Client::Formats::Text.marshal(registry)

  ret = [labels1.object_id, labels2.object_id, labels3.object_id]
  return ret
  3.itself
  ret
end

def find(id)
  ObjectSpace.each_object(Hash).find { |h| id == h.object_id }
end

retained_ids = test

10.times do
  GC.start(full_mark: true, immediate_sweep: true)
end

retained_ids.each { |id|
  obj = find(id)
  puts "found #{id} #{obj}" if obj
}

```

Tracing with GDB, you can find that `labels3` is retained because the compiler
chose to spill the receiver onto the native stack whenever a call without block
happens:


```
insns.def:
763     calling.block_handler = VM_BLOCK_HANDLER_NONE;
   0x00005555557284ee <+174>:   mov    %r15,%rbx
   0x00005555557284f1 <+177>:   movq   $0x0,0x70(%rsp)

764     vm_search_method(ci, cc, calling.recv = TOPN(calling.argc = ci->orig_argc));
   0x00005555557284fa <+186>:   movslq 0xc(%rcx),%rax
   0x00005555557284fe <+190>:   mov    %eax,0x80(%rsp)
   0x0000555555728505 <+197>:   shl    $0x3,%rax
   0x0000555555728509 <+201>:   sub    %rax,%rdx
   0x000055555572850c <+204>:   mov    -0x8(%rdx),%rax
   0x0000555555728510 <+208>:   mov    %rax,0x78(%rsp) <<<< Stores pointer to Ruby object on the native stack <<<<<<<

./include/ruby/ruby.h:
2058        if (RB_IMMEDIATE_P(obj)) {
=> 0x0000555555728515 <+213>:   test   $0x7,%al
   0x0000555555728517 <+215>:   je     0x55555572b24b <vm_exec_core+11787>

2059        if (RB_FIXNUM_P(obj)) return rb_cInteger;
   0x000055555572851d <+221>:   mov    0x1b56a4(%rip),%r12        # 0x5555558ddbc8 <rb_cInteger>
   0x0000555555728524 <+228>:   test   $0x1,%al
   0x0000555555728526 <+230>:   jne    0x555555728555 <vm_exec_core+277>
```

`labels3` happens to be the last receiver in the method so it is left on the
stack and retained. We can allow the GC to collect `labels3` by performing
another method call on some other object before the end of the `test` method.
Commenting out `return ret`, the VM spills `3` onto the same location that
`labels3` occupied, allowing the GC to collect labels3.

Again, all this is up to the exact machine code output of your compiler, so
what I'm seeing here on my setup might be completely different from yours.
This kind of object retainment can happen anywhere in the Ruby's C code and
can easily be introduced or removed by changes to the VM or the Ruby code in
question.

This kind of extraneous retainment is an inherent property of Ruby's GC.
There is no memory growth since the fixed size native stack puts a cap on
the number of objects that could be retained in this manner.

This could be the reason to the memory bloat problems in your app, but I think
other explanations are more plausible. I would suggest taking another look at
your heap profile. There should be other, more actionable instances of
object retainment in your app.


----------------------------------------
Bug #16278: Potential memory leak when an hash is used as a key for another hash
https://bugs.ruby-lang.org/issues/16278#change-82348

* Author: cristiangreco (Cristian Greco)
* Status: Rejected
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-darwin18]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
Hi,

I've been hitting what seems to be a memory leak.

When an hash is used as key for another hash, the former object will be retained even after multiple GC runs.

The following code snippet demonstrates how the hash `{:a => 1}` (which is never used outside the scope of `create`) is retained even after 10 GC runs (`find` will look for an object with a given `object_id` on heap).


```ruby
# frozen_string_literal: true

def create
  h = {{:a => 1} => 2}
  h.keys.first.object_id
end

def find(object_id)
  ObjectSpace.each_object(Hash).any?{|h| h.object_id == object_id} ? 1 : 0
end


leaked = create

10.times do
  GC.start(full_mark: true, immediate_sweep: true)
end

exit find(leaked)
```

This code snippet is expected to exit with `0` while it exits with `1` in my tests. I've tested this on multiple recent ruby versions and OSs, either locally (OSX with homebrew) or in different CIs (e.g. [here](https://github.com/cristiangreco/ruby-hash-leak/commit/285e586b7193104989f59b92579fe8f25770141e/checks?check_suite_id=278711566)).

Can you please help understand what's going on here? Thanks!



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [ruby-core:95571] [Ruby master Bug#16278] Potential memory leak when an hash is used as a key for another hash
       [not found] <redmine.issue-16278.20191023232506@ruby-lang.org>
                   ` (11 preceding siblings ...)
  2019-10-26 16:31 ` [ruby-core:95567] " XrXr
@ 2019-10-27 20:41 ` sacrogemini
  12 siblings, 0 replies; 13+ messages in thread
From: sacrogemini @ 2019-10-27 20:41 UTC (permalink / raw)
  To: ruby-core

Issue #16278 has been updated by cristiangreco (Cristian Greco).


Thanks both @jeremyevans0 @alanwu for the rich and clear explanations! 

----------------------------------------
Bug #16278: Potential memory leak when an hash is used as a key for another hash
https://bugs.ruby-lang.org/issues/16278#change-82352

* Author: cristiangreco (Cristian Greco)
* Status: Rejected
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-darwin18]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
Hi,

I've been hitting what seems to be a memory leak.

When an hash is used as key for another hash, the former object will be retained even after multiple GC runs.

The following code snippet demonstrates how the hash `{:a => 1}` (which is never used outside the scope of `create`) is retained even after 10 GC runs (`find` will look for an object with a given `object_id` on heap).


```ruby
# frozen_string_literal: true

def create
  h = {{:a => 1} => 2}
  h.keys.first.object_id
end

def find(object_id)
  ObjectSpace.each_object(Hash).any?{|h| h.object_id == object_id} ? 1 : 0
end


leaked = create

10.times do
  GC.start(full_mark: true, immediate_sweep: true)
end

exit find(leaked)
```

This code snippet is expected to exit with `0` while it exits with `1` in my tests. I've tested this on multiple recent ruby versions and OSs, either locally (OSX with homebrew) or in different CIs (e.g. [here](https://github.com/cristiangreco/ruby-hash-leak/commit/285e586b7193104989f59b92579fe8f25770141e/checks?check_suite_id=278711566)).

Can you please help understand what's going on here? Thanks!



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-10-27 20:41 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <redmine.issue-16278.20191023232506@ruby-lang.org>
2019-10-23 23:25 ` [ruby-core:95519] [Ruby master Bug#16278] Potential memory leak when an hash is used as a key for another hash cristian
2019-10-23 23:43 ` [ruby-core:95520] " mame
2019-10-24  7:55 ` [ruby-core:95528] " cristian
2019-10-25  6:07 ` [ruby-core:95539] " ko1
2019-10-25 21:18 ` [ruby-core:95553] " cristian
2019-10-25 21:31 ` [ruby-core:95554] " merch-redmine
2019-10-25 22:53 ` [ruby-core:95555] " cristian
2019-10-26  2:21 ` [ruby-core:95557] " merch-redmine
2019-10-26  4:13 ` [ruby-core:95559] " nobu
2019-10-26 12:44 ` [ruby-core:95565] " cristian
2019-10-26 15:23 ` [ruby-core:95566] " merch-redmine
2019-10-26 16:31 ` [ruby-core:95567] " XrXr
2019-10-27 20:41 ` [ruby-core:95571] " sacrogemini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).