From: XrXr@users.noreply.github.com
To: ruby-core@ruby-lang.org
Subject: [ruby-core:95567] [Ruby master Bug#16278] Potential memory leak when an hash is used as a key for another hash
Date: Sat, 26 Oct 2019 16:31:01 +0000 (UTC) [thread overview]
Message-ID: <redmine.journal-82348.20191026163100.396e061f001b8218@ruby-lang.org> (raw)
In-Reply-To: redmine.issue-16278.20191023232506@ruby-lang.org
Issue #16278 has been updated by alanwu (Alan Wu).
The GC scans the native stack for Ruby's C code to find values to retain.
Pointers to Ruby's heap objects can end up on the native stack for a variety
of reasons, and this is mostly up to the C compiler. Whether a pointer ends up
on the native stack depends on how Ruby is built, inlining decisions, register
alloation and a bunch of other things -- this is not something Ruby's code has
direct control over.
The generated machine code might interact in a way such that certain sequence
of Ruby code overwrite heap pointers on the native stack, letting the GC
collect those objects.
To illustrate, here is a modified version of your prometheus script:
```ruby
# frozen_string_literal: true
require "prometheus/client"
require "prometheus/client/formats/text"
require "prometheus/client/data_stores/synchronized"
Prometheus::Client.config.data_store = Prometheus::Client::DataStores::Synchronized.new
def test
registry = Prometheus::Client::Registry.new
counter = registry.counter(
:counter_metric,
docstring: "a counter",
labels: %i[label1 label2],
)
labels1 = { label1: "foo", label2: "bar" }
labels2 = { label1: 1, label2: 2 }
labels3 = { label1: :a, label2: :b }
counter.increment(by: 1, labels: labels1)
counter.increment(by: 1, labels: labels2)
counter.increment(by: 1, labels: labels3)
puts Prometheus::Client::Formats::Text.marshal(registry)
ret = [labels1.object_id, labels2.object_id, labels3.object_id]
return ret
3.itself
ret
end
def find(id)
ObjectSpace.each_object(Hash).find { |h| id == h.object_id }
end
retained_ids = test
10.times do
GC.start(full_mark: true, immediate_sweep: true)
end
retained_ids.each { |id|
obj = find(id)
puts "found #{id} #{obj}" if obj
}
```
Tracing with GDB, you can find that `labels3` is retained because the compiler
chose to spill the receiver onto the native stack whenever a call without block
happens:
```
insns.def:
763 calling.block_handler = VM_BLOCK_HANDLER_NONE;
0x00005555557284ee <+174>: mov %r15,%rbx
0x00005555557284f1 <+177>: movq $0x0,0x70(%rsp)
764 vm_search_method(ci, cc, calling.recv = TOPN(calling.argc = ci->orig_argc));
0x00005555557284fa <+186>: movslq 0xc(%rcx),%rax
0x00005555557284fe <+190>: mov %eax,0x80(%rsp)
0x0000555555728505 <+197>: shl $0x3,%rax
0x0000555555728509 <+201>: sub %rax,%rdx
0x000055555572850c <+204>: mov -0x8(%rdx),%rax
0x0000555555728510 <+208>: mov %rax,0x78(%rsp) <<<< Stores pointer to Ruby object on the native stack <<<<<<<
./include/ruby/ruby.h:
2058 if (RB_IMMEDIATE_P(obj)) {
=> 0x0000555555728515 <+213>: test $0x7,%al
0x0000555555728517 <+215>: je 0x55555572b24b <vm_exec_core+11787>
2059 if (RB_FIXNUM_P(obj)) return rb_cInteger;
0x000055555572851d <+221>: mov 0x1b56a4(%rip),%r12 # 0x5555558ddbc8 <rb_cInteger>
0x0000555555728524 <+228>: test $0x1,%al
0x0000555555728526 <+230>: jne 0x555555728555 <vm_exec_core+277>
```
`labels3` happens to be the last receiver in the method so it is left on the
stack and retained. We can allow the GC to collect `labels3` by performing
another method call on some other object before the end of the `test` method.
Commenting out `return ret`, the VM spills `3` onto the same location that
`labels3` occupied, allowing the GC to collect labels3.
Again, all this is up to the exact machine code output of your compiler, so
what I'm seeing here on my setup might be completely different from yours.
This kind of object retainment can happen anywhere in the Ruby's C code and
can easily be introduced or removed by changes to the VM or the Ruby code in
question.
This kind of extraneous retainment is an inherent property of Ruby's GC.
There is no memory growth since the fixed size native stack puts a cap on
the number of objects that could be retained in this manner.
This could be the reason to the memory bloat problems in your app, but I think
other explanations are more plausible. I would suggest taking another look at
your heap profile. There should be other, more actionable instances of
object retainment in your app.
----------------------------------------
Bug #16278: Potential memory leak when an hash is used as a key for another hash
https://bugs.ruby-lang.org/issues/16278#change-82348
* Author: cristiangreco (Cristian Greco)
* Status: Rejected
* Priority: Normal
* Assignee:
* Target version:
* ruby -v: ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-darwin18]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
Hi,
I've been hitting what seems to be a memory leak.
When an hash is used as key for another hash, the former object will be retained even after multiple GC runs.
The following code snippet demonstrates how the hash `{:a => 1}` (which is never used outside the scope of `create`) is retained even after 10 GC runs (`find` will look for an object with a given `object_id` on heap).
```ruby
# frozen_string_literal: true
def create
h = {{:a => 1} => 2}
h.keys.first.object_id
end
def find(object_id)
ObjectSpace.each_object(Hash).any?{|h| h.object_id == object_id} ? 1 : 0
end
leaked = create
10.times do
GC.start(full_mark: true, immediate_sweep: true)
end
exit find(leaked)
```
This code snippet is expected to exit with `0` while it exits with `1` in my tests. I've tested this on multiple recent ruby versions and OSs, either locally (OSX with homebrew) or in different CIs (e.g. [here](https://github.com/cristiangreco/ruby-hash-leak/commit/285e586b7193104989f59b92579fe8f25770141e/checks?check_suite_id=278711566)).
Can you please help understand what's going on here? Thanks!
--
https://bugs.ruby-lang.org/
next prev parent reply other threads:[~2019-10-26 16:31 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <redmine.issue-16278.20191023232506@ruby-lang.org>
2019-10-23 23:25 ` [ruby-core:95519] [Ruby master Bug#16278] Potential memory leak when an hash is used as a key for another hash cristian
2019-10-23 23:43 ` [ruby-core:95520] " mame
2019-10-24 7:55 ` [ruby-core:95528] " cristian
2019-10-25 6:07 ` [ruby-core:95539] " ko1
2019-10-25 21:18 ` [ruby-core:95553] " cristian
2019-10-25 21:31 ` [ruby-core:95554] " merch-redmine
2019-10-25 22:53 ` [ruby-core:95555] " cristian
2019-10-26 2:21 ` [ruby-core:95557] " merch-redmine
2019-10-26 4:13 ` [ruby-core:95559] " nobu
2019-10-26 12:44 ` [ruby-core:95565] " cristian
2019-10-26 15:23 ` [ruby-core:95566] " merch-redmine
2019-10-26 16:31 ` XrXr [this message]
2019-10-27 20:41 ` [ruby-core:95571] " sacrogemini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-list from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.ruby-lang.org/en/community/mailing-lists/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=redmine.journal-82348.20191026163100.396e061f001b8218@ruby-lang.org \
--to=ruby-core@ruby-lang.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).