ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
From: merch-redmine@jeremyevans.net
To: ruby-core@ruby-lang.org
Subject: [ruby-core:95566] [Ruby master Bug#16278] Potential memory leak when an hash is used as a key for another hash
Date: Sat, 26 Oct 2019 15:23:30 +0000 (UTC)	[thread overview]
Message-ID: <redmine.journal-82347.20191026152329.466cafc144278192@ruby-lang.org> (raw)
In-Reply-To: redmine.issue-16278.20191023232506@ruby-lang.org

Issue #16278 has been updated by jeremyevans0 (Jeremy Evans).

Status changed from Open to Rejected

cristiangreco (Cristian Greco) wrote:
> Hi Jeremy, thanks for these details!
> 
> I don’t know the details of ruby’s GC, seems to me it might behave unpredictably sometimes. I guess what confuses me now is that although that object is retained we don’t observe unbounded memory growth when calling the method in a loop: is it eventually collected? Is it somehow re-used, or what? 

It is eventually collected when Ruby's GC can no longer find a reference to it.  The reason it may be retained even though there is no direct reference to it in Ruby is that pointers to the object may still exist somewhere on the the C/Ruby stack.  Ruby's GC is conservative, part of the scan for objects is just looking at at stack memory, and if any memory could be considered a pointer to a Ruby object, that Ruby object is retained during that GC pass.

This conservative GC is needed to make C extensions work.  C local variables may hold pointers to Ruby objects, and collecting the Ruby objects while there is a C local variable pointing to them can cause undefined behavior (often a segfault).

> Another thing that intrigues me is that this behaviour is not always consistent: it might vary across different ruby builds (you mentioned that memory layout and compiler optimisations might affect the output) and is also subtly influenced by code changes. For example, if I add this line `puts ObjectSpace.count_objects[:T_HASH]` just after the call to `create` then `h4` is almost never found after GC (or its reference is found associated to another object). Not sure why this happens though!

It happens because the memory layout changed, and there are no longer locations in memory on the stack that point to the objects.

> nobu (Nobuyoshi Nakada) wrote:
> > Adding this method and calling it after `create` clears the output.
> > 
> > ```ruby
> > def garbage
> >   h1 = h2 = h3 = h4 = h5 = h6 = h7 = h8 = h9 = h10 = nil
> > end
> > ```
> > 
> > So a “shadow” seems staying on the VM stack.
> 
> Hey Nobu, what is going on here? What means that a shadow is retained on the stack?

It just means there are locations in memory on the stack that point to the Ruby objects.  By calling another method that sets a bunch of local variables, the locations in memory that previously pointed to the hashes now are modified to contain `nil` (`Qnil` in C, 0x4 or 0x8 depending on processor type).  Since no locations in memory point to the hashes, the hashes can then be GCed.

Hopefully that provides an adequate explanation for Ruby's GC.  For more background, please review some conference presentations by @ko1 :  http://www.atdot.net/~ko1/activities/.  I'm going to close this now.  Please only reopen if you can show a case with unbounded memory growth.

----------------------------------------
Bug #16278: Potential memory leak when an hash is used as a key for another hash
https://bugs.ruby-lang.org/issues/16278#change-82347

* Author: cristiangreco (Cristian Greco)
* Status: Rejected
* Priority: Normal
* Assignee: 
* Target version: 
* ruby -v: ruby 2.6.5p114 (2019-10-01 revision 67812) [x86_64-darwin18]
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN
----------------------------------------
Hi,

I've been hitting what seems to be a memory leak.

When an hash is used as key for another hash, the former object will be retained even after multiple GC runs.

The following code snippet demonstrates how the hash `{:a => 1}` (which is never used outside the scope of `create`) is retained even after 10 GC runs (`find` will look for an object with a given `object_id` on heap).


```ruby
# frozen_string_literal: true

def create
  h = {{:a => 1} => 2}
  h.keys.first.object_id
end

def find(object_id)
  ObjectSpace.each_object(Hash).any?{|h| h.object_id == object_id} ? 1 : 0
end


leaked = create

10.times do
  GC.start(full_mark: true, immediate_sweep: true)
end

exit find(leaked)
```

This code snippet is expected to exit with `0` while it exits with `1` in my tests. I've tested this on multiple recent ruby versions and OSs, either locally (OSX with homebrew) or in different CIs (e.g. [here](https://github.com/cristiangreco/ruby-hash-leak/commit/285e586b7193104989f59b92579fe8f25770141e/checks?check_suite_id=278711566)).

Can you please help understand what's going on here? Thanks!



-- 
https://bugs.ruby-lang.org/

  parent reply	other threads:[~2019-10-26 15:23 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <redmine.issue-16278.20191023232506@ruby-lang.org>
2019-10-23 23:25 ` [ruby-core:95519] [Ruby master Bug#16278] Potential memory leak when an hash is used as a key for another hash cristian
2019-10-23 23:43 ` [ruby-core:95520] " mame
2019-10-24  7:55 ` [ruby-core:95528] " cristian
2019-10-25  6:07 ` [ruby-core:95539] " ko1
2019-10-25 21:18 ` [ruby-core:95553] " cristian
2019-10-25 21:31 ` [ruby-core:95554] " merch-redmine
2019-10-25 22:53 ` [ruby-core:95555] " cristian
2019-10-26  2:21 ` [ruby-core:95557] " merch-redmine
2019-10-26  4:13 ` [ruby-core:95559] " nobu
2019-10-26 12:44 ` [ruby-core:95565] " cristian
2019-10-26 15:23 ` merch-redmine [this message]
2019-10-26 16:31 ` [ruby-core:95567] " XrXr
2019-10-27 20:41 ` [ruby-core:95571] " sacrogemini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.ruby-lang.org/en/community/mailing-lists/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=redmine.journal-82347.20191026152329.466cafc144278192@ruby-lang.org \
    --to=ruby-core@ruby-lang.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).