ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
From: matz@ruby.or.jp
To: ruby-core@ruby-lang.org
Subject: [ruby-core:100557] [Ruby master Feature#17176] GC.auto_compact / GC.auto_compact=(flag)
Date: Mon, 26 Oct 2020 06:40:47 +0000 (UTC)	[thread overview]
Message-ID: <redmine.journal-88186.20201026064047.73@ruby-lang.org> (raw)
In-Reply-To: redmine.issue-17176.20200916183914.73@ruby-lang.org

Issue #17176 has been updated by matz (Yukihiro Matsumoto).


Accepted as long as it's turned off by default.

Matz.


----------------------------------------
Feature #17176: GC.auto_compact / GC.auto_compact=(flag)
https://bugs.ruby-lang.org/issues/17176#change-88186

* Author: tenderlovemaking (Aaron Patterson)
* Status: Open
* Priority: Normal
----------------------------------------
Hi,

I'd like to make compaction automatic eventually.  As a first step, I would like to introduce two functions:

* GC.enable_autocompact
* GC.disable_autocompact

One function enables auto compaction, the other one disables it.  Automatic compaction is *disabled* by default.  When it is enabled it will happen only on every major GC.

I've made a pull request here: https://github.com/ruby/ruby/pull/3547

This patch makes _object movement_ happen at the same time as page sweep.  When one page finishes sweeping, that page is filled.

## Sweep + Move Phase

During sweep, we keep a pointer to the current sweeping page.  This pointer is kept in [`heap->sweeping_page`](https://github.com/ruby/ruby/blob/8a4d8fa0ea463d44486bf2447ea9830593768fd7/gc.c#L4817).  At the beginning of sweep, this is the *first* element of the heap's linked list.

At the same time, the compaction process points at the *last* page in the heap, and that is stored in `heap->compact_cursor` [here](https://github.com/ruby/ruby/blob/8a4d8fa0ea463d44486bf2447ea9830593768fd7/gc.c#L5023).

Incremental sweeping sweeps one page at a time in the [`gc_page_sweep` function](https://github.com/ruby/ruby/blob/8a4d8fa0ea463d44486bf2447ea9830593768fd7/gc.c#L4624).  At the end of that function, we call [`gc_fill_swept_page`](https://github.com/ruby/ruby/blob/8a4d8fa0ea463d44486bf2447ea9830593768fd7/gc.c#L4738-L4742).  `gc_fill_swept_page` fills the page that was just swept and moves the movement cursor towards the sweeping cursor.

When the sweeping cursor and the movement cursor meet, sweeping is paused, and references are updated.  This can happen in 2 ways, the sweeping cursor "runs in to the moving cursor" which is [here](https://github.com/ruby/ruby/blob/8a4d8fa0ea463d44486bf2447ea9830593768fd7/gc.c#L4634-L4644).  Or the moving cursor runs in to the sweep cursor which happens [here](https://github.com/ruby/ruby/blob/8a4d8fa0ea463d44486bf2447ea9830593768fd7/gc.c#L4425-L4430).

Either way, the sweep step is paused and references are updated.

## Reference Updating

Reference updating hasn't changed, but since reference updating happens before the GC finishes a cycle, it must take in to account garbage objects [here](https://github.com/ruby/ruby/blob/8a4d8fa0ea463d44486bf2447ea9830593768fd7/gc.c#L8971-L8977).

## Read Barrier

During the sweep phase, some objects may touch other objects.  For example, `T_CLASS` [must remove itself from a parent class](https://github.com/ruby/ruby/blob/8a4d8fa0ea463d44486bf2447ea9830593768fd7/gc.c#L2769-L2770).

```ruby
class A; end
class B < A; end

const_set(:B, nil)
```

When `B` is freed, it must remove itself from `A`'s subclasses.  But what if `A` moved?  To fix this, I've introduced a read barrier.  The read barrier protects `heap_page_body` using `mprotect`.  If something tries to read from the page, an exception will occur and we can move all objects back to the page (invalidate the movement).

The lock function is [here](https://github.com/ruby/ruby/blob/8a4d8fa0ea463d44486bf2447ea9830593768fd7/gc.c#L4321-L4335).
The unlock function is [here](https://github.com/ruby/ruby/blob/8a4d8fa0ea463d44486bf2447ea9830593768fd7/gc.c#L4337-L4351).

It uses `sigaction` to catch the exception [here](https://github.com/ruby/ruby/blob/8a4d8fa0ea463d44486bf2447ea9830593768fd7/gc.c#L4514-L4530).

## Cross Platform

`mprotect` and `sigaction` are not cross platform, they doesn't work on Windows. On Windows the read barrier uses exception handlers that are built in to Windows.  I implemented them [here](https://github.com/ruby/ruby/blob/8a4d8fa0ea463d44486bf2447ea9830593768fd7/gc.c#L4471-L4503).

The read barrier seems to work on all platforms we're testing.

## Statistics

`GC.stat(:compact_count)` contains the number of times compaction has happened, so we can write things like this:

```ruby
GC.enable_autocompact

cc = GC.stat(:compact_count)
list = []
loop do
  500.times { list << Object.new }
  break if cc < GC.stat(:compact_count)
end

p GC.stat(:compact_count)
```

We can check when the read barrier is triggered with `GC.stat(:read_barrier_faults)`

I've also added `GC.latest_compact_info` so you can see what types of objects moved and how many.  For example:

```
[aaron@tc-lan-adapter ~/g/ruby (autocompact)]$ cat test.rb
list = []
500.times {
  list << Object.new
  Object.new
  Object.new
}

GC.enable_autocompact
count = GC.stat :compact_count
loop do
  list << Object.new
  break if GC.stat(:compact_count) > count
end

p GC.latest_compact_info
[aaron@tc-lan-adapter ~/g/ruby (autocompact)]$ make runruby
./miniruby -I./lib -I. -I.ext/common  ./tool/runruby.rb --extout=.ext  -- --disable-gems ./test.rb
{:considered=>{:T_OBJECT=>408}, :moved=>{:T_OBJECT=>408}}
[aaron@tc-lan-adapter ~/g/ruby (autocompact)]$
```

## Recap

New methods:

* GC.enable_autocompact
* GC.disable_autocompact
* GC.last_compact_info

New statistics in `GC.stat`:

* GC.stat(:read_barrier_faults)

Diff is here: https://github.com/ruby/ruby/pull/3547



-- 
https://bugs.ruby-lang.org/

      parent reply	other threads:[~2020-10-26  6:41 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-16 18:39 [ruby-core:100025] [Ruby master Feature#17176] GC.enable_autocompact / GC.disable_autocompact tenderlove
2020-09-16 21:08 ` [ruby-core:100026] " eregontp
2020-09-16 21:15 ` [ruby-core:100028] " tenderlove
2020-09-18  7:54 ` [ruby-core:100031] " ko1
2020-09-18  7:55 ` [ruby-core:100032] " ko1
2020-10-01 20:50 ` [ruby-core:100261] " tenderlove
2020-10-01 21:16 ` [ruby-core:100263] " eregontp
2020-10-01 21:21 ` [ruby-core:100264] " jean.boussier
2020-10-01 21:37 ` [ruby-core:100265] " tenderlove
2020-10-20 19:39 ` [ruby-core:100449] [Ruby master Feature#17176] GC.auto_compact / GC.auto_compact=(flag) tenderlove
2020-10-26  6:40 ` matz [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-list from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.ruby-lang.org/en/community/mailing-lists/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=redmine.journal-88186.20201026064047.73@ruby-lang.org \
    --to=ruby-core@ruby-lang.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).