ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:98935] [Ruby master Feature#16984] Remove write barrier examption for T_ICLASS
@ 2020-06-25  1:03 XrXr
  2020-06-27 19:12 ` [ruby-core:98988] " ko1
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: XrXr @ 2020-06-25  1:03 UTC (permalink / raw)
  To: ruby-core

Issue #16984 has been reported by alanwu (Alan Wu).

----------------------------------------
Feature #16984: Remove write barrier examption for T_ICLASS
https://bugs.ruby-lang.org/issues/16984

* Author: alanwu (Alan Wu)
* Status: Open
* Priority: Normal
----------------------------------------
Consider the following code:

```ruby
module M
  def foo; end
  def bar; end
end

class C
  include M
end
```

The object reference graph from running the code looks like this:

```
+---+              +-----+
| M |--------------| foo |-+
+---+              +-----+ |
  |                +-----+ |
  +----------------| bar | |
                   +-----+ |
+-----------+         |    |
| iclass(M) |---------+    |
+-----------+--------------+
```

Applying the proposed patch, the graph becomes

```
+---+         +--------------+   +-----+
| M |---------| method table |---| foo |
+---+         +--------------+   +-----+
+-----------+         |    |     +-----+
| iclass(M) |---------+    +-----| bar |
+-----------+                    +-----+

```

This change has a similar effect on the constant table. In addition to this, T_ICLASS no longer
holds a reference to a ivar table. Code that access the ivar table through iclasses
are changed to access it through the object from which the iclass was made. This change
impacts autoload and class variable lookup.

## Why?

The main goal of this change is to make iclasses and modules write barrier protected. At the moment, they are
"shady", which means the GC has to do extra work to handle them. In code bases that use modules a lot,
iclasses can easily take up a significant portion of the heap and impact GC time.

In the old setup, because of the way `M` and `iclass(M)` share the method table, adding a single method
to `M` would create multiple edges on the object reference graph. To safely make `M` and `iclass(M)`
write barrier protected, one would need to trigger a write barrier for each new edge. This would
make the amount of work it takes to add a method a function of the number of times the target module is
included.

The new setup also factors the edges in the graph. If the number of methods in a module is `M` and the number
of times the module is prepended or included is `N`, the old setup had `M * (N+1)` edges. The new setup has
`M + N + 1` edges instead. For large enough `M` and `N`, the new setup produces fewer edges. Having fewer
edges is better since the GC's work is proportional to the number of edges.

## Impact to GC time

I measured the impact to minor GC time with the following steps:
 - load an application
 - run `GC::Profile.enable`
 - allocate 50 million objects
 - run `GC::Profile.report`

Here is the impact to average minor GC time on various apps:

|Application             |     Before    |  After  | Speedup ratio |
|------------------------|---------------|---------|---------------|
|CRuby's test-all suite  |  2.438ms      | 2.289ms |   1.06        |
|`rails new` app         |  1.911ms      | 1.798ms |   1.06        |
|Private app A           |  5.182ms      | 5.168ms |   1.00        |
|Private app B           |  185.7ms      | 107.9ms |   1.72        |

Private app A's heap size is about 22 MiB compared to B's 250 MiB.
App B boots up about 15% faster with this change.

## Impact to class variable lookup

I included a benchmark in the patch to measure the impact to class variable lookup performance.
The difference seems negligible.

## Conclusion

This change seems to reduce minor GC time for real-world applications.


---

Code: https://github.com/ruby/ruby/pull/3238
Credits to @tenderlovemaking for coming up with the idea for this change.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:98988] [Ruby master Feature#16984] Remove write barrier examption for T_ICLASS
  2020-06-25  1:03 [ruby-core:98935] [Ruby master Feature#16984] Remove write barrier examption for T_ICLASS XrXr
@ 2020-06-27 19:12 ` ko1
  2020-06-30  7:13 ` [ruby-core:98996] " XrXr
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: ko1 @ 2020-06-27 19:12 UTC (permalink / raw)
  To: ruby-core

Issue #16984 has been updated by ko1 (Koichi Sasada).


Thank you for great work. This kind of hack can cause BUGs easily.

> Private app A's heap size is about 22 MiB compared to B's 250 MiB.

Could you measure the memory/objects consumption before and after this patch if it is not difficult?
Maybe no problem, but I want to confirm.

----------------------------------------
Feature #16984: Remove write barrier examption for T_ICLASS
https://bugs.ruby-lang.org/issues/16984#change-86368

* Author: alanwu (Alan Wu)
* Status: Open
* Priority: Normal
----------------------------------------
Consider the following code:

```ruby
module M
  def foo; end
  def bar; end
end

class C
  include M
end
```

The object reference graph from running the code looks like this:

```
+---+              +-----+
| M |--------------| foo |-+
+---+              +-----+ |
  |                +-----+ |
  +----------------| bar | |
                   +-----+ |
+-----------+         |    |
| iclass(M) |---------+    |
+-----------+--------------+
```

Applying the proposed patch, the graph becomes

```
+---+         +--------------+   +-----+
| M |---------| method table |---| foo |
+---+         +--------------+   +-----+
+-----------+         |    |     +-----+
| iclass(M) |---------+    +-----| bar |
+-----------+                    +-----+

```

This change has a similar effect on the constant table. In addition to this, T_ICLASS no longer
holds a reference to a ivar table. Code that access the ivar table through iclasses
are changed to access it through the object from which the iclass was made. This change
impacts autoload and class variable lookup.

## Why?

The main goal of this change is to make iclasses and modules write barrier protected. At the moment, they are
"shady", which means the GC has to do extra work to handle them. In code bases that use modules a lot,
iclasses can easily take up a significant portion of the heap and impact GC time.

In the old setup, because of the way `M` and `iclass(M)` share the method table, adding a single method
to `M` would create multiple edges on the object reference graph. To safely make `M` and `iclass(M)`
write barrier protected, one would need to trigger a write barrier for each new edge. This would
make the amount of work it takes to add a method a function of the number of times the target module is
included.

The new setup also factors the edges in the graph. If the number of methods in a module is `M` and the number
of times the module is prepended or included is `N`, the old setup had `M * (N+1)` edges. The new setup has
`M + N + 1` edges instead. For large enough `M` and `N`, the new setup produces fewer edges. Having fewer
edges is better since the GC's work is proportional to the number of edges.

## Impact to GC time

I measured the impact to minor GC time with the following steps:
 - load an application
 - run `GC::Profile.enable`
 - allocate 50 million objects
 - run `GC::Profile.report`

Here is the impact to average minor GC time on various apps:

|Application             |     Before    |  After  | Speedup ratio |
|------------------------|---------------|---------|---------------|
|CRuby's test-all suite  |  2.438ms      | 2.289ms |   1.06        |
|`rails new` app         |  1.911ms      | 1.798ms |   1.06        |
|Private app A           |  5.182ms      | 5.168ms |   1.00        |
|Private app B           |  185.7ms      | 107.9ms |   1.72        |

Private app A's heap size is about 22 MiB compared to B's 250 MiB.
App B boots up about 15% faster with this change.

## Impact to class variable lookup

I included a benchmark in the patch to measure the impact to class variable lookup performance.
The difference seems negligible.

## Conclusion

This change seems to reduce minor GC time for real-world applications.


---

Code: https://github.com/ruby/ruby/pull/3238
Credits to @tenderlovemaking for coming up with the idea for this change.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:98996] [Ruby master Feature#16984] Remove write barrier examption for T_ICLASS
  2020-06-25  1:03 [ruby-core:98935] [Ruby master Feature#16984] Remove write barrier examption for T_ICLASS XrXr
  2020-06-27 19:12 ` [ruby-core:98988] " ko1
@ 2020-06-30  7:13 ` XrXr
  2020-07-13 21:51 ` [ruby-core:99159] " XrXr
  2020-08-12  6:28 ` [ruby-core:99563] " ko1
  3 siblings, 0 replies; 5+ messages in thread
From: XrXr @ 2020-06-30  7:13 UTC (permalink / raw)
  To: ruby-core

Issue #16984 has been updated by alanwu (Alan Wu).


> Could you measure the memory/objects consumption before and after this patch if it is not difficult?

I took measurements on app B. It's a large Rails app with lots of classes and modules.
The amount of retained memory is not deterministic unfortunately, so I can only give a rough summary.
                           
                          | Change to median   | Change to average   |
--------------------------|--------------------|---------------------|
GC::Profiler "Total Size" |      7 MiB         |         1 MiB       |
VmRSS from the /proc      |      4 MiB         |        -5 MiB       |

The increase to GC heap size makes sense, I expect about 5 MiB more objects given the number of
classes and modules in the app. There is not much change to RSS I guess because the
patch moves what used to be on the malloc heap to the GC heap.

----------------------------------------
Feature #16984: Remove write barrier examption for T_ICLASS
https://bugs.ruby-lang.org/issues/16984#change-86374

* Author: alanwu (Alan Wu)
* Status: Open
* Priority: Normal
----------------------------------------
Consider the following code:

```ruby
module M
  def foo; end
  def bar; end
end

class C
  include M
end
```

The object reference graph from running the code looks like this:

```
+---+              +-----+
| M |--------------| foo |-+
+---+              +-----+ |
  |                +-----+ |
  +----------------| bar | |
                   +-----+ |
+-----------+         |    |
| iclass(M) |---------+    |
+-----------+--------------+
```

Applying the proposed patch, the graph becomes

```
+---+         +--------------+   +-----+
| M |---------| method table |---| foo |
+---+         +--------------+   +-----+
+-----------+         |    |     +-----+
| iclass(M) |---------+    +-----| bar |
+-----------+                    +-----+

```

This change has a similar effect on the constant table. In addition to this, T_ICLASS no longer
holds a reference to a ivar table. Code that access the ivar table through iclasses
are changed to access it through the object from which the iclass was made. This change
impacts autoload and class variable lookup.

## Why?

The main goal of this change is to make iclasses and modules write barrier protected. At the moment, they are
"shady", which means the GC has to do extra work to handle them. In code bases that use modules a lot,
iclasses can easily take up a significant portion of the heap and impact GC time.

In the old setup, because of the way `M` and `iclass(M)` share the method table, adding a single method
to `M` would create multiple edges on the object reference graph. To safely make `M` and `iclass(M)`
write barrier protected, one would need to trigger a write barrier for each new edge. This would
make the amount of work it takes to add a method a function of the number of times the target module is
included.

The new setup also factors the edges in the graph. If the number of methods in a module is `M` and the number
of times the module is prepended or included is `N`, the old setup had `M * (N+1)` edges. The new setup has
`M + N + 1` edges instead. For large enough `M` and `N`, the new setup produces fewer edges. Having fewer
edges is better since the GC's work is proportional to the number of edges.

## Impact to GC time

I measured the impact to minor GC time with the following steps:
 - load an application
 - run `GC::Profile.enable`
 - allocate 50 million objects
 - run `GC::Profile.report`

Here is the impact to average minor GC time on various apps:

|Application             |     Before    |  After  | Speedup ratio |
|------------------------|---------------|---------|---------------|
|CRuby's test-all suite  |  2.438ms      | 2.289ms |   1.06        |
|`rails new` app         |  1.911ms      | 1.798ms |   1.06        |
|Private app A           |  5.182ms      | 5.168ms |   1.00        |
|Private app B           |  185.7ms      | 107.9ms |   1.72        |

Private app A's heap size is about 22 MiB compared to B's 250 MiB.
App B boots up about 15% faster with this change.

## Impact to class variable lookup

I included a benchmark in the patch to measure the impact to class variable lookup performance.
The difference seems negligible.

## Conclusion

This change seems to reduce minor GC time for real-world applications.


---

Code: https://github.com/ruby/ruby/pull/3238
Credits to @tenderlovemaking for coming up with the idea for this change.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:99159] [Ruby master Feature#16984] Remove write barrier examption for T_ICLASS
  2020-06-25  1:03 [ruby-core:98935] [Ruby master Feature#16984] Remove write barrier examption for T_ICLASS XrXr
  2020-06-27 19:12 ` [ruby-core:98988] " ko1
  2020-06-30  7:13 ` [ruby-core:98996] " XrXr
@ 2020-07-13 21:51 ` XrXr
  2020-08-12  6:28 ` [ruby-core:99563] " ko1
  3 siblings, 0 replies; 5+ messages in thread
From: XrXr @ 2020-07-13 21:51 UTC (permalink / raw)
  To: ruby-core

Issue #16984 has been updated by alanwu (Alan Wu).

Description updated

Edit: I noticed that T_ICLASS wasn't marking the shared method and constant table on master. My notes
about reducing the number of `gc_mark` references on the heap was incorrect.



----------------------------------------
Feature #16984: Remove write barrier examption for T_ICLASS
https://bugs.ruby-lang.org/issues/16984#change-86537

* Author: alanwu (Alan Wu)
* Status: Open
* Priority: Normal
----------------------------------------
Consider the following code:

```ruby
module M
  def foo; end
  def bar; end
end

class C
  include M
end
```

The object reference graph from running the code looks like this:

```
+---+              +-----+
| M |--------------| foo |-+
+---+              +-----+ |
  |                +-----+ |
  +----------------| bar | |
                   +-----+ |
+-----------+         |    |
| iclass(M) |---------+    |
+-----------+--------------+
```

Applying the proposed patch, the graph becomes

```
+---+         +--------------+   +-----+
| M |---------| method table |---| foo |
+---+         +--------------+   +-----+
+-----------+         |    |     +-----+
| iclass(M) |---------+    +-----| bar |
+-----------+                    +-----+

```

This change has a similar effect on the constant table. In addition to this, T_ICLASS no longer
holds a reference to a ivar table. Code that access the ivar table through iclasses
are changed to access it through the object from which the iclass was made. This change
impacts autoload and class variable lookup.

## Why?

The main goal of this change is to make iclasses and modules write barrier protected. At the moment, they are
"shady", which means the GC has to do extra work to handle them. In code bases that use modules a lot,
iclasses can easily take up a significant portion of the heap and impact GC time. Inserting write barriers was
tricky in the old setup, because of the way `M` and `iclass(M)` share the method table.

Having write barrier for iclasses mean they can age in the generational GC.
Once aged, the GC can sometimes skip subgraphs rooted at these objects, improving performance.

## Impact to GC time

I measured the impact to minor GC time with the following steps:
 - load an application
 - run `GC::Profile.enable`
 - allocate 50 million objects
 - run `GC::Profile.report`

Here is the impact to average minor GC time on various apps:

|Application             |     Before    |  After  | Speedup ratio |
|------------------------|---------------|---------|---------------|
|CRuby's test-all suite  |  2.438ms      | 2.289ms |   1.06        |
|`rails new` app         |  1.911ms      | 1.798ms |   1.06        |
|Private app A           |  5.182ms      | 5.168ms |   1.00        |
|Private app B           |  185.7ms      | 107.9ms |   1.72        |

Private app A's heap size is about 22 MiB compared to B's 250 MiB.
App B boots up about 15% faster with this change.

## Impact to class variable lookup

I included a benchmark in the patch to measure the impact to class variable lookup performance.
The difference seems negligible.

## Conclusion

This change seems to reduce minor GC time for real-world applications.


---

Code: https://github.com/ruby/ruby/pull/3238
Credits to @tenderlovemaking for coming up with the idea for this change.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:99563] [Ruby master Feature#16984] Remove write barrier examption for T_ICLASS
  2020-06-25  1:03 [ruby-core:98935] [Ruby master Feature#16984] Remove write barrier examption for T_ICLASS XrXr
                   ` (2 preceding siblings ...)
  2020-07-13 21:51 ` [ruby-core:99159] " XrXr
@ 2020-08-12  6:28 ` ko1
  3 siblings, 0 replies; 5+ messages in thread
From: ko1 @ 2020-08-12  6:28 UTC (permalink / raw)
  To: ruby-core

Issue #16984 has been updated by ko1 (Koichi Sasada).


sorry I didn't check it.

Thank you, ~10 MB in "B's 250 MiB." is not problem I think.
Could you merge it if you don't have any trouble more?


----------------------------------------
Feature #16984: Remove write barrier examption for T_ICLASS
https://bugs.ruby-lang.org/issues/16984#change-87029

* Author: alanwu (Alan Wu)
* Status: Open
* Priority: Normal
----------------------------------------
Consider the following code:

```ruby
module M
  def foo; end
  def bar; end
end

class C
  include M
end
```

The object reference graph from running the code looks like this:

```
+---+              +-----+
| M |--------------| foo |-+
+---+              +-----+ |
  |                +-----+ |
  +----------------| bar | |
                   +-----+ |
+-----------+         |    |
| iclass(M) |---------+    |
+-----------+--------------+
```

Applying the proposed patch, the graph becomes

```
+---+         +--------------+   +-----+
| M |---------| method table |---| foo |
+---+         +--------------+   +-----+
+-----------+         |    |     +-----+
| iclass(M) |---------+    +-----| bar |
+-----------+                    +-----+

```

This change has a similar effect on the constant table. In addition to this, T_ICLASS no longer
holds a reference to a ivar table. Code that access the ivar table through iclasses
are changed to access it through the object from which the iclass was made. This change
impacts autoload and class variable lookup.

## Why?

The main goal of this change is to make iclasses and modules write barrier protected. At the moment, they are
"shady", which means the GC has to do extra work to handle them. In code bases that use modules a lot,
iclasses can easily take up a significant portion of the heap and impact GC time. Inserting write barriers was
tricky in the old setup, because of the way `M` and `iclass(M)` share the method table.

Having write barrier for iclasses mean they can age in the generational GC.
Once aged, the GC can sometimes skip subgraphs rooted at these objects, improving performance.

## Impact to GC time

I measured the impact to minor GC time with the following steps:
 - load an application
 - run `GC::Profile.enable`
 - allocate 50 million objects
 - run `GC::Profile.report`

Here is the impact to average minor GC time on various apps:

|Application             |     Before    |  After  | Speedup ratio |
|------------------------|---------------|---------|---------------|
|CRuby's test-all suite  |  2.438ms      | 2.289ms |   1.06        |
|`rails new` app         |  1.911ms      | 1.798ms |   1.06        |
|Private app A           |  5.182ms      | 5.168ms |   1.00        |
|Private app B           |  185.7ms      | 107.9ms |   1.72        |

Private app A's heap size is about 22 MiB compared to B's 250 MiB.
App B boots up about 15% faster with this change.

## Impact to class variable lookup

I included a benchmark in the patch to measure the impact to class variable lookup performance.
The difference seems negligible.

## Conclusion

This change seems to reduce minor GC time for real-world applications.


---

Code: https://github.com/ruby/ruby/pull/3238
Credits to @tenderlovemaking for coming up with the idea for this change.




-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-08-12  6:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-25  1:03 [ruby-core:98935] [Ruby master Feature#16984] Remove write barrier examption for T_ICLASS XrXr
2020-06-27 19:12 ` [ruby-core:98988] " ko1
2020-06-30  7:13 ` [ruby-core:98996] " XrXr
2020-07-13 21:51 ` [ruby-core:99159] " XrXr
2020-08-12  6:28 ` [ruby-core:99563] " ko1

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).