* [ruby-core:88887] [Ruby trunk Feature#15085] Decrease memory cache usage of MJIT
[not found] <redmine.issue-15085.20180906232533@ruby-lang.org>
@ 2018-09-06 23:25 ` s.wanabe
2018-09-08 6:50 ` [ruby-core:88896] " takashikkbn
` (3 subsequent siblings)
4 siblings, 0 replies; 5+ messages in thread
From: s.wanabe @ 2018-09-06 23:25 UTC (permalink / raw
To: ruby-core
Issue #15085 has been reported by wanabe (_ wanabe).
----------------------------------------
Feature #15085: Decrease memory cache usage of MJIT
https://bugs.ruby-lang.org/issues/15085
* Author: wanabe (_ wanabe)
* Status: Open
* Priority: Normal
* Assignee:
* Target version:
----------------------------------------
MJIT makes ruby-methods faster by ordinary, but I have observed that some cases are exceptional.
I guess the one is caused by `invokesuper` instruction.
And I guess it is related to memory caching, especially iTLB.
Attached "export-big-func.patch" makes MJIT binary code for `invokesuper` smaller.
"super.rb" is a benchmark script with benchmark_driver.
"benchmark.log" is a result of super.rb.
"benchmark-with-perf.log" is another result with `PERF_STAT` environment variable.
The results are merely in my environment and depend to a large part on machine specs.
`invokesuper` can get faster with exported `vm_search_super_method()`, but I think it is not enough.
Because `perf stat` shows that there are still many iTLB-load-misses.
I believe MJIT can grow fast with good care for CPU memory cache, not only iTLB but also L1 / L2 and so on.
---Files--------------------------------
export-big-func.patch (934 Bytes)
super.rb (897 Bytes)
benchmark.log (624 Bytes)
benchmark-with-perf.log (7.05 KB)
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 5+ messages in thread
* [ruby-core:88896] [Ruby trunk Feature#15085] Decrease memory cache usage of MJIT
[not found] <redmine.issue-15085.20180906232533@ruby-lang.org>
2018-09-06 23:25 ` [ruby-core:88887] [Ruby trunk Feature#15085] Decrease memory cache usage of MJIT s.wanabe
@ 2018-09-08 6:50 ` takashikkbn
2018-09-13 0:10 ` [ruby-core:88965] [Ruby trunk Feature#15085][Rejected] " s.wanabe
` (2 subsequent siblings)
4 siblings, 0 replies; 5+ messages in thread
From: takashikkbn @ 2018-09-08 6:50 UTC (permalink / raw
To: ruby-core
Issue #15085 has been updated by k0kubun (Takashi Kokubun).
As long as I can see from the benchmark result for the improved case, it looks good. But at least I would like to see micro benchmarks for opt_send_without_block and send. Because of _mjit_compile_send, it may not be affected so much though. Also, how was the result for larger benchmarks (optcarrot, discourse, ...)?
> And I guess it is related to memory caching, especially iTLB.
> invokesuper can get faster with exported vm_search_super_method(), but I think it is not enough.
My assumption on exporting only `rb_vm_search_method_slowpath` was that we should inline things as much as possible to exploit compiler optimizations but compiling (`rb_vm_search_method_slowpath` part of) `vm_search_method` was too slow to compile many methods within the default Optcarrot measurement period. I didn't care CPU cache for not compiling it, and I assume we should inline everything if compilation finishes in 0 second.
Why do you think not inlining `vm_search_method` is more friendly for iTLB? Is the generated code size for `vm_search_method` is too big, or is loading instructions from vm_search_method efficient when the code for vm_search_method is shared with VM?
----------------------------------------
Feature #15085: Decrease memory cache usage of MJIT
https://bugs.ruby-lang.org/issues/15085#change-73937
* Author: wanabe (_ wanabe)
* Status: Open
* Priority: Normal
* Assignee:
* Target version:
----------------------------------------
MJIT makes ruby-methods faster by ordinary, but I have observed that some cases are exceptional.
I guess the one is caused by `invokesuper` instruction.
And I guess it is related to memory caching, especially iTLB.
Attached "export-big-func.patch" makes MJIT binary code for `invokesuper` smaller.
"super.rb" is a benchmark script with benchmark_driver.
"benchmark.log" is a result of super.rb.
"benchmark-with-perf.log" is another result with `PERF_STAT` environment variable.
The results are merely in my environment and depend to a large part on machine specs.
`invokesuper` can get faster with exported `vm_search_super_method()`, but I think it is not enough.
Because `perf stat` shows that there are still many iTLB-load-misses.
I believe MJIT can grow fast with good care for CPU memory cache, not only iTLB but also L1 / L2 and so on.
---Files--------------------------------
export-big-func.patch (934 Bytes)
super.rb (897 Bytes)
benchmark.log (624 Bytes)
benchmark-with-perf.log (7.05 KB)
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 5+ messages in thread
* [ruby-core:88965] [Ruby trunk Feature#15085][Rejected] Decrease memory cache usage of MJIT
[not found] <redmine.issue-15085.20180906232533@ruby-lang.org>
2018-09-06 23:25 ` [ruby-core:88887] [Ruby trunk Feature#15085] Decrease memory cache usage of MJIT s.wanabe
2018-09-08 6:50 ` [ruby-core:88896] " takashikkbn
@ 2018-09-13 0:10 ` s.wanabe
2018-09-13 1:34 ` [ruby-core:88966] [Ruby trunk Feature#15085] " takashikkbn
2018-11-09 0:17 ` [ruby-core:89761] " s.wanabe
4 siblings, 0 replies; 5+ messages in thread
From: s.wanabe @ 2018-09-13 0:10 UTC (permalink / raw
To: ruby-core
Issue #15085 has been updated by wanabe (_ wanabe).
File export-vm_call_super_method.patch added
Status changed from Open to Rejected
I am sorry in advance, I've decided to withdraw this ticket and its patch.
I tried to reveal what's going on and explain it, but end up getting nowhere.
I also tried to explain the reason that I had reached to `vm_search_super_method` and iTLB,
but I had forgotten to write a memo and I can't remember now.
To make matters worse, my assumption "Big function makes iTLB-load-count bad" is totally wrong.
For example, attached "export-vm_call_super_method.patch" shows almost same result on my environment.
(Note that the JIT compile time is as same as trunk)
Although `vm_call_super_method` is a very small function.
```
Warming up --------------------------------------
a.foo 344.327k i/s
Calculating -------------------------------------
trunk trunk,--jit export-big-func export-big-func,--jit export-vm_call_super_method export-vm_call_super_method,--jit
a.foo 353.285k 224.659k 340.091k 368.875k 343.167k 386.849k i/s - 1.033M times in 2.923926s 4.597989s 3.037365s 2.800350s 3.010142s 2.670239s
Comparison:
a.foo
export-vm_call_super_method,--jit: 386849.3 i/s
export-big-func,--jit: 368875.3 i/s - 1.05x slower
trunk: 353285.2 i/s - 1.10x slower
export-vm_call_super_method: 343166.5 i/s - 1.13x slower
export-big-func: 340090.8 i/s - 1.14x slower
trunk,--jit: 224659.1 i/s - 1.72x slower
```
So I gave up this ticket until at least I can explain.
I'm sorry for confusing you.
----------------------------------------
Feature #15085: Decrease memory cache usage of MJIT
https://bugs.ruby-lang.org/issues/15085#change-73997
* Author: wanabe (_ wanabe)
* Status: Rejected
* Priority: Normal
* Assignee:
* Target version:
----------------------------------------
MJIT makes ruby-methods faster by ordinary, but I have observed that some cases are exceptional.
I guess the one is caused by `invokesuper` instruction.
And I guess it is related to memory caching, especially iTLB.
Attached "export-big-func.patch" makes MJIT binary code for `invokesuper` smaller.
"super.rb" is a benchmark script with benchmark_driver.
"benchmark.log" is a result of super.rb.
"benchmark-with-perf.log" is another result with `PERF_STAT` environment variable.
The results are merely in my environment and depend to a large part on machine specs.
`invokesuper` can get faster with exported `vm_search_super_method()`, but I think it is not enough.
Because `perf stat` shows that there are still many iTLB-load-misses.
I believe MJIT can grow fast with good care for CPU memory cache, not only iTLB but also L1 / L2 and so on.
---Files--------------------------------
export-big-func.patch (934 Bytes)
super.rb (897 Bytes)
benchmark.log (624 Bytes)
benchmark-with-perf.log (7.05 KB)
export-vm_call_super_method.patch (2.66 KB)
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 5+ messages in thread
* [ruby-core:88966] [Ruby trunk Feature#15085] Decrease memory cache usage of MJIT
[not found] <redmine.issue-15085.20180906232533@ruby-lang.org>
` (2 preceding siblings ...)
2018-09-13 0:10 ` [ruby-core:88965] [Ruby trunk Feature#15085][Rejected] " s.wanabe
@ 2018-09-13 1:34 ` takashikkbn
2018-11-09 0:17 ` [ruby-core:89761] " s.wanabe
4 siblings, 0 replies; 5+ messages in thread
From: takashikkbn @ 2018-09-13 1:34 UTC (permalink / raw
To: ruby-core
Issue #15085 has been updated by k0kubun (Takashi Kokubun).
I see. Thank you for the experiment and taking time for the investigation.
----------------------------------------
Feature #15085: Decrease memory cache usage of MJIT
https://bugs.ruby-lang.org/issues/15085#change-73998
* Author: wanabe (_ wanabe)
* Status: Rejected
* Priority: Normal
* Assignee:
* Target version:
----------------------------------------
MJIT makes ruby-methods faster by ordinary, but I have observed that some cases are exceptional.
I guess the one is caused by `invokesuper` instruction.
And I guess it is related to memory caching, especially iTLB.
Attached "export-big-func.patch" makes MJIT binary code for `invokesuper` smaller.
"super.rb" is a benchmark script with benchmark_driver.
"benchmark.log" is a result of super.rb.
"benchmark-with-perf.log" is another result with `PERF_STAT` environment variable.
The results are merely in my environment and depend to a large part on machine specs.
`invokesuper` can get faster with exported `vm_search_super_method()`, but I think it is not enough.
Because `perf stat` shows that there are still many iTLB-load-misses.
I believe MJIT can grow fast with good care for CPU memory cache, not only iTLB but also L1 / L2 and so on.
---Files--------------------------------
export-big-func.patch (934 Bytes)
super.rb (897 Bytes)
benchmark.log (624 Bytes)
benchmark-with-perf.log (7.05 KB)
export-vm_call_super_method.patch (2.66 KB)
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 5+ messages in thread
* [ruby-core:89761] [Ruby trunk Feature#15085] Decrease memory cache usage of MJIT
[not found] <redmine.issue-15085.20180906232533@ruby-lang.org>
` (3 preceding siblings ...)
2018-09-13 1:34 ` [ruby-core:88966] [Ruby trunk Feature#15085] " takashikkbn
@ 2018-11-09 0:17 ` s.wanabe
4 siblings, 0 replies; 5+ messages in thread
From: s.wanabe @ 2018-11-09 0:17 UTC (permalink / raw
To: ruby-core
Issue #15085 has been updated by wanabe (_ wanabe).
File benchmark-with-perf-on-preview3.log added
The issue is almost gone on v2_6_0_preview3.
`invokesuper` on MJIT runs as about fast as on normal VM.
Attached "benchmark-with-perf-on-preview3.log" is benchmark result.
----------------------------------------
Feature #15085: Decrease memory cache usage of MJIT
https://bugs.ruby-lang.org/issues/15085#change-74815
* Author: wanabe (_ wanabe)
* Status: Rejected
* Priority: Normal
* Assignee:
* Target version:
----------------------------------------
MJIT makes ruby-methods faster by ordinary, but I have observed that some cases are exceptional.
I guess the one is caused by `invokesuper` instruction.
And I guess it is related to memory caching, especially iTLB.
Attached "export-big-func.patch" makes MJIT binary code for `invokesuper` smaller.
"super.rb" is a benchmark script with benchmark_driver.
"benchmark.log" is a result of super.rb.
"benchmark-with-perf.log" is another result with `PERF_STAT` environment variable.
The results are merely in my environment and depend to a large part on machine specs.
`invokesuper` can get faster with exported `vm_search_super_method()`, but I think it is not enough.
Because `perf stat` shows that there are still many iTLB-load-misses.
I believe MJIT can grow fast with good care for CPU memory cache, not only iTLB but also L1 / L2 and so on.
---Files--------------------------------
export-big-func.patch (934 Bytes)
super.rb (897 Bytes)
benchmark.log (624 Bytes)
benchmark-with-perf.log (7.05 KB)
export-vm_call_super_method.patch (2.66 KB)
benchmark-with-perf-on-preview3.log (9.99 KB)
--
https://bugs.ruby-lang.org/
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-11-09 0:17 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <redmine.issue-15085.20180906232533@ruby-lang.org>
2018-09-06 23:25 ` [ruby-core:88887] [Ruby trunk Feature#15085] Decrease memory cache usage of MJIT s.wanabe
2018-09-08 6:50 ` [ruby-core:88896] " takashikkbn
2018-09-13 0:10 ` [ruby-core:88965] [Ruby trunk Feature#15085][Rejected] " s.wanabe
2018-09-13 1:34 ` [ruby-core:88966] [Ruby trunk Feature#15085] " takashikkbn
2018-11-09 0:17 ` [ruby-core:89761] " s.wanabe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).