ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:88887] [Ruby trunk Feature#15085] Decrease memory cache usage of MJIT
       [not found] <redmine.issue-15085.20180906232533@ruby-lang.org>
@ 2018-09-06 23:25 ` s.wanabe
  2018-09-08  6:50 ` [ruby-core:88896] " takashikkbn
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 5+ messages in thread
From: s.wanabe @ 2018-09-06 23:25 UTC (permalink / raw
  To: ruby-core

Issue #15085 has been reported by wanabe (_ wanabe).

----------------------------------------
Feature #15085: Decrease memory cache usage of MJIT
https://bugs.ruby-lang.org/issues/15085

* Author: wanabe (_ wanabe)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
MJIT makes ruby-methods faster by ordinary, but I have observed that some cases are exceptional.
I guess the one is caused by `invokesuper` instruction.
And I guess it is related to memory caching, especially iTLB.

Attached "export-big-func.patch" makes MJIT binary code for `invokesuper` smaller.
"super.rb" is a benchmark script with benchmark_driver.
"benchmark.log" is a result of super.rb.
"benchmark-with-perf.log" is another result with `PERF_STAT` environment variable.
The results are merely in my environment and depend to a large part on machine specs.

`invokesuper` can get faster with exported `vm_search_super_method()`, but I think it is not enough.
Because `perf stat` shows that there are still many iTLB-load-misses.
I believe MJIT can grow fast with good care for CPU memory cache, not only iTLB but also L1 / L2 and so on.

---Files--------------------------------
export-big-func.patch (934 Bytes)
super.rb (897 Bytes)
benchmark.log (624 Bytes)
benchmark-with-perf.log (7.05 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:88896] [Ruby trunk Feature#15085] Decrease memory cache usage of MJIT
       [not found] <redmine.issue-15085.20180906232533@ruby-lang.org>
  2018-09-06 23:25 ` [ruby-core:88887] [Ruby trunk Feature#15085] Decrease memory cache usage of MJIT s.wanabe
@ 2018-09-08  6:50 ` takashikkbn
  2018-09-13  0:10 ` [ruby-core:88965] [Ruby trunk Feature#15085][Rejected] " s.wanabe
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 5+ messages in thread
From: takashikkbn @ 2018-09-08  6:50 UTC (permalink / raw
  To: ruby-core

Issue #15085 has been updated by k0kubun (Takashi Kokubun).


As long as I can see from the benchmark result for the improved case, it looks good. But at least I would like to see micro benchmarks for opt_send_without_block and send. Because of _mjit_compile_send, it may not be affected so much though. Also, how was the result for larger benchmarks (optcarrot, discourse, ...)?

> And I guess it is related to memory caching, especially iTLB.
> invokesuper can get faster with exported vm_search_super_method(), but I think it is not enough.

My assumption on exporting only `rb_vm_search_method_slowpath` was that we should inline things as much as possible to exploit compiler optimizations but compiling (`rb_vm_search_method_slowpath` part of) `vm_search_method` was too slow to compile many methods within the default Optcarrot measurement period. I didn't care CPU cache for not compiling it, and I assume we should inline everything if compilation finishes in 0 second.

Why do you think not inlining `vm_search_method` is more friendly for iTLB? Is the generated code size for `vm_search_method` is too big, or is loading instructions from vm_search_method efficient when the code for vm_search_method is shared with VM?

----------------------------------------
Feature #15085: Decrease memory cache usage of MJIT
https://bugs.ruby-lang.org/issues/15085#change-73937

* Author: wanabe (_ wanabe)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
MJIT makes ruby-methods faster by ordinary, but I have observed that some cases are exceptional.
I guess the one is caused by `invokesuper` instruction.
And I guess it is related to memory caching, especially iTLB.

Attached "export-big-func.patch" makes MJIT binary code for `invokesuper` smaller.
"super.rb" is a benchmark script with benchmark_driver.
"benchmark.log" is a result of super.rb.
"benchmark-with-perf.log" is another result with `PERF_STAT` environment variable.
The results are merely in my environment and depend to a large part on machine specs.

`invokesuper` can get faster with exported `vm_search_super_method()`, but I think it is not enough.
Because `perf stat` shows that there are still many iTLB-load-misses.
I believe MJIT can grow fast with good care for CPU memory cache, not only iTLB but also L1 / L2 and so on.

---Files--------------------------------
export-big-func.patch (934 Bytes)
super.rb (897 Bytes)
benchmark.log (624 Bytes)
benchmark-with-perf.log (7.05 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:88965] [Ruby trunk Feature#15085][Rejected] Decrease memory cache usage of MJIT
       [not found] <redmine.issue-15085.20180906232533@ruby-lang.org>
  2018-09-06 23:25 ` [ruby-core:88887] [Ruby trunk Feature#15085] Decrease memory cache usage of MJIT s.wanabe
  2018-09-08  6:50 ` [ruby-core:88896] " takashikkbn
@ 2018-09-13  0:10 ` s.wanabe
  2018-09-13  1:34 ` [ruby-core:88966] [Ruby trunk Feature#15085] " takashikkbn
  2018-11-09  0:17 ` [ruby-core:89761] " s.wanabe
  4 siblings, 0 replies; 5+ messages in thread
From: s.wanabe @ 2018-09-13  0:10 UTC (permalink / raw
  To: ruby-core

Issue #15085 has been updated by wanabe (_ wanabe).

File export-vm_call_super_method.patch added
Status changed from Open to Rejected

I am sorry in advance, I've decided to withdraw this ticket and its patch.
I tried to reveal what's going on and explain it, but end up getting nowhere.

I also tried to explain the reason that I had reached to `vm_search_super_method` and iTLB,
but I had forgotten to write a memo and I can't remember now.

To make matters worse, my assumption "Big function makes iTLB-load-count bad" is totally wrong.
For example, attached "export-vm_call_super_method.patch" shows almost same result on my environment.
(Note that the JIT compile time is as same as trunk)
Although `vm_call_super_method` is a very small function.

```
Warming up --------------------------------------
              a.foo    344.327k i/s
Calculating -------------------------------------
                          trunk  trunk,--jit  export-big-func  export-big-func,--jit  export-vm_call_super_method  export-vm_call_super_method,--jit 
              a.foo    353.285k     224.659k         340.091k               368.875k                     343.167k                           386.849k i/s -      1.033M times in 2.923926s 4.597989s 3.037365s 2.800350s 3.010142s 2.670239s

Comparison:
                           a.foo 
export-vm_call_super_method,--jit:    386849.3 i/s 
export-big-func,--jit:    368875.3 i/s - 1.05x  slower
               trunk:    353285.2 i/s - 1.10x  slower
export-vm_call_super_method:    343166.5 i/s - 1.13x  slower
     export-big-func:    340090.8 i/s - 1.14x  slower
         trunk,--jit:    224659.1 i/s - 1.72x  slower

```

So I gave up this ticket until at least I can explain.
I'm sorry for confusing you.

----------------------------------------
Feature #15085: Decrease memory cache usage of MJIT
https://bugs.ruby-lang.org/issues/15085#change-73997

* Author: wanabe (_ wanabe)
* Status: Rejected
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
MJIT makes ruby-methods faster by ordinary, but I have observed that some cases are exceptional.
I guess the one is caused by `invokesuper` instruction.
And I guess it is related to memory caching, especially iTLB.

Attached "export-big-func.patch" makes MJIT binary code for `invokesuper` smaller.
"super.rb" is a benchmark script with benchmark_driver.
"benchmark.log" is a result of super.rb.
"benchmark-with-perf.log" is another result with `PERF_STAT` environment variable.
The results are merely in my environment and depend to a large part on machine specs.

`invokesuper` can get faster with exported `vm_search_super_method()`, but I think it is not enough.
Because `perf stat` shows that there are still many iTLB-load-misses.
I believe MJIT can grow fast with good care for CPU memory cache, not only iTLB but also L1 / L2 and so on.

---Files--------------------------------
export-big-func.patch (934 Bytes)
super.rb (897 Bytes)
benchmark.log (624 Bytes)
benchmark-with-perf.log (7.05 KB)
export-vm_call_super_method.patch (2.66 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:88966] [Ruby trunk Feature#15085] Decrease memory cache usage of MJIT
       [not found] <redmine.issue-15085.20180906232533@ruby-lang.org>
                   ` (2 preceding siblings ...)
  2018-09-13  0:10 ` [ruby-core:88965] [Ruby trunk Feature#15085][Rejected] " s.wanabe
@ 2018-09-13  1:34 ` takashikkbn
  2018-11-09  0:17 ` [ruby-core:89761] " s.wanabe
  4 siblings, 0 replies; 5+ messages in thread
From: takashikkbn @ 2018-09-13  1:34 UTC (permalink / raw
  To: ruby-core

Issue #15085 has been updated by k0kubun (Takashi Kokubun).


I see. Thank you for the experiment and taking time for the investigation.

----------------------------------------
Feature #15085: Decrease memory cache usage of MJIT
https://bugs.ruby-lang.org/issues/15085#change-73998

* Author: wanabe (_ wanabe)
* Status: Rejected
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
MJIT makes ruby-methods faster by ordinary, but I have observed that some cases are exceptional.
I guess the one is caused by `invokesuper` instruction.
And I guess it is related to memory caching, especially iTLB.

Attached "export-big-func.patch" makes MJIT binary code for `invokesuper` smaller.
"super.rb" is a benchmark script with benchmark_driver.
"benchmark.log" is a result of super.rb.
"benchmark-with-perf.log" is another result with `PERF_STAT` environment variable.
The results are merely in my environment and depend to a large part on machine specs.

`invokesuper` can get faster with exported `vm_search_super_method()`, but I think it is not enough.
Because `perf stat` shows that there are still many iTLB-load-misses.
I believe MJIT can grow fast with good care for CPU memory cache, not only iTLB but also L1 / L2 and so on.

---Files--------------------------------
export-big-func.patch (934 Bytes)
super.rb (897 Bytes)
benchmark.log (624 Bytes)
benchmark-with-perf.log (7.05 KB)
export-vm_call_super_method.patch (2.66 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [ruby-core:89761] [Ruby trunk Feature#15085] Decrease memory cache usage of MJIT
       [not found] <redmine.issue-15085.20180906232533@ruby-lang.org>
                   ` (3 preceding siblings ...)
  2018-09-13  1:34 ` [ruby-core:88966] [Ruby trunk Feature#15085] " takashikkbn
@ 2018-11-09  0:17 ` s.wanabe
  4 siblings, 0 replies; 5+ messages in thread
From: s.wanabe @ 2018-11-09  0:17 UTC (permalink / raw
  To: ruby-core

Issue #15085 has been updated by wanabe (_ wanabe).

File benchmark-with-perf-on-preview3.log added

The issue is almost gone on v2_6_0_preview3.
`invokesuper` on MJIT runs as about fast as on normal VM.

Attached "benchmark-with-perf-on-preview3.log" is benchmark result.

----------------------------------------
Feature #15085: Decrease memory cache usage of MJIT
https://bugs.ruby-lang.org/issues/15085#change-74815

* Author: wanabe (_ wanabe)
* Status: Rejected
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
MJIT makes ruby-methods faster by ordinary, but I have observed that some cases are exceptional.
I guess the one is caused by `invokesuper` instruction.
And I guess it is related to memory caching, especially iTLB.

Attached "export-big-func.patch" makes MJIT binary code for `invokesuper` smaller.
"super.rb" is a benchmark script with benchmark_driver.
"benchmark.log" is a result of super.rb.
"benchmark-with-perf.log" is another result with `PERF_STAT` environment variable.
The results are merely in my environment and depend to a large part on machine specs.

`invokesuper` can get faster with exported `vm_search_super_method()`, but I think it is not enough.
Because `perf stat` shows that there are still many iTLB-load-misses.
I believe MJIT can grow fast with good care for CPU memory cache, not only iTLB but also L1 / L2 and so on.

---Files--------------------------------
export-big-func.patch (934 Bytes)
super.rb (897 Bytes)
benchmark.log (624 Bytes)
benchmark-with-perf.log (7.05 KB)
export-vm_call_super_method.patch (2.66 KB)
benchmark-with-perf-on-preview3.log (9.99 KB)


-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-11-09  0:17 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <redmine.issue-15085.20180906232533@ruby-lang.org>
2018-09-06 23:25 ` [ruby-core:88887] [Ruby trunk Feature#15085] Decrease memory cache usage of MJIT s.wanabe
2018-09-08  6:50 ` [ruby-core:88896] " takashikkbn
2018-09-13  0:10 ` [ruby-core:88965] [Ruby trunk Feature#15085][Rejected] " s.wanabe
2018-09-13  1:34 ` [ruby-core:88966] [Ruby trunk Feature#15085] " takashikkbn
2018-11-09  0:17 ` [ruby-core:89761] " s.wanabe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).