[ruby-core:86879] [Ruby trunk Feature#14739] Improve fiber yield/resume performance

ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed

* [ruby-core:86879] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
@ 2018-05-05  0:59 ` samuel
  2018-05-05  1:25 ` [ruby-core:86880] " shyouhei
                   ` (47 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-05-05  0:59 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been reported by ioquatix (Samuel Williams).

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86880] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
  2018-05-05  0:59 ` [ruby-core:86879] [Ruby trunk Feature#14739] Improve fiber yield/resume performance samuel
@ 2018-05-05  1:25 ` shyouhei
  2018-05-05  2:48 ` [ruby-core:86884] " samuel
                   ` (46 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: shyouhei @ 2018-05-05  1:25 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by shyouhei (Shyouhei Urabe).


ioquatix (Samuel Williams) wrote:
> Does Ruby currently reuse stacks?

Yes.

Not sure how fast libcoro is, though.


----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71841

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86884] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
  2018-05-05  0:59 ` [ruby-core:86879] [Ruby trunk Feature#14739] Improve fiber yield/resume performance samuel
  2018-05-05  1:25 ` [ruby-core:86880] " shyouhei
@ 2018-05-05  2:48 ` samuel
  2018-05-05  2:50 ` [ruby-core:86885] " samuel
                   ` (45 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-05-05  2:48 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).

Here is the code https://github.com/ioquatix/ruby/tree/fiber-libcoro

I did some preliminary test.

I don't know if there is a standard fiber benchmark/test in Ruby.

I used a simple message passing ring benchmark I found.

```
^_^ > ./build/bin/ruby ./fiber_benchmark.rb 10000 1000
setup time for 10000 fibers:   0.159232
execution time for 1000 messages:  13.581069

^_^ > ./fiber_benchmark.rb 10000 1000 
setup time for 10000 fibers:   0.153677
execution time for 1000 messages:  14.630562
```

It was about 8% faster.

I feel like this is still slow. I will need to investigate further. I didn't use stack reuse code path because I was more interested in yield/resume performance, so all fibers are allocated in the first step of the benchmark.

That being said, I would like to know if this is of interest to Ruby, otherwise I won't invest more time into it.

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71846

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86885] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (2 preceding siblings ...)
  2018-05-05  2:48 ` [ruby-core:86884] " samuel
@ 2018-05-05  2:50 ` samuel
  2018-05-05  2:54 ` [ruby-core:86887] " samuel
                   ` (44 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-05-05  2:50 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).

> Not sure how fast libcoro is, though.

In my experience, the `libcoro` ASM implementation is the fastest implementation I used.

It's not much slower than a (normal) C function call.

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71847

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86887] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (3 preceding siblings ...)
  2018-05-05  2:50 ` [ruby-core:86885] " samuel
@ 2018-05-05  2:54 ` samuel
  2018-05-05  2:57 ` [ruby-core:86888] " samuel
                   ` (43 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-05-05  2:54 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).

It would definitely make sense to have a good quality thread-local fiber cache.

The current fiber cache `machine_stack_cache_struct` looks a bit limited. Does it require locking? Or does it assume the GVL?

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71849

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86888] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (4 preceding siblings ...)
  2018-05-05  2:54 ` [ruby-core:86887] " samuel
@ 2018-05-05  2:57 ` samuel
  2018-05-05  3:17 ` [ruby-core:86889] " samuel
                   ` (42 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-05-05  2:57 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).


I think there are some bugs, it compiles without warning or error on macOS, but bombs on Linux :p

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71850

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86889] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (5 preceding siblings ...)
  2018-05-05  2:57 ` [ruby-core:86888] " samuel
@ 2018-05-05  3:17 ` samuel
  2018-05-05  3:38 ` [ruby-core:86890] " samuel
                   ` (41 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-05-05  3:17 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).

Oh, I think I stuffed up somewhere. Hold on the above benchmark might not be correct.

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71851

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86890] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (6 preceding siblings ...)
  2018-05-05  3:17 ` [ruby-core:86889] " samuel
@ 2018-05-05  3:38 ` samuel
  2018-05-05  3:54 ` [ruby-core:86891] " samuel
                   ` (40 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-05-05  3:38 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).


```
  # Without libcoro
  koyoko% ./build/bin/ruby ./fiber_benchmark.rb 10000 1000
  setup time for 10000 fibers:   0.099961
  execution time for 1000 messages:  19.505909

  # With libcoro
  koyoko% ./build/bin/ruby ./fiber_benchmark.rb 10000 1000
  setup time for 10000 fibers:   0.099268
  execution time for 1000 messages:   8.491746
```

It's almost 60% faster.

That's about what I was expecting.

Can someone else confirm? Thanks. The benefit is predominately on Linux.

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71852

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86891] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (7 preceding siblings ...)
  2018-05-05  3:38 ` [ruby-core:86890] " samuel
@ 2018-05-05  3:54 ` samuel
  2018-05-05  3:56 ` [ruby-core:86892] " samuel
                   ` (39 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-05-05  3:54 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).


```
# Without libcoro (macOS)

^_^ > ./build/bin/ruby ./fiber_benchmark.rb 10000 1000
setup time for 10000 fibers:   0.298039
execution time for 1000 messages:  35.248941

# With libcoro (macOS)

^_^ > ./build/bin/ruby ./fiber_benchmark.rb 10000 1000
setup time for 10000 fibers:   0.167117
execution time for 1000 messages:  15.460046
```

On macOS, it's about the same, 2.2x faster.

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71853

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86892] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (8 preceding siblings ...)
  2018-05-05  3:54 ` [ruby-core:86891] " samuel
@ 2018-05-05  3:56 ` samuel
  2018-05-05  8:41 ` [ruby-core:86895] " v.ondruch
                   ` (38 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-05-05  3:56 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).

I don't know how to run a full benchmark of Ruby. Can someone help me with that? It would be interesting to get a more general idea of the performance.

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71854

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86895] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (9 preceding siblings ...)
  2018-05-05  3:56 ` [ruby-core:86892] " samuel
@ 2018-05-05  8:41 ` v.ondruch
  2018-05-05 10:13 ` [ruby-core:86896] " nobu
                   ` (37 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: v.ondruch @ 2018-05-05  8:41 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by vo.x (Vit Ondruch).

I wonder what architectures libcoro supports? It seems it supports x86 a probably some ARM, but what about s390x and ppc64?

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71857

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86896] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (10 preceding siblings ...)
  2018-05-05  8:41 ` [ruby-core:86895] " v.ondruch
@ 2018-05-05 10:13 ` nobu
  2018-05-05 10:14 ` [ruby-core:86897] " samuel
                   ` (36 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: nobu @ 2018-05-05 10:13 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by nobu (Nobuyoshi Nakada).

And seems it requires gcc (variants) and non-Windows.
coro.c can't compile with Visual C nor mingw gcc.
Also, `asm` needed to be replaced with `__asm__` to compile with Apple clang, and it is 3% faster.

```
$ ruby fiber_benchmark.rb 10000 1000
setup time for 10000 fibers:   0.227721
execution time for 1000 messages:  74.540142

$ ./ruby fiber_benchmark.rb 10000 1000
setup time for 10000 fibers:   0.293740
execution time for 1000 messages:  72.180107
```
https://github.com/nobu/ruby/tree/feature/fiber-libcoro

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71858

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86897] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (11 preceding siblings ...)
  2018-05-05 10:13 ` [ruby-core:86896] " nobu
@ 2018-05-05 10:14 ` samuel
  2018-05-05 10:18 ` [ruby-core:86898] " samuel
                   ` (35 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-05-05 10:14 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).

You can see the supported methods here.

https://github.com/ioquatix/ruby/blob/4a9c12d94aae1cf3a52ca5f026432cd03e9817bc/libcoro/coro.h#L99-L164

For the proof of concept, I forced it to use the ASM method, which supports 32-bit and 64-bit x86 CPUs and ARM (I've never tested it).

It would make sense to set up some configure tests to detect which one is available.

I'd also suggest if we move forward with this, we should remove most of the native implementation of coroutines in Ruby because they are slower and clutter up the implementation.

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71859

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86898] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (12 preceding siblings ...)
  2018-05-05 10:14 ` [ruby-core:86897] " samuel
@ 2018-05-05 10:18 ` samuel
  2018-05-05 10:48 ` [ruby-core:86900] " samuel
                   ` (34 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-05-05 10:18 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).

I've compiled this on both LLVM and GCC just fine.

I've never tried compiling it on Windows but it should work. It might require some work.

> Also, asm needed to be replaced with __asm__ to compile with Apple clang

I didn't have this problem. What version of the developer tools are you using?

> and it is 3% faster.

If you get that, something is wrong, it's definitely a much bigger improvement than that. Did you try it on Linux?

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71860

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86900] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (13 preceding siblings ...)
  2018-05-05 10:18 ` [ruby-core:86898] " samuel
@ 2018-05-05 10:48 ` samuel
  2018-05-05 10:59 ` [ruby-core:86902] " samuel
                   ` (33 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-05-05 10:48 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).

I am trying out your branch, and will report back. 3% is within the margin for error so it sounds like nothing changed for some reason. There will be some explanation.

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71862

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86902] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (14 preceding siblings ...)
  2018-05-05 10:48 ` [ruby-core:86900] " samuel
@ 2018-05-05 10:59 ` samuel
  2018-05-05 11:32 ` [ruby-core:86904] " samuel
                   ` (32 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-05-05 10:59 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).

For some reason with your branch I can't reproduce results, but I retested on my linux desktop and I can reproduce results. I will investigate further.

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71864

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86904] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (15 preceding siblings ...)
  2018-05-05 10:59 ` [ruby-core:86902] " samuel
@ 2018-05-05 11:32 ` samuel
  2018-05-05 12:00 ` [ruby-core:86905] " nobu
                   ` (31 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-05-05 11:32 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).

Sometimes I can reproduce the results and sometimes I can't. Something seems very flakey. Sometimes the build system doesn't seem to rebuild the right files.

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71866

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86905] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (16 preceding siblings ...)
  2018-05-05 11:32 ` [ruby-core:86904] " samuel
@ 2018-05-05 12:00 ` nobu
  2018-05-05 12:03 ` [ruby-core:86906] " samuel
                   ` (30 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: nobu @ 2018-05-05 12:00 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by nobu (Nobuyoshi Nakada).


ioquatix (Samuel Williams) wrote:
> > Also, asm needed to be replaced with `__asm__` to compile with Apple clang
> 
> I didn't have this problem. What version of the developer tools are you using?

```
$ clang --version
Apple LLVM version 8.0.0 (clang-800.0.42.1)
Target: x86_64-apple-darwin15.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
```

> If you get that, something is wrong, it's definitely a much bigger improvement than that. Did you try it on Linux?

On Ubuntu 18.04,  it has the effect with `gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)`.

### trunk
```
$ ./x86_64-linux/exe/ruby src/fiber_benchmark.rb 10000 1000
setup time for 10000 fibers:   0.153903
execution time for 1000 messages:  25.395488
```

### fiber-libcoro
```
$ make -C x86_64-linux prog > /dev/null && ./x86_64-linux/exe/ruby src/fiber_benchmark.rb 10000 1000
In file included from ../src/libcoro/coro.c:41:0,
                 from ../src/cont.c:51:
../src/cont.c: In function ‘cont_free’:
../src/libcoro/coro.h:401:28: warning: statement with no effect [-Wunused-value]
 # define coro_destroy(ctx) (void *)(ctx)
                            ^~~~~~~~~~~~~
../src/cont.c:370:2: note: in expansion of macro ‘coro_destroy’
  coro_destroy((coro_context *)&fib->context);
  ^~~~~~~~~~~~
../src/cont.c: In function ‘fiber_initialize_machine_stack_context’:
../src/cont.c:862:32: warning: passing argument 2 of ‘coro_create’ from incompatible pointer type [-Wincompatible-pointer-types]
     coro_create(&fib->context, rb_fiber_start, NULL, fib->ss_sp, fib->ss_size);
                                ^~~~~~~~~~~~~~
In file included from ../src/cont.c:51:0:
../src/libcoro/coro.c:331:1: note: expected ‘coro_func {aka void (*)(void *)}’ but argument is of type ‘__attribute__((noreturn)) void (*)(void)’
 coro_create (coro_context *ctx, coro_func coro, void *arg, void *sptr, size_t ssize)
 ^~~~~~~~~~~
In file included from ../src/libcoro/coro.c:41:0,
                 from ../src/cont.c:51:
../src/cont.c: In function ‘rb_fiber_terminate’:
../src/libcoro/coro.h:401:28: warning: statement with no effect [-Wunused-value]
 # define coro_destroy(ctx) (void *)(ctx)
                            ^~~~~~~~~~~~~
../src/cont.c:1799:5: note: in expansion of macro ‘coro_destroy’
     coro_destroy(&fib->context);
     ^~~~~~~~~~~~
../src/cont.c: At top level:
cc1: warning: unrecognized command line option ‘-Wno-self-assign’
cc1: warning: unrecognized command line option ‘-Wno-constant-logical-operand’
cc1: warning: unrecognized command line option ‘-Wno-parentheses-equality’
setup time for 10000 fibers:   0.146823
execution time for 1000 messages:   7.855211
```



----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71867

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86906] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (17 preceding siblings ...)
  2018-05-05 12:00 ` [ruby-core:86905] " nobu
@ 2018-05-05 12:03 ` samuel
  2018-05-05 12:17 ` [ruby-core:86907] " samuel
                   ` (29 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-05-05 12:03 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).

Yes, that supports my own test as well.

Okay.. well, I'm finding this so far really confusing, but here are the results I get on my Linux box:

```
koyoko% ruby --version
ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-linux]
koyoko% ruby ./fiber_benchmark.rb 10000 1000 
setup time for 10000 fibers:   0.094309
execution time for 1000 messages:  22.248827

koyoko% ./build/bin/ruby --version
ruby 2.6.0dev (2018-05-03 fiber-libcoro 63333) [x86_64-linux]
last_commit=Use libcoro for Fiber implementation to improve performance.
koyoko% ./build/bin/ruby ./fiber_benchmark.rb 10000 1000
setup time for 10000 fibers:   0.104364
execution time for 1000 messages:  19.717851

koyoko% ./build/bin/ruby --version
ruby 2.6.0dev (2018-05-03 fiber-libcoro 63333) [x86_64-linux]
last_commit=Use libcoro for Fiber implementation to improve performance.
koyoko% ./build/bin/ruby ./fiber_benchmark.rb 10000 1000
setup time for 10000 fibers:   0.104798
execution time for 1000 messages:   8.988672
```

However, on macOS, I can't reproduce my original results. I apologise. I was playing around with stack allocation. I tried to revert back to that state, but couldn't reproduce the results I gave earlier.

I will continue to investigate.

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71868

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86907] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (18 preceding siblings ...)
  2018-05-05 12:03 ` [ruby-core:86906] " samuel
@ 2018-05-05 12:17 ` samuel
  2018-05-05 12:30 ` [ruby-core:86908] " samuel
                   ` (28 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-05-05 12:17 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).

Okay, I found out what happened.

On macOS, you need to set

```
#include "libcoro/coro.c"
#define FIBER_USE_NATIVE 1
```

Otherwise it won't take the optimal code path. My apologies, I think as I was playing with the code I made that change but didn't commit it after I started patching it to work on Linux, since it seems on Linux that's the default.

Here is the performance improvement.

```
^_^ > ./build/bin/ruby ./fiber_benchmark.rb 10000 1000
setup time for 10000 fibers:   0.165381
execution time for 1000 messages:  14.267517

^_^ > ./build/bin/ruby ./fiber_benchmark.rb 10000 1000
setup time for 10000 fibers:   0.160629
execution time for 1000 messages:   6.307580
```

So, it's similar speed-up.

I tried to compile without libcoro, but with `#define FIBER_USE_NATIVE 1`, but it fails because `swapcontext/makecontext` is deprecated on macOS and compile fails.

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71869

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86908] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (19 preceding siblings ...)
  2018-05-05 12:17 ` [ruby-core:86907] " samuel
@ 2018-05-05 12:30 ` samuel
  2018-05-05 12:55 ` [ruby-core:86909] " samuel
                   ` (27 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-05-05 12:30 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).

I updated my branch with a few changes.

I'm sorry I didn't rebase on your branch.

I think once we decide if this is a good idea or not, we can decide how best to integrate it with Ruby. I just wanted to make a proof of concept to show it was a good improvement to performance.

My suggestion would be to remove the implementations from `cont.c` and update libcoro to support all required platforms. The API provided by libcoro is really great and a nice wrapper.

It should be possible to build libcoro on Windows. I do have Windows with Visual Studio set up but I really have no idea how to use it :) However, it wouldn't be silly to update libcoro to make it compile without problems on all supported platforms. It's quite an "old" implementation, but it does work really well. There are some other implementations available too, some are more modern, but I found this one was pretty good.

It might make sense to fork libcoro into a separate repo, I don't mind maintaining it, I already have a fork of it actually, and it's a bit different from the one here. But, it would make sense to update it a bit.

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71870

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86909] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (20 preceding siblings ...)
  2018-05-05 12:30 ` [ruby-core:86908] " samuel
@ 2018-05-05 12:55 ` samuel
  2018-05-05 20:28 ` [ruby-core:86913] " shevegen
                   ` (26 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-05-05 12:55 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).

I was reading https://sourceware.org/ml/libc-help/2016-01/msg00008.html and noticed the following regarding `*context` functions:

>> these functions are deprecated/dead -- they no longer exist in the latest
>> POSIX specification.  the preference would be to stop using them.  i think
>> we might consider dropping them in a future glibc version.

Of course they still exist, but yes they are deprecated, and non-existent in the latest POSIX standard. I might even remove it from my fork of `libcoro`.

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71871

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86913] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (21 preceding siblings ...)
  2018-05-05 12:55 ` [ruby-core:86909] " samuel
@ 2018-05-05 20:28 ` shevegen
  2018-05-05 22:21 ` [ruby-core:86914] " samuel
                   ` (25 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: shevegen @ 2018-05-05 20:28 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by shevegen (Robert A. Heiler).

> However, it wouldn't be silly to update libcoro to make it
> compile without problems on all supported platforms.

I can't speak for matz and the ruby core team, but in the past
there were (feature-)proposals that were rejected since they 
were only specific for e. g. Linux - that is, improvements 
pertaining to Linux, but not other OS. I think matz wants to have
ruby be as OS-agnostic as possible; in other words to work on
as many OS as possible, too. And there are quite some people 
who use ruby on windows as well, for one reason or another.

As for benchmarks, I think any noticable improvement is a
win and may fit into the "ruby 3 is 3x as fast as ruby 2.0",
but to get to that, it may be more important to verify that
the improvements could also work on windows. Even 3% would
be considerable. :)

By the way, I think there are some ruby-devs who use windows
too ... greg I think. May take a little before the issue here
is seen by them; they could probably help. (I use linux
myself so I won't be of much help.)

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71875

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86914] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (22 preceding siblings ...)
  2018-05-05 20:28 ` [ruby-core:86913] " shevegen
@ 2018-05-05 22:21 ` samuel
  2018-05-06  9:48 ` [ruby-core:86917] " shyouhei
                   ` (24 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-05-05 22:21 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).

The windows code path for fibers is relatively trivial both in libcoro and cont.c, so I wouldn’t be too concerned about windows support. It shouldn’t be much effort to make it work well in libcoro or keep existing windows code path.

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71876

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86917] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (23 preceding siblings ...)
  2018-05-05 22:21 ` [ruby-core:86914] " samuel
@ 2018-05-06  9:48 ` shyouhei
  2018-05-06 10:09 ` [ruby-core:86918] " samuel
                   ` (23 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: shyouhei @ 2018-05-06  9:48 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by shyouhei (Shyouhei Urabe).

I'm neutral.  This is a feature request but the "feature" being discussed is the speed of execution. It is by nature different from each other.  If this improvement could be truly transparent (and seems currently it is), I think there are chances for acceptance. Wider support for different OSes is definitely nice-to-have of course.

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71879

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86918] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (24 preceding siblings ...)
  2018-05-06  9:48 ` [ruby-core:86917] " shyouhei
@ 2018-05-06 10:09 ` samuel
  2018-05-06 11:07 ` [ruby-core:86919] " samuel
                   ` (22 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-05-06 10:09 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).

Thanks for your feedback. When I made this issue, I could only select "Bug", "Feature" or "Misc". Should I have selected "Misc" instead?

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71880

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86919] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (25 preceding siblings ...)
  2018-05-06 10:09 ` [ruby-core:86918] " samuel
@ 2018-05-06 11:07 ` samuel
  2018-05-06 11:17 ` [ruby-core:86920] " samuel
                   ` (21 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-05-06 11:07 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).


I test in some real world applications today. The first is async, which has a performance test for read context switch overhead: https://github.com/socketry/async/blob/master/spec/async/performance_spec.rb

This isn't direct comparison since I'm using rvm with ruby head and my branch, but it's pretty close.

```
# Without libcoro fibers
Async::Wrapper
Warming up --------------------------------------
Wrapper#wait_readable
                         1.801k i/100ms
    Reactor#register     2.087k i/100ms
Calculating -------------------------------------
Wrapper#wait_readable
                        176.789k (± 5.7%) i/s -    880.689k in   5.004582s
    Reactor#register    227.882k (± 2.9%) i/s -      1.140M in   5.004740s

Comparison:
    Reactor#register:   227882.2 i/s
Wrapper#wait_readable:   176789.3 i/s - 1.29x  slower

# With libcoro fibers (12% more context switch for read operations)
Async::Wrapper
Warming up --------------------------------------
Wrapper#wait_readable
                         2.217k i/100ms
    Reactor#register     2.380k i/100ms
Calculating -------------------------------------
Wrapper#wait_readable
                        197.116k (± 2.7%) i/s -    986.565k in   5.008582s
    Reactor#register    256.078k (± 4.4%) i/s -      1.278M in   5.003710s

Comparison:
    Reactor#register:   256077.8 i/s
Wrapper#wait_readable:   197115.9 i/s - 1.30x  slower
```

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71881

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86920] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (26 preceding siblings ...)
  2018-05-06 11:07 ` [ruby-core:86919] " samuel
@ 2018-05-06 11:17 ` samuel
  2018-05-06 12:17 ` [ruby-core:86921] " samuel
                   ` (20 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-05-06 11:17 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).


Compare async-dns with bind9 for the same workload:

```
# Without libcoro-fiber
                                     user     system      total        real
Async::DNS::Server               0.000345   0.000029   0.000374 (  0.000381)
Bind9                            0.000294   0.000025   0.000319 (  0.000328)

# With libcoro-fiber (no significant difference)
                                     user     system      total        real
Async::DNS::Server               0.000320   0.000048   0.000368 (  0.000371)
Bind9                            0.000218   0.000033   0.000251 (  0.000258)
```

This one was a toss-up, I'd say there was no significant difference.

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71882

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86921] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (27 preceding siblings ...)
  2018-05-06 11:17 ` [ruby-core:86920] " samuel
@ 2018-05-06 12:17 ` samuel
  2018-05-06 12:47 ` [ruby-core:86922] " samuel
                   ` (19 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-05-06 12:17 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).


I tested async-http, a web server, it has a basic performance spec using `wrk` as the client.

I ran it several times and report the best result of each below. It's difficult to make a judgement. I'd like to say performance was improved but if so, < 5%. However, this benchmark is testing an entire web server stack. Context switching only happens a few times per request.. If I had to take a guess, maybe not more than 4 times (accept, read request, write response). In many cases, we only context switch if the operation would block.

```
# Without libcoro-fiber
Async::HTTP::Server
  simple response
Running 2m test @ http://127.0.0.1:9292/
  8 threads and 8 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   110.06us  647.25us  67.72ms   99.33%
    Req/Sec    12.58k     3.07k   26.94k    70.77%
  12021990 requests in 2.00m, 401.28MB read
Requests/sec: 100100.72
Transfer/sec:      3.34MB

# With libcoro-fiber
Async::HTTP::Server
  simple response
Running 2m test @ http://127.0.0.1:9292/
  8 threads and 8 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   106.47us  834.32us  99.45ms   99.46%
    Req/Sec    12.66k     2.95k   17.61k    71.12%
  12093398 requests in 2.00m, 403.66MB read
Requests/sec: 100694.76
Transfer/sec:      3.36MB
```

This result surprised me a little bit, but now that I think about it, it could make sense. Because the cost of network (read/write) and processing (parsing, generating response, buffers, GC) far outweigh the fiber yield/resume, which is already minimised. In real world situations, the results should lean more in favour of libcoro.

Just for interest, I also collect system call stats.

```
# Without libcoro
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 45.76    4.635066           2   2095278           sendto
 32.47    3.288691           1   4191323           rt_sigprocmask
 20.90    2.117062           1   2095611       324 recvfrom
  0.67    0.068189        9741         7           poll
  0.07    0.006821           1      6256      5313 openat
  0.03    0.003404           1      4034         5 lstat
  0.01    0.001072           1      1158           read
  0.01    0.001049           1       987           close
  0.01    0.000805           1       901       421 stat
  0.01    0.000627          25        25           clone
  0.01    0.000624           1       793           fstat
  0.01    0.000521           4       124           mmap
  0.00    0.000475           1       798       246 fcntl
  0.00    0.000475           2       297         1 epoll_wait
  0.00    0.000402           3       140           mremap
  0.00    0.000386           1       346       322 epoll_ctl
  0.00    0.000331           1       557       552 ioctl
  0.00    0.000323          16        20           futex
  0.00    0.000321           3        94           mprotect
  0.00    0.000307           1       213           brk
  0.00    0.000255           4        62           getdents
  0.00    0.000183           1       291           getuid
  0.00    0.000180           1       292           geteuid
  0.00    0.000177           1       292           getegid
  0.00    0.000172           1       291           getgid
  0.00    0.000096           3        36           pipe2
  0.00    0.000074           6        12           munmap
  0.00    0.000066          11         6         2 execve
  0.00    0.000052           2        23        14 accept4
  0.00    0.000047           3        18           prctl
  0.00    0.000047           2        27           set_robust_list
  0.00    0.000045           2        19           getpid
  0.00    0.000040           0        81         2 rt_sigaction
  0.00    0.000028           2        16         8 access
  0.00    0.000017           1        15           getcwd
  0.00    0.000016           1        14           readlink
  0.00    0.000016           0       241       238 newfstatat
  0.00    0.000014           0        96           lseek
  0.00    0.000013           1        10           chdir
  0.00    0.000013           3         4           arch_prctl
  0.00    0.000012           0        25           setsockopt
  0.00    0.000009           0        25           getsockname
  0.00    0.000007           2         4           prlimit64
  0.00    0.000006           0        17           getsockopt
  0.00    0.000006           3         2           getrandom
  0.00    0.000004           2         2           sched_getaffinity
  0.00    0.000004           4         1           clock_gettime
  0.00    0.000003           2         2           write
  0.00    0.000003           3         1           sigaltstack
  0.00    0.000003           2         2           set_tid_address
  0.00    0.000002           2         1           vfork
  0.00    0.000001           1         1           wait4
  0.00    0.000001           1         1           getresgid
  0.00    0.000000           0         8           pipe
  0.00    0.000000           0         1           dup2
  0.00    0.000000           0         8           socket
  0.00    0.000000           0         8           bind
  0.00    0.000000           0         8           listen
  0.00    0.000000           0         1           sysinfo
  0.00    0.000000           0         1           getresuid
  0.00    0.000000           0         8           epoll_create1
------ ----------- ----------- --------- --------- ----------------
100.00   10.128563               8400935      7448 total

# With libcoro
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 65.83    5.263501           2   2708883           sendto
 32.87    2.628193           1   2709155       263 recvfrom
  1.06    0.084583       16917         5           poll
  0.09    0.006915           1      6232      5313 openat
  0.06    0.004405           1      4034         5 lstat
  0.02    0.001276           1      1123           read
  0.02    0.001207           1       833       379 stat
  0.01    0.000996           1       963           close
  0.01    0.000510           1       785           fstat
  0.01    0.000492           1       533       528 ioctl
  0.00    0.000330           2       162         1 epoll_wait
  0.00    0.000327           0       797       246 fcntl
  0.00    0.000285          11        25           clone
  0.00    0.000253           1       232           brk
  0.00    0.000253           1       284       260 epoll_ctl
  0.00    0.000239           2       123           mmap
  0.00    0.000207           2        95           mprotect
  0.00    0.000168           8        20           futex
  0.00    0.000163           3        62           getdents
  0.00    0.000142           0       291           getuid
  0.00    0.000139           1       238       235 newfstatat
  0.00    0.000133           0       292           geteuid
  0.00    0.000131           0       291           getgid
  0.00    0.000129           0       292           getegid
  0.00    0.000080           7        12           munmap
  0.00    0.000058           2        32           rt_sigprocmask
  0.00    0.000057           1        88           lseek
  0.00    0.000057           2        36           pipe2
  0.00    0.000044           1        81         2 rt_sigaction
  0.00    0.000043           3        14           readlink
  0.00    0.000039           2        16         8 access
  0.00    0.000036           2        22        13 accept4
  0.00    0.000035           1        27           set_robust_list
  0.00    0.000033           2        18           prctl
  0.00    0.000028           1        19           getpid
  0.00    0.000026           2        15           getcwd
  0.00    0.000020           2        10           chdir
  0.00    0.000013          13         1           wait4
  0.00    0.000009           5         2           getrandom
  0.00    0.000008           0        25           setsockopt
  0.00    0.000006           3         2           write
  0.00    0.000006           0        25           getsockname
  0.00    0.000003           3         1           vfork
  0.00    0.000003           1         6         2 execve
  0.00    0.000003           1         4           arch_prctl
  0.00    0.000003           2         2           set_tid_address
  0.00    0.000003           1         4           prlimit64
  0.00    0.000002           0        17           getsockopt
  0.00    0.000002           2         1           sigaltstack
  0.00    0.000001           1         1           getresuid
rake aborted!
  0.00    0.000001           1         1           getresgid
  0.00    0.000001           1         2           sched_getaffinity
  0.00    0.000000           0         8           pipe
  0.00    0.000000           0         1           dup2
  0.00    0.000000           0         8           socket
  0.00    0.000000           0         8           bind
  0.00    0.000000           0         8           listen
Interrupt: 
  0.00    0.000000           0         1           sysinfo
  0.00    0.000000           0         1           clock_gettime
  0.00    0.000000           0         8           epoll_create1
------ ----------- ----------- --------- --------- ----------------
```

`rt_sigprocmask` was gone because it's not invoked by libcoro unless using `swapcontext`.

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71883

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86922] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (28 preceding siblings ...)
  2018-05-06 12:17 ` [ruby-core:86921] " samuel
@ 2018-05-06 12:47 ` samuel
  2018-05-08  4:33 ` [ruby-core:86940] " samuel
                   ` (18 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-05-06 12:47 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).

I could get a more interesting result with `ab`. It's quite late now so I will write an update tomorrow, but it seemed to have about a 7% improvement. `ab` uses `Connection: close` which makes a new connection for each request and stresses concurrent IO a bit more. 

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71884

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86940] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (29 preceding siblings ...)
  2018-05-06 12:47 ` [ruby-core:86922] " samuel
@ 2018-05-08  4:33 ` samuel
  2018-05-08  7:43 ` [ruby-core:86945] " ko1
                   ` (17 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-05-08  4:33 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).

It's been a while since I played around with libcoro.

I was evaluating it's performance in a C++ program.

I found that it's not thread safe due to global variables. I change them to thread local to fix the issue, it works well.

I just want to reinforce that this was a proof of concept, if we decide to roll with such an implementation, it requires more work. I am happy to help with that but it would be good to get some feedback regarding whether such a contribution would be acceptable before investing so much time.

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71906

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86945] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (30 preceding siblings ...)
  2018-05-08  4:33 ` [ruby-core:86940] " samuel
@ 2018-05-08  7:43 ` ko1
  2018-05-08 10:41 ` [ruby-core:86946] " duerst
                   ` (16 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: ko1 @ 2018-05-08  7:43 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ko1 (Koichi Sasada).

Sorry I can't read all of your comments because it too long :p

As you quoted first,

> Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

In this article:
> A lightweight swapcontext implementation

It shows that `swapcontext` has extra overhead because of sigprocmask system call.

> rt_sigprocmask was gone because it's not invoked by libcoro unless using swapcontext.

Yes.

Last year, I tried modified `swapcontext` that article introduced, and I got good performance.
(I found Fiber resume/yiled ping ping and I found sigprocmask is one overhead, and google about it, and I also found same page :p)

However, introduced `swapcontext` is based on glibc, so there is a license problem that we can't merge it into Ruby source code.

Using libcoro (I don't see the library, but as you say) seems to use same tech, so it is one idea to employ.
However, I'm not sure it is the best way.

No conclusion, but it is my current comment.

Thanks,
Koichi

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71911

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86946] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (31 preceding siblings ...)
  2018-05-08  7:43 ` [ruby-core:86945] " ko1
@ 2018-05-08 10:41 ` duerst
  2018-05-09 14:23 ` [ruby-core:86956] " samuel
                   ` (15 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: duerst @ 2018-05-08 10:41 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by duerst (Martin Dürst).


ioquatix (Samuel Williams) wrote:
> Thanks for your feedback. When I made this issue, I could only select "Bug", "Feature" or "Misc". Should I have selected "Misc" instead?

"Feature" should be okay.

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71913

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86956] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (32 preceding siblings ...)
  2018-05-08 10:41 ` [ruby-core:86946] " duerst
@ 2018-05-09 14:23 ` samuel
  2018-05-09 23:34 ` [ruby-core:86958] " samuel
                   ` (14 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-05-09 14:23 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).

Thanks Koichi, for your valuable response and I appreciate your past work in this area.

I started hacking on my own implementation for x64. It is slightly simpler than libcoro.

I have been reviewing x64 ABI, and it should be pretty trivial to support both 64-bit Windows ABI and 64-bit System V ABI (Linux, Mac, Solaris, BSD). The amount of code is < 200 lines for both ABIs.

For all other ABIs, I suggest using existing code path. I am happy to release this code to Ruby/MRI under whatever license is suitable.

Please be patient while I finish off the patch, when it is done I will update here.

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71920

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86958] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (33 preceding siblings ...)
  2018-05-09 14:23 ` [ruby-core:86956] " samuel
@ 2018-05-09 23:34 ` samuel
  2018-05-10  4:28 ` [ruby-core:86961] " samuel
                   ` (13 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-05-09 23:34 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).

What compiler is used to compile 64-bit Ruby on Windows?

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71922

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:86961] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (34 preceding siblings ...)
  2018-05-09 23:34 ` [ruby-core:86958] " samuel
@ 2018-05-10  4:28 ` samuel
  2018-05-14  7:40 ` [ruby-core:87018] " sam.saffron
                   ` (12 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-05-10  4:28 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).

Here is the initial code.

https://github.com/kurocha/coroutine

It implements a semantically similar interface to `libcoro`, but it supports native coroutines on win32, win64 and amd64. I should add a `ucontext` wrapper (`makecontext`/`swapcontext`) for other platforms, then I think all platforms are supported. `libcoro` didn't have good windows support.

I've put this code under the MIT license.

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71926

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:87018] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (35 preceding siblings ...)
  2018-05-10  4:28 ` [ruby-core:86961] " samuel
@ 2018-05-14  7:40 ` sam.saffron
  2018-05-14  7:49 ` [ruby-core:87019] " ko1
                   ` (11 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: sam.saffron @ 2018-05-14  7:40 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by sam.saffron (Sam Saffron).

Does this change move us any closer to being able to ship fibers between threads? 

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71984

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:87019] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (36 preceding siblings ...)
  2018-05-14  7:40 ` [ruby-core:87018] " sam.saffron
@ 2018-05-14  7:49 ` ko1
  2018-05-14  8:05 ` [ruby-core:87021] " samuel
                   ` (10 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: ko1 @ 2018-05-14  7:49 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ko1 (Koichi Sasada).

sorry I missed comments.

How to ship with this library? bundle it or download by others?
(this is similar discussion with jemalloc :))

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71985

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:87021] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (37 preceding siblings ...)
  2018-05-14  7:49 ` [ruby-core:87019] " ko1
@ 2018-05-14  8:05 ` samuel
  2018-05-14  8:14 ` [ruby-core:87022] " samuel
                   ` (9 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-05-14  8:05 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).

@ko1 I would suggest we make a Ruby specific version, but we can also try to make generic static library so that it can be maintained separately. I already have some other projects using coroutines so it's useful to me to have a C library implementation which is maintained well.

@sam.saffron This is an interesting question which I did specifically try to address in this implementation. I will give you the details.

Typical implementation of Fiber uses thread local variables for main fiber and currently executing fiber `Fiber.current`. Because of this, it's annoying to ship fiber between threads. Additionally, I'd argue that moving fibers between threads is inherently not safe. I'd Kindly suggest that a coroutine which can be resumed on different threads is not a "Fiber" but a "Green Thread". The fundamental difference is how Fiber is implemented, and it depends on thread local storage. For example, how would Fiber#resume work on a different thread if it's executing already? Right now, `yield` and `resume` are VERY efficient because they don't have to check anything like this.

However, coroutines are the underlying abstraction for implementing Fiber and they CAN be moved across threads.

This particular implementation was designed very carefully to allow for this. In particular, `coroutine_transfer` function takes two arguments, a coroutine to store the current stack, and a coroutine to restore it's stack. In particular, `coroutine_transfer` passes both these arguments to the start function, and additionally, `coroutine_transfer` returns the coroutine that invoked it, so returning back doesn't require any shared state. Because of this, the implementation avoids any kind of "global" state, it's all on the coroutine stack.

Therefore, with this coroutine library, we can nicely implement green threads too, but you'd need to provide additional guarantees/locking around coroutine_transfer. If you want to transfer a coroutine to another thread, you need to move the `coroutine_context` data structure (contains stack) to the new thread, and the new thread needs to call `coroutine_transfer`. The coroutine can simply call `coroutine_transfer` to return back, using either the argument `from` or the result of a previous `coroutine_transfer`.

So, the short answer is yes.

@ko1 I also finished implementing for arm64, and hopefully can implement for arm32 soon. I test on raspberry pi :) I don't know about PowerPC, I don't have any hardware to test this. Can we test in a VM?

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71987

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:87022] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (38 preceding siblings ...)
  2018-05-14  8:05 ` [ruby-core:87021] " samuel
@ 2018-05-14  8:14 ` samuel
  2018-06-01 23:21 ` [ruby-core:87350] " samuel
                   ` (8 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-05-14  8:14 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).

Here is the test which shows coroutine arguments and `coroutine_transfer` result.

https://github.com/kurocha/coroutine/blob/9bbd5e514c2e0f8f3c7c1c277aa6deb5e337a9c7/test/Coroutine/transfer.cpp#L17-L21

The reason for `COROUTINE` macro is that on win32, in order to avoid lots of stack manipulation, we need to use `__fastcall`.

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-71988

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:87350] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (39 preceding siblings ...)
  2018-05-14  8:14 ` [ruby-core:87022] " samuel
@ 2018-06-01 23:21 ` samuel
  2018-06-02  1:07 ` [ruby-core:87351] " samuel
                   ` (7 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-06-01 23:21 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).


I've made a new branch with the new implementation above.

It shows a slightly improved performance improvement over `libcoro`.

Here is without the PR:

```
^_^ > ./build/bin/ruby ./fiber_benchmark.rb 10000 1000
setup time for 10000 fibers:   0.161763
execution time for 1000 messages:  14.018874
setup time for 10000 fibers:   1.572869
execution time for 1000 messages:  13.778874
setup time for 10000 fibers:   0.917040
execution time for 1000 messages:  13.942525
setup time for 10000 fibers:   1.616929
execution time for 1000 messages:  13.991115
setup time for 10000 fibers:   1.623587
execution time for 1000 messages:  14.281334
```

And here it is with the PR, on macOS (the same system used in previous benchmarks):

```
^_^ > ./build/bin/ruby ./fiber_benchmark.rb 10000 1000 
setup time for 10000 fibers:   0.160637
execution time for 1000 messages:   6.009332
setup time for 10000 fibers:   0.244175
execution time for 1000 messages:   6.246711
setup time for 10000 fibers:   0.242718
execution time for 1000 messages:   6.142166
setup time for 10000 fibers:   0.233410
execution time for 1000 messages:   5.994752
setup time for 10000 fibers:   0.288830
execution time for 1000 messages:   6.216617
```

Performance is about 2~2.5x faster depending on your analysis. Both creation and execution time is improved. But remember this is micro-benchmark.

I was also interested in mjit performance:

Without PR, enabled mjit:

```
^_^ > ./build/bin/ruby --jit ./fiber_benchmark.rb 10000 1000
setup time for 10000 fibers:   0.172145
execution time for 1000 messages:  25.176702
setup time for 10000 fibers:   1.654751
execution time for 1000 messages:  14.729177
setup time for 10000 fibers:   1.016810
execution time for 1000 messages:  15.154141
setup time for 10000 fibers:   1.726305
execution time for 1000 messages:  14.797269
setup time for 10000 fibers:   2.025997
execution time for 1000 messages:  15.124753
```

With PR, enabled mjit:

```
x_x > ./build/bin/ruby --jit ./fiber_benchmark.rb 10000 1000
setup time for 10000 fibers:   0.179744
execution time for 1000 messages:  13.793318
setup time for 10000 fibers:   0.354717
execution time for 1000 messages:  10.664870
setup time for 10000 fibers:   0.308818
execution time for 1000 messages:   6.956352
setup time for 10000 fibers:   0.378568
execution time for 1000 messages:   6.553922
setup time for 10000 fibers:   0.295583
execution time for 1000 messages:   7.274086
```

We can see it still needs a bit of work.

I will try to isolate some interesting results from higher level frameworks.

The updated branch is here: https://github.com/ioquatix/ruby/tree/native-fiber

It only work on Darwin x84 at the moment, because changes to autoconf do not cover all platforms yet. I'll fix this soon.

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-72339

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:87351] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (40 preceding siblings ...)
  2018-06-01 23:21 ` [ruby-core:87350] " samuel
@ 2018-06-02  1:07 ` samuel
  2018-06-02  1:34 ` [ruby-core:87353] " samuel
                   ` (6 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-06-02  1:07 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).


I fixed autoconf issues and built on Linux. The performance improvement was even more impressive.

```
koyoko% ruby --version
ruby 2.6.0dev (2018-06-01 native-fiber 63544) [x86_64-linux]
last_commit=Better support for amd64 platforms
koyoko% ruby ./fiber_benchmark.rb 
setup time for 1000 fibers:   0.007222
execution time for 10000 messages:   3.433891
setup time for 1000 fibers:   0.015365
execution time for 10000 messages:   3.177730
setup time for 1000 fibers:   0.010035
execution time for 10000 messages:   3.205329
setup time for 1000 fibers:   0.012063
execution time for 10000 messages:   2.968101
setup time for 1000 fibers:   0.010448
execution time for 10000 messages:   2.947756
koyoko% rvm use 2.6
Using /home/samuel/.rvm/gems/ruby-2.6.0-preview2
koyoko% ruby --version           
ruby 2.6.0preview2 (2018-05-31 trunk 63539) [x86_64-linux]
koyoko% ruby ./fiber_benchmark.rb
setup time for 1000 fibers:   0.006881
execution time for 10000 messages:  13.242779
setup time for 1000 fibers:   0.009869
execution time for 10000 messages:  13.468187
setup time for 1000 fibers:   0.013938
execution time for 10000 messages:  12.691139
setup time for 1000 fibers:   0.014423
execution time for 10000 messages:  12.005481
setup time for 1000 fibers:   0.013953
execution time for 10000 messages:  12.535145
```

@nobu do you mind confirming?

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-72343

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:87353] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (41 preceding siblings ...)
  2018-06-02  1:07 ` [ruby-core:87351] " samuel
@ 2018-06-02  1:34 ` samuel
  2018-06-03  2:47 ` [ruby-core:87363] " samuel
                   ` (5 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-06-02  1:34 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).


Here is a more realistic benchmark which fiber context switch is only a tiny percentage of the actual run-time.

A brief summary of the benchmark: `async-http` uses an event-driven stackful coroutine (fiber) based design. Each request allocates a fiber, and each blocking operation (i.e. read) results in `Fiber.yield`. Once the IO is ready, `Fiber#resume` is called. So, for each request being processed, we expect several calls to `Fiber.yield`. `async` is optimistic so it tries to perform the operation e.g. `read` and only yields if it results in `EWOULDBLOCK` so in some cases (especially in synthetic benchmarks) some Fiber scheduling may be elided.

```
koyoko% rvm use 2.6
Using /home/samuel/.rvm/gems/ruby-2.6.0-preview2
koyoko% ruby --version
ruby 2.6.0preview2 (2018-05-31 trunk 63539) [x86_64-linux]
koyoko% bundle exec rake wrk
Running 10s test @ http://127.0.0.1:9294/
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    63.59us   77.52us   4.53ms   98.32%
    Req/Sec    16.68k     1.07k   18.32k    74.26%
  167544 requests in 10.10s, 14.54MB read
Requests/sec:  16589.33
Transfer/sec:      1.44MB
Running 10s test @ http://127.0.0.1:9294/
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    60.85us   34.26us   1.39ms   95.82%
    Req/Sec    16.82k     0.87k   18.49k    70.00%
  167424 requests in 10.00s, 14.53MB read
Requests/sec:  16742.19
Transfer/sec:      1.45MB
Running 10s test @ http://127.0.0.1:9294/
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    62.44us   54.34us   3.81ms   97.62%
    Req/Sec    16.62k     1.00k   18.09k    67.33%
  166959 requests in 10.10s, 14.49MB read
Requests/sec:  16530.76
Transfer/sec:      1.43MB
Running 10s test @ http://127.0.0.1:9294/
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    61.89us   32.53us 687.00us   94.29%
    Req/Sec    16.54k     1.20k   18.37k    67.33%
  166105 requests in 10.10s, 14.42MB read
Requests/sec:  16445.91
Transfer/sec:      1.43MB
Running 10s test @ http://127.0.0.1:9294/
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    60.90us   37.64us   1.70ms   95.89%
    Req/Sec    16.89k     1.22k   18.57k    72.28%
  169694 requests in 10.10s, 14.73MB read
Requests/sec:  16802.33
Transfer/sec:      1.46MB
```

Here is with the PR:

```
koyoko% rvm use ruby-head-fiber
Using /home/samuel/.rvm/gems/ruby-head-fiber
koyoko% ruby --version
ruby 2.6.0dev (2018-06-01 native-fiber 63544) [x86_64-linux]
last_commit=Better support for amd64 platforms
koyoko% bundle exec rake wrk
Running 10s test @ http://127.0.0.1:9294/
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    62.53us   73.11us   5.02ms   97.96%
    Req/Sec    16.80k     1.35k   19.46k    63.37%
  168863 requests in 10.10s, 14.65MB read
Requests/sec:  16719.77
Transfer/sec:      1.45MB
Running 10s test @ http://127.0.0.1:9294/
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    58.91us   35.19us   1.54ms   95.25%
    Req/Sec    17.49k     1.16k   19.42k    69.31%
  175719 requests in 10.10s, 15.25MB read
Requests/sec:  17399.00
Transfer/sec:      1.51MB
Running 10s test @ http://127.0.0.1:9294/
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    58.64us   45.92us   3.09ms   96.88%
    Req/Sec    17.72k     1.10k   19.42k    71.29%
  178027 requests in 10.10s, 15.45MB read
Requests/sec:  17626.32
Transfer/sec:      1.53MB
Running 10s test @ http://127.0.0.1:9294/
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    60.83us   33.93us   1.06ms   94.93%
    Req/Sec    16.86k     1.54k   19.36k    63.37%
  169307 requests in 10.10s, 14.69MB read
Requests/sec:  16764.19
Transfer/sec:      1.45MB
Running 10s test @ http://127.0.0.1:9294/
  1 threads and 1 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    59.07us   39.77us   2.17ms   95.97%
    Req/Sec    17.52k     0.98k   19.32k    66.34%
  176112 requests in 10.10s, 15.28MB read
Requests/sec:  17436.64
Transfer/sec:      1.51MB
```

This is actually better than I expected. I would say there is a practical improvement of about ~5%. In this situation it's very workload dependent, but I'm glad that I saw something.

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-72345

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:87363] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (42 preceding siblings ...)
  2018-06-02  1:34 ` [ruby-core:87353] " samuel
@ 2018-06-03  2:47 ` samuel
  2018-06-06 21:32 ` [ruby-core:87435] " git
                   ` (4 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-06-03  2:47 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).


I've made a short blog post about this PR: https://www.codeotaku.com/journal/2018-06/improving-ruby-fibers/index

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-72349

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:87435] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (43 preceding siblings ...)
  2018-06-03  2:47 ` [ruby-core:87363] " samuel
@ 2018-06-06 21:32 ` git
  2018-06-06 23:59 ` [ruby-core:87438] " samuel
                   ` (3 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: git @ 2018-06-06 21:32 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by cremes (Chuck Remes).

I'd like to link this to another open issue regarding Fiber migration between threads. https://bugs.ruby-lang.org/issues/13821

@ioquatix, please note in the above-referenced bug that I put in a link to the "boost" documentation regarding coroutine movement between threads. An explicit API to lock/unlock ownership of the fiber to a thread would probably resolve some of the complaints people raise about fiber migration. If it's explicit, more guarantees can be made. Default behavior should be the current behavior where Fibers cannot migrate.

Thanks for your work on this.

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-72429

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:87438] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (44 preceding siblings ...)
  2018-06-06 21:32 ` [ruby-core:87435] " git
@ 2018-06-06 23:59 ` samuel
  2018-09-13  7:25 ` [ruby-core:88981] " matz
                   ` (2 subsequent siblings)
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-06-06 23:59 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).

@cremes Thanks for your positive feedback and linking me to related issues.

The coroutine implementation was specifically designed to handle cross-thread migrations, in the sense that all the required state to yield/resume is passed as arguments/returns to/from the coroutine.

What this means is that no global/thread-local state is required and thus when moving a coroutine to another thread, there is almost no additional data to sync which is nice from an API point of view.

The bigger challenge is how Ruby Fiber is implemented. It does make it tricky. I would be happy to work towards this. I see the following path being viable:

- Merge these changes.
- Simplify the Fiber implementation by removing all the other implementations from `cont.c` and if necessary move these to the coroutine code (but ideally remove them).
- With the simplified Fiber code base, explore the overheads of Fiber creation/context switching and figure out the right places to put locking/checks (e.g. for locks being held, etc).

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-72432

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:88981] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (45 preceding siblings ...)
  2018-06-06 23:59 ` [ruby-core:87438] " samuel
@ 2018-09-13  7:25 ` matz
  2018-09-13  7:33 ` [ruby-core:88983] " hsbt
  2018-12-11 23:55 ` [ruby-core:90429] [Ruby trunk Feature#14739][Closed] " samuel
  48 siblings, 0 replies; 49+ messages in thread
From: matz @ 2018-09-13  7:25 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by matz (Yukihiro Matsumoto).

OK, it sounds reasonable. We will give you commit privilege.

Matz.

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-74009

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:88983] [Ruby trunk Feature#14739] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (46 preceding siblings ...)
  2018-09-13  7:25 ` [ruby-core:88981] " matz
@ 2018-09-13  7:33 ` hsbt
  2018-12-11 23:55 ` [ruby-core:90429] [Ruby trunk Feature#14739][Closed] " samuel
  48 siblings, 0 replies; 49+ messages in thread
From: hsbt @ 2018-09-13  7:33 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by hsbt (Hiroshi SHIBATA).

Hi, ioquatix.

I send an invitation of the ruby core team. Please check it.

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-74012

* Author: ioquatix (Samuel Williams)
* Status: Open
* Priority: Normal
* Assignee: 
* Target version: 
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [ruby-core:90429] [Ruby trunk Feature#14739][Closed] Improve fiber yield/resume performance
       [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
                   ` (47 preceding siblings ...)
  2018-09-13  7:33 ` [ruby-core:88983] " hsbt
@ 2018-12-11 23:55 ` samuel
  48 siblings, 0 replies; 49+ messages in thread
From: samuel @ 2018-12-11 23:55 UTC (permalink / raw
  To: ruby-core

Issue #14739 has been updated by ioquatix (Samuel Williams).

Status changed from Open to Closed
Assignee set to ioquatix (Samuel Williams)
Target version set to 2.6

This is now implemented across: arm32, arm64, ppc64le, win32, win64, x86, amd64. Thanks to everyone who helped with this. This is a really awesome first step to improving Ruby Fiber performance.

----------------------------------------
Feature #14739: Improve fiber yield/resume performance
https://bugs.ruby-lang.org/issues/14739#change-75583

* Author: ioquatix (Samuel Williams)
* Status: Closed
* Priority: Normal
* Assignee: ioquatix (Samuel Williams)
* Target version: 2.6
----------------------------------------
I am interested to improve Fiber yield/resume performance.

I've used this library before: http://software.schmorp.de/pkg/libcoro.html and handled millions of HTTP requests using it.

I'd suggest to use that library.

As this is used in many places in Ruby (e.g. enumerable) it could be a big performance win across the board.

Here is a nice summary of what was done for RethinkDB: https://rethinkdb.com/blog/making-coroutines-fast/

Does Ruby currently reuse stacks? This is also a big performance win if it's not being done already.

-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2018-12-11 23:55 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <redmine.issue-14739.20180505005954@ruby-lang.org>
2018-05-05  0:59 ` [ruby-core:86879] [Ruby trunk Feature#14739] Improve fiber yield/resume performance samuel
2018-05-05  1:25 ` [ruby-core:86880] " shyouhei
2018-05-05  2:48 ` [ruby-core:86884] " samuel
2018-05-05  2:50 ` [ruby-core:86885] " samuel
2018-05-05  2:54 ` [ruby-core:86887] " samuel
2018-05-05  2:57 ` [ruby-core:86888] " samuel
2018-05-05  3:17 ` [ruby-core:86889] " samuel
2018-05-05  3:38 ` [ruby-core:86890] " samuel
2018-05-05  3:54 ` [ruby-core:86891] " samuel
2018-05-05  3:56 ` [ruby-core:86892] " samuel
2018-05-05  8:41 ` [ruby-core:86895] " v.ondruch
2018-05-05 10:13 ` [ruby-core:86896] " nobu
2018-05-05 10:14 ` [ruby-core:86897] " samuel
2018-05-05 10:18 ` [ruby-core:86898] " samuel
2018-05-05 10:48 ` [ruby-core:86900] " samuel
2018-05-05 10:59 ` [ruby-core:86902] " samuel
2018-05-05 11:32 ` [ruby-core:86904] " samuel
2018-05-05 12:00 ` [ruby-core:86905] " nobu
2018-05-05 12:03 ` [ruby-core:86906] " samuel
2018-05-05 12:17 ` [ruby-core:86907] " samuel
2018-05-05 12:30 ` [ruby-core:86908] " samuel
2018-05-05 12:55 ` [ruby-core:86909] " samuel
2018-05-05 20:28 ` [ruby-core:86913] " shevegen
2018-05-05 22:21 ` [ruby-core:86914] " samuel
2018-05-06  9:48 ` [ruby-core:86917] " shyouhei
2018-05-06 10:09 ` [ruby-core:86918] " samuel
2018-05-06 11:07 ` [ruby-core:86919] " samuel
2018-05-06 11:17 ` [ruby-core:86920] " samuel
2018-05-06 12:17 ` [ruby-core:86921] " samuel
2018-05-06 12:47 ` [ruby-core:86922] " samuel
2018-05-08  4:33 ` [ruby-core:86940] " samuel
2018-05-08  7:43 ` [ruby-core:86945] " ko1
2018-05-08 10:41 ` [ruby-core:86946] " duerst
2018-05-09 14:23 ` [ruby-core:86956] " samuel
2018-05-09 23:34 ` [ruby-core:86958] " samuel
2018-05-10  4:28 ` [ruby-core:86961] " samuel
2018-05-14  7:40 ` [ruby-core:87018] " sam.saffron
2018-05-14  7:49 ` [ruby-core:87019] " ko1
2018-05-14  8:05 ` [ruby-core:87021] " samuel
2018-05-14  8:14 ` [ruby-core:87022] " samuel
2018-06-01 23:21 ` [ruby-core:87350] " samuel
2018-06-02  1:07 ` [ruby-core:87351] " samuel
2018-06-02  1:34 ` [ruby-core:87353] " samuel
2018-06-03  2:47 ` [ruby-core:87363] " samuel
2018-06-06 21:32 ` [ruby-core:87435] " git
2018-06-06 23:59 ` [ruby-core:87438] " samuel
2018-09-13  7:25 ` [ruby-core:88981] " matz
2018-09-13  7:33 ` [ruby-core:88983] " hsbt
2018-12-11 23:55 ` [ruby-core:90429] [Ruby trunk Feature#14739][Closed] " samuel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).