From: samuel@oriontransfer.net
To: ruby-core@ruby-lang.org
Subject: [ruby-core:100499] [Ruby master Bug#17263] Fiber context switch degrades with number of fibers, limit on number of fibers
Date: Thu, 22 Oct 2020 10:43:04 +0000 (UTC) [thread overview]
Message-ID: <redmine.journal-88119.20201022104303.15231@ruby-lang.org> (raw)
In-Reply-To: redmine.issue-17263.20201015100437.15231@ruby-lang.org
Issue #17263 has been updated by ioquatix (Samuel Williams).
You do not need to preface your questions like that.
The fiber stack is a fixed size, but it's allocated using virtual memory and it's "released" using `madvise(DONT_NEED)` which allows the kernel to release those pages. So whether you allocate fibers with a 1MiB stack or a 128MiB stack, the only difference is address space consumed which is almost free and the actual amount of stack used, rounded up to the nearest page. One part to be careful of is to ensure the GC knows the correct extent of the stack, otherwise it would page in the entire stack (mostly zeros).
----------------------------------------
Bug #17263: Fiber context switch degrades with number of fibers, limit on number of fibers
https://bugs.ruby-lang.org/issues/17263#change-88119
* Author: ciconia (Sharon Rosner)
* Status: Open
* Priority: Normal
* ruby -v: 2.7.1
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
I'm working on developing [Polyphony](https://github.com/digital-fabric/polyphony), a Ruby gem for writing
highly-concurrent Ruby programs with fibers. In the course of my work I have
come up against two problems using Ruby fibers:
1. Fiber context switching performance seem to degrade as the number of fibers
is increased. This is both with `Fiber#transfer` and
`Fiber#resume/Fiber.yield`.
2. The number of concurrent fibers that can exist at any time seems to be
limited. Once a certain number is reached (on my system this seems to be
31744 fibers), calling `Fiber#transfer` will raise a `FiberError` with the
message `can't set a guard page: Cannot allocate memory`. This is not due to
RAM being saturated. With 10000 fibers, my test program hovers at around 150MB
RSS (on Ruby 2.7.1).
Here's a program for testing the performance of `Fiber#transfer`:
```ruby
# frozen_string_literal: true
require 'fiber'
class Fiber
attr_accessor :next
end
def run(num_fibers)
count = 0
GC.start
GC.disable
first = nil
last = nil
supervisor = Fiber.current
num_fibers.times do
fiber = Fiber.new do
loop do
count += 1
if count == 1_000_000
supervisor.transfer
else
Fiber.current.next.transfer
end
end
end
first ||= fiber
last.next = fiber if last
last = fiber
end
last.next = first
t0 = Time.now
first.transfer
elapsed = Time.now - t0
rss = `ps -o rss= -p #{Process.pid}`.to_i
puts "fibers: #{num_fibers} rss: #{rss} count: #{count} rate: #{count / elapsed}"
rescue Exception => e
puts "Stopped at #{count} fibers"
p e
end
run(100)
run(1000)
run(10000)
run(100000)
```
With Ruby 2.6.5 I'm getting:
```
fibers: 100 rss: 23212 count: 1000000 rate: 3357675.1688139187
fibers: 1000 rss: 31292 count: 1000000 rate: 2455537.056439736
fibers: 10000 rss: 127388 count: 1000000 rate: 954251.1674325482
Stopped at 22718 fibers
#<FiberError: can't set a guard page: Cannot allocate memory>
```
With Ruby 2.7.1 I'm getting:
```
fibers: 100 rss: 23324 count: 1000000 rate: 3443916.967616508
fibers: 1000 rss: 34676 count: 1000000 rate: 2333315.3862491543
fibers: 10000 rss: 151364 count: 1000000 rate: 916772.1008060966
Stopped at 31744 fibers
#<FiberError: can't set a guard page: Cannot allocate memory>
```
With ruby-head I get an almost identical result to that of 2.7.1.
As you can see, the performance degradation is similar in all the three versions
of Ruby, going from ~3.4M context switches per second for 100 fibers to less
then 1M context switches per second for 10000 fibers. Running with 100000 fibers
fails to complete.
Here's a program for testing the performance of `Fiber#resume/Fiber.yield`:
```ruby
# frozen_string_literal: true
require 'fiber'
class Fiber
attr_accessor :next
end
# This program shows how the performance of Fiber.transfer degrades as the fiber
# count increases
def run(num_fibers)
count = 0
GC.start
GC.disable
fibers = []
num_fibers.times do
fibers << Fiber.new { loop { Fiber.yield } }
end
t0 = Time.now
while count < 1000000
fibers.each do |f|
count += 1
f.resume
end
end
elapsed = Time.now - t0
puts "fibers: #{num_fibers} count: #{count} rate: #{count / elapsed}"
rescue Exception => e
puts "Stopped at #{count} fibers"
p e
end
run(100)
run(1000)
run(10000)
run(100000)
```
With Ruby 2.7.1 I'm getting the following output:
```
fibers: 100 count: 1000000 rate: 3048230.049946255
fibers: 1000 count: 1000000 rate: 2362235.6455160403
fibers: 10000 count: 1000000 rate: 950251.7621725246
Stopped at 21745 fibers
#<FiberError: can't set a guard page: Cannot allocate memory>
```
As I understand it, theoretically at least switching between fibers should have
a constant cost in terms of CPU cycles, irrespective of the number of fibers
currently existing in memory. I am completely ignorant the implementation
details of Ruby fibers, so at least for now I don't have any idea where this
problem is coming from.
--
https://bugs.ruby-lang.org/
next prev parent reply other threads:[~2020-10-22 10:43 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-15 10:04 [ruby-core:100401] [Ruby master Bug#17263] Fiber context switch degrades with number of fibers, limit on number of fibers ciconia
2020-10-15 11:13 ` [ruby-core:100402] " samuel
2020-10-15 11:25 ` [ruby-core:100403] " samuel
2020-10-16 1:58 ` [ruby-core:100412] " samuel
2020-10-16 6:19 ` [ruby-core:100418] " samuel
2020-10-20 20:40 ` [ruby-core:100453] " ciconia
2020-10-22 10:43 ` samuel [this message]
2022-01-31 14:47 ` [ruby-core:107390] " rmosolgo (Robert Mosolgo)
2023-08-25 0:13 ` [ruby-core:114519] " ioquatix (Samuel Williams) via ruby-core
2023-08-25 0:13 ` [ruby-core:114520] " ioquatix (Samuel Williams) via ruby-core
2023-08-25 3:16 ` [ruby-core:114523] " ioquatix (Samuel Williams) via ruby-core
2023-08-25 3:39 ` [ruby-core:114524] " ioquatix (Samuel Williams) via ruby-core
2023-08-25 4:28 ` [ruby-core:114525] " ioquatix (Samuel Williams) via ruby-core
2023-09-18 8:21 ` [ruby-core:114794] " kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-list from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.ruby-lang.org/en/community/mailing-lists/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=redmine.journal-88119.20201022104303.15231@ruby-lang.org \
--to=ruby-core@ruby-lang.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).