ruby-core@ruby-lang.org archive (unofficial mirror)
 help / color / mirror / Atom feed
* [ruby-core:100401] [Ruby master Bug#17263] Fiber context switch degrades with number of fibers, limit on number of fibers
@ 2020-10-15 10:04 ciconia
  2020-10-15 11:13 ` [ruby-core:100402] " samuel
                   ` (12 more replies)
  0 siblings, 13 replies; 14+ messages in thread
From: ciconia @ 2020-10-15 10:04 UTC (permalink / raw)
  To: ruby-core

Issue #17263 has been reported by ciconia (Sharon Rosner).

----------------------------------------
Bug #17263: Fiber context switch degrades with number of fibers, limit on number of fibers
https://bugs.ruby-lang.org/issues/17263

* Author: ciconia (Sharon Rosner)
* Status: Open
* Priority: Normal
* ruby -v: 2.7.1
* Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN
----------------------------------------
I'm working on developing [Polyphony](), a Ruby gem for writing
highly-concurrent Ruby programs with fibers. In the course of my work I have
come up against two problems using Ruby fibers:

1. Fiber context switching performance seem to degrade as the number of fibers
   is increased. This is both with `Fiber#transfer` and
   `Fiber#resume/Fiber.yield`.
2. The number of concurrent fibers that can exist at any time seems to be
   limited. Once a certain number is reached (on my system this seems to be
   31744 fibers), calling `Fiber#transfer` will raise a `FiberError` with the
   message `can't set a guard page: Cannot allocate memory`. This is not due to
   RAM being saturated. With 10000 fibers, my test program hovers at around 150MB
   RSS (on Ruby 2.7.1).

Here's a program for testing the performance of `Fiber#transfer`:

```ruby
# frozen_string_literal: true

require 'fiber'

class Fiber
  attr_accessor :next
end

def run(num_fibers)
  count = 0

  GC.start
  GC.disable

  first = nil
  last = nil
  supervisor = Fiber.current
  num_fibers.times do
    fiber = Fiber.new do
      loop do
        count += 1
        if count == 1_000_000
          supervisor.transfer
        else
          Fiber.current.next.transfer
        end
      end
    end
    first ||= fiber
    last.next = fiber if last
    last = fiber
  end

  last.next = first
  
  t0 = Time.now
  first.transfer
  elapsed = Time.now - t0

  rss = `ps -o rss= -p #{Process.pid}`.to_i

  puts "fibers: #{num_fibers} rss: #{rss} count: #{count} rate: #{count / elapsed}"
rescue Exception => e
  puts "Stopped at #{count} fibers"
  p e
end

run(100)
run(1000)
run(10000)
run(100000)
```

With Ruby 2.6.5 I'm getting:

```
fibers: 100 rss: 23212 count: 1000000 rate: 3357675.1688139187
fibers: 1000 rss: 31292 count: 1000000 rate: 2455537.056439736
fibers: 10000 rss: 127388 count: 1000000 rate: 954251.1674325482
Stopped at 22718 fibers
#<FiberError: can't set a guard page: Cannot allocate memory>
```

With Ruby 2.7.1 I'm getting:

```
fibers: 100 rss: 23324 count: 1000000 rate: 3443916.967616508
fibers: 1000 rss: 34676 count: 1000000 rate: 2333315.3862491543
fibers: 10000 rss: 151364 count: 1000000 rate: 916772.1008060966
Stopped at 31744 fibers
#<FiberError: can't set a guard page: Cannot allocate memory>
```

With ruby-head I get an almost identical result to that of 2.7.1.

As you can see, the performance degradation is similar in all the three versions
of Ruby, going from ~3.4M context switches per second for 100 fibers to less
then 1M context switches per second for 10000 fibers. Running with 100000 fibers
fails to complete.

Here's a program for testing the performance of `Fiber#resume/Fiber.yield`:

```ruby
# frozen_string_literal: true

require 'fiber'

class Fiber
  attr_accessor :next
end

# This program shows how the performance of Fiber.transfer degrades as the fiber
# count increases

def run(num_fibers)
  count = 0

  GC.start
  GC.disable

  fibers = []
  num_fibers.times do
    fibers << Fiber.new { loop { Fiber.yield } }
  end

  t0 = Time.now

  while count < 1000000
    fibers.each do |f|
      count += 1
      f.resume
    end
  end

  elapsed = Time.now - t0

  puts "fibers: #{num_fibers} count: #{count} rate: #{count / elapsed}"
rescue Exception => e
  puts "Stopped at #{count} fibers"
  p e
end

run(100)
run(1000)
run(10000)
run(100000)
```

With Ruby 2.7.1 I'm getting the following output:

```
fibers: 100 count: 1000000 rate: 3048230.049946255
fibers: 1000 count: 1000000 rate: 2362235.6455160403
fibers: 10000 count: 1000000 rate: 950251.7621725246
Stopped at 21745 fibers
#<FiberError: can't set a guard page: Cannot allocate memory>
```

As I understand it, theoretically at least switching between fibers should have
a constant cost in terms of CPU cycles, irrespective of the number of fibers
currently existing in memory. I am completely ignorant the implementation
details of Ruby fibers, so at least for now I don't have any idea where this
problem is coming from.



-- 
https://bugs.ruby-lang.org/

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-09-18  8:21 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-15 10:04 [ruby-core:100401] [Ruby master Bug#17263] Fiber context switch degrades with number of fibers, limit on number of fibers ciconia
2020-10-15 11:13 ` [ruby-core:100402] " samuel
2020-10-15 11:25 ` [ruby-core:100403] " samuel
2020-10-16  1:58 ` [ruby-core:100412] " samuel
2020-10-16  6:19 ` [ruby-core:100418] " samuel
2020-10-20 20:40 ` [ruby-core:100453] " ciconia
2020-10-22 10:43 ` [ruby-core:100499] " samuel
2022-01-31 14:47 ` [ruby-core:107390] " rmosolgo (Robert Mosolgo)
2023-08-25  0:13 ` [ruby-core:114519] " ioquatix (Samuel Williams) via ruby-core
2023-08-25  0:13 ` [ruby-core:114520] " ioquatix (Samuel Williams) via ruby-core
2023-08-25  3:16 ` [ruby-core:114523] " ioquatix (Samuel Williams) via ruby-core
2023-08-25  3:39 ` [ruby-core:114524] " ioquatix (Samuel Williams) via ruby-core
2023-08-25  4:28 ` [ruby-core:114525] " ioquatix (Samuel Williams) via ruby-core
2023-09-18  8:21 ` [ruby-core:114794] " kjtsanaktsidis (KJ Tsanaktsidis) via ruby-core

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).