From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,UNPARSEABLE_RELAY shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from neon.ruby-lang.org (neon.ruby-lang.org [221.186.184.75]) by dcvr.yhbt.net (Postfix) with ESMTP id A84AB1F4B4 for ; Thu, 22 Oct 2020 10:43:10 +0000 (UTC) Received: from neon.ruby-lang.org (localhost [IPv6:::1]) by neon.ruby-lang.org (Postfix) with ESMTP id AC1E7120B79; Thu, 22 Oct 2020 19:42:31 +0900 (JST) Received: from xtrwkhkc.outbound-mail.sendgrid.net (xtrwkhkc.outbound-mail.sendgrid.net [167.89.16.28]) by neon.ruby-lang.org (Postfix) with ESMTPS id 01202120B78 for ; Thu, 22 Oct 2020 19:42:28 +0900 (JST) Received: by filterdrecv-p3las1-bf7bc68d5-sgdhm with SMTP id filterdrecv-p3las1-bf7bc68d5-sgdhm-18-5F916238-35 2020-10-22 10:43:04.68105504 +0000 UTC m=+60239.702524602 Received: from herokuapp.com (unknown) by ismtpd0060p1mdw1.sendgrid.net (SG) with ESMTP id 2ZbgUsWTSKG7KPvwJ4pduQ for ; Thu, 22 Oct 2020 10:43:04.541 +0000 (UTC) Date: Thu, 22 Oct 2020 10:43:04 +0000 (UTC) From: samuel@oriontransfer.net Message-ID: References: Mime-Version: 1.0 X-Redmine-MailingListIntegration-Message-Ids: 76369 X-Redmine-Project: ruby-master X-Redmine-Issue-Tracker: Bug X-Redmine-Issue-Id: 17263 X-Redmine-Issue-Author: ciconia X-Redmine-Sender: ioquatix X-Mailer: Redmine X-Redmine-Host: bugs.ruby-lang.org X-Redmine-Site: Ruby Issue Tracking System X-Auto-Response-Suppress: All Auto-Submitted: auto-generated X-SG-EID: =?us-ascii?Q?cjxb6GWHefMLoR50bkJBcGo6DRiDl=2FNYcMZdY+Wj30SqqmCNlX+U9=2F6Ze26uzH?= =?us-ascii?Q?5UtiMJufeL+2tvjkT1ttoaJyYBPqt2jQmgSw79y?= =?us-ascii?Q?H6nV6=2FnULxzvZ3RdZrUtcYKs=2FjYlaNcVh0tf4PQ?= =?us-ascii?Q?+QzLHcKGOoHT8+q4Fjw8QNSK48ufHUpEaM2dFmd?= =?us-ascii?Q?FtQCzGlM97820k123tcrERFx51b=2FosW3oFM+6CU?= =?us-ascii?Q?9lDUkXtistxh9aZ2U=3D?= To: ruby-core@ruby-lang.org X-Entity-ID: b/2+PoftWZ6GuOu3b0IycA== X-ML-Name: ruby-core X-Mail-Count: 100499 Subject: [ruby-core:100499] [Ruby master Bug#17263] Fiber context switch degrades with number of fibers, limit on number of fibers X-BeenThere: ruby-core@ruby-lang.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Ruby developers List-Id: Ruby developers List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ruby-core-bounces@ruby-lang.org Sender: "ruby-core" Issue #17263 has been updated by ioquatix (Samuel Williams). You do not need to preface your questions like that. The fiber stack is a fixed size, but it's allocated using virtual memory and it's "released" using `madvise(DONT_NEED)` which allows the kernel to release those pages. So whether you allocate fibers with a 1MiB stack or a 128MiB stack, the only difference is address space consumed which is almost free and the actual amount of stack used, rounded up to the nearest page. One part to be careful of is to ensure the GC knows the correct extent of the stack, otherwise it would page in the entire stack (mostly zeros). ---------------------------------------- Bug #17263: Fiber context switch degrades with number of fibers, limit on number of fibers https://bugs.ruby-lang.org/issues/17263#change-88119 * Author: ciconia (Sharon Rosner) * Status: Open * Priority: Normal * ruby -v: 2.7.1 * Backport: 2.5: UNKNOWN, 2.6: UNKNOWN, 2.7: UNKNOWN ---------------------------------------- I'm working on developing [Polyphony](https://github.com/digital-fabric/polyphony), a Ruby gem for writing highly-concurrent Ruby programs with fibers. In the course of my work I have come up against two problems using Ruby fibers: 1. Fiber context switching performance seem to degrade as the number of fibers is increased. This is both with `Fiber#transfer` and `Fiber#resume/Fiber.yield`. 2. The number of concurrent fibers that can exist at any time seems to be limited. Once a certain number is reached (on my system this seems to be 31744 fibers), calling `Fiber#transfer` will raise a `FiberError` with the message `can't set a guard page: Cannot allocate memory`. This is not due to RAM being saturated. With 10000 fibers, my test program hovers at around 150MB RSS (on Ruby 2.7.1). Here's a program for testing the performance of `Fiber#transfer`: ```ruby # frozen_string_literal: true require 'fiber' class Fiber attr_accessor :next end def run(num_fibers) count = 0 GC.start GC.disable first = nil last = nil supervisor = Fiber.current num_fibers.times do fiber = Fiber.new do loop do count += 1 if count == 1_000_000 supervisor.transfer else Fiber.current.next.transfer end end end first ||= fiber last.next = fiber if last last = fiber end last.next = first t0 = Time.now first.transfer elapsed = Time.now - t0 rss = `ps -o rss= -p #{Process.pid}`.to_i puts "fibers: #{num_fibers} rss: #{rss} count: #{count} rate: #{count / elapsed}" rescue Exception => e puts "Stopped at #{count} fibers" p e end run(100) run(1000) run(10000) run(100000) ``` With Ruby 2.6.5 I'm getting: ``` fibers: 100 rss: 23212 count: 1000000 rate: 3357675.1688139187 fibers: 1000 rss: 31292 count: 1000000 rate: 2455537.056439736 fibers: 10000 rss: 127388 count: 1000000 rate: 954251.1674325482 Stopped at 22718 fibers # ``` With Ruby 2.7.1 I'm getting: ``` fibers: 100 rss: 23324 count: 1000000 rate: 3443916.967616508 fibers: 1000 rss: 34676 count: 1000000 rate: 2333315.3862491543 fibers: 10000 rss: 151364 count: 1000000 rate: 916772.1008060966 Stopped at 31744 fibers # ``` With ruby-head I get an almost identical result to that of 2.7.1. As you can see, the performance degradation is similar in all the three versions of Ruby, going from ~3.4M context switches per second for 100 fibers to less then 1M context switches per second for 10000 fibers. Running with 100000 fibers fails to complete. Here's a program for testing the performance of `Fiber#resume/Fiber.yield`: ```ruby # frozen_string_literal: true require 'fiber' class Fiber attr_accessor :next end # This program shows how the performance of Fiber.transfer degrades as the fiber # count increases def run(num_fibers) count = 0 GC.start GC.disable fibers = [] num_fibers.times do fibers << Fiber.new { loop { Fiber.yield } } end t0 = Time.now while count < 1000000 fibers.each do |f| count += 1 f.resume end end elapsed = Time.now - t0 puts "fibers: #{num_fibers} count: #{count} rate: #{count / elapsed}" rescue Exception => e puts "Stopped at #{count} fibers" p e end run(100) run(1000) run(10000) run(100000) ``` With Ruby 2.7.1 I'm getting the following output: ``` fibers: 100 count: 1000000 rate: 3048230.049946255 fibers: 1000 count: 1000000 rate: 2362235.6455160403 fibers: 10000 count: 1000000 rate: 950251.7621725246 Stopped at 21745 fibers # ``` As I understand it, theoretically at least switching between fibers should have a constant cost in terms of CPU cycles, irrespective of the number of fibers currently existing in memory. I am completely ignorant the implementation details of Ruby fibers, so at least for now I don't have any idea where this problem is coming from. -- https://bugs.ruby-lang.org/