From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS4713 221.184.0.0/13 X-Spam-Status: No, score=-3.5 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from neon.ruby-lang.org (neon.ruby-lang.org [221.186.184.75]) by dcvr.yhbt.net (Postfix) with ESMTP id AB67821841 for ; Wed, 2 May 2018 07:54:34 +0000 (UTC) Received: from neon.ruby-lang.org (localhost [IPv6:::1]) by neon.ruby-lang.org (Postfix) with ESMTP id 1B2A11209DF; Wed, 2 May 2018 16:54:31 +0900 (JST) Received: from dcvr.yhbt.net (dcvr.yhbt.net [64.71.152.64]) by neon.ruby-lang.org (Postfix) with ESMTPS id C0C7912097D for ; Wed, 2 May 2018 16:54:24 +0900 (JST) Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id D2D6D21841; Wed, 2 May 2018 07:54:22 +0000 (UTC) Date: Wed, 2 May 2018 07:54:22 +0000 From: Eric Wong To: ruby-core@ruby-lang.org Message-ID: <20180502075422.GA12737@dcvr> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-ML-Name: ruby-core X-Mail-Count: 86826 Subject: [ruby-core:86826] Re: [Ruby trunk Feature#13618] [PATCH] auto fiber schedule for rb_wait_for_single_fd and rb_waitpid X-BeenThere: ruby-core@ruby-lang.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Ruby developers List-Id: Ruby developers List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ruby-core-bounces@ruby-lang.org Sender: "ruby-core" samuel@oriontransfer.org wrote: > So, it seems like your design has unavoidable contention (and > therefore latency) because you need to send events between > threads, which is what I expected. However, you argue this > overhead should be small. I'd like to see actual numbers TBH. Any contention is completely masked by GVL at the moment. When we get Guilds or get rid of GVL, of course I'll try per-core schedulers. As it stands, per-Thread schedulers will be a disaster for systems with hundreds of native Threads (because those threads are needed for servicing hundreds of slow disks) > And as you state, it's not possible (nor desirable IMHO) to > move fibers between threads. Yes, head-of-line blocking might > be an issue. Moving stacks between CPU cores is not without > it's own set of overheads. If you have serious issues with > head-of-line blocking it's more likely to be a problem with > your code (I've directly experienced this and the result was: > https://github.com/socketry/async-http/blob/ca655aa190ed7a89b601e267906359793271ec8a/lib/async/http/protocol/http11.rb#L93). Fwiw, yahns makes large performance sacrifices(*) to avoid HOL blocking. According to Go user reports, being able to move goroutines between native threads is a big feature to them. But I don't think it's possible with current Ruby C API, anyways :< > It would be interesting to see exactly how much overhead is > incurred using a shared epoll. I know from my testing that It's high for yahns because the default max_events for epoll_wait is only 1. (*) Throughput should increase with something reasonable like 64, but you will lose HOL blocking resistance. Auto-fiber starts at max_events 8 and auto-increases if needed. Fwiw, HOL blocking considerations for auto-fiber and yahns are completely different so I don't see a downside to increasing max_events with auto-fiber aside from memory use. With yahns Transfer-Encoding:chunked responses, yahns prioritizes latency of each individual chunk over overall throughput, so it loses in throughput performance, too. Maybe there can be an option to change that *shrug* > It's prepared but not added to the reactor here (it's lazy): https://github.com/kurocha/async-http/blob/eff77f61f7a85a3ac21f7a8f51ba07f069063cbe/source/Async/HTTP/V1/Protocol.cpp#L34 > > By calling wait, the fd is inserted into the reactor/selector: https://github.com/kurocha/async/blob/2edef4d6990259cc60cc307b6de2ab35b97560f1/source/Async/Protocol/Buffer.cpp#L254 > The cost of adding/removing FDs is effectively constant time > given an arbitrary number of reads or writes. We shouldn't > preclude implementing this model in Ruby if it makes sense. As > you say, the overhead of the system call is pretty minimal. Right, for auto-fiber in Ruby is lazy add (and the scheduler is lazily created). > Now that you mention it, I'd like to compare EPOLLET vs > EPOLLONESHOT. It's an interesting design choice and it makes a > lot of sense if you are doing only one read in the context of > a blocking operation. To me, they're drastically different in terms of programming style. ET isn't too different than level-triggered (LT), but I'd say the trickiest of the bunch. I would say ET is trickier than LT to avoid head-of-line blocking because you need to keep track of undrained buffers (from clients which pipeline aggressively) and not lose them if you want to temporarily yield to other clients. I suppose Edge and Level trigger invite a "reactive" design and inverted control flow. The main thing which bothers me about both ET and LT is you have to remember to disable/reenable events (to avoid unfairness or DoS). Under ideal conditions (clients not trying to DoS or be unfair to other clients), ET can probably be fastest. Just totally unrealistic to expect ideal conditions. So I strongly prefer one-shot because you don't have to deal with disabling events. This is especially useful when there's aggressive pipelining going on (e.g. client sending you requests quickly, yet reading responses slowly to fill up your output buffers). one-shot makes it a queue, so that also invites sharing across threads. The way auto-fiber uses one-shot doesn't invert the control flow at all. (Though you could, yahns does that). Instead, auto-fiber feels like the Linux kernel scheduler API: ``` /* get stuck on EAGAIN or similar, can't proceed */ add_wait_queue_and_register(&foo) /* list_add + epoll_ctl */ while (!foo->ready) { /* * run epoll_wait and let other threads/fibers run * Since we registered foo with the scheduler, it can become * ready at any time while schedule() is running; */ schedule(); } remove_wait_queue(&foo) /* list_del via rb_ensure */ /* resume whatever this fiber was doing */ ```