From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS4713 221.184.0.0/13 X-Spam-Status: No, score=-3.4 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from neon.ruby-lang.org (neon.ruby-lang.org [221.186.184.75]) by dcvr.yhbt.net (Postfix) with ESMTP id 7DB6021841 for ; Mon, 30 Apr 2018 10:25:40 +0000 (UTC) Received: from neon.ruby-lang.org (localhost [IPv6:::1]) by neon.ruby-lang.org (Postfix) with ESMTP id E35C6120905; Mon, 30 Apr 2018 19:25:38 +0900 (JST) Received: from dcvr.yhbt.net (dcvr.yhbt.net [64.71.152.64]) by neon.ruby-lang.org (Postfix) with ESMTPS id 1CFF41208C0 for ; Mon, 30 Apr 2018 19:25:29 +0900 (JST) Received: from localhost (dcvr.yhbt.net [127.0.0.1]) by dcvr.yhbt.net (Postfix) with ESMTP id BC55221841; Mon, 30 Apr 2018 10:25:26 +0000 (UTC) Date: Mon, 30 Apr 2018 10:25:26 +0000 From: Eric Wong To: ruby-core@ruby-lang.org Message-ID: <20180430102526.GA20199@dcvr> References: MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: X-ML-Name: ruby-core X-Mail-Count: 86774 Subject: [ruby-core:86774] Re: [Ruby trunk Feature#13618] [PATCH] auto fiber schedule for rb_wait_for_single_fd and rb_waitpid X-BeenThere: ruby-core@ruby-lang.org X-Mailman-Version: 2.1.15 Precedence: list Reply-To: Ruby developers List-Id: Ruby developers List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ruby-core-bounces@ruby-lang.org Sender: "ruby-core" samuel@oriontransfer.org wrote: > > > Using a background thread is your mistake. > Don't assume I made this design. It was made by other people. > I merely tested it because I was interested in the performance > overhead. And yes, there is significant overhead. And let's be > generous, people who invested their time and effort to make > such a thing for Ruby deserve our appreciation. Knowing that > the path they chose to explore was not good is equally > important. The problem I have with existing reactor patterns is threads are an afterthought. They should not be. > > Multiple foreground threads safely use epoll_wait or kevent > > on the SAME epoll or kqueue fd. It's perfectly safe to do > > that. > Sure, that's reasonable. If you want to share those data > structures across threads, you can dispatch your work in > different threads too. I liked what you did with > https://yhbt.net/yahns/yahns.txt and it's an interesting > design. > > The biggest single benefit of this design is that blocking > operations in an individual "task" or "worker" won't block any > other "task" or "worker", up to the limit of the thread pool > you allocate, at which point things WILL start causing > blocking. So you can't avoid blocking even with this design. Of course everything blocks at some point when things get overloaded. The difference is there's no head-of-line blocking in yahns because sockets can migrate to an idle thread. Auto-fiber can't avoid head-of-line blocking right now, because Ruby Fiber can't migrate across threads (that's a separate problem). > The major downside of such a design is that workers have to > assume they could be running on different threads, so shared > data structure needs locking/will cause contention. In > addition the current state of the Ruby GIL means that any such > design will generally have poor performance. No, you don't need locking for read/write ops if you use EV_ONESHOT/EPOLLONESHOT. libev and typical reactor pattern designs are not built with one-shot in mind, so they're stuck using Level-triggering and rely on locking. Only FD allocation/deallocation requires locking (the kernel needs locking, there, too). > So, I think it's safe to say, that in an end to end test, the > GIL is a MAJOR performance issue. Feel free to correct me if > you think I'm wrong. I'm sure this story is more complicated > than the above benchmarks, but I felt like it was a useful > comparison. GVL is a major performance issue if your bottleneck is the CPU. It is not a major problem when my bottleneck is network I/O or high-latency disks (I have systems with dozens or hundreds). > Blocking operations that are causing performance issues should > use a thread pool. For things like launching an external > process or syscall, and waiting for it to finish, threads are > ideal. Launching external process and waitpid does not benefit from native threads. Again, native_thread_count >= disk_count is a huge thing I rely on with Ruby for years now, so using one native thread is totally wrong for my use case when I have dozens/hundreds of slow disks. > There is some elegance in the design you propose. Your > proposal requires some kind of "Task" or "Worker" which is a > fiber which will yield when IO would block, and resume when IO > is ready. Based on what you've said, do you mind explaining > whether the "Task" or "Worker" is resumed on the same thread > or a different one? Do you maintain a thread pool? The use of threads or thread pool remains up to the Ruby user. There's no extra fibers or native threads created behind users' back; that's a waste of memory. It uses "idle time" of any available threads (including main thread) to do scheduling work. (Current Ruby has provisions for an internal thread cache for Thread.new, but it's orthogonal to this issue and has been around for a decade in a buggy, never-enabled state). > If it's always resumed on the same thread, how do you manage > that? e.g. perhaps you can show me how the following would > work: Every thread has a FIFO run-queue (th->afrunq or th->runq depending on which version you look at).... > If you following this model, the thread must be calling into > `epoll` or `kqueue` in order to resume work. But based on what > you've said, if you have several of the above threads running, > and the thread itself is invoking `epoll_wait`, then it > receives events for a different thread, how does that work? Do > you send the events to the different thread? If you do that, > what is the overhead? If you don't do that, do you move > workers between threads? When a thread receives work for a fiber for a different thread, it inserts into the runqueue of the other thread. Right now it's ccan/list for branchless insert/delete (relies on GVL) If/when we get rid of GVL, we will likely use wfcqueue for wait-free insert and mass dequeue. Wait-free is better than lock-free, even, but there'd still be memory barriers, of course. Again, we can't move fibers across threads in Ruby atm. One-shot notifications ensures we don't get unintended events. > Then, why not consider the similar model to async which uses > per-thread reactors. The workers do not move around threads, > and the reactor does not need to send events to other threads. I know all that sounds like an unnecessary serialization and overhead, but the same stuff is being serialized in the kernel and hardware, even. For (typical) servers with a single active NIC, interrupts tend to be handled by a single CPU and inserting into epoll readylist has the same serialization overhead. So partitioning across multiple epoll/kqueue descriptions inside the kernel is a waste of time unless you're getting enough traffic to max out a CPU with interrupt handling. There's nothing about the design which prevents the use of parallel schedulers (they are not "reactors" to me). So if I was getting enough network traffic to saturate multiple NICs and peg a CPU from network traffic alone, yes, as a last resort I'd have extra epoll/kqueue-based schedulers inside a process. That's a last resort. I know we can eek more performance out of the epoll readylist inside the Linux kernel, first. But that's not even worth the effort atm. Until then, I''d rather save unswappable kernel memory and FDs with a single epoll/kqueue per-process.