From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ruby-core-bounces@ruby-lang.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net
X-Spam-Level: 
X-Spam-ASN: AS4713 221.184.0.0/13
X-Spam-Status: No, score=-3.5 required=3.0 tests=AWL,BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,SPF_PASS
	shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0
Received: from neon.ruby-lang.org (neon.ruby-lang.org [221.186.184.75])
	by dcvr.yhbt.net (Postfix) with ESMTP id AB67821841
	for <normalperson@yhbt.net>; Wed,  2 May 2018 07:54:34 +0000 (UTC)
Received: from neon.ruby-lang.org (localhost [IPv6:::1])
	by neon.ruby-lang.org (Postfix) with ESMTP id 1B2A11209DF;
	Wed,  2 May 2018 16:54:31 +0900 (JST)
Received: from dcvr.yhbt.net (dcvr.yhbt.net [64.71.152.64])
 by neon.ruby-lang.org (Postfix) with ESMTPS id C0C7912097D
 for <ruby-core@ruby-lang.org>; Wed,  2 May 2018 16:54:24 +0900 (JST)
Received: from localhost (dcvr.yhbt.net [127.0.0.1])
 by dcvr.yhbt.net (Postfix) with ESMTP id D2D6D21841;
 Wed,  2 May 2018 07:54:22 +0000 (UTC)
Date: Wed, 2 May 2018 07:54:22 +0000
From: Eric Wong <normalperson@yhbt.net>
To: ruby-core@ruby-lang.org
Message-ID: <20180502075422.GA12737@dcvr>
References: <redmine.issue-13618.20170601001407@ruby-lang.org>
 <redmine.journal-71784.20180502052055.ab388428bc1b19d2@ruby-lang.org>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <redmine.journal-71784.20180502052055.ab388428bc1b19d2@ruby-lang.org>
X-ML-Name: ruby-core
X-Mail-Count: 86826
Subject: [ruby-core:86826] Re: [Ruby trunk Feature#13618] [PATCH] auto fiber
 schedule for rb_wait_for_single_fd and rb_waitpid
X-BeenThere: ruby-core@ruby-lang.org
X-Mailman-Version: 2.1.15
Precedence: list
Reply-To: Ruby developers <ruby-core@ruby-lang.org>
List-Id: Ruby developers <ruby-core.ruby-lang.org>
List-Unsubscribe: <https://lists.ruby-lang.org/cgi-bin/mailman/options/ruby-core>, 
 <mailto:ruby-core-request@ruby-lang.org?subject=unsubscribe>
List-Post: <mailto:ruby-core@ruby-lang.org>
List-Help: <mailto:ruby-core-request@ruby-lang.org?subject=help>
List-Subscribe: <https://lists.ruby-lang.org/cgi-bin/mailman/listinfo/ruby-core>, 
 <mailto:ruby-core-request@ruby-lang.org?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: ruby-core-bounces@ruby-lang.org
Sender: "ruby-core" <ruby-core-bounces@ruby-lang.org>

samuel@oriontransfer.org wrote:
> So, it seems like your design has unavoidable contention (and
> therefore latency) because you need to send events between
> threads, which is what I expected. However, you argue this
> overhead should be small. I'd like to see actual numbers TBH.

Any contention is completely masked by GVL at the moment.  When
we get Guilds or get rid of GVL, of course I'll try per-core
schedulers.  As it stands, per-Thread schedulers will be a
disaster for systems with hundreds of native Threads (because
those threads are needed for servicing hundreds of slow disks)

> And as you state, it's not possible (nor desirable IMHO) to
> move fibers between threads. Yes, head-of-line blocking might
> be an issue. Moving stacks between CPU cores is not without
> it's own set of overheads. If you have serious issues with
> head-of-line blocking it's more likely to be a problem with
> your code (I've directly experienced this and the result was:
> https://github.com/socketry/async-http/blob/ca655aa190ed7a89b601e267906359793271ec8a/lib/async/http/protocol/http11.rb#L93).

Fwiw, yahns makes large performance sacrifices(*) to avoid HOL
blocking.

According to Go user reports, being able to move goroutines
between native threads is a big feature to them.  But I don't
think it's possible with current Ruby C API, anyways :<

> It would be interesting to see exactly how much overhead is
> incurred using a shared epoll. I know from my testing that

It's high for yahns because the default max_events for
epoll_wait is only 1. (*)  Throughput should increase with
something reasonable like 64, but you will lose HOL blocking
resistance.

Auto-fiber starts at max_events 8 and auto-increases if needed.
Fwiw, HOL blocking considerations for auto-fiber and yahns are
completely different so I don't see a downside to increasing
max_events with auto-fiber aside from memory use.


With yahns Transfer-Encoding:chunked responses, yahns
prioritizes latency of each individual chunk over overall
throughput, so it loses in throughput performance, too.  Maybe
there can be an option to change that *shrug*

> It's prepared but not added to the reactor here (it's lazy): https://github.com/kurocha/async-http/blob/eff77f61f7a85a3ac21f7a8f51ba07f069063cbe/source/Async/HTTP/V1/Protocol.cpp#L34
> 
> By calling wait, the fd is inserted into the reactor/selector: https://github.com/kurocha/async/blob/2edef4d6990259cc60cc307b6de2ab35b97560f1/source/Async/Protocol/Buffer.cpp#L254

> The cost of adding/removing FDs is effectively constant time
> given an arbitrary number of reads or writes. We shouldn't
> preclude implementing this model in Ruby if it makes sense. As
> you say, the overhead of the system call is pretty minimal.

Right, for auto-fiber in Ruby is lazy add (and the scheduler is
lazily created).

> Now that you mention it, I'd like to compare EPOLLET vs
> EPOLLONESHOT. It's an interesting design choice and it makes a
> lot of sense if you are doing only one read in the context of
> a blocking operation.

To me, they're drastically different in terms of programming
style.

ET isn't too different than level-triggered (LT), but I'd say
the trickiest of the bunch.  I would say ET is trickier than LT
to avoid head-of-line blocking because you need to keep track of
undrained buffers (from clients which pipeline aggressively) and
not lose them if you want to temporarily yield to other clients.

I suppose Edge and Level trigger invite a "reactive" design and
inverted control flow.

The main thing which bothers me about both ET and LT is you have
to remember to disable/reenable events (to avoid unfairness or DoS).

Under ideal conditions (clients not trying to DoS or be unfair
to other clients), ET can probably be fastest.  Just totally
unrealistic to expect ideal conditions.

So I strongly prefer one-shot because you don't have to deal
with disabling events.  This is especially useful when there's
aggressive pipelining going on (e.g. client sending you requests
quickly, yet reading responses slowly to fill up your output
buffers).  one-shot makes it a queue, so that also invites
sharing across threads.

The way auto-fiber uses one-shot doesn't invert the control flow at
all.  (Though you could, yahns does that).  Instead, auto-fiber
feels like the Linux kernel scheduler API:

```
	/* get stuck on EAGAIN or similar, can't proceed */
	add_wait_queue_and_register(&foo) /* list_add + epoll_ctl */
	while (!foo->ready) {
		/*
		 * run epoll_wait and let other threads/fibers run
		 * Since we registered foo with the scheduler, it can become
		 * ready at any time while schedule() is running;
		 */
		schedule();
	}
	remove_wait_queue(&foo) /* list_del via rb_ensure */
	/* resume whatever this fiber was doing */
```