From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-4.0 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.1 Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 30F2C1F597 for ; Wed, 1 Aug 2018 09:26:34 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:cc:subject:message-id:references :mime-version:content-type:in-reply-to; q=dns; s=default; b=ZwoE R279niAWxxIJDnjuvB1yVcsp1EaH0SARYVZGoVRBns0H3upSI6kYoQatJXVolNX9 MUQGN3VLCoFsu7trvL2TNxmTyfeKk2vVHm7Gv4BKfg/8wI+nUdBZOiLO7EFyeUOk lHJTU/zgkP7i2O++l05TTwTmmICKxb/zZPj9dz4= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:cc:subject:message-id:references :mime-version:content-type:in-reply-to; s=default; bh=JGYFcB6Mk8 v219SPgmwH+hsdDvM=; b=sq5eQsmk6pq72GobZOdS/5RSXyIXiDeWUqRT7EjJJK nZHozLCrK8lHDfD+FJWobjZR0O7JyJ1hp4l57sgRhy8+2c9p/ecgM8+yttMS2fY+ oHFjYOdDv3TLknnsZr31suaVPenE+lbmEDjrvlFw4dsXsscu14DI+dYPpXcZRW4m c= Received: (qmail 94281 invoked by alias); 1 Aug 2018 09:26:31 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 93767 invoked by uid 89); 1 Aug 2018 09:26:30 -0000 Authentication-Results: sourceware.org; auth=none X-HELO: dcvr.yhbt.net Date: Wed, 1 Aug 2018 09:26:26 +0000 From: Eric Wong To: Carlos O'Donell Cc: libc-alpha@sourceware.org Subject: Re: [RFC/PoC] malloc: use wfcqueue to speed up remote frees Message-ID: <20180801092626.jrwyrojfye4avcis@whir> References: <20180731084936.g4yw6wnvt677miti@dcvr> <0cfdccea-d173-486c-85f4-27e285a30a1a@redhat.com> <20180731231819.57xsqvdfdyfxrzy5@whir> <20180801062352.rlrjqmsszntkzlfe@untitled> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Carlos O'Donell wrote: > On 08/01/2018 02:23 AM, Eric Wong wrote: > > Carlos O'Donell wrote: > >> On 07/31/2018 07:18 PM, Eric Wong wrote: > >>> Also, if I spawn a bunch of threads and get a bunch of > >>> arenas early in the program lifetime; and then only have few > >>> threads later, there can be a lot of idle arenas. > >> > >> Yes. That is true. We don't coalesce arenas to match the thread > >> demand. > > > > Eep :< If contention can be avoided (which tcache seems to > > work well for), limiting arenas to CPU count seems desirable and > > worth trying. > > Agreed. > > In general it is not as bad as you think. > > An arena is made up of a chain of heaps, each an mmap'd block, and > if we can manage to free an entire heap then we unmap the heap, > and if we're lucky we can manage to free down the entire arena > (_int_free -> large chunk / consolidate -> heap_trim -> shrink_heap). > > So we might just end up with a large number of arena's that don't > have very much allocated at all, but are all on the arena free list > waiting for a thread to attach to them to reduce overall contention. > > I agree that it would be *better* if we had one arena per CPU and > each thread could easily determine the CPU it was on (via a > restartable sequence) and then allocate CPU-local memory to work > with (the best you can do; ignoring NUMA effects). Thanks for the info on arenas. One problem for Ruby is we get many threads[1], and they create allocations of varying lifetimes. All this while malloc contention is rarely a problem in Ruby because of the global VM lock (GVL). Even without restartable sequences, I was wondering if lfstack (also in urcu) could even be used for sharing/distributing arenas between threads. This would require tcache to avoid retries on lfstack pop/push. Much less straighforward than using wfcqueue for frees with this patch, though :) [1] we only had green-threads back in Ruby 1.8, and I guess many Rubyists got used to the idea that they could have many threads cheaply. Ruby 1.9+ moved to 100% native threads, so I'm also trying to reintroduce green threads as an option back into Ruby (but still keeping native threads) > > OK, I noticed my patch fails conformance tests because > > (despite my use of __cds_wfcq_splice_nonblocking) it references > > poll(), despite poll() being in an impossible code path: > > > > __cds_wfcq_splice_nonblocking -> ___cds_wfcq_splice > > -> ___cds_wfcq_busy_wait -> poll > > > > The poll call is impossible because the `blocking' parameter is 0; > > but I guess the linker doesn't know that? > > Correct. We can fix that easily at a later date. Don't worry about it. Heh, a bit dirty, but #define-ing poll away seems to work :) diff --git a/malloc/malloc.c b/malloc/malloc.c index 40d61e45db..89e675c7a0 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -247,6 +247,11 @@ /* For SINGLE_THREAD_P. */ #include +/* prevent wfcqueue.h from including poll.h and linking to it */ +#include +#undef poll +#define poll(a,b,c) assert(0 && "should not be called") + #define _LGPL_SOURCE /* allows inlines */ #include