From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.8 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.1 Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 550F31F597 for ; Thu, 2 Aug 2018 21:38:28 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:to:cc:references:from:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; q=dns; s=default; b=Yr2ReanWt/wutSM/ +73ika1cRZbZlm3r7YTihVFg0P+aSzhFhUI0Q1I5s+6nkMVkpgLhEd6mR0JGIwN4 Y9yfLU8CwsWyDiqLZW6KNFvn61/Jz3S/UnCnOJC9Ad8qsW4MG3Cr8CCXEOPba/in cLh+5Mjf9x9StGqjvRVm9ng+hHQ= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:to:cc:references:from:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; s=default; bh=wRnVck5EKMAPF7GCieyduJ Yi0TU=; b=wIg9/sn+gtinGcTilQVMTnHjjOkcACKeJlxYTcojyw9KZ3pf9anFhf MCwzM45B93Ikl76gJoGdhBfyT5XyFfcSBDlXpXT5vnVGz45IbII2GhP6sIK2sH3M nj7St0nReOoWf1nu+YyDs3IWR4ck7KH9cZvgrIWwRXbDbAJhzNU6o= Received: (qmail 111315 invoked by alias); 2 Aug 2018 21:38:25 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 111305 invoked by uid 89); 2 Aug 2018 21:38:24 -0000 Authentication-Results: sourceware.org; auth=none X-HELO: mail-qk0-f196.google.com Subject: Re: [RFC/PoC] malloc: use wfcqueue to speed up remote frees To: Eric Wong Cc: libc-alpha@sourceware.org References: <20180731084936.g4yw6wnvt677miti@dcvr> <0cfdccea-d173-486c-85f4-27e285a30a1a@redhat.com> <20180731231819.57xsqvdfdyfxrzy5@whir> <20180801062352.rlrjqmsszntkzlfe@untitled> <20180801092626.jrwyrojfye4avcis@whir> From: Carlos O'Donell Openpgp: preference=signencrypt Message-ID: <3cea8c38-adef-bbb8-7d54-8fc371211065@redhat.com> Date: Thu, 2 Aug 2018 17:38:19 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: <20180801092626.jrwyrojfye4avcis@whir> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit On 08/01/2018 05:26 AM, Eric Wong wrote: > Carlos O'Donell wrote: >> On 08/01/2018 02:23 AM, Eric Wong wrote: >>> Carlos O'Donell wrote: >>>> On 07/31/2018 07:18 PM, Eric Wong wrote: >>>>> Also, if I spawn a bunch of threads and get a bunch of >>>>> arenas early in the program lifetime; and then only have few >>>>> threads later, there can be a lot of idle arenas. >>>> >>>> Yes. That is true. We don't coalesce arenas to match the thread >>>> demand. >>> >>> Eep :< If contention can be avoided (which tcache seems to >>> work well for), limiting arenas to CPU count seems desirable and >>> worth trying. >> >> Agreed. >> >> In general it is not as bad as you think. >> >> An arena is made up of a chain of heaps, each an mmap'd block, and >> if we can manage to free an entire heap then we unmap the heap, >> and if we're lucky we can manage to free down the entire arena >> (_int_free -> large chunk / consolidate -> heap_trim -> shrink_heap). >> >> So we might just end up with a large number of arena's that don't >> have very much allocated at all, but are all on the arena free list >> waiting for a thread to attach to them to reduce overall contention. >> >> I agree that it would be *better* if we had one arena per CPU and >> each thread could easily determine the CPU it was on (via a >> restartable sequence) and then allocate CPU-local memory to work >> with (the best you can do; ignoring NUMA effects). > > Thanks for the info on arenas. One problem for Ruby is we get > many threads[1], and they create allocations of varying > lifetimes. All this while malloc contention is rarely a > problem in Ruby because of the global VM lock (GVL). The allocations of varying lifetimes will make it impossible to free down a heap from a heap-based allocator. This is a serious issue with heap-based allocators, and it will impact the max RSS that you'll need to reach steady state. It's not really a tractable problem I think, I don't know how to deinterlace the the chunks which have differing lifetimes. Your only chance is to take the existing large/small/fast bin machinery and instead of mixing them in one heap, split them into one smaller heap each, and see how that goes i.e. adopt size classes, but keep it heap-based. > Even without restartable sequences, I was wondering if lfstack > (also in urcu) could even be used for sharing/distributing > arenas between threads. This would require tcache to avoid > retries on lfstack pop/push. We absolutely need a better balancing across arenas, even now we don't do any rebalancing based on load or attach count. We should. That problem would go away if you just used restartable sequences to find your cpu, map that to the local arena for the cpu and allocate there. > Much less straighforward than using wfcqueue for frees with > this patch, though :) Correct. > Heh, a bit dirty, but #define-ing poll away seems to work :) > > diff --git a/malloc/malloc.c b/malloc/malloc.c > index 40d61e45db..89e675c7a0 100644 > --- a/malloc/malloc.c > +++ b/malloc/malloc.c > @@ -247,6 +247,11 @@ > /* For SINGLE_THREAD_P. */ > #include > > +/* prevent wfcqueue.h from including poll.h and linking to it */ > +#include > +#undef poll > +#define poll(a,b,c) assert(0 && "should not be called") > + Call __poll instead. That should fix the issue. > #define _LGPL_SOURCE /* allows inlines */ > #include > > -- Cheers, Carlos.