From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.1 (2015-04-28) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.8 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.1 Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 188921F597 for ; Tue, 31 Jul 2018 23:18:58 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:cc:subject:message-id:references :mime-version:content-type:in-reply-to; q=dns; s=default; b=ETr2 BWvPYm/fbOIopfyKzEeSrLoVKW2p48tu97KMdb100jiqaZgS7igOG9j70Iizfz2C 3yey74nHJ9uwVYqCmiQ/DqcyBUqtJzwfDpnmUV6AGNFMFk+3VahrF9zyPCyvGyKQ L2828NONlajY4HRAJof+5IzUR1aeXuRfy+irLRU= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:date:from:to:cc:subject:message-id:references :mime-version:content-type:in-reply-to; s=default; bh=SXfxFhCUY5 Xw7qIVeQXatvBz3t0=; b=BELsq1/fH2dHeX+0LrhQSF4+xK+23e1gc5ZbzLSVO8 Xt+ZY8n+DVqigdefwN32knNPD8yun1PGYYTpqLTVUkWLapW+t/14KVjApEHr1Of9 YkCGxDX7mwOwDO/AFDMUA25aaMI/wdg6OCtb9Od8QnWuLtVANDzx8wKaJmn3X7La E= Received: (qmail 7128 invoked by alias); 31 Jul 2018 23:18:27 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 7023 invoked by uid 89); 31 Jul 2018 23:18:26 -0000 Authentication-Results: sourceware.org; auth=none X-HELO: dcvr.yhbt.net Date: Tue, 31 Jul 2018 23:18:19 +0000 From: Eric Wong To: Carlos O'Donell Cc: libc-alpha@sourceware.org Subject: Re: [RFC/PoC] malloc: use wfcqueue to speed up remote frees Message-ID: <20180731231819.57xsqvdfdyfxrzy5@whir> References: <20180731084936.g4yw6wnvt677miti@dcvr> <0cfdccea-d173-486c-85f4-27e285a30a1a@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <0cfdccea-d173-486c-85f4-27e285a30a1a@redhat.com> Carlos O'Donell wrote: > On 07/31/2018 04:49 AM, Eric Wong wrote: > > The goal is to reduce contention and improve locality of cross-thread > > malloc/free traffic common to IPC systems (including Userspace-RCU) and > > some garbage-collected runtimes. > Eric, > > This looks like a really interesting contribution! > > For anyone reviewing this patch I just want to point out that Eric *does* > have FSF copyright assignment for glibc, so review can proceed normally > for this patch. Thank you! Yep, it's been a while. > I would like to see urcu used within glibc to provide better data structures > for key thread, dynamic loader, and malloc algorithms. So if anything I think > this is a move in the right direction. That's great to hear :> > It would be interesting to roll your RFC into Fedora Rawhide for 6 months and > see if we hit any problems. Sure, sounds good to me. > I have a few high-level questions: > - Can you explain the RSS reduction given this patch? You > might think that just adding the frees to a queue wouldn't > result in any RSS gains. At least two reasons I can see: 1) With lock contention, the freeing thread can lose to the allocating thread. This makes the allocating thread hit sysmalloc since it prevented the freeing thread from doing its job. sysmalloc is the slow path, so the lock gets held even longer and the problem compounds from there. 2) thread caching - memory ends up in the wrong thread and could never get used in some cases. Fortunately this is bounded, but still a waste. I'm still new to the code, but it looks like threads are pinned to the arena and the memory used for arenas never gets released. Is that correct? I was wondering if there was another possibility: the allocating thread gives up the arena and creates a new one because the freeing thread locked it, but I don't think that's the case. Also, if I spawn a bunch of threads and get a bunch of arenas early in the program lifetime; and then only have few threads later, there can be a lot of idle arenas. > However, you are calling _int_free a lot in row and that > deinterleaving may help (you really want vector free API here > so you don't walk all the lists so many times, tcache had the > same problem but in reverse for finding chunks). Maybe... I think in the ideal case, the number of allocations and frees is close 1:1, so the loop is kept short. What may be worth trying is to bypass _int_free for cases where a chunk can fulfill the allocation which triggers it. Delaying or avoiding consolidation could worsen fragmentation, though. > - Adding urcu as a build-time dependency is not acceptable for > bootstrap, instead we would bundle a copy of urcu and keep it > in sync with upstream. Would that make your work easier? Yes, bundling that sounds great. I assume it's something for you or one of the regular contributors to work on (build systems scare me :x) > - What problems are you having with `make -j4 check?' Try > master and report back. We are about to release 2.28 so it > should build and pass. My fault. It seems like tests aren't automatically rerun when I change the code; so some of my broken work-in-progress changes ended up being false positives :x. When working on this, I made the mistake of doing remote_free_step inside malloc_consolidate, which could recurse into _int_free or _int_malloc I guess I should remove the *.test-result files before rerunning tests? I still get: FAIL: nptl/tst-sched1 "Create failed" I guess my system was overloaded. pthread_create failures seemed to happen a lot for me when working on Ruby, too, and POSIX forcing EAGAIN makes it hard to diagnose :< (ulimit -u 47999 and 12GB RAM) Removing the test-result and retrying seems OK. FAIL: resolv/tst-resolv-ai_idn FAIL: resolv/tst-resolv-ai_idn-latin1 Not root, so no CLONE_NEWUTS So I think that's expected...