Re: [RFC/PoC] malloc: use wfcqueue to speed up remote frees

unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Carlos O'Donell <carlos@redhat.com>
To: Eric Wong <normalperson@yhbt.net>
Cc: libc-alpha@sourceware.org
Subject: Re: [RFC/PoC] malloc: use wfcqueue to speed up remote frees
Date: Wed, 1 Aug 2018 03:01:22 -0400	[thread overview]
Message-ID: <aa9b36d4-02a1-e3cc-30e8-eca9f0d8b6eb@redhat.com> (raw)
In-Reply-To: <20180801062352.rlrjqmsszntkzlfe@untitled>

On 08/01/2018 02:23 AM, Eric Wong wrote:
> Carlos O'Donell <carlos@redhat.com> wrote:
>> On 07/31/2018 07:18 PM, Eric Wong wrote:
>>>> - Can you explain the RSS reduction given this patch? You
>>>> might think that just adding the frees to a queue wouldn't
>>>> result in any RSS gains.
>>>
>>> At least two reasons I can see:
>>>
>>> 1) With lock contention, the freeing thread can lose to the
>>>    allocating thread.  This makes the allocating thread hit
>>>    sysmalloc since it prevented the freeing thread from doing
>>>    its job.  sysmalloc is the slow path, so the lock gets held
>>>    even longer and the problem compounds from there.
>>
>> How does this impact RSS? It would only block the remote thread
>> from freeing in a timely fashion, but it would eventually make
>> progress.
> 
> Blocking the freeing thread causes the allocating thread to
> sysmalloc more.  If the freeing thread could always beat the
> allocating thread, then the freed memory would be available in
> the arena by the time the allocating thread takes the lock.

I see what you mean now. Yes, that could reduce RSS by reducing
the time between when the remote thread frees memory and when
the producer thread (let's call it that) can reuse the returned
chunks.

>>> 2) thread caching - memory ends up in the wrong thread and
>>>    could never get used in some cases.  Fortunately this is
>>>    bounded, but still a waste.
>>
>> We can't have memory end up in the wrong thread. The remote thread
>> computes the arena from the chunk it has, and then frees back to
>> the appropriate arena, even if it's not the arena that the thread
>> is attached to.
> 
> Really?  I see:
> 
>    __libc_free -> MAYBE_INIT_TCACHE && _int_free -> tcache_put
> 
> I am not seeing anything in _int_free which makes the tcache_put
> arena-aware.  If we drop MAYBE_INIT_TCACHE from __libc_free,
> then the tcache_put could be avoided.

Thank you, that clarifies it for me, I was glossing over tcache.

Yes, the tcache layer doesn't care where the block came from and
will happily cache it.

In a producer-consumer model though, as this seems to be the example
from which we are drawing parallels, the consumer rarely needs to
allocate anything, so yes, the tcache effectively slows the initial
rate of frees to the producer thread, but only to a limit (as
you note).

>>> I'm still new to the code, but it looks like threads are pinned
>>> to the arena and the memory used for arenas never gets released.
>>> Is that correct?
>>
>> Threads are pinned to their arenas, but they can move in the event
>> of allocation failures, particularly to the main arena to attempt
>> sbrk to get more memory.
> 
> OK.
> 
>>> I was wondering if there was another possibility: the allocating
>>> thread gives up the arena and creates a new one because the
>>> freeing thread locked it, but I don't think that's the case.
>>
>> No.
>>
>>> Also, if I spawn a bunch of threads and get a bunch of
>>> arenas early in the program lifetime; and then only have few
>>> threads later, there can be a lot of idle arenas.
>>  
>> Yes. That is true. We don't coalesce arenas to match the thread
>> demand.
> 
> Eep :<    If contention can be avoided (which tcache seems to
> work well for), limiting arenas to CPU count seems desirable and
> worth trying.

Agreed.

In general it is not as bad as you think.

An arena is made up of a chain of heaps, each an mmap'd block, and
if we can manage to free an entire heap then we unmap the heap,
and if we're lucky we can manage to free down the entire arena
(_int_free -> large chunk / consolidate -> heap_trim -> shrink_heap).

So we might just end up with a large number of arena's that don't
have very much allocated at all, but are all on the arena free list
waiting for a thread to attach to them to reduce overall contention.

I agree that it would be *better* if we had one arena per CPU and
each thread could easily determine the CPU it was on (via a
restartable sequence) and then allocate CPU-local memory to work
with (the best you can do; ignoring NUMA effects).

> <snip>
> 
>>>> - Adding urcu as a build-time dependency is not acceptable for
>>>> bootstrap, instead we would bundle a copy of urcu and keep it
>>>> in sync with upstream. Would that make your work easier?
>>>
>>> Yes, bundling that sounds great.  I assume it's something for
>>> you or one of the regular contributors to work on (build systems
>>> scare me :x)
>>
>> Yes, that is something we'd have to do.
> 
> OK, I noticed my patch fails conformance tests because
> (despite my use of __cds_wfcq_splice_nonblocking) it references
> poll(), despite poll() being in an impossible code path:
> 
>    __cds_wfcq_splice_nonblocking -> ___cds_wfcq_splice
> 	   -> ___cds_wfcq_busy_wait -> poll
> 
> The poll call is impossible because the `blocking' parameter is 0;
> but I guess the linker doesn't know that?

Correct. We can fix that easily at a later date. Don't worry about it.

-- 
Cheers,
Carlos.

next prev parent reply	other threads:[~2018-08-01  7:01 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-31  8:49 [RFC/PoC] malloc: use wfcqueue to speed up remote frees Eric Wong
2018-07-31 12:16 ` Carlos O'Donell
2018-07-31 23:18   ` Eric Wong
2018-08-01  4:41     ` Carlos O'Donell
2018-08-01  6:23       ` Eric Wong
2018-08-01  7:01         ` Carlos O'Donell [this message]
2018-08-01  9:26           ` Eric Wong
2018-08-02 21:38             ` Carlos O'Donell
2023-01-17  6:42       ` Eric Wong via Libc-alpha
2023-01-17 19:05         ` Mathieu Desnoyers via Libc-alpha
2023-01-18 15:48           ` Mathieu Desnoyers via Libc-alpha
2023-01-18 19:12             ` Eric Wong via Libc-alpha
2023-01-18 19:17               ` Mathieu Desnoyers via Libc-alpha
2023-01-18 20:05                 ` Eric Wong via Libc-alpha
2023-01-18 14:53         ` Mathieu Desnoyers via Libc-alpha
2023-01-18 14:58           ` Mathieu Desnoyers via Libc-alpha
2018-08-08 10:40   ` Eric Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/libc/involved.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aa9b36d4-02a1-e3cc-30e8-eca9f0d8b6eb@redhat.com \
    --to=carlos@redhat.com \
    --cc=libc-alpha@sourceware.org \
    --cc=normalperson@yhbt.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).