From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-3.9 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_PASS, SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 116571F55B for ; Wed, 3 Jun 2020 17:11:36 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 368CE3840C1C; Wed, 3 Jun 2020 17:11:35 +0000 (GMT) Received: from brightrain.aerifal.cx (brightrain.aerifal.cx [216.12.86.13]) by sourceware.org (Postfix) with ESMTPS id EB5E63840C1C for ; Wed, 3 Jun 2020 17:11:30 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org EB5E63840C1C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=libc.org Authentication-Results: sourceware.org; spf=none smtp.mailfrom=dalias@libc.org Date: Wed, 3 Jun 2020 13:11:26 -0400 From: Rich Felker To: Florian Weimer Subject: Re: [PATCH 2/2] manual: Document __libc_single_threaded Message-ID: <20200603171125.GH1079@brightrain.aerifal.cx> References: <20200522174932.GY1079@brightrain.aerifal.cx> <20200522192249.GC24880@arm.com> <20200522195326.GB1079@brightrain.aerifal.cx> <20200523064941.GD26190@port70.net> <20200523160202.GG1079@brightrain.aerifal.cx> <87o8qczb0n.fsf@oldenburg2.str.redhat.com> <20200525172149.GQ1079@brightrain.aerifal.cx> <87mu5tk2zq.fsf@oldenburg2.str.redhat.com> <20200527153609.GT1079@brightrain.aerifal.cx> <87a71kurda.fsf@oldenburg2.str.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87a71kurda.fsf@oldenburg2.str.redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: libc-alpha@sourceware.org Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" On Wed, Jun 03, 2020 at 05:00:01PM +0200, Florian Weimer wrote: > * Rich Felker: > > >> I think it makes more sense not to declare the object as volatile and > >> make sure that only libc functions which imply the required barrier > >> write to __libc_single_threaded. For instance, I expect that this will > >> allow compilers to generate tighter code around multiple (implied) > >> reference count updates. > > > > It really should be volatile just because whatever you make it is ABI, > > and there might be good reasons to want to make it updated > > "asynchronously" with respect to the compiler's model in the future. > > If you don't make it volatile now you can't do that in the future. > > volatile does not mean that it is safe to update it asynchronously. > Some barriers are still needed. If by "asynchronously" we just mean "some code running in the same task context which the compiler isn't aware of, like a signal handler or internals of malloc or of a pure function" then yes volatile is sufficient. This is basically the definition of volatile. If you mean modification from other threads, then you need additional synchronization to ensure that the change is seen, but if you're ok with a change not being seen (roughly the same as relaxed order) then volatile suffices as a valid way to model it. (The potential change is an asynchronous change by the hardware, where the hardware in question happens to be the cache/memory coherency implementation.) > And if we need acquire MO for the > access, then the purpose of the variable is completely defeated. No. Reading a stale value (false instead of true) is perfectly fine. The other direction cannot happen because the change from true to false can fundamentally happen only in a single-threaded context. And if you don't want to read a stale value, SYS_membarrier performed by the thread changing the value ensures that you won't. But that's not needed. > Is it really a significant restriction on implementations if they can > only set __libc_single_threaded to true from functions which clearly > imply an acquire barrier? I don't think so. pthread_join qualifies, > and that seems the most important case/ Again doing it only from pthread_join will completely omit the optimization in any program that waits to join until is has no further work to do, which is a very reasonable thing to do. > > The cost of volatile is rather trivial; compiler can still cache > > address of it, and loading from a cache line you recently loaded from, > > and to which no stores are happening, is virtually free. > > There is a code size impact because the compiler can no longer merge > several atomic/non-atomic alternatives. Consider this example, which I > think it is not too unrealistic (hopefully, std::shared_ptr will > eventually compile down to something similar): > > #include > > struct refcounted > { > size_t count; > }; > > extern char __libc_single_threaded; > > static inline > void ref (struct refcounted *p) > { > if (__libc_single_threaded) > ++p->count; > else > __atomic_fetch_add (&p->count, 1, __ATOMIC_RELAXED); > } > > > void f1 (struct refcounted *a, struct refcounted *b); > > void > f2 (struct refcounted *a, struct refcounted *b) > { > /* f1 takes ownership of both objects, but the caller of f2 is > expected to retain (shared) ownership as well. */ > ref (a); > ref (b); > f1 (a, b); > } > > As written, this turns into: > > f2: > cmpb $0, __libc_single_threaded(%rip) > je .L2 > addq $1, (%rdi) > ..L3: > addq $1, (%rsi) > jmp f1 > ..L2: > lock addq $1, (%rdi) > cmpb $0, __libc_single_threaded(%rip) > jne .L3 > lock addq $1, (%rsi) > jmp f1 > > The jump back to .L3 is due to the current reluctance to optimize > relaxed MO accesses, due to the brokenness of the C++ memory model > (although it would be quite safe here, I assume). > > With volatile, it's this instead: > > f2: > movzbl __libc_single_threaded(%rip), %eax > testb %al, %al > je .L2 > addq $1, (%rdi) > movzbl __libc_single_threaded(%rip), %eax > testb %al, %al > je .L4 > ..L7: > addq $1, (%rsi) > jmp f1 > ..L2: > lock addq $1, (%rdi) > movzbl __libc_single_threaded(%rip), %eax > testb %al, %al > jne .L7 > ..L4: > lock addq $1, (%rsi) > jmp f1 > > So two checks in the non-atomic case, plus a load for the volatile > access. I see; indeed for something on this small a scale, it *might* matter. Has anyone done any measurement to determine whether it does? Rich