From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-3.9 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_PASS, SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 4F71A1F4B4 for ; Mon, 28 Sep 2020 23:48:39 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 5103A3857C5A; Mon, 28 Sep 2020 23:48:37 +0000 (GMT) Received: from brightrain.aerifal.cx (brightrain.aerifal.cx [216.12.86.13]) by sourceware.org (Postfix) with ESMTPS id 8CE123857C45 for ; Mon, 28 Sep 2020 23:48:34 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 8CE123857C45 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=libc.org Authentication-Results: sourceware.org; spf=none smtp.mailfrom=dalias@libc.org Date: Mon, 28 Sep 2020 19:48:33 -0400 From: Rich Felker To: Florian Weimer Subject: Re: [PATCH] Make abort() AS-safe (Bug 26275). Message-ID: <20200928234833.GC17637@brightrain.aerifal.cx> References: <20200927141952.121047-1-carlos@redhat.com> <871rinm1fx.fsf@mid.deneb.enyo.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <871rinm1fx.fsf@mid.deneb.enyo.de> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Carlos O'Donell via Libc-alpha Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" On Sun, Sep 27, 2020 at 10:04:02PM +0200, Florian Weimer wrote: > * Carlos O'Donell via Libc-alpha: > > > diff --git a/stdlib/Makefile b/stdlib/Makefile > > index 4615f6dfe7..cd470e53ac 100644 > > --- a/stdlib/Makefile > > +++ b/stdlib/Makefile > > @@ -87,7 +87,7 @@ tests := tst-strtol tst-strtod testmb testrand testsort testdiv \ > > tst-makecontext-align test-bz22786 tst-strtod-nan-sign \ > > tst-swapcontext1 tst-setcontext4 tst-setcontext5 \ > > tst-setcontext6 tst-setcontext7 tst-setcontext8 \ > > - tst-setcontext9 tst-bz20544 > > + tst-setcontext9 tst-bz20544 tst-threaded-fork-abort \ > > > > tests-internal := tst-strtod1i tst-strtod3 tst-strtod4 tst-strtod5i \ > > tst-tls-atexit tst-tls-atexit-nodelete > > @@ -243,3 +243,5 @@ $(objpfx)tst-setcontext3.out: tst-setcontext3.sh $(objpfx)tst-setcontext3 > > $(evaluate-test) > > > > $(objpfx)tst-makecontext: $(libdl) > > + > > +LDLIBS-tst-threaded-fork-abort = $(shared-thread-library) > > This should be a dependency, so that the test gets relinked if the > thread library chagnes. > > > --- /dev/null > > +++ b/stdlib/tst-threaded-fork-abort.c > > > + On a x86_64 VM (4 vCPU @ 3.5GHz) the test fails once every 100-600 > > + seconds when run continuosly in a loop. We want this test to fail > > + with a probability of say 10% so develpers see it at least once in > > + a one or two week development period, or at least once during the > > + release window. To achieve that we run for at least 20 seconds, > > + and set the timeout at 30 seconds (gives time for a last iteration > > + to complete). */ > > Please make it an xtest. I don't think the test is valuable enough to > run continuously during development, given its overhead. > > > diff --git a/support/timespec.h b/support/timespec.h > > index 1a6775a882..280844f2c0 100644 > > --- a/support/timespec.h > > +++ b/support/timespec.h > > @@ -55,6 +55,11 @@ struct timespec support_timespec_normalize (struct timespec time); > > int support_timespec_check_in_range (struct timespec expected, struct timespec observed, > > double lower_bound, double upper_bound); > > > > +/* True if left is before right, false otherwise. */ > > +#define TIMESPEC_BEFORE(left, right) \ > > + ((left.tv_sec < right.tv_sec) \ > > + || (left.tv_sec == right.tv_sec \ > > + && left.tv_nsec < right.tv_nsec)) > > Should this be an inline function? > > > /* Check that the timespec on the left represents a time before the > > time on the right. */ > > diff --git a/sysdeps/nptl/fork.c b/sysdeps/nptl/fork.c > > index 5091a000e3..8fdc6589e5 100644 > > --- a/sysdeps/nptl/fork.c > > +++ b/sysdeps/nptl/fork.c > > @@ -66,6 +66,11 @@ __libc_fork (void) > > { > > _IO_list_lock (); > > > > + /* Acquire the abort lock. We need to prevent other threads > > + from aborting while we fork. We must keep abort AS-safe > > + and to do that we need to own the lock in the child. */ > > + __abort_fork_lock (); > > + > > /* Acquire malloc locks. This needs to come last because fork > > handlers may use malloc, and the libio list lock has an > > indirect malloc dependency as well (via the getdelim > > Is this ordering really correct? I think this will result in > deadlocks if another thread detects heap corruption and calls abort: > the forking thead acquires the abort lock and then the malloc locks, > while the other thread acquires one malloc lock and then the abort > lock. Is there a reason to take the lock across fork rather than just resetting it in the child? After seeing this I'm working on fixing the same issue in musl and was about to take the lock, but realized ours isn't actually protecting any userspace data state, just excluding sigaction on SIGABRT during abort. Rich