From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_PASS, SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 564FD1F4B4 for ; Thu, 1 Oct 2020 02:30:44 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 25D46388A438; Thu, 1 Oct 2020 02:30:41 +0000 (GMT) Received: from brightrain.aerifal.cx (brightrain.aerifal.cx [216.12.86.13]) by sourceware.org (Postfix) with ESMTPS id EDE5D3857C75 for ; Thu, 1 Oct 2020 02:30:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org EDE5D3857C75 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=libc.org Authentication-Results: sourceware.org; spf=none smtp.mailfrom=dalias@libc.org Date: Wed, 30 Sep 2020 22:30:19 -0400 From: Rich Felker To: Florian Weimer Subject: Re: [PATCH] Make abort() AS-safe (Bug 26275). Message-ID: <20201001023018.GL17637@brightrain.aerifal.cx> References: <20200927141952.121047-1-carlos@redhat.com> <871rinm1fx.fsf@mid.deneb.enyo.de> <20200928234833.GC17637@brightrain.aerifal.cx> <87d025jcn0.fsf@mid.deneb.enyo.de> <20200929144207.GD17637@brightrain.aerifal.cx> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200929144207.GD17637@brightrain.aerifal.cx> User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: musl@lists.openwall.com, Carlos O'Donell via Libc-alpha Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" On Tue, Sep 29, 2020 at 10:42:07AM -0400, Rich Felker wrote: > On Tue, Sep 29, 2020 at 08:54:59AM +0200, Florian Weimer wrote: > > * Rich Felker: > > > > > Is there a reason to take the lock across fork rather than just > > > resetting it in the child? After seeing this I'm working on fixing the > > > same issue in musl and was about to take the lock, but realized ours > > > isn't actually protecting any userspace data state, just excluding > > > sigaction on SIGABRT during abort. > > > > It's also necessary to stop the fork because the subprocess could > > otherwise observe the impossible SIG_DFL state. In case the signal > > handler returns, the implementation needs to produce a termination > > status with SIGABRT as the termination signal, and the only way I can > > see to achieve that is to remove the signal handler and send the > > signal again. This suggests that a lock in sigaction is needed as > > well. > > Yes, in musl we already have the lock in sigaction -- that's the whole > point of the lock. To prevent other threads from fighting to change > the disposition back to SIG_IGN or a signal handler while abort is > trying to change it to SIG_DFL. > > > But for the fork case, restting the lock in the new subprocess should > > be sufficient. > > I don't follow. Do you mean taking the lock in the parent, but just > resetting it in the child? That should work but I don't see how it has > any advantage over just releasing it in the child. OK, this is a lot worse than you thought: Even without fork, execve and posix_spawn can also see the SIGABRT disposition change made by abort(), passing it on to a process that should have started with a disposition of SIG_IGN if you hit exactly the wrong spot in the race. So, to fix this, these interfaces also have to take the abort lock, and to make it AS-safe (since execve is required to be), need to block all signals to take the lock. But execve can't leave signals blocked or the new process image would inherit that state. So it has to unblock them after taking the lock. But then a signal handler can interrupt between taking the lock and the execve syscall, making abort deadlock if called from the signal handler. So how to solve this? Having the abort lock be recursive sounds like it helps (avoid the deadlock above), but then the signal handler that runs between taking the abort lock and making the execve syscall still delays abort by other threads for an unbounded length of time, and in fact it could even longjmp out, leaving a stale lock owner that prevents any other thread from ever calling abort. Ultimately this boils down to a general principle: you can't make AS-safe locks that allow arbitrary application code to run while they're held. I really don't see any way out without giving abort a mechanism to "seize" other threads before changing the signal disposition. This could for example be done with the same mechanism used for multithreaded set*id (broadcast signal of an implementation-internal, unblockable signal) or maybe with some seccomp hacks on a recent enough kernel. Is there some better approach I'm missing?? All of this hell because Linux thought we didn't need a SYS_abort... Rich