From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-4.0 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED, SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id E8D7F20248 for ; Mon, 11 Mar 2019 22:52:09 +0000 (UTC) Received: (qmail 49096 invoked by alias); 11 Mar 2019 22:52:07 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 49088 invoked by uid 89); 11 Mar 2019 22:52:07 -0000 Authentication-Results: sourceware.org; auth=none X-HELO: brightrain.aerifal.cx Date: Mon, 11 Mar 2019 18:52:00 -0400 From: Rich Felker To: Florian Weimer Cc: libc-alpha@sourceware.org Subject: Re: Removing longjmp error handling from the dynamic loader Message-ID: <20190311225200.GA23599@brightrain.aerifal.cx> References: <871s3lgtvu.fsf@oldenburg2.str.redhat.com> <20190306154013.GQ23599@brightrain.aerifal.cx> <877ed5zfrq.fsf@oldenburg2.str.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <877ed5zfrq.fsf@oldenburg2.str.redhat.com> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: Rich Felker On Mon, Mar 11, 2019 at 02:45:13PM +0100, Florian Weimer wrote: > * Rich Felker: > > > I don't have a strong opinion (and maybe not enough information to > > have any opinion) on whether you keep longjmp or go with something > > else, but I don't think there's any fundamental reason you need to > > change this to fix current bugs. In musl, longjmp is used partly by > > historical accident (I don't recall fully, but I think it was a matter > > of adding dlopen to code that was originally written just for initial > > dynamic linking at program entry), and doesn't have problems like the > > ones you describe. > > Interesting. > > In glibc, we have many callouts into architecture-specific routines from > generic code. Some of these routines throw exceptions, and which ones > do is not always entirely clear. > > For example, if there is a temporary memory allocation which persists > across such a callout, do we have to install a local exception handler > to clean up that allocation in case the helper routine throws? > > If the error handling is expressed in the function signature (using that > exception pointer parameter), the behavior is much more explicit and we > can avoid these issues more easily. > > > For the init/fini "soft errors" problem, it sounds to me like the code > > that runs the ctors should just be outside of the scope of the > > _dl_catch_error. If you've started running ctors, you're past the > > point where the operation can be backed out in any meaningful sense. > > That's certainly true. This one should be rather easy to fix. It also > affects only dlopen/dlclose, at which point we can assume that we have > our full exception handling implementation. > > > I wonder if it's worse with ifunc, but I think not -- without > > bind_now, the ifunc resolvers don't even need to run before you pass > > the point of no return, and with bind_now, you'd be executing them in > > a context where resolver errors are still "hard" and can/should cause > > dlopen to fail. > > The question is whether a failure from the run-time trampoline should > ever be a soft error (that can be caught by dlopen). I think we use the > soft/hard distinction differently, but you seem to suggest that a lazy > binding error during a call to an IFUNC resolver should not cause > process termination. I think it is undefined like any other trampoline > failure, so we should abort. We really don't know what the IFUNC > resolver was supposed to be doing and which of its side effects > happened. The situation really is unrecoverable. Assuming ifunc resolvers aren't "allowed" to do much beyond probing hwcaps/cpuid/etc. to pick an implementation, I don't see any reason that resolver failure during an ifunc resolver function should terminate the process. Missing symbols at dlopen time with RTLD_NOW or DT_BINDNOW or whatever should never crash the application, but should report the error. With ifunc, I think (?) you have the possibility that the ifunc resolver code will call another function in the library being loaded (or one of its deps) via a plt slot that hasn't yet been initialized, because there's no way to know a dependency order for the relocations to avoid this. This should probably longjmp back and make dlopen fail; I can't see any other way to make it work since there's no way to make forward progress past the impossible-to-satisfy call. But maybe the relocations can just be ordered such that this isn't a concern (by checking all symbolic references prior to doing any ifunc resolvers?). > If we want to give users more precise control over binding errors, I > don't think anything based on SJLJ-style exception handling is the > answer. I don't see why there should be any expectation that you can use C++ exception handling for this; the contract of dlopen is that it succeed or return an error, not that it might terminate via an exception. Rich