From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-return-100567-e=80x24.org@sourceware.org>
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net
X-Spam-Level: 
X-Spam-ASN: AS31976 209.132.180.0/23
X-Spam-Status: No, score=-4.0 required=3.0 tests=AWL,BAYES_00,
	HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,
	SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham
	autolearn_force=no version=3.4.2
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by dcvr.yhbt.net (Postfix) with ESMTPS id E8D7F20248
	for <e@80x24.org>; Mon, 11 Mar 2019 22:52:09 +0000 (UTC)
Received: (qmail 49096 invoked by alias); 11 Mar 2019 22:52:07 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Unsubscribe: <mailto:libc-alpha-unsubscribe-e=80x24.org@sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Received: (qmail 49088 invoked by uid 89); 11 Mar 2019 22:52:07 -0000
Authentication-Results: sourceware.org; auth=none
X-HELO: brightrain.aerifal.cx
Date: Mon, 11 Mar 2019 18:52:00 -0400
From: Rich Felker <dalias@libc.org>
To: Florian Weimer <fweimer@redhat.com>
Cc: libc-alpha@sourceware.org
Subject: Re: Removing longjmp error handling from the dynamic loader
Message-ID: <20190311225200.GA23599@brightrain.aerifal.cx>
References: <871s3lgtvu.fsf@oldenburg2.str.redhat.com>
 <20190306154013.GQ23599@brightrain.aerifal.cx>
 <877ed5zfrq.fsf@oldenburg2.str.redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <877ed5zfrq.fsf@oldenburg2.str.redhat.com>
User-Agent: Mutt/1.5.21 (2010-09-15)
Sender: Rich Felker <dalias@aerifal.cx>

On Mon, Mar 11, 2019 at 02:45:13PM +0100, Florian Weimer wrote:
> * Rich Felker:
> 
> > I don't have a strong opinion (and maybe not enough information to
> > have any opinion) on whether you keep longjmp or go with something
> > else, but I don't think there's any fundamental reason you need to
> > change this to fix current bugs. In musl, longjmp is used partly by
> > historical accident (I don't recall fully, but I think it was a matter
> > of adding dlopen to code that was originally written just for initial
> > dynamic linking at program entry), and doesn't have problems like the
> > ones you describe.
> 
> Interesting.
> 
> In glibc, we have many callouts into architecture-specific routines from
> generic code.  Some of these routines throw exceptions, and which ones
> do is not always entirely clear.
> 
> For example, if there is a temporary memory allocation which persists
> across such a callout, do we have to install a local exception handler
> to clean up that allocation in case the helper routine throws?
> 
> If the error handling is expressed in the function signature (using that
> exception pointer parameter), the behavior is much more explicit and we
> can avoid these issues more easily.
> 
> > For the init/fini "soft errors" problem, it sounds to me like the code
> > that runs the ctors should just be outside of the scope of the
> > _dl_catch_error. If you've started running ctors, you're past the
> > point where the operation can be backed out in any meaningful sense.
> 
> That's certainly true.  This one should be rather easy to fix.  It also
> affects only dlopen/dlclose, at which point we can assume that we have
> our full exception handling implementation.
> 
> > I wonder if it's worse with ifunc, but I think not -- without
> > bind_now, the ifunc resolvers don't even need to run before you pass
> > the point of no return, and with bind_now, you'd be executing them in
> > a context where resolver errors are still "hard" and can/should cause
> > dlopen to fail.
> 
> The question is whether a failure from the run-time trampoline should
> ever be a soft error (that can be caught by dlopen).  I think we use the
> soft/hard distinction differently, but you seem to suggest that a lazy
> binding error during a call to an IFUNC resolver should not cause
> process termination.  I think it is undefined like any other trampoline
> failure, so we should abort.  We really don't know what the IFUNC
> resolver was supposed to be doing and which of its side effects
> happened.  The situation really is unrecoverable.

Assuming ifunc resolvers aren't "allowed" to do much beyond probing
hwcaps/cpuid/etc. to pick an implementation, I don't see any reason
that resolver failure during an ifunc resolver function should
terminate the process. Missing symbols at dlopen time with RTLD_NOW or
DT_BINDNOW or whatever should never crash the application, but should
report the error. With ifunc, I think (?) you have the possibility
that the ifunc resolver code will call another function in the library
being loaded (or one of its deps) via a plt slot that hasn't yet been
initialized, because there's no way to know a dependency order for the
relocations to avoid this. This should probably longjmp back and make
dlopen fail; I can't see any other way to make it work since there's
no way to make forward progress past the impossible-to-satisfy call.
But maybe the relocations can just be ordered such that this isn't a
concern (by checking all symbolic references prior to doing any ifunc
resolvers?).

> If we want to give users more precise control over binding errors, I
> don't think anything based on SJLJ-style exception handling is the
> answer.

I don't see why there should be any expectation that you can use C++
exception handling for this; the contract of dlopen is that it succeed
or return an error, not that it might terminate via an exception.

Rich