From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-return-100534-e=80x24.org@sourceware.org>
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net
X-Spam-Level: 
X-Spam-ASN: AS31976 209.132.180.0/23
X-Spam-Status: No, score=-4.0 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_EF,HEADER_FROM_DIFFERENT_DOMAINS,
	MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS
	shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by dcvr.yhbt.net (Postfix) with ESMTPS id B46E020248
	for <e@80x24.org>; Mon, 11 Mar 2019 14:07:30 +0000 (UTC)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
	:list-unsubscribe:list-subscribe:list-archive:list-post
	:list-help:sender:subject:to:references:from:message-id:date
	:mime-version:in-reply-to:content-type
	:content-transfer-encoding; q=dns; s=default; b=JrIBtnjLMs+Cw1EX
	6+mB9S9wy2+E6jIFUUMde85KJ5Mx6G/KOL42onabWNRbLyWQXfnFwrFb8KGuTfyw
	fkyIxUr0uFCZmTGhlTYpPOFijkvjinW5B5O/AGsLUeOJlO0cPIu0CuIqVWuqdoYO
	59mRnJP0lQkMhyQvr9pHPXYzC28=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
	:list-unsubscribe:list-subscribe:list-archive:list-post
	:list-help:sender:subject:to:references:from:message-id:date
	:mime-version:in-reply-to:content-type
	:content-transfer-encoding; s=default; bh=Qpz2iAMMlHeGIULHaEZ6Ls
	icA2w=; b=I6p36sSX8fMKYD+qLMCLvwj2o25rvyep/1WBJMpq01wMbnL6EgG3dR
	6/ZowCyNT83bgsK2mtCYAB3crZpOWWAAlvcgkLaKmQAH/wn7twWNtX6RbJRCmoIP
	ClGk+6xujWPeKa46N8EqeKzNSEffe3WWfcNvQpJkRw137PuZlj+WE=
Received: (qmail 104976 invoked by alias); 11 Mar 2019 14:07:28 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Unsubscribe: <mailto:libc-alpha-unsubscribe-e=80x24.org@sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Received: (qmail 104967 invoked by uid 89); 11 Mar 2019 14:07:28 -0000
Authentication-Results: sourceware.org; auth=none
X-HELO: mail-qt1-f176.google.com
Subject: Re: Removing longjmp error handling from the dynamic loader
To: Florian Weimer <fweimer@redhat.com>, libc-alpha@sourceware.org
References: <871s3lgtvu.fsf@oldenburg2.str.redhat.com>
From: Carlos O'Donell <carlos@redhat.com>
Openpgp: preference=signencrypt
Message-ID: <7ad76477-c936-5db4-91be-c304ea322299@redhat.com>
Date: Mon, 11 Mar 2019 10:07:20 -0400
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101
 Thunderbird/60.5.1
MIME-Version: 1.0
In-Reply-To: <871s3lgtvu.fsf@oldenburg2.str.redhat.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

On 3/5/19 11:37 AM, Florian Weimer wrote:
> Currently, dynamic loader operations triggered by explicit function
> calls (dlopen, dlsym, dlcose) wrapped in exception handlers, installed
> using _dl_catch_error.  This function calls setjmp, and when an error is
> raised the dynamic linker calls longjmp, resetting the call stack.

Yes.

Call this "Method 1: sjlj recovery"
 
> This leads to strange bugs, such as undefined symbols in init/fini calls
> being treated as non-fatal soft errors reported through dlerror
> (bug 24304).  It also leads to a design where most dlopen failures are
> not handled locally.  Instead they bubble up to the top-level dlopen
> error handler installed using _dl_catch_error, which then attempts to
> undo the changes of the  partially completed state to the global dynamic
> linker state (see bug 20839 for a couple of problems with that).

Nothing to me indicates that a new system wouldn't also have bugs.

The benefit you need to highlight is the ability for a new design to
mitigate bugs by the very nature of the structural changes being
proposed.

Both sjlj and a purely C++-style exception mechanism are, in my mind,
semantically equivalent, but the latter has more syntactic sugar to
allow you to practice good patterns that avoid mistakes (but it can
equally hide errors and create bugs).

> In the current scheme, more localized error handling is problematic
> because it has high syntactic overhead: You need to define a struct for
> argument data and a separate function that receives the data, and pass
> both to _dl_catch_error.  There also could be a performance overhead if
> individual malloc calls were protected in this way because each call to
> _dl_catch_error incurs a call to setjmp.

Yes, Method 1 requires passing down all information encapsulated in a
structure.

Why would malloc calls be protected in this way?

> Does anyone think we should retain the current error handling scheme?

That depends on the quality of the alternative proposal :-)

> I personally do not have a problem with exceptions and stack unwinding,
> but if this is what we want, we should use a DWARF-based unwinder and
> GCC's exception handling features (the limited support in the C front
> end is probably sufficient).

I agree.

Let's call this "Method 2: C exceptions"

For dlopen-et-al I think a DWARF-based unwinder would be great.

I assume you envision that exception handling in C would help cleanup
the code and allow more locally visible cleanups to happen (unwinding
state).

> The alternative to unwinding is an explicit struct dl_exception *
> argument for functions which can fail, and use the return value to
> indicate whether there was a fatal error.  This sometimes causes issues
> where the return value is already used to signal both success and
> non-fatal error (e.g., -1 for failure from open_verify, or NULL for
> RTLD_NOLOAD failure from _dl_map_object_from_fd).

No, if we're going to change it should be *towards* something where the
compiler can help us get it right.

I don't want to see an explicit "this" passed into every function by hand.

Let's call this "Method 3: Explicit this"

> There is some impact on <dl-machine.h> because the relocation processing
> needs to change.  We can convert the relocation processing first to the
> new scheme and continue to signal any errors using longjmp in the
> generic code.  But supporting twice the number of relocation APIs for
> incremental conversion of targets will still be difficult.  I think we
> are still looking at one fairly large patch, given the number of
> architectures we support, although the changes should just be a few
> dozen lines per architecture.

Please expand on this a bit more.

> We cannot convert the generic code first because that would mean calling
> setjmp each time before calling into architecture-specific code, which I
> expect will be too problematic from a performance point of view.

Right, this is an all-or-nothing change IMO.

> A third option is not use an explicit struct dl_exception * argument,
> but a per-thread variable.  This will require changes to support TLS
> (presumably the initial-exec variant) in the dynamic linker itself,
> which is currently missing.  Since the exception pointer is only needed
> in case of an error, using a TLS variable for it will avoid the overhead
> of maintaining the explicit exception pointer argument and passing it
> around.  Adding TLS support to the dynamic linker could be implemented
> incrementally across architectures, but the conversion itself faces the
> same flag day challenge as the explicit argument solution.  The explict
> argument also ensures that places stick out where encoding fatal errors
> in the return argument is difficult.  (A fourth option would compile the
> dynamic linker twice and use the TLS-less version for the initial
> loading of the executables and its dependencies.)

I don't like either of these options.

Let's call this "Method 3: TP explicit this"

I won't give the 4th option a name :-}

> From a source code and binary size point of view, there is not much
> difference between using longjmp and the explicit function argument
> approach on x86-64.

Right.

> Does anyone want to keep the longjmp approach?  Should I polish my patch
> and extend it to cover all architectures?

What does your patch do?

Method 1: sjlj
Method 2: C exceptions <------- I vote method 2.
Method 3: Explicit this
Method 4: TP explicit this

-- 
Cheers,
Carlos.