From: Rich Felker <dalias@libc.org>
To: Florian Weimer <fweimer@redhat.com>
Cc: libc-alpha@sourceware.org
Subject: Re: Removing longjmp error handling from the dynamic loader
Date: Wed, 6 Mar 2019 10:40:13 -0500 [thread overview]
Message-ID: <20190306154013.GQ23599@brightrain.aerifal.cx> (raw)
In-Reply-To: <871s3lgtvu.fsf@oldenburg2.str.redhat.com>
On Tue, Mar 05, 2019 at 05:37:09PM +0100, Florian Weimer wrote:
> Currently, dynamic loader operations triggered by explicit function
> calls (dlopen, dlsym, dlcose) wrapped in exception handlers, installed
> using _dl_catch_error. This function calls setjmp, and when an error is
> raised the dynamic linker calls longjmp, resetting the call stack.
>
> This leads to strange bugs, such as undefined symbols in init/fini calls
> being treated as non-fatal soft errors reported through dlerror
> (bug 24304). It also leads to a design where most dlopen failures are
> not handled locally. Instead they bubble up to the top-level dlopen
> error handler installed using _dl_catch_error, which then attempts to
> undo the changes of the partially completed state to the global dynamic
> linker state (see bug 20839 for a couple of problems with that).
>
> In the current scheme, more localized error handling is problematic
> because it has high syntactic overhead: You need to define a struct for
> argument data and a separate function that receives the data, and pass
> both to _dl_catch_error. There also could be a performance overhead if
> individual malloc calls were protected in this way because each call to
> _dl_catch_error incurs a call to setjmp.
>
> Does anyone think we should retain the current error handling scheme?
> I personally do not have a problem with exceptions and stack unwinding,
> but if this is what we want, we should use a DWARF-based unwinder and
> GCC's exception handling features (the limited support in the C front
> end is probably sufficient).
>
> The alternative to unwinding is an explicit struct dl_exception *
> argument for functions which can fail, and use the return value to
> indicate whether there was a fatal error. This sometimes causes issues
> where the return value is already used to signal both success and
> non-fatal error (e.g., -1 for failure from open_verify, or NULL for
> RTLD_NOLOAD failure from _dl_map_object_from_fd).
>
> There is some impact on <dl-machine.h> because the relocation processing
> needs to change. We can convert the relocation processing first to the
> new scheme and continue to signal any errors using longjmp in the
> generic code. But supporting twice the number of relocation APIs for
> incremental conversion of targets will still be difficult. I think we
> are still looking at one fairly large patch, given the number of
> architectures we support, although the changes should just be a few
> dozen lines per architecture.
>
> We cannot convert the generic code first because that would mean calling
> setjmp each time before calling into architecture-specific code, which I
> expect will be too problematic from a performance point of view.
>
> A third option is not use an explicit struct dl_exception * argument,
> but a per-thread variable. This will require changes to support TLS
> (presumably the initial-exec variant) in the dynamic linker itself,
> which is currently missing. Since the exception pointer is only needed
> in case of an error, using a TLS variable for it will avoid the overhead
> of maintaining the explicit exception pointer argument and passing it
> around. Adding TLS support to the dynamic linker could be implemented
> incrementally across architectures, but the conversion itself faces the
> same flag day challenge as the explicit argument solution. The explict
> argument also ensures that places stick out where encoding fatal errors
> in the return argument is difficult. (A fourth option would compile the
> dynamic linker twice and use the TLS-less version for the initial
> loading of the executables and its dependencies.)
>
> From a source code and binary size point of view, there is not much
> difference between using longjmp and the explicit function argument
> approach on x86-64.
>
> Does anyone want to keep the longjmp approach? Should I polish my patch
> and extend it to cover all architectures?
I don't have a strong opinion (and maybe not enough information to
have any opinion) on whether you keep longjmp or go with something
else, but I don't think there's any fundamental reason you need to
change this to fix current bugs. In musl, longjmp is used partly by
historical accident (I don't recall fully, but I think it was a matter
of adding dlopen to code that was originally written just for initial
dynamic linking at program entry), and doesn't have problems like the
ones you describe.
For the init/fini "soft errors" problem, it sounds to me like the code
that runs the ctors should just be outside of the scope of the
_dl_catch_error. If you've started running ctors, you're past the
point where the operation can be backed out in any meaningful sense.
I wonder if it's worse with ifunc, but I think not -- without
bind_now, the ifunc resolvers don't even need to run before you pass
the point of no return, and with bind_now, you'd be executing them in
a context where resolver errors are still "hard" and can/should cause
dlopen to fail.
Are there things I'm missing?
Rich
next prev parent reply other threads:[~2019-03-06 15:40 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-05 16:37 Removing longjmp error handling from the dynamic loader Florian Weimer
2019-03-06 15:40 ` Rich Felker [this message]
2019-03-07 14:03 ` Adhemerval Zanella
2019-03-11 13:45 ` Florian Weimer
2019-03-11 22:52 ` Rich Felker
2019-03-12 10:30 ` Florian Weimer
2019-03-12 12:00 ` Szabolcs Nagy
2019-03-11 14:07 ` Carlos O'Donell
2019-03-11 15:29 ` Florian Weimer
2019-03-11 16:13 ` Carlos O'Donell
2019-03-11 16:53 ` Zack Weinberg
2019-03-11 17:08 ` Florian Weimer
2019-03-11 18:39 ` Zack Weinberg
2019-03-11 18:49 ` Florian Weimer
2019-03-11 18:39 ` Florian Weimer
2019-03-11 19:30 ` Carlos O'Donell
2019-03-13 15:51 ` Florian Weimer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/libc/involved.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190306154013.GQ23599@brightrain.aerifal.cx \
--to=dalias@libc.org \
--cc=fweimer@redhat.com \
--cc=libc-alpha@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).