unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Rich Felker <dalias@libc.org>
To: Florian Weimer <fweimer@redhat.com>
Cc: libc-alpha@sourceware.org
Subject: Re: Removing longjmp error handling from the dynamic loader
Date: Wed, 6 Mar 2019 10:40:13 -0500	[thread overview]
Message-ID: <20190306154013.GQ23599@brightrain.aerifal.cx> (raw)
In-Reply-To: <871s3lgtvu.fsf@oldenburg2.str.redhat.com>

On Tue, Mar 05, 2019 at 05:37:09PM +0100, Florian Weimer wrote:
> Currently, dynamic loader operations triggered by explicit function
> calls (dlopen, dlsym, dlcose) wrapped in exception handlers, installed
> using _dl_catch_error.  This function calls setjmp, and when an error is
> raised the dynamic linker calls longjmp, resetting the call stack.
> 
> This leads to strange bugs, such as undefined symbols in init/fini calls
> being treated as non-fatal soft errors reported through dlerror
> (bug 24304).  It also leads to a design where most dlopen failures are
> not handled locally.  Instead they bubble up to the top-level dlopen
> error handler installed using _dl_catch_error, which then attempts to
> undo the changes of the  partially completed state to the global dynamic
> linker state (see bug 20839 for a couple of problems with that).
> 
> In the current scheme, more localized error handling is problematic
> because it has high syntactic overhead: You need to define a struct for
> argument data and a separate function that receives the data, and pass
> both to _dl_catch_error.  There also could be a performance overhead if
> individual malloc calls were protected in this way because each call to
> _dl_catch_error incurs a call to setjmp.
> 
> Does anyone think we should retain the current error handling scheme?
> I personally do not have a problem with exceptions and stack unwinding,
> but if this is what we want, we should use a DWARF-based unwinder and
> GCC's exception handling features (the limited support in the C front
> end is probably sufficient).
> 
> The alternative to unwinding is an explicit struct dl_exception *
> argument for functions which can fail, and use the return value to
> indicate whether there was a fatal error.  This sometimes causes issues
> where the return value is already used to signal both success and
> non-fatal error (e.g., -1 for failure from open_verify, or NULL for
> RTLD_NOLOAD failure from _dl_map_object_from_fd).
> 
> There is some impact on <dl-machine.h> because the relocation processing
> needs to change.  We can convert the relocation processing first to the
> new scheme and continue to signal any errors using longjmp in the
> generic code.  But supporting twice the number of relocation APIs for
> incremental conversion of targets will still be difficult.  I think we
> are still looking at one fairly large patch, given the number of
> architectures we support, although the changes should just be a few
> dozen lines per architecture.
> 
> We cannot convert the generic code first because that would mean calling
> setjmp each time before calling into architecture-specific code, which I
> expect will be too problematic from a performance point of view.
> 
> A third option is not use an explicit struct dl_exception * argument,
> but a per-thread variable.  This will require changes to support TLS
> (presumably the initial-exec variant) in the dynamic linker itself,
> which is currently missing.  Since the exception pointer is only needed
> in case of an error, using a TLS variable for it will avoid the overhead
> of maintaining the explicit exception pointer argument and passing it
> around.  Adding TLS support to the dynamic linker could be implemented
> incrementally across architectures, but the conversion itself faces the
> same flag day challenge as the explicit argument solution.  The explict
> argument also ensures that places stick out where encoding fatal errors
> in the return argument is difficult.  (A fourth option would compile the
> dynamic linker twice and use the TLS-less version for the initial
> loading of the executables and its dependencies.)
> 
> From a source code and binary size point of view, there is not much
> difference between using longjmp and the explicit function argument
> approach on x86-64.
> 
> Does anyone want to keep the longjmp approach?  Should I polish my patch
> and extend it to cover all architectures?

I don't have a strong opinion (and maybe not enough information to
have any opinion) on whether you keep longjmp or go with something
else, but I don't think there's any fundamental reason you need to
change this to fix current bugs. In musl, longjmp is used partly by
historical accident (I don't recall fully, but I think it was a matter
of adding dlopen to code that was originally written just for initial
dynamic linking at program entry), and doesn't have problems like the
ones you describe.

For the init/fini "soft errors" problem, it sounds to me like the code
that runs the ctors should just be outside of the scope of the
_dl_catch_error. If you've started running ctors, you're past the
point where the operation can be backed out in any meaningful sense.
I wonder if it's worse with ifunc, but I think not -- without
bind_now, the ifunc resolvers don't even need to run before you pass
the point of no return, and with bind_now, you'd be executing them in
a context where resolver errors are still "hard" and can/should cause
dlopen to fail.

Are there things I'm missing?

Rich

  reply	other threads:[~2019-03-06 15:40 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-05 16:37 Removing longjmp error handling from the dynamic loader Florian Weimer
2019-03-06 15:40 ` Rich Felker [this message]
2019-03-07 14:03   ` Adhemerval Zanella
2019-03-11 13:45   ` Florian Weimer
2019-03-11 22:52     ` Rich Felker
2019-03-12 10:30       ` Florian Weimer
2019-03-12 12:00         ` Szabolcs Nagy
2019-03-11 14:07 ` Carlos O'Donell
2019-03-11 15:29   ` Florian Weimer
2019-03-11 16:13     ` Carlos O'Donell
2019-03-11 16:53       ` Zack Weinberg
2019-03-11 17:08         ` Florian Weimer
2019-03-11 18:39           ` Zack Weinberg
2019-03-11 18:49             ` Florian Weimer
2019-03-11 18:39       ` Florian Weimer
2019-03-11 19:30         ` Carlos O'Donell
2019-03-13 15:51           ` Florian Weimer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/libc/involved.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190306154013.GQ23599@brightrain.aerifal.cx \
    --to=dalias@libc.org \
    --cc=fweimer@redhat.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).