Removing longjmp error handling from the dynamic loader

unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed

* Removing longjmp error handling from the dynamic loader
@ 2019-03-05 16:37 Florian Weimer
  2019-03-06 15:40 ` Rich Felker
  2019-03-11 14:07 ` Carlos O'Donell
  0 siblings, 2 replies; 17+ messages in thread
From: Florian Weimer @ 2019-03-05 16:37 UTC (permalink / raw)
  To: libc-alpha

Currently, dynamic loader operations triggered by explicit function
calls (dlopen, dlsym, dlcose) wrapped in exception handlers, installed
using _dl_catch_error.  This function calls setjmp, and when an error is
raised the dynamic linker calls longjmp, resetting the call stack.

This leads to strange bugs, such as undefined symbols in init/fini calls
being treated as non-fatal soft errors reported through dlerror
(bug 24304).  It also leads to a design where most dlopen failures are
not handled locally.  Instead they bubble up to the top-level dlopen
error handler installed using _dl_catch_error, which then attempts to
undo the changes of the  partially completed state to the global dynamic
linker state (see bug 20839 for a couple of problems with that).

In the current scheme, more localized error handling is problematic
because it has high syntactic overhead: You need to define a struct for
argument data and a separate function that receives the data, and pass
both to _dl_catch_error.  There also could be a performance overhead if
individual malloc calls were protected in this way because each call to
_dl_catch_error incurs a call to setjmp.

Does anyone think we should retain the current error handling scheme?
I personally do not have a problem with exceptions and stack unwinding,
but if this is what we want, we should use a DWARF-based unwinder and
GCC's exception handling features (the limited support in the C front
end is probably sufficient).

The alternative to unwinding is an explicit struct dl_exception *
argument for functions which can fail, and use the return value to
indicate whether there was a fatal error.  This sometimes causes issues
where the return value is already used to signal both success and
non-fatal error (e.g., -1 for failure from open_verify, or NULL for
RTLD_NOLOAD failure from _dl_map_object_from_fd).

There is some impact on <dl-machine.h> because the relocation processing
needs to change.  We can convert the relocation processing first to the
new scheme and continue to signal any errors using longjmp in the
generic code.  But supporting twice the number of relocation APIs for
incremental conversion of targets will still be difficult.  I think we
are still looking at one fairly large patch, given the number of
architectures we support, although the changes should just be a few
dozen lines per architecture.

We cannot convert the generic code first because that would mean calling
setjmp each time before calling into architecture-specific code, which I
expect will be too problematic from a performance point of view.

A third option is not use an explicit struct dl_exception * argument,
but a per-thread variable.  This will require changes to support TLS
(presumably the initial-exec variant) in the dynamic linker itself,
which is currently missing.  Since the exception pointer is only needed
in case of an error, using a TLS variable for it will avoid the overhead
of maintaining the explicit exception pointer argument and passing it
around.  Adding TLS support to the dynamic linker could be implemented
incrementally across architectures, but the conversion itself faces the
same flag day challenge as the explicit argument solution.  The explict
argument also ensures that places stick out where encoding fatal errors
in the return argument is difficult.  (A fourth option would compile the
dynamic linker twice and use the TLS-less version for the initial
loading of the executables and its dependencies.)

From a source code and binary size point of view, there is not much
difference between using longjmp and the explicit function argument
approach on x86-64.

Does anyone want to keep the longjmp approach?  Should I polish my patch
and extend it to cover all architectures?

Thanks,
Florian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Removing longjmp error handling from the dynamic loader
  2019-03-05 16:37 Removing longjmp error handling from the dynamic loader Florian Weimer
@ 2019-03-06 15:40 ` Rich Felker
  2019-03-07 14:03   ` Adhemerval Zanella
  2019-03-11 13:45   ` Florian Weimer
  2019-03-11 14:07 ` Carlos O'Donell
  1 sibling, 2 replies; 17+ messages in thread
From: Rich Felker @ 2019-03-06 15:40 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

On Tue, Mar 05, 2019 at 05:37:09PM +0100, Florian Weimer wrote:
> Currently, dynamic loader operations triggered by explicit function
> calls (dlopen, dlsym, dlcose) wrapped in exception handlers, installed
> using _dl_catch_error.  This function calls setjmp, and when an error is
> raised the dynamic linker calls longjmp, resetting the call stack.
> 
> This leads to strange bugs, such as undefined symbols in init/fini calls
> being treated as non-fatal soft errors reported through dlerror
> (bug 24304).  It also leads to a design where most dlopen failures are
> not handled locally.  Instead they bubble up to the top-level dlopen
> error handler installed using _dl_catch_error, which then attempts to
> undo the changes of the  partially completed state to the global dynamic
> linker state (see bug 20839 for a couple of problems with that).
> 
> In the current scheme, more localized error handling is problematic
> because it has high syntactic overhead: You need to define a struct for
> argument data and a separate function that receives the data, and pass
> both to _dl_catch_error.  There also could be a performance overhead if
> individual malloc calls were protected in this way because each call to
> _dl_catch_error incurs a call to setjmp.
> 
> Does anyone think we should retain the current error handling scheme?
> I personally do not have a problem with exceptions and stack unwinding,
> but if this is what we want, we should use a DWARF-based unwinder and
> GCC's exception handling features (the limited support in the C front
> end is probably sufficient).
> 
> The alternative to unwinding is an explicit struct dl_exception *
> argument for functions which can fail, and use the return value to
> indicate whether there was a fatal error.  This sometimes causes issues
> where the return value is already used to signal both success and
> non-fatal error (e.g., -1 for failure from open_verify, or NULL for
> RTLD_NOLOAD failure from _dl_map_object_from_fd).
> 
> There is some impact on <dl-machine.h> because the relocation processing
> needs to change.  We can convert the relocation processing first to the
> new scheme and continue to signal any errors using longjmp in the
> generic code.  But supporting twice the number of relocation APIs for
> incremental conversion of targets will still be difficult.  I think we
> are still looking at one fairly large patch, given the number of
> architectures we support, although the changes should just be a few
> dozen lines per architecture.
> 
> We cannot convert the generic code first because that would mean calling
> setjmp each time before calling into architecture-specific code, which I
> expect will be too problematic from a performance point of view.
> 
> A third option is not use an explicit struct dl_exception * argument,
> but a per-thread variable.  This will require changes to support TLS
> (presumably the initial-exec variant) in the dynamic linker itself,
> which is currently missing.  Since the exception pointer is only needed
> in case of an error, using a TLS variable for it will avoid the overhead
> of maintaining the explicit exception pointer argument and passing it
> around.  Adding TLS support to the dynamic linker could be implemented
> incrementally across architectures, but the conversion itself faces the
> same flag day challenge as the explicit argument solution.  The explict
> argument also ensures that places stick out where encoding fatal errors
> in the return argument is difficult.  (A fourth option would compile the
> dynamic linker twice and use the TLS-less version for the initial
> loading of the executables and its dependencies.)
> 
> From a source code and binary size point of view, there is not much
> difference between using longjmp and the explicit function argument
> approach on x86-64.
> 
> Does anyone want to keep the longjmp approach?  Should I polish my patch
> and extend it to cover all architectures?

I don't have a strong opinion (and maybe not enough information to
have any opinion) on whether you keep longjmp or go with something
else, but I don't think there's any fundamental reason you need to
change this to fix current bugs. In musl, longjmp is used partly by
historical accident (I don't recall fully, but I think it was a matter
of adding dlopen to code that was originally written just for initial
dynamic linking at program entry), and doesn't have problems like the
ones you describe.

For the init/fini "soft errors" problem, it sounds to me like the code
that runs the ctors should just be outside of the scope of the
_dl_catch_error. If you've started running ctors, you're past the
point where the operation can be backed out in any meaningful sense.
I wonder if it's worse with ifunc, but I think not -- without
bind_now, the ifunc resolvers don't even need to run before you pass
the point of no return, and with bind_now, you'd be executing them in
a context where resolver errors are still "hard" and can/should cause
dlopen to fail.

Are there things I'm missing?

Rich

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Removing longjmp error handling from the dynamic loader
  2019-03-06 15:40 ` Rich Felker
@ 2019-03-07 14:03   ` Adhemerval Zanella
  2019-03-11 13:45   ` Florian Weimer
  1 sibling, 0 replies; 17+ messages in thread
From: Adhemerval Zanella @ 2019-03-07 14:03 UTC (permalink / raw)
  To: libc-alpha



On 06/03/2019 12:40, Rich Felker wrote:
> On Tue, Mar 05, 2019 at 05:37:09PM +0100, Florian Weimer wrote:
>> Currently, dynamic loader operations triggered by explicit function
>> calls (dlopen, dlsym, dlcose) wrapped in exception handlers, installed
>> using _dl_catch_error.  This function calls setjmp, and when an error is
>> raised the dynamic linker calls longjmp, resetting the call stack.
>>
>> This leads to strange bugs, such as undefined symbols in init/fini calls
>> being treated as non-fatal soft errors reported through dlerror
>> (bug 24304).  It also leads to a design where most dlopen failures are
>> not handled locally.  Instead they bubble up to the top-level dlopen
>> error handler installed using _dl_catch_error, which then attempts to
>> undo the changes of the  partially completed state to the global dynamic
>> linker state (see bug 20839 for a couple of problems with that).
>>
>> In the current scheme, more localized error handling is problematic
>> because it has high syntactic overhead: You need to define a struct for
>> argument data and a separate function that receives the data, and pass
>> both to _dl_catch_error.  There also could be a performance overhead if
>> individual malloc calls were protected in this way because each call to
>> _dl_catch_error incurs a call to setjmp.
>>
>> Does anyone think we should retain the current error handling scheme?
>> I personally do not have a problem with exceptions and stack unwinding,
>> but if this is what we want, we should use a DWARF-based unwinder and
>> GCC's exception handling features (the limited support in the C front
>> end is probably sufficient).
>>
>> The alternative to unwinding is an explicit struct dl_exception *
>> argument for functions which can fail, and use the return value to
>> indicate whether there was a fatal error.  This sometimes causes issues
>> where the return value is already used to signal both success and
>> non-fatal error (e.g., -1 for failure from open_verify, or NULL for
>> RTLD_NOLOAD failure from _dl_map_object_from_fd).
>>
>> There is some impact on <dl-machine.h> because the relocation processing
>> needs to change.  We can convert the relocation processing first to the
>> new scheme and continue to signal any errors using longjmp in the
>> generic code.  But supporting twice the number of relocation APIs for
>> incremental conversion of targets will still be difficult.  I think we
>> are still looking at one fairly large patch, given the number of
>> architectures we support, although the changes should just be a few
>> dozen lines per architecture.
>>
>> We cannot convert the generic code first because that would mean calling
>> setjmp each time before calling into architecture-specific code, which I
>> expect will be too problematic from a performance point of view.
>>
>> A third option is not use an explicit struct dl_exception * argument,
>> but a per-thread variable.  This will require changes to support TLS
>> (presumably the initial-exec variant) in the dynamic linker itself,
>> which is currently missing.  Since the exception pointer is only needed
>> in case of an error, using a TLS variable for it will avoid the overhead
>> of maintaining the explicit exception pointer argument and passing it
>> around.  Adding TLS support to the dynamic linker could be implemented
>> incrementally across architectures, but the conversion itself faces the
>> same flag day challenge as the explicit argument solution.  The explict
>> argument also ensures that places stick out where encoding fatal errors
>> in the return argument is difficult.  (A fourth option would compile the
>> dynamic linker twice and use the TLS-less version for the initial
>> loading of the executables and its dependencies.)
>>
>> From a source code and binary size point of view, there is not much
>> difference between using longjmp and the explicit function argument
>> approach on x86-64.
>>
>> Does anyone want to keep the longjmp approach?  Should I polish my patch
>> and extend it to cover all architectures?
> 
> I don't have a strong opinion (and maybe not enough information to
> have any opinion) on whether you keep longjmp or go with something
> else, but I don't think there's any fundamental reason you need to
> change this to fix current bugs. In musl, longjmp is used partly by
> historical accident (I don't recall fully, but I think it was a matter
> of adding dlopen to code that was originally written just for initial
> dynamic linking at program entry), and doesn't have problems like the
> ones you describe.
> 
> For the init/fini "soft errors" problem, it sounds to me like the code
> that runs the ctors should just be outside of the scope of the
> _dl_catch_error. If you've started running ctors, you're past the
> point where the operation can be backed out in any meaningful sense.
> I wonder if it's worse with ifunc, but I think not -- without
> bind_now, the ifunc resolvers don't even need to run before you pass
> the point of no return, and with bind_now, you'd be executing them in
> a context where resolver errors are still "hard" and can/should cause
> dlopen to fail.
> 
> Are there things I'm missing?
> 
> Rich
> 

Florian, it was not clear to me which approach you are actually working
on, explicit struct dl_exception *argument or per-thread variable. In
any case, can't we use Rich suggestion and move the logic starting
at dl_open_worker starting at _dl_init out of the function, and thus
out of _dl_catch_exception?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Removing longjmp error handling from the dynamic loader
  2019-03-06 15:40 ` Rich Felker
  2019-03-07 14:03   ` Adhemerval Zanella
@ 2019-03-11 13:45   ` Florian Weimer
  2019-03-11 22:52     ` Rich Felker
  1 sibling, 1 reply; 17+ messages in thread
From: Florian Weimer @ 2019-03-11 13:45 UTC (permalink / raw)
  To: Rich Felker; +Cc: libc-alpha

* Rich Felker:

> I don't have a strong opinion (and maybe not enough information to
> have any opinion) on whether you keep longjmp or go with something
> else, but I don't think there's any fundamental reason you need to
> change this to fix current bugs. In musl, longjmp is used partly by
> historical accident (I don't recall fully, but I think it was a matter
> of adding dlopen to code that was originally written just for initial
> dynamic linking at program entry), and doesn't have problems like the
> ones you describe.

Interesting.

In glibc, we have many callouts into architecture-specific routines from
generic code.  Some of these routines throw exceptions, and which ones
do is not always entirely clear.

For example, if there is a temporary memory allocation which persists
across such a callout, do we have to install a local exception handler
to clean up that allocation in case the helper routine throws?

If the error handling is expressed in the function signature (using that
exception pointer parameter), the behavior is much more explicit and we
can avoid these issues more easily.

> For the init/fini "soft errors" problem, it sounds to me like the code
> that runs the ctors should just be outside of the scope of the
> _dl_catch_error. If you've started running ctors, you're past the
> point where the operation can be backed out in any meaningful sense.

That's certainly true.  This one should be rather easy to fix.  It also
affects only dlopen/dlclose, at which point we can assume that we have
our full exception handling implementation.

> I wonder if it's worse with ifunc, but I think not -- without
> bind_now, the ifunc resolvers don't even need to run before you pass
> the point of no return, and with bind_now, you'd be executing them in
> a context where resolver errors are still "hard" and can/should cause
> dlopen to fail.

The question is whether a failure from the run-time trampoline should
ever be a soft error (that can be caught by dlopen).  I think we use the
soft/hard distinction differently, but you seem to suggest that a lazy
binding error during a call to an IFUNC resolver should not cause
process termination.  I think it is undefined like any other trampoline
failure, so we should abort.  We really don't know what the IFUNC
resolver was supposed to be doing and which of its side effects
happened.  The situation really is unrecoverable.

If we want to give users more precise control over binding errors, I
don't think anything based on SJLJ-style exception handling is the
answer.  From a technical perspective, it should be feasible to throw
something that can actually be caught from C++ code, but given the
current pervasive climate against table-driven stack unwinding for error
reporting, I'm not sure this is something that would be a good use of
anyone's time.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Removing longjmp error handling from the dynamic loader
  2019-03-05 16:37 Removing longjmp error handling from the dynamic loader Florian Weimer
  2019-03-06 15:40 ` Rich Felker
@ 2019-03-11 14:07 ` Carlos O'Donell
  2019-03-11 15:29   ` Florian Weimer
  1 sibling, 1 reply; 17+ messages in thread
From: Carlos O'Donell @ 2019-03-11 14:07 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha

On 3/5/19 11:37 AM, Florian Weimer wrote:
> Currently, dynamic loader operations triggered by explicit function
> calls (dlopen, dlsym, dlcose) wrapped in exception handlers, installed
> using _dl_catch_error.  This function calls setjmp, and when an error is
> raised the dynamic linker calls longjmp, resetting the call stack.

Yes.

Call this "Method 1: sjlj recovery"
 
> This leads to strange bugs, such as undefined symbols in init/fini calls
> being treated as non-fatal soft errors reported through dlerror
> (bug 24304).  It also leads to a design where most dlopen failures are
> not handled locally.  Instead they bubble up to the top-level dlopen
> error handler installed using _dl_catch_error, which then attempts to
> undo the changes of the  partially completed state to the global dynamic
> linker state (see bug 20839 for a couple of problems with that).

Nothing to me indicates that a new system wouldn't also have bugs.

The benefit you need to highlight is the ability for a new design to
mitigate bugs by the very nature of the structural changes being
proposed.

Both sjlj and a purely C++-style exception mechanism are, in my mind,
semantically equivalent, but the latter has more syntactic sugar to
allow you to practice good patterns that avoid mistakes (but it can
equally hide errors and create bugs).

> In the current scheme, more localized error handling is problematic
> because it has high syntactic overhead: You need to define a struct for
> argument data and a separate function that receives the data, and pass
> both to _dl_catch_error.  There also could be a performance overhead if
> individual malloc calls were protected in this way because each call to
> _dl_catch_error incurs a call to setjmp.

Yes, Method 1 requires passing down all information encapsulated in a
structure.

Why would malloc calls be protected in this way?

> Does anyone think we should retain the current error handling scheme?

That depends on the quality of the alternative proposal :-)

> I personally do not have a problem with exceptions and stack unwinding,
> but if this is what we want, we should use a DWARF-based unwinder and
> GCC's exception handling features (the limited support in the C front
> end is probably sufficient).

I agree.

Let's call this "Method 2: C exceptions"

For dlopen-et-al I think a DWARF-based unwinder would be great.

I assume you envision that exception handling in C would help cleanup
the code and allow more locally visible cleanups to happen (unwinding
state).

> The alternative to unwinding is an explicit struct dl_exception *
> argument for functions which can fail, and use the return value to
> indicate whether there was a fatal error.  This sometimes causes issues
> where the return value is already used to signal both success and
> non-fatal error (e.g., -1 for failure from open_verify, or NULL for
> RTLD_NOLOAD failure from _dl_map_object_from_fd).

No, if we're going to change it should be *towards* something where the
compiler can help us get it right.

I don't want to see an explicit "this" passed into every function by hand.

Let's call this "Method 3: Explicit this"

> There is some impact on <dl-machine.h> because the relocation processing
> needs to change.  We can convert the relocation processing first to the
> new scheme and continue to signal any errors using longjmp in the
> generic code.  But supporting twice the number of relocation APIs for
> incremental conversion of targets will still be difficult.  I think we
> are still looking at one fairly large patch, given the number of
> architectures we support, although the changes should just be a few
> dozen lines per architecture.

Please expand on this a bit more.

> We cannot convert the generic code first because that would mean calling
> setjmp each time before calling into architecture-specific code, which I
> expect will be too problematic from a performance point of view.

Right, this is an all-or-nothing change IMO.

> A third option is not use an explicit struct dl_exception * argument,
> but a per-thread variable.  This will require changes to support TLS
> (presumably the initial-exec variant) in the dynamic linker itself,
> which is currently missing.  Since the exception pointer is only needed
> in case of an error, using a TLS variable for it will avoid the overhead
> of maintaining the explicit exception pointer argument and passing it
> around.  Adding TLS support to the dynamic linker could be implemented
> incrementally across architectures, but the conversion itself faces the
> same flag day challenge as the explicit argument solution.  The explict
> argument also ensures that places stick out where encoding fatal errors
> in the return argument is difficult.  (A fourth option would compile the
> dynamic linker twice and use the TLS-less version for the initial
> loading of the executables and its dependencies.)

I don't like either of these options.

Let's call this "Method 3: TP explicit this"

I won't give the 4th option a name :-}

> From a source code and binary size point of view, there is not much
> difference between using longjmp and the explicit function argument
> approach on x86-64.

Right.

> Does anyone want to keep the longjmp approach?  Should I polish my patch
> and extend it to cover all architectures?

What does your patch do?

Method 1: sjlj
Method 2: C exceptions <------- I vote method 2.
Method 3: Explicit this
Method 4: TP explicit this

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Removing longjmp error handling from the dynamic loader
  2019-03-11 14:07 ` Carlos O'Donell
@ 2019-03-11 15:29   ` Florian Weimer
  2019-03-11 16:13     ` Carlos O'Donell
  0 siblings, 1 reply; 17+ messages in thread
From: Florian Weimer @ 2019-03-11 15:29 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-alpha

* Carlos O'Donell:

>> In the current scheme, more localized error handling is problematic
>> because it has high syntactic overhead: You need to define a struct for
>> argument data and a separate function that receives the data, and pass
>> both to _dl_catch_error.  There also could be a performance overhead if
>> individual malloc calls were protected in this way because each call to
>> _dl_catch_error incurs a call to setjmp.
>
> Yes, Method 1 requires passing down all information encapsulated in a
> structure.
>
> Why would malloc calls be protected in this way?

You need a local handler if the pointer is not rooted in something that
is thrown away by the top-most handler, and you call a
potentially-throwing function between the allocation and the
deallocation.

>> I personally do not have a problem with exceptions and stack unwinding,
>> but if this is what we want, we should use a DWARF-based unwinder and
>> GCC's exception handling features (the limited support in the C front
>> end is probably sufficient).
>
> I agree.
>
> Let's call this "Method 2: C exceptions"
>
> For dlopen-et-al I think a DWARF-based unwinder would be great.

It requires moving the unwinder implementation from libgcc_s to libc,
though.  The last time we discussed this (related to unwinder
performance issues and the dl_iterate_phdr interface), this idea was not
well-received.

If we fix the init/fini and IFUNC resolver bugs the way I intend, we
will never have to unwind through user code before hitting the final
handler, so there is no problem with the availability of unwinding data.
So at least that part is entirely solvable.

> I assume you envision that exception handling in C would help cleanup
> the code and allow more locally visible cleanups to happen (unwinding
> state).

It requires a vastly different coding style.  I have not thought much
about it.  I don't think we should rely on a relatively obscure GNU
extension so prominently.  Or put differently, if we want to use RAII as
an implementation technique, I don't think we should use C.

>> The alternative to unwinding is an explicit struct dl_exception *
>> argument for functions which can fail, and use the return value to
>> indicate whether there was a fatal error.  This sometimes causes issues
>> where the return value is already used to signal both success and
>> non-fatal error (e.g., -1 for failure from open_verify, or NULL for
>> RTLD_NOLOAD failure from _dl_map_object_from_fd).
>
> No, if we're going to change it should be *towards* something where the
> compiler can help us get it right.
>
> I don't want to see an explicit "this" passed into every function by hand.
>
> Let's call this "Method 3: Explicit this"

The explicit argument really helps to spot places where unwinding
happens.  For example, it makes it pretty clear that you have a memory
leak if you call a function with such an argument and an allocation is
not rooted in a global data structure.

>> There is some impact on <dl-machine.h> because the relocation processing
>> needs to change.  We can convert the relocation processing first to the
>> new scheme and continue to signal any errors using longjmp in the
>> generic code.  But supporting twice the number of relocation APIs for
>> incremental conversion of targets will still be difficult.  I think we
>> are still looking at one fairly large patch, given the number of
>> architectures we support, although the changes should just be a few
>> dozen lines per architecture.
>
> Please expand on this a bit more.

Let's look at sysdeps/powerpc/powerpc64/dl-machine.h.  elf_machine_rela
calls _dl_reloc_overflow and _dl_reloc_bad_type, among other things, and
those throw.  This means that elf_machine_rela needs to be adjusted with
the explicit argument.  RESOLVE_MAP calls _dl_lookup_symbol_x behind the
scenes as well, so that can throw, too.

Is there any state we need to roll back locally in elf_machine_rela?
Probably not.  So this is one area where we do not benefit from the
explicit argument.

>> A third option is not use an explicit struct dl_exception * argument,
>> but a per-thread variable.  This will require changes to support TLS
>> (presumably the initial-exec variant) in the dynamic linker itself,
>> which is currently missing.  Since the exception pointer is only needed
>> in case of an error, using a TLS variable for it will avoid the overhead
>> of maintaining the explicit exception pointer argument and passing it
>> around.  Adding TLS support to the dynamic linker could be implemented
>> incrementally across architectures, but the conversion itself faces the
>> same flag day challenge as the explicit argument solution.  The explict
>> argument also ensures that places stick out where encoding fatal errors
>> in the return argument is difficult.  (A fourth option would compile the
>> dynamic linker twice and use the TLS-less version for the initial
>> loading of the executables and its dependencies.)
>
> I don't like either of these options.
>
> Let's call this "Method 3: TP explicit this"
>
> I won't give the 4th option a name :-}

Note that the current implementation already stores the dl_exception *
and the jump buffer for the error handler in thread-local storage.
That's why we need to compile the exception handling machinery twice.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Removing longjmp error handling from the dynamic loader
  2019-03-11 15:29   ` Florian Weimer
@ 2019-03-11 16:13     ` Carlos O'Donell
  2019-03-11 16:53       ` Zack Weinberg
  2019-03-11 18:39       ` Florian Weimer
  0 siblings, 2 replies; 17+ messages in thread
From: Carlos O'Donell @ 2019-03-11 16:13 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

On 3/11/19 11:29 AM, Florian Weimer wrote:
> * Carlos O'Donell:
> 
>>> In the current scheme, more localized error handling is problematic
>>> because it has high syntactic overhead: You need to define a struct for
>>> argument data and a separate function that receives the data, and pass
>>> both to _dl_catch_error.  There also could be a performance overhead if
>>> individual malloc calls were protected in this way because each call to
>>> _dl_catch_error incurs a call to setjmp.
>>
>> Yes, Method 1 requires passing down all information encapsulated in a
>> structure.
>>
>> Why would malloc calls be protected in this way?
> 
> You need a local handler if the pointer is not rooted in something that
> is thrown away by the top-most handler, and you call a
> potentially-throwing function between the allocation and the
> deallocation.

Right, you might be able to get away with a free-list here, but I get your
point, in a fully generic system you must register a free handler which is
specific code that knows how to free the object you allocated. I hate to say
it but we're just designing C++ from the ground up again. Why not just get
the loader to the point it can run C++? This is a serious question.

>>> I personally do not have a problem with exceptions and stack unwinding,
>>> but if this is what we want, we should use a DWARF-based unwinder and
>>> GCC's exception handling features (the limited support in the C front
>>> end is probably sufficient).
>>
>> I agree.
>>
>> Let's call this "Method 2: C exceptions"
>>
>> For dlopen-et-al I think a DWARF-based unwinder would be great.
> 
> It requires moving the unwinder implementation from libgcc_s to libc,
> though.  The last time we discussed this (related to unwinder
> performance issues and the dl_iterate_phdr interface), this idea was not
> well-received.

It might still be the best technical choice.

I do not think we should discard this idea so easily.

Why wouldn't we pursue this option?

> If we fix the init/fini and IFUNC resolver bugs the way I intend, we
> will never have to unwind through user code before hitting the final
> handler, so there is no problem with the availability of unwinding data.
> So at least that part is entirely solvable.

I don't understand the exact bugs in question here, what are they?

What are your solutions?

>> I assume you envision that exception handling in C would help cleanup
>> the code and allow more locally visible cleanups to happen (unwinding
>> state).
> 
> It requires a vastly different coding style.  I have not thought much
> about it.  I don't think we should rely on a relatively obscure GNU
> extension so prominently.  Or put differently, if we want to use RAII as
> an implementation technique, I don't think we should use C.

This is a GNU project and we should do whatever we think is the best
solution for glibc.

If the obscure GNU extension has bugs, then that's a reason not to use
it, or if the debugger doesn't support it, then that's a reason not to
use it, but it can't just be about "obscure GNU extensions."

Alternatives are using Rust or C++.

I would prefer a limited form of C++ without libstdc++ support.

Get the loader to the point where it can run C++, with the unwinder
builtin, and then only use C++ classes the loader itself defines.

>>> The alternative to unwinding is an explicit struct dl_exception *
>>> argument for functions which can fail, and use the return value to
>>> indicate whether there was a fatal error.  This sometimes causes issues
>>> where the return value is already used to signal both success and
>>> non-fatal error (e.g., -1 for failure from open_verify, or NULL for
>>> RTLD_NOLOAD failure from _dl_map_object_from_fd).
>>
>> No, if we're going to change it should be *towards* something where the
>> compiler can help us get it right.
>>
>> I don't want to see an explicit "this" passed into every function by hand.
>>
>> Let's call this "Method 3: Explicit this"
> 
> The explicit argument really helps to spot places where unwinding
> happens.  For example, it makes it pretty clear that you have a memory
> leak if you call a function with such an argument and an allocation is
> not rooted in a global data structure.

I agree, but is it the best option?

Or are you arguing it's the best value for cost option?

>>> There is some impact on <dl-machine.h> because the relocation processing
>>> needs to change.  We can convert the relocation processing first to the
>>> new scheme and continue to signal any errors using longjmp in the
>>> generic code.  But supporting twice the number of relocation APIs for
>>> incremental conversion of targets will still be difficult.  I think we
>>> are still looking at one fairly large patch, given the number of
>>> architectures we support, although the changes should just be a few
>>> dozen lines per architecture.
>>
>> Please expand on this a bit more.
> 
> Let's look at sysdeps/powerpc/powerpc64/dl-machine.h.  elf_machine_rela
> calls _dl_reloc_overflow and _dl_reloc_bad_type, among other things, and
> those throw.  This means that elf_machine_rela needs to be adjusted with
> the explicit argument.  RESOLVE_MAP calls _dl_lookup_symbol_x behind the
> scenes as well, so that can throw, too.

I see where you're going with this, particularly the state rollback.
 
> Is there any state we need to roll back locally in elf_machine_rela?
> Probably not.  So this is one area where we do not benefit from the
> explicit argument.

Understood.

>>> A third option is not use an explicit struct dl_exception * argument,
>>> but a per-thread variable.  This will require changes to support TLS
>>> (presumably the initial-exec variant) in the dynamic linker itself,
>>> which is currently missing.  Since the exception pointer is only needed
>>> in case of an error, using a TLS variable for it will avoid the overhead
>>> of maintaining the explicit exception pointer argument and passing it
>>> around.  Adding TLS support to the dynamic linker could be implemented
>>> incrementally across architectures, but the conversion itself faces the
>>> same flag day challenge as the explicit argument solution.  The explict
>>> argument also ensures that places stick out where encoding fatal errors
>>> in the return argument is difficult.  (A fourth option would compile the
>>> dynamic linker twice and use the TLS-less version for the initial
>>> loading of the executables and its dependencies.)
>>
>> I don't like either of these options.
>>
>> Let's call this "Method 3: TP explicit this"
>>
>> I won't give the 4th option a name :-}
> 
> Note that the current implementation already stores the dl_exception *
> and the jump buffer for the error handler in thread-local storage.

Yes, I see that in elf/dl-error-skeleton.c (catch_hook).

> That's why we need to compile the exception handling machinery twice.

So to be clear, you're proposing:

"Method 3: Explicit this"

Which you argue has highest value, with lowest cost.

While I've been arguing for "Method 2: C exceptions" which you believe
has too much cost given the value, since we'd have to:
- move the unwinder to libc
- switch to some C++ subset

I can understand your position, and I'm not opposed to
"Method 3: Explicit this", it would call out the exact places where
we might have problems for a future conversion.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Removing longjmp error handling from the dynamic loader
  2019-03-11 16:13     ` Carlos O'Donell
@ 2019-03-11 16:53       ` Zack Weinberg
  2019-03-11 17:08         ` Florian Weimer
  2019-03-11 18:39       ` Florian Weimer
  1 sibling, 1 reply; 17+ messages in thread
From: Zack Weinberg @ 2019-03-11 16:53 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Florian Weimer, GNU C Library

On Mon, Mar 11, 2019 at 12:13 PM Carlos O'Donell <carlos@redhat.com> wrote:
> On 3/11/19 11:29 AM, Florian Weimer wrote:
> > It requires moving the unwinder implementation from libgcc_s to libc,
> > though.  The last time we discussed this (related to unwinder
> > performance issues and the dl_iterate_phdr interface), this idea was not
> > well-received.
>
> It might still be the best technical choice.
> I do not think we should discard this idea so easily.
> Why wouldn't we pursue this option?

I haven't thought about this much, but I don't like the idea of
increasing the set of functions potentially executed during symbol
resolution to the tune of the entire unwinder, because of the unusual
constraints on code executed in that context (e.g. must not take
locks, must protect itself from cancellation, must not touch the
normal errno).  I was already thinking about saying that I supported
Florian's original proposal just because it would mean *reducing* that
set by at least setjmp and longjmp.

zw

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Removing longjmp error handling from the dynamic loader
  2019-03-11 16:53       ` Zack Weinberg
@ 2019-03-11 17:08         ` Florian Weimer
  2019-03-11 18:39           ` Zack Weinberg
  0 siblings, 1 reply; 17+ messages in thread
From: Florian Weimer @ 2019-03-11 17:08 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Carlos O'Donell, GNU C Library

* Zack Weinberg:

> On Mon, Mar 11, 2019 at 12:13 PM Carlos O'Donell <carlos@redhat.com> wrote:
>> On 3/11/19 11:29 AM, Florian Weimer wrote:
>> > It requires moving the unwinder implementation from libgcc_s to libc,
>> > though.  The last time we discussed this (related to unwinder
>> > performance issues and the dl_iterate_phdr interface), this idea was not
>> > well-received.
>>
>> It might still be the best technical choice.
>> I do not think we should discard this idea so easily.
>> Why wouldn't we pursue this option?
>
> I haven't thought about this much, but I don't like the idea of
> increasing the set of functions potentially executed during symbol
> resolution to the tune of the entire unwinder, because of the unusual
> constraints on code executed in that context (e.g. must not take
> locks, must protect itself from cancellation, must not touch the
> normal errno).

This is a good point which I had not considered.

My position is that errors during symbol resolution in lazy binding are
never recoverable.  If the error is not recoverable, there is no need to
do any unwinding at all.  We currently get this wrong for init/fini
(bug 24304).

Rich Felker brought up this matter in conjunction with IFUNC resolvers.
These can be called during relocation processing (from dlopen) or lazy
binding, when they themselves trigger lazy binding.  I think these
errors should not be recoverable, either, whether they happen during
relocation processing or lazy binding.  But there is a narrow edge case
(IFUNC resolver called during relocation processing which triggers lazy
binding which fails in symbol lookup) where we actually have user code
(the IFUNC resolver) on the stack, and we would have to unwind through
the trampoline/IFUNC resolver/dlopen sequence if wanted to make this
error recoverable.  (Which I think we should not do, but the discussion
is still open.)

But even that edge case only happens after dlopen, so at least it's not
in an async-signal-safe context.  But the user stack frame pretty much
kills the DWARF-based unwinding approach if it's something we need to
support.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Removing longjmp error handling from the dynamic loader
  2019-03-11 16:13     ` Carlos O'Donell
  2019-03-11 16:53       ` Zack Weinberg
@ 2019-03-11 18:39       ` Florian Weimer
  2019-03-11 19:30         ` Carlos O'Donell
  1 sibling, 1 reply; 17+ messages in thread
From: Florian Weimer @ 2019-03-11 18:39 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-alpha

* Carlos O'Donell:

> Right, you might be able to get away with a free-list here, but I get your
> point, in a fully generic system you must register a free handler which is
> specific code that knows how to free the object you allocated. I hate to say
> it but we're just designing C++ from the ground up again. Why not just get
> the loader to the point it can run C++? This is a serious question.

We would have to solve the bootstrapping issue.

I expect this topic (use of C++) to be rather controversial.

>>> Let's call this "Method 2: C exceptions"
>>>
>>> For dlopen-et-al I think a DWARF-based unwinder would be great.
>> 
>> It requires moving the unwinder implementation from libgcc_s to libc,
>> though.  The last time we discussed this (related to unwinder
>> performance issues and the dl_iterate_phdr interface), this idea was not
>> well-received.
>
> It might still be the best technical choice.
>
> I do not think we should discard this idea so easily.
>
> Why wouldn't we pursue this option?

Well, it turns this whole exercise from a one-day hack into a
longer-term integration project. 8-)

If this is the direction in which we want to go, it doesn't look
technically unsound to me, just very controversial.

>> If we fix the init/fini and IFUNC resolver bugs the way I intend, we
>> will never have to unwind through user code before hitting the final
>> handler, so there is no problem with the availability of unwinding data.
>> So at least that part is entirely solvable.
>
> I don't understand the exact bugs in question here, what are they?

These bugs are unrelated.  We currently have memory leaks if getaddrinfo
is canceled, for example.  Fixing that by compiling getaddrinfo with
-fexceptions and using RAII wouldn't solve them because getaddrinfo
might call into an NSS service module which is not compiled with
unwinding information.  Hence the need for the dual unwinding with
longjmp and the libgcc_s unwinder in libpthread's cancellation
implementation.  (We might get away with that if we install a setjmp
barrier before calling into an external NSS module, though.)

For the socket file descriptor opened by nss_dns/libresolv and some
other resources used there, we currently use a strategy like the free
list you mentioned (_res and struct resolv_context).  We need to perform
heap allocations for the resolver context because the thread cleanup
code runs only after fully unwinding the stack, so it won't be able to
use the information there.  (This is different in the SJLJ case in the
dynamic linker, where we can easily change what the code does before
calling longjmp.)  But these two mechanisms do not cover everything that
getaddrinfo allocates.

In short, this is an entirely different area, but it could benefit from
experience with other unwinding designs.

>>> I assume you envision that exception handling in C would help cleanup
>>> the code and allow more locally visible cleanups to happen (unwinding
>>> state).
>> 
>> It requires a vastly different coding style.  I have not thought much
>> about it.  I don't think we should rely on a relatively obscure GNU
>> extension so prominently.  Or put differently, if we want to use RAII as
>> an implementation technique, I don't think we should use C.
>
> This is a GNU project and we should do whatever we think is the best
> solution for glibc.
>
> If the obscure GNU extension has bugs, then that's a reason not to use
> it, or if the debugger doesn't support it, then that's a reason not to
> use it, but it can't just be about "obscure GNU extensions."

The extension itself (the cleanup attribute) is increasingly used.
systemd seems quite happy as a user, and there are others.

Combining it with DWARF unwinding is rare and probably restricted to
some limited scenarios related to cancellation.  Fedora and downstream
are compiled with -fexceptions, so there probably aren't any obvious
ICEs in that area.

> Alternatives are using Rust or C++.

Rust is not supposed to have exceptions.  It only supports automatic
error returns based on the return type (a specially annotated sum type).
There's also no GNU front end for it.

(In reality, Rust has exceptions; the test framework even requires them.
Last time I tried, you couldn't compile your code without stack
unwinding support if you wanted to test it.  By default, out-of-bounds
array access in Rust throws, and like in Go, there is just not very
convenient language support to handle the exception, but the support is
there in the run-time library.)

> I would prefer a limited form of C++ without libstdc++ support.

We would have to move the C++ personality routine and the core exception
handling to ld.so.  They live in libstdc++, not libgcc_s, and are not
part of the regular unwinder. 8-/

Maybe we could have a restricted, hidden implementation of C++ exception
support.  I have not put much thought to that.  I didn't expect this
turn of the discussion.

Note that RAII is helpful as an implementation strategy even without
exceptions.  (But it makes some uses of constructors rather cumbersome
because objects are not necessarily in a valid state right after
construction.  It would work fine for struct scratch_buffer and the
dynarrays, though.  Guess why.)

>>>> The alternative to unwinding is an explicit struct dl_exception *
>>>> argument for functions which can fail, and use the return value to
>>>> indicate whether there was a fatal error.  This sometimes causes issues
>>>> where the return value is already used to signal both success and
>>>> non-fatal error (e.g., -1 for failure from open_verify, or NULL for
>>>> RTLD_NOLOAD failure from _dl_map_object_from_fd).
>>>
>>> No, if we're going to change it should be *towards* something where the
>>> compiler can help us get it right.
>>>
>>> I don't want to see an explicit "this" passed into every function by hand.
>>>
>>> Let's call this "Method 3: Explicit this"
>> 
>> The explicit argument really helps to spot places where unwinding
>> happens.  For example, it makes it pretty clear that you have a memory
>> leak if you call a function with such an argument and an allocation is
>> not rooted in a global data structure.
>
> I agree, but is it the best option?
>
> Or are you arguing it's the best value for cost option?

I expect that we have to do this conversion anyway (I already did for
x86-64) to find out the spots which need careful handling in any of the
implicit approaches.

Maybe we could avoid that if we had exact call-graph-based tools, for
all relevant targets, but I don't think they exist.

> So to be clear, you're proposing:
>
> "Method 3: Explicit this"
>
> Which you argue has highest value, with lowest cost.

Yes, and it also has the best tool support, I think.

We could use the knowledge gained by the conversion to fix bugs, but I
think it will bit-rot over time, as people add more
architecture-specific code and support more features in the dynamic
linker.

> While I've been arguing for "Method 2: C exceptions" which you believe
> has too much cost given the value, since we'd have to:
> - move the unwinder to libc
> - switch to some C++ subset

This sounds tempting as well, but I'm not sure if it's the right place
to start with using C++ in glibc.

One thing we have not considered much so far is what you call “free
lists”, and generally not doing any unrecoverable actions until we reach
the point of no return in dlopen (and eliminating the possiblity of
failures in dlclose, by avoiding calling malloc there).  For example, we
could store the temporary file descriptors in the exception context (by
a callout to libc.so.6) and make sure that they are closed when
unwinding.  We can avoid memory leaks for temporary allocations in much
the same way.

In a sense, we would provide a more approachable programming environment
for unwinding.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Removing longjmp error handling from the dynamic loader
  2019-03-11 17:08         ` Florian Weimer
@ 2019-03-11 18:39           ` Zack Weinberg
  2019-03-11 18:49             ` Florian Weimer
  0 siblings, 1 reply; 17+ messages in thread
From: Zack Weinberg @ 2019-03-11 18:39 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Carlos O'Donell, GNU C Library

On Mon, Mar 11, 2019 at 1:08 PM Florian Weimer <fweimer@redhat.com> wrote:
> * Zack Weinberg:
> > I haven't thought about this much, but I don't like the idea of
> > increasing the set of functions potentially executed during symbol
> > resolution to the tune of the entire unwinder, because of the unusual
> > constraints on code executed in that context (e.g. must not take
> > locks, must protect itself from cancellation, must not touch the
> > normal errno).
>
> This is a good point which I had not considered.
>
> My position is that errors during symbol resolution in lazy binding are
> never recoverable.  If the error is not recoverable, there is no need to
> do any unwinding at all.  We currently get this wrong for init/fini
> (bug 24304).

I tend to agree with this.  I'm also honestly dubious of the value of
lazy binding anymore, but that we should think about separately.

> Rich Felker brought up this matter in conjunction with IFUNC resolvers.
> These can be called during relocation processing (from dlopen) or lazy
> binding, when they themselves trigger lazy binding.  I think these
> errors should not be recoverable, either, whether they happen during
> relocation processing or lazy binding.  But there is a narrow edge case
> (IFUNC resolver called during relocation processing which triggers lazy
> binding which fails in symbol lookup)

I wonder how practical it would be to discover the set of symbols that
an IFUNC resolver might attempt to use, and resolve those before
calling into the IFUNC resolver.  IFUNCs are still new enough that we
could require additional annotations in the object file if that would
help.

zw

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Removing longjmp error handling from the dynamic loader
  2019-03-11 18:39           ` Zack Weinberg
@ 2019-03-11 18:49             ` Florian Weimer
  0 siblings, 0 replies; 17+ messages in thread
From: Florian Weimer @ 2019-03-11 18:49 UTC (permalink / raw)
  To: Zack Weinberg; +Cc: Carlos O'Donell, GNU C Library

* Zack Weinberg:

>> Rich Felker brought up this matter in conjunction with IFUNC resolvers.
>> These can be called during relocation processing (from dlopen) or lazy
>> binding, when they themselves trigger lazy binding.  I think these
>> errors should not be recoverable, either, whether they happen during
>> relocation processing or lazy binding.  But there is a narrow edge case
>> (IFUNC resolver called during relocation processing which triggers lazy
>> binding which fails in symbol lookup)
>
> I wonder how practical it would be to discover the set of symbols that
> an IFUNC resolver might attempt to use, and resolve those before
> calling into the IFUNC resolver.  IFUNCs are still new enough that we
> could require additional annotations in the object file if that would
> help.

Considering that the whole purpose of IFUNC resolvers is to select
implementations, eagerly resolving all potentially needed relocations
(even without BIND_NOW) seems rather excessive and not very
backwards-compatible.

I've proposed two-phase relocation processing before, first all
non-IFUNC relocations, followed by the IFUNC relocations that have been
queued.  A two-phase approach is needed for most cases involving IFUNCs,
non-lazy binding, and symbol interposition.  It does not solve all IFUNC
issues (you cannot in general, if there is a circular depenedency for
example), but I think it would be a good addition, and it would even fix
some glibc bugs where we use IFUNC resolvers improperly.  It applies to
the BIND_NOW case only, which, like you, I consider the more relevant
one.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Removing longjmp error handling from the dynamic loader
  2019-03-11 18:39       ` Florian Weimer
@ 2019-03-11 19:30         ` Carlos O'Donell
  2019-03-13 15:51           ` Florian Weimer
  0 siblings, 1 reply; 17+ messages in thread
From: Carlos O'Donell @ 2019-03-11 19:30 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

On 3/11/19 2:39 PM, Florian Weimer wrote:
>> So to be clear, you're proposing:
>>
>> "Method 3: Explicit this"
>>
>> Which you argue has highest value, with lowest cost.
> 
> Yes, and it also has the best tool support, I think.
> 
> We could use the knowledge gained by the conversion to fix bugs, but I
> think it will bit-rot over time, as people add more
> architecture-specific code and support more features in the dynamic
> linker.
> 
>> While I've been arguing for "Method 2: C exceptions" which you believe
>> has too much cost given the value, since we'd have to:
>> - move the unwinder to libc
>> - switch to some C++ subset
> 
> This sounds tempting as well, but I'm not sure if it's the right place
> to start with using C++ in glibc.

I can accept that :-)
 
> One thing we have not considered much so far is what you call “free
> lists”, and generally not doing any unrecoverable actions until we reach
> the point of no return in dlopen (and eliminating the possiblity of
> failures in dlclose, by avoiding calling malloc there).  For example, we
> could store the temporary file descriptors in the exception context (by
> a callout to libc.so.6) and make sure that they are closed when
> unwinding.  We can avoid memory leaks for temporary allocations in much
> the same way.
> 
> In a sense, we would provide a more approachable programming environment
> for unwinding.

OK, so lets expand on that a bit, if you went down this route what do the
steps look like? If you enumerate them a bit more I think we'll all be able
to comment on them first, and then agree that it's the right way forward.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Removing longjmp error handling from the dynamic loader
  2019-03-11 13:45   ` Florian Weimer
@ 2019-03-11 22:52     ` Rich Felker
  2019-03-12 10:30       ` Florian Weimer
  0 siblings, 1 reply; 17+ messages in thread
From: Rich Felker @ 2019-03-11 22:52 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

On Mon, Mar 11, 2019 at 02:45:13PM +0100, Florian Weimer wrote:
> * Rich Felker:
> 
> > I don't have a strong opinion (and maybe not enough information to
> > have any opinion) on whether you keep longjmp or go with something
> > else, but I don't think there's any fundamental reason you need to
> > change this to fix current bugs. In musl, longjmp is used partly by
> > historical accident (I don't recall fully, but I think it was a matter
> > of adding dlopen to code that was originally written just for initial
> > dynamic linking at program entry), and doesn't have problems like the
> > ones you describe.
> 
> Interesting.
> 
> In glibc, we have many callouts into architecture-specific routines from
> generic code.  Some of these routines throw exceptions, and which ones
> do is not always entirely clear.
> 
> For example, if there is a temporary memory allocation which persists
> across such a callout, do we have to install a local exception handler
> to clean up that allocation in case the helper routine throws?
> 
> If the error handling is expressed in the function signature (using that
> exception pointer parameter), the behavior is much more explicit and we
> can avoid these issues more easily.
> 
> > For the init/fini "soft errors" problem, it sounds to me like the code
> > that runs the ctors should just be outside of the scope of the
> > _dl_catch_error. If you've started running ctors, you're past the
> > point where the operation can be backed out in any meaningful sense.
> 
> That's certainly true.  This one should be rather easy to fix.  It also
> affects only dlopen/dlclose, at which point we can assume that we have
> our full exception handling implementation.
> 
> > I wonder if it's worse with ifunc, but I think not -- without
> > bind_now, the ifunc resolvers don't even need to run before you pass
> > the point of no return, and with bind_now, you'd be executing them in
> > a context where resolver errors are still "hard" and can/should cause
> > dlopen to fail.
> 
> The question is whether a failure from the run-time trampoline should
> ever be a soft error (that can be caught by dlopen).  I think we use the
> soft/hard distinction differently, but you seem to suggest that a lazy
> binding error during a call to an IFUNC resolver should not cause
> process termination.  I think it is undefined like any other trampoline
> failure, so we should abort.  We really don't know what the IFUNC
> resolver was supposed to be doing and which of its side effects
> happened.  The situation really is unrecoverable.

Assuming ifunc resolvers aren't "allowed" to do much beyond probing
hwcaps/cpuid/etc. to pick an implementation, I don't see any reason
that resolver failure during an ifunc resolver function should
terminate the process. Missing symbols at dlopen time with RTLD_NOW or
DT_BINDNOW or whatever should never crash the application, but should
report the error. With ifunc, I think (?) you have the possibility
that the ifunc resolver code will call another function in the library
being loaded (or one of its deps) via a plt slot that hasn't yet been
initialized, because there's no way to know a dependency order for the
relocations to avoid this. This should probably longjmp back and make
dlopen fail; I can't see any other way to make it work since there's
no way to make forward progress past the impossible-to-satisfy call.
But maybe the relocations can just be ordered such that this isn't a
concern (by checking all symbolic references prior to doing any ifunc
resolvers?).

> If we want to give users more precise control over binding errors, I
> don't think anything based on SJLJ-style exception handling is the
> answer.

I don't see why there should be any expectation that you can use C++
exception handling for this; the contract of dlopen is that it succeed
or return an error, not that it might terminate via an exception.

Rich

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Removing longjmp error handling from the dynamic loader
  2019-03-11 22:52     ` Rich Felker
@ 2019-03-12 10:30       ` Florian Weimer
  2019-03-12 12:00         ` Szabolcs Nagy
  0 siblings, 1 reply; 17+ messages in thread
From: Florian Weimer @ 2019-03-12 10:30 UTC (permalink / raw)
  To: Rich Felker; +Cc: libc-alpha

* Rich Felker:

> Assuming ifunc resolvers aren't "allowed" to do much beyond probing
> hwcaps/cpuid/etc. to pick an implementation, I don't see any reason
> that resolver failure during an ifunc resolver function should
> terminate the process.

Depending on which documentation you read, IFUNC resolvers must not
depend on run-time relocations themselves.  In that case, lazy binding
failure during execution of an IFUNC resolver cannot possibly happen in
a valid program.

We would also have to disable signals while IFUNC resolvers are running,
so that lazy binding errors in signal handlers do not leak into the
dlopen call (after longjmp'ing out of the signal handler).

Is this really worth the trouble?

> Missing symbols at dlopen time with RTLD_NOW or DT_BINDNOW or whatever
> should never crash the application, but should report the error. With
> ifunc, I think (?) you have the possibility that the ifunc resolver
> code will call another function in the library being loaded (or one of
> its deps) via a plt slot that hasn't yet been initialized, because
> there's no way to know a dependency order for the relocations to avoid
> this.

We perform relocations in topological order, and IRELATIVE relocations
are sorted last by current binutils.  So in the absence of cyclic
dependencies (perhaps as the result of symbol interposition), IFUNC
resolvers will not encounter uninitialized PLT slots.

> This should probably longjmp back and make dlopen fail; I can't see
> any other way to make it work since there's no way to make forward
> progress past the impossible-to-satisfy call.

If you can detect at all that the relocation has not been processed, you
could longjmp out of the IFUNC resolver and try something else, until
there is definitely no way to make progress.  (I'm not saying that this
longjmp is valid, but it is a possibility.)

But I really don't see how we can make this work reliably because there
are relocation dependencies that do not involve lazy binding or PLT
calls, and we cannot detect those.  The IFUNC resolver would just use
uninitialized data or data that is later overwritten.

Two-phase relocation processing in topology order (first all non-IFUNC
relocations, then the IFUNC relocations) seems to cover all the
practical cases involving symbol interposition.  It deals correctly with
glibc's internal uses of those (which can currently lead to crashes, see
bug 21041).  It also covers, by design, all relocations for data symbols
because they cannot involve IFUNCs.  It does not deal with all cases
where an IFUNC resolver uses a function pointer variable that has been
initialized by a relocation, and that value is itself the result of an
IFUNC resolver.  It also cannot support cases where an IFUNC resolver
depends on ELF constructors having run for one of its dependencies
(which some people did, until distributions started building with
BIND_NOW).

> But maybe the relocations can just be ordered such that this isn't a
> concern (by checking all symbolic references prior to doing any ifunc
> resolvers?).

I think doing that would be excessive and it wouldn't cover the case
where the IFUNC resolver actually relies on lazy binding for choosing
the implementation (which could be constructed as valid if IFUNC
resolvers may rely on relocations).

>> If we want to give users more precise control over binding errors, I
>> don't think anything based on SJLJ-style exception handling is the
>> answer.
>
> I don't see why there should be any expectation that you can use C++
> exception handling for this; the contract of dlopen is that it succeed
> or return an error, not that it might terminate via an exception.

I meant for a call that results in a lazy binding failure.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Removing longjmp error handling from the dynamic loader
  2019-03-12 10:30       ` Florian Weimer
@ 2019-03-12 12:00         ` Szabolcs Nagy
  0 siblings, 0 replies; 17+ messages in thread
From: Szabolcs Nagy @ 2019-03-12 12:00 UTC (permalink / raw)
  To: Florian Weimer, Rich Felker; +Cc: nd, libc-alpha@sourceware.org

On 12/03/2019 10:30, Florian Weimer wrote:
> * Rich Felker:
>> Missing symbols at dlopen time with RTLD_NOW or DT_BINDNOW or whatever
>> should never crash the application, but should report the error. With
>> ifunc, I think (?) you have the possibility that the ifunc resolver
>> code will call another function in the library being loaded (or one of
>> its deps) via a plt slot that hasn't yet been initialized, because
>> there's no way to know a dependency order for the relocations to avoid
>> this.
> 
> We perform relocations in topological order, and IRELATIVE relocations
> are sorted last by current binutils.  So in the absence of cyclic
> dependencies (perhaps as the result of symbol interposition), IFUNC
> resolvers will not encounter uninitialized PLT slots.

IRELATIVE is sorted after JUMP_SLOT

but GLOB_DAT and ABS come before and those can be against an
STT_GNU_IFUNC symbol that is defined in the same module, so
ifunc resolvers can encounter uninitialized PLT slots (no
cyclic dependency needed, with or without bind now).

> IFUNC resolver.  It also cannot support cases where an IFUNC resolver
> depends on ELF constructors having run for one of its dependencies
> (which some people did, until distributions started building with
> BIND_NOW).

depending on ctor in ifunc sounds broken.
(i know at least x86 gcc libatomic did this, but that was a bug)

>> But maybe the relocations can just be ordered such that this isn't a
>> concern (by checking all symbolic references prior to doing any ifunc
>> resolvers?).
> 
> I think doing that would be excessive and it wouldn't cover the case
> where the IFUNC resolver actually relies on lazy binding for choosing
> the implementation (which could be constructed as valid if IFUNC
> resolvers may rely on relocations).

depending on lazy binding sounds fragile.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Removing longjmp error handling from the dynamic loader
  2019-03-11 19:30         ` Carlos O'Donell
@ 2019-03-13 15:51           ` Florian Weimer
  0 siblings, 0 replies; 17+ messages in thread
From: Florian Weimer @ 2019-03-13 15:51 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-alpha

* Carlos O'Donell:

>> One thing we have not considered much so far is what you call “free
>> lists”, and generally not doing any unrecoverable actions until we reach
>> the point of no return in dlopen (and eliminating the possiblity of
>> failures in dlclose, by avoiding calling malloc there).  For example, we
>> could store the temporary file descriptors in the exception context (by
>> a callout to libc.so.6) and make sure that they are closed when
>> unwinding.  We can avoid memory leaks for temporary allocations in much
>> the same way.
>> 
>> In a sense, we would provide a more approachable programming environment
>> for unwinding.
>
> OK, so lets expand on that a bit, if you went down this route what do
> the steps look like? If you enumerate them a bit more I think we'll
> all be able to comment on them first, and then agree that it's the
> right way forward.

I haven't read all of the loader code, but I think we need to manage at
least the following resources:

* The loader lock
* Temporary memory allocation (e.g. file name buffers)
* Persistant memory allocations related to the link map
* Other persistant memory allocations
* One file descriptor
* File mappings
* How new link maps are hooked into the loader state
* Updates to global loader state (various search paths)
* Maybe TLS data in the future (if we allocate it eagerly)

Ideally, there would be a precise point, clearly indicated in the source
code, at which which we know that dlopen cannot fail.  I tried to
identify this point in the existing sources, but it is entirely unclear
where it should be.

Conceptually, I think we all agree is that it has to come before we call
the ELF constructors.

After the point of no return, we cannot do anything that might
potentially fail in a non-critical fashion, such as allocating memory.
For example, in the current code, we appear to call malloc in
add_to_global, when adding the new objects to the global scope.  We
would have to perform this allocation beforehand (and make sure that we
do not need more memory afterwards because some other thread made
further changes).

So overall, I think the longjmp vs no longjmp discussion may be a
distraction, and the harder problems lie elsewhere.

Please also consider Rich Felker's comments regarding lazy binding
failures in IFUNC resolvers; for those we would have to use longjmp
anyway if we want to keep them as non-fatal because the IFUNC resolver
may not have unwinding information.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-03-13 15:51 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-05 16:37 Removing longjmp error handling from the dynamic loader Florian Weimer
2019-03-06 15:40 ` Rich Felker
2019-03-07 14:03   ` Adhemerval Zanella
2019-03-11 13:45   ` Florian Weimer
2019-03-11 22:52     ` Rich Felker
2019-03-12 10:30       ` Florian Weimer
2019-03-12 12:00         ` Szabolcs Nagy
2019-03-11 14:07 ` Carlos O'Donell
2019-03-11 15:29   ` Florian Weimer
2019-03-11 16:13     ` Carlos O'Donell
2019-03-11 16:53       ` Zack Weinberg
2019-03-11 17:08         ` Florian Weimer
2019-03-11 18:39           ` Zack Weinberg
2019-03-11 18:49             ` Florian Weimer
2019-03-11 18:39       ` Florian Weimer
2019-03-11 19:30         ` Carlos O'Donell
2019-03-13 15:51           ` Florian Weimer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).