* Removing longjmp error handling from the dynamic loader @ 2019-03-05 16:37 Florian Weimer 2019-03-06 15:40 ` Rich Felker 2019-03-11 14:07 ` Carlos O'Donell 0 siblings, 2 replies; 17+ messages in thread From: Florian Weimer @ 2019-03-05 16:37 UTC (permalink / raw) To: libc-alpha Currently, dynamic loader operations triggered by explicit function calls (dlopen, dlsym, dlcose) wrapped in exception handlers, installed using _dl_catch_error. This function calls setjmp, and when an error is raised the dynamic linker calls longjmp, resetting the call stack. This leads to strange bugs, such as undefined symbols in init/fini calls being treated as non-fatal soft errors reported through dlerror (bug 24304). It also leads to a design where most dlopen failures are not handled locally. Instead they bubble up to the top-level dlopen error handler installed using _dl_catch_error, which then attempts to undo the changes of the partially completed state to the global dynamic linker state (see bug 20839 for a couple of problems with that). In the current scheme, more localized error handling is problematic because it has high syntactic overhead: You need to define a struct for argument data and a separate function that receives the data, and pass both to _dl_catch_error. There also could be a performance overhead if individual malloc calls were protected in this way because each call to _dl_catch_error incurs a call to setjmp. Does anyone think we should retain the current error handling scheme? I personally do not have a problem with exceptions and stack unwinding, but if this is what we want, we should use a DWARF-based unwinder and GCC's exception handling features (the limited support in the C front end is probably sufficient). The alternative to unwinding is an explicit struct dl_exception * argument for functions which can fail, and use the return value to indicate whether there was a fatal error. This sometimes causes issues where the return value is already used to signal both success and non-fatal error (e.g., -1 for failure from open_verify, or NULL for RTLD_NOLOAD failure from _dl_map_object_from_fd). There is some impact on <dl-machine.h> because the relocation processing needs to change. We can convert the relocation processing first to the new scheme and continue to signal any errors using longjmp in the generic code. But supporting twice the number of relocation APIs for incremental conversion of targets will still be difficult. I think we are still looking at one fairly large patch, given the number of architectures we support, although the changes should just be a few dozen lines per architecture. We cannot convert the generic code first because that would mean calling setjmp each time before calling into architecture-specific code, which I expect will be too problematic from a performance point of view. A third option is not use an explicit struct dl_exception * argument, but a per-thread variable. This will require changes to support TLS (presumably the initial-exec variant) in the dynamic linker itself, which is currently missing. Since the exception pointer is only needed in case of an error, using a TLS variable for it will avoid the overhead of maintaining the explicit exception pointer argument and passing it around. Adding TLS support to the dynamic linker could be implemented incrementally across architectures, but the conversion itself faces the same flag day challenge as the explicit argument solution. The explict argument also ensures that places stick out where encoding fatal errors in the return argument is difficult. (A fourth option would compile the dynamic linker twice and use the TLS-less version for the initial loading of the executables and its dependencies.) From a source code and binary size point of view, there is not much difference between using longjmp and the explicit function argument approach on x86-64. Does anyone want to keep the longjmp approach? Should I polish my patch and extend it to cover all architectures? Thanks, Florian ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Removing longjmp error handling from the dynamic loader 2019-03-05 16:37 Removing longjmp error handling from the dynamic loader Florian Weimer @ 2019-03-06 15:40 ` Rich Felker 2019-03-07 14:03 ` Adhemerval Zanella 2019-03-11 13:45 ` Florian Weimer 2019-03-11 14:07 ` Carlos O'Donell 1 sibling, 2 replies; 17+ messages in thread From: Rich Felker @ 2019-03-06 15:40 UTC (permalink / raw) To: Florian Weimer; +Cc: libc-alpha On Tue, Mar 05, 2019 at 05:37:09PM +0100, Florian Weimer wrote: > Currently, dynamic loader operations triggered by explicit function > calls (dlopen, dlsym, dlcose) wrapped in exception handlers, installed > using _dl_catch_error. This function calls setjmp, and when an error is > raised the dynamic linker calls longjmp, resetting the call stack. > > This leads to strange bugs, such as undefined symbols in init/fini calls > being treated as non-fatal soft errors reported through dlerror > (bug 24304). It also leads to a design where most dlopen failures are > not handled locally. Instead they bubble up to the top-level dlopen > error handler installed using _dl_catch_error, which then attempts to > undo the changes of the partially completed state to the global dynamic > linker state (see bug 20839 for a couple of problems with that). > > In the current scheme, more localized error handling is problematic > because it has high syntactic overhead: You need to define a struct for > argument data and a separate function that receives the data, and pass > both to _dl_catch_error. There also could be a performance overhead if > individual malloc calls were protected in this way because each call to > _dl_catch_error incurs a call to setjmp. > > Does anyone think we should retain the current error handling scheme? > I personally do not have a problem with exceptions and stack unwinding, > but if this is what we want, we should use a DWARF-based unwinder and > GCC's exception handling features (the limited support in the C front > end is probably sufficient). > > The alternative to unwinding is an explicit struct dl_exception * > argument for functions which can fail, and use the return value to > indicate whether there was a fatal error. This sometimes causes issues > where the return value is already used to signal both success and > non-fatal error (e.g., -1 for failure from open_verify, or NULL for > RTLD_NOLOAD failure from _dl_map_object_from_fd). > > There is some impact on <dl-machine.h> because the relocation processing > needs to change. We can convert the relocation processing first to the > new scheme and continue to signal any errors using longjmp in the > generic code. But supporting twice the number of relocation APIs for > incremental conversion of targets will still be difficult. I think we > are still looking at one fairly large patch, given the number of > architectures we support, although the changes should just be a few > dozen lines per architecture. > > We cannot convert the generic code first because that would mean calling > setjmp each time before calling into architecture-specific code, which I > expect will be too problematic from a performance point of view. > > A third option is not use an explicit struct dl_exception * argument, > but a per-thread variable. This will require changes to support TLS > (presumably the initial-exec variant) in the dynamic linker itself, > which is currently missing. Since the exception pointer is only needed > in case of an error, using a TLS variable for it will avoid the overhead > of maintaining the explicit exception pointer argument and passing it > around. Adding TLS support to the dynamic linker could be implemented > incrementally across architectures, but the conversion itself faces the > same flag day challenge as the explicit argument solution. The explict > argument also ensures that places stick out where encoding fatal errors > in the return argument is difficult. (A fourth option would compile the > dynamic linker twice and use the TLS-less version for the initial > loading of the executables and its dependencies.) > > From a source code and binary size point of view, there is not much > difference between using longjmp and the explicit function argument > approach on x86-64. > > Does anyone want to keep the longjmp approach? Should I polish my patch > and extend it to cover all architectures? I don't have a strong opinion (and maybe not enough information to have any opinion) on whether you keep longjmp or go with something else, but I don't think there's any fundamental reason you need to change this to fix current bugs. In musl, longjmp is used partly by historical accident (I don't recall fully, but I think it was a matter of adding dlopen to code that was originally written just for initial dynamic linking at program entry), and doesn't have problems like the ones you describe. For the init/fini "soft errors" problem, it sounds to me like the code that runs the ctors should just be outside of the scope of the _dl_catch_error. If you've started running ctors, you're past the point where the operation can be backed out in any meaningful sense. I wonder if it's worse with ifunc, but I think not -- without bind_now, the ifunc resolvers don't even need to run before you pass the point of no return, and with bind_now, you'd be executing them in a context where resolver errors are still "hard" and can/should cause dlopen to fail. Are there things I'm missing? Rich ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Removing longjmp error handling from the dynamic loader 2019-03-06 15:40 ` Rich Felker @ 2019-03-07 14:03 ` Adhemerval Zanella 2019-03-11 13:45 ` Florian Weimer 1 sibling, 0 replies; 17+ messages in thread From: Adhemerval Zanella @ 2019-03-07 14:03 UTC (permalink / raw) To: libc-alpha On 06/03/2019 12:40, Rich Felker wrote: > On Tue, Mar 05, 2019 at 05:37:09PM +0100, Florian Weimer wrote: >> Currently, dynamic loader operations triggered by explicit function >> calls (dlopen, dlsym, dlcose) wrapped in exception handlers, installed >> using _dl_catch_error. This function calls setjmp, and when an error is >> raised the dynamic linker calls longjmp, resetting the call stack. >> >> This leads to strange bugs, such as undefined symbols in init/fini calls >> being treated as non-fatal soft errors reported through dlerror >> (bug 24304). It also leads to a design where most dlopen failures are >> not handled locally. Instead they bubble up to the top-level dlopen >> error handler installed using _dl_catch_error, which then attempts to >> undo the changes of the partially completed state to the global dynamic >> linker state (see bug 20839 for a couple of problems with that). >> >> In the current scheme, more localized error handling is problematic >> because it has high syntactic overhead: You need to define a struct for >> argument data and a separate function that receives the data, and pass >> both to _dl_catch_error. There also could be a performance overhead if >> individual malloc calls were protected in this way because each call to >> _dl_catch_error incurs a call to setjmp. >> >> Does anyone think we should retain the current error handling scheme? >> I personally do not have a problem with exceptions and stack unwinding, >> but if this is what we want, we should use a DWARF-based unwinder and >> GCC's exception handling features (the limited support in the C front >> end is probably sufficient). >> >> The alternative to unwinding is an explicit struct dl_exception * >> argument for functions which can fail, and use the return value to >> indicate whether there was a fatal error. This sometimes causes issues >> where the return value is already used to signal both success and >> non-fatal error (e.g., -1 for failure from open_verify, or NULL for >> RTLD_NOLOAD failure from _dl_map_object_from_fd). >> >> There is some impact on <dl-machine.h> because the relocation processing >> needs to change. We can convert the relocation processing first to the >> new scheme and continue to signal any errors using longjmp in the >> generic code. But supporting twice the number of relocation APIs for >> incremental conversion of targets will still be difficult. I think we >> are still looking at one fairly large patch, given the number of >> architectures we support, although the changes should just be a few >> dozen lines per architecture. >> >> We cannot convert the generic code first because that would mean calling >> setjmp each time before calling into architecture-specific code, which I >> expect will be too problematic from a performance point of view. >> >> A third option is not use an explicit struct dl_exception * argument, >> but a per-thread variable. This will require changes to support TLS >> (presumably the initial-exec variant) in the dynamic linker itself, >> which is currently missing. Since the exception pointer is only needed >> in case of an error, using a TLS variable for it will avoid the overhead >> of maintaining the explicit exception pointer argument and passing it >> around. Adding TLS support to the dynamic linker could be implemented >> incrementally across architectures, but the conversion itself faces the >> same flag day challenge as the explicit argument solution. The explict >> argument also ensures that places stick out where encoding fatal errors >> in the return argument is difficult. (A fourth option would compile the >> dynamic linker twice and use the TLS-less version for the initial >> loading of the executables and its dependencies.) >> >> From a source code and binary size point of view, there is not much >> difference between using longjmp and the explicit function argument >> approach on x86-64. >> >> Does anyone want to keep the longjmp approach? Should I polish my patch >> and extend it to cover all architectures? > > I don't have a strong opinion (and maybe not enough information to > have any opinion) on whether you keep longjmp or go with something > else, but I don't think there's any fundamental reason you need to > change this to fix current bugs. In musl, longjmp is used partly by > historical accident (I don't recall fully, but I think it was a matter > of adding dlopen to code that was originally written just for initial > dynamic linking at program entry), and doesn't have problems like the > ones you describe. > > For the init/fini "soft errors" problem, it sounds to me like the code > that runs the ctors should just be outside of the scope of the > _dl_catch_error. If you've started running ctors, you're past the > point where the operation can be backed out in any meaningful sense. > I wonder if it's worse with ifunc, but I think not -- without > bind_now, the ifunc resolvers don't even need to run before you pass > the point of no return, and with bind_now, you'd be executing them in > a context where resolver errors are still "hard" and can/should cause > dlopen to fail. > > Are there things I'm missing? > > Rich > Florian, it was not clear to me which approach you are actually working on, explicit struct dl_exception *argument or per-thread variable. In any case, can't we use Rich suggestion and move the logic starting at dl_open_worker starting at _dl_init out of the function, and thus out of _dl_catch_exception? ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Removing longjmp error handling from the dynamic loader 2019-03-06 15:40 ` Rich Felker 2019-03-07 14:03 ` Adhemerval Zanella @ 2019-03-11 13:45 ` Florian Weimer 2019-03-11 22:52 ` Rich Felker 1 sibling, 1 reply; 17+ messages in thread From: Florian Weimer @ 2019-03-11 13:45 UTC (permalink / raw) To: Rich Felker; +Cc: libc-alpha * Rich Felker: > I don't have a strong opinion (and maybe not enough information to > have any opinion) on whether you keep longjmp or go with something > else, but I don't think there's any fundamental reason you need to > change this to fix current bugs. In musl, longjmp is used partly by > historical accident (I don't recall fully, but I think it was a matter > of adding dlopen to code that was originally written just for initial > dynamic linking at program entry), and doesn't have problems like the > ones you describe. Interesting. In glibc, we have many callouts into architecture-specific routines from generic code. Some of these routines throw exceptions, and which ones do is not always entirely clear. For example, if there is a temporary memory allocation which persists across such a callout, do we have to install a local exception handler to clean up that allocation in case the helper routine throws? If the error handling is expressed in the function signature (using that exception pointer parameter), the behavior is much more explicit and we can avoid these issues more easily. > For the init/fini "soft errors" problem, it sounds to me like the code > that runs the ctors should just be outside of the scope of the > _dl_catch_error. If you've started running ctors, you're past the > point where the operation can be backed out in any meaningful sense. That's certainly true. This one should be rather easy to fix. It also affects only dlopen/dlclose, at which point we can assume that we have our full exception handling implementation. > I wonder if it's worse with ifunc, but I think not -- without > bind_now, the ifunc resolvers don't even need to run before you pass > the point of no return, and with bind_now, you'd be executing them in > a context where resolver errors are still "hard" and can/should cause > dlopen to fail. The question is whether a failure from the run-time trampoline should ever be a soft error (that can be caught by dlopen). I think we use the soft/hard distinction differently, but you seem to suggest that a lazy binding error during a call to an IFUNC resolver should not cause process termination. I think it is undefined like any other trampoline failure, so we should abort. We really don't know what the IFUNC resolver was supposed to be doing and which of its side effects happened. The situation really is unrecoverable. If we want to give users more precise control over binding errors, I don't think anything based on SJLJ-style exception handling is the answer. From a technical perspective, it should be feasible to throw something that can actually be caught from C++ code, but given the current pervasive climate against table-driven stack unwinding for error reporting, I'm not sure this is something that would be a good use of anyone's time. Thanks, Florian ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Removing longjmp error handling from the dynamic loader 2019-03-11 13:45 ` Florian Weimer @ 2019-03-11 22:52 ` Rich Felker 2019-03-12 10:30 ` Florian Weimer 0 siblings, 1 reply; 17+ messages in thread From: Rich Felker @ 2019-03-11 22:52 UTC (permalink / raw) To: Florian Weimer; +Cc: libc-alpha On Mon, Mar 11, 2019 at 02:45:13PM +0100, Florian Weimer wrote: > * Rich Felker: > > > I don't have a strong opinion (and maybe not enough information to > > have any opinion) on whether you keep longjmp or go with something > > else, but I don't think there's any fundamental reason you need to > > change this to fix current bugs. In musl, longjmp is used partly by > > historical accident (I don't recall fully, but I think it was a matter > > of adding dlopen to code that was originally written just for initial > > dynamic linking at program entry), and doesn't have problems like the > > ones you describe. > > Interesting. > > In glibc, we have many callouts into architecture-specific routines from > generic code. Some of these routines throw exceptions, and which ones > do is not always entirely clear. > > For example, if there is a temporary memory allocation which persists > across such a callout, do we have to install a local exception handler > to clean up that allocation in case the helper routine throws? > > If the error handling is expressed in the function signature (using that > exception pointer parameter), the behavior is much more explicit and we > can avoid these issues more easily. > > > For the init/fini "soft errors" problem, it sounds to me like the code > > that runs the ctors should just be outside of the scope of the > > _dl_catch_error. If you've started running ctors, you're past the > > point where the operation can be backed out in any meaningful sense. > > That's certainly true. This one should be rather easy to fix. It also > affects only dlopen/dlclose, at which point we can assume that we have > our full exception handling implementation. > > > I wonder if it's worse with ifunc, but I think not -- without > > bind_now, the ifunc resolvers don't even need to run before you pass > > the point of no return, and with bind_now, you'd be executing them in > > a context where resolver errors are still "hard" and can/should cause > > dlopen to fail. > > The question is whether a failure from the run-time trampoline should > ever be a soft error (that can be caught by dlopen). I think we use the > soft/hard distinction differently, but you seem to suggest that a lazy > binding error during a call to an IFUNC resolver should not cause > process termination. I think it is undefined like any other trampoline > failure, so we should abort. We really don't know what the IFUNC > resolver was supposed to be doing and which of its side effects > happened. The situation really is unrecoverable. Assuming ifunc resolvers aren't "allowed" to do much beyond probing hwcaps/cpuid/etc. to pick an implementation, I don't see any reason that resolver failure during an ifunc resolver function should terminate the process. Missing symbols at dlopen time with RTLD_NOW or DT_BINDNOW or whatever should never crash the application, but should report the error. With ifunc, I think (?) you have the possibility that the ifunc resolver code will call another function in the library being loaded (or one of its deps) via a plt slot that hasn't yet been initialized, because there's no way to know a dependency order for the relocations to avoid this. This should probably longjmp back and make dlopen fail; I can't see any other way to make it work since there's no way to make forward progress past the impossible-to-satisfy call. But maybe the relocations can just be ordered such that this isn't a concern (by checking all symbolic references prior to doing any ifunc resolvers?). > If we want to give users more precise control over binding errors, I > don't think anything based on SJLJ-style exception handling is the > answer. I don't see why there should be any expectation that you can use C++ exception handling for this; the contract of dlopen is that it succeed or return an error, not that it might terminate via an exception. Rich ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Removing longjmp error handling from the dynamic loader 2019-03-11 22:52 ` Rich Felker @ 2019-03-12 10:30 ` Florian Weimer 2019-03-12 12:00 ` Szabolcs Nagy 0 siblings, 1 reply; 17+ messages in thread From: Florian Weimer @ 2019-03-12 10:30 UTC (permalink / raw) To: Rich Felker; +Cc: libc-alpha * Rich Felker: > Assuming ifunc resolvers aren't "allowed" to do much beyond probing > hwcaps/cpuid/etc. to pick an implementation, I don't see any reason > that resolver failure during an ifunc resolver function should > terminate the process. Depending on which documentation you read, IFUNC resolvers must not depend on run-time relocations themselves. In that case, lazy binding failure during execution of an IFUNC resolver cannot possibly happen in a valid program. We would also have to disable signals while IFUNC resolvers are running, so that lazy binding errors in signal handlers do not leak into the dlopen call (after longjmp'ing out of the signal handler). Is this really worth the trouble? > Missing symbols at dlopen time with RTLD_NOW or DT_BINDNOW or whatever > should never crash the application, but should report the error. With > ifunc, I think (?) you have the possibility that the ifunc resolver > code will call another function in the library being loaded (or one of > its deps) via a plt slot that hasn't yet been initialized, because > there's no way to know a dependency order for the relocations to avoid > this. We perform relocations in topological order, and IRELATIVE relocations are sorted last by current binutils. So in the absence of cyclic dependencies (perhaps as the result of symbol interposition), IFUNC resolvers will not encounter uninitialized PLT slots. > This should probably longjmp back and make dlopen fail; I can't see > any other way to make it work since there's no way to make forward > progress past the impossible-to-satisfy call. If you can detect at all that the relocation has not been processed, you could longjmp out of the IFUNC resolver and try something else, until there is definitely no way to make progress. (I'm not saying that this longjmp is valid, but it is a possibility.) But I really don't see how we can make this work reliably because there are relocation dependencies that do not involve lazy binding or PLT calls, and we cannot detect those. The IFUNC resolver would just use uninitialized data or data that is later overwritten. Two-phase relocation processing in topology order (first all non-IFUNC relocations, then the IFUNC relocations) seems to cover all the practical cases involving symbol interposition. It deals correctly with glibc's internal uses of those (which can currently lead to crashes, see bug 21041). It also covers, by design, all relocations for data symbols because they cannot involve IFUNCs. It does not deal with all cases where an IFUNC resolver uses a function pointer variable that has been initialized by a relocation, and that value is itself the result of an IFUNC resolver. It also cannot support cases where an IFUNC resolver depends on ELF constructors having run for one of its dependencies (which some people did, until distributions started building with BIND_NOW). > But maybe the relocations can just be ordered such that this isn't a > concern (by checking all symbolic references prior to doing any ifunc > resolvers?). I think doing that would be excessive and it wouldn't cover the case where the IFUNC resolver actually relies on lazy binding for choosing the implementation (which could be constructed as valid if IFUNC resolvers may rely on relocations). >> If we want to give users more precise control over binding errors, I >> don't think anything based on SJLJ-style exception handling is the >> answer. > > I don't see why there should be any expectation that you can use C++ > exception handling for this; the contract of dlopen is that it succeed > or return an error, not that it might terminate via an exception. I meant for a call that results in a lazy binding failure. Thanks, Florian ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Removing longjmp error handling from the dynamic loader 2019-03-12 10:30 ` Florian Weimer @ 2019-03-12 12:00 ` Szabolcs Nagy 0 siblings, 0 replies; 17+ messages in thread From: Szabolcs Nagy @ 2019-03-12 12:00 UTC (permalink / raw) To: Florian Weimer, Rich Felker; +Cc: nd, libc-alpha@sourceware.org On 12/03/2019 10:30, Florian Weimer wrote: > * Rich Felker: >> Missing symbols at dlopen time with RTLD_NOW or DT_BINDNOW or whatever >> should never crash the application, but should report the error. With >> ifunc, I think (?) you have the possibility that the ifunc resolver >> code will call another function in the library being loaded (or one of >> its deps) via a plt slot that hasn't yet been initialized, because >> there's no way to know a dependency order for the relocations to avoid >> this. > > We perform relocations in topological order, and IRELATIVE relocations > are sorted last by current binutils. So in the absence of cyclic > dependencies (perhaps as the result of symbol interposition), IFUNC > resolvers will not encounter uninitialized PLT slots. IRELATIVE is sorted after JUMP_SLOT but GLOB_DAT and ABS come before and those can be against an STT_GNU_IFUNC symbol that is defined in the same module, so ifunc resolvers can encounter uninitialized PLT slots (no cyclic dependency needed, with or without bind now). > IFUNC resolver. It also cannot support cases where an IFUNC resolver > depends on ELF constructors having run for one of its dependencies > (which some people did, until distributions started building with > BIND_NOW). depending on ctor in ifunc sounds broken. (i know at least x86 gcc libatomic did this, but that was a bug) >> But maybe the relocations can just be ordered such that this isn't a >> concern (by checking all symbolic references prior to doing any ifunc >> resolvers?). > > I think doing that would be excessive and it wouldn't cover the case > where the IFUNC resolver actually relies on lazy binding for choosing > the implementation (which could be constructed as valid if IFUNC > resolvers may rely on relocations). depending on lazy binding sounds fragile. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Removing longjmp error handling from the dynamic loader 2019-03-05 16:37 Removing longjmp error handling from the dynamic loader Florian Weimer 2019-03-06 15:40 ` Rich Felker @ 2019-03-11 14:07 ` Carlos O'Donell 2019-03-11 15:29 ` Florian Weimer 1 sibling, 1 reply; 17+ messages in thread From: Carlos O'Donell @ 2019-03-11 14:07 UTC (permalink / raw) To: Florian Weimer, libc-alpha On 3/5/19 11:37 AM, Florian Weimer wrote: > Currently, dynamic loader operations triggered by explicit function > calls (dlopen, dlsym, dlcose) wrapped in exception handlers, installed > using _dl_catch_error. This function calls setjmp, and when an error is > raised the dynamic linker calls longjmp, resetting the call stack. Yes. Call this "Method 1: sjlj recovery" > This leads to strange bugs, such as undefined symbols in init/fini calls > being treated as non-fatal soft errors reported through dlerror > (bug 24304). It also leads to a design where most dlopen failures are > not handled locally. Instead they bubble up to the top-level dlopen > error handler installed using _dl_catch_error, which then attempts to > undo the changes of the partially completed state to the global dynamic > linker state (see bug 20839 for a couple of problems with that). Nothing to me indicates that a new system wouldn't also have bugs. The benefit you need to highlight is the ability for a new design to mitigate bugs by the very nature of the structural changes being proposed. Both sjlj and a purely C++-style exception mechanism are, in my mind, semantically equivalent, but the latter has more syntactic sugar to allow you to practice good patterns that avoid mistakes (but it can equally hide errors and create bugs). > In the current scheme, more localized error handling is problematic > because it has high syntactic overhead: You need to define a struct for > argument data and a separate function that receives the data, and pass > both to _dl_catch_error. There also could be a performance overhead if > individual malloc calls were protected in this way because each call to > _dl_catch_error incurs a call to setjmp. Yes, Method 1 requires passing down all information encapsulated in a structure. Why would malloc calls be protected in this way? > Does anyone think we should retain the current error handling scheme? That depends on the quality of the alternative proposal :-) > I personally do not have a problem with exceptions and stack unwinding, > but if this is what we want, we should use a DWARF-based unwinder and > GCC's exception handling features (the limited support in the C front > end is probably sufficient). I agree. Let's call this "Method 2: C exceptions" For dlopen-et-al I think a DWARF-based unwinder would be great. I assume you envision that exception handling in C would help cleanup the code and allow more locally visible cleanups to happen (unwinding state). > The alternative to unwinding is an explicit struct dl_exception * > argument for functions which can fail, and use the return value to > indicate whether there was a fatal error. This sometimes causes issues > where the return value is already used to signal both success and > non-fatal error (e.g., -1 for failure from open_verify, or NULL for > RTLD_NOLOAD failure from _dl_map_object_from_fd). No, if we're going to change it should be *towards* something where the compiler can help us get it right. I don't want to see an explicit "this" passed into every function by hand. Let's call this "Method 3: Explicit this" > There is some impact on <dl-machine.h> because the relocation processing > needs to change. We can convert the relocation processing first to the > new scheme and continue to signal any errors using longjmp in the > generic code. But supporting twice the number of relocation APIs for > incremental conversion of targets will still be difficult. I think we > are still looking at one fairly large patch, given the number of > architectures we support, although the changes should just be a few > dozen lines per architecture. Please expand on this a bit more. > We cannot convert the generic code first because that would mean calling > setjmp each time before calling into architecture-specific code, which I > expect will be too problematic from a performance point of view. Right, this is an all-or-nothing change IMO. > A third option is not use an explicit struct dl_exception * argument, > but a per-thread variable. This will require changes to support TLS > (presumably the initial-exec variant) in the dynamic linker itself, > which is currently missing. Since the exception pointer is only needed > in case of an error, using a TLS variable for it will avoid the overhead > of maintaining the explicit exception pointer argument and passing it > around. Adding TLS support to the dynamic linker could be implemented > incrementally across architectures, but the conversion itself faces the > same flag day challenge as the explicit argument solution. The explict > argument also ensures that places stick out where encoding fatal errors > in the return argument is difficult. (A fourth option would compile the > dynamic linker twice and use the TLS-less version for the initial > loading of the executables and its dependencies.) I don't like either of these options. Let's call this "Method 3: TP explicit this" I won't give the 4th option a name :-} > From a source code and binary size point of view, there is not much > difference between using longjmp and the explicit function argument > approach on x86-64. Right. > Does anyone want to keep the longjmp approach? Should I polish my patch > and extend it to cover all architectures? What does your patch do? Method 1: sjlj Method 2: C exceptions <------- I vote method 2. Method 3: Explicit this Method 4: TP explicit this -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Removing longjmp error handling from the dynamic loader 2019-03-11 14:07 ` Carlos O'Donell @ 2019-03-11 15:29 ` Florian Weimer 2019-03-11 16:13 ` Carlos O'Donell 0 siblings, 1 reply; 17+ messages in thread From: Florian Weimer @ 2019-03-11 15:29 UTC (permalink / raw) To: Carlos O'Donell; +Cc: libc-alpha * Carlos O'Donell: >> In the current scheme, more localized error handling is problematic >> because it has high syntactic overhead: You need to define a struct for >> argument data and a separate function that receives the data, and pass >> both to _dl_catch_error. There also could be a performance overhead if >> individual malloc calls were protected in this way because each call to >> _dl_catch_error incurs a call to setjmp. > > Yes, Method 1 requires passing down all information encapsulated in a > structure. > > Why would malloc calls be protected in this way? You need a local handler if the pointer is not rooted in something that is thrown away by the top-most handler, and you call a potentially-throwing function between the allocation and the deallocation. >> I personally do not have a problem with exceptions and stack unwinding, >> but if this is what we want, we should use a DWARF-based unwinder and >> GCC's exception handling features (the limited support in the C front >> end is probably sufficient). > > I agree. > > Let's call this "Method 2: C exceptions" > > For dlopen-et-al I think a DWARF-based unwinder would be great. It requires moving the unwinder implementation from libgcc_s to libc, though. The last time we discussed this (related to unwinder performance issues and the dl_iterate_phdr interface), this idea was not well-received. If we fix the init/fini and IFUNC resolver bugs the way I intend, we will never have to unwind through user code before hitting the final handler, so there is no problem with the availability of unwinding data. So at least that part is entirely solvable. > I assume you envision that exception handling in C would help cleanup > the code and allow more locally visible cleanups to happen (unwinding > state). It requires a vastly different coding style. I have not thought much about it. I don't think we should rely on a relatively obscure GNU extension so prominently. Or put differently, if we want to use RAII as an implementation technique, I don't think we should use C. >> The alternative to unwinding is an explicit struct dl_exception * >> argument for functions which can fail, and use the return value to >> indicate whether there was a fatal error. This sometimes causes issues >> where the return value is already used to signal both success and >> non-fatal error (e.g., -1 for failure from open_verify, or NULL for >> RTLD_NOLOAD failure from _dl_map_object_from_fd). > > No, if we're going to change it should be *towards* something where the > compiler can help us get it right. > > I don't want to see an explicit "this" passed into every function by hand. > > Let's call this "Method 3: Explicit this" The explicit argument really helps to spot places where unwinding happens. For example, it makes it pretty clear that you have a memory leak if you call a function with such an argument and an allocation is not rooted in a global data structure. >> There is some impact on <dl-machine.h> because the relocation processing >> needs to change. We can convert the relocation processing first to the >> new scheme and continue to signal any errors using longjmp in the >> generic code. But supporting twice the number of relocation APIs for >> incremental conversion of targets will still be difficult. I think we >> are still looking at one fairly large patch, given the number of >> architectures we support, although the changes should just be a few >> dozen lines per architecture. > > Please expand on this a bit more. Let's look at sysdeps/powerpc/powerpc64/dl-machine.h. elf_machine_rela calls _dl_reloc_overflow and _dl_reloc_bad_type, among other things, and those throw. This means that elf_machine_rela needs to be adjusted with the explicit argument. RESOLVE_MAP calls _dl_lookup_symbol_x behind the scenes as well, so that can throw, too. Is there any state we need to roll back locally in elf_machine_rela? Probably not. So this is one area where we do not benefit from the explicit argument. >> A third option is not use an explicit struct dl_exception * argument, >> but a per-thread variable. This will require changes to support TLS >> (presumably the initial-exec variant) in the dynamic linker itself, >> which is currently missing. Since the exception pointer is only needed >> in case of an error, using a TLS variable for it will avoid the overhead >> of maintaining the explicit exception pointer argument and passing it >> around. Adding TLS support to the dynamic linker could be implemented >> incrementally across architectures, but the conversion itself faces the >> same flag day challenge as the explicit argument solution. The explict >> argument also ensures that places stick out where encoding fatal errors >> in the return argument is difficult. (A fourth option would compile the >> dynamic linker twice and use the TLS-less version for the initial >> loading of the executables and its dependencies.) > > I don't like either of these options. > > Let's call this "Method 3: TP explicit this" > > I won't give the 4th option a name :-} Note that the current implementation already stores the dl_exception * and the jump buffer for the error handler in thread-local storage. That's why we need to compile the exception handling machinery twice. Thanks, Florian ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Removing longjmp error handling from the dynamic loader 2019-03-11 15:29 ` Florian Weimer @ 2019-03-11 16:13 ` Carlos O'Donell 2019-03-11 16:53 ` Zack Weinberg 2019-03-11 18:39 ` Florian Weimer 0 siblings, 2 replies; 17+ messages in thread From: Carlos O'Donell @ 2019-03-11 16:13 UTC (permalink / raw) To: Florian Weimer; +Cc: libc-alpha On 3/11/19 11:29 AM, Florian Weimer wrote: > * Carlos O'Donell: > >>> In the current scheme, more localized error handling is problematic >>> because it has high syntactic overhead: You need to define a struct for >>> argument data and a separate function that receives the data, and pass >>> both to _dl_catch_error. There also could be a performance overhead if >>> individual malloc calls were protected in this way because each call to >>> _dl_catch_error incurs a call to setjmp. >> >> Yes, Method 1 requires passing down all information encapsulated in a >> structure. >> >> Why would malloc calls be protected in this way? > > You need a local handler if the pointer is not rooted in something that > is thrown away by the top-most handler, and you call a > potentially-throwing function between the allocation and the > deallocation. Right, you might be able to get away with a free-list here, but I get your point, in a fully generic system you must register a free handler which is specific code that knows how to free the object you allocated. I hate to say it but we're just designing C++ from the ground up again. Why not just get the loader to the point it can run C++? This is a serious question. >>> I personally do not have a problem with exceptions and stack unwinding, >>> but if this is what we want, we should use a DWARF-based unwinder and >>> GCC's exception handling features (the limited support in the C front >>> end is probably sufficient). >> >> I agree. >> >> Let's call this "Method 2: C exceptions" >> >> For dlopen-et-al I think a DWARF-based unwinder would be great. > > It requires moving the unwinder implementation from libgcc_s to libc, > though. The last time we discussed this (related to unwinder > performance issues and the dl_iterate_phdr interface), this idea was not > well-received. It might still be the best technical choice. I do not think we should discard this idea so easily. Why wouldn't we pursue this option? > If we fix the init/fini and IFUNC resolver bugs the way I intend, we > will never have to unwind through user code before hitting the final > handler, so there is no problem with the availability of unwinding data. > So at least that part is entirely solvable. I don't understand the exact bugs in question here, what are they? What are your solutions? >> I assume you envision that exception handling in C would help cleanup >> the code and allow more locally visible cleanups to happen (unwinding >> state). > > It requires a vastly different coding style. I have not thought much > about it. I don't think we should rely on a relatively obscure GNU > extension so prominently. Or put differently, if we want to use RAII as > an implementation technique, I don't think we should use C. This is a GNU project and we should do whatever we think is the best solution for glibc. If the obscure GNU extension has bugs, then that's a reason not to use it, or if the debugger doesn't support it, then that's a reason not to use it, but it can't just be about "obscure GNU extensions." Alternatives are using Rust or C++. I would prefer a limited form of C++ without libstdc++ support. Get the loader to the point where it can run C++, with the unwinder builtin, and then only use C++ classes the loader itself defines. >>> The alternative to unwinding is an explicit struct dl_exception * >>> argument for functions which can fail, and use the return value to >>> indicate whether there was a fatal error. This sometimes causes issues >>> where the return value is already used to signal both success and >>> non-fatal error (e.g., -1 for failure from open_verify, or NULL for >>> RTLD_NOLOAD failure from _dl_map_object_from_fd). >> >> No, if we're going to change it should be *towards* something where the >> compiler can help us get it right. >> >> I don't want to see an explicit "this" passed into every function by hand. >> >> Let's call this "Method 3: Explicit this" > > The explicit argument really helps to spot places where unwinding > happens. For example, it makes it pretty clear that you have a memory > leak if you call a function with such an argument and an allocation is > not rooted in a global data structure. I agree, but is it the best option? Or are you arguing it's the best value for cost option? >>> There is some impact on <dl-machine.h> because the relocation processing >>> needs to change. We can convert the relocation processing first to the >>> new scheme and continue to signal any errors using longjmp in the >>> generic code. But supporting twice the number of relocation APIs for >>> incremental conversion of targets will still be difficult. I think we >>> are still looking at one fairly large patch, given the number of >>> architectures we support, although the changes should just be a few >>> dozen lines per architecture. >> >> Please expand on this a bit more. > > Let's look at sysdeps/powerpc/powerpc64/dl-machine.h. elf_machine_rela > calls _dl_reloc_overflow and _dl_reloc_bad_type, among other things, and > those throw. This means that elf_machine_rela needs to be adjusted with > the explicit argument. RESOLVE_MAP calls _dl_lookup_symbol_x behind the > scenes as well, so that can throw, too. I see where you're going with this, particularly the state rollback. > Is there any state we need to roll back locally in elf_machine_rela? > Probably not. So this is one area where we do not benefit from the > explicit argument. Understood. >>> A third option is not use an explicit struct dl_exception * argument, >>> but a per-thread variable. This will require changes to support TLS >>> (presumably the initial-exec variant) in the dynamic linker itself, >>> which is currently missing. Since the exception pointer is only needed >>> in case of an error, using a TLS variable for it will avoid the overhead >>> of maintaining the explicit exception pointer argument and passing it >>> around. Adding TLS support to the dynamic linker could be implemented >>> incrementally across architectures, but the conversion itself faces the >>> same flag day challenge as the explicit argument solution. The explict >>> argument also ensures that places stick out where encoding fatal errors >>> in the return argument is difficult. (A fourth option would compile the >>> dynamic linker twice and use the TLS-less version for the initial >>> loading of the executables and its dependencies.) >> >> I don't like either of these options. >> >> Let's call this "Method 3: TP explicit this" >> >> I won't give the 4th option a name :-} > > Note that the current implementation already stores the dl_exception * > and the jump buffer for the error handler in thread-local storage. Yes, I see that in elf/dl-error-skeleton.c (catch_hook). > That's why we need to compile the exception handling machinery twice. So to be clear, you're proposing: "Method 3: Explicit this" Which you argue has highest value, with lowest cost. While I've been arguing for "Method 2: C exceptions" which you believe has too much cost given the value, since we'd have to: - move the unwinder to libc - switch to some C++ subset I can understand your position, and I'm not opposed to "Method 3: Explicit this", it would call out the exact places where we might have problems for a future conversion. -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Removing longjmp error handling from the dynamic loader 2019-03-11 16:13 ` Carlos O'Donell @ 2019-03-11 16:53 ` Zack Weinberg 2019-03-11 17:08 ` Florian Weimer 2019-03-11 18:39 ` Florian Weimer 1 sibling, 1 reply; 17+ messages in thread From: Zack Weinberg @ 2019-03-11 16:53 UTC (permalink / raw) To: Carlos O'Donell; +Cc: Florian Weimer, GNU C Library On Mon, Mar 11, 2019 at 12:13 PM Carlos O'Donell <carlos@redhat.com> wrote: > On 3/11/19 11:29 AM, Florian Weimer wrote: > > It requires moving the unwinder implementation from libgcc_s to libc, > > though. The last time we discussed this (related to unwinder > > performance issues and the dl_iterate_phdr interface), this idea was not > > well-received. > > It might still be the best technical choice. > I do not think we should discard this idea so easily. > Why wouldn't we pursue this option? I haven't thought about this much, but I don't like the idea of increasing the set of functions potentially executed during symbol resolution to the tune of the entire unwinder, because of the unusual constraints on code executed in that context (e.g. must not take locks, must protect itself from cancellation, must not touch the normal errno). I was already thinking about saying that I supported Florian's original proposal just because it would mean *reducing* that set by at least setjmp and longjmp. zw ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Removing longjmp error handling from the dynamic loader 2019-03-11 16:53 ` Zack Weinberg @ 2019-03-11 17:08 ` Florian Weimer 2019-03-11 18:39 ` Zack Weinberg 0 siblings, 1 reply; 17+ messages in thread From: Florian Weimer @ 2019-03-11 17:08 UTC (permalink / raw) To: Zack Weinberg; +Cc: Carlos O'Donell, GNU C Library * Zack Weinberg: > On Mon, Mar 11, 2019 at 12:13 PM Carlos O'Donell <carlos@redhat.com> wrote: >> On 3/11/19 11:29 AM, Florian Weimer wrote: >> > It requires moving the unwinder implementation from libgcc_s to libc, >> > though. The last time we discussed this (related to unwinder >> > performance issues and the dl_iterate_phdr interface), this idea was not >> > well-received. >> >> It might still be the best technical choice. >> I do not think we should discard this idea so easily. >> Why wouldn't we pursue this option? > > I haven't thought about this much, but I don't like the idea of > increasing the set of functions potentially executed during symbol > resolution to the tune of the entire unwinder, because of the unusual > constraints on code executed in that context (e.g. must not take > locks, must protect itself from cancellation, must not touch the > normal errno). This is a good point which I had not considered. My position is that errors during symbol resolution in lazy binding are never recoverable. If the error is not recoverable, there is no need to do any unwinding at all. We currently get this wrong for init/fini (bug 24304). Rich Felker brought up this matter in conjunction with IFUNC resolvers. These can be called during relocation processing (from dlopen) or lazy binding, when they themselves trigger lazy binding. I think these errors should not be recoverable, either, whether they happen during relocation processing or lazy binding. But there is a narrow edge case (IFUNC resolver called during relocation processing which triggers lazy binding which fails in symbol lookup) where we actually have user code (the IFUNC resolver) on the stack, and we would have to unwind through the trampoline/IFUNC resolver/dlopen sequence if wanted to make this error recoverable. (Which I think we should not do, but the discussion is still open.) But even that edge case only happens after dlopen, so at least it's not in an async-signal-safe context. But the user stack frame pretty much kills the DWARF-based unwinding approach if it's something we need to support. Thanks, Florian ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Removing longjmp error handling from the dynamic loader 2019-03-11 17:08 ` Florian Weimer @ 2019-03-11 18:39 ` Zack Weinberg 2019-03-11 18:49 ` Florian Weimer 0 siblings, 1 reply; 17+ messages in thread From: Zack Weinberg @ 2019-03-11 18:39 UTC (permalink / raw) To: Florian Weimer; +Cc: Carlos O'Donell, GNU C Library On Mon, Mar 11, 2019 at 1:08 PM Florian Weimer <fweimer@redhat.com> wrote: > * Zack Weinberg: > > I haven't thought about this much, but I don't like the idea of > > increasing the set of functions potentially executed during symbol > > resolution to the tune of the entire unwinder, because of the unusual > > constraints on code executed in that context (e.g. must not take > > locks, must protect itself from cancellation, must not touch the > > normal errno). > > This is a good point which I had not considered. > > My position is that errors during symbol resolution in lazy binding are > never recoverable. If the error is not recoverable, there is no need to > do any unwinding at all. We currently get this wrong for init/fini > (bug 24304). I tend to agree with this. I'm also honestly dubious of the value of lazy binding anymore, but that we should think about separately. > Rich Felker brought up this matter in conjunction with IFUNC resolvers. > These can be called during relocation processing (from dlopen) or lazy > binding, when they themselves trigger lazy binding. I think these > errors should not be recoverable, either, whether they happen during > relocation processing or lazy binding. But there is a narrow edge case > (IFUNC resolver called during relocation processing which triggers lazy > binding which fails in symbol lookup) I wonder how practical it would be to discover the set of symbols that an IFUNC resolver might attempt to use, and resolve those before calling into the IFUNC resolver. IFUNCs are still new enough that we could require additional annotations in the object file if that would help. zw ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Removing longjmp error handling from the dynamic loader 2019-03-11 18:39 ` Zack Weinberg @ 2019-03-11 18:49 ` Florian Weimer 0 siblings, 0 replies; 17+ messages in thread From: Florian Weimer @ 2019-03-11 18:49 UTC (permalink / raw) To: Zack Weinberg; +Cc: Carlos O'Donell, GNU C Library * Zack Weinberg: >> Rich Felker brought up this matter in conjunction with IFUNC resolvers. >> These can be called during relocation processing (from dlopen) or lazy >> binding, when they themselves trigger lazy binding. I think these >> errors should not be recoverable, either, whether they happen during >> relocation processing or lazy binding. But there is a narrow edge case >> (IFUNC resolver called during relocation processing which triggers lazy >> binding which fails in symbol lookup) > > I wonder how practical it would be to discover the set of symbols that > an IFUNC resolver might attempt to use, and resolve those before > calling into the IFUNC resolver. IFUNCs are still new enough that we > could require additional annotations in the object file if that would > help. Considering that the whole purpose of IFUNC resolvers is to select implementations, eagerly resolving all potentially needed relocations (even without BIND_NOW) seems rather excessive and not very backwards-compatible. I've proposed two-phase relocation processing before, first all non-IFUNC relocations, followed by the IFUNC relocations that have been queued. A two-phase approach is needed for most cases involving IFUNCs, non-lazy binding, and symbol interposition. It does not solve all IFUNC issues (you cannot in general, if there is a circular depenedency for example), but I think it would be a good addition, and it would even fix some glibc bugs where we use IFUNC resolvers improperly. It applies to the BIND_NOW case only, which, like you, I consider the more relevant one. Thanks, Florian ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Removing longjmp error handling from the dynamic loader 2019-03-11 16:13 ` Carlos O'Donell 2019-03-11 16:53 ` Zack Weinberg @ 2019-03-11 18:39 ` Florian Weimer 2019-03-11 19:30 ` Carlos O'Donell 1 sibling, 1 reply; 17+ messages in thread From: Florian Weimer @ 2019-03-11 18:39 UTC (permalink / raw) To: Carlos O'Donell; +Cc: libc-alpha * Carlos O'Donell: > Right, you might be able to get away with a free-list here, but I get your > point, in a fully generic system you must register a free handler which is > specific code that knows how to free the object you allocated. I hate to say > it but we're just designing C++ from the ground up again. Why not just get > the loader to the point it can run C++? This is a serious question. We would have to solve the bootstrapping issue. I expect this topic (use of C++) to be rather controversial. >>> Let's call this "Method 2: C exceptions" >>> >>> For dlopen-et-al I think a DWARF-based unwinder would be great. >> >> It requires moving the unwinder implementation from libgcc_s to libc, >> though. The last time we discussed this (related to unwinder >> performance issues and the dl_iterate_phdr interface), this idea was not >> well-received. > > It might still be the best technical choice. > > I do not think we should discard this idea so easily. > > Why wouldn't we pursue this option? Well, it turns this whole exercise from a one-day hack into a longer-term integration project. 8-) If this is the direction in which we want to go, it doesn't look technically unsound to me, just very controversial. >> If we fix the init/fini and IFUNC resolver bugs the way I intend, we >> will never have to unwind through user code before hitting the final >> handler, so there is no problem with the availability of unwinding data. >> So at least that part is entirely solvable. > > I don't understand the exact bugs in question here, what are they? These bugs are unrelated. We currently have memory leaks if getaddrinfo is canceled, for example. Fixing that by compiling getaddrinfo with -fexceptions and using RAII wouldn't solve them because getaddrinfo might call into an NSS service module which is not compiled with unwinding information. Hence the need for the dual unwinding with longjmp and the libgcc_s unwinder in libpthread's cancellation implementation. (We might get away with that if we install a setjmp barrier before calling into an external NSS module, though.) For the socket file descriptor opened by nss_dns/libresolv and some other resources used there, we currently use a strategy like the free list you mentioned (_res and struct resolv_context). We need to perform heap allocations for the resolver context because the thread cleanup code runs only after fully unwinding the stack, so it won't be able to use the information there. (This is different in the SJLJ case in the dynamic linker, where we can easily change what the code does before calling longjmp.) But these two mechanisms do not cover everything that getaddrinfo allocates. In short, this is an entirely different area, but it could benefit from experience with other unwinding designs. >>> I assume you envision that exception handling in C would help cleanup >>> the code and allow more locally visible cleanups to happen (unwinding >>> state). >> >> It requires a vastly different coding style. I have not thought much >> about it. I don't think we should rely on a relatively obscure GNU >> extension so prominently. Or put differently, if we want to use RAII as >> an implementation technique, I don't think we should use C. > > This is a GNU project and we should do whatever we think is the best > solution for glibc. > > If the obscure GNU extension has bugs, then that's a reason not to use > it, or if the debugger doesn't support it, then that's a reason not to > use it, but it can't just be about "obscure GNU extensions." The extension itself (the cleanup attribute) is increasingly used. systemd seems quite happy as a user, and there are others. Combining it with DWARF unwinding is rare and probably restricted to some limited scenarios related to cancellation. Fedora and downstream are compiled with -fexceptions, so there probably aren't any obvious ICEs in that area. > Alternatives are using Rust or C++. Rust is not supposed to have exceptions. It only supports automatic error returns based on the return type (a specially annotated sum type). There's also no GNU front end for it. (In reality, Rust has exceptions; the test framework even requires them. Last time I tried, you couldn't compile your code without stack unwinding support if you wanted to test it. By default, out-of-bounds array access in Rust throws, and like in Go, there is just not very convenient language support to handle the exception, but the support is there in the run-time library.) > I would prefer a limited form of C++ without libstdc++ support. We would have to move the C++ personality routine and the core exception handling to ld.so. They live in libstdc++, not libgcc_s, and are not part of the regular unwinder. 8-/ Maybe we could have a restricted, hidden implementation of C++ exception support. I have not put much thought to that. I didn't expect this turn of the discussion. Note that RAII is helpful as an implementation strategy even without exceptions. (But it makes some uses of constructors rather cumbersome because objects are not necessarily in a valid state right after construction. It would work fine for struct scratch_buffer and the dynarrays, though. Guess why.) >>>> The alternative to unwinding is an explicit struct dl_exception * >>>> argument for functions which can fail, and use the return value to >>>> indicate whether there was a fatal error. This sometimes causes issues >>>> where the return value is already used to signal both success and >>>> non-fatal error (e.g., -1 for failure from open_verify, or NULL for >>>> RTLD_NOLOAD failure from _dl_map_object_from_fd). >>> >>> No, if we're going to change it should be *towards* something where the >>> compiler can help us get it right. >>> >>> I don't want to see an explicit "this" passed into every function by hand. >>> >>> Let's call this "Method 3: Explicit this" >> >> The explicit argument really helps to spot places where unwinding >> happens. For example, it makes it pretty clear that you have a memory >> leak if you call a function with such an argument and an allocation is >> not rooted in a global data structure. > > I agree, but is it the best option? > > Or are you arguing it's the best value for cost option? I expect that we have to do this conversion anyway (I already did for x86-64) to find out the spots which need careful handling in any of the implicit approaches. Maybe we could avoid that if we had exact call-graph-based tools, for all relevant targets, but I don't think they exist. > So to be clear, you're proposing: > > "Method 3: Explicit this" > > Which you argue has highest value, with lowest cost. Yes, and it also has the best tool support, I think. We could use the knowledge gained by the conversion to fix bugs, but I think it will bit-rot over time, as people add more architecture-specific code and support more features in the dynamic linker. > While I've been arguing for "Method 2: C exceptions" which you believe > has too much cost given the value, since we'd have to: > - move the unwinder to libc > - switch to some C++ subset This sounds tempting as well, but I'm not sure if it's the right place to start with using C++ in glibc. One thing we have not considered much so far is what you call “free lists”, and generally not doing any unrecoverable actions until we reach the point of no return in dlopen (and eliminating the possiblity of failures in dlclose, by avoiding calling malloc there). For example, we could store the temporary file descriptors in the exception context (by a callout to libc.so.6) and make sure that they are closed when unwinding. We can avoid memory leaks for temporary allocations in much the same way. In a sense, we would provide a more approachable programming environment for unwinding. Thanks, Florian ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Removing longjmp error handling from the dynamic loader 2019-03-11 18:39 ` Florian Weimer @ 2019-03-11 19:30 ` Carlos O'Donell 2019-03-13 15:51 ` Florian Weimer 0 siblings, 1 reply; 17+ messages in thread From: Carlos O'Donell @ 2019-03-11 19:30 UTC (permalink / raw) To: Florian Weimer; +Cc: libc-alpha On 3/11/19 2:39 PM, Florian Weimer wrote: >> So to be clear, you're proposing: >> >> "Method 3: Explicit this" >> >> Which you argue has highest value, with lowest cost. > > Yes, and it also has the best tool support, I think. > > We could use the knowledge gained by the conversion to fix bugs, but I > think it will bit-rot over time, as people add more > architecture-specific code and support more features in the dynamic > linker. > >> While I've been arguing for "Method 2: C exceptions" which you believe >> has too much cost given the value, since we'd have to: >> - move the unwinder to libc >> - switch to some C++ subset > > This sounds tempting as well, but I'm not sure if it's the right place > to start with using C++ in glibc. I can accept that :-) > One thing we have not considered much so far is what you call “free > lists”, and generally not doing any unrecoverable actions until we reach > the point of no return in dlopen (and eliminating the possiblity of > failures in dlclose, by avoiding calling malloc there). For example, we > could store the temporary file descriptors in the exception context (by > a callout to libc.so.6) and make sure that they are closed when > unwinding. We can avoid memory leaks for temporary allocations in much > the same way. > > In a sense, we would provide a more approachable programming environment > for unwinding. OK, so lets expand on that a bit, if you went down this route what do the steps look like? If you enumerate them a bit more I think we'll all be able to comment on them first, and then agree that it's the right way forward. -- Cheers, Carlos. ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Removing longjmp error handling from the dynamic loader 2019-03-11 19:30 ` Carlos O'Donell @ 2019-03-13 15:51 ` Florian Weimer 0 siblings, 0 replies; 17+ messages in thread From: Florian Weimer @ 2019-03-13 15:51 UTC (permalink / raw) To: Carlos O'Donell; +Cc: libc-alpha * Carlos O'Donell: >> One thing we have not considered much so far is what you call “free >> lists”, and generally not doing any unrecoverable actions until we reach >> the point of no return in dlopen (and eliminating the possiblity of >> failures in dlclose, by avoiding calling malloc there). For example, we >> could store the temporary file descriptors in the exception context (by >> a callout to libc.so.6) and make sure that they are closed when >> unwinding. We can avoid memory leaks for temporary allocations in much >> the same way. >> >> In a sense, we would provide a more approachable programming environment >> for unwinding. > > OK, so lets expand on that a bit, if you went down this route what do > the steps look like? If you enumerate them a bit more I think we'll > all be able to comment on them first, and then agree that it's the > right way forward. I haven't read all of the loader code, but I think we need to manage at least the following resources: * The loader lock * Temporary memory allocation (e.g. file name buffers) * Persistant memory allocations related to the link map * Other persistant memory allocations * One file descriptor * File mappings * How new link maps are hooked into the loader state * Updates to global loader state (various search paths) * Maybe TLS data in the future (if we allocate it eagerly) Ideally, there would be a precise point, clearly indicated in the source code, at which which we know that dlopen cannot fail. I tried to identify this point in the existing sources, but it is entirely unclear where it should be. Conceptually, I think we all agree is that it has to come before we call the ELF constructors. After the point of no return, we cannot do anything that might potentially fail in a non-critical fashion, such as allocating memory. For example, in the current code, we appear to call malloc in add_to_global, when adding the new objects to the global scope. We would have to perform this allocation beforehand (and make sure that we do not need more memory afterwards because some other thread made further changes). So overall, I think the longjmp vs no longjmp discussion may be a distraction, and the harder problems lie elsewhere. Please also consider Rich Felker's comments regarding lazy binding failures in IFUNC resolvers; for those we would have to use longjmp anyway if we want to keep them as non-fatal because the IFUNC resolver may not have unwinding information. Thanks, Florian ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2019-03-13 15:51 UTC | newest] Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-03-05 16:37 Removing longjmp error handling from the dynamic loader Florian Weimer 2019-03-06 15:40 ` Rich Felker 2019-03-07 14:03 ` Adhemerval Zanella 2019-03-11 13:45 ` Florian Weimer 2019-03-11 22:52 ` Rich Felker 2019-03-12 10:30 ` Florian Weimer 2019-03-12 12:00 ` Szabolcs Nagy 2019-03-11 14:07 ` Carlos O'Donell 2019-03-11 15:29 ` Florian Weimer 2019-03-11 16:13 ` Carlos O'Donell 2019-03-11 16:53 ` Zack Weinberg 2019-03-11 17:08 ` Florian Weimer 2019-03-11 18:39 ` Zack Weinberg 2019-03-11 18:49 ` Florian Weimer 2019-03-11 18:39 ` Florian Weimer 2019-03-11 19:30 ` Carlos O'Donell 2019-03-13 15:51 ` Florian Weimer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).