unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* Re: Fwd: [PATCH v5 00/22] Some rtld-audit fixes
@ 2021-11-22 17:46 jma14 via Libc-alpha
  2021-11-23 13:58 ` Adhemerval Zanella via Libc-alpha
  0 siblings, 1 reply; 15+ messages in thread
From: jma14 via Libc-alpha @ 2021-11-22 17:46 UTC (permalink / raw)
  To: Florian Weimer, Adhemerval Zanella
  Cc: John Mellor-Crummey, libc-alpha, Mark W. Krentel, Xiaozhu Meng


On 11/19/21 14:31, Florian Weimer       wrote:

> * Adhemerval Zanella:
>>> "" for the main executable is widely known.  Usually code uses it to
>>> implement a fallback on argv[0] or /proc/self/exe, though.

If this is widely known, it would be helpful if this was added to the  
man pages. Currently the man pages define l_name as the "absolute  
pathname where object was found." That sounds (to me at least) like  
dlinfo(dlopen(NULL)) is a portable alternative to the Linux-specific  
AT_EXECFN, contrary to reality.

>> There are still the issue where audit interface does not have direct
>> access to argv[0] from the audited process and '/proc' might also not
>> be accessible.  I am still not convinced that provided argv[0] for
>> l_name for main executable is worse than "", specially because the
>> fallback might not work.
>
> I think it's better to give the auditor a chance to figure out whether
> they want to use program_invocation_name (if that's not available in the
> inner libc, that's for sure a bug we must fix), AT_EXECFN, or
> /proc/self/exe.  If we pick one of these for the auditor, we make it
> more difficult to make the appropriate choice.

Apologies, but I don't understand the logic here. The main executable  
is easy to identify in la_objopen as `l_prev == NULL && lmid ==  
LM_ID_BASE`, the same (effectively) is done in gdb [1]. Applications  
have dlinfo(dlopen(NULL)) to directly obtain the main executable's  
link_map. I fail to see how setting l_name to "" is superior to either  
of those options, especially given the downsides.

I agree with Adhemerval. There is no portable way for an auditor to  
acquire the resolved path to the main executable, AT_EXECFN and /proc  
are Linux-specific and dladdr (currently) often crashes from the  
auditor. An l_name of "" means auditors *cannot* make the appropriate  
choice.

[1]  
https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=gdb/solib-svr4.c;hb=d3de0860104b8bb8d496527fbb042c3b4c5c82dc#l1230

>>>   […]   […]
>>>   # define __RELOC_POINTER(ptr, base) ((ptr) + (base))
>>>   […]
>>>   _Unwind_Ptr load_base;
>>>   […]
>>>   load_base = info->dlpi_addr;
>>>   […]
>>>   /* See if PC falls into one of the loaded segments.  Find the eh_frame
>>>      segment at the same time.  */
>>>   for (n = info->dlpi_phnum; --n >= 0; phdr++)
>>>     {
>>>       if (phdr->p_type == PT_LOAD)
>>>         {
>>>           _Unwind_Ptr vaddr = (_Unwind_Ptr)
>>>             __RELOC_POINTER (phdr->p_vaddr, load_base);
>>>           if (data->pc >= vaddr && data->pc < vaddr + phdr->p_memsz)
>>>             {
>>>               match = 1;
>>>               pc_low = vaddr;
>>>               pc_high =  vaddr + phdr->p_memsz;
>>>             }
>>>         }
>>>   […]

>>> Changing l_addr will break the libgcc unwinder.  It uses l_addr to
>>> relocate the program header (see the code I quoted previously).  Not
>>> everyone uses the platform unwinder, and the libgcc unwinder is
>>> sometimes linked statically.  This is different from the l_name
>>> change: The l_addr would definitely cause widespread breakage.

Based on the quoted code snippet, the libgcc unwinder uses dlpi_addr  
(from dl_iterate_phdr) instead of l_addr. To clarify, my proposal is  
that *only* the publicly visible l_addr changes. In short replacing  
the relation:
    dlpi_addr (from dl_iterate_phdr) == l_addr != dli_fbase (from  
dladdr) == (private) l_map_start
with the relation:
    dlpi_addr (from dl_iterate_phdr) != l_addr == dli_fbase (from  
dladdr) == (private) l_map_start

Any Glibc code using l_addr (including dl_iterate_phdr) would instead  
use a newly added private field retaining the old semantic, and l_addr  
would carry the value of l_map_start.

I don't expect users to use dl_iterate_phdr and link_map in tandem  
(and then fail when l_addr != dlpi_addr), the former (+  
dladdr(dlpi_phdr)) gives strictly superior information. The quoted  
code snippet supports that expectation.

-Jonathon



^ permalink raw reply	[flat|nested] 15+ messages in thread
* Re: Fwd: [PATCH v5 00/22] Some rtld-audit fixes
@ 2021-11-22 17:46 jma14 via Libc-alpha
  0 siblings, 0 replies; 15+ messages in thread
From: jma14 via Libc-alpha @ 2021-11-22 17:46 UTC (permalink / raw)
  To: Florian Weimer
  Cc: John Mellor-Crummey, libc-alpha, Mark W. Krentel, Xiaozhu Meng


On 11/19/21 13:18, Florian Weimer wrote:

> * Jonathon Anderson:
>>>> Right now, we only require the program headers which we can obtain
>>>> from getauxval(AT_PHDR), however this technique has questionable
>>>> portability and robustness (getauxval returns an unsigned long, not a
>>>> pointer).
>
>>> A glibc port to an architecture where a long value cannot hold all
>>> pointer values will have to provide an alternative interface similar
>>> to getauxval, but that returns pointer values.
>
>> I would go one step farther and say getauxval is already broken for any
>> 64-bit architecture, unsigned long is only required to support 32 bits
>> as per the C standards. One of my greater fears is that some exotic
>> compiler will cleverly allocate only 4 bytes of stack space for the
>> return value, and we wouldn't know except for a subtle bug (dependent
>> on optimization flags!) that crashes our entire tool with SEGVs in the
>> auditor (where GDB doesn't give properly unwound call stacks).
>
> If ported to such an architecture, glibc would need several changes to
> accomodate this.  Newer architectures take this into account and do not
> do funny things.

Ah, right, I forgot the sizes of the standard types are part of the  
GNU ABI. That resolves my concern then.

>> This is a much clearer description of the semantic, it would be very
>> helpful the man pages used that sentence (or one like it) wherever the
>> l_addr value is exposed in the API (link_map->l_addr and
>> dl_phdr_info->dlpi_addr). It would also be very helpful if
>> Dl_info.dli_fbase was clearly documented as *not* l_addr but instead
>> byte 0/ELF header in the process image.
>
> I've made a note to update the manual pages.

Thanks!

>>>> dladdr gets its value from link_map->l_map_start instead of l_addr,
>>>> so the semantic we want is already present in a private field. It
>>>> seems to me these two fields could be swapped with little issue, if
>>>> altering the public semantic is not acceptable we could also be sated
>>>> if l_map_start was made public.
>>>
>>> Applications which know about the current semantics of l_addr will
>>> break, though.  l_addr is also exposed to debuggers via the _r_debug
>>> interface.  I really do not think we can make changes to l_addr. We
>>> have a similar issue around l_name being "" for the main program, and
>>> unfortuantely I will have to argue quite strongly against changing
>>> that.
>
>> Is adding new public fields completely off the table?
>
> To struct link_map?  We could probably pull it off, but it would be
> years until such a change will be in the hands of the users.  There is
> an internal structure that overlaps with the public struct link_map, and
> some applications poke at the private bits at fixed offsets.  We've
> started not to strip ld.so downstream, so that these applications can
> switch to DWARF data to avoid dependencies on fixed offsets, but that
> has been a very recent change.

I would consider poking at private fields a last resort, portability  
across (versions of) GNU/Linux distributions is something we fight  
with on occasion. It also doesn't help the l_name issue.

>> I am keeping my hopes low for changing the public semantics, but I hope
>> that adding new fields for the missing data at least could fall into
>> the "bug fixes" category.
>
> We can in theory add new flags to dladdr1 or new requests to dlinfo.
> These things do not impact ABI and are backportable.

These options would not help the issue. dlinfo (currently) crashes in  
the auditor, and using it relies on the (undocumented) fact that  
dlopen handles are in fact just struct link_map*. dladdr already  
provides the corrected values, adding a new option to dladdr1 isn't  
needed.

> Changing struct link_map is challenging, as I explained.  Adding new
> functions definitely triggers deployment delays because those aren't
> considered backportable.  Adding fields to existing structs is usually
> impossible because these structs may have been used to define
> application types, and new fields cause subsequent offsets to change.

Ah, right. Can the new fields be added in a future release? If so,  
then with backported fixes to the dladdr crashes dladdr(l_ld) can be  
used as a portable workaround, provided it works as intended on all  
Glibc targets (it works on x86-64/Linux but I'll need confirmation for  
others). There is a performance cost to dladdr so I still think the  
new fields are required, it also gives the workaround a clear lifetime.

If new fields are completely impossible (now and forever), then as an  
LD_AUDIT user the only option (brought up so far) I feel comfortable  
with is a change to the l_addr/l_name semantics. It's ABI-compatible,  
I have a hard time believing there are applications that will regress  
because of it, and it fixes an interface that was *never* quite right  
(and differed from its documentation).

-Jonathon



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-11-25 17:56 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <EA69A62D-7C01-4536-B551-2609226053F2@rice.edu>
2021-11-17 18:08 ` Fwd: [PATCH v5 00/22] Some rtld-audit fixes John Mellor-Crummey via Libc-alpha
2021-11-17 20:42   ` Florian Weimer via Libc-alpha
2021-11-18 21:55     ` Jonathon Anderson via Libc-alpha
2021-11-19 19:18       ` Florian Weimer via Libc-alpha
2021-11-19 19:56         ` Adhemerval Zanella via Libc-alpha
2021-11-19 20:31           ` Florian Weimer via Libc-alpha
2021-11-23 16:36             ` Adhemerval Zanella via Libc-alpha
2021-11-22 17:46 jma14 via Libc-alpha
2021-11-23 13:58 ` Adhemerval Zanella via Libc-alpha
2021-11-23 14:02   ` Florian Weimer via Libc-alpha
2021-11-23 16:25     ` Adhemerval Zanella via Libc-alpha
2021-11-23 16:50       ` Florian Weimer via Libc-alpha
2021-11-23 21:13         ` Jonathon Anderson via Libc-alpha
2021-11-25 17:56           ` Adhemerval Zanella via Libc-alpha
  -- strict thread matches above, loose matches on Subject: below --
2021-11-22 17:46 jma14 via Libc-alpha

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).