unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Adhemerval Zanella via Libc-alpha <libc-alpha@sourceware.org>
To: Florian Weimer <fweimer@redhat.com>,
	Jonathon Anderson <janderson@rice.edu>
Cc: John Mellor-Crummey <johnmc@rice.edu>,
	libc-alpha@sourceware.org, "Mark W. Krentel" <krentel@rice.edu>,
	Xiaozhu Meng <xm13@rice.edu>
Subject: Re: Fwd: [PATCH v5 00/22] Some rtld-audit fixes
Date: Fri, 19 Nov 2021 16:56:25 -0300	[thread overview]
Message-ID: <d9b3a9a5-4808-ddc3-df37-7a214736794b@linaro.org> (raw)
In-Reply-To: <87ee7c9coo.fsf@oldenburg.str.redhat.com>



On 19/11/2021 16:18, Florian Weimer via Libc-alpha wrote:
> * Jonathon Anderson:
> 
>>>> Right now, we
>>>> only require the program headers which we can obtain from
>>>> getauxval(AT_PHDR), however this technique has questionable
>>>> portability and robustness (getauxval returns an unsigned long, not a
>>>> pointer).
> 
>>> A glibc port to an architecture where a long value cannot hold all
>>> pointer values will have to provide an alternative interface similar to
>>> getauxval, but that returns pointer values.
> 
>> I would go one step farther and say getauxval is already broken for
>> any 64-bit architecture, unsigned long is only required to support 32
>> bits as per the C standards. One of my greater fears is that some
>> exotic compiler will cleverly allocate only 4 bytes of stack space for
>> the return value, and we wouldn't know except for a subtle bug
>> (dependent on optimization flags!) that crashes our entire tool with
>> SEGVs in the auditor (where GDB doesn't give properly unwound call
>> stacks).
> 
> If ported to such an architecture, glibc would need several changes to
> accomodate this.  Newer architectures take this into account and do not
> do funny things.  But Morello, as a capabilities-based architecture,
> does not have this luxury, so they have to do something about this
> interface.  But that is (still) an outlier.
> 
> I think the important point is that glibc interfaces do not need to be
> fully API-compatible with future architecture requirements.  We can
> change APIs for future ports.
> 
>>> Of course that's not the
>>> only interface with this problem (ElfW(Addr) is an integer as well).
> 
>> AFAICT ElfW(Addr) is fine, it should always be an integer large enough
>> to store a pointer on the host architecture (i.e. a uintptr_t). Unless
>> I missed some specific arch where this doesn't work out to be the
>> case?
> 
> Morello and other capabilities-based architectures.  Pointers need to
> pointers there.  Weird architectures do not have uintptr_t.
> 
>>> makes the Morello glibc port quite interesting.  So I think *something*
>>> like getauxval (AT_PHDR) will always be available, with pretty much
>>> identical semantics.
>>>
>>>>  From an outside perspective the current l_addr semantic is fairly
>>>> undocumented, the dladdr and dlinfo man pages define it vaguely as
>>>> the "difference between the address in the ELF file and the address
>>>> in memory." That sounds (to me at least) like l_addr should point to
>>>> byte 0 in the file (the ELF header), and that seems to be correct in
>>>> all but the non-PIE case.
>>> I have struggled with this in the past.  I agree that it is confusing.
>>> l_addr is the offset between virtual addresses in the program header of
>>> the ELF object and the actual addresses in the process image.  This
>>> offset happens to be 0 for ET_EXEC objects, and only there.
> 
>> This is a much clearer description of the semantic, it would be very
>> helpful the man pages used that sentence (or one like it) wherever the 
>> l_addr value is exposed in the API (link_map->l_addr and
>> dl_phdr_info->dlpi_addr). It would also be very helpful if 
>> Dl_info.dli_fbase was clearly documented as *not* l_addr but instead
>> byte 0/ELF header in the process image.
> 
> I've made a note to update the manual pages.
> 
>> That does sound like the "correct" way out, but dl_iterate_phdr
>> operates on the caller's namespace so one would need to inject a shim
>> library to do the actual call.
> 
> Ugh, you are right.  It means we currently can't unwind across dlmopen
> boundaries.
> 
> So please use getauxval (AT_PHDR) for now.  It is fully portable across
> all present glibc targets.
> 
>>>> dladdr gets its value from link_map->l_map_start instead of l_addr,
>>>> so the semantic we want is already present in a private field. It
>>>> seems to me these two fields could be swapped with little issue, if
>>>> altering the public semantic is not acceptable we could also be sated
>>>> if l_map_start was made public.
>>> Applications which know about the current semantics of l_addr will
>>> break, though.  l_addr is also exposed to debuggers via the _r_debug
>>> interface.  I really do not think we can make changes to l_addr.
>>> We have a similar issue around l_name being "" for the main program, and
>>> unfortuantely I will have to argue quite strongly against changing that.
> 
>> Is adding new public fields completely off the table?
> 
> To struct link_map?  We could probably pull it off, but it would be
> years until such a change will be in the hands of the users.  There is
> an internal structure that overlaps with the public struct link_map,
> and some applications poke at the private bits at fixed offsets.
> 
> We've started not to strip ld.so downstream, so that these applications
> can switch to DWARF data to avoid dependencies on fixed offsets, but
> that has been a very recent change.
> 
>> If I can humor the impossible for a few moments longer, I personally
>> have a difficult time believing that anyone actually uses 
>> link_map->l_addr or link_map->l_name in a way that would break by
>> changing their semantics for the main executable:
> 
>>  - The documentation hasn't improved for years so there can't be many
>> users that care about (or even noticed) this case in particular.
> 
> "" for the main executable is widely known.  Usually code uses it to
> implement a fallback on argv[0] or /proc/self/exe, though.

There are still the issue where audit interface does not have direct
access to argv[0] from the audited process and '/proc' might also not
be accessible.  I am still not convinced that provided argv[0] for
l_name for main executable is worse than "", specially because the
fallback might not work.

> 
> Changing l_addr will break the libgcc unwinder.  It uses l_addr to
> relocate the program header (see the code I quoted previously).  Not
> everyone uses the platform unwinder, and the libgcc unwinder is
> sometimes linked statically.  This is different from the l_name change:
> The l_addr would definitely cause widespread breakage.
> 
>>  - Every use case I can think of for obtaining a link_map from the dl*
>> functions (dlinfo and dladdr1) will either already have the special 
>> handling, or won't operate on the main executable, or likely won't opt
>> to use l_addr (vs. dlsym or dli_fbase) or l_name (vs. dli_fname).
> 
> Some special-case the main executable based on l_name, I expect, which
> is why I'm so reluctant to change l_name.  The GDB comment is actually
> hinting strongly towards a "" convention (that Solaris broke).

So I take that Solaris does provide the application name to l_name? And
what kind of breakage it has done on gdb?

  reply	other threads:[~2021-11-19 19:56 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <EA69A62D-7C01-4536-B551-2609226053F2@rice.edu>
2021-11-17 18:08 ` Fwd: [PATCH v5 00/22] Some rtld-audit fixes John Mellor-Crummey via Libc-alpha
2021-11-17 20:42   ` Florian Weimer via Libc-alpha
2021-11-18 21:55     ` Jonathon Anderson via Libc-alpha
2021-11-19 19:18       ` Florian Weimer via Libc-alpha
2021-11-19 19:56         ` Adhemerval Zanella via Libc-alpha [this message]
2021-11-19 20:31           ` Florian Weimer via Libc-alpha
2021-11-23 16:36             ` Adhemerval Zanella via Libc-alpha
2021-11-22 17:46 jma14 via Libc-alpha
2021-11-23 13:58 ` Adhemerval Zanella via Libc-alpha
2021-11-23 14:02   ` Florian Weimer via Libc-alpha
2021-11-23 16:25     ` Adhemerval Zanella via Libc-alpha
2021-11-23 16:50       ` Florian Weimer via Libc-alpha
2021-11-23 21:13         ` Jonathon Anderson via Libc-alpha
2021-11-25 17:56           ` Adhemerval Zanella via Libc-alpha
  -- strict thread matches above, loose matches on Subject: below --
2021-11-22 17:46 jma14 via Libc-alpha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/libc/involved.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d9b3a9a5-4808-ddc3-df37-7a214736794b@linaro.org \
    --to=libc-alpha@sourceware.org \
    --cc=adhemerval.zanella@linaro.org \
    --cc=fweimer@redhat.com \
    --cc=janderson@rice.edu \
    --cc=johnmc@rice.edu \
    --cc=krentel@rice.edu \
    --cc=xm13@rice.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).