From: Adhemerval Zanella via Libc-alpha <libc-alpha@sourceware.org>
To: Florian Weimer <fweimer@redhat.com>,
Jonathon Anderson <janderson@rice.edu>
Cc: John Mellor-Crummey <johnmc@rice.edu>,
libc-alpha@sourceware.org, "Mark W. Krentel" <krentel@rice.edu>,
Xiaozhu Meng <xm13@rice.edu>
Subject: Re: Fwd: [PATCH v5 00/22] Some rtld-audit fixes
Date: Fri, 19 Nov 2021 16:56:25 -0300 [thread overview]
Message-ID: <d9b3a9a5-4808-ddc3-df37-7a214736794b@linaro.org> (raw)
In-Reply-To: <87ee7c9coo.fsf@oldenburg.str.redhat.com>
On 19/11/2021 16:18, Florian Weimer via Libc-alpha wrote:
> * Jonathon Anderson:
>
>>>> Right now, we
>>>> only require the program headers which we can obtain from
>>>> getauxval(AT_PHDR), however this technique has questionable
>>>> portability and robustness (getauxval returns an unsigned long, not a
>>>> pointer).
>
>>> A glibc port to an architecture where a long value cannot hold all
>>> pointer values will have to provide an alternative interface similar to
>>> getauxval, but that returns pointer values.
>
>> I would go one step farther and say getauxval is already broken for
>> any 64-bit architecture, unsigned long is only required to support 32
>> bits as per the C standards. One of my greater fears is that some
>> exotic compiler will cleverly allocate only 4 bytes of stack space for
>> the return value, and we wouldn't know except for a subtle bug
>> (dependent on optimization flags!) that crashes our entire tool with
>> SEGVs in the auditor (where GDB doesn't give properly unwound call
>> stacks).
>
> If ported to such an architecture, glibc would need several changes to
> accomodate this. Newer architectures take this into account and do not
> do funny things. But Morello, as a capabilities-based architecture,
> does not have this luxury, so they have to do something about this
> interface. But that is (still) an outlier.
>
> I think the important point is that glibc interfaces do not need to be
> fully API-compatible with future architecture requirements. We can
> change APIs for future ports.
>
>>> Of course that's not the
>>> only interface with this problem (ElfW(Addr) is an integer as well).
>
>> AFAICT ElfW(Addr) is fine, it should always be an integer large enough
>> to store a pointer on the host architecture (i.e. a uintptr_t). Unless
>> I missed some specific arch where this doesn't work out to be the
>> case?
>
> Morello and other capabilities-based architectures. Pointers need to
> pointers there. Weird architectures do not have uintptr_t.
>
>>> makes the Morello glibc port quite interesting. So I think *something*
>>> like getauxval (AT_PHDR) will always be available, with pretty much
>>> identical semantics.
>>>
>>>> From an outside perspective the current l_addr semantic is fairly
>>>> undocumented, the dladdr and dlinfo man pages define it vaguely as
>>>> the "difference between the address in the ELF file and the address
>>>> in memory." That sounds (to me at least) like l_addr should point to
>>>> byte 0 in the file (the ELF header), and that seems to be correct in
>>>> all but the non-PIE case.
>>> I have struggled with this in the past. I agree that it is confusing.
>>> l_addr is the offset between virtual addresses in the program header of
>>> the ELF object and the actual addresses in the process image. This
>>> offset happens to be 0 for ET_EXEC objects, and only there.
>
>> This is a much clearer description of the semantic, it would be very
>> helpful the man pages used that sentence (or one like it) wherever the
>> l_addr value is exposed in the API (link_map->l_addr and
>> dl_phdr_info->dlpi_addr). It would also be very helpful if
>> Dl_info.dli_fbase was clearly documented as *not* l_addr but instead
>> byte 0/ELF header in the process image.
>
> I've made a note to update the manual pages.
>
>> That does sound like the "correct" way out, but dl_iterate_phdr
>> operates on the caller's namespace so one would need to inject a shim
>> library to do the actual call.
>
> Ugh, you are right. It means we currently can't unwind across dlmopen
> boundaries.
>
> So please use getauxval (AT_PHDR) for now. It is fully portable across
> all present glibc targets.
>
>>>> dladdr gets its value from link_map->l_map_start instead of l_addr,
>>>> so the semantic we want is already present in a private field. It
>>>> seems to me these two fields could be swapped with little issue, if
>>>> altering the public semantic is not acceptable we could also be sated
>>>> if l_map_start was made public.
>>> Applications which know about the current semantics of l_addr will
>>> break, though. l_addr is also exposed to debuggers via the _r_debug
>>> interface. I really do not think we can make changes to l_addr.
>>> We have a similar issue around l_name being "" for the main program, and
>>> unfortuantely I will have to argue quite strongly against changing that.
>
>> Is adding new public fields completely off the table?
>
> To struct link_map? We could probably pull it off, but it would be
> years until such a change will be in the hands of the users. There is
> an internal structure that overlaps with the public struct link_map,
> and some applications poke at the private bits at fixed offsets.
>
> We've started not to strip ld.so downstream, so that these applications
> can switch to DWARF data to avoid dependencies on fixed offsets, but
> that has been a very recent change.
>
>> If I can humor the impossible for a few moments longer, I personally
>> have a difficult time believing that anyone actually uses
>> link_map->l_addr or link_map->l_name in a way that would break by
>> changing their semantics for the main executable:
>
>> - The documentation hasn't improved for years so there can't be many
>> users that care about (or even noticed) this case in particular.
>
> "" for the main executable is widely known. Usually code uses it to
> implement a fallback on argv[0] or /proc/self/exe, though.
There are still the issue where audit interface does not have direct
access to argv[0] from the audited process and '/proc' might also not
be accessible. I am still not convinced that provided argv[0] for
l_name for main executable is worse than "", specially because the
fallback might not work.
>
> Changing l_addr will break the libgcc unwinder. It uses l_addr to
> relocate the program header (see the code I quoted previously). Not
> everyone uses the platform unwinder, and the libgcc unwinder is
> sometimes linked statically. This is different from the l_name change:
> The l_addr would definitely cause widespread breakage.
>
>> - Every use case I can think of for obtaining a link_map from the dl*
>> functions (dlinfo and dladdr1) will either already have the special
>> handling, or won't operate on the main executable, or likely won't opt
>> to use l_addr (vs. dlsym or dli_fbase) or l_name (vs. dli_fname).
>
> Some special-case the main executable based on l_name, I expect, which
> is why I'm so reluctant to change l_name. The GDB comment is actually
> hinting strongly towards a "" convention (that Solaris broke).
So I take that Solaris does provide the application name to l_name? And
what kind of breakage it has done on gdb?
next prev parent reply other threads:[~2021-11-19 19:56 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <EA69A62D-7C01-4536-B551-2609226053F2@rice.edu>
2021-11-17 18:08 ` Fwd: [PATCH v5 00/22] Some rtld-audit fixes John Mellor-Crummey via Libc-alpha
2021-11-17 20:42 ` Florian Weimer via Libc-alpha
2021-11-18 21:55 ` Jonathon Anderson via Libc-alpha
2021-11-19 19:18 ` Florian Weimer via Libc-alpha
2021-11-19 19:56 ` Adhemerval Zanella via Libc-alpha [this message]
2021-11-19 20:31 ` Florian Weimer via Libc-alpha
2021-11-23 16:36 ` Adhemerval Zanella via Libc-alpha
2021-11-22 17:46 jma14 via Libc-alpha
2021-11-23 13:58 ` Adhemerval Zanella via Libc-alpha
2021-11-23 14:02 ` Florian Weimer via Libc-alpha
2021-11-23 16:25 ` Adhemerval Zanella via Libc-alpha
2021-11-23 16:50 ` Florian Weimer via Libc-alpha
2021-11-23 21:13 ` Jonathon Anderson via Libc-alpha
2021-11-25 17:56 ` Adhemerval Zanella via Libc-alpha
-- strict thread matches above, loose matches on Subject: below --
2021-11-22 17:46 jma14 via Libc-alpha
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/libc/involved.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d9b3a9a5-4808-ddc3-df37-7a214736794b@linaro.org \
--to=libc-alpha@sourceware.org \
--cc=adhemerval.zanella@linaro.org \
--cc=fweimer@redhat.com \
--cc=janderson@rice.edu \
--cc=johnmc@rice.edu \
--cc=krentel@rice.edu \
--cc=xm13@rice.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).