unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] elf: Fix TLS modid reuse generation assignment
@ 2023-11-28  6:23 Hector Martin
  2023-11-28 10:53 ` Szabolcs Nagy
  0 siblings, 1 reply; 5+ messages in thread
From: Hector Martin @ 2023-11-28  6:23 UTC (permalink / raw
  To: libc-alpha

_dl_assign_tls_modid() assigns a slotinfo entry for a new module, but does
*not* do anything to the generation counter. The first time this
happens, the generation is zero and map_generation() returns the current
generation to be used during relocation processing. However, if a
slotinfo entry is later reused, it will already have a generation
assigned. If this generation has fallen behind the current global max
generation, then this causes an obsolete generation to be assigned
during relocation processing, as map_generation() returns this
generation if nonzero. _dl_add_to_slotinfo() eventually resets the
generation, but by then it is too late. This causes DTV updates to be
skipped, leading to NULL or broken TLS slot pointers and segfaults.

Fix this by resetting the generation to zero in _dl_assign_tls_modid(),
so it behaves the same as the first time a slot is assigned.
_dl_add_to_slotinfo() will still assign the correct static generation
later during module load, but relocation processing will no longer use
an obsolete generation.

Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29039
---
 elf/dl-tls.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index c192b5a13a94..70446e71a8c4 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -154,6 +154,7 @@ _dl_assign_tls_modid (struct link_map *l)
 	      {
 		/* Mark the entry as used, so any dependency see it.  */
 		atomic_store_relaxed (&runp->slotinfo[result - disp].map, l);
+		atomic_store_relaxed (&runp->slotinfo[result - disp].gen, 0);
 		break;
 	      }
 

---
base-commit: 78ca44da0160a0b442f0ca1f253e3360f044b2ec
change-id: 20231128-tls-modid-reuse-0a7a903a1f7e

Best regards,
-- 
Hector Martin <marcan@marcan.st>


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] elf: Fix TLS modid reuse generation assignment
  2023-11-28  6:23 [PATCH] elf: Fix TLS modid reuse generation assignment Hector Martin
@ 2023-11-28 10:53 ` Szabolcs Nagy
  2023-11-28 12:34   ` Adhemerval Zanella Netto
  2023-11-28 14:04   ` Hector Martin
  0 siblings, 2 replies; 5+ messages in thread
From: Szabolcs Nagy @ 2023-11-28 10:53 UTC (permalink / raw
  To: Hector Martin, libc-alpha

The 11/28/2023 15:23, Hector Martin wrote:
> _dl_assign_tls_modid() assigns a slotinfo entry for a new module, but does
> *not* do anything to the generation counter. The first time this
> happens, the generation is zero and map_generation() returns the current
> generation to be used during relocation processing. However, if a
> slotinfo entry is later reused, it will already have a generation
> assigned. If this generation has fallen behind the current global max
> generation, then this causes an obsolete generation to be assigned
> during relocation processing, as map_generation() returns this
> generation if nonzero. _dl_add_to_slotinfo() eventually resets the
> generation, but by then it is too late. This causes DTV updates to be
> skipped, leading to NULL or broken TLS slot pointers and segfaults.
> 
> Fix this by resetting the generation to zero in _dl_assign_tls_modid(),
> so it behaves the same as the first time a slot is assigned.
> _dl_add_to_slotinfo() will still assign the correct static generation
> later during module load, but relocation processing will no longer use
> an obsolete generation.
> 
> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29039

Thanks, this looks good to me.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>

i'd note that only TLSDESC relocation processing is affected
and in practice modid reuse happens after a dlclose.

we usually only mention the bug number in the commit message
not the bugzilla url.

i can commit your patch with these changes if that's ok with you.


i think the bug is present since

  commit 572bd547d57a39b6cf0ea072545dc4048921f4c3
  Author:     Szabolcs Nagy <szabolcs.nagy@arm.com>
  CommitDate: 2021-05-11 17:16:37 +0100

  elf: Fix DTV gap reuse logic [BZ #27135]

which got reverted and an updated version committed

  commit ba33937be210da5d07f7f01709323743f66011ce
  Author:     Adhemerval Zanella <adhemerval.zanella@linaro.org>
  CommitDate: 2021-07-14 15:10:27 -0300

  elf: Fix DTV gap reuse logic (BZ #27135)

which is part of the 2.34 release, so we will have to
backport up to that version.

before that, modid reuse was hard to trigger (only dlopen
failure would do it and the failure would have to happen
after gen counter assignment for it to be problematic).

> ---
>  elf/dl-tls.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/elf/dl-tls.c b/elf/dl-tls.c
> index c192b5a13a94..70446e71a8c4 100644
> --- a/elf/dl-tls.c
> +++ b/elf/dl-tls.c
> @@ -154,6 +154,7 @@ _dl_assign_tls_modid (struct link_map *l)
>  	      {
>  		/* Mark the entry as used, so any dependency see it.  */
>  		atomic_store_relaxed (&runp->slotinfo[result - disp].map, l);
> +		atomic_store_relaxed (&runp->slotinfo[result - disp].gen, 0);
>  		break;
>  	      }
>  
> 
> ---
> base-commit: 78ca44da0160a0b442f0ca1f253e3360f044b2ec
> change-id: 20231128-tls-modid-reuse-0a7a903a1f7e
> 
> Best regards,
> -- 
> Hector Martin <marcan@marcan.st>
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] elf: Fix TLS modid reuse generation assignment
  2023-11-28 10:53 ` Szabolcs Nagy
@ 2023-11-28 12:34   ` Adhemerval Zanella Netto
  2023-11-28 16:43     ` Szabolcs Nagy
  2023-11-28 14:04   ` Hector Martin
  1 sibling, 1 reply; 5+ messages in thread
From: Adhemerval Zanella Netto @ 2023-11-28 12:34 UTC (permalink / raw
  To: Szabolcs Nagy, Hector Martin, libc-alpha



On 28/11/23 07:53, Szabolcs Nagy wrote:
> The 11/28/2023 15:23, Hector Martin wrote:
>> _dl_assign_tls_modid() assigns a slotinfo entry for a new module, but does
>> *not* do anything to the generation counter. The first time this
>> happens, the generation is zero and map_generation() returns the current
>> generation to be used during relocation processing. However, if a
>> slotinfo entry is later reused, it will already have a generation
>> assigned. If this generation has fallen behind the current global max
>> generation, then this causes an obsolete generation to be assigned
>> during relocation processing, as map_generation() returns this
>> generation if nonzero. _dl_add_to_slotinfo() eventually resets the
>> generation, but by then it is too late. This causes DTV updates to be
>> skipped, leading to NULL or broken TLS slot pointers and segfaults.
>>
>> Fix this by resetting the generation to zero in _dl_assign_tls_modid(),
>> so it behaves the same as the first time a slot is assigned.
>> _dl_add_to_slotinfo() will still assign the correct static generation
>> later during module load, but relocation processing will no longer use
>> an obsolete generation.
>>
>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29039
> 
> Thanks, this looks good to me.
> 
> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
> 
> i'd note that only TLSDESC relocation processing is affected
> and in practice modid reuse happens after a dlclose.
> 
> we usually only mention the bug number in the commit message
> not the bugzilla url.
> 
> i can commit your patch with these changes if that's ok with you.
> 
> 
> i think the bug is present since
> 
>   commit 572bd547d57a39b6cf0ea072545dc4048921f4c3
>   Author:     Szabolcs Nagy <szabolcs.nagy@arm.com>
>   CommitDate: 2021-05-11 17:16:37 +0100
> 
>   elf: Fix DTV gap reuse logic [BZ #27135]
> 
> which got reverted and an updated version committed
> 
>   commit ba33937be210da5d07f7f01709323743f66011ce
>   Author:     Adhemerval Zanella <adhemerval.zanella@linaro.org>
>   CommitDate: 2021-07-14 15:10:27 -0300
> 
>   elf: Fix DTV gap reuse logic (BZ #27135)
> 
> which is part of the 2.34 release, so we will have to
> backport up to that version.
> 
> before that, modid reuse was hard to trigger (only dlopen
> failure would do it and the failure would have to happen
> after gen counter assignment for it to be problematic).

Would be possible to create a regression testcase for this issue? From RH
bug report [1], it seems to describe a somewhat feasible scenario.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=2251557

> 
>> ---
>>  elf/dl-tls.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/elf/dl-tls.c b/elf/dl-tls.c
>> index c192b5a13a94..70446e71a8c4 100644
>> --- a/elf/dl-tls.c
>> +++ b/elf/dl-tls.c
>> @@ -154,6 +154,7 @@ _dl_assign_tls_modid (struct link_map *l)
>>  	      {
>>  		/* Mark the entry as used, so any dependency see it.  */
>>  		atomic_store_relaxed (&runp->slotinfo[result - disp].map, l);
>> +		atomic_store_relaxed (&runp->slotinfo[result - disp].gen, 0);
>>  		break;
>>  	      }
>>  
>>
>> ---
>> base-commit: 78ca44da0160a0b442f0ca1f253e3360f044b2ec
>> change-id: 20231128-tls-modid-reuse-0a7a903a1f7e
>>
>> Best regards,
>> -- 
>> Hector Martin <marcan@marcan.st>
>>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] elf: Fix TLS modid reuse generation assignment
  2023-11-28 10:53 ` Szabolcs Nagy
  2023-11-28 12:34   ` Adhemerval Zanella Netto
@ 2023-11-28 14:04   ` Hector Martin
  1 sibling, 0 replies; 5+ messages in thread
From: Hector Martin @ 2023-11-28 14:04 UTC (permalink / raw
  To: Szabolcs Nagy, libc-alpha



On 2023/11/28 19:53, Szabolcs Nagy wrote:
> The 11/28/2023 15:23, Hector Martin wrote:
>> _dl_assign_tls_modid() assigns a slotinfo entry for a new module, but does
>> *not* do anything to the generation counter. The first time this
>> happens, the generation is zero and map_generation() returns the current
>> generation to be used during relocation processing. However, if a
>> slotinfo entry is later reused, it will already have a generation
>> assigned. If this generation has fallen behind the current global max
>> generation, then this causes an obsolete generation to be assigned
>> during relocation processing, as map_generation() returns this
>> generation if nonzero. _dl_add_to_slotinfo() eventually resets the
>> generation, but by then it is too late. This causes DTV updates to be
>> skipped, leading to NULL or broken TLS slot pointers and segfaults.
>>
>> Fix this by resetting the generation to zero in _dl_assign_tls_modid(),
>> so it behaves the same as the first time a slot is assigned.
>> _dl_add_to_slotinfo() will still assign the correct static generation
>> later during module load, but relocation processing will no longer use
>> an obsolete generation.
>>
>> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29039
> 
> Thanks, this looks good to me.
> 
> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
> 
> i'd note that only TLSDESC relocation processing is affected
> and in practice modid reuse happens after a dlclose.
> 
> we usually only mention the bug number in the commit message
> not the bugzilla url.
> 
> i can commit your patch with these changes if that's ok with you.

Works for me, thanks :)

> 
> 
> i think the bug is present since
> 
>   commit 572bd547d57a39b6cf0ea072545dc4048921f4c3
>   Author:     Szabolcs Nagy <szabolcs.nagy@arm.com>
>   CommitDate: 2021-05-11 17:16:37 +0100
> 
>   elf: Fix DTV gap reuse logic [BZ #27135]
> 
> which got reverted and an updated version committed
> 
>   commit ba33937be210da5d07f7f01709323743f66011ce
>   Author:     Adhemerval Zanella <adhemerval.zanella@linaro.org>
>   CommitDate: 2021-07-14 15:10:27 -0300
> 
>   elf: Fix DTV gap reuse logic (BZ #27135)
> 
> which is part of the 2.34 release, so we will have to
> backport up to that version.
> 
> before that, modid reuse was hard to trigger (only dlopen
> failure would do it and the failure would have to happen
> after gen counter assignment for it to be problematic).
> 
>> ---
>>  elf/dl-tls.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/elf/dl-tls.c b/elf/dl-tls.c
>> index c192b5a13a94..70446e71a8c4 100644
>> --- a/elf/dl-tls.c
>> +++ b/elf/dl-tls.c
>> @@ -154,6 +154,7 @@ _dl_assign_tls_modid (struct link_map *l)
>>  	      {
>>  		/* Mark the entry as used, so any dependency see it.  */
>>  		atomic_store_relaxed (&runp->slotinfo[result - disp].map, l);
>> +		atomic_store_relaxed (&runp->slotinfo[result - disp].gen, 0);
>>  		break;
>>  	      }
>>  
>>
>> ---
>> base-commit: 78ca44da0160a0b442f0ca1f253e3360f044b2ec
>> change-id: 20231128-tls-modid-reuse-0a7a903a1f7e
>>
>> Best regards,
>> -- 
>> Hector Martin <marcan@marcan.st>
>>
> 

- Hector

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] elf: Fix TLS modid reuse generation assignment
  2023-11-28 12:34   ` Adhemerval Zanella Netto
@ 2023-11-28 16:43     ` Szabolcs Nagy
  0 siblings, 0 replies; 5+ messages in thread
From: Szabolcs Nagy @ 2023-11-28 16:43 UTC (permalink / raw
  To: Adhemerval Zanella Netto, Hector Martin, libc-alpha

The 11/28/2023 09:34, Adhemerval Zanella Netto wrote:
> 
> 
> On 28/11/23 07:53, Szabolcs Nagy wrote:
> > The 11/28/2023 15:23, Hector Martin wrote:
> >> _dl_assign_tls_modid() assigns a slotinfo entry for a new module, but does
> >> *not* do anything to the generation counter. The first time this
> >> happens, the generation is zero and map_generation() returns the current
> >> generation to be used during relocation processing. However, if a
> >> slotinfo entry is later reused, it will already have a generation
> >> assigned. If this generation has fallen behind the current global max
> >> generation, then this causes an obsolete generation to be assigned
> >> during relocation processing, as map_generation() returns this
> >> generation if nonzero. _dl_add_to_slotinfo() eventually resets the
> >> generation, but by then it is too late. This causes DTV updates to be
> >> skipped, leading to NULL or broken TLS slot pointers and segfaults.
> >>
> >> Fix this by resetting the generation to zero in _dl_assign_tls_modid(),
> >> so it behaves the same as the first time a slot is assigned.
> >> _dl_add_to_slotinfo() will still assign the correct static generation
> >> later during module load, but relocation processing will no longer use
> >> an obsolete generation.
> >>
> >> Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=29039
> > 
> > Thanks, this looks good to me.
> > 
> > Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
> > 
> > i'd note that only TLSDESC relocation processing is affected
> > and in practice modid reuse happens after a dlclose.
> > 
> > we usually only mention the bug number in the commit message
> > not the bugzilla url.
> > 
> > i can commit your patch with these changes if that's ok with you.
> > 
> > 
> > i think the bug is present since
> > 
> >   commit 572bd547d57a39b6cf0ea072545dc4048921f4c3
> >   Author:     Szabolcs Nagy <szabolcs.nagy@arm.com>
> >   CommitDate: 2021-05-11 17:16:37 +0100
> > 
> >   elf: Fix DTV gap reuse logic [BZ #27135]
> > 
> > which got reverted and an updated version committed
> > 
> >   commit ba33937be210da5d07f7f01709323743f66011ce
> >   Author:     Adhemerval Zanella <adhemerval.zanella@linaro.org>
> >   CommitDate: 2021-07-14 15:10:27 -0300
> > 
> >   elf: Fix DTV gap reuse logic (BZ #27135)
> > 
> > which is part of the 2.34 release, so we will have to
> > backport up to that version.
> > 
> > before that, modid reuse was hard to trigger (only dlopen
> > failure would do it and the failure would have to happen
> > after gen counter assignment for it to be problematic).
> 
> Would be possible to create a regression testcase for this issue? From RH
> bug report [1], it seems to describe a somewhat feasible scenario.
> 
> [1] https://bugzilla.redhat.com/show_bug.cgi?id=2251557


note that the bug only happens if the code goes through
_dl_make_tlsdesc_dynamic (and later accesses the tls object
via _dl_tlsdesc_dynamic, not dlsym or __tls_get_addr)

because tlsdesc is optimized to use static tls, it should
not happen up to 512 byte tls (so naive test code won't
reproduce the bug).

(it also requires specific conditions to get an out of date
gen count in _dl_make_tlsdesc_dynamic and get a thread dtv
where that causes trouble.)

however this means we can recommend large surplus tls as
a workaround, e.g.:

GLIBC_TUNABLES=glibc.rtld.optional_static_tls=4000

which is useful anyway: dynamic tlsdesc fast path is slower
than the fixed offset tlsdesc access, so it makes tls access
faster too.

i will add this to the commit message and can look at adding
a regression test later.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-11-28 16:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-28  6:23 [PATCH] elf: Fix TLS modid reuse generation assignment Hector Martin
2023-11-28 10:53 ` Szabolcs Nagy
2023-11-28 12:34   ` Adhemerval Zanella Netto
2023-11-28 16:43     ` Szabolcs Nagy
2023-11-28 14:04   ` Hector Martin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).