unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH] powerpc64le/power9: guard power9 strcmp against rtld usage [BZ# 25905]
@ 2020-05-01 20:32 Paul E. Murphy via Libc-alpha
  2020-05-04 12:26 ` Adhemerval Zanella via Libc-alpha
  0 siblings, 1 reply; 8+ messages in thread
From: Paul E. Murphy via Libc-alpha @ 2020-05-01 20:32 UTC (permalink / raw
  To: libc-alpha

strcmp is used while resolving PLT references.  Vector registers
should not be used during this.  The P9 strcmp makes heavy use of
vector registers, so it should be avoided in rtld.

This prevents quiet vector register corruption when glibc is configured
with --disable-multi-arch and --with-cpu=power9.  This can be seen with
test-float64x-compat_totalordermag during the first call into
totalordermagf64x@GLIBC_2.27.

Add a guard to fallback to the power8 implementation when building
power9 strcmp for libraries other than libc.
---
 sysdeps/powerpc/powerpc64/le/power9/strcmp.S | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/sysdeps/powerpc/powerpc64/le/power9/strcmp.S b/sysdeps/powerpc/powerpc64/le/power9/strcmp.S
index 412a13599d..b29ea7c020 100644
--- a/sysdeps/powerpc/powerpc64/le/power9/strcmp.S
+++ b/sysdeps/powerpc/powerpc64/le/power9/strcmp.S
@@ -17,6 +17,11 @@
    <https://www.gnu.org/licenses/>.  */
 #include <sysdep.h>
 
+#if !IS_IN(libc)
+/* Fallback to P8 which does not use vector regs for rtld.  */
+#include <sysdeps/powerpc/powerpc64/power8/strcmp.S>
+#else
+
 #ifndef STRCMP
 # define STRCMP strcmp
 #endif
@@ -262,3 +267,5 @@ L(pagecross_nullfound):
 	b	L(pagecross_retdiff)
 END (STRCMP)
 libc_hidden_builtin_def (strcmp)
+
+#endif
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] powerpc64le/power9: guard power9 strcmp against rtld usage [BZ# 25905]
  2020-05-01 20:32 [PATCH] powerpc64le/power9: guard power9 strcmp against rtld usage [BZ# 25905] Paul E. Murphy via Libc-alpha
@ 2020-05-04 12:26 ` Adhemerval Zanella via Libc-alpha
  2020-05-04 15:25   ` [PATCHv2] " Paul E. Murphy via Libc-alpha
  0 siblings, 1 reply; 8+ messages in thread
From: Adhemerval Zanella via Libc-alpha @ 2020-05-04 12:26 UTC (permalink / raw
  To: libc-alpha



On 01/05/2020 17:32, Paul E. Murphy via Libc-alpha wrote:
> strcmp is used while resolving PLT references.  Vector registers
> should not be used during this.  The P9 strcmp makes heavy use of
> vector registers, so it should be avoided in rtld.
> 
> This prevents quiet vector register corruption when glibc is configured
> with --disable-multi-arch and --with-cpu=power9.  This can be seen with
> test-float64x-compat_totalordermag during the first call into
> totalordermagf64x@GLIBC_2.27.
> 
> Add a guard to fallback to the power8 implementation when building
> power9 strcmp for libraries other than libc.

The more usual way and it has the advantages of decouple the generic
implementation and make it more explicit is to add a new rtld-strcmp.S
file:

$ cat sysdeps/powerpc/powerpc64/le/power9/rtld-strcmp.S
#include <sysdeps/powerpc/powerpc64/power8/strcmp.S>


> ---
>  sysdeps/powerpc/powerpc64/le/power9/strcmp.S | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/sysdeps/powerpc/powerpc64/le/power9/strcmp.S b/sysdeps/powerpc/powerpc64/le/power9/strcmp.S
> index 412a13599d..b29ea7c020 100644
> --- a/sysdeps/powerpc/powerpc64/le/power9/strcmp.S
> +++ b/sysdeps/powerpc/powerpc64/le/power9/strcmp.S
> @@ -17,6 +17,11 @@
>     <https://www.gnu.org/licenses/>.  */
>  #include <sysdep.h>
>  
> +#if !IS_IN(libc)
> +/* Fallback to P8 which does not use vector regs for rtld.  */
> +#include <sysdeps/powerpc/powerpc64/power8/strcmp.S>
> +#else
> +
>  #ifndef STRCMP
>  # define STRCMP strcmp
>  #endif
> @@ -262,3 +267,5 @@ L(pagecross_nullfound):
>  	b	L(pagecross_retdiff)
>  END (STRCMP)
>  libc_hidden_builtin_def (strcmp)
> +
> +#endif
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCHv2] powerpc64le/power9: guard power9 strcmp against rtld usage [BZ# 25905]
  2020-05-04 12:26 ` Adhemerval Zanella via Libc-alpha
@ 2020-05-04 15:25   ` Paul E. Murphy via Libc-alpha
  2020-05-04 16:36     ` Adhemerval Zanella via Libc-alpha
  0 siblings, 1 reply; 8+ messages in thread
From: Paul E. Murphy via Libc-alpha @ 2020-05-04 15:25 UTC (permalink / raw
  To: libc-alpha, adhemerval.zanella

Use rtld-strcpy.S file in power9 to redirect to power8 implementation
instead.  This builds the power8 strcpy rtld with --with-cpu=power9 and
--disable-multi-arch.  The existing behavior is unchanged when
multi-arch is enabled.

---8<---
strcmp is used while resolving PLT references.  Vector registers
should not be used during this.  The P9 strcmp makes heavy use of
vector registers, so it should be avoided in rtld.

This prevents quiet vector register corruption when glibc is configured
with --disable-multi-arch and --with-cpu=power9.  This can be seen with
test-float64x-compat_totalordermag during the first call into
totalordermagf64x@GLIBC_2.27.

Add a guard to fallback to the power8 implementation when building
power9 strcmp for libraries other than libc.
---
 sysdeps/powerpc/powerpc64/le/power9/rtld-strcmp.S | 2 ++
 1 file changed, 2 insertions(+)
 create mode 100644 sysdeps/powerpc/powerpc64/le/power9/rtld-strcmp.S

diff --git a/sysdeps/powerpc/powerpc64/le/power9/rtld-strcmp.S b/sysdeps/powerpc/powerpc64/le/power9/rtld-strcmp.S
new file mode 100644
index 0000000000..afdb492b3d
--- /dev/null
+++ b/sysdeps/powerpc/powerpc64/le/power9/rtld-strcmp.S
@@ -0,0 +1,2 @@
+/* Fallback to P8 which does not use vector regs for rtld.  */
+#include <sysdeps/powerpc/powerpc64/power8/strcmp.S>
-- 
2.21.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCHv2] powerpc64le/power9: guard power9 strcmp against rtld usage [BZ# 25905]
  2020-05-04 15:25   ` [PATCHv2] " Paul E. Murphy via Libc-alpha
@ 2020-05-04 16:36     ` Adhemerval Zanella via Libc-alpha
  2020-05-04 18:02       ` Paul E Murphy via Libc-alpha
  0 siblings, 1 reply; 8+ messages in thread
From: Adhemerval Zanella via Libc-alpha @ 2020-05-04 16:36 UTC (permalink / raw
  To: Paul E. Murphy, libc-alpha



On 04/05/2020 12:25, Paul E. Murphy wrote:
> Use rtld-strcpy.S file in power9 to redirect to power8 implementation
> instead.  This builds the power8 strcpy rtld with --with-cpu=power9 and

s/strcpy/strcmp.

> --disable-multi-arch.  The existing behavior is unchanged when
> multi-arch is enabled.

LGTM thanks.

As a side note, is strcmp the only string routine that used in lazy
resolution?

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

> 
> ---8<---
> strcmp is used while resolving PLT references.  Vector registers
> should not be used during this.  The P9 strcmp makes heavy use of
> vector registers, so it should be avoided in rtld.
> 
> This prevents quiet vector register corruption when glibc is configured
> with --disable-multi-arch and --with-cpu=power9.  This can be seen with
> test-float64x-compat_totalordermag during the first call into
> totalordermagf64x@GLIBC_2.27.
> 
> Add a guard to fallback to the power8 implementation when building
> power9 strcmp for libraries other than libc.
> ---
>  sysdeps/powerpc/powerpc64/le/power9/rtld-strcmp.S | 2 ++
>  1 file changed, 2 insertions(+)
>  create mode 100644 sysdeps/powerpc/powerpc64/le/power9/rtld-strcmp.S
> 
> diff --git a/sysdeps/powerpc/powerpc64/le/power9/rtld-strcmp.S b/sysdeps/powerpc/powerpc64/le/power9/rtld-strcmp.S
> new file mode 100644
> index 0000000000..afdb492b3d
> --- /dev/null
> +++ b/sysdeps/powerpc/powerpc64/le/power9/rtld-strcmp.S
> @@ -0,0 +1,2 @@
> +/* Fallback to P8 which does not use vector regs for rtld.  */
> +#include <sysdeps/powerpc/powerpc64/power8/strcmp.S>
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCHv2] powerpc64le/power9: guard power9 strcmp against rtld usage [BZ# 25905]
  2020-05-04 16:36     ` Adhemerval Zanella via Libc-alpha
@ 2020-05-04 18:02       ` Paul E Murphy via Libc-alpha
  2020-05-04 18:26         ` Adhemerval Zanella via Libc-alpha
  0 siblings, 1 reply; 8+ messages in thread
From: Paul E Murphy via Libc-alpha @ 2020-05-04 18:02 UTC (permalink / raw
  To: Adhemerval Zanella, Paul E. Murphy, libc-alpha



On 5/4/20 11:36 AM, Adhemerval Zanella wrote:

> As a side note, is strcmp the only string routine that used in lazy
> resolution?

It was the only one I encountered while debugging this issue.  I won't 
claim it is the only one, I have limited experience with that corner of 
glibc.  Though, I am surprised only one test failed due to the vector 
register clobbering.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCHv2] powerpc64le/power9: guard power9 strcmp against rtld usage [BZ# 25905]
  2020-05-04 18:02       ` Paul E Murphy via Libc-alpha
@ 2020-05-04 18:26         ` Adhemerval Zanella via Libc-alpha
  2020-05-04 21:16           ` Paul E Murphy via Libc-alpha
  0 siblings, 1 reply; 8+ messages in thread
From: Adhemerval Zanella via Libc-alpha @ 2020-05-04 18:26 UTC (permalink / raw
  To: Paul E Murphy, Paul E. Murphy, libc-alpha



On 04/05/2020 15:02, Paul E Murphy wrote:
> 
> 
> On 5/4/20 11:36 AM, Adhemerval Zanella wrote:
> 
>> As a side note, is strcmp the only string routine that used in lazy
>> resolution?
> 
> It was the only one I encountered while debugging this issue.  I won't claim it is the only one, I have limited experience with that corner of glibc.  Though, I am surprised only one test failed due to the vector register clobbering.

I am wondering which would be best way to avoid such issue to happen
with different string routines since some specific loader code need 
to adhere to more strict ABI requirements.

The powerpc Makefile already adds the expected compiler tools to
avoid this very issue:

---
sysdeps/powerpc/powerpc64/Makefile

 13 # These flags prevent FPU or Altivec registers from being used,
 14 # for code called in contexts that is not allowed to touch those registers.
 15 # Stupid GCC requires us to pass all these ridiculous switches.  We need to
 16 # pass the -mno-* switches as well to prevent the compiler from attempting
 17 # to emit altivec or vsx instructions, especially when the registers aren't
 18 # available.
 19 no-special-regs := $(sort $(foreach n,40 41 50 51 60 61 62 63 \
 20                                       $(foreach m,2 3 4 5 6 7 8 9, \
 21                                                   3$m 4$m 5$m),\
 22                                     -ffixed-$n)) \
 23                    $(sort $(foreach n,$(foreach m,0 1 2 3 4 5 6 7 8 9,\
 24                                                 $m 1$m 2$m) 30 31,\
 25                                     -ffixed-v$n)) \
 26                    -ffixed-vrsave -ffixed-vscr -mno-altivec -mno-vsx
[...]
 36 CFLAGS-dl-runtime.os = $(no-special-regs)
 37 CFLAGS-dl-lookup.os = $(no-special-regs)
 38 CFLAGS-dl-misc.os = $(no-special-regs)
 39 CFLAGS-rtld-mempcpy.os = $(no-special-regs)
 40 CFLAGS-rtld-memmove.os = $(no-special-regs)
 41 CFLAGS-rtld-memchr.os = $(no-special-regs)
 42 CFLAGS-rtld-strnlen.os = $(no-special-regs)
---

However it does not help when an arch-specific implementation might ended being
pulled in rtld build and thus making this ineffective (And it seems that 
the strcmp is missing as well for the powerpc case).

And using __builtin_strcmp not -ffreestanding does really help on dl-runtime,
since strcmp call might still being generated by compiler.  

Maybe one option could to enforce all rtld code to use generic C implementation
instead, not allowing sysdeps folder to override it.  It should scale better
than actually testing all possible permutation and require less hacky solutions
like overriding the loader object for an specific build option as this patch.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCHv2] powerpc64le/power9: guard power9 strcmp against rtld usage [BZ# 25905]
  2020-05-04 18:26         ` Adhemerval Zanella via Libc-alpha
@ 2020-05-04 21:16           ` Paul E Murphy via Libc-alpha
  2020-05-05 21:01             ` Adhemerval Zanella via Libc-alpha
  0 siblings, 1 reply; 8+ messages in thread
From: Paul E Murphy via Libc-alpha @ 2020-05-04 21:16 UTC (permalink / raw
  To: Adhemerval Zanella, Paul E. Murphy, libc-alpha



On 5/4/20 1:26 PM, Adhemerval Zanella wrote:
> 
> 
> On 04/05/2020 15:02, Paul E Murphy wrote:
>>
>>
>> On 5/4/20 11:36 AM, Adhemerval Zanella wrote:
>>
>>> As a side note, is strcmp the only string routine that used in lazy
>>> resolution?
>>
>> It was the only one I encountered while debugging this issue.  I won't claim it is the only one, I have limited experience with that corner of glibc.  Though, I am surprised only one test failed due to the vector register clobbering.
> 
> I am wondering which would be best way to avoid such issue to happen
> with different string routines since some specific loader code need
> to adhere to more strict ABI requirements.
> 
> The powerpc Makefile already adds the expected compiler tools to
> avoid this very issue:
> 
> ---
> sysdeps/powerpc/powerpc64/Makefile
> 
>   13 # These flags prevent FPU or Altivec registers from being used,
>   14 # for code called in contexts that is not allowed to touch those registers.
>   15 # Stupid GCC requires us to pass all these ridiculous switches.  We need to
>   16 # pass the -mno-* switches as well to prevent the compiler from attempting
>   17 # to emit altivec or vsx instructions, especially when the registers aren't
>   18 # available.
>   19 no-special-regs := $(sort $(foreach n,40 41 50 51 60 61 62 63 \
>   20                                       $(foreach m,2 3 4 5 6 7 8 9, \
>   21                                                   3$m 4$m 5$m),\
>   22                                     -ffixed-$n)) \
>   23                    $(sort $(foreach n,$(foreach m,0 1 2 3 4 5 6 7 8 9,\
>   24                                                 $m 1$m 2$m) 30 31,\
>   25                                     -ffixed-v$n)) \
>   26                    -ffixed-vrsave -ffixed-vscr -mno-altivec -mno-vsx
> [...]
>   36 CFLAGS-dl-runtime.os = $(no-special-regs)
>   37 CFLAGS-dl-lookup.os = $(no-special-regs)
>   38 CFLAGS-dl-misc.os = $(no-special-regs)


>   39 CFLAGS-rtld-mempcpy.os = $(no-special-regs)
>   40 CFLAGS-rtld-memmove.os = $(no-special-regs)
>   41 CFLAGS-rtld-memchr.os = $(no-special-regs)
>   42 CFLAGS-rtld-strnlen.os = $(no-special-regs)

Hrm, all the string functions listed above generate VMX instructions 
when building with --with-cpu=power9 --disable-multi-arch.  I question 
the correctness and completeness of this list as strcmp is missing.

> ---
> 
> However it does not help when an arch-specific implementation might ended being
> pulled in rtld build and thus making this ineffective (And it seems that
> the strcmp is missing as well for the powerpc case).
> 
> And using __builtin_strcmp not -ffreestanding does really help on dl-runtime,
> since strcmp call might still being generated by compiler.
> 
> Maybe one option could to enforce all rtld code to use generic C implementation
> instead, not allowing sysdeps folder to override it.  It should scale better
> than actually testing all possible permutation and require less hacky solutions
> like overriding the loader object for an specific build option as this patch.
> 
I agree with that line of thought.  However, what would the performance 
impact be?  I would hazard to guess it would be limited.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCHv2] powerpc64le/power9: guard power9 strcmp against rtld usage [BZ# 25905]
  2020-05-04 21:16           ` Paul E Murphy via Libc-alpha
@ 2020-05-05 21:01             ` Adhemerval Zanella via Libc-alpha
  0 siblings, 0 replies; 8+ messages in thread
From: Adhemerval Zanella via Libc-alpha @ 2020-05-05 21:01 UTC (permalink / raw
  To: Paul E Murphy, Paul E. Murphy, libc-alpha



On 04/05/2020 18:16, Paul E Murphy wrote:
> 
> 
> On 5/4/20 1:26 PM, Adhemerval Zanella wrote:
>>
>>
>> On 04/05/2020 15:02, Paul E Murphy wrote:
>>>
>>>
>>> On 5/4/20 11:36 AM, Adhemerval Zanella wrote:
>>>
>>>> As a side note, is strcmp the only string routine that used in lazy
>>>> resolution?
>>>
>>> It was the only one I encountered while debugging this issue.  I won't claim it is the only one, I have limited experience with that corner of glibc.  Though, I am surprised only one test failed due to the vector register clobbering.
>>
>> I am wondering which would be best way to avoid such issue to happen
>> with different string routines since some specific loader code need
>> to adhere to more strict ABI requirements.
>>
>> The powerpc Makefile already adds the expected compiler tools to
>> avoid this very issue:
>>
>> ---
>> sysdeps/powerpc/powerpc64/Makefile
>>
>>   13 # These flags prevent FPU or Altivec registers from being used,
>>   14 # for code called in contexts that is not allowed to touch those registers.
>>   15 # Stupid GCC requires us to pass all these ridiculous switches.  We need to
>>   16 # pass the -mno-* switches as well to prevent the compiler from attempting
>>   17 # to emit altivec or vsx instructions, especially when the registers aren't
>>   18 # available.
>>   19 no-special-regs := $(sort $(foreach n,40 41 50 51 60 61 62 63 \
>>   20                                       $(foreach m,2 3 4 5 6 7 8 9, \
>>   21                                                   3$m 4$m 5$m),\
>>   22                                     -ffixed-$n)) \
>>   23                    $(sort $(foreach n,$(foreach m,0 1 2 3 4 5 6 7 8 9,\
>>   24                                                 $m 1$m 2$m) 30 31,\
>>   25                                     -ffixed-v$n)) \
>>   26                    -ffixed-vrsave -ffixed-vscr -mno-altivec -mno-vsx
>> [...]
>>   36 CFLAGS-dl-runtime.os = $(no-special-regs)
>>   37 CFLAGS-dl-lookup.os = $(no-special-regs)
>>   38 CFLAGS-dl-misc.os = $(no-special-regs)
> 
> 
>>   39 CFLAGS-rtld-mempcpy.os = $(no-special-regs)
>>   40 CFLAGS-rtld-memmove.os = $(no-special-regs)
>>   41 CFLAGS-rtld-memchr.os = $(no-special-regs)
>>   42 CFLAGS-rtld-strnlen.os = $(no-special-regs)
> 
> Hrm, all the string functions listed above generate VMX instructions when building with --with-cpu=power9 --disable-multi-arch.  I question the correctness and completeness of this list as strcmp is missing.

I think that's the problem of having a dissociated rule from the code it actually
uses, since its definition might be unnecessary if the code is refactored or
changed.  I didn't dig into the history of the lazy resolution code, but it
might the case where old implementation did actually used the referenced routines.

What I think would be a better approach is to make no-special-regs a parametric
rule define in the architecture Makefile (which should know which registers
it supports in lazy resolution) and use this rule (no-special-regs) on generic
definition in elf/Makefile.

> 
>> ---
>>
>> However it does not help when an arch-specific implementation might ended being
>> pulled in rtld build and thus making this ineffective (And it seems that
>> the strcmp is missing as well for the powerpc case).
>>
>> And using __builtin_strcmp not -ffreestanding does really help on dl-runtime,
>> since strcmp call might still being generated by compiler.
>>
>> Maybe one option could to enforce all rtld code to use generic C implementation
>> instead, not allowing sysdeps folder to override it.  It should scale better
>> than actually testing all possible permutation and require less hacky solutions
>> like overriding the loader object for an specific build option as this patch.
>>
> I agree with that line of thought.  However, what would the performance impact be?  I would hazard to guess it would be limited.

It would impart mostly the loader code and I would expects mostly in program 
loading (since distros now are moving to make bind now the default build 
option) with large a large number of symbols with large sizes (most likely C++
programs with load of shared libraries).

Using clang default installation from ubuntu-10 (which links to quite large C++
libraries with more than 30k symbols) on x86_64, strcmp consumes about ~3% of 
total runtime. And adding a very crude strcmp profiling using the same strategy
for relocation time from rtld.c at elf/dl-lookup.c it seems strcmp consumes
about 2.2% of total relocation time.

Most of time is in fact spend at _dl_lookup_symbol_x and _dl_relocate_object,
so I am not sure if using the best mem* routines in dynamic loader is indeed
paramount.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2020-05-05 21:01 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-05-01 20:32 [PATCH] powerpc64le/power9: guard power9 strcmp against rtld usage [BZ# 25905] Paul E. Murphy via Libc-alpha
2020-05-04 12:26 ` Adhemerval Zanella via Libc-alpha
2020-05-04 15:25   ` [PATCHv2] " Paul E. Murphy via Libc-alpha
2020-05-04 16:36     ` Adhemerval Zanella via Libc-alpha
2020-05-04 18:02       ` Paul E Murphy via Libc-alpha
2020-05-04 18:26         ` Adhemerval Zanella via Libc-alpha
2020-05-04 21:16           ` Paul E Murphy via Libc-alpha
2020-05-05 21:01             ` Adhemerval Zanella via Libc-alpha

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).