unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* V2 [PATCH 0/4] ld.so: Add --list-tunables to print tunable values
@ 2020-09-18 16:07 H.J. Lu via Libc-alpha
  2020-09-18 16:07 ` [PATCH 1/4] x86: Initialize CPU info via IFUNC relocation [BZ 26203] H.J. Lu via Libc-alpha
                   ` (3 more replies)
  0 siblings, 4 replies; 33+ messages in thread
From: H.J. Lu via Libc-alpha @ 2020-09-18 16:07 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer

Tunable values and their minimum/maximum values are invisible to users.
This patch set adds --list-tunables to ld.so to print tunable values
with their minimum and maximum values.  For these tunables whose values
and minimum/maximum values are determinted at run-time, TUNABLE_SET_ALL
and TUNABLE_SET_ALL_FULL are added to update tunable values together with
their minimum and maximum values.  

--list-tunables works on i686 and x86-64.  Please test --list-tunables on
your native processors.  users/hjl/tunable/master branch at:

https://gitlab.com/x86-glibc/glibc/-/commits/users/hjl/tunable/master

contains the same set of patches.

On x86, to make cache info accessible to --list-tunables, they are moved
to cpu_features in ld.so.  CPU features are initialized by DL_PLATFORM_INIT
in dynamic executable and by ARCH_INIT_CPU_FEATURES in static executable.
To initialize CPU features and cache info when loading ld.so and libc.so
inside of static executable, also initialize them by dummy function
pointers via IFUNC relocation like other fields in _rtld_global_ro in
ld.so.

H.J. Lu (4):
  x86: Initialize CPU info via IFUNC relocation [BZ 26203]
  Set tunable value as well as min/max values
  x86: Move x86 processor cache info to cpu_features
  ld.so: Add --list-tunables to print tunable values

 NEWS                               |   2 +
 elf/Makefile                       |   8 +
 elf/dl-tunables.c                  |  53 +-
 elf/dl-tunables.h                  |  20 +-
 elf/rtld.c                         |  31 +-
 manual/README.tunables             |  24 +-
 manual/tunables.texi               |  37 ++
 sysdeps/i386/dl-machine.h          |   3 +-
 sysdeps/x86/cacheinfo.c            | 873 ++-------------------------
 sysdeps/x86/cpu-cacheinfo.c        | 922 +++++++++++++++++++++++++++++
 sysdeps/x86/cpu-features.c         |  25 +-
 sysdeps/x86/dl-get-cpu-features.c  |  25 +-
 sysdeps/x86/include/cpu-features.h |  23 +
 sysdeps/x86_64/dl-machine.h        |   3 +-
 14 files changed, 1198 insertions(+), 851 deletions(-)
 create mode 100644 sysdeps/x86/cpu-cacheinfo.c

-- 
2.26.2


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/4] x86: Initialize CPU info via IFUNC relocation [BZ 26203]
  2020-09-18 16:07 V2 [PATCH 0/4] ld.so: Add --list-tunables to print tunable values H.J. Lu via Libc-alpha
@ 2020-09-18 16:07 ` H.J. Lu via Libc-alpha
  2020-09-28 13:08   ` Florian Weimer via Libc-alpha
  2020-09-18 16:07 ` [PATCH 2/4] Set tunable value as well as min/max values H.J. Lu via Libc-alpha
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 33+ messages in thread
From: H.J. Lu via Libc-alpha @ 2020-09-18 16:07 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer

X86 CPU features in ld.so are initialized by init_cpu_features, which is
invoked by DL_PLATFORM_INIT from _dl_sysdep_start.  But when ld.so is
loaded by static executable, DL_PLATFORM_INIT is never called.  Also
x86 cache info in libc.o and libc.a is initialized by a constructor
which may be called too late.  Since _rtld_global_ro in ld.so is
initialized by dynamic relocation,  we can also initialize x86 CPU
features in _rtld_global_ro in ld.so and cache info in libc.so by
initializing dummy function pointers in ld.so and libc.so via IFUNC
relocation.

Note: _dl_x86_init_cpu_features can be called more than once from
DL_PLATFORM_INIT and during relocation in ld.so.
---
 sysdeps/i386/dl-machine.h          |  3 +--
 sysdeps/x86/cacheinfo.c            | 10 +++++++++-
 sysdeps/x86/dl-get-cpu-features.c  | 25 ++++++++++++++++++++++++-
 sysdeps/x86/include/cpu-features.h |  1 +
 sysdeps/x86_64/dl-machine.h        |  3 +--
 5 files changed, 36 insertions(+), 6 deletions(-)

diff --git a/sysdeps/i386/dl-machine.h b/sysdeps/i386/dl-machine.h
index 0f08079e48..5e22e795cc 100644
--- a/sysdeps/i386/dl-machine.h
+++ b/sysdeps/i386/dl-machine.h
@@ -25,7 +25,6 @@
 #include <sysdep.h>
 #include <tls.h>
 #include <dl-tlsdesc.h>
-#include <cpu-features.c>
 
 /* Return nonzero iff ELF header is compatible with the running host.  */
 static inline int __attribute__ ((unused))
@@ -250,7 +249,7 @@ dl_platform_init (void)
 #if IS_IN (rtld)
   /* init_cpu_features has been called early from __libc_start_main in
      static executable.  */
-  init_cpu_features (&GLRO(dl_x86_cpu_features));
+  _dl_x86_init_cpu_features ();
 #else
   if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
     /* Avoid an empty string which would disturb us.  */
diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
index 217c21c34f..7a325ab70e 100644
--- a/sysdeps/x86/cacheinfo.c
+++ b/sysdeps/x86/cacheinfo.c
@@ -756,7 +756,6 @@ intel_bug_no_cache_info:
 
 
 static void
-__attribute__((constructor))
 init_cacheinfo (void)
 {
   /* Find out what brand of processor.  */
@@ -770,6 +769,8 @@ init_cacheinfo (void)
   unsigned int threads = 0;
   const struct cpu_features *cpu_features = __get_cpu_features ();
 
+  assert (cpu_features->basic.kind != arch_kind_unknown);
+
   if (cpu_features->basic.kind == arch_kind_intel)
     {
       data = handle_intel (_SC_LEVEL1_DCACHE_SIZE, cpu_features);
@@ -894,4 +895,11 @@ init_cacheinfo (void)
 # endif
 }
 
+/* NB: Call init_cacheinfo by initializing a dummy function pointer via
+   IFUNC relocation.  */
+extern void __x86_cacheinfo (void) attribute_hidden;
+const void (*__x86_cacheinfo_p) (void) attribute_hidden
+  = __x86_cacheinfo;
+
+__ifunc (__x86_cacheinfo, __x86_cacheinfo, NULL, void, init_cacheinfo);
 #endif
diff --git a/sysdeps/x86/dl-get-cpu-features.c b/sysdeps/x86/dl-get-cpu-features.c
index 5f9e46b0c6..da4697b895 100644
--- a/sysdeps/x86/dl-get-cpu-features.c
+++ b/sysdeps/x86/dl-get-cpu-features.c
@@ -1,4 +1,4 @@
-/* This file is part of the GNU C Library.
+/* Initialize CPU feature data via IFUNC relocation.
    Copyright (C) 2015-2020 Free Software Foundation, Inc.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -18,6 +18,29 @@
 
 #include <ldsodefs.h>
 
+#ifdef SHARED
+# include <cpu-features.c>
+
+/* NB: Normally, DL_PLATFORM_INIT calls init_cpu_features to initialize
+   CPU features.  But when loading ld.so inside of static executable,
+   DL_PLATFORM_INIT isn't called.  Call init_cpu_features by initializing
+   a dummy function pointer via IFUNC relocation for ld.so.  */
+extern void __x86_cpu_features (void) attribute_hidden;
+const void (*__x86_cpu_features_p) (void) attribute_hidden
+  = __x86_cpu_features;
+
+void
+_dl_x86_init_cpu_features (void)
+{
+  struct cpu_features *cpu_features = __get_cpu_features ();
+  if (cpu_features->basic.kind == arch_kind_unknown)
+    init_cpu_features (cpu_features);
+}
+
+__ifunc (__x86_cpu_features, __x86_cpu_features, NULL, void,
+	 _dl_x86_init_cpu_features);
+#endif
+
 #undef __x86_get_cpu_features
 
 const struct cpu_features *
diff --git a/sysdeps/x86/include/cpu-features.h b/sysdeps/x86/include/cpu-features.h
index dcf29b6fe8..f62be0b9b3 100644
--- a/sysdeps/x86/include/cpu-features.h
+++ b/sysdeps/x86/include/cpu-features.h
@@ -159,6 +159,7 @@ struct cpu_features
 /* Unused for x86.  */
 #  define INIT_ARCH()
 #  define __x86_get_cpu_features(max) (&GLRO(dl_x86_cpu_features))
+extern void _dl_x86_init_cpu_features (void) attribute_hidden;
 # endif
 
 # ifdef __x86_64__
diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h
index ca73d8fef9..773e94c8bb 100644
--- a/sysdeps/x86_64/dl-machine.h
+++ b/sysdeps/x86_64/dl-machine.h
@@ -26,7 +26,6 @@
 #include <sysdep.h>
 #include <tls.h>
 #include <dl-tlsdesc.h>
-#include <cpu-features.c>
 
 /* Return nonzero iff ELF header is compatible with the running host.  */
 static inline int __attribute__ ((unused))
@@ -225,7 +224,7 @@ dl_platform_init (void)
 #if IS_IN (rtld)
   /* init_cpu_features has been called early from __libc_start_main in
      static executable.  */
-  init_cpu_features (&GLRO(dl_x86_cpu_features));
+  _dl_x86_init_cpu_features ();
 #else
   if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
     /* Avoid an empty string which would disturb us.  */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/4] Set tunable value as well as min/max values
  2020-09-18 16:07 V2 [PATCH 0/4] ld.so: Add --list-tunables to print tunable values H.J. Lu via Libc-alpha
  2020-09-18 16:07 ` [PATCH 1/4] x86: Initialize CPU info via IFUNC relocation [BZ 26203] H.J. Lu via Libc-alpha
@ 2020-09-18 16:07 ` H.J. Lu via Libc-alpha
  2020-09-28 13:35   ` Florian Weimer via Libc-alpha
  2020-09-18 16:07 ` [PATCH 3/4] x86: Move x86 processor cache info to cpu_features H.J. Lu via Libc-alpha
  2020-09-18 16:07 ` [PATCH 4/4] ld.so: Add --list-tunables to print tunable values H.J. Lu via Libc-alpha
  3 siblings, 1 reply; 33+ messages in thread
From: H.J. Lu via Libc-alpha @ 2020-09-18 16:07 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer

Some tunable values and their minimum/maximum values must be determinted
at run-time.  Add TUNABLE_SET_ALL and TUNABLE_SET_ALL_FULL to update
tunable value together with minimum and maximum values.  __tunable_set_val
is updated to set tunable value as well as min/max values.
---
 elf/dl-tunables.c      | 17 ++++++++++++-----
 elf/dl-tunables.h      | 18 ++++++++++++++++--
 manual/README.tunables | 24 ++++++++++++++++++++++--
 3 files changed, 50 insertions(+), 9 deletions(-)

diff --git a/elf/dl-tunables.c b/elf/dl-tunables.c
index 26e6e26612..b44174fe71 100644
--- a/elf/dl-tunables.c
+++ b/elf/dl-tunables.c
@@ -101,12 +101,19 @@ get_next_env (char **envp, char **name, size_t *namelen, char **val,
 })
 
 static void
-do_tunable_update_val (tunable_t *cur, const void *valp)
+do_tunable_update_val (tunable_t *cur, const void *valp,
+		       const void *minp, const void *maxp)
 {
   uint64_t val;
 
   if (cur->type.type_code != TUNABLE_TYPE_STRING)
-    val = *((int64_t *) valp);
+    {
+      val = *((int64_t *) valp);
+      if (minp)
+	cur->type.min = *((int64_t *) minp);
+      if (maxp)
+	cur->type.max = *((int64_t *) maxp);
+    }
 
   switch (cur->type.type_code)
     {
@@ -153,15 +160,15 @@ tunable_initialize (tunable_t *cur, const char *strval)
       cur->initialized = true;
       valp = strval;
     }
-  do_tunable_update_val (cur, valp);
+  do_tunable_update_val (cur, valp, NULL, NULL);
 }
 
 void
-__tunable_set_val (tunable_id_t id, void *valp)
+__tunable_set_val (tunable_id_t id, void *valp, void *minp, void *maxp)
 {
   tunable_t *cur = &tunable_list[id];
 
-  do_tunable_update_val (cur, valp);
+  do_tunable_update_val (cur, valp, minp, maxp);
 }
 
 #if TUNABLES_FRONTEND == TUNABLES_FRONTEND_valstring
diff --git a/elf/dl-tunables.h b/elf/dl-tunables.h
index f05eb50c2f..b1add3184f 100644
--- a/elf/dl-tunables.h
+++ b/elf/dl-tunables.h
@@ -70,9 +70,10 @@ typedef struct _tunable tunable_t;
 
 extern void __tunables_init (char **);
 extern void __tunable_get_val (tunable_id_t, void *, tunable_callback_t);
-extern void __tunable_set_val (tunable_id_t, void *);
+extern void __tunable_set_val (tunable_id_t, void *, void *, void *);
 rtld_hidden_proto (__tunables_init)
 rtld_hidden_proto (__tunable_get_val)
+rtld_hidden_proto (__tunable_set_val)
 
 /* Define TUNABLE_GET and TUNABLE_SET in short form if TOP_NAMESPACE and
    TUNABLE_NAMESPACE are defined.  This is useful shorthand to get and set
@@ -82,11 +83,16 @@ rtld_hidden_proto (__tunable_get_val)
   TUNABLE_GET_FULL (TOP_NAMESPACE, TUNABLE_NAMESPACE, __id, __type, __cb)
 # define TUNABLE_SET(__id, __type, __val) \
   TUNABLE_SET_FULL (TOP_NAMESPACE, TUNABLE_NAMESPACE, __id, __type, __val)
+# define TUNABLE_SET_ALL(__id, __type, __val, __min, __max) \
+  TUNABLE_SET_ALL_FULL (TOP_NAMESPACE, TUNABLE_NAMESPACE, __id, __type, \
+			__val, __min, __max)
 #else
 # define TUNABLE_GET(__top, __ns, __id, __type, __cb) \
   TUNABLE_GET_FULL (__top, __ns, __id, __type, __cb)
 # define TUNABLE_SET(__top, __ns, __id, __type, __val) \
   TUNABLE_SET_FULL (__top, __ns, __id, __type, __val)
+# define TUNABLE_SET_ALL(__top, __ns, __id, __type, __val, __min, __max) \
+  TUNABLE_SET_ALL_FULL (__top, __ns, __id, __type, __val, __min, __max)
 #endif
 
 /* Get and return a tunable value.  If the tunable was set externally and __CB
@@ -103,7 +109,15 @@ rtld_hidden_proto (__tunable_get_val)
 # define TUNABLE_SET_FULL(__top, __ns, __id, __type, __val) \
 ({									      \
   __tunable_set_val (TUNABLE_ENUM_NAME (__top, __ns, __id),		      \
-			& (__type) {__val});				      \
+		     & (__type) {__val}, NULL, NULL);			      \
+})
+
+/* Set a tunable value together with min/max values.  */
+# define TUNABLE_SET_ALL_FULL(__top, __ns, __id, __type, __val, __min, __max) \
+({									      \
+  __tunable_set_val (TUNABLE_ENUM_NAME (__top, __ns, __id),		      \
+		     & (__type) {__val},  & (__type) {__min},		      \
+		     & (__type) {__max});				      \
 })
 
 /* Namespace sanity for callback functions.  Use this macro to keep the
diff --git a/manual/README.tunables b/manual/README.tunables
index f87a31a65e..db6f6bae8d 100644
--- a/manual/README.tunables
+++ b/manual/README.tunables
@@ -67,7 +67,7 @@ The list of allowed attributes are:
 				     non-AT_SECURE subprocesses.
 			NONE: Read all the time.
 
-2. Use TUNABLE_GET/TUNABLE_SET to get and set tunables.
+2. Use TUNABLE_GET/TUNABLE_SET/TUNABLE_SET_ALL to get and set tunables.
 
 3. OPTIONAL: If tunables in a namespace are being used multiple times within a
    specific module, set the TUNABLE_NAMESPACE macro to reduce the amount of
@@ -112,9 +112,29 @@ form of the macros as follows:
 where 'glibc' is the top namespace, 'cpu' is the tunable namespace and the
 remaining arguments are the same as the short form macros.
 
+The minimum and maximum values can updated together with the tunable value
+using:
+
+  TUNABLE_SET_ALL (check, int32_t, val, min, max)
+
+where 'check' is the tunable name, 'int32_t' is the C type of the tunable,
+'val' is a value of same type, 'min' and 'max' are the minimum and maximum
+values of the tunable.
+
+To set the minimum and maximum values of tunables in a different namespace
+from that module, use the full form of the macros as follows:
+
+  val = TUNABLE_GET_FULL (glibc, cpu, hwcap_mask, uint64_t, NULL)
+
+  TUNABLE_SET_ALL_FULL (glibc, cpu, hwcap_mask, uint64_t, val, min, max)
+
+where 'glibc' is the top namespace, 'cpu' is the tunable namespace and the
+remaining arguments are the same as the short form macros.
+
 When TUNABLE_NAMESPACE is not defined in a module, TUNABLE_GET is equivalent to
 TUNABLE_GET_FULL, so you will need to provide full namespace information for
-both macros.  Likewise for TUNABLE_SET and TUNABLE_SET_FULL.
+both macros.  Likewise for TUNABLE_SET, TUNABLE_SET_FULL, TUNABLE_SET_ALL
+and TUNABLE_SET_ALL_FULL.
 
 ** IMPORTANT NOTE **
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 3/4] x86: Move x86 processor cache info to cpu_features
  2020-09-18 16:07 V2 [PATCH 0/4] ld.so: Add --list-tunables to print tunable values H.J. Lu via Libc-alpha
  2020-09-18 16:07 ` [PATCH 1/4] x86: Initialize CPU info via IFUNC relocation [BZ 26203] H.J. Lu via Libc-alpha
  2020-09-18 16:07 ` [PATCH 2/4] Set tunable value as well as min/max values H.J. Lu via Libc-alpha
@ 2020-09-18 16:07 ` H.J. Lu via Libc-alpha
  2020-09-18 16:07 ` [PATCH 4/4] ld.so: Add --list-tunables to print tunable values H.J. Lu via Libc-alpha
  3 siblings, 0 replies; 33+ messages in thread
From: H.J. Lu via Libc-alpha @ 2020-09-18 16:07 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer

Move x86 processor cache info to cpu_features to use TUNABLE_SET_ALL and
cache CPUID outputs with the same inputs.
---
 sysdeps/x86/cacheinfo.c            | 867 ++-------------------------
 sysdeps/x86/cpu-cacheinfo.c        | 922 +++++++++++++++++++++++++++++
 sysdeps/x86/cpu-features.c         |  25 +-
 sysdeps/x86/include/cpu-features.h |  22 +
 4 files changed, 1000 insertions(+), 836 deletions(-)
 create mode 100644 sysdeps/x86/cpu-cacheinfo.c

diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
index 7a325ab70e..da17ff76f4 100644
--- a/sysdeps/x86/cacheinfo.c
+++ b/sysdeps/x86/cacheinfo.c
@@ -18,498 +18,9 @@
 
 #if IS_IN (libc)
 
-#include <assert.h>
-#include <stdbool.h>
-#include <stdlib.h>
 #include <unistd.h>
-#include <cpuid.h>
 #include <init-arch.h>
 
-static const struct intel_02_cache_info
-{
-  unsigned char idx;
-  unsigned char assoc;
-  unsigned char linesize;
-  unsigned char rel_name;
-  unsigned int size;
-} intel_02_known [] =
-  {
-#define M(sc) ((sc) - _SC_LEVEL1_ICACHE_SIZE)
-    { 0x06,  4, 32, M(_SC_LEVEL1_ICACHE_SIZE),    8192 },
-    { 0x08,  4, 32, M(_SC_LEVEL1_ICACHE_SIZE),   16384 },
-    { 0x09,  4, 32, M(_SC_LEVEL1_ICACHE_SIZE),   32768 },
-    { 0x0a,  2, 32, M(_SC_LEVEL1_DCACHE_SIZE),    8192 },
-    { 0x0c,  4, 32, M(_SC_LEVEL1_DCACHE_SIZE),   16384 },
-    { 0x0d,  4, 64, M(_SC_LEVEL1_DCACHE_SIZE),   16384 },
-    { 0x0e,  6, 64, M(_SC_LEVEL1_DCACHE_SIZE),   24576 },
-    { 0x21,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
-    { 0x22,  4, 64, M(_SC_LEVEL3_CACHE_SIZE),   524288 },
-    { 0x23,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  1048576 },
-    { 0x25,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  2097152 },
-    { 0x29,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  4194304 },
-    { 0x2c,  8, 64, M(_SC_LEVEL1_DCACHE_SIZE),   32768 },
-    { 0x30,  8, 64, M(_SC_LEVEL1_ICACHE_SIZE),   32768 },
-    { 0x39,  4, 64, M(_SC_LEVEL2_CACHE_SIZE),   131072 },
-    { 0x3a,  6, 64, M(_SC_LEVEL2_CACHE_SIZE),   196608 },
-    { 0x3b,  2, 64, M(_SC_LEVEL2_CACHE_SIZE),   131072 },
-    { 0x3c,  4, 64, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
-    { 0x3d,  6, 64, M(_SC_LEVEL2_CACHE_SIZE),   393216 },
-    { 0x3e,  4, 64, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
-    { 0x3f,  2, 64, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
-    { 0x41,  4, 32, M(_SC_LEVEL2_CACHE_SIZE),   131072 },
-    { 0x42,  4, 32, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
-    { 0x43,  4, 32, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
-    { 0x44,  4, 32, M(_SC_LEVEL2_CACHE_SIZE),  1048576 },
-    { 0x45,  4, 32, M(_SC_LEVEL2_CACHE_SIZE),  2097152 },
-    { 0x46,  4, 64, M(_SC_LEVEL3_CACHE_SIZE),  4194304 },
-    { 0x47,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  8388608 },
-    { 0x48, 12, 64, M(_SC_LEVEL2_CACHE_SIZE),  3145728 },
-    { 0x49, 16, 64, M(_SC_LEVEL2_CACHE_SIZE),  4194304 },
-    { 0x4a, 12, 64, M(_SC_LEVEL3_CACHE_SIZE),  6291456 },
-    { 0x4b, 16, 64, M(_SC_LEVEL3_CACHE_SIZE),  8388608 },
-    { 0x4c, 12, 64, M(_SC_LEVEL3_CACHE_SIZE), 12582912 },
-    { 0x4d, 16, 64, M(_SC_LEVEL3_CACHE_SIZE), 16777216 },
-    { 0x4e, 24, 64, M(_SC_LEVEL2_CACHE_SIZE),  6291456 },
-    { 0x60,  8, 64, M(_SC_LEVEL1_DCACHE_SIZE),   16384 },
-    { 0x66,  4, 64, M(_SC_LEVEL1_DCACHE_SIZE),    8192 },
-    { 0x67,  4, 64, M(_SC_LEVEL1_DCACHE_SIZE),   16384 },
-    { 0x68,  4, 64, M(_SC_LEVEL1_DCACHE_SIZE),   32768 },
-    { 0x78,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),  1048576 },
-    { 0x79,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),   131072 },
-    { 0x7a,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
-    { 0x7b,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
-    { 0x7c,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),  1048576 },
-    { 0x7d,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),  2097152 },
-    { 0x7f,  2, 64, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
-    { 0x80,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
-    { 0x82,  8, 32, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
-    { 0x83,  8, 32, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
-    { 0x84,  8, 32, M(_SC_LEVEL2_CACHE_SIZE),  1048576 },
-    { 0x85,  8, 32, M(_SC_LEVEL2_CACHE_SIZE),  2097152 },
-    { 0x86,  4, 64, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
-    { 0x87,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),  1048576 },
-    { 0xd0,  4, 64, M(_SC_LEVEL3_CACHE_SIZE),   524288 },
-    { 0xd1,  4, 64, M(_SC_LEVEL3_CACHE_SIZE),  1048576 },
-    { 0xd2,  4, 64, M(_SC_LEVEL3_CACHE_SIZE),  2097152 },
-    { 0xd6,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  1048576 },
-    { 0xd7,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  2097152 },
-    { 0xd8,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  4194304 },
-    { 0xdc, 12, 64, M(_SC_LEVEL3_CACHE_SIZE),  2097152 },
-    { 0xdd, 12, 64, M(_SC_LEVEL3_CACHE_SIZE),  4194304 },
-    { 0xde, 12, 64, M(_SC_LEVEL3_CACHE_SIZE),  8388608 },
-    { 0xe2, 16, 64, M(_SC_LEVEL3_CACHE_SIZE),  2097152 },
-    { 0xe3, 16, 64, M(_SC_LEVEL3_CACHE_SIZE),  4194304 },
-    { 0xe4, 16, 64, M(_SC_LEVEL3_CACHE_SIZE),  8388608 },
-    { 0xea, 24, 64, M(_SC_LEVEL3_CACHE_SIZE), 12582912 },
-    { 0xeb, 24, 64, M(_SC_LEVEL3_CACHE_SIZE), 18874368 },
-    { 0xec, 24, 64, M(_SC_LEVEL3_CACHE_SIZE), 25165824 },
-  };
-
-#define nintel_02_known (sizeof (intel_02_known) / sizeof (intel_02_known [0]))
-
-static int
-intel_02_known_compare (const void *p1, const void *p2)
-{
-  const struct intel_02_cache_info *i1;
-  const struct intel_02_cache_info *i2;
-
-  i1 = (const struct intel_02_cache_info *) p1;
-  i2 = (const struct intel_02_cache_info *) p2;
-
-  if (i1->idx == i2->idx)
-    return 0;
-
-  return i1->idx < i2->idx ? -1 : 1;
-}
-
-
-static long int
-__attribute__ ((noinline))
-intel_check_word (int name, unsigned int value, bool *has_level_2,
-		  bool *no_level_2_or_3,
-		  const struct cpu_features *cpu_features)
-{
-  if ((value & 0x80000000) != 0)
-    /* The register value is reserved.  */
-    return 0;
-
-  /* Fold the name.  The _SC_ constants are always in the order SIZE,
-     ASSOC, LINESIZE.  */
-  int folded_rel_name = (M(name) / 3) * 3;
-
-  while (value != 0)
-    {
-      unsigned int byte = value & 0xff;
-
-      if (byte == 0x40)
-	{
-	  *no_level_2_or_3 = true;
-
-	  if (folded_rel_name == M(_SC_LEVEL3_CACHE_SIZE))
-	    /* No need to look further.  */
-	    break;
-	}
-      else if (byte == 0xff)
-	{
-	  /* CPUID leaf 0x4 contains all the information.  We need to
-	     iterate over it.  */
-	  unsigned int eax;
-	  unsigned int ebx;
-	  unsigned int ecx;
-	  unsigned int edx;
-
-	  unsigned int round = 0;
-	  while (1)
-	    {
-	      __cpuid_count (4, round, eax, ebx, ecx, edx);
-
-	      enum { null = 0, data = 1, inst = 2, uni = 3 } type = eax & 0x1f;
-	      if (type == null)
-		/* That was the end.  */
-		break;
-
-	      unsigned int level = (eax >> 5) & 0x7;
-
-	      if ((level == 1 && type == data
-		   && folded_rel_name == M(_SC_LEVEL1_DCACHE_SIZE))
-		  || (level == 1 && type == inst
-		      && folded_rel_name == M(_SC_LEVEL1_ICACHE_SIZE))
-		  || (level == 2 && folded_rel_name == M(_SC_LEVEL2_CACHE_SIZE))
-		  || (level == 3 && folded_rel_name == M(_SC_LEVEL3_CACHE_SIZE))
-		  || (level == 4 && folded_rel_name == M(_SC_LEVEL4_CACHE_SIZE)))
-		{
-		  unsigned int offset = M(name) - folded_rel_name;
-
-		  if (offset == 0)
-		    /* Cache size.  */
-		    return (((ebx >> 22) + 1)
-			    * (((ebx >> 12) & 0x3ff) + 1)
-			    * ((ebx & 0xfff) + 1)
-			    * (ecx + 1));
-		  if (offset == 1)
-		    return (ebx >> 22) + 1;
-
-		  assert (offset == 2);
-		  return (ebx & 0xfff) + 1;
-		}
-
-	      ++round;
-	    }
-	  /* There is no other cache information anywhere else.  */
-	  break;
-	}
-      else
-	{
-	  if (byte == 0x49 && folded_rel_name == M(_SC_LEVEL3_CACHE_SIZE))
-	    {
-	      /* Intel reused this value.  For family 15, model 6 it
-		 specifies the 3rd level cache.  Otherwise the 2nd
-		 level cache.  */
-	      unsigned int family = cpu_features->basic.family;
-	      unsigned int model = cpu_features->basic.model;
-
-	      if (family == 15 && model == 6)
-		{
-		  /* The level 3 cache is encoded for this model like
-		     the level 2 cache is for other models.  Pretend
-		     the caller asked for the level 2 cache.  */
-		  name = (_SC_LEVEL2_CACHE_SIZE
-			  + (name - _SC_LEVEL3_CACHE_SIZE));
-		  folded_rel_name = M(_SC_LEVEL2_CACHE_SIZE);
-		}
-	    }
-
-	  struct intel_02_cache_info *found;
-	  struct intel_02_cache_info search;
-
-	  search.idx = byte;
-	  found = bsearch (&search, intel_02_known, nintel_02_known,
-			   sizeof (intel_02_known[0]), intel_02_known_compare);
-	  if (found != NULL)
-	    {
-	      if (found->rel_name == folded_rel_name)
-		{
-		  unsigned int offset = M(name) - folded_rel_name;
-
-		  if (offset == 0)
-		    /* Cache size.  */
-		    return found->size;
-		  if (offset == 1)
-		    return found->assoc;
-
-		  assert (offset == 2);
-		  return found->linesize;
-		}
-
-	      if (found->rel_name == M(_SC_LEVEL2_CACHE_SIZE))
-		*has_level_2 = true;
-	    }
-	}
-
-      /* Next byte for the next round.  */
-      value >>= 8;
-    }
-
-  /* Nothing found.  */
-  return 0;
-}
-
-
-static long int __attribute__ ((noinline))
-handle_intel (int name, const struct cpu_features *cpu_features)
-{
-  unsigned int maxidx = cpu_features->basic.max_cpuid;
-
-  /* Return -1 for older CPUs.  */
-  if (maxidx < 2)
-    return -1;
-
-  /* OK, we can use the CPUID instruction to get all info about the
-     caches.  */
-  unsigned int cnt = 0;
-  unsigned int max = 1;
-  long int result = 0;
-  bool no_level_2_or_3 = false;
-  bool has_level_2 = false;
-
-  while (cnt++ < max)
-    {
-      unsigned int eax;
-      unsigned int ebx;
-      unsigned int ecx;
-      unsigned int edx;
-      __cpuid (2, eax, ebx, ecx, edx);
-
-      /* The low byte of EAX in the first round contain the number of
-	 rounds we have to make.  At least one, the one we are already
-	 doing.  */
-      if (cnt == 1)
-	{
-	  max = eax & 0xff;
-	  eax &= 0xffffff00;
-	}
-
-      /* Process the individual registers' value.  */
-      result = intel_check_word (name, eax, &has_level_2,
-				 &no_level_2_or_3, cpu_features);
-      if (result != 0)
-	return result;
-
-      result = intel_check_word (name, ebx, &has_level_2,
-				 &no_level_2_or_3, cpu_features);
-      if (result != 0)
-	return result;
-
-      result = intel_check_word (name, ecx, &has_level_2,
-				 &no_level_2_or_3, cpu_features);
-      if (result != 0)
-	return result;
-
-      result = intel_check_word (name, edx, &has_level_2,
-				 &no_level_2_or_3, cpu_features);
-      if (result != 0)
-	return result;
-    }
-
-  if (name >= _SC_LEVEL2_CACHE_SIZE && name <= _SC_LEVEL3_CACHE_LINESIZE
-      && no_level_2_or_3)
-    return -1;
-
-  return 0;
-}
-
-
-static long int __attribute__ ((noinline))
-handle_amd (int name)
-{
-  unsigned int eax;
-  unsigned int ebx;
-  unsigned int ecx;
-  unsigned int edx;
-  __cpuid (0x80000000, eax, ebx, ecx, edx);
-
-  /* No level 4 cache (yet).  */
-  if (name > _SC_LEVEL3_CACHE_LINESIZE)
-    return 0;
-
-  unsigned int fn = 0x80000005 + (name >= _SC_LEVEL2_CACHE_SIZE);
-  if (eax < fn)
-    return 0;
-
-  __cpuid (fn, eax, ebx, ecx, edx);
-
-  if (name < _SC_LEVEL1_DCACHE_SIZE)
-    {
-      name += _SC_LEVEL1_DCACHE_SIZE - _SC_LEVEL1_ICACHE_SIZE;
-      ecx = edx;
-    }
-
-  switch (name)
-    {
-    case _SC_LEVEL1_DCACHE_SIZE:
-      return (ecx >> 14) & 0x3fc00;
-
-    case _SC_LEVEL1_DCACHE_ASSOC:
-      ecx >>= 16;
-      if ((ecx & 0xff) == 0xff)
-	/* Fully associative.  */
-	return (ecx << 2) & 0x3fc00;
-      return ecx & 0xff;
-
-    case _SC_LEVEL1_DCACHE_LINESIZE:
-      return ecx & 0xff;
-
-    case _SC_LEVEL2_CACHE_SIZE:
-      return (ecx & 0xf000) == 0 ? 0 : (ecx >> 6) & 0x3fffc00;
-
-    case _SC_LEVEL2_CACHE_ASSOC:
-      switch ((ecx >> 12) & 0xf)
-	{
-	case 0:
-	case 1:
-	case 2:
-	case 4:
-	  return (ecx >> 12) & 0xf;
-	case 6:
-	  return 8;
-	case 8:
-	  return 16;
-	case 10:
-	  return 32;
-	case 11:
-	  return 48;
-	case 12:
-	  return 64;
-	case 13:
-	  return 96;
-	case 14:
-	  return 128;
-	case 15:
-	  return ((ecx >> 6) & 0x3fffc00) / (ecx & 0xff);
-	default:
-	  return 0;
-	}
-      /* NOTREACHED */
-
-    case _SC_LEVEL2_CACHE_LINESIZE:
-      return (ecx & 0xf000) == 0 ? 0 : ecx & 0xff;
-
-    case _SC_LEVEL3_CACHE_SIZE:
-      return (edx & 0xf000) == 0 ? 0 : (edx & 0x3ffc0000) << 1;
-
-    case _SC_LEVEL3_CACHE_ASSOC:
-      switch ((edx >> 12) & 0xf)
-	{
-	case 0:
-	case 1:
-	case 2:
-	case 4:
-	  return (edx >> 12) & 0xf;
-	case 6:
-	  return 8;
-	case 8:
-	  return 16;
-	case 10:
-	  return 32;
-	case 11:
-	  return 48;
-	case 12:
-	  return 64;
-	case 13:
-	  return 96;
-	case 14:
-	  return 128;
-	case 15:
-	  return ((edx & 0x3ffc0000) << 1) / (edx & 0xff);
-	default:
-	  return 0;
-	}
-      /* NOTREACHED */
-
-    case _SC_LEVEL3_CACHE_LINESIZE:
-      return (edx & 0xf000) == 0 ? 0 : edx & 0xff;
-
-    default:
-      assert (! "cannot happen");
-    }
-  return -1;
-}
-
-
-static long int __attribute__ ((noinline))
-handle_zhaoxin (int name)
-{
-  unsigned int eax;
-  unsigned int ebx;
-  unsigned int ecx;
-  unsigned int edx;
-
-  int folded_rel_name = (M(name) / 3) * 3;
-
-  unsigned int round = 0;
-  while (1)
-    {
-      __cpuid_count (4, round, eax, ebx, ecx, edx);
-
-      enum { null = 0, data = 1, inst = 2, uni = 3 } type = eax & 0x1f;
-      if (type == null)
-        break;
-
-      unsigned int level = (eax >> 5) & 0x7;
-
-      if ((level == 1 && type == data
-        && folded_rel_name == M(_SC_LEVEL1_DCACHE_SIZE))
-        || (level == 1 && type == inst
-            && folded_rel_name == M(_SC_LEVEL1_ICACHE_SIZE))
-        || (level == 2 && folded_rel_name == M(_SC_LEVEL2_CACHE_SIZE))
-        || (level == 3 && folded_rel_name == M(_SC_LEVEL3_CACHE_SIZE)))
-        {
-          unsigned int offset = M(name) - folded_rel_name;
-
-          if (offset == 0)
-            /* Cache size.  */
-            return (((ebx >> 22) + 1)
-                * (((ebx >> 12) & 0x3ff) + 1)
-                * ((ebx & 0xfff) + 1)
-                * (ecx + 1));
-          if (offset == 1)
-            return (ebx >> 22) + 1;
-
-          assert (offset == 2);
-          return (ebx & 0xfff) + 1;
-        }
-
-      ++round;
-    }
-
-  /* Nothing found.  */
-  return 0;
-}
-
-
-/* Get the value of the system variable NAME.  */
-long int
-attribute_hidden
-__cache_sysconf (int name)
-{
-  const struct cpu_features *cpu_features = __get_cpu_features ();
-
-  if (cpu_features->basic.kind == arch_kind_intel)
-    return handle_intel (name, cpu_features);
-
-  if (cpu_features->basic.kind == arch_kind_amd)
-    return handle_amd (name);
-
-  if (cpu_features->basic.kind == arch_kind_zhaoxin)
-    return handle_zhaoxin (name);
-
-  // XXX Fill in more vendors.
-
-  /* CPU not known, we have no information.  */
-  return 0;
-}
-
-
 /* Data cache size for use in memory and string routines, typically
    L1 size, rounded to multiple of 256 bytes.  */
 long int __x86_data_cache_size_half attribute_hidden = 32 * 1024 / 2;
@@ -537,362 +48,78 @@ long int __x86_rep_movsb_threshold attribute_hidden = 2048;
 long int __x86_rep_stosb_threshold attribute_hidden = 2048;
 
 
-static void
-get_common_cache_info (long int *shared_ptr, unsigned int *threads_ptr,
-                long int core)
+/* Get the value of the system variable NAME.  */
+long int
+attribute_hidden
+__cache_sysconf (int name)
 {
-  unsigned int eax;
-  unsigned int ebx;
-  unsigned int ecx;
-  unsigned int edx;
-
-  /* Number of logical processors sharing L2 cache.  */
-  int threads_l2;
-
-  /* Number of logical processors sharing L3 cache.  */
-  int threads_l3;
-
   const struct cpu_features *cpu_features = __get_cpu_features ();
-  int max_cpuid = cpu_features->basic.max_cpuid;
-  unsigned int family = cpu_features->basic.family;
-  unsigned int model = cpu_features->basic.model;
-  long int shared = *shared_ptr;
-  unsigned int threads = *threads_ptr;
-  bool inclusive_cache = true;
-  bool support_count_mask = true;
-
-  /* Try L3 first.  */
-  unsigned int level = 3;
-
-  if (cpu_features->basic.kind == arch_kind_zhaoxin && family == 6)
-    support_count_mask = false;
-
-  if (shared <= 0)
-    {
-      /* Try L2 otherwise.  */
-      level  = 2;
-      shared = core;
-      threads_l2 = 0;
-      threads_l3 = -1;
-    }
-  else
-    {
-      threads_l2 = 0;
-      threads_l3 = 0;
-    }
-
-  /* A value of 0 for the HTT bit indicates there is only a single
-     logical processor.  */
-  if (CPU_FEATURE_USABLE (HTT))
+  switch (name)
     {
-      /* Figure out the number of logical threads that share the
-         highest cache level.  */
-      if (max_cpuid >= 4)
-        {
-          int i = 0;
-
-          /* Query until cache level 2 and 3 are enumerated.  */
-          int check = 0x1 | (threads_l3 == 0) << 1;
-          do
-            {
-              __cpuid_count (4, i++, eax, ebx, ecx, edx);
+    case _SC_LEVEL1_ICACHE_SIZE:
+      return cpu_features->level1_icache_size;
 
-              /* There seems to be a bug in at least some Pentium Ds
-                 which sometimes fail to iterate all cache parameters.
-                 Do not loop indefinitely here, stop in this case and
-                 assume there is no such information.  */
-              if (cpu_features->basic.kind == arch_kind_intel
-                  && (eax & 0x1f) == 0 )
-                goto intel_bug_no_cache_info;
+    case _SC_LEVEL1_DCACHE_SIZE:
+      return cpu_features->level1_dcache_size;
 
-              switch ((eax >> 5) & 0x7)
-                {
-                  default:
-                    break;
-                  case 2:
-                    if ((check & 0x1))
-                      {
-                        /* Get maximum number of logical processors
-                           sharing L2 cache.  */
-                        threads_l2 = (eax >> 14) & 0x3ff;
-                        check &= ~0x1;
-                      }
-                    break;
-                  case 3:
-                    if ((check & (0x1 << 1)))
-                      {
-                        /* Get maximum number of logical processors
-                           sharing L3 cache.  */
-                        threads_l3 = (eax >> 14) & 0x3ff;
+    case _SC_LEVEL1_DCACHE_ASSOC:
+      return cpu_features->level1_dcache_assoc;
 
-                        /* Check if L2 and L3 caches are inclusive.  */
-                        inclusive_cache = (edx & 0x2) != 0;
-                        check &= ~(0x1 << 1);
-                      }
-                    break;
-                }
-            }
-          while (check);
+    case _SC_LEVEL1_DCACHE_LINESIZE:
+      return cpu_features->level1_dcache_linesize;
 
-          /* If max_cpuid >= 11, THREADS_L2/THREADS_L3 are the maximum
-             numbers of addressable IDs for logical processors sharing
-             the cache, instead of the maximum number of threads
-             sharing the cache.  */
-          if (max_cpuid >= 11 && support_count_mask)
-            {
-              /* Find the number of logical processors shipped in
-                 one core and apply count mask.  */
-              i = 0;
+    case _SC_LEVEL2_CACHE_SIZE:
+      return cpu_features->level2_cache_size;
 
-              /* Count SMT only if there is L3 cache.  Always count
-                 core if there is no L3 cache.  */
-              int count = ((threads_l2 > 0 && level == 3)
-                           | ((threads_l3 > 0
-                               || (threads_l2 > 0 && level == 2)) << 1));
+    case _SC_LEVEL2_CACHE_ASSOC:
+      return cpu_features->level2_cache_assoc;
 
-              while (count)
-                {
-                  __cpuid_count (11, i++, eax, ebx, ecx, edx);
+    case _SC_LEVEL2_CACHE_LINESIZE:
+      return cpu_features->level2_cache_linesize;
 
-                  int shipped = ebx & 0xff;
-                  int type = ecx & 0xff00;
-                  if (shipped == 0 || type == 0)
-                    break;
-                  else if (type == 0x100)
-                    {
-                      /* Count SMT.  */
-                      if ((count & 0x1))
-                        {
-                          int count_mask;
+    case _SC_LEVEL3_CACHE_SIZE:
+      return cpu_features->level3_cache_size;
 
-                          /* Compute count mask.  */
-                          asm ("bsr %1, %0"
-                               : "=r" (count_mask) : "g" (threads_l2));
-                          count_mask = ~(-1 << (count_mask + 1));
-                          threads_l2 = (shipped - 1) & count_mask;
-                          count &= ~0x1;
-                        }
-                    }
-                  else if (type == 0x200)
-                    {
-                      /* Count core.  */
-                      if ((count & (0x1 << 1)))
-                        {
-                          int count_mask;
-                          int threads_core
-                            = (level == 2 ? threads_l2 : threads_l3);
+    case _SC_LEVEL3_CACHE_ASSOC:
+      return cpu_features->level3_cache_assoc;
 
-                          /* Compute count mask.  */
-                          asm ("bsr %1, %0"
-                               : "=r" (count_mask) : "g" (threads_core));
-                          count_mask = ~(-1 << (count_mask + 1));
-                          threads_core = (shipped - 1) & count_mask;
-                          if (level == 2)
-                            threads_l2 = threads_core;
-                          else
-                            threads_l3 = threads_core;
-                          count &= ~(0x1 << 1);
-                        }
-                    }
-                }
-            }
-          if (threads_l2 > 0)
-            threads_l2 += 1;
-          if (threads_l3 > 0)
-            threads_l3 += 1;
-          if (level == 2)
-            {
-              if (threads_l2)
-                {
-                  threads = threads_l2;
-                  if (cpu_features->basic.kind == arch_kind_intel
-                      && threads > 2
-                      && family == 6)
-                    switch (model)
-                      {
-                        case 0x37:
-                        case 0x4a:
-                        case 0x4d:
-                        case 0x5a:
-                        case 0x5d:
-                          /* Silvermont has L2 cache shared by 2 cores.  */
-                          threads = 2;
-                          break;
-                        default:
-                          break;
-                      }
-                }
-            }
-          else if (threads_l3)
-            threads = threads_l3;
-        }
-      else
-        {
-intel_bug_no_cache_info:
-          /* Assume that all logical threads share the highest cache
-             level.  */
-          threads
-            = ((cpu_features->features[COMMON_CPUID_INDEX_1].cpuid.ebx
-                >> 16) & 0xff);
-        }
+    case _SC_LEVEL3_CACHE_LINESIZE:
+      return cpu_features->level3_cache_linesize;
 
-        /* Cap usage of highest cache level to the number of supported
-           threads.  */
-        if (shared > 0 && threads > 0)
-          shared /= threads;
-    }
+    case _SC_LEVEL4_CACHE_SIZE:
+      return cpu_features->level4_cache_size;
 
-  /* Account for non-inclusive L2 and L3 caches.  */
-  if (!inclusive_cache)
-    {
-      if (threads_l2 > 0)
-        core /= threads_l2;
-      shared += core;
+    default:
+      break;
     }
-
-  *shared_ptr = shared;
-  *threads_ptr = threads;
+  return -1;
 }
 
-
 static void
 init_cacheinfo (void)
 {
-  /* Find out what brand of processor.  */
-  unsigned int ebx;
-  unsigned int ecx;
-  unsigned int edx;
-  int max_cpuid_ex;
-  long int data = -1;
-  long int shared = -1;
-  long int core;
-  unsigned int threads = 0;
   const struct cpu_features *cpu_features = __get_cpu_features ();
+  long int data = cpu_features->data_cache_size;
+  __x86_raw_data_cache_size_half = data / 2;
+  __x86_raw_data_cache_size = data;
+  /* Round data cache size to multiple of 256 bytes.  */
+  data = data & ~255L;
+  __x86_data_cache_size_half = data / 2;
+  __x86_data_cache_size = data;
+
+  long int shared = cpu_features->shared_cache_size;
+  __x86_raw_shared_cache_size_half = shared / 2;
+  __x86_raw_shared_cache_size = shared;
+  /* Round shared cache size to multiple of 256 bytes.  */
+  shared = shared & ~255L;
+  __x86_shared_cache_size_half = shared / 2;
+  __x86_shared_cache_size = shared;
 
-  assert (cpu_features->basic.kind != arch_kind_unknown);
-
-  if (cpu_features->basic.kind == arch_kind_intel)
-    {
-      data = handle_intel (_SC_LEVEL1_DCACHE_SIZE, cpu_features);
-      core = handle_intel (_SC_LEVEL2_CACHE_SIZE, cpu_features);
-      shared = handle_intel (_SC_LEVEL3_CACHE_SIZE, cpu_features);
-
-      get_common_cache_info (&shared, &threads, core);
-    }
-  else if (cpu_features->basic.kind == arch_kind_zhaoxin)
-    {
-      data = handle_zhaoxin (_SC_LEVEL1_DCACHE_SIZE);
-      core = handle_zhaoxin (_SC_LEVEL2_CACHE_SIZE);
-      shared = handle_zhaoxin (_SC_LEVEL3_CACHE_SIZE);
-
-      get_common_cache_info (&shared, &threads, core);
-    }
-  else if (cpu_features->basic.kind == arch_kind_amd)
-    {
-      data   = handle_amd (_SC_LEVEL1_DCACHE_SIZE);
-      long int core = handle_amd (_SC_LEVEL2_CACHE_SIZE);
-      shared = handle_amd (_SC_LEVEL3_CACHE_SIZE);
-
-      /* Get maximum extended function. */
-      __cpuid (0x80000000, max_cpuid_ex, ebx, ecx, edx);
-
-      if (shared <= 0)
-	/* No shared L3 cache.  All we have is the L2 cache.  */
-	shared = core;
-      else
-	{
-	  /* Figure out the number of logical threads that share L3.  */
-	  if (max_cpuid_ex >= 0x80000008)
-	    {
-	      /* Get width of APIC ID.  */
-	      __cpuid (0x80000008, max_cpuid_ex, ebx, ecx, edx);
-	      threads = 1 << ((ecx >> 12) & 0x0f);
-	    }
-
-	  if (threads == 0)
-	    {
-	      /* If APIC ID width is not available, use logical
-		 processor count.  */
-	      __cpuid (0x00000001, max_cpuid_ex, ebx, ecx, edx);
-
-	      if ((edx & (1 << 28)) != 0)
-		threads = (ebx >> 16) & 0xff;
-	    }
-
-	  /* Cap usage of highest cache level to the number of
-	     supported threads.  */
-	  if (threads > 0)
-	    shared /= threads;
-
-	  /* Account for exclusive L2 and L3 caches.  */
-	  shared += core;
-	}
-    }
-
-  if (cpu_features->data_cache_size != 0)
-    data = cpu_features->data_cache_size;
-
-  if (data > 0)
-    {
-      __x86_raw_data_cache_size_half = data / 2;
-      __x86_raw_data_cache_size = data;
-      /* Round data cache size to multiple of 256 bytes.  */
-      data = data & ~255L;
-      __x86_data_cache_size_half = data / 2;
-      __x86_data_cache_size = data;
-    }
-
-  if (cpu_features->shared_cache_size != 0)
-    shared = cpu_features->shared_cache_size;
-
-  if (shared > 0)
-    {
-      __x86_raw_shared_cache_size_half = shared / 2;
-      __x86_raw_shared_cache_size = shared;
-      /* Round shared cache size to multiple of 256 bytes.  */
-      shared = shared & ~255L;
-      __x86_shared_cache_size_half = shared / 2;
-      __x86_shared_cache_size = shared;
-    }
-
-  /* The large memcpy micro benchmark in glibc shows that 6 times of
-     shared cache size is the approximate value above which non-temporal
-     store becomes faster on a 8-core processor.  This is the 3/4 of the
-     total shared cache size.  */
   __x86_shared_non_temporal_threshold
-    = (cpu_features->non_temporal_threshold != 0
-       ? cpu_features->non_temporal_threshold
-       : __x86_shared_cache_size * threads * 3 / 4);
-
-  /* NB: The REP MOVSB threshold must be greater than VEC_SIZE * 8.  */
-  unsigned int minimum_rep_movsb_threshold;
-  /* NB: The default REP MOVSB threshold is 2048 * (VEC_SIZE / 16).  */
-  unsigned int rep_movsb_threshold;
-  if (CPU_FEATURE_USABLE_P (cpu_features, AVX512F)
-      && !CPU_FEATURE_PREFERRED_P (cpu_features, Prefer_No_AVX512))
-    {
-      rep_movsb_threshold = 2048 * (64 / 16);
-      minimum_rep_movsb_threshold = 64 * 8;
-    }
-  else if (CPU_FEATURE_PREFERRED_P (cpu_features,
-				    AVX_Fast_Unaligned_Load))
-    {
-      rep_movsb_threshold = 2048 * (32 / 16);
-      minimum_rep_movsb_threshold = 32 * 8;
-    }
-  else
-    {
-      rep_movsb_threshold = 2048 * (16 / 16);
-      minimum_rep_movsb_threshold = 16 * 8;
-    }
-  if (cpu_features->rep_movsb_threshold > minimum_rep_movsb_threshold)
-    __x86_rep_movsb_threshold = cpu_features->rep_movsb_threshold;
-  else
-    __x86_rep_movsb_threshold = rep_movsb_threshold;
+    = cpu_features->non_temporal_threshold;
 
-# if HAVE_TUNABLES
+  __x86_rep_movsb_threshold = cpu_features->rep_movsb_threshold;
   __x86_rep_stosb_threshold = cpu_features->rep_stosb_threshold;
-# endif
 }
 
 /* NB: Call init_cacheinfo by initializing a dummy function pointer via
diff --git a/sysdeps/x86/cpu-cacheinfo.c b/sysdeps/x86/cpu-cacheinfo.c
new file mode 100644
index 0000000000..16b81af333
--- /dev/null
+++ b/sysdeps/x86/cpu-cacheinfo.c
@@ -0,0 +1,922 @@
+/* x86 cache info.
+   Copyright (C) 2020 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <assert.h>
+#include <unistd.h>
+
+static const struct intel_02_cache_info
+{
+  unsigned char idx;
+  unsigned char assoc;
+  unsigned char linesize;
+  unsigned char rel_name;
+  unsigned int size;
+} intel_02_known [] =
+  {
+#define M(sc) ((sc) - _SC_LEVEL1_ICACHE_SIZE)
+    { 0x06,  4, 32, M(_SC_LEVEL1_ICACHE_SIZE),    8192 },
+    { 0x08,  4, 32, M(_SC_LEVEL1_ICACHE_SIZE),   16384 },
+    { 0x09,  4, 32, M(_SC_LEVEL1_ICACHE_SIZE),   32768 },
+    { 0x0a,  2, 32, M(_SC_LEVEL1_DCACHE_SIZE),    8192 },
+    { 0x0c,  4, 32, M(_SC_LEVEL1_DCACHE_SIZE),   16384 },
+    { 0x0d,  4, 64, M(_SC_LEVEL1_DCACHE_SIZE),   16384 },
+    { 0x0e,  6, 64, M(_SC_LEVEL1_DCACHE_SIZE),   24576 },
+    { 0x21,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
+    { 0x22,  4, 64, M(_SC_LEVEL3_CACHE_SIZE),   524288 },
+    { 0x23,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  1048576 },
+    { 0x25,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  2097152 },
+    { 0x29,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  4194304 },
+    { 0x2c,  8, 64, M(_SC_LEVEL1_DCACHE_SIZE),   32768 },
+    { 0x30,  8, 64, M(_SC_LEVEL1_ICACHE_SIZE),   32768 },
+    { 0x39,  4, 64, M(_SC_LEVEL2_CACHE_SIZE),   131072 },
+    { 0x3a,  6, 64, M(_SC_LEVEL2_CACHE_SIZE),   196608 },
+    { 0x3b,  2, 64, M(_SC_LEVEL2_CACHE_SIZE),   131072 },
+    { 0x3c,  4, 64, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
+    { 0x3d,  6, 64, M(_SC_LEVEL2_CACHE_SIZE),   393216 },
+    { 0x3e,  4, 64, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
+    { 0x3f,  2, 64, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
+    { 0x41,  4, 32, M(_SC_LEVEL2_CACHE_SIZE),   131072 },
+    { 0x42,  4, 32, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
+    { 0x43,  4, 32, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
+    { 0x44,  4, 32, M(_SC_LEVEL2_CACHE_SIZE),  1048576 },
+    { 0x45,  4, 32, M(_SC_LEVEL2_CACHE_SIZE),  2097152 },
+    { 0x46,  4, 64, M(_SC_LEVEL3_CACHE_SIZE),  4194304 },
+    { 0x47,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  8388608 },
+    { 0x48, 12, 64, M(_SC_LEVEL2_CACHE_SIZE),  3145728 },
+    { 0x49, 16, 64, M(_SC_LEVEL2_CACHE_SIZE),  4194304 },
+    { 0x4a, 12, 64, M(_SC_LEVEL3_CACHE_SIZE),  6291456 },
+    { 0x4b, 16, 64, M(_SC_LEVEL3_CACHE_SIZE),  8388608 },
+    { 0x4c, 12, 64, M(_SC_LEVEL3_CACHE_SIZE), 12582912 },
+    { 0x4d, 16, 64, M(_SC_LEVEL3_CACHE_SIZE), 16777216 },
+    { 0x4e, 24, 64, M(_SC_LEVEL2_CACHE_SIZE),  6291456 },
+    { 0x60,  8, 64, M(_SC_LEVEL1_DCACHE_SIZE),   16384 },
+    { 0x66,  4, 64, M(_SC_LEVEL1_DCACHE_SIZE),    8192 },
+    { 0x67,  4, 64, M(_SC_LEVEL1_DCACHE_SIZE),   16384 },
+    { 0x68,  4, 64, M(_SC_LEVEL1_DCACHE_SIZE),   32768 },
+    { 0x78,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),  1048576 },
+    { 0x79,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),   131072 },
+    { 0x7a,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
+    { 0x7b,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
+    { 0x7c,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),  1048576 },
+    { 0x7d,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),  2097152 },
+    { 0x7f,  2, 64, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
+    { 0x80,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
+    { 0x82,  8, 32, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
+    { 0x83,  8, 32, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
+    { 0x84,  8, 32, M(_SC_LEVEL2_CACHE_SIZE),  1048576 },
+    { 0x85,  8, 32, M(_SC_LEVEL2_CACHE_SIZE),  2097152 },
+    { 0x86,  4, 64, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
+    { 0x87,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),  1048576 },
+    { 0xd0,  4, 64, M(_SC_LEVEL3_CACHE_SIZE),   524288 },
+    { 0xd1,  4, 64, M(_SC_LEVEL3_CACHE_SIZE),  1048576 },
+    { 0xd2,  4, 64, M(_SC_LEVEL3_CACHE_SIZE),  2097152 },
+    { 0xd6,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  1048576 },
+    { 0xd7,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  2097152 },
+    { 0xd8,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  4194304 },
+    { 0xdc, 12, 64, M(_SC_LEVEL3_CACHE_SIZE),  2097152 },
+    { 0xdd, 12, 64, M(_SC_LEVEL3_CACHE_SIZE),  4194304 },
+    { 0xde, 12, 64, M(_SC_LEVEL3_CACHE_SIZE),  8388608 },
+    { 0xe2, 16, 64, M(_SC_LEVEL3_CACHE_SIZE),  2097152 },
+    { 0xe3, 16, 64, M(_SC_LEVEL3_CACHE_SIZE),  4194304 },
+    { 0xe4, 16, 64, M(_SC_LEVEL3_CACHE_SIZE),  8388608 },
+    { 0xea, 24, 64, M(_SC_LEVEL3_CACHE_SIZE), 12582912 },
+    { 0xeb, 24, 64, M(_SC_LEVEL3_CACHE_SIZE), 18874368 },
+    { 0xec, 24, 64, M(_SC_LEVEL3_CACHE_SIZE), 25165824 },
+  };
+
+#define nintel_02_known (sizeof (intel_02_known) / sizeof (intel_02_known [0]))
+
+static int
+intel_02_known_compare (const void *p1, const void *p2)
+{
+  const struct intel_02_cache_info *i1;
+  const struct intel_02_cache_info *i2;
+
+  i1 = (const struct intel_02_cache_info *) p1;
+  i2 = (const struct intel_02_cache_info *) p2;
+
+  if (i1->idx == i2->idx)
+    return 0;
+
+  return i1->idx < i2->idx ? -1 : 1;
+}
+
+
+static long int
+__attribute__ ((noinline))
+intel_check_word (int name, unsigned int value, bool *has_level_2,
+		  bool *no_level_2_or_3,
+		  const struct cpu_features *cpu_features)
+{
+  if ((value & 0x80000000) != 0)
+    /* The register value is reserved.  */
+    return 0;
+
+  /* Fold the name.  The _SC_ constants are always in the order SIZE,
+     ASSOC, LINESIZE.  */
+  int folded_rel_name = (M(name) / 3) * 3;
+
+  while (value != 0)
+    {
+      unsigned int byte = value & 0xff;
+
+      if (byte == 0x40)
+	{
+	  *no_level_2_or_3 = true;
+
+	  if (folded_rel_name == M(_SC_LEVEL3_CACHE_SIZE))
+	    /* No need to look further.  */
+	    break;
+	}
+      else if (byte == 0xff)
+	{
+	  /* CPUID leaf 0x4 contains all the information.  We need to
+	     iterate over it.  */
+	  unsigned int eax;
+	  unsigned int ebx;
+	  unsigned int ecx;
+	  unsigned int edx;
+
+	  unsigned int round = 0;
+	  while (1)
+	    {
+	      __cpuid_count (4, round, eax, ebx, ecx, edx);
+
+	      enum { null = 0, data = 1, inst = 2, uni = 3 } type = eax & 0x1f;
+	      if (type == null)
+		/* That was the end.  */
+		break;
+
+	      unsigned int level = (eax >> 5) & 0x7;
+
+	      if ((level == 1 && type == data
+		   && folded_rel_name == M(_SC_LEVEL1_DCACHE_SIZE))
+		  || (level == 1 && type == inst
+		      && folded_rel_name == M(_SC_LEVEL1_ICACHE_SIZE))
+		  || (level == 2 && folded_rel_name == M(_SC_LEVEL2_CACHE_SIZE))
+		  || (level == 3 && folded_rel_name == M(_SC_LEVEL3_CACHE_SIZE))
+		  || (level == 4 && folded_rel_name == M(_SC_LEVEL4_CACHE_SIZE)))
+		{
+		  unsigned int offset = M(name) - folded_rel_name;
+
+		  if (offset == 0)
+		    /* Cache size.  */
+		    return (((ebx >> 22) + 1)
+			    * (((ebx >> 12) & 0x3ff) + 1)
+			    * ((ebx & 0xfff) + 1)
+			    * (ecx + 1));
+		  if (offset == 1)
+		    return (ebx >> 22) + 1;
+
+		  assert (offset == 2);
+		  return (ebx & 0xfff) + 1;
+		}
+
+	      ++round;
+	    }
+	  /* There is no other cache information anywhere else.  */
+	  break;
+	}
+      else
+	{
+	  if (byte == 0x49 && folded_rel_name == M(_SC_LEVEL3_CACHE_SIZE))
+	    {
+	      /* Intel reused this value.  For family 15, model 6 it
+		 specifies the 3rd level cache.  Otherwise the 2nd
+		 level cache.  */
+	      unsigned int family = cpu_features->basic.family;
+	      unsigned int model = cpu_features->basic.model;
+
+	      if (family == 15 && model == 6)
+		{
+		  /* The level 3 cache is encoded for this model like
+		     the level 2 cache is for other models.  Pretend
+		     the caller asked for the level 2 cache.  */
+		  name = (_SC_LEVEL2_CACHE_SIZE
+			  + (name - _SC_LEVEL3_CACHE_SIZE));
+		  folded_rel_name = M(_SC_LEVEL2_CACHE_SIZE);
+		}
+	    }
+
+	  struct intel_02_cache_info *found;
+	  struct intel_02_cache_info search;
+
+	  search.idx = byte;
+	  found = bsearch (&search, intel_02_known, nintel_02_known,
+			   sizeof (intel_02_known[0]), intel_02_known_compare);
+	  if (found != NULL)
+	    {
+	      if (found->rel_name == folded_rel_name)
+		{
+		  unsigned int offset = M(name) - folded_rel_name;
+
+		  if (offset == 0)
+		    /* Cache size.  */
+		    return found->size;
+		  if (offset == 1)
+		    return found->assoc;
+
+		  assert (offset == 2);
+		  return found->linesize;
+		}
+
+	      if (found->rel_name == M(_SC_LEVEL2_CACHE_SIZE))
+		*has_level_2 = true;
+	    }
+	}
+
+      /* Next byte for the next round.  */
+      value >>= 8;
+    }
+
+  /* Nothing found.  */
+  return 0;
+}
+
+
+static long int __attribute__ ((noinline))
+handle_intel (int name, const struct cpu_features *cpu_features)
+{
+  unsigned int maxidx = cpu_features->basic.max_cpuid;
+
+  /* Return -1 for older CPUs.  */
+  if (maxidx < 2)
+    return -1;
+
+  /* OK, we can use the CPUID instruction to get all info about the
+     caches.  */
+  unsigned int cnt = 0;
+  unsigned int max = 1;
+  long int result = 0;
+  bool no_level_2_or_3 = false;
+  bool has_level_2 = false;
+
+  while (cnt++ < max)
+    {
+      unsigned int eax;
+      unsigned int ebx;
+      unsigned int ecx;
+      unsigned int edx;
+      __cpuid (2, eax, ebx, ecx, edx);
+
+      /* The low byte of EAX in the first round contain the number of
+	 rounds we have to make.  At least one, the one we are already
+	 doing.  */
+      if (cnt == 1)
+	{
+	  max = eax & 0xff;
+	  eax &= 0xffffff00;
+	}
+
+      /* Process the individual registers' value.  */
+      result = intel_check_word (name, eax, &has_level_2,
+				 &no_level_2_or_3, cpu_features);
+      if (result != 0)
+	return result;
+
+      result = intel_check_word (name, ebx, &has_level_2,
+				 &no_level_2_or_3, cpu_features);
+      if (result != 0)
+	return result;
+
+      result = intel_check_word (name, ecx, &has_level_2,
+				 &no_level_2_or_3, cpu_features);
+      if (result != 0)
+	return result;
+
+      result = intel_check_word (name, edx, &has_level_2,
+				 &no_level_2_or_3, cpu_features);
+      if (result != 0)
+	return result;
+    }
+
+  if (name >= _SC_LEVEL2_CACHE_SIZE && name <= _SC_LEVEL3_CACHE_LINESIZE
+      && no_level_2_or_3)
+    return -1;
+
+  return 0;
+}
+
+
+static long int __attribute__ ((noinline))
+handle_amd (int name)
+{
+  unsigned int eax;
+  unsigned int ebx;
+  unsigned int ecx;
+  unsigned int edx;
+  __cpuid (0x80000000, eax, ebx, ecx, edx);
+
+  /* No level 4 cache (yet).  */
+  if (name > _SC_LEVEL3_CACHE_LINESIZE)
+    return 0;
+
+  unsigned int fn = 0x80000005 + (name >= _SC_LEVEL2_CACHE_SIZE);
+  if (eax < fn)
+    return 0;
+
+  __cpuid (fn, eax, ebx, ecx, edx);
+
+  if (name < _SC_LEVEL1_DCACHE_SIZE)
+    {
+      name += _SC_LEVEL1_DCACHE_SIZE - _SC_LEVEL1_ICACHE_SIZE;
+      ecx = edx;
+    }
+
+  switch (name)
+    {
+    case _SC_LEVEL1_DCACHE_SIZE:
+      return (ecx >> 14) & 0x3fc00;
+
+    case _SC_LEVEL1_DCACHE_ASSOC:
+      ecx >>= 16;
+      if ((ecx & 0xff) == 0xff)
+	/* Fully associative.  */
+	return (ecx << 2) & 0x3fc00;
+      return ecx & 0xff;
+
+    case _SC_LEVEL1_DCACHE_LINESIZE:
+      return ecx & 0xff;
+
+    case _SC_LEVEL2_CACHE_SIZE:
+      return (ecx & 0xf000) == 0 ? 0 : (ecx >> 6) & 0x3fffc00;
+
+    case _SC_LEVEL2_CACHE_ASSOC:
+      switch ((ecx >> 12) & 0xf)
+	{
+	case 0:
+	case 1:
+	case 2:
+	case 4:
+	  return (ecx >> 12) & 0xf;
+	case 6:
+	  return 8;
+	case 8:
+	  return 16;
+	case 10:
+	  return 32;
+	case 11:
+	  return 48;
+	case 12:
+	  return 64;
+	case 13:
+	  return 96;
+	case 14:
+	  return 128;
+	case 15:
+	  return ((ecx >> 6) & 0x3fffc00) / (ecx & 0xff);
+	default:
+	  return 0;
+	}
+      /* NOTREACHED */
+
+    case _SC_LEVEL2_CACHE_LINESIZE:
+      return (ecx & 0xf000) == 0 ? 0 : ecx & 0xff;
+
+    case _SC_LEVEL3_CACHE_SIZE:
+      return (edx & 0xf000) == 0 ? 0 : (edx & 0x3ffc0000) << 1;
+
+    case _SC_LEVEL3_CACHE_ASSOC:
+      switch ((edx >> 12) & 0xf)
+	{
+	case 0:
+	case 1:
+	case 2:
+	case 4:
+	  return (edx >> 12) & 0xf;
+	case 6:
+	  return 8;
+	case 8:
+	  return 16;
+	case 10:
+	  return 32;
+	case 11:
+	  return 48;
+	case 12:
+	  return 64;
+	case 13:
+	  return 96;
+	case 14:
+	  return 128;
+	case 15:
+	  return ((edx & 0x3ffc0000) << 1) / (edx & 0xff);
+	default:
+	  return 0;
+	}
+      /* NOTREACHED */
+
+    case _SC_LEVEL3_CACHE_LINESIZE:
+      return (edx & 0xf000) == 0 ? 0 : edx & 0xff;
+
+    default:
+      assert (! "cannot happen");
+    }
+  return -1;
+}
+
+
+static long int __attribute__ ((noinline))
+handle_zhaoxin (int name)
+{
+  unsigned int eax;
+  unsigned int ebx;
+  unsigned int ecx;
+  unsigned int edx;
+
+  int folded_rel_name = (M(name) / 3) * 3;
+
+  unsigned int round = 0;
+  while (1)
+    {
+      __cpuid_count (4, round, eax, ebx, ecx, edx);
+
+      enum { null = 0, data = 1, inst = 2, uni = 3 } type = eax & 0x1f;
+      if (type == null)
+        break;
+
+      unsigned int level = (eax >> 5) & 0x7;
+
+      if ((level == 1 && type == data
+        && folded_rel_name == M(_SC_LEVEL1_DCACHE_SIZE))
+        || (level == 1 && type == inst
+            && folded_rel_name == M(_SC_LEVEL1_ICACHE_SIZE))
+        || (level == 2 && folded_rel_name == M(_SC_LEVEL2_CACHE_SIZE))
+        || (level == 3 && folded_rel_name == M(_SC_LEVEL3_CACHE_SIZE)))
+        {
+          unsigned int offset = M(name) - folded_rel_name;
+
+          if (offset == 0)
+            /* Cache size.  */
+            return (((ebx >> 22) + 1)
+                * (((ebx >> 12) & 0x3ff) + 1)
+                * ((ebx & 0xfff) + 1)
+                * (ecx + 1));
+          if (offset == 1)
+            return (ebx >> 22) + 1;
+
+          assert (offset == 2);
+          return (ebx & 0xfff) + 1;
+        }
+
+      ++round;
+    }
+
+  /* Nothing found.  */
+  return 0;
+}
+
+
+static void
+get_common_cache_info (long int *shared_ptr, unsigned int *threads_ptr,
+                long int core)
+{
+  unsigned int eax;
+  unsigned int ebx;
+  unsigned int ecx;
+  unsigned int edx;
+
+  /* Number of logical processors sharing L2 cache.  */
+  int threads_l2;
+
+  /* Number of logical processors sharing L3 cache.  */
+  int threads_l3;
+
+  const struct cpu_features *cpu_features = __get_cpu_features ();
+  int max_cpuid = cpu_features->basic.max_cpuid;
+  unsigned int family = cpu_features->basic.family;
+  unsigned int model = cpu_features->basic.model;
+  long int shared = *shared_ptr;
+  unsigned int threads = *threads_ptr;
+  bool inclusive_cache = true;
+  bool support_count_mask = true;
+
+  /* Try L3 first.  */
+  unsigned int level = 3;
+
+  if (cpu_features->basic.kind == arch_kind_zhaoxin && family == 6)
+    support_count_mask = false;
+
+  if (shared <= 0)
+    {
+      /* Try L2 otherwise.  */
+      level  = 2;
+      shared = core;
+      threads_l2 = 0;
+      threads_l3 = -1;
+    }
+  else
+    {
+      threads_l2 = 0;
+      threads_l3 = 0;
+    }
+
+  /* A value of 0 for the HTT bit indicates there is only a single
+     logical processor.  */
+  if (HAS_CPU_FEATURE (HTT))
+    {
+      /* Figure out the number of logical threads that share the
+         highest cache level.  */
+      if (max_cpuid >= 4)
+        {
+          int i = 0;
+
+          /* Query until cache level 2 and 3 are enumerated.  */
+          int check = 0x1 | (threads_l3 == 0) << 1;
+          do
+            {
+              __cpuid_count (4, i++, eax, ebx, ecx, edx);
+
+              /* There seems to be a bug in at least some Pentium Ds
+                 which sometimes fail to iterate all cache parameters.
+                 Do not loop indefinitely here, stop in this case and
+                 assume there is no such information.  */
+              if (cpu_features->basic.kind == arch_kind_intel
+                  && (eax & 0x1f) == 0 )
+                goto intel_bug_no_cache_info;
+
+              switch ((eax >> 5) & 0x7)
+                {
+                  default:
+                    break;
+                  case 2:
+                    if ((check & 0x1))
+                      {
+                        /* Get maximum number of logical processors
+                           sharing L2 cache.  */
+                        threads_l2 = (eax >> 14) & 0x3ff;
+                        check &= ~0x1;
+                      }
+                    break;
+                  case 3:
+                    if ((check & (0x1 << 1)))
+                      {
+                        /* Get maximum number of logical processors
+                           sharing L3 cache.  */
+                        threads_l3 = (eax >> 14) & 0x3ff;
+
+                        /* Check if L2 and L3 caches are inclusive.  */
+                        inclusive_cache = (edx & 0x2) != 0;
+                        check &= ~(0x1 << 1);
+                      }
+                    break;
+                }
+            }
+          while (check);
+
+          /* If max_cpuid >= 11, THREADS_L2/THREADS_L3 are the maximum
+             numbers of addressable IDs for logical processors sharing
+             the cache, instead of the maximum number of threads
+             sharing the cache.  */
+          if (max_cpuid >= 11 && support_count_mask)
+            {
+              /* Find the number of logical processors shipped in
+                 one core and apply count mask.  */
+              i = 0;
+
+              /* Count SMT only if there is L3 cache.  Always count
+                 core if there is no L3 cache.  */
+              int count = ((threads_l2 > 0 && level == 3)
+                           | ((threads_l3 > 0
+                               || (threads_l2 > 0 && level == 2)) << 1));
+
+              while (count)
+                {
+                  __cpuid_count (11, i++, eax, ebx, ecx, edx);
+
+                  int shipped = ebx & 0xff;
+                  int type = ecx & 0xff00;
+                  if (shipped == 0 || type == 0)
+                    break;
+                  else if (type == 0x100)
+                    {
+                      /* Count SMT.  */
+                      if ((count & 0x1))
+                        {
+                          int count_mask;
+
+                          /* Compute count mask.  */
+                          asm ("bsr %1, %0"
+                               : "=r" (count_mask) : "g" (threads_l2));
+                          count_mask = ~(-1 << (count_mask + 1));
+                          threads_l2 = (shipped - 1) & count_mask;
+                          count &= ~0x1;
+                        }
+                    }
+                  else if (type == 0x200)
+                    {
+                      /* Count core.  */
+                      if ((count & (0x1 << 1)))
+                        {
+                          int count_mask;
+                          int threads_core
+                            = (level == 2 ? threads_l2 : threads_l3);
+
+                          /* Compute count mask.  */
+                          asm ("bsr %1, %0"
+                               : "=r" (count_mask) : "g" (threads_core));
+                          count_mask = ~(-1 << (count_mask + 1));
+                          threads_core = (shipped - 1) & count_mask;
+                          if (level == 2)
+                            threads_l2 = threads_core;
+                          else
+                            threads_l3 = threads_core;
+                          count &= ~(0x1 << 1);
+                        }
+                    }
+                }
+            }
+          if (threads_l2 > 0)
+            threads_l2 += 1;
+          if (threads_l3 > 0)
+            threads_l3 += 1;
+          if (level == 2)
+            {
+              if (threads_l2)
+                {
+                  threads = threads_l2;
+                  if (cpu_features->basic.kind == arch_kind_intel
+                      && threads > 2
+                      && family == 6)
+                    switch (model)
+                      {
+                        case 0x37:
+                        case 0x4a:
+                        case 0x4d:
+                        case 0x5a:
+                        case 0x5d:
+                          /* Silvermont has L2 cache shared by 2 cores.  */
+                          threads = 2;
+                          break;
+                        default:
+                          break;
+                      }
+                }
+            }
+          else if (threads_l3)
+            threads = threads_l3;
+        }
+      else
+        {
+intel_bug_no_cache_info:
+          /* Assume that all logical threads share the highest cache
+             level.  */
+          threads
+            = ((cpu_features->features[COMMON_CPUID_INDEX_1].cpuid.ebx
+                >> 16) & 0xff);
+        }
+
+        /* Cap usage of highest cache level to the number of supported
+           threads.  */
+        if (shared > 0 && threads > 0)
+          shared /= threads;
+    }
+
+  /* Account for non-inclusive L2 and L3 caches.  */
+  if (!inclusive_cache)
+    {
+      if (threads_l2 > 0)
+        core /= threads_l2;
+      shared += core;
+    }
+
+  *shared_ptr = shared;
+  *threads_ptr = threads;
+}
+
+static void
+dl_init_cacheinfo (struct cpu_features *cpu_features)
+{
+  /* Find out what brand of processor.  */
+  unsigned int ebx;
+  unsigned int ecx;
+  unsigned int edx;
+  int max_cpuid_ex;
+  long int data = -1;
+  long int shared = -1;
+  long int core;
+  unsigned int threads = 0;
+  unsigned long int level1_icache_size = -1;
+  unsigned long int level1_dcache_size = -1;
+  unsigned long int level1_dcache_assoc = -1;
+  unsigned long int level1_dcache_linesize = -1;
+  unsigned long int level2_cache_size = -1;
+  unsigned long int level2_cache_assoc = -1;
+  unsigned long int level2_cache_linesize = -1;
+  unsigned long int level3_cache_size = -1;
+  unsigned long int level3_cache_assoc = -1;
+  unsigned long int level3_cache_linesize = -1;
+  unsigned long int level4_cache_size = -1;
+
+  if (cpu_features->basic.kind == arch_kind_intel)
+    {
+      data = handle_intel (_SC_LEVEL1_DCACHE_SIZE, cpu_features);
+      core = handle_intel (_SC_LEVEL2_CACHE_SIZE, cpu_features);
+      shared = handle_intel (_SC_LEVEL3_CACHE_SIZE, cpu_features);
+
+      level1_icache_size
+	= handle_intel (_SC_LEVEL1_ICACHE_SIZE, cpu_features);
+      level1_dcache_size = data;
+      level1_dcache_assoc
+	= handle_intel (_SC_LEVEL1_DCACHE_ASSOC, cpu_features);
+      level1_dcache_linesize
+	= handle_intel (_SC_LEVEL1_DCACHE_LINESIZE, cpu_features);
+      level2_cache_size = core;
+      level2_cache_assoc
+	= handle_intel (_SC_LEVEL2_CACHE_ASSOC, cpu_features);
+      level2_cache_linesize
+	= handle_intel (_SC_LEVEL2_CACHE_LINESIZE, cpu_features);
+      level3_cache_size = shared;
+      level3_cache_assoc
+	= handle_intel (_SC_LEVEL3_CACHE_ASSOC, cpu_features);
+      level3_cache_linesize
+	= handle_intel (_SC_LEVEL3_CACHE_LINESIZE, cpu_features);
+      level4_cache_size
+	= handle_intel (_SC_LEVEL4_CACHE_SIZE, cpu_features);
+
+      get_common_cache_info (&shared, &threads, core);
+    }
+  else if (cpu_features->basic.kind == arch_kind_zhaoxin)
+    {
+      data = handle_zhaoxin (_SC_LEVEL1_DCACHE_SIZE);
+      core = handle_zhaoxin (_SC_LEVEL2_CACHE_SIZE);
+      shared = handle_zhaoxin (_SC_LEVEL3_CACHE_SIZE);
+
+      level1_icache_size = handle_zhaoxin (_SC_LEVEL1_ICACHE_SIZE);
+      level1_dcache_size = data;
+      level1_dcache_assoc = handle_zhaoxin (_SC_LEVEL1_DCACHE_ASSOC);
+      level1_dcache_linesize = handle_zhaoxin (_SC_LEVEL1_DCACHE_LINESIZE);
+      level2_cache_size = core;
+      level2_cache_assoc = handle_zhaoxin (_SC_LEVEL2_CACHE_ASSOC);
+      level2_cache_linesize = handle_zhaoxin (_SC_LEVEL2_CACHE_LINESIZE);
+      level3_cache_size = shared;
+      level3_cache_assoc = handle_zhaoxin (_SC_LEVEL3_CACHE_ASSOC);
+      level3_cache_linesize = handle_zhaoxin (_SC_LEVEL3_CACHE_LINESIZE);
+
+      get_common_cache_info (&shared, &threads, core);
+    }
+  else if (cpu_features->basic.kind == arch_kind_amd)
+    {
+      data  = handle_amd (_SC_LEVEL1_DCACHE_SIZE);
+      core = handle_amd (_SC_LEVEL2_CACHE_SIZE);
+      shared = handle_amd (_SC_LEVEL3_CACHE_SIZE);
+
+      level1_icache_size = handle_amd (_SC_LEVEL1_ICACHE_SIZE);
+      level1_dcache_size = data;
+      level1_dcache_assoc = handle_amd (_SC_LEVEL1_DCACHE_ASSOC);
+      level1_dcache_linesize = handle_amd (_SC_LEVEL1_DCACHE_LINESIZE);
+      level2_cache_size = core;
+      level2_cache_assoc = handle_amd (_SC_LEVEL2_CACHE_ASSOC);
+      level2_cache_linesize = handle_amd (_SC_LEVEL2_CACHE_LINESIZE);
+      level3_cache_size = shared;
+      level3_cache_assoc = handle_amd (_SC_LEVEL3_CACHE_ASSOC);
+      level3_cache_linesize = handle_amd (_SC_LEVEL3_CACHE_LINESIZE);
+
+      /* Get maximum extended function. */
+      __cpuid (0x80000000, max_cpuid_ex, ebx, ecx, edx);
+
+      if (shared <= 0)
+	/* No shared L3 cache.  All we have is the L2 cache.  */
+	shared = core;
+      else
+	{
+	  /* Figure out the number of logical threads that share L3.  */
+	  if (max_cpuid_ex >= 0x80000008)
+	    {
+	      /* Get width of APIC ID.  */
+	      __cpuid (0x80000008, max_cpuid_ex, ebx, ecx, edx);
+	      threads = 1 << ((ecx >> 12) & 0x0f);
+	    }
+
+	  if (threads == 0)
+	    {
+	      /* If APIC ID width is not available, use logical
+		 processor count.  */
+	      __cpuid (0x00000001, max_cpuid_ex, ebx, ecx, edx);
+
+	      if ((edx & (1 << 28)) != 0)
+		threads = (ebx >> 16) & 0xff;
+	    }
+
+	  /* Cap usage of highest cache level to the number of
+	     supported threads.  */
+	  if (threads > 0)
+	    shared /= threads;
+
+	  /* Account for exclusive L2 and L3 caches.  */
+	  shared += core;
+	}
+    }
+
+  cpu_features->level1_icache_size = level1_icache_size;
+  cpu_features->level1_dcache_size = level1_dcache_size;
+  cpu_features->level1_dcache_assoc = level1_dcache_assoc;
+  cpu_features->level1_dcache_linesize = level1_dcache_linesize;
+  cpu_features->level2_cache_size = level2_cache_size;
+  cpu_features->level2_cache_assoc = level2_cache_assoc;
+  cpu_features->level2_cache_linesize = level2_cache_linesize;
+  cpu_features->level3_cache_size = level3_cache_size;
+  cpu_features->level3_cache_assoc = level3_cache_assoc;
+  cpu_features->level3_cache_linesize = level3_cache_linesize;
+  cpu_features->level4_cache_size = level4_cache_size;
+
+  /* The large memcpy micro benchmark in glibc shows that 6 times of
+     shared cache size is the approximate value above which non-temporal
+     store becomes faster on a 8-core processor.  This is the 3/4 of the
+     total shared cache size.  */
+  unsigned long int non_temporal_threshold = (shared * threads * 3 / 4);
+
+#if HAVE_TUNABLES
+  /* NB: The REP MOVSB threshold must be greater than VEC_SIZE * 8.  */
+  unsigned int minimum_rep_movsb_threshold;
+#endif
+  /* NB: The default REP MOVSB threshold is 2048 * (VEC_SIZE / 16).  */
+  unsigned int rep_movsb_threshold;
+  if (CPU_FEATURE_USABLE_P (cpu_features, AVX512F)
+      && !CPU_FEATURE_PREFERRED_P (cpu_features, Prefer_No_AVX512))
+    {
+      rep_movsb_threshold = 2048 * (64 / 16);
+#if HAVE_TUNABLES
+      minimum_rep_movsb_threshold = 64 * 8;
+#endif
+    }
+  else if (CPU_FEATURE_PREFERRED_P (cpu_features,
+				    AVX_Fast_Unaligned_Load))
+    {
+      rep_movsb_threshold = 2048 * (32 / 16);
+#if HAVE_TUNABLES
+      minimum_rep_movsb_threshold = 32 * 8;
+#endif
+    }
+  else
+    {
+      rep_movsb_threshold = 2048 * (16 / 16);
+#if HAVE_TUNABLES
+      minimum_rep_movsb_threshold = 16 * 8;
+#endif
+    }
+
+  /* The default threshold to use Enhanced REP STOSB.  */
+  unsigned long int rep_stosb_threshold = 2048;
+
+#if HAVE_TUNABLES
+  long int tunable_size;
+
+  tunable_size = TUNABLE_GET (x86_data_cache_size, long int, NULL);
+  /* NB: Ignore the default value 0.  */
+  if (tunable_size)
+    data = tunable_size;
+
+  tunable_size = TUNABLE_GET (x86_shared_cache_size, long int, NULL);
+  /* NB: Ignore the default value 0.  */
+  if (tunable_size)
+    shared = tunable_size;
+
+  tunable_size = TUNABLE_GET (x86_non_temporal_threshold, long int, NULL);
+  /* NB: Ignore the default value 0.  */
+  if (tunable_size)
+    non_temporal_threshold = tunable_size;
+
+  tunable_size = TUNABLE_GET (x86_rep_movsb_threshold, long int, NULL);
+  if (tunable_size > minimum_rep_movsb_threshold)
+    rep_movsb_threshold = tunable_size;
+
+  /* NB: The default value of the x86_rep_stosb_threshold tunable is the
+     same as the default value of __x86_rep_stosb_threshold and the
+     minimum value is fixed.  */
+  rep_stosb_threshold = TUNABLE_GET (x86_rep_stosb_threshold,
+				     long int, NULL);
+
+  TUNABLE_SET_ALL (x86_data_cache_size, long int, data,
+		   0, (long int) -1);
+  TUNABLE_SET_ALL (x86_shared_cache_size, long int, shared,
+		   0, (long int) -1);
+  TUNABLE_SET_ALL (x86_non_temporal_threshold, long int,
+		   non_temporal_threshold, 0, (long int) -1);
+  TUNABLE_SET_ALL (x86_rep_movsb_threshold, long int,
+		   rep_movsb_threshold, minimum_rep_movsb_threshold,
+		   (long int) -1);
+  TUNABLE_SET_ALL (x86_rep_stosb_threshold, long int,
+		   rep_stosb_threshold, 1, (long int) -1);
+#endif
+
+  cpu_features->data_cache_size = data;
+  cpu_features->shared_cache_size = shared;
+  cpu_features->non_temporal_threshold = non_temporal_threshold;
+  cpu_features->rep_movsb_threshold = rep_movsb_threshold;
+  cpu_features->rep_stosb_threshold = rep_stosb_threshold;
+}
diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
index 6551df19c0..b45b45c1c5 100644
--- a/sysdeps/x86/cpu-features.c
+++ b/sysdeps/x86/cpu-features.c
@@ -19,6 +19,7 @@
 #include <cpuid.h>
 #include <cpu-features.h>
 #include <dl-hwcap.h>
+#include <init-arch.h>
 #include <libc-pointer-arith.h>
 
 #if HAVE_TUNABLES
@@ -37,6 +38,8 @@ extern void TUNABLE_CALLBACK (set_x86_shstk) (tunable_val_t *)
 # endif
 #endif
 
+#include <cpu-cacheinfo.c>
+
 #if CET_ENABLED
 # include <dl-cet.h>
 #endif
@@ -630,24 +633,14 @@ no_cpuid:
   cpu_features->basic.model = model;
   cpu_features->basic.stepping = stepping;
 
+  dl_init_cacheinfo (cpu_features);
+
 #if HAVE_TUNABLES
   TUNABLE_GET (hwcaps, tunable_val_t *, TUNABLE_CALLBACK (set_hwcaps));
-  cpu_features->non_temporal_threshold
-    = TUNABLE_GET (x86_non_temporal_threshold, long int, NULL);
-  cpu_features->rep_movsb_threshold
-    = TUNABLE_GET (x86_rep_movsb_threshold, long int, NULL);
-  cpu_features->rep_stosb_threshold
-    = TUNABLE_GET (x86_rep_stosb_threshold, long int, NULL);
-  cpu_features->data_cache_size
-    = TUNABLE_GET (x86_data_cache_size, long int, NULL);
-  cpu_features->shared_cache_size
-    = TUNABLE_GET (x86_shared_cache_size, long int, NULL);
-#endif
-
-  /* Reuse dl_platform, dl_hwcap and dl_hwcap_mask for x86.  */
-#if !HAVE_TUNABLES && defined SHARED
-  /* The glibc.cpu.hwcap_mask tunable is initialized already, so no need to do
-     this.  */
+#elif defined SHARED
+  /* Reuse dl_platform, dl_hwcap and dl_hwcap_mask for x86.  The
+     glibc.cpu.hwcap_mask tunable is initialized already, so no
+     need to do this.  */
   GLRO(dl_hwcap_mask) = HWCAP_IMPORTANT;
 #endif
 
diff --git a/sysdeps/x86/include/cpu-features.h b/sysdeps/x86/include/cpu-features.h
index f62be0b9b3..3f3bd93320 100644
--- a/sysdeps/x86/include/cpu-features.h
+++ b/sysdeps/x86/include/cpu-features.h
@@ -153,6 +153,28 @@ struct cpu_features
   unsigned long int rep_movsb_threshold;
   /* Threshold to use "rep stosb".  */
   unsigned long int rep_stosb_threshold;
+  /* _SC_LEVEL1_ICACHE_SIZE.  */
+  unsigned long int level1_icache_size;
+  /* _SC_LEVEL1_DCACHE_SIZE.  */
+  unsigned long int level1_dcache_size;
+  /* _SC_LEVEL1_DCACHE_ASSOC.  */
+  unsigned long int level1_dcache_assoc;
+  /* _SC_LEVEL1_DCACHE_LINESIZE.  */
+  unsigned long int level1_dcache_linesize;
+  /* _SC_LEVEL2_CACHE_ASSOC.  */
+  unsigned long int level2_cache_size;
+  /* _SC_LEVEL2_DCACHE_ASSOC.  */
+  unsigned long int level2_cache_assoc;
+  /* _SC_LEVEL2_CACHE_LINESIZE.  */
+  unsigned long int level2_cache_linesize;
+  /* /_SC_LEVEL3_CACHE_SIZE.  */
+  unsigned long int level3_cache_size;
+  /* _SC_LEVEL3_CACHE_ASSOC.  */
+  unsigned long int level3_cache_assoc;
+  /* _SC_LEVEL3_CACHE_LINESIZE.  */
+  unsigned long int level3_cache_linesize;
+  /* /_SC_LEVEL4_CACHE_SIZE.  */
+  unsigned long int level4_cache_size;
 };
 
 # if defined (_LIBC) && !IS_IN (nonlib)
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 4/4] ld.so: Add --list-tunables to print tunable values
  2020-09-18 16:07 V2 [PATCH 0/4] ld.so: Add --list-tunables to print tunable values H.J. Lu via Libc-alpha
                   ` (2 preceding siblings ...)
  2020-09-18 16:07 ` [PATCH 3/4] x86: Move x86 processor cache info to cpu_features H.J. Lu via Libc-alpha
@ 2020-09-18 16:07 ` H.J. Lu via Libc-alpha
  2020-09-21  8:25   ` Florian Weimer via Libc-alpha
  3 siblings, 1 reply; 33+ messages in thread
From: H.J. Lu via Libc-alpha @ 2020-09-18 16:07 UTC (permalink / raw)
  To: libc-alpha; +Cc: Florian Weimer

Pass --list-tunables to ld.so to print tunables with min and max values.
---
 NEWS                 |  2 ++
 elf/Makefile         |  8 ++++++++
 elf/dl-tunables.c    | 36 ++++++++++++++++++++++++++++++++++++
 elf/dl-tunables.h    |  2 ++
 elf/rtld.c           | 31 +++++++++++++++++++++++++++++--
 manual/tunables.texi | 37 +++++++++++++++++++++++++++++++++++++
 6 files changed, 114 insertions(+), 2 deletions(-)

diff --git a/NEWS b/NEWS
index fc8dd15439..f47516a02f 100644
--- a/NEWS
+++ b/NEWS
@@ -9,6 +9,8 @@ Version 2.33
 
 Major new features:
 
+* Pass --list-tunables to ld.so to print tunable values.
+
 * The mallinfo2 function is added to report statistics as per mallinfo,
   but with larger field widths to accurately report values that are
   larger than fit in an integer.
diff --git a/elf/Makefile b/elf/Makefile
index 0b78721848..11e90c9d17 100644
--- a/elf/Makefile
+++ b/elf/Makefile
@@ -44,6 +44,10 @@ dl-routines += dl-tunables
 tunables-type = $(addprefix TUNABLES_FRONTEND_,$(have-tunables))
 CPPFLAGS-dl-tunables.c += -DTUNABLES_FRONTEND=$(tunables-type)
 
+ifeq (yesyes,$(build-shared)$(run-built-tests))
+tests-special += $(objpfx)list-tunables.out
+endif
+
 # Make sure that the compiler does not insert any library calls in tunables
 # code paths.
 ifeq (yes,$(have-loop-to-function))
@@ -1796,3 +1800,7 @@ $(objpfx)tst-tls-ie-dlmopen.out: \
   $(objpfx)tst-tls-ie-mod6.so
 
 $(objpfx)tst-tls-surplus: $(libdl)
+
+$(objpfx)list-tunables.out: $(objpfx)ld.so
+	$(objpfx)ld.so --list-tunables > $@; \
+	$(evaluate-test)
diff --git a/elf/dl-tunables.c b/elf/dl-tunables.c
index b44174fe71..53226ef258 100644
--- a/elf/dl-tunables.c
+++ b/elf/dl-tunables.c
@@ -368,6 +368,42 @@ __tunables_init (char **envp)
     }
 }
 
+void
+__tunables_print (void)
+{
+  for (int i = 0; i < sizeof (tunable_list) / sizeof (tunable_t); i++)
+    {
+      tunable_t *cur = &tunable_list[i];
+      _dl_printf ("%s: ", cur->name);
+      switch (cur->type.type_code)
+	{
+	case TUNABLE_TYPE_INT_32:
+	  _dl_printf ("%d (min: %d, max: %d)\n",
+		      (int) cur->val.numval,
+		      (int) cur->type.min,
+		      (int) cur->type.max);
+	  break;
+	case TUNABLE_TYPE_UINT_64:
+	  _dl_printf ("0x%lx (min: 0x%lx, max: 0x%lx)\n",
+		      (long int) cur->val.numval,
+		      (long int) cur->type.min,
+		      (long int) cur->type.max);
+	  break;
+	case TUNABLE_TYPE_SIZE_T:
+	  _dl_printf ("0x%Zx (min: 0x%Zx, max: 0x%Zx)\n",
+		      (size_t) cur->val.numval,
+		      (size_t) cur->type.min,
+		      (size_t) cur->type.max);
+	  break;
+	case TUNABLE_TYPE_STRING:
+	  _dl_printf ("%s\n", cur->val.strval ? cur->val.strval : "");
+	  break;
+	default:
+	  __builtin_unreachable ();
+	}
+    }
+}
+
 /* Set the tunable value.  This is called by the module that the tunable exists
    in. */
 void
diff --git a/elf/dl-tunables.h b/elf/dl-tunables.h
index b1add3184f..a2c728f0dd 100644
--- a/elf/dl-tunables.h
+++ b/elf/dl-tunables.h
@@ -69,9 +69,11 @@ typedef struct _tunable tunable_t;
 # include "dl-tunable-list.h"
 
 extern void __tunables_init (char **);
+extern void __tunables_print (void);
 extern void __tunable_get_val (tunable_id_t, void *, tunable_callback_t);
 extern void __tunable_set_val (tunable_id_t, void *, void *, void *);
 rtld_hidden_proto (__tunables_init)
+rtld_hidden_proto (__tunables_print)
 rtld_hidden_proto (__tunable_get_val)
 rtld_hidden_proto (__tunable_set_val)
 
diff --git a/elf/rtld.c b/elf/rtld.c
index 5b882163fa..f76b9b0265 100644
--- a/elf/rtld.c
+++ b/elf/rtld.c
@@ -48,6 +48,10 @@
 #include <array_length.h>
 #include <libc-early-init.h>
 
+#if HAVE_TUNABLES
+# include <dl-tunables.h>
+#endif
+
 #include <assert.h>
 
 /* Only enables rtld profiling for architectures which provides non generic
@@ -168,7 +172,7 @@ static void audit_list_add_dynamic_tag (struct audit_list *,
 static const char *audit_list_next (struct audit_list *);
 
 /* This is a list of all the modes the dynamic loader can be in.  */
-enum mode { normal, list, verify, trace };
+enum mode { normal, list, verify, trace, list_tunables };
 
 /* Process all environments variables the dynamic linker must recognize.
    Since all of them start with `LD_' we are a bit smarter while finding
@@ -1263,9 +1267,27 @@ dl_main (const ElfW(Phdr) *phdr,
 	    _dl_argc -= 2;
 	    _dl_argv += 2;
 	  }
+#if HAVE_TUNABLES
+	else if (! strcmp (_dl_argv[1], "--list-tunables"))
+	  {
+	    mode = list_tunables;
+
+	    ++_dl_skip_args;
+	    --_dl_argc;
+	    ++_dl_argv;
+	  }
+#endif
 	else
 	  break;
 
+#if HAVE_TUNABLES
+      if (__builtin_expect (mode, normal) == list_tunables)
+	{
+	  __tunables_print ();
+	  _exit (0);
+	}
+#endif
+
       /* If we have no further argument the program was called incorrectly.
 	 Grant the user some education.  */
       if (_dl_argc < 2)
@@ -1292,7 +1314,12 @@ of this helper program; chances are you did not intend to run this program.\n\
   --inhibit-rpath LIST  ignore RUNPATH and RPATH information in object names\n\
 			in LIST\n\
   --audit LIST          use objects named in LIST as auditors\n\
-  --preload LIST        preload objects named in LIST\n");
+  --preload LIST        preload objects named in LIST\n"
+#if HAVE_TUNABLES
+"\
+  --list-tunables       list all tunables with minimum and maximum values\n"
+#endif
+	  );
 
       ++_dl_skip_args;
       --_dl_argc;
diff --git a/manual/tunables.texi b/manual/tunables.texi
index 23ef0d40e7..38c8578229 100644
--- a/manual/tunables.texi
+++ b/manual/tunables.texi
@@ -28,6 +28,43 @@ Finally, the set of tunables available may vary between distributions as
 the tunables feature allows distributions to add their own tunables under
 their own namespace.
 
+Passing @option{--list-tunables} to the dynamic loader to print all
+tunables with minimum and maximum values:
+
+@example
+$ /lib64/ld-linux-x86-64.so.2 --list-tunables
+glibc.rtld.nns: 0x4 (min: 0x1, max: 0x10)
+glibc.elision.skip_lock_after_retries: 3 (min: -2147483648, max: 2147483647)
+glibc.malloc.trim_threshold: 0x0 (min: 0x0, max: 0xffffffffffffffff)
+glibc.malloc.perturb: 0 (min: 0, max: 255)
+glibc.cpu.x86_shared_cache_size: 0x100000 (min: 0x0, max: 0xffffffffffffffff)
+glibc.elision.tries: 3 (min: -2147483648, max: 2147483647)
+glibc.elision.enable: 0 (min: 0, max: 1)
+glibc.cpu.x86_rep_movsb_threshold: 0x800 (min: 0x100, max: 0xffffffffffffffff)
+glibc.malloc.mxfast: 0x0 (min: 0x0, max: 0xffffffffffffffff)
+glibc.elision.skip_lock_busy: 3 (min: -2147483648, max: 2147483647)
+glibc.malloc.top_pad: 0x0 (min: 0x0, max: 0xffffffffffffffff)
+glibc.cpu.x86_rep_stosb_threshold: 0x800 (min: 0x1, max: 0xffffffffffffffff)
+glibc.cpu.x86_non_temporal_threshold: 0x600000 (min: 0x0, max: 0xffffffffffffffff)
+glibc.cpu.x86_shstk: 
+glibc.cpu.hwcap_mask: 0x6 (min: 0x0, max: 0xffffffffffffffff)
+glibc.malloc.mmap_max: 0 (min: -2147483648, max: 2147483647)
+glibc.elision.skip_trylock_internal_abort: 3 (min: -2147483648, max: 2147483647)
+glibc.malloc.tcache_unsorted_limit: 0x0 (min: 0x0, max: 0xffffffffffffffff)
+glibc.cpu.x86_ibt: 
+glibc.cpu.hwcaps: 
+glibc.elision.skip_lock_internal_abort: 3 (min: -2147483648, max: 2147483647)
+glibc.malloc.arena_max: 0x0 (min: 0x1, max: 0xffffffffffffffff)
+glibc.malloc.mmap_threshold: 0x0 (min: 0x0, max: 0xffffffffffffffff)
+glibc.cpu.x86_data_cache_size: 0x8000 (min: 0x0, max: 0xffffffffffffffff)
+glibc.malloc.tcache_count: 0x0 (min: 0x0, max: 0xffffffffffffffff)
+glibc.malloc.arena_test: 0x0 (min: 0x1, max: 0xffffffffffffffff)
+glibc.pthread.mutex_spin_count: 100 (min: 0, max: 32767)
+glibc.rtld.optional_static_tls: 0x200 (min: 0x0, max: 0xffffffffffffffff)
+glibc.malloc.tcache_max: 0x0 (min: 0x0, max: 0xffffffffffffffff)
+glibc.malloc.check: 0 (min: 0, max: 3)
+@end example
+
 @menu
 * Tunable names::  The structure of a tunable name
 * Memory Allocation Tunables::  Tunables in the memory allocation subsystem
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 4/4] ld.so: Add --list-tunables to print tunable values
  2020-09-18 16:07 ` [PATCH 4/4] ld.so: Add --list-tunables to print tunable values H.J. Lu via Libc-alpha
@ 2020-09-21  8:25   ` Florian Weimer via Libc-alpha
  0 siblings, 0 replies; 33+ messages in thread
From: Florian Weimer via Libc-alpha @ 2020-09-21  8:25 UTC (permalink / raw)
  To: H.J. Lu; +Cc: libc-alpha

* H. J. Lu:

> Pass --list-tunables to ld.so to print tunables with min and max values.

The code changes look good to me.  Not sure about the NEWS entry and
what conventions we want to use there.

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/4] x86: Initialize CPU info via IFUNC relocation [BZ 26203]
  2020-09-18 16:07 ` [PATCH 1/4] x86: Initialize CPU info via IFUNC relocation [BZ 26203] H.J. Lu via Libc-alpha
@ 2020-09-28 13:08   ` Florian Weimer via Libc-alpha
  2020-09-28 13:48     ` H.J. Lu via Libc-alpha
  0 siblings, 1 reply; 33+ messages in thread
From: Florian Weimer via Libc-alpha @ 2020-09-28 13:08 UTC (permalink / raw)
  To: H.J. Lu via Libc-alpha

* H. J. Lu via Libc-alpha:

> X86 CPU features in ld.so are initialized by init_cpu_features, which is
> invoked by DL_PLATFORM_INIT from _dl_sysdep_start.  But when ld.so is
> loaded by static executable, DL_PLATFORM_INIT is never called.  Also
> x86 cache info in libc.o and libc.a is initialized by a constructor
> which may be called too late.  Since _rtld_global_ro in ld.so is
> initialized by dynamic relocation,  we can also initialize x86 CPU
> features in _rtld_global_ro in ld.so and cache info in libc.so by
> initializing dummy function pointers in ld.so and libc.so via IFUNC
> relocation.

_rtld_global_ro is *partially* initialized by relocation.  Most of it is
not initialized (see the need for rtld_active).

Please make this a little bit clearer in the commit message.

> diff --git a/sysdeps/i386/dl-machine.h b/sysdeps/i386/dl-machine.h
> index 0f08079e48..5e22e795cc 100644
> --- a/sysdeps/i386/dl-machine.h
> +++ b/sysdeps/i386/dl-machine.h
> @@ -25,7 +25,6 @@
>  #include <sysdep.h>
>  #include <tls.h>
>  #include <dl-tlsdesc.h>
> -#include <cpu-features.c>
>  
>  /* Return nonzero iff ELF header is compatible with the running host.  */
>  static inline int __attribute__ ((unused))
> @@ -250,7 +249,7 @@ dl_platform_init (void)
>  #if IS_IN (rtld)
>    /* init_cpu_features has been called early from __libc_start_main in
>       static executable.  */
> -  init_cpu_features (&GLRO(dl_x86_cpu_features));
> +  _dl_x86_init_cpu_features ();

Is the commented outdated?

> diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
> index 217c21c34f..7a325ab70e 100644
> --- a/sysdeps/x86/cacheinfo.c
> +++ b/sysdeps/x86/cacheinfo.c

> +  assert (cpu_features->basic.kind != arch_kind_unknown);

Why doesn't this assert fire occasionally?  How do you ensure that
relocation processing is correctly ordered?

> +/* NB: Call init_cacheinfo by initializing a dummy function pointer via
> +   IFUNC relocation.  */
> +extern void __x86_cacheinfo (void) attribute_hidden;
> +const void (*__x86_cacheinfo_p) (void) attribute_hidden
> +  = __x86_cacheinfo;
> +
> +__ifunc (__x86_cacheinfo, __x86_cacheinfo, NULL, void, init_cacheinfo);
>  #endif

Please expand the comment and mention that is used to initialize the
dormant copy of ld.so after static dlopen.  The comment in
sysdeps/x86/dl-get-cpu-features.c is good.

> diff --git a/sysdeps/x86/dl-get-cpu-features.c b/sysdeps/x86/dl-get-cpu-features.c
> index 5f9e46b0c6..da4697b895 100644
> --- a/sysdeps/x86/dl-get-cpu-features.c
> +++ b/sysdeps/x86/dl-get-cpu-features.c
> @@ -1,4 +1,4 @@
> -/* This file is part of the GNU C Library.
> +/* Initialize CPU feature data via IFUNC relocation.
>     Copyright (C) 2015-2020 Free Software Foundation, Inc.
>  
>     The GNU C Library is free software; you can redistribute it and/or
> @@ -18,6 +18,29 @@
>  
>  #include <ldsodefs.h>
>  
> +#ifdef SHARED
> +# include <cpu-features.c>
> +
> +/* NB: Normally, DL_PLATFORM_INIT calls init_cpu_features to initialize
> +   CPU features.  But when loading ld.so inside of static executable,
> +   DL_PLATFORM_INIT isn't called.  Call init_cpu_features by initializing
> +   a dummy function pointer via IFUNC relocation for ld.so.  */
> +extern void __x86_cpu_features (void) attribute_hidden;
> +const void (*__x86_cpu_features_p) (void) attribute_hidden
> +  = __x86_cpu_features;
> +
> +void
> +_dl_x86_init_cpu_features (void)
> +{
> +  struct cpu_features *cpu_features = __get_cpu_features ();
> +  if (cpu_features->basic.kind == arch_kind_unknown)
> +    init_cpu_features (cpu_features);
> +}
> +
> +__ifunc (__x86_cpu_features, __x86_cpu_features, NULL, void,
> +	 _dl_x86_init_cpu_features);
> +#endif
> +
>  #undef __x86_get_cpu_features

Why do we need both the conditional check and the function pointer hack?

I expect that one of the function pointers can go, probably the one
here.  The cache hierarchy data might be used by a string function that
has not been selected by IFUNC.

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/4] Set tunable value as well as min/max values
  2020-09-18 16:07 ` [PATCH 2/4] Set tunable value as well as min/max values H.J. Lu via Libc-alpha
@ 2020-09-28 13:35   ` Florian Weimer via Libc-alpha
  2020-09-28 13:53     ` H.J. Lu via Libc-alpha
  2020-09-28 17:30     ` Siddhesh Poyarekar via Libc-alpha
  0 siblings, 2 replies; 33+ messages in thread
From: Florian Weimer via Libc-alpha @ 2020-09-28 13:35 UTC (permalink / raw)
  To: H.J. Lu via Libc-alpha; +Cc: Siddhesh Poyarekar

* H. J. Lu via Libc-alpha:

> Some tunable values and their minimum/maximum values must be determinted
> at run-time.  Add TUNABLE_SET_ALL and TUNABLE_SET_ALL_FULL to update
> tunable value together with minimum and maximum values.  __tunable_set_val
> is updated to set tunable value as well as min/max values.

I'm not sure if this change is philosophically correct as far as the
tunables framework is concerned.  I had thought the limits should be
something static, so that they are consistent across systems.

Maybe Siddhesh can comment on that aspect?

What is supposed to happen if you specify an out-of-range value?

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/4] x86: Initialize CPU info via IFUNC relocation [BZ 26203]
  2020-09-28 13:08   ` Florian Weimer via Libc-alpha
@ 2020-09-28 13:48     ` H.J. Lu via Libc-alpha
  2020-09-28 14:05       ` Florian Weimer via Libc-alpha
  0 siblings, 1 reply; 33+ messages in thread
From: H.J. Lu via Libc-alpha @ 2020-09-28 13:48 UTC (permalink / raw)
  To: Florian Weimer; +Cc: H.J. Lu via Libc-alpha

On Mon, Sep 28, 2020 at 6:08 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu via Libc-alpha:
>
> > X86 CPU features in ld.so are initialized by init_cpu_features, which is
> > invoked by DL_PLATFORM_INIT from _dl_sysdep_start.  But when ld.so is
> > loaded by static executable, DL_PLATFORM_INIT is never called.  Also
> > x86 cache info in libc.o and libc.a is initialized by a constructor
> > which may be called too late.  Since _rtld_global_ro in ld.so is
> > initialized by dynamic relocation,  we can also initialize x86 CPU
> > features in _rtld_global_ro in ld.so and cache info in libc.so by
> > initializing dummy function pointers in ld.so and libc.so via IFUNC
> > relocation.
>
> _rtld_global_ro is *partially* initialized by relocation.  Most of it is
> not initialized (see the need for rtld_active).
>
> Please make this a little bit clearer in the commit message.

Will do

> > diff --git a/sysdeps/i386/dl-machine.h b/sysdeps/i386/dl-machine.h
> > index 0f08079e48..5e22e795cc 100644
> > --- a/sysdeps/i386/dl-machine.h
> > +++ b/sysdeps/i386/dl-machine.h
> > @@ -25,7 +25,6 @@
> >  #include <sysdep.h>
> >  #include <tls.h>
> >  #include <dl-tlsdesc.h>
> > -#include <cpu-features.c>
> >
> >  /* Return nonzero iff ELF header is compatible with the running host.  */
> >  static inline int __attribute__ ((unused))
> > @@ -250,7 +249,7 @@ dl_platform_init (void)
> >  #if IS_IN (rtld)
> >    /* init_cpu_features has been called early from __libc_start_main in
> >       static executable.  */
> > -  init_cpu_features (&GLRO(dl_x86_cpu_features));
> > +  _dl_x86_init_cpu_features ();
>
> Is the commented outdated?

I will fix it.

> > diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
> > index 217c21c34f..7a325ab70e 100644
> > --- a/sysdeps/x86/cacheinfo.c
> > +++ b/sysdeps/x86/cacheinfo.c
>
> > +  assert (cpu_features->basic.kind != arch_kind_unknown);
>
> Why doesn't this assert fire occasionally?  How do you ensure that

See

https://sourceware.org/bugzilla/show_bug.cgi?id=26203

It only happens in dlopen from a static executable.

> relocation processing is correctly ordered?

cpu_features is also initialized by IFUNC relocation in ld.so which
is relocated before libc.so.

> > +/* NB: Call init_cacheinfo by initializing a dummy function pointer via
> > +   IFUNC relocation.  */
> > +extern void __x86_cacheinfo (void) attribute_hidden;
> > +const void (*__x86_cacheinfo_p) (void) attribute_hidden
> > +  = __x86_cacheinfo;
> > +
> > +__ifunc (__x86_cacheinfo, __x86_cacheinfo, NULL, void, init_cacheinfo);
> >  #endif
>
> Please expand the comment and mention that is used to initialize the
> dormant copy of ld.so after static dlopen.  The comment in
> sysdeps/x86/dl-get-cpu-features.c is good.

Will do.

> > diff --git a/sysdeps/x86/dl-get-cpu-features.c b/sysdeps/x86/dl-get-cpu-features.c
> > index 5f9e46b0c6..da4697b895 100644
> > --- a/sysdeps/x86/dl-get-cpu-features.c
> > +++ b/sysdeps/x86/dl-get-cpu-features.c
> > @@ -1,4 +1,4 @@
> > -/* This file is part of the GNU C Library.
> > +/* Initialize CPU feature data via IFUNC relocation.
> >     Copyright (C) 2015-2020 Free Software Foundation, Inc.
> >
> >     The GNU C Library is free software; you can redistribute it and/or
> > @@ -18,6 +18,29 @@
> >
> >  #include <ldsodefs.h>
> >
> > +#ifdef SHARED
> > +# include <cpu-features.c>
> > +
> > +/* NB: Normally, DL_PLATFORM_INIT calls init_cpu_features to initialize
> > +   CPU features.  But when loading ld.so inside of static executable,
> > +   DL_PLATFORM_INIT isn't called.  Call init_cpu_features by initializing
> > +   a dummy function pointer via IFUNC relocation for ld.so.  */
> > +extern void __x86_cpu_features (void) attribute_hidden;
> > +const void (*__x86_cpu_features_p) (void) attribute_hidden
> > +  = __x86_cpu_features;
> > +
> > +void
> > +_dl_x86_init_cpu_features (void)
> > +{
> > +  struct cpu_features *cpu_features = __get_cpu_features ();
> > +  if (cpu_features->basic.kind == arch_kind_unknown)
> > +    init_cpu_features (cpu_features);
> > +}
> > +
> > +__ifunc (__x86_cpu_features, __x86_cpu_features, NULL, void,
> > +      _dl_x86_init_cpu_features);
> > +#endif
> > +
> >  #undef __x86_get_cpu_features
>
> Why do we need both the conditional check and the function pointer hack?

Because _dl_x86_init_cpu_features is called both indirectly and by IFUNC
reloc in dynamic executable, but it is only called by IFUNC reloc when
dlopen in static executable.

> I expect that one of the function pointers can go, probably the one
> here.  The cache hierarchy data might be used by a string function that
> has not been selected by IFUNC.
>

There are one IFUNC reloc in ld.so and the other in libc.so.  We need
both.

-- 
H.J.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/4] Set tunable value as well as min/max values
  2020-09-28 13:35   ` Florian Weimer via Libc-alpha
@ 2020-09-28 13:53     ` H.J. Lu via Libc-alpha
  2020-09-28 14:03       ` Florian Weimer via Libc-alpha
  2020-09-28 17:30     ` Siddhesh Poyarekar via Libc-alpha
  1 sibling, 1 reply; 33+ messages in thread
From: H.J. Lu via Libc-alpha @ 2020-09-28 13:53 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Siddhesh Poyarekar, H.J. Lu via Libc-alpha

On Mon, Sep 28, 2020 at 6:35 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu via Libc-alpha:
>
> > Some tunable values and their minimum/maximum values must be determinted
> > at run-time.  Add TUNABLE_SET_ALL and TUNABLE_SET_ALL_FULL to update
> > tunable value together with minimum and maximum values.  __tunable_set_val
> > is updated to set tunable value as well as min/max values.
>
> I'm not sure if this change is philosophically correct as far as the
> tunables framework is concerned.  I had thought the limits should be
> something static, so that they are consistent across systems.

Some x86 tunables ranges are dynamic.

> Maybe Siddhesh can comment on that aspect?
>
> What is supposed to happen if you specify an out-of-range value?

It should be rejected.  Otherwise programs will crash.

-- 
H.J.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/4] Set tunable value as well as min/max values
  2020-09-28 13:53     ` H.J. Lu via Libc-alpha
@ 2020-09-28 14:03       ` Florian Weimer via Libc-alpha
  0 siblings, 0 replies; 33+ messages in thread
From: Florian Weimer via Libc-alpha @ 2020-09-28 14:03 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Siddhesh Poyarekar, H.J. Lu via Libc-alpha

* H. J. Lu:

> On Mon, Sep 28, 2020 at 6:35 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * H. J. Lu via Libc-alpha:
>>
>> > Some tunable values and their minimum/maximum values must be determinted
>> > at run-time.  Add TUNABLE_SET_ALL and TUNABLE_SET_ALL_FULL to update
>> > tunable value together with minimum and maximum values.  __tunable_set_val
>> > is updated to set tunable value as well as min/max values.
>>
>> I'm not sure if this change is philosophically correct as far as the
>> tunables framework is concerned.  I had thought the limits should be
>> something static, so that they are consistent across systems.
>
> Some x86 tunables ranges are dynamic.
>
>> Maybe Siddhesh can comment on that aspect?
>>
>> What is supposed to happen if you specify an out-of-range value?
>
> It should be rejected.  Otherwise programs will crash.

You could still do this outside the tunables framework, I think.

Let's wait for Siddhesh's comments.

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/4] x86: Initialize CPU info via IFUNC relocation [BZ 26203]
  2020-09-28 13:48     ` H.J. Lu via Libc-alpha
@ 2020-09-28 14:05       ` Florian Weimer via Libc-alpha
  2020-09-28 14:20         ` H.J. Lu via Libc-alpha
  0 siblings, 1 reply; 33+ messages in thread
From: Florian Weimer via Libc-alpha @ 2020-09-28 14:05 UTC (permalink / raw)
  To: H.J. Lu; +Cc: H.J. Lu via Libc-alpha

* H. J. Lu:

>> > diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
>> > index 217c21c34f..7a325ab70e 100644
>> > --- a/sysdeps/x86/cacheinfo.c
>> > +++ b/sysdeps/x86/cacheinfo.c
>>
>> > +  assert (cpu_features->basic.kind != arch_kind_unknown);
>>
>> Why doesn't this assert fire occasionally?  How do you ensure that
>
> See
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=26203
>
> It only happens in dlopen from a static executable.

Sorry, I don't understand how this answers my question.

Do you mean that for the non-static case, initialization has already
happened.

>> relocation processing is correctly ordered?
>
> cpu_features is also initialized by IFUNC relocation in ld.so which
> is relocated before libc.so.

Is that really true in all cases?  Even if libc.so is preloaded?
(Static dlopen probably ignores LD_PRELOAD.)

Maybe put this information as a comment next to the assert?

But since cacheinfo.os is linked into libc.so, I don't really think the
assert is correct.

>> > diff --git a/sysdeps/x86/dl-get-cpu-features.c b/sysdeps/x86/dl-get-cpu-features.c
>> > index 5f9e46b0c6..da4697b895 100644
>> > --- a/sysdeps/x86/dl-get-cpu-features.c
>> > +++ b/sysdeps/x86/dl-get-cpu-features.c
>> > @@ -1,4 +1,4 @@
>> > -/* This file is part of the GNU C Library.
>> > +/* Initialize CPU feature data via IFUNC relocation.
>> >     Copyright (C) 2015-2020 Free Software Foundation, Inc.
>> >
>> >     The GNU C Library is free software; you can redistribute it and/or
>> > @@ -18,6 +18,29 @@
>> >
>> >  #include <ldsodefs.h>
>> >
>> > +#ifdef SHARED
>> > +# include <cpu-features.c>
>> > +
>> > +/* NB: Normally, DL_PLATFORM_INIT calls init_cpu_features to initialize
>> > +   CPU features.  But when loading ld.so inside of static executable,
>> > +   DL_PLATFORM_INIT isn't called.  Call init_cpu_features by initializing
>> > +   a dummy function pointer via IFUNC relocation for ld.so.  */
>> > +extern void __x86_cpu_features (void) attribute_hidden;
>> > +const void (*__x86_cpu_features_p) (void) attribute_hidden
>> > +  = __x86_cpu_features;
>> > +
>> > +void
>> > +_dl_x86_init_cpu_features (void)
>> > +{
>> > +  struct cpu_features *cpu_features = __get_cpu_features ();
>> > +  if (cpu_features->basic.kind == arch_kind_unknown)
>> > +    init_cpu_features (cpu_features);
>> > +}
>> > +
>> > +__ifunc (__x86_cpu_features, __x86_cpu_features, NULL, void,
>> > +      _dl_x86_init_cpu_features);
>> > +#endif
>> > +
>> >  #undef __x86_get_cpu_features
>>
>> Why do we need both the conditional check and the function pointer hack?
>
> Because _dl_x86_init_cpu_features is called both indirectly and by IFUNC
> reloc in dynamic executable, but it is only called by IFUNC reloc when
> dlopen in static executable.

I think we always need to call it eventually, as a dependency of filling
in the cacheinfo data?

>> I expect that one of the function pointers can go, probably the one
>> here.  The cache hierarchy data might be used by a string function that
>> has not been selected by IFUNC.
>>
>
> There are one IFUNC reloc in ld.so and the other in libc.so.  We need
> both.

libc.so should not need the relocation hack because we have
__libc_early_init, which is also called after static dlopen and before
constructors.

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/4] x86: Initialize CPU info via IFUNC relocation [BZ 26203]
  2020-09-28 14:05       ` Florian Weimer via Libc-alpha
@ 2020-09-28 14:20         ` H.J. Lu via Libc-alpha
  2020-09-28 14:22           ` Florian Weimer via Libc-alpha
  0 siblings, 1 reply; 33+ messages in thread
From: H.J. Lu via Libc-alpha @ 2020-09-28 14:20 UTC (permalink / raw)
  To: Florian Weimer; +Cc: H.J. Lu via Libc-alpha

On Mon, Sep 28, 2020 at 7:05 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu:
>
> >> > diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
> >> > index 217c21c34f..7a325ab70e 100644
> >> > --- a/sysdeps/x86/cacheinfo.c
> >> > +++ b/sysdeps/x86/cacheinfo.c
> >>
> >> > +  assert (cpu_features->basic.kind != arch_kind_unknown);
> >>
> >> Why doesn't this assert fire occasionally?  How do you ensure that
> >
> > See
> >
> > https://sourceware.org/bugzilla/show_bug.cgi?id=26203
> >
> > It only happens in dlopen from a static executable.
>
> Sorry, I don't understand how this answers my question.
>
> Do you mean that for the non-static case, initialization has already
> happened.

Yes, it is initialized by DL_PLATFORM_INIT in a dynamic executable.

> >> relocation processing is correctly ordered?
> >
> > cpu_features is also initialized by IFUNC relocation in ld.so which
> > is relocated before libc.so.
>
> Is that really true in all cases?  Even if libc.so is preloaded?

Not if init_cacheinfo is a constructor function.

> (Static dlopen probably ignores LD_PRELOAD.)

Correct.

> Maybe put this information as a comment next to the assert?

Will do.

> But since cacheinfo.os is linked into libc.so, I don't really think the
> assert is correct.

When init_cacheinfo is called, cpu_features must be initialized.

> >> > diff --git a/sysdeps/x86/dl-get-cpu-features.c b/sysdeps/x86/dl-get-cpu-features.c
> >> > index 5f9e46b0c6..da4697b895 100644
> >> > --- a/sysdeps/x86/dl-get-cpu-features.c
> >> > +++ b/sysdeps/x86/dl-get-cpu-features.c
> >> > @@ -1,4 +1,4 @@
> >> > -/* This file is part of the GNU C Library.
> >> > +/* Initialize CPU feature data via IFUNC relocation.
> >> >     Copyright (C) 2015-2020 Free Software Foundation, Inc.
> >> >
> >> >     The GNU C Library is free software; you can redistribute it and/or
> >> > @@ -18,6 +18,29 @@
> >> >
> >> >  #include <ldsodefs.h>
> >> >
> >> > +#ifdef SHARED
> >> > +# include <cpu-features.c>
> >> > +
> >> > +/* NB: Normally, DL_PLATFORM_INIT calls init_cpu_features to initialize
> >> > +   CPU features.  But when loading ld.so inside of static executable,
> >> > +   DL_PLATFORM_INIT isn't called.  Call init_cpu_features by initializing
> >> > +   a dummy function pointer via IFUNC relocation for ld.so.  */
> >> > +extern void __x86_cpu_features (void) attribute_hidden;
> >> > +const void (*__x86_cpu_features_p) (void) attribute_hidden
> >> > +  = __x86_cpu_features;
> >> > +
> >> > +void
> >> > +_dl_x86_init_cpu_features (void)
> >> > +{
> >> > +  struct cpu_features *cpu_features = __get_cpu_features ();
> >> > +  if (cpu_features->basic.kind == arch_kind_unknown)
> >> > +    init_cpu_features (cpu_features);
> >> > +}
> >> > +
> >> > +__ifunc (__x86_cpu_features, __x86_cpu_features, NULL, void,
> >> > +      _dl_x86_init_cpu_features);
> >> > +#endif
> >> > +
> >> >  #undef __x86_get_cpu_features
> >>
> >> Why do we need both the conditional check and the function pointer hack?
> >
> > Because _dl_x86_init_cpu_features is called both indirectly and by IFUNC
> > reloc in dynamic executable, but it is only called by IFUNC reloc when
> > dlopen in static executable.
>
> I think we always need to call it eventually, as a dependency of filling
> in the cacheinfo data?

Yes.  _dl_x86_init_cpu_features is called twice by DL_PLATFORM_INIT
and IFUNC reloc in dynamic executable.  It is called once by IFUNC reloc
via dlopen in static executable.

> >> I expect that one of the function pointers can go, probably the one
> >> here.  The cache hierarchy data might be used by a string function that
> >> has not been selected by IFUNC.
> >>
> >
> > There are one IFUNC reloc in ld.so and the other in libc.so.  We need
> > both.
>
> libc.so should not need the relocation hack because we have
> __libc_early_init, which is also called after static dlopen and before
> constructors.
>

We want to call init_cacheinfo as early as possible.  __libc_early_init is
still too late.

-- 
H.J.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/4] x86: Initialize CPU info via IFUNC relocation [BZ 26203]
  2020-09-28 14:20         ` H.J. Lu via Libc-alpha
@ 2020-09-28 14:22           ` Florian Weimer via Libc-alpha
  2020-09-28 14:39             ` H.J. Lu via Libc-alpha
  0 siblings, 1 reply; 33+ messages in thread
From: Florian Weimer via Libc-alpha @ 2020-09-28 14:22 UTC (permalink / raw)
  To: H.J. Lu; +Cc: H.J. Lu via Libc-alpha

* H. J. Lu:

> We want to call init_cacheinfo as early as possible.  __libc_early_init is
> still too late.

My point is that we should call it from IFUNC resolvers that need it,
and from __libc_early_init.  That should cover all cases, no?

It would also settle the matter of the assert.

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/4] x86: Initialize CPU info via IFUNC relocation [BZ 26203]
  2020-09-28 14:22           ` Florian Weimer via Libc-alpha
@ 2020-09-28 14:39             ` H.J. Lu via Libc-alpha
  2020-09-28 14:47               ` Florian Weimer via Libc-alpha
  0 siblings, 1 reply; 33+ messages in thread
From: H.J. Lu via Libc-alpha @ 2020-09-28 14:39 UTC (permalink / raw)
  To: Florian Weimer; +Cc: H.J. Lu via Libc-alpha

On Mon, Sep 28, 2020 at 7:22 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu:
>
> > We want to call init_cacheinfo as early as possible.  __libc_early_init is
> > still too late.
>
> My point is that we should call it from IFUNC resolvers that need it,
> and from __libc_early_init.  That should cover all cases, no?

Why call it from __libc_early_init after it has been called by
IFUNC reloc?  IFUNC relocations are processed before
__libc_early_init is called.

> It would also settle the matter of the assert.
>
>

-- 
H.J.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/4] x86: Initialize CPU info via IFUNC relocation [BZ 26203]
  2020-09-28 14:39             ` H.J. Lu via Libc-alpha
@ 2020-09-28 14:47               ` Florian Weimer via Libc-alpha
  2020-09-28 17:54                 ` V3 [PATCH] " H.J. Lu via Libc-alpha
  0 siblings, 1 reply; 33+ messages in thread
From: Florian Weimer via Libc-alpha @ 2020-09-28 14:47 UTC (permalink / raw)
  To: H.J. Lu; +Cc: H.J. Lu via Libc-alpha

* H. J. Lu:

> On Mon, Sep 28, 2020 at 7:22 AM Florian Weimer <fweimer@redhat.com> wrote:
>>
>> * H. J. Lu:
>>
>> > We want to call init_cacheinfo as early as possible.  __libc_early_init is
>> > still too late.
>>
>> My point is that we should call it from IFUNC resolvers that need it,
>> and from __libc_early_init.  That should cover all cases, no?
>
> Why call it from __libc_early_init after it has been called by
> IFUNC reloc?  IFUNC relocations are processed before
> __libc_early_init is called.

IFUNC relocations might not existing in a --disable-multi-arch build,
but it may still need the cacheinfo data if the hard-coded
implementations need them.

We would still run the IFUNC resolver for the artificial IFUNC resolver
with its function pointer, but:

My concern is that you seem to have a specific ordering dependency on
IFUNC resolvers, and I would like to get rid of that: Initialize the
necessary data on demand (for string function selection), and during
__libc_early_init for potential use from string functions.

I hope this clarifies what I'm looking for.  Please let me know if this
is not reasonable.

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/4] Set tunable value as well as min/max values
  2020-09-28 13:35   ` Florian Weimer via Libc-alpha
  2020-09-28 13:53     ` H.J. Lu via Libc-alpha
@ 2020-09-28 17:30     ` Siddhesh Poyarekar via Libc-alpha
  2020-09-29  4:00       ` V3 [PATCH] " H.J. Lu via Libc-alpha
  1 sibling, 1 reply; 33+ messages in thread
From: Siddhesh Poyarekar via Libc-alpha @ 2020-09-28 17:30 UTC (permalink / raw)
  To: Florian Weimer, H.J. Lu via Libc-alpha

On 28/09/20 19:05, Florian Weimer via Libc-alpha wrote:
> I'm not sure if this change is philosophically correct as far as the
> tunables framework is concerned.  I had thought the limits should be
> something static, so that they are consistent across systems.

It seems like a good idea to support dynamic limits if they will always
be more restrictive than the most restrictive static limit one could
come up with for the tunable.  I didn't exclude dynamic limits from a
design perspective; it's just that the tunables implemented at that time
didn't need them.

There is a case to always have static bounds (at the minimum to ensure
that values don't overflow the underlying types) but that shouldn't
preclude more restrictive dynamic limits IMO.

Bikeshed: maybe the macro should be called TUNABLE_SET_WITH_BOUNDS()
instead of TUNABLE_SET_ALL.

Siddhesh

^ permalink raw reply	[flat|nested] 33+ messages in thread

* V3 [PATCH] x86: Initialize CPU info via IFUNC relocation [BZ 26203]
  2020-09-28 14:47               ` Florian Weimer via Libc-alpha
@ 2020-09-28 17:54                 ` H.J. Lu via Libc-alpha
  2020-09-29  7:53                   ` Florian Weimer via Libc-alpha
  0 siblings, 1 reply; 33+ messages in thread
From: H.J. Lu via Libc-alpha @ 2020-09-28 17:54 UTC (permalink / raw)
  To: Florian Weimer; +Cc: H.J. Lu via Libc-alpha

[-- Attachment #1: Type: text/plain, Size: 1425 bytes --]

On Mon, Sep 28, 2020 at 7:47 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu:
>
> > On Mon, Sep 28, 2020 at 7:22 AM Florian Weimer <fweimer@redhat.com> wrote:
> >>
> >> * H. J. Lu:
> >>
> >> > We want to call init_cacheinfo as early as possible.  __libc_early_init is
> >> > still too late.
> >>
> >> My point is that we should call it from IFUNC resolvers that need it,
> >> and from __libc_early_init.  That should cover all cases, no?
> >
> > Why call it from __libc_early_init after it has been called by
> > IFUNC reloc?  IFUNC relocations are processed before
> > __libc_early_init is called.
>
> IFUNC relocations might not existing in a --disable-multi-arch build,
> but it may still need the cacheinfo data if the hard-coded
> implementations need them.

IFUNC is always supported on x86.  multi-arch uses IFUNC, not
the other way around.

> We would still run the IFUNC resolver for the artificial IFUNC resolver
> with its function pointer, but:
>
> My concern is that you seem to have a specific ordering dependency on
> IFUNC resolvers, and I would like to get rid of that: Initialize the
> necessary data on demand (for string function selection), and during
> __libc_early_init for potential use from string functions.
>
> I hope this clarifies what I'm looking for.  Please let me know if this
> is not reasonable.
>

Here is the updated patch.  Does it address your concerns?

Thanks.

-- 
H.J.

[-- Attachment #2: 0001-x86-Initialize-CPU-info-via-IFUNC-relocation-BZ-2620.patch --]
[-- Type: application/x-patch, Size: 13398 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* V3 [PATCH] Set tunable value as well as min/max values
  2020-09-28 17:30     ` Siddhesh Poyarekar via Libc-alpha
@ 2020-09-29  4:00       ` H.J. Lu via Libc-alpha
  2020-09-29  4:45         ` Siddhesh Poyarekar via Libc-alpha
  0 siblings, 1 reply; 33+ messages in thread
From: H.J. Lu via Libc-alpha @ 2020-09-29  4:00 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: Florian Weimer, H.J. Lu via Libc-alpha

[-- Attachment #1: Type: text/plain, Size: 1078 bytes --]

On Mon, Sep 28, 2020 at 10:31 AM Siddhesh Poyarekar via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> On 28/09/20 19:05, Florian Weimer via Libc-alpha wrote:
> > I'm not sure if this change is philosophically correct as far as the
> > tunables framework is concerned.  I had thought the limits should be
> > something static, so that they are consistent across systems.
>
> It seems like a good idea to support dynamic limits if they will always
> be more restrictive than the most restrictive static limit one could
> come up with for the tunable.  I didn't exclude dynamic limits from a
> design perspective; it's just that the tunables implemented at that time
> didn't need them.
>
> There is a case to always have static bounds (at the minimum to ensure
> that values don't overflow the underlying types) but that shouldn't
> preclude more restrictive dynamic limits IMO.
>
> Bikeshed: maybe the macro should be called TUNABLE_SET_WITH_BOUNDS()
> instead of TUNABLE_SET_ALL.
>

Here is the updated patch with TUNABLE_SET_WITH_BOUNDS.

OK for master?

Thanks.

-- 
H.J.

[-- Attachment #2: 0001-Set-tunable-value-as-well-as-min-max-values.patch --]
[-- Type: text/x-patch, Size: 6250 bytes --]

From 34c463f37f46abcb2279196def9925e9797e9ede Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Mon, 1 Jun 2020 14:11:32 -0700
Subject: [PATCH] Set tunable value as well as min/max values

Some tunable values and their minimum/maximum values must be determinted
at run-time.  Add TUNABLE_SET_WITH_BOUNDS and TUNABLE_SET_WITH_BOUNDS_FULL
to update tunable value together with minimum and maximum values.
__tunable_set_val is updated to set tunable value as well as min/max
values.
---
 elf/dl-tunables.c      | 17 ++++++++++++-----
 elf/dl-tunables.h      | 21 +++++++++++++++++++--
 manual/README.tunables | 24 ++++++++++++++++++++++--
 3 files changed, 53 insertions(+), 9 deletions(-)

diff --git a/elf/dl-tunables.c b/elf/dl-tunables.c
index 26e6e26612..b44174fe71 100644
--- a/elf/dl-tunables.c
+++ b/elf/dl-tunables.c
@@ -101,12 +101,19 @@ get_next_env (char **envp, char **name, size_t *namelen, char **val,
 })
 
 static void
-do_tunable_update_val (tunable_t *cur, const void *valp)
+do_tunable_update_val (tunable_t *cur, const void *valp,
+		       const void *minp, const void *maxp)
 {
   uint64_t val;
 
   if (cur->type.type_code != TUNABLE_TYPE_STRING)
-    val = *((int64_t *) valp);
+    {
+      val = *((int64_t *) valp);
+      if (minp)
+	cur->type.min = *((int64_t *) minp);
+      if (maxp)
+	cur->type.max = *((int64_t *) maxp);
+    }
 
   switch (cur->type.type_code)
     {
@@ -153,15 +160,15 @@ tunable_initialize (tunable_t *cur, const char *strval)
       cur->initialized = true;
       valp = strval;
     }
-  do_tunable_update_val (cur, valp);
+  do_tunable_update_val (cur, valp, NULL, NULL);
 }
 
 void
-__tunable_set_val (tunable_id_t id, void *valp)
+__tunable_set_val (tunable_id_t id, void *valp, void *minp, void *maxp)
 {
   tunable_t *cur = &tunable_list[id];
 
-  do_tunable_update_val (cur, valp);
+  do_tunable_update_val (cur, valp, minp, maxp);
 }
 
 #if TUNABLES_FRONTEND == TUNABLES_FRONTEND_valstring
diff --git a/elf/dl-tunables.h b/elf/dl-tunables.h
index f05eb50c2f..550b0cc7f4 100644
--- a/elf/dl-tunables.h
+++ b/elf/dl-tunables.h
@@ -70,9 +70,10 @@ typedef struct _tunable tunable_t;
 
 extern void __tunables_init (char **);
 extern void __tunable_get_val (tunable_id_t, void *, tunable_callback_t);
-extern void __tunable_set_val (tunable_id_t, void *);
+extern void __tunable_set_val (tunable_id_t, void *, void *, void *);
 rtld_hidden_proto (__tunables_init)
 rtld_hidden_proto (__tunable_get_val)
+rtld_hidden_proto (__tunable_set_val)
 
 /* Define TUNABLE_GET and TUNABLE_SET in short form if TOP_NAMESPACE and
    TUNABLE_NAMESPACE are defined.  This is useful shorthand to get and set
@@ -82,11 +83,18 @@ rtld_hidden_proto (__tunable_get_val)
   TUNABLE_GET_FULL (TOP_NAMESPACE, TUNABLE_NAMESPACE, __id, __type, __cb)
 # define TUNABLE_SET(__id, __type, __val) \
   TUNABLE_SET_FULL (TOP_NAMESPACE, TUNABLE_NAMESPACE, __id, __type, __val)
+# define TUNABLE_SET_WITH_BOUNDS(__id, __type, __val, __min, __max) \
+  TUNABLE_SET_WITH_BOUNDS_FULL (TOP_NAMESPACE, TUNABLE_NAMESPACE, __id, \
+				__type, __val, __min, __max)
 #else
 # define TUNABLE_GET(__top, __ns, __id, __type, __cb) \
   TUNABLE_GET_FULL (__top, __ns, __id, __type, __cb)
 # define TUNABLE_SET(__top, __ns, __id, __type, __val) \
   TUNABLE_SET_FULL (__top, __ns, __id, __type, __val)
+# define TUNABLE_SET_WITH_BOUNDS(__top, __ns, __id, __type, __val, \
+				 __min, __max) \
+  TUNABLE_SET_WITH_BOUNDS_FULL (__top, __ns, __id, __type, __val, \
+				__min, __max)
 #endif
 
 /* Get and return a tunable value.  If the tunable was set externally and __CB
@@ -103,7 +111,16 @@ rtld_hidden_proto (__tunable_get_val)
 # define TUNABLE_SET_FULL(__top, __ns, __id, __type, __val) \
 ({									      \
   __tunable_set_val (TUNABLE_ENUM_NAME (__top, __ns, __id),		      \
-			& (__type) {__val});				      \
+		     & (__type) {__val}, NULL, NULL);			      \
+})
+
+/* Set a tunable value together with min/max values.  */
+# define TUNABLE_SET_WITH_BOUNDS_FULL(__top, __ns, __id, __type, __val,	      \
+				      __min, __max)			      \
+({									      \
+  __tunable_set_val (TUNABLE_ENUM_NAME (__top, __ns, __id),		      \
+		     & (__type) {__val},  & (__type) {__min},		      \
+		     & (__type) {__max});				      \
 })
 
 /* Namespace sanity for callback functions.  Use this macro to keep the
diff --git a/manual/README.tunables b/manual/README.tunables
index f87a31a65e..fff6c2a87e 100644
--- a/manual/README.tunables
+++ b/manual/README.tunables
@@ -67,7 +67,7 @@ The list of allowed attributes are:
 				     non-AT_SECURE subprocesses.
 			NONE: Read all the time.
 
-2. Use TUNABLE_GET/TUNABLE_SET to get and set tunables.
+2. Use TUNABLE_GET/TUNABLE_SET/TUNABLE_SET_WITH_BOUNDS to get and set tunables.
 
 3. OPTIONAL: If tunables in a namespace are being used multiple times within a
    specific module, set the TUNABLE_NAMESPACE macro to reduce the amount of
@@ -112,9 +112,29 @@ form of the macros as follows:
 where 'glibc' is the top namespace, 'cpu' is the tunable namespace and the
 remaining arguments are the same as the short form macros.
 
+The minimum and maximum values can updated together with the tunable value
+using:
+
+  TUNABLE_SET_WITH_BOUNDS (check, int32_t, val, min, max)
+
+where 'check' is the tunable name, 'int32_t' is the C type of the tunable,
+'val' is a value of same type, 'min' and 'max' are the minimum and maximum
+values of the tunable.
+
+To set the minimum and maximum values of tunables in a different namespace
+from that module, use the full form of the macros as follows:
+
+  val = TUNABLE_GET_FULL (glibc, cpu, hwcap_mask, uint64_t, NULL)
+
+  TUNABLE_SET_WITH_BOUNDS_FULL (glibc, cpu, hwcap_mask, uint64_t, val, min, max)
+
+where 'glibc' is the top namespace, 'cpu' is the tunable namespace and the
+remaining arguments are the same as the short form macros.
+
 When TUNABLE_NAMESPACE is not defined in a module, TUNABLE_GET is equivalent to
 TUNABLE_GET_FULL, so you will need to provide full namespace information for
-both macros.  Likewise for TUNABLE_SET and TUNABLE_SET_FULL.
+both macros.  Likewise for TUNABLE_SET, TUNABLE_SET_FULL,
+TUNABLE_SET_WITH_BOUNDS and TUNABLE_SET_WITH_BOUNDS_FULL.
 
 ** IMPORTANT NOTE **
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: V3 [PATCH] Set tunable value as well as min/max values
  2020-09-29  4:00       ` V3 [PATCH] " H.J. Lu via Libc-alpha
@ 2020-09-29  4:45         ` Siddhesh Poyarekar via Libc-alpha
  2020-09-29  4:47           ` Siddhesh Poyarekar
  0 siblings, 1 reply; 33+ messages in thread
From: Siddhesh Poyarekar via Libc-alpha @ 2020-09-29  4:45 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Florian Weimer, H.J. Lu via Libc-alpha

On 29/09/20 09:30, H.J. Lu via Libc-alpha wrote:
> Here is the updated patch with TUNABLE_SET_WITH_BOUNDS.
> 

> @@ -101,12 +101,19 @@ get_next_env (char **envp, char **name, size_t *namelen, char **val,
>  })
>  
>  static void
> -do_tunable_update_val (tunable_t *cur, const void *valp)
> +do_tunable_update_val (tunable_t *cur, const void *valp,
> +		       const void *minp, const void *maxp)
>  {
>    uint64_t val;
>  
>    if (cur->type.type_code != TUNABLE_TYPE_STRING)
> -    val = *((int64_t *) valp);
> +    {
> +      val = *((int64_t *) valp);
> +      if (minp)
> +	cur->type.min = *((int64_t *) minp);
> +      if (maxp)
> +	cur->type.max = *((int64_t *) maxp);
> +    }
>  
>    switch (cur->type.type_code)
>      {
> @@ -153,15 +160,15 @@ tunable_initialize (tunable_t *cur, con

There should be a check here to ensure that the bounds do not exceed
statically set bounds.  That is:

    if (minp != NULL && cur->type.min < *((int64_t *) minp))
      cur->type.min = *((int64_t *) minp);
    if (maxp != NULL && cur->type.max > *((int64_t *) maxp))
      cur->type.max = *((int64_t *) maxp);

Siddhesh

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: V3 [PATCH] Set tunable value as well as min/max values
  2020-09-29  4:45         ` Siddhesh Poyarekar via Libc-alpha
@ 2020-09-29  4:47           ` Siddhesh Poyarekar
  2020-09-29 12:30             ` V4 " H.J. Lu via Libc-alpha
  0 siblings, 1 reply; 33+ messages in thread
From: Siddhesh Poyarekar @ 2020-09-29  4:47 UTC (permalink / raw)
  To: Siddhesh Poyarekar, H.J. Lu; +Cc: Florian Weimer, H.J. Lu via Libc-alpha

(Sorry, replying to myself)

On 29/09/20 10:15, Siddhesh Poyarekar via Libc-alpha wrote:
> There should be a check here to ensure that the bounds do not exceed
> statically set bounds.  That is:
> 
>     if (minp != NULL && cur->type.min < *((int64_t *) minp))
>       cur->type.min = *((int64_t *) minp);
>     if (maxp != NULL && cur->type.max > *((int64_t *) maxp))
>       cur->type.max = *((int64_t *) maxp);

Also check for min > max type invalid ranges.

Siddhesh

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: V3 [PATCH] x86: Initialize CPU info via IFUNC relocation [BZ 26203]
  2020-09-28 17:54                 ` V3 [PATCH] " H.J. Lu via Libc-alpha
@ 2020-09-29  7:53                   ` Florian Weimer via Libc-alpha
  2020-09-29 11:44                     ` H.J. Lu via Libc-alpha
  0 siblings, 1 reply; 33+ messages in thread
From: Florian Weimer via Libc-alpha @ 2020-09-29  7:53 UTC (permalink / raw)
  To: H.J. Lu via Libc-alpha

* H. J. Lu via Libc-alpha:

> Here is the updated patch.  Does it address your concerns?

“git am” fails to apply it for me:

Applying: x86: Initialize CPU info via IFUNC relocation [BZ 26203]
error: corrupt patch at line 41
Patch failed at 0001 x86: Initialize CPU info via IFUNC relocation [BZ 26203]

The patch program does not like it, either:

patching file sysdeps/i386/dl-machine.h
patching file sysdeps/x86/cacheinfo.c
patch: **** malformed patch at line 73: @@ -770,6 +769,8 @@ init_cacheinfo (void)

I don't think it's the Red Hat mail system that's at fault this time. 8-/

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: V3 [PATCH] x86: Initialize CPU info via IFUNC relocation [BZ 26203]
  2020-09-29  7:53                   ` Florian Weimer via Libc-alpha
@ 2020-09-29 11:44                     ` H.J. Lu via Libc-alpha
  2020-10-01  8:46                       ` Florian Weimer via Libc-alpha
  0 siblings, 1 reply; 33+ messages in thread
From: H.J. Lu via Libc-alpha @ 2020-09-29 11:44 UTC (permalink / raw)
  To: Florian Weimer; +Cc: H.J. Lu via Libc-alpha

[-- Attachment #1: Type: text/plain, Size: 798 bytes --]

On Tue, Sep 29, 2020 at 12:53 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu via Libc-alpha:
>
> > Here is the updated patch.  Does it address your concerns?
>
> “git am” fails to apply it for me:
>
> Applying: x86: Initialize CPU info via IFUNC relocation [BZ 26203]
> error: corrupt patch at line 41
> Patch failed at 0001 x86: Initialize CPU info via IFUNC relocation [BZ 26203]
>
> The patch program does not like it, either:
>
> patching file sysdeps/i386/dl-machine.h
> patching file sysdeps/x86/cacheinfo.c
> patch: **** malformed patch at line 73: @@ -770,6 +769,8 @@ init_cacheinfo (void)
>
> I don't think it's the Red Hat mail system that's at fault this time. 8-/
>

Here is the updated patch with the correct commit log.

Thanks.

-- 
H.J.

[-- Attachment #2: 0001-x86-Initialize-CPU-info-via-IFUNC-relocation-BZ-2620.patch --]
[-- Type: text/x-patch, Size: 8732 bytes --]

From 8ee884ce6c1a52c3b88159ecdef3f8401743ad3c Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Sat, 4 Jul 2020 06:35:49 -0700
Subject: [PATCH] x86: Initialize CPU info via IFUNC relocation [BZ 26203]

X86 CPU features in ld.so are initialized by init_cpu_features, which is
invoked by DL_PLATFORM_INIT from _dl_sysdep_start.  But when ld.so is
loaded by static executable, DL_PLATFORM_INIT is never called.  Also
x86 cache info in libc.o and libc.a is initialized by a constructor
which may be called too late.  Since some fields in _rtld_global_ro
in ld.so are initialized by dynamic relocation, we can also initialize
x86 CPU features in _rtld_global_ro in ld.so and cache info in libc.so
by initializing dummy function pointers in ld.so and libc.so via IFUNC
relocation.

Key points:

1. IFUNC is always supported, independent of --enable-multi-arch or
--disable-multi-arch.  Linker generates IFUNC relocations from input
IFUNC objects and ld.so performs IFUNC relocations.
2. There are no IFUNC dependencies in ld.so before dynamic relocation
have been performed,
3. The x86 CPU features in ld.so is initialized by DL_PLATFORM_INIT
in dynamic executable and by IFUNC relocation in dlopen in static
executable.
4. The x86 cache info in libc.o is initialized by IFUNC relocation.
5. In libc.a, both x86 CPU features and cache info are initialized from
ARCH_INIT_CPU_FEATURES, not by IFUNC relocation, before __libc_early_init
is called.

Note: _dl_x86_init_cpu_features can be called more than once from
DL_PLATFORM_INIT and during relocation in ld.so.
---
 sysdeps/i386/dl-machine.h          |  7 +++----
 sysdeps/x86/cacheinfo.c            | 21 +++++++++++++++++++--
 sysdeps/x86/cpu-features.c         | 12 +++++++++++-
 sysdeps/x86/dl-get-cpu-features.c  | 27 ++++++++++++++++++++++++++-
 sysdeps/x86/include/cpu-features.h |  1 +
 sysdeps/x86/libc-start.c           |  2 +-
 sysdeps/x86_64/dl-machine.h        |  7 +++----
 7 files changed, 64 insertions(+), 13 deletions(-)

diff --git a/sysdeps/i386/dl-machine.h b/sysdeps/i386/dl-machine.h
index 0f08079e48..bdc21d1a3c 100644
--- a/sysdeps/i386/dl-machine.h
+++ b/sysdeps/i386/dl-machine.h
@@ -25,7 +25,6 @@
 #include <sysdep.h>
 #include <tls.h>
 #include <dl-tlsdesc.h>
-#include <cpu-features.c>
 
 /* Return nonzero iff ELF header is compatible with the running host.  */
 static inline int __attribute__ ((unused))
@@ -248,9 +247,9 @@ static inline void __attribute__ ((unused))
 dl_platform_init (void)
 {
 #if IS_IN (rtld)
-  /* init_cpu_features has been called early from __libc_start_main in
-     static executable.  */
-  init_cpu_features (&GLRO(dl_x86_cpu_features));
+  /* _dl_x86_init_cpu_features is a wrapper for init_cpu_features which
+     has been called early from __libc_start_main in static executable.  */
+  _dl_x86_init_cpu_features ();
 #else
   if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
     /* Avoid an empty string which would disturb us.  */
diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
index dadec5d58f..65ab29123d 100644
--- a/sysdeps/x86/cacheinfo.c
+++ b/sysdeps/x86/cacheinfo.c
@@ -16,7 +16,9 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#if IS_IN (libc)
+/* NB: In libc.a, this file is included in libc-static.c.  In libc.so,
+   this file is standalone.  */
+#if IS_IN (libc) && (defined SHARED || defined _PRIVATE_CPU_FEATURES_H)
 
 #include <assert.h>
 #include <stdbool.h>
@@ -756,7 +758,6 @@ intel_bug_no_cache_info:
 
 
 static void
-__attribute__((constructor))
 init_cacheinfo (void)
 {
   /* Find out what brand of processor.  */
@@ -770,6 +771,12 @@ init_cacheinfo (void)
   unsigned int threads = 0;
   const struct cpu_features *cpu_features = __get_cpu_features ();
 
+  /* NB: In libc.so, cpu_features is defined in ld.so and is initialized
+     by DL_PLATFORM_INIT or IFUNC relocation before init_cacheinfo is
+     called by IFUNC relocation.  In libc.a, init_cacheinfo is called
+     from init_cpu_features by ARCH_INIT_CPU_FEATURES.  */
+  assert (cpu_features->basic.kind != arch_kind_unknown);
+
   if (cpu_features->basic.kind == arch_kind_intel)
     {
       data = handle_intel (_SC_LEVEL1_DCACHE_SIZE, cpu_features);
@@ -900,4 +907,14 @@ init_cacheinfo (void)
 # endif
 }
 
+# ifdef SHARED
+/* NB: In libc.so, call init_cacheinfo by initializing a dummy function
+   pointer via IFUNC relocation after CPU features in ld.so have been
+   initialized by DL_PLATFORM_INIT or IFUNC relocation.  */
+extern void __x86_cacheinfo (void) attribute_hidden;
+const void (*__x86_cacheinfo_p) (void) attribute_hidden
+  = __x86_cacheinfo;
+
+__ifunc (__x86_cacheinfo, __x86_cacheinfo, NULL, void, init_cacheinfo);
+# endif
 #endif
diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
index 6551df19c0..874ce9c886 100644
--- a/sysdeps/x86/cpu-features.c
+++ b/sysdeps/x86/cpu-features.c
@@ -16,7 +16,12 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#include <cpuid.h>
+#ifdef SHARED
+/* NB: Workaround the GCC <cpuid.h> bug:
+	https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96238
+ */
+# include <cpuid.h>
+#endif
 #include <cpu-features.h>
 #include <dl-hwcap.h>
 #include <libc-pointer-arith.h>
@@ -746,4 +751,9 @@ no_cpuid:
 # endif
     }
 #endif
+
+#ifndef SHARED
+  /* NB: In libc.a, call init_cacheinfo.  */
+  init_cacheinfo ();
+#endif
 }
diff --git a/sysdeps/x86/dl-get-cpu-features.c b/sysdeps/x86/dl-get-cpu-features.c
index 5f9e46b0c6..349472d99f 100644
--- a/sysdeps/x86/dl-get-cpu-features.c
+++ b/sysdeps/x86/dl-get-cpu-features.c
@@ -1,4 +1,4 @@
-/* This file is part of the GNU C Library.
+/* Initialize CPU feature data via IFUNC relocation.
    Copyright (C) 2015-2020 Free Software Foundation, Inc.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -18,6 +18,31 @@
 
 #include <ldsodefs.h>
 
+#ifdef SHARED
+# include <cpu-features.c>
+
+/* NB: Normally, DL_PLATFORM_INIT calls init_cpu_features to initialize
+   CPU features in dynamic executable.  But when loading ld.so inside of
+   static executable, DL_PLATFORM_INIT isn't called and IFUNC relocation
+   is used to call init_cpu_features.  In static executable, it is called
+   once by IFUNC relocation.  In dynamic executable, it is called twice
+   by DL_PLATFORM_INIT and by IFUNC relocation.  */
+extern void __x86_cpu_features (void) attribute_hidden;
+const void (*__x86_cpu_features_p) (void) attribute_hidden
+  = __x86_cpu_features;
+
+void
+_dl_x86_init_cpu_features (void)
+{
+  struct cpu_features *cpu_features = __get_cpu_features ();
+  if (cpu_features->basic.kind == arch_kind_unknown)
+    init_cpu_features (cpu_features);
+}
+
+__ifunc (__x86_cpu_features, __x86_cpu_features, NULL, void,
+	 _dl_x86_init_cpu_features);
+#endif
+
 #undef __x86_get_cpu_features
 
 const struct cpu_features *
diff --git a/sysdeps/x86/include/cpu-features.h b/sysdeps/x86/include/cpu-features.h
index dcf29b6fe8..f62be0b9b3 100644
--- a/sysdeps/x86/include/cpu-features.h
+++ b/sysdeps/x86/include/cpu-features.h
@@ -159,6 +159,7 @@ struct cpu_features
 /* Unused for x86.  */
 #  define INIT_ARCH()
 #  define __x86_get_cpu_features(max) (&GLRO(dl_x86_cpu_features))
+extern void _dl_x86_init_cpu_features (void) attribute_hidden;
 # endif
 
 # ifdef __x86_64__
diff --git a/sysdeps/x86/libc-start.c b/sysdeps/x86/libc-start.c
index 875bb93e55..4f72fcf397 100644
--- a/sysdeps/x86/libc-start.c
+++ b/sysdeps/x86/libc-start.c
@@ -20,7 +20,7 @@
    PIE.  */
 # include <startup.h>
 # include <ldsodefs.h>
-# include <cpu-features.h>
+# include <cacheinfo.c>
 # include <cpu-features.c>
 
 extern struct cpu_features _dl_x86_cpu_features;
diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h
index ca73d8fef9..bb93c7c6ab 100644
--- a/sysdeps/x86_64/dl-machine.h
+++ b/sysdeps/x86_64/dl-machine.h
@@ -26,7 +26,6 @@
 #include <sysdep.h>
 #include <tls.h>
 #include <dl-tlsdesc.h>
-#include <cpu-features.c>
 
 /* Return nonzero iff ELF header is compatible with the running host.  */
 static inline int __attribute__ ((unused))
@@ -223,9 +222,9 @@ static inline void __attribute__ ((unused))
 dl_platform_init (void)
 {
 #if IS_IN (rtld)
-  /* init_cpu_features has been called early from __libc_start_main in
-     static executable.  */
-  init_cpu_features (&GLRO(dl_x86_cpu_features));
+  /* _dl_x86_init_cpu_features is a wrapper for init_cpu_features which
+     has been called early from __libc_start_main in static executable.  */
+  _dl_x86_init_cpu_features ();
 #else
   if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
     /* Avoid an empty string which would disturb us.  */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* V4 [PATCH] Set tunable value as well as min/max values
  2020-09-29  4:47           ` Siddhesh Poyarekar
@ 2020-09-29 12:30             ` H.J. Lu via Libc-alpha
  2020-09-29 13:50               ` Siddhesh Poyarekar via Libc-alpha
  0 siblings, 1 reply; 33+ messages in thread
From: H.J. Lu via Libc-alpha @ 2020-09-29 12:30 UTC (permalink / raw)
  To: Siddhesh Poyarekar
  Cc: Florian Weimer, Siddhesh Poyarekar, H.J. Lu via Libc-alpha

[-- Attachment #1: Type: text/plain, Size: 662 bytes --]

On Mon, Sep 28, 2020 at 9:47 PM Siddhesh Poyarekar <siddhesh@gotplt.org> wrote:
>
> (Sorry, replying to myself)
>
> On 29/09/20 10:15, Siddhesh Poyarekar via Libc-alpha wrote:
> > There should be a check here to ensure that the bounds do not exceed
> > statically set bounds.  That is:
> >
> >     if (minp != NULL && cur->type.min < *((int64_t *) minp))
> >       cur->type.min = *((int64_t *) minp);
> >     if (maxp != NULL && cur->type.max > *((int64_t *) maxp))
> >       cur->type.max = *((int64_t *) maxp);
>
> Also check for min > max type invalid ranges.
>

Here is the updated patch with TUNABLE_SET_BOUNDS_IF_VALID.

OK for master?

Thanks.

-- 
H.J.

[-- Attachment #2: 0001-Set-tunable-value-as-well-as-min-max-values.patch --]
[-- Type: text/x-patch, Size: 7319 bytes --]

From aa49996f8e233b081d511441d1f1b557a7fd498f Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Mon, 1 Jun 2020 14:11:32 -0700
Subject: [PATCH] Set tunable value as well as min/max values

Some tunable values and their minimum/maximum values must be determinted
at run-time.  Add TUNABLE_SET_WITH_BOUNDS and TUNABLE_SET_WITH_BOUNDS_FULL
to update tunable value together with minimum and maximum values.
__tunable_set_val is updated to set tunable value as well as min/max
values.
---
 elf/dl-tunables.c      | 33 +++++++++++++++++++++++++++++----
 elf/dl-tunables.h      | 21 +++++++++++++++++++--
 manual/README.tunables | 24 ++++++++++++++++++++++--
 3 files changed, 70 insertions(+), 8 deletions(-)

diff --git a/elf/dl-tunables.c b/elf/dl-tunables.c
index 26e6e26612..6e8ca36fef 100644
--- a/elf/dl-tunables.c
+++ b/elf/dl-tunables.c
@@ -100,8 +100,30 @@ get_next_env (char **envp, char **name, size_t *namelen, char **val,
     }									      \
 })
 
+#define TUNABLE_SET_BOUNDS_IF_VALID(__cur, __maxp, __minp, __type)	      \
+({									      \
+  __type min = __minp ? *((__type *) __minp) : (__cur)->type.min;	      \
+  __type max = __maxp ? *((__type *) __maxp) : (__cur)->type.max;	      \
+  if (__minp)								      \
+    {									      \
+      if (__maxp)							      \
+	{								      \
+	  if (max > min)						      \
+	    {								      \
+	      (__cur)->type.min = min;					      \
+	      (__cur)->type.max = max;					      \
+	    }								      \
+	}								      \
+      else if (max > min)						      \
+	(__cur)->type.min = min;					      \
+    }									      \
+  else if (max > min)							      \
+    (__cur)->type.max = min;						      \
+})
+
 static void
-do_tunable_update_val (tunable_t *cur, const void *valp)
+do_tunable_update_val (tunable_t *cur, const void *valp,
+		       const void *minp, const void *maxp)
 {
   uint64_t val;
 
@@ -112,16 +134,19 @@ do_tunable_update_val (tunable_t *cur, const void *valp)
     {
     case TUNABLE_TYPE_INT_32:
 	{
+	  TUNABLE_SET_BOUNDS_IF_VALID (cur, minp, maxp, int64_t);
 	  TUNABLE_SET_VAL_IF_VALID_RANGE (cur, val, int64_t);
 	  break;
 	}
     case TUNABLE_TYPE_UINT_64:
 	{
+	  TUNABLE_SET_BOUNDS_IF_VALID (cur, minp, maxp, uint64_t);
 	  TUNABLE_SET_VAL_IF_VALID_RANGE (cur, val, uint64_t);
 	  break;
 	}
     case TUNABLE_TYPE_SIZE_T:
 	{
+	  TUNABLE_SET_BOUNDS_IF_VALID (cur, minp, maxp, uint64_t);
 	  TUNABLE_SET_VAL_IF_VALID_RANGE (cur, val, uint64_t);
 	  break;
 	}
@@ -153,15 +178,15 @@ tunable_initialize (tunable_t *cur, const char *strval)
       cur->initialized = true;
       valp = strval;
     }
-  do_tunable_update_val (cur, valp);
+  do_tunable_update_val (cur, valp, NULL, NULL);
 }
 
 void
-__tunable_set_val (tunable_id_t id, void *valp)
+__tunable_set_val (tunable_id_t id, void *valp, void *minp, void *maxp)
 {
   tunable_t *cur = &tunable_list[id];
 
-  do_tunable_update_val (cur, valp);
+  do_tunable_update_val (cur, valp, minp, maxp);
 }
 
 #if TUNABLES_FRONTEND == TUNABLES_FRONTEND_valstring
diff --git a/elf/dl-tunables.h b/elf/dl-tunables.h
index f05eb50c2f..550b0cc7f4 100644
--- a/elf/dl-tunables.h
+++ b/elf/dl-tunables.h
@@ -70,9 +70,10 @@ typedef struct _tunable tunable_t;
 
 extern void __tunables_init (char **);
 extern void __tunable_get_val (tunable_id_t, void *, tunable_callback_t);
-extern void __tunable_set_val (tunable_id_t, void *);
+extern void __tunable_set_val (tunable_id_t, void *, void *, void *);
 rtld_hidden_proto (__tunables_init)
 rtld_hidden_proto (__tunable_get_val)
+rtld_hidden_proto (__tunable_set_val)
 
 /* Define TUNABLE_GET and TUNABLE_SET in short form if TOP_NAMESPACE and
    TUNABLE_NAMESPACE are defined.  This is useful shorthand to get and set
@@ -82,11 +83,18 @@ rtld_hidden_proto (__tunable_get_val)
   TUNABLE_GET_FULL (TOP_NAMESPACE, TUNABLE_NAMESPACE, __id, __type, __cb)
 # define TUNABLE_SET(__id, __type, __val) \
   TUNABLE_SET_FULL (TOP_NAMESPACE, TUNABLE_NAMESPACE, __id, __type, __val)
+# define TUNABLE_SET_WITH_BOUNDS(__id, __type, __val, __min, __max) \
+  TUNABLE_SET_WITH_BOUNDS_FULL (TOP_NAMESPACE, TUNABLE_NAMESPACE, __id, \
+				__type, __val, __min, __max)
 #else
 # define TUNABLE_GET(__top, __ns, __id, __type, __cb) \
   TUNABLE_GET_FULL (__top, __ns, __id, __type, __cb)
 # define TUNABLE_SET(__top, __ns, __id, __type, __val) \
   TUNABLE_SET_FULL (__top, __ns, __id, __type, __val)
+# define TUNABLE_SET_WITH_BOUNDS(__top, __ns, __id, __type, __val, \
+				 __min, __max) \
+  TUNABLE_SET_WITH_BOUNDS_FULL (__top, __ns, __id, __type, __val, \
+				__min, __max)
 #endif
 
 /* Get and return a tunable value.  If the tunable was set externally and __CB
@@ -103,7 +111,16 @@ rtld_hidden_proto (__tunable_get_val)
 # define TUNABLE_SET_FULL(__top, __ns, __id, __type, __val) \
 ({									      \
   __tunable_set_val (TUNABLE_ENUM_NAME (__top, __ns, __id),		      \
-			& (__type) {__val});				      \
+		     & (__type) {__val}, NULL, NULL);			      \
+})
+
+/* Set a tunable value together with min/max values.  */
+# define TUNABLE_SET_WITH_BOUNDS_FULL(__top, __ns, __id, __type, __val,	      \
+				      __min, __max)			      \
+({									      \
+  __tunable_set_val (TUNABLE_ENUM_NAME (__top, __ns, __id),		      \
+		     & (__type) {__val},  & (__type) {__min},		      \
+		     & (__type) {__max});				      \
 })
 
 /* Namespace sanity for callback functions.  Use this macro to keep the
diff --git a/manual/README.tunables b/manual/README.tunables
index f87a31a65e..fff6c2a87e 100644
--- a/manual/README.tunables
+++ b/manual/README.tunables
@@ -67,7 +67,7 @@ The list of allowed attributes are:
 				     non-AT_SECURE subprocesses.
 			NONE: Read all the time.
 
-2. Use TUNABLE_GET/TUNABLE_SET to get and set tunables.
+2. Use TUNABLE_GET/TUNABLE_SET/TUNABLE_SET_WITH_BOUNDS to get and set tunables.
 
 3. OPTIONAL: If tunables in a namespace are being used multiple times within a
    specific module, set the TUNABLE_NAMESPACE macro to reduce the amount of
@@ -112,9 +112,29 @@ form of the macros as follows:
 where 'glibc' is the top namespace, 'cpu' is the tunable namespace and the
 remaining arguments are the same as the short form macros.
 
+The minimum and maximum values can updated together with the tunable value
+using:
+
+  TUNABLE_SET_WITH_BOUNDS (check, int32_t, val, min, max)
+
+where 'check' is the tunable name, 'int32_t' is the C type of the tunable,
+'val' is a value of same type, 'min' and 'max' are the minimum and maximum
+values of the tunable.
+
+To set the minimum and maximum values of tunables in a different namespace
+from that module, use the full form of the macros as follows:
+
+  val = TUNABLE_GET_FULL (glibc, cpu, hwcap_mask, uint64_t, NULL)
+
+  TUNABLE_SET_WITH_BOUNDS_FULL (glibc, cpu, hwcap_mask, uint64_t, val, min, max)
+
+where 'glibc' is the top namespace, 'cpu' is the tunable namespace and the
+remaining arguments are the same as the short form macros.
+
 When TUNABLE_NAMESPACE is not defined in a module, TUNABLE_GET is equivalent to
 TUNABLE_GET_FULL, so you will need to provide full namespace information for
-both macros.  Likewise for TUNABLE_SET and TUNABLE_SET_FULL.
+both macros.  Likewise for TUNABLE_SET, TUNABLE_SET_FULL,
+TUNABLE_SET_WITH_BOUNDS and TUNABLE_SET_WITH_BOUNDS_FULL.
 
 ** IMPORTANT NOTE **
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: V4 [PATCH] Set tunable value as well as min/max values
  2020-09-29 12:30             ` V4 " H.J. Lu via Libc-alpha
@ 2020-09-29 13:50               ` Siddhesh Poyarekar via Libc-alpha
  2020-09-29 14:54                 ` V5 " H.J. Lu via Libc-alpha
  0 siblings, 1 reply; 33+ messages in thread
From: Siddhesh Poyarekar via Libc-alpha @ 2020-09-29 13:50 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Florian Weimer, H.J. Lu via Libc-alpha

On 29/09/20 18:00, H.J. Lu wrote:
> Here is the updated patch with TUNABLE_SET_BOUNDS_IF_VALID.
> 
> OK for master?

I don't think TUNABLE_SET_BOUNDS_IF_VALID is doing what it intends to do.

> +#define TUNABLE_SET_BOUNDS_IF_VALID(__cur, __maxp, __minp, __type)	      \

minp and maxp are switched around.

> +({									      \
> +  __type min = __minp ? *((__type *) __minp) : (__cur)->type.min;	      \
> +  __type max = __maxp ? *((__type *) __maxp) : (__cur)->type.max;	      \
> +  if (__minp)								      \
> +    {									      \
> +      if (__maxp)							      \
> +	{								      \
> +	  if (max > min)						      \
> +	    {								      \
> +	      (__cur)->type.min = min;					      \
> +	      (__cur)->type.max = max;					      \
> +	    }								      \

When both minp and maxp are specified, it checks if they're sane with
respect to each other but it should also check whether the range they
describe is a *subset* of (__cur)->type.min and (__cur)->type.max.

> +	}								      \
> +      else if (max > min)						      \
> +	(__cur)->type.min = min;					      \

You also need to make sure that (__cur)->type.min < min to ensure a more
restrictive range.

> +    }									      \
> +  else if (max > min)							      \
> +    (__cur)->type.max = min;						      \

I did not understand this one.

Basically, this is what it should look like:

    if (__minp != NULL
        && *__minp <= *__maxp
        && *__minp >= (__cur)->type.min
        && *__minp <= (__cur)->type.max)
      (__cur)->type.min = *__minp;

    if (__maxp != NULL
        && *__minp <= *_maxp
        && *__maxp >= (__cur)->type.min
        && *__maxp <= (__cur)->type.max)
      (__cur)->type.max = *maxp;

You could collapse some of these conditions if we assume that maxp and
minp are always either NULL or not NULL together.

Siddhesh

^ permalink raw reply	[flat|nested] 33+ messages in thread

* V5 [PATCH] Set tunable value as well as min/max values
  2020-09-29 13:50               ` Siddhesh Poyarekar via Libc-alpha
@ 2020-09-29 14:54                 ` H.J. Lu via Libc-alpha
  2020-09-29 15:58                   ` Siddhesh Poyarekar via Libc-alpha
  0 siblings, 1 reply; 33+ messages in thread
From: H.J. Lu via Libc-alpha @ 2020-09-29 14:54 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: Florian Weimer, H.J. Lu via Libc-alpha

[-- Attachment #1: Type: text/plain, Size: 3020 bytes --]

On Tue, Sep 29, 2020 at 6:50 AM Siddhesh Poyarekar
<siddhesh@sourceware.org> wrote:
>
> On 29/09/20 18:00, H.J. Lu wrote:
> > Here is the updated patch with TUNABLE_SET_BOUNDS_IF_VALID.
> >
> > OK for master?
>
> I don't think TUNABLE_SET_BOUNDS_IF_VALID is doing what it intends to do.
>
> > +#define TUNABLE_SET_BOUNDS_IF_VALID(__cur, __maxp, __minp, __type)         \
>
> minp and maxp are switched around.

Fixed.

> > +({                                                                         \
> > +  __type min = __minp ? *((__type *) __minp) : (__cur)->type.min;          \
> > +  __type max = __maxp ? *((__type *) __maxp) : (__cur)->type.max;          \
> > +  if (__minp)                                                                      \
> > +    {                                                                              \
> > +      if (__maxp)                                                          \
> > +     {                                                                     \
> > +       if (max > min)                                                      \
> > +         {                                                                 \
> > +           (__cur)->type.min = min;                                        \
> > +           (__cur)->type.max = max;                                        \
> > +         }                                                                 \
>
> When both minp and maxp are specified, it checks if they're sane with
> respect to each other but it should also check whether the range they
> describe is a *subset* of (__cur)->type.min and (__cur)->type.max.
>
> > +     }                                                                     \
> > +      else if (max > min)                                                  \
> > +     (__cur)->type.min = min;                                              \
>
> You also need to make sure that (__cur)->type.min < min to ensure a more
> restrictive range.
>
> > +    }                                                                              \
> > +  else if (max > min)                                                              \
> > +    (__cur)->type.max = min;                                               \
>
> I did not understand this one.
>
> Basically, this is what it should look like:
>
>     if (__minp != NULL
>         && *__minp <= *__maxp

__maxp may be NULL.

>         && *__minp >= (__cur)->type.min
>         && *__minp <= (__cur)->type.max)
>       (__cur)->type.min = *__minp;
>
>     if (__maxp != NULL
>         && *__minp <= *_maxp

__minp may be NULL.

>         && *__maxp >= (__cur)->type.min
>         && *__maxp <= (__cur)->type.max)
>       (__cur)->type.max = *maxp;
>
> You could collapse some of these conditions if we assume that maxp and
> minp are always either NULL or not NULL together.
>

I don't think we should make such assumptions.  Here is the
updated patch with the check for the subset of (__cur)->type.min
and (__cur)->type.max.


-- 
H.J.

[-- Attachment #2: 0001-Set-tunable-value-as-well-as-min-max-values.patch --]
[-- Type: text/x-patch, Size: 7832 bytes --]

From 9dd6abbb94c1589f58b12c8dbb5159cc51e6b08e Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Mon, 1 Jun 2020 14:11:32 -0700
Subject: [PATCH] Set tunable value as well as min/max values

Some tunable values and their minimum/maximum values must be determinted
at run-time.  Add TUNABLE_SET_WITH_BOUNDS and TUNABLE_SET_WITH_BOUNDS_FULL
to update tunable value together with minimum and maximum values.
__tunable_set_val is updated to set tunable value as well as min/max
values.
---
 elf/dl-tunables.c      | 45 ++++++++++++++++++++++++++++++++++++++----
 elf/dl-tunables.h      | 21 ++++++++++++++++++--
 manual/README.tunables | 24 ++++++++++++++++++++--
 3 files changed, 82 insertions(+), 8 deletions(-)

diff --git a/elf/dl-tunables.c b/elf/dl-tunables.c
index 26e6e26612..2ba2844075 100644
--- a/elf/dl-tunables.c
+++ b/elf/dl-tunables.c
@@ -100,8 +100,42 @@ get_next_env (char **envp, char **name, size_t *namelen, char **val,
     }									      \
 })
 
+#define TUNABLE_SET_BOUNDS_IF_VALID(__cur, __minp, __maxp, __type)	      \
+({									      \
+  if (__minp != NULL)							      \
+    {									      \
+      /* MIN is specified.  */						      \
+      __type min = *((__type *) __minp);				      \
+      if (__maxp != NULL)						      \
+	{								      \
+	   /* Both MIN and MAX are specified.  */			      \
+	    __type max = *((__type *) __maxp);				      \
+	  if (max >= min						      \
+	      && max <= (__cur)->type.max				      \
+	      && min >= (__cur)->type.min)				      \
+	    {								      \
+	      (__cur)->type.min = min;					      \
+	      (__cur)->type.max = max;					      \
+	    }								      \
+	}								      \
+      else if (min > (__cur)->type.min && min <= (__cur)->type.max)	      \
+	{								      \
+	  /* Only MIN is specified.  */					      \
+	  (__cur)->type.min = min;					      \
+	}								      \
+    }									      \
+  else if (__maxp != NULL)						      \
+    {									      \
+      /* Only MAX is specified.  */					      \
+      __type max = *((__type *) __maxp);				      \
+      if (max < (__cur)->type.max && max >= (__cur)->type.min)		      \
+	(__cur)->type.max = max;					      \
+    }									      \
+})
+
 static void
-do_tunable_update_val (tunable_t *cur, const void *valp)
+do_tunable_update_val (tunable_t *cur, const void *valp,
+		       const void *minp, const void *maxp)
 {
   uint64_t val;
 
@@ -112,16 +146,19 @@ do_tunable_update_val (tunable_t *cur, const void *valp)
     {
     case TUNABLE_TYPE_INT_32:
 	{
+	  TUNABLE_SET_BOUNDS_IF_VALID (cur, minp, maxp, int64_t);
 	  TUNABLE_SET_VAL_IF_VALID_RANGE (cur, val, int64_t);
 	  break;
 	}
     case TUNABLE_TYPE_UINT_64:
 	{
+	  TUNABLE_SET_BOUNDS_IF_VALID (cur, minp, maxp, uint64_t);
 	  TUNABLE_SET_VAL_IF_VALID_RANGE (cur, val, uint64_t);
 	  break;
 	}
     case TUNABLE_TYPE_SIZE_T:
 	{
+	  TUNABLE_SET_BOUNDS_IF_VALID (cur, minp, maxp, uint64_t);
 	  TUNABLE_SET_VAL_IF_VALID_RANGE (cur, val, uint64_t);
 	  break;
 	}
@@ -153,15 +190,15 @@ tunable_initialize (tunable_t *cur, const char *strval)
       cur->initialized = true;
       valp = strval;
     }
-  do_tunable_update_val (cur, valp);
+  do_tunable_update_val (cur, valp, NULL, NULL);
 }
 
 void
-__tunable_set_val (tunable_id_t id, void *valp)
+__tunable_set_val (tunable_id_t id, void *valp, void *minp, void *maxp)
 {
   tunable_t *cur = &tunable_list[id];
 
-  do_tunable_update_val (cur, valp);
+  do_tunable_update_val (cur, valp, minp, maxp);
 }
 
 #if TUNABLES_FRONTEND == TUNABLES_FRONTEND_valstring
diff --git a/elf/dl-tunables.h b/elf/dl-tunables.h
index f05eb50c2f..550b0cc7f4 100644
--- a/elf/dl-tunables.h
+++ b/elf/dl-tunables.h
@@ -70,9 +70,10 @@ typedef struct _tunable tunable_t;
 
 extern void __tunables_init (char **);
 extern void __tunable_get_val (tunable_id_t, void *, tunable_callback_t);
-extern void __tunable_set_val (tunable_id_t, void *);
+extern void __tunable_set_val (tunable_id_t, void *, void *, void *);
 rtld_hidden_proto (__tunables_init)
 rtld_hidden_proto (__tunable_get_val)
+rtld_hidden_proto (__tunable_set_val)
 
 /* Define TUNABLE_GET and TUNABLE_SET in short form if TOP_NAMESPACE and
    TUNABLE_NAMESPACE are defined.  This is useful shorthand to get and set
@@ -82,11 +83,18 @@ rtld_hidden_proto (__tunable_get_val)
   TUNABLE_GET_FULL (TOP_NAMESPACE, TUNABLE_NAMESPACE, __id, __type, __cb)
 # define TUNABLE_SET(__id, __type, __val) \
   TUNABLE_SET_FULL (TOP_NAMESPACE, TUNABLE_NAMESPACE, __id, __type, __val)
+# define TUNABLE_SET_WITH_BOUNDS(__id, __type, __val, __min, __max) \
+  TUNABLE_SET_WITH_BOUNDS_FULL (TOP_NAMESPACE, TUNABLE_NAMESPACE, __id, \
+				__type, __val, __min, __max)
 #else
 # define TUNABLE_GET(__top, __ns, __id, __type, __cb) \
   TUNABLE_GET_FULL (__top, __ns, __id, __type, __cb)
 # define TUNABLE_SET(__top, __ns, __id, __type, __val) \
   TUNABLE_SET_FULL (__top, __ns, __id, __type, __val)
+# define TUNABLE_SET_WITH_BOUNDS(__top, __ns, __id, __type, __val, \
+				 __min, __max) \
+  TUNABLE_SET_WITH_BOUNDS_FULL (__top, __ns, __id, __type, __val, \
+				__min, __max)
 #endif
 
 /* Get and return a tunable value.  If the tunable was set externally and __CB
@@ -103,7 +111,16 @@ rtld_hidden_proto (__tunable_get_val)
 # define TUNABLE_SET_FULL(__top, __ns, __id, __type, __val) \
 ({									      \
   __tunable_set_val (TUNABLE_ENUM_NAME (__top, __ns, __id),		      \
-			& (__type) {__val});				      \
+		     & (__type) {__val}, NULL, NULL);			      \
+})
+
+/* Set a tunable value together with min/max values.  */
+# define TUNABLE_SET_WITH_BOUNDS_FULL(__top, __ns, __id, __type, __val,	      \
+				      __min, __max)			      \
+({									      \
+  __tunable_set_val (TUNABLE_ENUM_NAME (__top, __ns, __id),		      \
+		     & (__type) {__val},  & (__type) {__min},		      \
+		     & (__type) {__max});				      \
 })
 
 /* Namespace sanity for callback functions.  Use this macro to keep the
diff --git a/manual/README.tunables b/manual/README.tunables
index f87a31a65e..fff6c2a87e 100644
--- a/manual/README.tunables
+++ b/manual/README.tunables
@@ -67,7 +67,7 @@ The list of allowed attributes are:
 				     non-AT_SECURE subprocesses.
 			NONE: Read all the time.
 
-2. Use TUNABLE_GET/TUNABLE_SET to get and set tunables.
+2. Use TUNABLE_GET/TUNABLE_SET/TUNABLE_SET_WITH_BOUNDS to get and set tunables.
 
 3. OPTIONAL: If tunables in a namespace are being used multiple times within a
    specific module, set the TUNABLE_NAMESPACE macro to reduce the amount of
@@ -112,9 +112,29 @@ form of the macros as follows:
 where 'glibc' is the top namespace, 'cpu' is the tunable namespace and the
 remaining arguments are the same as the short form macros.
 
+The minimum and maximum values can updated together with the tunable value
+using:
+
+  TUNABLE_SET_WITH_BOUNDS (check, int32_t, val, min, max)
+
+where 'check' is the tunable name, 'int32_t' is the C type of the tunable,
+'val' is a value of same type, 'min' and 'max' are the minimum and maximum
+values of the tunable.
+
+To set the minimum and maximum values of tunables in a different namespace
+from that module, use the full form of the macros as follows:
+
+  val = TUNABLE_GET_FULL (glibc, cpu, hwcap_mask, uint64_t, NULL)
+
+  TUNABLE_SET_WITH_BOUNDS_FULL (glibc, cpu, hwcap_mask, uint64_t, val, min, max)
+
+where 'glibc' is the top namespace, 'cpu' is the tunable namespace and the
+remaining arguments are the same as the short form macros.
+
 When TUNABLE_NAMESPACE is not defined in a module, TUNABLE_GET is equivalent to
 TUNABLE_GET_FULL, so you will need to provide full namespace information for
-both macros.  Likewise for TUNABLE_SET and TUNABLE_SET_FULL.
+both macros.  Likewise for TUNABLE_SET, TUNABLE_SET_FULL,
+TUNABLE_SET_WITH_BOUNDS and TUNABLE_SET_WITH_BOUNDS_FULL.
 
 ** IMPORTANT NOTE **
 
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: V5 [PATCH] Set tunable value as well as min/max values
  2020-09-29 14:54                 ` V5 " H.J. Lu via Libc-alpha
@ 2020-09-29 15:58                   ` Siddhesh Poyarekar via Libc-alpha
  0 siblings, 0 replies; 33+ messages in thread
From: Siddhesh Poyarekar via Libc-alpha @ 2020-09-29 15:58 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Florian Weimer, H.J. Lu via Libc-alpha

On 29/09/20 20:24, H.J. Lu wrote:
>> Basically, this is what it should look like:
>>
>>     if (__minp != NULL
>>         && *__minp <= *__maxp
> 
> __maxp may be NULL.
> 
>>         && *__minp >= (__cur)->type.min
>>         && *__minp <= (__cur)->type.max)
>>       (__cur)->type.min = *__minp;
>>
>>     if (__maxp != NULL
>>         && *__minp <= *_maxp
> 
> __minp may be NULL.

Oops :)

> I don't think we should make such assumptions.  Here is the
> updated patch with the check for the subset of (__cur)->type.min
> and (__cur)->type.max.

Fair enough, this version looks good.

Thanks,
Siddhesh

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: V3 [PATCH] x86: Initialize CPU info via IFUNC relocation [BZ 26203]
  2020-09-29 11:44                     ` H.J. Lu via Libc-alpha
@ 2020-10-01  8:46                       ` Florian Weimer via Libc-alpha
  2020-10-01 19:50                         ` V4 " H.J. Lu via Libc-alpha
  0 siblings, 1 reply; 33+ messages in thread
From: Florian Weimer via Libc-alpha @ 2020-10-01  8:46 UTC (permalink / raw)
  To: H.J. Lu; +Cc: H.J. Lu via Libc-alpha

* H. J. Lu:

> diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
> index dadec5d58f..65ab29123d 100644
> --- a/sysdeps/x86/cacheinfo.c
> +++ b/sysdeps/x86/cacheinfo.c
> @@ -16,7 +16,9 @@
>     License along with the GNU C Library; if not, see
>     <https://www.gnu.org/licenses/>.  */
>  
> -#if IS_IN (libc)
> +/* NB: In libc.a, this file is included in libc-static.c.  In libc.so,
> +   this file is standalone.  */
> +#if IS_IN (libc) && (defined SHARED || defined _PRIVATE_CPU_FEATURES_H)

libc-static.c should be libc-start.c, I believe.  The “defined
_PRIVATE_CPU_FEATURES_H” part seems rather indirect.  What exactly are
you trying to accomplish here?

It looks to me as if this file should included in libc.so, but not
pulled into ld.so via the rebuild, so maybe you can add an empty
sysdeps/x86/rtld-cacheinfo.c file instead?

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill


^ permalink raw reply	[flat|nested] 33+ messages in thread

* V4 [PATCH] x86: Initialize CPU info via IFUNC relocation [BZ 26203]
  2020-10-01  8:46                       ` Florian Weimer via Libc-alpha
@ 2020-10-01 19:50                         ` H.J. Lu via Libc-alpha
  2020-10-08 13:22                           ` PING: " H.J. Lu via Libc-alpha
  0 siblings, 1 reply; 33+ messages in thread
From: H.J. Lu via Libc-alpha @ 2020-10-01 19:50 UTC (permalink / raw)
  To: Florian Weimer; +Cc: H.J. Lu via Libc-alpha

[-- Attachment #1: Type: text/plain, Size: 1170 bytes --]

On Thu, Oct 1, 2020 at 1:46 AM Florian Weimer <fweimer@redhat.com> wrote:
>
> * H. J. Lu:
>
> > diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
> > index dadec5d58f..65ab29123d 100644
> > --- a/sysdeps/x86/cacheinfo.c
> > +++ b/sysdeps/x86/cacheinfo.c
> > @@ -16,7 +16,9 @@
> >     License along with the GNU C Library; if not, see
> >     <https://www.gnu.org/licenses/>.  */
> >
> > -#if IS_IN (libc)
> > +/* NB: In libc.a, this file is included in libc-static.c.  In libc.so,
> > +   this file is standalone.  */
> > +#if IS_IN (libc) && (defined SHARED || defined _PRIVATE_CPU_FEATURES_H)
>
> libc-static.c should be libc-start.c, I believe.  The “defined
> _PRIVATE_CPU_FEATURES_H” part seems rather indirect.  What exactly are
> you trying to accomplish here?
>
> It looks to me as if this file should included in libc.so, but not
> pulled into ld.so via the rebuild, so maybe you can add an empty
> sysdeps/x86/rtld-cacheinfo.c file instead?
>

Here is the updated patch.   I also moved files around to prepare
for moving x86 processor cache info to cpu_features in ld.so to
support --list-tunables.


-- 
H.J.

[-- Attachment #2: 0001-x86-Initialize-CPU-info-via-IFUNC-relocation-BZ-2620.patch --]
[-- Type: text/x-patch, Size: 65027 bytes --]

From c5cea18653257a6ed366a8c89a5d8660bd168e20 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Sat, 4 Jul 2020 06:35:49 -0700
Subject: [PATCH] x86: Initialize CPU info via IFUNC relocation [BZ 26203]

X86 CPU features in ld.so are initialized by init_cpu_features, which is
invoked by DL_PLATFORM_INIT from _dl_sysdep_start.  But when ld.so is
loaded by static executable, DL_PLATFORM_INIT is never called.  Also
x86 cache info in libc.o and libc.a is initialized by a constructor
which may be called too late.  Since some fields in _rtld_global_ro
in ld.so are initialized by dynamic relocation, we can also initialize
x86 CPU features in _rtld_global_ro in ld.so and cache info in libc.so
by initializing dummy function pointers in ld.so and libc.so via IFUNC
relocation.

Key points:

1. IFUNC is always supported, independent of --enable-multi-arch or
--disable-multi-arch.  Linker generates IFUNC relocations from input
IFUNC objects and ld.so performs IFUNC relocations.
2. There are no IFUNC dependencies in ld.so before dynamic relocation
have been performed,
3. The x86 CPU features in ld.so is initialized by DL_PLATFORM_INIT
in dynamic executable and by IFUNC relocation in dlopen in static
executable.
4. The x86 cache info in libc.o is initialized by IFUNC relocation.
5. In libc.a, both x86 CPU features and cache info are initialized from
ARCH_INIT_CPU_FEATURES, not by IFUNC relocation, before __libc_early_init
is called.

Note: _dl_x86_init_cpu_features can be called more than once from
DL_PLATFORM_INIT and during relocation in ld.so.
---
 sysdeps/i386/dl-machine.h          |   7 +-
 sysdeps/x86/cacheinfo.c            | 871 +----------------------------
 sysdeps/x86/cacheinfo.h            | 413 ++++++++++++++
 sysdeps/x86/cpu-features.c         |  12 +-
 sysdeps/x86/dl-cacheinfo.h         | 478 ++++++++++++++++
 sysdeps/x86/dl-get-cpu-features.c  |  27 +-
 sysdeps/x86/include/cpu-features.h |   1 +
 sysdeps/x86/libc-start.c           |   1 -
 sysdeps/x86_64/dl-machine.h        |   7 +-
 9 files changed, 949 insertions(+), 868 deletions(-)
 create mode 100644 sysdeps/x86/cacheinfo.h
 create mode 100644 sysdeps/x86/dl-cacheinfo.h

diff --git a/sysdeps/i386/dl-machine.h b/sysdeps/i386/dl-machine.h
index 0f08079e48..bdc21d1a3c 100644
--- a/sysdeps/i386/dl-machine.h
+++ b/sysdeps/i386/dl-machine.h
@@ -25,7 +25,6 @@
 #include <sysdep.h>
 #include <tls.h>
 #include <dl-tlsdesc.h>
-#include <cpu-features.c>
 
 /* Return nonzero iff ELF header is compatible with the running host.  */
 static inline int __attribute__ ((unused))
@@ -248,9 +247,9 @@ static inline void __attribute__ ((unused))
 dl_platform_init (void)
 {
 #if IS_IN (rtld)
-  /* init_cpu_features has been called early from __libc_start_main in
-     static executable.  */
-  init_cpu_features (&GLRO(dl_x86_cpu_features));
+  /* _dl_x86_init_cpu_features is a wrapper for init_cpu_features which
+     has been called early from __libc_start_main in static executable.  */
+  _dl_x86_init_cpu_features ();
 #else
   if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
     /* Avoid an empty string which would disturb us.  */
diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
index dadec5d58f..8ed95ce9dd 100644
--- a/sysdeps/x86/cacheinfo.c
+++ b/sysdeps/x86/cacheinfo.c
@@ -1,4 +1,4 @@
-/* x86_64 cache info.
+/* x86 cache info.
    Copyright (C) 2003-2020 Free Software Foundation, Inc.
    This file is part of the GNU C Library.
 
@@ -16,476 +16,11 @@
    License along with the GNU C Library; if not, see
    <https://www.gnu.org/licenses/>.  */
 
-#if IS_IN (libc)
-
 #include <assert.h>
-#include <stdbool.h>
-#include <stdlib.h>
 #include <unistd.h>
 #include <cpuid.h>
-#include <init-arch.h>
-
-static const struct intel_02_cache_info
-{
-  unsigned char idx;
-  unsigned char assoc;
-  unsigned char linesize;
-  unsigned char rel_name;
-  unsigned int size;
-} intel_02_known [] =
-  {
-#define M(sc) ((sc) - _SC_LEVEL1_ICACHE_SIZE)
-    { 0x06,  4, 32, M(_SC_LEVEL1_ICACHE_SIZE),    8192 },
-    { 0x08,  4, 32, M(_SC_LEVEL1_ICACHE_SIZE),   16384 },
-    { 0x09,  4, 32, M(_SC_LEVEL1_ICACHE_SIZE),   32768 },
-    { 0x0a,  2, 32, M(_SC_LEVEL1_DCACHE_SIZE),    8192 },
-    { 0x0c,  4, 32, M(_SC_LEVEL1_DCACHE_SIZE),   16384 },
-    { 0x0d,  4, 64, M(_SC_LEVEL1_DCACHE_SIZE),   16384 },
-    { 0x0e,  6, 64, M(_SC_LEVEL1_DCACHE_SIZE),   24576 },
-    { 0x21,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
-    { 0x22,  4, 64, M(_SC_LEVEL3_CACHE_SIZE),   524288 },
-    { 0x23,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  1048576 },
-    { 0x25,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  2097152 },
-    { 0x29,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  4194304 },
-    { 0x2c,  8, 64, M(_SC_LEVEL1_DCACHE_SIZE),   32768 },
-    { 0x30,  8, 64, M(_SC_LEVEL1_ICACHE_SIZE),   32768 },
-    { 0x39,  4, 64, M(_SC_LEVEL2_CACHE_SIZE),   131072 },
-    { 0x3a,  6, 64, M(_SC_LEVEL2_CACHE_SIZE),   196608 },
-    { 0x3b,  2, 64, M(_SC_LEVEL2_CACHE_SIZE),   131072 },
-    { 0x3c,  4, 64, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
-    { 0x3d,  6, 64, M(_SC_LEVEL2_CACHE_SIZE),   393216 },
-    { 0x3e,  4, 64, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
-    { 0x3f,  2, 64, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
-    { 0x41,  4, 32, M(_SC_LEVEL2_CACHE_SIZE),   131072 },
-    { 0x42,  4, 32, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
-    { 0x43,  4, 32, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
-    { 0x44,  4, 32, M(_SC_LEVEL2_CACHE_SIZE),  1048576 },
-    { 0x45,  4, 32, M(_SC_LEVEL2_CACHE_SIZE),  2097152 },
-    { 0x46,  4, 64, M(_SC_LEVEL3_CACHE_SIZE),  4194304 },
-    { 0x47,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  8388608 },
-    { 0x48, 12, 64, M(_SC_LEVEL2_CACHE_SIZE),  3145728 },
-    { 0x49, 16, 64, M(_SC_LEVEL2_CACHE_SIZE),  4194304 },
-    { 0x4a, 12, 64, M(_SC_LEVEL3_CACHE_SIZE),  6291456 },
-    { 0x4b, 16, 64, M(_SC_LEVEL3_CACHE_SIZE),  8388608 },
-    { 0x4c, 12, 64, M(_SC_LEVEL3_CACHE_SIZE), 12582912 },
-    { 0x4d, 16, 64, M(_SC_LEVEL3_CACHE_SIZE), 16777216 },
-    { 0x4e, 24, 64, M(_SC_LEVEL2_CACHE_SIZE),  6291456 },
-    { 0x60,  8, 64, M(_SC_LEVEL1_DCACHE_SIZE),   16384 },
-    { 0x66,  4, 64, M(_SC_LEVEL1_DCACHE_SIZE),    8192 },
-    { 0x67,  4, 64, M(_SC_LEVEL1_DCACHE_SIZE),   16384 },
-    { 0x68,  4, 64, M(_SC_LEVEL1_DCACHE_SIZE),   32768 },
-    { 0x78,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),  1048576 },
-    { 0x79,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),   131072 },
-    { 0x7a,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
-    { 0x7b,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
-    { 0x7c,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),  1048576 },
-    { 0x7d,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),  2097152 },
-    { 0x7f,  2, 64, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
-    { 0x80,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
-    { 0x82,  8, 32, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
-    { 0x83,  8, 32, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
-    { 0x84,  8, 32, M(_SC_LEVEL2_CACHE_SIZE),  1048576 },
-    { 0x85,  8, 32, M(_SC_LEVEL2_CACHE_SIZE),  2097152 },
-    { 0x86,  4, 64, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
-    { 0x87,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),  1048576 },
-    { 0xd0,  4, 64, M(_SC_LEVEL3_CACHE_SIZE),   524288 },
-    { 0xd1,  4, 64, M(_SC_LEVEL3_CACHE_SIZE),  1048576 },
-    { 0xd2,  4, 64, M(_SC_LEVEL3_CACHE_SIZE),  2097152 },
-    { 0xd6,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  1048576 },
-    { 0xd7,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  2097152 },
-    { 0xd8,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  4194304 },
-    { 0xdc, 12, 64, M(_SC_LEVEL3_CACHE_SIZE),  2097152 },
-    { 0xdd, 12, 64, M(_SC_LEVEL3_CACHE_SIZE),  4194304 },
-    { 0xde, 12, 64, M(_SC_LEVEL3_CACHE_SIZE),  8388608 },
-    { 0xe2, 16, 64, M(_SC_LEVEL3_CACHE_SIZE),  2097152 },
-    { 0xe3, 16, 64, M(_SC_LEVEL3_CACHE_SIZE),  4194304 },
-    { 0xe4, 16, 64, M(_SC_LEVEL3_CACHE_SIZE),  8388608 },
-    { 0xea, 24, 64, M(_SC_LEVEL3_CACHE_SIZE), 12582912 },
-    { 0xeb, 24, 64, M(_SC_LEVEL3_CACHE_SIZE), 18874368 },
-    { 0xec, 24, 64, M(_SC_LEVEL3_CACHE_SIZE), 25165824 },
-  };
-
-#define nintel_02_known (sizeof (intel_02_known) / sizeof (intel_02_known [0]))
-
-static int
-intel_02_known_compare (const void *p1, const void *p2)
-{
-  const struct intel_02_cache_info *i1;
-  const struct intel_02_cache_info *i2;
-
-  i1 = (const struct intel_02_cache_info *) p1;
-  i2 = (const struct intel_02_cache_info *) p2;
-
-  if (i1->idx == i2->idx)
-    return 0;
-
-  return i1->idx < i2->idx ? -1 : 1;
-}
-
-
-static long int
-__attribute__ ((noinline))
-intel_check_word (int name, unsigned int value, bool *has_level_2,
-		  bool *no_level_2_or_3,
-		  const struct cpu_features *cpu_features)
-{
-  if ((value & 0x80000000) != 0)
-    /* The register value is reserved.  */
-    return 0;
-
-  /* Fold the name.  The _SC_ constants are always in the order SIZE,
-     ASSOC, LINESIZE.  */
-  int folded_rel_name = (M(name) / 3) * 3;
-
-  while (value != 0)
-    {
-      unsigned int byte = value & 0xff;
-
-      if (byte == 0x40)
-	{
-	  *no_level_2_or_3 = true;
-
-	  if (folded_rel_name == M(_SC_LEVEL3_CACHE_SIZE))
-	    /* No need to look further.  */
-	    break;
-	}
-      else if (byte == 0xff)
-	{
-	  /* CPUID leaf 0x4 contains all the information.  We need to
-	     iterate over it.  */
-	  unsigned int eax;
-	  unsigned int ebx;
-	  unsigned int ecx;
-	  unsigned int edx;
-
-	  unsigned int round = 0;
-	  while (1)
-	    {
-	      __cpuid_count (4, round, eax, ebx, ecx, edx);
-
-	      enum { null = 0, data = 1, inst = 2, uni = 3 } type = eax & 0x1f;
-	      if (type == null)
-		/* That was the end.  */
-		break;
-
-	      unsigned int level = (eax >> 5) & 0x7;
-
-	      if ((level == 1 && type == data
-		   && folded_rel_name == M(_SC_LEVEL1_DCACHE_SIZE))
-		  || (level == 1 && type == inst
-		      && folded_rel_name == M(_SC_LEVEL1_ICACHE_SIZE))
-		  || (level == 2 && folded_rel_name == M(_SC_LEVEL2_CACHE_SIZE))
-		  || (level == 3 && folded_rel_name == M(_SC_LEVEL3_CACHE_SIZE))
-		  || (level == 4 && folded_rel_name == M(_SC_LEVEL4_CACHE_SIZE)))
-		{
-		  unsigned int offset = M(name) - folded_rel_name;
-
-		  if (offset == 0)
-		    /* Cache size.  */
-		    return (((ebx >> 22) + 1)
-			    * (((ebx >> 12) & 0x3ff) + 1)
-			    * ((ebx & 0xfff) + 1)
-			    * (ecx + 1));
-		  if (offset == 1)
-		    return (ebx >> 22) + 1;
-
-		  assert (offset == 2);
-		  return (ebx & 0xfff) + 1;
-		}
-
-	      ++round;
-	    }
-	  /* There is no other cache information anywhere else.  */
-	  break;
-	}
-      else
-	{
-	  if (byte == 0x49 && folded_rel_name == M(_SC_LEVEL3_CACHE_SIZE))
-	    {
-	      /* Intel reused this value.  For family 15, model 6 it
-		 specifies the 3rd level cache.  Otherwise the 2nd
-		 level cache.  */
-	      unsigned int family = cpu_features->basic.family;
-	      unsigned int model = cpu_features->basic.model;
-
-	      if (family == 15 && model == 6)
-		{
-		  /* The level 3 cache is encoded for this model like
-		     the level 2 cache is for other models.  Pretend
-		     the caller asked for the level 2 cache.  */
-		  name = (_SC_LEVEL2_CACHE_SIZE
-			  + (name - _SC_LEVEL3_CACHE_SIZE));
-		  folded_rel_name = M(_SC_LEVEL2_CACHE_SIZE);
-		}
-	    }
-
-	  struct intel_02_cache_info *found;
-	  struct intel_02_cache_info search;
-
-	  search.idx = byte;
-	  found = bsearch (&search, intel_02_known, nintel_02_known,
-			   sizeof (intel_02_known[0]), intel_02_known_compare);
-	  if (found != NULL)
-	    {
-	      if (found->rel_name == folded_rel_name)
-		{
-		  unsigned int offset = M(name) - folded_rel_name;
-
-		  if (offset == 0)
-		    /* Cache size.  */
-		    return found->size;
-		  if (offset == 1)
-		    return found->assoc;
-
-		  assert (offset == 2);
-		  return found->linesize;
-		}
-
-	      if (found->rel_name == M(_SC_LEVEL2_CACHE_SIZE))
-		*has_level_2 = true;
-	    }
-	}
-
-      /* Next byte for the next round.  */
-      value >>= 8;
-    }
-
-  /* Nothing found.  */
-  return 0;
-}
-
-
-static long int __attribute__ ((noinline))
-handle_intel (int name, const struct cpu_features *cpu_features)
-{
-  unsigned int maxidx = cpu_features->basic.max_cpuid;
-
-  /* Return -1 for older CPUs.  */
-  if (maxidx < 2)
-    return -1;
-
-  /* OK, we can use the CPUID instruction to get all info about the
-     caches.  */
-  unsigned int cnt = 0;
-  unsigned int max = 1;
-  long int result = 0;
-  bool no_level_2_or_3 = false;
-  bool has_level_2 = false;
-
-  while (cnt++ < max)
-    {
-      unsigned int eax;
-      unsigned int ebx;
-      unsigned int ecx;
-      unsigned int edx;
-      __cpuid (2, eax, ebx, ecx, edx);
-
-      /* The low byte of EAX in the first round contain the number of
-	 rounds we have to make.  At least one, the one we are already
-	 doing.  */
-      if (cnt == 1)
-	{
-	  max = eax & 0xff;
-	  eax &= 0xffffff00;
-	}
-
-      /* Process the individual registers' value.  */
-      result = intel_check_word (name, eax, &has_level_2,
-				 &no_level_2_or_3, cpu_features);
-      if (result != 0)
-	return result;
-
-      result = intel_check_word (name, ebx, &has_level_2,
-				 &no_level_2_or_3, cpu_features);
-      if (result != 0)
-	return result;
-
-      result = intel_check_word (name, ecx, &has_level_2,
-				 &no_level_2_or_3, cpu_features);
-      if (result != 0)
-	return result;
-
-      result = intel_check_word (name, edx, &has_level_2,
-				 &no_level_2_or_3, cpu_features);
-      if (result != 0)
-	return result;
-    }
-
-  if (name >= _SC_LEVEL2_CACHE_SIZE && name <= _SC_LEVEL3_CACHE_LINESIZE
-      && no_level_2_or_3)
-    return -1;
-
-  return 0;
-}
-
-
-static long int __attribute__ ((noinline))
-handle_amd (int name)
-{
-  unsigned int eax;
-  unsigned int ebx;
-  unsigned int ecx;
-  unsigned int edx;
-  __cpuid (0x80000000, eax, ebx, ecx, edx);
-
-  /* No level 4 cache (yet).  */
-  if (name > _SC_LEVEL3_CACHE_LINESIZE)
-    return 0;
-
-  unsigned int fn = 0x80000005 + (name >= _SC_LEVEL2_CACHE_SIZE);
-  if (eax < fn)
-    return 0;
-
-  __cpuid (fn, eax, ebx, ecx, edx);
-
-  if (name < _SC_LEVEL1_DCACHE_SIZE)
-    {
-      name += _SC_LEVEL1_DCACHE_SIZE - _SC_LEVEL1_ICACHE_SIZE;
-      ecx = edx;
-    }
-
-  switch (name)
-    {
-    case _SC_LEVEL1_DCACHE_SIZE:
-      return (ecx >> 14) & 0x3fc00;
-
-    case _SC_LEVEL1_DCACHE_ASSOC:
-      ecx >>= 16;
-      if ((ecx & 0xff) == 0xff)
-	/* Fully associative.  */
-	return (ecx << 2) & 0x3fc00;
-      return ecx & 0xff;
-
-    case _SC_LEVEL1_DCACHE_LINESIZE:
-      return ecx & 0xff;
-
-    case _SC_LEVEL2_CACHE_SIZE:
-      return (ecx & 0xf000) == 0 ? 0 : (ecx >> 6) & 0x3fffc00;
-
-    case _SC_LEVEL2_CACHE_ASSOC:
-      switch ((ecx >> 12) & 0xf)
-	{
-	case 0:
-	case 1:
-	case 2:
-	case 4:
-	  return (ecx >> 12) & 0xf;
-	case 6:
-	  return 8;
-	case 8:
-	  return 16;
-	case 10:
-	  return 32;
-	case 11:
-	  return 48;
-	case 12:
-	  return 64;
-	case 13:
-	  return 96;
-	case 14:
-	  return 128;
-	case 15:
-	  return ((ecx >> 6) & 0x3fffc00) / (ecx & 0xff);
-	default:
-	  return 0;
-	}
-      /* NOTREACHED */
-
-    case _SC_LEVEL2_CACHE_LINESIZE:
-      return (ecx & 0xf000) == 0 ? 0 : ecx & 0xff;
-
-    case _SC_LEVEL3_CACHE_SIZE:
-      return (edx & 0xf000) == 0 ? 0 : (edx & 0x3ffc0000) << 1;
-
-    case _SC_LEVEL3_CACHE_ASSOC:
-      switch ((edx >> 12) & 0xf)
-	{
-	case 0:
-	case 1:
-	case 2:
-	case 4:
-	  return (edx >> 12) & 0xf;
-	case 6:
-	  return 8;
-	case 8:
-	  return 16;
-	case 10:
-	  return 32;
-	case 11:
-	  return 48;
-	case 12:
-	  return 64;
-	case 13:
-	  return 96;
-	case 14:
-	  return 128;
-	case 15:
-	  return ((edx & 0x3ffc0000) << 1) / (edx & 0xff);
-	default:
-	  return 0;
-	}
-      /* NOTREACHED */
-
-    case _SC_LEVEL3_CACHE_LINESIZE:
-      return (edx & 0xf000) == 0 ? 0 : edx & 0xff;
-
-    default:
-      assert (! "cannot happen");
-    }
-  return -1;
-}
-
-
-static long int __attribute__ ((noinline))
-handle_zhaoxin (int name)
-{
-  unsigned int eax;
-  unsigned int ebx;
-  unsigned int ecx;
-  unsigned int edx;
-
-  int folded_rel_name = (M(name) / 3) * 3;
-
-  unsigned int round = 0;
-  while (1)
-    {
-      __cpuid_count (4, round, eax, ebx, ecx, edx);
-
-      enum { null = 0, data = 1, inst = 2, uni = 3 } type = eax & 0x1f;
-      if (type == null)
-        break;
-
-      unsigned int level = (eax >> 5) & 0x7;
-
-      if ((level == 1 && type == data
-        && folded_rel_name == M(_SC_LEVEL1_DCACHE_SIZE))
-        || (level == 1 && type == inst
-            && folded_rel_name == M(_SC_LEVEL1_ICACHE_SIZE))
-        || (level == 2 && folded_rel_name == M(_SC_LEVEL2_CACHE_SIZE))
-        || (level == 3 && folded_rel_name == M(_SC_LEVEL3_CACHE_SIZE)))
-        {
-          unsigned int offset = M(name) - folded_rel_name;
-
-          if (offset == 0)
-            /* Cache size.  */
-            return (((ebx >> 22) + 1)
-                * (((ebx >> 12) & 0x3ff) + 1)
-                * ((ebx & 0xfff) + 1)
-                * (ecx + 1));
-          if (offset == 1)
-            return (ebx >> 22) + 1;
-
-          assert (offset == 2);
-          return (ebx & 0xfff) + 1;
-        }
-
-      ++round;
-    }
-
-  /* Nothing found.  */
-  return 0;
-}
-
+#include <ldsodefs.h>
+#include <dl-cacheinfo.h>
 
 /* Get the value of the system variable NAME.  */
 long int
@@ -509,395 +44,17 @@ __cache_sysconf (int name)
   return 0;
 }
 
+#ifdef SHARED
+/* NB: In libc.a, cacheinfo.h is included in libc-start.c.  In libc.so,
+   cacheinfo.h is included here and call init_cacheinfo by initializing
+   a dummy function pointer via IFUNC relocation after CPU features in
+   ld.so have been initialized by DL_PLATFORM_INIT or IFUNC relocation.  */
+# include <cacheinfo.h>
+# include <ifunc-init.h>
 
-/* Data cache size for use in memory and string routines, typically
-   L1 size, rounded to multiple of 256 bytes.  */
-long int __x86_data_cache_size_half attribute_hidden = 32 * 1024 / 2;
-long int __x86_data_cache_size attribute_hidden = 32 * 1024;
-/* Similar to __x86_data_cache_size_half, but not rounded.  */
-long int __x86_raw_data_cache_size_half attribute_hidden = 32 * 1024 / 2;
-/* Similar to __x86_data_cache_size, but not rounded.  */
-long int __x86_raw_data_cache_size attribute_hidden = 32 * 1024;
-/* Shared cache size for use in memory and string routines, typically
-   L2 or L3 size, rounded to multiple of 256 bytes.  */
-long int __x86_shared_cache_size_half attribute_hidden = 1024 * 1024 / 2;
-long int __x86_shared_cache_size attribute_hidden = 1024 * 1024;
-/* Similar to __x86_shared_cache_size_half, but not rounded.  */
-long int __x86_raw_shared_cache_size_half attribute_hidden = 1024 * 1024 / 2;
-/* Similar to __x86_shared_cache_size, but not rounded.  */
-long int __x86_raw_shared_cache_size attribute_hidden = 1024 * 1024;
-
-/* Threshold to use non temporal store.  */
-long int __x86_shared_non_temporal_threshold attribute_hidden;
-
-/* Threshold to use Enhanced REP MOVSB.  */
-long int __x86_rep_movsb_threshold attribute_hidden = 2048;
-
-/* Threshold to use Enhanced REP STOSB.  */
-long int __x86_rep_stosb_threshold attribute_hidden = 2048;
-
-
-static void
-get_common_cache_info (long int *shared_ptr, unsigned int *threads_ptr,
-                long int core)
-{
-  unsigned int eax;
-  unsigned int ebx;
-  unsigned int ecx;
-  unsigned int edx;
-
-  /* Number of logical processors sharing L2 cache.  */
-  int threads_l2;
-
-  /* Number of logical processors sharing L3 cache.  */
-  int threads_l3;
-
-  const struct cpu_features *cpu_features = __get_cpu_features ();
-  int max_cpuid = cpu_features->basic.max_cpuid;
-  unsigned int family = cpu_features->basic.family;
-  unsigned int model = cpu_features->basic.model;
-  long int shared = *shared_ptr;
-  unsigned int threads = *threads_ptr;
-  bool inclusive_cache = true;
-  bool support_count_mask = true;
-
-  /* Try L3 first.  */
-  unsigned int level = 3;
-
-  if (cpu_features->basic.kind == arch_kind_zhaoxin && family == 6)
-    support_count_mask = false;
-
-  if (shared <= 0)
-    {
-      /* Try L2 otherwise.  */
-      level  = 2;
-      shared = core;
-      threads_l2 = 0;
-      threads_l3 = -1;
-    }
-  else
-    {
-      threads_l2 = 0;
-      threads_l3 = 0;
-    }
-
-  /* A value of 0 for the HTT bit indicates there is only a single
-     logical processor.  */
-  if (CPU_FEATURE_USABLE (HTT))
-    {
-      /* Figure out the number of logical threads that share the
-         highest cache level.  */
-      if (max_cpuid >= 4)
-        {
-          int i = 0;
-
-          /* Query until cache level 2 and 3 are enumerated.  */
-          int check = 0x1 | (threads_l3 == 0) << 1;
-          do
-            {
-              __cpuid_count (4, i++, eax, ebx, ecx, edx);
-
-              /* There seems to be a bug in at least some Pentium Ds
-                 which sometimes fail to iterate all cache parameters.
-                 Do not loop indefinitely here, stop in this case and
-                 assume there is no such information.  */
-              if (cpu_features->basic.kind == arch_kind_intel
-                  && (eax & 0x1f) == 0 )
-                goto intel_bug_no_cache_info;
-
-              switch ((eax >> 5) & 0x7)
-                {
-                  default:
-                    break;
-                  case 2:
-                    if ((check & 0x1))
-                      {
-                        /* Get maximum number of logical processors
-                           sharing L2 cache.  */
-                        threads_l2 = (eax >> 14) & 0x3ff;
-                        check &= ~0x1;
-                      }
-                    break;
-                  case 3:
-                    if ((check & (0x1 << 1)))
-                      {
-                        /* Get maximum number of logical processors
-                           sharing L3 cache.  */
-                        threads_l3 = (eax >> 14) & 0x3ff;
-
-                        /* Check if L2 and L3 caches are inclusive.  */
-                        inclusive_cache = (edx & 0x2) != 0;
-                        check &= ~(0x1 << 1);
-                      }
-                    break;
-                }
-            }
-          while (check);
-
-          /* If max_cpuid >= 11, THREADS_L2/THREADS_L3 are the maximum
-             numbers of addressable IDs for logical processors sharing
-             the cache, instead of the maximum number of threads
-             sharing the cache.  */
-          if (max_cpuid >= 11 && support_count_mask)
-            {
-              /* Find the number of logical processors shipped in
-                 one core and apply count mask.  */
-              i = 0;
-
-              /* Count SMT only if there is L3 cache.  Always count
-                 core if there is no L3 cache.  */
-              int count = ((threads_l2 > 0 && level == 3)
-                           | ((threads_l3 > 0
-                               || (threads_l2 > 0 && level == 2)) << 1));
-
-              while (count)
-                {
-                  __cpuid_count (11, i++, eax, ebx, ecx, edx);
-
-                  int shipped = ebx & 0xff;
-                  int type = ecx & 0xff00;
-                  if (shipped == 0 || type == 0)
-                    break;
-                  else if (type == 0x100)
-                    {
-                      /* Count SMT.  */
-                      if ((count & 0x1))
-                        {
-                          int count_mask;
-
-                          /* Compute count mask.  */
-                          asm ("bsr %1, %0"
-                               : "=r" (count_mask) : "g" (threads_l2));
-                          count_mask = ~(-1 << (count_mask + 1));
-                          threads_l2 = (shipped - 1) & count_mask;
-                          count &= ~0x1;
-                        }
-                    }
-                  else if (type == 0x200)
-                    {
-                      /* Count core.  */
-                      if ((count & (0x1 << 1)))
-                        {
-                          int count_mask;
-                          int threads_core
-                            = (level == 2 ? threads_l2 : threads_l3);
-
-                          /* Compute count mask.  */
-                          asm ("bsr %1, %0"
-                               : "=r" (count_mask) : "g" (threads_core));
-                          count_mask = ~(-1 << (count_mask + 1));
-                          threads_core = (shipped - 1) & count_mask;
-                          if (level == 2)
-                            threads_l2 = threads_core;
-                          else
-                            threads_l3 = threads_core;
-                          count &= ~(0x1 << 1);
-                        }
-                    }
-                }
-            }
-          if (threads_l2 > 0)
-            threads_l2 += 1;
-          if (threads_l3 > 0)
-            threads_l3 += 1;
-          if (level == 2)
-            {
-              if (threads_l2)
-                {
-                  threads = threads_l2;
-                  if (cpu_features->basic.kind == arch_kind_intel
-                      && threads > 2
-                      && family == 6)
-                    switch (model)
-                      {
-                        case 0x37:
-                        case 0x4a:
-                        case 0x4d:
-                        case 0x5a:
-                        case 0x5d:
-                          /* Silvermont has L2 cache shared by 2 cores.  */
-                          threads = 2;
-                          break;
-                        default:
-                          break;
-                      }
-                }
-            }
-          else if (threads_l3)
-            threads = threads_l3;
-        }
-      else
-        {
-intel_bug_no_cache_info:
-          /* Assume that all logical threads share the highest cache
-             level.  */
-          threads
-            = ((cpu_features->features[COMMON_CPUID_INDEX_1].cpuid.ebx
-                >> 16) & 0xff);
-        }
-
-        /* Cap usage of highest cache level to the number of supported
-           threads.  */
-        if (shared > 0 && threads > 0)
-          shared /= threads;
-    }
-
-  /* Account for non-inclusive L2 and L3 caches.  */
-  if (!inclusive_cache)
-    {
-      if (threads_l2 > 0)
-        core /= threads_l2;
-      shared += core;
-    }
-
-  *shared_ptr = shared;
-  *threads_ptr = threads;
-}
-
-
-static void
-__attribute__((constructor))
-init_cacheinfo (void)
-{
-  /* Find out what brand of processor.  */
-  unsigned int ebx;
-  unsigned int ecx;
-  unsigned int edx;
-  int max_cpuid_ex;
-  long int data = -1;
-  long int shared = -1;
-  long int core;
-  unsigned int threads = 0;
-  const struct cpu_features *cpu_features = __get_cpu_features ();
-
-  if (cpu_features->basic.kind == arch_kind_intel)
-    {
-      data = handle_intel (_SC_LEVEL1_DCACHE_SIZE, cpu_features);
-      core = handle_intel (_SC_LEVEL2_CACHE_SIZE, cpu_features);
-      shared = handle_intel (_SC_LEVEL3_CACHE_SIZE, cpu_features);
-
-      get_common_cache_info (&shared, &threads, core);
-    }
-  else if (cpu_features->basic.kind == arch_kind_zhaoxin)
-    {
-      data = handle_zhaoxin (_SC_LEVEL1_DCACHE_SIZE);
-      core = handle_zhaoxin (_SC_LEVEL2_CACHE_SIZE);
-      shared = handle_zhaoxin (_SC_LEVEL3_CACHE_SIZE);
-
-      get_common_cache_info (&shared, &threads, core);
-    }
-  else if (cpu_features->basic.kind == arch_kind_amd)
-    {
-      data   = handle_amd (_SC_LEVEL1_DCACHE_SIZE);
-      long int core = handle_amd (_SC_LEVEL2_CACHE_SIZE);
-      shared = handle_amd (_SC_LEVEL3_CACHE_SIZE);
-
-      /* Get maximum extended function. */
-      __cpuid (0x80000000, max_cpuid_ex, ebx, ecx, edx);
-
-      if (shared <= 0)
-	/* No shared L3 cache.  All we have is the L2 cache.  */
-	shared = core;
-      else
-	{
-	  /* Figure out the number of logical threads that share L3.  */
-	  if (max_cpuid_ex >= 0x80000008)
-	    {
-	      /* Get width of APIC ID.  */
-	      __cpuid (0x80000008, max_cpuid_ex, ebx, ecx, edx);
-	      threads = 1 << ((ecx >> 12) & 0x0f);
-	    }
-
-	  if (threads == 0)
-	    {
-	      /* If APIC ID width is not available, use logical
-		 processor count.  */
-	      __cpuid (0x00000001, max_cpuid_ex, ebx, ecx, edx);
-
-	      if ((edx & (1 << 28)) != 0)
-		threads = (ebx >> 16) & 0xff;
-	    }
-
-	  /* Cap usage of highest cache level to the number of
-	     supported threads.  */
-	  if (threads > 0)
-	    shared /= threads;
-
-	  /* Account for exclusive L2 and L3 caches.  */
-	  shared += core;
-	}
-    }
-
-  if (cpu_features->data_cache_size != 0)
-    data = cpu_features->data_cache_size;
-
-  if (data > 0)
-    {
-      __x86_raw_data_cache_size_half = data / 2;
-      __x86_raw_data_cache_size = data;
-      /* Round data cache size to multiple of 256 bytes.  */
-      data = data & ~255L;
-      __x86_data_cache_size_half = data / 2;
-      __x86_data_cache_size = data;
-    }
-
-  if (cpu_features->shared_cache_size != 0)
-    shared = cpu_features->shared_cache_size;
-
-  if (shared > 0)
-    {
-      __x86_raw_shared_cache_size_half = shared / 2;
-      __x86_raw_shared_cache_size = shared;
-      /* Round shared cache size to multiple of 256 bytes.  */
-      shared = shared & ~255L;
-      __x86_shared_cache_size_half = shared / 2;
-      __x86_shared_cache_size = shared;
-    }
-
-  /* The default setting for the non_temporal threshold is 3/4 of one
-     thread's share of the chip's cache. For most Intel and AMD processors
-     with an initial release date between 2017 and 2020, a thread's typical
-     share of the cache is from 500 KBytes to 2 MBytes. Using the 3/4
-     threshold leaves 125 KBytes to 500 KBytes of the thread's data
-     in cache after a maximum temporal copy, which will maintain
-     in cache a reasonable portion of the thread's stack and other
-     active data. If the threshold is set higher than one thread's
-     share of the cache, it has a substantial risk of negatively
-     impacting the performance of other threads running on the chip. */
-  __x86_shared_non_temporal_threshold
-    = (cpu_features->non_temporal_threshold != 0
-       ? cpu_features->non_temporal_threshold
-       : __x86_shared_cache_size * 3 / 4);
-
-  /* NB: The REP MOVSB threshold must be greater than VEC_SIZE * 8.  */
-  unsigned int minimum_rep_movsb_threshold;
-  /* NB: The default REP MOVSB threshold is 2048 * (VEC_SIZE / 16).  */
-  unsigned int rep_movsb_threshold;
-  if (CPU_FEATURE_USABLE_P (cpu_features, AVX512F)
-      && !CPU_FEATURE_PREFERRED_P (cpu_features, Prefer_No_AVX512))
-    {
-      rep_movsb_threshold = 2048 * (64 / 16);
-      minimum_rep_movsb_threshold = 64 * 8;
-    }
-  else if (CPU_FEATURE_PREFERRED_P (cpu_features,
-				    AVX_Fast_Unaligned_Load))
-    {
-      rep_movsb_threshold = 2048 * (32 / 16);
-      minimum_rep_movsb_threshold = 32 * 8;
-    }
-  else
-    {
-      rep_movsb_threshold = 2048 * (16 / 16);
-      minimum_rep_movsb_threshold = 16 * 8;
-    }
-  if (cpu_features->rep_movsb_threshold > minimum_rep_movsb_threshold)
-    __x86_rep_movsb_threshold = cpu_features->rep_movsb_threshold;
-  else
-    __x86_rep_movsb_threshold = rep_movsb_threshold;
-
-# if HAVE_TUNABLES
-  __x86_rep_stosb_threshold = cpu_features->rep_stosb_threshold;
-# endif
-}
+extern void __x86_cacheinfo (void) attribute_hidden;
+const void (*__x86_cacheinfo_p) (void) attribute_hidden
+  = __x86_cacheinfo;
 
+__ifunc (__x86_cacheinfo, __x86_cacheinfo, NULL, void, init_cacheinfo);
 #endif
diff --git a/sysdeps/x86/cacheinfo.h b/sysdeps/x86/cacheinfo.h
new file mode 100644
index 0000000000..7f342fdc23
--- /dev/null
+++ b/sysdeps/x86/cacheinfo.h
@@ -0,0 +1,413 @@
+/* x86 cache info.
+   Copyright (C) 2020 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+#include <assert.h>
+#include <unistd.h>
+
+/* Data cache size for use in memory and string routines, typically
+   L1 size, rounded to multiple of 256 bytes.  */
+long int __x86_data_cache_size_half attribute_hidden = 32 * 1024 / 2;
+long int __x86_data_cache_size attribute_hidden = 32 * 1024;
+/* Similar to __x86_data_cache_size_half, but not rounded.  */
+long int __x86_raw_data_cache_size_half attribute_hidden = 32 * 1024 / 2;
+/* Similar to __x86_data_cache_size, but not rounded.  */
+long int __x86_raw_data_cache_size attribute_hidden = 32 * 1024;
+/* Shared cache size for use in memory and string routines, typically
+   L2 or L3 size, rounded to multiple of 256 bytes.  */
+long int __x86_shared_cache_size_half attribute_hidden = 1024 * 1024 / 2;
+long int __x86_shared_cache_size attribute_hidden = 1024 * 1024;
+/* Similar to __x86_shared_cache_size_half, but not rounded.  */
+long int __x86_raw_shared_cache_size_half attribute_hidden = 1024 * 1024 / 2;
+/* Similar to __x86_shared_cache_size, but not rounded.  */
+long int __x86_raw_shared_cache_size attribute_hidden = 1024 * 1024;
+
+/* Threshold to use non temporal store.  */
+long int __x86_shared_non_temporal_threshold attribute_hidden;
+
+/* Threshold to use Enhanced REP MOVSB.  */
+long int __x86_rep_movsb_threshold attribute_hidden = 2048;
+
+/* Threshold to use Enhanced REP STOSB.  */
+long int __x86_rep_stosb_threshold attribute_hidden = 2048;
+
+static void
+get_common_cache_info (long int *shared_ptr, unsigned int *threads_ptr,
+		       long int core)
+{
+  unsigned int eax;
+  unsigned int ebx;
+  unsigned int ecx;
+  unsigned int edx;
+
+  /* Number of logical processors sharing L2 cache.  */
+  int threads_l2;
+
+  /* Number of logical processors sharing L3 cache.  */
+  int threads_l3;
+
+  const struct cpu_features *cpu_features = __get_cpu_features ();
+  int max_cpuid = cpu_features->basic.max_cpuid;
+  unsigned int family = cpu_features->basic.family;
+  unsigned int model = cpu_features->basic.model;
+  long int shared = *shared_ptr;
+  unsigned int threads = *threads_ptr;
+  bool inclusive_cache = true;
+  bool support_count_mask = true;
+
+  /* Try L3 first.  */
+  unsigned int level = 3;
+
+  if (cpu_features->basic.kind == arch_kind_zhaoxin && family == 6)
+    support_count_mask = false;
+
+  if (shared <= 0)
+    {
+      /* Try L2 otherwise.  */
+      level  = 2;
+      shared = core;
+      threads_l2 = 0;
+      threads_l3 = -1;
+    }
+  else
+    {
+      threads_l2 = 0;
+      threads_l3 = 0;
+    }
+
+  /* A value of 0 for the HTT bit indicates there is only a single
+     logical processor.  */
+  if (HAS_CPU_FEATURE (HTT))
+    {
+      /* Figure out the number of logical threads that share the
+         highest cache level.  */
+      if (max_cpuid >= 4)
+        {
+          int i = 0;
+
+          /* Query until cache level 2 and 3 are enumerated.  */
+          int check = 0x1 | (threads_l3 == 0) << 1;
+          do
+            {
+              __cpuid_count (4, i++, eax, ebx, ecx, edx);
+
+              /* There seems to be a bug in at least some Pentium Ds
+                 which sometimes fail to iterate all cache parameters.
+                 Do not loop indefinitely here, stop in this case and
+                 assume there is no such information.  */
+              if (cpu_features->basic.kind == arch_kind_intel
+                  && (eax & 0x1f) == 0 )
+                goto intel_bug_no_cache_info;
+
+              switch ((eax >> 5) & 0x7)
+                {
+                  default:
+                    break;
+                  case 2:
+                    if ((check & 0x1))
+                      {
+                        /* Get maximum number of logical processors
+                           sharing L2 cache.  */
+                        threads_l2 = (eax >> 14) & 0x3ff;
+                        check &= ~0x1;
+                      }
+                    break;
+                  case 3:
+                    if ((check & (0x1 << 1)))
+                      {
+                        /* Get maximum number of logical processors
+                           sharing L3 cache.  */
+                        threads_l3 = (eax >> 14) & 0x3ff;
+
+                        /* Check if L2 and L3 caches are inclusive.  */
+                        inclusive_cache = (edx & 0x2) != 0;
+                        check &= ~(0x1 << 1);
+                      }
+                    break;
+                }
+            }
+          while (check);
+
+          /* If max_cpuid >= 11, THREADS_L2/THREADS_L3 are the maximum
+             numbers of addressable IDs for logical processors sharing
+             the cache, instead of the maximum number of threads
+             sharing the cache.  */
+          if (max_cpuid >= 11 && support_count_mask)
+            {
+              /* Find the number of logical processors shipped in
+                 one core and apply count mask.  */
+              i = 0;
+
+              /* Count SMT only if there is L3 cache.  Always count
+                 core if there is no L3 cache.  */
+              int count = ((threads_l2 > 0 && level == 3)
+                           | ((threads_l3 > 0
+                               || (threads_l2 > 0 && level == 2)) << 1));
+
+              while (count)
+                {
+                  __cpuid_count (11, i++, eax, ebx, ecx, edx);
+
+                  int shipped = ebx & 0xff;
+                  int type = ecx & 0xff00;
+                  if (shipped == 0 || type == 0)
+                    break;
+                  else if (type == 0x100)
+                    {
+                      /* Count SMT.  */
+                      if ((count & 0x1))
+                        {
+                          int count_mask;
+
+                          /* Compute count mask.  */
+                          asm ("bsr %1, %0"
+                               : "=r" (count_mask) : "g" (threads_l2));
+                          count_mask = ~(-1 << (count_mask + 1));
+                          threads_l2 = (shipped - 1) & count_mask;
+                          count &= ~0x1;
+                        }
+                    }
+                  else if (type == 0x200)
+                    {
+                      /* Count core.  */
+                      if ((count & (0x1 << 1)))
+                        {
+                          int count_mask;
+                          int threads_core
+                            = (level == 2 ? threads_l2 : threads_l3);
+
+                          /* Compute count mask.  */
+                          asm ("bsr %1, %0"
+                               : "=r" (count_mask) : "g" (threads_core));
+                          count_mask = ~(-1 << (count_mask + 1));
+                          threads_core = (shipped - 1) & count_mask;
+                          if (level == 2)
+                            threads_l2 = threads_core;
+                          else
+                            threads_l3 = threads_core;
+                          count &= ~(0x1 << 1);
+                        }
+                    }
+                }
+            }
+          if (threads_l2 > 0)
+            threads_l2 += 1;
+          if (threads_l3 > 0)
+            threads_l3 += 1;
+          if (level == 2)
+            {
+              if (threads_l2)
+                {
+                  threads = threads_l2;
+                  if (cpu_features->basic.kind == arch_kind_intel
+                      && threads > 2
+                      && family == 6)
+                    switch (model)
+                      {
+                        case 0x37:
+                        case 0x4a:
+                        case 0x4d:
+                        case 0x5a:
+                        case 0x5d:
+                          /* Silvermont has L2 cache shared by 2 cores.  */
+                          threads = 2;
+                          break;
+                        default:
+                          break;
+                      }
+                }
+            }
+          else if (threads_l3)
+            threads = threads_l3;
+        }
+      else
+        {
+intel_bug_no_cache_info:
+          /* Assume that all logical threads share the highest cache
+             level.  */
+          threads
+            = ((cpu_features->features[COMMON_CPUID_INDEX_1].cpuid.ebx
+                >> 16) & 0xff);
+        }
+
+        /* Cap usage of highest cache level to the number of supported
+           threads.  */
+        if (shared > 0 && threads > 0)
+          shared /= threads;
+    }
+
+  /* Account for non-inclusive L2 and L3 caches.  */
+  if (!inclusive_cache)
+    {
+      if (threads_l2 > 0)
+        core /= threads_l2;
+      shared += core;
+    }
+
+  *shared_ptr = shared;
+  *threads_ptr = threads;
+}
+
+static void
+init_cacheinfo (void)
+{
+  /* Find out what brand of processor.  */
+  unsigned int ebx;
+  unsigned int ecx;
+  unsigned int edx;
+  int max_cpuid_ex;
+  long int data = -1;
+  long int shared = -1;
+  long int core;
+  unsigned int threads = 0;
+  const struct cpu_features *cpu_features = __get_cpu_features ();
+
+  /* NB: In libc.so, cpu_features is defined in ld.so and is initialized
+     by DL_PLATFORM_INIT or IFUNC relocation before init_cacheinfo is
+     called by IFUNC relocation.  In libc.a, init_cacheinfo is called
+     from init_cpu_features by ARCH_INIT_CPU_FEATURES.  */
+  assert (cpu_features->basic.kind != arch_kind_unknown);
+
+  if (cpu_features->basic.kind == arch_kind_intel)
+    {
+      data = handle_intel (_SC_LEVEL1_DCACHE_SIZE, cpu_features);
+      core = handle_intel (_SC_LEVEL2_CACHE_SIZE, cpu_features);
+      shared = handle_intel (_SC_LEVEL3_CACHE_SIZE, cpu_features);
+
+      get_common_cache_info (&shared, &threads, core);
+    }
+  else if (cpu_features->basic.kind == arch_kind_zhaoxin)
+    {
+      data = handle_zhaoxin (_SC_LEVEL1_DCACHE_SIZE);
+      core = handle_zhaoxin (_SC_LEVEL2_CACHE_SIZE);
+      shared = handle_zhaoxin (_SC_LEVEL3_CACHE_SIZE);
+
+      get_common_cache_info (&shared, &threads, core);
+    }
+  else if (cpu_features->basic.kind == arch_kind_amd)
+    {
+      data   = handle_amd (_SC_LEVEL1_DCACHE_SIZE);
+      long int core = handle_amd (_SC_LEVEL2_CACHE_SIZE);
+      shared = handle_amd (_SC_LEVEL3_CACHE_SIZE);
+
+      /* Get maximum extended function. */
+      __cpuid (0x80000000, max_cpuid_ex, ebx, ecx, edx);
+
+      if (shared <= 0)
+	/* No shared L3 cache.  All we have is the L2 cache.  */
+	shared = core;
+      else
+	{
+	  /* Figure out the number of logical threads that share L3.  */
+	  if (max_cpuid_ex >= 0x80000008)
+	    {
+	      /* Get width of APIC ID.  */
+	      __cpuid (0x80000008, max_cpuid_ex, ebx, ecx, edx);
+	      threads = 1 << ((ecx >> 12) & 0x0f);
+	    }
+
+	  if (threads == 0)
+	    {
+	      /* If APIC ID width is not available, use logical
+		 processor count.  */
+	      __cpuid (0x00000001, max_cpuid_ex, ebx, ecx, edx);
+
+	      if ((edx & (1 << 28)) != 0)
+		threads = (ebx >> 16) & 0xff;
+	    }
+
+	  /* Cap usage of highest cache level to the number of
+	     supported threads.  */
+	  if (threads > 0)
+	    shared /= threads;
+
+	  /* Account for exclusive L2 and L3 caches.  */
+	  shared += core;
+	}
+    }
+
+  if (cpu_features->data_cache_size != 0)
+    data = cpu_features->data_cache_size;
+
+  if (data > 0)
+    {
+      __x86_raw_data_cache_size_half = data / 2;
+      __x86_raw_data_cache_size = data;
+      /* Round data cache size to multiple of 256 bytes.  */
+      data = data & ~255L;
+      __x86_data_cache_size_half = data / 2;
+      __x86_data_cache_size = data;
+    }
+
+  if (cpu_features->shared_cache_size != 0)
+    shared = cpu_features->shared_cache_size;
+
+  if (shared > 0)
+    {
+      __x86_raw_shared_cache_size_half = shared / 2;
+      __x86_raw_shared_cache_size = shared;
+      /* Round shared cache size to multiple of 256 bytes.  */
+      shared = shared & ~255L;
+      __x86_shared_cache_size_half = shared / 2;
+      __x86_shared_cache_size = shared;
+    }
+
+  /* The default setting for the non_temporal threshold is 3/4 of one
+     thread's share of the chip's cache. For most Intel and AMD processors
+     with an initial release date between 2017 and 2020, a thread's typical
+     share of the cache is from 500 KBytes to 2 MBytes. Using the 3/4
+     threshold leaves 125 KBytes to 500 KBytes of the thread's data
+     in cache after a maximum temporal copy, which will maintain
+     in cache a reasonable portion of the thread's stack and other
+     active data. If the threshold is set higher than one thread's
+     share of the cache, it has a substantial risk of negatively
+     impacting the performance of other threads running on the chip. */
+  __x86_shared_non_temporal_threshold
+    = (cpu_features->non_temporal_threshold != 0
+       ? cpu_features->non_temporal_threshold
+       : __x86_shared_cache_size * 3 / 4);
+
+  /* NB: The REP MOVSB threshold must be greater than VEC_SIZE * 8.  */
+  unsigned int minimum_rep_movsb_threshold;
+  /* NB: The default REP MOVSB threshold is 2048 * (VEC_SIZE / 16).  */
+  unsigned int rep_movsb_threshold;
+  if (CPU_FEATURE_USABLE_P (cpu_features, AVX512F)
+      && !CPU_FEATURE_PREFERRED_P (cpu_features, Prefer_No_AVX512))
+    {
+      rep_movsb_threshold = 2048 * (64 / 16);
+      minimum_rep_movsb_threshold = 64 * 8;
+    }
+  else if (CPU_FEATURE_PREFERRED_P (cpu_features,
+				    AVX_Fast_Unaligned_Load))
+    {
+      rep_movsb_threshold = 2048 * (32 / 16);
+      minimum_rep_movsb_threshold = 32 * 8;
+    }
+  else
+    {
+      rep_movsb_threshold = 2048 * (16 / 16);
+      minimum_rep_movsb_threshold = 16 * 8;
+    }
+  if (cpu_features->rep_movsb_threshold > minimum_rep_movsb_threshold)
+    __x86_rep_movsb_threshold = cpu_features->rep_movsb_threshold;
+  else
+    __x86_rep_movsb_threshold = rep_movsb_threshold;
+
+# if HAVE_TUNABLES
+  __x86_rep_stosb_threshold = cpu_features->rep_stosb_threshold;
+# endif
+}
diff --git a/sysdeps/x86/cpu-features.c b/sysdeps/x86/cpu-features.c
index 6551df19c0..e29e6c9f16 100644
--- a/sysdeps/x86/cpu-features.c
+++ b/sysdeps/x86/cpu-features.c
@@ -17,9 +17,14 @@
    <https://www.gnu.org/licenses/>.  */
 
 #include <cpuid.h>
-#include <cpu-features.h>
 #include <dl-hwcap.h>
 #include <libc-pointer-arith.h>
+#if IS_IN (libc) && !defined SHARED
+# include <assert.h>
+# include <unistd.h>
+# include <dl-cacheinfo.h>
+# include <cacheinfo.h>
+#endif
 
 #if HAVE_TUNABLES
 # define TUNABLE_NAMESPACE cpu
@@ -746,4 +751,9 @@ no_cpuid:
 # endif
     }
 #endif
+
+#ifndef SHARED
+  /* NB: In libc.a, call init_cacheinfo.  */
+  init_cacheinfo ();
+#endif
 }
diff --git a/sysdeps/x86/dl-cacheinfo.h b/sysdeps/x86/dl-cacheinfo.h
new file mode 100644
index 0000000000..b2b90074b0
--- /dev/null
+++ b/sysdeps/x86/dl-cacheinfo.h
@@ -0,0 +1,478 @@
+/* Initialize x86 cache info.
+   Copyright (C) 2020 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <https://www.gnu.org/licenses/>.  */
+
+static const struct intel_02_cache_info
+{
+  unsigned char idx;
+  unsigned char assoc;
+  unsigned char linesize;
+  unsigned char rel_name;
+  unsigned int size;
+} intel_02_known [] =
+  {
+#define M(sc) ((sc) - _SC_LEVEL1_ICACHE_SIZE)
+    { 0x06,  4, 32, M(_SC_LEVEL1_ICACHE_SIZE),    8192 },
+    { 0x08,  4, 32, M(_SC_LEVEL1_ICACHE_SIZE),   16384 },
+    { 0x09,  4, 32, M(_SC_LEVEL1_ICACHE_SIZE),   32768 },
+    { 0x0a,  2, 32, M(_SC_LEVEL1_DCACHE_SIZE),    8192 },
+    { 0x0c,  4, 32, M(_SC_LEVEL1_DCACHE_SIZE),   16384 },
+    { 0x0d,  4, 64, M(_SC_LEVEL1_DCACHE_SIZE),   16384 },
+    { 0x0e,  6, 64, M(_SC_LEVEL1_DCACHE_SIZE),   24576 },
+    { 0x21,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
+    { 0x22,  4, 64, M(_SC_LEVEL3_CACHE_SIZE),   524288 },
+    { 0x23,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  1048576 },
+    { 0x25,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  2097152 },
+    { 0x29,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  4194304 },
+    { 0x2c,  8, 64, M(_SC_LEVEL1_DCACHE_SIZE),   32768 },
+    { 0x30,  8, 64, M(_SC_LEVEL1_ICACHE_SIZE),   32768 },
+    { 0x39,  4, 64, M(_SC_LEVEL2_CACHE_SIZE),   131072 },
+    { 0x3a,  6, 64, M(_SC_LEVEL2_CACHE_SIZE),   196608 },
+    { 0x3b,  2, 64, M(_SC_LEVEL2_CACHE_SIZE),   131072 },
+    { 0x3c,  4, 64, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
+    { 0x3d,  6, 64, M(_SC_LEVEL2_CACHE_SIZE),   393216 },
+    { 0x3e,  4, 64, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
+    { 0x3f,  2, 64, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
+    { 0x41,  4, 32, M(_SC_LEVEL2_CACHE_SIZE),   131072 },
+    { 0x42,  4, 32, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
+    { 0x43,  4, 32, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
+    { 0x44,  4, 32, M(_SC_LEVEL2_CACHE_SIZE),  1048576 },
+    { 0x45,  4, 32, M(_SC_LEVEL2_CACHE_SIZE),  2097152 },
+    { 0x46,  4, 64, M(_SC_LEVEL3_CACHE_SIZE),  4194304 },
+    { 0x47,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  8388608 },
+    { 0x48, 12, 64, M(_SC_LEVEL2_CACHE_SIZE),  3145728 },
+    { 0x49, 16, 64, M(_SC_LEVEL2_CACHE_SIZE),  4194304 },
+    { 0x4a, 12, 64, M(_SC_LEVEL3_CACHE_SIZE),  6291456 },
+    { 0x4b, 16, 64, M(_SC_LEVEL3_CACHE_SIZE),  8388608 },
+    { 0x4c, 12, 64, M(_SC_LEVEL3_CACHE_SIZE), 12582912 },
+    { 0x4d, 16, 64, M(_SC_LEVEL3_CACHE_SIZE), 16777216 },
+    { 0x4e, 24, 64, M(_SC_LEVEL2_CACHE_SIZE),  6291456 },
+    { 0x60,  8, 64, M(_SC_LEVEL1_DCACHE_SIZE),   16384 },
+    { 0x66,  4, 64, M(_SC_LEVEL1_DCACHE_SIZE),    8192 },
+    { 0x67,  4, 64, M(_SC_LEVEL1_DCACHE_SIZE),   16384 },
+    { 0x68,  4, 64, M(_SC_LEVEL1_DCACHE_SIZE),   32768 },
+    { 0x78,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),  1048576 },
+    { 0x79,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),   131072 },
+    { 0x7a,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
+    { 0x7b,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
+    { 0x7c,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),  1048576 },
+    { 0x7d,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),  2097152 },
+    { 0x7f,  2, 64, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
+    { 0x80,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
+    { 0x82,  8, 32, M(_SC_LEVEL2_CACHE_SIZE),   262144 },
+    { 0x83,  8, 32, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
+    { 0x84,  8, 32, M(_SC_LEVEL2_CACHE_SIZE),  1048576 },
+    { 0x85,  8, 32, M(_SC_LEVEL2_CACHE_SIZE),  2097152 },
+    { 0x86,  4, 64, M(_SC_LEVEL2_CACHE_SIZE),   524288 },
+    { 0x87,  8, 64, M(_SC_LEVEL2_CACHE_SIZE),  1048576 },
+    { 0xd0,  4, 64, M(_SC_LEVEL3_CACHE_SIZE),   524288 },
+    { 0xd1,  4, 64, M(_SC_LEVEL3_CACHE_SIZE),  1048576 },
+    { 0xd2,  4, 64, M(_SC_LEVEL3_CACHE_SIZE),  2097152 },
+    { 0xd6,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  1048576 },
+    { 0xd7,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  2097152 },
+    { 0xd8,  8, 64, M(_SC_LEVEL3_CACHE_SIZE),  4194304 },
+    { 0xdc, 12, 64, M(_SC_LEVEL3_CACHE_SIZE),  2097152 },
+    { 0xdd, 12, 64, M(_SC_LEVEL3_CACHE_SIZE),  4194304 },
+    { 0xde, 12, 64, M(_SC_LEVEL3_CACHE_SIZE),  8388608 },
+    { 0xe2, 16, 64, M(_SC_LEVEL3_CACHE_SIZE),  2097152 },
+    { 0xe3, 16, 64, M(_SC_LEVEL3_CACHE_SIZE),  4194304 },
+    { 0xe4, 16, 64, M(_SC_LEVEL3_CACHE_SIZE),  8388608 },
+    { 0xea, 24, 64, M(_SC_LEVEL3_CACHE_SIZE), 12582912 },
+    { 0xeb, 24, 64, M(_SC_LEVEL3_CACHE_SIZE), 18874368 },
+    { 0xec, 24, 64, M(_SC_LEVEL3_CACHE_SIZE), 25165824 },
+  };
+
+#define nintel_02_known (sizeof (intel_02_known) / sizeof (intel_02_known [0]))
+
+static int
+intel_02_known_compare (const void *p1, const void *p2)
+{
+  const struct intel_02_cache_info *i1;
+  const struct intel_02_cache_info *i2;
+
+  i1 = (const struct intel_02_cache_info *) p1;
+  i2 = (const struct intel_02_cache_info *) p2;
+
+  if (i1->idx == i2->idx)
+    return 0;
+
+  return i1->idx < i2->idx ? -1 : 1;
+}
+
+
+static long int
+__attribute__ ((noinline))
+intel_check_word (int name, unsigned int value, bool *has_level_2,
+		  bool *no_level_2_or_3,
+		  const struct cpu_features *cpu_features)
+{
+  if ((value & 0x80000000) != 0)
+    /* The register value is reserved.  */
+    return 0;
+
+  /* Fold the name.  The _SC_ constants are always in the order SIZE,
+     ASSOC, LINESIZE.  */
+  int folded_rel_name = (M(name) / 3) * 3;
+
+  while (value != 0)
+    {
+      unsigned int byte = value & 0xff;
+
+      if (byte == 0x40)
+	{
+	  *no_level_2_or_3 = true;
+
+	  if (folded_rel_name == M(_SC_LEVEL3_CACHE_SIZE))
+	    /* No need to look further.  */
+	    break;
+	}
+      else if (byte == 0xff)
+	{
+	  /* CPUID leaf 0x4 contains all the information.  We need to
+	     iterate over it.  */
+	  unsigned int eax;
+	  unsigned int ebx;
+	  unsigned int ecx;
+	  unsigned int edx;
+
+	  unsigned int round = 0;
+	  while (1)
+	    {
+	      __cpuid_count (4, round, eax, ebx, ecx, edx);
+
+	      enum { null = 0, data = 1, inst = 2, uni = 3 } type = eax & 0x1f;
+	      if (type == null)
+		/* That was the end.  */
+		break;
+
+	      unsigned int level = (eax >> 5) & 0x7;
+
+	      if ((level == 1 && type == data
+		   && folded_rel_name == M(_SC_LEVEL1_DCACHE_SIZE))
+		  || (level == 1 && type == inst
+		      && folded_rel_name == M(_SC_LEVEL1_ICACHE_SIZE))
+		  || (level == 2 && folded_rel_name == M(_SC_LEVEL2_CACHE_SIZE))
+		  || (level == 3 && folded_rel_name == M(_SC_LEVEL3_CACHE_SIZE))
+		  || (level == 4 && folded_rel_name == M(_SC_LEVEL4_CACHE_SIZE)))
+		{
+		  unsigned int offset = M(name) - folded_rel_name;
+
+		  if (offset == 0)
+		    /* Cache size.  */
+		    return (((ebx >> 22) + 1)
+			    * (((ebx >> 12) & 0x3ff) + 1)
+			    * ((ebx & 0xfff) + 1)
+			    * (ecx + 1));
+		  if (offset == 1)
+		    return (ebx >> 22) + 1;
+
+		  assert (offset == 2);
+		  return (ebx & 0xfff) + 1;
+		}
+
+	      ++round;
+	    }
+	  /* There is no other cache information anywhere else.  */
+	  break;
+	}
+      else
+	{
+	  if (byte == 0x49 && folded_rel_name == M(_SC_LEVEL3_CACHE_SIZE))
+	    {
+	      /* Intel reused this value.  For family 15, model 6 it
+		 specifies the 3rd level cache.  Otherwise the 2nd
+		 level cache.  */
+	      unsigned int family = cpu_features->basic.family;
+	      unsigned int model = cpu_features->basic.model;
+
+	      if (family == 15 && model == 6)
+		{
+		  /* The level 3 cache is encoded for this model like
+		     the level 2 cache is for other models.  Pretend
+		     the caller asked for the level 2 cache.  */
+		  name = (_SC_LEVEL2_CACHE_SIZE
+			  + (name - _SC_LEVEL3_CACHE_SIZE));
+		  folded_rel_name = M(_SC_LEVEL2_CACHE_SIZE);
+		}
+	    }
+
+	  struct intel_02_cache_info *found;
+	  struct intel_02_cache_info search;
+
+	  search.idx = byte;
+	  found = bsearch (&search, intel_02_known, nintel_02_known,
+			   sizeof (intel_02_known[0]), intel_02_known_compare);
+	  if (found != NULL)
+	    {
+	      if (found->rel_name == folded_rel_name)
+		{
+		  unsigned int offset = M(name) - folded_rel_name;
+
+		  if (offset == 0)
+		    /* Cache size.  */
+		    return found->size;
+		  if (offset == 1)
+		    return found->assoc;
+
+		  assert (offset == 2);
+		  return found->linesize;
+		}
+
+	      if (found->rel_name == M(_SC_LEVEL2_CACHE_SIZE))
+		*has_level_2 = true;
+	    }
+	}
+
+      /* Next byte for the next round.  */
+      value >>= 8;
+    }
+
+  /* Nothing found.  */
+  return 0;
+}
+
+
+static long int __attribute__ ((noinline))
+handle_intel (int name, const struct cpu_features *cpu_features)
+{
+  unsigned int maxidx = cpu_features->basic.max_cpuid;
+
+  /* Return -1 for older CPUs.  */
+  if (maxidx < 2)
+    return -1;
+
+  /* OK, we can use the CPUID instruction to get all info about the
+     caches.  */
+  unsigned int cnt = 0;
+  unsigned int max = 1;
+  long int result = 0;
+  bool no_level_2_or_3 = false;
+  bool has_level_2 = false;
+
+  while (cnt++ < max)
+    {
+      unsigned int eax;
+      unsigned int ebx;
+      unsigned int ecx;
+      unsigned int edx;
+      __cpuid (2, eax, ebx, ecx, edx);
+
+      /* The low byte of EAX in the first round contain the number of
+	 rounds we have to make.  At least one, the one we are already
+	 doing.  */
+      if (cnt == 1)
+	{
+	  max = eax & 0xff;
+	  eax &= 0xffffff00;
+	}
+
+      /* Process the individual registers' value.  */
+      result = intel_check_word (name, eax, &has_level_2,
+				 &no_level_2_or_3, cpu_features);
+      if (result != 0)
+	return result;
+
+      result = intel_check_word (name, ebx, &has_level_2,
+				 &no_level_2_or_3, cpu_features);
+      if (result != 0)
+	return result;
+
+      result = intel_check_word (name, ecx, &has_level_2,
+				 &no_level_2_or_3, cpu_features);
+      if (result != 0)
+	return result;
+
+      result = intel_check_word (name, edx, &has_level_2,
+				 &no_level_2_or_3, cpu_features);
+      if (result != 0)
+	return result;
+    }
+
+  if (name >= _SC_LEVEL2_CACHE_SIZE && name <= _SC_LEVEL3_CACHE_LINESIZE
+      && no_level_2_or_3)
+    return -1;
+
+  return 0;
+}
+
+
+static long int __attribute__ ((noinline))
+handle_amd (int name)
+{
+  unsigned int eax;
+  unsigned int ebx;
+  unsigned int ecx;
+  unsigned int edx;
+  __cpuid (0x80000000, eax, ebx, ecx, edx);
+
+  /* No level 4 cache (yet).  */
+  if (name > _SC_LEVEL3_CACHE_LINESIZE)
+    return 0;
+
+  unsigned int fn = 0x80000005 + (name >= _SC_LEVEL2_CACHE_SIZE);
+  if (eax < fn)
+    return 0;
+
+  __cpuid (fn, eax, ebx, ecx, edx);
+
+  if (name < _SC_LEVEL1_DCACHE_SIZE)
+    {
+      name += _SC_LEVEL1_DCACHE_SIZE - _SC_LEVEL1_ICACHE_SIZE;
+      ecx = edx;
+    }
+
+  switch (name)
+    {
+    case _SC_LEVEL1_DCACHE_SIZE:
+      return (ecx >> 14) & 0x3fc00;
+
+    case _SC_LEVEL1_DCACHE_ASSOC:
+      ecx >>= 16;
+      if ((ecx & 0xff) == 0xff)
+	/* Fully associative.  */
+	return (ecx << 2) & 0x3fc00;
+      return ecx & 0xff;
+
+    case _SC_LEVEL1_DCACHE_LINESIZE:
+      return ecx & 0xff;
+
+    case _SC_LEVEL2_CACHE_SIZE:
+      return (ecx & 0xf000) == 0 ? 0 : (ecx >> 6) & 0x3fffc00;
+
+    case _SC_LEVEL2_CACHE_ASSOC:
+      switch ((ecx >> 12) & 0xf)
+	{
+	case 0:
+	case 1:
+	case 2:
+	case 4:
+	  return (ecx >> 12) & 0xf;
+	case 6:
+	  return 8;
+	case 8:
+	  return 16;
+	case 10:
+	  return 32;
+	case 11:
+	  return 48;
+	case 12:
+	  return 64;
+	case 13:
+	  return 96;
+	case 14:
+	  return 128;
+	case 15:
+	  return ((ecx >> 6) & 0x3fffc00) / (ecx & 0xff);
+	default:
+	  return 0;
+	}
+      /* NOTREACHED */
+
+    case _SC_LEVEL2_CACHE_LINESIZE:
+      return (ecx & 0xf000) == 0 ? 0 : ecx & 0xff;
+
+    case _SC_LEVEL3_CACHE_SIZE:
+      return (edx & 0xf000) == 0 ? 0 : (edx & 0x3ffc0000) << 1;
+
+    case _SC_LEVEL3_CACHE_ASSOC:
+      switch ((edx >> 12) & 0xf)
+	{
+	case 0:
+	case 1:
+	case 2:
+	case 4:
+	  return (edx >> 12) & 0xf;
+	case 6:
+	  return 8;
+	case 8:
+	  return 16;
+	case 10:
+	  return 32;
+	case 11:
+	  return 48;
+	case 12:
+	  return 64;
+	case 13:
+	  return 96;
+	case 14:
+	  return 128;
+	case 15:
+	  return ((edx & 0x3ffc0000) << 1) / (edx & 0xff);
+	default:
+	  return 0;
+	}
+      /* NOTREACHED */
+
+    case _SC_LEVEL3_CACHE_LINESIZE:
+      return (edx & 0xf000) == 0 ? 0 : edx & 0xff;
+
+    default:
+      assert (! "cannot happen");
+    }
+  return -1;
+}
+
+
+static long int __attribute__ ((noinline))
+handle_zhaoxin (int name)
+{
+  unsigned int eax;
+  unsigned int ebx;
+  unsigned int ecx;
+  unsigned int edx;
+
+  int folded_rel_name = (M(name) / 3) * 3;
+
+  unsigned int round = 0;
+  while (1)
+    {
+      __cpuid_count (4, round, eax, ebx, ecx, edx);
+
+      enum { null = 0, data = 1, inst = 2, uni = 3 } type = eax & 0x1f;
+      if (type == null)
+        break;
+
+      unsigned int level = (eax >> 5) & 0x7;
+
+      if ((level == 1 && type == data
+        && folded_rel_name == M(_SC_LEVEL1_DCACHE_SIZE))
+        || (level == 1 && type == inst
+            && folded_rel_name == M(_SC_LEVEL1_ICACHE_SIZE))
+        || (level == 2 && folded_rel_name == M(_SC_LEVEL2_CACHE_SIZE))
+        || (level == 3 && folded_rel_name == M(_SC_LEVEL3_CACHE_SIZE)))
+        {
+          unsigned int offset = M(name) - folded_rel_name;
+
+          if (offset == 0)
+            /* Cache size.  */
+            return (((ebx >> 22) + 1)
+                * (((ebx >> 12) & 0x3ff) + 1)
+                * ((ebx & 0xfff) + 1)
+                * (ecx + 1));
+          if (offset == 1)
+            return (ebx >> 22) + 1;
+
+          assert (offset == 2);
+          return (ebx & 0xfff) + 1;
+        }
+
+      ++round;
+    }
+
+  /* Nothing found.  */
+  return 0;
+}
diff --git a/sysdeps/x86/dl-get-cpu-features.c b/sysdeps/x86/dl-get-cpu-features.c
index 5f9e46b0c6..349472d99f 100644
--- a/sysdeps/x86/dl-get-cpu-features.c
+++ b/sysdeps/x86/dl-get-cpu-features.c
@@ -1,4 +1,4 @@
-/* This file is part of the GNU C Library.
+/* Initialize CPU feature data via IFUNC relocation.
    Copyright (C) 2015-2020 Free Software Foundation, Inc.
 
    The GNU C Library is free software; you can redistribute it and/or
@@ -18,6 +18,31 @@
 
 #include <ldsodefs.h>
 
+#ifdef SHARED
+# include <cpu-features.c>
+
+/* NB: Normally, DL_PLATFORM_INIT calls init_cpu_features to initialize
+   CPU features in dynamic executable.  But when loading ld.so inside of
+   static executable, DL_PLATFORM_INIT isn't called and IFUNC relocation
+   is used to call init_cpu_features.  In static executable, it is called
+   once by IFUNC relocation.  In dynamic executable, it is called twice
+   by DL_PLATFORM_INIT and by IFUNC relocation.  */
+extern void __x86_cpu_features (void) attribute_hidden;
+const void (*__x86_cpu_features_p) (void) attribute_hidden
+  = __x86_cpu_features;
+
+void
+_dl_x86_init_cpu_features (void)
+{
+  struct cpu_features *cpu_features = __get_cpu_features ();
+  if (cpu_features->basic.kind == arch_kind_unknown)
+    init_cpu_features (cpu_features);
+}
+
+__ifunc (__x86_cpu_features, __x86_cpu_features, NULL, void,
+	 _dl_x86_init_cpu_features);
+#endif
+
 #undef __x86_get_cpu_features
 
 const struct cpu_features *
diff --git a/sysdeps/x86/include/cpu-features.h b/sysdeps/x86/include/cpu-features.h
index dcf29b6fe8..f62be0b9b3 100644
--- a/sysdeps/x86/include/cpu-features.h
+++ b/sysdeps/x86/include/cpu-features.h
@@ -159,6 +159,7 @@ struct cpu_features
 /* Unused for x86.  */
 #  define INIT_ARCH()
 #  define __x86_get_cpu_features(max) (&GLRO(dl_x86_cpu_features))
+extern void _dl_x86_init_cpu_features (void) attribute_hidden;
 # endif
 
 # ifdef __x86_64__
diff --git a/sysdeps/x86/libc-start.c b/sysdeps/x86/libc-start.c
index 875bb93e55..ca35d4da97 100644
--- a/sysdeps/x86/libc-start.c
+++ b/sysdeps/x86/libc-start.c
@@ -20,7 +20,6 @@
    PIE.  */
 # include <startup.h>
 # include <ldsodefs.h>
-# include <cpu-features.h>
 # include <cpu-features.c>
 
 extern struct cpu_features _dl_x86_cpu_features;
diff --git a/sysdeps/x86_64/dl-machine.h b/sysdeps/x86_64/dl-machine.h
index ca73d8fef9..bb93c7c6ab 100644
--- a/sysdeps/x86_64/dl-machine.h
+++ b/sysdeps/x86_64/dl-machine.h
@@ -26,7 +26,6 @@
 #include <sysdep.h>
 #include <tls.h>
 #include <dl-tlsdesc.h>
-#include <cpu-features.c>
 
 /* Return nonzero iff ELF header is compatible with the running host.  */
 static inline int __attribute__ ((unused))
@@ -223,9 +222,9 @@ static inline void __attribute__ ((unused))
 dl_platform_init (void)
 {
 #if IS_IN (rtld)
-  /* init_cpu_features has been called early from __libc_start_main in
-     static executable.  */
-  init_cpu_features (&GLRO(dl_x86_cpu_features));
+  /* _dl_x86_init_cpu_features is a wrapper for init_cpu_features which
+     has been called early from __libc_start_main in static executable.  */
+  _dl_x86_init_cpu_features ();
 #else
   if (GLRO(dl_platform) != NULL && *GLRO(dl_platform) == '\0')
     /* Avoid an empty string which would disturb us.  */
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* PING: V4 [PATCH] x86: Initialize CPU info via IFUNC relocation [BZ 26203]
  2020-10-01 19:50                         ` V4 " H.J. Lu via Libc-alpha
@ 2020-10-08 13:22                           ` H.J. Lu via Libc-alpha
  2020-10-15 12:53                             ` PING^2: " H.J. Lu via Libc-alpha
  0 siblings, 1 reply; 33+ messages in thread
From: H.J. Lu via Libc-alpha @ 2020-10-08 13:22 UTC (permalink / raw)
  To: Florian Weimer; +Cc: H.J. Lu via Libc-alpha

On Thu, Oct 1, 2020 at 12:50 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Thu, Oct 1, 2020 at 1:46 AM Florian Weimer <fweimer@redhat.com> wrote:
> >
> > * H. J. Lu:
> >
> > > diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
> > > index dadec5d58f..65ab29123d 100644
> > > --- a/sysdeps/x86/cacheinfo.c
> > > +++ b/sysdeps/x86/cacheinfo.c
> > > @@ -16,7 +16,9 @@
> > >     License along with the GNU C Library; if not, see
> > >     <https://www.gnu.org/licenses/>.  */
> > >
> > > -#if IS_IN (libc)
> > > +/* NB: In libc.a, this file is included in libc-static.c.  In libc.so,
> > > +   this file is standalone.  */
> > > +#if IS_IN (libc) && (defined SHARED || defined _PRIVATE_CPU_FEATURES_H)
> >
> > libc-static.c should be libc-start.c, I believe.  The “defined
> > _PRIVATE_CPU_FEATURES_H” part seems rather indirect.  What exactly are
> > you trying to accomplish here?
> >
> > It looks to me as if this file should included in libc.so, but not
> > pulled into ld.so via the rebuild, so maybe you can add an empty
> > sysdeps/x86/rtld-cacheinfo.c file instead?
> >
>
> Here is the updated patch.   I also moved files around to prepare
> for moving x86 processor cache info to cpu_features in ld.so to
> support --list-tunables.
>

PING:

https://sourceware.org/pipermail/libc-alpha/2020-October/118228.html

-- 
H.J.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* PING^2: V4 [PATCH] x86: Initialize CPU info via IFUNC relocation [BZ 26203]
  2020-10-08 13:22                           ` PING: " H.J. Lu via Libc-alpha
@ 2020-10-15 12:53                             ` H.J. Lu via Libc-alpha
  2022-05-02 13:59                               ` Sunil Pandey via Libc-alpha
  0 siblings, 1 reply; 33+ messages in thread
From: H.J. Lu via Libc-alpha @ 2020-10-15 12:53 UTC (permalink / raw)
  To: Florian Weimer; +Cc: H.J. Lu via Libc-alpha

On Thu, Oct 8, 2020 at 6:22 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Thu, Oct 1, 2020 at 12:50 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Thu, Oct 1, 2020 at 1:46 AM Florian Weimer <fweimer@redhat.com> wrote:
> > >
> > > * H. J. Lu:
> > >
> > > > diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
> > > > index dadec5d58f..65ab29123d 100644
> > > > --- a/sysdeps/x86/cacheinfo.c
> > > > +++ b/sysdeps/x86/cacheinfo.c
> > > > @@ -16,7 +16,9 @@
> > > >     License along with the GNU C Library; if not, see
> > > >     <https://www.gnu.org/licenses/>.  */
> > > >
> > > > -#if IS_IN (libc)
> > > > +/* NB: In libc.a, this file is included in libc-static.c.  In libc.so,
> > > > +   this file is standalone.  */
> > > > +#if IS_IN (libc) && (defined SHARED || defined _PRIVATE_CPU_FEATURES_H)
> > >
> > > libc-static.c should be libc-start.c, I believe.  The “defined
> > > _PRIVATE_CPU_FEATURES_H” part seems rather indirect.  What exactly are
> > > you trying to accomplish here?
> > >
> > > It looks to me as if this file should included in libc.so, but not
> > > pulled into ld.so via the rebuild, so maybe you can add an empty
> > > sysdeps/x86/rtld-cacheinfo.c file instead?
> > >
> >
> > Here is the updated patch.   I also moved files around to prepare
> > for moving x86 processor cache info to cpu_features in ld.so to
> > support --list-tunables.
> >
>
> PING:
>
> https://sourceware.org/pipermail/libc-alpha/2020-October/118228.html
>

I will check it in tomorrow if there are no objections.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PING^2: V4 [PATCH] x86: Initialize CPU info via IFUNC relocation [BZ 26203]
  2020-10-15 12:53                             ` PING^2: " H.J. Lu via Libc-alpha
@ 2022-05-02 13:59                               ` Sunil Pandey via Libc-alpha
  2022-05-03 18:51                                 ` Sunil Pandey via Libc-alpha
  0 siblings, 1 reply; 33+ messages in thread
From: Sunil Pandey via Libc-alpha @ 2022-05-02 13:59 UTC (permalink / raw)
  To: H.J. Lu, Libc-stable Mailing List; +Cc: Florian Weimer, H.J. Lu via Libc-alpha

On Thu, Oct 15, 2020 at 5:54 AM H.J. Lu via Libc-alpha
<libc-alpha@sourceware.org> wrote:
>
> On Thu, Oct 8, 2020 at 6:22 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Thu, Oct 1, 2020 at 12:50 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > On Thu, Oct 1, 2020 at 1:46 AM Florian Weimer <fweimer@redhat.com> wrote:
> > > >
> > > > * H. J. Lu:
> > > >
> > > > > diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
> > > > > index dadec5d58f..65ab29123d 100644
> > > > > --- a/sysdeps/x86/cacheinfo.c
> > > > > +++ b/sysdeps/x86/cacheinfo.c
> > > > > @@ -16,7 +16,9 @@
> > > > >     License along with the GNU C Library; if not, see
> > > > >     <https://www.gnu.org/licenses/>.  */
> > > > >
> > > > > -#if IS_IN (libc)
> > > > > +/* NB: In libc.a, this file is included in libc-static.c.  In libc.so,
> > > > > +   this file is standalone.  */
> > > > > +#if IS_IN (libc) && (defined SHARED || defined _PRIVATE_CPU_FEATURES_H)
> > > >
> > > > libc-static.c should be libc-start.c, I believe.  The “defined
> > > > _PRIVATE_CPU_FEATURES_H” part seems rather indirect.  What exactly are
> > > > you trying to accomplish here?
> > > >
> > > > It looks to me as if this file should included in libc.so, but not
> > > > pulled into ld.so via the rebuild, so maybe you can add an empty
> > > > sysdeps/x86/rtld-cacheinfo.c file instead?
> > > >
> > >
> > > Here is the updated patch.   I also moved files around to prepare
> > > for moving x86 processor cache info to cpu_features in ld.so to
> > > support --list-tunables.
> > >
> >
> > PING:
> >
> > https://sourceware.org/pipermail/libc-alpha/2020-October/118228.html
> >
>
> I will check it in tomorrow if there are no objections.
>
> Thanks.
>
> --
> H.J.

I would like to backport this patch to release branches.
Any comments or objections?

--Sunil

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: PING^2: V4 [PATCH] x86: Initialize CPU info via IFUNC relocation [BZ 26203]
  2022-05-02 13:59                               ` Sunil Pandey via Libc-alpha
@ 2022-05-03 18:51                                 ` Sunil Pandey via Libc-alpha
  0 siblings, 0 replies; 33+ messages in thread
From: Sunil Pandey via Libc-alpha @ 2022-05-03 18:51 UTC (permalink / raw)
  To: H.J. Lu, Libc-stable Mailing List; +Cc: Florian Weimer, H.J. Lu via Libc-alpha

On Mon, May 2, 2022 at 6:59 AM Sunil Pandey <skpgkp2@gmail.com> wrote:
>
> On Thu, Oct 15, 2020 at 5:54 AM H.J. Lu via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
> >
> > On Thu, Oct 8, 2020 at 6:22 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > On Thu, Oct 1, 2020 at 12:50 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > >
> > > > On Thu, Oct 1, 2020 at 1:46 AM Florian Weimer <fweimer@redhat.com> wrote:
> > > > >
> > > > > * H. J. Lu:
> > > > >
> > > > > > diff --git a/sysdeps/x86/cacheinfo.c b/sysdeps/x86/cacheinfo.c
> > > > > > index dadec5d58f..65ab29123d 100644
> > > > > > --- a/sysdeps/x86/cacheinfo.c
> > > > > > +++ b/sysdeps/x86/cacheinfo.c
> > > > > > @@ -16,7 +16,9 @@
> > > > > >     License along with the GNU C Library; if not, see
> > > > > >     <https://www.gnu.org/licenses/>.  */
> > > > > >
> > > > > > -#if IS_IN (libc)
> > > > > > +/* NB: In libc.a, this file is included in libc-static.c.  In libc.so,
> > > > > > +   this file is standalone.  */
> > > > > > +#if IS_IN (libc) && (defined SHARED || defined _PRIVATE_CPU_FEATURES_H)
> > > > >
> > > > > libc-static.c should be libc-start.c, I believe.  The “defined
> > > > > _PRIVATE_CPU_FEATURES_H” part seems rather indirect.  What exactly are
> > > > > you trying to accomplish here?
> > > > >
> > > > > It looks to me as if this file should included in libc.so, but not
> > > > > pulled into ld.so via the rebuild, so maybe you can add an empty
> > > > > sysdeps/x86/rtld-cacheinfo.c file instead?
> > > > >
> > > >
> > > > Here is the updated patch.   I also moved files around to prepare
> > > > for moving x86 processor cache info to cpu_features in ld.so to
> > > > support --list-tunables.
> > > >
> > >
> > > PING:
> > >
> > > https://sourceware.org/pipermail/libc-alpha/2020-October/118228.html
> > >
> >
> > I will check it in tomorrow if there are no objections.
> >
> > Thanks.
> >
> > --
> > H.J.
>
> I would like to backport this patch to release branches.
> Any comments or objections?
>
> --Sunil

I have to stop backporting at 2.33.

There is major x86 restructuring and inter patch dependency in the 2.32 branch.
Resolving backport conflict in the 2.32 branch has a cascading effect
on existing
patches.

--Sunil

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2022-05-03 18:52 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-18 16:07 V2 [PATCH 0/4] ld.so: Add --list-tunables to print tunable values H.J. Lu via Libc-alpha
2020-09-18 16:07 ` [PATCH 1/4] x86: Initialize CPU info via IFUNC relocation [BZ 26203] H.J. Lu via Libc-alpha
2020-09-28 13:08   ` Florian Weimer via Libc-alpha
2020-09-28 13:48     ` H.J. Lu via Libc-alpha
2020-09-28 14:05       ` Florian Weimer via Libc-alpha
2020-09-28 14:20         ` H.J. Lu via Libc-alpha
2020-09-28 14:22           ` Florian Weimer via Libc-alpha
2020-09-28 14:39             ` H.J. Lu via Libc-alpha
2020-09-28 14:47               ` Florian Weimer via Libc-alpha
2020-09-28 17:54                 ` V3 [PATCH] " H.J. Lu via Libc-alpha
2020-09-29  7:53                   ` Florian Weimer via Libc-alpha
2020-09-29 11:44                     ` H.J. Lu via Libc-alpha
2020-10-01  8:46                       ` Florian Weimer via Libc-alpha
2020-10-01 19:50                         ` V4 " H.J. Lu via Libc-alpha
2020-10-08 13:22                           ` PING: " H.J. Lu via Libc-alpha
2020-10-15 12:53                             ` PING^2: " H.J. Lu via Libc-alpha
2022-05-02 13:59                               ` Sunil Pandey via Libc-alpha
2022-05-03 18:51                                 ` Sunil Pandey via Libc-alpha
2020-09-18 16:07 ` [PATCH 2/4] Set tunable value as well as min/max values H.J. Lu via Libc-alpha
2020-09-28 13:35   ` Florian Weimer via Libc-alpha
2020-09-28 13:53     ` H.J. Lu via Libc-alpha
2020-09-28 14:03       ` Florian Weimer via Libc-alpha
2020-09-28 17:30     ` Siddhesh Poyarekar via Libc-alpha
2020-09-29  4:00       ` V3 [PATCH] " H.J. Lu via Libc-alpha
2020-09-29  4:45         ` Siddhesh Poyarekar via Libc-alpha
2020-09-29  4:47           ` Siddhesh Poyarekar
2020-09-29 12:30             ` V4 " H.J. Lu via Libc-alpha
2020-09-29 13:50               ` Siddhesh Poyarekar via Libc-alpha
2020-09-29 14:54                 ` V5 " H.J. Lu via Libc-alpha
2020-09-29 15:58                   ` Siddhesh Poyarekar via Libc-alpha
2020-09-18 16:07 ` [PATCH 3/4] x86: Move x86 processor cache info to cpu_features H.J. Lu via Libc-alpha
2020-09-18 16:07 ` [PATCH 4/4] ld.so: Add --list-tunables to print tunable values H.J. Lu via Libc-alpha
2020-09-21  8:25   ` Florian Weimer via Libc-alpha

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).