unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 0/3] malloc: improve THP effectiveness
@ 2021-08-13 21:04 Adhemerval Zanella via Libc-alpha
  2021-08-13 21:04 ` [PATCH 1/3] malloc: Add madvise support for Transparent Huge Pages Adhemerval Zanella via Libc-alpha
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Adhemerval Zanella via Libc-alpha @ 2021-08-13 21:04 UTC (permalink / raw)
  To: libc-alpha; +Cc: Norbert Manthey, Siddhesh Poyarekar, Guillaume Morin

Linux Transparent Huge Pages (THP) current support three different
states [1]: 'never', 'madvise', and 'always'.  The 'never' is
self-explanatory and 'always' will enable THP for all anonymous
memory.  However, 'madvise' is still the default for some system and
for such case THP will be only used if the memory range is explicity
advertise by the program through the madvise(MADV_HUGEPAGE) call.

This patchset adds a new tunable, 'glibc.malloc.thp_pagesize', which
allows the user to explicit use THP on anonymous page even if the
'madvise' state is set.   The usage should be transparent for mmap()
call, and the madvise(MADV_HUGEPAGE) call is only set for sizes large
than the THP huge page size.

The sbrk() change alters the program memory allocation since the
increment is now aligned to the huge page size, instead of default page
size.  It is enable as with the new tunable.

This patchset adds THP for aarch64, mips, powerpc, riscv, s390, sparc,
and x86.  These are the architecture that have
HAVE_ARCH_TRANSPARENT_HUGEPAGE as default and define the internal
flags for THP support (I might have missed some architecture).

Although it does improve THP effectiveness, it does not provide the same
features from libhugetlsfs morecore implementation [2], since it does
use MAP_HUGETLB explicit on mmap.  And I think this is not what we want
for glibc, it requires additional setup from the admin to mount the
hugetlsfs and reserve the pages with it outside from glibc scope.

The performance improvements are really dependent of the workload
and the platform, however a simple testcase might show the possible
improvements:

$ cat hugepages.cc
#include <unordered_map>

int
main (int argc, char *argv[])
{
  std::size_t iters = 10000000;
  std::unordered_map <std::size_t, std::size_t> ht;
  ht.reserve (iters);
  for (std::size_t i = 0; i < iters; ++i)
    ht.try_emplace (i, i);

  return 0;
}
$ g++ -std=c++17 -O2 hugepages.cc -o hugepages

On a x86_64 (Ryzen 9 5900X):

 Performance counter stats for 'env
GLIBC_TUNABLES=glibc.malloc.thp_pagesize=0 ./testrun.sh ./hugepages':

            98,874      faults                                                      
           717,059      dTLB-loads                                                  
           411,701      dTLB-load-misses          #   57.42% of all dTLB
cache accesses
         3,754,927      cache-misses              #    8.479 % of all
cache refs    
        44,287,580      cache-references                                            

       0.315278378 seconds time elapsed

       0.238635000 seconds user
       0.076714000 seconds sys

 Performance counter stats for 'env
GLIBC_TUNABLES=glibc.malloc.thp_pagesize=1 ./testrun.sh ./hugepages':

             1,871      faults                                                      
           120,035      dTLB-loads                                                  
            19,882      dTLB-load-misses          #   16.56% of all dTLB
cache accesses
         4,182,942      cache-misses              #    7.452 % of all
cache refs    
        56,128,995      cache-references                                            

       0.262620733 seconds time elapsed

       0.222233000 seconds user
       0.040333000 seconds sys


On an AArch64 (cortex A72):

 Performance counter stats for 'env
GLIBC_TUNABLES=glibc.malloc.thp_pagesize=0 ./testrun.sh ./hugepages':

             98835      faults                                                      
        2007234756      dTLB-loads                                                  
           4613669      dTLB-load-misses          #    0.23% of all dTLB
cache accesses
           8831801      cache-misses              #    0.504 % of all
cache refs    
        1751391405      cache-references                                            

       0.616782575 seconds time elapsed

       0.460946000 seconds user
       0.154309000 seconds sys

 Performance counter stats for 'env
GLIBC_TUNABLES=glibc.malloc.thp_pagesize=1 ./testrun.sh ./hugepages':

               955      faults                                                      
        1787401880      dTLB-loads                                                  
            224034      dTLB-load-misses          #    0.01% of all dTLB
cache accesses
           5480917      cache-misses              #    0.337 % of all
cache refs    
        1625937858      cache-references                                            

       0.487773443 seconds time elapsed

       0.440894000 seconds user
       0.046465000 seconds sys


And on a powerpc64 (POWER8):

 Performance counter stats for 'env
GLIBC_TUNABLES=glibc.malloc.thp_pagesize=0 ./testrun.sh ./hugepages
':

              5453      faults                                                      
              9940      dTLB-load-misses                                            
           1338152      cache-misses              #    0.101 % of all
cache refs    
        1326037487      cache-references                                            

       1.056355887 seconds time elapsed

       1.014633000 seconds user
       0.041805000 seconds sys

 Performance counter stats for 'env
GLIBC_TUNABLES=glibc.malloc.thp_pagesize=1 ./testrun.sh ./hugepages
':

              1016      faults                                                      
              1746      dTLB-load-misses                                            
            399052      cache-misses              #    0.030 % of all
cache refs    
        1316059877      cache-references                                            

       1.057810501 seconds time elapsed

       1.012175000 seconds user
       0.045624000 seconds sys

It is worth to note that the powerpc64 machine has 'always' set
on '/sys/kernel/mm/transparent_hugepage/enabled'.

Norbert Manthey's paper has more information with a more thoroughly
performance analysis.

This is a based of the previous RFC to enable Transparent Huge Page
with madvise [3], with fixes and improvements:

  * Remove the usage of MAP_HUGE_SHIFT since kernel only uses it
    when MAP_HUGETLB is also used.
  * Moved the option to a tunable and add documentation.
  * Remove the address alignmente before madvise call, since kernels
    already does it.
  * Added a arch-specific hook to return the THP huge page size.
  * Avoid calling sbrk() twice to align to THP and remove a lot of
    unecessary internal state.
  * Add a NEWS entry.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/mm/transhuge.rst
[2] https://sourceware.org/pipermail/libc-alpha/2021-July/129041.html
[3] https://arxiv.org/pdf/2004.14378.pdf
[4] https://sourceware.org/pipermail/libc-alpha/2020-May/113539.html

Adhemerval Zanella (3):
  malloc: Add madvise support for Transparent Huge Pages
  malloc: Add THP/madvise support for sbrk
  malloc: Add arch-specific malloc_verify_thp_pagesize for Linux

 NEWS                                         |  5 +-
 elf/dl-tunables.list                         |  5 ++
 elf/tst-rtld-list-tunables.exp               |  1 +
 malloc/arena.c                               |  5 ++
 malloc/malloc-internal.h                     |  1 +
 malloc/malloc.c                              | 85 ++++++++++++++++++--
 manual/tunables.texi                         | 11 +++
 sysdeps/generic/malloc-thp.h                 | 32 ++++++++
 sysdeps/unix/sysv/linux/aarch64/malloc-thp.h | 40 +++++++++
 sysdeps/unix/sysv/linux/mips/malloc-thp.h    | 39 +++++++++
 sysdeps/unix/sysv/linux/powerpc/malloc-thp.h | 56 +++++++++++++
 sysdeps/unix/sysv/linux/riscv/malloc-thp.h   | 32 ++++++++
 sysdeps/unix/sysv/linux/s390/malloc-thp.h    | 33 ++++++++
 sysdeps/unix/sysv/linux/sparc/malloc-thp.h   | 36 +++++++++
 sysdeps/unix/sysv/linux/x86/malloc-thp.h     | 32 ++++++++
 15 files changed, 407 insertions(+), 6 deletions(-)
 create mode 100644 sysdeps/generic/malloc-thp.h
 create mode 100644 sysdeps/unix/sysv/linux/aarch64/malloc-thp.h
 create mode 100644 sysdeps/unix/sysv/linux/mips/malloc-thp.h
 create mode 100644 sysdeps/unix/sysv/linux/powerpc/malloc-thp.h
 create mode 100644 sysdeps/unix/sysv/linux/riscv/malloc-thp.h
 create mode 100644 sysdeps/unix/sysv/linux/s390/malloc-thp.h
 create mode 100644 sysdeps/unix/sysv/linux/sparc/malloc-thp.h
 create mode 100644 sysdeps/unix/sysv/linux/x86/malloc-thp.h

-- 
2.30.2


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/3] malloc: Add madvise support for Transparent Huge Pages
  2021-08-13 21:04 [PATCH 0/3] malloc: improve THP effectiveness Adhemerval Zanella via Libc-alpha
@ 2021-08-13 21:04 ` Adhemerval Zanella via Libc-alpha
  2021-08-13 21:04 ` [PATCH 2/3] malloc: Add THP/madvise support for sbrk Adhemerval Zanella via Libc-alpha
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Adhemerval Zanella via Libc-alpha @ 2021-08-13 21:04 UTC (permalink / raw)
  To: libc-alpha; +Cc: Norbert Manthey, Siddhesh Poyarekar, Guillaume Morin

Linux Transparent Huge Pages (THP) current support three different
states: 'never', 'madvise', and 'always'.  The 'never' is
self-explanatory and 'always' will enable THP for all anonymous
memory.  However, 'madvise' is still the default for some system and
for such case THP will be only used if the memory range is explicity
advertise by the program through the madvise(MADV_HUGEPAGE) call.

To enable it a new tunable is provided, 'glibc.malloc.thp_pagesize',
where the user can either enable THP through madvise using the
default huge page size by using a value of '1' or by specifying
a different large page size if the system supports it (Linux current
only support one page size for THP, even if the architecture supports
multiple sizes).

This patch issues the madvise(MADV_HUGEPAGE) call after a successful
mmap() call at sysmalloc().  The default malloc_verify_thp_pagesize()
does not enable it even if the tunable is set.

Checked on x86_64-linux-gnu.
---
 NEWS                           |  5 +++-
 elf/dl-tunables.list           |  5 ++++
 elf/tst-rtld-list-tunables.exp |  1 +
 malloc/arena.c                 |  5 ++++
 malloc/malloc-internal.h       |  1 +
 malloc/malloc.c                | 45 ++++++++++++++++++++++++++++++++++
 manual/tunables.texi           | 11 +++++++++
 sysdeps/generic/malloc-thp.h   | 32 ++++++++++++++++++++++++
 8 files changed, 104 insertions(+), 1 deletion(-)
 create mode 100644 sysdeps/generic/malloc-thp.h

diff --git a/NEWS b/NEWS
index 79c895e382..85b7933e4d 100644
--- a/NEWS
+++ b/NEWS
@@ -9,7 +9,10 @@ Version 2.35
 
 Major new features:
 
-  [Add new features here]
+* On Linux, a new tunable, glibc.malloc.thp_pagesize, can be used to
+  make malloc issue madvise plus MADV_HUGEPAGE on mmap and sbrk calls.
+  It force the use of Transparent Huge Pages when madvise global mode
+  is set and might improve performance depending of the workload.
 
 Deprecated and removed features, and other changes affecting compatibility:
 
diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list
index 8ddd4a2314..77d3662ffd 100644
--- a/elf/dl-tunables.list
+++ b/elf/dl-tunables.list
@@ -92,6 +92,11 @@ glibc {
       minval: 0
       security_level: SXID_IGNORE
     }
+    thp_pagesize {
+      type: SIZE_T
+      minval: 0
+      default: 0
+    }
   }
   cpu {
     hwcap_mask {
diff --git a/elf/tst-rtld-list-tunables.exp b/elf/tst-rtld-list-tunables.exp
index 9f66c52885..532af4eabc 100644
--- a/elf/tst-rtld-list-tunables.exp
+++ b/elf/tst-rtld-list-tunables.exp
@@ -8,6 +8,7 @@ glibc.malloc.perturb: 0 (min: 0, max: 255)
 glibc.malloc.tcache_count: 0x0 (min: 0x0, max: 0x[f]+)
 glibc.malloc.tcache_max: 0x0 (min: 0x0, max: 0x[f]+)
 glibc.malloc.tcache_unsorted_limit: 0x0 (min: 0x0, max: 0x[f]+)
+glibc.malloc.thp_pagesize: 0x0 (min: 0x0, max: 0x[f]+)
 glibc.malloc.top_pad: 0x0 (min: 0x0, max: 0x[f]+)
 glibc.malloc.trim_threshold: 0x0 (min: 0x0, max: 0x[f]+)
 glibc.rtld.nns: 0x4 (min: 0x1, max: 0x10)
diff --git a/malloc/arena.c b/malloc/arena.c
index 667484630e..7ec316a906 100644
--- a/malloc/arena.c
+++ b/malloc/arena.c
@@ -231,6 +231,7 @@ TUNABLE_CALLBACK_FNDECL (set_tcache_count, size_t)
 TUNABLE_CALLBACK_FNDECL (set_tcache_unsorted_limit, size_t)
 #endif
 TUNABLE_CALLBACK_FNDECL (set_mxfast, size_t)
+TUNABLE_CALLBACK_FNDECL (set_thp_pagesize, size_t)
 #else
 /* Initialization routine. */
 #include <string.h>
@@ -331,6 +332,7 @@ ptmalloc_init (void)
 	       TUNABLE_CALLBACK (set_tcache_unsorted_limit));
 # endif
   TUNABLE_GET (mxfast, size_t, TUNABLE_CALLBACK (set_mxfast));
+  TUNABLE_GET (thp_pagesize, size_t, TUNABLE_CALLBACK (set_thp_pagesize));
 #else
   if (__glibc_likely (_environ != NULL))
     {
@@ -509,6 +511,9 @@ new_heap (size_t size, size_t top_pad)
       __munmap (p2, HEAP_MAX_SIZE);
       return 0;
     }
+
+  sysmadvise_thp (p2, size);
+
   h = (heap_info *) p2;
   h->size = size;
   h->mprotect_size = size;
diff --git a/malloc/malloc-internal.h b/malloc/malloc-internal.h
index 0c7b5a183c..2efef06f35 100644
--- a/malloc/malloc-internal.h
+++ b/malloc/malloc-internal.h
@@ -22,6 +22,7 @@
 #include <malloc-machine.h>
 #include <malloc-sysdep.h>
 #include <malloc-size.h>
+#include <malloc-thp.h>
 
 /* Called in the parent process before a fork.  */
 void __malloc_fork_lock_parent (void) attribute_hidden;
diff --git a/malloc/malloc.c b/malloc/malloc.c
index e065785af7..52ea84a63d 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -1881,6 +1881,11 @@ struct malloc_par
   INTERNAL_SIZE_T arena_test;
   INTERNAL_SIZE_T arena_max;
 
+#if HAVE_TUNABLES
+  /* Transparent Large Page support.  */
+  INTERNAL_SIZE_T thp_pagesize;
+#endif
+
   /* Memory map support */
   int n_mmaps;
   int n_mmaps_max;
@@ -2009,6 +2014,20 @@ free_perturb (char *p, size_t n)
 
 #include <stap-probe.h>
 
+/* ----------- Routines dealing with transparent huge pages ----------- */
+
+static inline void
+sysmadvise_thp (void *p, INTERNAL_SIZE_T size)
+{
+#if HAVE_TUNABLES && defined (MADV_HUGEPAGE)
+  /* Do not consider areas smaller than a huge page or if the tunable is
+     not active.  */
+  if (mp_.thp_pagesize == 0 || size < mp_.thp_pagesize)
+    return;
+  __madvise (p, size, MADV_HUGEPAGE);
+#endif
+}
+
 /* ------------------- Support for multiple arenas -------------------- */
 #include "arena.c"
 
@@ -2446,6 +2465,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
 
           if (mm != MAP_FAILED)
             {
+	      sysmadvise_thp (mm, size);
+
               /*
                  The offset to the start of the mmapped region is stored
                  in the prev_size field of the chunk. This allows us to adjust
@@ -2607,6 +2628,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
       if (size > 0)
         {
           brk = (char *) (MORECORE (size));
+	  if (brk != (char *) (MORECORE_FAILURE))
+	    sysmadvise_thp (brk, size);
           LIBC_PROBE (memory_sbrk_more, 2, brk, size);
         }
 
@@ -2638,6 +2661,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
 
               if (mbrk != MAP_FAILED)
                 {
+		  sysmadvise_thp (mbrk, size);
+
                   /* We do not need, and cannot use, another sbrk call to find end */
                   brk = mbrk;
                   snd_brk = brk + size;
@@ -2749,6 +2774,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
                       correction = 0;
                       snd_brk = (char *) (MORECORE (0));
                     }
+		  else
+		    sysmadvise_thp (snd_brk, correction);
                 }
 
               /* handle non-contiguous cases */
@@ -2989,6 +3016,8 @@ mremap_chunk (mchunkptr p, size_t new_size)
   if (cp == MAP_FAILED)
     return 0;
 
+  sysmadvise_thp (cp, new_size);
+
   p = (mchunkptr) (cp + offset);
 
   assert (aligned_OK (chunk2mem (p)));
@@ -5325,6 +5354,22 @@ do_set_mxfast (size_t value)
   return 0;
 }
 
+#if HAVE_TUNABLES
+static __always_inline int
+do_set_thp_pagesize (size_t value)
+{
+  /* Only enable THP through madvise if the arch-specific return size is
+     larger than the default page size.  */
+  if (value > 0)
+    {
+      size_t thps = malloc_verify_thp_pagesize (value);
+      if (thps != GLRO(dl_pagesize))
+	mp_.thp_pagesize = thps;
+    }
+  return 0;
+}
+#endif
+
 int
 __libc_mallopt (int param_number, int value)
 {
diff --git a/manual/tunables.texi b/manual/tunables.texi
index 658547c613..3364e85ef5 100644
--- a/manual/tunables.texi
+++ b/manual/tunables.texi
@@ -270,6 +270,17 @@ pointer, so add 4 on 32-bit systems or 8 on 64-bit systems to the size
 passed to @code{malloc} for the largest bin size to enable.
 @end deftp
 
+@deftp Tunable glibc.malloc.thp_pagesize
+This tunable enables support for Transparent Huge Page through @code{madvise}
+with @code{MADV_HUGEPAGE} on the allocated memory range after @code{malloc}
+calls the system allocator.  Each architecture defines set of possible values,
+and the input value is rounded to the supported one.
+
+The default value of this tunable is 0, which disable its usage.  The value
+of 1 meants to use the default Huge Page size for the architecture, and
+a value larger than 2 is rounded to the supported size.
+@end deftp
+
 @node Dynamic Linking Tunables
 @section Dynamic Linking Tunables
 @cindex dynamic linking tunables
diff --git a/sysdeps/generic/malloc-thp.h b/sysdeps/generic/malloc-thp.h
new file mode 100644
index 0000000000..d70ceb8e1e
--- /dev/null
+++ b/sysdeps/generic/malloc-thp.h
@@ -0,0 +1,32 @@
+/* Transparent Huge Page support.  Generic implementation.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public License as
+   published by the Free Software Foundation; either version 2.1 of the
+   License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; see the file COPYING.LIB.  If
+   not, see <https://www.gnu.org/licenses/>.  */
+
+#ifndef _MALLOC_THP_H
+#define _MALLOC_THP_H
+
+#include <ldsodefs.h>
+
+/* Return the prefered large page size for the request PAGESIZE.  The
+   requested value of 1 means the default size for the architecture.  */
+static inline size_t
+malloc_verify_thp_pagesize (size_t pagesize)
+{
+  return GLRO(dl_pagesize);
+}
+
+#endif /* _MALLOC_THP_H */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/3] malloc: Add THP/madvise support for sbrk
  2021-08-13 21:04 [PATCH 0/3] malloc: improve THP effectiveness Adhemerval Zanella via Libc-alpha
  2021-08-13 21:04 ` [PATCH 1/3] malloc: Add madvise support for Transparent Huge Pages Adhemerval Zanella via Libc-alpha
@ 2021-08-13 21:04 ` Adhemerval Zanella via Libc-alpha
  2021-08-13 21:04 ` [PATCH 3/3] malloc: Add arch-specific malloc_verify_thp_pagesize for Linux Adhemerval Zanella via Libc-alpha
  2021-08-13 21:37 ` [PATCH 0/3] malloc: improve THP effectiveness Guillaume Morin
  3 siblings, 0 replies; 7+ messages in thread
From: Adhemerval Zanella via Libc-alpha @ 2021-08-13 21:04 UTC (permalink / raw)
  To: libc-alpha; +Cc: Norbert Manthey, Siddhesh Poyarekar, Guillaume Morin

For the main arena, the sbrk() might the preferable syscall instead of
mmap().  And the granularity used when increasing the program segment
is the default page size.

To increase effectiveness with Transparent Huge Page with madvise, the
large page size is use instead.  This is enabled with the new tunable
'glibc.malloc.thp_pagesize'.

Checked on x86_64-linux-gnu.
---
 malloc/malloc.c | 40 +++++++++++++++++++++++++++++++++++-----
 1 file changed, 35 insertions(+), 5 deletions(-)

diff --git a/malloc/malloc.c b/malloc/malloc.c
index 52ea84a63d..7cd586c866 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -2028,6 +2028,38 @@ sysmadvise_thp (void *p, INTERNAL_SIZE_T size)
 #endif
 }
 
+static inline long int
+thp_brk_align_up (long int size)
+{
+  INTERNAL_SIZE_T r = size;
+#if HAVE_TUNABLES && defined (MADV_HUGEPAGE)
+  /* Defined in brk.c.  */
+  extern void *__curbrk;
+  if (mp_.thp_pagesize != 0)
+    {
+      uintptr_t top = ALIGN_UP ((uintptr_t)__curbrk + size, mp_.thp_pagesize);
+      r = top - (uintptr_t)__curbrk;
+    }
+  else
+#endif
+    r = ALIGN_UP (size, GLRO(dl_pagesize));
+  return r;
+}
+
+static inline long
+thp_brk_align_down (long int top)
+{
+  long r;
+#if HAVE_TUNABLES && defined (MADV_HUGEPAGE)
+  if (mp_.thp_pagesize != 0)
+    r = ALIGN_DOWN (top, mp_.thp_pagesize);
+  else
+#endif
+    r = ALIGN_DOWN (top, GLRO(dl_pagesize));
+  return r;
+}
+
+
 /* ------------------- Support for multiple arenas -------------------- */
 #include "arena.c"
 
@@ -2610,14 +2642,14 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
         size -= old_size;
 
       /*
-         Round to a multiple of page size.
+         Round to a multiple of page size or huge page size.
          If MORECORE is not contiguous, this ensures that we only call it
          with whole-page arguments.  And if MORECORE is contiguous and
          this is not first time through, this preserves page-alignment of
          previous calls. Otherwise, we correct to page-align below.
        */
 
-      size = ALIGN_UP (size, pagesize);
+      size = thp_brk_align_up (size);
 
       /*
          Don't try to call MORECORE if argument is so big as to appear
@@ -2900,10 +2932,8 @@ systrim (size_t pad, mstate av)
   long released;         /* Amount actually released */
   char *current_brk;     /* address returned by pre-check sbrk call */
   char *new_brk;         /* address returned by post-check sbrk call */
-  size_t pagesize;
   long top_area;
 
-  pagesize = GLRO (dl_pagesize);
   top_size = chunksize (av->top);
 
   top_area = top_size - MINSIZE - 1;
@@ -2911,7 +2941,7 @@ systrim (size_t pad, mstate av)
     return 0;
 
   /* Release in pagesize units and round down to the nearest page.  */
-  extra = ALIGN_DOWN(top_area - pad, pagesize);
+  extra = thp_brk_align_down (top_area - pad);
 
   if (extra == 0)
     return 0;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 3/3] malloc: Add arch-specific malloc_verify_thp_pagesize for Linux
  2021-08-13 21:04 [PATCH 0/3] malloc: improve THP effectiveness Adhemerval Zanella via Libc-alpha
  2021-08-13 21:04 ` [PATCH 1/3] malloc: Add madvise support for Transparent Huge Pages Adhemerval Zanella via Libc-alpha
  2021-08-13 21:04 ` [PATCH 2/3] malloc: Add THP/madvise support for sbrk Adhemerval Zanella via Libc-alpha
@ 2021-08-13 21:04 ` Adhemerval Zanella via Libc-alpha
  2021-08-13 21:37 ` [PATCH 0/3] malloc: improve THP effectiveness Guillaume Morin
  3 siblings, 0 replies; 7+ messages in thread
From: Adhemerval Zanella via Libc-alpha @ 2021-08-13 21:04 UTC (permalink / raw)
  To: libc-alpha; +Cc: Norbert Manthey, Siddhesh Poyarekar, Guillaume Morin

Not all architectures have Transparent Huge Page (THP) support enabled
by default, so this patch only adds support for the one that have
HAVE_ARCH_TRANSPARENT_HUGEPAGE defined.  Also, Linux THP only support
one Huge Page size, so malloc_verify_thp_pagesize() returns only the
default value for the architecture.

The x86, sparc, and riscv are straightforward since they only support
one possible value.  AArch64 and mips, which supports multiple pages
sizes, can use a direct map to the Large Page support.  PowerPC is
the only architecture where its THP size depends not only on the
configured page size, but also on which MMU is used.  For this case
the sysfs file is used instead.

Checked on x86_64-linux-gnu, aarch64-linux-gnu, and
powerpc64le-linux-gnu.
---
 sysdeps/unix/sysv/linux/aarch64/malloc-thp.h | 40 ++++++++++++++
 sysdeps/unix/sysv/linux/mips/malloc-thp.h    | 39 ++++++++++++++
 sysdeps/unix/sysv/linux/powerpc/malloc-thp.h | 56 ++++++++++++++++++++
 sysdeps/unix/sysv/linux/riscv/malloc-thp.h   | 32 +++++++++++
 sysdeps/unix/sysv/linux/s390/malloc-thp.h    | 33 ++++++++++++
 sysdeps/unix/sysv/linux/sparc/malloc-thp.h   | 36 +++++++++++++
 sysdeps/unix/sysv/linux/x86/malloc-thp.h     | 32 +++++++++++
 7 files changed, 268 insertions(+)
 create mode 100644 sysdeps/unix/sysv/linux/aarch64/malloc-thp.h
 create mode 100644 sysdeps/unix/sysv/linux/mips/malloc-thp.h
 create mode 100644 sysdeps/unix/sysv/linux/powerpc/malloc-thp.h
 create mode 100644 sysdeps/unix/sysv/linux/riscv/malloc-thp.h
 create mode 100644 sysdeps/unix/sysv/linux/s390/malloc-thp.h
 create mode 100644 sysdeps/unix/sysv/linux/sparc/malloc-thp.h
 create mode 100644 sysdeps/unix/sysv/linux/x86/malloc-thp.h

diff --git a/sysdeps/unix/sysv/linux/aarch64/malloc-thp.h b/sysdeps/unix/sysv/linux/aarch64/malloc-thp.h
new file mode 100644
index 0000000000..e2e65446f2
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/aarch64/malloc-thp.h
@@ -0,0 +1,40 @@
+/* Transparent Huge Page support.  AArch64 implementation.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public License as
+   published by the Free Software Foundation; either version 2.1 of the
+   License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; see the file COPYING.LIB.  If
+   not, see <https://www.gnu.org/licenses/>.  */
+
+#ifndef _MALLOC_THP_H
+#define _MALLOC_THP_H
+
+#include <ldsodefs.h>
+#include <stdio.h>
+
+/* Return the prefered large page size for the request PAGESIZE.  The
+   requested value of 1 means the default size for the architecutre.
+   Returning 0 disables Large Parse usage.  */
+static inline size_t
+malloc_verify_thp_pagesize (size_t pagesize)
+{
+  /* AArch64 THP size depends of the default page size:
+      4k ->   2m
+     16k ->  32m
+     64k -> 512m  */
+  int page_shift = __builtin_ctzl (GLRO(dl_pagesize));
+  printf ("%s: page_shift=%d\n", __func__, page_shift);
+  return 1UL << ((page_shift - 3) * 2 + 3);
+}
+
+#endif /* _MALLOC_THP_H */
diff --git a/sysdeps/unix/sysv/linux/mips/malloc-thp.h b/sysdeps/unix/sysv/linux/mips/malloc-thp.h
new file mode 100644
index 0000000000..d8cdbd026d
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/mips/malloc-thp.h
@@ -0,0 +1,39 @@
+/* Transparent Huge Page support.  Generic implementation.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public License as
+   published by the Free Software Foundation; either version 2.1 of the
+   License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; see the file COPYING.LIB.  If
+   not, see <https://www.gnu.org/licenses/>.  */
+
+#ifndef _MALLOC_THP_H
+#define _MALLOC_THP_H
+
+#include <ldsodefs.h>
+
+/* Return the prefered large page size for the request PAGESIZE.  The
+   requested value of 1 means the default size for the architecture.  */
+static inline size_t
+malloc_verify_thp_pagesize (size_t pagesize)
+{
+  /* MIPS THP size depends of the default page size:
+      4k ->   2m
+      8k ->   8m
+     16k ->  32m
+     32k -> 128m
+     64k -> 512m   */
+  int page_shift = __builtin_ctzl (GLRO(dl_pagesize));
+  return 1UL << (page_shift + (page_shift - 3));
+}
+
+#endif /* _MALLOC_THP_H */
diff --git a/sysdeps/unix/sysv/linux/powerpc/malloc-thp.h b/sysdeps/unix/sysv/linux/powerpc/malloc-thp.h
new file mode 100644
index 0000000000..c3fcfa3386
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/powerpc/malloc-thp.h
@@ -0,0 +1,56 @@
+/* Transparent Huge Page support.  PowerPC implementation.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public License as
+   published by the Free Software Foundation; either version 2.1 of the
+   License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; see the file COPYING.LIB.  If
+   not, see <https://www.gnu.org/licenses/>.  */
+
+#ifndef _MALLOC_THP_H
+#define _MALLOC_THP_H
+
+#include <ldsodefs.h>
+#include <intprops.h>
+
+/* Return the prefered large page size for the request PAGESIZE.  The
+   requested value of 1 means the default size for the architecture.  */
+static inline size_t
+malloc_verify_thp_pagesize (size_t pagesize)
+{
+  /* PowerPC THP size depends of the default page size and which MMU hardware
+     is used.  So no easy way to statically map it, query the kernel
+     instead.  */
+  int fd = __open64_nocancel (
+    "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size", O_RDONLY);
+  if (fd == -1)
+    return 0;
+
+  size_t hps = 0;
+
+  char str[INT_BUFSIZE_BOUND (size_t)];
+  ssize_t r = __read_nocancel (fd, str, sizeof (str));
+  if (r > 0)
+    for (ssize_t i = 0; i < r; i++)
+      {
+	if (str[i] == '\n')
+	  break;
+	hps *= 10;
+	hps += str[i] - '0';
+      }
+
+  __close_nocancel (fd);
+
+  return hps;
+}
+
+#endif /* _MALLOC_THP_H */
diff --git a/sysdeps/unix/sysv/linux/riscv/malloc-thp.h b/sysdeps/unix/sysv/linux/riscv/malloc-thp.h
new file mode 100644
index 0000000000..aa38ca6dd6
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/riscv/malloc-thp.h
@@ -0,0 +1,32 @@
+/* Transparent Huge Page support.  RISCV implementation.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public License as
+   published by the Free Software Foundation; either version 2.1 of the
+   License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; see the file COPYING.LIB.  If
+   not, see <https://www.gnu.org/licenses/>.  */
+
+#ifndef _MALLOC_THP_H
+#define _MALLOC_THP_H
+
+#include <ldsodefs.h>
+
+/* Return the prefered large page size for the request PAGESIZE.  The
+   requested value of 1 means the default size for the architecture.  */
+static inline size_t
+malloc_verify_thp_pagesize (size_t pagesize)
+{
+  return 1UL << 21;
+}
+
+#endif /* _MALLOC_THP_H */
diff --git a/sysdeps/unix/sysv/linux/s390/malloc-thp.h b/sysdeps/unix/sysv/linux/s390/malloc-thp.h
new file mode 100644
index 0000000000..be6c26aa3b
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/s390/malloc-thp.h
@@ -0,0 +1,33 @@
+/* Transparent Huge Page support.  Generic implementation.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public License as
+   published by the Free Software Foundation; either version 2.1 of the
+   License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; see the file COPYING.LIB.  If
+   not, see <https://www.gnu.org/licenses/>.  */
+
+#ifndef _MALLOC_THP_H
+#define _MALLOC_THP_H
+
+#include <ldsodefs.h>
+
+/* Return the prefered large page size for the request PAGESIZE.  The
+   requested value of 1 means the default size for the architecture.  */
+static inline size_t
+malloc_verify_thp_pagesize (size_t pagesize)
+{
+  /* s390 uses 1M for THP.  */
+  return 1UL << 20;
+}
+
+#endif /* _MALLOC_THP_H */
diff --git a/sysdeps/unix/sysv/linux/sparc/malloc-thp.h b/sysdeps/unix/sysv/linux/sparc/malloc-thp.h
new file mode 100644
index 0000000000..83f6fdc114
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/sparc/malloc-thp.h
@@ -0,0 +1,36 @@
+/* Transparent Huge Page support.  SPARC implementation.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public License as
+   published by the Free Software Foundation; either version 2.1 of the
+   License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; see the file COPYING.LIB.  If
+   not, see <https://www.gnu.org/licenses/>.  */
+
+#ifndef _MALLOC_THP_H
+#define _MALLOC_THP_H
+
+#include <ldsodefs.h>
+
+/* Return the prefered large page size for the request PAGESIZE.  The
+   requested value of 1 means the default size for the architecture.  */
+static inline size_t
+malloc_verify_thp_pagesize (size_t pagesize)
+{
+#ifdef __arch64__
+  return 1UL << 23;
+#else
+  return 1UL << 18;
+#endif
+}
+
+#endif /* _MALLOC_THP_H */
diff --git a/sysdeps/unix/sysv/linux/x86/malloc-thp.h b/sysdeps/unix/sysv/linux/x86/malloc-thp.h
new file mode 100644
index 0000000000..d94cb578c2
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86/malloc-thp.h
@@ -0,0 +1,32 @@
+/* Transparent Huge Page support.  Generic implementation.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public License as
+   published by the Free Software Foundation; either version 2.1 of the
+   License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; see the file COPYING.LIB.  If
+   not, see <https://www.gnu.org/licenses/>.  */
+
+#ifndef _MALLOC_THP_H
+#define _MALLOC_THP_H
+
+#include <ldsodefs.h>
+
+/* Return the prefered large page size for the request PAGESIZE.  The
+   requested value of 1 means the default size for the architecture.  */
+static inline size_t
+malloc_verify_thp_pagesize (size_t pagesize)
+{
+  return 1UL << 21;
+}
+
+#endif /* _MALLOC_THP_H */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] malloc: improve THP effectiveness
  2021-08-13 21:04 [PATCH 0/3] malloc: improve THP effectiveness Adhemerval Zanella via Libc-alpha
                   ` (2 preceding siblings ...)
  2021-08-13 21:04 ` [PATCH 3/3] malloc: Add arch-specific malloc_verify_thp_pagesize for Linux Adhemerval Zanella via Libc-alpha
@ 2021-08-13 21:37 ` Guillaume Morin
  2021-08-16 20:55   ` Adhemerval Zanella via Libc-alpha
  3 siblings, 1 reply; 7+ messages in thread
From: Guillaume Morin @ 2021-08-13 21:37 UTC (permalink / raw)
  To: Adhemerval Zanella
  Cc: Norbert Manthey, Siddhesh Poyarekar, libc-alpha, Guillaume Morin

Hello Adhemerval,

On 13 Aug 18:04, Adhemerval Zanella wrote:
> Although it does improve THP effectiveness, it does not provide the same
> features from libhugetlsfs morecore implementation [2], since it does
> use MAP_HUGETLB explicit on mmap.  And I think this is not what we want
> for glibc, it requires additional setup from the admin to mount the
> hugetlsfs and reserve the pages with it outside from glibc scope.

I certainly do appreciate the effort. But unfortunately this is not a
usable replacement for most libhugetlblfs users (who actually want to use
hugetlbfs).

First, I'll argue to have THP supported directly in the allocator is
probably a nice-to-have feature for THP users but probably not that
critical considering you can just madvise() the memory *after*
it's been allocated. Alternatively any malloc interposition scheme can
do this trivially: afaik there were never an actual *need* for a
morecore implementation in this case.
There is no such possibility with hugetlbfs. It's either mmap() with
MAP_HUGETLB or not.

Second, THP is not a drop-in replacement for hugetblfs. hugetlbfs has
desirable properties that simply do not exist for THP. Just a few
examples: 1) A hugetlbfs allocation gives you a huge page or not at
allocation time but this is forever. There is no splitting, re-merging
by the VM: no TLB shootdowns for running processes etc. 2) Fast
allocation: there is a dedicated pool of these pages.  There is no
competition with the rest of the processes unlike THP 3) No swapping all
hugetlbfs pages.

I would really like to discuss and/or maybe implement some schemable
that allows to optionally use MAP_HUGETLB for all allocations (which
would be a definitive improvement over what libhugetlbfs was doing) if
that's workable for you.

Guillaume.

-- 
Guillaume Morin <guillaume@morinfr.org>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] malloc: improve THP effectiveness
  2021-08-13 21:37 ` [PATCH 0/3] malloc: improve THP effectiveness Guillaume Morin
@ 2021-08-16 20:55   ` Adhemerval Zanella via Libc-alpha
  2021-08-17  4:00     ` Siddhesh Poyarekar via Libc-alpha
  0 siblings, 1 reply; 7+ messages in thread
From: Adhemerval Zanella via Libc-alpha @ 2021-08-16 20:55 UTC (permalink / raw)
  To: libc-alpha, Norbert Manthey, Siddhesh Poyarekar, Guillaume Morin



On 13/08/2021 18:37, Guillaume Morin wrote:
> Hello Adhemerval,
> 
> On 13 Aug 18:04, Adhemerval Zanella wrote:
>> Although it does improve THP effectiveness, it does not provide the same
>> features from libhugetlsfs morecore implementation [2], since it does
>> use MAP_HUGETLB explicit on mmap.  And I think this is not what we want
>> for glibc, it requires additional setup from the admin to mount the
>> hugetlsfs and reserve the pages with it outside from glibc scope.
> 
> I certainly do appreciate the effort. But unfortunately this is not a
> usable replacement for most libhugetlblfs users (who actually want to use
> hugetlbfs).

Yes, that's why I explicit stated this is not a replacement.  But
I had the misconception that MAP_HUGETLB would require to to use
solely with mmap files opened on libhugetls filesystem and that's
why I wrote that I think it is not meant to glibc.

However reading the kernel documentation properly and after some
experiment, I think we add another tunable to use MAP_HUGETLB
as first allocation option.

> 
> First, I'll argue to have THP supported directly in the allocator is
> probably a nice-to-have feature for THP users but probably not that
> critical considering you can just madvise() the memory *after*
> it's been allocated. Alternatively any malloc interposition scheme can
> do this trivially: afaik there were never an actual *need* for a
> morecore implementation in this case.
> There is no such possibility with hugetlbfs. It's either mmap() with
> MAP_HUGETLB or not.

Yeah, I am aware. The idea is mainly to abstract to requirement to
query the kernel or handle the multiple pagesize from different
architectures and also possible handle the sbrk() calls for main
arena.  We can also add more tuning in the future if we find some
scenarios where THP need tuning.

> 
> Second, THP is not a drop-in replacement for hugetblfs. hugetlbfs has
> desirable properties that simply do not exist for THP. Just a few
> examples: 1) A hugetlbfs allocation gives you a huge page or not at
> allocation time but this is forever. There is no splitting, re-merging
> by the VM: no TLB shootdowns for running processes etc. 2) Fast
> allocation: there is a dedicated pool of these pages.  There is no
> competition with the rest of the processes unlike THP 3) No swapping all
> hugetlbfs pages.
> 
> I would really like to discuss and/or maybe implement some schemable
> that allows to optionally use MAP_HUGETLB for all allocations (which
> would be a definitive improvement over what libhugetlbfs was doing) if
> that's workable for you.

I am reworking this patchset and I intend to add an option to use
MAP_HUGETLB as well.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 0/3] malloc: improve THP effectiveness
  2021-08-16 20:55   ` Adhemerval Zanella via Libc-alpha
@ 2021-08-17  4:00     ` Siddhesh Poyarekar via Libc-alpha
  0 siblings, 0 replies; 7+ messages in thread
From: Siddhesh Poyarekar via Libc-alpha @ 2021-08-17  4:00 UTC (permalink / raw)
  To: Adhemerval Zanella, libc-alpha, Norbert Manthey, Guillaume Morin

On 8/17/21 2:25 AM, Adhemerval Zanella via Libc-alpha wrote:
> I am reworking this patchset and I intend to add an option to use
> MAP_HUGETLB as well.

A low hanging fruit for adding MAP_HUGETLB may be at the mmap threshold; 
whenever it crosses the hugepage size, always use MAP_HUGETLB for 
mmapped blocks if the user requests it via tunable.

Siddhesh

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-08-17  4:06 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-13 21:04 [PATCH 0/3] malloc: improve THP effectiveness Adhemerval Zanella via Libc-alpha
2021-08-13 21:04 ` [PATCH 1/3] malloc: Add madvise support for Transparent Huge Pages Adhemerval Zanella via Libc-alpha
2021-08-13 21:04 ` [PATCH 2/3] malloc: Add THP/madvise support for sbrk Adhemerval Zanella via Libc-alpha
2021-08-13 21:04 ` [PATCH 3/3] malloc: Add arch-specific malloc_verify_thp_pagesize for Linux Adhemerval Zanella via Libc-alpha
2021-08-13 21:37 ` [PATCH 0/3] malloc: improve THP effectiveness Guillaume Morin
2021-08-16 20:55   ` Adhemerval Zanella via Libc-alpha
2021-08-17  4:00     ` Siddhesh Poyarekar via Libc-alpha

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).