unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] malloc: Improve Huge Page support
@ 2021-08-18 14:19 Adhemerval Zanella via Libc-alpha
  2021-08-18 14:19 ` [PATCH v2 1/4] malloc: Add madvise support for Transparent Huge Pages Adhemerval Zanella via Libc-alpha
                   ` (5 more replies)
  0 siblings, 6 replies; 24+ messages in thread
From: Adhemerval Zanella via Libc-alpha @ 2021-08-18 14:19 UTC (permalink / raw
  To: libc-alpha; +Cc: Norbert Manthey, Guillaume Morin, Siddhesh Poyarekar

Linux currently supports two ways to use Huge Pages: either by using
specific flags directly with the syscall (MAP_HUGETLB for mmap(), or
SHM_HUGETLB for shmget()), or by using Transparent Huge Pages (THP)
where the kernel will try to move allocated anonymous pages to Huge
Pages blocks transparent to application.

Also, THP current support three different modes [1]: 'never', 'madvise',
and 'always'.  The 'never' is self-explanatory and 'always' will enable
THP for all anonymous memory.  However, 'madvise' is still the default
for some systems and for such cases THP will be only used if the memory
range is explicity advertise by the program through a
madvise(MADV_HUGEPAGE) call.

This patchset adds a two new tunables to improve malloc() support with
Huge Page:

  - glibc.malloc.thp_madvise: instruct the system allocator to issue
    a madvise(MADV_HUGEPAGE) call after a mmap() one for sizes larger
    than the default huge page size.  The default behavior is to
    disable it and if the system does not support THP the tunable also
    does not enable the madvise() call.

  - glibc.malloc.mmap_hugetlb: instruct the system allocator to round
    allocation to huge page sizes along with the required flags
    (MAP_HUGETLB for Linux).  If the memory allocation fails, the
    default system page size is used instead.  The default behavior is
    to disable and a value of 1 uses the default system huge page size.
    A positive value larger than 1 means to use a specific huge page
    size, which is matched against the supported ones by the system.

The 'thp_madvise' tunable also changes the sbrk() usage by malloc
on main arenas, where the increment is now aligned to the huge page
size, instead of default page size.

The 'mmap_hugetlb' aims to replace the 'morecore' removed callback
from 2.34 for libhugetlbfs (where the library tries to leverage the
huge pages usage instead to provide a system allocator).  By
implementing the support directly on the mmap() code patch there is
no need to try emulate the morecore()/sbrk() semantic which simplifies
the code and make memory shrink logic more straighforward.

The performance improvements are really dependent of the workload
and the platform, however a simple testcase might show the possible
improvements:

$ cat hugepages.cc
#include <unordered_map>

int
main (int argc, char *argv[])
{
  std::size_t iters = 10000000;
  std::unordered_map <std::size_t, std::size_t> ht;
  ht.reserve (iters);
  for (std::size_t i = 0; i < iters; ++i)
    ht.try_emplace (i, i);

  return 0;
}
$ g++ -std=c++17 -O2 hugepages.cc -o hugepages

On a x86_64 (Ryzen 9 5900X):

 Performance counter stats for 'env
GLIBC_TUNABLES=glibc.malloc.thp_madvise=0 ./testrun.sh ./hugepages':

            98,874      faults                                                      
           717,059      dTLB-loads                                                  
           411,701      dTLB-load-misses          #   57.42% of all dTLB
cache accesses
         3,754,927      cache-misses              #    8.479 % of all
cache refs    
        44,287,580      cache-references                                            

       0.315278378 seconds time elapsed

       0.238635000 seconds user
       0.076714000 seconds sys

 Performance counter stats for 'env
GLIBC_TUNABLES=glibc.malloc.thp_madvise=1 ./testrun.sh ./hugepages':

             1,871      faults                                                      
           120,035      dTLB-loads                                                  
            19,882      dTLB-load-misses          #   16.56% of all dTLB
cache accesses
         4,182,942      cache-misses              #    7.452 % of all
cache refs    
        56,128,995      cache-references                                            

       0.262620733 seconds time elapsed

       0.222233000 seconds user
       0.040333000 seconds sys


On an AArch64 (cortex A72):

 Performance counter stats for 'env
GLIBC_TUNABLES=glibc.malloc.thp_madvise=0 ./testrun.sh ./hugepages':

             98835      faults                                                      
        2007234756      dTLB-loads                                                  
           4613669      dTLB-load-misses          #    0.23% of all dTLB
cache accesses
           8831801      cache-misses              #    0.504 % of all
cache refs    
        1751391405      cache-references                                            

       0.616782575 seconds time elapsed

       0.460946000 seconds user
       0.154309000 seconds sys

 Performance counter stats for 'env
GLIBC_TUNABLES=glibc.malloc.thp_madvise=1 ./testrun.sh ./hugepages':

               955      faults                                                      
        1787401880      dTLB-loads                                                  
            224034      dTLB-load-misses          #    0.01% of all dTLB
cache accesses
           5480917      cache-misses              #    0.337 % of all
cache refs    
        1625937858      cache-references                                            

       0.487773443 seconds time elapsed

       0.440894000 seconds user
       0.046465000 seconds sys


And on a powerpc64 (POWER8):

 Performance counter stats for 'env
GLIBC_TUNABLES=glibc.malloc.thp_madvise=0 ./testrun.sh ./hugepages
':

              5453      faults                                                      
              9940      dTLB-load-misses                                            
           1338152      cache-misses              #    0.101 % of all
cache refs    
        1326037487      cache-references                                            

       1.056355887 seconds time elapsed

       1.014633000 seconds user
       0.041805000 seconds sys

 Performance counter stats for 'env
GLIBC_TUNABLES=glibc.malloc.thp_madvise=1 ./testrun.sh ./hugepages
':

              1016      faults                                                      
              1746      dTLB-load-misses                                            
            399052      cache-misses              #    0.030 % of all
cache refs    
        1316059877      cache-references                                            

       1.057810501 seconds time elapsed

       1.012175000 seconds user
       0.045624000 seconds sys

It is worth to note that the powerpc64 machine has 'always' set
on '/sys/kernel/mm/transparent_hugepage/enabled'.

Norbert Manthey's paper has more information with a more thoroughly
performance analysis.

For testing run make check on x86_64-linux-gnu with thp_pagesize=1
(directly on ptmalloc_init() after tunable initialiazation) and
with mmap_hugetlb=1 (also directly on ptmalloc_init()) with about
10 large pages (so the fallback mmap() call is used) and with
1024 large pages (so all mmap(MAP_HUGETLB) are successful).

--

Changes from previous version:

  - Renamed thp_pagesize to thp_madvise and make it a boolean state.
  - Added MAP_HUGETLB support for mmap().
  - Remove system specific hooks for THP huge page size in favor of
    Linux generic implementation.
  - Initial program segments need to be page aligned for the
    first madvise call.

Adhemerval Zanella (4):
  malloc: Add madvise support for Transparent Huge Pages
  malloc: Add THP/madvise support for sbrk
  malloc: Move mmap logic to its own function
  malloc: Add Huge Page support for sysmalloc

 NEWS                                       |   9 +-
 elf/dl-tunables.list                       |   9 +
 elf/tst-rtld-list-tunables.exp             |   2 +
 include/libc-pointer-arith.h               |  10 +
 malloc/arena.c                             |   7 +
 malloc/malloc-internal.h                   |   1 +
 malloc/malloc.c                            | 263 +++++++++++++++------
 manual/tunables.texi                       |  23 ++
 sysdeps/generic/Makefile                   |   8 +
 sysdeps/generic/malloc-hugepages.c         |  37 +++
 sysdeps/generic/malloc-hugepages.h         |  49 ++++
 sysdeps/unix/sysv/linux/malloc-hugepages.c | 201 ++++++++++++++++
 12 files changed, 542 insertions(+), 77 deletions(-)
 create mode 100644 sysdeps/generic/malloc-hugepages.c
 create mode 100644 sysdeps/generic/malloc-hugepages.h
 create mode 100644 sysdeps/unix/sysv/linux/malloc-hugepages.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v2 1/4] malloc: Add madvise support for Transparent Huge Pages
  2021-08-18 14:19 [PATCH v2 0/4] malloc: Improve Huge Page support Adhemerval Zanella via Libc-alpha
@ 2021-08-18 14:19 ` Adhemerval Zanella via Libc-alpha
  2021-08-18 18:42   ` Siddhesh Poyarekar via Libc-alpha
  2021-08-18 14:19 ` [PATCH v2 2/4] malloc: Add THP/madvise support for sbrk Adhemerval Zanella via Libc-alpha
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 24+ messages in thread
From: Adhemerval Zanella via Libc-alpha @ 2021-08-18 14:19 UTC (permalink / raw
  To: libc-alpha; +Cc: Norbert Manthey, Guillaume Morin, Siddhesh Poyarekar

Linux Transparent Huge Pages (THP) current support three different
states: 'never', 'madvise', and 'always'.  The 'never' is
self-explanatory and 'always' will enable THP for all anonymous
memory.  However, 'madvise' is still the default for some system and
for such case THP will be only used if the memory range is explicity
advertise by the program through a madvise(MADV_HUGEPAGE) call.

To enable it a new tunable is provided, 'glibc.malloc.thp_madvise',
where setting to a value diffent than 0 enables the madvise call.
Linux current only support one page size for THP, even if the
architecture supports multiple sizes.

This patch issues the madvise(MADV_HUGEPAGE) call after a successful
mmap() call at sysmalloc() with sizes larger than the default huge
page size.  The madvise() call is disable is system does not support
THP or if it has the mode set to "never".

Checked on x86_64-linux-gnu.
---
 NEWS                                       |  5 +-
 elf/dl-tunables.list                       |  5 ++
 elf/tst-rtld-list-tunables.exp             |  1 +
 malloc/arena.c                             |  5 ++
 malloc/malloc-internal.h                   |  1 +
 malloc/malloc.c                            | 48 ++++++++++++++
 manual/tunables.texi                       |  9 +++
 sysdeps/generic/Makefile                   |  8 +++
 sysdeps/generic/malloc-hugepages.c         | 31 +++++++++
 sysdeps/generic/malloc-hugepages.h         | 37 +++++++++++
 sysdeps/unix/sysv/linux/malloc-hugepages.c | 76 ++++++++++++++++++++++
 11 files changed, 225 insertions(+), 1 deletion(-)
 create mode 100644 sysdeps/generic/malloc-hugepages.c
 create mode 100644 sysdeps/generic/malloc-hugepages.h
 create mode 100644 sysdeps/unix/sysv/linux/malloc-hugepages.c

diff --git a/NEWS b/NEWS
index 79c895e382..9b2345d08c 100644
--- a/NEWS
+++ b/NEWS
@@ -9,7 +9,10 @@ Version 2.35
 
 Major new features:
 
-  [Add new features here]
+* On Linux, a new tunable, glibc.malloc.thp_madvise, can be used to
+  make malloc issue madvise plus MADV_HUGEPAGE on mmap and sbrk calls.
+  It might improve performance with Transparent Huge Pages madvise mode
+  depending of the workload.
 
 Deprecated and removed features, and other changes affecting compatibility:
 
diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list
index 8ddd4a2314..67df6dbc2c 100644
--- a/elf/dl-tunables.list
+++ b/elf/dl-tunables.list
@@ -92,6 +92,11 @@ glibc {
       minval: 0
       security_level: SXID_IGNORE
     }
+    thp_madvise {
+      type: INT_32
+      minval: 0
+      maxval: 1
+    }
   }
   cpu {
     hwcap_mask {
diff --git a/elf/tst-rtld-list-tunables.exp b/elf/tst-rtld-list-tunables.exp
index 9f66c52885..d8109fa31c 100644
--- a/elf/tst-rtld-list-tunables.exp
+++ b/elf/tst-rtld-list-tunables.exp
@@ -8,6 +8,7 @@ glibc.malloc.perturb: 0 (min: 0, max: 255)
 glibc.malloc.tcache_count: 0x0 (min: 0x0, max: 0x[f]+)
 glibc.malloc.tcache_max: 0x0 (min: 0x0, max: 0x[f]+)
 glibc.malloc.tcache_unsorted_limit: 0x0 (min: 0x0, max: 0x[f]+)
+glibc.malloc.thp_madvise: 0 (min: 0, max: 1)
 glibc.malloc.top_pad: 0x0 (min: 0x0, max: 0x[f]+)
 glibc.malloc.trim_threshold: 0x0 (min: 0x0, max: 0x[f]+)
 glibc.rtld.nns: 0x4 (min: 0x1, max: 0x10)
diff --git a/malloc/arena.c b/malloc/arena.c
index 667484630e..81bff54303 100644
--- a/malloc/arena.c
+++ b/malloc/arena.c
@@ -231,6 +231,7 @@ TUNABLE_CALLBACK_FNDECL (set_tcache_count, size_t)
 TUNABLE_CALLBACK_FNDECL (set_tcache_unsorted_limit, size_t)
 #endif
 TUNABLE_CALLBACK_FNDECL (set_mxfast, size_t)
+TUNABLE_CALLBACK_FNDECL (set_thp_madvise, int32_t)
 #else
 /* Initialization routine. */
 #include <string.h>
@@ -331,6 +332,7 @@ ptmalloc_init (void)
 	       TUNABLE_CALLBACK (set_tcache_unsorted_limit));
 # endif
   TUNABLE_GET (mxfast, size_t, TUNABLE_CALLBACK (set_mxfast));
+  TUNABLE_GET (thp_madvise, int32_t, TUNABLE_CALLBACK (set_thp_madvise));
 #else
   if (__glibc_likely (_environ != NULL))
     {
@@ -509,6 +511,9 @@ new_heap (size_t size, size_t top_pad)
       __munmap (p2, HEAP_MAX_SIZE);
       return 0;
     }
+
+  sysmadvise_thp (p2, size);
+
   h = (heap_info *) p2;
   h->size = size;
   h->mprotect_size = size;
diff --git a/malloc/malloc-internal.h b/malloc/malloc-internal.h
index 0c7b5a183c..7493e34d86 100644
--- a/malloc/malloc-internal.h
+++ b/malloc/malloc-internal.h
@@ -22,6 +22,7 @@
 #include <malloc-machine.h>
 #include <malloc-sysdep.h>
 #include <malloc-size.h>
+#include <malloc-hugepages.h>
 
 /* Called in the parent process before a fork.  */
 void __malloc_fork_lock_parent (void) attribute_hidden;
diff --git a/malloc/malloc.c b/malloc/malloc.c
index e065785af7..ad3eec41ac 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -1881,6 +1881,11 @@ struct malloc_par
   INTERNAL_SIZE_T arena_test;
   INTERNAL_SIZE_T arena_max;
 
+#if HAVE_TUNABLES
+  /* Transparent Large Page support.  */
+  INTERNAL_SIZE_T thp_pagesize;
+#endif
+
   /* Memory map support */
   int n_mmaps;
   int n_mmaps_max;
@@ -2009,6 +2014,20 @@ free_perturb (char *p, size_t n)
 
 #include <stap-probe.h>
 
+/* ----------- Routines dealing with transparent huge pages ----------- */
+
+static inline void
+sysmadvise_thp (void *p, INTERNAL_SIZE_T size)
+{
+#if HAVE_TUNABLES && defined (MADV_HUGEPAGE)
+  /* Do not consider areas smaller than a huge page or if the tunable is
+     not active.  */
+  if (mp_.thp_pagesize == 0 || size < mp_.thp_pagesize)
+    return;
+  __madvise (p, size, MADV_HUGEPAGE);
+#endif
+}
+
 /* ------------------- Support for multiple arenas -------------------- */
 #include "arena.c"
 
@@ -2446,6 +2465,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
 
           if (mm != MAP_FAILED)
             {
+	      sysmadvise_thp (mm, size);
+
               /*
                  The offset to the start of the mmapped region is stored
                  in the prev_size field of the chunk. This allows us to adjust
@@ -2607,6 +2628,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
       if (size > 0)
         {
           brk = (char *) (MORECORE (size));
+	  if (brk != (char *) (MORECORE_FAILURE))
+	    sysmadvise_thp (brk, size);
           LIBC_PROBE (memory_sbrk_more, 2, brk, size);
         }
 
@@ -2638,6 +2661,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
 
               if (mbrk != MAP_FAILED)
                 {
+		  sysmadvise_thp (mbrk, size);
+
                   /* We do not need, and cannot use, another sbrk call to find end */
                   brk = mbrk;
                   snd_brk = brk + size;
@@ -2749,6 +2774,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
                       correction = 0;
                       snd_brk = (char *) (MORECORE (0));
                     }
+		  else
+		    sysmadvise_thp (snd_brk, correction);
                 }
 
               /* handle non-contiguous cases */
@@ -2989,6 +3016,8 @@ mremap_chunk (mchunkptr p, size_t new_size)
   if (cp == MAP_FAILED)
     return 0;
 
+  sysmadvise_thp (cp, new_size);
+
   p = (mchunkptr) (cp + offset);
 
   assert (aligned_OK (chunk2mem (p)));
@@ -5325,6 +5354,25 @@ do_set_mxfast (size_t value)
   return 0;
 }
 
+#if HAVE_TUNABLES
+static __always_inline int
+do_set_thp_madvise (int32_t value)
+{
+  if (value > 0)
+    {
+      enum malloc_thp_mode_t thp_mode = __malloc_thp_mode ();
+      /*
+	 Only enables THP usage is system does support it and has at least
+	 always or madvise mode.  Otherwise the madvise() call is wasteful.
+       */
+      if (thp_mode != malloc_thp_mode_not_supported
+	  && thp_mode != malloc_thp_mode_never)
+	mp_.thp_pagesize = __malloc_default_thp_pagesize ();
+    }
+  return 0;
+}
+#endif
+
 int
 __libc_mallopt (int param_number, int value)
 {
diff --git a/manual/tunables.texi b/manual/tunables.texi
index 658547c613..93c46807f9 100644
--- a/manual/tunables.texi
+++ b/manual/tunables.texi
@@ -270,6 +270,15 @@ pointer, so add 4 on 32-bit systems or 8 on 64-bit systems to the size
 passed to @code{malloc} for the largest bin size to enable.
 @end deftp
 
+@deftp Tunable glibc.malloc.thp_madivse
+This tunable enable the use of @code{madvise} with @code{MADV_HUGEPAGE} after
+the system allocator allocated memory through @code{mmap} if the system supports
+Transparent Huge Page (currently only Linux).
+
+The default value of this tunable is @code{0}, which disable its usage.
+Setting to a positive value enable the @code{madvise} call.
+@end deftp
+
 @node Dynamic Linking Tunables
 @section Dynamic Linking Tunables
 @cindex dynamic linking tunables
diff --git a/sysdeps/generic/Makefile b/sysdeps/generic/Makefile
index a209e85cc4..8eef83c94d 100644
--- a/sysdeps/generic/Makefile
+++ b/sysdeps/generic/Makefile
@@ -27,3 +27,11 @@ sysdep_routines += framestate unwind-pe
 shared-only-routines += framestate unwind-pe
 endif
 endif
+
+ifeq ($(subdir),malloc)
+sysdep_malloc_debug_routines += malloc-hugepages
+endif
+
+ifeq ($(subdir),misc)
+sysdep_routines += malloc-hugepages
+endif
diff --git a/sysdeps/generic/malloc-hugepages.c b/sysdeps/generic/malloc-hugepages.c
new file mode 100644
index 0000000000..262bcdbeb8
--- /dev/null
+++ b/sysdeps/generic/malloc-hugepages.c
@@ -0,0 +1,31 @@
+/* Huge Page support.  Generic implementation.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public License as
+   published by the Free Software Foundation; either version 2.1 of the
+   License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; see the file COPYING.LIB.  If
+   not, see <https://www.gnu.org/licenses/>.  */
+
+#include <malloc-hugepages.h>
+
+size_t
+__malloc_default_thp_pagesize (void)
+{
+  return 0;
+}
+
+enum malloc_thp_mode_t
+__malloc_thp_mode (void)
+{
+  return malloc_thp_mode_not_supported;
+}
diff --git a/sysdeps/generic/malloc-hugepages.h b/sysdeps/generic/malloc-hugepages.h
new file mode 100644
index 0000000000..664cda9b67
--- /dev/null
+++ b/sysdeps/generic/malloc-hugepages.h
@@ -0,0 +1,37 @@
+/* Malloc huge page support.  Generic implementation.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public License as
+   published by the Free Software Foundation; either version 2.1 of the
+   License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; see the file COPYING.LIB.  If
+   not, see <https://www.gnu.org/licenses/>.  */
+
+#ifndef _MALLOC_HUGEPAGES_H
+#define _MALLOC_HUGEPAGES_H
+
+#include <stddef.h>
+
+/* Return the default transparent huge page size.  */
+size_t __malloc_default_thp_pagesize (void) attribute_hidden;
+
+enum malloc_thp_mode_t
+{
+  malloc_thp_mode_always,
+  malloc_thp_mode_madvise,
+  malloc_thp_mode_never,
+  malloc_thp_mode_not_supported
+};
+
+enum malloc_thp_mode_t __malloc_thp_mode (void) attribute_hidden;
+
+#endif /* _MALLOC_HUGEPAGES_H */
diff --git a/sysdeps/unix/sysv/linux/malloc-hugepages.c b/sysdeps/unix/sysv/linux/malloc-hugepages.c
new file mode 100644
index 0000000000..66589127cd
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/malloc-hugepages.c
@@ -0,0 +1,76 @@
+/* Huge Page support.  Linux implementation.
+   Copyright (C) 2021 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public License as
+   published by the Free Software Foundation; either version 2.1 of the
+   License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; see the file COPYING.LIB.  If
+   not, see <https://www.gnu.org/licenses/>.  */
+
+#include <intprops.h>
+#include <malloc-hugepages.h>
+#include <not-cancel.h>
+
+size_t
+__malloc_default_thp_pagesize (void)
+{
+  int fd = __open64_nocancel (
+    "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size", O_RDONLY);
+  if (fd == -1)
+    return 0;
+
+
+  char str[INT_BUFSIZE_BOUND (size_t)];
+  ssize_t s = __read_nocancel (fd, str, sizeof (str));
+  __close_nocancel (fd);
+
+  if (s < 0)
+    return 0;
+
+  int r = 0;
+  for (ssize_t i = 0; i < s; i++)
+    {
+      if (str[i] == '\n')
+	break;
+      r *= 10;
+      r += str[i] - '0';
+    }
+  return r;
+}
+
+enum malloc_thp_mode_t
+__malloc_thp_mode (void)
+{
+  int fd = __open64_nocancel ("/sys/kernel/mm/transparent_hugepage/enabled",
+			      O_RDONLY);
+  if (fd == -1)
+    return malloc_thp_mode_not_supported;
+
+  static const char mode_always[]  = "[always] madvise never\n";
+  static const char mode_madvise[] = "always [madvise] never\n";
+  static const char mode_never[]   = "always madvise [never]\n";
+
+  char str[sizeof(mode_always)];
+  ssize_t s = __read_nocancel (fd, str, sizeof (str));
+  __close_nocancel (fd);
+
+  if (s == sizeof (mode_always) - 1)
+    {
+      if (strcmp (str, mode_always) == 0)
+	return malloc_thp_mode_always;
+      else if (strcmp (str, mode_madvise) == 0)
+	return malloc_thp_mode_madvise;
+      else if (strcmp (str, mode_never) == 0)
+	return malloc_thp_mode_never;
+    }
+  return malloc_thp_mode_not_supported;
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 2/4] malloc: Add THP/madvise support for sbrk
  2021-08-18 14:19 [PATCH v2 0/4] malloc: Improve Huge Page support Adhemerval Zanella via Libc-alpha
  2021-08-18 14:19 ` [PATCH v2 1/4] malloc: Add madvise support for Transparent Huge Pages Adhemerval Zanella via Libc-alpha
@ 2021-08-18 14:19 ` Adhemerval Zanella via Libc-alpha
  2021-08-18 14:19 ` [PATCH v2 3/4] malloc: Move mmap logic to its own function Adhemerval Zanella via Libc-alpha
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 24+ messages in thread
From: Adhemerval Zanella via Libc-alpha @ 2021-08-18 14:19 UTC (permalink / raw
  To: libc-alpha; +Cc: Norbert Manthey, Guillaume Morin, Siddhesh Poyarekar

For the main arena, the sbrk() might the preferable syscall instead of
mmap().  And the granularity used when increasing the program segment
is the default page size.

To increase effectiveness with Transparent Huge Page with madvise, the
large page size is use instead.  This is enabled with the new tunable
'glibc.malloc.thp_pagesize'.

Checked on x86_64-linux-gnu.
---
 include/libc-pointer-arith.h | 10 ++++++++++
 malloc/malloc.c              | 35 ++++++++++++++++++++++++++++++-----
 2 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/include/libc-pointer-arith.h b/include/libc-pointer-arith.h
index 04ba537617..f592cbafec 100644
--- a/include/libc-pointer-arith.h
+++ b/include/libc-pointer-arith.h
@@ -37,6 +37,16 @@
 /* Cast an integer or a pointer VAL to integer with proper type.  */
 # define cast_to_integer(val) ((__integer_if_pointer_type (val)) (val))
 
+/* Check if SIZE is aligned on SIZE  */
+#define IS_ALIGNED(base, size) \
+  (((base) & (size - 1)) == 0)
+
+#define PTR_IS_ALIGNED(base, size) \
+  ((((uintptr_t) (base)) & (size - 1)) == 0)
+
+#define PTR_DIFF(p1, p2) \
+  ((ptrdiff_t)((uintptr_t)(p1) - (uintptr_t)(p2)))
+
 /* Cast an integer VAL to void * pointer.  */
 # define cast_to_pointer(val) ((void *) (uintptr_t) (val))
 
diff --git a/malloc/malloc.c b/malloc/malloc.c
index ad3eec41ac..1a2c798a35 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -2024,6 +2024,17 @@ sysmadvise_thp (void *p, INTERNAL_SIZE_T size)
      not active.  */
   if (mp_.thp_pagesize == 0 || size < mp_.thp_pagesize)
     return;
+
+  /* madvise() requires at least the input to be aligned to system page and
+     MADV_HUGEPAGE should handle unaligned address.  Also unaligned inputs
+     should happen only for the initial data segment.  */
+  if (__glibc_unlikely (!PTR_IS_ALIGNED (p, GLRO (dl_pagesize))))
+    {
+      void *q = PTR_ALIGN_DOWN (p, GLRO (dl_pagesize));
+      size += PTR_DIFF (p, q);
+      p = q;
+    }
+
   __madvise (p, size, MADV_HUGEPAGE);
 #endif
 }
@@ -2610,14 +2621,25 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
         size -= old_size;
 
       /*
-         Round to a multiple of page size.
+         Round to a multiple of page size or huge page size.
          If MORECORE is not contiguous, this ensures that we only call it
          with whole-page arguments.  And if MORECORE is contiguous and
          this is not first time through, this preserves page-alignment of
          previous calls. Otherwise, we correct to page-align below.
        */
 
-      size = ALIGN_UP (size, pagesize);
+#if HAVE_TUNABLES && defined (MADV_HUGEPAGE)
+      /* Defined in brk.c.  */
+      extern void *__curbrk;
+      if (mp_.thp_pagesize != 0)
+	{
+	  uintptr_t top = ALIGN_UP ((uintptr_t) __curbrk + size,
+				    mp_.thp_pagesize);
+	  size = top - (uintptr_t) __curbrk;
+	}
+      else
+#endif
+	size = ALIGN_UP (size, GLRO(dl_pagesize));
 
       /*
          Don't try to call MORECORE if argument is so big as to appear
@@ -2900,10 +2922,8 @@ systrim (size_t pad, mstate av)
   long released;         /* Amount actually released */
   char *current_brk;     /* address returned by pre-check sbrk call */
   char *new_brk;         /* address returned by post-check sbrk call */
-  size_t pagesize;
   long top_area;
 
-  pagesize = GLRO (dl_pagesize);
   top_size = chunksize (av->top);
 
   top_area = top_size - MINSIZE - 1;
@@ -2911,7 +2931,12 @@ systrim (size_t pad, mstate av)
     return 0;
 
   /* Release in pagesize units and round down to the nearest page.  */
-  extra = ALIGN_DOWN(top_area - pad, pagesize);
+#if HAVE_TUNABLES && defined (MADV_HUGEPAGE)
+  if (mp_.thp_pagesize != 0)
+    extra = ALIGN_DOWN (top_area - pad, mp_.thp_pagesize);
+  else
+#endif
+    extra = ALIGN_DOWN (top_area - pad, GLRO(dl_pagesize));
 
   if (extra == 0)
     return 0;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 3/4] malloc: Move mmap logic to its own function
  2021-08-18 14:19 [PATCH v2 0/4] malloc: Improve Huge Page support Adhemerval Zanella via Libc-alpha
  2021-08-18 14:19 ` [PATCH v2 1/4] malloc: Add madvise support for Transparent Huge Pages Adhemerval Zanella via Libc-alpha
  2021-08-18 14:19 ` [PATCH v2 2/4] malloc: Add THP/madvise support for sbrk Adhemerval Zanella via Libc-alpha
@ 2021-08-18 14:19 ` Adhemerval Zanella via Libc-alpha
  2021-08-19  0:47   ` Siddhesh Poyarekar via Libc-alpha
  2021-08-18 14:20 ` [PATCH v2 4/4] malloc: Add Huge Page support for sysmalloc Adhemerval Zanella via Libc-alpha
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 24+ messages in thread
From: Adhemerval Zanella via Libc-alpha @ 2021-08-18 14:19 UTC (permalink / raw
  To: libc-alpha; +Cc: Norbert Manthey, Guillaume Morin, Siddhesh Poyarekar

So it can be used with different pagesize and flags.
---
 malloc/malloc.c | 155 +++++++++++++++++++++++++-----------------------
 1 file changed, 82 insertions(+), 73 deletions(-)

diff --git a/malloc/malloc.c b/malloc/malloc.c
index 1a2c798a35..4bfcea286f 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -2414,6 +2414,85 @@ do_check_malloc_state (mstate av)
    be extended or replaced.
  */
 
+static void *
+sysmalloc_mmap (INTERNAL_SIZE_T nb, size_t pagesize, int extra_flags, mstate av)
+{
+  long int size;
+
+  /*
+    Round up size to nearest page.  For mmapped chunks, the overhead is one
+    SIZE_SZ unit larger than for normal chunks, because there is no
+    following chunk whose prev_size field could be used.
+
+    See the front_misalign handling below, for glibc there is no need for
+    further alignments unless we have have high alignment.
+   */
+  if (MALLOC_ALIGNMENT == CHUNK_HDR_SZ)
+    size = ALIGN_UP (nb + SIZE_SZ, pagesize);
+  else
+    size = ALIGN_UP (nb + SIZE_SZ + MALLOC_ALIGN_MASK, pagesize);
+
+  /* Don't try if size wraps around 0.  */
+  if ((unsigned long) (size) <= (unsigned long) (nb))
+    return MAP_FAILED;
+
+  char *mm = (char *) MMAP (0, size,
+			    mtag_mmap_flags | PROT_READ | PROT_WRITE,
+			    extra_flags);
+  if (mm == MAP_FAILED)
+    return mm;
+
+  sysmadvise_thp (mm, size);
+
+  /*
+    The offset to the start of the mmapped region is stored in the prev_size
+    field of the chunk.  This allows us to adjust returned start address to
+    meet alignment requirements here and in memalign(), and still be able to
+    compute proper address argument for later munmap in free() and realloc().
+   */
+
+  INTERNAL_SIZE_T front_misalign; /* unusable bytes at front of new space */
+
+  if (MALLOC_ALIGNMENT == CHUNK_HDR_SZ)
+    {
+      /* For glibc, chunk2mem increases the address by CHUNK_HDR_SZ and
+	 MALLOC_ALIGN_MASK is CHUNK_HDR_SZ-1.  Each mmap'ed area is page
+	 aligned and therefore definitely MALLOC_ALIGN_MASK-aligned.  */
+      assert (((INTERNAL_SIZE_T) chunk2mem (mm) & MALLOC_ALIGN_MASK) == 0);
+      front_misalign = 0;
+    }
+  else
+    front_misalign = (INTERNAL_SIZE_T) chunk2mem (mm) & MALLOC_ALIGN_MASK;
+
+  mchunkptr p;                    /* the allocated/returned chunk */
+
+  if (front_misalign > 0)
+    {
+      ptrdiff_t correction = MALLOC_ALIGNMENT - front_misalign;
+      p = (mchunkptr) (mm + correction);
+      set_prev_size (p, correction);
+      set_head (p, (size - correction) | IS_MMAPPED);
+    }
+  else
+    {
+      p = (mchunkptr) mm;
+      set_prev_size (p, 0);
+      set_head (p, size | IS_MMAPPED);
+    }
+
+  /* update statistics */
+  int new = atomic_exchange_and_add (&mp_.n_mmaps, 1) + 1;
+  atomic_max (&mp_.max_n_mmaps, new);
+
+  unsigned long sum;
+  sum = atomic_exchange_and_add (&mp_.mmapped_mem, size) + size;
+  atomic_max (&mp_.max_mmapped_mem, sum);
+
+  check_chunk (av, p);
+
+  return chunk2mem (p);
+}
+
 static void *
 sysmalloc (INTERNAL_SIZE_T nb, mstate av)
 {
@@ -2451,81 +2530,11 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
       || ((unsigned long) (nb) >= (unsigned long) (mp_.mmap_threshold)
 	  && (mp_.n_mmaps < mp_.n_mmaps_max)))
     {
-      char *mm;           /* return value from mmap call*/
-
     try_mmap:
-      /*
-         Round up size to nearest page.  For mmapped chunks, the overhead
-         is one SIZE_SZ unit larger than for normal chunks, because there
-         is no following chunk whose prev_size field could be used.
-
-         See the front_misalign handling below, for glibc there is no
-         need for further alignments unless we have have high alignment.
-       */
-      if (MALLOC_ALIGNMENT == CHUNK_HDR_SZ)
-        size = ALIGN_UP (nb + SIZE_SZ, pagesize);
-      else
-        size = ALIGN_UP (nb + SIZE_SZ + MALLOC_ALIGN_MASK, pagesize);
+      char *mm = sysmalloc_mmap (nb, pagesize, 0, av);
+      if (mm != MAP_FAILED)
+	return mm;
       tried_mmap = true;
-
-      /* Don't try if size wraps around 0 */
-      if ((unsigned long) (size) > (unsigned long) (nb))
-        {
-          mm = (char *) (MMAP (0, size,
-			       mtag_mmap_flags | PROT_READ | PROT_WRITE, 0));
-
-          if (mm != MAP_FAILED)
-            {
-	      sysmadvise_thp (mm, size);
-
-              /*
-                 The offset to the start of the mmapped region is stored
-                 in the prev_size field of the chunk. This allows us to adjust
-                 returned start address to meet alignment requirements here
-                 and in memalign(), and still be able to compute proper
-                 address argument for later munmap in free() and realloc().
-               */
-
-              if (MALLOC_ALIGNMENT == CHUNK_HDR_SZ)
-                {
-                  /* For glibc, chunk2mem increases the address by
-                     CHUNK_HDR_SZ and MALLOC_ALIGN_MASK is
-                     CHUNK_HDR_SZ-1.  Each mmap'ed area is page
-                     aligned and therefore definitely
-                     MALLOC_ALIGN_MASK-aligned.  */
-                  assert (((INTERNAL_SIZE_T) chunk2mem (mm) & MALLOC_ALIGN_MASK) == 0);
-                  front_misalign = 0;
-                }
-              else
-                front_misalign = (INTERNAL_SIZE_T) chunk2mem (mm) & MALLOC_ALIGN_MASK;
-              if (front_misalign > 0)
-                {
-                  correction = MALLOC_ALIGNMENT - front_misalign;
-                  p = (mchunkptr) (mm + correction);
-		  set_prev_size (p, correction);
-                  set_head (p, (size - correction) | IS_MMAPPED);
-                }
-              else
-                {
-                  p = (mchunkptr) mm;
-		  set_prev_size (p, 0);
-                  set_head (p, size | IS_MMAPPED);
-                }
-
-              /* update statistics */
-
-              int new = atomic_exchange_and_add (&mp_.n_mmaps, 1) + 1;
-              atomic_max (&mp_.max_n_mmaps, new);
-
-              unsigned long sum;
-              sum = atomic_exchange_and_add (&mp_.mmapped_mem, size) + size;
-              atomic_max (&mp_.max_mmapped_mem, sum);
-
-              check_chunk (av, p);
-
-              return chunk2mem (p);
-            }
-        }
     }
 
   /* There are no usable arenas and mmap also failed.  */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v2 4/4] malloc: Add Huge Page support for sysmalloc
  2021-08-18 14:19 [PATCH v2 0/4] malloc: Improve Huge Page support Adhemerval Zanella via Libc-alpha
                   ` (2 preceding siblings ...)
  2021-08-18 14:19 ` [PATCH v2 3/4] malloc: Move mmap logic to its own function Adhemerval Zanella via Libc-alpha
@ 2021-08-18 14:20 ` Adhemerval Zanella via Libc-alpha
  2021-08-19  1:03   ` Siddhesh Poyarekar via Libc-alpha
  2021-08-19 17:58   ` Matheus Castanho via Libc-alpha
  2021-08-18 18:11 ` [PATCH v2 0/4] malloc: Improve Huge Page support Siddhesh Poyarekar via Libc-alpha
  2021-08-19 16:42 ` Guillaume Morin
  5 siblings, 2 replies; 24+ messages in thread
From: Adhemerval Zanella via Libc-alpha @ 2021-08-18 14:20 UTC (permalink / raw
  To: libc-alpha; +Cc: Norbert Manthey, Guillaume Morin, Siddhesh Poyarekar

A new tunable, 'glibc.malloc.mmap_hugetlb', adds support to use Huge Page
support directly with mmap() calls.  The required supported sizes and
flags for mmap() are provided by an arch-specific internal hook
malloc_hp_config().

Currently it first try mmap() using the huge page size and fallback to
default page size and sbrk() call if kernel returns MMAP_FAILED.

The default malloc_hp_config() implementation does not enable it even
if the tunable is set.

Checked on x86_64-linux-gnu.
---
 NEWS                                       |   4 +
 elf/dl-tunables.list                       |   4 +
 elf/tst-rtld-list-tunables.exp             |   1 +
 malloc/arena.c                             |   2 +
 malloc/malloc.c                            |  35 +++++-
 manual/tunables.texi                       |  14 +++
 sysdeps/generic/malloc-hugepages.c         |   6 +
 sysdeps/generic/malloc-hugepages.h         |  12 ++
 sysdeps/unix/sysv/linux/malloc-hugepages.c | 125 +++++++++++++++++++++
 9 files changed, 200 insertions(+), 3 deletions(-)

diff --git a/NEWS b/NEWS
index 9b2345d08c..412bf3e6f8 100644
--- a/NEWS
+++ b/NEWS
@@ -14,6 +14,10 @@ Major new features:
   It might improve performance with Transparent Huge Pages madvise mode
   depending of the workload.
 
+* On Linux, a new tunable, glibc.malloc.mmap_hugetlb, can be used to
+  instruct malloc to try use Huge Pages when allocate memory with mmap()
+  calls (through the use of MAP_HUGETLB).
+
 Deprecated and removed features, and other changes affecting compatibility:
 
   [Add deprecations, removals and changes affecting compatibility here]
diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list
index 67df6dbc2c..209c2d8592 100644
--- a/elf/dl-tunables.list
+++ b/elf/dl-tunables.list
@@ -97,6 +97,10 @@ glibc {
       minval: 0
       maxval: 1
     }
+    mmap_hugetlb {
+      type: SIZE_T
+      minval: 0
+    }
   }
   cpu {
     hwcap_mask {
diff --git a/elf/tst-rtld-list-tunables.exp b/elf/tst-rtld-list-tunables.exp
index d8109fa31c..49f033ce91 100644
--- a/elf/tst-rtld-list-tunables.exp
+++ b/elf/tst-rtld-list-tunables.exp
@@ -1,6 +1,7 @@
 glibc.malloc.arena_max: 0x0 (min: 0x1, max: 0x[f]+)
 glibc.malloc.arena_test: 0x0 (min: 0x1, max: 0x[f]+)
 glibc.malloc.check: 0 (min: 0, max: 3)
+glibc.malloc.mmap_hugetlb: 0x0 (min: 0x0, max: 0x[f]+)
 glibc.malloc.mmap_max: 0 (min: 0, max: 2147483647)
 glibc.malloc.mmap_threshold: 0x0 (min: 0x0, max: 0x[f]+)
 glibc.malloc.mxfast: 0x0 (min: 0x0, max: 0x[f]+)
diff --git a/malloc/arena.c b/malloc/arena.c
index 81bff54303..4efb5581c1 100644
--- a/malloc/arena.c
+++ b/malloc/arena.c
@@ -232,6 +232,7 @@ TUNABLE_CALLBACK_FNDECL (set_tcache_unsorted_limit, size_t)
 #endif
 TUNABLE_CALLBACK_FNDECL (set_mxfast, size_t)
 TUNABLE_CALLBACK_FNDECL (set_thp_madvise, int32_t)
+TUNABLE_CALLBACK_FNDECL (set_mmap_hugetlb, size_t)
 #else
 /* Initialization routine. */
 #include <string.h>
@@ -333,6 +334,7 @@ ptmalloc_init (void)
 # endif
   TUNABLE_GET (mxfast, size_t, TUNABLE_CALLBACK (set_mxfast));
   TUNABLE_GET (thp_madvise, int32_t, TUNABLE_CALLBACK (set_thp_madvise));
+  TUNABLE_GET (mmap_hugetlb, size_t, TUNABLE_CALLBACK (set_mmap_hugetlb));
 #else
   if (__glibc_likely (_environ != NULL))
     {
diff --git a/malloc/malloc.c b/malloc/malloc.c
index 4bfcea286f..8cf2d6855e 100644
--- a/malloc/malloc.c
+++ b/malloc/malloc.c
@@ -1884,6 +1884,10 @@ struct malloc_par
 #if HAVE_TUNABLES
   /* Transparent Large Page support.  */
   INTERNAL_SIZE_T thp_pagesize;
+  /* A value different than 0 means to align mmap allocation to hp_pagesize
+     add hp_flags on flags.  */
+  INTERNAL_SIZE_T hp_pagesize;
+  int hp_flags;
 #endif
 
   /* Memory map support */
@@ -2415,7 +2419,8 @@ do_check_malloc_state (mstate av)
  */
 
 static void *
-sysmalloc_mmap (INTERNAL_SIZE_T nb, size_t pagesize, int extra_flags, mstate av)
+sysmalloc_mmap (INTERNAL_SIZE_T nb, size_t pagesize, int extra_flags, mstate av,
+		bool set_thp)
 {
   long int size;
 
@@ -2442,7 +2447,8 @@ sysmalloc_mmap (INTERNAL_SIZE_T nb, size_t pagesize, int extra_flags, mstate av)
   if (mm == MAP_FAILED)
     return mm;
 
-  sysmadvise_thp (mm, size);
+  if (set_thp)
+    sysmadvise_thp (mm, size);
 
   /*
     The offset to the start of the mmapped region is stored in the prev_size
@@ -2531,7 +2537,18 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
 	  && (mp_.n_mmaps < mp_.n_mmaps_max)))
     {
     try_mmap:
-      char *mm = sysmalloc_mmap (nb, pagesize, 0, av);
+      char *mm;
+#if HAVE_TUNABLES
+      if (mp_.hp_pagesize > 0)
+	{
+	  /* There is no need to isse the THP madvise call if Huge Pages are
+	     used directly.  */
+	  mm = sysmalloc_mmap (nb, mp_.hp_pagesize, mp_.hp_flags, av, false);
+	  if (mm != MAP_FAILED)
+	    return mm;
+	}
+#endif
+      mm = sysmalloc_mmap (nb, pagesize, 0, av, true);
       if (mm != MAP_FAILED)
 	return mm;
       tried_mmap = true;
@@ -5405,6 +5422,18 @@ do_set_thp_madvise (int32_t value)
     }
   return 0;
 }
+
+static __always_inline int
+do_set_mmap_hugetlb (size_t value)
+{
+  if (value > 0)
+    {
+      struct malloc_hugepage_config_t cfg = __malloc_hugepage_config (value);
+      mp_.hp_pagesize = cfg.pagesize;
+      mp_.hp_flags = cfg.flags;
+    }
+  return 0;
+}
 #endif
 
 int
diff --git a/manual/tunables.texi b/manual/tunables.texi
index 93c46807f9..4da6a02778 100644
--- a/manual/tunables.texi
+++ b/manual/tunables.texi
@@ -279,6 +279,20 @@ The default value of this tunable is @code{0}, which disable its usage.
 Setting to a positive value enable the @code{madvise} call.
 @end deftp
 
+@deftp Tunable glibc.malloc.mmap_hugetlb
+This tunable enable the use of Huge Pages when the system supports it (currently
+only Linux).  It is done by aligning the memory size and passing the required
+flags (@code{MAP_HUGETLB} on Linux) when issuing the @code{mmap} to allocate
+memory from the system.
+
+The default value of this tunable is @code{0}, which disable its usage.
+The special value @code{1} will try to gather the system default huge page size,
+while a value larger than @code{1} will try to match it with the supported system
+huge page size.  If either no default huge page size could be obtained or if the
+requested size does not match the supported ones, the huge pages supports will be
+disabled.
+@end deftp
+
 @node Dynamic Linking Tunables
 @section Dynamic Linking Tunables
 @cindex dynamic linking tunables
diff --git a/sysdeps/generic/malloc-hugepages.c b/sysdeps/generic/malloc-hugepages.c
index 262bcdbeb8..e5f5c1ec98 100644
--- a/sysdeps/generic/malloc-hugepages.c
+++ b/sysdeps/generic/malloc-hugepages.c
@@ -29,3 +29,9 @@ __malloc_thp_mode (void)
 {
   return malloc_thp_mode_not_supported;
 }
+
+/* Return the default transparent huge page size.  */
+struct malloc_hugepage_config_t __malloc_hugepage_config (size_t requested)
+{
+  return (struct malloc_hugepage_config_t) { 0, 0 };
+}
diff --git a/sysdeps/generic/malloc-hugepages.h b/sysdeps/generic/malloc-hugepages.h
index 664cda9b67..27f7adfea5 100644
--- a/sysdeps/generic/malloc-hugepages.h
+++ b/sysdeps/generic/malloc-hugepages.h
@@ -34,4 +34,16 @@ enum malloc_thp_mode_t
 
 enum malloc_thp_mode_t __malloc_thp_mode (void) attribute_hidden;
 
+struct malloc_hugepage_config_t
+{
+  size_t pagesize;
+  int flags;
+};
+
+/* Returned the support huge page size from the requested PAGESIZE along
+   with the requires extra mmap flags.  Returning a 0 value for pagesize
+   disables its usage.  */
+struct malloc_hugepage_config_t __malloc_hugepage_config (size_t requested)
+     attribute_hidden;
+
 #endif /* _MALLOC_HUGEPAGES_H */
diff --git a/sysdeps/unix/sysv/linux/malloc-hugepages.c b/sysdeps/unix/sysv/linux/malloc-hugepages.c
index 66589127cd..0eb0c764ad 100644
--- a/sysdeps/unix/sysv/linux/malloc-hugepages.c
+++ b/sysdeps/unix/sysv/linux/malloc-hugepages.c
@@ -17,8 +17,10 @@
    not, see <https://www.gnu.org/licenses/>.  */
 
 #include <intprops.h>
+#include <dirent.h>
 #include <malloc-hugepages.h>
 #include <not-cancel.h>
+#include <sys/mman.h>
 
 size_t
 __malloc_default_thp_pagesize (void)
@@ -74,3 +76,126 @@ __malloc_thp_mode (void)
     }
   return malloc_thp_mode_not_supported;
 }
+
+static size_t
+malloc_default_hugepage_size (void)
+{
+  int fd = __open64_nocancel ("/proc/meminfo", O_RDONLY);
+  if (fd == -1)
+    return 0;
+
+  char buf[512];
+  off64_t off = 0;
+  while (1)
+    {
+      ssize_t r = __pread64_nocancel (fd, buf, sizeof (buf) - 1, off);
+      if (r < 0)
+	break;
+      buf[r - 1] = '\0';
+
+      const char *s = strstr (buf, "Hugepagesize:");
+      if (s == NULL)
+	{
+	  char *nl = strrchr (buf, '\n');
+	  if (nl == NULL)
+	    break;
+	  off += (nl + 1) - buf;
+	  continue;
+	}
+
+      /* The default huge page size is in the form:
+	 Hugepagesize:       NUMBER kB  */
+      size_t hpsize = 0;
+      s += sizeof ("Hugepagesize: ") - 1;
+      for (int i = 0; (s[i] >= '0' && s[i] <= '9') || s[i] == ' '; i++)
+	{
+	  if (s[i] == ' ')
+	    continue;
+	  hpsize *= 10;
+	  hpsize += s[i] - '0';
+	}
+      return hpsize * 1024;
+    }
+
+  __close_nocancel (fd);
+
+  return 0;
+}
+
+static inline struct malloc_hugepage_config_t
+make_malloc_hugepage_config (size_t pagesize)
+{
+  int flags = MAP_HUGETLB | (__builtin_ctzll (pagesize) << MAP_HUGE_SHIFT);
+  return (struct malloc_hugepage_config_t) { pagesize, flags };
+}
+
+struct malloc_hugepage_config_t
+__malloc_hugepage_config (size_t requested)
+{
+  if (requested == 1)
+    {
+      size_t pagesize = malloc_default_hugepage_size ();
+      if (pagesize != 0)
+	return make_malloc_hugepage_config (pagesize);
+    }
+
+  int dirfd = __open64_nocancel ("/sys/kernel/mm/hugepages",
+				 O_RDONLY | O_DIRECTORY, 0);
+  if (dirfd == -1)
+    return (struct malloc_hugepage_config_t) { 0, 0 };
+
+  bool found = false;
+
+  char buffer[1024];
+  while (true)
+    {
+#if !IS_IN(libc)
+# define __getdents64 getdents64
+#endif
+      ssize_t ret = __getdents64 (dirfd, buffer, sizeof (buffer));
+      if (ret == -1)
+	break;
+      else if (ret == 0)
+        break;
+
+      char *begin = buffer, *end = buffer + ret;
+      while (begin != end)
+        {
+          unsigned short int d_reclen;
+          memcpy (&d_reclen, begin + offsetof (struct dirent64, d_reclen),
+                  sizeof (d_reclen));
+          const char *dname = begin + offsetof (struct dirent64, d_name);
+          begin += d_reclen;
+
+          if (dname[0] == '.'
+	      || strncmp (dname, "hugepages-", sizeof ("hugepages-") - 1) != 0)
+            continue;
+
+	  /* Each entry represents a supported huge page in the form of:
+	     hugepages-<size>kB.  */
+	  size_t hpsize = 0;
+	  const char *sizestr = dname + sizeof ("hugepages-") - 1;
+	  for (int i = 0; sizestr[i] >= '0' && sizestr[i] <= '9'; i++)
+	    {
+	      hpsize *= 10;
+	      hpsize += sizestr[i] - '0';
+	    }
+	  hpsize *= 1024;
+
+	  if (hpsize == requested)
+	    {
+	      found = true;
+	      break;
+	    }
+        }
+      if (found)
+	break;
+    }
+
+  __close_nocancel (dirfd);
+
+  if (found)
+    return make_malloc_hugepage_config (requested);
+
+  return (struct malloc_hugepage_config_t) { 0, 0 };
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/4] malloc: Improve Huge Page support
  2021-08-18 14:19 [PATCH v2 0/4] malloc: Improve Huge Page support Adhemerval Zanella via Libc-alpha
                   ` (3 preceding siblings ...)
  2021-08-18 14:20 ` [PATCH v2 4/4] malloc: Add Huge Page support for sysmalloc Adhemerval Zanella via Libc-alpha
@ 2021-08-18 18:11 ` Siddhesh Poyarekar via Libc-alpha
  2021-08-19 11:26   ` Adhemerval Zanella via Libc-alpha
  2021-08-19 16:42 ` Guillaume Morin
  5 siblings, 1 reply; 24+ messages in thread
From: Siddhesh Poyarekar via Libc-alpha @ 2021-08-18 18:11 UTC (permalink / raw
  To: Adhemerval Zanella, libc-alpha; +Cc: Norbert Manthey, Guillaume Morin

On 8/18/21 7:49 PM, Adhemerval Zanella wrote:
> Linux currently supports two ways to use Huge Pages: either by using
> specific flags directly with the syscall (MAP_HUGETLB for mmap(), or
> SHM_HUGETLB for shmget()), or by using Transparent Huge Pages (THP)
> where the kernel will try to move allocated anonymous pages to Huge
> Pages blocks transparent to application.
> 
> Also, THP current support three different modes [1]: 'never', 'madvise',
> and 'always'.  The 'never' is self-explanatory and 'always' will enable
> THP for all anonymous memory.  However, 'madvise' is still the default
> for some systems and for such cases THP will be only used if the memory
> range is explicity advertise by the program through a
> madvise(MADV_HUGEPAGE) call.
> 
> This patchset adds a two new tunables to improve malloc() support with
> Huge Page:

I wonder if this could be done with just the one tunable, 
glibc.malloc.hugepages where:

0: Disabled (default)
1: Transparent, where we emulate "always" behaviour of THP
2: HugeTLB enabled with default hugepage size
<size>: HugeTLB enabled with the specified page size

When using HugeTLB, we don't really need to bother with THP so they seem 
mutually exclusive.

> 
>    - glibc.malloc.thp_madvise: instruct the system allocator to issue
>      a madvise(MADV_HUGEPAGE) call after a mmap() one for sizes larger
>      than the default huge page size.  The default behavior is to
>      disable it and if the system does not support THP the tunable also
>      does not enable the madvise() call.
> 
>    - glibc.malloc.mmap_hugetlb: instruct the system allocator to round
>      allocation to huge page sizes along with the required flags
>      (MAP_HUGETLB for Linux).  If the memory allocation fails, the
>      default system page size is used instead.  The default behavior is
>      to disable and a value of 1 uses the default system huge page size.
>      A positive value larger than 1 means to use a specific huge page
>      size, which is matched against the supported ones by the system.
> 
> The 'thp_madvise' tunable also changes the sbrk() usage by malloc
> on main arenas, where the increment is now aligned to the huge page
> size, instead of default page size.
> 
> The 'mmap_hugetlb' aims to replace the 'morecore' removed callback
> from 2.34 for libhugetlbfs (where the library tries to leverage the
> huge pages usage instead to provide a system allocator).  By
> implementing the support directly on the mmap() code patch there is
> no need to try emulate the morecore()/sbrk() semantic which simplifies
> the code and make memory shrink logic more straighforward.
> 
> The performance improvements are really dependent of the workload
> and the platform, however a simple testcase might show the possible
> improvements:

A simple test like below in benchtests would be very useful to at least 
get an initial understanding of the behaviour differences with different 
tunable values.  Later those who care can add more relevant workloads.

> 
> $ cat hugepages.cc
> #include <unordered_map>
> 
> int
> main (int argc, char *argv[])
> {
>    std::size_t iters = 10000000;
>    std::unordered_map <std::size_t, std::size_t> ht;
>    ht.reserve (iters);
>    for (std::size_t i = 0; i < iters; ++i)
>      ht.try_emplace (i, i);
> 
>    return 0;
> }
> $ g++ -std=c++17 -O2 hugepages.cc -o hugepages
> 
> On a x86_64 (Ryzen 9 5900X):
> 
>   Performance counter stats for 'env
> GLIBC_TUNABLES=glibc.malloc.thp_madvise=0 ./testrun.sh ./hugepages':
> 
>              98,874      faults
>             717,059      dTLB-loads
>             411,701      dTLB-load-misses          #   57.42% of all dTLB
> cache accesses
>           3,754,927      cache-misses              #    8.479 % of all
> cache refs
>          44,287,580      cache-references
> 
>         0.315278378 seconds time elapsed
> 
>         0.238635000 seconds user
>         0.076714000 seconds sys
> 
>   Performance counter stats for 'env
> GLIBC_TUNABLES=glibc.malloc.thp_madvise=1 ./testrun.sh ./hugepages':
> 
>               1,871      faults
>             120,035      dTLB-loads
>              19,882      dTLB-load-misses          #   16.56% of all dTLB
> cache accesses
>           4,182,942      cache-misses              #    7.452 % of all
> cache refs
>          56,128,995      cache-references
> 
>         0.262620733 seconds time elapsed
> 
>         0.222233000 seconds user
>         0.040333000 seconds sys
> 
> 
> On an AArch64 (cortex A72):
> 
>   Performance counter stats for 'env
> GLIBC_TUNABLES=glibc.malloc.thp_madvise=0 ./testrun.sh ./hugepages':
> 
>               98835      faults
>          2007234756      dTLB-loads
>             4613669      dTLB-load-misses          #    0.23% of all dTLB
> cache accesses
>             8831801      cache-misses              #    0.504 % of all
> cache refs
>          1751391405      cache-references
> 
>         0.616782575 seconds time elapsed
> 
>         0.460946000 seconds user
>         0.154309000 seconds sys
> 
>   Performance counter stats for 'env
> GLIBC_TUNABLES=glibc.malloc.thp_madvise=1 ./testrun.sh ./hugepages':
> 
>                 955      faults
>          1787401880      dTLB-loads
>              224034      dTLB-load-misses          #    0.01% of all dTLB
> cache accesses
>             5480917      cache-misses              #    0.337 % of all
> cache refs
>          1625937858      cache-references
> 
>         0.487773443 seconds time elapsed
> 
>         0.440894000 seconds user
>         0.046465000 seconds sys
> 
> 
> And on a powerpc64 (POWER8):
> 
>   Performance counter stats for 'env
> GLIBC_TUNABLES=glibc.malloc.thp_madvise=0 ./testrun.sh ./hugepages
> ':
> 
>                5453      faults
>                9940      dTLB-load-misses
>             1338152      cache-misses              #    0.101 % of all
> cache refs
>          1326037487      cache-references
> 
>         1.056355887 seconds time elapsed
> 
>         1.014633000 seconds user
>         0.041805000 seconds sys
> 
>   Performance counter stats for 'env
> GLIBC_TUNABLES=glibc.malloc.thp_madvise=1 ./testrun.sh ./hugepages
> ':
> 
>                1016      faults
>                1746      dTLB-load-misses
>              399052      cache-misses              #    0.030 % of all
> cache refs
>          1316059877      cache-references
> 
>         1.057810501 seconds time elapsed
> 
>         1.012175000 seconds user
>         0.045624000 seconds sys
> 
> It is worth to note that the powerpc64 machine has 'always' set
> on '/sys/kernel/mm/transparent_hugepage/enabled'.
> 
> Norbert Manthey's paper has more information with a more thoroughly
> performance analysis.
> 
> For testing run make check on x86_64-linux-gnu with thp_pagesize=1
> (directly on ptmalloc_init() after tunable initialiazation) and
> with mmap_hugetlb=1 (also directly on ptmalloc_init()) with about
> 10 large pages (so the fallback mmap() call is used) and with
> 1024 large pages (so all mmap(MAP_HUGETLB) are successful).

You could add tests similar to mcheck and malloc-check, i.e. add 
$(tests-hugepages) to run all malloc tests again with the various 
tunable values.  See tests-mcheck for example.

> --
> 
> Changes from previous version:
> 
>    - Renamed thp_pagesize to thp_madvise and make it a boolean state.
>    - Added MAP_HUGETLB support for mmap().
>    - Remove system specific hooks for THP huge page size in favor of
>      Linux generic implementation.
>    - Initial program segments need to be page aligned for the
>      first madvise call.
> 
> Adhemerval Zanella (4):
>    malloc: Add madvise support for Transparent Huge Pages
>    malloc: Add THP/madvise support for sbrk
>    malloc: Move mmap logic to its own function
>    malloc: Add Huge Page support for sysmalloc
> 
>   NEWS                                       |   9 +-
>   elf/dl-tunables.list                       |   9 +
>   elf/tst-rtld-list-tunables.exp             |   2 +
>   include/libc-pointer-arith.h               |  10 +
>   malloc/arena.c                             |   7 +
>   malloc/malloc-internal.h                   |   1 +
>   malloc/malloc.c                            | 263 +++++++++++++++------
>   manual/tunables.texi                       |  23 ++
>   sysdeps/generic/Makefile                   |   8 +
>   sysdeps/generic/malloc-hugepages.c         |  37 +++
>   sysdeps/generic/malloc-hugepages.h         |  49 ++++
>   sysdeps/unix/sysv/linux/malloc-hugepages.c | 201 ++++++++++++++++
>   12 files changed, 542 insertions(+), 77 deletions(-)
>   create mode 100644 sysdeps/generic/malloc-hugepages.c
>   create mode 100644 sysdeps/generic/malloc-hugepages.h
>   create mode 100644 sysdeps/unix/sysv/linux/malloc-hugepages.c
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/4] malloc: Add madvise support for Transparent Huge Pages
  2021-08-18 14:19 ` [PATCH v2 1/4] malloc: Add madvise support for Transparent Huge Pages Adhemerval Zanella via Libc-alpha
@ 2021-08-18 18:42   ` Siddhesh Poyarekar via Libc-alpha
  2021-08-19 12:00     ` Adhemerval Zanella via Libc-alpha
  0 siblings, 1 reply; 24+ messages in thread
From: Siddhesh Poyarekar via Libc-alpha @ 2021-08-18 18:42 UTC (permalink / raw
  To: Adhemerval Zanella, libc-alpha; +Cc: Norbert Manthey, Guillaume Morin

On 8/18/21 7:49 PM, Adhemerval Zanella wrote:
> Linux Transparent Huge Pages (THP) current support three different
> states: 'never', 'madvise', and 'always'.  The 'never' is
> self-explanatory and 'always' will enable THP for all anonymous
> memory.  However, 'madvise' is still the default for some system and
> for such case THP will be only used if the memory range is explicity
> advertise by the program through a madvise(MADV_HUGEPAGE) call.
> 
> To enable it a new tunable is provided, 'glibc.malloc.thp_madvise',
> where setting to a value diffent than 0 enables the madvise call.
> Linux current only support one page size for THP, even if the
> architecture supports multiple sizes.
> 
> This patch issues the madvise(MADV_HUGEPAGE) call after a successful
> mmap() call at sysmalloc() with sizes larger than the default huge
> page size.  The madvise() call is disable is system does not support
> THP or if it has the mode set to "never".
> 
> Checked on x86_64-linux-gnu.
> ---
>   NEWS                                       |  5 +-
>   elf/dl-tunables.list                       |  5 ++
>   elf/tst-rtld-list-tunables.exp             |  1 +
>   malloc/arena.c                             |  5 ++
>   malloc/malloc-internal.h                   |  1 +
>   malloc/malloc.c                            | 48 ++++++++++++++
>   manual/tunables.texi                       |  9 +++
>   sysdeps/generic/Makefile                   |  8 +++
>   sysdeps/generic/malloc-hugepages.c         | 31 +++++++++
>   sysdeps/generic/malloc-hugepages.h         | 37 +++++++++++
>   sysdeps/unix/sysv/linux/malloc-hugepages.c | 76 ++++++++++++++++++++++
>   11 files changed, 225 insertions(+), 1 deletion(-)
>   create mode 100644 sysdeps/generic/malloc-hugepages.c
>   create mode 100644 sysdeps/generic/malloc-hugepages.h
>   create mode 100644 sysdeps/unix/sysv/linux/malloc-hugepages.c
> 
> diff --git a/NEWS b/NEWS
> index 79c895e382..9b2345d08c 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -9,7 +9,10 @@ Version 2.35
>   
>   Major new features:
>   
> -  [Add new features here]
> +* On Linux, a new tunable, glibc.malloc.thp_madvise, can be used to
> +  make malloc issue madvise plus MADV_HUGEPAGE on mmap and sbrk calls.
> +  It might improve performance with Transparent Huge Pages madvise mode
> +  depending of the workload.
>   
>   Deprecated and removed features, and other changes affecting compatibility:
>   
> diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list
> index 8ddd4a2314..67df6dbc2c 100644
> --- a/elf/dl-tunables.list
> +++ b/elf/dl-tunables.list
> @@ -92,6 +92,11 @@ glibc {
>         minval: 0
>         security_level: SXID_IGNORE
>       }
> +    thp_madvise {
> +      type: INT_32
> +      minval: 0
> +      maxval: 1
> +    }
>     }
>     cpu {
>       hwcap_mask {
> diff --git a/elf/tst-rtld-list-tunables.exp b/elf/tst-rtld-list-tunables.exp
> index 9f66c52885..d8109fa31c 100644
> --- a/elf/tst-rtld-list-tunables.exp
> +++ b/elf/tst-rtld-list-tunables.exp
> @@ -8,6 +8,7 @@ glibc.malloc.perturb: 0 (min: 0, max: 255)
>   glibc.malloc.tcache_count: 0x0 (min: 0x0, max: 0x[f]+)
>   glibc.malloc.tcache_max: 0x0 (min: 0x0, max: 0x[f]+)
>   glibc.malloc.tcache_unsorted_limit: 0x0 (min: 0x0, max: 0x[f]+)
> +glibc.malloc.thp_madvise: 0 (min: 0, max: 1)
>   glibc.malloc.top_pad: 0x0 (min: 0x0, max: 0x[f]+)
>   glibc.malloc.trim_threshold: 0x0 (min: 0x0, max: 0x[f]+)
>   glibc.rtld.nns: 0x4 (min: 0x1, max: 0x10)
> diff --git a/malloc/arena.c b/malloc/arena.c
> index 667484630e..81bff54303 100644
> --- a/malloc/arena.c
> +++ b/malloc/arena.c
> @@ -231,6 +231,7 @@ TUNABLE_CALLBACK_FNDECL (set_tcache_count, size_t)
>   TUNABLE_CALLBACK_FNDECL (set_tcache_unsorted_limit, size_t)
>   #endif
>   TUNABLE_CALLBACK_FNDECL (set_mxfast, size_t)
> +TUNABLE_CALLBACK_FNDECL (set_thp_madvise, int32_t)
>   #else
>   /* Initialization routine. */
>   #include <string.h>
> @@ -331,6 +332,7 @@ ptmalloc_init (void)
>   	       TUNABLE_CALLBACK (set_tcache_unsorted_limit));
>   # endif
>     TUNABLE_GET (mxfast, size_t, TUNABLE_CALLBACK (set_mxfast));
> +  TUNABLE_GET (thp_madvise, int32_t, TUNABLE_CALLBACK (set_thp_madvise));
>   #else
>     if (__glibc_likely (_environ != NULL))
>       {
> @@ -509,6 +511,9 @@ new_heap (size_t size, size_t top_pad)
>         __munmap (p2, HEAP_MAX_SIZE);
>         return 0;
>       }
> +
> +  sysmadvise_thp (p2, size);
> +
>     h = (heap_info *) p2;
>     h->size = size;
>     h->mprotect_size = size;
> diff --git a/malloc/malloc-internal.h b/malloc/malloc-internal.h
> index 0c7b5a183c..7493e34d86 100644
> --- a/malloc/malloc-internal.h
> +++ b/malloc/malloc-internal.h
> @@ -22,6 +22,7 @@
>   #include <malloc-machine.h>
>   #include <malloc-sysdep.h>
>   #include <malloc-size.h>
> +#include <malloc-hugepages.h>
>   
>   /* Called in the parent process before a fork.  */
>   void __malloc_fork_lock_parent (void) attribute_hidden;
> diff --git a/malloc/malloc.c b/malloc/malloc.c
> index e065785af7..ad3eec41ac 100644
> --- a/malloc/malloc.c
> +++ b/malloc/malloc.c
> @@ -1881,6 +1881,11 @@ struct malloc_par
>     INTERNAL_SIZE_T arena_test;
>     INTERNAL_SIZE_T arena_max;
>   
> +#if HAVE_TUNABLES
> +  /* Transparent Large Page support.  */
> +  INTERNAL_SIZE_T thp_pagesize;
> +#endif
> +
>     /* Memory map support */
>     int n_mmaps;
>     int n_mmaps_max;
> @@ -2009,6 +2014,20 @@ free_perturb (char *p, size_t n)
>   
>   #include <stap-probe.h>
>   
> +/* ----------- Routines dealing with transparent huge pages ----------- */
> +
> +static inline void
> +sysmadvise_thp (void *p, INTERNAL_SIZE_T size)
> +{
> +#if HAVE_TUNABLES && defined (MADV_HUGEPAGE)
> +  /* Do not consider areas smaller than a huge page or if the tunable is
> +     not active.  */

You also shouldn't bother setting it if 
/sys/kernel/mm/transparent_hugepage/enabled is set to "enabled" since 
it's redundant.

> +  if (mp_.thp_pagesize == 0 || size < mp_.thp_pagesize)
> +    return;
> +  __madvise (p, size, MADV_HUGEPAGE);
> +#endif
> +}
> +
>   /* ------------------- Support for multiple arenas -------------------- */
>   #include "arena.c"
>   
> @@ -2446,6 +2465,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
>   
>             if (mm != MAP_FAILED)
>               {
> +	      sysmadvise_thp (mm, size);
> +
>                 /*
>                    The offset to the start of the mmapped region is stored
>                    in the prev_size field of the chunk. This allows us to adjust
> @@ -2607,6 +2628,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
>         if (size > 0)
>           {
>             brk = (char *) (MORECORE (size));
> +	  if (brk != (char *) (MORECORE_FAILURE))
> +	    sysmadvise_thp (brk, size);
>             LIBC_PROBE (memory_sbrk_more, 2, brk, size);
>           }
>   
> @@ -2638,6 +2661,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
>   
>                 if (mbrk != MAP_FAILED)
>                   {
> +		  sysmadvise_thp (mbrk, size);
> +
>                     /* We do not need, and cannot use, another sbrk call to find end */
>                     brk = mbrk;
>                     snd_brk = brk + size;
> @@ -2749,6 +2774,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
>                         correction = 0;
>                         snd_brk = (char *) (MORECORE (0));
>                       }
> +		  else
> +		    sysmadvise_thp (snd_brk, correction);
>                   }
>   
>                 /* handle non-contiguous cases */
> @@ -2989,6 +3016,8 @@ mremap_chunk (mchunkptr p, size_t new_size)
>     if (cp == MAP_FAILED)
>       return 0;
>   
> +  sysmadvise_thp (cp, new_size);
> +
>     p = (mchunkptr) (cp + offset);
>   
>     assert (aligned_OK (chunk2mem (p)));
> @@ -5325,6 +5354,25 @@ do_set_mxfast (size_t value)
>     return 0;
>   }
>   
> +#if HAVE_TUNABLES
> +static __always_inline int
> +do_set_thp_madvise (int32_t value)
> +{
> +  if (value > 0)
> +    {
> +      enum malloc_thp_mode_t thp_mode = __malloc_thp_mode ();
> +      /*
> +	 Only enables THP usage is system does support it and has at least
> +	 always or madvise mode.  Otherwise the madvise() call is wasteful.
> +       */
> +      if (thp_mode != malloc_thp_mode_not_supported
> +	  && thp_mode != malloc_thp_mode_never)
> +	mp_.thp_pagesize = __malloc_default_thp_pagesize ();
> +    }
> +  return 0;
> +}
> +#endif
> +
>   int
>   __libc_mallopt (int param_number, int value)
>   {
> diff --git a/manual/tunables.texi b/manual/tunables.texi
> index 658547c613..93c46807f9 100644
> --- a/manual/tunables.texi
> +++ b/manual/tunables.texi
> @@ -270,6 +270,15 @@ pointer, so add 4 on 32-bit systems or 8 on 64-bit systems to the size
>   passed to @code{malloc} for the largest bin size to enable.
>   @end deftp
>   
> +@deftp Tunable glibc.malloc.thp_madivse
> +This tunable enable the use of @code{madvise} with @code{MADV_HUGEPAGE} after
> +the system allocator allocated memory through @code{mmap} if the system supports
> +Transparent Huge Page (currently only Linux).
> +
> +The default value of this tunable is @code{0}, which disable its usage.
> +Setting to a positive value enable the @code{madvise} call.
> +@end deftp
> +
>   @node Dynamic Linking Tunables
>   @section Dynamic Linking Tunables
>   @cindex dynamic linking tunables
> diff --git a/sysdeps/generic/Makefile b/sysdeps/generic/Makefile
> index a209e85cc4..8eef83c94d 100644
> --- a/sysdeps/generic/Makefile
> +++ b/sysdeps/generic/Makefile
> @@ -27,3 +27,11 @@ sysdep_routines += framestate unwind-pe
>   shared-only-routines += framestate unwind-pe
>   endif
>   endif
> +
> +ifeq ($(subdir),malloc)
> +sysdep_malloc_debug_routines += malloc-hugepages
> +endif
> +
> +ifeq ($(subdir),misc)
> +sysdep_routines += malloc-hugepages
> +endif
> diff --git a/sysdeps/generic/malloc-hugepages.c b/sysdeps/generic/malloc-hugepages.c
> new file mode 100644
> index 0000000000..262bcdbeb8
> --- /dev/null
> +++ b/sysdeps/generic/malloc-hugepages.c
> @@ -0,0 +1,31 @@
> +/* Huge Page support.  Generic implementation.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public License as
> +   published by the Free Software Foundation; either version 2.1 of the
> +   License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; see the file COPYING.LIB.  If
> +   not, see <https://www.gnu.org/licenses/>.  */
> +
> +#include <malloc-hugepages.h>
> +
> +size_t
> +__malloc_default_thp_pagesize (void)
> +{
> +  return 0;
> +}
> +
> +enum malloc_thp_mode_t
> +__malloc_thp_mode (void)
> +{
> +  return malloc_thp_mode_not_supported;
> +}
> diff --git a/sysdeps/generic/malloc-hugepages.h b/sysdeps/generic/malloc-hugepages.h
> new file mode 100644
> index 0000000000..664cda9b67
> --- /dev/null
> +++ b/sysdeps/generic/malloc-hugepages.h
> @@ -0,0 +1,37 @@
> +/* Malloc huge page support.  Generic implementation.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public License as
> +   published by the Free Software Foundation; either version 2.1 of the
> +   License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; see the file COPYING.LIB.  If
> +   not, see <https://www.gnu.org/licenses/>.  */
> +
> +#ifndef _MALLOC_HUGEPAGES_H
> +#define _MALLOC_HUGEPAGES_H
> +
> +#include <stddef.h>
> +
> +/* Return the default transparent huge page size.  */
> +size_t __malloc_default_thp_pagesize (void) attribute_hidden;
> +
> +enum malloc_thp_mode_t
> +{
> +  malloc_thp_mode_always,
> +  malloc_thp_mode_madvise,
> +  malloc_thp_mode_never,
> +  malloc_thp_mode_not_supported
> +};
> +
> +enum malloc_thp_mode_t __malloc_thp_mode (void) attribute_hidden;
> +
> +#endif /* _MALLOC_HUGEPAGES_H */
> diff --git a/sysdeps/unix/sysv/linux/malloc-hugepages.c b/sysdeps/unix/sysv/linux/malloc-hugepages.c
> new file mode 100644
> index 0000000000..66589127cd
> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/malloc-hugepages.c
> @@ -0,0 +1,76 @@
> +/* Huge Page support.  Linux implementation.
> +   Copyright (C) 2021 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public License as
> +   published by the Free Software Foundation; either version 2.1 of the
> +   License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; see the file COPYING.LIB.  If
> +   not, see <https://www.gnu.org/licenses/>.  */
> +
> +#include <intprops.h>
> +#include <malloc-hugepages.h>
> +#include <not-cancel.h>
> +
> +size_t
> +__malloc_default_thp_pagesize (void)
> +{

Likewise for page size; this could be cached too.

> +  int fd = __open64_nocancel (
> +    "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size", O_RDONLY);
> +  if (fd == -1)
> +    return 0;
> +
> +
> +  char str[INT_BUFSIZE_BOUND (size_t)];
> +  ssize_t s = __read_nocancel (fd, str, sizeof (str));
> +  __close_nocancel (fd);
> +
> +  if (s < 0)
> +    return 0;
> +
> +  int r = 0;
> +  for (ssize_t i = 0; i < s; i++)
> +    {
> +      if (str[i] == '\n')
> +	break;
> +      r *= 10;
> +      r += str[i] - '0';
> +    }
> +  return r;
> +}
> +
> +enum malloc_thp_mode_t
> +__malloc_thp_mode (void)
> +{
> +  int fd = __open64_nocancel ("/sys/kernel/mm/transparent_hugepage/enabled",
> +			      O_RDONLY);
> +  if (fd == -1)
> +    return malloc_thp_mode_not_supported;
> +
> +  static const char mode_always[]  = "[always] madvise never\n";
> +  static const char mode_madvise[] = "always [madvise] never\n";
> +  static const char mode_never[]   = "always madvise [never]\n";
> +
> +  char str[sizeof(mode_always)];
> +  ssize_t s = __read_nocancel (fd, str, sizeof (str));
> +  __close_nocancel (fd);
> +
> +  if (s == sizeof (mode_always) - 1)
> +    {
> +      if (strcmp (str, mode_always) == 0)
> +	return malloc_thp_mode_always;
> +      else if (strcmp (str, mode_madvise) == 0)
> +	return malloc_thp_mode_madvise;
> +      else if (strcmp (str, mode_never) == 0)
> +	return malloc_thp_mode_never;
> +    }
> +  return malloc_thp_mode_not_supported;
> +}
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 3/4] malloc: Move mmap logic to its own function
  2021-08-18 14:19 ` [PATCH v2 3/4] malloc: Move mmap logic to its own function Adhemerval Zanella via Libc-alpha
@ 2021-08-19  0:47   ` Siddhesh Poyarekar via Libc-alpha
  0 siblings, 0 replies; 24+ messages in thread
From: Siddhesh Poyarekar via Libc-alpha @ 2021-08-19  0:47 UTC (permalink / raw
  To: Adhemerval Zanella, libc-alpha; +Cc: Norbert Manthey, Guillaume Morin

On 8/18/21 7:49 PM, Adhemerval Zanella via Libc-alpha wrote:
> So it can be used with different pagesize and flags.
> ---
>   malloc/malloc.c | 155 +++++++++++++++++++++++++-----------------------
>   1 file changed, 82 insertions(+), 73 deletions(-)
> 
> diff --git a/malloc/malloc.c b/malloc/malloc.c
> index 1a2c798a35..4bfcea286f 100644
> --- a/malloc/malloc.c
> +++ b/malloc/malloc.c
> @@ -2414,6 +2414,85 @@ do_check_malloc_state (mstate av)
>      be extended or replaced.
>    */
>   
> +static void *
> +sysmalloc_mmap (INTERNAL_SIZE_T nb, size_t pagesize, int extra_flags, mstate av)
> +{
> +  long int size;
> +
> +  /*
> +    Round up size to nearest page.  For mmapped chunks, the overhead is one
> +    SIZE_SZ unit larger than for normal chunks, because there is no
> +    following chunk whose prev_size field could be used.
> +
> +    See the front_misalign handling below, for glibc there is no need for
> +    further alignments unless we have have high alignment.
> +   */
> +  if (MALLOC_ALIGNMENT == CHUNK_HDR_SZ)
> +    size = ALIGN_UP (nb + SIZE_SZ, pagesize);
> +  else
> +    size = ALIGN_UP (nb + SIZE_SZ + MALLOC_ALIGN_MASK, pagesize);
> +
> +  /* Don't try if size wraps around 0.  */
> +  if ((unsigned long) (size) <= (unsigned long) (nb))
> +    return MAP_FAILED;
> +
> +  char *mm = (char *) MMAP (0, size,
> +			    mtag_mmap_flags | PROT_READ | PROT_WRITE,
> +			    extra_flags);
> +  if (mm == MAP_FAILED)
> +    return mm;
> +
> +  sysmadvise_thp (mm, size);
> +
> +  /*
> +    The offset to the start of the mmapped region is stored in the prev_size
> +    field of the chunk.  This allows us to adjust returned start address to
> +    meet alignment requirements here and in memalign(), and still be able to
> +    compute proper address argument for later munmap in free() and realloc().
> +   */
> +
> +  INTERNAL_SIZE_T front_misalign; /* unusable bytes at front of new space */
> +
> +  if (MALLOC_ALIGNMENT == CHUNK_HDR_SZ)
> +    {
> +      /* For glibc, chunk2mem increases the address by CHUNK_HDR_SZ and
> +	 MALLOC_ALIGN_MASK is CHUNK_HDR_SZ-1.  Each mmap'ed area is page
> +	 aligned and therefore definitely MALLOC_ALIGN_MASK-aligned.  */
> +      assert (((INTERNAL_SIZE_T) chunk2mem (mm) & MALLOC_ALIGN_MASK) == 0);
> +      front_misalign = 0;
> +    }
> +  else
> +    front_misalign = (INTERNAL_SIZE_T) chunk2mem (mm) & MALLOC_ALIGN_MASK;
> +
> +  mchunkptr p;                    /* the allocated/returned chunk */
> +
> +  if (front_misalign > 0)
> +    {
> +      ptrdiff_t correction = MALLOC_ALIGNMENT - front_misalign;
> +      p = (mchunkptr) (mm + correction);
> +      set_prev_size (p, correction);
> +      set_head (p, (size - correction) | IS_MMAPPED);
> +    }
> +  else
> +    {
> +      p = (mchunkptr) mm;
> +      set_prev_size (p, 0);
> +      set_head (p, size | IS_MMAPPED);
> +    }
> +
> +  /* update statistics */
> +  int new = atomic_exchange_and_add (&mp_.n_mmaps, 1) + 1;
> +  atomic_max (&mp_.max_n_mmaps, new);
> +
> +  unsigned long sum;
> +  sum = atomic_exchange_and_add (&mp_.mmapped_mem, size) + size;
> +  atomic_max (&mp_.max_mmapped_mem, sum);
> +
> +  check_chunk (av, p);
> +
> +  return chunk2mem (p);
> +}
> +
>   static void *
>   sysmalloc (INTERNAL_SIZE_T nb, mstate av)
>   {
> @@ -2451,81 +2530,11 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
>         || ((unsigned long) (nb) >= (unsigned long) (mp_.mmap_threshold)
>   	  && (mp_.n_mmaps < mp_.n_mmaps_max)))
>       {
> -      char *mm;           /* return value from mmap call*/
> -
>       try_mmap:

This is a great opportunity to get rid of this goto.

> -      /*
> -         Round up size to nearest page.  For mmapped chunks, the overhead
> -         is one SIZE_SZ unit larger than for normal chunks, because there
> -         is no following chunk whose prev_size field could be used.
> -
> -         See the front_misalign handling below, for glibc there is no
> -         need for further alignments unless we have have high alignment.
> -       */
> -      if (MALLOC_ALIGNMENT == CHUNK_HDR_SZ)
> -        size = ALIGN_UP (nb + SIZE_SZ, pagesize);
> -      else
> -        size = ALIGN_UP (nb + SIZE_SZ + MALLOC_ALIGN_MASK, pagesize);
> +      char *mm = sysmalloc_mmap (nb, pagesize, 0, av);
> +      if (mm != MAP_FAILED)
> +	return mm;
>         tried_mmap = true;
> -
> -      /* Don't try if size wraps around 0 */
> -      if ((unsigned long) (size) > (unsigned long) (nb))
> -        {
> -          mm = (char *) (MMAP (0, size,
> -			       mtag_mmap_flags | PROT_READ | PROT_WRITE, 0));
> -
> -          if (mm != MAP_FAILED)
> -            {
> -	      sysmadvise_thp (mm, size);
> -
> -              /*
> -                 The offset to the start of the mmapped region is stored
> -                 in the prev_size field of the chunk. This allows us to adjust
> -                 returned start address to meet alignment requirements here
> -                 and in memalign(), and still be able to compute proper
> -                 address argument for later munmap in free() and realloc().
> -               */
> -
> -              if (MALLOC_ALIGNMENT == CHUNK_HDR_SZ)
> -                {
> -                  /* For glibc, chunk2mem increases the address by
> -                     CHUNK_HDR_SZ and MALLOC_ALIGN_MASK is
> -                     CHUNK_HDR_SZ-1.  Each mmap'ed area is page
> -                     aligned and therefore definitely
> -                     MALLOC_ALIGN_MASK-aligned.  */
> -                  assert (((INTERNAL_SIZE_T) chunk2mem (mm) & MALLOC_ALIGN_MASK) == 0);
> -                  front_misalign = 0;
> -                }
> -              else
> -                front_misalign = (INTERNAL_SIZE_T) chunk2mem (mm) & MALLOC_ALIGN_MASK;
> -              if (front_misalign > 0)
> -                {
> -                  correction = MALLOC_ALIGNMENT - front_misalign;
> -                  p = (mchunkptr) (mm + correction);
> -		  set_prev_size (p, correction);
> -                  set_head (p, (size - correction) | IS_MMAPPED);
> -                }
> -              else
> -                {
> -                  p = (mchunkptr) mm;
> -		  set_prev_size (p, 0);
> -                  set_head (p, size | IS_MMAPPED);
> -                }
> -
> -              /* update statistics */
> -
> -              int new = atomic_exchange_and_add (&mp_.n_mmaps, 1) + 1;
> -              atomic_max (&mp_.max_n_mmaps, new);
> -
> -              unsigned long sum;
> -              sum = atomic_exchange_and_add (&mp_.mmapped_mem, size) + size;
> -              atomic_max (&mp_.max_mmapped_mem, sum);
> -
> -              check_chunk (av, p);
> -
> -              return chunk2mem (p);
> -            }
> -        }
>       }
>   
>     /* There are no usable arenas and mmap also failed.  */
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 4/4] malloc: Add Huge Page support for sysmalloc
  2021-08-18 14:20 ` [PATCH v2 4/4] malloc: Add Huge Page support for sysmalloc Adhemerval Zanella via Libc-alpha
@ 2021-08-19  1:03   ` Siddhesh Poyarekar via Libc-alpha
  2021-08-19 12:08     ` Adhemerval Zanella via Libc-alpha
  2021-08-19 17:58   ` Matheus Castanho via Libc-alpha
  1 sibling, 1 reply; 24+ messages in thread
From: Siddhesh Poyarekar via Libc-alpha @ 2021-08-19  1:03 UTC (permalink / raw
  To: Adhemerval Zanella, libc-alpha; +Cc: Norbert Manthey, Guillaume Morin

On 8/18/21 7:50 PM, Adhemerval Zanella via Libc-alpha wrote:
> A new tunable, 'glibc.malloc.mmap_hugetlb', adds support to use Huge Page
> support directly with mmap() calls.  The required supported sizes and
> flags for mmap() are provided by an arch-specific internal hook
> malloc_hp_config().
> 
> Currently it first try mmap() using the huge page size and fallback to
> default page size and sbrk() call if kernel returns MMAP_FAILED.
> 
> The default malloc_hp_config() implementation does not enable it even
> if the tunable is set.
> 
> Checked on x86_64-linux-gnu.
> ---
>   NEWS                                       |   4 +
>   elf/dl-tunables.list                       |   4 +
>   elf/tst-rtld-list-tunables.exp             |   1 +
>   malloc/arena.c                             |   2 +
>   malloc/malloc.c                            |  35 +++++-
>   manual/tunables.texi                       |  14 +++
>   sysdeps/generic/malloc-hugepages.c         |   6 +
>   sysdeps/generic/malloc-hugepages.h         |  12 ++
>   sysdeps/unix/sysv/linux/malloc-hugepages.c | 125 +++++++++++++++++++++
>   9 files changed, 200 insertions(+), 3 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index 9b2345d08c..412bf3e6f8 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -14,6 +14,10 @@ Major new features:
>     It might improve performance with Transparent Huge Pages madvise mode
>     depending of the workload.
>   
> +* On Linux, a new tunable, glibc.malloc.mmap_hugetlb, can be used to
> +  instruct malloc to try use Huge Pages when allocate memory with mmap()
> +  calls (through the use of MAP_HUGETLB).
> +
>   Deprecated and removed features, and other changes affecting compatibility:
>   
>     [Add deprecations, removals and changes affecting compatibility here]
> diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list
> index 67df6dbc2c..209c2d8592 100644
> --- a/elf/dl-tunables.list
> +++ b/elf/dl-tunables.list
> @@ -97,6 +97,10 @@ glibc {
>         minval: 0
>         maxval: 1
>       }
> +    mmap_hugetlb {
> +      type: SIZE_T
> +      minval: 0
> +    }
>     }
>     cpu {
>       hwcap_mask {
> diff --git a/elf/tst-rtld-list-tunables.exp b/elf/tst-rtld-list-tunables.exp
> index d8109fa31c..49f033ce91 100644
> --- a/elf/tst-rtld-list-tunables.exp
> +++ b/elf/tst-rtld-list-tunables.exp
> @@ -1,6 +1,7 @@
>   glibc.malloc.arena_max: 0x0 (min: 0x1, max: 0x[f]+)
>   glibc.malloc.arena_test: 0x0 (min: 0x1, max: 0x[f]+)
>   glibc.malloc.check: 0 (min: 0, max: 3)
> +glibc.malloc.mmap_hugetlb: 0x0 (min: 0x0, max: 0x[f]+)
>   glibc.malloc.mmap_max: 0 (min: 0, max: 2147483647)
>   glibc.malloc.mmap_threshold: 0x0 (min: 0x0, max: 0x[f]+)
>   glibc.malloc.mxfast: 0x0 (min: 0x0, max: 0x[f]+)
> diff --git a/malloc/arena.c b/malloc/arena.c
> index 81bff54303..4efb5581c1 100644
> --- a/malloc/arena.c
> +++ b/malloc/arena.c
> @@ -232,6 +232,7 @@ TUNABLE_CALLBACK_FNDECL (set_tcache_unsorted_limit, size_t)
>   #endif
>   TUNABLE_CALLBACK_FNDECL (set_mxfast, size_t)
>   TUNABLE_CALLBACK_FNDECL (set_thp_madvise, int32_t)
> +TUNABLE_CALLBACK_FNDECL (set_mmap_hugetlb, size_t)
>   #else
>   /* Initialization routine. */
>   #include <string.h>
> @@ -333,6 +334,7 @@ ptmalloc_init (void)
>   # endif
>     TUNABLE_GET (mxfast, size_t, TUNABLE_CALLBACK (set_mxfast));
>     TUNABLE_GET (thp_madvise, int32_t, TUNABLE_CALLBACK (set_thp_madvise));
> +  TUNABLE_GET (mmap_hugetlb, size_t, TUNABLE_CALLBACK (set_mmap_hugetlb));
>   #else
>     if (__glibc_likely (_environ != NULL))
>       {
> diff --git a/malloc/malloc.c b/malloc/malloc.c
> index 4bfcea286f..8cf2d6855e 100644
> --- a/malloc/malloc.c
> +++ b/malloc/malloc.c
> @@ -1884,6 +1884,10 @@ struct malloc_par
>   #if HAVE_TUNABLES
>     /* Transparent Large Page support.  */
>     INTERNAL_SIZE_T thp_pagesize;
> +  /* A value different than 0 means to align mmap allocation to hp_pagesize
> +     add hp_flags on flags.  */
> +  INTERNAL_SIZE_T hp_pagesize;
> +  int hp_flags;
>   #endif
>   
>     /* Memory map support */
> @@ -2415,7 +2419,8 @@ do_check_malloc_state (mstate av)
>    */
>   
>   static void *
> -sysmalloc_mmap (INTERNAL_SIZE_T nb, size_t pagesize, int extra_flags, mstate av)
> +sysmalloc_mmap (INTERNAL_SIZE_T nb, size_t pagesize, int extra_flags, mstate av,
> +		bool set_thp)
>   {
>     long int size;
>   
> @@ -2442,7 +2447,8 @@ sysmalloc_mmap (INTERNAL_SIZE_T nb, size_t pagesize, int extra_flags, mstate av)
>     if (mm == MAP_FAILED)
>       return mm;
>   
> -  sysmadvise_thp (mm, size);
> +  if (set_thp)
> +    sysmadvise_thp (mm, size);

If MAP_HUGEPAGE is set in extra_flags then we don't need madvise; 
there's no need for set_thp.

>   
>     /*
>       The offset to the start of the mmapped region is stored in the prev_size
> @@ -2531,7 +2537,18 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
>   	  && (mp_.n_mmaps < mp_.n_mmaps_max)))
>       {
>       try_mmap:
> -      char *mm = sysmalloc_mmap (nb, pagesize, 0, av);
> +      char *mm;
> +#if HAVE_TUNABLES
> +      if (mp_.hp_pagesize > 0)
> +	{
> +	  /* There is no need to isse the THP madvise call if Huge Pages are
> +	     used directly.  */
> +	  mm = sysmalloc_mmap (nb, mp_.hp_pagesize, mp_.hp_flags, av, false);
> +	  if (mm != MAP_FAILED)
> +	    return mm;
> +	}
> +#endif
> +      mm = sysmalloc_mmap (nb, pagesize, 0, av, true);

A single tunable ought to allow you to do all this in just sysmalloc_mmap.

>         if (mm != MAP_FAILED)
>   	return mm;
>         tried_mmap = true;
> @@ -5405,6 +5422,18 @@ do_set_thp_madvise (int32_t value)
>       }
>     return 0;
>   }
> +
> +static __always_inline int
> +do_set_mmap_hugetlb (size_t value)
> +{
> +  if (value > 0)
> +    {
> +      struct malloc_hugepage_config_t cfg = __malloc_hugepage_config (value);
> +      mp_.hp_pagesize = cfg.pagesize;
> +      mp_.hp_flags = cfg.flags;

Instead of making a struct to pass it, you could just pass 
&mp.hp_pagesize and &mp.hp_flags.  Also, with a single tunable, you do 
this only when value > 1.  For value == 0, you set the default THP 
pagesize and set flags to 0.

> +    }
> +  return 0;
> +}
>   #endif
>   
>   int
> diff --git a/manual/tunables.texi b/manual/tunables.texi
> index 93c46807f9..4da6a02778 100644
> --- a/manual/tunables.texi
> +++ b/manual/tunables.texi
> @@ -279,6 +279,20 @@ The default value of this tunable is @code{0}, which disable its usage.
>   Setting to a positive value enable the @code{madvise} call.
>   @end deftp
>   
> +@deftp Tunable glibc.malloc.mmap_hugetlb
> +This tunable enable the use of Huge Pages when the system supports it (currently
> +only Linux).  It is done by aligning the memory size and passing the required
> +flags (@code{MAP_HUGETLB} on Linux) when issuing the @code{mmap} to allocate
> +memory from the system.
> +
> +The default value of this tunable is @code{0}, which disable its usage.
> +The special value @code{1} will try to gather the system default huge page size,
> +while a value larger than @code{1} will try to match it with the supported system
> +huge page size.  If either no default huge page size could be obtained or if the
> +requested size does not match the supported ones, the huge pages supports will be
> +disabled.
> +@end deftp
> +
>   @node Dynamic Linking Tunables
>   @section Dynamic Linking Tunables
>   @cindex dynamic linking tunables
> diff --git a/sysdeps/generic/malloc-hugepages.c b/sysdeps/generic/malloc-hugepages.c
> index 262bcdbeb8..e5f5c1ec98 100644
> --- a/sysdeps/generic/malloc-hugepages.c
> +++ b/sysdeps/generic/malloc-hugepages.c
> @@ -29,3 +29,9 @@ __malloc_thp_mode (void)
>   {
>     return malloc_thp_mode_not_supported;
>   }
> +
> +/* Return the default transparent huge page size.  */
> +struct malloc_hugepage_config_t __malloc_hugepage_config (size_t requested)
> +{
> +  return (struct malloc_hugepage_config_t) { 0, 0 };
> +}
> diff --git a/sysdeps/generic/malloc-hugepages.h b/sysdeps/generic/malloc-hugepages.h
> index 664cda9b67..27f7adfea5 100644
> --- a/sysdeps/generic/malloc-hugepages.h
> +++ b/sysdeps/generic/malloc-hugepages.h
> @@ -34,4 +34,16 @@ enum malloc_thp_mode_t
>   
>   enum malloc_thp_mode_t __malloc_thp_mode (void) attribute_hidden;
>   
> +struct malloc_hugepage_config_t
> +{
> +  size_t pagesize;
> +  int flags;
> +};
> +
> +/* Returned the support huge page size from the requested PAGESIZE along
> +   with the requires extra mmap flags.  Returning a 0 value for pagesize
> +   disables its usage.  */
> +struct malloc_hugepage_config_t __malloc_hugepage_config (size_t requested)
> +     attribute_hidden;
> +
>   #endif /* _MALLOC_HUGEPAGES_H */
> diff --git a/sysdeps/unix/sysv/linux/malloc-hugepages.c b/sysdeps/unix/sysv/linux/malloc-hugepages.c
> index 66589127cd..0eb0c764ad 100644
> --- a/sysdeps/unix/sysv/linux/malloc-hugepages.c
> +++ b/sysdeps/unix/sysv/linux/malloc-hugepages.c
> @@ -17,8 +17,10 @@
>      not, see <https://www.gnu.org/licenses/>.  */
>   
>   #include <intprops.h>
> +#include <dirent.h>
>   #include <malloc-hugepages.h>
>   #include <not-cancel.h>
> +#include <sys/mman.h>
>   
>   size_t
>   __malloc_default_thp_pagesize (void)
> @@ -74,3 +76,126 @@ __malloc_thp_mode (void)
>       }
>     return malloc_thp_mode_not_supported;
>   }
> +
> +static size_t
> +malloc_default_hugepage_size (void)
> +{
> +  int fd = __open64_nocancel ("/proc/meminfo", O_RDONLY);
> +  if (fd == -1)
> +    return 0;
> +
> +  char buf[512];
> +  off64_t off = 0;
> +  while (1)
> +    {
> +      ssize_t r = __pread64_nocancel (fd, buf, sizeof (buf) - 1, off);
> +      if (r < 0)
> +	break;
> +      buf[r - 1] = '\0';
> +
> +      const char *s = strstr (buf, "Hugepagesize:");
> +      if (s == NULL)
> +	{
> +	  char *nl = strrchr (buf, '\n');
> +	  if (nl == NULL)
> +	    break;
> +	  off += (nl + 1) - buf;
> +	  continue;
> +	}
> +
> +      /* The default huge page size is in the form:
> +	 Hugepagesize:       NUMBER kB  */
> +      size_t hpsize = 0;
> +      s += sizeof ("Hugepagesize: ") - 1;
> +      for (int i = 0; (s[i] >= '0' && s[i] <= '9') || s[i] == ' '; i++)
> +	{
> +	  if (s[i] == ' ')
> +	    continue;
> +	  hpsize *= 10;
> +	  hpsize += s[i] - '0';
> +	}
> +      return hpsize * 1024;
> +    }
> +
> +  __close_nocancel (fd);
> +
> +  return 0;
> +}
> +
> +static inline struct malloc_hugepage_config_t
> +make_malloc_hugepage_config (size_t pagesize)
> +{
> +  int flags = MAP_HUGETLB | (__builtin_ctzll (pagesize) << MAP_HUGE_SHIFT);
> +  return (struct malloc_hugepage_config_t) { pagesize, flags };
> +}
> +
> +struct malloc_hugepage_config_t
> +__malloc_hugepage_config (size_t requested)
> +{
> +  if (requested == 1)
> +    {
> +      size_t pagesize = malloc_default_hugepage_size ();
> +      if (pagesize != 0)
> +	return make_malloc_hugepage_config (pagesize);
> +    }
> +
> +  int dirfd = __open64_nocancel ("/sys/kernel/mm/hugepages",
> +				 O_RDONLY | O_DIRECTORY, 0);
> +  if (dirfd == -1)
> +    return (struct malloc_hugepage_config_t) { 0, 0 };
> +
> +  bool found = false;
> +
> +  char buffer[1024];
> +  while (true)
> +    {
> +#if !IS_IN(libc)
> +# define __getdents64 getdents64
> +#endif
> +      ssize_t ret = __getdents64 (dirfd, buffer, sizeof (buffer));
> +      if (ret == -1)
> +	break;
> +      else if (ret == 0)
> +        break;
> +
> +      char *begin = buffer, *end = buffer + ret;
> +      while (begin != end)
> +        {
> +          unsigned short int d_reclen;
> +          memcpy (&d_reclen, begin + offsetof (struct dirent64, d_reclen),
> +                  sizeof (d_reclen));
> +          const char *dname = begin + offsetof (struct dirent64, d_name);
> +          begin += d_reclen;
> +
> +          if (dname[0] == '.'
> +	      || strncmp (dname, "hugepages-", sizeof ("hugepages-") - 1) != 0)
> +            continue;
> +
> +	  /* Each entry represents a supported huge page in the form of:
> +	     hugepages-<size>kB.  */
> +	  size_t hpsize = 0;
> +	  const char *sizestr = dname + sizeof ("hugepages-") - 1;
> +	  for (int i = 0; sizestr[i] >= '0' && sizestr[i] <= '9'; i++)
> +	    {
> +	      hpsize *= 10;
> +	      hpsize += sizestr[i] - '0';
> +	    }
> +	  hpsize *= 1024;
> +
> +	  if (hpsize == requested)
> +	    {
> +	      found = true;
> +	      break;
> +	    }
> +        }
> +      if (found)
> +	break;
> +    }
> +
> +  __close_nocancel (dirfd);
> +
> +  if (found)
> +    return make_malloc_hugepage_config (requested);
> +
> +  return (struct malloc_hugepage_config_t) { 0, 0 };
> +}
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/4] malloc: Improve Huge Page support
  2021-08-18 18:11 ` [PATCH v2 0/4] malloc: Improve Huge Page support Siddhesh Poyarekar via Libc-alpha
@ 2021-08-19 11:26   ` Adhemerval Zanella via Libc-alpha
  2021-08-19 11:48     ` Siddhesh Poyarekar via Libc-alpha
  0 siblings, 1 reply; 24+ messages in thread
From: Adhemerval Zanella via Libc-alpha @ 2021-08-19 11:26 UTC (permalink / raw
  To: Siddhesh Poyarekar, libc-alpha; +Cc: Norbert Manthey, Guillaume Morin



On 18/08/2021 15:11, Siddhesh Poyarekar wrote:
> On 8/18/21 7:49 PM, Adhemerval Zanella wrote:
>> Linux currently supports two ways to use Huge Pages: either by using
>> specific flags directly with the syscall (MAP_HUGETLB for mmap(), or
>> SHM_HUGETLB for shmget()), or by using Transparent Huge Pages (THP)
>> where the kernel will try to move allocated anonymous pages to Huge
>> Pages blocks transparent to application.
>>
>> Also, THP current support three different modes [1]: 'never', 'madvise',
>> and 'always'.  The 'never' is self-explanatory and 'always' will enable
>> THP for all anonymous memory.  However, 'madvise' is still the default
>> for some systems and for such cases THP will be only used if the memory
>> range is explicity advertise by the program through a
>> madvise(MADV_HUGEPAGE) call.
>>
>> This patchset adds a two new tunables to improve malloc() support with
>> Huge Page:
> 
> I wonder if this could be done with just the one tunable, glibc.malloc.hugepages where:
> 
> 0: Disabled (default)
> 1: Transparent, where we emulate "always" behaviour of THP
> 2: HugeTLB enabled with default hugepage size
> <size>: HugeTLB enabled with the specified page size

I though about it, and decided to use two tunables because although
for mmap() system allocation both tunable are mutually exclusive
(since it does not make sense to madvise() a mmap(MAP_HUGETLB)
we still use sbrk() on main arena. The way I did for sbrk() is to align 
to the THP page size advertisen by the kernel, so using the tunable
does change the behavior slightly (it is not 'transparent' as the
madvise call).

So to use only one tunable would require to either drop the sbrk()
madvise when MAP_HUGETLB is used, move it to another tunable (say
'3: HugeTLB enabled with default hugepage size and madvise() on sbrk()),
or assume it when huge pages should be used. 

(and how do we handle sbrk() with explicit size?)

If one tunable is preferable I think it would be something like:

0: Disabled (default)
1: Transparent, where we emulate "always" behaviour of THP
   sbrk() is also aligned to huge page size and issued madvise()
2: HugeTLB enabled with default hugepage size and sbrk() as
   handled are 1
> <size>: HugeTLB enabled with the specified page size and sbrk()
   are handled as 1

By forcing the sbrk() and madvise() on all tunables value make
the expectation to use huge pages in all possible occasions.


> 
> When using HugeTLB, we don't really need to bother with THP so they seem mutually exclusive.
> 
>>
>>    - glibc.malloc.thp_madvise: instruct the system allocator to issue
>>      a madvise(MADV_HUGEPAGE) call after a mmap() one for sizes larger
>>      than the default huge page size.  The default behavior is to
>>      disable it and if the system does not support THP the tunable also
>>      does not enable the madvise() call.
>>
>>    - glibc.malloc.mmap_hugetlb: instruct the system allocator to round
>>      allocation to huge page sizes along with the required flags
>>      (MAP_HUGETLB for Linux).  If the memory allocation fails, the
>>      default system page size is used instead.  The default behavior is
>>      to disable and a value of 1 uses the default system huge page size.
>>      A positive value larger than 1 means to use a specific huge page
>>      size, which is matched against the supported ones by the system.
>>
>> The 'thp_madvise' tunable also changes the sbrk() usage by malloc
>> on main arenas, where the increment is now aligned to the huge page
>> size, instead of default page size.
>>
>> The 'mmap_hugetlb' aims to replace the 'morecore' removed callback
>> from 2.34 for libhugetlbfs (where the library tries to leverage the
>> huge pages usage instead to provide a system allocator).  By
>> implementing the support directly on the mmap() code patch there is
>> no need to try emulate the morecore()/sbrk() semantic which simplifies
>> the code and make memory shrink logic more straighforward.
>>
>> The performance improvements are really dependent of the workload
>> and the platform, however a simple testcase might show the possible
>> improvements:
> 
> A simple test like below in benchtests would be very useful to at least get an initial understanding of the behaviour differences with different tunable values.  Later those who care can add more relevant workloads.

Yeah, I am open to suggestions on how to properly test it.  The issue
is we need to have specific system configuration either by proper
kernel support (THP) or with reserved large pages to actually test
it.

For THP the issue is really 'transparent' for user, which means that
we will need to poke on specific Linux sysfs information to check if
huge pages are being used. And we might not get the expected answer
depending of the system load and memory utilization (the advised
pages might not be moved to large pages if there is no sufficient
memory).

> 
>>
>> $ cat hugepages.cc
>> #include <unordered_map>
>>
>> int
>> main (int argc, char *argv[])
>> {
>>    std::size_t iters = 10000000;
>>    std::unordered_map <std::size_t, std::size_t> ht;
>>    ht.reserve (iters);
>>    for (std::size_t i = 0; i < iters; ++i)
>>      ht.try_emplace (i, i);
>>
>>    return 0;
>> }
>> $ g++ -std=c++17 -O2 hugepages.cc -o hugepages
>>
>> On a x86_64 (Ryzen 9 5900X):
>>
>>   Performance counter stats for 'env
>> GLIBC_TUNABLES=glibc.malloc.thp_madvise=0 ./testrun.sh ./hugepages':
>>
>>              98,874      faults
>>             717,059      dTLB-loads
>>             411,701      dTLB-load-misses          #   57.42% of all dTLB
>> cache accesses
>>           3,754,927      cache-misses              #    8.479 % of all
>> cache refs
>>          44,287,580      cache-references
>>
>>         0.315278378 seconds time elapsed
>>
>>         0.238635000 seconds user
>>         0.076714000 seconds sys
>>
>>   Performance counter stats for 'env
>> GLIBC_TUNABLES=glibc.malloc.thp_madvise=1 ./testrun.sh ./hugepages':
>>
>>               1,871      faults
>>             120,035      dTLB-loads
>>              19,882      dTLB-load-misses          #   16.56% of all dTLB
>> cache accesses
>>           4,182,942      cache-misses              #    7.452 % of all
>> cache refs
>>          56,128,995      cache-references
>>
>>         0.262620733 seconds time elapsed
>>
>>         0.222233000 seconds user
>>         0.040333000 seconds sys
>>
>>
>> On an AArch64 (cortex A72):
>>
>>   Performance counter stats for 'env
>> GLIBC_TUNABLES=glibc.malloc.thp_madvise=0 ./testrun.sh ./hugepages':
>>
>>               98835      faults
>>          2007234756      dTLB-loads
>>             4613669      dTLB-load-misses          #    0.23% of all dTLB
>> cache accesses
>>             8831801      cache-misses              #    0.504 % of all
>> cache refs
>>          1751391405      cache-references
>>
>>         0.616782575 seconds time elapsed
>>
>>         0.460946000 seconds user
>>         0.154309000 seconds sys
>>
>>   Performance counter stats for 'env
>> GLIBC_TUNABLES=glibc.malloc.thp_madvise=1 ./testrun.sh ./hugepages':
>>
>>                 955      faults
>>          1787401880      dTLB-loads
>>              224034      dTLB-load-misses          #    0.01% of all dTLB
>> cache accesses
>>             5480917      cache-misses              #    0.337 % of all
>> cache refs
>>          1625937858      cache-references
>>
>>         0.487773443 seconds time elapsed
>>
>>         0.440894000 seconds user
>>         0.046465000 seconds sys
>>
>>
>> And on a powerpc64 (POWER8):
>>
>>   Performance counter stats for 'env
>> GLIBC_TUNABLES=glibc.malloc.thp_madvise=0 ./testrun.sh ./hugepages
>> ':
>>
>>                5453      faults
>>                9940      dTLB-load-misses
>>             1338152      cache-misses              #    0.101 % of all
>> cache refs
>>          1326037487      cache-references
>>
>>         1.056355887 seconds time elapsed
>>
>>         1.014633000 seconds user
>>         0.041805000 seconds sys
>>
>>   Performance counter stats for 'env
>> GLIBC_TUNABLES=glibc.malloc.thp_madvise=1 ./testrun.sh ./hugepages
>> ':
>>
>>                1016      faults
>>                1746      dTLB-load-misses
>>              399052      cache-misses              #    0.030 % of all
>> cache refs
>>          1316059877      cache-references
>>
>>         1.057810501 seconds time elapsed
>>
>>         1.012175000 seconds user
>>         0.045624000 seconds sys
>>
>> It is worth to note that the powerpc64 machine has 'always' set
>> on '/sys/kernel/mm/transparent_hugepage/enabled'.
>>
>> Norbert Manthey's paper has more information with a more thoroughly
>> performance analysis.
>>
>> For testing run make check on x86_64-linux-gnu with thp_pagesize=1
>> (directly on ptmalloc_init() after tunable initialiazation) and
>> with mmap_hugetlb=1 (also directly on ptmalloc_init()) with about
>> 10 large pages (so the fallback mmap() call is used) and with
>> 1024 large pages (so all mmap(MAP_HUGETLB) are successful).
> 
> You could add tests similar to mcheck and malloc-check, i.e. add $(tests-hugepages) to run all malloc tests again with the various tunable values.  See tests-mcheck for example.

Ok, I can work with this.  This might not add much if the system is
not configured with either THP or with some huge page pool but at
least adds some coverage.

> 
>> -- 
>>
>> Changes from previous version:
>>
>>    - Renamed thp_pagesize to thp_madvise and make it a boolean state.
>>    - Added MAP_HUGETLB support for mmap().
>>    - Remove system specific hooks for THP huge page size in favor of
>>      Linux generic implementation.
>>    - Initial program segments need to be page aligned for the
>>      first madvise call.
>>
>> Adhemerval Zanella (4):
>>    malloc: Add madvise support for Transparent Huge Pages
>>    malloc: Add THP/madvise support for sbrk
>>    malloc: Move mmap logic to its own function
>>    malloc: Add Huge Page support for sysmalloc
>>
>>   NEWS                                       |   9 +-
>>   elf/dl-tunables.list                       |   9 +
>>   elf/tst-rtld-list-tunables.exp             |   2 +
>>   include/libc-pointer-arith.h               |  10 +
>>   malloc/arena.c                             |   7 +
>>   malloc/malloc-internal.h                   |   1 +
>>   malloc/malloc.c                            | 263 +++++++++++++++------
>>   manual/tunables.texi                       |  23 ++
>>   sysdeps/generic/Makefile                   |   8 +
>>   sysdeps/generic/malloc-hugepages.c         |  37 +++
>>   sysdeps/generic/malloc-hugepages.h         |  49 ++++
>>   sysdeps/unix/sysv/linux/malloc-hugepages.c | 201 ++++++++++++++++
>>   12 files changed, 542 insertions(+), 77 deletions(-)
>>   create mode 100644 sysdeps/generic/malloc-hugepages.c
>>   create mode 100644 sysdeps/generic/malloc-hugepages.h
>>   create mode 100644 sysdeps/unix/sysv/linux/malloc-hugepages.c
>>
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/4] malloc: Improve Huge Page support
  2021-08-19 11:26   ` Adhemerval Zanella via Libc-alpha
@ 2021-08-19 11:48     ` Siddhesh Poyarekar via Libc-alpha
  2021-08-19 12:04       ` Adhemerval Zanella via Libc-alpha
  0 siblings, 1 reply; 24+ messages in thread
From: Siddhesh Poyarekar via Libc-alpha @ 2021-08-19 11:48 UTC (permalink / raw
  To: Adhemerval Zanella, libc-alpha; +Cc: Norbert Manthey, Guillaume Morin

On 8/19/21 4:56 PM, Adhemerval Zanella wrote:
> I though about it, and decided to use two tunables because although
> for mmap() system allocation both tunable are mutually exclusive
> (since it does not make sense to madvise() a mmap(MAP_HUGETLB)
> we still use sbrk() on main arena. The way I did for sbrk() is to align
> to the THP page size advertisen by the kernel, so using the tunable
> does change the behavior slightly (it is not 'transparent' as the
> madvise call).
> 
> So to use only one tunable would require to either drop the sbrk()
> madvise when MAP_HUGETLB is used, move it to another tunable (say
> '3: HugeTLB enabled with default hugepage size and madvise() on sbrk()),
> or assume it when huge pages should be used.
> 
> (and how do we handle sbrk() with explicit size?)
 >
> If one tunable is preferable I think it would be something like:
> 
> 0: Disabled (default)
> 1: Transparent, where we emulate "always" behaviour of THP
>     sbrk() is also aligned to huge page size and issued madvise()
> 2: HugeTLB enabled with default hugepage size and sbrk() as
>     handled are 1
>> <size>: HugeTLB enabled with the specified page size and sbrk()
>     are handled as 1
> 
> By forcing the sbrk() and madvise() on all tunables value make
> the expectation to use huge pages in all possible occasions.

What do you think about using mmap instead of sbrk for (2) and <size> if 
hugetlb is requested?  It kinda emulates what libhugetlbfs does and 
makes the behaviour more consistent with what is advertised by the tunables.

>> A simple test like below in benchtests would be very useful to at least get an initial understanding of the behaviour differences with different tunable values.  Later those who care can add more relevant workloads.
> 
> Yeah, I am open to suggestions on how to properly test it.  The issue
> is we need to have specific system configuration either by proper
> kernel support (THP) or with reserved large pages to actually test
> it.
> 
> For THP the issue is really 'transparent' for user, which means that
> we will need to poke on specific Linux sysfs information to check if
> huge pages are being used. And we might not get the expected answer
> depending of the system load and memory utilization (the advised
> pages might not be moved to large pages if there is no sufficient
> memory).

For benchmarking we can make a minimal assumption that the user will set 
the system up to appropriately isolate the benchmarks.  As for the sysfs 
setup, we can always test and bail if unsupported.

>> You could add tests similar to mcheck and malloc-check, i.e. add $(tests-hugepages) to run all malloc tests again with the various tunable values.  See tests-mcheck for example.
> 
> Ok, I can work with this.  This might not add much if the system is
> not configured with either THP or with some huge page pool but at
> least adds some coverage.

Yeah the main intent is to simply ensure that there are no differences 
in behaviour with hugepages.

Siddhesh

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/4] malloc: Add madvise support for Transparent Huge Pages
  2021-08-18 18:42   ` Siddhesh Poyarekar via Libc-alpha
@ 2021-08-19 12:00     ` Adhemerval Zanella via Libc-alpha
  2021-08-19 12:22       ` Siddhesh Poyarekar via Libc-alpha
  0 siblings, 1 reply; 24+ messages in thread
From: Adhemerval Zanella via Libc-alpha @ 2021-08-19 12:00 UTC (permalink / raw
  To: Siddhesh Poyarekar, libc-alpha; +Cc: Norbert Manthey, Guillaume Morin



On 18/08/2021 15:42, Siddhesh Poyarekar wrote:
> On 8/18/21 7:49 PM, Adhemerval Zanella wrote:
>> Linux Transparent Huge Pages (THP) current support three different
>> states: 'never', 'madvise', and 'always'.  The 'never' is
>> self-explanatory and 'always' will enable THP for all anonymous
>> memory.  However, 'madvise' is still the default for some system and
>> for such case THP will be only used if the memory range is explicity
>> advertise by the program through a madvise(MADV_HUGEPAGE) call.
>>
>> To enable it a new tunable is provided, 'glibc.malloc.thp_madvise',
>> where setting to a value diffent than 0 enables the madvise call.
>> Linux current only support one page size for THP, even if the
>> architecture supports multiple sizes.
>>
>> This patch issues the madvise(MADV_HUGEPAGE) call after a successful
>> mmap() call at sysmalloc() with sizes larger than the default huge
>> page size.  The madvise() call is disable is system does not support
>> THP or if it has the mode set to "never".
>>
>> Checked on x86_64-linux-gnu.
>> ---
>>   NEWS                                       |  5 +-
>>   elf/dl-tunables.list                       |  5 ++
>>   elf/tst-rtld-list-tunables.exp             |  1 +
>>   malloc/arena.c                             |  5 ++
>>   malloc/malloc-internal.h                   |  1 +
>>   malloc/malloc.c                            | 48 ++++++++++++++
>>   manual/tunables.texi                       |  9 +++
>>   sysdeps/generic/Makefile                   |  8 +++
>>   sysdeps/generic/malloc-hugepages.c         | 31 +++++++++
>>   sysdeps/generic/malloc-hugepages.h         | 37 +++++++++++
>>   sysdeps/unix/sysv/linux/malloc-hugepages.c | 76 ++++++++++++++++++++++
>>   11 files changed, 225 insertions(+), 1 deletion(-)
>>   create mode 100644 sysdeps/generic/malloc-hugepages.c
>>   create mode 100644 sysdeps/generic/malloc-hugepages.h
>>   create mode 100644 sysdeps/unix/sysv/linux/malloc-hugepages.c
>>
>> diff --git a/NEWS b/NEWS
>> index 79c895e382..9b2345d08c 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -9,7 +9,10 @@ Version 2.35
>>     Major new features:
>>   -  [Add new features here]
>> +* On Linux, a new tunable, glibc.malloc.thp_madvise, can be used to
>> +  make malloc issue madvise plus MADV_HUGEPAGE on mmap and sbrk calls.
>> +  It might improve performance with Transparent Huge Pages madvise mode
>> +  depending of the workload.
>>     Deprecated and removed features, and other changes affecting compatibility:
>>   diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list
>> index 8ddd4a2314..67df6dbc2c 100644
>> --- a/elf/dl-tunables.list
>> +++ b/elf/dl-tunables.list
>> @@ -92,6 +92,11 @@ glibc {
>>         minval: 0
>>         security_level: SXID_IGNORE
>>       }
>> +    thp_madvise {
>> +      type: INT_32
>> +      minval: 0
>> +      maxval: 1
>> +    }
>>     }
>>     cpu {
>>       hwcap_mask {
>> diff --git a/elf/tst-rtld-list-tunables.exp b/elf/tst-rtld-list-tunables.exp
>> index 9f66c52885..d8109fa31c 100644
>> --- a/elf/tst-rtld-list-tunables.exp
>> +++ b/elf/tst-rtld-list-tunables.exp
>> @@ -8,6 +8,7 @@ glibc.malloc.perturb: 0 (min: 0, max: 255)
>>   glibc.malloc.tcache_count: 0x0 (min: 0x0, max: 0x[f]+)
>>   glibc.malloc.tcache_max: 0x0 (min: 0x0, max: 0x[f]+)
>>   glibc.malloc.tcache_unsorted_limit: 0x0 (min: 0x0, max: 0x[f]+)
>> +glibc.malloc.thp_madvise: 0 (min: 0, max: 1)
>>   glibc.malloc.top_pad: 0x0 (min: 0x0, max: 0x[f]+)
>>   glibc.malloc.trim_threshold: 0x0 (min: 0x0, max: 0x[f]+)
>>   glibc.rtld.nns: 0x4 (min: 0x1, max: 0x10)
>> diff --git a/malloc/arena.c b/malloc/arena.c
>> index 667484630e..81bff54303 100644
>> --- a/malloc/arena.c
>> +++ b/malloc/arena.c
>> @@ -231,6 +231,7 @@ TUNABLE_CALLBACK_FNDECL (set_tcache_count, size_t)
>>   TUNABLE_CALLBACK_FNDECL (set_tcache_unsorted_limit, size_t)
>>   #endif
>>   TUNABLE_CALLBACK_FNDECL (set_mxfast, size_t)
>> +TUNABLE_CALLBACK_FNDECL (set_thp_madvise, int32_t)
>>   #else
>>   /* Initialization routine. */
>>   #include <string.h>
>> @@ -331,6 +332,7 @@ ptmalloc_init (void)
>>              TUNABLE_CALLBACK (set_tcache_unsorted_limit));
>>   # endif
>>     TUNABLE_GET (mxfast, size_t, TUNABLE_CALLBACK (set_mxfast));
>> +  TUNABLE_GET (thp_madvise, int32_t, TUNABLE_CALLBACK (set_thp_madvise));
>>   #else
>>     if (__glibc_likely (_environ != NULL))
>>       {
>> @@ -509,6 +511,9 @@ new_heap (size_t size, size_t top_pad)
>>         __munmap (p2, HEAP_MAX_SIZE);
>>         return 0;
>>       }
>> +
>> +  sysmadvise_thp (p2, size);
>> +
>>     h = (heap_info *) p2;
>>     h->size = size;
>>     h->mprotect_size = size;
>> diff --git a/malloc/malloc-internal.h b/malloc/malloc-internal.h
>> index 0c7b5a183c..7493e34d86 100644
>> --- a/malloc/malloc-internal.h
>> +++ b/malloc/malloc-internal.h
>> @@ -22,6 +22,7 @@
>>   #include <malloc-machine.h>
>>   #include <malloc-sysdep.h>
>>   #include <malloc-size.h>
>> +#include <malloc-hugepages.h>
>>     /* Called in the parent process before a fork.  */
>>   void __malloc_fork_lock_parent (void) attribute_hidden;
>> diff --git a/malloc/malloc.c b/malloc/malloc.c
>> index e065785af7..ad3eec41ac 100644
>> --- a/malloc/malloc.c
>> +++ b/malloc/malloc.c
>> @@ -1881,6 +1881,11 @@ struct malloc_par
>>     INTERNAL_SIZE_T arena_test;
>>     INTERNAL_SIZE_T arena_max;
>>   +#if HAVE_TUNABLES
>> +  /* Transparent Large Page support.  */
>> +  INTERNAL_SIZE_T thp_pagesize;
>> +#endif
>> +
>>     /* Memory map support */
>>     int n_mmaps;
>>     int n_mmaps_max;
>> @@ -2009,6 +2014,20 @@ free_perturb (char *p, size_t n)
>>     #include <stap-probe.h>
>>   +/* ----------- Routines dealing with transparent huge pages ----------- */
>> +
>> +static inline void
>> +sysmadvise_thp (void *p, INTERNAL_SIZE_T size)
>> +{
>> +#if HAVE_TUNABLES && defined (MADV_HUGEPAGE)
>> +  /* Do not consider areas smaller than a huge page or if the tunable is
>> +     not active.  */
> 
> You also shouldn't bother setting it if /sys/kernel/mm/transparent_hugepage/enabled is set to "enabled" since it's redundant.

I think you means 'always' and it should be handled by
__malloc_thp_mode() (which would about 'tph_pagesize'
to have a value different than 0). 

I also did not considered 'always' because I saw some results 
on powerpc where even with 'always' mode issuing the madvise did 
improve. I am not sure why exactly, it might be the case for 
the program header that is no sufficient aligned (and with 
tunable it would be).

But maybe for 'always' it would be better to disable the
madvise() as well.

> 
>> +  if (mp_.thp_pagesize == 0 || size < mp_.thp_pagesize)
>> +    return;
>> +  __madvise (p, size, MADV_HUGEPAGE);
>> +#endif
>> +}
>> +
>>   /* ------------------- Support for multiple arenas -------------------- */
>>   #include "arena.c"
>>   @@ -2446,6 +2465,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
>>               if (mm != MAP_FAILED)
>>               {
>> +          sysmadvise_thp (mm, size);
>> +
>>                 /*
>>                    The offset to the start of the mmapped region is stored
>>                    in the prev_size field of the chunk. This allows us to adjust
>> @@ -2607,6 +2628,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
>>         if (size > 0)
>>           {
>>             brk = (char *) (MORECORE (size));
>> +      if (brk != (char *) (MORECORE_FAILURE))
>> +        sysmadvise_thp (brk, size);
>>             LIBC_PROBE (memory_sbrk_more, 2, brk, size);
>>           }
>>   @@ -2638,6 +2661,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
>>                   if (mbrk != MAP_FAILED)
>>                   {
>> +          sysmadvise_thp (mbrk, size);
>> +
>>                     /* We do not need, and cannot use, another sbrk call to find end */
>>                     brk = mbrk;
>>                     snd_brk = brk + size;
>> @@ -2749,6 +2774,8 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
>>                         correction = 0;
>>                         snd_brk = (char *) (MORECORE (0));
>>                       }
>> +          else
>> +            sysmadvise_thp (snd_brk, correction);
>>                   }
>>                   /* handle non-contiguous cases */
>> @@ -2989,6 +3016,8 @@ mremap_chunk (mchunkptr p, size_t new_size)
>>     if (cp == MAP_FAILED)
>>       return 0;
>>   +  sysmadvise_thp (cp, new_size);
>> +
>>     p = (mchunkptr) (cp + offset);
>>       assert (aligned_OK (chunk2mem (p)));
>> @@ -5325,6 +5354,25 @@ do_set_mxfast (size_t value)
>>     return 0;
>>   }
>>   +#if HAVE_TUNABLES
>> +static __always_inline int
>> +do_set_thp_madvise (int32_t value)
>> +{
>> +  if (value > 0)
>> +    {
>> +      enum malloc_thp_mode_t thp_mode = __malloc_thp_mode ();
>> +      /*
>> +     Only enables THP usage is system does support it and has at least
>> +     always or madvise mode.  Otherwise the madvise() call is wasteful.
>> +       */
>> +      if (thp_mode != malloc_thp_mode_not_supported
>> +      && thp_mode != malloc_thp_mode_never)
>> +    mp_.thp_pagesize = __malloc_default_thp_pagesize ();
>> +    }
>> +  return 0;
>> +}
>> +#endif
>> +
>>   int
>>   __libc_mallopt (int param_number, int value)
>>   {
>> diff --git a/manual/tunables.texi b/manual/tunables.texi
>> index 658547c613..93c46807f9 100644
>> --- a/manual/tunables.texi
>> +++ b/manual/tunables.texi
>> @@ -270,6 +270,15 @@ pointer, so add 4 on 32-bit systems or 8 on 64-bit systems to the size
>>   passed to @code{malloc} for the largest bin size to enable.
>>   @end deftp
>>   +@deftp Tunable glibc.malloc.thp_madivse
>> +This tunable enable the use of @code{madvise} with @code{MADV_HUGEPAGE} after
>> +the system allocator allocated memory through @code{mmap} if the system supports
>> +Transparent Huge Page (currently only Linux).
>> +
>> +The default value of this tunable is @code{0}, which disable its usage.
>> +Setting to a positive value enable the @code{madvise} call.
>> +@end deftp
>> +
>>   @node Dynamic Linking Tunables
>>   @section Dynamic Linking Tunables
>>   @cindex dynamic linking tunables
>> diff --git a/sysdeps/generic/Makefile b/sysdeps/generic/Makefile
>> index a209e85cc4..8eef83c94d 100644
>> --- a/sysdeps/generic/Makefile
>> +++ b/sysdeps/generic/Makefile
>> @@ -27,3 +27,11 @@ sysdep_routines += framestate unwind-pe
>>   shared-only-routines += framestate unwind-pe
>>   endif
>>   endif
>> +
>> +ifeq ($(subdir),malloc)
>> +sysdep_malloc_debug_routines += malloc-hugepages
>> +endif
>> +
>> +ifeq ($(subdir),misc)
>> +sysdep_routines += malloc-hugepages
>> +endif
>> diff --git a/sysdeps/generic/malloc-hugepages.c b/sysdeps/generic/malloc-hugepages.c
>> new file mode 100644
>> index 0000000000..262bcdbeb8
>> --- /dev/null
>> +++ b/sysdeps/generic/malloc-hugepages.c
>> @@ -0,0 +1,31 @@
>> +/* Huge Page support.  Generic implementation.
>> +   Copyright (C) 2021 Free Software Foundation, Inc.
>> +   This file is part of the GNU C Library.
>> +
>> +   The GNU C Library is free software; you can redistribute it and/or
>> +   modify it under the terms of the GNU Lesser General Public License as
>> +   published by the Free Software Foundation; either version 2.1 of the
>> +   License, or (at your option) any later version.
>> +
>> +   The GNU C Library is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +   Lesser General Public License for more details.
>> +
>> +   You should have received a copy of the GNU Lesser General Public
>> +   License along with the GNU C Library; see the file COPYING.LIB.  If
>> +   not, see <https://www.gnu.org/licenses/>.  */
>> +
>> +#include <malloc-hugepages.h>
>> +
>> +size_t
>> +__malloc_default_thp_pagesize (void)
>> +{
>> +  return 0;
>> +}
>> +
>> +enum malloc_thp_mode_t
>> +__malloc_thp_mode (void)
>> +{
>> +  return malloc_thp_mode_not_supported;
>> +}
>> diff --git a/sysdeps/generic/malloc-hugepages.h b/sysdeps/generic/malloc-hugepages.h
>> new file mode 100644
>> index 0000000000..664cda9b67
>> --- /dev/null
>> +++ b/sysdeps/generic/malloc-hugepages.h
>> @@ -0,0 +1,37 @@
>> +/* Malloc huge page support.  Generic implementation.
>> +   Copyright (C) 2021 Free Software Foundation, Inc.
>> +   This file is part of the GNU C Library.
>> +
>> +   The GNU C Library is free software; you can redistribute it and/or
>> +   modify it under the terms of the GNU Lesser General Public License as
>> +   published by the Free Software Foundation; either version 2.1 of the
>> +   License, or (at your option) any later version.
>> +
>> +   The GNU C Library is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +   Lesser General Public License for more details.
>> +
>> +   You should have received a copy of the GNU Lesser General Public
>> +   License along with the GNU C Library; see the file COPYING.LIB.  If
>> +   not, see <https://www.gnu.org/licenses/>.  */
>> +
>> +#ifndef _MALLOC_HUGEPAGES_H
>> +#define _MALLOC_HUGEPAGES_H
>> +
>> +#include <stddef.h>
>> +
>> +/* Return the default transparent huge page size.  */
>> +size_t __malloc_default_thp_pagesize (void) attribute_hidden;
>> +
>> +enum malloc_thp_mode_t
>> +{
>> +  malloc_thp_mode_always,
>> +  malloc_thp_mode_madvise,
>> +  malloc_thp_mode_never,
>> +  malloc_thp_mode_not_supported
>> +};
>> +
>> +enum malloc_thp_mode_t __malloc_thp_mode (void) attribute_hidden;
>> +
>> +#endif /* _MALLOC_HUGEPAGES_H */
>> diff --git a/sysdeps/unix/sysv/linux/malloc-hugepages.c b/sysdeps/unix/sysv/linux/malloc-hugepages.c
>> new file mode 100644
>> index 0000000000..66589127cd
>> --- /dev/null
>> +++ b/sysdeps/unix/sysv/linux/malloc-hugepages.c
>> @@ -0,0 +1,76 @@
>> +/* Huge Page support.  Linux implementation.
>> +   Copyright (C) 2021 Free Software Foundation, Inc.
>> +   This file is part of the GNU C Library.
>> +
>> +   The GNU C Library is free software; you can redistribute it and/or
>> +   modify it under the terms of the GNU Lesser General Public License as
>> +   published by the Free Software Foundation; either version 2.1 of the
>> +   License, or (at your option) any later version.
>> +
>> +   The GNU C Library is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> +   Lesser General Public License for more details.
>> +
>> +   You should have received a copy of the GNU Lesser General Public
>> +   License along with the GNU C Library; see the file COPYING.LIB.  If
>> +   not, see <https://www.gnu.org/licenses/>.  */
>> +
>> +#include <intprops.h>
>> +#include <malloc-hugepages.h>
>> +#include <not-cancel.h>
>> +
>> +size_t
>> +__malloc_default_thp_pagesize (void)
>> +{
> 
> Likewise for page size; this could be cached too.

I think there is no much sense to cache since it is used once at
malloc initialization.

> 
>> +  int fd = __open64_nocancel (
>> +    "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size", O_RDONLY);
>> +  if (fd == -1)
>> +    return 0;
>> +
>> +
>> +  char str[INT_BUFSIZE_BOUND (size_t)];
>> +  ssize_t s = __read_nocancel (fd, str, sizeof (str));
>> +  __close_nocancel (fd);
>> +
>> +  if (s < 0)
>> +    return 0;
>> +
>> +  int r = 0;
>> +  for (ssize_t i = 0; i < s; i++)
>> +    {
>> +      if (str[i] == '\n')
>> +    break;
>> +      r *= 10;
>> +      r += str[i] - '0';
>> +    }
>> +  return r;
>> +}
>> +
>> +enum malloc_thp_mode_t
>> +__malloc_thp_mode (void)
>> +{
>> +  int fd = __open64_nocancel ("/sys/kernel/mm/transparent_hugepage/enabled",
>> +                  O_RDONLY);
>> +  if (fd == -1)
>> +    return malloc_thp_mode_not_supported;
>> +
>> +  static const char mode_always[]  = "[always] madvise never\n";
>> +  static const char mode_madvise[] = "always [madvise] never\n";
>> +  static const char mode_never[]   = "always madvise [never]\n";
>> +
>> +  char str[sizeof(mode_always)];
>> +  ssize_t s = __read_nocancel (fd, str, sizeof (str));
>> +  __close_nocancel (fd);
>> +
>> +  if (s == sizeof (mode_always) - 1)
>> +    {
>> +      if (strcmp (str, mode_always) == 0)
>> +    return malloc_thp_mode_always;
>> +      else if (strcmp (str, mode_madvise) == 0)
>> +    return malloc_thp_mode_madvise;
>> +      else if (strcmp (str, mode_never) == 0)
>> +    return malloc_thp_mode_never;
>> +    }
>> +  return malloc_thp_mode_not_supported;
>> +}
>>
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/4] malloc: Improve Huge Page support
  2021-08-19 11:48     ` Siddhesh Poyarekar via Libc-alpha
@ 2021-08-19 12:04       ` Adhemerval Zanella via Libc-alpha
  2021-08-19 12:26         ` Siddhesh Poyarekar via Libc-alpha
  0 siblings, 1 reply; 24+ messages in thread
From: Adhemerval Zanella via Libc-alpha @ 2021-08-19 12:04 UTC (permalink / raw
  To: Siddhesh Poyarekar, libc-alpha; +Cc: Norbert Manthey, Guillaume Morin



On 19/08/2021 08:48, Siddhesh Poyarekar wrote:
> On 8/19/21 4:56 PM, Adhemerval Zanella wrote:
>> I though about it, and decided to use two tunables because although
>> for mmap() system allocation both tunable are mutually exclusive
>> (since it does not make sense to madvise() a mmap(MAP_HUGETLB)
>> we still use sbrk() on main arena. The way I did for sbrk() is to align
>> to the THP page size advertisen by the kernel, so using the tunable
>> does change the behavior slightly (it is not 'transparent' as the
>> madvise call).
>>
>> So to use only one tunable would require to either drop the sbrk()
>> madvise when MAP_HUGETLB is used, move it to another tunable (say
>> '3: HugeTLB enabled with default hugepage size and madvise() on sbrk()),
>> or assume it when huge pages should be used.
>>
>> (and how do we handle sbrk() with explicit size?)
>>
>> If one tunable is preferable I think it would be something like:
>>
>> 0: Disabled (default)
>> 1: Transparent, where we emulate "always" behaviour of THP
>>     sbrk() is also aligned to huge page size and issued madvise()
>> 2: HugeTLB enabled with default hugepage size and sbrk() as
>>     handled are 1
>>> <size>: HugeTLB enabled with the specified page size and sbrk()
>>     are handled as 1
>>
>> By forcing the sbrk() and madvise() on all tunables value make
>> the expectation to use huge pages in all possible occasions.
> 
> What do you think about using mmap instead of sbrk for (2) and <size> if hugetlb is requested?  It kinda emulates what libhugetlbfs does and makes the behaviour more consistent with what is advertised by the tunables.

I think this would be an additional tunable, we still need to handle
the case where mmap() fails either in default path (due maximum number
of mmap() per process by kernel or when the poll is exhausted for 
MAP_HUGETLB).

So for sbrk() call, should we align the increment to huge page and
issue the madvise() if the tunable is set to use huge pages?

> 
>>> A simple test like below in benchtests would be very useful to at least get an initial understanding of the behaviour differences with different tunable values.  Later those who care can add more relevant workloads.
>>
>> Yeah, I am open to suggestions on how to properly test it.  The issue
>> is we need to have specific system configuration either by proper
>> kernel support (THP) or with reserved large pages to actually test
>> it.
>>
>> For THP the issue is really 'transparent' for user, which means that
>> we will need to poke on specific Linux sysfs information to check if
>> huge pages are being used. And we might not get the expected answer
>> depending of the system load and memory utilization (the advised
>> pages might not be moved to large pages if there is no sufficient
>> memory).
> 
> For benchmarking we can make a minimal assumption that the user will set the system up to appropriately isolate the benchmarks.  As for the sysfs setup, we can always test and bail if unsupported.
> 
>>> You could add tests similar to mcheck and malloc-check, i.e. add $(tests-hugepages) to run all malloc tests again with the various tunable values.  See tests-mcheck for example.
>>
>> Ok, I can work with this.  This might not add much if the system is
>> not configured with either THP or with some huge page pool but at
>> least adds some coverage.
> 
> Yeah the main intent is to simply ensure that there are no differences in behaviour with hugepages.

Alright, I will add some tunable usage then.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 4/4] malloc: Add Huge Page support for sysmalloc
  2021-08-19  1:03   ` Siddhesh Poyarekar via Libc-alpha
@ 2021-08-19 12:08     ` Adhemerval Zanella via Libc-alpha
  0 siblings, 0 replies; 24+ messages in thread
From: Adhemerval Zanella via Libc-alpha @ 2021-08-19 12:08 UTC (permalink / raw
  To: Siddhesh Poyarekar, libc-alpha; +Cc: Norbert Manthey, Guillaume Morin



On 18/08/2021 22:03, Siddhesh Poyarekar wrote:
> On 8/18/21 7:50 PM, Adhemerval Zanella via Libc-alpha wrote:
>> A new tunable, 'glibc.malloc.mmap_hugetlb', adds support to use Huge Page
>> support directly with mmap() calls.  The required supported sizes and
>> flags for mmap() are provided by an arch-specific internal hook
>> malloc_hp_config().
>>
>> Currently it first try mmap() using the huge page size and fallback to
>> default page size and sbrk() call if kernel returns MMAP_FAILED.
>>
>> The default malloc_hp_config() implementation does not enable it even
>> if the tunable is set.
>>
>> Checked on x86_64-linux-gnu.
>> ---
>>   NEWS                                       |   4 +
>>   elf/dl-tunables.list                       |   4 +
>>   elf/tst-rtld-list-tunables.exp             |   1 +
>>   malloc/arena.c                             |   2 +
>>   malloc/malloc.c                            |  35 +++++-
>>   manual/tunables.texi                       |  14 +++
>>   sysdeps/generic/malloc-hugepages.c         |   6 +
>>   sysdeps/generic/malloc-hugepages.h         |  12 ++
>>   sysdeps/unix/sysv/linux/malloc-hugepages.c | 125 +++++++++++++++++++++
>>   9 files changed, 200 insertions(+), 3 deletions(-)
>>
>> diff --git a/NEWS b/NEWS
>> index 9b2345d08c..412bf3e6f8 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -14,6 +14,10 @@ Major new features:
>>     It might improve performance with Transparent Huge Pages madvise mode
>>     depending of the workload.
>>   +* On Linux, a new tunable, glibc.malloc.mmap_hugetlb, can be used to
>> +  instruct malloc to try use Huge Pages when allocate memory with mmap()
>> +  calls (through the use of MAP_HUGETLB).
>> +
>>   Deprecated and removed features, and other changes affecting compatibility:
>>       [Add deprecations, removals and changes affecting compatibility here]
>> diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list
>> index 67df6dbc2c..209c2d8592 100644
>> --- a/elf/dl-tunables.list
>> +++ b/elf/dl-tunables.list
>> @@ -97,6 +97,10 @@ glibc {
>>         minval: 0
>>         maxval: 1
>>       }
>> +    mmap_hugetlb {
>> +      type: SIZE_T
>> +      minval: 0
>> +    }
>>     }
>>     cpu {
>>       hwcap_mask {
>> diff --git a/elf/tst-rtld-list-tunables.exp b/elf/tst-rtld-list-tunables.exp
>> index d8109fa31c..49f033ce91 100644
>> --- a/elf/tst-rtld-list-tunables.exp
>> +++ b/elf/tst-rtld-list-tunables.exp
>> @@ -1,6 +1,7 @@
>>   glibc.malloc.arena_max: 0x0 (min: 0x1, max: 0x[f]+)
>>   glibc.malloc.arena_test: 0x0 (min: 0x1, max: 0x[f]+)
>>   glibc.malloc.check: 0 (min: 0, max: 3)
>> +glibc.malloc.mmap_hugetlb: 0x0 (min: 0x0, max: 0x[f]+)
>>   glibc.malloc.mmap_max: 0 (min: 0, max: 2147483647)
>>   glibc.malloc.mmap_threshold: 0x0 (min: 0x0, max: 0x[f]+)
>>   glibc.malloc.mxfast: 0x0 (min: 0x0, max: 0x[f]+)
>> diff --git a/malloc/arena.c b/malloc/arena.c
>> index 81bff54303..4efb5581c1 100644
>> --- a/malloc/arena.c
>> +++ b/malloc/arena.c
>> @@ -232,6 +232,7 @@ TUNABLE_CALLBACK_FNDECL (set_tcache_unsorted_limit, size_t)
>>   #endif
>>   TUNABLE_CALLBACK_FNDECL (set_mxfast, size_t)
>>   TUNABLE_CALLBACK_FNDECL (set_thp_madvise, int32_t)
>> +TUNABLE_CALLBACK_FNDECL (set_mmap_hugetlb, size_t)
>>   #else
>>   /* Initialization routine. */
>>   #include <string.h>
>> @@ -333,6 +334,7 @@ ptmalloc_init (void)
>>   # endif
>>     TUNABLE_GET (mxfast, size_t, TUNABLE_CALLBACK (set_mxfast));
>>     TUNABLE_GET (thp_madvise, int32_t, TUNABLE_CALLBACK (set_thp_madvise));
>> +  TUNABLE_GET (mmap_hugetlb, size_t, TUNABLE_CALLBACK (set_mmap_hugetlb));
>>   #else
>>     if (__glibc_likely (_environ != NULL))
>>       {
>> diff --git a/malloc/malloc.c b/malloc/malloc.c
>> index 4bfcea286f..8cf2d6855e 100644
>> --- a/malloc/malloc.c
>> +++ b/malloc/malloc.c
>> @@ -1884,6 +1884,10 @@ struct malloc_par
>>   #if HAVE_TUNABLES
>>     /* Transparent Large Page support.  */
>>     INTERNAL_SIZE_T thp_pagesize;
>> +  /* A value different than 0 means to align mmap allocation to hp_pagesize
>> +     add hp_flags on flags.  */
>> +  INTERNAL_SIZE_T hp_pagesize;
>> +  int hp_flags;
>>   #endif
>>       /* Memory map support */
>> @@ -2415,7 +2419,8 @@ do_check_malloc_state (mstate av)
>>    */
>>     static void *
>> -sysmalloc_mmap (INTERNAL_SIZE_T nb, size_t pagesize, int extra_flags, mstate av)
>> +sysmalloc_mmap (INTERNAL_SIZE_T nb, size_t pagesize, int extra_flags, mstate av,
>> +        bool set_thp)
>>   {
>>     long int size;
>>   @@ -2442,7 +2447,8 @@ sysmalloc_mmap (INTERNAL_SIZE_T nb, size_t pagesize, int extra_flags, mstate av)
>>     if (mm == MAP_FAILED)
>>       return mm;
>>   -  sysmadvise_thp (mm, size);
>> +  if (set_thp)
>> +    sysmadvise_thp (mm, size);
> 
> If MAP_HUGEPAGE is set in extra_flags then we don't need madvise; there's no need for set_thp.

Alright we can use it instead.  I just add the flag to avoid the extra
ifdef MAP_HUGEPAGE.

> 
>>       /*
>>       The offset to the start of the mmapped region is stored in the prev_size
>> @@ -2531,7 +2537,18 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
>>         && (mp_.n_mmaps < mp_.n_mmaps_max)))
>>       {
>>       try_mmap:
>> -      char *mm = sysmalloc_mmap (nb, pagesize, 0, av);
>> +      char *mm;
>> +#if HAVE_TUNABLES
>> +      if (mp_.hp_pagesize > 0)
>> +    {
>> +      /* There is no need to isse the THP madvise call if Huge Pages are
>> +         used directly.  */
>> +      mm = sysmalloc_mmap (nb, mp_.hp_pagesize, mp_.hp_flags, av, false);
>> +      if (mm != MAP_FAILED)
>> +        return mm;
>> +    }
>> +#endif
>> +      mm = sysmalloc_mmap (nb, pagesize, 0, av, true);
> 
> A single tunable ought to allow you to do all this in just sysmalloc_mmap.
> 
>>         if (mm != MAP_FAILED)
>>       return mm;
>>         tried_mmap = true;
>> @@ -5405,6 +5422,18 @@ do_set_thp_madvise (int32_t value)
>>       }
>>     return 0;
>>   }
>> +
>> +static __always_inline int
>> +do_set_mmap_hugetlb (size_t value)
>> +{
>> +  if (value > 0)
>> +    {
>> +      struct malloc_hugepage_config_t cfg = __malloc_hugepage_config (value);
>> +      mp_.hp_pagesize = cfg.pagesize;
>> +      mp_.hp_flags = cfg.flags;
> 
> Instead of making a struct to pass it, you could just pass &mp.hp_pagesize and &mp.hp_flags.  Also, with a single tunable, you do this only when value > 1.  For value == 0, you set the default THP pagesize and set flags to 0.
> 
>> +    }
>> +  return 0;
>> +}
>>   #endif
>>     int

I don't have a strong opinion here, using pointers should work as well.

>> diff --git a/manual/tunables.texi b/manual/tunables.texi
>> index 93c46807f9..4da6a02778 100644
>> --- a/manual/tunables.texi
>> +++ b/manual/tunables.texi
>> @@ -279,6 +279,20 @@ The default value of this tunable is @code{0}, which disable its usage.
>>   Setting to a positive value enable the @code{madvise} call.
>>   @end deftp
>>   +@deftp Tunable glibc.malloc.mmap_hugetlb
>> +This tunable enable the use of Huge Pages when the system supports it (currently
>> +only Linux).  It is done by aligning the memory size and passing the required
>> +flags (@code{MAP_HUGETLB} on Linux) when issuing the @code{mmap} to allocate
>> +memory from the system.
>> +
>> +The default value of this tunable is @code{0}, which disable its usage.
>> +The special value @code{1} will try to gather the system default huge page size,
>> +while a value larger than @code{1} will try to match it with the supported system
>> +huge page size.  If either no default huge page size could be obtained or if the
>> +requested size does not match the supported ones, the huge pages supports will be
>> +disabled.
>> +@end deftp
>> +
>>   @node Dynamic Linking Tunables
>>   @section Dynamic Linking Tunables
>>   @cindex dynamic linking tunables
>> diff --git a/sysdeps/generic/malloc-hugepages.c b/sysdeps/generic/malloc-hugepages.c
>> index 262bcdbeb8..e5f5c1ec98 100644
>> --- a/sysdeps/generic/malloc-hugepages.c
>> +++ b/sysdeps/generic/malloc-hugepages.c
>> @@ -29,3 +29,9 @@ __malloc_thp_mode (void)
>>   {
>>     return malloc_thp_mode_not_supported;
>>   }
>> +
>> +/* Return the default transparent huge page size.  */
>> +struct malloc_hugepage_config_t __malloc_hugepage_config (size_t requested)
>> +{
>> +  return (struct malloc_hugepage_config_t) { 0, 0 };
>> +}
>> diff --git a/sysdeps/generic/malloc-hugepages.h b/sysdeps/generic/malloc-hugepages.h
>> index 664cda9b67..27f7adfea5 100644
>> --- a/sysdeps/generic/malloc-hugepages.h
>> +++ b/sysdeps/generic/malloc-hugepages.h
>> @@ -34,4 +34,16 @@ enum malloc_thp_mode_t
>>     enum malloc_thp_mode_t __malloc_thp_mode (void) attribute_hidden;
>>   +struct malloc_hugepage_config_t
>> +{
>> +  size_t pagesize;
>> +  int flags;
>> +};
>> +
>> +/* Returned the support huge page size from the requested PAGESIZE along
>> +   with the requires extra mmap flags.  Returning a 0 value for pagesize
>> +   disables its usage.  */
>> +struct malloc_hugepage_config_t __malloc_hugepage_config (size_t requested)
>> +     attribute_hidden;
>> +
>>   #endif /* _MALLOC_HUGEPAGES_H */
>> diff --git a/sysdeps/unix/sysv/linux/malloc-hugepages.c b/sysdeps/unix/sysv/linux/malloc-hugepages.c
>> index 66589127cd..0eb0c764ad 100644
>> --- a/sysdeps/unix/sysv/linux/malloc-hugepages.c
>> +++ b/sysdeps/unix/sysv/linux/malloc-hugepages.c
>> @@ -17,8 +17,10 @@
>>      not, see <https://www.gnu.org/licenses/>.  */
>>     #include <intprops.h>
>> +#include <dirent.h>
>>   #include <malloc-hugepages.h>
>>   #include <not-cancel.h>
>> +#include <sys/mman.h>
>>     size_t
>>   __malloc_default_thp_pagesize (void)
>> @@ -74,3 +76,126 @@ __malloc_thp_mode (void)
>>       }
>>     return malloc_thp_mode_not_supported;
>>   }
>> +
>> +static size_t
>> +malloc_default_hugepage_size (void)
>> +{
>> +  int fd = __open64_nocancel ("/proc/meminfo", O_RDONLY);
>> +  if (fd == -1)
>> +    return 0;
>> +
>> +  char buf[512];
>> +  off64_t off = 0;
>> +  while (1)
>> +    {
>> +      ssize_t r = __pread64_nocancel (fd, buf, sizeof (buf) - 1, off);
>> +      if (r < 0)
>> +    break;
>> +      buf[r - 1] = '\0';
>> +
>> +      const char *s = strstr (buf, "Hugepagesize:");
>> +      if (s == NULL)
>> +    {
>> +      char *nl = strrchr (buf, '\n');
>> +      if (nl == NULL)
>> +        break;
>> +      off += (nl + 1) - buf;
>> +      continue;
>> +    }
>> +
>> +      /* The default huge page size is in the form:
>> +     Hugepagesize:       NUMBER kB  */
>> +      size_t hpsize = 0;
>> +      s += sizeof ("Hugepagesize: ") - 1;
>> +      for (int i = 0; (s[i] >= '0' && s[i] <= '9') || s[i] == ' '; i++)
>> +    {
>> +      if (s[i] == ' ')
>> +        continue;
>> +      hpsize *= 10;
>> +      hpsize += s[i] - '0';
>> +    }
>> +      return hpsize * 1024;
>> +    }
>> +
>> +  __close_nocancel (fd);
>> +
>> +  return 0;
>> +}
>> +
>> +static inline struct malloc_hugepage_config_t
>> +make_malloc_hugepage_config (size_t pagesize)
>> +{
>> +  int flags = MAP_HUGETLB | (__builtin_ctzll (pagesize) << MAP_HUGE_SHIFT);
>> +  return (struct malloc_hugepage_config_t) { pagesize, flags };
>> +}
>> +
>> +struct malloc_hugepage_config_t
>> +__malloc_hugepage_config (size_t requested)
>> +{
>> +  if (requested == 1)
>> +    {
>> +      size_t pagesize = malloc_default_hugepage_size ();
>> +      if (pagesize != 0)
>> +    return make_malloc_hugepage_config (pagesize);
>> +    }
>> +
>> +  int dirfd = __open64_nocancel ("/sys/kernel/mm/hugepages",
>> +                 O_RDONLY | O_DIRECTORY, 0);
>> +  if (dirfd == -1)
>> +    return (struct malloc_hugepage_config_t) { 0, 0 };
>> +
>> +  bool found = false;
>> +
>> +  char buffer[1024];
>> +  while (true)
>> +    {
>> +#if !IS_IN(libc)
>> +# define __getdents64 getdents64
>> +#endif
>> +      ssize_t ret = __getdents64 (dirfd, buffer, sizeof (buffer));
>> +      if (ret == -1)
>> +    break;
>> +      else if (ret == 0)
>> +        break;
>> +
>> +      char *begin = buffer, *end = buffer + ret;
>> +      while (begin != end)
>> +        {
>> +          unsigned short int d_reclen;
>> +          memcpy (&d_reclen, begin + offsetof (struct dirent64, d_reclen),
>> +                  sizeof (d_reclen));
>> +          const char *dname = begin + offsetof (struct dirent64, d_name);
>> +          begin += d_reclen;
>> +
>> +          if (dname[0] == '.'
>> +          || strncmp (dname, "hugepages-", sizeof ("hugepages-") - 1) != 0)
>> +            continue;
>> +
>> +      /* Each entry represents a supported huge page in the form of:
>> +         hugepages-<size>kB.  */
>> +      size_t hpsize = 0;
>> +      const char *sizestr = dname + sizeof ("hugepages-") - 1;
>> +      for (int i = 0; sizestr[i] >= '0' && sizestr[i] <= '9'; i++)
>> +        {
>> +          hpsize *= 10;
>> +          hpsize += sizestr[i] - '0';
>> +        }
>> +      hpsize *= 1024;
>> +
>> +      if (hpsize == requested)
>> +        {
>> +          found = true;
>> +          break;
>> +        }
>> +        }
>> +      if (found)
>> +    break;
>> +    }
>> +
>> +  __close_nocancel (dirfd);
>> +
>> +  if (found)
>> +    return make_malloc_hugepage_config (requested);
>> +
>> +  return (struct malloc_hugepage_config_t) { 0, 0 };
>> +}
>>
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 1/4] malloc: Add madvise support for Transparent Huge Pages
  2021-08-19 12:00     ` Adhemerval Zanella via Libc-alpha
@ 2021-08-19 12:22       ` Siddhesh Poyarekar via Libc-alpha
  0 siblings, 0 replies; 24+ messages in thread
From: Siddhesh Poyarekar via Libc-alpha @ 2021-08-19 12:22 UTC (permalink / raw
  To: Adhemerval Zanella, libc-alpha; +Cc: Norbert Manthey, Guillaume Morin

On 8/19/21 5:30 PM, Adhemerval Zanella wrote:
> I think you means 'always' and it should be handled by
> __malloc_thp_mode() (which would about 'tph_pagesize'
> to have a value different than 0).
> 
> I also did not considered 'always' because I saw some results
> on powerpc where even with 'always' mode issuing the madvise did
> improve. I am not sure why exactly, it might be the case for
> the program header that is no sufficient aligned (and with
> tunable it would be).
> 
> But maybe for 'always' it would be better to disable the
> madvise() as well.

Yeah we could always do more fine tuning later for architecture-specific 
behaviour, etc.

>> Likewise for page size; this could be cached too.
> 
> I think there is no much sense to cache since it is used once at
> malloc initialization.

Ahh yes, never mind this one then.

Thanks,
Siddhesh

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/4] malloc: Improve Huge Page support
  2021-08-19 12:04       ` Adhemerval Zanella via Libc-alpha
@ 2021-08-19 12:26         ` Siddhesh Poyarekar via Libc-alpha
  2021-08-19 12:42           ` Adhemerval Zanella via Libc-alpha
  0 siblings, 1 reply; 24+ messages in thread
From: Siddhesh Poyarekar via Libc-alpha @ 2021-08-19 12:26 UTC (permalink / raw
  To: Adhemerval Zanella, libc-alpha; +Cc: Norbert Manthey, Guillaume Morin

On 8/19/21 5:34 PM, Adhemerval Zanella wrote:
> I think this would be an additional tunable, we still need to handle
> the case where mmap() fails either in default path (due maximum number
> of mmap() per process by kernel or when the poll is exhausted for
> MAP_HUGETLB).
> 
> So for sbrk() call, should we align the increment to huge page and
> issue the madvise() if the tunable is set to use huge pages?

Yeah it's a reasonable compromise.  I've been thinking about getting rid 
of max_mmaps too; I don't see much use for it anymore.

Siddhesh

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/4] malloc: Improve Huge Page support
  2021-08-19 12:26         ` Siddhesh Poyarekar via Libc-alpha
@ 2021-08-19 12:42           ` Adhemerval Zanella via Libc-alpha
  0 siblings, 0 replies; 24+ messages in thread
From: Adhemerval Zanella via Libc-alpha @ 2021-08-19 12:42 UTC (permalink / raw
  To: Siddhesh Poyarekar, libc-alpha; +Cc: Norbert Manthey, Guillaume Morin



On 19/08/2021 09:26, Siddhesh Poyarekar wrote:
> On 8/19/21 5:34 PM, Adhemerval Zanella wrote:
>> I think this would be an additional tunable, we still need to handle
>> the case where mmap() fails either in default path (due maximum number
>> of mmap() per process by kernel or when the poll is exhausted for
>> MAP_HUGETLB).
>>
>> So for sbrk() call, should we align the increment to huge page and
>> issue the madvise() if the tunable is set to use huge pages?
> 
> Yeah it's a reasonable compromise.  I've been thinking about getting rid of max_mmaps too; I don't see much use for it anymore.

I think it made sense when mmap() is way costly, specially for 32-bit
architectures.  On Linux it is still controlled by a tunable,
/proc/sys/vm/max_map_count, so it might still be case where might
want to avoid the overhead of the mmap failure and fallback to sbrk()
directly.

But I agree that for usual case where mmap() is used it does not make
much sense to try use the tunable, since for cases like threaded
programs sbrk() does not help much.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/4] malloc: Improve Huge Page support
  2021-08-18 14:19 [PATCH v2 0/4] malloc: Improve Huge Page support Adhemerval Zanella via Libc-alpha
                   ` (4 preceding siblings ...)
  2021-08-18 18:11 ` [PATCH v2 0/4] malloc: Improve Huge Page support Siddhesh Poyarekar via Libc-alpha
@ 2021-08-19 16:42 ` Guillaume Morin
  2021-08-19 16:55   ` Adhemerval Zanella via Libc-alpha
  5 siblings, 1 reply; 24+ messages in thread
From: Guillaume Morin @ 2021-08-19 16:42 UTC (permalink / raw
  To: Adhemerval Zanella
  Cc: Norbert Manthey, Guillaume Morin, libc-alpha, Siddhesh Poyarekar

Hi Adhemerval,

On 18 Aug 11:19, Adhemerval Zanella wrote:
> Linux currently supports two ways to use Huge Pages: either by using
> specific flags directly with the syscall (MAP_HUGETLB for mmap(), or
> SHM_HUGETLB for shmget()), or by using Transparent Huge Pages (THP)
> where the kernel will try to move allocated anonymous pages to Huge
> Pages blocks transparent to application.

This approach looks good to me! This is much appreciated.

Are you planning on tackling using the same tunables to allocate
additional heaps (in arena.c)?

It's a little more subtle because of the calls to mprotect() which needs
to be properly aligned for hugetlbfs, and probably for THP as well (to
avoid un-necessary page splitting).

One additional thing to address is the case where mmap() fails with
MAP_HUGETLB because HP allocation fails.  Reverting to the default pages
will match what libhugetlbfs does (i.e just call mmap() again without
MAP_HUGETLB). But I see that Siddhesh and you have already been
discussing this case.

Guillaume.

-- 
Guillaume Morin <guillaume@morinfr.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/4] malloc: Improve Huge Page support
  2021-08-19 16:42 ` Guillaume Morin
@ 2021-08-19 16:55   ` Adhemerval Zanella via Libc-alpha
  2021-08-19 17:17     ` Guillaume Morin
  0 siblings, 1 reply; 24+ messages in thread
From: Adhemerval Zanella via Libc-alpha @ 2021-08-19 16:55 UTC (permalink / raw
  To: libc-alpha, Norbert Manthey, Guillaume Morin, Siddhesh Poyarekar



On 19/08/2021 13:42, Guillaume Morin wrote:
> Hi Adhemerval,
> 
> On 18 Aug 11:19, Adhemerval Zanella wrote:
>> Linux currently supports two ways to use Huge Pages: either by using
>> specific flags directly with the syscall (MAP_HUGETLB for mmap(), or
>> SHM_HUGETLB for shmget()), or by using Transparent Huge Pages (THP)
>> where the kernel will try to move allocated anonymous pages to Huge
>> Pages blocks transparent to application.
> 
> This approach looks good to me! This is much appreciated.
> 
> Are you planning on tackling using the same tunables to allocate
> additional heaps (in arena.c)?
> 
> It's a little more subtle because of the calls to mprotect() which needs
> to be properly aligned for hugetlbfs, and probably for THP as well (to
> avoid un-necessary page splitting).

What do you mean by additional heaps in this case?

> 
> One additional thing to address is the case where mmap() fails with
> MAP_HUGETLB because HP allocation fails.  Reverting to the default pages
> will match what libhugetlbfs does (i.e just call mmap() again without
> MAP_HUGETLB). But I see that Siddhesh and you have already been
> discussing this case.

This is what I did in my patch, it follow the current default allocation
path.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/4] malloc: Improve Huge Page support
  2021-08-19 16:55   ` Adhemerval Zanella via Libc-alpha
@ 2021-08-19 17:17     ` Guillaume Morin
  2021-08-19 17:27       ` Adhemerval Zanella via Libc-alpha
  0 siblings, 1 reply; 24+ messages in thread
From: Guillaume Morin @ 2021-08-19 17:17 UTC (permalink / raw
  To: Adhemerval Zanella
  Cc: Norbert Manthey, Guillaume Morin, libc-alpha, Siddhesh Poyarekar

On 19 Aug 13:55, Adhemerval Zanella wrote:
> On 19/08/2021 13:42, Guillaume Morin wrote:
> > Are you planning on tackling using the same tunables to allocate
> > additional heaps (in arena.c)?
> > 
> > It's a little more subtle because of the calls to mprotect() which needs
> > to be properly aligned for hugetlbfs, and probably for THP as well (to
> > avoid un-necessary page splitting).
> 
> What do you mean by additional heaps in this case?

I mean what is done in new_heap() in arena.c.

> > One additional thing to address is the case where mmap() fails with
> > MAP_HUGETLB because HP allocation fails.  Reverting to the default pages
> > will match what libhugetlbfs does (i.e just call mmap() again without
> > MAP_HUGETLB). But I see that Siddhesh and you have already been
> > discussing this case.
> 
> This is what I did in my patch, it follow the current default allocation
> path.

Yes, you are right. I misread. You've been discussing adding a tunable
to decide if that should fail or not. My 2 cents as a user: it's hard
for me to imagine that users would like malloc() to fail in this case.
Even if the admin allows surplus pages (i.e create new HPs on the fly),
this is far from guaranteed to succeed.

Guillaume.

-- 
Guillaume Morin <guillaume@morinfr.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 0/4] malloc: Improve Huge Page support
  2021-08-19 17:17     ` Guillaume Morin
@ 2021-08-19 17:27       ` Adhemerval Zanella via Libc-alpha
  0 siblings, 0 replies; 24+ messages in thread
From: Adhemerval Zanella via Libc-alpha @ 2021-08-19 17:27 UTC (permalink / raw
  To: libc-alpha, Norbert Manthey, Guillaume Morin, Siddhesh Poyarekar



On 19/08/2021 14:17, Guillaume Morin wrote:
> On 19 Aug 13:55, Adhemerval Zanella wrote:
>> On 19/08/2021 13:42, Guillaume Morin wrote:
>>> Are you planning on tackling using the same tunables to allocate
>>> additional heaps (in arena.c)?
>>>
>>> It's a little more subtle because of the calls to mprotect() which needs
>>> to be properly aligned for hugetlbfs, and probably for THP as well (to
>>> avoid un-necessary page splitting).
>>
>> What do you mean by additional heaps in this case?
> 
> I mean what is done in new_heap() in arena.c.

Good catch, I haven't take in consideration the new_heap() code.  I think
we should the same tunable to drive the hguepage usage in this case as
well.

> 
>>> One additional thing to address is the case where mmap() fails with
>>> MAP_HUGETLB because HP allocation fails.  Reverting to the default pages
>>> will match what libhugetlbfs does (i.e just call mmap() again without
>>> MAP_HUGETLB). But I see that Siddhesh and you have already been
>>> discussing this case.
>>
>> This is what I did in my patch, it follow the current default allocation
>> path.
> 
> Yes, you are right. I misread. You've been discussing adding a tunable
> to decide if that should fail or not. My 2 cents as a user: it's hard
> for me to imagine that users would like malloc() to fail in this case.
> Even if the admin allows surplus pages (i.e create new HPs on the fly),
> this is far from guaranteed to succeed.
> 
> Guillaume.
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 4/4] malloc: Add Huge Page support for sysmalloc
  2021-08-18 14:20 ` [PATCH v2 4/4] malloc: Add Huge Page support for sysmalloc Adhemerval Zanella via Libc-alpha
  2021-08-19  1:03   ` Siddhesh Poyarekar via Libc-alpha
@ 2021-08-19 17:58   ` Matheus Castanho via Libc-alpha
  2021-08-19 18:50     ` Adhemerval Zanella via Libc-alpha
  1 sibling, 1 reply; 24+ messages in thread
From: Matheus Castanho via Libc-alpha @ 2021-08-19 17:58 UTC (permalink / raw
  To: Adhemerval Zanella
  Cc: Norbert Manthey, Tulio Magno Quites Machado Filho,
	Guillaume Morin, libc-alpha, Siddhesh Poyarekar

[-- Attachment #1: Type: text/plain, Size: 12329 bytes --]


Adhemerval Zanella via Libc-alpha <libc-alpha@sourceware.org> writes:

> A new tunable, 'glibc.malloc.mmap_hugetlb', adds support to use Huge Page
> support directly with mmap() calls.  The required supported sizes and
> flags for mmap() are provided by an arch-specific internal hook
> malloc_hp_config().
>
> Currently it first try mmap() using the huge page size and fallback to
> default page size and sbrk() call if kernel returns MMAP_FAILED.
>
> The default malloc_hp_config() implementation does not enable it even
> if the tunable is set.
>
> Checked on x86_64-linux-gnu.
> ---
>  NEWS                                       |   4 +
>  elf/dl-tunables.list                       |   4 +
>  elf/tst-rtld-list-tunables.exp             |   1 +
>  malloc/arena.c                             |   2 +
>  malloc/malloc.c                            |  35 +++++-
>  manual/tunables.texi                       |  14 +++
>  sysdeps/generic/malloc-hugepages.c         |   6 +
>  sysdeps/generic/malloc-hugepages.h         |  12 ++
>  sysdeps/unix/sysv/linux/malloc-hugepages.c | 125 +++++++++++++++++++++
>  9 files changed, 200 insertions(+), 3 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index 9b2345d08c..412bf3e6f8 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -14,6 +14,10 @@ Major new features:
>    It might improve performance with Transparent Huge Pages madvise mode
>    depending of the workload.
>
> +* On Linux, a new tunable, glibc.malloc.mmap_hugetlb, can be used to
> +  instruct malloc to try use Huge Pages when allocate memory with mmap()
> +  calls (through the use of MAP_HUGETLB).
> +
>  Deprecated and removed features, and other changes affecting compatibility:
>
>    [Add deprecations, removals and changes affecting compatibility here]
> diff --git a/elf/dl-tunables.list b/elf/dl-tunables.list
> index 67df6dbc2c..209c2d8592 100644
> --- a/elf/dl-tunables.list
> +++ b/elf/dl-tunables.list
> @@ -97,6 +97,10 @@ glibc {
>        minval: 0
>        maxval: 1
>      }
> +    mmap_hugetlb {
> +      type: SIZE_T
> +      minval: 0
> +    }
>    }
>    cpu {
>      hwcap_mask {
> diff --git a/elf/tst-rtld-list-tunables.exp b/elf/tst-rtld-list-tunables.exp
> index d8109fa31c..49f033ce91 100644
> --- a/elf/tst-rtld-list-tunables.exp
> +++ b/elf/tst-rtld-list-tunables.exp
> @@ -1,6 +1,7 @@
>  glibc.malloc.arena_max: 0x0 (min: 0x1, max: 0x[f]+)
>  glibc.malloc.arena_test: 0x0 (min: 0x1, max: 0x[f]+)
>  glibc.malloc.check: 0 (min: 0, max: 3)
> +glibc.malloc.mmap_hugetlb: 0x0 (min: 0x0, max: 0x[f]+)
>  glibc.malloc.mmap_max: 0 (min: 0, max: 2147483647)
>  glibc.malloc.mmap_threshold: 0x0 (min: 0x0, max: 0x[f]+)
>  glibc.malloc.mxfast: 0x0 (min: 0x0, max: 0x[f]+)
> diff --git a/malloc/arena.c b/malloc/arena.c
> index 81bff54303..4efb5581c1 100644
> --- a/malloc/arena.c
> +++ b/malloc/arena.c
> @@ -232,6 +232,7 @@ TUNABLE_CALLBACK_FNDECL (set_tcache_unsorted_limit, size_t)
>  #endif
>  TUNABLE_CALLBACK_FNDECL (set_mxfast, size_t)
>  TUNABLE_CALLBACK_FNDECL (set_thp_madvise, int32_t)
> +TUNABLE_CALLBACK_FNDECL (set_mmap_hugetlb, size_t)
>  #else
>  /* Initialization routine. */
>  #include <string.h>
> @@ -333,6 +334,7 @@ ptmalloc_init (void)
>  # endif
>    TUNABLE_GET (mxfast, size_t, TUNABLE_CALLBACK (set_mxfast));
>    TUNABLE_GET (thp_madvise, int32_t, TUNABLE_CALLBACK (set_thp_madvise));
> +  TUNABLE_GET (mmap_hugetlb, size_t, TUNABLE_CALLBACK (set_mmap_hugetlb));
>  #else
>    if (__glibc_likely (_environ != NULL))
>      {
> diff --git a/malloc/malloc.c b/malloc/malloc.c
> index 4bfcea286f..8cf2d6855e 100644
> --- a/malloc/malloc.c
> +++ b/malloc/malloc.c
> @@ -1884,6 +1884,10 @@ struct malloc_par
>  #if HAVE_TUNABLES
>    /* Transparent Large Page support.  */
>    INTERNAL_SIZE_T thp_pagesize;
> +  /* A value different than 0 means to align mmap allocation to hp_pagesize
> +     add hp_flags on flags.  */
> +  INTERNAL_SIZE_T hp_pagesize;
> +  int hp_flags;
>  #endif
>
>    /* Memory map support */
> @@ -2415,7 +2419,8 @@ do_check_malloc_state (mstate av)
>   */
>
>  static void *
> -sysmalloc_mmap (INTERNAL_SIZE_T nb, size_t pagesize, int extra_flags, mstate av)
> +sysmalloc_mmap (INTERNAL_SIZE_T nb, size_t pagesize, int extra_flags, mstate av,
> +		bool set_thp)
>  {
>    long int size;
>
> @@ -2442,7 +2447,8 @@ sysmalloc_mmap (INTERNAL_SIZE_T nb, size_t pagesize, int extra_flags, mstate av)
>    if (mm == MAP_FAILED)
>      return mm;
>
> -  sysmadvise_thp (mm, size);
> +  if (set_thp)
> +    sysmadvise_thp (mm, size);
>
>    /*
>      The offset to the start of the mmapped region is stored in the prev_size
> @@ -2531,7 +2537,18 @@ sysmalloc (INTERNAL_SIZE_T nb, mstate av)
>  	  && (mp_.n_mmaps < mp_.n_mmaps_max)))
>      {
>      try_mmap:
> -      char *mm = sysmalloc_mmap (nb, pagesize, 0, av);
> +      char *mm;
> +#if HAVE_TUNABLES
> +      if (mp_.hp_pagesize > 0)
> +	{
> +	  /* There is no need to isse the THP madvise call if Huge Pages are
> +	     used directly.  */
> +	  mm = sysmalloc_mmap (nb, mp_.hp_pagesize, mp_.hp_flags, av, false);
> +	  if (mm != MAP_FAILED)
> +	    return mm;
> +	}
> +#endif
> +      mm = sysmalloc_mmap (nb, pagesize, 0, av, true);
>        if (mm != MAP_FAILED)
>  	return mm;
>        tried_mmap = true;
> @@ -5405,6 +5422,18 @@ do_set_thp_madvise (int32_t value)
>      }
>    return 0;
>  }
> +
> +static __always_inline int
> +do_set_mmap_hugetlb (size_t value)
> +{
> +  if (value > 0)
> +    {
> +      struct malloc_hugepage_config_t cfg = __malloc_hugepage_config (value);
> +      mp_.hp_pagesize = cfg.pagesize;
> +      mp_.hp_flags = cfg.flags;
> +    }
> +  return 0;
> +}
>  #endif
>
>  int
> diff --git a/manual/tunables.texi b/manual/tunables.texi
> index 93c46807f9..4da6a02778 100644
> --- a/manual/tunables.texi
> +++ b/manual/tunables.texi
> @@ -279,6 +279,20 @@ The default value of this tunable is @code{0}, which disable its usage.
>  Setting to a positive value enable the @code{madvise} call.
>  @end deftp
>
> +@deftp Tunable glibc.malloc.mmap_hugetlb
> +This tunable enable the use of Huge Pages when the system supports it (currently
> +only Linux).  It is done by aligning the memory size and passing the required
> +flags (@code{MAP_HUGETLB} on Linux) when issuing the @code{mmap} to allocate
> +memory from the system.
> +
> +The default value of this tunable is @code{0}, which disable its usage.
> +The special value @code{1} will try to gather the system default huge page size,
> +while a value larger than @code{1} will try to match it with the supported system
> +huge page size.  If either no default huge page size could be obtained or if the
> +requested size does not match the supported ones, the huge pages supports will be
> +disabled.
> +@end deftp
> +
>  @node Dynamic Linking Tunables
>  @section Dynamic Linking Tunables
>  @cindex dynamic linking tunables
> diff --git a/sysdeps/generic/malloc-hugepages.c b/sysdeps/generic/malloc-hugepages.c
> index 262bcdbeb8..e5f5c1ec98 100644
> --- a/sysdeps/generic/malloc-hugepages.c
> +++ b/sysdeps/generic/malloc-hugepages.c
> @@ -29,3 +29,9 @@ __malloc_thp_mode (void)
>  {
>    return malloc_thp_mode_not_supported;
>  }
> +
> +/* Return the default transparent huge page size.  */
> +struct malloc_hugepage_config_t __malloc_hugepage_config (size_t requested)
> +{
> +  return (struct malloc_hugepage_config_t) { 0, 0 };
> +}
> diff --git a/sysdeps/generic/malloc-hugepages.h b/sysdeps/generic/malloc-hugepages.h
> index 664cda9b67..27f7adfea5 100644
> --- a/sysdeps/generic/malloc-hugepages.h
> +++ b/sysdeps/generic/malloc-hugepages.h
> @@ -34,4 +34,16 @@ enum malloc_thp_mode_t
>
>  enum malloc_thp_mode_t __malloc_thp_mode (void) attribute_hidden;
>
> +struct malloc_hugepage_config_t
> +{
> +  size_t pagesize;
> +  int flags;
> +};
> +
> +/* Returned the support huge page size from the requested PAGESIZE along
> +   with the requires extra mmap flags.  Returning a 0 value for pagesize
> +   disables its usage.  */
> +struct malloc_hugepage_config_t __malloc_hugepage_config (size_t requested)
> +     attribute_hidden;
> +
>  #endif /* _MALLOC_HUGEPAGES_H */
> diff --git a/sysdeps/unix/sysv/linux/malloc-hugepages.c b/sysdeps/unix/sysv/linux/malloc-hugepages.c
> index 66589127cd..0eb0c764ad 100644
> --- a/sysdeps/unix/sysv/linux/malloc-hugepages.c
> +++ b/sysdeps/unix/sysv/linux/malloc-hugepages.c
> @@ -17,8 +17,10 @@
>     not, see <https://www.gnu.org/licenses/>.  */
>
>  #include <intprops.h>
> +#include <dirent.h>
>  #include <malloc-hugepages.h>
>  #include <not-cancel.h>
> +#include <sys/mman.h>
>
>  size_t
>  __malloc_default_thp_pagesize (void)
> @@ -74,3 +76,126 @@ __malloc_thp_mode (void)
>      }
>    return malloc_thp_mode_not_supported;
>  }
> +
> +static size_t
> +malloc_default_hugepage_size (void)
> +{
> +  int fd = __open64_nocancel ("/proc/meminfo", O_RDONLY);
> +  if (fd == -1)
> +    return 0;
> +
> +  char buf[512];
> +  off64_t off = 0;
> +  while (1)
> +    {
> +      ssize_t r = __pread64_nocancel (fd, buf, sizeof (buf) - 1, off);
> +      if (r < 0)
> +	break;
> +      buf[r - 1] = '\0';
> +
> +      const char *s = strstr (buf, "Hugepagesize:");
> +      if (s == NULL)
> +	{
> +	  char *nl = strrchr (buf, '\n');
> +	  if (nl == NULL)
> +	    break;
> +	  off += (nl + 1) - buf;
> +	  continue;
> +	}
> +
> +      /* The default huge page size is in the form:
> +	 Hugepagesize:       NUMBER kB  */
> +      size_t hpsize = 0;
> +      s += sizeof ("Hugepagesize: ") - 1;
> +      for (int i = 0; (s[i] >= '0' && s[i] <= '9') || s[i] == ' '; i++)
> +	{
> +	  if (s[i] == ' ')
> +	    continue;
> +	  hpsize *= 10;
> +	  hpsize += s[i] - '0';
> +	}
> +      return hpsize * 1024;
> +    }
> +
> +  __close_nocancel (fd);
> +
> +  return 0;
> +}
> +
> +static inline struct malloc_hugepage_config_t
> +make_malloc_hugepage_config (size_t pagesize)
> +{
> +  int flags = MAP_HUGETLB | (__builtin_ctzll (pagesize) << MAP_HUGE_SHIFT);
> +  return (struct malloc_hugepage_config_t) { pagesize, flags };
> +}
> +
> +struct malloc_hugepage_config_t
> +__malloc_hugepage_config (size_t requested)
> +{
> +  if (requested == 1)
> +    {
> +      size_t pagesize = malloc_default_hugepage_size ();
> +      if (pagesize != 0)
> +	return make_malloc_hugepage_config (pagesize);
> +    }
> +
> +  int dirfd = __open64_nocancel ("/sys/kernel/mm/hugepages",
> +				 O_RDONLY | O_DIRECTORY, 0);
> +  if (dirfd == -1)
> +    return (struct malloc_hugepage_config_t) { 0, 0 };
> +
> +  bool found = false;
> +
> +  char buffer[1024];
> +  while (true)
> +    {
> +#if !IS_IN(libc)
> +# define __getdents64 getdents64
> +#endif
> +      ssize_t ret = __getdents64 (dirfd, buffer, sizeof (buffer));
> +      if (ret == -1)
> +	break;
> +      else if (ret == 0)
> +        break;
> +
> +      char *begin = buffer, *end = buffer + ret;
> +      while (begin != end)
> +        {
> +          unsigned short int d_reclen;
> +          memcpy (&d_reclen, begin + offsetof (struct dirent64, d_reclen),
> +                  sizeof (d_reclen));
> +          const char *dname = begin + offsetof (struct dirent64, d_name);
> +          begin += d_reclen;
> +
> +          if (dname[0] == '.'
> +	      || strncmp (dname, "hugepages-", sizeof ("hugepages-") - 1) != 0)
> +            continue;
> +
> +	  /* Each entry represents a supported huge page in the form of:
> +	     hugepages-<size>kB.  */
> +	  size_t hpsize = 0;
> +	  const char *sizestr = dname + sizeof ("hugepages-") - 1;
> +	  for (int i = 0; sizestr[i] >= '0' && sizestr[i] <= '9'; i++)
> +	    {
> +	      hpsize *= 10;
> +	      hpsize += sizestr[i] - '0';
> +	    }
> +	  hpsize *= 1024;
> +
> +	  if (hpsize == requested)
> +	    {
> +	      found = true;
> +	      break;
> +	    }
> +        }
> +      if (found)
> +	break;
> +    }
> +
> +  __close_nocancel (dirfd);
> +
> +  if (found)
> +    return make_malloc_hugepage_config (requested);
> +
> +  return (struct malloc_hugepage_config_t) { 0, 0 };
> +}

Hi Adhemerval,

I tested this patchset on a POWER9, and I'm seeing the following test
failures when running make check with glibc.malloc.mmap_hugetlb=1:

malloc/tst-free-errno
malloc/tst-free-errno-malloc-check
malloc/tst-free-errno-mcheck
posix/tst-exec
posix/tst-exec-static
posix/tst-spawn
posix/tst-spawn-static
posix/tst-spawn5

I'm attaching a summary of the contents of the .out files for each test.


[-- Attachment #2: Summary of failing tests --]
[-- Type: text/plain, Size: 3604 bytes --]

$ failing="malloc/tst-free-errno malloc/tst-free-errno-malloc-check malloc/tst-free-errno-mcheck posix/tst-exec posix/tst-exec-static posix/tst-spawn posix/tst-spawn-static posix/tst-spawn5"
$
$ for t in $failing; do echo "~> $t"; { make test t=$t; GLIBC_TUNABLES="glibc.malloc.mmap_hugetlb=1" make test t=$t; } | grep -Ei "^fail|pass"; cat $t.out; echo; done

~> malloc/tst-free-errno
double free or corruption (out)
PASS: malloc/tst-free-errno
FAIL: malloc/tst-free-errno
Didn't expect signal from child: got `Aborted'

~> malloc/tst-free-errno-malloc-check
PASS: malloc/tst-free-errno-malloc-check
FAIL: malloc/tst-free-errno-malloc-check
error: xmmap.c:28: mmap of 16908288 bytes, prot=0x3, flags=0x32: Device or resource busy
error: 1 test failures

~> malloc/tst-free-errno-mcheck
memory clobbered past end of allocated block
PASS: malloc/tst-free-errno-mcheck
FAIL: malloc/tst-free-errno-mcheck
Didn't expect signal from child: got `Aborted'

~> posix/tst-exec
/home/mscastanho/build/glibc/posix/tst-exec: file 1 (4) is not closed
PASS: posix/tst-exec
FAIL: posix/tst-exec

~> posix/tst-exec-static
/home/mscastanho/build/glibc/posix/tst-exec-static: file 1 (4) is not closed
PASS: posix/tst-exec-static
FAIL: posix/tst-exec-static

~> posix/tst-spawn
PASS: posix/tst-spawn
FAIL: posix/tst-spawn
tst-spawn.c:127: numeric comparison failure
   left: 0 (0x0); from: lseek (fd1, 0, SEEK_CUR)
  right: -1 (0xffffffffffffffff); from: (off_t) -1
error: 1 test failures
tst-spawn.c:244: numeric comparison failure
   left: 1 (0x1); from: WEXITSTATUS (status)
  right: 0 (0x0); from: 0
tst-spawn.c:127: numeric comparison failure
   left: 0 (0x0); from: lseek (fd1, 0, SEEK_CUR)
  right: -1 (0xffffffffffffffff); from: (off_t) -1
error: 1 test failures
tst-spawn.c:258: numeric comparison failure
   left: 1 (0x1); from: WEXITSTATUS (status)
  right: 0 (0x0); from: 0
error: 2 test failures

~> posix/tst-spawn-static
PASS: posix/tst-spawn-static
FAIL: posix/tst-spawn-static
tst-spawn.c:127: numeric comparison failure
   left: 0 (0x0); from: lseek (fd1, 0, SEEK_CUR)
  right: -1 (0xffffffffffffffff); from: (off_t) -1
error: 1 test failures
tst-spawn.c:244: numeric comparison failure
   left: 1 (0x1); from: WEXITSTATUS (status)
  right: 0 (0x0); from: 0
tst-spawn.c:127: numeric comparison failure
   left: 0 (0x0); from: lseek (fd1, 0, SEEK_CUR)
  right: -1 (0xffffffffffffffff); from: (off_t) -1
error: 1 test failures
tst-spawn.c:258: numeric comparison failure
   left: 1 (0x1); from: WEXITSTATUS (status)
  right: 0 (0x0); from: 0
error: 2 test failures

~> posix/tst-spawn5
PASS: posix/tst-spawn5
FAIL: posix/tst-spawn5
error: tst-spawn5.c:128: unexpected open file descriptor 54: /proc/meminfo
tst-spawn5.c:182: numeric comparison failure
   left: 1 (0x1); from: WEXITSTATUS (status)
  right: 0 (0x0); from: 0
error: tst-spawn5.c:128: unexpected open file descriptor 54: /proc/meminfo
tst-spawn5.c:182: numeric comparison failure
   left: 1 (0x1); from: WEXITSTATUS (status)
  right: 0 (0x0); from: 0
error: tst-spawn5.c:128: unexpected open file descriptor 5: /proc/meminfo
tst-spawn5.c:182: numeric comparison failure
   left: 1 (0x1); from: WEXITSTATUS (status)
  right: 0 (0x0); from: 0
error: tst-spawn5.c:128: unexpected open file descriptor 4: /proc/meminfo
tst-spawn5.c:182: numeric comparison failure
   left: 1 (0x1); from: WEXITSTATUS (status)
  right: 0 (0x0); from: 0
error: tst-spawn5.c:128: unexpected open file descriptor 6: /proc/meminfo
tst-spawn5.c:182: numeric comparison failure
   left: 1 (0x1); from: WEXITSTATUS (status)
  right: 0 (0x0); from: 0
error: 5 test failures

[-- Attachment #3: Type: text/plain, Size: 21 bytes --]


--
Matheus Castanho

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 4/4] malloc: Add Huge Page support for sysmalloc
  2021-08-19 17:58   ` Matheus Castanho via Libc-alpha
@ 2021-08-19 18:50     ` Adhemerval Zanella via Libc-alpha
  2021-08-20 12:34       ` Matheus Castanho via Libc-alpha
  0 siblings, 1 reply; 24+ messages in thread
From: Adhemerval Zanella via Libc-alpha @ 2021-08-19 18:50 UTC (permalink / raw
  To: Matheus Castanho
  Cc: Norbert Manthey, Tulio Magno Quites Machado Filho,
	Guillaume Morin, libc-alpha, Siddhesh Poyarekar



On 19/08/2021 14:58, Matheus Castanho wrote:
> Hi Adhemerval,
> 
> I tested this patchset on a POWER9, and I'm seeing the following test
> failures when running make check with glibc.malloc.mmap_hugetlb=1:

Thanks for checking on this.

> 
> malloc/tst-free-errno
> malloc/tst-free-errno-malloc-check
> malloc/tst-free-errno-mcheck

These one I couldn't really reproduce it on gcc farm power machines,
a power9 with 2M huge page default and power8 with 16M default. Both
didn't have any page allocated in the poll. I don't have admin access
so I can change the pool size to check what is happening.

I also tested on my x86_64 environment without any pages in the poll,
with 4 pages in the pool and with 10 pages.  

If you could the stacktrace from where we get the
"Didn't expect signal from child: got `Aborted'" it would be useful.

It could be also something related to /proc/sys/vm/max_map_count
value, since it mmap seems to be failing for some reason.

> posix/tst-exec
> posix/tst-exec-static
> posix/tst-spawn
> posix/tst-spawn-static
> posix/tst-spawn5

These are an overlook at 'malloc_default_hugepage_size()' where it
does not close the file descriptor on success.  I have fixed it.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v2 4/4] malloc: Add Huge Page support for sysmalloc
  2021-08-19 18:50     ` Adhemerval Zanella via Libc-alpha
@ 2021-08-20 12:34       ` Matheus Castanho via Libc-alpha
  0 siblings, 0 replies; 24+ messages in thread
From: Matheus Castanho via Libc-alpha @ 2021-08-20 12:34 UTC (permalink / raw
  To: Adhemerval Zanella
  Cc: Norbert Manthey, Tulio Magno Quites Machado Filho,
	Guillaume Morin, libc-alpha, Siddhesh Poyarekar


Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:

> On 19/08/2021 14:58, Matheus Castanho wrote:
>> Hi Adhemerval,
>>
>> I tested this patchset on a POWER9, and I'm seeing the following test
>> failures when running make check with glibc.malloc.mmap_hugetlb=1:
>
> Thanks for checking on this.
>
>>
>> malloc/tst-free-errno
>> malloc/tst-free-errno-malloc-check
>> malloc/tst-free-errno-mcheck
>
> These one I couldn't really reproduce it on gcc farm power machines,
> a power9 with 2M huge page default and power8 with 16M default. Both
> didn't have any page allocated in the poll. I don't have admin access
> so I can change the pool size to check what is happening.
>
> I also tested on my x86_64 environment without any pages in the poll,
> with 4 pages in the pool and with 10 pages.
>

I confirm that without pages in the pool the tests pass correctly. Only
when I add them to the pool things start failing. In this case I'm
reserving 500 16 MB pages:

$ grep -i hugepages /proc/meminfo
AnonHugePages:         0 kB
ShmemHugePages:        0 kB
FileHugePages:         0 kB
HugePages_Total:     500
HugePages_Free:      500
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:      16384 kB

> If you could the stacktrace from where we get the
> "Didn't expect signal from child: got `Aborted'" it would be useful.
>

This is what GDB is showing me when the abort happens:

#0  0x00007ffff7dccf00 in __pthread_kill_internal (threadid=<optimized out>, signo=<optimized out>) at pthread_kill.c:44
#1  0x00007ffff7d6e26c in __GI_raise (sig=<optimized out>) at ../sysdeps/posix/raise.c:26
#2  0x00007ffff7d50490 in __GI_abort () at abort.c:79
#3  0x00007ffff7dba770 in __libc_message (action=<optimized out>, fmt=<optimized out>) at ../sysdeps/posix/libc_fatal.c:155
#4  0x00007ffff7ddc4e8 in malloc_printerr (str=<optimized out>, str@entry=0x7ffff7efdc90 "double free or corruption (out)") at malloc.c:5654
#5  0x00007ffff7ddefe8 in _int_free (av=0x7ffff7f60e30 <main_arena>, p=0x7ffff80203d0, have_lock=<optimized out>, have_lock@entry=0) at malloc.c:4555
#6  0x00007ffff7de2160 in __GI___libc_free (mem=<optimized out>) at malloc.c:3358
#7  0x0000000010001ee4 in do_test () at tst-free-errno.c:123
#8  0x0000000010002730 in run_test_function (argc=argc@entry=1, argv=argv@entry=0x7fffffffede0, config=config@entry=0x7fffffffe950) at support_test_main.c:232
#9  0x00000000100032fc in support_test_main (argc=1, argv=0x7fffffffede0, config=0x7fffffffe950) at support_test_main.c:431
#10 0x00000000100019d0 in main (argc=<optimized out>, argv=<optimized out>) at ../support/test-driver.c:168
#11 0x00007ffff7d50818 in __libc_start_call_main (main=main@entry=0x10001980 <main>, argc=argc@entry=1, argv=argv@entry=0x7fffffffede0, auxvec=auxvec@entry=0x7fffffffef68) at ../sysdeps/nptl/libc_start_call_main.h:58
#12 0x00007ffff7d50a00 in generic_start_main (fini=<optimized out>, stack_end=<optimized out>, rtld_fini=<optimized out>, init=<optimized out>, auxvec=<optimized out>, argv=<optimized out>, argc=<optimized out>, main=<optimized out>) at ../csu/libc-start.c:409
#13 __libc_start_main_impl (argc=1, argv=0x7fffffffede0, ev=<optimized out>, auxvec=0x7fffffffef68, rtld_fini=<optimized out>, stinfo=<optimized out>, stack_on_entry=<optimized out>) at ../sysdeps/unix/sysv/linux/powerpc/libc-start.c:98
#14 0x0000000000000000 in ?? ()

> It could be also something related to /proc/sys/vm/max_map_count
> value, since it mmap seems to be failing for some reason.
>

This is what the machine I'm using now has:

$ cat /proc/sys/vm/max_map_count
65530

>> posix/tst-exec
>> posix/tst-exec-static
>> posix/tst-spawn
>> posix/tst-spawn-static
>> posix/tst-spawn5
>
> These are an overlook at 'malloc_default_hugepage_size()' where it
> does not close the file descriptor on success.  I have fixed it.

Ok, thanks!

--
Matheus Castanho

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2021-08-20 12:35 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-08-18 14:19 [PATCH v2 0/4] malloc: Improve Huge Page support Adhemerval Zanella via Libc-alpha
2021-08-18 14:19 ` [PATCH v2 1/4] malloc: Add madvise support for Transparent Huge Pages Adhemerval Zanella via Libc-alpha
2021-08-18 18:42   ` Siddhesh Poyarekar via Libc-alpha
2021-08-19 12:00     ` Adhemerval Zanella via Libc-alpha
2021-08-19 12:22       ` Siddhesh Poyarekar via Libc-alpha
2021-08-18 14:19 ` [PATCH v2 2/4] malloc: Add THP/madvise support for sbrk Adhemerval Zanella via Libc-alpha
2021-08-18 14:19 ` [PATCH v2 3/4] malloc: Move mmap logic to its own function Adhemerval Zanella via Libc-alpha
2021-08-19  0:47   ` Siddhesh Poyarekar via Libc-alpha
2021-08-18 14:20 ` [PATCH v2 4/4] malloc: Add Huge Page support for sysmalloc Adhemerval Zanella via Libc-alpha
2021-08-19  1:03   ` Siddhesh Poyarekar via Libc-alpha
2021-08-19 12:08     ` Adhemerval Zanella via Libc-alpha
2021-08-19 17:58   ` Matheus Castanho via Libc-alpha
2021-08-19 18:50     ` Adhemerval Zanella via Libc-alpha
2021-08-20 12:34       ` Matheus Castanho via Libc-alpha
2021-08-18 18:11 ` [PATCH v2 0/4] malloc: Improve Huge Page support Siddhesh Poyarekar via Libc-alpha
2021-08-19 11:26   ` Adhemerval Zanella via Libc-alpha
2021-08-19 11:48     ` Siddhesh Poyarekar via Libc-alpha
2021-08-19 12:04       ` Adhemerval Zanella via Libc-alpha
2021-08-19 12:26         ` Siddhesh Poyarekar via Libc-alpha
2021-08-19 12:42           ` Adhemerval Zanella via Libc-alpha
2021-08-19 16:42 ` Guillaume Morin
2021-08-19 16:55   ` Adhemerval Zanella via Libc-alpha
2021-08-19 17:17     ` Guillaume Morin
2021-08-19 17:27       ` Adhemerval Zanella via Libc-alpha

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).