From: Cupertino Miranda via Libc-alpha <libc-alpha@sourceware.org>
To: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: libc-alpha@sourceware.org, Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Subject: Re: [PATCH] nptl: Disable THP on thread stack if it incurs in large RSS usage
Date: Tue, 16 May 2023 15:30:06 +0100 [thread overview]
Message-ID: <875y8stklt.fsf@oracle.com> (raw)
In-Reply-To: <20230420172436.2013698-1-adhemerval.zanella@linaro.org>
Hi Adhemerval,
Appologies for such late reply.
For what it is worth, the patch LGTM.
Regards,
Cupertino
Adhemerval Zanella writes:
> If Transparent Huge Page (THP) is set to 'always', the thread stack might
> be backed by Huge Pages depending of the asigned address by the kernel and
> the resulting guard position. If the guard page is within the same large
> page that might be used by the stack itself, changing stack permission
> will make the allocated range no longer be server with THP. The kernel
> will then revert back using default page size.
>
> In this case, besides the aditional work, the kernel will need to potential
> keep all the pages since it can not distinguish which one was really touched
> by the process. This result in a large RSS usage than just madvise the range
> to not use huge pages.
>
> The glibc.pthread.stack_hugetlb is still useful for the case where the
> caller either setup no guard page or a guard page multiple of default THP
> size. In this case, the kernel might potentially backup the stack with
> THP, but it would be up to the thread stack profile if THP is benefitial
> or not.
>
> Checked on x86_64-linux-gnu.
> ---
> nptl/allocatestack.c | 34 ++++++++++++++++
> sysdeps/generic/malloc-hugepages.h | 1 +
> sysdeps/unix/sysv/linux/malloc-hugepages.c | 46 ++++++++++++++++++----
> 3 files changed, 74 insertions(+), 7 deletions(-)
>
> diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
> index f9d8cdfd08..1eb34f816c 100644
> --- a/nptl/allocatestack.c
> +++ b/nptl/allocatestack.c
> @@ -33,6 +33,7 @@
> #include <nptl-stack.h>
> #include <libc-lock.h>
> #include <tls-internal.h>
> +#include <malloc-hugepages.h>
>
> /* Default alignment of stack. */
> #ifndef STACK_ALIGN
> @@ -206,6 +207,31 @@ advise_stack_range (void *mem, size_t size, uintptr_t pd, size_t guardsize)
> #endif
> }
>
> +/* If Transparent Huge Page (THP) is set to 'always', the thread stack might
> + be backed by Huge Pages depending of the asigned address by the kernel and
> + if resulting guard position. If the guard page is within the same large
> + page that might be used by the stack itself, changing stack permission
> + will make the allocated range no longer be server with THP. The kernel will
> + then revert back using default page size.
> +
> + In this case, besides the aditional work, the kernel will need to potential
> + keep all the pages since it can distinguish which one was really touched by
> + the process. This result in a large RSS usage than just madvise the range
> + to not use huge pages. */
> +static __always_inline int
> +advise_thp (void *mem, size_t size, char *guard)
> +{
> + enum malloc_thp_mode_t thpmode = __malloc_thp_mode ();
> + if (thpmode != malloc_thp_mode_always)
> + return 0;
> +
> + unsigned long int thpsize = __malloc_default_thp_pagesize ();
> + if (PTR_ALIGN_DOWN (mem, thpsize) != PTR_ALIGN_DOWN (guard, thpsize))
> + return 0;
> +
> + return __madvise (mem, size, MADV_NOHUGEPAGE);
> +}
> +
> /* Returns a usable stack for a new thread either by allocating a
> new stack or reusing a cached stack of sufficient size.
> ATTR must be non-NULL and point to a valid pthread_attr.
> @@ -396,6 +422,14 @@ allocate_stack (const struct pthread_attr *attr, struct pthread **pdp,
> {
> char *guard = guard_position (mem, size, guardsize, pd,
> pagesize_m1);
> +
> + if (__glibc_unlikely (__nptl_stack_hugetlb == 1))
> + {
> + int r = advise_thp (mem, size, guard);
> + if (r != 0)
> + return r;
> + }
> +
> if (setup_stack_prot (mem, size, guard, guardsize, prot) != 0)
> {
> __munmap (mem, size);
> diff --git a/sysdeps/generic/malloc-hugepages.h b/sysdeps/generic/malloc-hugepages.h
> index d68b85630c..21d4844bc4 100644
> --- a/sysdeps/generic/malloc-hugepages.h
> +++ b/sysdeps/generic/malloc-hugepages.h
> @@ -26,6 +26,7 @@ unsigned long int __malloc_default_thp_pagesize (void) attribute_hidden;
>
> enum malloc_thp_mode_t
> {
> + malloc_thp_mode_unknown,
> malloc_thp_mode_always,
> malloc_thp_mode_madvise,
> malloc_thp_mode_never,
> diff --git a/sysdeps/unix/sysv/linux/malloc-hugepages.c b/sysdeps/unix/sysv/linux/malloc-hugepages.c
> index 2f316474c1..e7877f098e 100644
> --- a/sysdeps/unix/sysv/linux/malloc-hugepages.c
> +++ b/sysdeps/unix/sysv/linux/malloc-hugepages.c
> @@ -22,19 +22,33 @@
> #include <not-cancel.h>
> #include <sys/mman.h>
>
> +/* The __malloc_thp_mode is called only in single-thread mode, either in
> + malloc initialization or pthread creation. */
> +static unsigned long int thp_pagesize = -1;
> +
> unsigned long int
> __malloc_default_thp_pagesize (void)
> {
> + unsigned long int size = atomic_load_relaxed (&thp_pagesize);
> + if (size != -1)
> + return size;
> +
> int fd = __open64_nocancel (
> "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size", O_RDONLY);
> if (fd == -1)
> - return 0;
> + {
> + atomic_store_relaxed (&thp_pagesize, 0);
> + return 0;
> + }
>
> char str[INT_BUFSIZE_BOUND (unsigned long int)];
> ssize_t s = __read_nocancel (fd, str, sizeof (str));
> __close_nocancel (fd);
> if (s < 0)
> - return 0;
> + {
> + atomic_store_relaxed (&thp_pagesize, 0);
> + return 0;
> + }
>
> unsigned long int r = 0;
> for (ssize_t i = 0; i < s; i++)
> @@ -44,16 +58,28 @@ __malloc_default_thp_pagesize (void)
> r *= 10;
> r += str[i] - '0';
> }
> + atomic_store_relaxed (&thp_pagesize, r);
> return r;
> }
>
> +/* The __malloc_thp_mode is called only in single-thread mode, either in
> + malloc initialization or pthread creation. */
> +static enum malloc_thp_mode_t thp_mode = malloc_thp_mode_unknown;
> +
> enum malloc_thp_mode_t
> __malloc_thp_mode (void)
> {
> + enum malloc_thp_mode_t mode = atomic_load_relaxed (&thp_mode);
> + if (mode != malloc_thp_mode_unknown)
> + return mode;
> +
> int fd = __open64_nocancel ("/sys/kernel/mm/transparent_hugepage/enabled",
> O_RDONLY);
> if (fd == -1)
> - return malloc_thp_mode_not_supported;
> + {
> + atomic_store_relaxed (&thp_mode, malloc_thp_mode_not_supported);
> + return malloc_thp_mode_not_supported;
> + }
>
> static const char mode_always[] = "[always] madvise never\n";
> static const char mode_madvise[] = "always [madvise] never\n";
> @@ -69,13 +95,19 @@ __malloc_thp_mode (void)
> if (s == sizeof (mode_always) - 1)
> {
> if (strcmp (str, mode_always) == 0)
> - return malloc_thp_mode_always;
> + mode = malloc_thp_mode_always;
> else if (strcmp (str, mode_madvise) == 0)
> - return malloc_thp_mode_madvise;
> + mode = malloc_thp_mode_madvise;
> else if (strcmp (str, mode_never) == 0)
> - return malloc_thp_mode_never;
> + mode = malloc_thp_mode_never;
> + else
> + mode = malloc_thp_mode_not_supported;
> }
> - return malloc_thp_mode_not_supported;
> + else
> + mode = malloc_thp_mode_not_supported;
> +
> + atomic_store_relaxed (&thp_mode, mode);
> + return mode;
> }
>
> static size_t
prev parent reply other threads:[~2023-05-16 14:30 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-20 17:24 [PATCH] nptl: Disable THP on thread stack if it incurs in large RSS usage Adhemerval Zanella via Libc-alpha
2023-05-03 12:42 ` Wilco Dijkstra via Libc-alpha
2023-05-15 17:57 ` Adhemerval Zanella Netto via Libc-alpha
2023-05-16 15:38 ` Wilco Dijkstra via Libc-alpha
2023-05-16 16:35 ` Adhemerval Zanella Netto via Libc-alpha
2023-05-17 12:49 ` Wilco Dijkstra via Libc-alpha
2023-05-17 13:12 ` Cupertino Miranda via Libc-alpha
2023-05-17 13:20 ` Adhemerval Zanella Netto via Libc-alpha
2023-05-17 14:22 ` Wilco Dijkstra via Libc-alpha
2023-05-17 16:50 ` Adhemerval Zanella Netto via Libc-alpha
2023-05-17 18:16 ` Wilco Dijkstra via Libc-alpha
2023-05-18 13:04 ` Adhemerval Zanella Netto via Libc-alpha
2023-05-23 9:48 ` Wilco Dijkstra via Libc-alpha
2024-01-31 2:03 ` Cristian Rodríguez
2024-01-31 7:54 ` Florian Weimer
2024-01-31 11:30 ` Adhemerval Zanella Netto
2024-01-31 11:43 ` Florian Weimer
2024-03-12 0:55 ` Cristian Rodríguez
2024-01-31 15:18 ` Cristian Rodríguez
2024-02-01 1:26 ` Cristian Rodríguez
2023-05-16 14:30 ` Cupertino Miranda via Libc-alpha [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/libc/involved.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=875y8stklt.fsf@oracle.com \
--to=libc-alpha@sourceware.org \
--cc=Wilco.Dijkstra@arm.com \
--cc=adhemerval.zanella@linaro.org \
--cc=cupertino.miranda@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).