unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Add __libc_single_threaded
@ 2020-05-20 18:12 Florian Weimer via Libc-alpha
  2020-05-20 18:12 ` [PATCH 1/2] Add the __libc_single_threaded variable Florian Weimer via Libc-alpha
  2020-05-20 18:12 ` [PATCH 2/2] manual: Document __libc_single_threaded Florian Weimer via Libc-alpha
  0 siblings, 2 replies; 45+ messages in thread
From: Florian Weimer via Libc-alpha @ 2020-05-20 18:12 UTC (permalink / raw)
  To: libc-alpha

This is another attempt at providing the __libc_single_threaded
variable.  With __libc_early_init, this becomes quite easy to implement.
I removed to optimization for fork because it did not seem very
important.

Tested on x86_64-linux-gnu and i686-linux-gnu.  Built with
build-many-glibcs.py.

Florian Weimer (2):
  Add the __libc_single_threaded variable
  manual: Document __libc_single_threaded

 NEWS                                          |   6 +
 elf/Makefile                                  |  33 +++-
 elf/libc_early_init.c                         |   4 +
 elf/tst-single_threaded-mod1.c                |  25 +++
 elf/tst-single_threaded-mod2.c                |  25 +++
 elf/tst-single_threaded-mod3.c                |  25 +++
 elf/tst-single_threaded-mod4.c                |  25 +++
 elf/tst-single_threaded-pthread-static.c      |  86 +++++++++
 elf/tst-single_threaded-pthread.c             | 174 ++++++++++++++++++
 elf/tst-single_threaded-static-dlopen.c       |  56 ++++++
 elf/tst-single_threaded-static.c              |  29 +++
 elf/tst-single_threaded.c                     |  70 +++++++
 htl/pt-create.c                               |   5 +
 include/sys/single_threaded.h                 |   1 +
 manual/threads.texi                           |  89 +++++++++
 misc/Makefile                                 |   5 +-
 misc/Versions                                 |   3 +
 misc/single_threaded.c                        |  27 +++
 misc/sys/single_threaded.h                    |  33 ++++
 nptl/pthread_create.c                         |   5 +
 sysdeps/generic/ldsodefs.h                    |   5 +
 sysdeps/generic/libc.abilist                  |   1 +
 sysdeps/mach/hurd/i386/libc.abilist           |   1 +
 sysdeps/unix/sysv/linux/aarch64/libc.abilist  |   1 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist    |   1 +
 sysdeps/unix/sysv/linux/arm/be/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/arm/le/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/csky/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/hppa/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/i386/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/ia64/libc.abilist     |   1 +
 .../sysv/linux/m68k/coldfire/libc.abilist     |   1 +
 .../unix/sysv/linux/m68k/m680x0/libc.abilist  |   1 +
 .../sysv/linux/microblaze/be/libc.abilist     |   1 +
 .../sysv/linux/microblaze/le/libc.abilist     |   1 +
 .../sysv/linux/mips/mips32/fpu/libc.abilist   |   1 +
 .../sysv/linux/mips/mips32/nofpu/libc.abilist |   1 +
 .../sysv/linux/mips/mips64/n32/libc.abilist   |   1 +
 .../sysv/linux/mips/mips64/n64/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/nios2/libc.abilist    |   1 +
 .../linux/powerpc/powerpc32/fpu/libc.abilist  |   1 +
 .../powerpc/powerpc32/nofpu/libc.abilist      |   1 +
 .../linux/powerpc/powerpc64/be/libc.abilist   |   1 +
 .../linux/powerpc/powerpc64/le/libc.abilist   |   1 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |   1 +
 .../unix/sysv/linux/s390/s390-32/libc.abilist |   1 +
 .../unix/sysv/linux/s390/s390-64/libc.abilist |   1 +
 sysdeps/unix/sysv/linux/sh/be/libc.abilist    |   1 +
 sysdeps/unix/sysv/linux/sh/le/libc.abilist    |   1 +
 .../sysv/linux/sparc/sparc32/libc.abilist     |   1 +
 .../sysv/linux/sparc/sparc64/libc.abilist     |   1 +
 .../unix/sysv/linux/x86_64/64/libc.abilist    |   1 +
 .../unix/sysv/linux/x86_64/x32/libc.abilist   |   1 +
 53 files changed, 755 insertions(+), 8 deletions(-)
 create mode 100644 elf/tst-single_threaded-mod1.c
 create mode 100644 elf/tst-single_threaded-mod2.c
 create mode 100644 elf/tst-single_threaded-mod3.c
 create mode 100644 elf/tst-single_threaded-mod4.c
 create mode 100644 elf/tst-single_threaded-pthread-static.c
 create mode 100644 elf/tst-single_threaded-pthread.c
 create mode 100644 elf/tst-single_threaded-static-dlopen.c
 create mode 100644 elf/tst-single_threaded-static.c
 create mode 100644 elf/tst-single_threaded.c
 create mode 100644 include/sys/single_threaded.h
 create mode 100644 misc/single_threaded.c
 create mode 100644 misc/sys/single_threaded.h

-- 
2.25.4


^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 1/2] Add the __libc_single_threaded variable
  2020-05-20 18:12 [PATCH 0/2] Add __libc_single_threaded Florian Weimer via Libc-alpha
@ 2020-05-20 18:12 ` Florian Weimer via Libc-alpha
  2020-05-21 13:07   ` Szabolcs Nagy
  2020-05-20 18:12 ` [PATCH 2/2] manual: Document __libc_single_threaded Florian Weimer via Libc-alpha
  1 sibling, 1 reply; 45+ messages in thread
From: Florian Weimer via Libc-alpha @ 2020-05-20 18:12 UTC (permalink / raw)
  To: libc-alpha

The variable is placed in libc.so, and it can be true only in
an outer libc, not libcs loaded via dlmopen or static dlopen.
Since thread creation from inner namespaces does not work,
pthread_create can update __libc_single_threaded directly.

Using __libc_early_init and its initial flag, implementation of this
variable is very straightforward.  A future version may reset the flag
during fork (but not in an inner namespace), or after joining all
threads except one.
---
 NEWS                                          |   6 +
 elf/Makefile                                  |  33 +++-
 elf/libc_early_init.c                         |   4 +
 elf/tst-single_threaded-mod1.c                |  25 +++
 elf/tst-single_threaded-mod2.c                |  25 +++
 elf/tst-single_threaded-mod3.c                |  25 +++
 elf/tst-single_threaded-mod4.c                |  25 +++
 elf/tst-single_threaded-pthread-static.c      |  86 +++++++++
 elf/tst-single_threaded-pthread.c             | 174 ++++++++++++++++++
 elf/tst-single_threaded-static-dlopen.c       |  56 ++++++
 elf/tst-single_threaded-static.c              |  29 +++
 elf/tst-single_threaded.c                     |  70 +++++++
 htl/pt-create.c                               |   5 +
 include/sys/single_threaded.h                 |   1 +
 misc/Makefile                                 |   5 +-
 misc/Versions                                 |   3 +
 misc/single_threaded.c                        |  27 +++
 misc/sys/single_threaded.h                    |  33 ++++
 nptl/pthread_create.c                         |   5 +
 sysdeps/generic/ldsodefs.h                    |   5 +
 sysdeps/generic/libc.abilist                  |   1 +
 sysdeps/mach/hurd/i386/libc.abilist           |   1 +
 sysdeps/unix/sysv/linux/aarch64/libc.abilist  |   1 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist    |   1 +
 sysdeps/unix/sysv/linux/arm/be/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/arm/le/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/csky/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/hppa/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/i386/libc.abilist     |   1 +
 sysdeps/unix/sysv/linux/ia64/libc.abilist     |   1 +
 .../sysv/linux/m68k/coldfire/libc.abilist     |   1 +
 .../unix/sysv/linux/m68k/m680x0/libc.abilist  |   1 +
 .../sysv/linux/microblaze/be/libc.abilist     |   1 +
 .../sysv/linux/microblaze/le/libc.abilist     |   1 +
 .../sysv/linux/mips/mips32/fpu/libc.abilist   |   1 +
 .../sysv/linux/mips/mips32/nofpu/libc.abilist |   1 +
 .../sysv/linux/mips/mips64/n32/libc.abilist   |   1 +
 .../sysv/linux/mips/mips64/n64/libc.abilist   |   1 +
 sysdeps/unix/sysv/linux/nios2/libc.abilist    |   1 +
 .../linux/powerpc/powerpc32/fpu/libc.abilist  |   1 +
 .../powerpc/powerpc32/nofpu/libc.abilist      |   1 +
 .../linux/powerpc/powerpc64/be/libc.abilist   |   1 +
 .../linux/powerpc/powerpc64/le/libc.abilist   |   1 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |   1 +
 .../unix/sysv/linux/s390/s390-32/libc.abilist |   1 +
 .../unix/sysv/linux/s390/s390-64/libc.abilist |   1 +
 sysdeps/unix/sysv/linux/sh/be/libc.abilist    |   1 +
 sysdeps/unix/sysv/linux/sh/le/libc.abilist    |   1 +
 .../sysv/linux/sparc/sparc32/libc.abilist     |   1 +
 .../sysv/linux/sparc/sparc64/libc.abilist     |   1 +
 .../unix/sysv/linux/x86_64/64/libc.abilist    |   1 +
 .../unix/sysv/linux/x86_64/x32/libc.abilist   |   1 +
 52 files changed, 666 insertions(+), 8 deletions(-)
 create mode 100644 elf/tst-single_threaded-mod1.c
 create mode 100644 elf/tst-single_threaded-mod2.c
 create mode 100644 elf/tst-single_threaded-mod3.c
 create mode 100644 elf/tst-single_threaded-mod4.c
 create mode 100644 elf/tst-single_threaded-pthread-static.c
 create mode 100644 elf/tst-single_threaded-pthread.c
 create mode 100644 elf/tst-single_threaded-static-dlopen.c
 create mode 100644 elf/tst-single_threaded-static.c
 create mode 100644 elf/tst-single_threaded.c
 create mode 100644 include/sys/single_threaded.h
 create mode 100644 misc/single_threaded.c
 create mode 100644 misc/sys/single_threaded.h

diff --git a/NEWS b/NEWS
index b7c229b32a..0230f0a58e 100644
--- a/NEWS
+++ b/NEWS
@@ -27,6 +27,12 @@ Major new features:
   several APIs have been annotated with GCC 'access' attribute.  This
   should help GCC 10 issue better warnings.
 
+* The GNU C Library now provides the header file <sys/single_threaded.h>
+  which declares the variable __libc_single_threaded.  Applications are
+  encouraged to use this variable for single-thread optimizations,
+  instead of weak references to symbols historically defined in
+  libpthread.
+
 Deprecated and removed features, and other changes affecting compatibility:
 
 * The deprecated <sys/sysctl.h> header and the sysctl function have been
diff --git a/elf/Makefile b/elf/Makefile
index 6fe1df90bb..81a696c3ef 100644
--- a/elf/Makefile
+++ b/elf/Makefile
@@ -155,7 +155,9 @@ endif
 tests-static-normal := tst-leaks1-static tst-array1-static tst-array5-static \
 	       tst-dl-iter-static \
 	       tst-tlsalign-static tst-tlsalign-extern-static \
-	       tst-linkall-static tst-env-setuid tst-env-setuid-tunables
+	       tst-linkall-static tst-env-setuid tst-env-setuid-tunables \
+	       tst-single_threaded-static tst-single_threaded-pthread-static
+
 tests-static-internal := tst-tls1-static tst-tls2-static \
 	       tst-ptrguard1-static tst-stackguard1-static \
 	       tst-tls1-static-non-pie tst-libc_dlvsym-static
@@ -174,9 +176,11 @@ tests-internal := tst-tls1 tst-tls2 $(tests-static-internal)
 tests-static := $(tests-static-normal) $(tests-static-internal)
 
 ifeq (yes,$(build-shared))
-tests-static += tst-tls9-static
-tst-tls9-static-ENV = \
-       LD_LIBRARY_PATH=$(objpfx):$(common-objpfx):$(common-objpfx)dlfcn
+tests-static += tst-tls9-static tst-single_threaded-static-dlopen
+static-dlopen-environment = \
+  LD_LIBRARY_PATH=$(objpfx):$(common-objpfx):$(common-objpfx)dlfcn
+tst-tls9-static-ENV = $(static-dlopen-environment)
+tst-single_threaded-static-dlopen-ENV = $(static-dlopen-environment)
 
 tests += restest1 preloadtest loadfail multiload origtest resolvfail \
 	 constload1 order noload filter \
@@ -204,7 +208,8 @@ tests += restest1 preloadtest loadfail multiload origtest resolvfail \
 	 tst-dlopen-self tst-auditmany tst-initfinilazyfail tst-dlopenfail \
 	 tst-dlopenfail-2 \
 	 tst-filterobj tst-filterobj-dlopen tst-auxobj tst-auxobj-dlopen \
-	 tst-audit14 tst-audit15 tst-audit16
+	 tst-audit14 tst-audit15 tst-audit16 \
+	 tst-single_threaded tst-single_threaded-pthread
 #	 reldep9
 tests-internal += loadtest unload unload2 circleload1 \
 	 neededtest neededtest2 neededtest3 neededtest4 \
@@ -317,7 +322,9 @@ modules-names = testobj1 testobj2 testobj3 testobj4 testobj5 testobj6 \
 		tst-dlopenfailmod1 tst-dlopenfaillinkmod tst-dlopenfailmod2 \
 		tst-dlopenfailmod3 tst-ldconfig-ld-mod \
 		tst-filterobj-flt tst-filterobj-aux tst-filterobj-filtee \
-		tst-auditlogmod-1 tst-auditlogmod-2 tst-auditlogmod-3
+		tst-auditlogmod-1 tst-auditlogmod-2 tst-auditlogmod-3 \
+		tst-single_threaded-mod1 tst-single_threaded-mod2 \
+		tst-single_threaded-mod3 tst-single_threaded-mod4
 # Most modules build with _ISOMAC defined, but those filtered out
 # depend on internal headers.
 modules-names-tests = $(filter-out ifuncmod% tst-libc_dlvsym-dso tst-tlsmod%,\
@@ -1748,3 +1755,17 @@ $(objpfx)tst-auxobj: $(objpfx)tst-filterobj-aux.so
 $(objpfx)tst-auxobj-dlopen: $(libdl)
 $(objpfx)tst-auxobj.out: $(objpfx)tst-filterobj-filtee.so
 $(objpfx)tst-auxobj-dlopen.out: $(objpfx)tst-filterobj-filtee.so
+
+$(objpfx)tst-single_threaded: $(objpfx)tst-single_threaded-mod1.so $(libdl)
+$(objpfx)tst-single_threaded.out: \
+  $(objpfx)tst-single_threaded-mod2.so $(objpfx)tst-single_threaded-mod3.so
+$(objpfx)tst-single_threaded-static-dlopen: \
+  $(objpfx)tst-single_threaded-mod1.o $(common-objpfx)dlfcn/libdl.a
+$(objpfx)tst-single_threaded-static-dlopen.out: \
+  $(objpfx)tst-single_threaded-mod2.so
+$(objpfx)tst-single_threaded-pthread: \
+  $(objpfx)tst-single_threaded-mod1.so $(libdl) $(shared-thread-library)
+$(objpfx)tst-single_threaded-pthread.out: \
+  $(objpfx)tst-single_threaded-mod2.so $(objpfx)tst-single_threaded-mod3.so \
+  $(objpfx)tst-single_threaded-mod4.so
+$(objpfx)tst-single_threaded-pthread-static: $(static-thread-library)
diff --git a/elf/libc_early_init.c b/elf/libc_early_init.c
index e6c64fb526..725ab2f811 100644
--- a/elf/libc_early_init.c
+++ b/elf/libc_early_init.c
@@ -18,10 +18,14 @@
 
 #include <ctype.h>
 #include <libc-early-init.h>
+#include <sys/single_threaded.h>
 
 void
 __libc_early_init (_Bool initial)
 {
   /* Initialize ctype data.  */
   __ctype_init ();
+
+  /* Only the outer namespace is marked as single-threaded.  */
+  __libc_single_threaded = initial;
 }
diff --git a/elf/tst-single_threaded-mod1.c b/elf/tst-single_threaded-mod1.c
new file mode 100644
index 0000000000..9fe94b2526
--- /dev/null
+++ b/elf/tst-single_threaded-mod1.c
@@ -0,0 +1,25 @@
+/* Test support for single-thread optimizations.  Shared object 1.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sys/single_threaded.h>
+
+_Bool
+single_threaded_1 (void)
+{
+  return __libc_single_threaded;
+}
diff --git a/elf/tst-single_threaded-mod2.c b/elf/tst-single_threaded-mod2.c
new file mode 100644
index 0000000000..a5166c9ebc
--- /dev/null
+++ b/elf/tst-single_threaded-mod2.c
@@ -0,0 +1,25 @@
+/* Test support for single-thread optimizations.  Shared object 2.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sys/single_threaded.h>
+
+_Bool
+single_threaded_2 (void)
+{
+  return __libc_single_threaded;
+}
diff --git a/elf/tst-single_threaded-mod3.c b/elf/tst-single_threaded-mod3.c
new file mode 100644
index 0000000000..53df13e3a7
--- /dev/null
+++ b/elf/tst-single_threaded-mod3.c
@@ -0,0 +1,25 @@
+/* Test support for single-thread optimizations.  Shared object 3.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sys/single_threaded.h>
+
+_Bool
+single_threaded_3 (void)
+{
+  return __libc_single_threaded;
+}
diff --git a/elf/tst-single_threaded-mod4.c b/elf/tst-single_threaded-mod4.c
new file mode 100644
index 0000000000..3bf5e555a4
--- /dev/null
+++ b/elf/tst-single_threaded-mod4.c
@@ -0,0 +1,25 @@
+/* Test support for single-thread optimizations.  Shared object 4.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sys/single_threaded.h>
+
+_Bool
+single_threaded_4 (void)
+{
+  return __libc_single_threaded;
+}
diff --git a/elf/tst-single_threaded-pthread-static.c b/elf/tst-single_threaded-pthread-static.c
new file mode 100644
index 0000000000..780564c40c
--- /dev/null
+++ b/elf/tst-single_threaded-pthread-static.c
@@ -0,0 +1,86 @@
+/* Test support for single-thread optimizations.  With threads, static version.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This test is a stripped-down version of
+   tst-single_threaded-pthread.c, without any loading of dynamic
+   objects.  */
+
+#include <stdio.h>
+#include <support/check.h>
+#include <support/xthread.h>
+#include <sys/single_threaded.h>
+
+/* First barrier synchronizes main thread, thread 1, thread 2.  */
+static pthread_barrier_t barrier1;
+
+/* Second barrier synchronizes main thread, thread 2.  */
+static pthread_barrier_t barrier2;
+
+static void *
+threadfunc (void *closure)
+{
+  TEST_VERIFY (!__libc_single_threaded);
+
+  /* Wait for the main thread and the other thread.  */
+  xpthread_barrier_wait (&barrier1);
+  TEST_VERIFY (!__libc_single_threaded);
+
+  /* Second thread waits on second barrier, too.  */
+  if (closure != NULL)
+    xpthread_barrier_wait (&barrier2);
+  TEST_VERIFY (!__libc_single_threaded);
+
+  return NULL;
+}
+
+static int
+do_test (void)
+{
+  TEST_VERIFY (__libc_single_threaded);
+
+  /* Two threads plus main thread.  */
+  xpthread_barrier_init (&barrier1, NULL, 3);
+
+  /* Main thread and second thread.  */
+  xpthread_barrier_init (&barrier2, NULL, 2);
+
+  pthread_t thr1 = xpthread_create (NULL, threadfunc, NULL);
+  TEST_VERIFY (!__libc_single_threaded);
+
+  pthread_t thr2 = xpthread_create (NULL, threadfunc, &thr2);
+  TEST_VERIFY (!__libc_single_threaded);
+
+  xpthread_barrier_wait (&barrier1);
+  TEST_VERIFY (!__libc_single_threaded);
+
+  /* Join first thread.  This should not bring us back into
+     single-threaded mode.  */
+  xpthread_join (thr1);
+  TEST_VERIFY (!__libc_single_threaded);
+
+  /* We may be back in single-threaded mode after joining both
+     threads, but this is not guaranteed.  */
+  xpthread_barrier_wait (&barrier2);
+  xpthread_join (thr2);
+  printf ("info: __libc_single_threaded after joining all threads: %d\n",
+          __libc_single_threaded);
+
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/elf/tst-single_threaded-pthread.c b/elf/tst-single_threaded-pthread.c
new file mode 100644
index 0000000000..c02f4047d1
--- /dev/null
+++ b/elf/tst-single_threaded-pthread.c
@@ -0,0 +1,174 @@
+/* Test support for single-thread optimizations.  With threads.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stddef.h>
+#include <stdio.h>
+#include <support/check.h>
+#include <support/namespace.h>
+#include <support/xdlfcn.h>
+#include <support/xthread.h>
+#include <sys/single_threaded.h>
+
+/* First barrier synchronizes main thread, thread 1, thread 2.  */
+static pthread_barrier_t barrier1;
+
+/* Second barrier synchronizes main thread, thread 2.  */
+static pthread_barrier_t barrier2;
+
+/* Defined in tst-single-threaded-mod1.so.  */
+_Bool single_threaded_1 (void);
+
+/* Initialized via dlsym.  */
+static _Bool (*single_threaded_2) (void);
+static _Bool (*single_threaded_3) (void);
+static _Bool (*single_threaded_4) (void);
+
+static void *
+threadfunc (void *closure)
+{
+  TEST_VERIFY (!__libc_single_threaded);
+  TEST_VERIFY (!single_threaded_1 ());
+  TEST_VERIFY (!single_threaded_2 ());
+
+  /* Wait until the main thread loads more functions.  */
+  xpthread_barrier_wait (&barrier1);
+
+  TEST_VERIFY (!__libc_single_threaded);
+  TEST_VERIFY (!single_threaded_1 ());
+  TEST_VERIFY (!single_threaded_2 ());
+  TEST_VERIFY (!single_threaded_3 ());
+  TEST_VERIFY (!single_threaded_4 ());
+
+  /* Second thread waits on second barrier, too.  */
+  if (closure != NULL)
+    xpthread_barrier_wait (&barrier2);
+  TEST_VERIFY (!__libc_single_threaded);
+  TEST_VERIFY (!single_threaded_1 ());
+  TEST_VERIFY (!single_threaded_2 ());
+  TEST_VERIFY (!single_threaded_3 ());
+  TEST_VERIFY (!single_threaded_4 ());
+
+  return NULL;
+}
+
+/* Used for closure arguments to the subprocess function.  */
+static char expected_false = 0;
+static char expected_true = 1;
+
+/* A subprocess inherits currently inherits the single-threaded state
+   of the parent process.  */
+static void
+subprocess (void *closure)
+{
+  const char *expected = closure;
+  TEST_COMPARE (__libc_single_threaded, *expected);
+  TEST_COMPARE (single_threaded_1 (), *expected);
+  if (single_threaded_2 != NULL)
+    TEST_COMPARE (single_threaded_2 (), *expected);
+  if (single_threaded_3 != NULL)
+    TEST_COMPARE (single_threaded_3 (), *expected);
+  if (single_threaded_4 != NULL)
+    TEST_VERIFY (!single_threaded_4 ());
+}
+
+static int
+do_test (void)
+{
+  printf ("info: main __libc_single_threaded address: %p\n",
+          &__libc_single_threaded);
+  TEST_VERIFY (__libc_single_threaded);
+  TEST_VERIFY (single_threaded_1 ());
+  support_isolate_in_subprocess (subprocess, &expected_true);
+
+  void *handle_mod2 = xdlopen ("tst-single_threaded-mod2.so", RTLD_LAZY);
+  single_threaded_2 = xdlsym (handle_mod2, "single_threaded_2");
+  TEST_VERIFY (single_threaded_2 ());
+
+  /* Two threads plus main thread.  */
+  xpthread_barrier_init (&barrier1, NULL, 3);
+
+  /* Main thread and second thread.  */
+  xpthread_barrier_init (&barrier2, NULL, 2);
+
+  pthread_t thr1 = xpthread_create (NULL, threadfunc, NULL);
+  TEST_VERIFY (!__libc_single_threaded);
+  TEST_VERIFY (!single_threaded_1 ());
+  TEST_VERIFY (!single_threaded_2 ());
+  support_isolate_in_subprocess (subprocess, &expected_false);
+
+  pthread_t thr2 = xpthread_create (NULL, threadfunc, &thr2);
+  TEST_VERIFY (!__libc_single_threaded);
+  TEST_VERIFY (!single_threaded_1 ());
+  TEST_VERIFY (!single_threaded_2 ());
+  support_isolate_in_subprocess (subprocess, &expected_false);
+
+  /* Delayed library load, while already multi-threaded.  */
+  void *handle_mod3 = xdlopen ("tst-single_threaded-mod3.so", RTLD_LAZY);
+  single_threaded_3 = xdlsym (handle_mod3, "single_threaded_3");
+  TEST_VERIFY (!__libc_single_threaded);
+  TEST_VERIFY (!single_threaded_1 ());
+  TEST_VERIFY (!single_threaded_2 ());
+  TEST_VERIFY (!single_threaded_3 ());
+  support_isolate_in_subprocess (subprocess, &expected_false);
+
+  /* Same with dlmopen.  */
+  void *handle_mod4 = dlmopen (LM_ID_NEWLM, "tst-single_threaded-mod4.so",
+                               RTLD_LAZY);
+  single_threaded_4 = xdlsym (handle_mod4, "single_threaded_4");
+  TEST_VERIFY (!__libc_single_threaded);
+  TEST_VERIFY (!single_threaded_1 ());
+  TEST_VERIFY (!single_threaded_2 ());
+  TEST_VERIFY (!single_threaded_3 ());
+  TEST_VERIFY (!single_threaded_4 ());
+  support_isolate_in_subprocess (subprocess, &expected_false);
+
+  /* Run the newly loaded functions from the other threads as
+     well.  */
+  xpthread_barrier_wait (&barrier1);
+  TEST_VERIFY (!__libc_single_threaded);
+  TEST_VERIFY (!single_threaded_1 ());
+  TEST_VERIFY (!single_threaded_2 ());
+  TEST_VERIFY (!single_threaded_3 ());
+  TEST_VERIFY (!single_threaded_4 ());
+  support_isolate_in_subprocess (subprocess, &expected_false);
+
+  /* Join first thread.  This should not bring us back into
+     single-threaded mode.  */
+  xpthread_join (thr1);
+  TEST_VERIFY (!__libc_single_threaded);
+  TEST_VERIFY (!single_threaded_1 ());
+  TEST_VERIFY (!single_threaded_2 ());
+  TEST_VERIFY (!single_threaded_3 ());
+  TEST_VERIFY (!single_threaded_4 ());
+  support_isolate_in_subprocess (subprocess, &expected_false);
+
+  /* We may be back in single-threaded mode after joining both
+     threads, but this is not guaranteed.  */
+  xpthread_barrier_wait (&barrier2);
+  xpthread_join (thr2);
+  printf ("info: __libc_single_threaded after joining all threads: %d\n",
+          __libc_single_threaded);
+
+  xdlclose (handle_mod4);
+  xdlclose (handle_mod3);
+  xdlclose (handle_mod2);
+
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/elf/tst-single_threaded-static-dlopen.c b/elf/tst-single_threaded-static-dlopen.c
new file mode 100644
index 0000000000..f270cf452e
--- /dev/null
+++ b/elf/tst-single_threaded-static-dlopen.c
@@ -0,0 +1,56 @@
+/* Test support for single-thread optimizations.  No threads, static dlopen.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* In a static dlopen scenario, the single-threaded optimization is
+   not possible because their is no globally shared dynamic linker
+   across all namespaces.  */
+
+#include <stddef.h>
+#include <support/check.h>
+#include <support/xdlfcn.h>
+#include <sys/single_threaded.h>
+
+static int
+do_test (void)
+{
+  TEST_VERIFY (__libc_single_threaded);
+
+  /* Defined in tst-single-threaded-mod1.o.  */
+  extern _Bool single_threaded_1 (void);
+  TEST_VERIFY (single_threaded_1 ());
+
+  /* Even after a failed dlopen, assume multi-threaded mode.  */
+  TEST_VERIFY (dlopen ("tst-single_threaded-does-not-exist.so", RTLD_LAZY)
+               == NULL);
+  TEST_VERIFY (__libc_single_threaded);
+  TEST_VERIFY (single_threaded_1 ());
+
+  void *handle_mod2 = xdlopen ("tst-single_threaded-mod2.so", RTLD_LAZY);
+  _Bool (*single_threaded_2) (void)
+    = xdlsym (handle_mod2, "single_threaded_2");
+  TEST_VERIFY (__libc_single_threaded);
+  TEST_VERIFY (single_threaded_1 ());
+  /* The inner libc always assumes multi-threaded use.  */
+  TEST_VERIFY (!single_threaded_2 ());
+
+  xdlclose (handle_mod2);
+
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/elf/tst-single_threaded-static.c b/elf/tst-single_threaded-static.c
new file mode 100644
index 0000000000..29d7ab2731
--- /dev/null
+++ b/elf/tst-single_threaded-static.c
@@ -0,0 +1,29 @@
+/* Test support for single-thread optimizations.  Static, no threads.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <support/check.h>
+#include <sys/single_threaded.h>
+
+static int
+do_test (void)
+{
+  TEST_VERIFY (__libc_single_threaded);
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/elf/tst-single_threaded.c b/elf/tst-single_threaded.c
new file mode 100644
index 0000000000..478c2dc259
--- /dev/null
+++ b/elf/tst-single_threaded.c
@@ -0,0 +1,70 @@
+/* Test support for single-thread optimizations.  No threads.
+   Copyright (C) 2020 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stddef.h>
+#include <stdio.h>
+#include <support/check.h>
+#include <support/namespace.h>
+#include <support/xdlfcn.h>
+#include <sys/single_threaded.h>
+
+/* Defined in tst-single-threaded-mod1.so.  */
+extern _Bool single_threaded_1 (void);
+
+/* Initialized via dlsym.  */
+_Bool (*single_threaded_2) (void);
+_Bool (*single_threaded_3) (void);
+
+static void
+subprocess (void *closure)
+{
+  TEST_VERIFY (__libc_single_threaded);
+  TEST_VERIFY (single_threaded_1 ());
+  if (single_threaded_2 != NULL)
+    TEST_VERIFY (single_threaded_2 ());
+  if (single_threaded_3 != NULL)
+    TEST_VERIFY (!single_threaded_3 ());
+}
+
+static int
+do_test (void)
+{
+  TEST_VERIFY (__libc_single_threaded);
+  TEST_VERIFY (single_threaded_1 ());
+  support_isolate_in_subprocess (subprocess, NULL);
+
+  void *handle_mod2 = xdlopen ("tst-single_threaded-mod2.so", RTLD_LAZY);
+  single_threaded_2 = xdlsym (handle_mod2, "single_threaded_2");
+  TEST_VERIFY (single_threaded_2 ());
+  support_isolate_in_subprocess (subprocess, NULL);
+
+  /* The current implementation treats the inner namespace as
+     multi-threaded.  */
+  void *handle_mod3 = dlmopen (LM_ID_NEWLM, "tst-single_threaded-mod3.so",
+                               RTLD_LAZY);
+  single_threaded_3 = xdlsym (handle_mod3, "single_threaded_3");
+  TEST_VERIFY (!single_threaded_3 ());
+  support_isolate_in_subprocess (subprocess, NULL);
+
+  xdlclose (handle_mod3);
+  xdlclose (handle_mod2);
+
+  return 0;
+}
+
+#include <support/test-driver.c>
diff --git a/htl/pt-create.c b/htl/pt-create.c
index f501a12017..7ac875cbf7 100644
--- a/htl/pt-create.c
+++ b/htl/pt-create.c
@@ -24,6 +24,7 @@
 
 #include <atomic.h>
 #include <hurd/resource.h>
+#include <sys/single_threaded.h>
 
 #include <pt-internal.h>
 #include <pthreadP.h>
@@ -104,6 +105,10 @@ __pthread_create_internal (struct __pthread **thread,
   sigset_t sigset;
   size_t stacksize;
 
+  /* Avoid a data race in the multi-threaded case.  */
+  if (__libc_single_threaded)
+    __libc_single_threaded = 0;
+
   /* Allocate a new thread structure.  */
   err = __pthread_alloc (&pthread);
   if (err)
diff --git a/include/sys/single_threaded.h b/include/sys/single_threaded.h
new file mode 100644
index 0000000000..18f6972482
--- /dev/null
+++ b/include/sys/single_threaded.h
@@ -0,0 +1 @@
+#include <misc/sys/single_threaded.h>
diff --git a/misc/Makefile b/misc/Makefile
index 67c5237f97..58959f6913 100644
--- a/misc/Makefile
+++ b/misc/Makefile
@@ -37,7 +37,8 @@ headers	:= sys/uio.h bits/uio-ext.h bits/uio_lim.h \
 	   bits/syslog.h bits/syslog-ldbl.h bits/syslog-path.h bits/error.h \
 	   bits/select2.h bits/hwcap.h sys/auxv.h \
 	   sys/sysmacros.h bits/sysmacros.h bits/types/struct_iovec.h \
-	   bits/err-ldbl.h bits/error-ldbl.h
+	   bits/err-ldbl.h bits/error-ldbl.h \
+	   sys/single_threaded.h
 
 routines := brk sbrk sstk ioctl \
 	    readv writev preadv preadv64 pwritev pwritev64 \
@@ -72,7 +73,7 @@ routines := brk sbrk sstk ioctl \
 	    fgetxattr flistxattr fremovexattr fsetxattr getxattr \
 	    listxattr lgetxattr llistxattr lremovexattr lsetxattr \
 	    removexattr setxattr getauxval ifunc-impl-list makedev \
-	    allocate_once fd_to_filename
+	    allocate_once fd_to_filename single_threaded
 
 generated += tst-error1.mtrace tst-error1-mem.out \
   tst-allocate_once.mtrace tst-allocate_once-mem.out
diff --git a/misc/Versions b/misc/Versions
index e749582369..95666f6548 100644
--- a/misc/Versions
+++ b/misc/Versions
@@ -161,6 +161,9 @@ libc {
   GLIBC_2.30 {
     twalk_r;
   }
+  GLIBC_2.32 {
+    __libc_single_threaded;
+  }
   GLIBC_PRIVATE {
     __madvise;
     __mktemp;
diff --git a/misc/single_threaded.c b/misc/single_threaded.c
new file mode 100644
index 0000000000..d7c55b784b
--- /dev/null
+++ b/misc/single_threaded.c
@@ -0,0 +1,27 @@
+/* Support for single-thread optimizations.  Statically linked version.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sys/single_threaded.h>
+
+/* In dynamically linked programs, this variable is initialized in
+   __libc_early_init (as false for inner libcs).  */
+#ifdef SHARED
+char __libc_single_threaded;
+#else
+char __libc_single_threaded = 1;
+#endif
diff --git a/misc/sys/single_threaded.h b/misc/sys/single_threaded.h
new file mode 100644
index 0000000000..c721141d35
--- /dev/null
+++ b/misc/sys/single_threaded.h
@@ -0,0 +1,33 @@
+/* Support for single-thread optimizations.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_SINGLE_THREADED_H
+#define _SYS_SINGLE_THREADED_H
+
+#include <features.h>
+
+__BEGIN_DECLS
+
+/* If this variable is non-zero, then the current thread is the only
+   thread in the process image.  If it is zero, the process can be
+   multi-threaded.  */
+extern char __libc_single_threaded;
+
+__END_DECLS
+
+#endif /* _SYS_SINGLE_THREADED_H */
diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
index a43089065c..6fffe07ffa 100644
--- a/nptl/pthread_create.c
+++ b/nptl/pthread_create.c
@@ -34,6 +34,7 @@
 #include <futex-internal.h>
 #include <tls-setup.h>
 #include "libioP.h"
+#include <sys/single_threaded.h>
 
 #include <shlib-compat.h>
 
@@ -611,6 +612,10 @@ __pthread_create_2_1 (pthread_t *newthread, const pthread_attr_t *attr,
 {
   STACK_VARIABLES;
 
+  /* Avoid a data race in the multi-threaded case.  */
+  if (__libc_single_threaded)
+    __libc_single_threaded = 0;
+
   const struct pthread_attr *iattr = (struct pthread_attr *) attr;
   struct pthread_attr default_attr;
   bool free_cpuset = false;
diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
index 5ff4a2831b..fa5c200f73 100644
--- a/sysdeps/generic/ldsodefs.h
+++ b/sysdeps/generic/ldsodefs.h
@@ -482,9 +482,11 @@ struct rtld_global
 #   define __rtld_local_attribute__ __attribute__ ((visibility ("hidden")))
 #  endif
 extern struct rtld_global _rtld_local __rtld_local_attribute__;
+extern char __libc_single_threaded_local __rtld_local_attribute__;
 #  undef __rtld_local_attribute__
 # endif
 extern struct rtld_global _rtld_global __rtld_global_attribute__;
+extern char __libc_single_threaded __rtld_global_attribute__;
 # undef __rtld_global_attribute__
 #endif
 
@@ -1124,6 +1126,9 @@ extern struct link_map * _dl_get_dl_main_map (void)
    If libpthread is not linked in, this is an empty function.  */
 void __pthread_initialize_minimal (void) weak_function;
 
+/* Update both copies of __libc_single_threaded.  */
+void _dl_single_threaded_update (char value);
+
 /* Allocate memory for static TLS block (unless MEM is nonzero) and dtv.  */
 extern void *_dl_allocate_tls (void *mem);
 rtld_hidden_proto (_dl_allocate_tls)
diff --git a/sysdeps/generic/libc.abilist b/sysdeps/generic/libc.abilist
index e69de29bb2..8ca9b93c2f 100644
--- a/sysdeps/generic/libc.abilist
+++ b/sysdeps/generic/libc.abilist
@@ -0,0 +1 @@
+GLIBC_2.32 __libc_single_threaded D 0x1
diff --git a/sysdeps/mach/hurd/i386/libc.abilist b/sysdeps/mach/hurd/i386/libc.abilist
index 60696d827f..67a98ae86f 100644
--- a/sysdeps/mach/hurd/i386/libc.abilist
+++ b/sysdeps/mach/hurd/i386/libc.abilist
@@ -2181,6 +2181,7 @@ GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
 GLIBC_2.30 twalk_r F
+GLIBC_2.32 __libc_single_threaded D 0x1
 GLIBC_2.32 mach_print F
 GLIBC_2.32 thrd_current F
 GLIBC_2.32 thrd_equal F
diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
index 41bb214bb9..c526f8d661 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
@@ -2146,4 +2146,5 @@ GLIBC_2.30 getdents64 F
 GLIBC_2.30 gettid F
 GLIBC_2.30 tgkill F
 GLIBC_2.30 twalk_r F
+GLIBC_2.32 __libc_single_threaded D 0x1
 GLIBC_2.32 pthread_sigmask F
diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist
index 6430af207f..cdc5c27fa9 100644
--- a/sysdeps/unix/sysv/linux/alpha/libc.abilist
+++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist
@@ -2226,6 +2226,7 @@ GLIBC_2.30 getdents64 F
 GLIBC_2.30 gettid F
 GLIBC_2.30 tgkill F
 GLIBC_2.30 twalk_r F
+GLIBC_2.32 __libc_single_threaded D 0x1
 GLIBC_2.32 pthread_sigmask F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
diff --git a/sysdeps/unix/sysv/linux/arm/be/libc.abilist b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
index f4ea1756d5..18c6901a47 100644
--- a/sysdeps/unix/sysv/linux/arm/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/be/libc.abilist
@@ -133,6 +133,7 @@ GLIBC_2.30 twalk_r F
 GLIBC_2.31 msgctl F
 GLIBC_2.31 semctl F
 GLIBC_2.31 shmctl F
+GLIBC_2.32 __libc_single_threaded D 0x1
 GLIBC_2.32 pthread_sigmask F
 GLIBC_2.4 _Exit F
 GLIBC_2.4 _IO_2_1_stderr_ D 0xa0
diff --git a/sysdeps/unix/sysv/linux/arm/le/libc.abilist b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
index f1456b26b2..f3bfff868b 100644
--- a/sysdeps/unix/sysv/linux/arm/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/le/libc.abilist
@@ -130,6 +130,7 @@ GLIBC_2.30 getdents64 F
 GLIBC_2.30 gettid F
 GLIBC_2.30 tgkill F
 GLIBC_2.30 twalk_r F
+GLIBC_2.32 __libc_single_threaded D 0x1
 GLIBC_2.32 pthread_sigmask F
 GLIBC_2.4 _Exit F
 GLIBC_2.4 _IO_2_1_stderr_ D 0xa0
diff --git a/sysdeps/unix/sysv/linux/csky/libc.abilist b/sysdeps/unix/sysv/linux/csky/libc.abilist
index c54aed2f8e..d61457aef0 100644
--- a/sysdeps/unix/sysv/linux/csky/libc.abilist
+++ b/sysdeps/unix/sysv/linux/csky/libc.abilist
@@ -2090,4 +2090,5 @@ GLIBC_2.30 getdents64 F
 GLIBC_2.30 gettid F
 GLIBC_2.30 tgkill F
 GLIBC_2.30 twalk_r F
+GLIBC_2.32 __libc_single_threaded D 0x1
 GLIBC_2.32 pthread_sigmask F
diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist
index 87373f755b..b9e28741eb 100644
--- a/sysdeps/unix/sysv/linux/hppa/libc.abilist
+++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist
@@ -2047,6 +2047,7 @@ GLIBC_2.30 getdents64 F
 GLIBC_2.30 gettid F
 GLIBC_2.30 tgkill F
 GLIBC_2.30 twalk_r F
+GLIBC_2.32 __libc_single_threaded D 0x1
 GLIBC_2.32 pthread_sigmask F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist
index 1bd2e02f79..ae74842837 100644
--- a/sysdeps/unix/sysv/linux/i386/libc.abilist
+++ b/sysdeps/unix/sysv/linux/i386/libc.abilist
@@ -2213,6 +2213,7 @@ GLIBC_2.30 getdents64 F
 GLIBC_2.30 gettid F
 GLIBC_2.30 tgkill F
 GLIBC_2.30 twalk_r F
+GLIBC_2.32 __libc_single_threaded D 0x1
 GLIBC_2.32 pthread_sigmask F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist
index 07e51d46bf..b7190f60f6 100644
--- a/sysdeps/unix/sysv/linux/ia64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist
@@ -2079,6 +2079,7 @@ GLIBC_2.30 getdents64 F
 GLIBC_2.30 gettid F
 GLIBC_2.30 tgkill F
 GLIBC_2.30 twalk_r F
+GLIBC_2.32 __libc_single_threaded D 0x1
 GLIBC_2.32 pthread_sigmask F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
index 42ea4c24bf..48ab675319 100644
--- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
@@ -134,6 +134,7 @@ GLIBC_2.30 twalk_r F
 GLIBC_2.31 msgctl F
 GLIBC_2.31 semctl F
 GLIBC_2.31 shmctl F
+GLIBC_2.32 __libc_single_threaded D 0x1
 GLIBC_2.32 pthread_sigmask F
 GLIBC_2.4 _Exit F
 GLIBC_2.4 _IO_2_1_stderr_ D 0x98
diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
index e9358fb092..125d98b17c 100644
--- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
@@ -2159,6 +2159,7 @@ GLIBC_2.30 twalk_r F
 GLIBC_2.31 msgctl F
 GLIBC_2.31 semctl F
 GLIBC_2.31 shmctl F
+GLIBC_2.32 __libc_single_threaded D 0x1
 GLIBC_2.32 pthread_sigmask F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
diff --git a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
index 2cefe739c0..d40ead85d2 100644
--- a/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/be/libc.abilist
@@ -2141,4 +2141,5 @@ GLIBC_2.30 twalk_r F
 GLIBC_2.31 msgctl F
 GLIBC_2.31 semctl F
 GLIBC_2.31 shmctl F
+GLIBC_2.32 __libc_single_threaded D 0x1
 GLIBC_2.32 pthread_sigmask F
diff --git a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
index 3474ef1490..fcca528d68 100644
--- a/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/le/libc.abilist
@@ -2138,4 +2138,5 @@ GLIBC_2.30 getdents64 F
 GLIBC_2.30 gettid F
 GLIBC_2.30 tgkill F
 GLIBC_2.30 twalk_r F
+GLIBC_2.32 __libc_single_threaded D 0x1
 GLIBC_2.32 pthread_sigmask F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
index a6f99a7369..42a5cd5d08 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
@@ -2130,6 +2130,7 @@ GLIBC_2.30 getdents64 F
 GLIBC_2.30 gettid F
 GLIBC_2.30 tgkill F
 GLIBC_2.30 twalk_r F
+GLIBC_2.32 __libc_single_threaded D 0x1
 GLIBC_2.32 pthread_sigmask F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
index 48222af11c..eef6cdc23f 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
@@ -2128,6 +2128,7 @@ GLIBC_2.30 getdents64 F
 GLIBC_2.30 gettid F
 GLIBC_2.30 tgkill F
 GLIBC_2.30 twalk_r F
+GLIBC_2.32 __libc_single_threaded D 0x1
 GLIBC_2.32 pthread_sigmask F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
index 99965cfb0f..3d970ddd0a 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
@@ -2136,6 +2136,7 @@ GLIBC_2.30 getdents64 F
 GLIBC_2.30 gettid F
 GLIBC_2.30 tgkill F
 GLIBC_2.30 twalk_r F
+GLIBC_2.32 __libc_single_threaded D 0x1
 GLIBC_2.32 pthread_sigmask F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
index 2c8bafc669..5050f84b5d 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
@@ -2130,6 +2130,7 @@ GLIBC_2.30 getdents64 F
 GLIBC_2.30 gettid F
 GLIBC_2.30 tgkill F
 GLIBC_2.30 twalk_r F
+GLIBC_2.32 __libc_single_threaded D 0x1
 GLIBC_2.32 pthread_sigmask F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist
index 52cf72052c..7523f06805 100644
--- a/sysdeps/unix/sysv/linux/nios2/libc.abilist
+++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist
@@ -2179,4 +2179,5 @@ GLIBC_2.30 getdents64 F
 GLIBC_2.30 gettid F
 GLIBC_2.30 tgkill F
 GLIBC_2.30 twalk_r F
+GLIBC_2.32 __libc_single_threaded D 0x1
 GLIBC_2.32 pthread_sigmask F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
index 2ca5bbccf3..a9b169bf79 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
@@ -2186,6 +2186,7 @@ GLIBC_2.30 getdents64 F
 GLIBC_2.30 gettid F
 GLIBC_2.30 tgkill F
 GLIBC_2.30 twalk_r F
+GLIBC_2.32 __libc_single_threaded D 0x1
 GLIBC_2.32 pthread_sigmask F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
index e6c4d002d5..59218e2188 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
@@ -2219,6 +2219,7 @@ GLIBC_2.30 getdents64 F
 GLIBC_2.30 gettid F
 GLIBC_2.30 tgkill F
 GLIBC_2.30 twalk_r F
+GLIBC_2.32 __libc_single_threaded D 0x1
 GLIBC_2.32 pthread_sigmask F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
index 82d77b7e48..f06a8add0e 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
@@ -2049,6 +2049,7 @@ GLIBC_2.30 getdents64 F
 GLIBC_2.30 gettid F
 GLIBC_2.30 tgkill F
 GLIBC_2.30 twalk_r F
+GLIBC_2.32 __libc_single_threaded D 0x1
 GLIBC_2.32 pthread_sigmask F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
index 0c2513a4b3..ac57ac82e3 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
@@ -2276,6 +2276,7 @@ GLIBC_2.32 __isoc99_vsscanfieee128 F
 GLIBC_2.32 __isoc99_vswscanfieee128 F
 GLIBC_2.32 __isoc99_vwscanfieee128 F
 GLIBC_2.32 __isoc99_wscanfieee128 F
+GLIBC_2.32 __libc_single_threaded D 0x1
 GLIBC_2.32 __obstack_printf_chkieee128 F
 GLIBC_2.32 __obstack_printfieee128 F
 GLIBC_2.32 __obstack_vprintf_chkieee128 F
diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
index 234d34929a..128c040b7f 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
@@ -2108,4 +2108,5 @@ GLIBC_2.30 getdents64 F
 GLIBC_2.30 gettid F
 GLIBC_2.30 tgkill F
 GLIBC_2.30 twalk_r F
+GLIBC_2.32 __libc_single_threaded D 0x1
 GLIBC_2.32 pthread_sigmask F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
index 1f06cce028..3881fe4e77 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
@@ -2184,6 +2184,7 @@ GLIBC_2.30 twalk_r F
 GLIBC_2.31 msgctl F
 GLIBC_2.31 semctl F
 GLIBC_2.31 shmctl F
+GLIBC_2.32 __libc_single_threaded D 0x1
 GLIBC_2.32 pthread_sigmask F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
index 26c2ce32e5..d62589c337 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
@@ -2085,6 +2085,7 @@ GLIBC_2.30 getdents64 F
 GLIBC_2.30 gettid F
 GLIBC_2.30 tgkill F
 GLIBC_2.30 twalk_r F
+GLIBC_2.32 __libc_single_threaded D 0x1
 GLIBC_2.32 pthread_sigmask F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
diff --git a/sysdeps/unix/sysv/linux/sh/be/libc.abilist b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
index 7ad2e920c3..df4abe3400 100644
--- a/sysdeps/unix/sysv/linux/sh/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/be/libc.abilist
@@ -2054,6 +2054,7 @@ GLIBC_2.30 twalk_r F
 GLIBC_2.31 msgctl F
 GLIBC_2.31 semctl F
 GLIBC_2.31 shmctl F
+GLIBC_2.32 __libc_single_threaded D 0x1
 GLIBC_2.32 pthread_sigmask F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
diff --git a/sysdeps/unix/sysv/linux/sh/le/libc.abilist b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
index d2611bf0a5..a4d69cae90 100644
--- a/sysdeps/unix/sysv/linux/sh/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/le/libc.abilist
@@ -2051,6 +2051,7 @@ GLIBC_2.30 getdents64 F
 GLIBC_2.30 gettid F
 GLIBC_2.30 tgkill F
 GLIBC_2.30 twalk_r F
+GLIBC_2.32 __libc_single_threaded D 0x1
 GLIBC_2.32 pthread_sigmask F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
index 18a528f0e9..b5dfc8582f 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
@@ -2175,6 +2175,7 @@ GLIBC_2.30 getdents64 F
 GLIBC_2.30 gettid F
 GLIBC_2.30 tgkill F
 GLIBC_2.30 twalk_r F
+GLIBC_2.32 __libc_single_threaded D 0x1
 GLIBC_2.32 pthread_sigmask F
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
index a1d48b0f3c..7f8c73735b 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
@@ -2102,6 +2102,7 @@ GLIBC_2.30 getdents64 F
 GLIBC_2.30 gettid F
 GLIBC_2.30 tgkill F
 GLIBC_2.30 twalk_r F
+GLIBC_2.32 __libc_single_threaded D 0x1
 GLIBC_2.32 pthread_sigmask F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
index 6418ace78a..4484052d63 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
@@ -2060,6 +2060,7 @@ GLIBC_2.30 getdents64 F
 GLIBC_2.30 gettid F
 GLIBC_2.30 tgkill F
 GLIBC_2.30 twalk_r F
+GLIBC_2.32 __libc_single_threaded D 0x1
 GLIBC_2.32 pthread_sigmask F
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
index edb9f2f004..b36a994d39 100644
--- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
@@ -2159,4 +2159,5 @@ GLIBC_2.30 getdents64 F
 GLIBC_2.30 gettid F
 GLIBC_2.30 tgkill F
 GLIBC_2.30 twalk_r F
+GLIBC_2.32 __libc_single_threaded D 0x1
 GLIBC_2.32 pthread_sigmask F
-- 
2.25.4



^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-20 18:12 [PATCH 0/2] Add __libc_single_threaded Florian Weimer via Libc-alpha
  2020-05-20 18:12 ` [PATCH 1/2] Add the __libc_single_threaded variable Florian Weimer via Libc-alpha
@ 2020-05-20 18:12 ` Florian Weimer via Libc-alpha
  2020-05-21  7:52   ` Michael Kerrisk (man-pages) via Libc-alpha
                     ` (2 more replies)
  1 sibling, 3 replies; 45+ messages in thread
From: Florian Weimer via Libc-alpha @ 2020-05-20 18:12 UTC (permalink / raw)
  To: libc-alpha; +Cc: Michael Kerrisk

---
 manual/threads.texi | 89 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 89 insertions(+)

diff --git a/manual/threads.texi b/manual/threads.texi
index a425635179..d4c261a0e9 100644
--- a/manual/threads.texi
+++ b/manual/threads.texi
@@ -627,6 +627,7 @@ the standard.
 					  threads in a process.
 * Waiting with Explicit Clocks::          Functions for waiting with an
                                           explicit clock specification.
+* Single-Threaded::     Detecting single-threaded execution.
 @end menu
 
 @node Default Thread Attributes
@@ -771,6 +772,94 @@ Behaves like @code{pthread_timedjoin_np} except that the absolute time in
 @var{abstime} is measured against the clock specified by @var{clockid}.
 @end deftypefun
 
+@node Single-Threaded
+@subsubsection Detecting Single-Threaded Execution
+
+Multi-threaded programs require synchronization among threads.  This
+synchronization can be costly even if there is just a single thread
+and no data is shared between multiple processors.  @Theglibc{} offers
+an interface to detect whether the process is in single-threaded mode.
+Applications can use this information to avoid synchronization, for
+example by using regular instructions to load and store memory instead
+of atomic instructions, or using relaxed memory ordering instead of
+stronger memory ordering.
+
+@deftypevar char __libc_single_threaded
+@standards{GNU, sys/single_threaded.h}
+This variable is non-zero if the current process is definitely
+single-threaded.  If it is zero, the process can be multi-threaded,
+or @theglibc{} cannot determine at this point of the program execution
+whether the process is single-threaded or not.
+
+Applications must never write to this variable.
+@end deftypevar
+
+Most applications should perform the same actions whether or not
+@code{__libc_single_threaded} is true, except with less
+synchronization.  If this rule is followed, a process that
+subsequently becomes multi-threaded is already in a consistent state.
+For example, in order to increment a reference count, the following
+code can be used:
+
+@smallexample
+if (__libc_single_threaded)
+  atomic_fetch_add (&reference_count, 1, memory_order_relaxed);
+else
+  atomic_fetch_add (&reference_count, 1, memory_order_acq_rel);
+@end smallexample
+
+This still requires some form of synchronization on the
+single-threaded branch, so it can be beneficial not to declare the
+reference count as @code{_Atomic}, and use the GCC @code{__atomic}
+built-ins.  @xref{__atomic Builtins,, Built-in Functions for Memory
+Model Aware Atomic Operations, gcc, Using the GNU Compiler Collection
+(GCC)}.  Then the code to increment a reference count looks like this:
+
+@smallexample
+if (__libc_single_threaded)
+  ++refeference_count;inf
+else
+  __atomic_fetch_add (&reference_count, 1, __ATOMIC_ACQ_REL);
+@end smallexample
+
+(Depending on the data associated with the reference count, it may be
+possible to use the weaker @code{__ATOMIC_RELAXED} memory ordering on
+the multi-threaded branch.)
+
+Several functions in @theglibc{} can change the value of the
+@code{__libc_single_threaded} variable.  For example, creating new
+threads using the @code{pthread_create} or @code{thrd_create} function
+sets the variable to false.  This can also happen directly, say via a
+call to @code{dlopen}.  Therefore, applications need to make a copy of
+the value of @code{__libc_single_threaded} if after such a function
+call, behavior must match the value as it was before the call, like
+this:
+
+@smallexample
+bool single_threaded = __libc_single_threaded;
+if (single_threaded)
+  prepare_single_threaded ();
+else
+  prepare_multi_thread ();
+
+void *handle = dlopen (shared_library_name, RTLD_NOW);
+lookup_symbols (handle);
+
+if (single_threaded)
+  cleanup_single_threaded ();
+else
+  cleanup_multi_thread ();
+@end smallexample
+
+Since the value of @code{__libc_single_threaded} can change from true
+to false during the execution of the program, it is not useful for
+selecting optimized function implementations in IFUNC resolvers.
+
+Atomic operations can also be used on mappings shared among
+single-threaded processes.  This means that a compiler cannot use
+@code{__libc_single_threaded} to optimize atomic operations, unless it
+is able to prove that the memory is not shared.
+
 @c FIXME these are undocumented:
 @c pthread_atfork
 @c pthread_attr_destroy
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-20 18:12 ` [PATCH 2/2] manual: Document __libc_single_threaded Florian Weimer via Libc-alpha
@ 2020-05-21  7:52   ` Michael Kerrisk (man-pages) via Libc-alpha
  2020-05-21 12:17     ` Florian Weimer via Libc-alpha
  2020-05-21 11:18   ` Szabolcs Nagy
  2020-05-21 12:50   ` Adhemerval Zanella via Libc-alpha
  2 siblings, 1 reply; 45+ messages in thread
From: Michael Kerrisk (man-pages) via Libc-alpha @ 2020-05-21  7:52 UTC (permalink / raw)
  To: Florian Weimer, libc-alpha; +Cc: mtk.manpages

On 5/20/20 8:12 PM, Florian Weimer wrote:
> ---
>  manual/threads.texi | 89 +++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 89 insertions(+)
> 
> diff --git a/manual/threads.texi b/manual/threads.texi
> index a425635179..d4c261a0e9 100644
> --- a/manual/threads.texi
> +++ b/manual/threads.texi
> @@ -627,6 +627,7 @@ the standard.
>  					  threads in a process.
>  * Waiting with Explicit Clocks::          Functions for waiting with an
>                                            explicit clock specification.
> +* Single-Threaded::     Detecting single-threaded execution.
>  @end menu
>  
>  @node Default Thread Attributes
> @@ -771,6 +772,94 @@ Behaves like @code{pthread_timedjoin_np} except that the absolute time in
>  @var{abstime} is measured against the clock specified by @var{clockid}.
>  @end deftypefun
>  
> +@node Single-Threaded
> +@subsubsection Detecting Single-Threaded Execution
> +
> +Multi-threaded programs require synchronization among threads.  This
> +synchronization can be costly even if there is just a single thread
> +and no data is shared between multiple processors.  @Theglibc{} offers
> +an interface to detect whether the process is in single-threaded mode.
> +Applications can use this information to avoid synchronization, for
> +example by using regular instructions to load and store memory instead
> +of atomic instructions, or using relaxed memory ordering instead of
> +stronger memory ordering.
> +
> +@deftypevar char __libc_single_threaded
> +@standards{GNU, sys/single_threaded.h}
> +This variable is non-zero if the current process is definitely
> +single-threaded.  If it is zero, the process can be multi-threaded,

s/can be/may be/

> +or @theglibc{} cannot determine at this point of the program execution
> +whether the process is single-threaded or not.
> +
> +Applications must never write to this variable.
> +@end deftypevar
> +
> +Most applications should perform the same actions whether or not
> +@code{__libc_single_threaded} is true, except with less
> +synchronization.  If this rule is followed, a process that
> +subsequently becomes multi-threaded is already in a consistent state.
> +For example, in order to increment a reference count, the following
> +code can be used:
> +
> +@smallexample
> +if (__libc_single_threaded)
> +  atomic_fetch_add (&reference_count, 1, memory_order_relaxed);
> +else
> +  atomic_fetch_add (&reference_count, 1, memory_order_acq_rel);
> +@end smallexample
> +
> +This still requires some form of synchronization on the
> +single-threaded branch, so it can be beneficial not to declare the
> +reference count as @code{_Atomic}, and use the GCC @code{__atomic}
> +built-ins.  @xref{__atomic Builtins,, Built-in Functions for Memory
> +Model Aware Atomic Operations, gcc, Using the GNU Compiler Collection
> +(GCC)}.  Then the code to increment a reference count looks like this:
> +
> +@smallexample
> +if (__libc_single_threaded)
> +  ++refeference_count;inf
> +else
> +  __atomic_fetch_add (&reference_count, 1, __ATOMIC_ACQ_REL);
> +@end smallexample
> +
> +(Depending on the data associated with the reference count, it may be
> +possible to use the weaker @code{__ATOMIC_RELAXED} memory ordering on
> +the multi-threaded branch.)
> +
> +Several functions in @theglibc{} can change the value of the
> +@code{__libc_single_threaded} variable.  For example, creating new
> +threads using the @code{pthread_create} or @code{thrd_create} function
> +sets the variable to false.  This can also happen directly, say via a

s/directly/indirectly/ ?

> +call to @code{dlopen}.  Therefore, applications need to make a copy of
> +the value of @code{__libc_single_threaded} if after such a function
> +call, behavior must match the value as it was before the call, like
> +this:
> +
> +@smallexample
> +bool single_threaded = __libc_single_threaded;
> +if (single_threaded)
> +  prepare_single_threaded ();
> +else
> +  prepare_multi_thread ();
> +
> +void *handle = dlopen (shared_library_name, RTLD_NOW);
> +lookup_symbols (handle);
> +
> +if (single_threaded)
> +  cleanup_single_threaded ();
> +else
> +  cleanup_multi_thread ();
> +@end smallexample
> +
> +Since the value of @code{__libc_single_threaded} can change from true
> +to false during the execution of the program, it is not useful for
> +selecting optimized function implementations in IFUNC resolvers.
> +
> +Atomic operations can also be used on mappings shared among
> +single-threaded processes.  This means that a compiler cannot use
> +@code{__libc_single_threaded} to optimize atomic operations, unless it
> +is able to prove that the memory is not shared.
> +
>  @c FIXME these are undocumented:
>  @c pthread_atfork
>  @c pthread_attr_destroy

Good info!

Thanks,

Michael


-- 
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-20 18:12 ` [PATCH 2/2] manual: Document __libc_single_threaded Florian Weimer via Libc-alpha
  2020-05-21  7:52   ` Michael Kerrisk (man-pages) via Libc-alpha
@ 2020-05-21 11:18   ` Szabolcs Nagy
  2020-05-21 12:16     ` Florian Weimer via Libc-alpha
  2020-05-21 12:50   ` Adhemerval Zanella via Libc-alpha
  2 siblings, 1 reply; 45+ messages in thread
From: Szabolcs Nagy @ 2020-05-21 11:18 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha, Michael Kerrisk

The 05/20/2020 20:12, Florian Weimer via Libc-alpha wrote:
> +@smallexample
> +if (__libc_single_threaded)
> +  atomic_fetch_add (&reference_count, 1, memory_order_relaxed);
> +else
> +  atomic_fetch_add (&reference_count, 1, memory_order_acq_rel);
> +@end smallexample
> +
> +This still requires some form of synchronization on the
> +single-threaded branch, so it can be beneficial not to declare the
> +reference count as @code{_Atomic}, and use the GCC @code{__atomic}
> +built-ins.  @xref{__atomic Builtins,, Built-in Functions for Memory
> +Model Aware Atomic Operations, gcc, Using the GNU Compiler Collection
> +(GCC)}.  Then the code to increment a reference count looks like this:
> +
> +@smallexample
> +if (__libc_single_threaded)
> +  ++refeference_count;inf

inf typo

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-21 11:18   ` Szabolcs Nagy
@ 2020-05-21 12:16     ` Florian Weimer via Libc-alpha
  0 siblings, 0 replies; 45+ messages in thread
From: Florian Weimer via Libc-alpha @ 2020-05-21 12:16 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha, Michael Kerrisk

* Szabolcs Nagy:

> The 05/20/2020 20:12, Florian Weimer via Libc-alpha wrote:
>> +@smallexample
>> +if (__libc_single_threaded)
>> +  atomic_fetch_add (&reference_count, 1, memory_order_relaxed);
>> +else
>> +  atomic_fetch_add (&reference_count, 1, memory_order_acq_rel);
>> +@end smallexample
>> +
>> +This still requires some form of synchronization on the
>> +single-threaded branch, so it can be beneficial not to declare the
>> +reference count as @code{_Atomic}, and use the GCC @code{__atomic}
>> +built-ins.  @xref{__atomic Builtins,, Built-in Functions for Memory
>> +Model Aware Atomic Operations, gcc, Using the GNU Compiler Collection
>> +(GCC)}.  Then the code to increment a reference count looks like this:
>> +
>> +@smallexample
>> +if (__libc_single_threaded)
>> +  ++refeference_count;inf
>
> inf typo

Thanks, queued for V2.

Florian


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-21  7:52   ` Michael Kerrisk (man-pages) via Libc-alpha
@ 2020-05-21 12:17     ` Florian Weimer via Libc-alpha
  0 siblings, 0 replies; 45+ messages in thread
From: Florian Weimer via Libc-alpha @ 2020-05-21 12:17 UTC (permalink / raw)
  To: Michael Kerrisk (man-pages); +Cc: libc-alpha

* Michael Kerrisk:

>> +@deftypevar char __libc_single_threaded
>> +@standards{GNU, sys/single_threaded.h}
>> +This variable is non-zero if the current process is definitely
>> +single-threaded.  If it is zero, the process can be multi-threaded,
>
> s/can be/may be/

Fixed.

>> +Several functions in @theglibc{} can change the value of the
>> +@code{__libc_single_threaded} variable.  For example, creating new
>> +threads using the @code{pthread_create} or @code{thrd_create} function
>> +sets the variable to false.  This can also happen directly, say via a
>
> s/directly/indirectly/ ?

Indeed, fixed.

I'll repost a V2 once there are comments on the actual code.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-20 18:12 ` [PATCH 2/2] manual: Document __libc_single_threaded Florian Weimer via Libc-alpha
  2020-05-21  7:52   ` Michael Kerrisk (man-pages) via Libc-alpha
  2020-05-21 11:18   ` Szabolcs Nagy
@ 2020-05-21 12:50   ` Adhemerval Zanella via Libc-alpha
  2020-05-21 13:09     ` Szabolcs Nagy
  2020-05-21 13:14     ` Florian Weimer via Libc-alpha
  2 siblings, 2 replies; 45+ messages in thread
From: Adhemerval Zanella via Libc-alpha @ 2020-05-21 12:50 UTC (permalink / raw)
  To: libc-alpha



On 20/05/2020 15:12, Florian Weimer via Libc-alpha wrote:

> +@smallexample
> +if (__libc_single_threaded)
> +  atomic_fetch_add (&reference_count, 1, memory_order_relaxed);
> +else
> +  atomic_fetch_add (&reference_count, 1, memory_order_acq_rel);
> +@end smallexample

Shouldn't the access to __libc_single_threaded be atomic itself
(at least with relaxed semantic)?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] Add the __libc_single_threaded variable
  2020-05-20 18:12 ` [PATCH 1/2] Add the __libc_single_threaded variable Florian Weimer via Libc-alpha
@ 2020-05-21 13:07   ` Szabolcs Nagy
  2020-05-21 13:16     ` Florian Weimer via Libc-alpha
  0 siblings, 1 reply; 45+ messages in thread
From: Szabolcs Nagy @ 2020-05-21 13:07 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

The 05/20/2020 20:12, Florian Weimer via Libc-alpha wrote:
> The variable is placed in libc.so, and it can be true only in
> an outer libc, not libcs loaded via dlmopen or static dlopen.
> Since thread creation from inner namespaces does not work,
> pthread_create can update __libc_single_threaded directly.
> 
> Using __libc_early_init and its initial flag, implementation of this
> variable is very straightforward.  A future version may reset the flag
> during fork (but not in an inner namespace), or after joining all
> threads except one.
> ---
...
> +* The GNU C Library now provides the header file <sys/single_threaded.h>
> +  which declares the variable __libc_single_threaded.  Applications are
> +  encouraged to use this variable for single-thread optimizations,
> +  instead of weak references to symbols historically defined in
> +  libpthread.

i wonder if the new header can be included into
threads.h and pthread.h, and a feature macro added
for it, so users can avoid doing a header check.

if the name is already in implementation reserved
namespace this should be possible, i'm not sure
if such indirect include is best practice though.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-21 12:50   ` Adhemerval Zanella via Libc-alpha
@ 2020-05-21 13:09     ` Szabolcs Nagy
  2020-05-21 13:15       ` Adhemerval Zanella via Libc-alpha
  2020-05-21 13:14     ` Florian Weimer via Libc-alpha
  1 sibling, 1 reply; 45+ messages in thread
From: Szabolcs Nagy @ 2020-05-21 13:09 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-alpha

The 05/21/2020 09:50, Adhemerval Zanella via Libc-alpha wrote:
> On 20/05/2020 15:12, Florian Weimer via Libc-alpha wrote:
> 
> > +@smallexample
> > +if (__libc_single_threaded)
> > +  atomic_fetch_add (&reference_count, 1, memory_order_relaxed);
> > +else
> > +  atomic_fetch_add (&reference_count, 1, memory_order_acq_rel);
> > +@end smallexample
> 
> Shouldn't the access to __libc_single_threaded be atomic itself
> (at least with relaxed semantic)?

not if we guarantee that this object can only be
written while the process is single threaded.

(e.g. an exiting detached thread cannot update it
even if only one thread left.. because that may
concurrently read it)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-21 12:50   ` Adhemerval Zanella via Libc-alpha
  2020-05-21 13:09     ` Szabolcs Nagy
@ 2020-05-21 13:14     ` Florian Weimer via Libc-alpha
  2020-05-21 14:32       ` Adhemerval Zanella via Libc-alpha
  1 sibling, 1 reply; 45+ messages in thread
From: Florian Weimer via Libc-alpha @ 2020-05-21 13:14 UTC (permalink / raw)
  To: Adhemerval Zanella via Libc-alpha

* Adhemerval Zanella via Libc-alpha:

> On 20/05/2020 15:12, Florian Weimer via Libc-alpha wrote:
>
>> +@smallexample
>> +if (__libc_single_threaded)
>> +  atomic_fetch_add (&reference_count, 1, memory_order_relaxed);
>> +else
>> +  atomic_fetch_add (&reference_count, 1, memory_order_acq_rel);
>> +@end smallexample
>
> Shouldn't the access to __libc_single_threaded be atomic itself
> (at least with relaxed semantic)?

Good question.  In the current implementation, it is not needed because
the variable is never written again once the process is multi-threaded.

We must retain relaxed MO access as a valid use of this variable.  A
future implementation may set __libc_single_threaded to true after
detecting that the process has become single-threaded again.  But I
think this requires that the last remaining thread synchronizes in some
way with the exit of the other, second-to-last remaining thread.  And
that in turn means that no explicit MO is needed for the variable read.

I'm going to add this to the manual as an implementation note, after the
first example:

@c Note: No memory order on __libc_single_threaded.  The
@c implementation must ensure that exit of the critical
@c (second-to-last) thread happens-before setting
@c __libc_single_threaded to true.  Otherwise, acquire MO might be
@c needed for reading the variable in some scenarios, and that would
@c completely defeat its purpose.

For detached thread exits, this kind of synchronization may not be
easily obtainable in all cases.  I don't think we can do it on the
on-thread exit path because the kernel will perform certain actions
afterwards (like robust mutex updates), no matter how late we do it.  I
guess we could perhaps piggy-back on the stack reclamation mechanism.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-21 13:09     ` Szabolcs Nagy
@ 2020-05-21 13:15       ` Adhemerval Zanella via Libc-alpha
  2020-05-21 13:30         ` Szabolcs Nagy
  0 siblings, 1 reply; 45+ messages in thread
From: Adhemerval Zanella via Libc-alpha @ 2020-05-21 13:15 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha



On 21/05/2020 10:09, Szabolcs Nagy wrote:
> The 05/21/2020 09:50, Adhemerval Zanella via Libc-alpha wrote:
>> On 20/05/2020 15:12, Florian Weimer via Libc-alpha wrote:
>>
>>> +@smallexample
>>> +if (__libc_single_threaded)
>>> +  atomic_fetch_add (&reference_count, 1, memory_order_relaxed);
>>> +else
>>> +  atomic_fetch_add (&reference_count, 1, memory_order_acq_rel);
>>> +@end smallexample
>>
>> Shouldn't the access to __libc_single_threaded be atomic itself
>> (at least with relaxed semantic)?
> 
> not if we guarantee that this object can only be
> written while the process is single threaded.
> 
> (e.g. an exiting detached thread cannot update it
> even if only one thread left.. because that may
> concurrently read it)
> 

OK, so I think we should outline that atomic operations are not required 
to acess this object and that once __libc_single_threaded is set 0 it will 
continue to indicate non-single thread even thread are jointed or detached 
thread finishes.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] Add the __libc_single_threaded variable
  2020-05-21 13:07   ` Szabolcs Nagy
@ 2020-05-21 13:16     ` Florian Weimer via Libc-alpha
  2020-05-21 13:26       ` Szabolcs Nagy
  0 siblings, 1 reply; 45+ messages in thread
From: Florian Weimer via Libc-alpha @ 2020-05-21 13:16 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha

* Szabolcs Nagy:

> The 05/20/2020 20:12, Florian Weimer via Libc-alpha wrote:
>> The variable is placed in libc.so, and it can be true only in
>> an outer libc, not libcs loaded via dlmopen or static dlopen.
>> Since thread creation from inner namespaces does not work,
>> pthread_create can update __libc_single_threaded directly.
>> 
>> Using __libc_early_init and its initial flag, implementation of this
>> variable is very straightforward.  A future version may reset the flag
>> during fork (but not in an inner namespace), or after joining all
>> threads except one.
>> ---
> ...
>> +* The GNU C Library now provides the header file <sys/single_threaded.h>
>> +  which declares the variable __libc_single_threaded.  Applications are
>> +  encouraged to use this variable for single-thread optimizations,
>> +  instead of weak references to symbols historically defined in
>> +  libpthread.
>
> i wonder if the new header can be included into
> threads.h and pthread.h, and a feature macro added
> for it, so users can avoid doing a header check.

Isn't __has_include in current compilers sufficient for that?

The advantage of the current approach is that libstdc++ headers can use
__has_include on the new headers, without pulling in other headers which
declare identifiers which are not in implementation namespace and also
not expected to be used by C++ standard headers.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 1/2] Add the __libc_single_threaded variable
  2020-05-21 13:16     ` Florian Weimer via Libc-alpha
@ 2020-05-21 13:26       ` Szabolcs Nagy
  0 siblings, 0 replies; 45+ messages in thread
From: Szabolcs Nagy @ 2020-05-21 13:26 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

The 05/21/2020 15:16, Florian Weimer wrote:
> * Szabolcs Nagy:
> > The 05/20/2020 20:12, Florian Weimer via Libc-alpha wrote:
> >> The variable is placed in libc.so, and it can be true only in
> >> an outer libc, not libcs loaded via dlmopen or static dlopen.
> >> Since thread creation from inner namespaces does not work,
> >> pthread_create can update __libc_single_threaded directly.
> >> 
> >> Using __libc_early_init and its initial flag, implementation of this
> >> variable is very straightforward.  A future version may reset the flag
> >> during fork (but not in an inner namespace), or after joining all
> >> threads except one.
> >> ---
> > ...
> >> +* The GNU C Library now provides the header file <sys/single_threaded.h>
> >> +  which declares the variable __libc_single_threaded.  Applications are
> >> +  encouraged to use this variable for single-thread optimizations,
> >> +  instead of weak references to symbols historically defined in
> >> +  libpthread.
> >
> > i wonder if the new header can be included into
> > threads.h and pthread.h, and a feature macro added
> > for it, so users can avoid doing a header check.
> 
> Isn't __has_include in current compilers sufficient for that?
> 
> The advantage of the current approach is that libstdc++ headers can use
> __has_include on the new headers, without pulling in other headers which
> declare identifiers which are not in implementation namespace and also
> not expected to be used by C++ standard headers.

ok i guess __has_include is fine (assuming the macro expansion
bugs are fixed)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-21 13:15       ` Adhemerval Zanella via Libc-alpha
@ 2020-05-21 13:30         ` Szabolcs Nagy
  2020-05-21 13:44           ` Florian Weimer via Libc-alpha
  2020-05-21 13:56           ` Adhemerval Zanella via Libc-alpha
  0 siblings, 2 replies; 45+ messages in thread
From: Szabolcs Nagy @ 2020-05-21 13:30 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-alpha

The 05/21/2020 10:15, Adhemerval Zanella wrote:
> On 21/05/2020 10:09, Szabolcs Nagy wrote:
> > The 05/21/2020 09:50, Adhemerval Zanella via Libc-alpha wrote:
> >> On 20/05/2020 15:12, Florian Weimer via Libc-alpha wrote:
> >>
> >>> +@smallexample
> >>> +if (__libc_single_threaded)
> >>> +  atomic_fetch_add (&reference_count, 1, memory_order_relaxed);
> >>> +else
> >>> +  atomic_fetch_add (&reference_count, 1, memory_order_acq_rel);
> >>> +@end smallexample
> >>
> >> Shouldn't the access to __libc_single_threaded be atomic itself
> >> (at least with relaxed semantic)?
> > 
> > not if we guarantee that this object can only be
> > written while the process is single threaded.
> > 
> > (e.g. an exiting detached thread cannot update it
> > even if only one thread left.. because that may
> > concurrently read it)
> > 
> 
> OK, so I think we should outline that atomic operations are not required 
> to acess this object and that once __libc_single_threaded is set 0 it will 
> continue to indicate non-single thread even thread are jointed or detached 
> thread finishes.

what's wrong with pthread_join updating it?
(other than the already mentioned dlmopened
libc case)

if only one thread left that is doing the join
then there cannot be concurent access or any
memory ordering problem, the process is single
threaded.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-21 13:30         ` Szabolcs Nagy
@ 2020-05-21 13:44           ` Florian Weimer via Libc-alpha
  2020-05-21 13:58             ` Adhemerval Zanella via Libc-alpha
  2020-05-22 10:01             ` Szabolcs Nagy
  2020-05-21 13:56           ` Adhemerval Zanella via Libc-alpha
  1 sibling, 2 replies; 45+ messages in thread
From: Florian Weimer via Libc-alpha @ 2020-05-21 13:44 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha

* Szabolcs Nagy:

> The 05/21/2020 10:15, Adhemerval Zanella wrote:
>> On 21/05/2020 10:09, Szabolcs Nagy wrote:
>> > The 05/21/2020 09:50, Adhemerval Zanella via Libc-alpha wrote:
>> >> On 20/05/2020 15:12, Florian Weimer via Libc-alpha wrote:
>> >>
>> >>> +@smallexample
>> >>> +if (__libc_single_threaded)
>> >>> +  atomic_fetch_add (&reference_count, 1, memory_order_relaxed);
>> >>> +else
>> >>> +  atomic_fetch_add (&reference_count, 1, memory_order_acq_rel);
>> >>> +@end smallexample
>> >>
>> >> Shouldn't the access to __libc_single_threaded be atomic itself
>> >> (at least with relaxed semantic)?
>> > 
>> > not if we guarantee that this object can only be
>> > written while the process is single threaded.
>> > 
>> > (e.g. an exiting detached thread cannot update it
>> > even if only one thread left.. because that may
>> > concurrently read it)
>> > 
>> 
>> OK, so I think we should outline that atomic operations are not required 
>> to acess this object and that once __libc_single_threaded is set 0 it will 
>> continue to indicate non-single thread even thread are jointed or detached 
>> thread finishes.
>
> what's wrong with pthread_join updating it?

It's tricky do it correctly if there are two remaining threads, one of
them the one being joined, the other one a detached thread.  A
straightforward implementation merely looking at __nptl_nthreads before
returning from pthread_join would not perform the required
synchronization on the detached thread exit.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-21 13:30         ` Szabolcs Nagy
  2020-05-21 13:44           ` Florian Weimer via Libc-alpha
@ 2020-05-21 13:56           ` Adhemerval Zanella via Libc-alpha
  1 sibling, 0 replies; 45+ messages in thread
From: Adhemerval Zanella via Libc-alpha @ 2020-05-21 13:56 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha



On 21/05/2020 10:30, Szabolcs Nagy wrote:
> The 05/21/2020 10:15, Adhemerval Zanella wrote:
>> On 21/05/2020 10:09, Szabolcs Nagy wrote:
>>> The 05/21/2020 09:50, Adhemerval Zanella via Libc-alpha wrote:
>>>> On 20/05/2020 15:12, Florian Weimer via Libc-alpha wrote:
>>>>
>>>>> +@smallexample
>>>>> +if (__libc_single_threaded)
>>>>> +  atomic_fetch_add (&reference_count, 1, memory_order_relaxed);
>>>>> +else
>>>>> +  atomic_fetch_add (&reference_count, 1, memory_order_acq_rel);
>>>>> +@end smallexample
>>>>
>>>> Shouldn't the access to __libc_single_threaded be atomic itself
>>>> (at least with relaxed semantic)?
>>>
>>> not if we guarantee that this object can only be
>>> written while the process is single threaded.
>>>
>>> (e.g. an exiting detached thread cannot update it
>>> even if only one thread left.. because that may
>>> concurrently read it)
>>>
>>
>> OK, so I think we should outline that atomic operations are not required 
>> to acess this object and that once __libc_single_threaded is set 0 it will 
>> continue to indicate non-single thread even thread are jointed or detached 
>> thread finishes.
> 
> what's wrong with pthread_join updating it?
> (other than the already mentioned dlmopened
> libc case)
> 

I don't think it is a problem, but we should document the current semantic
(which does not update the value on pthread_join or detached thread exit).

> if only one thread left that is doing the join
> then there cannot be concurent access or any
> memory ordering problem, the process is single
> threaded.
> 

I am not sure it is valid for C11 atomic semantics to access an object
that is concurrently updated without proper atomic semantic, however 
afaiu for this specific case it should result only in missing optimization
(to use the mt patch instead of the single threaded one).

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-21 13:44           ` Florian Weimer via Libc-alpha
@ 2020-05-21 13:58             ` Adhemerval Zanella via Libc-alpha
  2020-05-21 14:03               ` Florian Weimer via Libc-alpha
  2020-05-22 10:01             ` Szabolcs Nagy
  1 sibling, 1 reply; 45+ messages in thread
From: Adhemerval Zanella via Libc-alpha @ 2020-05-21 13:58 UTC (permalink / raw)
  To: Florian Weimer, Szabolcs Nagy; +Cc: libc-alpha



On 21/05/2020 10:44, Florian Weimer wrote:
> * Szabolcs Nagy:
> 
>> The 05/21/2020 10:15, Adhemerval Zanella wrote:
>>> On 21/05/2020 10:09, Szabolcs Nagy wrote:
>>>> The 05/21/2020 09:50, Adhemerval Zanella via Libc-alpha wrote:
>>>>> On 20/05/2020 15:12, Florian Weimer via Libc-alpha wrote:
>>>>>
>>>>>> +@smallexample
>>>>>> +if (__libc_single_threaded)
>>>>>> +  atomic_fetch_add (&reference_count, 1, memory_order_relaxed);
>>>>>> +else
>>>>>> +  atomic_fetch_add (&reference_count, 1, memory_order_acq_rel);
>>>>>> +@end smallexample
>>>>>
>>>>> Shouldn't the access to __libc_single_threaded be atomic itself
>>>>> (at least with relaxed semantic)?
>>>>
>>>> not if we guarantee that this object can only be
>>>> written while the process is single threaded.
>>>>
>>>> (e.g. an exiting detached thread cannot update it
>>>> even if only one thread left.. because that may
>>>> concurrently read it)
>>>>
>>>
>>> OK, so I think we should outline that atomic operations are not required 
>>> to acess this object and that once __libc_single_threaded is set 0 it will 
>>> continue to indicate non-single thread even thread are jointed or detached 
>>> thread finishes.
>>
>> what's wrong with pthread_join updating it?
> 
> It's tricky do it correctly if there are two remaining threads, one of
> them the one being joined, the other one a detached thread.  A
> straightforward implementation merely looking at __nptl_nthreads before
> returning from pthread_join would not perform the required
> synchronization on the detached thread exit.

Couldn't we accomplish by making __libc_single_threaded count the total
number of threads and making pthread_create/pthread_exit/detach exit
atomically updating it?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-21 13:58             ` Adhemerval Zanella via Libc-alpha
@ 2020-05-21 14:03               ` Florian Weimer via Libc-alpha
  0 siblings, 0 replies; 45+ messages in thread
From: Florian Weimer via Libc-alpha @ 2020-05-21 14:03 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: libc-alpha

* Adhemerval Zanella:

> On 21/05/2020 10:44, Florian Weimer wrote:
>> * Szabolcs Nagy:
>> 
>>> The 05/21/2020 10:15, Adhemerval Zanella wrote:
>>>> On 21/05/2020 10:09, Szabolcs Nagy wrote:
>>>>> The 05/21/2020 09:50, Adhemerval Zanella via Libc-alpha wrote:
>>>>>> On 20/05/2020 15:12, Florian Weimer via Libc-alpha wrote:
>>>>>>
>>>>>>> +@smallexample
>>>>>>> +if (__libc_single_threaded)
>>>>>>> +  atomic_fetch_add (&reference_count, 1, memory_order_relaxed);
>>>>>>> +else
>>>>>>> +  atomic_fetch_add (&reference_count, 1, memory_order_acq_rel);
>>>>>>> +@end smallexample
>>>>>>
>>>>>> Shouldn't the access to __libc_single_threaded be atomic itself
>>>>>> (at least with relaxed semantic)?
>>>>>
>>>>> not if we guarantee that this object can only be
>>>>> written while the process is single threaded.
>>>>>
>>>>> (e.g. an exiting detached thread cannot update it
>>>>> even if only one thread left.. because that may
>>>>> concurrently read it)
>>>>>
>>>>
>>>> OK, so I think we should outline that atomic operations are not required 
>>>> to acess this object and that once __libc_single_threaded is set 0 it will 
>>>> continue to indicate non-single thread even thread are jointed or detached 
>>>> thread finishes.
>>>
>>> what's wrong with pthread_join updating it?
>> 
>> It's tricky do it correctly if there are two remaining threads, one of
>> them the one being joined, the other one a detached thread.  A
>> straightforward implementation merely looking at __nptl_nthreads before
>> returning from pthread_join would not perform the required
>> synchronization on the detached thread exit.
>
> Couldn't we accomplish by making __libc_single_threaded count the total
> number of threads and making pthread_create/pthread_exit/detach exit
> atomically updating it?

We already have __nptl_nthreads as a global thread count, but it is
currently decremented too early (on the exiting thread).  As I tried to
explain, we cannot decrement it on the exiting thread itself because it
would not give us the desired synchronization, particularly not with any
kernel actions that happen afterwards.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-21 13:14     ` Florian Weimer via Libc-alpha
@ 2020-05-21 14:32       ` Adhemerval Zanella via Libc-alpha
  2020-06-03 15:48         ` Florian Weimer via Libc-alpha
  0 siblings, 1 reply; 45+ messages in thread
From: Adhemerval Zanella via Libc-alpha @ 2020-05-21 14:32 UTC (permalink / raw)
  To: Florian Weimer, Adhemerval Zanella via Libc-alpha



On 21/05/2020 10:14, Florian Weimer wrote:
> * Adhemerval Zanella via Libc-alpha:
> 
>> On 20/05/2020 15:12, Florian Weimer via Libc-alpha wrote:
>>
>>> +@smallexample
>>> +if (__libc_single_threaded)
>>> +  atomic_fetch_add (&reference_count, 1, memory_order_relaxed);
>>> +else
>>> +  atomic_fetch_add (&reference_count, 1, memory_order_acq_rel);
>>> +@end smallexample
>>
>> Shouldn't the access to __libc_single_threaded be atomic itself
>> (at least with relaxed semantic)?
> 
> Good question.  In the current implementation, it is not needed because
> the variable is never written again once the process is multi-threaded.
> 
> We must retain relaxed MO access as a valid use of this variable.  A
> future implementation may set __libc_single_threaded to true after
> detecting that the process has become single-threaded again.  But I
> think this requires that the last remaining thread synchronizes in some
> way with the exit of the other, second-to-last remaining thread.  And
> that in turn means that no explicit MO is needed for the variable read.
> 
> I'm going to add this to the manual as an implementation note, after the
> first example:
> 
> @c Note: No memory order on __libc_single_threaded.  The
> @c implementation must ensure that exit of the critical
> @c (second-to-last) thread happens-before setting
> @c __libc_single_threaded to true.  Otherwise, acquire MO might be
> @c needed for reading the variable in some scenarios, and that would
> @c completely defeat its purpose.

The comments is sound, but I still think we should properly document 
that this initial version does not attempt to update 
__libc_single_threaded on pthread_join or detach exit and maybe also
the brief explanation you added on why this semantic was chose (to
avoid the requirement of more strict MO). 

> 
> For detached thread exits, this kind of synchronization may not be
> easily obtainable in all cases.  I don't think we can do it on the
> on-thread exit path because the kernel will perform certain actions
> afterwards (like robust mutex updates), no matter how late we do it.  I
> guess we could perhaps piggy-back on the stack reclamation mechanism.

It seems that robust mutexes updates are indeed a problem, but I am not
sure if CLONE_CHILD_CLEARTID clear helps here.  It signals the thread
is done with the memory synchronization, but the stack cache is not
really updated.  Maybe an extra clone3 flag ?


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-21 13:44           ` Florian Weimer via Libc-alpha
  2020-05-21 13:58             ` Adhemerval Zanella via Libc-alpha
@ 2020-05-22 10:01             ` Szabolcs Nagy
  2020-05-22 10:05               ` Florian Weimer via Libc-alpha
  1 sibling, 1 reply; 45+ messages in thread
From: Szabolcs Nagy @ 2020-05-22 10:01 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

The 05/21/2020 15:44, Florian Weimer wrote:
> * Szabolcs Nagy:
> > what's wrong with pthread_join updating it?
> 
> It's tricky do it correctly if there are two remaining threads, one of
> them the one being joined, the other one a detached thread.  A
> straightforward implementation merely looking at __nptl_nthreads before
> returning from pthread_join would not perform the required
> synchronization on the detached thread exit.

i'm trying to understand this, but don't see
what's wrong if the last thread is detached.

do you mean user code in atexit handlers?
or synchronization in libc? what is libc
synchronizing with in a single detached thread?

so far i don't see why __libc_single_thread
cannot go back to true once it was false
(there may be usability issues but i need
to look at some example usage to see)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-22 10:01             ` Szabolcs Nagy
@ 2020-05-22 10:05               ` Florian Weimer via Libc-alpha
  2020-05-22 10:54                 ` Szabolcs Nagy
  0 siblings, 1 reply; 45+ messages in thread
From: Florian Weimer via Libc-alpha @ 2020-05-22 10:05 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha

* Szabolcs Nagy:

> The 05/21/2020 15:44, Florian Weimer wrote:
>> * Szabolcs Nagy:
>> > what's wrong with pthread_join updating it?
>> 
>> It's tricky do it correctly if there are two remaining threads, one of
>> them the one being joined, the other one a detached thread.  A
>> straightforward implementation merely looking at __nptl_nthreads before
>> returning from pthread_join would not perform the required
>> synchronization on the detached thread exit.
>
> i'm trying to understand this, but don't see
> what's wrong if the last thread is detached.

Sorry, I meant three reamining threads in total, i.e., two more threads
in addition to the one thread that keeps going after the other two
exited, and may use __libc_single_threaded in the future.

Clearer now?

Thanks,
Florian


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-22 10:05               ` Florian Weimer via Libc-alpha
@ 2020-05-22 10:54                 ` Szabolcs Nagy
  2020-05-22 11:08                   ` Florian Weimer via Libc-alpha
  2020-05-22 15:07                   ` Rich Felker
  0 siblings, 2 replies; 45+ messages in thread
From: Szabolcs Nagy @ 2020-05-22 10:54 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

The 05/22/2020 12:05, Florian Weimer wrote:
> * Szabolcs Nagy:
> 
> > The 05/21/2020 15:44, Florian Weimer wrote:
> >> * Szabolcs Nagy:
> >> > what's wrong with pthread_join updating it?
> >> 
> >> It's tricky do it correctly if there are two remaining threads, one of
> >> them the one being joined, the other one a detached thread.  A
> >> straightforward implementation merely looking at __nptl_nthreads before
> >> returning from pthread_join would not perform the required
> >> synchronization on the detached thread exit.
> >
> > i'm trying to understand this, but don't see
> > what's wrong if the last thread is detached.
> 
> Sorry, I meant three reamining threads in total, i.e., two more threads
> in addition to the one thread that keeps going after the other two
> exited, and may use __libc_single_threaded in the future.
> 
> Clearer now?

hm so a detached thread is concurrently exiting with
a pthread_join which sees a decremented __nptl_nthreads
but the detached thread has not actually exited yet.

i think glibc can issue a memory barrier syscall before
decrementing __nptl_nthreads in a detached thread, this
means if pthread_join observes __nptl_nthreads==1
then user memory accesses in the detached thread are
synchronized with non-atomic memory accesses after
pthread_join returns. (i.e. __nptl_nthreads==1 should
mean at all times that as far as user code is concerned
the process is single threaded even if some detached
thread is still hanging around)

i think __libc_single_threaded should be possible to
update in pthread_join with the above change, in
which case we need not document that it stays false
forever, so we can change this in the future.
(unless somebody finds usecases where a false->true
transition would cause problems)


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-22 10:54                 ` Szabolcs Nagy
@ 2020-05-22 11:08                   ` Florian Weimer via Libc-alpha
  2020-05-22 15:07                   ` Rich Felker
  1 sibling, 0 replies; 45+ messages in thread
From: Florian Weimer via Libc-alpha @ 2020-05-22 11:08 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: libc-alpha

* Szabolcs Nagy:

> The 05/22/2020 12:05, Florian Weimer wrote:
>> * Szabolcs Nagy:
>> 
>> > The 05/21/2020 15:44, Florian Weimer wrote:
>> >> * Szabolcs Nagy:
>> >> > what's wrong with pthread_join updating it?
>> >> 
>> >> It's tricky do it correctly if there are two remaining threads, one of
>> >> them the one being joined, the other one a detached thread.  A
>> >> straightforward implementation merely looking at __nptl_nthreads before
>> >> returning from pthread_join would not perform the required
>> >> synchronization on the detached thread exit.
>> >
>> > i'm trying to understand this, but don't see
>> > what's wrong if the last thread is detached.
>> 
>> Sorry, I meant three reamining threads in total, i.e., two more threads
>> in addition to the one thread that keeps going after the other two
>> exited, and may use __libc_single_threaded in the future.
>> 
>> Clearer now?
>
> hm so a detached thread is concurrently exiting with
> a pthread_join which sees a decremented __nptl_nthreads
> but the detached thread has not actually exited yet.

Correct.

> i think glibc can issue a memory barrier syscall before
> decrementing __nptl_nthreads in a detached thread, this
> means if pthread_join observes __nptl_nthreads==1
> then user memory accesses in the detached thread are
> synchronized with non-atomic memory accesses after
> pthread_join returns. (i.e. __nptl_nthreads==1 should
> mean at all times that as far as user code is concerned
> the process is single threaded even if some detached
> thread is still hanging around)

This depends on the extent to which kernel actions after thread exit are
visible to the last remaining threads.  I think it would be safer to
require that the stack must have been reclaimed.  From a high-level
perspective, we have a similar synchronization issue with stack
reclamation, and it should be possible to reuse the same mechanism for
that.

> i think __libc_single_threaded should be possible to
> update in pthread_join with the above change, in
> which case we need not document that it stays false
> forever, so we can change this in the future.

Yes, I expect future such changes, which is why I mentioned saving the
value of __libc_single_threaded for consistent execution.  The other
obvious case for optimization is fork, which is easier to implement: it
is only necessary to detect that the libc in question is an outer libc.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-22 10:54                 ` Szabolcs Nagy
  2020-05-22 11:08                   ` Florian Weimer via Libc-alpha
@ 2020-05-22 15:07                   ` Rich Felker
  2020-05-22 16:14                     ` Rich Felker
  1 sibling, 1 reply; 45+ messages in thread
From: Rich Felker @ 2020-05-22 15:07 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: Florian Weimer, libc-alpha

On Fri, May 22, 2020 at 11:54:58AM +0100, Szabolcs Nagy wrote:
> The 05/22/2020 12:05, Florian Weimer wrote:
> > * Szabolcs Nagy:
> > 
> > > The 05/21/2020 15:44, Florian Weimer wrote:
> > >> * Szabolcs Nagy:
> > >> > what's wrong with pthread_join updating it?
> > >> 
> > >> It's tricky do it correctly if there are two remaining threads, one of
> > >> them the one being joined, the other one a detached thread.  A
> > >> straightforward implementation merely looking at __nptl_nthreads before
> > >> returning from pthread_join would not perform the required
> > >> synchronization on the detached thread exit.
> > >
> > > i'm trying to understand this, but don't see
> > > what's wrong if the last thread is detached.
> > 
> > Sorry, I meant three reamining threads in total, i.e., two more threads
> > in addition to the one thread that keeps going after the other two
> > exited, and may use __libc_single_threaded in the future.
> > 
> > Clearer now?
> 
> hm so a detached thread is concurrently exiting with
> a pthread_join which sees a decremented __nptl_nthreads
> but the detached thread has not actually exited yet.

In principle this is no big deal as long as the exiting thread cannot
make any further actions where its existence causes an observable
effect on users of __libc_single_threaded. (For this purpose, I think
you actually need to define what uses are valid, though; see setxid
remarks below.) If it makes a problem for pthread_join that's an
implementation detail that should be fixable. The bigger issue is
memory synchronization.

> i think glibc can issue a memory barrier syscall before
> decrementing __nptl_nthreads in a detached thread, this
> means if pthread_join observes __nptl_nthreads==1
> then user memory accesses in the detached thread are
> synchronized with non-atomic memory accesses after
> pthread_join returns. (i.e. __nptl_nthreads==1 should
> mean at all times that as far as user code is concerned
> the process is single threaded even if some detached
> thread is still hanging around)

This still has consequences for setxid safety which is why musl now
fully synchronizes the existing threads list. But if you're not using
the thread count for that, it's not an issue. Indeed I think
SYS_membarrier is a solution here, but if it's not supported or
blocked by seccomp then __libc_single_threaded must not be made true
again at this time.

> i think __libc_single_threaded should be possible to
> update in pthread_join with the above change, in
> which case we need not document that it stays false
> forever, so we can change this in the future.
> (unless somebody finds usecases where a false->true
> transition would cause problems)

pthread_join is not unique here. In principle __libc_single_threaded
can be made true again any time a non-AS-safe function observes a
thread count of 1 after performing a memory barrier. There may be
other internal-locking situations where it's beneficial to update it
so that future locks can be elided.

Rich

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-22 15:07                   ` Rich Felker
@ 2020-05-22 16:14                     ` Rich Felker
  2020-05-22 16:36                       ` Adhemerval Zanella via Libc-alpha
  2020-05-22 17:02                       ` Florian Weimer via Libc-alpha
  0 siblings, 2 replies; 45+ messages in thread
From: Rich Felker @ 2020-05-22 16:14 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: Florian Weimer, libc-alpha

On Fri, May 22, 2020 at 11:07:20AM -0400, Rich Felker wrote:
> On Fri, May 22, 2020 at 11:54:58AM +0100, Szabolcs Nagy wrote:
> > The 05/22/2020 12:05, Florian Weimer wrote:
> > > * Szabolcs Nagy:
> > > 
> > > > The 05/21/2020 15:44, Florian Weimer wrote:
> > > >> * Szabolcs Nagy:
> > > >> > what's wrong with pthread_join updating it?
> > > >> 
> > > >> It's tricky do it correctly if there are two remaining threads, one of
> > > >> them the one being joined, the other one a detached thread.  A
> > > >> straightforward implementation merely looking at __nptl_nthreads before
> > > >> returning from pthread_join would not perform the required
> > > >> synchronization on the detached thread exit.
> > > >
> > > > i'm trying to understand this, but don't see
> > > > what's wrong if the last thread is detached.
> > > 
> > > Sorry, I meant three reamining threads in total, i.e., two more threads
> > > in addition to the one thread that keeps going after the other two
> > > exited, and may use __libc_single_threaded in the future.
> > > 
> > > Clearer now?
> > 
> > hm so a detached thread is concurrently exiting with
> > a pthread_join which sees a decremented __nptl_nthreads
> > but the detached thread has not actually exited yet.
> 
> In principle this is no big deal as long as the exiting thread cannot
> make any further actions where its existence causes an observable
> effect on users of __libc_single_threaded. (For this purpose, I think
> you actually need to define what uses are valid, though; see setxid
> remarks below.) If it makes a problem for pthread_join that's an
> implementation detail that should be fixable. The bigger issue is
> memory synchronization.
> 
> > i think glibc can issue a memory barrier syscall before
> > decrementing __nptl_nthreads in a detached thread, this
> > means if pthread_join observes __nptl_nthreads==1
> > then user memory accesses in the detached thread are
> > synchronized with non-atomic memory accesses after
> > pthread_join returns. (i.e. __nptl_nthreads==1 should
> > mean at all times that as far as user code is concerned
> > the process is single threaded even if some detached
> > thread is still hanging around)
> 
> This still has consequences for setxid safety which is why musl now
> fully synchronizes the existing threads list. But if you're not using
> the thread count for that, it's not an issue. Indeed I think
> SYS_membarrier is a solution here, but if it's not supported or
> blocked by seccomp then __libc_single_threaded must not be made true
> again at this time.

Uhg, SYS_membarrier is *not* a solution here. The problem is far
worse, because the user of __libc_single_threaded potentially lacks
*compiler barriers* too.

Consider something like:

	if (!__libc_single_threaded) { lock(); need_unlock=1; }
	x = *p;
	if (need_unlock) unlock();
	/* ... */
	if (!__libc_single_threaded) { lock(); need_unlock=1; }
	x = *p;
	if (need_unlock) unlock();

Here, in the case where __libc_single_threaded is true the second time
around, there is no (memory or compiler) acquire barrier between the
first access to *p and the second. Thus the compiler can (and actually
does! I don't have a minimal PoC but musl actually just hit a bug very
close to this) omit the second load from memory, and uses the cached
value, which may be incorrect because the exiting thread modified it.

This could potentially be avoided with complex contracts about
barriers needed to use __libc_single_threaded, but it seems highly
error-prone.

Note that the issue becomes much easier to hit with a sort of "pretest
not under lock, re-check with lock" idiom of the form:

	x = *p;
	if (predicate(x)) {
		if (!__libc_single_threaded) { lock(); need_unlock=1; }
		x = *p;
		/* ... */
		if (need_unlock) unlock();
	}

Unlike the above, this one does not depend on the release barrier in
unlock() not also being a compiler acquire barrier.

Rich

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-22 16:14                     ` Rich Felker
@ 2020-05-22 16:36                       ` Adhemerval Zanella via Libc-alpha
  2020-05-22 17:02                       ` Florian Weimer via Libc-alpha
  1 sibling, 0 replies; 45+ messages in thread
From: Adhemerval Zanella via Libc-alpha @ 2020-05-22 16:36 UTC (permalink / raw)
  To: libc-alpha



On 22/05/2020 13:14, Rich Felker wrote:
> On Fri, May 22, 2020 at 11:07:20AM -0400, Rich Felker wrote:
>> On Fri, May 22, 2020 at 11:54:58AM +0100, Szabolcs Nagy wrote:
>>> The 05/22/2020 12:05, Florian Weimer wrote:
>>>> * Szabolcs Nagy:
>>>>
>>>>> The 05/21/2020 15:44, Florian Weimer wrote:
>>>>>> * Szabolcs Nagy:
>>>>>>> what's wrong with pthread_join updating it?
>>>>>>
>>>>>> It's tricky do it correctly if there are two remaining threads, one of
>>>>>> them the one being joined, the other one a detached thread.  A
>>>>>> straightforward implementation merely looking at __nptl_nthreads before
>>>>>> returning from pthread_join would not perform the required
>>>>>> synchronization on the detached thread exit.
>>>>>
>>>>> i'm trying to understand this, but don't see
>>>>> what's wrong if the last thread is detached.
>>>>
>>>> Sorry, I meant three reamining threads in total, i.e., two more threads
>>>> in addition to the one thread that keeps going after the other two
>>>> exited, and may use __libc_single_threaded in the future.
>>>>
>>>> Clearer now?
>>>
>>> hm so a detached thread is concurrently exiting with
>>> a pthread_join which sees a decremented __nptl_nthreads
>>> but the detached thread has not actually exited yet.
>>
>> In principle this is no big deal as long as the exiting thread cannot
>> make any further actions where its existence causes an observable
>> effect on users of __libc_single_threaded. (For this purpose, I think
>> you actually need to define what uses are valid, though; see setxid
>> remarks below.) If it makes a problem for pthread_join that's an
>> implementation detail that should be fixable. The bigger issue is
>> memory synchronization.
>>
>>> i think glibc can issue a memory barrier syscall before
>>> decrementing __nptl_nthreads in a detached thread, this
>>> means if pthread_join observes __nptl_nthreads==1
>>> then user memory accesses in the detached thread are
>>> synchronized with non-atomic memory accesses after
>>> pthread_join returns. (i.e. __nptl_nthreads==1 should
>>> mean at all times that as far as user code is concerned
>>> the process is single threaded even if some detached
>>> thread is still hanging around)
>>
>> This still has consequences for setxid safety which is why musl now
>> fully synchronizes the existing threads list. But if you're not using
>> the thread count for that, it's not an issue. Indeed I think
>> SYS_membarrier is a solution here, but if it's not supported or
>> blocked by seccomp then __libc_single_threaded must not be made true
>> again at this time.
> 
> Uhg, SYS_membarrier is *not* a solution here. The problem is far
> worse, because the user of __libc_single_threaded potentially lacks
> *compiler barriers* too.
> 
> Consider something like:
> 
> 	if (!__libc_single_threaded) { lock(); need_unlock=1; }
> 	x = *p;
> 	if (need_unlock) unlock();
> 	/* ... */
> 	if (!__libc_single_threaded) { lock(); need_unlock=1; }
> 	x = *p;
> 	if (need_unlock) unlock();
> 
> Here, in the case where __libc_single_threaded is true the second time
> around, there is no (memory or compiler) acquire barrier between the
> first access to *p and the second. Thus the compiler can (and actually
> does! I don't have a minimal PoC but musl actually just hit a bug very
> close to this) omit the second load from memory, and uses the cached
> value, which may be incorrect because the exiting thread modified it.

Does it help to enforce a relaxed atomic MO on __libc_single_threaded
access in this example?

> 
> This could potentially be avoided with complex contracts about
> barriers needed to use __libc_single_threaded, but it seems highly
> error-prone.
> 
> Note that the issue becomes much easier to hit with a sort of "pretest
> not under lock, re-check with lock" idiom of the form:
> 
> 	x = *p;
> 	if (predicate(x)) {
> 		if (!__libc_single_threaded) { lock(); need_unlock=1; }
> 		x = *p;
> 		/* ... */
> 		if (need_unlock) unlock();
> 	}
> 
> Unlike the above, this one does not depend on the release barrier in
> unlock() not also being a compiler acquire barrier.
> 
> Rich
> 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-22 16:14                     ` Rich Felker
  2020-05-22 16:36                       ` Adhemerval Zanella via Libc-alpha
@ 2020-05-22 17:02                       ` Florian Weimer via Libc-alpha
  2020-05-22 17:18                         ` Florian Weimer via Libc-alpha
  2020-05-22 17:28                         ` Rich Felker
  1 sibling, 2 replies; 45+ messages in thread
From: Florian Weimer via Libc-alpha @ 2020-05-22 17:02 UTC (permalink / raw)
  To: Rich Felker; +Cc: libc-alpha

* Rich Felker:

>> This still has consequences for setxid safety which is why musl now
>> fully synchronizes the existing threads list. But if you're not using
>> the thread count for that, it's not an issue. Indeed I think
>> SYS_membarrier is a solution here, but if it's not supported or
>> blocked by seccomp then __libc_single_threaded must not be made true
>> again at this time.
>
> Uhg, SYS_membarrier is *not* a solution here. The problem is far
> worse, because the user of __libc_single_threaded potentially lacks
> *compiler barriers* too.
>
> Consider something like:
>
> 	if (!__libc_single_threaded) { lock(); need_unlock=1; }
> 	x = *p;
> 	if (need_unlock) unlock();
> 	/* ... */
> 	if (!__libc_single_threaded) { lock(); need_unlock=1; }
> 	x = *p;
> 	if (need_unlock) unlock();
>
> Here, in the case where __libc_single_threaded is true the second time
> around, there is no (memory or compiler) acquire barrier between the
> first access to *p and the second. Thus the compiler can (and actually
> does! I don't have a minimal PoC but musl actually just hit a bug very
> close to this) omit the second load from memory, and uses the cached
> value, which may be incorrect because the exiting thread modified it.
>
> This could potentially be avoided with complex contracts about
> barriers needed to use __libc_single_threaded, but it seems highly
> error-prone.

Well, yes.  It's clearly a data race if the implementation sets
__libc_single_threaded directly from an exiting thread.  I don't see a
way around that.

Our discussion focused on the problem that observing a thread count of 1
in pthread_join does not necessarily mean that it is safe to assume at
this point that the process is single-threaded, in glibc's
implementation that uses a simple __nptl_nthreads counter decremented on
the thread itself.  This does not cause a low-level data race directly,
but is potentially still incorrect (I'm not quite sure yet).

In glibc, we annotate many functions with __attribute__ ((leaf)),
implicitly via __THROW.  None of these functions may reset
__libc_single_threaded.  I expect that many compilers have a built-in
list of standard functions they treat as leaf functions.  This means
that these functions cannot write in practice to __libc_single_threaded
(or any other global variable apart from errno).  Not following this
rule would result in undefined behavior, similar to an actual data race
in the memory model.

A compiler cannot treat pthread_create as a leaf function, so the simple
implementation of __libc_single_threaded I posted should be fine in this
regard.

Thanjs,
Florian


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-22 17:02                       ` Florian Weimer via Libc-alpha
@ 2020-05-22 17:18                         ` Florian Weimer via Libc-alpha
  2020-05-22 17:28                         ` Rich Felker
  1 sibling, 0 replies; 45+ messages in thread
From: Florian Weimer via Libc-alpha @ 2020-05-22 17:18 UTC (permalink / raw)
  To: Rich Felker; +Cc: libc-alpha

* Florian Weimer:

> * Rich Felker:
>
>>> This still has consequences for setxid safety which is why musl now
>>> fully synchronizes the existing threads list. But if you're not using
>>> the thread count for that, it's not an issue. Indeed I think
>>> SYS_membarrier is a solution here, but if it's not supported or
>>> blocked by seccomp then __libc_single_threaded must not be made true
>>> again at this time.
>>
>> Uhg, SYS_membarrier is *not* a solution here. The problem is far
>> worse, because the user of __libc_single_threaded potentially lacks
>> *compiler barriers* too.
>>
>> Consider something like:
>>
>> 	if (!__libc_single_threaded) { lock(); need_unlock=1; }
>> 	x = *p;
>> 	if (need_unlock) unlock();
>> 	/* ... */
>> 	if (!__libc_single_threaded) { lock(); need_unlock=1; }
>> 	x = *p;
>> 	if (need_unlock) unlock();
>>
>> Here, in the case where __libc_single_threaded is true the second time
>> around, there is no (memory or compiler) acquire barrier between the
>> first access to *p and the second. Thus the compiler can (and actually
>> does! I don't have a minimal PoC but musl actually just hit a bug very
>> close to this) omit the second load from memory, and uses the cached
>> value, which may be incorrect because the exiting thread modified it.
>>
>> This could potentially be avoided with complex contracts about
>> barriers needed to use __libc_single_threaded, but it seems highly
>> error-prone.
>
> Well, yes.  It's clearly a data race if the implementation sets
> __libc_single_threaded directly from an exiting thread.  I don't see a
> way around that.
>
> Our discussion focused on the problem that observing a thread count of 1
> in pthread_join does not necessarily mean that it is safe to assume at
> this point that the process is single-threaded, in glibc's
> implementation that uses a simple __nptl_nthreads counter decremented on
> the thread itself.  This does not cause a low-level data race directly,
> but is potentially still incorrect (I'm not quite sure yet).
>
> In glibc, we annotate many functions with __attribute__ ((leaf)),
> implicitly via __THROW.  None of these functions may reset
> __libc_single_threaded.  I expect that many compilers have a built-in
> list of standard functions they treat as leaf functions.  This means
> that these functions cannot write in practice to __libc_single_threaded
> (or any other global variable apart from errno).  Not following this
> rule would result in undefined behavior, similar to an actual data race
> in the memory model.
>
> A compiler cannot treat pthread_create as a leaf function, so the simple
> implementation of __libc_single_threaded I posted should be fine in this
> regard.

Sorry, it's not the leaf attribute what I'm looking here.  It has this
affect only for static variables in translation units that contains
functions whose address is taken.  What I'm after is more like the pure
attribute.

Clang does it for malloc in C++ mode, but not in C mode:

#include <stdlib.h>

extern int a;
extern void *p;

int
f (void)
{
  int a1 = a;
  p = malloc (1);
  return a1 - a;
}

turns into:

_Z1fv:                                  # @_Z1fv
# %bb.0:
	pushq	%rax
	movl	$1, %edi
	callq	malloc
	movq	%rax, p(%rip)
	xorl	%eax, %eax
	popq	%rcx
	retq

So the code is compiled as if it were written like this, consistent with
eliding the relaod:

int
f (void)
{
  p = malloc (1);
  return 0;
}

Thanks,
Florian


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-22 17:02                       ` Florian Weimer via Libc-alpha
  2020-05-22 17:18                         ` Florian Weimer via Libc-alpha
@ 2020-05-22 17:28                         ` Rich Felker
  2020-05-22 17:40                           ` Florian Weimer via Libc-alpha
  1 sibling, 1 reply; 45+ messages in thread
From: Rich Felker @ 2020-05-22 17:28 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

On Fri, May 22, 2020 at 07:02:01PM +0200, Florian Weimer wrote:
> * Rich Felker:
> 
> >> This still has consequences for setxid safety which is why musl now
> >> fully synchronizes the existing threads list. But if you're not using
> >> the thread count for that, it's not an issue. Indeed I think
> >> SYS_membarrier is a solution here, but if it's not supported or
> >> blocked by seccomp then __libc_single_threaded must not be made true
> >> again at this time.
> >
> > Uhg, SYS_membarrier is *not* a solution here. The problem is far
> > worse, because the user of __libc_single_threaded potentially lacks
> > *compiler barriers* too.
> >
> > Consider something like:
> >
> > 	if (!__libc_single_threaded) { lock(); need_unlock=1; }
> > 	x = *p;
> > 	if (need_unlock) unlock();
> > 	/* ... */
> > 	if (!__libc_single_threaded) { lock(); need_unlock=1; }
> > 	x = *p;
> > 	if (need_unlock) unlock();
> >
> > Here, in the case where __libc_single_threaded is true the second time
> > around, there is no (memory or compiler) acquire barrier between the
> > first access to *p and the second. Thus the compiler can (and actually
> > does! I don't have a minimal PoC but musl actually just hit a bug very
> > close to this) omit the second load from memory, and uses the cached
> > value, which may be incorrect because the exiting thread modified it.
> >
> > This could potentially be avoided with complex contracts about
> > barriers needed to use __libc_single_threaded, but it seems highly
> > error-prone.
> 
> Well, yes.  It's clearly a data race if the implementation sets
> __libc_single_threaded directly from an exiting thread.  I don't see a
> way around that.
> 
> Our discussion focused on the problem that observing a thread count of 1
> in pthread_join does not necessarily mean that it is safe to assume at
> this point that the process is single-threaded, in glibc's
> implementation that uses a simple __nptl_nthreads counter decremented on
> the thread itself.  This does not cause a low-level data race directly,
> but is potentially still incorrect (I'm not quite sure yet).

pthread_join necessarily has an acquire barrier (this is a fundamental
requirement of the interface contract; join is acquiring the results
of the thread) so under some weak assumptions on unsynchronized memory
access (e.g. non-tearing, not seeing a value that wasn't stored
sometime between the last and next acquire barriers on the observer's
side) I think observing it from pthread_join is safe.

On the other hand I'm skeptical of the utility. In a program that
only makes small use of threads, the join may happen long after the
thread exits, during which time many operations may have been slowed
down by inability to skip locks.

> In glibc, we annotate many functions with __attribute__ ((leaf)),
> implicitly via __THROW.  None of these functions may reset
> __libc_single_threaded.

I don't think leaf (at least the gcc attribute leaf) is the actual
issue here; it's more complicated. Nothing about leaf forbids stores
to global variables. It just means the compiler can assume it has a
fuller picture for escape analysis. But any update to
__libc_single_threaded would require acquire barriers (e.g., after
acquiring a lock, using the value of __nptl_threads to infer that
future lock cycles can be skipped), and these barriers would in turn
preclude any invalid transformation by the compiler. (Leaf does not
negate barriers.)

Rich

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-22 17:28                         ` Rich Felker
@ 2020-05-22 17:40                           ` Florian Weimer via Libc-alpha
  2020-05-22 17:49                             ` Rich Felker
  0 siblings, 1 reply; 45+ messages in thread
From: Florian Weimer via Libc-alpha @ 2020-05-22 17:40 UTC (permalink / raw)
  To: Rich Felker; +Cc: libc-alpha

* Rich Felker:

>> Our discussion focused on the problem that observing a thread count of 1
>> in pthread_join does not necessarily mean that it is safe to assume at
>> this point that the process is single-threaded, in glibc's
>> implementation that uses a simple __nptl_nthreads counter decremented on
>> the thread itself.  This does not cause a low-level data race directly,
>> but is potentially still incorrect (I'm not quite sure yet).
>
> pthread_join necessarily has an acquire barrier (this is a fundamental
> requirement of the interface contract; join is acquiring the results
> of the thread) so under some weak assumptions on unsynchronized memory
> access (e.g. non-tearing, not seeing a value that wasn't stored
> sometime between the last and next acquire barriers on the observer's
> side) I think observing it from pthread_join is safe.

Because of the meaning of the variable, it is *completely* safe if there
are no detached threads, without any further assumptions.

With detached threads an pthread_join observing a thread count of 1 (as
decreased during thread exit), the validity of setting
__libc_single_threaded depends on whether the kernel offers something
that causes a memory write on thread exit.  I know of at least two such
facilities: the TID variable and robust mutexes.  Therefore, I'm
inclined that further such facilities could be added by the kernel, in
ways not observable to glibc, and we better make sure that we have full
synchronization with the thread exit if we implemented the pthread_join
optimization for __libc_single_threaded *today*, although it might not
actually be needed.

> I don't think leaf (at least the gcc attribute leaf) is the actual
> issue here

Yes, please see the follow-up.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-22 17:40                           ` Florian Weimer via Libc-alpha
@ 2020-05-22 17:49                             ` Rich Felker
  2020-05-22 19:22                               ` Szabolcs Nagy
  0 siblings, 1 reply; 45+ messages in thread
From: Rich Felker @ 2020-05-22 17:49 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

On Fri, May 22, 2020 at 07:40:19PM +0200, Florian Weimer via Libc-alpha wrote:
> * Rich Felker:
> 
> >> Our discussion focused on the problem that observing a thread count of 1
> >> in pthread_join does not necessarily mean that it is safe to assume at
> >> this point that the process is single-threaded, in glibc's
> >> implementation that uses a simple __nptl_nthreads counter decremented on
> >> the thread itself.  This does not cause a low-level data race directly,
> >> but is potentially still incorrect (I'm not quite sure yet).
> >
> > pthread_join necessarily has an acquire barrier (this is a fundamental
> > requirement of the interface contract; join is acquiring the results
> > of the thread) so under some weak assumptions on unsynchronized memory
> > access (e.g. non-tearing, not seeing a value that wasn't stored
> > sometime between the last and next acquire barriers on the observer's
> > side) I think observing it from pthread_join is safe.
> 
> Because of the meaning of the variable, it is *completely* safe if there
> are no detached threads, without any further assumptions.
> 
> With detached threads an pthread_join observing a thread count of 1 (as
> decreased during thread exit), the validity of setting
> __libc_single_threaded depends on whether the kernel offers something
> that causes a memory write on thread exit.  I know of at least two such

I don't follow. Why do you care about the kernel entity exiting here?
You should only care about having a release barrier before the update
to the count, so that seeing the updated count guarantees seeing any
changes to memory made by the exiting detached thread.

Rich

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-22 17:49                             ` Rich Felker
@ 2020-05-22 19:22                               ` Szabolcs Nagy
  2020-05-22 19:53                                 ` Rich Felker
  0 siblings, 1 reply; 45+ messages in thread
From: Szabolcs Nagy @ 2020-05-22 19:22 UTC (permalink / raw)
  To: Rich Felker; +Cc: Florian Weimer, libc-alpha

The 05/22/2020 13:49, Rich Felker wrote:
> On Fri, May 22, 2020 at 07:40:19PM +0200, Florian Weimer via Libc-alpha wrote:
> > * Rich Felker:
> > 
> > >> Our discussion focused on the problem that observing a thread count of 1
> > >> in pthread_join does not necessarily mean that it is safe to assume at
> > >> this point that the process is single-threaded, in glibc's
> > >> implementation that uses a simple __nptl_nthreads counter decremented on
> > >> the thread itself.  This does not cause a low-level data race directly,
> > >> but is potentially still incorrect (I'm not quite sure yet).
> > >
> > > pthread_join necessarily has an acquire barrier (this is a fundamental
> > > requirement of the interface contract; join is acquiring the results
> > > of the thread) so under some weak assumptions on unsynchronized memory
> > > access (e.g. non-tearing, not seeing a value that wasn't stored
> > > sometime between the last and next acquire barriers on the observer's
> > > side) I think observing it from pthread_join is safe.
> > 
> > Because of the meaning of the variable, it is *completely* safe if there
> > are no detached threads, without any further assumptions.
> > 
> > With detached threads an pthread_join observing a thread count of 1 (as
> > decreased during thread exit), the validity of setting
> > __libc_single_threaded depends on whether the kernel offers something
> > that causes a memory write on thread exit.  I know of at least two such
> 
> I don't follow. Why do you care about the kernel entity exiting here?
> You should only care about having a release barrier before the update
> to the count, so that seeing the updated count guarantees seeing any
> changes to memory made by the exiting detached thread.

kernel entity matters if it writes user memory after
the release barrier such that user code may observe it.
(although that would likely break conformance or other
properties too, not just single thread checks).

another example is observing the detached thread via
kernel apis like /proc/task: user code may expect to
see a single os task when __libc_single_threaded is set.

so i think the safest implementation never sets
__libc_single_threaded back to true and second safest
is one that only sets it back to true in pthread_join
when there were no detached threads (or if using some
os api it can verify that there really is only one
kernel thread).

if we want to allow kernel entities to be around
but still tell the user when "as far as the libc
is concerned there is only one thread", then i
think __libc_single_threaded needs to be an extern
call (that acts as a compiler barrier).

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-22 19:22                               ` Szabolcs Nagy
@ 2020-05-22 19:53                                 ` Rich Felker
  2020-05-23  6:49                                   ` Szabolcs Nagy
  0 siblings, 1 reply; 45+ messages in thread
From: Rich Felker @ 2020-05-22 19:53 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: Florian Weimer, libc-alpha

On Fri, May 22, 2020 at 08:22:50PM +0100, Szabolcs Nagy wrote:
> The 05/22/2020 13:49, Rich Felker wrote:
> > On Fri, May 22, 2020 at 07:40:19PM +0200, Florian Weimer via Libc-alpha wrote:
> > > * Rich Felker:
> > > 
> > > >> Our discussion focused on the problem that observing a thread count of 1
> > > >> in pthread_join does not necessarily mean that it is safe to assume at
> > > >> this point that the process is single-threaded, in glibc's
> > > >> implementation that uses a simple __nptl_nthreads counter decremented on
> > > >> the thread itself.  This does not cause a low-level data race directly,
> > > >> but is potentially still incorrect (I'm not quite sure yet).
> > > >
> > > > pthread_join necessarily has an acquire barrier (this is a fundamental
> > > > requirement of the interface contract; join is acquiring the results
> > > > of the thread) so under some weak assumptions on unsynchronized memory
> > > > access (e.g. non-tearing, not seeing a value that wasn't stored
> > > > sometime between the last and next acquire barriers on the observer's
> > > > side) I think observing it from pthread_join is safe.
> > > 
> > > Because of the meaning of the variable, it is *completely* safe if there
> > > are no detached threads, without any further assumptions.
> > > 
> > > With detached threads an pthread_join observing a thread count of 1 (as
> > > decreased during thread exit), the validity of setting
> > > __libc_single_threaded depends on whether the kernel offers something
> > > that causes a memory write on thread exit.  I know of at least two such
> > 
> > I don't follow. Why do you care about the kernel entity exiting here?
> > You should only care about having a release barrier before the update
> > to the count, so that seeing the updated count guarantees seeing any
> > changes to memory made by the exiting detached thread.
> 
> kernel entity matters if it writes user memory after
> the release barrier such that user code may observe it.
> (although that would likely break conformance or other
> properties too, not just single thread checks).
> 
> another example is observing the detached thread via
> kernel apis like /proc/task: user code may expect to
> see a single os task when __libc_single_threaded is set.

Indeed, this is why I said the precise meaning/contract for what the
consumer of __libc_single_threaded is allowed to do needs to be clear
-- see also the new thread "How would we make linux man pages
authoritative?" and Michael Kerrisk's comments on how hidden behaviors
become de facto contracts if you don't clearly specify otherwise.

> so i think the safest implementation never sets
> __libc_single_threaded back to true and second safest
> is one that only sets it back to true in pthread_join
> when there were no detached threads (or if using some
> os api it can verify that there really is only one
> kernel thread).
> 
> if we want to allow kernel entities to be around
> but still tell the user when "as far as the libc
> is concerned there is only one thread", then i
> think __libc_single_threaded needs to be an extern
> call (that acts as a compiler barrier).

Do relaxed order atomics provide a compiler barrier? If so, I think
SYS_membarrier combined with one could be sufficient However using
atomic type for a public interface like this is a bit of a
compatibility thorn (requires compiler and std level supporting it).

Of course the same could be achieved with requirement to use a manual
compiler barrier (dummy asm) but I think it's error-prone to assume
the application would do it correctly.

Rich

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-22 19:53                                 ` Rich Felker
@ 2020-05-23  6:49                                   ` Szabolcs Nagy
  2020-05-23 16:02                                     ` Rich Felker
  0 siblings, 1 reply; 45+ messages in thread
From: Szabolcs Nagy @ 2020-05-23  6:49 UTC (permalink / raw)
  To: Rich Felker; +Cc: Florian Weimer, libc-alpha

* Rich Felker <dalias@libc.org> [2020-05-22 15:53:26 -0400]:
> On Fri, May 22, 2020 at 08:22:50PM +0100, Szabolcs Nagy wrote:
> > so i think the safest implementation never sets
> > __libc_single_threaded back to true and second safest
> > is one that only sets it back to true in pthread_join
> > when there were no detached threads (or if using some
> > os api it can verify that there really is only one
> > kernel thread).
> > 
> > if we want to allow kernel entities to be around
> > but still tell the user when "as far as the libc
> > is concerned there is only one thread", then i
> > think __libc_single_threaded needs to be an extern
> > call (that acts as a compiler barrier).
> 
> Do relaxed order atomics provide a compiler barrier? If so, I think
> SYS_membarrier combined with one could be sufficient However using
> atomic type for a public interface like this is a bit of a
> compatibility thorn (requires compiler and std level supporting it).
> 
> Of course the same could be achieved with requirement to use a manual
> compiler barrier (dummy asm) but I think it's error-prone to assume
> the application would do it correctly.

no, relaxed atomics do not order unrelated memory accesses.

but a trick like you proposed for the musl internal need_locks
may work: __libc_single_threaded can go back to 1 when there
is only one thread left executing user code *and* that thread
called a libc function that has an acquire barrier.
(if thread exit has a release barrier then this should work)

this can allow earlier single threaded detection than only
considering pthread_join: e.g. stdio, malloc etc may do a
check and update the global after an acquire barrier, however
the compiler must not cache globals across libc calls for this
to work.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-23  6:49                                   ` Szabolcs Nagy
@ 2020-05-23 16:02                                     ` Rich Felker
  2020-05-25  8:08                                       ` Florian Weimer via Libc-alpha
  2020-05-25  8:08                                       ` Florian Weimer via Libc-alpha
  0 siblings, 2 replies; 45+ messages in thread
From: Rich Felker @ 2020-05-23 16:02 UTC (permalink / raw)
  To: Szabolcs Nagy; +Cc: Florian Weimer, libc-alpha

On Sat, May 23, 2020 at 08:49:41AM +0200, Szabolcs Nagy wrote:
> * Rich Felker <dalias@libc.org> [2020-05-22 15:53:26 -0400]:
> > On Fri, May 22, 2020 at 08:22:50PM +0100, Szabolcs Nagy wrote:
> > > so i think the safest implementation never sets
> > > __libc_single_threaded back to true and second safest
> > > is one that only sets it back to true in pthread_join
> > > when there were no detached threads (or if using some
> > > os api it can verify that there really is only one
> > > kernel thread).
> > > 
> > > if we want to allow kernel entities to be around
> > > but still tell the user when "as far as the libc
> > > is concerned there is only one thread", then i
> > > think __libc_single_threaded needs to be an extern
> > > call (that acts as a compiler barrier).
> > 
> > Do relaxed order atomics provide a compiler barrier? If so, I think
> > SYS_membarrier combined with one could be sufficient However using
> > atomic type for a public interface like this is a bit of a
> > compatibility thorn (requires compiler and std level supporting it).
> > 
> > Of course the same could be achieved with requirement to use a manual
> > compiler barrier (dummy asm) but I think it's error-prone to assume
> > the application would do it correctly.
> 
> no, relaxed atomics do not order unrelated memory accesses.
> 
> but a trick like you proposed for the musl internal need_locks
> may work: __libc_single_threaded can go back to 1 when there
> is only one thread left executing user code *and* that thread
> called a libc function that has an acquire barrier.
> (if thread exit has a release barrier then this should work)

Thread exit is necessarily a release barrier if the thread is
joinable, has robust mutexes, or any of a number of other conditions
(including anything where the child exit futex address is non-null).

In addition any barrier in userspace before the kernel task exit is
fine too.

> this can allow earlier single threaded detection than only
> considering pthread_join: e.g. stdio, malloc etc may do a
> check and update the global after an acquire barrier, however
> the compiler must not cache globals across libc calls for this
> to work.

It can't cache globals across non-pure functions whose definitions it
cant't see (and if it saw the definition it would know the global is
modified). malloc is something of a special case where clang treats it
not as a function but having "pure malloc semantics", but even then I
don't think it matters if it caches it; at worst you see the old value
of __libc_single_threaded (false) rather than the new one (true) and
that direction is safe.

Rich

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-23 16:02                                     ` Rich Felker
  2020-05-25  8:08                                       ` Florian Weimer via Libc-alpha
@ 2020-05-25  8:08                                       ` Florian Weimer via Libc-alpha
  1 sibling, 0 replies; 45+ messages in thread
From: Florian Weimer via Libc-alpha @ 2020-05-25  8:08 UTC (permalink / raw)
  To: Rich Felker; +Cc: libc-alpha

* Rich Felker:

>> this can allow earlier single threaded detection than only
>> considering pthread_join: e.g. stdio, malloc etc may do a
>> check and update the global after an acquire barrier, however
>> the compiler must not cache globals across libc calls for this
>> to work.
>
> It can't cache globals across non-pure functions whose definitions it
> cant't see (and if it saw the definition it would know the global is
> modified).

All 

> malloc is something of a special case where clang treats it
> not as a function but having "pure malloc semantics", but even then I
> don't think it matters if it caches it; at worst you see the old value
> of __libc_single_threaded (false) rather than the new one (true) and
> that direction is safe.
>
> Rich


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-23 16:02                                     ` Rich Felker
@ 2020-05-25  8:08                                       ` Florian Weimer via Libc-alpha
  2020-05-25 17:21                                         ` Rich Felker
  2020-05-25  8:08                                       ` Florian Weimer via Libc-alpha
  1 sibling, 1 reply; 45+ messages in thread
From: Florian Weimer via Libc-alpha @ 2020-05-25  8:08 UTC (permalink / raw)
  To: Rich Felker; +Cc: libc-alpha

* Rich Felker:

>> this can allow earlier single threaded detection than only
>> considering pthread_join: e.g. stdio, malloc etc may do a
>> check and update the global after an acquire barrier, however
>> the compiler must not cache globals across libc calls for this
>> to work.
>
> It can't cache globals across non-pure functions whose definitions it
> cant't see (and if it saw the definition it would know the global is
> modified).

Sorry about that, hit C-c C-c while I thought I was in a terminal. 8-/

For most standard C functions, it is well-known to which global
variables (if any) they write.  Of course, compilers exploit this fact.

> malloc is something of a special case where clang treats it
> not as a function but having "pure malloc semantics", but even then I
> don't think it matters if it caches it;

And of course malloc is the most common example of a standard function
that has observable side effects beyond those specified in the standard:
most implementations have a statistics interface.

> at worst you see the old value of __libc_single_threaded (false)
> rather than the new one (true) and that direction is safe.

It's still a data race.  The compiler can easily generate invalid code
if it incorrectly assumes that __libc_single_threaded remains stable.  I
don't know if Clang will do this.  But I think the C library
implementation should be rather conservative here.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-25  8:08                                       ` Florian Weimer via Libc-alpha
@ 2020-05-25 17:21                                         ` Rich Felker
  2020-05-27 11:54                                           ` Florian Weimer via Libc-alpha
  0 siblings, 1 reply; 45+ messages in thread
From: Rich Felker @ 2020-05-25 17:21 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

On Mon, May 25, 2020 at 10:08:37AM +0200, Florian Weimer via Libc-alpha wrote:
> * Rich Felker:
> 
> >> this can allow earlier single threaded detection than only
> >> considering pthread_join: e.g. stdio, malloc etc may do a
> >> check and update the global after an acquire barrier, however
> >> the compiler must not cache globals across libc calls for this
> >> to work.
> >
> > It can't cache globals across non-pure functions whose definitions it
> > cant't see (and if it saw the definition it would know the global is
> > modified).
> 
> Sorry about that, hit C-c C-c while I thought I was in a terminal. 8-/
> 
> For most standard C functions, it is well-known to which global
> variables (if any) they write.  Of course, compilers exploit this fact.
> 
> > malloc is something of a special case where clang treats it
> > not as a function but having "pure malloc semantics", but even then I
> > don't think it matters if it caches it;
> 
> And of course malloc is the most common example of a standard function
> that has observable side effects beyond those specified in the standard:
> most implementations have a statistics interface.
> 
> > at worst you see the old value of __libc_single_threaded (false)
> > rather than the new one (true) and that direction is safe.
> 
> It's still a data race.  The compiler can easily generate invalid code
> if it incorrectly assumes that __libc_single_threaded remains stable.  I
> don't know if Clang will do this.  But I think the C library
> implementation should be rather conservative here.

If this is an issue, and even regardless of whether it is, I think the
type of __libc_single_threaded should be volatile qualified. This
ensures that it cannot be cached in any way that might be invalid.
That's not just a hack; it's the correct way to model that the value
is able to change asynchronously (such as by an operation that the
compiler would otherwise assume can't have side effects).

Rich

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-25 17:21                                         ` Rich Felker
@ 2020-05-27 11:54                                           ` Florian Weimer via Libc-alpha
  2020-05-27 15:36                                             ` Rich Felker
  0 siblings, 1 reply; 45+ messages in thread
From: Florian Weimer via Libc-alpha @ 2020-05-27 11:54 UTC (permalink / raw)
  To: Rich Felker; +Cc: libc-alpha

* Rich Felker:

> On Mon, May 25, 2020 at 10:08:37AM +0200, Florian Weimer via Libc-alpha wrote:
>> * Rich Felker:
>> 
>> >> this can allow earlier single threaded detection than only
>> >> considering pthread_join: e.g. stdio, malloc etc may do a
>> >> check and update the global after an acquire barrier, however
>> >> the compiler must not cache globals across libc calls for this
>> >> to work.
>> >
>> > It can't cache globals across non-pure functions whose definitions it
>> > cant't see (and if it saw the definition it would know the global is
>> > modified).
>> 
>> Sorry about that, hit C-c C-c while I thought I was in a terminal. 8-/
>> 
>> For most standard C functions, it is well-known to which global
>> variables (if any) they write.  Of course, compilers exploit this fact.
>> 
>> > malloc is something of a special case where clang treats it
>> > not as a function but having "pure malloc semantics", but even then I
>> > don't think it matters if it caches it;
>> 
>> And of course malloc is the most common example of a standard function
>> that has observable side effects beyond those specified in the standard:
>> most implementations have a statistics interface.
>> 
>> > at worst you see the old value of __libc_single_threaded (false)
>> > rather than the new one (true) and that direction is safe.
>> 
>> It's still a data race.  The compiler can easily generate invalid code
>> if it incorrectly assumes that __libc_single_threaded remains stable.  I
>> don't know if Clang will do this.  But I think the C library
>> implementation should be rather conservative here.
>
> If this is an issue, and even regardless of whether it is, I think the
> type of __libc_single_threaded should be volatile qualified. This
> ensures that it cannot be cached in any way that might be invalid.
> That's not just a hack; it's the correct way to model that the value
> is able to change asynchronously (such as by an operation that the
> compiler would otherwise assume can't have side effects).

I think it makes more sense not to declare the object as volatile and
make sure that only libc functions which imply the required barrier
write to __libc_single_threaded.  For instance, I expect that this will
allow compilers to generate tighter code around multiple (implied)
reference count updates.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-27 11:54                                           ` Florian Weimer via Libc-alpha
@ 2020-05-27 15:36                                             ` Rich Felker
  2020-06-03 15:00                                               ` Florian Weimer via Libc-alpha
  0 siblings, 1 reply; 45+ messages in thread
From: Rich Felker @ 2020-05-27 15:36 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

On Wed, May 27, 2020 at 01:54:01PM +0200, Florian Weimer wrote:
> * Rich Felker:
> 
> > On Mon, May 25, 2020 at 10:08:37AM +0200, Florian Weimer via Libc-alpha wrote:
> >> * Rich Felker:
> >> 
> >> >> this can allow earlier single threaded detection than only
> >> >> considering pthread_join: e.g. stdio, malloc etc may do a
> >> >> check and update the global after an acquire barrier, however
> >> >> the compiler must not cache globals across libc calls for this
> >> >> to work.
> >> >
> >> > It can't cache globals across non-pure functions whose definitions it
> >> > cant't see (and if it saw the definition it would know the global is
> >> > modified).
> >> 
> >> Sorry about that, hit C-c C-c while I thought I was in a terminal. 8-/
> >> 
> >> For most standard C functions, it is well-known to which global
> >> variables (if any) they write.  Of course, compilers exploit this fact.
> >> 
> >> > malloc is something of a special case where clang treats it
> >> > not as a function but having "pure malloc semantics", but even then I
> >> > don't think it matters if it caches it;
> >> 
> >> And of course malloc is the most common example of a standard function
> >> that has observable side effects beyond those specified in the standard:
> >> most implementations have a statistics interface.
> >> 
> >> > at worst you see the old value of __libc_single_threaded (false)
> >> > rather than the new one (true) and that direction is safe.
> >> 
> >> It's still a data race.  The compiler can easily generate invalid code
> >> if it incorrectly assumes that __libc_single_threaded remains stable.  I
> >> don't know if Clang will do this.  But I think the C library
> >> implementation should be rather conservative here.
> >
> > If this is an issue, and even regardless of whether it is, I think the
> > type of __libc_single_threaded should be volatile qualified. This
> > ensures that it cannot be cached in any way that might be invalid.
> > That's not just a hack; it's the correct way to model that the value
> > is able to change asynchronously (such as by an operation that the
> > compiler would otherwise assume can't have side effects).
> 
> I think it makes more sense not to declare the object as volatile and
> make sure that only libc functions which imply the required barrier
> write to __libc_single_threaded.  For instance, I expect that this will
> allow compilers to generate tighter code around multiple (implied)
> reference count updates.

It really should be volatile just because whatever you make it is ABI,
and there might be good reasons to want to make it updated
"asynchronously" with respect to the compiler's model in the future.
If you don't make it volatile now you can't do that in the future.

The cost of volatile is rather trivial; compiler can still cache
address of it, and loading from a cache line you recently loaded from,
and to which no stores are happening, is virtually free.

Rich

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-27 15:36                                             ` Rich Felker
@ 2020-06-03 15:00                                               ` Florian Weimer via Libc-alpha
  2020-06-03 17:11                                                 ` Rich Felker
  0 siblings, 1 reply; 45+ messages in thread
From: Florian Weimer via Libc-alpha @ 2020-06-03 15:00 UTC (permalink / raw)
  To: Rich Felker; +Cc: libc-alpha

* Rich Felker:

>> I think it makes more sense not to declare the object as volatile and
>> make sure that only libc functions which imply the required barrier
>> write to __libc_single_threaded.  For instance, I expect that this will
>> allow compilers to generate tighter code around multiple (implied)
>> reference count updates.
>
> It really should be volatile just because whatever you make it is ABI,
> and there might be good reasons to want to make it updated
> "asynchronously" with respect to the compiler's model in the future.
> If you don't make it volatile now you can't do that in the future.

volatile does not mean that it is safe to update it asynchronously.
Some barriers are still needed.  And if we need acquire MO for the
access, then the purpose of the variable is completely defeated.

Is it really a significant restriction on implementations if they can
only set __libc_single_threaded to true from functions which clearly
imply an acquire barrier?  I don't think so.  pthread_join qualifies,
and that seems the most important case/

> The cost of volatile is rather trivial; compiler can still cache
> address of it, and loading from a cache line you recently loaded from,
> and to which no stores are happening, is virtually free.

There is a code size impact because the compiler can no longer merge
several atomic/non-atomic alternatives.  Consider this example, which I
think it is not too unrealistic (hopefully, std::shared_ptr will
eventually compile down to something similar):

#include <stddef.h>

struct refcounted
{
  size_t count;
};

extern char __libc_single_threaded;

static inline
void ref (struct refcounted *p)
{
  if (__libc_single_threaded)
    ++p->count;
  else
    __atomic_fetch_add (&p->count, 1, __ATOMIC_RELAXED);
}


void f1 (struct refcounted *a, struct refcounted *b);

void
f2 (struct refcounted *a, struct refcounted *b)
{
  /* f1 takes ownership of both objects, but the caller of f2 is
     expected to retain (shared) ownership as well. */
  ref (a);
  ref (b);
  f1 (a, b);
}

As written, this turns into:

f2:
	cmpb	$0, __libc_single_threaded(%rip)
	je	.L2
	addq	$1, (%rdi)
.L3:
	addq	$1, (%rsi)
	jmp	f1
.L2:
	lock addq	$1, (%rdi)
	cmpb	$0, __libc_single_threaded(%rip)
	jne	.L3
	lock addq	$1, (%rsi)
	jmp	f1

The jump back to .L3 is due to the current reluctance to optimize
relaxed MO accesses, due to the brokenness of the C++ memory model
(although it would be quite safe here, I assume).

With volatile, it's this instead:

f2:
	movzbl	__libc_single_threaded(%rip), %eax
	testb	%al, %al
	je	.L2
	addq	$1, (%rdi)
	movzbl	__libc_single_threaded(%rip), %eax
	testb	%al, %al
	je	.L4
.L7:
	addq	$1, (%rsi)
	jmp	f1
.L2:
	lock addq	$1, (%rdi)
	movzbl	__libc_single_threaded(%rip), %eax
	testb	%al, %al
	jne	.L7
.L4:
	lock addq	$1, (%rsi)
	jmp	f1

So two checks in the non-atomic case, plus a load for the volatile
access.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-05-21 14:32       ` Adhemerval Zanella via Libc-alpha
@ 2020-06-03 15:48         ` Florian Weimer via Libc-alpha
  2020-06-03 17:52           ` Adhemerval Zanella via Libc-alpha
  0 siblings, 1 reply; 45+ messages in thread
From: Florian Weimer via Libc-alpha @ 2020-06-03 15:48 UTC (permalink / raw)
  To: Adhemerval Zanella; +Cc: Adhemerval Zanella via Libc-alpha

* Adhemerval Zanella:

>> I'm going to add this to the manual as an implementation note, after the
>> first example:
>> 
>> @c Note: No memory order on __libc_single_threaded.  The
>> @c implementation must ensure that exit of the critical
>> @c (second-to-last) thread happens-before setting
>> @c __libc_single_threaded to true.  Otherwise, acquire MO might be
>> @c needed for reading the variable in some scenarios, and that would
>> @c completely defeat its purpose.
>
> The comments is sound, but I still think we should properly document 
> that this initial version does not attempt to update 
> __libc_single_threaded on pthread_join or detach exit and maybe also
> the brief explanation you added on why this semantic was chose (to
> avoid the requirement of more strict MO).

I'm concerned that if we make the implementation too transparent,
programmers will read the explanation and say, “gosh, I better assign
true to __libc_single_threaded after I joined the last thread”.  That's
not something we want to encourage.

>> For detached thread exits, this kind of synchronization may not be
>> easily obtainable in all cases.  I don't think we can do it on the
>> on-thread exit path because the kernel will perform certain actions
>> afterwards (like robust mutex updates), no matter how late we do it.  I
>> guess we could perhaps piggy-back on the stack reclamation mechanism.
>
> It seems that robust mutexes updates are indeed a problem, but I am not
> sure if CLONE_CHILD_CLEARTID clear helps here.  It signals the thread
> is done with the memory synchronization, but the stack cache is not
> really updated.  Maybe an extra clone3 flag ?

I thought we might piggy-back on the work that free_stacks does.  But
the code is sufficiently convoluted that I'm no longer sure.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-06-03 15:00                                               ` Florian Weimer via Libc-alpha
@ 2020-06-03 17:11                                                 ` Rich Felker
  0 siblings, 0 replies; 45+ messages in thread
From: Rich Felker @ 2020-06-03 17:11 UTC (permalink / raw)
  To: Florian Weimer; +Cc: libc-alpha

On Wed, Jun 03, 2020 at 05:00:01PM +0200, Florian Weimer wrote:
> * Rich Felker:
> 
> >> I think it makes more sense not to declare the object as volatile and
> >> make sure that only libc functions which imply the required barrier
> >> write to __libc_single_threaded.  For instance, I expect that this will
> >> allow compilers to generate tighter code around multiple (implied)
> >> reference count updates.
> >
> > It really should be volatile just because whatever you make it is ABI,
> > and there might be good reasons to want to make it updated
> > "asynchronously" with respect to the compiler's model in the future.
> > If you don't make it volatile now you can't do that in the future.
> 
> volatile does not mean that it is safe to update it asynchronously.
> Some barriers are still needed.

If by "asynchronously" we just mean "some code running in the same
task context which the compiler isn't aware of, like a signal handler
or internals of malloc or of a pure function" then yes volatile is
sufficient. This is basically the definition of volatile.

If you mean modification from other threads, then you need additional
synchronization to ensure that the change is seen, but if you're ok
with a change not being seen (roughly the same as relaxed order) then
volatile suffices as a valid way to model it. (The potential change is
an asynchronous change by the hardware, where the hardware in question
happens to be the cache/memory coherency implementation.)

> And if we need acquire MO for the
> access, then the purpose of the variable is completely defeated.

No. Reading a stale value (false instead of true) is perfectly fine.
The other direction cannot happen because the change from true to
false can fundamentally happen only in a single-threaded context. And
if you don't want to read a stale value, SYS_membarrier performed by
the thread changing the value ensures that you won't. But that's not
needed.

> Is it really a significant restriction on implementations if they can
> only set __libc_single_threaded to true from functions which clearly
> imply an acquire barrier?  I don't think so.  pthread_join qualifies,
> and that seems the most important case/

Again doing it only from pthread_join will completely omit the
optimization in any program that waits to join until is has no further
work to do, which is a very reasonable thing to do.

> > The cost of volatile is rather trivial; compiler can still cache
> > address of it, and loading from a cache line you recently loaded from,
> > and to which no stores are happening, is virtually free.
> 
> There is a code size impact because the compiler can no longer merge
> several atomic/non-atomic alternatives.  Consider this example, which I
> think it is not too unrealistic (hopefully, std::shared_ptr will
> eventually compile down to something similar):
> 
> #include <stddef.h>
> 
> struct refcounted
> {
>   size_t count;
> };
> 
> extern char __libc_single_threaded;
> 
> static inline
> void ref (struct refcounted *p)
> {
>   if (__libc_single_threaded)
>     ++p->count;
>   else
>     __atomic_fetch_add (&p->count, 1, __ATOMIC_RELAXED);
> }
> 
> 
> void f1 (struct refcounted *a, struct refcounted *b);
> 
> void
> f2 (struct refcounted *a, struct refcounted *b)
> {
>   /* f1 takes ownership of both objects, but the caller of f2 is
>      expected to retain (shared) ownership as well. */
>   ref (a);
>   ref (b);
>   f1 (a, b);
> }
> 
> As written, this turns into:
> 
> f2:
> 	cmpb	$0, __libc_single_threaded(%rip)
> 	je	.L2
> 	addq	$1, (%rdi)
> ..L3:
> 	addq	$1, (%rsi)
> 	jmp	f1
> ..L2:
> 	lock addq	$1, (%rdi)
> 	cmpb	$0, __libc_single_threaded(%rip)
> 	jne	.L3
> 	lock addq	$1, (%rsi)
> 	jmp	f1
> 
> The jump back to .L3 is due to the current reluctance to optimize
> relaxed MO accesses, due to the brokenness of the C++ memory model
> (although it would be quite safe here, I assume).
> 
> With volatile, it's this instead:
> 
> f2:
> 	movzbl	__libc_single_threaded(%rip), %eax
> 	testb	%al, %al
> 	je	.L2
> 	addq	$1, (%rdi)
> 	movzbl	__libc_single_threaded(%rip), %eax
> 	testb	%al, %al
> 	je	.L4
> ..L7:
> 	addq	$1, (%rsi)
> 	jmp	f1
> ..L2:
> 	lock addq	$1, (%rdi)
> 	movzbl	__libc_single_threaded(%rip), %eax
> 	testb	%al, %al
> 	jne	.L7
> ..L4:
> 	lock addq	$1, (%rsi)
> 	jmp	f1
> 
> So two checks in the non-atomic case, plus a load for the volatile
> access.

I see; indeed for something on this small a scale, it *might* matter.
Has anyone done any measurement to determine whether it does?

Rich

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 2/2] manual: Document __libc_single_threaded
  2020-06-03 15:48         ` Florian Weimer via Libc-alpha
@ 2020-06-03 17:52           ` Adhemerval Zanella via Libc-alpha
  0 siblings, 0 replies; 45+ messages in thread
From: Adhemerval Zanella via Libc-alpha @ 2020-06-03 17:52 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Adhemerval Zanella via Libc-alpha



On 03/06/2020 12:48, Florian Weimer wrote:
> * Adhemerval Zanella:
> 
>>> I'm going to add this to the manual as an implementation note, after the
>>> first example:
>>>
>>> @c Note: No memory order on __libc_single_threaded.  The
>>> @c implementation must ensure that exit of the critical
>>> @c (second-to-last) thread happens-before setting
>>> @c __libc_single_threaded to true.  Otherwise, acquire MO might be
>>> @c needed for reading the variable in some scenarios, and that would
>>> @c completely defeat its purpose.
>>
>> The comments is sound, but I still think we should properly document 
>> that this initial version does not attempt to update 
>> __libc_single_threaded on pthread_join or detach exit and maybe also
>> the brief explanation you added on why this semantic was chose (to
>> avoid the requirement of more strict MO).
> 
> I'm concerned that if we make the implementation too transparent,
> programmers will read the explanation and say, “gosh, I better assign
> true to __libc_single_threaded after I joined the last thread”.  That's
> not something we want to encourage.

In fact from your discussion with Rich I think we really should be
transparent about its semantic, why he have chose the qualifiers 
(whether we use volatile or not), how glibc updates internally, and 
the expected usage patterns (for instance program are not expected
to change its value).

> 
>>> For detached thread exits, this kind of synchronization may not be
>>> easily obtainable in all cases.  I don't think we can do it on the
>>> on-thread exit path because the kernel will perform certain actions
>>> afterwards (like robust mutex updates), no matter how late we do it.  I
>>> guess we could perhaps piggy-back on the stack reclamation mechanism.
>>
>> It seems that robust mutexes updates are indeed a problem, but I am not
>> sure if CLONE_CHILD_CLEARTID clear helps here.  It signals the thread
>> is done with the memory synchronization, but the stack cache is not
>> really updated.  Maybe an extra clone3 flag ?
> 
> I thought we might piggy-back on the work that free_stacks does.  But
> the code is sufficiently convoluted that I'm no longer sure.

I am not sure that the free_stack pattern is really helpful without some
memory synchronization, such as SYS_membarrier. My idea of the clone3
flag would be similar to CLONE_CHILD_{CLEAR,SET}TID where kernel does
the heavy lifting of the memory synchronization to update the total
number of threads.


^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2020-06-03 17:52 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-20 18:12 [PATCH 0/2] Add __libc_single_threaded Florian Weimer via Libc-alpha
2020-05-20 18:12 ` [PATCH 1/2] Add the __libc_single_threaded variable Florian Weimer via Libc-alpha
2020-05-21 13:07   ` Szabolcs Nagy
2020-05-21 13:16     ` Florian Weimer via Libc-alpha
2020-05-21 13:26       ` Szabolcs Nagy
2020-05-20 18:12 ` [PATCH 2/2] manual: Document __libc_single_threaded Florian Weimer via Libc-alpha
2020-05-21  7:52   ` Michael Kerrisk (man-pages) via Libc-alpha
2020-05-21 12:17     ` Florian Weimer via Libc-alpha
2020-05-21 11:18   ` Szabolcs Nagy
2020-05-21 12:16     ` Florian Weimer via Libc-alpha
2020-05-21 12:50   ` Adhemerval Zanella via Libc-alpha
2020-05-21 13:09     ` Szabolcs Nagy
2020-05-21 13:15       ` Adhemerval Zanella via Libc-alpha
2020-05-21 13:30         ` Szabolcs Nagy
2020-05-21 13:44           ` Florian Weimer via Libc-alpha
2020-05-21 13:58             ` Adhemerval Zanella via Libc-alpha
2020-05-21 14:03               ` Florian Weimer via Libc-alpha
2020-05-22 10:01             ` Szabolcs Nagy
2020-05-22 10:05               ` Florian Weimer via Libc-alpha
2020-05-22 10:54                 ` Szabolcs Nagy
2020-05-22 11:08                   ` Florian Weimer via Libc-alpha
2020-05-22 15:07                   ` Rich Felker
2020-05-22 16:14                     ` Rich Felker
2020-05-22 16:36                       ` Adhemerval Zanella via Libc-alpha
2020-05-22 17:02                       ` Florian Weimer via Libc-alpha
2020-05-22 17:18                         ` Florian Weimer via Libc-alpha
2020-05-22 17:28                         ` Rich Felker
2020-05-22 17:40                           ` Florian Weimer via Libc-alpha
2020-05-22 17:49                             ` Rich Felker
2020-05-22 19:22                               ` Szabolcs Nagy
2020-05-22 19:53                                 ` Rich Felker
2020-05-23  6:49                                   ` Szabolcs Nagy
2020-05-23 16:02                                     ` Rich Felker
2020-05-25  8:08                                       ` Florian Weimer via Libc-alpha
2020-05-25 17:21                                         ` Rich Felker
2020-05-27 11:54                                           ` Florian Weimer via Libc-alpha
2020-05-27 15:36                                             ` Rich Felker
2020-06-03 15:00                                               ` Florian Weimer via Libc-alpha
2020-06-03 17:11                                                 ` Rich Felker
2020-05-25  8:08                                       ` Florian Weimer via Libc-alpha
2020-05-21 13:56           ` Adhemerval Zanella via Libc-alpha
2020-05-21 13:14     ` Florian Weimer via Libc-alpha
2020-05-21 14:32       ` Adhemerval Zanella via Libc-alpha
2020-06-03 15:48         ` Florian Weimer via Libc-alpha
2020-06-03 17:52           ` Adhemerval Zanella via Libc-alpha

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).