From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-bounces+e=80x24.org@sourceware.org>
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net
X-Spam-Level: 
X-Spam-ASN: AS17314 8.43.84.0/22
X-Spam-Status: No, score=-4.3 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham
	autolearn_force=no version=3.4.2
Received: from sourceware.org (server2.sourceware.org [8.43.85.97])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
	(No client certificate requested)
	by dcvr.yhbt.net (Postfix) with ESMTPS id 33C921F8C6
	for <e@80x24.org>; Wed,  8 Sep 2021 02:57:32 +0000 (UTC)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 5E34D38515DE
	for <e@80x24.org>; Wed,  8 Sep 2021 02:57:31 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5E34D38515DE
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org;
	s=default; t=1631069851;
	bh=vJennULMqZ8Fd8YU2GNd+zD3K5v4nH16Ct7L505RX2Q=;
	h=To:Subject:Date:In-Reply-To:References:List-Id:List-Unsubscribe:
	 List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
	 From;
	b=lvDMGJ9knjuYpxSyU7VJWTCUFUXIwb/tSeNjgPt8LJIpflTT8tnkPLSFm17jh+2U8
	 A8GaAp1RQ99r6Bt445A0SUb9EH22kJ3lXLZ+LXJ1QdwzHHMLkpuohKxSRSc4nOGmIi
	 7yJZylrDFHtntCRvKCL0MY0WL2bahyI2K9ryK1bA=
Received: from mout.web.de (mout.web.de [212.227.15.3])
 by sourceware.org (Postfix) with ESMTPS id 03C613858C60
 for <libc-alpha@sourceware.org>; Wed,  8 Sep 2021 02:54:17 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 03C613858C60
X-UI-Sender-Class: c548c8c5-30a9-4db5-a2e7-cb6cb037b8f9
Received: from auth2-smtp.messagingengine.com ([66.111.4.228]) by smtp.web.de
 (mrweb002 [213.165.67.108]) with ESMTPSA (Nemesis) id
 0LjJWR-1mwrlI43hv-00dVuZ; Wed, 08 Sep 2021 04:54:16 +0200
Received: from compute3.internal (compute3.nyi.internal [10.202.2.43])
 by mailauth.nyi.internal (Postfix) with ESMTP id 539A627C005B;
 Tue,  7 Sep 2021 22:54:14 -0400 (EDT)
Received: from mailfrontend1 ([10.202.2.162])
 by compute3.internal (MEProxy); Tue, 07 Sep 2021 22:54:14 -0400
X-ME-Sender: <xms:1SU4YQzI4nYd8O4uChhcEYiaZHshO23fjo_BMKR9Z9HO0HG1r1zlcw>
 <xme:1SU4YUQRUHlknuQXFuq2U84fEiPLWtK8p3snGBIxIYn_jbyFTf2X2KwQ7MY34yYHf
 Wr35PGwFxyul2XH2A>
X-ME-Received: <xmr:1SU4YSXKqnA1HnMNoefeR-3hgxyLgtTXQtTaapNMF29rxKns6FGvEBMVr77jJZn2KBxxr86XMQ>
X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvtddrudefiedgiedvucetufdoteggodetrfdotf
 fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen
 uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkofgjfhgggfestdekre
 dtredttdenucfhrhhomhepofgrlhhtvgcuufhkrghruhhpkhgvuceomhgrlhhtvghskhgr
 rhhuphhkvgesfigvsgdruggvqeenucggtffrrghtthgvrhhnpeevgfehffefffehleelie
 fftddvudffkedtkeduheeiueeugffgvdejgeduvdfgjeenucevlhhushhtvghrufhiiigv
 pedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehmrghlthgvshhkrghruhhpkhgvodhmvg
 hsmhhtphgruhhthhhpvghrshhonhgrlhhithihqddutddujedtfedvleeiqdduuddvgedv
 keeiledqmhgrlhhtvghskhgrrhhuphhkvgeppeifvggsrdguvgesfhgrshhtmhgrihhlrd
 hfmh
X-ME-Proxy: <xmx:1SU4Yei7VB-S4hYCjz8MelyTJJbNbaxbA-K9kGerR7k-yqHnI9ewNg>
 <xmx:1SU4YSA2Ch7RwoQ7H255w0aidFFCC38QPNOaNUZ49T4Lpsc53UcXow>
 <xmx:1SU4YfJnYB0WdtSwDaOwwV8Tsfv-9WFtKj9c-cyK-1zeJxZxb4oPjQ>
 <xmx:1iU4YVPnWaap2bLU8ZSI3F2OHxoClIo0xPS4GAYCuAUNOGNDV8qFMw>
Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue,
 7 Sep 2021 22:54:13 -0400 (EDT)
To: libc-alpha@sourceware.org
Subject: [PATCH v2 3/6] nptl: Optimization by not incrementing wrefs in
 pthread_cond_wait
Date: Tue,  7 Sep 2021 22:52:36 -0400
Message-Id: <20210908025239.1326480-4-malteskarupke@web.de>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <20210908025239.1326480-1-malteskarupke@web.de>
References: <20210908025239.1326480-1-malteskarupke@web.de>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-Provags-ID: V03:K1:+KHPHT+0WVnRUBt4McvW5ktscJ9zjoXNKHKaYf8LDTzRSft0zgS
 DhqzHvJKND2l9sMRJeegkhMKZLwByNkkZExz47ZrpV7HqzIpQnIoTuyFc53uNIOl9Qn8eiL
 hbx6SoIBcV1s4k5hSm2xfhzJLbZCtEIrq4ookQSkJM/9zF1h1f905LUHLGHo6hZzeySoqrH
 DupTN8NnzhMifjLrsubYA==
X-UI-Out-Filterresults: notjunk:1;V03:K0:eLaDvcWY9G0=:47gkzEQcZqHaPJyIslUZ07
 jf33YCGJunWUkWrVSu0U9jorI1ooaRqEI7ImP1WiMRXEqiv7k8SXlHMcCmn85Q/fkQd8IHuK2
 zsa4u279aKv9UBZZQGsBSQWKJjO9qglCEi2jTk+QK0ANch3JJb58iemwevnUChgvIvMbqLYce
 bQ+h41kV6ML8xp+eBU5/ChNnVovntwiYhKa9piwrs0aXbXerUPVaMAqOEv9T7T2U/HvfIMNI1
 +a/11QWtvL1UuUCzB2wLEAu0l8PLDC7Tz4iHuCjKAUiE4y+sM192p4CId+7+0Q0y3++zr84tc
 BFdGy1dnzZC7MCbSbK36bQKyL6JWDkRoYWoqo8A7YKJfoMIgekwA6a7nOnls7adjtif7D7VN2
 Z8d2Xwa5b2Czqqtv/75Qoe8qwvbEWBeW5jt1kUumm5LUjuzqEyoxhvQ1DXm+gj1oW9A2uRxtH
 fK9q3NQUojvyGLvI49VZSdkrEIiTX3VCvMWLn59QRHkrSE+0rIrpoRkeHo6LRZ9YeIynnvha8
 /uzxCcQcKWWrDMNhWwHRiiOW9Ur1BreGLM6O7Sot+a5s9Qq4zYWGkpES3n3PFSNAYjWsmifGO
 OjgFX9m98nolfk001kHp8yaZN4856Ay6jPMPoWhBhSQWGQv1juvVKqEfy1wpTHA8u/CopVpBs
 Syf2ydX9Zuwl3+q8e5j7ab+Jc0B+ys3iT3Z9QQ9beJVk0s5HAEMx4Qadi/wHvGCce7n41n2h5
 ZrdQ/iayeYYmuZMUltqn8ci4HL3OMTOu3XKtHyAjRF8UTKnZ+MdLhuzjqXPTPiHqXrFnwOVaM
 B0pypLnVYzCQ9kxfiCmAwFceCMPzW/VJs4qfINDcMpTVkJmTi15OGhtmT/2EvPBqYZCRSxRCc
 ZpiE0HtPu2syzzN8pebtnpqENdG/mjjcT/Ukv+NutQZz6CnHjjFpT3uaygNAoA89V0Dn1jHY2
 xOx9NhtYJ72zKL1srpJJT+ixRsK4aWaQ9cHqVV9iBv5ZgErPeCOc3QtLOcid8oF9i1ASG3RQV
 OFC+peEfcYhAkyT64LCIsAuOVyRATD7oO7rr5uAqGCCPxlv8KSelWKDW5CPkEKc36qpC+TcJm
 qU1XyROAXSmtaI=
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
From: Malte Skarupke via Libc-alpha <libc-alpha@sourceware.org>
Reply-To: Malte Skarupke <malteskarupke@web.de>
Cc: Malte Skarupke <malteskarupke@web.de>, triegel@redhat.com,
 malteskarupke@fastmail.fm
Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org
Sender: "Libc-alpha" <libc-alpha-bounces+e=80x24.org@sourceware.org>

After I broadened the scope of grefs, it covered mostly the same scope
as wrefs. The duplicate atomic increment/decrement was unnecessary. In
this patch I remove the increment/decrement of wrefs.

One exception is the case when pthread_cancel is handled. The
interaction between __condvar_cleanup_waiting and
pthread_cond_destroy is complicated and required both variables. So in
order to preserve the existing behavior, I now increment/decrement
wrefs in __condvar_cleanup_waiting.

Another change is that quiesce_and_switch_g1 now clears the
wake-request flag that it sets. It used to be cleared when the last
waiter in the old group leaves pthread_cond_wait. The problem with
this was that it could result in a race with the new
pthread_cond_destroy behavior, where the leaving thread would allow
pthread_cond_destroy to finish and then modify the wake-request flag
after the destroy. This was pointed out in the review of an earlier
version of this patch, and the fix is to make quiesce_and_switch_g1
clear up its own flag.
=2D--
 nptl/nptl-printers.py          |  5 ++-
 nptl/nptl_lock_constants.pysym |  2 +-
 nptl/pthread_cond_broadcast.c  |  9 ++--
 nptl/pthread_cond_common.c     | 17 ++++----
 nptl/pthread_cond_destroy.c    | 30 ++++++++++----
 nptl/pthread_cond_signal.c     | 22 ++++++----
 nptl/pthread_cond_wait.c       | 75 ++++++++++++----------------------
 7 files changed, 81 insertions(+), 79 deletions(-)

diff --git a/nptl/nptl-printers.py b/nptl/nptl-printers.py
index 464a9c253c..eddd62b234 100644
=2D-- a/nptl/nptl-printers.py
+++ b/nptl/nptl-printers.py
@@ -313,6 +313,7 @@ class ConditionVariablePrinter(object):

         data =3D cond['__data']
         self.wrefs =3D data['__wrefs']
+        self.grefs =3D data['__g_refs']
         self.values =3D []

         self.read_values()
@@ -350,8 +351,10 @@ class ConditionVariablePrinter(object):
         are waiting for it.
         """

+        num_readers_g0 =3D self.grefs[0] >> PTHREAD_COND_GREFS_SHIFT
+        num_readers_g1 =3D self.grefs[1] >> PTHREAD_COND_GREFS_SHIFT
         self.values.append(('Threads known to still execute a wait functi=
on',
-                            self.wrefs >> PTHREAD_COND_WREFS_SHIFT))
+                            num_readers_g0 + num_readers_g1))

     def read_attributes(self):
         """Read the condvar's attributes."""
diff --git a/nptl/nptl_lock_constants.pysym b/nptl/nptl_lock_constants.pys=
ym
index ade4398e0c..2141cfa1f0 100644
=2D-- a/nptl/nptl_lock_constants.pysym
+++ b/nptl/nptl_lock_constants.pysym
@@ -50,7 +50,7 @@ PTHREAD_COND_SHARED_MASK          __PTHREAD_COND_SHARED_=
MASK
 PTHREAD_COND_CLOCK_MONOTONIC_MASK __PTHREAD_COND_CLOCK_MONOTONIC_MASK
 COND_CLOCK_BITS
 -- These values are hardcoded:
-PTHREAD_COND_WREFS_SHIFT          3
+PTHREAD_COND_GREFS_SHIFT          1

 -- Rwlock attributes
 PTHREAD_RWLOCK_PREFER_READER_NP
diff --git a/nptl/pthread_cond_broadcast.c b/nptl/pthread_cond_broadcast.c
index f1275b2f15..2ae16cbb8f 100644
=2D-- a/nptl/pthread_cond_broadcast.c
+++ b/nptl/pthread_cond_broadcast.c
@@ -40,10 +40,13 @@ ___pthread_cond_broadcast (pthread_cond_t *cond)
 {
   LIBC_PROBE (cond_broadcast, 1, cond);

-  unsigned int wrefs =3D atomic_load_relaxed (&cond->__data.__wrefs);
-  if (wrefs >> 3 =3D=3D 0)
+  /* See pthread_cond_signal for why relaxed MO is enough here. */
+  unsigned int grefs0 =3D atomic_load_relaxed (cond->__data.__g_refs);
+  unsigned int grefs1 =3D atomic_load_relaxed (cond->__data.__g_refs + 1)=
;
+  if ((grefs0 >> 1) =3D=3D 0 && (grefs1 >> 1) =3D=3D 0)
     return 0;
-  int private =3D __condvar_get_private (wrefs);
+  unsigned int flags =3D atomic_load_relaxed (&cond->__data.__wrefs);
+  int private =3D __condvar_get_private (flags);

   __condvar_acquire_lock (cond, private);

diff --git a/nptl/pthread_cond_common.c b/nptl/pthread_cond_common.c
index c35b9ef03a..001dbd41e2 100644
=2D-- a/nptl/pthread_cond_common.c
+++ b/nptl/pthread_cond_common.c
@@ -366,18 +366,15 @@ __condvar_quiesce_and_switch_g1 (pthread_cond_t *con=
d, uint64_t wseq,
        __g_signals, which will prevent waiters from blocking using a fute=
x on
        __g_signals and also notifies them that the group is closed.  As a
        result, they will eventually remove their group reference, allowin=
g us
-       to close switch group roles.  */
+       to close and switch group roles.  */

   /* First, set the closed flag on __g_signals.  This tells waiters that =
are
      about to wait that they shouldn't do that anymore.  This basically
      serves as an advance notificaton of the upcoming change to __g1_star=
t;
      waiters interpret it as if __g1_start was larger than their waiter
      sequence position.  This allows us to change __g1_start after waitin=
g
-     for all existing waiters with group references to leave, which in tu=
rn
-     makes recovery after stealing a signal simpler because it then can b=
e
-     skipped if __g1_start indicates that the group is closed (otherwise,
-     we would have to recover always because waiters don't know how big t=
heir
-     groups are).  Relaxed MO is fine.  */
+     for all existing waiters with group references to leave.
+     Relaxed MO is fine.  */
   atomic_fetch_or_relaxed (cond->__data.__g_signals + g1, 1);

   /* Wait until there are no group references anymore.  The fetch-or oper=
ation
@@ -419,10 +416,10 @@ __condvar_quiesce_and_switch_g1 (pthread_cond_t *con=
d, uint64_t wseq,
 	  r =3D atomic_load_relaxed (cond->__data.__g_refs + g1);
 	}
     }
-  /* Acquire MO so that we synchronize with the release operation that wa=
iters
-     use to decrement __g_refs and thus happen after the waiters we waite=
d
-     for.  */
-  atomic_thread_fence_acquire ();
+  /* Clear the wake-request flag. Acquire MO so that we synchronize with =
the
+     release operation that waiters use to decrement __g_refs and thus ha=
ppen
+     after the waiters we waited for.  */
+  atomic_fetch_and_acquire (cond->__data.__g_refs + g1, ~(unsigned int)1)=
;

   /* Update __g1_start, which finishes closing this group.  The value we =
add
      will never be negative because old_orig_size can only be zero when w=
e
diff --git a/nptl/pthread_cond_destroy.c b/nptl/pthread_cond_destroy.c
index 81ee958dcb..e14ec89adf 100644
=2D-- a/nptl/pthread_cond_destroy.c
+++ b/nptl/pthread_cond_destroy.c
@@ -37,22 +37,36 @@
    signal or broadcast calls.
    Thus, we can assume that all waiters that are still accessing the cond=
var
    have been woken.  We wait until they have confirmed to have woken up b=
y
-   decrementing __wrefs.  */
+   decrementing __g_refs.  */
 int
 __pthread_cond_destroy (pthread_cond_t *cond)
 {
   LIBC_PROBE (cond_destroy, 1, cond);

-  /* Set the wake request flag.  We could also spin, but destruction that=
 is
-     concurrent with still-active waiters is probably neither common nor
-     performance critical.  Acquire MO to synchronize with waiters confir=
ming
-     that they finished.  */
-  unsigned int wrefs =3D atomic_fetch_or_acquire (&cond->__data.__wrefs, =
4);
-  int private =3D __condvar_get_private (wrefs);
+  unsigned int flags =3D atomic_load_relaxed (&cond->__data.__wrefs);
+  int private =3D __condvar_get_private (flags);
+  for (unsigned g =3D 0; g < 2; ++g)
+    {
+      while (true)
+	{
+	  /* Set the wake request flag.  We could also spin, but destruction
+	     that is concurrent with still-active waiters is probably neither
+	     common nor performance critical.  Acquire MO to synchronize with
+	     waiters confirming that they finished.  */
+	  unsigned r =3D atomic_fetch_or_acquire (cond->__data.__g_refs + g, 1);
+	  r |=3D 1;
+	  if (r =3D=3D 1)
+	    break;
+	  futex_wait_simple (cond->__data.__g_refs + g, r, private);
+	}
+    }
+
+  /* Same as above, except to synchronize with canceled threads.  This wa=
ke
+     flag never gets cleared, so it's enough to set it once.  */
+  unsigned int wrefs =3D atomic_fetch_or_acquire (&cond->__data.__wrefs, =
4) | 4;
   while (wrefs >> 3 !=3D 0)
     {
       futex_wait_simple (&cond->__data.__wrefs, wrefs, private);
-      /* See above.  */
       wrefs =3D atomic_load_acquire (&cond->__data.__wrefs);
     }
   /* The memory the condvar occupies can now be reused.  */
diff --git a/nptl/pthread_cond_signal.c b/nptl/pthread_cond_signal.c
index 171193b13e..2043d8ac64 100644
=2D-- a/nptl/pthread_cond_signal.c
+++ b/nptl/pthread_cond_signal.c
@@ -36,13 +36,21 @@ ___pthread_cond_signal (pthread_cond_t *cond)
 {
   LIBC_PROBE (cond_signal, 1, cond);

-  /* First check whether there are waiters.  Relaxed MO is fine for that =
for
-     the same reasons that relaxed MO is fine when observing __wseq (see
-     below).  */
-  unsigned int wrefs =3D atomic_load_relaxed (&cond->__data.__wrefs);
-  if (wrefs >> 3 =3D=3D 0)
+  /* First check whether there are waiters.  Relaxed MO is fine for that,=
 and
+     it doesn't matter that there are two separate loads.  It could only
+     matter if another thread is calling pthread_cond_wait at the same ti=
me
+     as this function, but then there is no happens-before relationship w=
ith
+     this thread, and the caller can't tell which call came first. If we =
miss
+     a waiter, the caller would have to assume that pthread_cond_signal g=
ot
+     called before pthread_wait, and they have no way of telling otherwis=
e.
+     If they do have a way of telling then there is a happens-before
+     relationship and we're guaranteed to see the waiter here.  */
+  unsigned int grefs0 =3D atomic_load_relaxed (cond->__data.__g_refs);
+  unsigned int grefs1 =3D atomic_load_relaxed (cond->__data.__g_refs + 1)=
;
+  if ((grefs0 >> 1) =3D=3D 0 && (grefs1 >> 1) =3D=3D 0)
     return 0;
-  int private =3D __condvar_get_private (wrefs);
+  unsigned int flags =3D atomic_load_relaxed (&cond->__data.__wrefs);
+  int private =3D __condvar_get_private (flags);

   __condvar_acquire_lock (cond, private);

@@ -51,7 +59,7 @@ ___pthread_cond_signal (pthread_cond_t *cond)
      1) We can pick any position that is allowed by external happens-befo=
re
         constraints.  In particular, if another __pthread_cond_wait call
         happened before us, this waiter must be eligible for being woken =
by
-        us.  The only way do establish such a happens-before is by signal=
ing
+        us.  The only way to establish such a happens-before is by signal=
ing
         while having acquired the mutex associated with the condvar and
         ensuring that the signal's critical section happens after the wai=
ter.
         Thus, the mutex ensures that we see that waiter's __wseq increase=
.
diff --git a/nptl/pthread_cond_wait.c b/nptl/pthread_cond_wait.c
index fb31090e26..dde007475c 100644
=2D-- a/nptl/pthread_cond_wait.c
+++ b/nptl/pthread_cond_wait.c
@@ -43,19 +43,6 @@ struct _condvar_cleanup_buffer
 };


-/* Decrease the waiter reference count.  */
-static void
-__condvar_confirm_wakeup (pthread_cond_t *cond, int private)
-{
-  /* If destruction is pending (i.e., the wake-request flag is nonzero) a=
nd we
-     are the last waiter (prior value of __wrefs was 1 << 3), then wake a=
ny
-     threads waiting in pthread_cond_destroy.  Release MO to synchronize =
with
-     these threads.  Don't bother clearing the wake-up request flag.  */
-  if ((atomic_fetch_add_release (&cond->__data.__wrefs, -8) >> 2) =3D=3D =
3)
-    futex_wake (&cond->__data.__wrefs, INT_MAX, private);
-}
-
-
 /* Cancel waiting after having registered as a waiter previously.  SEQ is=
 our
    position and G is our group index.
    The goal of cancellation is to make our group smaller if that is still
@@ -151,14 +138,7 @@ __condvar_dec_grefs (pthread_cond_t *cond, unsigned i=
nt g, int private)
   /* Release MO to synchronize-with the acquire load in
      __condvar_quiesce_and_switch_g1.  */
   if (atomic_fetch_add_release (cond->__data.__g_refs + g, -2) =3D=3D 3)
-    {
-      /* Clear the wake-up request flag before waking up.  We do not need=
 more
-	 than relaxed MO and it doesn't matter if we apply this for an aliased
-	 group because we wake all futex waiters right after clearing the
-	 flag.  */
-      atomic_fetch_and_relaxed (cond->__data.__g_refs + g, ~(unsigned int=
) 1);
-      futex_wake (cond->__data.__g_refs + g, INT_MAX, private);
-    }
+    futex_wake (cond->__data.__g_refs + g, INT_MAX, private);
 }

 /* Clean-up for cancellation of waiters waiting for normal signals.  We c=
ancel
@@ -172,6 +152,15 @@ __condvar_cleanup_waiting (void *arg)
   pthread_cond_t *cond =3D cbuffer->cond;
   unsigned g =3D cbuffer->wseq & 1;

+  /* Normally we are not allowed to touch cond anymore after calling
+     __condvar_dec_grefs, because pthread_cond_destroy looks at __g_refs =
to
+     determine when all waiters have woken. Since we will do more work in
+     this function, we are using an extra channel to communicate to
+     pthread_cond_destroy that it is not allowed to finish yet: We
+     increment the refcount starting at the fourth bit on __wrefs. Relaxe=
d
+     MO is enough. The synchronization happens because __condvar_dec_gref=
s
+     uses release MO. */
+  atomic_fetch_add_relaxed (&cond->__data.__wrefs, 8);
   __condvar_dec_grefs (cond, g, cbuffer->private);

   __condvar_cancel_waiting (cond, cbuffer->wseq >> 1, g, cbuffer->private=
);
@@ -183,7 +172,12 @@ __condvar_cleanup_waiting (void *arg)
      conservatively.  */
   futex_wake (cond->__data.__g_signals + g, 1, cbuffer->private);

-  __condvar_confirm_wakeup (cond, cbuffer->private);
+  /* If destruction is pending (i.e., the wake-request flag is nonzero) a=
nd we
+     are the last waiter (prior value of __wrefs was 1 << 3), then wake a=
ny
+     threads waiting in pthread_cond_destroy.  Release MO to synchronize =
with
+     these threads.  Don't bother clearing the wake-up request flag.  */
+  if ((atomic_fetch_add_release (&cond->__data.__wrefs, -8) >> 2) =3D=3D =
3)
+    futex_wake (&cond->__data.__wrefs, INT_MAX, cbuffer->private);

   /* XXX If locking the mutex fails, should we just stop execution?  This
      might be better than silently ignoring the error.  */
@@ -287,20 +281,21 @@ __condvar_cleanup_waiting (void *arg)
    __g1_orig_size: Initial size of G1
      * The two least-significant bits represent the condvar-internal lock=
.
      * Only accessed while having acquired the condvar-internal lock.
-   __wrefs: Waiter reference counter.
+   __wrefs: Flags and count of waiters who called pthread_cancel.
      * Bit 2 is true if waiters should run futex_wake when they remove th=
e
        last reference.  pthread_cond_destroy uses this as futex word.
      * Bit 1 is the clock ID (0 =3D=3D CLOCK_REALTIME, 1 =3D=3D CLOCK_MON=
OTONIC).
      * Bit 0 is true iff this is a process-shared condvar.
-     * Simple reference count used by both waiters and pthread_cond_destr=
oy.
-     (If the format of __wrefs is changed, update nptl_lock_constants.pys=
ym
-      and the pretty printers.)
+     * Simple reference count used by __condvar_cleanup_waiting and pthre=
ad_cond_destroy.
+     (If the format of __wrefs is changed, update the pretty printers.)
    For each of the two groups, we have:
    __g_refs: Futex waiter reference count.
      * LSB is true if waiters should run futex_wake when they remove the
        last reference.
      * Reference count used by waiters concurrently with signalers that h=
ave
        acquired the condvar-internal lock.
+     (If the format of __g_refs is changed, update nptl_lock_constants.py=
sym
+      and the pretty printers.)
    __g_signals: The number of signals that can still be consumed.
      * Used as a futex word by waiters.  Used concurrently by waiters and
        signalers.
@@ -329,18 +324,6 @@ __condvar_cleanup_waiting (void *arg)
    sufficient because if a waiter can see a sufficiently large value, it =
could
    have also consume a signal in the waiters group.

-   Waiters try to grab a signal from __g_signals without holding a refere=
nce
-   count, which can lead to stealing a signal from a more recent group af=
ter
-   their own group was already closed.  They cannot always detect whether=
 they
-   in fact did because they do not know when they stole, but they can
-   conservatively add a signal back to the group they stole from; if they
-   did so unnecessarily, all that happens is a spurious wake-up.  To make=
 this
-   even less likely, __g1_start contains the index of the current g2 too,
-   which allows waiters to check if there aliasing on the group slots; if
-   there wasn't, they didn't steal from the current G1, which means that =
the
-   G1 they stole from must have been already closed and they do not need =
to
-   fix anything.
-
    It is essential that the last field in pthread_cond_t is __g_signals[1=
]:
    The previous condvar used a pointer-sized field in pthread_cond_t, so =
a
    PTHREAD_COND_INITIALIZER from that condvar implementation might only
@@ -405,16 +388,14 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pt=
hread_mutex_t *mutex,
   unsigned int g =3D wseq & 1;
   uint64_t seq =3D wseq >> 1;

-  /* Increase the waiter reference count.  Relaxed MO is sufficient becau=
se
-     we only need to synchronize when decrementing the reference count.  =
*/
-  unsigned int flags =3D atomic_fetch_add_relaxed (&cond->__data.__wrefs,=
 8);
-  int private =3D __condvar_get_private (flags);
   /* Acquire a group reference and use acquire MO for that so that we
      synchronize with the dummy read-modify-write in
      __condvar_quiesce_and_switch_g1 if we read from the same group.  Thi=
s will
      make us see the closed flag on __g_signals that designates a concurr=
ent
      attempt to reuse the group's slot. */
   atomic_fetch_add_acquire (cond->__data.__g_refs + g, 2);
+  unsigned int flags =3D atomic_load_relaxed (&cond->__data.__wrefs);
+  int private =3D __condvar_get_private (flags);

   /* Now that we are registered as a waiter, we can release the mutex.
      Waiting on the condvar must be atomic with releasing the mutex, so i=
f
@@ -428,7 +409,6 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthr=
ead_mutex_t *mutex,
     {
       __condvar_dec_grefs (cond, g, private);
       __condvar_cancel_waiting (cond, seq, g, private);
-      __condvar_confirm_wakeup (cond, private);
       return err;
     }

@@ -480,8 +460,8 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthr=
ead_mutex_t *mutex,
 	  /* No signals available after spinning, so prepare to block.
 	     First check the closed flag on __g_signals that designates a
 	     concurrent attempt to reuse the group's slot. We use acquire MO for
-	     the __g_signals check to make the __g1_start check work (see
-	     above).  */
+	     the __g_signals check to make sure we read the current value of
+             __g1_start (see above).  */
 	  if (((atomic_load_acquire (cond->__data.__g_signals + g) & 1) !=3D 0)
 	      || (seq < (__condvar_load_g1_start_relaxed (cond) >> 1)))
 	    {
@@ -522,9 +502,7 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthr=
ead_mutex_t *mutex,

     }
   /* Try to grab a signal.  Use acquire MO so that we see an up-to-date v=
alue
-     of __g1_start below (see spinning above for a similar case).  In
-     particular, if we steal from a more recent group, we will also see a
-     more recent __g1_start below.  */
+     of __g1_start when spinning above.  */
   while (!atomic_compare_exchange_weak_acquire (cond->__data.__g_signals =
+ g,
 						&signals, signals - 2));

@@ -534,7 +512,6 @@ __pthread_cond_wait_common (pthread_cond_t *cond, pthr=
ead_mutex_t *mutex,
      that before acquiring the mutex to allow for execution of
      pthread_cond_destroy while having acquired the mutex.  */
   __condvar_dec_grefs (cond, g, private);
-  __condvar_confirm_wakeup (cond, private);

  acquire_lock:
   /* Woken up; now re-acquire the mutex.  If this doesn't fail, return RE=
SULT,
=2D-
2.25.1