From: Adhemerval Zanella via Libc-alpha <libc-alpha@sourceware.org>
To: Florian Weimer <fweimer@redhat.com>
Cc: libc-alpha@sourceware.org
Subject: Re: [PATCH v2 04/19] nptl: Do not use pthread set_tid_address as state synchronization (BZ #19951)
Date: Thu, 26 Aug 2021 11:58:14 -0300 [thread overview]
Message-ID: <1f026797-7be8-bf9d-323f-6ac7539752a1@linaro.org> (raw)
In-Reply-To: <87y28o7ake.fsf@oldenburg.str.redhat.com>
On 26/08/2021 07:41, Florian Weimer wrote:
> * Adhemerval Zanella:
>
>> The use after free described in BZ#19951 is due the use of two different
>> PD fields, 'joinid' and 'cancelhandling', to describe the thread state
>> to synchronize the calls of pthread_join(), pthread_detach(),
>> pthread_exit(), and normal thread exit.
>>
>> And any state change potentially requires to check for both fields
>> atomically to handle partial state (such as pthread_join() with a
>> cancellation handler to issue a 'joinstate' field rollback).
>>
>> This patch uses a different PD member with 4 possible states (JOINABLE,
>> DETACHED, EXITING, and EXITED) instead of pthread 'tid' field:
>>
>> 1. On pthread_create() the inital state is set either to JOINABLE or
>> DETACHED depending of the pthread attribute used.
>>
>> 2. On pthread_detach(), a CAS is issued on the state. If the CAS
>> fails it means that thread is already detached (DETACHED) or is
>> being terminated (EXITING). For former an EINVAL is returned,
>> while for latter pthread_detach() should be reponsible to join the
>> thread (and deallocate any internal resources).
>>
>> 3. On pthread_create() exit phase (reached either if the thread
>
> Suggest: “In the exit phase of the wrapper function for the thread start
> routine (reached …”
Ack.
>
>> function has returned, pthread_exit() has being called, or
>> cancellation handled has been acted upon) we issue a CAS on state
>> to set to EXITING mode. If the thread is previously on DETACHED
>> mode it will be the thread itself responsible to deallocate any
>
> Suggest: “the thread itself is responsible for arranging the
> deallocation of any resource”
>
> (detached threads cannot immediately deallocate themselves)
Ack.
>
>> resource, otherwise the threads needs to be joined.
>
> “the thread[] needs to be joined”
Ack.
>
>
>> 4. The clear_tid_field on 'clone' call is changed to set the new
>> 'state' field on thread exit (EXITED). This state ins only
>> reached at thread termination.
>>
>> 4. The pthread_join() implementation is now simpler: the futex wait
>> is done directly on thread state and there is no need to reset it
>> in case of timeout (since the state is now set either by
>> pthread_detach() or by the kernel on process termination).
>
> Duplicate 4.
Ack.
>
>> The race condition on pthread_detach is avoided with only one atomic
>> operation on PD state: once the mode is set to THREAD_STATE_DETACHED
>> it is up to thread itself to deallocate its memory (done on the exit
>> phase at pthread_create()).
>
> See above regarding thread self-deallocation.
>
> The design as described above looks sound to me, those are just nits.
Right, should I change this paragraph as well (it is not clear the
suggestion here).
>
>> This change trigger an invalid C11 thread tests: it crates a thread,
>> which detaches itself, and after a timeout the creating thread checks
>> if the join fails. The issue is once thrd_join() is called the thread
>> lifetime has already expired. The test is changed so the sleep is done
>> by the thread itself, so the creating thread will try to join a valid
>> thread.
>
> This could be a separate commit.
Ack.
>
>> diff --git a/nptl/nptl-stack.h b/nptl/nptl-stack.h
>> index 8bcfde7ec5..c6b2b3078f 100644
>> --- a/nptl/nptl-stack.h
>> +++ b/nptl/nptl-stack.h
>> @@ -32,7 +32,7 @@ extern size_t __nptl_stack_cache_maxsize attribute_hidden;
>> static inline bool
>> __nptl_stack_in_use (struct pthread *pd)
>> {
>> - return pd->tid <= 0;
>> + return atomic_load_relaxed (&pd->joinstate) == THREAD_STATE_EXITED;
>> }
>
> Maybe this should be an acquire load, to synchronize with the thread
> exit as communicated by the kernel?
It seems ok, I also added the comment:
/* It synchronize with the thread exit as communicated by the kernel with
set_tid_address set at clone(). */
>
>> diff --git a/nptl/pthread_cancel.c b/nptl/pthread_cancel.c
>> index 38f637c5cf..67e00ef007 100644
>> --- a/nptl/pthread_cancel.c
>> +++ b/nptl/pthread_cancel.c
>> @@ -63,7 +63,8 @@ __pthread_cancel (pthread_t th)
>> volatile struct pthread *pd = (volatile struct pthread *) th;
>>
>> /* Make sure the descriptor is valid. */
>> - if (INVALID_TD_P (pd))
>> + int state = atomic_load_acquire (&pd->joinstate);
>> + if (state == THREAD_STATE_EXITED || state == THREAD_STATE_EXITING)
>> /* Not a valid thread handle. */
>> return ESRCH;
>
> This is fixed in patch 08/19, so it's okay.
>
>> diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
>> index 08e5189ad6..763e32bc3e 100644
>> --- a/nptl/pthread_create.c
>> +++ b/nptl/pthread_create.c
>> @@ -286,7 +286,7 @@ static int create_thread (struct pthread *pd,
>> const struct
>> @@ -351,13 +351,16 @@ start_thread (void *arg)
>> and free any resource prior return to the pthread_create caller. */
>> setup_failed = pd->setup_failed == 1;
>> if (setup_failed)
>> - pd->joinid = NULL;
>> + pd->joinstate = THREAD_STATE_JOINABLE;
>>
>> /* And give it up right away. */
>> lll_unlock (pd->lock, LLL_PRIVATE);
>>
>> if (setup_failed)
>> - goto out;
>> + {
>> + pd->tid = 0;
>> + goto out;
>> + }
>> }
>
> What's the advantage of setting pd->tid here and below in start_thread?
We don't really need to clear the tid on setup_failed case in fact, since
in this case no pthread_t will be returned to caller. I remove it.
>
> It destroys information that could be useful for debugging purposes,
> answering the question what the TID was before the thread exited.
>
>> @@ -481,9 +484,20 @@ start_thread (void *arg)
>> the breakpoint reports TD_THR_RUN state rather than TD_THR_ZOMBIE. */
>> atomic_bit_set (&pd->cancelhandling, EXITING_BIT);
>>
>> - if (__glibc_unlikely (atomic_decrement_and_test (&__nptl_nthreads)))
>> - /* This was the last thread. */
>> - exit (0);
>> +
>> + /* CONCURRENCY NOTES:
>> +
>> + Concurrent pthread_detach() will either set state to
>> + THREAD_STATE_DETACHED or wait the thread to terminate. The exiting state
>
> “wait [for] the thread to terminate”
Ack.
>
>> + set here is set so a pthread_join() wait until all the required cleanup
>> + steps are done.
>> +
>> + The 'joinstate' field will be used to determine who is responsible to
>> + call __nptl_free_tcb below. */
>> +
>> + unsigned int joinstate = THREAD_STATE_JOINABLE;
>> + atomic_compare_exchange_weak_acquire (&pd->joinstate, &joinstate,
>> + THREAD_STATE_EXITING);
>
> I think you need a strong CAS here. We don't have, so you'll have to
> add a loop.
Yeah, it seems right. I changed to:
unsigned int prevstate;
while (!atomic_compare_exchange_weak_acquire (&pd->joinstate, &prevstate,
THREAD_STATE_EXITING))
prevstate = atomic_load_relaxed (&pd->joinstate);
>
>> diff --git a/nptl/pthread_detach.c b/nptl/pthread_detach.c
>> index ac50db9b0e..97d292cab8 100644
>> --- a/nptl/pthread_detach.c
>> +++ b/nptl/pthread_detach.c
>> @@ -26,32 +26,24 @@ ___pthread_detach (pthread_t th)
>
>> + For the case the thread is being terminated (THREAD_STATE_EXITING),
>> + pthread_detach() will responsible to clean up the stack. */
>> +
>> + int curstate = THREAD_STATE_JOINABLE;
>> + if (!atomic_compare_exchange_weak_acquire (&pd->joinstate, &curstate,
>> + THREAD_STATE_DETACHED))
>
> This should be a strong CAS, so this needs a loop, I think.
I changed to:
unsigned int prevstate = atomic_load_relaxed (&pd->joinstate);
do
{
if (prevstate != THREAD_STATE_JOINABLE)
return EINVAL;
return __pthread_join (th, 0);
}
while (!atomic_compare_exchange_weak_acquire (&pd->joinstate, &prevstate,
THREAD_STATE_DETACHED));
>
>> diff --git a/nptl/pthread_join_common.c b/nptl/pthread_join_common.c
>> index 7303069316..428ed35101 100644
>> --- a/nptl/pthread_join_common.c
>> +++ b/nptl/pthread_join_common.c
>> @@ -22,113 +22,73 @@
>> #include <time.h>
>> #include <futex-internal.h>
>>
>> -static void
>> -cleanup (void *arg)
>> +/* Check for a possible deadlock situation where the threads are waiting for
>> + each other to finish. Note that this is a "may" error. To be 100% sure we
>> + catch this error we would have to lock the data structures but it is not
>> + necessary. In the unlikely case that two threads are really caught in this
>> + situation they will deadlock. It is the programmer's problem to figure
>> + this out. */
>> +static inline bool
>> +check_for_deadlock (int state, struct pthread *pd)
>> {
>> - /* If we already changed the waiter ID, reset it. The call cannot
>> - fail for any reason but the thread not having done that yet so
>> - there is no reason for a loop. */
>> struct pthread *self = THREAD_SELF;
>> - atomic_compare_exchange_weak_acquire (&arg, &self, NULL);
>> + return ((pd == self
>> + || (atomic_load_acquire (&self->joinstate) == THREAD_STATE_DETACHED
>> + && (pd->cancelhandling
>> + & (CANCELED_BITMASK | EXITING_BITMASK
>> + | TERMINATED_BITMASK)) == 0))
>> + && !(self->cancelstate == PTHREAD_CANCEL_ENABLE
>> + && (pd->cancelhandling & (CANCELED_BITMASK | EXITING_BITMASK
>> + | TERMINATED_BITMASK))
>> + == CANCELED_BITMASK));
>> }
>
> The state argument to check_for_deadlock appears to be unused.
I removed it.
>
>> + /* pthread_join() is a cancellation entrypoint and we use the same
>> + rationale for pthread_timedjoin_np().
>> +
>> + The kernel notifies a process which uses CLONE_CHILD_CLEARTID via
>> + a memory zerouing and futex wake up when the process terminates.
>
> “memory zero[]ing and futex wake[-]up”
Ack.
>
>> #if __TIMESIZE == 64
>> diff --git a/nptl/pthread_tryjoin.c b/nptl/pthread_tryjoin.c
>> index fd938e8780..726f2abc68 100644
>> --- a/nptl/pthread_tryjoin.c
>> +++ b/nptl/pthread_tryjoin.c
>> @@ -22,15 +22,17 @@
>> int
>> __pthread_tryjoin_np (pthread_t threadid, void **thread_return)
>> {
>> - /* Return right away if the thread hasn't terminated yet. */
>> - struct pthread *pd = (struct pthread *) threadid;
>> - if (pd->tid != 0)
>> - return EBUSY;
>> + /* The joinable state (THREAD_STATE_JOINABLE) is straigthforward since the
>> + thread hasn't finished yet and trying to join might block.
>> + Both detached (THREAD_STATE_DETACHED) and exiting (THREAD_STATE_EXITING)
>> + might also result in a possible blocking call: a detached thread might
>> + change its state to exiting and a exiting thread my take some time to
>> + exit (and thus let the kernel set the state to THREAD_STATE_EXITED). */
>
> pthread_tryjoin_np on a thread which is THREAD_STATE_DETACHED is
> invalid, so that case doesn't matter, I think.
I changed the comment to:
/* The joinable state (THREAD_STATE_JOINABLE) is straigthforward since the
thread hasn't finished yet and trying to join might block.
The exiting thread (THREAD_STATE_EXITING) also mgith result in ablocking
call: a detached thread might change its state to exiting and a exiting
thread my take some time to exit (and thus let the kernel set the state
to THREAD_STATE_EXITED). */
>
>> + struct pthread *pd = (struct pthread *) threadid;
>> + return atomic_load_acquire (&pd->joinstate) != THREAD_STATE_EXITED
>> + ? EBUSY
>> + : __pthread_clockjoin_ex (threadid, thread_return, 0, NULL);
>> }
>
> But the code is correct, I believe.
>
>> diff --git a/sysdeps/pthread/tst-thrd-detach.c b/sysdeps/pthread/tst-thrd-detach.c
>> index c844767748..e1906a0e10 100644
>> --- a/sysdeps/pthread/tst-thrd-detach.c
>> +++ b/sysdeps/pthread/tst-thrd-detach.c
>
>> - if (thrd_join (id, NULL) == thrd_success)
>> - FAIL_EXIT1 ("thrd_join succeed where it should fail");
>> + TEST_COMPARE (thrd_join (id, NULL), thrd_error);
>
> This is still a user-after-free bug, right?
Indeed, I think it would be better to just remove this test.
next prev parent reply other threads:[~2021-08-26 14:58 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-23 19:50 [PATCH v2 00/19] Fix various NPTL synchronization issues Adhemerval Zanella via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 01/19] nptl: Fix tst-cancel7 and tst-cancelx7 race condition (BZ #14232) Adhemerval Zanella via Libc-alpha
2021-08-26 9:33 ` Florian Weimer via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 02/19] nptl: Set cancellation type and state on pthread_exit Adhemerval Zanella via Libc-alpha
2021-08-26 9:38 ` Florian Weimer via Libc-alpha
2021-08-26 9:42 ` Florian Weimer via Libc-alpha
2021-08-26 11:56 ` Adhemerval Zanella via Libc-alpha
2021-08-26 11:52 ` Adhemerval Zanella via Libc-alpha
2021-08-26 12:08 ` Florian Weimer via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 03/19] nptl: Handle robust PI mutexes for !__ASSUME_SET_ROBUST_LIST Adhemerval Zanella via Libc-alpha
2021-08-26 9:42 ` Florian Weimer via Libc-alpha
2021-08-26 12:14 ` Adhemerval Zanella via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 04/19] nptl: Do not use pthread set_tid_address as state synchronization (BZ #19951) Adhemerval Zanella via Libc-alpha
2021-08-26 10:41 ` Florian Weimer via Libc-alpha
2021-08-26 14:58 ` Adhemerval Zanella via Libc-alpha [this message]
2021-08-26 15:06 ` Florian Weimer via Libc-alpha
2021-08-26 16:16 ` Adhemerval Zanella via Libc-alpha
2021-08-30 10:42 ` Florian Weimer via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 05/19] nptl: Move setxid flag out of cancelhandling Adhemerval Zanella via Libc-alpha
2021-08-26 11:34 ` Florian Weimer via Libc-alpha
2021-08-26 15:11 ` Adhemerval Zanella via Libc-alpha
2021-08-26 15:21 ` Florian Weimer via Libc-alpha
2021-08-26 16:39 ` Adhemerval Zanella via Libc-alpha
2021-08-30 11:27 ` Florian Weimer via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 06/19] nptl: Replace struct thread cancelhandling field Adhemerval Zanella via Libc-alpha
2021-08-26 14:34 ` Florian Weimer via Libc-alpha
2021-08-26 16:48 ` Adhemerval Zanella via Libc-alpha
2021-08-30 10:36 ` Florian Weimer via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 07/19] support: Add support_wait_for_thread_exit Adhemerval Zanella via Libc-alpha
2021-08-26 9:31 ` Florian Weimer via Libc-alpha
2021-08-26 16:49 ` Adhemerval Zanella via Libc-alpha
2021-08-30 11:46 ` Florian Weimer via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 08/19] nptl: pthread_kill, pthread_cancel should fail after exit (bug 19193) Adhemerval Zanella via Libc-alpha
2021-08-26 10:03 ` Florian Weimer via Libc-alpha
2021-08-26 16:49 ` Adhemerval Zanella via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 09/19] nptl: Fix race between pthread_kill and thread exit (bug 12889) Adhemerval Zanella via Libc-alpha
2021-08-26 14:23 ` Florian Weimer via Libc-alpha
2021-08-26 17:06 ` Adhemerval Zanella via Libc-alpha
2021-08-30 9:25 ` Florian Weimer via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 10/19] nptl: Use tidlock when accessing TID on pthread_getaffinity_np Adhemerval Zanella via Libc-alpha
2021-08-26 14:24 ` Florian Weimer via Libc-alpha
2021-08-26 17:29 ` Adhemerval Zanella via Libc-alpha
2021-08-30 9:30 ` Florian Weimer via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 11/19] nptl: Use tidlock when accessing TID on pthread_setaffinity Adhemerval Zanella via Libc-alpha
2021-08-26 14:25 ` Florian Weimer via Libc-alpha
2021-08-26 17:31 ` Adhemerval Zanella via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 12/19] nptl: Use tidlock when accessing TID on pthread_getcpuclockid Adhemerval Zanella via Libc-alpha
2021-08-26 14:27 ` Florian Weimer via Libc-alpha
2021-08-26 17:41 ` Adhemerval Zanella via Libc-alpha
2021-08-30 9:34 ` Florian Weimer via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 13/19] nptl: Use tidlock when accessing TID on pthread_getschedparam Adhemerval Zanella via Libc-alpha
2021-08-26 15:00 ` Florian Weimer via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 14/19] nptl: Use tidlock when accessing TID on pthread_setschedparam Adhemerval Zanella via Libc-alpha
2021-08-26 14:35 ` Florian Weimer via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 15/19] nptl: Use tidlock when accessing TID on pthread_getname_np Adhemerval Zanella via Libc-alpha
2021-08-26 14:38 ` Florian Weimer via Libc-alpha
2021-08-26 17:45 ` Adhemerval Zanella via Libc-alpha
2021-08-30 9:37 ` Florian Weimer via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 16/19] nptl: Use tidlock when accessing TID on pthread_setname_np Adhemerval Zanella via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 17/19] nptl: Use tidlock when accessing TID on pthread_sigqueue Adhemerval Zanella via Libc-alpha
2021-08-26 14:43 ` Florian Weimer via Libc-alpha
2021-08-26 17:49 ` Adhemerval Zanella via Libc-alpha
2021-08-30 9:26 ` Florian Weimer via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 18/19] nptl: Use tidlock when accessing TID on pthread_setschedprio Adhemerval Zanella via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 19/19] nptl: Remove INVALID_TD_P Adhemerval Zanella via Libc-alpha
2021-08-26 9:30 ` Florian Weimer via Libc-alpha
2021-08-26 14:47 ` [PATCH v2 00/19] Fix various NPTL synchronization issues Florian Weimer via Libc-alpha
2021-08-26 18:19 ` Adhemerval Zanella via Libc-alpha
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/libc/involved.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1f026797-7be8-bf9d-323f-6ac7539752a1@linaro.org \
--to=libc-alpha@sourceware.org \
--cc=adhemerval.zanella@linaro.org \
--cc=fweimer@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).