unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Florian Weimer via Libc-alpha <libc-alpha@sourceware.org>
To: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: libc-alpha@sourceware.org
Subject: Re: [PATCH v2 04/19] nptl: Do not use pthread set_tid_address as state synchronization (BZ #19951)
Date: Thu, 26 Aug 2021 12:41:05 +0200	[thread overview]
Message-ID: <87y28o7ake.fsf@oldenburg.str.redhat.com> (raw)
In-Reply-To: <20210823195047.543237-5-adhemerval.zanella@linaro.org> (Adhemerval Zanella's message of "Mon, 23 Aug 2021 16:50:32 -0300")

* Adhemerval Zanella:

> The use after free described in BZ#19951 is due the use of two different
> PD fields, 'joinid' and 'cancelhandling', to describe the thread state
> to synchronize the calls of pthread_join(), pthread_detach(),
> pthread_exit(), and normal thread exit.
>
> And any state change potentially requires to check for both fields
> atomically to handle partial state (such as pthread_join() with a
> cancellation handler to issue a 'joinstate' field rollback).
>
> This patch uses a different PD member with 4 possible states (JOINABLE,
> DETACHED, EXITING, and EXITED) instead of pthread 'tid' field:
>
>   1. On pthread_create() the inital state is set either to JOINABLE or
>      DETACHED depending of the pthread attribute used.
>
>   2. On pthread_detach(), a CAS is issued on the state.  If the CAS
>      fails it means that thread is already detached (DETACHED) or is
>      being terminated (EXITING).  For former an EINVAL is returned,
>      while for latter pthread_detach() should be reponsible to join the
>      thread (and deallocate any internal resources).
>
>   3. On pthread_create() exit phase (reached either if the thread

Suggest: “In the exit phase of the wrapper function for the thread start
routine (reached …”

>      function has returned, pthread_exit() has being called, or
>      cancellation handled has been acted upon) we issue a CAS on state
>      to set to EXITING mode.  If the thread is previously on DETACHED
>      mode it will be the thread itself responsible to deallocate any

Suggest: “the thread itself is responsible for arranging the
deallocation of any resource”

(detached threads cannot immediately deallocate themselves)

>      resource, otherwise the threads needs to be joined.

“the thread[] needs to be joined”


>   4. The clear_tid_field on 'clone' call is changed to set the new
>      'state' field on thread exit (EXITED).  This state ins only
>      reached at thread termination.
>
>   4. The pthread_join() implementation is now simpler: the futex wait
>      is done directly on thread state and there is no need to reset it
>      in case of timeout (since the state is now set either by
>      pthread_detach() or by the kernel on process termination).

Duplicate 4.

> The race condition on pthread_detach is avoided with only one atomic
> operation on PD state: once the mode is set to THREAD_STATE_DETACHED
> it is up to thread itself to deallocate its memory (done on the exit
> phase at pthread_create()).

See above regarding thread self-deallocation.

The design as described above looks sound to me, those are just nits.

> This change trigger an invalid C11 thread tests: it crates a thread,
> which detaches itself, and after a timeout the creating thread checks
> if the join fails.  The issue is once thrd_join() is called the thread
> lifetime has already expired.  The test is changed so the sleep is done
> by the thread itself, so the creating thread will try to join a valid
> thread.

This could be a separate commit.

> diff --git a/nptl/nptl-stack.h b/nptl/nptl-stack.h
> index 8bcfde7ec5..c6b2b3078f 100644
> --- a/nptl/nptl-stack.h
> +++ b/nptl/nptl-stack.h
> @@ -32,7 +32,7 @@ extern size_t __nptl_stack_cache_maxsize attribute_hidden;
>  static inline bool
>  __nptl_stack_in_use (struct pthread *pd)
>  {
> -  return pd->tid <= 0;
> +  return atomic_load_relaxed (&pd->joinstate) == THREAD_STATE_EXITED;
>  }

Maybe this should be an acquire load, to synchronize with the thread
exit as communicated by the kernel?

> diff --git a/nptl/pthread_cancel.c b/nptl/pthread_cancel.c
> index 38f637c5cf..67e00ef007 100644
> --- a/nptl/pthread_cancel.c
> +++ b/nptl/pthread_cancel.c
> @@ -63,7 +63,8 @@ __pthread_cancel (pthread_t th)
>    volatile struct pthread *pd = (volatile struct pthread *) th;
>  
>    /* Make sure the descriptor is valid.  */
> -  if (INVALID_TD_P (pd))
> +  int state = atomic_load_acquire (&pd->joinstate);
> +  if (state == THREAD_STATE_EXITED || state == THREAD_STATE_EXITING)
>      /* Not a valid thread handle.  */
>      return ESRCH;

This is fixed in patch 08/19, so it's okay.

> diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
> index 08e5189ad6..763e32bc3e 100644
> --- a/nptl/pthread_create.c
> +++ b/nptl/pthread_create.c
> @@ -286,7 +286,7 @@ static int create_thread (struct pthread *pd,
> const struct
> @@ -351,13 +351,16 @@ start_thread (void *arg)
>           and free any resource prior return to the pthread_create caller.  */
>        setup_failed = pd->setup_failed == 1;
>        if (setup_failed)
> -	pd->joinid = NULL;
> +	pd->joinstate = THREAD_STATE_JOINABLE;
>  
>        /* And give it up right away.  */
>        lll_unlock (pd->lock, LLL_PRIVATE);
>  
>        if (setup_failed)
> -	goto out;
> +	{
> +	  pd->tid = 0;
> +	  goto out;
> +	}
>      }

What's the advantage of setting pd->tid here and below in start_thread?

It destroys information that could be useful for debugging purposes,
answering the question what the TID was before the thread exited.

> @@ -481,9 +484,20 @@ start_thread (void *arg)
>       the breakpoint reports TD_THR_RUN state rather than TD_THR_ZOMBIE.  */
>    atomic_bit_set (&pd->cancelhandling, EXITING_BIT);
>  
> -  if (__glibc_unlikely (atomic_decrement_and_test (&__nptl_nthreads)))
> -    /* This was the last thread.  */
> -    exit (0);
> +
> +  /* CONCURRENCY NOTES:
> +
> +     Concurrent pthread_detach() will either set state to
> +     THREAD_STATE_DETACHED or wait the thread to terminate.  The exiting state

“wait [for] the thread to terminate”

> +     set here is set so a pthread_join() wait until all the required cleanup
> +     steps are done.
> +
> +     The 'joinstate' field will be used to determine who is responsible to
> +     call __nptl_free_tcb below.  */
> +
> +  unsigned int joinstate = THREAD_STATE_JOINABLE;
> +  atomic_compare_exchange_weak_acquire (&pd->joinstate, &joinstate,
> +					THREAD_STATE_EXITING);

I think you need a strong CAS here.  We don't have, so you'll have to
add a loop.

> diff --git a/nptl/pthread_detach.c b/nptl/pthread_detach.c
> index ac50db9b0e..97d292cab8 100644
> --- a/nptl/pthread_detach.c
> +++ b/nptl/pthread_detach.c
> @@ -26,32 +26,24 @@ ___pthread_detach (pthread_t th)

> +     For the case the thread is being terminated (THREAD_STATE_EXITING),
> +     pthread_detach() will responsible to clean up the stack.  */
> +
> +  int curstate = THREAD_STATE_JOINABLE;
> +  if (!atomic_compare_exchange_weak_acquire (&pd->joinstate, &curstate,
> +					     THREAD_STATE_DETACHED))

This should be a strong CAS, so this needs a loop, I think.

> diff --git a/nptl/pthread_join_common.c b/nptl/pthread_join_common.c
> index 7303069316..428ed35101 100644
> --- a/nptl/pthread_join_common.c
> +++ b/nptl/pthread_join_common.c
> @@ -22,113 +22,73 @@
>  #include <time.h>
>  #include <futex-internal.h>
>  
> -static void
> -cleanup (void *arg)
> +/* Check for a possible deadlock situation where the threads are waiting for
> +   each other to finish.  Note that this is a "may" error.  To be 100% sure we
> +   catch this error we would have to lock the data structures but it is not
> +   necessary.  In the unlikely case that two threads are really caught in this
> +   situation they will deadlock.  It is the programmer's problem to figure
> +   this out.  */
> +static inline bool
> +check_for_deadlock (int state, struct pthread *pd)
>  {
> -  /* If we already changed the waiter ID, reset it.  The call cannot
> -     fail for any reason but the thread not having done that yet so
> -     there is no reason for a loop.  */
>    struct pthread *self = THREAD_SELF;
> -  atomic_compare_exchange_weak_acquire (&arg, &self, NULL);
> +  return ((pd == self
> +	   || (atomic_load_acquire (&self->joinstate) == THREAD_STATE_DETACHED
> +	       && (pd->cancelhandling
> +		   & (CANCELED_BITMASK | EXITING_BITMASK
> +		      | TERMINATED_BITMASK)) == 0))
> +	   && !(self->cancelstate == PTHREAD_CANCEL_ENABLE
> +		&& (pd->cancelhandling & (CANCELED_BITMASK | EXITING_BITMASK
> +					  | TERMINATED_BITMASK))
> +		== CANCELED_BITMASK));
>  }

The state argument to check_for_deadlock appears to be unused.

> +      /* pthread_join() is a cancellation entrypoint and we use the same
> +         rationale for pthread_timedjoin_np().
> +
> +	 The kernel notifies a process which uses CLONE_CHILD_CLEARTID via
> +	 a memory zerouing and futex wake up when the process terminates.

“memory zero[]ing and futex wake[-]up”

>  #if __TIMESIZE == 64
> diff --git a/nptl/pthread_tryjoin.c b/nptl/pthread_tryjoin.c
> index fd938e8780..726f2abc68 100644
> --- a/nptl/pthread_tryjoin.c
> +++ b/nptl/pthread_tryjoin.c
> @@ -22,15 +22,17 @@
>  int
>  __pthread_tryjoin_np (pthread_t threadid, void **thread_return)
>  {
> -  /* Return right away if the thread hasn't terminated yet.  */
> -  struct pthread *pd = (struct pthread *) threadid;
> -  if (pd->tid != 0)
> -    return EBUSY;
> +  /* The joinable state (THREAD_STATE_JOINABLE) is straigthforward since the
> +     thread hasn't finished yet and trying to join might block.
> +     Both detached (THREAD_STATE_DETACHED) and exiting (THREAD_STATE_EXITING)
> +     might also result in a possible blocking call: a detached thread might
> +     change its state to exiting and a exiting thread my take some time to
> +     exit (and thus let the kernel set the state to THREAD_STATE_EXITED).  */

pthread_tryjoin_np on a thread which is THREAD_STATE_DETACHED is
invalid, so that case doesn't matter, I think.

> +  struct pthread *pd = (struct pthread *) threadid;
> +  return atomic_load_acquire (&pd->joinstate) != THREAD_STATE_EXITED
> +	 ? EBUSY
> +	 : __pthread_clockjoin_ex (threadid, thread_return, 0, NULL);
>  }

But the code is correct, I believe.

> diff --git a/sysdeps/pthread/tst-thrd-detach.c b/sysdeps/pthread/tst-thrd-detach.c
> index c844767748..e1906a0e10 100644
> --- a/sysdeps/pthread/tst-thrd-detach.c
> +++ b/sysdeps/pthread/tst-thrd-detach.c

> -  if (thrd_join (id, NULL) == thrd_success)
> -    FAIL_EXIT1 ("thrd_join succeed where it should fail");
> +  TEST_COMPARE (thrd_join (id, NULL), thrd_error);

This is still a user-after-free bug, right?

Thanks,
Florian


  reply	other threads:[~2021-08-26 10:41 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-23 19:50 [PATCH v2 00/19] Fix various NPTL synchronization issues Adhemerval Zanella via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 01/19] nptl: Fix tst-cancel7 and tst-cancelx7 race condition (BZ #14232) Adhemerval Zanella via Libc-alpha
2021-08-26  9:33   ` Florian Weimer via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 02/19] nptl: Set cancellation type and state on pthread_exit Adhemerval Zanella via Libc-alpha
2021-08-26  9:38   ` Florian Weimer via Libc-alpha
2021-08-26  9:42     ` Florian Weimer via Libc-alpha
2021-08-26 11:56       ` Adhemerval Zanella via Libc-alpha
2021-08-26 11:52     ` Adhemerval Zanella via Libc-alpha
2021-08-26 12:08       ` Florian Weimer via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 03/19] nptl: Handle robust PI mutexes for !__ASSUME_SET_ROBUST_LIST Adhemerval Zanella via Libc-alpha
2021-08-26  9:42   ` Florian Weimer via Libc-alpha
2021-08-26 12:14     ` Adhemerval Zanella via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 04/19] nptl: Do not use pthread set_tid_address as state synchronization (BZ #19951) Adhemerval Zanella via Libc-alpha
2021-08-26 10:41   ` Florian Weimer via Libc-alpha [this message]
2021-08-26 14:58     ` Adhemerval Zanella via Libc-alpha
2021-08-26 15:06       ` Florian Weimer via Libc-alpha
2021-08-26 16:16         ` Adhemerval Zanella via Libc-alpha
2021-08-30 10:42           ` Florian Weimer via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 05/19] nptl: Move setxid flag out of cancelhandling Adhemerval Zanella via Libc-alpha
2021-08-26 11:34   ` Florian Weimer via Libc-alpha
2021-08-26 15:11     ` Adhemerval Zanella via Libc-alpha
2021-08-26 15:21       ` Florian Weimer via Libc-alpha
2021-08-26 16:39         ` Adhemerval Zanella via Libc-alpha
2021-08-30 11:27           ` Florian Weimer via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 06/19] nptl: Replace struct thread cancelhandling field Adhemerval Zanella via Libc-alpha
2021-08-26 14:34   ` Florian Weimer via Libc-alpha
2021-08-26 16:48     ` Adhemerval Zanella via Libc-alpha
2021-08-30 10:36       ` Florian Weimer via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 07/19] support: Add support_wait_for_thread_exit Adhemerval Zanella via Libc-alpha
2021-08-26  9:31   ` Florian Weimer via Libc-alpha
2021-08-26 16:49     ` Adhemerval Zanella via Libc-alpha
2021-08-30 11:46       ` Florian Weimer via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 08/19] nptl: pthread_kill, pthread_cancel should fail after exit (bug 19193) Adhemerval Zanella via Libc-alpha
2021-08-26 10:03   ` Florian Weimer via Libc-alpha
2021-08-26 16:49     ` Adhemerval Zanella via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 09/19] nptl: Fix race between pthread_kill and thread exit (bug 12889) Adhemerval Zanella via Libc-alpha
2021-08-26 14:23   ` Florian Weimer via Libc-alpha
2021-08-26 17:06     ` Adhemerval Zanella via Libc-alpha
2021-08-30  9:25       ` Florian Weimer via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 10/19] nptl: Use tidlock when accessing TID on pthread_getaffinity_np Adhemerval Zanella via Libc-alpha
2021-08-26 14:24   ` Florian Weimer via Libc-alpha
2021-08-26 17:29     ` Adhemerval Zanella via Libc-alpha
2021-08-30  9:30       ` Florian Weimer via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 11/19] nptl: Use tidlock when accessing TID on pthread_setaffinity Adhemerval Zanella via Libc-alpha
2021-08-26 14:25   ` Florian Weimer via Libc-alpha
2021-08-26 17:31     ` Adhemerval Zanella via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 12/19] nptl: Use tidlock when accessing TID on pthread_getcpuclockid Adhemerval Zanella via Libc-alpha
2021-08-26 14:27   ` Florian Weimer via Libc-alpha
2021-08-26 17:41     ` Adhemerval Zanella via Libc-alpha
2021-08-30  9:34       ` Florian Weimer via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 13/19] nptl: Use tidlock when accessing TID on pthread_getschedparam Adhemerval Zanella via Libc-alpha
2021-08-26 15:00   ` Florian Weimer via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 14/19] nptl: Use tidlock when accessing TID on pthread_setschedparam Adhemerval Zanella via Libc-alpha
2021-08-26 14:35   ` Florian Weimer via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 15/19] nptl: Use tidlock when accessing TID on pthread_getname_np Adhemerval Zanella via Libc-alpha
2021-08-26 14:38   ` Florian Weimer via Libc-alpha
2021-08-26 17:45     ` Adhemerval Zanella via Libc-alpha
2021-08-30  9:37       ` Florian Weimer via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 16/19] nptl: Use tidlock when accessing TID on pthread_setname_np Adhemerval Zanella via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 17/19] nptl: Use tidlock when accessing TID on pthread_sigqueue Adhemerval Zanella via Libc-alpha
2021-08-26 14:43   ` Florian Weimer via Libc-alpha
2021-08-26 17:49     ` Adhemerval Zanella via Libc-alpha
2021-08-30  9:26       ` Florian Weimer via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 18/19] nptl: Use tidlock when accessing TID on pthread_setschedprio Adhemerval Zanella via Libc-alpha
2021-08-23 19:50 ` [PATCH v2 19/19] nptl: Remove INVALID_TD_P Adhemerval Zanella via Libc-alpha
2021-08-26  9:30   ` Florian Weimer via Libc-alpha
2021-08-26 14:47 ` [PATCH v2 00/19] Fix various NPTL synchronization issues Florian Weimer via Libc-alpha
2021-08-26 18:19   ` Adhemerval Zanella via Libc-alpha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/libc/involved.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87y28o7ake.fsf@oldenburg.str.redhat.com \
    --to=libc-alpha@sourceware.org \
    --cc=adhemerval.zanella@linaro.org \
    --cc=fweimer@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).