From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-bounces@sourceware.org>
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net
X-Spam-Level: 
X-Spam-ASN: AS3215 2.6.0.0/16
X-Spam-Status: No, score=-4.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham
	autolearn_force=no version=3.4.2
Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
	(No client certificate requested)
	by dcvr.yhbt.net (Postfix) with ESMTPS id 390CF1F5AE
	for <e@80x24.org>; Fri, 14 May 2021 06:51:40 +0000 (UTC)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 18086393BC2F;
	Fri, 14 May 2021 06:51:39 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 18086393BC2F
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org;
	s=default; t=1620975099;
	bh=Zjh7o+4GAHlois24AfYX8eIyJhrwG3POldKCM7G6T/k=;
	h=References:In-Reply-To:Date:Subject:To:List-Id:List-Unsubscribe:
	 List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
	 From;
	b=ukj0R8PFYE3YnAF3NDn0vc5+kSEbw/WS53ZTFLvNEUHwyMavqkpeiqRWV+3FGkFxt
	 Myhbuyttt8s2KD1dpG0vQyY5Y0iofiB/ZvN1Dih8ALGYHbW7X1CHSCYCAAMty1w7H+
	 xJFgvADq4CKoKjd0um6R1O3jNgEyOsYbdUBMKoDQ=
Received: from mail-pj1-x1032.google.com (mail-pj1-x1032.google.com
 [IPv6:2607:f8b0:4864:20::1032])
 by sourceware.org (Postfix) with ESMTPS id AA0D1393BC3D
 for <libc-alpha@sourceware.org>; Fri, 14 May 2021 06:51:34 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org AA0D1393BC3D
Received: by mail-pj1-x1032.google.com with SMTP id
 o17-20020a17090a9f91b029015cef5b3c50so1202882pjp.4
 for <libc-alpha@sourceware.org>; Thu, 13 May 2021 23:51:34 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20161025;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=r2/iEWFA+McjsmNd+vsEHCv8dxa7dAL43V5kkl4ioVw=;
 b=Kehvlz5D8REoJysIsoV3Ff/zxAIdic0EgYijwRwxIU/ZbKBr5O9SJE5UD+qjPTqgQ4
 5WYw0+6XF+Sa0KhTovCcE/pGKPda8sgLmwvdlkkWBe2w8K7XXd+bXXrViJ+5Ph1LO1XN
 iP0wbS6LDyJ/IWGIHXpjwX3SqMlm/F062ewF4M4guDGXpPGNiGA1khwSTfw61wOxJ1pq
 dvGq5pnPtjvFpZFaQzL13U0y9iDY+bxc8lHaIYSpSCN8NOYvYiLP28Ohj67uY8arpDUi
 ZHFJAE+ArEUafAowUzAG2uXizroQTSDnXDSHsVX4iIkcDh5c6SYzA00493vC9J8ik8U1
 wJXQ==
X-Gm-Message-State: AOAM531W94XMj9NxGeuByZ2LxiQQ654MRdx5/ah0KAbhJZqYypXTvpH5
 rPYbzIv23Upe4YjgRilFp3zLPiMu8Fd2OB/W/pCvT7cbhF0=
X-Google-Smtp-Source: ABdhPJzKjy8aqrNxe7kZgGAiTWbjAu0nxKDK7KHaGFOnH3TyGgvffI7NDp41PQB3H/RluY36T5whgou/dFNIcNGZ3UU=
X-Received: by 2002:a17:902:b687:b029:eb:6491:b3f7 with SMTP id
 c7-20020a170902b687b02900eb6491b3f7mr44310697pls.38.1620975093571; Thu, 13
 May 2021 23:51:33 -0700 (PDT)
MIME-Version: 1.0
References: <CAPjTjwvPr_Jq8B-mvmpgjaX__G1qutJTQjnSP0yeDbfT46nLGA@mail.gmail.com>
 <20210426192729.1745682-1-vitalybuka@google.com>
 <92662aba-5d35-7840-0fc5-98497ede7afb@linaro.org>
In-Reply-To: <92662aba-5d35-7840-0fc5-98497ede7afb@linaro.org>
Date: Thu, 13 May 2021 23:50:56 -0700
Message-ID: <CAPjTjwvyEr=BMfEUTo3b8PZS3kmdXMtQoq=8xkCaieOJ5DpsJw@mail.gmail.com>
Subject: Re: [PATCH] stdlib: Fix data race in __run_exit_handlers
To: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Content-Type: text/plain; charset="UTF-8"
X-Content-Filtered-By: Mailman/MimeDel 2.1.29
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
From: Vitaly Buka via Libc-alpha <libc-alpha@sourceware.org>
Reply-To: Vitaly Buka <vitalybuka@google.com>
Cc: GLIBC Devel <libc-alpha@sourceware.org>
Errors-To: libc-alpha-bounces@sourceware.org
Sender: "Libc-alpha" <libc-alpha-bounces@sourceware.org>

Thank you. These improvements look good to me. Please push it.

On Thu, 13 May 2021 at 06:15, Adhemerval Zanella <
adhemerval.zanella@linaro.org> wrote:

>
>
> On 26/04/2021 16:27, Vitaly Buka via Libc-alpha wrote:
> > Fixes https://sourceware.org/bugzilla/show_bug.cgi?id=27749
> >
> > Keep __exit_funcs_lock almost all the time and unlock it only to execute
> > callbacks. This fixed two issues.
> >
> > 1. f->func.cxa was modified outside the lock with rare data race like:
> >       thread 0: __run_exit_handlers unlock __exit_funcs_lock
> >       thread 1: __internal_atexit locks __exit_funcs_lock
> >       thread 0: f->flavor = ef_free;
> >       thread 1: sees ef_free and use it as new
> >       thread 1: new->func.cxa.fn = (void (*) (void *, int)) func;
> >       thread 1: new->func.cxa.arg = arg;
> >       thread 1: new->flavor = ef_cxa;
> >       thread 0: cxafct = f->func.cxa.fn;  // it's wrong fn!
> >       thread 0: cxafct (f->func.cxa.arg, status);  // it's wrong arg!
> >       thread 0: goto restart;
> >       thread 0: call the same exit_function again as it's ef_cxa
>
> Ok, the small window between fetching the function pointer and argument
> from the list is triggering a race condition.
>
> >
> > 2. Don't unlock in main while loop after *listp = cur->next. If *listp
> >    is NULL and __exit_funcs_done is false another thread may fail in
> >    __new_exitfn on assert (l != NULL):
> >        thread 0: *listp = cur->next;  // It can be the last: *listp =
> NULL.
> >        thread 0: __libc_lock_unlock
> >        thread 1: __libc_lock_lock in __on_exit
> >        thread 1: __new_exitfn
> >        thread 1: if (__exit_funcs_done)  // false: thread 0 isn't there
> yet.
> >        thread 1: l = *listp
> >        thread 1: moves one and crashes on assert (l != NULL);
>
> Yeah, this is tricky but it does look correct.  I guess the lock/unlock
> during the loop was added to give a chance to concurrent
> __cxa_atexit / on_exit to have a chance to add a new callback, but it
> also only complicates things as you noted.  We might try to fix it on the
> __new_exitfn (to avoid the assert), but I see the current approach of
> locking the list and only unlocking while running the callback is the
> right approach.
>
> The patch look ok in general, I added some comments below.  I have
> adjusted the patch based on my comments [1], if you are ok with them
> I can push it upstream.
>
> [1]
> https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/bz27749-atexit-fix
>
> >
> > The test needs multiple iterations to consistently fail without the fix.
> > ---
> >  stdlib/Makefile                |   4 +-
> >  stdlib/exit.c                  |  28 ++++++---
> >  stdlib/test-cxa_atexit-race2.c | 110 +++++++++++++++++++++++++++++++++
> >  3 files changed, 131 insertions(+), 11 deletions(-)
> >  create mode 100644 stdlib/test-cxa_atexit-race2.c
> >
> > diff --git a/stdlib/Makefile b/stdlib/Makefile
> > index b3b30ab73e..f5755a1654 100644
> > --- a/stdlib/Makefile
> > +++ b/stdlib/Makefile
> > @@ -81,7 +81,8 @@ tests               := tst-strtol tst-strtod testmb
> testrand testsort testdiv   \
> >                  tst-width-stdint tst-strfrom tst-strfrom-locale
>   \
> >                  tst-getrandom tst-atexit tst-at_quick_exit
>  \
> >                  tst-cxa_atexit tst-on_exit test-atexit-race
>   \
> > -                test-at_quick_exit-race test-cxa_atexit-race
>  \
> > +                test-at_quick_exit-race test-cxa_atexit-race
>  \
> > +                test-cxa_atexit-race2
>   \
> >                  test-on_exit-race test-dlclose-exit-race
>  \
> >                  tst-makecontext-align test-bz22786 tst-strtod-nan-sign \
> >                  tst-swapcontext1 tst-setcontext4 tst-setcontext5 \
> > @@ -100,6 +101,7 @@ endif
> >  LDLIBS-test-atexit-race = $(shared-thread-library)
> >  LDLIBS-test-at_quick_exit-race = $(shared-thread-library)
> >  LDLIBS-test-cxa_atexit-race = $(shared-thread-library)
> > +LDLIBS-test-cxa_atexit-race2 = $(shared-thread-library)
> >  LDLIBS-test-on_exit-race = $(shared-thread-library)
> >  LDLIBS-tst-canon-bz26341 = $(shared-thread-library)
> >
> > diff --git a/stdlib/exit.c b/stdlib/exit.c
> > index bed82733ad..f095b38ab3 100644
> > --- a/stdlib/exit.c
> > +++ b/stdlib/exit.c
> > @@ -45,6 +45,8 @@ __run_exit_handlers (int status, struct
> exit_function_list **listp,
> >      if (run_dtors)
> >        __call_tls_dtors ();
> >
> > +  __libc_lock_lock (__exit_funcs_lock);
> > +
> >    /* We do it this way to handle recursive calls to exit () made by
> >       the functions registered with `atexit' and `on_exit'. We call
> >       everyone on the list and use the status value in the last
>
> Ok, it avoids the second race condition.
>
> > @@ -53,8 +55,6 @@ __run_exit_handlers (int status, struct
> exit_function_list **listp,
> >      {
> >        struct exit_function_list *cur;
> >
> > -      __libc_lock_lock (__exit_funcs_lock);
> > -
> >      restart:
> >        cur = *listp;
> >
>
> I think there is no need use the goto anymore, since there is no need
> to unlock the lock within the loop (the goto can be just a continue).
>
> > @@ -63,7 +63,6 @@ __run_exit_handlers (int status, struct
> exit_function_list **listp,
> >         /* Exit processing complete.  We will not allow any more
> >            atexit/on_exit registrations.  */
> >         __exit_funcs_done = true;
> > -       __libc_lock_unlock (__exit_funcs_lock);
> >         break;
> >       }
> >
>
> Ok, there is no need to unlock on break anymore.
>
> > @@ -72,44 +71,52 @@ __run_exit_handlers (int status, struct
> exit_function_list **listp,
> >         struct exit_function *const f = &cur->fns[--cur->idx];
> >         const uint64_t new_exitfn_called = __new_exitfn_called;
> >
> > -       /* Unlock the list while we call a foreign function.  */
> > -       __libc_lock_unlock (__exit_funcs_lock);
> >         switch (f->flavor)
> >           {
> >             void (*atfct) (void);
> >             void (*onfct) (int status, void *arg);
> >             void (*cxafct) (void *arg, int status);
> > +           void *arg;
> >
> >           case ef_free:
> >           case ef_us:
> >             break;
> >           case ef_on:
> >             onfct = f->func.on.fn;
> > +           arg = f->func.on.arg;
> >  #ifdef PTR_DEMANGLE
> >             PTR_DEMANGLE (onfct);
> >  #endif
> > -           onfct (status, f->func.on.arg);
> > +           /* Unlock the list while we call a foreign function.  */
> > +           __libc_lock_unlock (__exit_funcs_lock);
> > +           onfct (status, arg);
> > +           __libc_lock_lock (__exit_funcs_lock);
> >             break;
> >           case ef_at:
> >             atfct = f->func.at;
>
> Ok.
>
> >  #ifdef PTR_DEMANGLE
> >             PTR_DEMANGLE (atfct);
> >  #endif
> > +           /* Unlock the list while we call a foreign function.  */
> > +           __libc_lock_unlock (__exit_funcs_lock);
> >             atfct ();
> > +           __libc_lock_lock (__exit_funcs_lock);
> >             break;
>
> Ok.
>
> >           case ef_cxa:
> >             /* To avoid dlclose/exit race calling cxafct twice (BZ
> 22180),
> >                we must mark this function as ef_free.  */
> >             f->flavor = ef_free;
> >             cxafct = f->func.cxa.fn;
> > +           arg = f->func.cxa.arg;
> >  #ifdef PTR_DEMANGLE
> >             PTR_DEMANGLE (cxafct);
> >  #endif
> > -           cxafct (f->func.cxa.arg, status);
> > +           /* Unlock the list while we call a foreign function.  */
> > +           __libc_lock_unlock (__exit_funcs_lock);
> > +           cxafct (arg, status);
> > +           __libc_lock_lock (__exit_funcs_lock);
> >             break;
> >           }
> > -       /* Re-lock again before looking at global state.  */
> > -       __libc_lock_lock (__exit_funcs_lock);
> >
> >         if (__glibc_unlikely (new_exitfn_called != __new_exitfn_called))
> >           /* The last exit function, or another thread, has registered
>
> Ok.
>
> > @@ -123,9 +130,10 @@ __run_exit_handlers (int status, struct
> exit_function_list **listp,
> >          allocate element.  */
> >       free (cur);
> >
> > -      __libc_lock_unlock (__exit_funcs_lock);
>
> Just remove the extra newline below as well.
>
> >      }
> >
> > +  __libc_lock_unlock (__exit_funcs_lock);
> > +
> >    if (run_list_atexit)
> >      RUN_HOOK (__libc_atexit, ());
> >
>
> Ok.
>
> > diff --git a/stdlib/test-cxa_atexit-race2.c
> b/stdlib/test-cxa_atexit-race2.c
> > new file mode 100644
> > index 0000000000..d8c3d418e7
> > --- /dev/null
> > +++ b/stdlib/test-cxa_atexit-race2.c
> > @@ -0,0 +1,110 @@
> > +/* Support file for atexit/exit, etc. race tests.
>
> I think it would be good to add a reference to the bug report.
>
> > +   Copyright (C) 2017-2021 Free Software Foundation, Inc.
> > +   This file is part of the GNU C Library.
> > +
> > +   The GNU C Library is free software; you can redistribute it and/or
> > +   modify it under the terms of the GNU Lesser General Public
> > +   License as published by the Free Software Foundation; either
> > +   version 2.1 of the License, or (at your option) any later version.
> > +
> > +   The GNU C Library is distributed in the hope that it will be useful,
> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> > +   Lesser General Public License for more details.
> > +
> > +   You should have received a copy of the GNU Lesser General Public
> > +   License along with the GNU C Library; if not, see
> > +   <https://www.gnu.org/licenses/>.  */
> > +
> > +/* This file must be run from within a directory called "stdlib".  */
>
> I don't think this true.
>
> > +
> > +/* The atexit/exit, at_quick_exit/quick_exit, __cxa_atexit/exit, etc.
> exhibited
> > +   data race while calling destructors.
> > +
> > +   This test registers destructors from the background thread, and
> checks that
> > +   the same destructor is not called more than once.  */
> > +
> > +#include <stdatomic.h>
> > +#include <stdio.h>
> > +#include <stdlib.h>
> > +#include <support/xthread.h>
> > +#include <sys/wait.h>
> > +#include <unistd.h>
> > +
> > +static atomic_int registered;
> > +static atomic_int todo = 100000;
> > +
> > +static void
> > +atexit_cb (void *arg)
> > +{
> > +  atomic_fetch_sub (&registered, 1);
> > +  static void *prev;
> > +  if (arg == prev)
> > +    {
> > +      printf ("%p\n", arg);
> > +      abort ();
>
> Use FAIL_EXIT1 here.
>
> > +    }
> > +  prev = arg;
> > +
> > +  while (atomic_load (&todo) > 0 && atomic_load (&registered) < 100)
> > +    ;
> > +}
> > +
> > +int __cxa_atexit (void (*func) (void *), void *arg, void *d);
> > +
> > +static void *
> > +thread_func (void *arg)
> > +{
> > +  void *cb_arg = NULL;
> > +  while (atomic_load (&todo) > 0)
>
> Add a open bracket here.
>
> > +    if (atomic_load (&registered) < 10000)
> > +      {
> > +        int n = 10;
> > +        for (int i = 0; i < n; ++i)
> > +          __cxa_atexit (&atexit_cb, ++cb_arg, 0);
> > +        atomic_fetch_add (&registered, n);
> > +        atomic_fetch_sub (&todo, n);
> > +      }
> > +  return 0;
>
> Use NULL here.
>
> > +}
> > +
> > +static void
>
> I would add a _Noreturn here.
>
> > +test_and_exit (void)
> > +{
> > +  pthread_attr_t attr;
> > +
> > +  xpthread_attr_init (&attr);
> > +  xpthread_attr_setdetachstate (&attr, 1);
> > +
> > +  xpthread_create (&attr, thread_func, NULL);
> > +  xpthread_attr_destroy (&attr);
> > +  while (!atomic_load (&registered))
>
> Check for 0 here (unless the return value is a bool the type check
> should be explicit).
>
> > +    ;
> > +  exit (0);
> > +}
> > +
> > +static int
> > +do_test (void)
> > +{
> > +  for (int i = 0; i < 20; ++i)
> > +    {
> > +      for (int i = 0; i < 10; ++i)
> > +        if (fork () == 0)
>
> Use xfork.
>
> > +          test_and_exit ();
> > +
> > +      int status;
> > +      while (wait (&status) > 0)
> > +        {
> > +          if (!WIFEXITED (status))
>
> I prefer if we limit the number of wait call to check for invalid
> return codes:
>
>   for (int i = 0; i < 10; ++i)
>     {
>       int status;
>       xwaitpid (0, &status, 0);
>       if (!WIFEXITED (status))
>         FAIL_EXIT1 ("Failed iterations %d", i);
>       TEST_COMPARE (WEXITSTATUS (status), 0);
>     }
>
> > +            {
> > +              printf ("Failed interation %d\n", i);
> > +              abort ();
>
> Use FAIL_EXIT1 here.
>
> > +            }
> > +        }
> > +    }
> > +
> > +  exit (0);
>
> There is no need to add an exit here.
>
> > +}
> > +
> > +#define TEST_FUNCTION do_test
> > +#include <support/test-driver.c>
> >
>