From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS17314 8.43.84.0/22 X-Spam-Status: No, score=-4.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 12E0B1F5AE for ; Thu, 13 May 2021 13:15:48 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AF41F383D026; Thu, 13 May 2021 13:15:46 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org AF41F383D026 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1620911746; bh=tMIkoPpHnaCxjSTf2FcvkrTwyimgTK2tRgk4hzzlF8o=; h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=Y9X1qkZf0G95p6idiMieHUcsG2uIaa6sYgRHtcDUxKqQG7rAHdm7P1l8mUEbf8FfS hPWZk0dy9V7meVoTmc5CaTfdCKL1qThgfsT8r9009ZPMOsb1IvLB9TVtnlXgmC3ldL 1wxwELMLhFxlg94VAcaf/J5lsLazBK/ixessX/Y0= Received: from mail-qv1-xf30.google.com (mail-qv1-xf30.google.com [IPv6:2607:f8b0:4864:20::f30]) by sourceware.org (Postfix) with ESMTPS id 30E933896C0D for ; Thu, 13 May 2021 13:15:43 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 30E933896C0D Received: by mail-qv1-xf30.google.com with SMTP id v18so3572395qvx.10 for ; Thu, 13 May 2021 06:15:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=tMIkoPpHnaCxjSTf2FcvkrTwyimgTK2tRgk4hzzlF8o=; b=JFUTWyYXrv5PsN7BXyeW6RZNh2lXJZCHdNQGAEsJ/hZ3hvy1tBNGMG85+cGzYmD3J0 nJ4pqHZiveIfUSPPtNQyGY2mjZ1lD8axd60hOjjfGr/qCuwzsV7kyceg+pWhNwmp/BA6 P+Y3HMqviOtkFXMSY8QwadTCusj/3aQUyiR9ssgT5ZI/gYAg+v2RsLPYmX66qwEOeo30 BL4KBkByHtoSGWgvaBDnIZuhASE+jmbQHM404YCawn60pyWswH2mjt2N2bTqNCukLCvE 5YM9+ZVJpIgxABF9QY4ulsEw3CT/e03XfMv75gtuyVIMH2KFhgoSwzH808ZKT255J7Wy G8BQ== X-Gm-Message-State: AOAM531aEwK/W27INxfC6hCS+XS10smoBqyJ8eoPUiFFZrezPgCNpxZb zXM2Al16uDQ/vscUYMXgf3IATTfRjeqPug== X-Google-Smtp-Source: ABdhPJxOVHcQ8ED1oYHendR3YOkMOwQk3Pt5ZC+v8uepPRnsPhzqD6M/5LbxTW+hRBrdgLFkyEmb7A== X-Received: by 2002:a0c:f044:: with SMTP id b4mr40969906qvl.3.1620911742295; Thu, 13 May 2021 06:15:42 -0700 (PDT) Received: from [192.168.1.4] ([177.194.37.86]) by smtp.gmail.com with ESMTPSA id o12sm2346137qtg.14.2021.05.13.06.15.40 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 13 May 2021 06:15:41 -0700 (PDT) Subject: Re: [PATCH] stdlib: Fix data race in __run_exit_handlers To: Vitaly Buka , libc-alpha@sourceware.org References: <20210426192729.1745682-1-vitalybuka@google.com> Message-ID: <92662aba-5d35-7840-0fc5-98497ede7afb@linaro.org> Date: Thu, 13 May 2021 10:15:39 -0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: <20210426192729.1745682-1-vitalybuka@google.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Adhemerval Zanella via Libc-alpha Reply-To: Adhemerval Zanella Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" On 26/04/2021 16:27, Vitaly Buka via Libc-alpha wrote: > Fixes https://sourceware.org/bugzilla/show_bug.cgi?id=27749 > > Keep __exit_funcs_lock almost all the time and unlock it only to execute > callbacks. This fixed two issues. > > 1. f->func.cxa was modified outside the lock with rare data race like: > thread 0: __run_exit_handlers unlock __exit_funcs_lock > thread 1: __internal_atexit locks __exit_funcs_lock > thread 0: f->flavor = ef_free; > thread 1: sees ef_free and use it as new > thread 1: new->func.cxa.fn = (void (*) (void *, int)) func; > thread 1: new->func.cxa.arg = arg; > thread 1: new->flavor = ef_cxa; > thread 0: cxafct = f->func.cxa.fn; // it's wrong fn! > thread 0: cxafct (f->func.cxa.arg, status); // it's wrong arg! > thread 0: goto restart; > thread 0: call the same exit_function again as it's ef_cxa Ok, the small window between fetching the function pointer and argument from the list is triggering a race condition. > > 2. Don't unlock in main while loop after *listp = cur->next. If *listp > is NULL and __exit_funcs_done is false another thread may fail in > __new_exitfn on assert (l != NULL): > thread 0: *listp = cur->next; // It can be the last: *listp = NULL. > thread 0: __libc_lock_unlock > thread 1: __libc_lock_lock in __on_exit > thread 1: __new_exitfn > thread 1: if (__exit_funcs_done) // false: thread 0 isn't there yet. > thread 1: l = *listp > thread 1: moves one and crashes on assert (l != NULL); Yeah, this is tricky but it does look correct. I guess the lock/unlock during the loop was added to give a chance to concurrent __cxa_atexit / on_exit to have a chance to add a new callback, but it also only complicates things as you noted. We might try to fix it on the __new_exitfn (to avoid the assert), but I see the current approach of locking the list and only unlocking while running the callback is the right approach. The patch look ok in general, I added some comments below. I have adjusted the patch based on my comments [1], if you are ok with them I can push it upstream. [1] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/bz27749-atexit-fix > > The test needs multiple iterations to consistently fail without the fix. > --- > stdlib/Makefile | 4 +- > stdlib/exit.c | 28 ++++++--- > stdlib/test-cxa_atexit-race2.c | 110 +++++++++++++++++++++++++++++++++ > 3 files changed, 131 insertions(+), 11 deletions(-) > create mode 100644 stdlib/test-cxa_atexit-race2.c > > diff --git a/stdlib/Makefile b/stdlib/Makefile > index b3b30ab73e..f5755a1654 100644 > --- a/stdlib/Makefile > +++ b/stdlib/Makefile > @@ -81,7 +81,8 @@ tests := tst-strtol tst-strtod testmb testrand testsort testdiv \ > tst-width-stdint tst-strfrom tst-strfrom-locale \ > tst-getrandom tst-atexit tst-at_quick_exit \ > tst-cxa_atexit tst-on_exit test-atexit-race \ > - test-at_quick_exit-race test-cxa_atexit-race \ > + test-at_quick_exit-race test-cxa_atexit-race \ > + test-cxa_atexit-race2 \ > test-on_exit-race test-dlclose-exit-race \ > tst-makecontext-align test-bz22786 tst-strtod-nan-sign \ > tst-swapcontext1 tst-setcontext4 tst-setcontext5 \ > @@ -100,6 +101,7 @@ endif > LDLIBS-test-atexit-race = $(shared-thread-library) > LDLIBS-test-at_quick_exit-race = $(shared-thread-library) > LDLIBS-test-cxa_atexit-race = $(shared-thread-library) > +LDLIBS-test-cxa_atexit-race2 = $(shared-thread-library) > LDLIBS-test-on_exit-race = $(shared-thread-library) > LDLIBS-tst-canon-bz26341 = $(shared-thread-library) > > diff --git a/stdlib/exit.c b/stdlib/exit.c > index bed82733ad..f095b38ab3 100644 > --- a/stdlib/exit.c > +++ b/stdlib/exit.c > @@ -45,6 +45,8 @@ __run_exit_handlers (int status, struct exit_function_list **listp, > if (run_dtors) > __call_tls_dtors (); > > + __libc_lock_lock (__exit_funcs_lock); > + > /* We do it this way to handle recursive calls to exit () made by > the functions registered with `atexit' and `on_exit'. We call > everyone on the list and use the status value in the last Ok, it avoids the second race condition. > @@ -53,8 +55,6 @@ __run_exit_handlers (int status, struct exit_function_list **listp, > { > struct exit_function_list *cur; > > - __libc_lock_lock (__exit_funcs_lock); > - > restart: > cur = *listp; > I think there is no need use the goto anymore, since there is no need to unlock the lock within the loop (the goto can be just a continue). > @@ -63,7 +63,6 @@ __run_exit_handlers (int status, struct exit_function_list **listp, > /* Exit processing complete. We will not allow any more > atexit/on_exit registrations. */ > __exit_funcs_done = true; > - __libc_lock_unlock (__exit_funcs_lock); > break; > } > Ok, there is no need to unlock on break anymore. > @@ -72,44 +71,52 @@ __run_exit_handlers (int status, struct exit_function_list **listp, > struct exit_function *const f = &cur->fns[--cur->idx]; > const uint64_t new_exitfn_called = __new_exitfn_called; > > - /* Unlock the list while we call a foreign function. */ > - __libc_lock_unlock (__exit_funcs_lock); > switch (f->flavor) > { > void (*atfct) (void); > void (*onfct) (int status, void *arg); > void (*cxafct) (void *arg, int status); > + void *arg; > > case ef_free: > case ef_us: > break; > case ef_on: > onfct = f->func.on.fn; > + arg = f->func.on.arg; > #ifdef PTR_DEMANGLE > PTR_DEMANGLE (onfct); > #endif > - onfct (status, f->func.on.arg); > + /* Unlock the list while we call a foreign function. */ > + __libc_lock_unlock (__exit_funcs_lock); > + onfct (status, arg); > + __libc_lock_lock (__exit_funcs_lock); > break; > case ef_at: > atfct = f->func.at; Ok. > #ifdef PTR_DEMANGLE > PTR_DEMANGLE (atfct); > #endif > + /* Unlock the list while we call a foreign function. */ > + __libc_lock_unlock (__exit_funcs_lock); > atfct (); > + __libc_lock_lock (__exit_funcs_lock); > break; Ok. > case ef_cxa: > /* To avoid dlclose/exit race calling cxafct twice (BZ 22180), > we must mark this function as ef_free. */ > f->flavor = ef_free; > cxafct = f->func.cxa.fn; > + arg = f->func.cxa.arg; > #ifdef PTR_DEMANGLE > PTR_DEMANGLE (cxafct); > #endif > - cxafct (f->func.cxa.arg, status); > + /* Unlock the list while we call a foreign function. */ > + __libc_lock_unlock (__exit_funcs_lock); > + cxafct (arg, status); > + __libc_lock_lock (__exit_funcs_lock); > break; > } > - /* Re-lock again before looking at global state. */ > - __libc_lock_lock (__exit_funcs_lock); > > if (__glibc_unlikely (new_exitfn_called != __new_exitfn_called)) > /* The last exit function, or another thread, has registered Ok. > @@ -123,9 +130,10 @@ __run_exit_handlers (int status, struct exit_function_list **listp, > allocate element. */ > free (cur); > > - __libc_lock_unlock (__exit_funcs_lock); Just remove the extra newline below as well. > } > > + __libc_lock_unlock (__exit_funcs_lock); > + > if (run_list_atexit) > RUN_HOOK (__libc_atexit, ()); > Ok. > diff --git a/stdlib/test-cxa_atexit-race2.c b/stdlib/test-cxa_atexit-race2.c > new file mode 100644 > index 0000000000..d8c3d418e7 > --- /dev/null > +++ b/stdlib/test-cxa_atexit-race2.c > @@ -0,0 +1,110 @@ > +/* Support file for atexit/exit, etc. race tests. I think it would be good to add a reference to the bug report. > + Copyright (C) 2017-2021 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + . */ > + > +/* This file must be run from within a directory called "stdlib". */ I don't think this true. > + > +/* The atexit/exit, at_quick_exit/quick_exit, __cxa_atexit/exit, etc. exhibited > + data race while calling destructors. > + > + This test registers destructors from the background thread, and checks that > + the same destructor is not called more than once. */ > + > +#include > +#include > +#include > +#include > +#include > +#include > + > +static atomic_int registered; > +static atomic_int todo = 100000; > + > +static void > +atexit_cb (void *arg) > +{ > + atomic_fetch_sub (®istered, 1); > + static void *prev; > + if (arg == prev) > + { > + printf ("%p\n", arg); > + abort (); Use FAIL_EXIT1 here. > + } > + prev = arg; > + > + while (atomic_load (&todo) > 0 && atomic_load (®istered) < 100) > + ; > +} > + > +int __cxa_atexit (void (*func) (void *), void *arg, void *d); > + > +static void * > +thread_func (void *arg) > +{ > + void *cb_arg = NULL; > + while (atomic_load (&todo) > 0) Add a open bracket here. > + if (atomic_load (®istered) < 10000) > + { > + int n = 10; > + for (int i = 0; i < n; ++i) > + __cxa_atexit (&atexit_cb, ++cb_arg, 0); > + atomic_fetch_add (®istered, n); > + atomic_fetch_sub (&todo, n); > + } > + return 0; Use NULL here. > +} > + > +static void I would add a _Noreturn here. > +test_and_exit (void) > +{ > + pthread_attr_t attr; > + > + xpthread_attr_init (&attr); > + xpthread_attr_setdetachstate (&attr, 1); > + > + xpthread_create (&attr, thread_func, NULL); > + xpthread_attr_destroy (&attr); > + while (!atomic_load (®istered)) Check for 0 here (unless the return value is a bool the type check should be explicit). > + ; > + exit (0); > +} > + > +static int > +do_test (void) > +{ > + for (int i = 0; i < 20; ++i) > + { > + for (int i = 0; i < 10; ++i) > + if (fork () == 0) Use xfork. > + test_and_exit (); > + > + int status; > + while (wait (&status) > 0) > + { > + if (!WIFEXITED (status)) I prefer if we limit the number of wait call to check for invalid return codes: for (int i = 0; i < 10; ++i) { int status; xwaitpid (0, &status, 0); if (!WIFEXITED (status)) FAIL_EXIT1 ("Failed iterations %d", i); TEST_COMPARE (WEXITSTATUS (status), 0); } > + { > + printf ("Failed interation %d\n", i); > + abort (); Use FAIL_EXIT1 here. > + } > + } > + } > + > + exit (0); There is no need to add an exit here. > +} > + > +#define TEST_FUNCTION do_test > +#include >