From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-return-92245-e=80x24.org@sourceware.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net
X-Spam-Level: 
X-Spam-ASN: AS31976 209.132.180.0/23
X-Spam-Status: No, score=-0.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED,
	DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,
	SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no
	version=3.4.0
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by dcvr.yhbt.net (Postfix) with ESMTPS id F0F52200B9
	for <e@80x24.org>; Tue,  8 May 2018 17:11:49 +0000 (UTC)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
	:list-unsubscribe:list-subscribe:list-archive:list-post
	:list-help:sender:subject:to:references:from:message-id:date
	:mime-version:in-reply-to:content-type
	:content-transfer-encoding; q=dns; s=default; b=DVS/GBoWP5Xbp0PV
	y1uxVGweq12WNxnJoI1+7WuS0usn0Bb760VTxi7QRYVDd8513x1JE35vsm2LdS5M
	3ejABTNS/+WAzGFtYurfX8+TrrWKH4mCPlVZbsRERN8NEx/+h8COb2nQn+lv8KfQ
	8BL5gTZqlp2b5MFKimDWfst8Fno=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
	:list-unsubscribe:list-subscribe:list-archive:list-post
	:list-help:sender:subject:to:references:from:message-id:date
	:mime-version:in-reply-to:content-type
	:content-transfer-encoding; s=default; bh=Kq41TEN3HDGUa0CKaRKel6
	ufZ4s=; b=WYMv2YBmnEOQSrYSl+GNq/bWZ4u4ke3Ti0eVCT6xAbKlYdN+qPvmDd
	RNAsFJhh0K5NP69my3SfiF/JnokN1dUhIhG4VIflG/mqZSoujVAmRIpQp6jQxfXb
	XuH8R8heuP+h3A/odPfmbMuB1AdhMF8U78bfbqEzVeZblbs3c9/GA=
Received: (qmail 40441 invoked by alias); 8 May 2018 17:11:47 -0000
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Unsubscribe: <mailto:libc-alpha-unsubscribe-e=80x24.org@sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Sender: libc-alpha-owner@sourceware.org
Received: (qmail 40030 invoked by uid 89); 8 May 2018 17:11:46 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-HELO: mail-qt0-f179.google.com
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:subject:to:references:from:openpgp:autocrypt
         :message-id:date:user-agent:mime-version:in-reply-to
         :content-language:content-transfer-encoding;
        bh=oDLmJA0hFtFK7UwUN7+Qznit/nGYo/+79gRrg3CWvJ0=;
        b=cg+fbtP6UIehZ2rGzXOBuEwrhUlsmIG0qkZkQSvvtpWHYywB33KzaqnRrzBMFpvkFu
         EleZtf6Gte6kA82fRrmjpeG9Yg8AwEdb4J4HHOk71NJImba+rRiiYsL0fR2HarGvy/qO
         n4xOdbYLAbnF2Sy+X5y9kITj6UhNQfs0Ylz8xmmngTs7bCi43QMlGeOOJU3VekK2o2fx
         3CDPVFavK6XzShxpEEeWlmknWgP2nkI/SD/MXuy6D+ecCej8Roti0uGkxAoPfd+B3+4q
         ndlBbiZ8En3euOC1eR5/ziHUxO0xPV1104+BPI5wMPnfhj7/zHvcO2JFOckfwmrfG5xI
         gS5g==
X-Gm-Message-State: ALQs6tCvE3nS0NHqUFc761/NattS4XSYDSo40Mklu6eHnONjm1STGWgH
	7FoFelhvgj6I53juLvu4SZgeaMWm5PI=
X-Google-Smtp-Source: AB8JxZqFAy17DHeIO9MRJubqDH1uv9GqRwPPQVQ0tSCVwDCuosLblHvUFaN+E4dN3aBd19sXpenwFw==
X-Received: by 2002:ac8:3af:: with SMTP id t47-v6mr38302936qtg.117.1525799499317;
        Tue, 08 May 2018 10:11:39 -0700 (PDT)
Subject: Re: [PATCH v2 03/21] nptl: Fix Race conditions in pthread
 cancellation (BZ#12683)
To: Zack Weinberg <zackw@panix.com>, libc-alpha@sourceware.org
References: <20180507024909.5598-1-zackw@panix.com>
 <20180507024909.5598-3-zackw@panix.com>
From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Openpgp: preference=signencrypt
Autocrypt: addr=adhemerval.zanella@linaro.org; prefer-encrypt=mutual; keydata=
 xsFNBFcVGkoBEADiQU2x/cBBmAVf5C2d1xgz6zCnlCefbqaflUBw4hB/bEME40QsrVzWZ5Nq
 8kxkEczZzAOKkkvv4pRVLlLn/zDtFXhlcvQRJ3yFMGqzBjofucOrmdYkOGo0uCaoJKPT186L
 NWp53SACXguFJpnw4ODI64ziInzXQs/rUJqrFoVIlrPDmNv/LUv1OVPKz20ETjgfpg8MNwG6
 iMizMefCl+RbtXbIEZ3TE/IaDT/jcOirjv96lBKrc/pAL0h/O71Kwbbp43fimW80GhjiaN2y
 WGByepnkAVP7FyNarhdDpJhoDmUk9yfwNuIuESaCQtfd3vgKKuo6grcKZ8bHy7IXX1XJj2X/
 BgRVhVgMHAnDPFIkXtP+SiarkUaLjGzCz7XkUn4XAGDskBNfbizFqYUQCaL2FdbW3DeZqNIa
 nSzKAZK7Dm9+0VVSRZXP89w71Y7JUV56xL/PlOE+YKKFdEw+gQjQi0e+DZILAtFjJLoCrkEX
 w4LluMhYX/X8XP6/C3xW0yOZhvHYyn72sV4yJ1uyc/qz3OY32CRy+bwPzAMAkhdwcORA3JPb
 kPTlimhQqVgvca8m+MQ/JFZ6D+K7QPyvEv7bQ7M+IzFmTkOCwCJ3xqOD6GjX3aphk8Sr0dq3
 4Awlf5xFDAG8dn8Uuutb7naGBd/fEv6t8dfkNyzj6yvc4jpVxwARAQABzUlBZGhlbWVydmFs
 IFphbmVsbGEgTmV0dG8gKExpbmFybyBWUE4gS2V5KSA8YWRoZW1lcnZhbC56YW5lbGxhQGxp
 bmFyby5vcmc+wsF3BBMBCAAhBQJXFRpKAhsDBQsJCAcDBRUKCQgLBRYCAwEAAh4BAheAAAoJ
 EKqx7BSnlIjv0e8P/1YOYoNkvJ+AJcNUaM5a2SA9oAKjSJ/M/EN4Id5Ow41ZJS4lUA0apSXW
 NjQg3VeVc2RiHab2LIB4MxdJhaWTuzfLkYnBeoy4u6njYcaoSwf3g9dSsvsl3mhtuzm6aXFH
 /Qsauav77enJh99tI4T+58rp0EuLhDsQbnBic/ukYNv7sQV8dy9KxA54yLnYUFqH6pfH8Lly
 sTVAMyi5Fg5O5/hVV+Z0Kpr+ZocC1YFJkTsNLAW5EIYSP9ftniqaVsim7MNmodv/zqK0IyDB
 GLLH1kjhvb5+6ySGlWbMTomt/or/uvMgulz0bRS+LUyOmlfXDdT+t38VPKBBVwFMarNuREU2
 69M3a3jdTfScboDd2ck1u7l+QbaGoHZQ8ZNUrzgObltjohiIsazqkgYDQzXIMrD9H19E+8fw
 kCNUlXxjEgH/Kg8DlpoYJXSJCX0fjMWfXywL6ZXc2xyG/hbl5hvsLNmqDpLpc1CfKcA0BkK+
 k8R57fr91mTCppSwwKJYO9T+8J+o4ho/CJnK/jBy1pWKMYJPvvrpdBCWq3MfzVpXYdahRKHI
 ypk8m4QlRlbOXWJ3TDd/SKNfSSrWgwRSg7XCjSlR7PNzNFXTULLB34sZhjrN6Q8NQZsZnMNs
 TX8nlGOVrKolnQPjKCLwCyu8PhllU8OwbSMKskcD1PSkG6h3r0AqzsFNBFcVGkoBEACgAdbR
 Ck+fsfOVwT8zowMiL3l9a2DP3Eeak23ifdZG+8Avb/SImpv0UMSbRfnw/N81IWwlbjkjbGTu
 oT37iZHLRwYUFmA8fZX0wNDNKQUUTjN6XalJmvhdz9l71H3WnE0wneEM5ahu5V1L1utUWTyh
 VUwzX1lwJeV3vyrNgI1kYOaeuNVvq7npNR6t6XxEpqPsNc6O77I12XELic2+36YibyqlTJIQ
 V1SZEbIy26AbC2zH9WqaKyGyQnr/IPbTJ2Lv0dM3RaXoVf+CeK7gB2B+w1hZummD21c1Laua
 +VIMPCUQ+EM8W9EtX+0iJXxI+wsztLT6vltQcm+5Q7tY+HFUucizJkAOAz98YFucwKefbkTp
 eKvCfCwiM1bGatZEFFKIlvJ2QNMQNiUrqJBlW9nZp/k7pbG3oStOjvawD9ZbP9e0fnlWJIsj
 6c7pX354Yi7kxIk/6gREidHLLqEb/otuwt1aoMPg97iUgDV5mlNef77lWE8vxmlY0FBWIXuZ
 yv0XYxf1WF6dRizwFFbxvUZzIJp3spAao7jLsQj1DbD2s5+S1BW09A0mI/1DjB6EhNN+4bDB
 SJCOv/ReK3tFJXuj/HbyDrOdoMt8aIFbe7YFLEExHpSk+HgN05Lg5TyTro8oW7TSMTk+8a5M
 kzaH4UGXTTBDP/g5cfL3RFPl79ubXwARAQABwsFfBBgBCAAJBQJXFRpKAhsMAAoJEKqx7BSn
 lIjvI/8P/jg0jl4Tbvg3B5kT6PxJOXHYu9OoyaHLcay6Cd+ZrOd1VQQCbOcgLFbf4Yr+rE9l
 mYsY67AUgq2QKmVVbn9pjvGsEaz8UmfDnz5epUhDxC6yRRvY4hreMXZhPZ1pbMa6A0a/WOSt
 AgFj5V6Z4dXGTM/lNManr0HjXxbUYv2WfbNt3/07Db9T+GZkpUotC6iknsTA4rJi6u2ls0W9
 1UIvW4o01vb4nZRCj4rni0g6eWoQCGoVDk/xFfy7ZliR5B+3Z3EWRJcQskip/QAHjbLa3pml
 xAZ484fVxgeESOoaeC9TiBIp0NfH8akWOI0HpBCiBD5xaCTvR7ujUWMvhsX2n881r/hNlR9g
 fcE6q00qHSPAEgGr1bnFv74/1vbKtjeXLCcRKk3Ulw0bY1OoDxWQr86T2fZGJ/HIZuVVBf3+
 gaYJF92GXFynHnea14nFFuFgOni0Mi1zDxYH/8yGGBXvo14KWd8JOW0NJPaCDFJkdS5hu0VY
 7vJwKcyHJGxsCLU+Et0mryX8qZwqibJIzu7kUJQdQDljbRPDFd/xmGUFCQiQAncSilYOcxNU
 EMVCXPAQTteqkvA+gNqSaK1NM9tY0eQ4iJpo+aoX8HAcn4sZzt2pfUB9vQMTBJ2d4+m/qO6+
 cFTAceXmIoFsN8+gFN3i8Is3u12u8xGudcBPvpoy4OoG
Message-ID: <6e32b5d0-8470-a5e4-6add-030cfb34cb24@linaro.org>
Date: Tue, 8 May 2018 14:11:34 -0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101
 Thunderbird/52.7.0
MIME-Version: 1.0
In-Reply-To: <20180507024909.5598-3-zackw@panix.com>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit


On 06/05/2018 23:49, Zack Weinberg wrote:

Hi Zack,

First, thank you for the throughfully review on this patch.

> On 26 Feb 2018, Adhemerval Zanella wrote:
>> This patches fixes some race conditions in NPTL cancellation code by
>> redefining how cancellable syscalls are defined and handled.  Current
>> approach is to enable asynchronous cancellation prior to making the syscall
>> and restore the previous cancellation type once the syscall returns.
>>
>> As decribed in BZ#12683, this approach shows 2 important problems:
>>
>>   1. Cancellation can act after the syscall has returned from kernel, but
>>      before userspace saves the return value.  It might result in a resource
>>      leak if the syscall allocated a resource or a side effect (partial
>>      read/write), and there is no way to program handle it with cancellation
>>      handlers.
>>
>>   2. If a signal is handled while the thread is blocked at a cancellable
>>      syscall, the entire signal handler runs with asynchronous cancellation
>>      enabled.  This can lead to issues if the signal handler call functions
>>      which are async-signal-safe but not async-cancel-safe.
>>
>> For cancellation to work correctly, there are 5 points at which the
>> cancellation signal could arrive:
>>
>>   1. Before the final "testcancel" and before the syscall is made.
>>   2. Between the "testcancel" and the syscall.
>>   3. While the syscall is blocked and no side effects have yet taken place.
>>   4. While the syscall is blocked but with some side effects already having
>>      taken place (e.g. a partial read or write).
>>   5. After the syscall has returned.
>>
>> And GLIBC wants to act on cancellation in cases 1, 2, and 3 but not in case
>> 4 or 5.  The proposed solution follows:
>>
>>   * Handling case 1 is trivial: do a conditional branch based on whether the
>>     thread has received a cancellation request;
>>   * Case 2 can be caught by the signal handler determining that the saved
>>     program counter (from the ucontext_t) is in some address range beginning
>>     just before the "testcancel" and ending with the syscall instruction.
>>   * In this case, except for certain syscalls that ALWAYS fail with EINTR
>>     even for non-interrupting signals, the kernel will reset the program
>>     counter to point at the syscall instruction during signal handling, so
>>     that the syscall is restarted when the signal handler returns. So, from
>>     the signal handler's standpoint, this looks the same as case 2, and thus
>>     it's taken care of.
>>   * In this case, the kernel cannot restart the syscall; when it's
>>     interrupted by a signal, the kernel must cause the syscall to return
>>     with whatever partial result it obtained (e.g. partial read or write).
>>   * In this case, the saved program counter points just after the syscall
>>     instruction, so the signal handler won't act on cancellation.
>>     This one is equal to 4. since the program counter is past the syscall
>>     instruction already.
> 
> To be 100% clear here, you are fixing important problem #1 by having
> deferred cancellation _not_ trigger in cases 4 and 5, because the
> signal handler observes that the PC is past the syscall instruction?

Yes.

> 
>> Another case that needs handling is syscalls that fail with EINTR even
>> when the signal handler is non-interrupting. In this case, the syscall
>> wrapper code can just check the cancellation flag when the errno result
>> is EINTR, and act on cancellation if it's set.
> 
> Which system calls are affected by this rule, and is it actually
> correct to have cancellation trigger when they fail with EINTR? 

The best description I have is from man pages related to signal [1] in
the 'Interruption of system calls and library functions by signal handlers'
part.

[1] http://man7.org/linux/man-pages/man7/signal.7.html

And I think cancellation should happen int his case because it falls on 
the semantic that another thread explicit called pthread_cancel, it is
the underlying system behaviour that is not allowing libc to explicit 
cancel the thread.

> Can we be certain that the EINTR was due to the cancellation signal rather
> than some other signal delivered first?  (At least for sigsuspend it
> is possible to have two signals be delivered simultaneously, if they
> were both blocked and pending, and both become unblocked due to the
> sigsuspend.  The kernel pushes two signal stack frames.  It would
> reduce the odds of this being a problem if the SIGCANCEL handler
> masked all other signals while running, but I'm not sure that will
> work with it jumping directly to __pthread_unwind under some
> conditions, and also I don't think it _completely_ eliminates the
> possibility; a second signal could be delivered right after sigreturn
> unmasks it, without returning to the caller of the syscall stub first.)

By inspecting the CANCELED_BIT in thread cancelhandling mask:

nptl/libc-cancellation.c:

 50   if ((result == -EINTR)
 51       && (pd->cancelhandling & CANCELED_BITMASK)
 52       && !(pd->cancelhandling & CANCELSTATE_BITMASK))
 53     __syscall_do_cancel ();

Which is atomic set on pthread_cancel:

nptl/pthread_cancel.c:

 40 
 41   THREAD_ATOMIC_BIT_SET (pd, cancelhandling, CANCELED_BIT);
 42 

So for the case where the thread is indeed cancelled by pthread_cancel,
cancelhandling should have CANCELED bit set and if kernel deliver other
signal first and cancellation signal handler it not called, the code on
libc-cancellation should handle it.

> 
>>   1. Remove the enable_asynccancel/disable_asynccancel function usage in
>>      syscall definition and instead make them call a common symbol that will
>>      check if cancellation is enabled (__syscall_cancel at
>>      nptl/libc-cancellation.c), call the arch-specific cancellable
>>      entry-point (__syscall_cancel_arch) and cancel the thread when required.
>>
>>   2. Provide a arch-specific symbol that contains global markers. These
>>      markers will be used in SIGCANCEL handler to check if the interruption
>>      has been called in a valid syscall and if the syscalls has been
>>      completed or not.
>>      A default version is provided (sysdeps/unix/sysv/linux/syscall_cancel.c),
>>      however the markers may not be set on correct expected places depeding
>>      of how INTERNAL_SYSCALL_NCS is implemented by the underlying architecture.
>>      In this case arch-specific implementation should be provided.
>>
>>   3. Rewrite SIGCANCEL asynchronous handler to check for both cancelling type
>>      and if current IP from signal handler falls between the global markes
>>      and act accordingly (sigcancel_handler at nptl/nptl-init.c).
>>
>>   4. Adjust nptl/pthread_cancel.c to send an signal instead of acting
>>      directly. This avoid synchronization issues when updating the
>>      cancellation status and also focus the logic on signal handler and
>>      cancellation syscall code.
>>
>>   5. Adjust pthread code to replace CANCEL_ASYNC/CANCEL_RESET calls to
>>      appropriated cancelable futex syscalls.
>>
>>   6. Adjust libc code to replace LIBC_CANCEL_ASYNC/LIBC_CANCEL_RESET to
>>      appropriated cancelable syscalls.
>>
>>   7. Adjust 'lowlevellock-futex.h' arch-specific implementations to provide
>>      cancelable futex calls (used in libpthread code).
> 
> The overall design seems sound to me.
> 
>> This patch adds the proposed changes to NPTL.  The code leaves all the ports
>> broken without further patches in the list.
> 
> Can we find an ordering of the patches in which the tree is never
> broken at an intermediate stage?  Perhaps you could introduce the generic
> __syscall_cancel(_arch) first, but not change any of the existing
> cancellation logic until after all of the port-specific code was in
> place?

I think I can try to split this one two or more patches and make the actual fix 
tied with the first architecture changed (x86_64 currently). The x86_64 its the
only architectures that provides a *cancellation* re-implementation in assembly
(originally as optimization (?)), and I think we split it on a preliminary
clean-up patch. But I am not sure I can make the tree stable if we just apply
some architectures fixes, instead of all of them.

> 
> Below, all the changes which I erased and didn't comment on seem OK to
> me.  I am generally pleased to see many system call wrappers get
> simpler in the process of fixing this bug.
> 
>> --- a/io/creat.c
>> +++ b/io/creat.c
> ...
>> -
>> -/* __open handles cancellation.  */
>> -LIBC_CANCEL_HANDLED ();
> 
> It might be clearer to split patch 2 into two patches, one to remove
> tst-cancel-wrappers.sh and one to make all the other testing changes,
> and then shift all of the removals of LIBC_CANCEL_HANDLED into the
> same patch that removes tst-cancel-wrappers.sh.

This sounds reasonable, I will change it.

> 
>> --- a/nptl/descr.h
>> +++ b/nptl/descr.h
> ...
>> +  /* Bit set if threads is canceled.  */
> 
> Grammar: "Bit set if thread is canceled."
> 
> (For 100% proper English, it would be "This bit is set if this thread
> has been cancelled", but all of the other comments are clipped as
> well, so the above is fine.)

Ack.

> 
>> --- a/nptl/libc-cancellation.c
>> +++ b/nptl/libc-cancellation.c
> ...
>> +#include <setjmp.h>
>> +#include <stdlib.h>
> 
> Why are these additional #includes necessary?

They are not, I will remove them.

> 
> ...
>> +  volatile struct pthread *pd = (volatile struct pthread *) self;
> 
> Why is it necessary for this pointer to be volatile?  It appears that
> we only ever access pd->cancelhandling; which loads need to be
> forced to go to memory, and should they also be atomic?

If I recall correctly this came from a previous review, however I can't
recall why exactly it was required. Thinking twice I see no reason in
fact to use volatile here.  I will remove it.

> 
>> +  /* If cancellation is not enabled, call the syscall directly.  */
>> +  if (pd->cancelhandling & CANCELSTATE_BITMASK)
> 
> It is unfortunate that the one-bit flag that means "cancellation is
> disabled" is named CANCELSTATE_BITMASK instead of, I dunno,
> CANCEL_DISABLED_BITMASK maybe.  You didn't introduce that, but please
> consider a follow-up patch that renames CANCELSTATE_BIT(MASK) and also
> CANCELTYPE_BIT(MASK) [the latter to "CANCEL_ASYNC_BIT(MASK)" perhaps].

In fact in my bz12683 repo I have 4 more patches that refactor the cancellation
mask altogether [1].

[1] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/azanella/bz12683

> 
>> +  if ((result == -EINTR)
>> +      && (pd->cancelhandling & CANCELED_BITMASK)
>> +      && !(pd->cancelhandling & CANCELSTATE_BITMASK))
>> +    __syscall_do_cancel ();
> 
> I thought you said this treatment was only applied to system calls
> that _always_ fail with EINTR, even when interrupted by SA_RESTART
> signals.  This is doing it for system calls interrupted by
> non-SA_RESTART signals as well.  You also change SIGCANCEL to be a
> SA_RESTART signal in this patch, but it still could happen that we
> have SIGCANCEL and some other non-SA_RESTART signal arrive
> simultaneously enough that they interrupt the same system call, and
> then it fails with EINTR and everything is confused.
> 
> I think you need to check over the entire design with "what if
> SIGCANCEL and some other signal are delivered as nested interruptions
> to the same system call" in mind, in fact.  The dangerous case that I
> see is when the _other_ signal causes the system call to be
> interrupted after producing some side effects.  Can the SIGCANCEL
> handler know that this has happened or is about to happen?

It is still handling only the cancellation case, since it checks the
cancelhandling bit that is set only if a thread calls pthread_cancel.
My understanding is nested interruptions should not be a problem: either
cancel sighandler is activated first which cancels the thread if syscall
does not have a side-effect or the syscall returns with EINTR and
with a cancellation pending (cancelhandling & CANCELED_BITMASK).

However I think we can improve the cancel handling (and potentially
simplifying the synchronization code on cancelhandling mask) by masking
all signals while handling SIGCANCEL.

> 
>> --- a/nptl/nptl-init.c
>> +++ b/nptl/nptl-init.c
>>  sigcancel_handler (int sig, siginfo_t *si, void *ctx)
>>  {
>> +  INTERNAL_SYSCALL_DECL (err);
>> +  pid_t pid = INTERNAL_SYSCALL_CALL (getpid, err);
>> +
>>    /* Safety check.  It would be possible to call this function for
>>       other signals and send a signal from another process.  This is not
>>       correct and might even be a security problem.  Try to catch as
>>       many incorrect invocations as possible.  */
>>    if (sig != SIGCANCEL
>> -      || si->si_pid != __getpid()
>> +      || si->si_pid != pid
>>        || si->si_code != SI_TKILL)
> 
> Why is this change necessary?

This code was before getpid cache removal (c579f48edba8) and I did not catch
it on my rebases.  I will remove it.

> 
>> +  volatile struct pthread *pd = (volatile struct pthread *) self;
> 
> Same question as above: why does this need to be volatile, does it
> actually do the job, should we instead be using atomics?

Same as before, I will remove it.

> 
>> +  /* Add SIGCANCEL on ignored sigmask to avoid the handler to be called
>> +     again.  */
>> +  sigset_t *set = UCONTEXT_SIGMASK (ctx);
>> +  __sigaddset (set, SIGCANCEL);
> 
> Screwing with the signal mask that will be restored by sigreturn seems
> unsafe.  Could we instead return early if CANCELED_BITMASK and/or
> EXITING_BITMASK has already been set?
> 
>> +  /* Check if asynchronous cancellation mode is set and if interrupted
>                                                        ^^^
> This 'and' should be 'or'.

Ack.

> 
>> +      THREAD_ATOMIC_BIT_SET (self, cancelhandling, EXITING_BIT);
>> +      THREAD_SETMEM (self, result, PTHREAD_CANCELED);
> 
> I think these ought to be done inside __do_cancel, since several other
> places may call __do_cancel.

Indeed and the 'THREAD_SETMEM (self, result, PTHREAD_CANCELED)' here is superflous
(since __do_cancel already does it).

> 
>> +      INTERNAL_SYSCALL_CALL (rt_sigprocmask, err, SIG_SETMASK, set, NULL,
>> +                          _NSIG / 8);
> 
> See above in re screwing with the signal mask.
> 
>> +      __do_cancel ();
>> +      return;
>>      }
>>  }
> 
> This 'return' should be unnecessary on two counts: we're about to fall
> off the end of the function anyway, and __do_cancel ends by calling
> __pthread_unwind which does not return.

Indeed and __do_cancel is already marked as noreturn.  I will remove it.

> 
> --- a/nptl/pthreadP.h
> +++ b/nptl/pthreadP.h
> ...
>> -  /* Make sure we get no more cancellations.  */
>> -  THREAD_ATOMIC_BIT_SET (self, cancelhandling, EXITING_BIT);
>> +  /* Make sure we get no more cancellations by clearing the cancel
>> +     state.  */
>> +  int oldval = THREAD_GETMEM (self, cancelhandling);
>> +  while (1)
>> +    {
>> +      int newval = (oldval | CANCELSTATE_BITMASK);
>> +      newval &= ~(CANCELTYPE_BITMASK);
> 
> Disabled cancellation state is specified to take precedence over
> asynchronous cancellation type, so it should be enough to atomically
> set CANCELSTATE_BIT, at which point the CAS loop would be unnecessary.

Right, this seems unnecessary.

> 
>> --- a/nptl/pthread_cancel.c
>> +++ b/nptl/pthread_cancel.c
> ...
>> +  /* Avoid signaling when thread attempts cancel itself (pthread_kill
>> +     is expensive).  */
>> +  if (pd == THREAD_SELF && !(pd->cancelhandling & CANCELTYPE_BITMASK))
>> +    return 0;
> 
> It is not obvious why this is correct.  The answer is that
> pthread_cancel is not itself a cancellation point, so, when
> cancellation is deferred, a self-cancel is _not_ supposed to take
> effect immediately; the signal handler would have no effect, and the
> next cancellation point will check the flag anyway.  I suggest instead
> 
>     if (pd == THREAD_SELF)
>       {
>          if (pd->cancelhandling & CANCELTYPE_BITMASK)
>            __do_cancel ();
>          /* pthread_cancel is not itself a cancellation point;
>             the next cancellation point will check the flag anyway,
>             so there is no need to send the cancellation signal.  */
>         return 0;
>      }
>      return __pthread_kill (th, SIGCANCEL);

I tried you suggestion, but it seems to trigger a regression which I think it
is unrelated to the patch:

>>> info threads
  Id   Target Id         Frame
* 1    LWP 28018 "ld-linux-x86-64" __GI___syscall_cancel_arch (ch=ch@entry=0x7ffff7ff2a48, nr=nr@entry=202, a1=a1@entry=140737345771984, a2=a2@entry=0, a3=28023, a4=a4@entry=0, a5=0, a6=0) at ../sysdeps/unix/sysv/linux/syscall_cancel.c:63
  3    LWP 28023 "ld-linux-x86-64" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  5    LWP 28025 "ld-linux-x86-64" __lll_unlock_wake () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:371
  7    LWP 28027 "ld-linux-x86-64" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  9    LWP 28029 "ld-linux-x86-64" __lll_unlock_wake () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:371
  11   LWP 28031 "ld-linux-x86-64" __lll_unlock_wake () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:371
  13   LWP 28033 "ld-linux-x86-64" __lll_unlock_wake () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:371
  15   LWP 28035 "ld-linux-x86-64" __lll_unlock_wake () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:371
  17   LWP 28037 "ld-linux-x86-64" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  19   LWP 28039 "ld-linux-x86-64" 0x00007ffff7bc3d95 in __GI___pthread_mutex_lock (mutex=0x7ffff7ffd990 <_rtld_local+2352>) at ../nptl/pthread_mutex_lock.c:113
  21   LWP 28041 "ld-linux-x86-64" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135

The process seems to stuck on unwind operation:

#4  0x00007ffff71f9a13 in uw_frame_state_for (context=context@entry=0x7ffff71e9c20, fs=fs@entry=0x7ffff71e9a70) at /home/azanella/Projects/glibc/ports/src/gcc/libgcc/unwind-dw2.c:1249
#5  0x00007ffff71fac80 in uw_init_context_1 (context=context@entry=0x7ffff71e9c20, outer_cfa=outer_cfa@entry=0x7ffff71e9e50, outer_ra=0x7ffff7bca390 <__GI___pthread_unwind+64>) at /home/azanella/Projects/glibc/ports/src/gcc/libgcc/unwind-dw2.c:1578
#6  0x00007ffff71fb426 in _Unwind_ForcedUnwind (exc=0x7ffff71ead70, stop=stop@entry=0x7ffff7bca220 <unwind_stop>, stop_argument=0x7ffff71e9e80) at /home/azanella/Projects/glibc/ports/src/gcc/libgcc/unwind.inc:201
#7  0x00007ffff7bca390 in __GI___pthread_unwind (buf=<optimized out>) at unwind.c:121
#8  0x00007ffff7bc8b41 in __do_cancel () at pthreadP.h:302
#9  __pthread_cancel (th=140737339369216) at pthread_cancel.c:60
#10 0x000000000040174d in ?? ()
#11 0x0000000000000000 in ?? ()

It does not happen on powerpc or sparc as far I can check. I did not dig into, but
I think it would be safer to use current mechanism to pthread_cancel always by
signaling the thread with pthread_kill and we can revise this later why is
failing with your suggestion on x86.

> 
>> --- a/nptl/pthread_create.c
>> +++ b/nptl/pthread_create.c
> ...
>>    /* If the parent was running cancellation handlers while creating
>>       the thread the new thread inherited the signal mask.  Reset the
>>       cancellation signal mask.  */
> 
> If the cancellation handler exited early when EXITING_BIT was set, and
> it didn't mess with the signal mask, we wouldn't need to do this, either.
> 
>> -       int oldtype = CANCEL_ASYNC ();
>> +       int ct;
>> +       __pthread_setcanceltype (PTHREAD_CANCEL_ASYNCHRONOUS, &ct);
> 
> I don't understand the logic in this part of the code at all so I am
> going to trust you that this is the correct change.  It seems like
> it's using SIGCANCEL for something other than cancellation, which
> seems unwise and maybe should be changed in a follow-up.

I think we can follow up this on subsequent patches, since the change 
is just to adjust CANCEL_ASYNC macro removal.

> 
>> --- a/nptl/pthread_join_common.c
>> +++ b/nptl/pthread_join_common.c
> ...
>> -      int oldtype = CANCEL_ASYNC ();
>> +      int ct;
>> +      __pthread_setcanceltype (PTHREAD_CANCEL_ASYNCHRONOUS, &ct);
>>
>>        if (abstime != NULL)
>>       result = lll_timedwait_tid (pd->tid, abstime);
>>        else
>>       lll_wait_tid (pd->tid);
>>
>> -      CANCEL_RESET (oldtype);
>> +      __pthread_setcanceltype (ct, NULL);
> 
> Maybe we should have lll_(timed)wait_tid_cancel rather than this?

Both lll_timedwait_tid and lll_wait_tid are already a cancellation point, the
idea here is to just to enable asynchronous cancellation since pthread_join
is a cancellation entrypoint (and nptl/libc-cancellation.c call the syscall
directly if cancellation is disabled).

> 
>> --- a/sysdeps/unix/sysdep.h
>> +++ b/sysdeps/unix/sysdep.h
> ...
>> +/* The loader does not need to handle thread cancellation, use direct
>> +   syscall instead.  */
> 
> A stronger assertion would be better:
> 
> /* The loader should never make cancellable system calls.  Remap them
>    to direct system calls.  */

Ack, I changed to your suggestion.

> 
> Also, is this hack still necessary after I went through and made the
> loader explicitly use _nocancel operations?

I need to actually check, but I presume that once the patch go in, loader build
would not require to pull objects that might pull the libc-cancellation.

> 
>> --- a/sysdeps/unix/sysv/linux/pthread_kill.c
>> +++ b/sysdeps/unix/sysv/linux/pthread_kill.c
> ...
>> -  /* Disallow sending the signal we use for cancellation, timers,
>> -     for the setxid implementation.  */
>> -  if (signo == SIGCANCEL || signo == SIGTIMER || signo == SIGSETXID)
>> +  /* Disallow sending the signal we use for setxid implementation.  */
>> +  if (signo == SIGSETXID)
>>      return EINVAL;
> 
> Now applications can call pthread_kill(thr, SIGCANCEL) themselves,
> can't they?  That seems unsafe.  Also this has since been changed to
> use __is_internal_signal, I think.  We need an __internal_pthread_kill
> or something.

Yes, I will change to call an internal symbol.

> 
>> --- a/sysdeps/unix/sysv/linux/socketcall.h
>> +++ b/sysdeps/unix/sysv/linux/socketcall.h
> ...
>> +#define __SOCKETCALL_CANCEL1(__name, __a1) \
>> +  SYSCALL_CANCEL_NCS (socketcall, __name, \
>> +     ((long int [1]) { (long int) __a1 }))
> 
> Blech, which architectures still require us to use socketcall?

The one I recall, i386, m68k, s390, and sparc (some socket operations are wire-up
in recent kernels for some architectures).

> 
>> --- /dev/null
>> +++ b/sysdeps/unix/sysv/linux/syscall_cancel.c
> ...
>> +#define ADD_LABEL(__label)		\
>> +  asm volatile (			\
>> +    ".global " __label "\t\n"		\
>> +    ".type " __label ",\%function\t\n"       \
>> +    __label ":\n");
> 
> This makes me extremely nervous.  Specifically, I do not trust the
> compiler to not move these around, even though they're volatile.  I
> would actually prefer an .S file for every architecture.
> 
> Do we have a concrete reason to believe they really will always be
> where they're supposed to be?  "It compiled fine with the compiler I
> have right now" is not enough, I'm afraid.

For the architectures that I am using the C implementation (riscv, mips64, 
nios2, m68k, alpha, ia64, arm, and aarch64) two different gcc version
generates the expected code (gcc6 and gcc7). My understanding is the
volatile asm should not move the ADD_LABEL macro other C declaration,
but I give you that depending on how the architecture defines
INTERNAL_SYSCALL_NCS_CALL, the labels might no in right position.

I created it to easier the port adjustments (since for some architectures
I don't have much experience with ABI), but I give you I don't have a 
strong preference here. The main advantage I see is it simplifies a lot the
required boilerplate on some ABIs for PIC/non-PIC code.

> 
>> --- a/sysdeps/unix/sysv/linux/sysdep.h
>> +++ b/sysdeps/unix/sysv/linux/sysdep.h
> ...
>> +/* Check error from cancellable syscall and set errno accordingly.
>> +   Linux uses a negative return value to indicate syscall errors
>> +   and since version 2.1 the return value of a system call might be
>> +   negative even if the call succeeded (e.g., the `lseek' system call
>> +   might return a large offset).
>> +   Current contract is kernel make sure the no syscall returns a value
>> +   in -1 .. -4095 as a valid result so we can savely test with -4095.  */
>> +#define SYSCALL_CANCEL_RET(__ret)		\
>> +  ({						\
>> +    if (__ret > -4096UL)			\
>> +      {						\
>> +	__set_errno (-__ret);			\
>> +	__ret = -1;				\
>> +      }						\
>> +    __ret;					\
>> +   })
> 
> Don't we already have this somewhere under another name?

Not really, we have the INTERNAL_SYSCALL_ERROR_P macro which check the
error from the INTERNAL_SYSCALL_DECL (err).  The issues is some some kernel
abis (alpha and sparc for instance) either it does check the return code 
(alpha) or it expects the expecial kernel abi register (sparc). I added this
one for consistency.