From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.6 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_EF,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 92EC31F731 for ; Thu, 8 Aug 2019 18:48:56 +0000 (UTC) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:from:to:cc:references:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; q=dns; s=default; b=A5WeZch4OpB75uT9 LgvnqbFDv9DbDoTRDorIDtGOJJQuStTrBFSqFp3EVFsbcVLOxee58VhB/dTPBBAx bFldRWT1f/sFuzD42rgkIEHyOWLFCN2BntAAgTca53QDtzuJ+xydfjZCizswteBz hbZ1B4T6Mo1nPYGP9+Wf0+QGvSI= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:from:to:cc:references:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; s=default; bh=x1naohY+2iycITwGTaHK9g Y85d4=; b=GuW4kVwjeskjXX4+LrQ22w+LQC/Avi/nkW214vG8S1jtd4r0kazYZL WxxxV71+KqbkLJwEsO4mMr1pvUaEzZQau8zFUia+LajvAka8NCpppmuuZUxmPxHN XDnMm+ZKO6RB6X2Vpr+I71aiq3xCX76U6p26D7IfWbQTuOYuqyBcs= Received: (qmail 116633 invoked by alias); 8 Aug 2019 18:48:53 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 116622 invoked by uid 89); 8 Aug 2019 18:48:53 -0000 Authentication-Results: sourceware.org; auth=none X-HELO: mx0a-001b2d01.pphosted.com Subject: Re: PPC64 libmvec sincos/sincosf ABI From: Bill Schmidt To: Wilco Dijkstra , "'GNU C Library'" , "tnggil@protonmail.com" , Joseph Myers Cc: nd , Tulio Magno Quites Machado Filho References: Message-ID: <1730b1dd-9495-2f04-cbc7-4a957bc20a6a@linux.ibm.com> Date: Thu, 8 Aug 2019 13:48:41 -0500 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit On 8/8/19 10:25 AM, Bill Schmidt wrote: > On 8/6/19 12:42 PM, Wilco Dijkstra wrote: >> Hi, >> >>> 1. What is the best vector ABI (best performance) for sincos on PPC64? >>> That may be a function of the particular vector instructions available on >>> PPC64; the best choice of ABI on PPC64 need not correspond to the best >>> choice on x86_64. >> I don't think it is related to the target - the fastest ABI is one that avoids >> unnecessary work. For example scalar sincos is slow due to the inefficient >> ABI which forces the results through memory (fixing that gives a 50% speedup). >> >> Similarly for the vector ABI I think returning 2 vectors in registers will be the >> fastest option in all cases. The actual vector instructions shouldn't affect the >> ABI beyond the vector widths that can be supported. >> >> Wilco >> > Let me jump in here to answer a general question that I think Bert has > had for a while. > > For the PPC64LE ABI, we should be returning everything through registers > wherever possible.  The ABI supports multiple return values of the same > type (up to 8 vector return values, for example), using the same > registers used for passing parameters.  For simplicity in this example, > I'll use the AltiVec-style types (vector double), but this works > identically if you use more generically defined vector types. > > #include > > struct sincosret > { >     vector double sinvals; >     vector double cosvals; > }; > > struct sincosret > mysincos (vector double a) > { >     struct sincosret scr; >     scr.sinvals = a+a;  // May be slightly incorrect >     scr.cosvals = a*a;  // Ditto >     return scr; > } > > This will result in the values being returned in VR2 and VR3: > >     xvmuldp 35,34,34 >     xvadddp 34,34,34 >     blr > > This is preferable to returning values indirectly through memory, which > on older POWER processors can result in stalls from the store and load > being too close together and possibly executed out of order.  The cost > is pretty much negligible compared to the cost of computing sin/cos, but > we might as well do it the best way that the ABI provides. Important caveat to the above.  This is the ELFv2 ABI, used for little-endian.  For the older ELFv1 ABI, the returned values will still go through memory. This doesn't restrict us from supporting ELFv1, but we just won't get the benefit. Thanks, Bill > > Now, as I've said elsewhere, dealing with sincos in the -mveclibabi > framework in GCC may be less than straightforward, due to the different > description of the output types, but perhaps AArch64 has already laid > some groundwork here.  I'm not up to date on the pending patches. > > Hope this helps, > Bill >