From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS17314 8.43.84.0/22 X-Spam-Status: No, score=-3.7 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS,TVD_SUBJ_WIPE_DEBT, UNPARSEABLE_RELAY shortcircuit=no autolearn=no autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 5373A1F953 for ; Thu, 11 Nov 2021 19:48:44 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 2449A385AC3F for ; Thu, 11 Nov 2021 19:48:43 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2449A385AC3F DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1636660123; bh=qfttOMP1fWbSWZkgtrTxRlk0LlLB1pTwSrQJrqryvDQ=; h=To:Subject:Date:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=Ay05EL7QG6arDFpgDN0sw0RIMxHWOYLInyd/zPHioLXMx+EsguKghJDdUGVoryr12 ZX7kbtprTXZAS4Ojpmyk+dtzA9/mC6XEbdwU5sd491QvakQSaZYwEa/JXAg/yDwYhd dtTPmU+klRhifzz7l5SDVwHpgQA3i+KH4Yp3lwHs= Received: from EUR05-VI1-obe.outbound.protection.outlook.com (mail-vi1eur05on2061.outbound.protection.outlook.com [40.107.21.61]) by sourceware.org (Postfix) with ESMTPS id AABDC3858037 for ; Thu, 11 Nov 2021 19:48:21 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org AABDC3858037 Received: from AM6PR05CA0036.eurprd05.prod.outlook.com (2603:10a6:20b:2e::49) by AM9PR08MB6004.eurprd08.prod.outlook.com (2603:10a6:20b:285::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4669.13; Thu, 11 Nov 2021 19:48:19 +0000 Received: from VE1EUR03FT053.eop-EUR03.prod.protection.outlook.com (2603:10a6:20b:2e:cafe::65) by AM6PR05CA0036.outlook.office365.com (2603:10a6:20b:2e::49) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4669.17 via Frontend Transport; Thu, 11 Nov 2021 19:48:19 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by VE1EUR03FT053.mail.protection.outlook.com (10.152.19.198) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4649.14 via Frontend Transport; Thu, 11 Nov 2021 19:48:18 +0000 Received: ("Tessian outbound 892d2780d3aa:v109"); Thu, 11 Nov 2021 19:48:18 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: 51ce9a7a489df11d X-CR-MTA-TID: 64aa7808 Received: from cd4bfd96f839.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 44D9B37C-F48B-4FB1-9132-3949E0C5CE95.1; Thu, 11 Nov 2021 19:48:11 +0000 Received: from EUR04-DB3-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id cd4bfd96f839.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Thu, 11 Nov 2021 19:48:11 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=U0dc2XKU2YbEkfxVKshWyr0hF+Hj9oNo3387z9GRjw1DisAVAFtAYtD22TTf5eK7SobHzlhcolSpOzgTyrC7c/emJTX4dn+/qvIKr3st1QTTUP42dP6+dp6bl5ydVw4t8NV0n9fXp2j4lHtVUxtPr+w/spMRidCXTHnZiuFupd7Azz2RhJw85kBANkw9zvFhg20grASte+ycOKk9qgu9e3u9BAhJcvnSCU443QF6Ko8y9sdr+GXuAgKEf1Ur+tjfwOuTBVqFNV9faFE/Jc31SV0k1DaLNBOBijzS85im91tuS+6bw+Lpy1VLRBSG2G6fngiDLQbwiQK9PQWBOlaQGg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=qfttOMP1fWbSWZkgtrTxRlk0LlLB1pTwSrQJrqryvDQ=; b=e9dahOfCeZZuG9VMlxwREwrf35BV0sqU1ihtg7c4Dw04B5lp/UXR6FzbxKr9z7oP659/MCKeF3ryZjvuiUs5Zs994rVFFqxhBlR2WQrJnMCrVQCHI0qVAHPs4jSOcRYQFFRzGi76Sup7G2OUO1T/dU8h9I09uAlbuTrL7kB7C0UCe5yC+RWu/jpet3auI1IaRwoSZxTh+F3jUli6aF3Mlthjv1mnbA44GW2oYFl3AlAffS8J+TgEwQtcgOlC+dqjb2FOVeZJZ9U2XCj+2pFUjKd/KJDKId3Chat3u9zwq2LBquT9Lq6jqrbpXEXwdbaGiWvZunyMe57KUiVnCfQSSw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VI1PR08MB4224.eurprd08.prod.outlook.com (2603:10a6:803:bc::32) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4669.11; Thu, 11 Nov 2021 19:48:07 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::e49f:f587:130d:78e4]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::e49f:f587:130d:78e4%9]) with mapi id 15.20.4690.016; Thu, 11 Nov 2021 19:48:07 +0000 To: Adhemerval Zanella , "Paul A. Clarke" Subject: Re: [PATCH v3 5/7] math: Remove powerpc e_hypot Thread-Topic: [PATCH v3 5/7] math: Remove powerpc e_hypot Thread-Index: AQHXz14N6PTp+7MyoU+R61LgXgFRjav7oRMAgAE4tmiAAAsegIAAI2/3gAGX1ICAABKpjw== Date: Thu, 11 Nov 2021 19:48:06 +0000 Message-ID: References: <20211101202059.1026032-1-adhemerval.zanella@linaro.org> <20211101202059.1026032-6-adhemerval.zanella@linaro.org> <20211109192800.GA4930@li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com> <37a5bc8c-a9ec-952d-427e-62632f7f7a0a@linaro.org> <384b240c-29c3-af14-05e6-951f00178cff@linaro.org> In-Reply-To: <384b240c-29c3-af14-05e6-951f00178cff@linaro.org> Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: suggested_attachment_session_id: 71e558f6-9cfc-b56f-2516-378ddaa6b44d Authentication-Results-Original: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: a6d60ea8-30cf-4e09-7727-08d9a54c35bb x-ms-traffictypediagnostic: VI1PR08MB4224:|AM9PR08MB6004: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:9508;OLM:9508; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: OLBhAJAzJu34sPvkACnBWJhzQHrO8Y8lQ7uS0CmRkuxI59wxiUO58+5REupTWqa3Odf1gJzPHHnYfjuLBrzqf+NySpTh5RFwTFhc3LjLnt5WQLqdWAwntthmKYGqkANg4EtpnQJOVqQ6oMzJgrKMEXqsERNgZzv9P9Ulr3msNWV4qANVfC2tjN6/u5hhYOC1JBUO1aUudVetD2QM3OU8ZWAnNC8f9G+o5lhRfv/XuodgZoVk45NV7thav5Ju0SOrOwX+QMvR1bJxGwRK4FBj3kDMIoshcQLf6Rwy3cJGYbcTfovF5SID6zjc15qxKwDVDymCmt6u7Ju8D2choZNr7QCEq73KLis2gsjWbU7tn+D8xSiLbnpkshgCL7NOrpixZVEbNDNwhC6n44OfIBuuIrUvT7/81bCLfBtAmmGLN0Zp1P6QjjKkhBb4am1ZJ7IQ/OOXKWbakvJeEwwECrD4+Ad3j6/7GhFMbjiDVNsllYhnzzg4oxzPaeIhCkaoOJh+q5ttQUEQCB3rFyaWDIJq1uoz/7gPjhxUxSeH69UELAvZFQ66S+qhb5LUo0pPKUmo4Yuw/kgixN5VyqycFhdDZrj15un+H0L5qZ+MdvvcfEwARk/tAk7C+B+KBqnrtT2zIMPKoTdoDmwBsRq5riRN/xx1m1iUQPim4RmvxDcZjWri4ejg80IqaCLILPpUFfDxVcPTkIQZXMKx+dg25citFQ== X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(5660300002)(26005)(76116006)(66476007)(9686003)(6506007)(71200400001)(186003)(83380400001)(7696005)(38070700005)(54906003)(8676002)(508600001)(52536014)(64756008)(86362001)(66446008)(122000001)(66556008)(4326008)(2906002)(66946007)(55016002)(316002)(38100700002)(8936002)(110136005)(33656002); DIR:OUT; SFP:1101; Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR08MB4224 Original-Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: VE1EUR03FT053.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 34985422-44f2-4c08-5f08-08d9a54c2ec5 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Ocq/XRPVRYOSFcf+Uuu2ZxMs9lmkTK7NSD0z2TPiGXzK7KOmEMEVG2pwPEq1XRNBM7oE51s3E0Qdfaacgm7CtDk9T+wHSXWAqdE4AmgFMQoF7gzj0VLekEdk3hmKXWfTmNU6S3SgQiijN4iiw/tzrKDCyARuvMu45NzSOFa1FhbFNFI5exM8iv/sFKlTRZVqIrwpX9HXoTiaTL1CURQRkZE8Yg3CK1UY9SuioRB+pSjd7jHDpk4PWbqJnUhROD4K+JWj2T0YREdNs4eGYH6OoT7K5k50YeGcEdIPxlNEpjC1ztW+fPnagaPVIwqaK+LzR589J8EFw9LNxsM4enFIqsDiwANIPnFFksNVqhtNWJJqi9PBlPjHGcwmN9DvIGMwdjmcnIEHdQGpkImfJyoB2nc2odhN+gpj+wXsgSFMV2dS6tLyiA1+zNDSPYAgPURAQjM5BeDEEj63CXzniuLzeuLQ0FIwPnEqr1gzj7hprEWMsmXSKhxcUEqmRxrRJpcLNo5LZR5udY6bjepg6KuJ4X5kPj0mJDrd518CvZTu6YW8kMs3ky4LZBpYpslDCm5Cg3CnY0gDHJfGaL4pkcDYx1DYNnG01tLT6wKuILQiswoNcKm2tmeO+mSDaQhJj+InVEp/JRkdGFkOynXk5M3GT29LSVClqS3N63w1pA8aUAZ+8p/Q60HaX+qDOaIp5jB2iUBJl3dpbPTFCryBB+I+jw== X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(36840700001)(46966006)(5660300002)(4326008)(55016002)(82310400003)(336012)(83380400001)(36860700001)(9686003)(2906002)(52536014)(186003)(356005)(8676002)(33656002)(6506007)(316002)(8936002)(54906003)(47076005)(110136005)(7696005)(86362001)(70586007)(508600001)(26005)(70206006)(81166007); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 11 Nov 2021 19:48:18.6162 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: a6d60ea8-30cf-4e09-7727-08d9a54c35bb X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: VE1EUR03FT053.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM9PR08MB6004 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Wilco Dijkstra via Libc-alpha Reply-To: Wilco Dijkstra Cc: Tulio Magno Quites Machado Filho , "libc-alpha@sourceware.org" Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" Hi Adhemerval,=0A= =0A= >>> Another option is to use the powerpc implementation which favor FP over= integer=0A= >>> as the default one.=0A= >> =0A= >> That is the fastest implementation. It is less accurate though (~1.04ULP= with FMA=0A= >> and ~1.21ULP without FMA), so I'm not sure that would be acceptable.=0A= >=0A= > This should not be worse than the current default (the powerpc one is ess= entially=0A= > the same as default using FP operations).=0A= =0A= The generic version carefully computes x * x + y * y with higher accuracy s= o that=0A= the sqrt stays below 1.0ULP. The powerpc version doesn't and so goes over 1= .0ULP.=0A= =0A= >> I did some quick optimizations on the new algorithm, on Neoverse N1 my f= astest=0A= >> version is less than 10% slower than the powerpc version, and has ~0.94 = ULP error.=0A= >=0A= > Do you mean besides the optimized nan/inf checks? I can check if it helps= on=0A= > powerpc.=0A= =0A= Yes. I avoid the unnecessary checks at the end by doing everything in the 3= main=0A= cases. The division can be made independent of the sqrt so they run in para= llel on=0A= modern cores.=0A= =0A= However we can do even better with FMA and remove the division entirely by = =0A= special casing the difficult case where x and y are really close. This has = only 3.5%=0A= higher latency than the powerpc version, so that's the fastest option below= 1.0ULP.=0A= I'll see whether it could work without FMA too and send you something to be= nchmark=0A= if it passes the testsuite.=0A= =0A= Cheers,=0A= Wilco=