From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-6.1 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS,TVD_SUBJ_WIPE_DEBT shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 4DAF21F953 for ; Thu, 11 Nov 2021 20:55:09 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id 02FEB385AC3E for ; Thu, 11 Nov 2021 20:55:08 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 02FEB385AC3E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1636664108; bh=q9Axq9we9Fu7t0RAlXziuVhTM1f4OMWZK5jjW8dnJS0=; h=Date:Subject:To:References:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc: From; b=hm/exO7omMr7tRoeAILa1cc2m4nHHiEtwyfh20uQvXkhe7J04Yj6SLYiqsE4F1bAZ hd1V1UWnPVV4sI4IVXeW+zMG5rUYRa6esN58hFbcSyEZRzyz3D9zQpeXXSelQm9Izx hgQ2hPhuA0GS1ejKo6D4VZMZCff+ztUUjy3BetQs= Received: from mail-ua1-x932.google.com (mail-ua1-x932.google.com [IPv6:2607:f8b0:4864:20::932]) by sourceware.org (Postfix) with ESMTPS id CA1EA385BF80 for ; Thu, 11 Nov 2021 20:54:46 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CA1EA385BF80 Received: by mail-ua1-x932.google.com with SMTP id b17so14660116uas.0 for ; Thu, 11 Nov 2021 12:54:46 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:in-reply-to :content-transfer-encoding; bh=q9Axq9we9Fu7t0RAlXziuVhTM1f4OMWZK5jjW8dnJS0=; b=SGVsTksYw3ghBgJnG1hg21Hi8MxtwBS2PXu4/ZoBRll+eTVb07ex12DC+veXwd6oYa NP012EXFSKKm8Ae2yxs5QW4vCj61SDzQz0S+XDIS8IoHsNTPYrsdtx6J7yY2R2C+ruQ8 MXcFfaCCBf4qaCkXmUVXmLZvDJAeE2IwMI7oUFZL0kuYtNnyvkjwmZ31Tiz1kiax3Qcr zGpMUNjyqC1+iHavV5Y87WOXKB6hWoy38Q4+OOKJMhrnqVzQONKyqIOXxnb7mRQjJ4uw 1j+FhFCbap8uweAnk2UyF1yMG8ufd0LNmBLaMbZ4ehB6Nn5XK7hf7BkPzykJu4G2v71b 9EvQ== X-Gm-Message-State: AOAM532BfzTLXu1Zs+7faCQwIiELG6m3FGrBX3jFi73DX1MO7Biz+zH1 D5r8PREx5lRb+gSwn7hHYuivLFHrPwu04w== X-Google-Smtp-Source: ABdhPJz4S+nw2QETb7hTgTEmnpq1YiWtDmYVp9slwnz3O6GawByq6AOt/d/juWdMF91od0IXOVofbw== X-Received: by 2002:ab0:1c02:: with SMTP id a2mr14142188uaj.115.1636664086061; Thu, 11 Nov 2021 12:54:46 -0800 (PST) Received: from ?IPV6:2804:431:c7cb:55a:48f2:1d0b:8ae8:643a? ([2804:431:c7cb:55a:48f2:1d0b:8ae8:643a]) by smtp.gmail.com with ESMTPSA id j21sm2773575vkn.4.2021.11.11.12.54.44 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 11 Nov 2021 12:54:45 -0800 (PST) Message-ID: Date: Thu, 11 Nov 2021 17:54:43 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.2.1 Subject: Re: [PATCH v3 5/7] math: Remove powerpc e_hypot Content-Language: en-US To: Wilco Dijkstra , "Paul A. Clarke" References: <20211101202059.1026032-1-adhemerval.zanella@linaro.org> <20211101202059.1026032-6-adhemerval.zanella@linaro.org> <20211109192800.GA4930@li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com> <37a5bc8c-a9ec-952d-427e-62632f7f7a0a@linaro.org> <384b240c-29c3-af14-05e6-951f00178cff@linaro.org> In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Adhemerval Zanella via Libc-alpha Reply-To: Adhemerval Zanella Cc: Tulio Magno Quites Machado Filho , "libc-alpha@sourceware.org" Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org Sender: "Libc-alpha" On 11/11/2021 16:48, Wilco Dijkstra wrote: > Hi Adhemerval, > >>>> Another option is to use the powerpc implementation which favor FP over integer >>>> as the default one. >>> >>> That is the fastest implementation. It is less accurate though (~1.04ULP with FMA >>> and ~1.21ULP without FMA), so I'm not sure that would be acceptable. >> >> This should not be worse than the current default (the powerpc one is essentially >> the same as default using FP operations). > > The generic version carefully computes x * x + y * y with higher accuracy so that > the sqrt stays below 1.0ULP. The powerpc version doesn't and so goes over 1.0ULP. For *hypotf* they are essentially the same, powerpc one just tries to optimize the isinf/isnan because of the FP->GRP hazards. I think there is not current justification for the TEST_INF_NAN, it would be better to use your suggestion of on default algorithm and just remove powerpc one: if (!isfinite(x) || !isfinite(y)) { a = x; b = y; if ((isinf (x) || isinf (y)) && !issignaling_inline (x) && !issignaling_inline (y)) return INFINITY; return x + y; } > >>> I did some quick optimizations on the new algorithm, on Neoverse N1 my fastest >>> version is less than 10% slower than the powerpc version, and has ~0.94 ULP error. >> >> Do you mean besides the optimized nan/inf checks? I can check if it helps on >> powerpc. > > Yes. I avoid the unnecessary checks at the end by doing everything in the 3 main > cases. The division can be made independent of the sqrt so they run in parallel on > modern cores. > > However we can do even better with FMA and remove the division entirely by > special casing the difficult case where x and y are really close. This has only 3.5% > higher latency than the powerpc version, so that's the fastest option below 1.0ULP. > I'll see whether it could work without FMA too and send you something to benchmark > if it passes the testsuite. The original paper does have a version that uses fma, but it aims to be correctly rounded: double h2 = h * h; double ax2 = ax * ax; h -= (__builtin_fma (-ay, ay, h2 - ax2) + __builtin_fma (h, h, -h2) - __builtin_fma (ax, ax, -ax2)) / (2.0 * h); return h * scale; However, at least on recent x86_64 I did not see much improvement over no fma version. Maybe we can come up with a version that might not be correctly rounded that can leverage the fma for __FP_FAST_FMA. (Also this version does not fully pass the testsuite, it trigger some underflow exceptions that I did not investigate).