From: Adhemerval Zanella via Libc-alpha <libc-alpha@sourceware.org>
To: Wilco Dijkstra <Wilco.Dijkstra@arm.com>,
"Paul A. Clarke" <pc@us.ibm.com>
Cc: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>,
"libc-alpha@sourceware.org" <libc-alpha@sourceware.org>
Subject: Re: [PATCH v3 5/7] math: Remove powerpc e_hypot
Date: Thu, 11 Nov 2021 17:54:43 -0300 [thread overview]
Message-ID: <d4770cd3-5fc3-ba8d-b21d-f6793b4077a5@linaro.org> (raw)
In-Reply-To: <VE1PR08MB5599C30360944D549AB9699383949@VE1PR08MB5599.eurprd08.prod.outlook.com>
On 11/11/2021 16:48, Wilco Dijkstra wrote:
> Hi Adhemerval,
>
>>>> Another option is to use the powerpc implementation which favor FP over integer
>>>> as the default one.
>>>
>>> That is the fastest implementation. It is less accurate though (~1.04ULP with FMA
>>> and ~1.21ULP without FMA), so I'm not sure that would be acceptable.
>>
>> This should not be worse than the current default (the powerpc one is essentially
>> the same as default using FP operations).
>
> The generic version carefully computes x * x + y * y with higher accuracy so that
> the sqrt stays below 1.0ULP. The powerpc version doesn't and so goes over 1.0ULP.
For *hypotf* they are essentially the same, powerpc one just tries to optimize
the isinf/isnan because of the FP->GRP hazards. I think there is not current
justification for the TEST_INF_NAN, it would be better to use your suggestion
of on default algorithm and just remove powerpc one:
if (!isfinite(x) || !isfinite(y))
{
a = x; b = y;
if ((isinf (x) || isinf (y))
&& !issignaling_inline (x) && !issignaling_inline (y))
return INFINITY;
return x + y;
}
>
>>> I did some quick optimizations on the new algorithm, on Neoverse N1 my fastest
>>> version is less than 10% slower than the powerpc version, and has ~0.94 ULP error.
>>
>> Do you mean besides the optimized nan/inf checks? I can check if it helps on
>> powerpc.
>
> Yes. I avoid the unnecessary checks at the end by doing everything in the 3 main
> cases. The division can be made independent of the sqrt so they run in parallel on
> modern cores.
>
> However we can do even better with FMA and remove the division entirely by
> special casing the difficult case where x and y are really close. This has only 3.5%
> higher latency than the powerpc version, so that's the fastest option below 1.0ULP.
> I'll see whether it could work without FMA too and send you something to benchmark
> if it passes the testsuite.
The original paper does have a version that uses fma, but it aims to be correctly
rounded:
double h2 = h * h;
double ax2 = ax * ax;
h -= (__builtin_fma (-ay, ay, h2 - ax2)
+ __builtin_fma (h, h, -h2)
- __builtin_fma (ax, ax, -ax2)) / (2.0 * h);
return h * scale;
However, at least on recent x86_64 I did not see much improvement over no fma
version. Maybe we can come up with a version that might not be correctly
rounded that can leverage the fma for __FP_FAST_FMA.
(Also this version does not fully pass the testsuite, it trigger some underflow
exceptions that I did not investigate).
next prev parent reply other threads:[~2021-11-11 20:55 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-01 20:20 [PATCH v3 0/7] Improve hypot() Adhemerval Zanella via Libc-alpha
2021-11-01 20:20 ` [PATCH v3 1/7] math: Simplify hypotf implementation Adhemerval Zanella via Libc-alpha
2021-11-01 20:20 ` [PATCH v3 2/7] math: Use an improved algorithm for hypot (dbl-64) Adhemerval Zanella via Libc-alpha
2021-11-01 20:20 ` [PATCH v3 3/7] math: Use an improved algorithm for hypotl (ldbl-96) Adhemerval Zanella via Libc-alpha
2021-11-01 20:20 ` [PATCH v3 4/7] math: Use an improved algorithm for hypotl (ldbl-128) Adhemerval Zanella via Libc-alpha
2021-11-01 20:20 ` [PATCH v3 5/7] math: Remove powerpc e_hypot Adhemerval Zanella via Libc-alpha
2021-11-09 19:28 ` Paul A. Clarke via Libc-alpha
2021-11-10 14:34 ` Wilco Dijkstra via Libc-alpha
2021-11-10 14:43 ` Paul A. Clarke via Libc-alpha
2021-11-10 14:47 ` Adhemerval Zanella via Libc-alpha
2021-11-11 17:05 ` Wilco Dijkstra via Libc-alpha
2021-11-11 17:13 ` Adhemerval Zanella via Libc-alpha
2021-11-11 19:48 ` Wilco Dijkstra via Libc-alpha
2021-11-11 20:54 ` Adhemerval Zanella via Libc-alpha [this message]
2021-11-01 20:20 ` [PATCH v3 6/7] i386: Move hypot implementation to C Adhemerval Zanella via Libc-alpha
2021-11-01 20:20 ` [PATCH v3 7/7] math: Remove the error handling wrapper from hypot and hypotf Adhemerval Zanella via Libc-alpha
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/libc/involved.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d4770cd3-5fc3-ba8d-b21d-f6793b4077a5@linaro.org \
--to=libc-alpha@sourceware.org \
--cc=Wilco.Dijkstra@arm.com \
--cc=adhemerval.zanella@linaro.org \
--cc=pc@us.ibm.com \
--cc=tuliom@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).