unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Adhemerval Zanella via Libc-alpha <libc-alpha@sourceware.org>
To: Wilco Dijkstra <Wilco.Dijkstra@arm.com>,
	"Paul A. Clarke" <pc@us.ibm.com>
Cc: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>,
	"libc-alpha@sourceware.org" <libc-alpha@sourceware.org>
Subject: Re: [PATCH v3 5/7] math: Remove powerpc e_hypot
Date: Thu, 11 Nov 2021 17:54:43 -0300	[thread overview]
Message-ID: <d4770cd3-5fc3-ba8d-b21d-f6793b4077a5@linaro.org> (raw)
In-Reply-To: <VE1PR08MB5599C30360944D549AB9699383949@VE1PR08MB5599.eurprd08.prod.outlook.com>



On 11/11/2021 16:48, Wilco Dijkstra wrote:
> Hi Adhemerval,
> 
>>>> Another option is to use the powerpc implementation which favor FP over integer
>>>> as the default one.
>>>
>>> That is the fastest implementation. It is less accurate though (~1.04ULP with FMA
>>> and ~1.21ULP without FMA), so I'm not sure that would be acceptable.
>>
>> This should not be worse than the current default (the powerpc one is essentially
>> the same as default using FP operations).
> 
> The generic version carefully computes x * x + y * y with higher accuracy so that
> the sqrt stays below 1.0ULP. The powerpc version doesn't and so goes over 1.0ULP.

For *hypotf* they are essentially the same, powerpc one just tries to optimize
the isinf/isnan because of the FP->GRP hazards.  I think there is not current 
justification for the TEST_INF_NAN, it would be better to use your suggestion
of on default algorithm and just remove powerpc one:

  if (!isfinite(x) || !isfinite(y))
     {
       a = x; b = y;
       if ((isinf (x) || isinf (y))
	  && !issignaling_inline (x) && !issignaling_inline (y))
	return INFINITY;
       return x + y;
     }

> 
>>> I did some quick optimizations on the new algorithm, on Neoverse N1 my fastest
>>> version is less than 10% slower than the powerpc version, and has ~0.94 ULP error.
>>
>> Do you mean besides the optimized nan/inf checks? I can check if it helps on
>> powerpc.
> 
> Yes. I avoid the unnecessary checks at the end by doing everything in the 3 main
> cases. The division can be made independent of the sqrt so they run in parallel on
> modern cores.
> 
> However we can do even better with FMA and remove the division entirely by 
> special casing the difficult case where x and y are really close. This has only 3.5%
> higher latency than the powerpc version, so that's the fastest option below 1.0ULP.
> I'll see whether it could work without FMA too and send you something to benchmark
> if it passes the testsuite.

The original paper does have a version that uses fma, but it aims to be correctly
rounded:

  double h2 = h * h;
  double ax2 = ax * ax;
  h -= (__builtin_fma (-ay, ay, h2 - ax2)
       + __builtin_fma (h, h, -h2)
       - __builtin_fma (ax, ax, -ax2)) / (2.0 * h);
  return h * scale;

However, at least on recent x86_64 I did not see much improvement over no fma
version. Maybe we can come up with a version that might not be correctly
rounded that can leverage the fma for __FP_FAST_FMA.

(Also this version does not fully pass the testsuite, it trigger some underflow
exceptions that I did not investigate).
  

  reply	other threads:[~2021-11-11 20:55 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-01 20:20 [PATCH v3 0/7] Improve hypot() Adhemerval Zanella via Libc-alpha
2021-11-01 20:20 ` [PATCH v3 1/7] math: Simplify hypotf implementation Adhemerval Zanella via Libc-alpha
2021-11-01 20:20 ` [PATCH v3 2/7] math: Use an improved algorithm for hypot (dbl-64) Adhemerval Zanella via Libc-alpha
2021-11-01 20:20 ` [PATCH v3 3/7] math: Use an improved algorithm for hypotl (ldbl-96) Adhemerval Zanella via Libc-alpha
2021-11-01 20:20 ` [PATCH v3 4/7] math: Use an improved algorithm for hypotl (ldbl-128) Adhemerval Zanella via Libc-alpha
2021-11-01 20:20 ` [PATCH v3 5/7] math: Remove powerpc e_hypot Adhemerval Zanella via Libc-alpha
2021-11-09 19:28   ` Paul A. Clarke via Libc-alpha
2021-11-10 14:34     ` Wilco Dijkstra via Libc-alpha
2021-11-10 14:43       ` Paul A. Clarke via Libc-alpha
2021-11-10 14:47       ` Adhemerval Zanella via Libc-alpha
2021-11-11 17:05         ` Wilco Dijkstra via Libc-alpha
2021-11-11 17:13           ` Adhemerval Zanella via Libc-alpha
2021-11-11 19:48             ` Wilco Dijkstra via Libc-alpha
2021-11-11 20:54               ` Adhemerval Zanella via Libc-alpha [this message]
2021-11-01 20:20 ` [PATCH v3 6/7] i386: Move hypot implementation to C Adhemerval Zanella via Libc-alpha
2021-11-01 20:20 ` [PATCH v3 7/7] math: Remove the error handling wrapper from hypot and hypotf Adhemerval Zanella via Libc-alpha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/libc/involved.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d4770cd3-5fc3-ba8d-b21d-f6793b4077a5@linaro.org \
    --to=libc-alpha@sourceware.org \
    --cc=Wilco.Dijkstra@arm.com \
    --cc=adhemerval.zanella@linaro.org \
    --cc=pc@us.ibm.com \
    --cc=tuliom@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).