From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-bounces+e=80x24.org@sourceware.org>
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net
X-Spam-Level: 
X-Spam-ASN: AS3215 2.6.0.0/16
X-Spam-Status: No, score=-6.1 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI,NICE_REPLY_A,
	RCVD_IN_DNSWL_MED,SPF_HELO_PASS,SPF_PASS,TVD_SUBJ_WIPE_DEBT
	shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2
Received: from sourceware.org (server2.sourceware.org [IPv6:2620:52:3:1:0:246e:9693:128c])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
	(No client certificate requested)
	by dcvr.yhbt.net (Postfix) with ESMTPS id 4DAF21F953
	for <e@80x24.org>; Thu, 11 Nov 2021 20:55:09 +0000 (UTC)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id 02FEB385AC3E
	for <e@80x24.org>; Thu, 11 Nov 2021 20:55:08 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 02FEB385AC3E
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org;
	s=default; t=1636664108;
	bh=q9Axq9we9Fu7t0RAlXziuVhTM1f4OMWZK5jjW8dnJS0=;
	h=Date:Subject:To:References:In-Reply-To:List-Id:List-Unsubscribe:
	 List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:Cc:
	 From;
	b=hm/exO7omMr7tRoeAILa1cc2m4nHHiEtwyfh20uQvXkhe7J04Yj6SLYiqsE4F1bAZ
	 hd1V1UWnPVV4sI4IVXeW+zMG5rUYRa6esN58hFbcSyEZRzyz3D9zQpeXXSelQm9Izx
	 hgQ2hPhuA0GS1ejKo6D4VZMZCff+ztUUjy3BetQs=
Received: from mail-ua1-x932.google.com (mail-ua1-x932.google.com
 [IPv6:2607:f8b0:4864:20::932])
 by sourceware.org (Postfix) with ESMTPS id CA1EA385BF80
 for <libc-alpha@sourceware.org>; Thu, 11 Nov 2021 20:54:46 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CA1EA385BF80
Received: by mail-ua1-x932.google.com with SMTP id b17so14660116uas.0
 for <libc-alpha@sourceware.org>; Thu, 11 Nov 2021 12:54:46 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:message-id:date:mime-version:user-agent:subject
 :content-language:to:cc:references:from:in-reply-to
 :content-transfer-encoding;
 bh=q9Axq9we9Fu7t0RAlXziuVhTM1f4OMWZK5jjW8dnJS0=;
 b=SGVsTksYw3ghBgJnG1hg21Hi8MxtwBS2PXu4/ZoBRll+eTVb07ex12DC+veXwd6oYa
 NP012EXFSKKm8Ae2yxs5QW4vCj61SDzQz0S+XDIS8IoHsNTPYrsdtx6J7yY2R2C+ruQ8
 MXcFfaCCBf4qaCkXmUVXmLZvDJAeE2IwMI7oUFZL0kuYtNnyvkjwmZ31Tiz1kiax3Qcr
 zGpMUNjyqC1+iHavV5Y87WOXKB6hWoy38Q4+OOKJMhrnqVzQONKyqIOXxnb7mRQjJ4uw
 1j+FhFCbap8uweAnk2UyF1yMG8ufd0LNmBLaMbZ4ehB6Nn5XK7hf7BkPzykJu4G2v71b
 9EvQ==
X-Gm-Message-State: AOAM532BfzTLXu1Zs+7faCQwIiELG6m3FGrBX3jFi73DX1MO7Biz+zH1
 D5r8PREx5lRb+gSwn7hHYuivLFHrPwu04w==
X-Google-Smtp-Source: ABdhPJz4S+nw2QETb7hTgTEmnpq1YiWtDmYVp9slwnz3O6GawByq6AOt/d/juWdMF91od0IXOVofbw==
X-Received: by 2002:ab0:1c02:: with SMTP id a2mr14142188uaj.115.1636664086061; 
 Thu, 11 Nov 2021 12:54:46 -0800 (PST)
Received: from ?IPV6:2804:431:c7cb:55a:48f2:1d0b:8ae8:643a?
 ([2804:431:c7cb:55a:48f2:1d0b:8ae8:643a])
 by smtp.gmail.com with ESMTPSA id j21sm2773575vkn.4.2021.11.11.12.54.44
 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128);
 Thu, 11 Nov 2021 12:54:45 -0800 (PST)
Message-ID: <d4770cd3-5fc3-ba8d-b21d-f6793b4077a5@linaro.org>
Date: Thu, 11 Nov 2021 17:54:43 -0300
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.2.1
Subject: Re: [PATCH v3 5/7] math: Remove powerpc e_hypot
Content-Language: en-US
To: Wilco Dijkstra <Wilco.Dijkstra@arm.com>, "Paul A. Clarke" <pc@us.ibm.com>
References: <20211101202059.1026032-1-adhemerval.zanella@linaro.org>
 <20211101202059.1026032-6-adhemerval.zanella@linaro.org>
 <20211109192800.GA4930@li-24c3614c-2adc-11b2-a85c-85f334518bdb.ibm.com>
 <VE1PR08MB5599A27A3DE03FC5902CDF4F83939@VE1PR08MB5599.eurprd08.prod.outlook.com>
 <37a5bc8c-a9ec-952d-427e-62632f7f7a0a@linaro.org>
 <VE1PR08MB5599992C2E504845EE76799783939@VE1PR08MB5599.eurprd08.prod.outlook.com>
 <384b240c-29c3-af14-05e6-951f00178cff@linaro.org>
 <VE1PR08MB5599C30360944D549AB9699383949@VE1PR08MB5599.eurprd08.prod.outlook.com>
In-Reply-To: <VE1PR08MB5599C30360944D549AB9699383949@VE1PR08MB5599.eurprd08.prod.outlook.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
From: Adhemerval Zanella via Libc-alpha <libc-alpha@sourceware.org>
Reply-To: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cc: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>,
 "libc-alpha@sourceware.org" <libc-alpha@sourceware.org>
Errors-To: libc-alpha-bounces+e=80x24.org@sourceware.org
Sender: "Libc-alpha" <libc-alpha-bounces+e=80x24.org@sourceware.org>


On 11/11/2021 16:48, Wilco Dijkstra wrote:
> Hi Adhemerval,
> 
>>>> Another option is to use the powerpc implementation which favor FP over integer
>>>> as the default one.
>>>
>>> That is the fastest implementation. It is less accurate though (~1.04ULP with FMA
>>> and ~1.21ULP without FMA), so I'm not sure that would be acceptable.
>>
>> This should not be worse than the current default (the powerpc one is essentially
>> the same as default using FP operations).
> 
> The generic version carefully computes x * x + y * y with higher accuracy so that
> the sqrt stays below 1.0ULP. The powerpc version doesn't and so goes over 1.0ULP.

For *hypotf* they are essentially the same, powerpc one just tries to optimize
the isinf/isnan because of the FP->GRP hazards.  I think there is not current 
justification for the TEST_INF_NAN, it would be better to use your suggestion
of on default algorithm and just remove powerpc one:

  if (!isfinite(x) || !isfinite(y))
     {
       a = x; b = y;
       if ((isinf (x) || isinf (y))
	  && !issignaling_inline (x) && !issignaling_inline (y))
	return INFINITY;
       return x + y;
     }

> 
>>> I did some quick optimizations on the new algorithm, on Neoverse N1 my fastest
>>> version is less than 10% slower than the powerpc version, and has ~0.94 ULP error.
>>
>> Do you mean besides the optimized nan/inf checks? I can check if it helps on
>> powerpc.
> 
> Yes. I avoid the unnecessary checks at the end by doing everything in the 3 main
> cases. The division can be made independent of the sqrt so they run in parallel on
> modern cores.
> 
> However we can do even better with FMA and remove the division entirely by 
> special casing the difficult case where x and y are really close. This has only 3.5%
> higher latency than the powerpc version, so that's the fastest option below 1.0ULP.
> I'll see whether it could work without FMA too and send you something to benchmark
> if it passes the testsuite.

The original paper does have a version that uses fma, but it aims to be correctly
rounded:

  double h2 = h * h;
  double ax2 = ax * ax;
  h -= (__builtin_fma (-ay, ay, h2 - ax2)
       + __builtin_fma (h, h, -h2)
       - __builtin_fma (ax, ax, -ax2)) / (2.0 * h);
  return h * scale;

However, at least on recent x86_64 I did not see much improvement over no fma
version. Maybe we can come up with a version that might not be correctly
rounded that can leverage the fma for __FP_FAST_FMA.

(Also this version does not fully pass the testsuite, it trigger some underflow
exceptions that I did not investigate).