From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!.POSTED!not-for-mail
From: Patrick McGehearty <patrick.mcgehearty@oracle.com>
Newsgroups: gmane.comp.lib.glibc.alpha
Subject: Re: [PATCH] v11 Improves __ieee754_exp() performance by greater than
 5x on sparc/x86.
Date: Tue, 13 Feb 2018 19:18:15 -0600
Message-ID: <b3eca297-3aaf-6d71-c556-71b739bb2d1e@oracle.com>
References: <DB6PR0801MB2053765A646D2DCBE02B666883F30@DB6PR0801MB2053.eurprd08.prod.outlook.com>
NNTP-Posting-Host: blaine.gmane.org
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: blaine.gmane.org 1518570991 16571 195.159.176.226 (14 Feb 2018 01:16:31 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Wed, 14 Feb 2018 01:16:31 +0000 (UTC)
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:52.0) Gecko/20100101
 Thunderbird/52.6.0
To: libc-alpha@sourceware.org
Original-X-From: libc-alpha-return-90292-glibc-alpha=m.gmane.org@sourceware.org Wed Feb 14 02:16:27 2018
Return-path: <libc-alpha-return-90292-glibc-alpha=m.gmane.org@sourceware.org>
Envelope-to: glibc-alpha@blaine.gmane.org
DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id
	:list-unsubscribe:list-subscribe:list-archive:list-post
	:list-help:sender:subject:to:references:from:message-id:date
	:mime-version:in-reply-to:content-type
	:content-transfer-encoding; q=dns; s=default; b=IK4MANjRpNaUJApV
	86acPc434/6/3xTG/e37n6J+JvZceFZYsKbFjFShgv9v9c+/JaM5XVzCmM3v7DkM
	f6+nz0SpRVmb9HrML4NxsnYh8A02XazZPy/W36t7Ec6K5y9oTIa9oimuUwhhT/cI
	PHzOVANW3oJw8ckXrdaLvJ0HHXI=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id
	:list-unsubscribe:list-subscribe:list-archive:list-post
	:list-help:sender:subject:to:references:from:message-id:date
	:mime-version:in-reply-to:content-type
	:content-transfer-encoding; s=default; bh=a8klOXNdkahLKopyrIQCok
	9h20g=; b=qDIVnOQ4BNBgoFSsnH0jP8bd5GrTGvEiTu1a8zdC9Tj4msnzcAejoG
	DhR0wIJ90RGp3m8nHMVoOVCltZPAlCSXkSr2tA6W86l0bSBrDzIS7fCY/V7PXmju
	VGCBIax2TrFUXK32nnb9k6FClOOtD7NVLeBZA7tbj/h/tjizi9ViE=
Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm
Precedence: bulk
List-Id: <libc-alpha.sourceware.org>
List-Unsubscribe: <mailto:libc-alpha-unsubscribe-glibc-alpha=m.gmane.org@sourceware.org>
List-Subscribe: <mailto:libc-alpha-subscribe@sourceware.org>
List-Archive: <http://sourceware.org/ml/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-help@sourceware.org>, <http://sourceware.org/ml/#faqs>
Original-Sender: libc-alpha-owner@sourceware.org
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=17584, substantial, million, Hx-languages-length:4744
X-HELO: aserp2120.oracle.com
In-Reply-To: <DB6PR0801MB2053765A646D2DCBE02B666883F30@DB6PR0801MB2053.eurprd08.prod.outlook.com>
X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8804 signatures=668670
X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=1 malwarescore=0
 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999
 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.0.1-1711220000 definitions=main-1802140010
Xref: news.gmane.org gmane.comp.lib.glibc.alpha:82640
Archived-At: <http://permalink.gmane.org/gmane.comp.lib.glibc.alpha/82640>
Received: from server1.sourceware.org ([209.132.180.131] helo=sourceware.org)
 by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from
 <libc-alpha-return-90292-glibc-alpha=m.gmane.org@sourceware.org>) id
 1ellgC-00038j-NT for glibc-alpha@blaine.gmane.org; Wed, 14 Feb 2018 02:16:13
 +0100
Received: (qmail 5212 invoked by alias); 14 Feb 2018 01:18:14 -0000
Received: (qmail 5197 invoked by uid 89); 14 Feb 2018 01:18:13 -0000

I note Szabolcs is proposing to modify ieee754_exp() to
remove the Slow path. Since my proposed patch contains
substantial changes to ieee754_exp(), it makes sense to
only make one of these patches. I've done some data
collection comparing the patches for your consideration.

I've labeled the current code "Slow path", Szabolcs version "No Slow path"
and my version "Patrick's exp()".


Comparisons between Slow path, No Slow path, and Patrick's exp()

Accuracy:
Existing code is assumed accurate with 0 ulp diffs.  Removing the slow
path gets 1 error on the current "make check" test suite.  Running ten
million numbers with each rounding mode shows removing the slow path
only gives an average of 4-5 1 ulp diffs per ten million tests. That is
extremely accurate still.

I also measured how often the slow path was taken for those same ten
million values. It was approximately 135 times per ten million tests
but usually returns the same value as the fast path.  The counts are
slightly different for different rounding modes.

Patrick's exp() also only gets 1 error on the current "make check" test 
suite,
the same test value as the "no slow path" code. It gets approximately
16000 1 ulp diffs per 10 million tests which is somewhat higher
than the "no slow path" code but still relatively rare.


Performance:

       sparc (nsec)                   x86 (nsec)
        slow   no slow  patrick      slow  no slow  patrick
max   17584     710     873        5158     299      275
min     399     398      96          15      15       15
mean   5497     538     419        1333      28       24
Repeated runs show about 2% variance for identical tests.

Notes: Removing the slow path is a huge performance win
on this set of values.
Patrick's version of exp() is 28% faster on Sparc and 14% faster on x86.

In addition, the existing code ("slow" and "no slow" versions) use
data tables with 13808 bytes for interpolation. Patrick's version
uses data tables with 3168 bytes for interpolation. It is hard
to predict what impact the extra 10K bytes might have on
real applications usage of L1 and L2 cache on various architectures.
Patrick's version could be modified to use larger data tables
to improve accuracy with no lose of performance in the glibc tests
but they would not approach the "no slow" accuracy levels.

Summary:
Both the "no slow path" and "Patrick's exp()" show major performance
gains with relatively rare 1 ulp differences in results. The "no slow
path" has the advantage of errors being extremely rare while
"Patrick's exp()" has the advantage of being 14-28% faster.

Any thoughts on general principles on how to decide which patch
to accept, given both seem much more better than the existing code?

- Patrick McGehearty


On 2/8/2018 5:40 AM, Wilco Dijkstra wrote:
> Hi Patrick,
>
>> Has there been a serious discussion in the past of to what degree
>> of accuracy glibc/libm should support other rounding modes than
>> round-to-nearest?  If a concensus decision were made that
>> other rounding modes were allowed slightly greater ulp diffs,
>> we could remove all the rounding mode checks and get
>> faster code. Failing that concensus, I don't see how we
>> can bypass the rounding mode checks for the generic code.
> There have been various discussions, but nothing conclusive. I believe the
> rounding mode changes can be removed from all the key math functions if we
> accept 1 extra ULP in non-nearest rounding modes. As Szabolcs mentioned
> there are some round-to-int idioms used by math functions which rely on a
> specific rounding mode, but we can fix those.
>
> If rounding errors in the more complex functions go up (some are very
> sensitive to ULP), we could consider adding the rounding mode changes there -
> that means you only do it where absolutely necessary, and also in cases where
> the relative overhead is much lower.
>
> Or alternatively we could agree that we don't have a requirement to optimize
> math functions for absolute best possible ULP with different rounding modes,
> and accept larger ULP errors.
>
>> I'll look into comparing removing the slow path on Sparc and
>> x86, including running my own "10 million values" test to
>> get a sense of how frequently the slow path is triggered
>> and what the largest relative error that test observes.
>> I'll also run timing tests.
> Yes I noticed that even when the slow path doesn't trigger, it has a significant
> overhead (log is 18% faster without the slow paths). Note we'll likely post patches
> for removing slow paths in exp, pow as well.
>
> Wilco