From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Patrick McGehearty Newsgroups: gmane.comp.lib.glibc.alpha Subject: Re: [PATCH] v11 Improves __ieee754_exp() performance by greater than 5x on sparc/x86. Date: Tue, 13 Feb 2018 19:18:15 -0600 Message-ID: References: NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1518570991 16571 195.159.176.226 (14 Feb 2018 01:16:31 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 14 Feb 2018 01:16:31 +0000 (UTC) User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 To: libc-alpha@sourceware.org Original-X-From: libc-alpha-return-90292-glibc-alpha=m.gmane.org@sourceware.org Wed Feb 14 02:16:27 2018 Return-path: Envelope-to: glibc-alpha@blaine.gmane.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:to:references:from:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; q=dns; s=default; b=IK4MANjRpNaUJApV 86acPc434/6/3xTG/e37n6J+JvZceFZYsKbFjFShgv9v9c+/JaM5XVzCmM3v7DkM f6+nz0SpRVmb9HrML4NxsnYh8A02XazZPy/W36t7Ec6K5y9oTIa9oimuUwhhT/cI PHzOVANW3oJw8ckXrdaLvJ0HHXI= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:to:references:from:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; s=default; bh=a8klOXNdkahLKopyrIQCok 9h20g=; b=qDIVnOQ4BNBgoFSsnH0jP8bd5GrTGvEiTu1a8zdC9Tj4msnzcAejoG DhR0wIJ90RGp3m8nHMVoOVCltZPAlCSXkSr2tA6W86l0bSBrDzIS7fCY/V7PXmju VGCBIax2TrFUXK32nnb9k6FClOOtD7NVLeBZA7tbj/h/tjizi9ViE= Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Original-Sender: libc-alpha-owner@sourceware.org Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,SPF_PASS,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=17584, substantial, million, Hx-languages-length:4744 X-HELO: aserp2120.oracle.com In-Reply-To: X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8804 signatures=668670 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=1 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1802140010 Xref: news.gmane.org gmane.comp.lib.glibc.alpha:82640 Archived-At: Received: from server1.sourceware.org ([209.132.180.131] helo=sourceware.org) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ellgC-00038j-NT for glibc-alpha@blaine.gmane.org; Wed, 14 Feb 2018 02:16:13 +0100 Received: (qmail 5212 invoked by alias); 14 Feb 2018 01:18:14 -0000 Received: (qmail 5197 invoked by uid 89); 14 Feb 2018 01:18:13 -0000 I note Szabolcs is proposing to modify ieee754_exp() to remove the Slow path. Since my proposed patch contains substantial changes to ieee754_exp(), it makes sense to only make one of these patches. I've done some data collection comparing the patches for your consideration. I've labeled the current code "Slow path", Szabolcs version "No Slow path" and my version "Patrick's exp()". Comparisons between Slow path, No Slow path, and Patrick's exp() Accuracy: Existing code is assumed accurate with 0 ulp diffs.  Removing the slow path gets 1 error on the current "make check" test suite.  Running ten million numbers with each rounding mode shows removing the slow path only gives an average of 4-5 1 ulp diffs per ten million tests. That is extremely accurate still. I also measured how often the slow path was taken for those same ten million values. It was approximately 135 times per ten million tests but usually returns the same value as the fast path.  The counts are slightly different for different rounding modes. Patrick's exp() also only gets 1 error on the current "make check" test suite, the same test value as the "no slow path" code. It gets approximately 16000 1 ulp diffs per 10 million tests which is somewhat higher than the "no slow path" code but still relatively rare. Performance:       sparc (nsec)                   x86 (nsec)        slow   no slow  patrick      slow  no slow  patrick max   17584     710     873        5158     299      275 min     399     398      96          15      15       15 mean   5497     538     419        1333      28       24 Repeated runs show about 2% variance for identical tests. Notes: Removing the slow path is a huge performance win on this set of values. Patrick's version of exp() is 28% faster on Sparc and 14% faster on x86. In addition, the existing code ("slow" and "no slow" versions) use data tables with 13808 bytes for interpolation. Patrick's version uses data tables with 3168 bytes for interpolation. It is hard to predict what impact the extra 10K bytes might have on real applications usage of L1 and L2 cache on various architectures. Patrick's version could be modified to use larger data tables to improve accuracy with no lose of performance in the glibc tests but they would not approach the "no slow" accuracy levels. Summary: Both the "no slow path" and "Patrick's exp()" show major performance gains with relatively rare 1 ulp differences in results. The "no slow path" has the advantage of errors being extremely rare while "Patrick's exp()" has the advantage of being 14-28% faster. Any thoughts on general principles on how to decide which patch to accept, given both seem much more better than the existing code? - Patrick McGehearty On 2/8/2018 5:40 AM, Wilco Dijkstra wrote: > Hi Patrick, > >> Has there been a serious discussion in the past of to what degree >> of accuracy glibc/libm should support other rounding modes than >> round-to-nearest? If a concensus decision were made that >> other rounding modes were allowed slightly greater ulp diffs, >> we could remove all the rounding mode checks and get >> faster code. Failing that concensus, I don't see how we >> can bypass the rounding mode checks for the generic code. > There have been various discussions, but nothing conclusive. I believe the > rounding mode changes can be removed from all the key math functions if we > accept 1 extra ULP in non-nearest rounding modes. As Szabolcs mentioned > there are some round-to-int idioms used by math functions which rely on a > specific rounding mode, but we can fix those. > > If rounding errors in the more complex functions go up (some are very > sensitive to ULP), we could consider adding the rounding mode changes there - > that means you only do it where absolutely necessary, and also in cases where > the relative overhead is much lower. > > Or alternatively we could agree that we don't have a requirement to optimize > math functions for absolute best possible ULP with different rounding modes, > and accept larger ULP errors. > >> I'll look into comparing removing the slow path on Sparc and >> x86, including running my own "10 million values" test to >> get a sense of how frequently the slow path is triggered >> and what the largest relative error that test observes. >> I'll also run timing tests. > Yes I noticed that even when the slow path doesn't trigger, it has a significant > overhead (log is 18% faster without the slow paths). Note we'll likely post patches > for removing slow paths in exp, pow as well. > > Wilco