From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Patrick McGehearty Newsgroups: gmane.comp.lib.glibc.alpha Subject: Re: [PATCH] v11 Improves __ieee754_exp() performance by greater than 5x on sparc/x86. Date: Wed, 7 Feb 2018 13:19:50 -0600 Message-ID: References: <1517262265-79445-1-git-send-email-patrick.mcgehearty@oracle.com> <361bac88-5538-227f-b6dc-76416178192c@arm.com> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1518031398 10269 195.159.176.226 (7 Feb 2018 19:23:18 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Wed, 7 Feb 2018 19:23:18 +0000 (UTC) User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 To: libc-alpha@sourceware.org Original-X-From: libc-alpha-return-90127-glibc-alpha=m.gmane.org@sourceware.org Wed Feb 07 20:23:13 2018 Return-path: Envelope-to: glibc-alpha@blaine.gmane.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:to:references:from:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; q=dns; s=default; b=mG2syRRqkSxx5AP7 VbSc8JOy9oh40QL15XrRBrBmrpT7XmUSu9EVd0uT8RVkbGL9/TrqzaFpZzDxSHIu QtC5B2VRWA1HNL5n9/COfouKhBaka0Ty9KqsH963uotKLqOldCfKtWq9ogDzmrRs nIoB/vO3vmLhy20E74mcInOVgfg= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:subject:to:references:from:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; s=default; bh=a0bPny/osGcAqF3/ksvuNJ E9Wjc=; b=jpAOch83f0VLUpqm/WbooecgjGZ7TLCzUjprj6YY/S6uNWE9iYK5aM Ke1NvYza8W4gWL+aLzHAXYarycA6x4to1wJ2byVHWi+UMgqfS5cINUwl5ZwHA9x9 fXU3U2o7RwgxuVoYRZYs60F1lDrhObna3pDF72WcP5edzvtyaMEPw= Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Original-Sender: libc-alpha-owner@sourceware.org Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,T_RP_MATCHES_RCVD autolearn=ham version=3.3.2 spammy=H*UA:6.3, erf, million, HX-Envelope-From:sk:patrick X-HELO: aserp2130.oracle.com In-Reply-To: <361bac88-5538-227f-b6dc-76416178192c@arm.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8798 signatures=668663 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=1 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1802070246 Xref: news.gmane.org gmane.comp.lib.glibc.alpha:82475 Archived-At: Received: from server1.sourceware.org ([209.132.180.131] helo=sourceware.org) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1ejVJ0-0001LN-3l for glibc-alpha@blaine.gmane.org; Wed, 07 Feb 2018 20:22:54 +0100 Received: (qmail 15682 invoked by alias); 7 Feb 2018 19:24:55 -0000 Received: (qmail 15233 invoked by uid 89); 7 Feb 2018 19:24:54 -0000 On 2/2/2018 8:40 AM, Szabolcs Nagy wrote: > On 29/01/18 21:44, Patrick McGehearty wrote: >> New with this version: >> Adds updates sparc and x86_64 libm-test-ulps files (1 ulp for >> various exp tests). Rewrite of full comment to reflect current >> state of patch. >> >> Summary of patch rationale >> >> These changes will be active for all platforms that don't provide >> their own exp() routines. They will also be active for ieee754 >> versions of ccos, ccosh, cosh, csin, csinh, sinh, exp10, gamma, and >> erf. >> >> Typical performance gains are 2x on Sparc s7 and 5x on x86_64. >> The former code included a slow path to assure no 1 ulp errors >> that could be 50-200 times slower than the normal path. >> Informal testing suggests perhaps 1 in 200 values might invoke >> the slow path. >> >> Using the glibc_perf tests: >>        sparc (nsec)    x86 (nsec) >>        old     new     old     new >> max   18180   936    4863     275 >> min     399    96      15      15 >> mean   5499   419    1336      24 >> > > i tested this patch on aarch64 against the current code > with the slow path removed and the later was about 10% > faster on both my throughput and latency benchmarks. > (i also removed the rounding mode settings in both cases > as that can be avoided at least on aarch64) > Removing the rounding mode settings certainly makes a big difference in performance. When I started, my base code did not include the rounding mode changes. Unfortunately, that also generated many more test failures in the non-round-to-nearest settings, including failures in other ieee754 functions that used ieee754_exp(). Most platforms do not have a way to avoid the rounding mode settings, making that option unavailable for most users of the generic IEEE754 code. Has there been a serious discussion in the past of to what degree of accuracy glibc/libm should support other rounding modes than round-to-nearest?  If a concensus decision were made that other rounding modes were allowed slightly greater ulp diffs, we could remove all the rounding mode checks and get faster code. Failing that concensus, I don't see how we can bypass the rounding mode checks for the generic code. > so i suggest just removing the slow path first, which > should have good enough error rate and similar performance. I'll look into comparing removing the slow path on Sparc and x86, including running my own "10 million values" test to get a sense of how frequently the slow path is triggered and what the largest relative error that test observes. I'll also run timing tests. May be a little while before I have something to report. > i did some testing and i think it's possible to do the > common case >30% faster with similar table size and around > 0.501 ulp error, with a slower path for values close to > overflow/underflow (at least on aarch64, which has > convert-to-nearest-int instruction that does not depend on > rounding mode, i'll see if it can be done in a generic way) - patrick