From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Szabolcs Nagy Newsgroups: gmane.comp.lib.glibc.alpha Subject: Re: [PATCH] v11 Improves __ieee754_exp() performance by greater than 5x on sparc/x86. Date: Thu, 22 Feb 2018 19:22:03 +0000 Message-ID: References: NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Trace: blaine.gmane.org 1519327216 16817 195.159.176.226 (22 Feb 2018 19:20:16 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Thu, 22 Feb 2018 19:20:16 +0000 (UTC) User-Agent: Mozilla/5.0 (X11; Linux aarch64; rv:52.0) Gecko/20100101 Thunderbird/52.5.0 Cc: nd@arm.com To: Patrick McGehearty , libc-alpha@sourceware.org Original-X-From: libc-alpha-return-90497-glibc-alpha=m.gmane.org@sourceware.org Thu Feb 22 20:20:12 2018 Return-path: Envelope-to: glibc-alpha@blaine.gmane.org DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:cc:subject:to:references:from:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; q=dns; s=default; b=dmOR194jLnNAMvHG YjdHSJMpXXtOiDnC9HtNwQM5NGgPulYdE1W8JI2qPBcer8D3/U/axGAr8j3Eh0iv 9cAyF1oMBUUq68PLpmF/l2+F1MDeLBZXcBGrBSGRdcQoE24SGVHc2VHPLOh9raKs iN5ZE7s01HJFOqHvg15iANuzYj0= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:cc:subject:to:references:from:message-id:date :mime-version:in-reply-to:content-type :content-transfer-encoding; s=default; bh=JdZLZz4P/1JNJb4rjGxOzw oh4Nw=; b=hLTmpNafDpUKXt+YYN1qlEcvAyhxHeQgztI8MQdmn1FMQ4hMfEMRrA OA6aZWdT073jaGQ9f6Ujameg2A/c0/pRIcnOdIDx7QIntwafZDfp03kLofkEYuF0 vKHic8p/PHXnK5V9v7e8pMcXGxDjoS894ZNWyF9hbNKlrf5FmGee8= Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Original-Sender: libc-alpha-owner@sourceware.org Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: =?ISO-8859-1?Q?No, score=-2.4 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 spammy=Patrick, 28=c2, nsec, million?= X-HELO: EUR02-HE1-obe.outbound.protection.outlook.com Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Szabolcs.Nagy@arm.com; In-Reply-To: X-ClientProxiedBy: CWLP265CA0015.GBRP265.PROD.OUTLOOK.COM (2603:10a6:401:10::27) To VI1PR0802MB2496.eurprd08.prod.outlook.com (2603:10a6:800:b8::23) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-HT: Tenant X-MS-Office365-Filtering-Correlation-Id: 36f49ee0-9024-4976-8ac9-08d57a298f7e X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(7020095)(4652020)(48565401081)(5600026)(4604075)(2017052603307)(7153060)(7193020);SRVR:VI1PR0802MB2496; X-Microsoft-Exchange-Diagnostics: 1;VI1PR0802MB2496;3:t1qDyzwU6DhKj9uE/fpYA9FR3FriEt2ZFq+mx4QwGOq0OAwkjOOHgnQXyggXKSpXIaB0KtW6TtM9niN0h8bu0e4GtD9X7ncSOg12PzAuNqKjs/cXg5PzSFuQ3Xmnsz7x1KIccR0ONz/McdxHx2TZF1QUizSNU5q03C9WmlJfGDQkpT5RujJRWPPZnbImPrgl8YuyUSj1eDcSFhCmgy6+lNy1TQjgBRT3Sbnw5iDtYu46yn4mouteNcH2XtQfCgfk;25:uhLtuvlq7EM0xTRqQ4E/cigmWaTKbhGh+p+s0X2Q+Wa8BfKNfRAeUubJ6nMUAPwY3nLF7lhvWhCcmfLQmp+GxXS0Y2PFsSed+Wf816fC4kHKcOYMBMWAT797Y9/4/M/OWoObH1i/gsdlHDKNV+wydHadt2aFs4b5O9263oEU/oETCc1Ntb6zQak3QnZHHNNb66C71PUwGd6hKjmbrkoLWhAojpNMbvv0WVTugzt1BTd0XP0/E7G3TyYwyMx2MxSVNcnQH+f9qPDGDrWn9mRXPlPgzid+WkGx7g8b8/XgH59dnFaBTs4+pj234TigjrwAVN+AMaBSsgDrY+1zrSwmBw==;31:4yY1oZuXCBnvyUci3l9mSylXQva6FaFRsoNLiMVfVHSZwaOHgi5m9mW+m4CarYNqMnmPFqpUIMyn79sCNcNK/X03IVpTjsDCPzOCVBpvA/c9/1/hnbFduUwkCUeXPV0UZ+cqa3yeU9EUD4EwyL2lP++tr3lPp0YJf0CFfvXS+zW fom7oWWtgTOBM65Au+BH3s/JKpZDj6ZGw6DU1EW7CABEMA/XCmDJ0Z5EK3Cwac/U= X-MS-TrafficTypeDiagnostic: VI1PR0802MB2496: NoDisclaimer: True X-Microsoft-Exchange-Diagnostics: 1;VI1PR0802MB2496;20:VetQ1L6/W5/GPjZTkC8nxBnpuNUb/P/QETShRifmhyw13muu1CgE+wm4UfiZkT/gsijIbpMKbidA+u5NEqa2OdWZRSDtUd1Juxs+Zx7Jqah8WYv56M9GJOm7itaxSVFChj5mkbCGJEH/gShxBG9rN8aODAfGU9U9PA8VrOwopx8=;4:r3CNddMpMEgEQR27TE01KhBYH+XVG7bzUNKPku//8v/OaTXgIOKFVdopubNUFzYiwSZitETjty6enI4WYwZqNl1EOJpuj0rtu9aIiPinK4fr+oS10z0ZwFOqm6UX2X0tKMcOxliGyQHoYBQOgNzGDkfJD/t5GrAxPTvJGG/v+BrmIlX72N3OPEmgTeQMSsWnZTZm5reuw9Vhcp07ZYlrOivrCuT8abL5lVSBx0iqhLb4LE3yyS/eb1u2yTdgXl67aBQ9FwW1nxuDejbOu7pf/SoGj9LtWfiRlRGWfTm8IDnKMVVoPQDElUNJXjt4UZDr X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(278428928389397); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(8211001082)(6040501)(2401047)(5005006)(8121501046)(3002001)(10201501046)(93006095)(93001095)(3231101)(944501161)(6055026)(6041288)(20161123562045)(20161123560045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(20161123564045)(20161123558120)(6072148)(201708071742011);SRVR:VI1PR0802MB2496;BCL:0;PCL:0;RULEID:;SRVR:VI1PR0802MB2496; X-Forefront-PRVS: 059185FE08 X-Forefront-Antispam-Report: SFV:NSPM;SFS:(10009020)(6049001)(366004)(39380400002)(376002)(396003)(39860400002)(346002)(199004)(189003)(16526019)(6116002)(386003)(2870700001)(53936002)(68736007)(53546011)(2950100002)(72206003)(50466002)(478600001)(31686004)(2906002)(64126003)(26005)(7736002)(106356001)(77096007)(305945005)(3846002)(4326008)(5660300001)(8936002)(66066001)(65956001)(47776003)(65806001)(81166006)(76176011)(229853002)(81156014)(8676002)(25786009)(36756003)(58126008)(52146003)(2486003)(86362001)(6246003)(52116002)(23676004)(6486002)(316002)(65826007)(97736004)(16576012)(105586002)(67846002)(31696002);DIR:OUT;SFP:1101;SCL:1;SRVR:VI1PR0802MB2496;H:[10.2.206.230];FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; Received-SPF: None (protection.outlook.com: arm.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?utf-8?B?MTtWSTFQUjA4MDJNQjI0OTY7MjM6YmgxaFR3aVBOTVpadGpJQUl2aW5rbU5W?= =?utf-8?B?Q1lWcmxibG1DQXVOcG44RGNLdmVzQW8wUDdzRHIvMytYc0ZrVDlwK3BZeUN6?= =?utf-8?B?a0QvblV6VzBqZFV1N2NVcjJMVWJpcVJjNldnVXRrZENybkJqU3JNSDdpYTdy?= =?utf-8?B?d1Z4amtQeTlOM2hUNTVkcHRIRFlDNURvSHYwTE0yYW5CZjRQeXh0T1BZejRP?= =?utf-8?B?YzZlWnovL2tUZWpYRTExcHhMSDlRWERCRnF1UU5pb0t5eEF2ZUNSMG5sZStu?= =?utf-8?B?Mkl1VEZpaDU1YUJZN1B4UEpER29pbTV5T0NITTNjVHE2L0RKblhWYzc0SkdN?= =?utf-8?B?M1FSNGhDN2gvNFNpdGJjakltYng2WjFBQlBpemdTSjUyNnpTV3lqem9UU2lw?= =?utf-8?B?Y1RKNnNUdGpad2RUU1ZPOTgvMHlMeEwwYnA0SkxjMUxZeklNclNsaVdtWTlC?= =?utf-8?B?VG1DSUs4ZWhRVjJIcWNoOXdpNjExM1ZlZyt1cExmNWgzYnI2WXhSak1HVTJB?= =?utf-8?B?SEx4SE9vblB5Q0FCd205UVV2NXpML0hXMlo5d21xVHlvVjI1OXVqZGdlQWZC?= =?utf-8?B?Y0RjSjFTZWRxUkQ4OVorRWdOZTEyVEN1SVJDb0UvOHV3QUUxYjY2QnZNNTdZ?= =?utf-8?B?aWVoWExlY2N6YjFITTliSFhiaGtDRC9nYXAwVm11U3 X-Microsoft-Exchange-Diagnostics: 1;VI1PR0802MB2496;6:gCbi8aBUlk7scCLOaUTStNXSMj8nD2qzZI2tWkahejc2QPoOzjNnIgMhsnnfKf9jxnTjAvTGUnrVGH+/PnHRWSAkfwqMaOb7MvZAW2kBeEhCD17p7NiPF+pSaDa9GnaOEtQ/FQqKQiQcCnl72obwWUpAvUXE3S9+6T4lx/NxKoOHzXEsodk5d2yi3+hI3JImamSyPmUkvECU0s0ZnLAaK0dQpewXiJ6iVxSSSme1sMs17qG69lJImvtWUDUdlZZxrTDsgtGC9M1v7une0vvgQYbHtNNhyEh4rqRfGpnMufqei5T/1rvnpBpR11oUiz7fqcwov5e6d87ugwzL2dNvMC90HwpWU3DHlTGD7zfXkY4=;5:Ge53SQBDOZMpD4U0qkqk5y/XR3I9TOdbcyACUgVugP8LkwFAdzj5qjX65Up8NEnANMr5Sj0BPa41yGU3W0O/Pw4jQloCLhePbGb5HRWnO//lnQFyja+msSINDyOZbBNG4mHqVCGdX0FvxBBQ9/ksda6kOyMqNK5Z1J3p3WxZINw=;24:thu/XCsUz3MNQ5kpj9YhVAQUR/emiQ4fDs/z2Eqos2ci2vNPVm4tdC3MkGJblexBoV120BAxgAsTN573oEms5zXJ/IvZ3QWbW2+R1QWzE14=;7:A90AjkEQbRa4BjEdxPq23+w+Ij4JfCaUvQfugvtHanGsoIGZpnOIobEkyDwH6h3DtbnBM2T4Zak+Fl74K0ZSPb9mCM/JtxVSY/CyS3HwaTfMluzrV1WWVoNJBLcb2 iqZPmfH6B2LtIJMAldeLEAfm61Lk0Tx4osc1rUKlCxb1prVoGrH4Ywre0BTN6jb6x SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 22 Feb 2018 19:22:05.9663 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 36f49ee0-9024-4976-8ac9-08d57a298f7e X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0802MB2496 Xref: news.gmane.org gmane.comp.lib.glibc.alpha:82829 Archived-At: Received: from server1.sourceware.org ([209.132.180.131] helo=sourceware.org) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eowPb-0003zc-B5 for glibc-alpha@blaine.gmane.org; Thu, 22 Feb 2018 20:20:11 +0100 Received: (qmail 76594 invoked by alias); 22 Feb 2018 19:22:13 -0000 Received: (qmail 76585 invoked by uid 89); 22 Feb 2018 19:22:12 -0000 On 14/02/18 01:18, Patrick McGehearty wrote: > I note Szabolcs is proposing to modify ieee754_exp() to > remove the Slow path. Since my proposed patch contains > substantial changes to ieee754_exp(), it makes sense to > only make one of these patches. I've done some data > collection comparing the patches for your consideration. > > I've labeled the current code "Slow path", Szabolcs version "No Slow path" > and my version "Patrick's exp()". > > > Comparisons between Slow path, No Slow path, and Patrick's exp() > > Accuracy: > Existing code is assumed accurate with 0 ulp diffs.  Removing the slow > path gets 1 error on the current "make check" test suite.  Running ten > million numbers with each rounding mode shows removing the slow path > only gives an average of 4-5 1 ulp diffs per ten million tests. That is > extremely accurate still. > > I also measured how often the slow path was taken for those same ten > million values. It was approximately 135 times per ten million tests > but usually returns the same value as the fast path.  The counts are > slightly different for different rounding modes. > > Patrick's exp() also only gets 1 error on the current "make check" test suite, > the same test value as the "no slow path" code. It gets approximately > 16000 1 ulp diffs per 10 million tests which is somewhat higher > than the "no slow path" code but still relatively rare. > > > Performance: > >       sparc (nsec)                   x86 (nsec) >        slow   no slow  patrick      slow  no slow  patrick > max   17584     710     873        5158     299      275 > min     399     398      96          15      15       15 > mean   5497     538     419        1333      28       24 > Repeated runs show about 2% variance for identical tests. > > Notes: Removing the slow path is a huge performance win > on this set of values. > Patrick's version of exp() is 28% faster on Sparc and 14% faster on x86. > > In addition, the existing code ("slow" and "no slow" versions) use > data tables with 13808 bytes for interpolation. Patrick's version > uses data tables with 3168 bytes for interpolation. It is hard > to predict what impact the extra 10K bytes might have on > real applications usage of L1 and L2 cache on various architectures. > Patrick's version could be modified to use larger data tables > to improve accuracy with no lose of performance in the glibc tests > but they would not approach the "no slow" accuracy levels. > did some more work on exp. the 'patrick' version uses different methods for small values (< 3/2 ln2) and larger ones. previously i benchmarked with large values, on those the current glibc code (no slow) is actually faster than patrick on aarch64. when i benchmark with small values (i suspect that's more common in practice) then the patrick version is reasonably fast. i use a single method (nsz exp): on larger inputs it's about 30% latency improvement compared to noslow and patrick, on small values i get a tiny bit better latency than patrick (2-3%). however that relies on having single instruction, rounding mode independent toint (aarch64), when i change the code to be portable then it is slower on small values compared to patrick (almost 10%), on large values it's still about 25% faster. so i think i have something that's good for aarch64 and i think it may be an improvement on all targets compared to noslow, but it's not better than patrick version for small values on most targets. (i removed rounding mode settings from patrick, noslow and nsz that should be valid for nsz exp and i think for patrick too, i don't remember why the rounding mode changes were needed there) it needs a bit more work still before i can post something. > Summary: > Both the "no slow path" and "Patrick's exp()" show major performance > gains with relatively rare 1 ulp differences in results. The "no slow > path" has the advantage of errors being extremely rare while > "Patrick's exp()" has the advantage of being 14-28% faster. > > Any thoughts on general principles on how to decide which patch > to accept, given both seem much more better than the existing code? > > - Patrick McGehearty >