unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Szabolcs Nagy <szabolcs.nagy@arm.com>
To: Patrick McGehearty <patrick.mcgehearty@oracle.com>,
	libc-alpha@sourceware.org
Cc: nd@arm.com
Subject: Re: [PATCH] v11 Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
Date: Thu, 22 Feb 2018 19:22:03 +0000	[thread overview]
Message-ID: <aecb644f-2c8d-3443-2155-1aacae5e2c4b@arm.com> (raw)
In-Reply-To: <b3eca297-3aaf-6d71-c556-71b739bb2d1e@oracle.com>

On 14/02/18 01:18, Patrick McGehearty wrote:
> I note Szabolcs is proposing to modify ieee754_exp() to
> remove the Slow path. Since my proposed patch contains
> substantial changes to ieee754_exp(), it makes sense to
> only make one of these patches. I've done some data
> collection comparing the patches for your consideration.
> 
> I've labeled the current code "Slow path", Szabolcs version "No Slow path"
> and my version "Patrick's exp()".
> 
> 
> Comparisons between Slow path, No Slow path, and Patrick's exp()
> 
> Accuracy:
> Existing code is assumed accurate with 0 ulp diffs.  Removing the slow
> path gets 1 error on the current "make check" test suite.  Running ten
> million numbers with each rounding mode shows removing the slow path
> only gives an average of 4-5 1 ulp diffs per ten million tests. That is
> extremely accurate still.
> 
> I also measured how often the slow path was taken for those same ten
> million values. It was approximately 135 times per ten million tests
> but usually returns the same value as the fast path.  The counts are
> slightly different for different rounding modes.
> 
> Patrick's exp() also only gets 1 error on the current "make check" test suite,
> the same test value as the "no slow path" code. It gets approximately
> 16000 1 ulp diffs per 10 million tests which is somewhat higher
> than the "no slow path" code but still relatively rare.
> 
> 
> Performance:
> 
>        sparc (nsec)                   x86 (nsec)
>         slow   no slow  patrick      slow  no slow  patrick
> max   17584     710     873        5158     299      275
> min     399     398      96          15      15       15
> mean   5497     538     419        1333      28       24
> Repeated runs show about 2% variance for identical tests.
> 
> Notes: Removing the slow path is a huge performance win
> on this set of values.
> Patrick's version of exp() is 28% faster on Sparc and 14% faster on x86.
> 
> In addition, the existing code ("slow" and "no slow" versions) use
> data tables with 13808 bytes for interpolation. Patrick's version
> uses data tables with 3168 bytes for interpolation. It is hard
> to predict what impact the extra 10K bytes might have on
> real applications usage of L1 and L2 cache on various architectures.
> Patrick's version could be modified to use larger data tables
> to improve accuracy with no lose of performance in the glibc tests
> but they would not approach the "no slow" accuracy levels.
> 

did some more work on exp.

the 'patrick' version uses different methods for small values
(< 3/2 ln2) and larger ones.

previously i benchmarked with large values, on those the current
glibc code (no slow) is actually faster than patrick on aarch64.

when i benchmark with small values (i suspect that's more common
in practice) then the patrick version is reasonably fast.

i use a single method (nsz exp): on larger inputs it's about 30%
latency improvement compared to noslow and patrick, on small
values i get a tiny bit better latency than patrick (2-3%).

however that relies on having single instruction, rounding mode
independent toint (aarch64), when i change the code to be portable
then it is slower on small values compared to patrick (almost 10%),
on large values it's still about 25% faster.

so i think i have something that's good for aarch64 and i think
it may be an improvement on all targets compared to noslow,
but it's not better than patrick version for small values on
most targets.

(i removed rounding mode settings from patrick, noslow and nsz
that should be valid for nsz exp and i think for patrick too,
i don't remember why the rounding mode changes were needed there)

it needs a bit more work still before i can post something.

> Summary:
> Both the "no slow path" and "Patrick's exp()" show major performance
> gains with relatively rare 1 ulp differences in results. The "no slow
> path" has the advantage of errors being extremely rare while
> "Patrick's exp()" has the advantage of being 14-28% faster.
> 
> Any thoughts on general principles on how to decide which patch
> to accept, given both seem much more better than the existing code?
> 
> - Patrick McGehearty
> 


  parent reply	other threads:[~2018-02-22 19:20 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-08 11:40 [PATCH] v11 Improves __ieee754_exp() performance by greater than 5x on sparc/x86 Wilco Dijkstra
2018-02-14  1:18 ` Patrick McGehearty
2018-02-14 16:41   ` Joseph Myers
2018-02-14 20:05     ` Szabolcs Nagy
2018-02-22 19:22   ` Szabolcs Nagy [this message]
  -- strict thread matches above, loose matches on Subject: below --
2018-01-29 21:44 Patrick McGehearty
2018-02-02 14:40 ` Szabolcs Nagy
2018-02-02 15:33   ` Joseph Myers
2018-02-02 16:35     ` Szabolcs Nagy
2018-02-07 19:19   ` Patrick McGehearty

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/libc/involved.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aecb644f-2c8d-3443-2155-1aacae5e2c4b@arm.com \
    --to=szabolcs.nagy@arm.com \
    --cc=libc-alpha@sourceware.org \
    --cc=nd@arm.com \
    --cc=patrick.mcgehearty@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).