unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* glibc benchmarks' results can be unreliable for short runtimes (on Aarch64)
@ 2019-06-21  7:20 Anton Youdkevitch
  0 siblings, 0 replies; 3+ messages in thread
From: Anton Youdkevitch @ 2019-06-21  7:20 UTC (permalink / raw
  To: libc-alpha

Folks,

Recently I was doing an optimized implementation of memcpy/memmove
or TX2. While running internal microbenchmarks I noticed that for the
"fast" benchmarks (~10ms runtime) the results vary quite significantly
across runs (5%-20%). It is possible to find two runs that show my
implementation actually significantly worsened the performance. Also
there are (quite common) cases when the "baseline" implementation
gets worse and the "tested" implementation gets better (or vice versa)
across the runs.

The first solution to this that comes to mind is to increase the runtime
for the "fast" benchmarks. If I increase bench-memcpy runtime 32x (the
actual runtime for TX2 would be ~2s) the results for a particular
implementation are always within 5% range. The effect of one benchmark
gains and another one loses for different runs while not as significant 
still
remains.

So, are there any reasons not to bumping up the runtime of the
"fast" benchmarks to 1s-2s?

-- 
   Thanks,
   Anton

^ permalink raw reply	[flat|nested] 3+ messages in thread

* RE: glibc benchmarks' results can be unreliable for short runtimes (on Aarch64)
@ 2019-06-21 12:01 Wilco Dijkstra
  2019-06-24  7:52 ` Anton Youdkevitch
  0 siblings, 1 reply; 3+ messages in thread
From: Wilco Dijkstra @ 2019-06-21 12:01 UTC (permalink / raw
  To: Anton Youdkevitch, libc-alpha@sourceware.org; +Cc: nd

Hi Anton,

> Recently I was doing an optimized implementation of memcpy/memmove
> or TX2. While running internal microbenchmarks I noticed that for the
> "fast" benchmarks (~10ms runtime) the results vary quite significantly
> across runs (5%-20%). It is possible to find two runs that show my
> implementation actually significantly worsened the performance. Also
> there are (quite common) cases when the "baseline" implementation
> gets worse and the "tested" implementation gets better (or vice versa)
> across the runs.

Yes this is certainly possible for any short running benchmark, which is
why I recently increased the minimum iteration count 128 times. I ran
it on a fixed frequency server and got quite stable results. However if your
CPU does frequency scaling then 10ms is likely too short for consistent results.

> The first solution to this that comes to mind is to increase the runtime
> for the "fast" benchmarks. If I increase bench-memcpy runtime 32x (the
> actual runtime for TX2 would be ~2s) the results for a particular
> implementation are always within 5% range. The effect of one benchmark
> gains and another one loses for different runs while not as significant still
> remains.
>
> So, are there any reasons not to bumping up the runtime of the
> "fast" benchmarks to 1s-2s?

1 second per benchmark sounds reasonable, however if you just increase 
INNER_LOOP_ITERS a lot then various benchmarks will become way too
slow. So you may need to move them to INNER_LOOP_ITERS_MEDIUM
or something similar. If you use "time $(run-bench)" in the benchtests makefile
it prints out the time for each benchmark.

Wilco

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: glibc benchmarks' results can be unreliable for short runtimes (on Aarch64)
  2019-06-21 12:01 glibc benchmarks' results can be unreliable for short runtimes (on Aarch64) Wilco Dijkstra
@ 2019-06-24  7:52 ` Anton Youdkevitch
  0 siblings, 0 replies; 3+ messages in thread
From: Anton Youdkevitch @ 2019-06-24  7:52 UTC (permalink / raw
  To: Wilco Dijkstra, libc-alpha@sourceware.org; +Cc: nd

Wilco,

On 6/21/2019 2:01 PM, Wilco Dijkstra wrote:
> Hi Anton,
>> Recently I was doing an optimized implementation of memcpy/memmove or 
>> TX2. While running internal microbenchmarks I noticed that for the 
>> "fast" benchmarks (~10ms runtime) the results vary quite 
>> significantly across runs (5%-20%). It is possible to find two runs 
>> that show my implementation actually significantly worsened the 
>> performance. Also there are (quite common) cases when the "baseline" 
>> implementation gets worse and the "tested" implementation gets better 
>> (or vice versa) across the runs. 
> Yes this is certainly possible for any short running benchmark, which 
> is why I recently increased the minimum iteration count 128 times. I 
> ran it on a fixed frequency server and got quite stable results. 
> However if your CPU does frequency scaling then 10ms is likely too 
> short for consistent results.
I think that we can assume frequency throttling to be a general rule 
these days.

>> The first solution to this that comes to mind is to increase the 
>> runtime for the "fast" benchmarks. If I increase bench-memcpy runtime 
>> 32x (the actual runtime for TX2 would be ~2s) the results for a 
>> particular implementation are always within 5% range. The effect of 
>> one benchmark gains and another one loses for different runs while 
>> not as significant still remains. So, are there any reasons not to 
>> bumping up the runtime of the "fast" benchmarks to 1s-2s? 
> 1 second per benchmark sounds reasonable, however if you just increase 
> INNER_LOOP_ITERS a lot then various benchmarks will become way too 
> slow. So you may need to move them to INNER_LOOP_ITERS_MEDIUM or 
> something similar. If you use "time $(run-bench)" in the benchtests 
> makefile it prints out the time for each benchmark.

OK, I understand this, thanks. I will use INNER_LOOP_ITERS_MEDIUM then.

-- 
   Thanks,
   Anton

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-06-24  7:53 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2019-06-21 12:01 glibc benchmarks' results can be unreliable for short runtimes (on Aarch64) Wilco Dijkstra
2019-06-24  7:52 ` Anton Youdkevitch
  -- strict thread matches above, loose matches on Subject: below --
2019-06-21  7:20 Anton Youdkevitch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).