Re: [PATCH v2] Loosen the limits of time/tst-cpuclock1.

unofficial mirror of libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Stefan Liebler via Libc-alpha <libc-alpha@sourceware.org>
To: Adhemerval Zanella <adhemerval.zanella@linaro.org>,
	libc-alpha@sourceware.org, "Carlos O'Donell" <carlos@redhat.com>
Subject: Re: [PATCH v2] Loosen the limits of time/tst-cpuclock1.
Date: Fri, 23 Oct 2020 10:59:43 +0200	[thread overview]
Message-ID: <730e513b-2650-b6a2-98c3-cd99f1e35b6b@linux.ibm.com> (raw)
In-Reply-To: <a57ca219-5775-1c56-5a72-4b67f87168c4@linaro.org>

On 10/21/20 2:58 PM, Adhemerval Zanella wrote:
> 
> 
> On 19/10/2020 11:47, Stefan Liebler via Libc-alpha wrote:
>> Starting with the commit 04deeaa9ea74b0679dfc9d9155a37b6425f19a9f
>> "Fix time/tst-cpuclock1 intermitent failures" (2020-07-11),
>> this test fails quite often on s390x/s390 with one/multiple of those:
>> "before - after" / "nanosleep time" / "dead - after" ourside reasonable range.
>>
>> On a zVM/kvm guest the CPUs are shared between multiple guests.
>> And even on the lpar (kvm host) the CPUs are usually shared between multiple lpars.
>> The defined CPUs for a lpar/zVM-system could also have lower weights compared
>> to other lpars which let the steal time further grow.
>>
>> Usually I build (-j$(nproc)) and test (PARALLELMFLAGS="-j$(nproc)") glibc multiple
>> times, e.g. with different GCCs, on various lpars or zVM guests at the same time.
>> During this time, I've run the test for 13500 times and obvserved the following fails:
>> ~600x "before - after"
>> ~60x "nanosleep time"
>> ~70x "dead - after"
>>
>> I've also observed a lot of "before - after" fails on a intel kvm-guest while
>> building/testing glibc on it.
>>
>> The mentioned commit has tighten the limits of valid tv_nsec ranges:
>> "before - after" (expected: 500000000):
>> - 100000000 ... 600000000
>> + 450000000 ... 550000000
>>
>> "nanosleep time" (expected: 100000000):
>> - 100000000 ... 200000000
>> + 090000000 ... 120000000
>>
>> "dead - after" (expected: 100000000):
>> -           ... 200000000
>> + 090000000 ... 120000000
>>
>> The test itself forks a child process which chew_cpu (user- and kernel-space).
>> The parent process sleeps with nanosleep(0.5s) and measures the child_clock time:
>> diff = after - before
>> With much workload on the machine, the child won't make much progess
>> and it can fall much beyond the minimum limit. Thus this check is now removed!
> 
> Ok.
> 
>>
>> Afterwards the parent process sleeps with clock_nanosleep (child_clock, 0.1s):
>> diff = afterns - after
>> The test currently also allows 0.9 * 0.1s.  As this would be an error, the
>> hard limit of 1.0 * 0.1s is now used as minimum border!
>> Depending on the workload, the maximum limit can exceed the 1.2 * 0.1s.
>> Therefore the upper limit is set to 2.0 which was also used before the
>> mentioned commit.
> 
> This is still tricky and add heuristic values that might fail depending
> of the architecture/kernel.  Wouldn't be better to follow Carlos suggestion
> and strip down from the test all the time related checks and only keep 
> the functional interfaces of the ABI:
> 
>   * clock_getcpuclockid vs. ENOSYS / ESRCH / EPERM
>   * clock_getcpuclockid vs. valid child
>   * clock_gettime of dead child where clock is no longer valid
> 
Sure, I can just remove this last time related check. This would be the
following diff and the title of the new commit would be "Remove timing
related checks of time/tst-cpuclock1":
diff --git a/time/tst-cpuclock1.c b/time/tst-cpuclock1.c
index cc08150654..f40b590111 100644
--- a/time/tst-cpuclock1.c
+++ b/time/tst-cpuclock1.c
@@ -26,7 +26,6 @@
 #include <signal.h>
 #include <stdint.h>
 #include <sys/wait.h>
-#include <support/timespec.h>

 /* This function is intended to rack up both user and system time.  */
 static void
@@ -163,21 +162,6 @@ do_test (void)
          printf ("live PID %d after sleep => %ju.%.9ju\n",
                  child, (uintmax_t) afterns.tv_sec,
                  (uintmax_t) afterns.tv_nsec);
-
-         /* As the sleep is based on the child clock, the diff should never
-            be less than the specified sleeptime.  Otherwise this is an
error.
-            The upper bound is quite high in order to get no failure if
running
-            with high cpu usage and/or on virtualized environments with
shared
-            CPUs.  */
-         struct timespec diff;
-         diff = timespec_sub (support_timespec_normalize (afterns),
-                              support_timespec_normalize (before));
-         if (!support_timespec_check_in_range (sleeptime, diff, 1.0, 2.0))
-           {
-             printf ("nanosleep time %ju.%.9ju outside reasonable range\n",
-                     (uintmax_t) diff.tv_sec, (uintmax_t) diff.tv_nsec);
-             result = 1;
-           }
        }
     }


> And then maybe we can add *another* test that might evaluate timings report
> as the tests was originally intended?
> 
Thus we would have time/tst-cpuclock1 which just performs functional checks
and time/tst-cpuclock1-timings which additionally performs the just
removed timing checks?

The timing checks could be enabled by setting a macro:
tst-cpuclock1-timings.c:
#define ENABLE_TIMING_CHECKS 1
#include <tst-cpuclock1.c>

But then time/tst-cpuclock1-timings would fail as often as the current
time/tst-cpuclock1 test if run on systems with high cpu-load /
virtualized CPUs. Should the valid limits be adjusted? If yes, which
limits should be used?
I think at least the "before - after" check which compares different
clocks should be removed.

Bye,
Stefan

next prev parent reply	other threads:[~2020-10-23  8:59 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-19 14:47 [PATCH v2] Loosen the limits of time/tst-cpuclock1 Stefan Liebler via Libc-alpha
2020-10-21 12:58 ` Adhemerval Zanella via Libc-alpha
2020-10-23  8:59   ` Stefan Liebler via Libc-alpha [this message]
2020-10-23 12:03     ` Adhemerval Zanella via Libc-alpha
2020-10-23 13:09       ` Stefan Liebler via Libc-alpha

find likely ancestor, descendant, or conflicting patches for this message:
dfblob:cc0815065 dfblob:f40b59011
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/libc/involved.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=730e513b-2650-b6a2-98c3-cd99f1e35b6b@linux.ibm.com \
    --to=libc-alpha@sourceware.org \
    --cc=adhemerval.zanella@linaro.org \
    --cc=carlos@redhat.com \
    --cc=stli@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).