From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <libc-alpha-bounces@sourceware.org>
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net
X-Spam-Level: 
X-Spam-Status: No, score=-4.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI,NICE_REPLY_A,
	SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham
	autolearn_force=no version=3.4.2
Received: from sourceware.org (server2.sourceware.org [8.43.85.97])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256)
	(No client certificate requested)
	by dcvr.yhbt.net (Postfix) with ESMTPS id 4C3EC1F4B4
	for <e@80x24.org>; Fri, 23 Oct 2020 08:59:55 +0000 (UTC)
Received: from server2.sourceware.org (localhost [IPv6:::1])
	by sourceware.org (Postfix) with ESMTP id D82D7386EC66;
	Fri, 23 Oct 2020 08:59:53 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D82D7386EC66
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org;
	s=default; t=1603443593;
	bh=woioMURjz0iiR9TzBuiaCOq30MUaAJuQg5EYsFixh6o=;
	h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe:
	 List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To:
	 From;
	b=vSCXuatdGel6kkJvVNj1SE0Sy251z39xVEULzKUFlXo1Q94SQtDX5Quh1c8l5scp5
	 KLzQFk4VltZZ+1/l9qBx1QRzaCubfJChmrvk/eilWLNTkjHA1geDk2b6rrV+DZgKi3
	 Ij9wjSn6/AZudSh85f8CfaXWWPII4sQqroltx3Zs=
Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com
 [148.163.156.1])
 by sourceware.org (Postfix) with ESMTPS id 044073857C45
 for <libc-alpha@sourceware.org>; Fri, 23 Oct 2020 08:59:50 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 044073857C45
Received: from pps.filterd (m0098404.ppops.net [127.0.0.1])
 by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id
 09N8pHrg031595; Fri, 23 Oct 2020 04:59:49 -0400
Received: from pps.reinject (localhost [127.0.0.1])
 by mx0a-001b2d01.pphosted.com with ESMTP id 34bn1k2m14-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Fri, 23 Oct 2020 04:59:49 -0400
Received: from m0098404.ppops.net (m0098404.ppops.net [127.0.0.1])
 by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 09N8pmVr035354;
 Fri, 23 Oct 2020 04:59:49 -0400
Received: from ppma06ams.nl.ibm.com (66.31.33a9.ip4.static.sl-reverse.com
 [169.51.49.102])
 by mx0a-001b2d01.pphosted.com with ESMTP id 34bn1k2m08-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Fri, 23 Oct 2020 04:59:48 -0400
Received: from pps.filterd (ppma06ams.nl.ibm.com [127.0.0.1])
 by ppma06ams.nl.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 09N8uvPK011463;
 Fri, 23 Oct 2020 08:59:46 GMT
Received: from b06cxnps4076.portsmouth.uk.ibm.com
 (d06relay13.portsmouth.uk.ibm.com [9.149.109.198])
 by ppma06ams.nl.ibm.com with ESMTP id 347qvhec83-1
 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT);
 Fri, 23 Oct 2020 08:59:46 +0000
Received: from d06av24.portsmouth.uk.ibm.com (mk.ibm.com [9.149.105.60])
 by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id
 09N8xiDm31785440
 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK);
 Fri, 23 Oct 2020 08:59:44 GMT
Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 39D8742049;
 Fri, 23 Oct 2020 08:59:44 +0000 (GMT)
Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1])
 by IMSVA (Postfix) with ESMTP id 0723942047;
 Fri, 23 Oct 2020 08:59:44 +0000 (GMT)
Received: from oc4452167425.ibm.com (unknown [9.145.0.234])
 by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTP;
 Fri, 23 Oct 2020 08:59:43 +0000 (GMT)
Subject: Re: [PATCH v2] Loosen the limits of time/tst-cpuclock1.
To: Adhemerval Zanella <adhemerval.zanella@linaro.org>,
 libc-alpha@sourceware.org, "Carlos O'Donell" <carlos@redhat.com>
References: <20201019144715.3236886-1-stli@linux.ibm.com>
 <a57ca219-5775-1c56-5a72-4b67f87168c4@linaro.org>
Message-ID: <730e513b-2650-b6a2-98c3-cd99f1e35b6b@linux.ibm.com>
Date: Fri, 23 Oct 2020 10:59:43 +0200
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
 Thunderbird/68.12.0
MIME-Version: 1.0
In-Reply-To: <a57ca219-5775-1c56-5a72-4b67f87168c4@linaro.org>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-TM-AS-GCONF: 00
X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235, 18.0.737
 definitions=2020-10-23_03:2020-10-23,
 2020-10-23 signatures=0
X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0
 adultscore=0 mlxlogscore=999
 priorityscore=1501 spamscore=0 lowpriorityscore=0 phishscore=0
 suspectscore=0 bulkscore=0 malwarescore=0 impostorscore=0 clxscore=1015
 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1
 engine=8.12.0-2009150000 definitions=main-2010230055
X-BeenThere: libc-alpha@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Libc-alpha mailing list <libc-alpha.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/libc-alpha/>
List-Post: <mailto:libc-alpha@sourceware.org>
List-Help: <mailto:libc-alpha-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/libc-alpha>,
 <mailto:libc-alpha-request@sourceware.org?subject=subscribe>
From: Stefan Liebler via Libc-alpha <libc-alpha@sourceware.org>
Reply-To: Stefan Liebler <stli@linux.ibm.com>
Errors-To: libc-alpha-bounces@sourceware.org
Sender: "Libc-alpha" <libc-alpha-bounces@sourceware.org>

On 10/21/20 2:58 PM, Adhemerval Zanella wrote:
> 
> 
> On 19/10/2020 11:47, Stefan Liebler via Libc-alpha wrote:
>> Starting with the commit 04deeaa9ea74b0679dfc9d9155a37b6425f19a9f
>> "Fix time/tst-cpuclock1 intermitent failures" (2020-07-11),
>> this test fails quite often on s390x/s390 with one/multiple of those:
>> "before - after" / "nanosleep time" / "dead - after" ourside reasonable range.
>>
>> On a zVM/kvm guest the CPUs are shared between multiple guests.
>> And even on the lpar (kvm host) the CPUs are usually shared between multiple lpars.
>> The defined CPUs for a lpar/zVM-system could also have lower weights compared
>> to other lpars which let the steal time further grow.
>>
>> Usually I build (-j$(nproc)) and test (PARALLELMFLAGS="-j$(nproc)") glibc multiple
>> times, e.g. with different GCCs, on various lpars or zVM guests at the same time.
>> During this time, I've run the test for 13500 times and obvserved the following fails:
>> ~600x "before - after"
>> ~60x "nanosleep time"
>> ~70x "dead - after"
>>
>> I've also observed a lot of "before - after" fails on a intel kvm-guest while
>> building/testing glibc on it.
>>
>> The mentioned commit has tighten the limits of valid tv_nsec ranges:
>> "before - after" (expected: 500000000):
>> - 100000000 ... 600000000
>> + 450000000 ... 550000000
>>
>> "nanosleep time" (expected: 100000000):
>> - 100000000 ... 200000000
>> + 090000000 ... 120000000
>>
>> "dead - after" (expected: 100000000):
>> -           ... 200000000
>> + 090000000 ... 120000000
>>
>> The test itself forks a child process which chew_cpu (user- and kernel-space).
>> The parent process sleeps with nanosleep(0.5s) and measures the child_clock time:
>> diff = after - before
>> With much workload on the machine, the child won't make much progess
>> and it can fall much beyond the minimum limit. Thus this check is now removed!
> 
> Ok.
> 
>>
>> Afterwards the parent process sleeps with clock_nanosleep (child_clock, 0.1s):
>> diff = afterns - after
>> The test currently also allows 0.9 * 0.1s.  As this would be an error, the
>> hard limit of 1.0 * 0.1s is now used as minimum border!
>> Depending on the workload, the maximum limit can exceed the 1.2 * 0.1s.
>> Therefore the upper limit is set to 2.0 which was also used before the
>> mentioned commit.
> 
> This is still tricky and add heuristic values that might fail depending
> of the architecture/kernel.  Wouldn't be better to follow Carlos suggestion
> and strip down from the test all the time related checks and only keep 
> the functional interfaces of the ABI:
> 
>   * clock_getcpuclockid vs. ENOSYS / ESRCH / EPERM
>   * clock_getcpuclockid vs. valid child
>   * clock_gettime of dead child where clock is no longer valid
> 
Sure, I can just remove this last time related check. This would be the
following diff and the title of the new commit would be "Remove timing
related checks of time/tst-cpuclock1":
diff --git a/time/tst-cpuclock1.c b/time/tst-cpuclock1.c
index cc08150654..f40b590111 100644
--- a/time/tst-cpuclock1.c
+++ b/time/tst-cpuclock1.c
@@ -26,7 +26,6 @@
 #include <signal.h>
 #include <stdint.h>
 #include <sys/wait.h>
-#include <support/timespec.h>

 /* This function is intended to rack up both user and system time.  */
 static void
@@ -163,21 +162,6 @@ do_test (void)
          printf ("live PID %d after sleep => %ju.%.9ju\n",
                  child, (uintmax_t) afterns.tv_sec,
                  (uintmax_t) afterns.tv_nsec);
-
-         /* As the sleep is based on the child clock, the diff should never
-            be less than the specified sleeptime.  Otherwise this is an
error.
-            The upper bound is quite high in order to get no failure if
running
-            with high cpu usage and/or on virtualized environments with
shared
-            CPUs.  */
-         struct timespec diff;
-         diff = timespec_sub (support_timespec_normalize (afterns),
-                              support_timespec_normalize (before));
-         if (!support_timespec_check_in_range (sleeptime, diff, 1.0, 2.0))
-           {
-             printf ("nanosleep time %ju.%.9ju outside reasonable range\n",
-                     (uintmax_t) diff.tv_sec, (uintmax_t) diff.tv_nsec);
-             result = 1;
-           }
        }
     }


> And then maybe we can add *another* test that might evaluate timings report
> as the tests was originally intended?
> 
Thus we would have time/tst-cpuclock1 which just performs functional checks
and time/tst-cpuclock1-timings which additionally performs the just
removed timing checks?

The timing checks could be enabled by setting a macro:
tst-cpuclock1-timings.c:
#define ENABLE_TIMING_CHECKS 1
#include <tst-cpuclock1.c>

But then time/tst-cpuclock1-timings would fail as often as the current
time/tst-cpuclock1 test if run on systems with high cpu-load /
virtualized CPUs. Should the valid limits be adjusted? If yes, which
limits should be used?
I think at least the "before - after" check which compares different
clocks should be removed.

Bye,
Stefan