From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-Status: No, score=-4.2 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_PASS,SPF_PASS shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by dcvr.yhbt.net (Postfix) with ESMTPS id 4C3EC1F4B4 for ; Fri, 23 Oct 2020 08:59:55 +0000 (UTC) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id D82D7386EC66; Fri, 23 Oct 2020 08:59:53 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org D82D7386EC66 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1603443593; bh=woioMURjz0iiR9TzBuiaCOq30MUaAJuQg5EYsFixh6o=; h=Subject:To:References:Date:In-Reply-To:List-Id:List-Unsubscribe: List-Archive:List-Post:List-Help:List-Subscribe:From:Reply-To: From; b=vSCXuatdGel6kkJvVNj1SE0Sy251z39xVEULzKUFlXo1Q94SQtDX5Quh1c8l5scp5 KLzQFk4VltZZ+1/l9qBx1QRzaCubfJChmrvk/eilWLNTkjHA1geDk2b6rrV+DZgKi3 Ij9wjSn6/AZudSh85f8CfaXWWPII4sQqroltx3Zs= Received: from mx0a-001b2d01.pphosted.com (mx0a-001b2d01.pphosted.com [148.163.156.1]) by sourceware.org (Postfix) with ESMTPS id 044073857C45 for ; Fri, 23 Oct 2020 08:59:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 044073857C45 Received: from pps.filterd (m0098404.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 09N8pHrg031595; Fri, 23 Oct 2020 04:59:49 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0a-001b2d01.pphosted.com with ESMTP id 34bn1k2m14-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 23 Oct 2020 04:59:49 -0400 Received: from m0098404.ppops.net (m0098404.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.36/8.16.0.36) with SMTP id 09N8pmVr035354; Fri, 23 Oct 2020 04:59:49 -0400 Received: from ppma06ams.nl.ibm.com (66.31.33a9.ip4.static.sl-reverse.com [169.51.49.102]) by mx0a-001b2d01.pphosted.com with ESMTP id 34bn1k2m08-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 23 Oct 2020 04:59:48 -0400 Received: from pps.filterd (ppma06ams.nl.ibm.com [127.0.0.1]) by ppma06ams.nl.ibm.com (8.16.0.42/8.16.0.42) with SMTP id 09N8uvPK011463; Fri, 23 Oct 2020 08:59:46 GMT Received: from b06cxnps4076.portsmouth.uk.ibm.com (d06relay13.portsmouth.uk.ibm.com [9.149.109.198]) by ppma06ams.nl.ibm.com with ESMTP id 347qvhec83-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 23 Oct 2020 08:59:46 +0000 Received: from d06av24.portsmouth.uk.ibm.com (mk.ibm.com [9.149.105.60]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 09N8xiDm31785440 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Fri, 23 Oct 2020 08:59:44 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 39D8742049; Fri, 23 Oct 2020 08:59:44 +0000 (GMT) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0723942047; Fri, 23 Oct 2020 08:59:44 +0000 (GMT) Received: from oc4452167425.ibm.com (unknown [9.145.0.234]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTP; Fri, 23 Oct 2020 08:59:43 +0000 (GMT) Subject: Re: [PATCH v2] Loosen the limits of time/tst-cpuclock1. To: Adhemerval Zanella , libc-alpha@sourceware.org, "Carlos O'Donell" References: <20201019144715.3236886-1-stli@linux.ibm.com> Message-ID: <730e513b-2650-b6a2-98c3-cd99f1e35b6b@linux.ibm.com> Date: Fri, 23 Oct 2020 10:59:43 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.235, 18.0.737 definitions=2020-10-23_03:2020-10-23, 2020-10-23 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 mlxlogscore=999 priorityscore=1501 spamscore=0 lowpriorityscore=0 phishscore=0 suspectscore=0 bulkscore=0 malwarescore=0 impostorscore=0 clxscore=1015 mlxscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2009150000 definitions=main-2010230055 X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , From: Stefan Liebler via Libc-alpha Reply-To: Stefan Liebler Errors-To: libc-alpha-bounces@sourceware.org Sender: "Libc-alpha" On 10/21/20 2:58 PM, Adhemerval Zanella wrote: > > > On 19/10/2020 11:47, Stefan Liebler via Libc-alpha wrote: >> Starting with the commit 04deeaa9ea74b0679dfc9d9155a37b6425f19a9f >> "Fix time/tst-cpuclock1 intermitent failures" (2020-07-11), >> this test fails quite often on s390x/s390 with one/multiple of those: >> "before - after" / "nanosleep time" / "dead - after" ourside reasonable range. >> >> On a zVM/kvm guest the CPUs are shared between multiple guests. >> And even on the lpar (kvm host) the CPUs are usually shared between multiple lpars. >> The defined CPUs for a lpar/zVM-system could also have lower weights compared >> to other lpars which let the steal time further grow. >> >> Usually I build (-j$(nproc)) and test (PARALLELMFLAGS="-j$(nproc)") glibc multiple >> times, e.g. with different GCCs, on various lpars or zVM guests at the same time. >> During this time, I've run the test for 13500 times and obvserved the following fails: >> ~600x "before - after" >> ~60x "nanosleep time" >> ~70x "dead - after" >> >> I've also observed a lot of "before - after" fails on a intel kvm-guest while >> building/testing glibc on it. >> >> The mentioned commit has tighten the limits of valid tv_nsec ranges: >> "before - after" (expected: 500000000): >> - 100000000 ... 600000000 >> + 450000000 ... 550000000 >> >> "nanosleep time" (expected: 100000000): >> - 100000000 ... 200000000 >> + 090000000 ... 120000000 >> >> "dead - after" (expected: 100000000): >> - ... 200000000 >> + 090000000 ... 120000000 >> >> The test itself forks a child process which chew_cpu (user- and kernel-space). >> The parent process sleeps with nanosleep(0.5s) and measures the child_clock time: >> diff = after - before >> With much workload on the machine, the child won't make much progess >> and it can fall much beyond the minimum limit. Thus this check is now removed! > > Ok. > >> >> Afterwards the parent process sleeps with clock_nanosleep (child_clock, 0.1s): >> diff = afterns - after >> The test currently also allows 0.9 * 0.1s. As this would be an error, the >> hard limit of 1.0 * 0.1s is now used as minimum border! >> Depending on the workload, the maximum limit can exceed the 1.2 * 0.1s. >> Therefore the upper limit is set to 2.0 which was also used before the >> mentioned commit. > > This is still tricky and add heuristic values that might fail depending > of the architecture/kernel. Wouldn't be better to follow Carlos suggestion > and strip down from the test all the time related checks and only keep > the functional interfaces of the ABI: > > * clock_getcpuclockid vs. ENOSYS / ESRCH / EPERM > * clock_getcpuclockid vs. valid child > * clock_gettime of dead child where clock is no longer valid > Sure, I can just remove this last time related check. This would be the following diff and the title of the new commit would be "Remove timing related checks of time/tst-cpuclock1": diff --git a/time/tst-cpuclock1.c b/time/tst-cpuclock1.c index cc08150654..f40b590111 100644 --- a/time/tst-cpuclock1.c +++ b/time/tst-cpuclock1.c @@ -26,7 +26,6 @@ #include #include #include -#include /* This function is intended to rack up both user and system time. */ static void @@ -163,21 +162,6 @@ do_test (void) printf ("live PID %d after sleep => %ju.%.9ju\n", child, (uintmax_t) afterns.tv_sec, (uintmax_t) afterns.tv_nsec); - - /* As the sleep is based on the child clock, the diff should never - be less than the specified sleeptime. Otherwise this is an error. - The upper bound is quite high in order to get no failure if running - with high cpu usage and/or on virtualized environments with shared - CPUs. */ - struct timespec diff; - diff = timespec_sub (support_timespec_normalize (afterns), - support_timespec_normalize (before)); - if (!support_timespec_check_in_range (sleeptime, diff, 1.0, 2.0)) - { - printf ("nanosleep time %ju.%.9ju outside reasonable range\n", - (uintmax_t) diff.tv_sec, (uintmax_t) diff.tv_nsec); - result = 1; - } } } > And then maybe we can add *another* test that might evaluate timings report > as the tests was originally intended? > Thus we would have time/tst-cpuclock1 which just performs functional checks and time/tst-cpuclock1-timings which additionally performs the just removed timing checks? The timing checks could be enabled by setting a macro: tst-cpuclock1-timings.c: #define ENABLE_TIMING_CHECKS 1 #include But then time/tst-cpuclock1-timings would fail as often as the current time/tst-cpuclock1 test if run on systems with high cpu-load / virtualized CPUs. Should the valid limits be adjusted? If yes, which limits should be used? I think at least the "before - after" check which compares different clocks should be removed. Bye, Stefan