From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-4.0 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,RCVD_IN_DNSWL_HI shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id 38C82211B3 for ; Wed, 5 Dec 2018 21:36:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727575AbeLEVg2 (ORCPT ); Wed, 5 Dec 2018 16:36:28 -0500 Received: from cloud.peff.net ([104.130.231.41]:60240 "HELO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1727257AbeLEVg2 (ORCPT ); Wed, 5 Dec 2018 16:36:28 -0500 Received: (qmail 10713 invoked by uid 109); 5 Dec 2018 21:36:28 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.94) with SMTP; Wed, 05 Dec 2018 21:36:28 +0000 Authentication-Results: cloud.peff.net; auth=none Received: (qmail 3313 invoked by uid 111); 5 Dec 2018 21:35:55 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.94) with (ECDHE-RSA-AES256-GCM-SHA384 encrypted) SMTP; Wed, 05 Dec 2018 16:35:55 -0500 Authentication-Results: peff.net; auth=none Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Wed, 05 Dec 2018 16:36:26 -0500 Date: Wed, 5 Dec 2018 16:36:26 -0500 From: Jeff King To: SZEDER =?utf-8?B?R8OhYm9y?= Cc: git@vger.kernel.org Subject: Re: [RFC PATCH 3/3] test-lib: add the '--stress' option to run a test repeatedly under load Message-ID: <20181205213625.GD19936@sigill.intra.peff.net> References: <20181204163457.15717-1-szeder.dev@gmail.com> <20181204163457.15717-4-szeder.dev@gmail.com> <20181205054408.GE12284@sigill.intra.peff.net> <20181205103454.GJ30222@szeder.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20181205103454.GJ30222@szeder.dev> Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Wed, Dec 05, 2018 at 11:34:54AM +0100, SZEDER Gábor wrote: > > Just a quick reply to this one point for now: > > On Wed, Dec 05, 2018 at 12:44:09AM -0500, Jeff King wrote: > > On Tue, Dec 04, 2018 at 05:34:57PM +0100, SZEDER Gábor wrote: > > > + job_nr=0 > > > + while test $job_nr -lt "$job_count" > > > + do > > > + wait > > > + job_nr=$(($job_nr + 1)) > > > + done > > > > Do we need to loop? Calling "wait" with no arguments should wait for all > > children. > > It should, but in dash, ksh, ksh93, Bash v4.2 or older it doesn't seem > to do so, at least not on my machine, and I get back my shell prompt > right after hitting ctrl-C or the first failed test, and then get the > progress output from the remaining jobs later. That's weird. I run my stress script under dash with a single "wait", and it handles the failing case just fine. And switching to a single "wait" in your script works for me. But the ^C case is interesting. Try running your script under "sh -x" and hitting ^C. The signal interrupts the first wait. In my script (or yours modified to use a single wait), we then proceed immediately to the "exit", and get output from the subshells after we've exited. But in your script with the loop, the first wait is interrupted, and then the _second_ wait (in the second loop iteration) picks up all of the exiting processes. The subsequent waits are all noops that return immediately, because there are no processes left. So the right number of waits is either "1" or "2". Looping means we do too many (which is mostly a harmless noop) or too few (in the off chance that you have only a single job ;) ). So it works out in practice. But I think a more clear way to do it is to simply wait in the signal trap, to cover the case when our original wait was aborted. I.e., something like this: diff --git a/t/test-lib.sh b/t/test-lib.sh index 9b7f687396..5e0c41626f 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -98,7 +98,7 @@ done,*) mkdir -p "$(dirname "$TEST_RESULTS_BASE")" stressfail="$TEST_RESULTS_BASE.stress-failed" rm -f "$stressfail" - trap 'echo aborted >"$stressfail"' TERM INT HUP + trap 'echo aborted >"$stressfail"; wait' TERM INT HUP job_nr=0 while test $job_nr -lt "$job_count" @@ -128,12 +128,7 @@ done,*) job_nr=$(($job_nr + 1)) done - job_nr=0 - while test $job_nr -lt "$job_count" - do - wait - job_nr=$(($job_nr + 1)) - done + wait if test -f "$stressfail" && test "$(cat "$stressfail")" != "aborted" then which behaves well for me in all cases. > Bash 4.3 or later are strange: I get back the shell prompt immediately > after ctrl-C as well, so it doesn't appear to be waiting for all > remaining jobs to finish either, but! I don't get any of the progress > output from those jobs to mess up my next command. Interesting. My bash 4.4 seems to behave the same as dash. It almost sounds like the SIGINT is getting passed to the subshells for you. Probably not really worth digging into, though. In case anybody is playing with this and needed to simulate a stress failure, here's what I used: diff --git a/t/t0000-basic.sh b/t/t0000-basic.sh index b6566003dd..a370cd9977 100755 --- a/t/t0000-basic.sh +++ b/t/t0000-basic.sh @@ -1171,4 +1171,11 @@ test_expect_success 'very long name in the index handled sanely' ' test $len = 4098 ' +test_expect_success 'roll those dice' ' + case "$(openssl rand -base64 1)" in + z*) + return 1 + esac +' + test_done