Re: [RFC PATCH 3/3] test-lib: add the '--stress' option to run a test repeatedly under load

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

From: Jeff King <peff@peff.net>
To: "SZEDER Gábor" <szeder.dev@gmail.com>
Cc: git@vger.kernel.org
Subject: Re: [RFC PATCH 3/3] test-lib: add the '--stress' option to run a test repeatedly under load
Date: Wed, 5 Dec 2018 16:56:21 -0500	[thread overview]
Message-ID: <20181205215621.GE19936@sigill.intra.peff.net> (raw)
In-Reply-To: <20181205140106.GM30222@szeder.dev>

On Wed, Dec 05, 2018 at 03:01:06PM +0100, SZEDER Gábor wrote:

> > >   - Make '--stress' imply '--verbose-log' and discard the test's
> > >     standard ouput and error; dumping the output of several parallel
> > >     tests to the terminal would create a big ugly mess.
> > 
> > Makes sense. My script just redirects the output to a file, but it
> > predates --verbose-log.
> > 
> > My script always runs with "-x". I guess it's not that hard to add it if
> > you want, but it is annoying to finally get a failure after several
> > minutes only to realize that your are missing some important
> > information. ;)
> > 
> > Ditto for "-i", which leaves more readable output (you know the
> > interesting part is at the end of the log), and leaves a trash directory
> > state that is more likely to be useful for inspecting.
> 
> I wanted to imply only the bare minimum of options that are necessary
> to prevent the tests from stomping on each other.  Other than that I
> agree and wouldn't run '--stress' without '-x -i' myself, and at one
> point considered to recommend '-x -i' in the description of
> '--stress'.

Yeah, it probably is the right thing to make the options as orthogonal
as possible, and let the user decide what they want. I was just
wondering if I could replace my "stress" script with this. But I think
I'd still want it to remember to use a sane set of options (including
"--root" in my case).

> > > 'wait' for all parallel jobs before exiting (either because a failure
> > > was found or because the user lost patience and aborted the stress
> > > test), allowing the still running tests to finish.  Otherwise the "OK
> > > X.Y" progress output from the last iteration would likely arrive after
> > > the user got back the shell prompt, interfering with typing in the
> > > next command.  OTOH, this waiting might induce a considerable delay
> > > between hitting ctrl-C and the test actually exiting; I'm not sure
> > > this is the right tradeoff.
> > 
> > If there is a delay, it's actually quite annoying to _not_ wait; you
> > start doing something in the terminal, and then a bunch of extra test
> > statuses print at a random time.
> 
> At least the INT/TERM/etc. handler should say something like "Waiting
> for background jobs to finish", to let the user know why it doesn't
> exit right away.

That seems reasonable. There's also no point in waiting for them to
finish if we know we're aborting anyway. Could we just kill them all?

I guess it's a little tricky, because $! is going to give us the pid of
each subshell. We actually want to kill the test sub-process. That takes
a few contortions, but the output is nice (every sub-job immediately
says "ABORTED ...", and the final process does not exit until the whole
tree is done):

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 9b7f687396..357dead3ff 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -98,8 +98,9 @@ done,*)
 	mkdir -p "$(dirname "$TEST_RESULTS_BASE")"
 	stressfail="$TEST_RESULTS_BASE.stress-failed"
 	rm -f "$stressfail"
-	trap 'echo aborted >"$stressfail"' TERM INT HUP
+	trap 'echo aborted >"$stressfail"; kill $job_pids 2>/dev/null; wait' TERM INT HUP
 
+	job_pids=
 	job_nr=0
 	while test $job_nr -lt "$job_count"
 	do
@@ -108,10 +109,13 @@ done,*)
 			GIT_TEST_STRESS_JOB_NR=$job_nr
 			export GIT_TEST_STRESS_STARTED GIT_TEST_STRESS_JOB_NR
 
+			trap 'kill $test_pid 2>/dev/null' TERM
+
 			cnt=0
 			while ! test -e "$stressfail"
 			do
-				if $TEST_SHELL_PATH "$0" "$@" >/dev/null 2>&1
+				$TEST_SHELL_PATH "$0" "$@" >/dev/null 2>&1 & test_pid=$!
+				if wait
 				then
 					printf >&2 "OK   %2d.%d\n" $GIT_TEST_STRESS_JOB_NR $cnt
 				elif test -f "$stressfail" &&
@@ -124,16 +128,11 @@ done,*)
 				fi
 				cnt=$(($cnt + 1))
 			done
-		) &
+		) & job_pids="$job_pids $!"
 		job_nr=$(($job_nr + 1))
 	done
 
-	job_nr=0
-	while test $job_nr -lt "$job_count"
-	do
-		wait
-		job_nr=$(($job_nr + 1))
-	done
+	wait
 
 	if test -f "$stressfail" && test "$(cat "$stressfail")" != "aborted"
 	then

> > > +				elif test -f "$stressfail" &&
> > > +				     test "$(cat "$stressfail")" = "aborted"
> > > +				then
> > > +					printf >&2 "ABORTED %2d.%d\n" $GIT_TEST_STRESS_JOB_NR $cnt
> > > +				else
> > > +					printf >&2 "FAIL %2d.%d\n" $GIT_TEST_STRESS_JOB_NR $cnt
> > > +					echo $GIT_TEST_STRESS_JOB_NR >>"$stressfail"
> > > +				fi
> > 
> > Hmm. What happens if we see a failure _and_ there's an abort? That's
> > actually pretty plausible if you see a failure whiz by, and you want to
> > abort the remaining scripts because they take a long time to run.
> > 
> > If the abort happens first, then the goal is to assume any errors are
> > due to the abort.
> 
> Yeah, that's why I added this ABORTED case.  When I hit ctrl-C, a
> couple of the tests tend to fail, but that's not the rare failure we
> are hoping to find when stress testing, so printing FAIL would be
> misleading.  The test script exits with 1 all the same, so we can't
> tell the difference, but since rare failures are, well, rare, this
> failure is most likely due to the abort.

With the patch as posted I actually see most of them still say "OK",
because the SIGINT does not make it to them (it would be a different
story if I killed the whole process group).

But with the modification above, I get "ABORTED" for most, which is what
I'd want. And it should be pretty race-free, since the "aborted" file is
put into place before triggering the signal cascade.

> > If the fail happens first (which I think is the more likely case), then
> > the abort handler will overwrite the file with "aborted", and we'll
> > forget that there was a real failure. This works in my script because of
> > the symlinking (if a real failure symlinked $fail to a directory, then
> > the attempt to write "aborted" will just be a noop).
> 
> OK, I think the abort handler should append, not overwrite, and then
> the file's contents should be checked e.g. with a case statement
> looking for *aborted*.
> 
> Or use two separate files for signaling abort and test failure.

Two separate files seems a lot simpler to me.

> > In my experience, outputting the failed log saves a step (especially
> > with "-i"), since seeing the failed test and its output is often
> > sufficient.
> 
> I just don't like when test output, especially with '-x', is dumped on
> my terminal :)

Don't you then use your terminal to display the file? ;) (I'm kidding, I
assume you load it in "less" or whatever, which is distinct).

I wish this could be optional somehow, though. I really prefer the dump.
I guess if I continue to use my script as a wrapper, it can dump for me
(after digging in the $stressfail file, I guess).

> > I'm also sad to lose the symlink to the failed trash directory, which
> > saves cutting and pasting when you have to inspect it. But I guess we
> > can't rely on symlinks everywhere.
> 
> You can tab-complete most of the trash directory, and then there are
> only those 1-2-3 digits of the job number that you have to type.  That
> worked out well enough for me so far (but I've only used it for a few
> days, so...).
> 
> We could rename the trash directory of the failed test to something
> short, sweet, and common, but that could be racy in the unlikely case
> that two tests fail at the same time.  And I like the 'trash
> directory.' prefix, because then 'make clean' will delete that one,
> too, without modifying 't/Makefile' (which is also the reason for
> adding the 'stress-N' suffix instead of a prefix, and putting the fail
> file into the 'test-results' directory).

We could rename it to "trash directory.xxx.failed", which is at least
predictable. That can even be done atomically, though I think it's a
little tricky to do portably with shell tools. I guess I can continue to
do the symlink thing in my wrapper script.

-Peff

next prev parent reply	other threads:[~2018-12-05 21:56 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-04 16:34 [RFC PATCH 0/3] test-lib: add the '--stress' option to help reproduce occasional failures in flaky tests SZEDER Gábor
2018-12-04 16:34 ` [PATCH 1/3] test-lib: consolidate naming of test-results paths SZEDER Gábor
2018-12-05  4:57   ` Jeff King
2018-12-04 16:34 ` [PATCH 2/3] test-lib-functions: introduce the 'test_set_port' helper function SZEDER Gábor
2018-12-05  5:17   ` Jeff King
2018-12-05 12:20     ` SZEDER Gábor
2018-12-05 21:59       ` Jeff King
2018-12-04 16:34 ` [RFC PATCH 3/3] test-lib: add the '--stress' option to run a test repeatedly under load SZEDER Gábor
2018-12-04 17:04   ` Ævar Arnfjörð Bjarmason
2018-12-04 17:37     ` SZEDER Gábor
2018-12-05  5:46     ` Jeff King
2018-12-04 18:11   ` Ævar Arnfjörð Bjarmason
2018-12-05  5:50     ` Jeff King
2018-12-05 12:07     ` SZEDER Gábor
2018-12-05 14:01       ` Ævar Arnfjörð Bjarmason
2018-12-05 14:39         ` SZEDER Gábor
2018-12-05 19:59           ` Ævar Arnfjörð Bjarmason
2018-12-05  5:44   ` Jeff King
2018-12-05 10:34     ` SZEDER Gábor
2018-12-05 21:36       ` Jeff King
2018-12-06  0:22         ` Junio C Hamano
2018-12-06  5:35           ` Jeff King
2018-12-06  6:41             ` Junio C Hamano
2018-12-06 22:56         ` SZEDER Gábor
2018-12-07  1:03           ` Jeff King
2018-12-05 14:01     ` SZEDER Gábor
2018-12-05 21:56       ` Jeff King [this message]
2018-12-06 23:10         ` SZEDER Gábor
2018-12-07  1:14           ` Jeff King
2018-12-09 22:56 ` [PATCH v2 0/7] test-lib: add the '--stress' option to help reproduce occasional failures in flaky tests SZEDER Gábor
2018-12-09 22:56   ` [PATCH v2 1/7] test-lib: translate SIGTERM and SIGHUP to an exit SZEDER Gábor
2018-12-11 10:57     ` Jeff King
2018-12-09 22:56   ` [PATCH v2 2/7] test-lib: parse some --options earlier SZEDER Gábor
2018-12-11 11:09     ` Jeff King
2018-12-11 12:42       ` SZEDER Gábor
2018-12-17 21:44         ` Jeff King
2018-12-30 19:04           ` SZEDER Gábor
2019-01-03  4:53             ` Jeff King
2018-12-09 22:56   ` [PATCH v2 3/7] test-lib: consolidate naming of test-results paths SZEDER Gábor
2018-12-09 22:56   ` [PATCH v2 4/7] test-lib: set $TRASH_DIRECTORY earlier SZEDER Gábor
2018-12-09 22:56   ` [PATCH v2 5/7] test-lib: extract Bash version check for '-x' tracing SZEDER Gábor
2018-12-09 22:56   ` [PATCH v2 6/7] test-lib-functions: introduce the 'test_set_port' helper function SZEDER Gábor
2018-12-09 22:56   ` [PATCH v2 7/7] test-lib: add the '--stress' option to run a test repeatedly under load SZEDER Gábor
2018-12-10  1:34     ` [PATCH] fixup! " SZEDER Gábor
2018-12-11 11:16   ` [PATCH v2 0/7] test-lib: add the '--stress' option to help reproduce occasional failures in flaky tests Jeff King
2018-12-30 19:16   ` [PATCH v3 0/8] " SZEDER Gábor
2018-12-30 19:16     ` [PATCH v3 1/8] test-lib: translate SIGTERM and SIGHUP to an exit SZEDER Gábor
2018-12-30 19:16     ` [PATCH v3 2/8] test-lib: parse options in a for loop to keep $@ intact SZEDER Gábor
2018-12-30 19:16     ` [PATCH v3 3/8] test-lib: parse command line options earlier SZEDER Gábor
2018-12-30 19:16     ` [PATCH v3 4/8] test-lib: consolidate naming of test-results paths SZEDER Gábor
2018-12-30 19:16     ` [PATCH v3 5/8] test-lib: set $TRASH_DIRECTORY earlier SZEDER Gábor
2018-12-30 22:44       ` SZEDER Gábor
2018-12-30 22:48         ` [PATCH v3.1 " SZEDER Gábor
2018-12-30 19:16     ` [PATCH v3 6/8] test-lib: extract Bash version check for '-x' tracing SZEDER Gábor
2018-12-31 17:14       ` Carlo Arenas
2018-12-30 19:16     ` [PATCH v3 7/8] test-lib-functions: introduce the 'test_set_port' helper function SZEDER Gábor
2018-12-30 19:16     ` [PATCH v3 8/8] test-lib: add the '--stress' option to run a test repeatedly under load SZEDER Gábor
2019-01-05  1:08     ` [PATCH v4 0/8] test-lib: add the '--stress' option to help reproduce occasional failures in flaky tests SZEDER Gábor
2019-01-05  1:08       ` [PATCH v4 1/8] test-lib: translate SIGTERM and SIGHUP to an exit SZEDER Gábor
2019-01-05  1:08       ` [PATCH v4 2/8] test-lib: extract Bash version check for '-x' tracing SZEDER Gábor
2019-01-05  1:08       ` [PATCH v4 3/8] test-lib: parse options in a for loop to keep $@ intact SZEDER Gábor
2019-01-05  1:08       ` [PATCH v4 4/8] test-lib: parse command line options earlier SZEDER Gábor
2019-01-05  1:08       ` [PATCH v4 5/8] test-lib: consolidate naming of test-results paths SZEDER Gábor
2019-01-05  1:08       ` [PATCH v4 6/8] test-lib: set $TRASH_DIRECTORY earlier SZEDER Gábor
2019-01-05  1:08       ` [PATCH v4 7/8] test-lib-functions: introduce the 'test_set_port' helper function SZEDER Gábor
2019-01-05  1:08       ` [PATCH v4 8/8] test-lib: add the '--stress' option to run a test repeatedly under load SZEDER Gábor
2019-01-07  8:49       ` [PATCH v4 0/8] test-lib: add the '--stress' option to help reproduce occasional failures in flaky tests Jeff King

find likely ancestor, descendant, or conflicting patches for this message:
dfblob:9b7f68739 dfblob:357dead3f
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181205215621.GE19936@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=git@vger.kernel.org \
    --cc=szeder.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).