[PATCH] test-lib: add ability to cap the runtime of tests

git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed

* [PATCH] test-lib: add ability to cap the runtime of tests
@ 2017-06-03 22:13 Ævar Arnfjörð Bjarmason
  2017-06-04  0:31 ` Junio C Hamano
  0 siblings, 1 reply; 12+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-06-03 22:13 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason

Add a GIT_TEST_TIMEOUT environment variable which optimistically sets
an approximate upper limit on how long any one test is allowed to
run.

Once the timeout is exceeded all remaining tests are skipped, no
attempt is made to stop a long running test in-progress, or deal with
the edge case of the epoch changing the epoch from under us by
e.g. ntpd.

On my machine median runtime for a test is around 150ms, but 8 tests
take more than 10 seconds to run, with t3404-rebase-interactive.sh
taking 18 seconds.

On a machine with a large number of cores these long-tail tests become
the limiting factor in how long it takes to run the entire test suite,
even if it's run with "prove --state=slow,save". This is because the
first long-running tests started at the very beginning will still be
running by the time the rest of the test suite finishes.

Speeding up the test suite by simply cataloging and skipping tests
that take longer than N seconds is a hassle to maintain, and entirely
skips some tests which would be nice to at least partially run,
e.g. instead of entirely skipping t3404-rebase-interactive.sh we can
run it for N seconds and get at least some "git rebase -i" test
coverage in a fast test run.

On a 56 core Xeon E5-2690 v4 @ 2.60GHz the runtime for the test suite
is cut in half with GIT_TEST_TIMEOUT=10 under prove -j56
t[0-9]*.sh. Approximate time to run all the tests with various
GIT_TEST_TIMEOUT settings[1]:

    N/A     30s
    20      20s
    10      15s
    5       12s
    1       12s

Setting a timeout lower than 5-10 seconds or so starts to reach
diminishing returns, e.g. t7063-status-untracked-cache.sh always takes
at least 6 seconds to run since it's blocking on a single
"update-index --test-untracked-cache" command. So there's room for
improvement, but this simple facility gives us most of the benefits.

The number of tests on the aforementioned machine which are run with
the various timeouts[2]:

    N/A     16261
    20      16037
    10      14972
    5       13952
    1       8409

While running with a timeout of 10 seconds cuts the runtime in half,
over 92% of the tests are still run. The test coverage is higher than
that number indicates, just taking into account the many similar tests
t0027-auto-crlf.sh runs brings it up to 95%.

1. for t in '' 20 10; do printf "%s\t" $t && (time GIT_TEST_TIMEOUT=$t prove -j$(parallel --number-of-cores) --state=slow,save t[0-9]*.sh) 2>&1 | grep ^real | grep -E -o '[0-9].*'; done
2. for t in '' 20 10 5 1; do printf "%s\t" $t && (time GIT_TEST_TIMEOUT=$t prove -j$(parallel --number-of-cores) --state=slow,save -v t[0-9]*.sh) 2>&1 | grep -e ^real -e '^1\.\.' | sed 's/^1\.\.//' | awk '{s+=$1} END {print s}'; done

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 t/test-lib.sh | 20 +++++++++++++++++++-
 1 file changed, 19 insertions(+), 1 deletion(-)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 4936725c67..0353e73873 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -15,6 +15,13 @@
 # You should have received a copy of the GNU General Public License
 # along with this program.  If not, see http://www.gnu.org/licenses/ .

+# If we have a set max runtime record the startup time before anything
+# else is done.
+if test -n "$GIT_TEST_TIMEOUT"
+then
+	TEST_STARTUP_TIME=$(date +%s)
+fi
+
 # Test the binaries we have just built.  The tests are kept in
 # t/ subdirectory and are run in 'trash directory' subdirectory.
 if test -z "$TEST_DIRECTORY"
@@ -689,11 +696,22 @@ test_skip () {
 		to_skip=t
 		skipped_reason="--run"
 	fi
+	if test -n "$GIT_TEST_TIMEOUT" &&
+		test "$(($(date +%s) - $TEST_STARTUP_TIME))" -ge "$GIT_TEST_TIMEOUT"
+	then
+		to_skip=all
+		skipped_reason="Exceeded GIT_TEST_TIMEOUT in runtime"
+	fi

 	case "$to_skip" in
-	t)
+	all|t)
 		say_color skip >&3 "skipping test: $@"
 		say_color skip "ok $test_count # skip $1 ($skipped_reason)"
+		if test "$to_skip" = all
+		then
+			skip_all="$skipped_reason"
+			test_done
+		fi
 		: true
 		;;
 	*)
-- 
2.13.0.506.g27d5fe0cd

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] test-lib: add ability to cap the runtime of tests
  2017-06-03 22:13 [PATCH] test-lib: add ability to cap the runtime of tests Ævar Arnfjörð Bjarmason
@ 2017-06-04  0:31 ` Junio C Hamano
  2017-06-04  7:29   ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 12+ messages in thread
From: Junio C Hamano @ 2017-06-04  0:31 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: git

Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Speeding up the test suite by simply cataloging and skipping tests
> that take longer than N seconds is a hassle to maintain, and entirely
> skips some tests which would be nice to at least partially run,
> e.g. instead of entirely skipping t3404-rebase-interactive.sh we can
> run it for N seconds and get at least some "git rebase -i" test
> coverage in a fast test run.

I'd be more supportive to the former approach in the longer run for
two reasons.

Is it even safe to stop a test in the middle?  Won't we leave
leftover server processes, for example?

    I see start_httpd at least sets up "trap" to call stop_httpd
    when the shell exits, so HTTP testing via lib-httpd.sh may be
    safe.  I do not know about other network-y tests, though.

Granted, when a test fails, we already have the same problem, but
then we'd go in and investigate, and the first thing we notice would
be that the old leftover server instance is holding onto the port to
prevent the attempt to re-run the test from running, which then we'd
kill.  But with this option, the user is not even made aware of
tests being killed in the middle.

> While running with a timeout of 10 seconds cuts the runtime in half,
> over 92% of the tests are still run. The test coverage is higher than
> that number indicates, just taking into account the many similar tests
> t0027-auto-crlf.sh runs brings it up to 95%.

I certainly understand that but in the longer term, I'd prefer the
approach to call out an overly large test.  That will hopefully
motivate us to split it (or speed up the thing) to help folks on
many-core machines.

I am afraid that the proposed change will disincentivize that by
sweeping the problematic ones under the rug.  Perhaps you can
collect what tests are terminated in the middle because they run for
too long and show the list of them at the end, or something?

Also, I thought that it was a no-no to say "to_skil=all" with
skipped-reason in the middle of a test when the test is run under
prove?

Oh, by the way, is "date +%s" even portable?  I thought not.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] test-lib: add ability to cap the runtime of tests
  2017-06-04  0:31 ` Junio C Hamano
@ 2017-06-04  7:29   ` Ævar Arnfjörð Bjarmason
  2017-06-05  1:55     ` Junio C Hamano
  2017-06-05 13:17     ` Lars Schneider
  0 siblings, 2 replies; 12+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-06-04  7:29 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List

On Sun, Jun 4, 2017 at 2:31 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>
>> Speeding up the test suite by simply cataloging and skipping tests
>> that take longer than N seconds is a hassle to maintain, and entirely
>> skips some tests which would be nice to at least partially run,
>> e.g. instead of entirely skipping t3404-rebase-interactive.sh we can
>> run it for N seconds and get at least some "git rebase -i" test
>> coverage in a fast test run.
>
> I'd be more supportive to the former approach in the longer run for
> two reasons.
>
> Is it even safe to stop a test in the middle?  Won't we leave
> leftover server processes, for example?
>
>     I see start_httpd at least sets up "trap" to call stop_httpd
>     when the shell exits, so HTTP testing via lib-httpd.sh may be
>     safe.  I do not know about other network-y tests, though.

When this flag is in effect and you run into the timeout the code is
semantically equivalent to not running subsequent test_expect_*
blocks, things like the trap in lib-httpd.sh will still run, so will
test_when_finished.

Unless we have some test killing a daemon in a test_expect_success
block later in the test this'll work as intended.

> Granted, when a test fails, we already have the same problem, but
> then we'd go in and investigate, and the first thing we notice would
> be that the old leftover server instance is holding onto the port to
> prevent the attempt to re-run the test from running, which then we'd
> kill.  But with this option, the user is not even made aware of
> tests being killed in the middle.
>
>> While running with a timeout of 10 seconds cuts the runtime in half,
>> over 92% of the tests are still run. The test coverage is higher than
>> that number indicates, just taking into account the many similar tests
>> t0027-auto-crlf.sh runs brings it up to 95%.
>
> I certainly understand that but in the longer term, I'd prefer the
> approach to call out an overly large test.  That will hopefully
> motivate us to split it (or speed up the thing) to help folks on
> many-core machines.

The reason I didn't document this in t/README was because I thought it
made sense to have this as a mostly hidden feature that end users
wouldn't be tempted to fiddle with, but would be useful to someone
doing git development.

Realistically I'm going to submit this patch, I'm not going to take
the much bigger project of refactoring the entire test suite so that
no test runs under N second, and of course any such refactoring can
only aim for a fixed instead of dynamic N.

The point of this change is that I can replace running e.g. "prove
t[0-9]*{grep,log}*.sh" with just running the full test suite every
time, since 30s is noticeably slow during regular hacking but once
it's down to 15s it's perceptively fast enough.

Reading between the lines in your reply, I think you're afraid that
regular users just testing git out will start using this, as opposed
to power user developers who understand the trade-offs. I think that's
mostly mitigated by not documenting it in t/README, but I could amend
the patch to add some scary commend to test-lib.sh as well.

> I am afraid that the proposed change will disincentivize that by
> sweeping the problematic ones under the rug.  Perhaps you can
> collect what tests are terminated in the middle because they run for
> too long and show the list of them at the end, or something?

This change incentivizes  me to be regularly running a larger % of the
full test suite.

Collecting the skipped ones is easy enough to do with a grep + for
loop, so I don't think it's worth making the implementation more
complex to occasionally answer the question of how many tests were
skipped due to running into the timeout:

$ rm .prove; for t in 20 10 5 1; do printf "%s\t" $t && (time
GIT_TEST_TIMEOUT=$t prove -j$(parallel --number-of-cores)
--state=slow,save -v t[0-9]*.sh) 2>&1 | grep -c "Exceeded
GIT_TEST_TIMEOUT"; done
rm: cannot remove ‘.prove’: No such file or directory
20      4
10      36
5       80
1       509

Of course that gives you "how many tests had skipped tests", now how
many test_expect_* blocks were skipped. An earlier WIP version of this
did the former, but e.g. running the rest of t0027-auto-crlf.sh took
many seconds just do spew out hundreds/thousands of lines in a shell
loop emitting "skip" lines, so I went with the to_skip=all
implementation.

> Also, I thought that it was a no-no to say "to_skil=all" with
> skipped-reason in the middle of a test when the test is run under
> prove?

TAP has two main modes of operation, you can either declare that
you're going to run N tests in advance and then you must run N, this
makes prove report progress on your tests as they run.

Or you can just run in a mode where you stream out however many tests
you're going to run as you go along, and then print "1..NUM_TESTS" at
the end.

We use the latter, so we can abort the entire test suite at any time
with test_done, that's what this change does.

> Oh, by the way, is "date +%s" even portable?  I thought not.

The lib-git-p4.sh lib says not, and shells out to python's time() is a
workaround, I could replace this with perl -e 'print time', but
thought it wasn't worth bothering with for an obscure optional feature
like this.

Since 6a9d16a0a8 ("filter-branch: add passed/remaining seconds on
progress", 2015-09-07) git-filter-branch relies on `date +%s`.

I suspect Solaris users are just setting a GNU/updated toolpath in
their $PATH, and worrying about this isn't worth bothering with,
especially for this sort of thing.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] test-lib: add ability to cap the runtime of tests
  2017-06-04  7:29   ` Ævar Arnfjörð Bjarmason
@ 2017-06-05  1:55     ` Junio C Hamano
  2017-06-05  5:48       ` Christian Couder
  2017-06-07 10:24       ` Jeff King
  2017-06-05 13:17     ` Lars Schneider
  1 sibling, 2 replies; 12+ messages in thread
From: Junio C Hamano @ 2017-06-05  1:55 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Git Mailing List

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

>> I certainly understand that but in the longer term, I'd prefer the
>> approach to call out an overly large test.  That will hopefully
>> motivate us to split it (or speed up the thing) to help folks on
>> many-core machines.
>
> The reason I didn't document this in t/README was because I thought it
> made sense to have this as a mostly hidden feature that end users
> wouldn't be tempted to fiddle with, but would be useful to someone
> doing git development.
>
> Realistically I'm going to submit this patch, I'm not going to take
> the much bigger project of refactoring the entire test suite so that
> no test runs under N second, and of course any such refactoring can
> only aim for a fixed instead of dynamic N.

I do not expect any single person to tackle the splitting.  I just
wished that a patch inspired by this patch (or better yet, a new
version of this patch) made the tail end of "make test" output to
read like this:

   ...
   [18:32:44] t9400-git-cvsserver-server.sh ...... ok    18331 ms
   [18:32:49] t9402-git-cvsserver-refs.sh ........ ok    22902 ms
   [18:32:49] t9200-git-cvsexportcommit.sh ....... ok    25163 ms
   [18:32:51]
   All tests successful.
   Files=785, Tests=16928, 122 wallclock secs ( ...
   Result: PASS

   * The following tests took longer than 15 seconds to run.  We
     may want to look into splitting them into smaller files.

   t3404-rebase-interactive.sh ...    19 secs
   t9001-send-email.sh ...........    22 secs
   t9402-git-cvsserver-refs.sh ...    22 secs
   t9200-git-cvsexportcommit.sh ..    25 secs

when the hidden feature is _not_ used, so that wider set of people
will be forced to see that some tests take inordinate amount of
time, and entice at least some of them to look into it.

> Collecting the skipped ones is easy enough to do with a grep + for
> loop, so I don't think it's worth making the implementation more
> complex to occasionally answer the question of how many tests were
> skipped due to running into the timeout

"Easy enough" and "made to stand out so _NO_ effort is needed to
see" are very different things.

> Or you can just run in a mode where you stream out however many tests
> you're going to run as you go along, and then print "1..NUM_TESTS" at
> the end.
>
> We use the latter, so we can abort the entire test suite at any time
> with test_done, that's what this change does.

cf. bf4b7219 ("test-lib.sh: Add check for invalid use of 'skip_all'
facility", 2012-09-01)

>> Oh, by the way, is "date +%s" even portable?  I thought not.
>
> The lib-git-p4.sh lib says not, and shells out to python's time() is a
> workaround, I could replace this with perl -e 'print time', but
> thought it wasn't worth bothering with for an obscure optional feature
> like this.

Let's do the right thing, because doing so is easy.

I personally think that filter-branch being broken is not noticed
only because it is not very often used, as opposed to that we want
to encourage those who are following along with us, especially those
who are on minority platforms, to run our tests every day.  Let's
not spread sloppyness unnecessarily.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] test-lib: add ability to cap the runtime of tests
  2017-06-05  1:55     ` Junio C Hamano
@ 2017-06-05  5:48       ` Christian Couder
  2017-06-05 18:56         ` Stefan Beller
  2017-06-07 10:24       ` Jeff King
  1 sibling, 1 reply; 12+ messages in thread
From: Christian Couder @ 2017-06-05  5:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ævar Arnfjörð Bjarmason, Git Mailing List

On Mon, Jun 5, 2017 at 3:55 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:
>
>> Realistically I'm going to submit this patch, I'm not going to take
>> the much bigger project of refactoring the entire test suite so that
>> no test runs under N second, and of course any such refactoring can
>> only aim for a fixed instead of dynamic N.
>
> I do not expect any single person to tackle the splitting.  I just
> wished that a patch inspired by this patch (or better yet, a new
> version of this patch) made the tail end of "make test" output to
> read like this:
>
>    ...
>    [18:32:44] t9400-git-cvsserver-server.sh ...... ok    18331 ms
>    [18:32:49] t9402-git-cvsserver-refs.sh ........ ok    22902 ms
>    [18:32:49] t9200-git-cvsexportcommit.sh ....... ok    25163 ms
>    [18:32:51]
>    All tests successful.
>    Files=785, Tests=16928, 122 wallclock secs ( ...
>    Result: PASS
>
>    * The following tests took longer than 15 seconds to run.  We
>      may want to look into splitting them into smaller files.
>
>    t3404-rebase-interactive.sh ...    19 secs
>    t9001-send-email.sh ...........    22 secs
>    t9402-git-cvsserver-refs.sh ...    22 secs
>    t9200-git-cvsexportcommit.sh ..    25 secs
>
> when the hidden feature is _not_ used, so that wider set of people
> will be forced to see that some tests take inordinate amount of
> time, and entice at least some of them to look into it.

I wonder if splitting tests would make a good GSoC microproject for next year.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] test-lib: add ability to cap the runtime of tests
  2017-06-04  7:29   ` Ævar Arnfjörð Bjarmason
  2017-06-05  1:55     ` Junio C Hamano
@ 2017-06-05 13:17     ` Lars Schneider
  2017-06-05 18:15       ` Ævar Arnfjörð Bjarmason
  1 sibling, 1 reply; 12+ messages in thread
From: Lars Schneider @ 2017-06-05 13:17 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason; +Cc: Junio C Hamano, Git Mailing List


> On 04 Jun 2017, at 09:29, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
> 
> On Sun, Jun 4, 2017 at 2:31 AM, Junio C Hamano <gitster@pobox.com> wrote:
>> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>> 
>>> Speeding up the test suite by simply cataloging and skipping tests
>>> that take longer than N seconds is a hassle to maintain, and entirely
>>> skips some tests which would be nice to at least partially run,
>>> e.g. instead of entirely skipping t3404-rebase-interactive.sh we can
>>> run it for N seconds and get at least some "git rebase -i" test
>>> coverage in a fast test run.
>> 
>> I'd be more supportive to the former approach in the longer run for
>> two reasons.
>> 
>> Is it even safe to stop a test in the middle?  Won't we leave
>> leftover server processes, for example?
>> 
>>    I see start_httpd at least sets up "trap" to call stop_httpd
>>    when the shell exits, so HTTP testing via lib-httpd.sh may be
>>    safe.  I do not know about other network-y tests, though.
> 
> When this flag is in effect and you run into the timeout the code is
> semantically equivalent to not running subsequent test_expect_*
> blocks, things like the trap in lib-httpd.sh will still run, so will
> test_when_finished.
> 
> Unless we have some test killing a daemon in a test_expect_success
> block later in the test this'll work as intended.
> 
>> Granted, when a test fails, we already have the same problem, but
>> then we'd go in and investigate, and the first thing we notice would
>> be that the old leftover server instance is holding onto the port to
>> prevent the attempt to re-run the test from running, which then we'd
>> kill.  But with this option, the user is not even made aware of
>> tests being killed in the middle.
>> 
>>> While running with a timeout of 10 seconds cuts the runtime in half,
>>> over 92% of the tests are still run. The test coverage is higher than
>>> that number indicates, just taking into account the many similar tests
>>> t0027-auto-crlf.sh runs brings it up to 95%.
>> 
>> I certainly understand that but in the longer term, I'd prefer the
>> approach to call out an overly large test.  That will hopefully
>> motivate us to split it (or speed up the thing) to help folks on
>> many-core machines.
> 
> The reason I didn't document this in t/README was because I thought it
> made sense to have this as a mostly hidden feature that end users
> wouldn't be tempted to fiddle with, but would be useful to someone
> doing git development.
> 
> Realistically I'm going to submit this patch, I'm not going to take
> the much bigger project of refactoring the entire test suite so that
> no test runs under N second, and of course any such refactoring can
> only aim for a fixed instead of dynamic N.
> 
> The point of this change is that I can replace running e.g. "prove
> t[0-9]*{grep,log}*.sh" with just running the full test suite every
> time, since 30s is noticeably slow during regular hacking but once
> it's down to 15s it's perceptively fast enough.
> 
> Reading between the lines in your reply, I think you're afraid that
> regular users just testing git out will start using this, as opposed
> to power user developers who understand the trade-offs. I think that's
> mostly mitigated by not documenting it in t/README, but I could amend
> the patch to add some scary commend to test-lib.sh as well.

Maybe I am wrong here, but I would be very surprised if a "regular user"
who does not dive into the Git source code would run the tests at all.

Plus, wouldn't it make sense to mark tests that run longer than 10sec 
on average hardware with "GIT_TEST_LONG"? Wouldn't that solve your
problem in a nice way?

We could use TravisCI as baseline for "average hardware":
https://travis-ci.org/git/git/jobs/239451918

- Lars



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] test-lib: add ability to cap the runtime of tests
  2017-06-05 13:17     ` Lars Schneider
@ 2017-06-05 18:15       ` Ævar Arnfjörð Bjarmason
  2017-06-05 19:03         ` Stefan Beller
  0 siblings, 1 reply; 12+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-06-05 18:15 UTC (permalink / raw)
  To: Lars Schneider; +Cc: Junio C Hamano, Git Mailing List

On Mon, Jun 5, 2017 at 3:17 PM, Lars Schneider <larsxschneider@gmail.com> wrote:
>
>> On 04 Jun 2017, at 09:29, Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote:
>>
>> On Sun, Jun 4, 2017 at 2:31 AM, Junio C Hamano <gitster@pobox.com> wrote:
>>> Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:
>>>
>>>> Speeding up the test suite by simply cataloging and skipping tests
>>>> that take longer than N seconds is a hassle to maintain, and entirely
>>>> skips some tests which would be nice to at least partially run,
>>>> e.g. instead of entirely skipping t3404-rebase-interactive.sh we can
>>>> run it for N seconds and get at least some "git rebase -i" test
>>>> coverage in a fast test run.
>>>
>>> I'd be more supportive to the former approach in the longer run for
>>> two reasons.
>>>
>>> Is it even safe to stop a test in the middle?  Won't we leave
>>> leftover server processes, for example?
>>>
>>>    I see start_httpd at least sets up "trap" to call stop_httpd
>>>    when the shell exits, so HTTP testing via lib-httpd.sh may be
>>>    safe.  I do not know about other network-y tests, though.
>>
>> When this flag is in effect and you run into the timeout the code is
>> semantically equivalent to not running subsequent test_expect_*
>> blocks, things like the trap in lib-httpd.sh will still run, so will
>> test_when_finished.
>>
>> Unless we have some test killing a daemon in a test_expect_success
>> block later in the test this'll work as intended.
>>
>>> Granted, when a test fails, we already have the same problem, but
>>> then we'd go in and investigate, and the first thing we notice would
>>> be that the old leftover server instance is holding onto the port to
>>> prevent the attempt to re-run the test from running, which then we'd
>>> kill.  But with this option, the user is not even made aware of
>>> tests being killed in the middle.
>>>
>>>> While running with a timeout of 10 seconds cuts the runtime in half,
>>>> over 92% of the tests are still run. The test coverage is higher than
>>>> that number indicates, just taking into account the many similar tests
>>>> t0027-auto-crlf.sh runs brings it up to 95%.
>>>
>>> I certainly understand that but in the longer term, I'd prefer the
>>> approach to call out an overly large test.  That will hopefully
>>> motivate us to split it (or speed up the thing) to help folks on
>>> many-core machines.
>>
>> The reason I didn't document this in t/README was because I thought it
>> made sense to have this as a mostly hidden feature that end users
>> wouldn't be tempted to fiddle with, but would be useful to someone
>> doing git development.
>>
>> Realistically I'm going to submit this patch, I'm not going to take
>> the much bigger project of refactoring the entire test suite so that
>> no test runs under N second, and of course any such refactoring can
>> only aim for a fixed instead of dynamic N.
>>
>> The point of this change is that I can replace running e.g. "prove
>> t[0-9]*{grep,log}*.sh" with just running the full test suite every
>> time, since 30s is noticeably slow during regular hacking but once
>> it's down to 15s it's perceptively fast enough.
>>
>> Reading between the lines in your reply, I think you're afraid that
>> regular users just testing git out will start using this, as opposed
>> to power user developers who understand the trade-offs. I think that's
>> mostly mitigated by not documenting it in t/README, but I could amend
>> the patch to add some scary commend to test-lib.sh as well.
>
> Maybe I am wrong here, but I would be very surprised if a "regular user"
> who does not dive into the Git source code would run the tests at all.
>
> Plus, wouldn't it make sense to mark tests that run longer than 10sec
> on average hardware with "GIT_TEST_LONG"? Wouldn't that solve your
> problem in a nice way?
>
> We could use TravisCI as baseline for "average hardware":
> https://travis-ci.org/git/git/jobs/239451918

There would be no point in marking the tests that take over 10s on
average hardware, and it's busywork to maintain those markers.

Also, as the numbers quoted in my patch show 10s is an arbitrary sweet
spot on that particular box, maybe it's 20s on some other hardware, or
5s if someone produces a 84 core box.

This patch is only something someone who has exceptional hardware
would care about. These long running tests in question don't have much
impact on the total CPU time spent on the test suite, the problem,
such as it is, is that they're not split up into more files, and thus
in a sufficiently beefy machine the entirety of the rest of the
parallel test suite can be over by the time these stragglers finish.

That's never going to be a problem on a less beefy machine with
--state=slow,save, since the 30s test is going to be long over by the
time the rest of the tests run.

Cutting down on these long tail tests allows me to e.g. replace this:

    git rebase -i --exec '(make -j56 all && cd t && prove -j56 <some
limited glob>)'

With a glob that runs the entire test suite, with the rebase only
taking marginally longer in most cases while getting much better test
coverage than I'd otherwise bother with.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] test-lib: add ability to cap the runtime of tests
  2017-06-05  5:48       ` Christian Couder
@ 2017-06-05 18:56         ` Stefan Beller
  0 siblings, 0 replies; 12+ messages in thread
From: Stefan Beller @ 2017-06-05 18:56 UTC (permalink / raw)
  To: Christian Couder
  Cc: Junio C Hamano, Ævar Arnfjörð Bjarmason,
	Git Mailing List

On Sun, Jun 4, 2017 at 10:48 PM, Christian Couder
<christian.couder@gmail.com> wrote:
>> when the hidden feature is _not_ used, so that wider set of people
>> will be forced to see that some tests take inordinate amount of
>> time, and entice at least some of them to look into it.
>
> I wonder if splitting tests would make a good GSoC microproject for next year.

That is an interesting thought.

It scales pretty well IMHO.
(Have a good student that works quickly
-> we'll have very modular tests at the end,
have a student doing a-ok
-> at least parts are converted and upstreamed)

On the other hand it requires knowledge of a lot of things
(shell scripting, different operating systems come to mind..)

I like the idea.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] test-lib: add ability to cap the runtime of tests
  2017-06-05 18:15       ` Ævar Arnfjörð Bjarmason
@ 2017-06-05 19:03         ` Stefan Beller
  2017-06-05 20:37           ` Ævar Arnfjörð Bjarmason
  0 siblings, 1 reply; 12+ messages in thread
From: Stefan Beller @ 2017-06-05 19:03 UTC (permalink / raw)
  To: Ævar Arnfjörð Bjarmason
  Cc: Lars Schneider, Junio C Hamano, Git Mailing List

> That's never going to be a problem on a less beefy machine with
> --state=slow,save, since the 30s test is going to be long over by the
> time the rest of the tests run.
>
> Cutting down on these long tail tests allows me to e.g. replace this:
>
>     git rebase -i --exec '(make -j56 all && cd t && prove -j56 <some
> limited glob>)'
>
> With a glob that runs the entire test suite, with the rebase only
> taking marginally longer in most cases while getting much better test
> coverage than I'd otherwise bother with.

I wonder if this functionality is rather best put into prove?

Also prove doesn't know which tests are "interesting",
e.g. if you were working on interactive rebase, then you really
want the longest test to be run in full?

And this "judge by time, not by interest" doesn't bode well with
me.

I have a non-beefy machine such that this particular problem
doesn't apply to me, but instead the whole test suite takes just
long to run.

For that I reduce testing intelligently, i.e. I know where I am
working on, so I run only some given tests (in case of
submodules I'd go with "prove t74*") which would also fix
your issue IIUC?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] test-lib: add ability to cap the runtime of tests
  2017-06-05 19:03         ` Stefan Beller
@ 2017-06-05 20:37           ` Ævar Arnfjörð Bjarmason
  0 siblings, 0 replies; 12+ messages in thread
From: Ævar Arnfjörð Bjarmason @ 2017-06-05 20:37 UTC (permalink / raw)
  To: Stefan Beller; +Cc: Lars Schneider, Junio C Hamano, Git Mailing List

On Mon, Jun 5, 2017 at 9:03 PM, Stefan Beller <sbeller@google.com> wrote:
>> That's never going to be a problem on a less beefy machine with
>> --state=slow,save, since the 30s test is going to be long over by the
>> time the rest of the tests run.
>>
>> Cutting down on these long tail tests allows me to e.g. replace this:
>>
>>     git rebase -i --exec '(make -j56 all && cd t && prove -j56 <some
>> limited glob>)'
>>
>> With a glob that runs the entire test suite, with the rebase only
>> taking marginally longer in most cases while getting much better test
>> coverage than I'd otherwise bother with.
>
> I wonder if this functionality is rather best put into prove?

It would be nice to have a general facility to abort & kill tests
based on some criteria as they're run by Test::Harness, but making
that work reliably with all the edge cases prove needs to deal with
(tens/hundreds of thousands of test suites) is a much bigger project
than this.

> Also prove doesn't know which tests are "interesting",
> e.g. if you were working on interactive rebase, then you really
> want the longest test to be run in full?

If I were hacking rebase or another feature which has such a long
running test then the long running test without the timeout would be
part of my "regular" testing.

The point of this feature is that most tests aren't like that, then
you can use this and do the full test suite every time.

> And this "judge by time, not by interest" doesn't bode well with
> me.

They're not mutually exclusive.

> I have a non-beefy machine such that this particular problem
> doesn't apply to me, but instead the whole test suite takes just
> long to run.
>
> For that I reduce testing intelligently, i.e. I know where I am
> working on, so I run only some given tests (in case of
> submodules I'd go with "prove t74*") which would also fix
> your issue IIUC?

No, because even when you're working on e.g. "grep" something you're
doing occasionally breaks in some completely unrelated test because it
happens to cover an aspect of grep which is not part of the main
tests.

I ran into this recently while hacking the wildmatch() implementation.
There's dozens of tests all over the test suite that'll break in
subtle ways if wildmatch() breaks, often in cases where the main
wildmatch test is still passing.

Running the whole thing, even in a limited timeout fashion, has a much
higher chance of catching whatever I've screwed up earlier, before I
do an occasional full test suite run. Running the tests in 10 or 15s
is a much shorter time to wait for during a edit/compile/test cycle.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] test-lib: add ability to cap the runtime of tests
  2017-06-05  1:55     ` Junio C Hamano
  2017-06-05  5:48       ` Christian Couder
@ 2017-06-07 10:24       ` Jeff King
  2017-06-08  0:59         ` Ramsay Jones
  1 sibling, 1 reply; 12+ messages in thread
From: Jeff King @ 2017-06-07 10:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Ævar Arnfjörð Bjarmason, Git Mailing List

On Mon, Jun 05, 2017 at 10:55:42AM +0900, Junio C Hamano wrote:

> I do not expect any single person to tackle the splitting.  I just
> wished that a patch inspired by this patch (or better yet, a new
> version of this patch) made the tail end of "make test" output to
> read like this:
> 
>    ...
>    [18:32:44] t9400-git-cvsserver-server.sh ...... ok    18331 ms
>    [18:32:49] t9402-git-cvsserver-refs.sh ........ ok    22902 ms
>    [18:32:49] t9200-git-cvsexportcommit.sh ....... ok    25163 ms
>    [18:32:51]
>    All tests successful.
>    Files=785, Tests=16928, 122 wallclock secs ( ...
>    Result: PASS
> 
>    * The following tests took longer than 15 seconds to run.  We
>      may want to look into splitting them into smaller files.
> 
>    t3404-rebase-interactive.sh ...    19 secs
>    t9001-send-email.sh ...........    22 secs
>    t9402-git-cvsserver-refs.sh ...    22 secs
>    t9200-git-cvsexportcommit.sh ..    25 secs
> 
> when the hidden feature is _not_ used, so that wider set of people
> will be forced to see that some tests take inordinate amount of
> time, and entice at least some of them to look into it.

If you use "prove", it already records this information, and it can
print it with "--timer".

I don't use that myself, though. What's much more interesting (but which
I haven't found a way to get prove to do out-of-the-box) is to show the
longest tests after the fact. I gave a perl snippet to do so in

  http://public-inbox.org/git/20161019205638.m3ytxozzmeh47ml2@sigill.intra.peff.net/

I've actually played with this splitting before, but on my quad-core
(plus hyperthreading) box, I could never get it to make any improvement
once --state=slow was used. The longest test for me is 28s, but the
whole suite takes 50s to run. The slow tests get front-loaded, and then
by the end we have lots of little tests to hand out to each processor
and they all stay busy.

The main difference with Ævar's run is that he has a huge number of
processors. So I'm in favor of more splitting, but I also doubt that
even most Git developers would see any improvement. Or maybe people
really do have monstrous boxes. I dunno.

The biggest changes I've seen in my runs are:

  1. Use "prove --state=slow,save"; even with a few processors it makes
     a big difference.

  2. Point --root at a RAM disk.

  3. I started using Michael's git-test[1], which I have testing each
     commit on the current branch in the background[2]. That gives me an
     early warning when there's a failure (I have it play a sad trombone
     sound, since it's in a minimized terminal), and when I do ask the
     tests to run, quite often it can answer "all tests pass" out of
     the cache.

-Peff

[1] https://github.com/mhagger/git-test

[2] My hacky script is at https://github.com/peff/git/blob/meta/ci

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] test-lib: add ability to cap the runtime of tests
  2017-06-07 10:24       ` Jeff King
@ 2017-06-08  0:59         ` Ramsay Jones
  0 siblings, 0 replies; 12+ messages in thread
From: Ramsay Jones @ 2017-06-08  0:59 UTC (permalink / raw)
  To: Jeff King, Junio C Hamano
  Cc: Ævar Arnfjörð Bjarmason, Git Mailing List



On 07/06/17 11:24, Jeff King wrote:
> On Mon, Jun 05, 2017 at 10:55:42AM +0900, Junio C Hamano wrote:
> 
> If you use "prove", it already records this information, and it can
> print it with "--timer".

Yes I have

    DEFAULT_TEST_TARGET=prove
    GIT_PROVE_OPTS='--timer'

in my config.mak and run the test: 'make test >test-out 2>&1'.

> I don't use that myself, though. What's much more interesting (but which
> I haven't found a way to get prove to do out-of-the-box) is to show the
> longest tests after the fact.

So, I have a perl script that can give me that info, like so:

On Linux, (current master branch), the ten slowest tests:

$ ./test-times.pl -n 10 test-out
   1   22.269s   t9001-send-email.sh
   2   15.222s   t7063-status-untracked-cache.sh
   3   10.965s   t9500-gitweb-standalone-no-errors.sh
   4    9.978s   t3404-rebase-interactive.sh
   5    9.270s   t3421-rebase-topology-linear.sh
   6    8.099s   t0027-auto-crlf.sh
   7    6.056s   t7112-reset-submodule.sh
   8    5.937s   t7610-mergetool.sh
   9    5.711s   t5572-pull-submodule.sh
  10    5.229s   t6030-bisect-porcelain.sh
$ 

... and those taking 10s or more:

$ ./test-times.pl -t 10 test-out
   1   22.269s   t9001-send-email.sh
   2   15.222s   t7063-status-untracked-cache.sh
   3   10.965s   t9500-gitweb-standalone-no-errors.sh
$ 

However, on cygwin (v2.13.0 test output file), the ten slowest:

$ ./test-times.pl -n 10 test-out
   1  385.057s   t3404-rebase-interactive.sh
   2  339.694s   t3421-rebase-topology-linear.sh
   3  224.854s   t7610-mergetool.sh
   4  211.230s   t6030-bisect-porcelain.sh
   5  180.297s   t3425-rebase-topology-merges.sh
   6  179.446s   t3903-stash.sh
   7  177.600s   t9001-send-email.sh
   8  175.278s   t5572-pull-submodule.sh
   9  160.422s   t2013-checkout-submodule.sh
  10  158.615s   t1013-read-tree-submodule.sh
$ 

... and (some of) those taking 10s or more:

$ ./test-times.pl -t 10 test-out
   1  385.057s   t3404-rebase-interactive.sh
   2  339.694s   t3421-rebase-topology-linear.sh
   3  224.854s   t7610-mergetool.sh
   4  211.230s   t6030-bisect-porcelain.sh

...

 175   10.389s   t4300-merge-tree.sh
 176   10.324s   t4213-log-tabexpand.sh
 177   10.143s   t3034-merge-recursive-rename-options.sh
 178   10.108s   t1410-reflog.sh
$ 

[On linux a complete run is about 400s and on cygwin about 3hours!]

ATB,
Ramsay Jones


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2017-06-08  0:59 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-03 22:13 [PATCH] test-lib: add ability to cap the runtime of tests Ævar Arnfjörð Bjarmason
2017-06-04  0:31 ` Junio C Hamano
2017-06-04  7:29   ` Ævar Arnfjörð Bjarmason
2017-06-05  1:55     ` Junio C Hamano
2017-06-05  5:48       ` Christian Couder
2017-06-05 18:56         ` Stefan Beller
2017-06-07 10:24       ` Jeff King
2017-06-08  0:59         ` Ramsay Jones
2017-06-05 13:17     ` Lars Schneider
2017-06-05 18:15       ` Ævar Arnfjörð Bjarmason
2017-06-05 19:03         ` Stefan Beller
2017-06-05 20:37           ` Ævar Arnfjörð Bjarmason

Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).