git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: Eric Sunshine <sunshine@sunshineco.com>
Cc: "Eric Wong" <e@80x24.org>,
	"Eric Sunshine via GitGitGadget" <gitgitgadget@gmail.com>,
	"Git List" <git@vger.kernel.org>,
	"Elijah Newren" <newren@gmail.com>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>,
	"Fabian Stelzer" <fs@gigacodes.de>,
	"Johannes Schindelin" <Johannes.Schindelin@gmx.de>
Subject: Re: [PATCH 06/18] chainlint.pl: validate test scripts in parallel
Date: Tue, 6 Sep 2022 19:26:59 -0400	[thread overview]
Message-ID: <YxfXQ0IJjq/FT2Uh@coredump.intra.peff.net> (raw)
In-Reply-To: <CAPig+cSx661-HEr3JcAD5MuYfgHviGQ1cSAftkgw6gj2FgTQVg@mail.gmail.com>

On Tue, Sep 06, 2022 at 06:52:26PM -0400, Eric Sunshine wrote:

> On Tue, Sep 6, 2022 at 6:35 PM Eric Wong <e@80x24.org> wrote:
> > Eric Sunshine via GitGitGadget <gitgitgadget@gmail.com> wrote:
> > > +unless ($Config{useithreads} && eval {
> > > +     require threads; threads->import();
> >
> > Fwiw, the threads(3perl) manpage has this since 2014:
> >
> >        The use of interpreter-based threads in perl is officially discouraged.
> 
> Thanks for pointing this out. I did see that, but as no better
> alternative was offered, and since I did want this to work on Windows,
> I went with it.

I did some timings the other night, and I found something quite curious
with the thread stuff.

Here's a hyperfine run of "make" in the t/ directory before any of your
patches. It uses "prove" to do parallelism under the hood:

  Benchmark 1: make
    Time (mean ± σ):     68.895 s ±  0.840 s    [User: 620.914 s, System: 428.498 s]
    Range (min … max):   67.943 s … 69.531 s    3 runs

So that gives us a baseline. Now the first thing I wondered is how bad
it would be to just run chainlint.pl once per script. So I applied up to
that patch:

  Benchmark 1: make
    Time (mean ± σ):     71.289 s ±  1.302 s    [User: 673.300 s, System: 417.912 s]
    Range (min … max):   69.788 s … 72.120 s    3 runs

I was quite surprised that it made things slower! It's nice that we're
only calling it once per script instead of once per test, but it seems
the startup overhead of the script is really high.

And since in this mode we're only feeding it one script at a time, I
tried reverting the "chainlint.pl: validate test scripts in parallel"
commit. And indeed, now things are much faster:

  Benchmark 1: make
    Time (mean ± σ):     61.544 s ±  3.364 s    [User: 556.486 s, System: 384.001 s]
    Range (min … max):   57.660 s … 63.490 s    3 runs

And you can see the same thing just running chainlint by itself:

  $ time perl chainlint.pl /dev/null
  real	0m0.069s
  user	0m0.042s
  sys	0m0.020s

  $ git revert HEAD^{/validate.test.scripts.in.parallel}
  $ time perl chainlint.pl /dev/null
  real	0m0.014s
  user	0m0.010s
  sys	0m0.004s

I didn't track down the source of the slowness. Maybe it's loading extra
modules, or maybe it's opening /proc/cpuinfo, or maybe it's the thread
setup. But it's a surprising slowdown.

Now of course your intent is to do a single repo-wide invocation. And
that is indeed a bit faster. Here it is without the parallel code:

  Benchmark 1: make
    Time (mean ± σ):     61.727 s ±  2.140 s    [User: 507.712 s, System: 377.753 s]
    Range (min … max):   59.259 s … 63.074 s    3 runs

The wall-clock time didn't improve much, but the CPU time did. Restoring
the parallel code does improve the wall-clock time a bit, but at the
cost of some extra CPU:

  Benchmark 1: make
    Time (mean ± σ):     59.029 s ±  2.851 s    [User: 515.690 s, System: 380.369 s]
    Range (min … max):   55.736 s … 60.693 s    3 runs

which makes sense. If I do a with/without of just "make test-chainlint",
the parallelism is buying a few seconds of wall-clock:

  Benchmark 1: make test-chainlint
    Time (mean ± σ):     900.1 ms ± 102.9 ms    [User: 12049.8 ms, System: 79.7 ms]
    Range (min … max):   704.2 ms … 994.4 ms    10 runs

  Benchmark 1: make test-chainlint
    Time (mean ± σ):      3.778 s ±  0.042 s    [User: 3.756 s, System: 0.023 s]
    Range (min … max):    3.706 s …  3.833 s    10 runs

I'm not sure what it all means. For Linux, I think I'd be just as happy
with a single non-parallelized test-chainlint run for each file. But
maybe on Windows the startup overhead is worse? OTOH, the whole test run
is so much worse there. One process per script is not going to be that
much in relative terms either way.

And if we did cache the results and avoid extra invocations via "make",
then we'd want all the parallelism to move to there anyway.

Maybe that gives you more food for thought about whether perl's "use
threads" is worth having.

-Peff

  reply	other threads:[~2022-09-06 23:27 UTC|newest]

Thread overview: 51+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-01  0:29 [PATCH 00/18] make test "linting" more comprehensive Eric Sunshine via GitGitGadget
2022-09-01  0:29 ` [PATCH 01/18] t: add skeleton chainlint.pl Eric Sunshine via GitGitGadget
2022-09-01 12:27   ` Ævar Arnfjörð Bjarmason
2022-09-02 18:53     ` Eric Sunshine
2022-09-01  0:29 ` [PATCH 02/18] chainlint.pl: add POSIX shell lexical analyzer Eric Sunshine via GitGitGadget
2022-09-01 12:32   ` Ævar Arnfjörð Bjarmason
2022-09-03  6:00     ` Eric Sunshine
2022-09-01  0:29 ` [PATCH 03/18] chainlint.pl: add POSIX shell parser Eric Sunshine via GitGitGadget
2022-09-01  0:29 ` [PATCH 04/18] chainlint.pl: add parser to validate tests Eric Sunshine via GitGitGadget
2022-09-01  0:29 ` [PATCH 05/18] chainlint.pl: add parser to identify test definitions Eric Sunshine via GitGitGadget
2022-09-01  0:29 ` [PATCH 06/18] chainlint.pl: validate test scripts in parallel Eric Sunshine via GitGitGadget
2022-09-01 12:36   ` Ævar Arnfjörð Bjarmason
2022-09-03  7:51     ` Eric Sunshine
2022-09-06 22:35   ` Eric Wong
2022-09-06 22:52     ` Eric Sunshine
2022-09-06 23:26       ` Jeff King [this message]
2022-11-21  4:02         ` Eric Sunshine
2022-11-21 13:28           ` Ævar Arnfjörð Bjarmason
2022-11-21 14:07             ` Eric Sunshine
2022-11-21 14:18               ` Ævar Arnfjörð Bjarmason
2022-11-21 14:48                 ` Eric Sunshine
2022-11-21 18:04           ` Jeff King
2022-11-21 18:47             ` Eric Sunshine
2022-11-21 18:50               ` Eric Sunshine
2022-11-21 18:52               ` Jeff King
2022-11-21 19:00                 ` Eric Sunshine
2022-11-21 19:28                   ` Jeff King
2022-11-22  0:11                   ` Ævar Arnfjörð Bjarmason
2022-09-01  0:29 ` [PATCH 07/18] chainlint.pl: don't require `return|exit|continue` to end with `&&` Eric Sunshine via GitGitGadget
2022-09-01  0:29 ` [PATCH 08/18] t/Makefile: apply chainlint.pl to existing self-tests Eric Sunshine via GitGitGadget
2022-09-01  0:29 ` [PATCH 09/18] chainlint.pl: don't require `&` background command to end with `&&` Eric Sunshine via GitGitGadget
2022-09-01  0:29 ` [PATCH 10/18] chainlint.pl: don't flag broken &&-chain if `$?` handled explicitly Eric Sunshine via GitGitGadget
2022-09-01  0:29 ` [PATCH 11/18] chainlint.pl: don't flag broken &&-chain if failure indicated explicitly Eric Sunshine via GitGitGadget
2022-09-01  0:29 ` [PATCH 12/18] chainlint.pl: complain about loops lacking explicit failure handling Eric Sunshine via GitGitGadget
2022-09-01  0:29 ` [PATCH 13/18] chainlint.pl: allow `|| echo` to signal failure upstream of a pipe Eric Sunshine via GitGitGadget
2022-09-01  0:29 ` [PATCH 14/18] t/chainlint: add more chainlint.pl self-tests Eric Sunshine via GitGitGadget
2022-09-01  0:29 ` [PATCH 15/18] test-lib: retire "lint harder" optimization hack Eric Sunshine via GitGitGadget
2022-09-01  0:29 ` [PATCH 16/18] test-lib: replace chainlint.sed with chainlint.pl Eric Sunshine via GitGitGadget
2022-09-03  5:07   ` Elijah Newren
2022-09-03  5:24     ` Eric Sunshine
2022-09-01  0:29 ` [PATCH 17/18] t/Makefile: teach `make test` and `make prove` to run chainlint.pl Eric Sunshine via GitGitGadget
2022-09-01  0:29 ` [PATCH 18/18] t: retire unused chainlint.sed Eric Sunshine via GitGitGadget
2022-09-02 12:42   ` several messages Johannes Schindelin
2022-09-02 18:16     ` Eric Sunshine
2022-09-02 18:34       ` Jeff King
2022-09-02 18:44         ` Junio C Hamano
2022-09-11  5:28 ` [PATCH 00/18] make test "linting" more comprehensive Jeff King
2022-09-11  7:01   ` Eric Sunshine
2022-09-11 18:31     ` Jeff King
2022-09-12 23:17       ` Eric Sunshine
2022-09-13  0:04         ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YxfXQ0IJjq/FT2Uh@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=avarab@gmail.com \
    --cc=e@80x24.org \
    --cc=fs@gigacodes.de \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=newren@gmail.com \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).