From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS3215 2.6.0.0/16 X-Spam-Status: No, score=-3.6 required=3.0 tests=AWL,BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE shortcircuit=no autolearn=ham autolearn_force=no version=3.4.2 Received: from out1.vger.email (out1.vger.email [IPv6:2620:137:e000::1:20]) by dcvr.yhbt.net (Postfix) with ESMTP id 59A8D1F54E for ; Thu, 1 Sep 2022 12:48:36 +0000 (UTC) Authentication-Results: dcvr.yhbt.net; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="FI+o6a62"; dkim-atps=neutral Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233448AbiIAMsJ (ORCPT ); Thu, 1 Sep 2022 08:48:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59426 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233380AbiIAMsD (ORCPT ); Thu, 1 Sep 2022 08:48:03 -0400 Received: from mail-ej1-x629.google.com (mail-ej1-x629.google.com [IPv6:2a00:1450:4864:20::629]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A89B6760E2 for ; Thu, 1 Sep 2022 05:47:57 -0700 (PDT) Received: by mail-ej1-x629.google.com with SMTP id fy31so34043082ejc.6 for ; Thu, 01 Sep 2022 05:47:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:in-reply-to :user-agent:references:date:subject:cc:to:from:from:to:cc:subject :date; bh=D581l7UUt6eFrdoF4f38yhOjYFa8LBXo4SPqcx2o1+I=; b=FI+o6a62xZbxbkSFncVN7TjR1Cwdt4fIPxHhJZh2NPChPtpeSSAWyRSjM9b6Jbrhwj HJWQdVVb3S3f8myFzk7XGHrfuH9EJdN5uzC9ELARsgDCp2F7e/V2x1JcNhcQYi5yxYmD 6lYxRMn5dtvY7xr2BUreXA85mDVT9yxk2SYai6Rr5Fz5mSCANUetf8ZW8kj1tvQdjBmD 2eNc/Kjm3jvruUPATlHCD52ranvg97labDcRlqVnrZ/n6k5jkpatSlw7ZJaiypaU75q1 XdRF82hYffr9ZA9ke9hZYXiGt8fE8wbkpDnl8tSIyeC7TJY6TdoKRD65Xl/E67YX1Jow xB2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:in-reply-to :user-agent:references:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date; bh=D581l7UUt6eFrdoF4f38yhOjYFa8LBXo4SPqcx2o1+I=; b=p7OlEgYA3tAWpC6neCjD8vuPaLmtJ2EVYW8jUHQMvUjvxEUN5kmSAW8u/7BnW77ZrJ 91a5+iTOh92htweuOzHlDrmjGb0myRAlvw3pK8og2rBWcgdViqCkk8vnoe6p5nq0vv4W 6CDKboIHzPrP7NdhJZU3lN32ZaBHwXqox263MaBk8Cg3Xb1JoGDX+XK02HArWjlzMSD9 VxkvYMXKBFEClRsPQ+Zikp3gTsKl9Gvi75AxkSuBHHgD7e8LP+mG+1e4xe6CISK/GaDk v+LyDUhjWrvUyuxyhyWkvgj4Sc4o+hCZT1YSPcoKQw/cO5w8b3JpLjf6e2x7TbN3+3jO WE+Q== X-Gm-Message-State: ACgBeo34TN9xblyUx393wTN5dWWmE1/95mdJBXVqzw2eL3+S0wJKWVx1 lled7CUX+h9/x55DQ9V6qRciCg+7PDIlPg== X-Google-Smtp-Source: AA6agR6Z4NyOR+xRHSypPc3bNzzUGKqmFYgmfRFLWaFytSKf9MupdbA6vCWFA6V+GM1cfD+W+n6Z0Q== X-Received: by 2002:a17:907:2ceb:b0:741:6115:a57 with SMTP id hz11-20020a1709072ceb00b0074161150a57mr15976995ejc.423.1662036474699; Thu, 01 Sep 2022 05:47:54 -0700 (PDT) Received: from gmgdl (dhcp-077-248-183-071.chello.nl. [77.248.183.71]) by smtp.gmail.com with ESMTPSA id d13-20020a170906304d00b0073c8d4c9f38sm8575791ejd.177.2022.09.01.05.47.54 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 01 Sep 2022 05:47:54 -0700 (PDT) Received: from avar by gmgdl with local (Exim 4.96) (envelope-from ) id 1oTjbt-000AGm-1J; Thu, 01 Sep 2022 14:47:53 +0200 From: =?utf-8?B?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason To: Eric Sunshine via GitGitGadget Cc: git@vger.kernel.org, Jeff King , Elijah Newren , Fabian Stelzer , Johannes Schindelin , Eric Sunshine Subject: Re: [PATCH 06/18] chainlint.pl: validate test scripts in parallel Date: Thu, 01 Sep 2022 14:36:57 +0200 References: <62fc652eb47a4df83d88a197e376f28dbbab3b52.1661992197.git.gitgitgadget@gmail.com> User-agent: Debian GNU/Linux bookworm/sid; Emacs 27.1; mu4e 1.7.12 In-reply-to: <62fc652eb47a4df83d88a197e376f28dbbab3b52.1661992197.git.gitgitgadget@gmail.com> Message-ID: <220901.86bkrzjm6e.gmgdl@evledraar.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Thu, Sep 01 2022, Eric Sunshine via GitGitGadget wrote: > From: Eric Sunshine > > Although chainlint.pl has undergone a good deal of optimization during > its development -- increasing in speed significantly -- parsing and > validating 1050+ scripts and 16500+ tests via Perl is not exactly > instantaneous. However, perceived performance can be improved by taking > advantage of the fact that there is no interdependence between test > scripts or test definitions, thus parsing and validating can be done in > parallel. The number of available cores is determined automatically but > can be overridden via the --jobs option. Per your CL: =09 =C3=86var offered some sensible comments[2,3] about optimizing the Makefil= e rules related to chainlint, but those optimizations are not tackled here for a f= ew reasons: (1) this series is already quite long, (2) I'd like to keep the series focused on its primary goal of installing a new and improved linter, (3) these patches do not make the Makefile situation any worse[4], and (4) those optimizations can easily be done atop this series[5]. I have been running with those t/Makefile changesg locally, but didn't submit them. FWIW that's here: https://github.com/git/git/compare/master...avar:git:avar/t-Makefile-use-d= ependency-graph-for-check-chainlint Which I'm not entirely sure I'm happy about, and it's jeust about the chainlint tests, but... > +sub ncores { > + # Windows > + return $ENV{NUMBER_OF_PROCESSORS} if exists($ENV{NUMBER_OF_PROCESSORS}); > + # Linux / MSYS2 / Cygwin / WSL > + do { local @ARGV=3D'/proc/cpuinfo'; return scalar(grep(/^processor\s*:/= , <>)); } if -r '/proc/cpuinfo'; > + # macOS & BSD > + return qx/sysctl -n hw.ncpu/ if $^O =3D~ /(?:^darwin$|bsd)/; > + return 1; > +} > + > sub show_stats { > my ($start_time, $stats) =3D @_; > my $walltime =3D $interval->($start_time); > @@ -621,7 +633,9 @@ sub exit_code { > Getopt::Long::Configure(qw{bundling}); > GetOptions( > "emit-all!" =3D> \$emit_all, > + "jobs|j=3Di" =3D> \$jobs, > "stats|show-stats!" =3D> \$show_stats) or die("option error\n"); > +$jobs =3D ncores() if $jobs < 1; >=20=20 > my $start_time =3D $getnow->(); > my @stats; > @@ -633,6 +647,40 @@ unless (@scripts) { > exit; > } >=20=20 > -push(@stats, check_script(1, sub { shift(@scripts); }, sub { print(@_); = })); > +unless ($Config{useithreads} && eval { > + require threads; threads->import(); > + require Thread::Queue; Thread::Queue->import(); > + 1; > + }) { > + push(@stats, check_script(1, sub { shift(@scripts); }, sub { print(@_);= })); > + show_stats($start_time, \@stats) if $show_stats; > + exit(exit_code(\@stats)); > +} > + > +my $script_queue =3D Thread::Queue->new(); > +my $output_queue =3D Thread::Queue->new(); > + > +sub next_script { return $script_queue->dequeue(); } > +sub emit { $output_queue->enqueue(@_); } > + > +sub monitor { > + while (my $s =3D $output_queue->dequeue()) { > + print($s); > + } > +} > + > +my $mon =3D threads->create({'context' =3D> 'void'}, \&monitor); > +threads->create({'context' =3D> 'list'}, \&check_script, $_, \&next_scri= pt, \&emit) for 1..$jobs; > + > +$script_queue->enqueue(@scripts); > +$script_queue->end(); > + > +for (threads->list()) { > + push(@stats, $_->join()) unless $_ =3D=3D $mon; > +} > + > +$output_queue->end(); > +$mon->join(); Maybe I'm misunderstanding this whole thing, but this really seems like the wrong direction in an otherwise fantastic direction of a series. I.e. it's *great* that we can do chain-lint without needing to actually execute the *.sh file, this series adds a lint parser that can parse those *.sh "at rest". But in your 16/18 you then do: =09 +if test "${GIT_TEST_CHAIN_LINT:-1}" !=3D 0 +then + "$PERL_PATH" "$TEST_DIRECTORY/chainlint.pl" "$0" || + BUG "lint error (see '?!...!? annotations above)" +fi =09 I may just be missing something here, but why not instead just borrow what I did for "lint-docs" in 8650c6298c1 (doc lint: make "lint-docs" non-.PHONY, 2021-10-15)? I.e. if we can run against t0001-init.sh or whatever *once* to see if it chain-lints OK then surely we could have a rule like: t0001-init.sh.chainlint-ok: t0001-init.sh perl chainlint.pl $< >$@ Then whenever you change t0001-init.sh we refresh that t0001-init.sh.chainlint-ok, if the chainlint.pl exits non-zero we'll fail to make it, and will unlink that t0001-init.sh.chainlint-ok. That way you wouldn't need any parallelism in the Perl script, because you'd have "make" take care of it, and the common case of re-testing where the speed matters would be that we woudln't need to run this at all, or would only re-run it for the test scripts that changed. (Obviously a "real" implementation would want to create that ".ok" file in t/.build/chainlint" or whatever) A drawback is that you'd probably be slower on the initial run, as you'd spwn N chainlint.pl. You could use $? instead of $< to get around that, but that requires some re-structuring, and I've found it to generally not be worth it. It would also have the drawback that a: ./t0001-init.sh wouldn't run the chain-lint, but this would: make T=3Dt0001-init.sh But if want the former to work we could carry some "GIT_TEST_VIA_MAKEFILE" variable or whatever, and only run the test-via-test-lib.sh if it isn't set.