From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: "Jeff King" <peff@peff.net>,
git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
"Johannes Sixt" <j6t@kdbg.org>,
"Øystein Walle" <oystwa@gmail.com>
Subject: Re: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN
Date: Thu, 21 Oct 2021 01:14:37 +0200 [thread overview]
Message-ID: <211021.86v91rmftn.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <YXCKqAEwtwFozWk6@nand.local>
On Wed, Oct 20 2021, Taylor Blau wrote:
> On Wed, Oct 20, 2021 at 04:35:38PM -0400, Jeff King wrote:
>> On Wed, Oct 20, 2021 at 08:39:51PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> > This series is based off an off-hand comment I made about making the
>> > cmdlist.sh faster, in the meantime much of the same methods are
>> > already cooking in "next" for the "lint-docs" target.
>> >
>> > See 7/8 for the main performance numbers, along the way I stole some
>> > patches from Johannes Sixt who'd worked on optimizing the script
>> > before, which compliment this new method of generating this file by
>> > piggy-backing more on GNU make for managing a dependency graph for us.
>>
>> I still think this is a much more complicated and error-prone approach
>> than just making the script faster. I know we can't rely on perl, but
>> could we use it optimistically?
Jeff: Just in terms of error prone both of these implementations will
accept bad input that's being caught in 8/8 of this series.
We accept a lot of bad input now, ending up with some combinations of
bad output or compile errors if you screw with the input *.txt files. I
think I've addressed all of those in this series.
If you mean the general concept of making a "foo.gen" from a "foo.txt"
as an intermediate with make as a way to get to "many-foo.h" I don't
really see how it's error prone conceptually. You get error checking
each step of the way, and it encourages logic that's simpler each step
of the way.
> I'll take credit for this terrible idea of using Perl when available.
>
> But I don't think we even need to, since we could just rely on Awk. That
> has all the benefits you described while still avoiding the circular
> dependency on libgit.a. But the killer feature is that we don't have to
> rely on two implementations, the lesser-used of which is likely to
> bitrot over time.
>
> The resulting awk is a little ugly, because of the nested-ness. I'm also
> no awk-spert, but I think that something like the below gets the job
> done.
>
> It also has the benefit of being slightly faster than the equivalent
> Perl implementation, for whatever those extra ~9 ms are worth ;).
>
> Benchmark #1: sh generate-cmdlist.sh command-list.txt
> Time (mean ± σ): 25.3 ms ± 5.3 ms [User: 31.1 ms, System: 8.3 ms]
> Range (min … max): 15.5 ms … 31.7 ms 95 runs
>
> Benchmark #2: sh generate-cmdlist.sh.old command-list.txt
> Time (mean ± σ): 34.9 ms ± 9.8 ms [User: 41.0 ms, System: 6.9 ms]
> Range (min … max): 22.4 ms … 54.8 ms 64 runs
>
> Summary
> 'sh generate-cmdlist.sh command-list.txt' ran
> 1.38 ± 0.49 times faster than 'sh generate-cmdlist.sh.old command-list.txt'
>
> ---
>
> diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
> index a1ab2b1f07..39338ef1cc 100755
> --- a/generate-cmdlist.sh
> +++ b/generate-cmdlist.sh
> @@ -64,12 +64,19 @@ print_command_list () {
> echo "static struct cmdname_help command_list[] = {"
>
> command_list "$1" |
> - while read cmd rest
> - do
> - printf " { \"$cmd\", $(get_synopsis $cmd), 0"
> - printf " | CAT_%s" $(echo "$rest" | get_category_line)
> - echo " },"
> - done
> + awk '{
> + f="Documentation/" $1 ".txt"
> + while((getline line<f) > 0) {
> + if (match(line, "^" $1 " - ")) {
> + syn=substr(line, RLENGTH+1)
> + printf "\t{ \"%s\", N_(\"%s\"), 0", $1, syn
> + for (i=2; i<=NF; i++) {
> + printf " | CAT_%s", $i
> + }
> + print " },"
> + }
> + }
> + }'
> echo "};"
> }
Per Eric's Sunshine's upthread comments an awk and Perl implementation
were both considered before[1].
I also care a bit about the timings of the from-scratch build, but I
think they're way less interesting than a partial build.
I.e. I think if you e.g. touch Documentation/git-a*.txt with this series
with/without this awk version the difference in runtime is within the
error bars. I.e. making the loop faster isn't necessary. It's better to
get to a point where make can save you from doing all/most of the work
by checking modification times, rather than making an O(n) loop faster.
The only reason there's even a loop there is because it's used by the
cmake logic in contrib/* (how we've ended up with a hard dependency in
contrib is another matter...).
I'm also interested in (and have WIP patches for) simplifying things
more generally in the Makefile. Once we have a file exploded out has
just the synopsis line that can be used to replace what's now in
Documentation/cmd-list.perl, i.e. those summary blurbs also end up in
"man git".
There's subtle dependency issues there as well, and just having a
one-off solution for the the command-list.h doesn't get us closer to
addressing that sibling implementation.
In terms of future Makefile work I was hoping to get this in, untangle
some of the complexity between the inter-dependency of Makefile &
Documentation/Makefile (eventually just merging the two, and leaving a
stub in Documentation/Makefile). I've also got a working implementation
for getting rid of all of the "FORCE" dependencies (except the version
one).
1. https://lore.kernel.org/git/CAPig+cSzKoOzU-zPOZqfNpPYBFpcWqvDP3mwLvAn5WkiNW0UMw@mail.gmail.com/
next prev parent reply other threads:[~2021-10-20 23:34 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-24 13:16 Why the Makefile is so eager to re-build & re-link Ævar Arnfjörð Bjarmason
2021-06-24 15:16 ` Jeff King
2021-06-24 15:28 ` Ævar Arnfjörð Bjarmason
2021-06-24 21:30 ` Johannes Sixt
2021-06-25 8:34 ` Ævar Arnfjörð Bjarmason
2021-06-25 9:01 ` Ævar Arnfjörð Bjarmason
2021-06-29 2:13 ` Jeff King
2021-10-20 18:39 ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason
2021-10-20 18:39 ` [PATCH 1/8] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
2021-10-20 18:39 ` [PATCH 2/8] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
2021-10-20 18:39 ` [PATCH 3/8] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
2021-10-20 18:39 ` [PATCH 4/8] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
2021-10-20 18:39 ` [PATCH 5/8] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
2021-10-20 18:39 ` [PATCH 6/8] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
2021-10-21 14:42 ` Jeff King
2021-10-21 16:25 ` Jeff King
2021-10-20 18:39 ` [PATCH 7/8] Makefile: stop having command-list.h depend on a wildcard Ævar Arnfjörð Bjarmason
2021-10-21 14:45 ` Jeff King
2021-10-21 18:24 ` Junio C Hamano
2021-10-21 22:46 ` Øystein Walle
2021-10-20 18:39 ` [PATCH 8/8] Makefile: assert correct generate-cmdlist.sh output Ævar Arnfjörð Bjarmason
2021-10-20 20:35 ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Jeff King
2021-10-20 21:31 ` Taylor Blau
2021-10-20 23:14 ` Ævar Arnfjörð Bjarmason [this message]
2021-10-20 23:46 ` Jeff King
2021-10-21 0:48 ` Ævar Arnfjörð Bjarmason
2021-10-21 2:20 ` Taylor Blau
2021-10-22 12:37 ` Ævar Arnfjörð Bjarmason
2021-10-21 14:34 ` Jeff King
2021-10-21 22:34 ` Junio C Hamano
2021-10-22 10:51 ` Ævar Arnfjörð Bjarmason
2021-10-22 18:31 ` Jeff King
2021-10-22 20:50 ` Ævar Arnfjörð Bjarmason
2021-10-21 5:39 ` Eric Sunshine
2021-10-22 19:36 ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason
2021-10-22 19:36 ` [PATCH v2 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
2021-10-25 18:29 ` Junio C Hamano
2021-10-25 21:22 ` Ævar Arnfjörð Bjarmason
2021-10-25 21:26 ` Junio C Hamano
2021-10-22 19:36 ` [PATCH v2 02/10] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
2021-10-22 19:36 ` [PATCH v2 03/10] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
2021-10-22 19:36 ` [PATCH v2 04/10] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
2021-10-22 19:36 ` [PATCH v2 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
2021-10-22 19:36 ` [PATCH v2 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
2021-10-22 19:36 ` [PATCH v2 07/10] generate-cmdlist.sh: stop sorting category lines Ævar Arnfjörð Bjarmason
2021-10-25 16:39 ` Jeff King
2021-10-22 19:36 ` [PATCH v2 08/10] generate-cmdlist.sh: do not shell out to "sed" Ævar Arnfjörð Bjarmason
2021-10-25 16:46 ` Jeff King
2021-10-25 17:52 ` Jeff King
2021-10-22 19:36 ` [PATCH v2 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason
2021-10-23 22:19 ` Junio C Hamano
2021-10-23 22:26 ` Junio C Hamano
2021-10-22 19:36 ` [PATCH v2 10/10] generate-cmdlist.sh: replace "cut", "tr" and "grep" with pure-shell Ævar Arnfjörð Bjarmason
2021-10-23 22:26 ` Junio C Hamano
2021-10-22 21:20 ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Taylor Blau
2021-10-23 22:34 ` Junio C Hamano
2021-10-25 16:57 ` Jeff King
2021-11-05 14:07 ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason
2021-11-05 14:07 ` [PATCH v3 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
2021-11-05 22:45 ` Junio C Hamano
2021-11-06 4:26 ` Ævar Arnfjörð Bjarmason
2021-11-08 19:18 ` Junio C Hamano
2021-11-05 14:08 ` [PATCH v3 02/10] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
2021-11-05 14:08 ` [PATCH v3 03/10] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
2021-11-05 22:47 ` Junio C Hamano
2021-11-06 4:23 ` Ævar Arnfjörð Bjarmason
2021-11-05 14:08 ` [PATCH v3 04/10] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
2021-11-05 14:08 ` [PATCH v3 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
2021-11-05 14:08 ` [PATCH v3 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
2021-11-05 14:08 ` [PATCH v3 07/10] generate-cmdlist.sh: stop sorting category lines Ævar Arnfjörð Bjarmason
2021-11-05 14:08 ` [PATCH v3 08/10] generate-cmdlist.sh: do not shell out to "sed" Ævar Arnfjörð Bjarmason
2021-11-05 14:08 ` [PATCH v3 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason
2021-11-05 14:08 ` [PATCH v3 10/10] generate-cmdlist.sh: don't parse command-list.txt thrice Ævar Arnfjörð Bjarmason
2021-06-25 21:17 ` Why the Makefile is so eager to re-build & re-link Felipe Contreras
2021-06-29 5:04 ` Eric Sunshine
2021-06-24 23:35 ` Øystein Walle
2021-06-24 23:39 ` Øystein Walle
2021-06-25 0:11 ` Ævar Arnfjörð Bjarmason
2021-07-02 11:58 ` [PATCH] Documentation/Makefile: don't re-build on 'git version' changes Ævar Arnfjörð Bjarmason
2021-07-02 15:53 ` Junio C Hamano
2021-07-03 11:58 ` Ævar Arnfjörð Bjarmason
2021-07-05 19:48 ` Junio C Hamano
2021-07-03 1:05 ` Felipe Contreras
2021-07-03 12:03 ` Ævar Arnfjörð Bjarmason
2021-07-03 18:56 ` Felipe Contreras
2021-07-05 19:38 ` Junio C Hamano
2021-07-06 22:25 ` Felipe Contreras
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=211021.86v91rmftn.gmgdl@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=j6t@kdbg.org \
--cc=me@ttaylorr.com \
--cc=oystwa@gmail.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).