git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Taylor Blau <me@ttaylorr.com>
Cc: "Jeff King" <peff@peff.net>,
	git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
	"Johannes Sixt" <j6t@kdbg.org>,
	"Øystein Walle" <oystwa@gmail.com>
Subject: Re: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN
Date: Thu, 21 Oct 2021 01:14:37 +0200	[thread overview]
Message-ID: <211021.86v91rmftn.gmgdl@evledraar.gmail.com> (raw)
In-Reply-To: <YXCKqAEwtwFozWk6@nand.local>


On Wed, Oct 20 2021, Taylor Blau wrote:

> On Wed, Oct 20, 2021 at 04:35:38PM -0400, Jeff King wrote:
>> On Wed, Oct 20, 2021 at 08:39:51PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> > This series is based off an off-hand comment I made about making the
>> > cmdlist.sh faster, in the meantime much of the same methods are
>> > already cooking in "next" for the "lint-docs" target.
>> >
>> > See 7/8 for the main performance numbers, along the way I stole some
>> > patches from Johannes Sixt who'd worked on optimizing the script
>> > before, which compliment this new method of generating this file by
>> > piggy-backing more on GNU make for managing a dependency graph for us.
>>
>> I still think this is a much more complicated and error-prone approach
>> than just making the script faster. I know we can't rely on perl, but
>> could we use it optimistically?

Jeff: Just in terms of error prone both of these implementations will
accept bad input that's being caught in 8/8 of this series.

We accept a lot of bad input now, ending up with some combinations of
bad output or compile errors if you screw with the input *.txt files. I
think I've addressed all of those in this series.

If you mean the general concept of making a "foo.gen" from a "foo.txt"
as an intermediate with make as a way to get to "many-foo.h" I don't
really see how it's error prone conceptually. You get error checking
each step of the way, and it encourages logic that's simpler each step
of the way.

> I'll take credit for this terrible idea of using Perl when available.
>
> But I don't think we even need to, since we could just rely on Awk. That
> has all the benefits you described while still avoiding the circular
> dependency on libgit.a. But the killer feature is that we don't have to
> rely on two implementations, the lesser-used of which is likely to
> bitrot over time.
>
> The resulting awk is a little ugly, because of the nested-ness. I'm also
> no awk-spert, but I think that something like the below gets the job
> done.
>
> It also has the benefit of being slightly faster than the equivalent
> Perl implementation, for whatever those extra ~9 ms are worth ;).
>
> Benchmark #1: sh generate-cmdlist.sh command-list.txt
>   Time (mean ± σ):      25.3 ms ±   5.3 ms    [User: 31.1 ms, System: 8.3 ms]
>   Range (min … max):    15.5 ms …  31.7 ms    95 runs
>
> Benchmark #2: sh generate-cmdlist.sh.old command-list.txt
>   Time (mean ± σ):      34.9 ms ±   9.8 ms    [User: 41.0 ms, System: 6.9 ms]
>   Range (min … max):    22.4 ms …  54.8 ms    64 runs
>
> Summary
>   'sh generate-cmdlist.sh command-list.txt' ran
>     1.38 ± 0.49 times faster than 'sh generate-cmdlist.sh.old command-list.txt'
>
> ---
>
> diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
> index a1ab2b1f07..39338ef1cc 100755
> --- a/generate-cmdlist.sh
> +++ b/generate-cmdlist.sh
> @@ -64,12 +64,19 @@ print_command_list () {
>  	echo "static struct cmdname_help command_list[] = {"
>
>  	command_list "$1" |
> -	while read cmd rest
> -	do
> -		printf "	{ \"$cmd\", $(get_synopsis $cmd), 0"
> -		printf " | CAT_%s" $(echo "$rest" | get_category_line)
> -		echo " },"
> -	done
> +	awk '{
> +		f="Documentation/" $1 ".txt"
> +		while((getline line<f) > 0) {
> +			if (match(line, "^" $1 " - ")) {
> +				syn=substr(line, RLENGTH+1)
> +				printf "\t{ \"%s\", N_(\"%s\"), 0", $1, syn
> +				for (i=2; i<=NF; i++) {
> +					printf " | CAT_%s", $i
> +				}
> +				print " },"
> +			}
> +		}
> +	}'
>  	echo "};"
>  }

Per Eric's Sunshine's upthread comments an awk and Perl implementation
were both considered before[1].

I also care a bit about the timings of the from-scratch build, but I
think they're way less interesting than a partial build.

I.e. I think if you e.g. touch Documentation/git-a*.txt with this series
with/without this awk version the difference in runtime is within the
error bars. I.e. making the loop faster isn't necessary. It's better to
get to a point where make can save you from doing all/most of the work
by checking modification times, rather than making an O(n) loop faster.

The only reason there's even a loop there is because it's used by the
cmake logic in contrib/* (how we've ended up with a hard dependency in
contrib is another matter...).

I'm also interested in (and have WIP patches for) simplifying things
more generally in the Makefile. Once we have a file exploded out has
just the synopsis line that can be used to replace what's now in
Documentation/cmd-list.perl, i.e. those summary blurbs also end up in
"man git".

There's subtle dependency issues there as well, and just having a
one-off solution for the the command-list.h doesn't get us closer to
addressing that sibling implementation.

In terms of future Makefile work I was hoping to get this in, untangle
some of the complexity between the inter-dependency of Makefile &
Documentation/Makefile (eventually just merging the two, and leaving a
stub in Documentation/Makefile). I've also got a working implementation
for getting rid of all of the "FORCE" dependencies (except the version
one).

1. https://lore.kernel.org/git/CAPig+cSzKoOzU-zPOZqfNpPYBFpcWqvDP3mwLvAn5WkiNW0UMw@mail.gmail.com/

  reply	other threads:[~2021-10-20 23:34 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-24 13:16 Why the Makefile is so eager to re-build & re-link Ævar Arnfjörð Bjarmason
2021-06-24 15:16 ` Jeff King
2021-06-24 15:28   ` Ævar Arnfjörð Bjarmason
2021-06-24 21:30   ` Johannes Sixt
2021-06-25  8:34     ` Ævar Arnfjörð Bjarmason
2021-06-25  9:01       ` Ævar Arnfjörð Bjarmason
2021-06-29  2:13       ` Jeff King
2021-10-20 18:39         ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 1/8] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 2/8] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 3/8] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 4/8] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 5/8] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 6/8] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
2021-10-21 14:42             ` Jeff King
2021-10-21 16:25               ` Jeff King
2021-10-20 18:39           ` [PATCH 7/8] Makefile: stop having command-list.h depend on a wildcard Ævar Arnfjörð Bjarmason
2021-10-21 14:45             ` Jeff King
2021-10-21 18:24               ` Junio C Hamano
2021-10-21 22:46             ` Øystein Walle
2021-10-20 18:39           ` [PATCH 8/8] Makefile: assert correct generate-cmdlist.sh output Ævar Arnfjörð Bjarmason
2021-10-20 20:35           ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Jeff King
2021-10-20 21:31             ` Taylor Blau
2021-10-20 23:14               ` Ævar Arnfjörð Bjarmason [this message]
2021-10-20 23:46                 ` Jeff King
2021-10-21  0:48                   ` Ævar Arnfjörð Bjarmason
2021-10-21  2:20                     ` Taylor Blau
2021-10-22 12:37                       ` Ævar Arnfjörð Bjarmason
2021-10-21 14:34                     ` Jeff King
2021-10-21 22:34                       ` Junio C Hamano
2021-10-22 10:51                       ` Ævar Arnfjörð Bjarmason
2021-10-22 18:31                         ` Jeff King
2021-10-22 20:50                           ` Ævar Arnfjörð Bjarmason
2021-10-21  5:39                 ` Eric Sunshine
2021-10-22 19:36           ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
2021-10-25 18:29               ` Junio C Hamano
2021-10-25 21:22                 ` Ævar Arnfjörð Bjarmason
2021-10-25 21:26                   ` Junio C Hamano
2021-10-22 19:36             ` [PATCH v2 02/10] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 03/10] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 04/10] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 07/10] generate-cmdlist.sh: stop sorting category lines Ævar Arnfjörð Bjarmason
2021-10-25 16:39               ` Jeff King
2021-10-22 19:36             ` [PATCH v2 08/10] generate-cmdlist.sh: do not shell out to "sed" Ævar Arnfjörð Bjarmason
2021-10-25 16:46               ` Jeff King
2021-10-25 17:52                 ` Jeff King
2021-10-22 19:36             ` [PATCH v2 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason
2021-10-23 22:19               ` Junio C Hamano
2021-10-23 22:26               ` Junio C Hamano
2021-10-22 19:36             ` [PATCH v2 10/10] generate-cmdlist.sh: replace "cut", "tr" and "grep" with pure-shell Ævar Arnfjörð Bjarmason
2021-10-23 22:26               ` Junio C Hamano
2021-10-22 21:20             ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Taylor Blau
2021-10-23 22:34             ` Junio C Hamano
2021-10-25 16:57             ` Jeff King
2021-11-05 14:07             ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason
2021-11-05 14:07               ` [PATCH v3 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
2021-11-05 22:45                 ` Junio C Hamano
2021-11-06  4:26                   ` Ævar Arnfjörð Bjarmason
2021-11-08 19:18                     ` Junio C Hamano
2021-11-05 14:08               ` [PATCH v3 02/10] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 03/10] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
2021-11-05 22:47                 ` Junio C Hamano
2021-11-06  4:23                   ` Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 04/10] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 07/10] generate-cmdlist.sh: stop sorting category lines Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 08/10] generate-cmdlist.sh: do not shell out to "sed" Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 10/10] generate-cmdlist.sh: don't parse command-list.txt thrice Ævar Arnfjörð Bjarmason
2021-06-25 21:17   ` Why the Makefile is so eager to re-build & re-link Felipe Contreras
2021-06-29  5:04   ` Eric Sunshine
2021-06-24 23:35 ` Øystein Walle
2021-06-24 23:39   ` Øystein Walle
2021-06-25  0:11   ` Ævar Arnfjörð Bjarmason
2021-07-02 11:58 ` [PATCH] Documentation/Makefile: don't re-build on 'git version' changes Ævar Arnfjörð Bjarmason
2021-07-02 15:53   ` Junio C Hamano
2021-07-03 11:58     ` Ævar Arnfjörð Bjarmason
2021-07-05 19:48       ` Junio C Hamano
2021-07-03  1:05   ` Felipe Contreras
2021-07-03 12:03     ` Ævar Arnfjörð Bjarmason
2021-07-03 18:56       ` Felipe Contreras
2021-07-05 19:38       ` Junio C Hamano
2021-07-06 22:25         ` Felipe Contreras

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=211021.86v91rmftn.gmgdl@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j6t@kdbg.org \
    --cc=me@ttaylorr.com \
    --cc=oystwa@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).