git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: Jeff King <peff@peff.net>
To: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Cc: git@vger.kernel.org, "Junio C Hamano" <gitster@pobox.com>,
	"Johannes Sixt" <j6t@kdbg.org>,
	"Øystein Walle" <oystwa@gmail.com>
Subject: Re: [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN
Date: Wed, 20 Oct 2021 16:35:38 -0400	[thread overview]
Message-ID: <YXB9moNHRj+nrnX9@coredump.intra.peff.net> (raw)
In-Reply-To: <cover-0.8-00000000000-20211020T183533Z-avarab@gmail.com>

On Wed, Oct 20, 2021 at 08:39:51PM +0200, Ævar Arnfjörð Bjarmason wrote:

> This series is based off an off-hand comment I made about making the
> cmdlist.sh faster, in the meantime much of the same methods are
> already cooking in "next" for the "lint-docs" target.
> 
> See 7/8 for the main performance numbers, along the way I stole some
> patches from Johannes Sixt who'd worked on optimizing the script
> before, which compliment this new method of generating this file by
> piggy-backing more on GNU make for managing a dependency graph for us.

I still think this is a much more complicated and error-prone approach
than just making the script faster. I know we can't rely on perl, but
could we use it optimistically?

The proof-of-concept below on top of your patch 6 does two things:

  - observes that there is no need for get_category_line in the loop; it
    is just sorting and de-duping the bitfields, but since we just OR
    them together, neither of those things matters

  - uses perl to open each individual doc file to get the synopsis. It
    _feels_ like this should be something that sed or awk could do, but
    it is beyond me. However, speculatively trying perl is an easy win,
    and we can fall back to the shell loop.

Here are my timings:

Benchmark #1: sh generate-cmdlist.sh command-list.txt
  Time (mean ± σ):      40.4 ms ±  18.1 ms    [User: 44.9 ms, System: 7.1 ms]
  Range (min … max):    20.3 ms …  65.5 ms    10 runs

Benchmark #2: sh generate-cmdlist.sh.old command-list.txt
  Time (mean ± σ):      1.414 s ±  0.038 s    [User: 1.641 s, System: 0.863 s]
  Range (min … max):    1.344 s …  1.451 s    10 runs

Summary
  'sh generate-cmdlist.sh command-list.txt' ran
   34.96 ± 15.66 times faster than 'sh generate-cmdlist.sh.old command-list.txt'

I hate having fallbacks, because the seldom-used version may bitrot. I'm
tempted to just write that loop in C, but there's a circular dependency
with using any of libgit.a (even though it's really only the git
porcelain that cares about command-list.h, it goes into help.o which
goes into libgit.a. We could break that dependency if we wanted,
though). If we can do it in awk, that may be worthwhile.

But either way, I think this is superior to trying to parallelize the
Makefile:

  - it actually uses less CPU, rather than just trying to improve
    wall-clock time by using more cores

  - there's little chance of having some subtle dependency problem

Parallelizing makes a lot of sense to me when the operation is truly
expensive. But in this case it's literally just opening a file, and the
only reason it's slow is because we spawn a ton of processes to do it.

---
diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index a1ab2b1f07..f922eebe23 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -63,11 +63,23 @@ define_category_names () {
 print_command_list () {
 	echo "static struct cmdname_help command_list[] = {"
 
+	# try perl first, as we can do it all in one process
+	command_list "$1" |
+	perl -ne '
+		my ($cmd, @rest) = split;
+		open(my $fh, "<", "Documentation/$cmd.txt");
+		while (<$fh>) {
+			next unless /^$cmd - (.*)/;
+			print "{ \"$cmd\", N_(\"$1\"), 0";
+			print " | CAT_$_" for (@rest);
+			print " },\n";
+		}
+	' ||
 	command_list "$1" |
 	while read cmd rest
 	do
 		printf "	{ \"$cmd\", $(get_synopsis $cmd), 0"
-		printf " | CAT_%s" $(echo "$rest" | get_category_line)
+		printf " | CAT_%s" $rest
 		echo " },"
 	done
 	echo "};"

  parent reply	other threads:[~2021-10-20 20:35 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-24 13:16 Why the Makefile is so eager to re-build & re-link Ævar Arnfjörð Bjarmason
2021-06-24 15:16 ` Jeff King
2021-06-24 15:28   ` Ævar Arnfjörð Bjarmason
2021-06-24 21:30   ` Johannes Sixt
2021-06-25  8:34     ` Ævar Arnfjörð Bjarmason
2021-06-25  9:01       ` Ævar Arnfjörð Bjarmason
2021-06-29  2:13       ` Jeff King
2021-10-20 18:39         ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 1/8] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 2/8] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 3/8] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 4/8] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 5/8] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 6/8] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
2021-10-21 14:42             ` Jeff King
2021-10-21 16:25               ` Jeff King
2021-10-20 18:39           ` [PATCH 7/8] Makefile: stop having command-list.h depend on a wildcard Ævar Arnfjörð Bjarmason
2021-10-21 14:45             ` Jeff King
2021-10-21 18:24               ` Junio C Hamano
2021-10-21 22:46             ` Øystein Walle
2021-10-20 18:39           ` [PATCH 8/8] Makefile: assert correct generate-cmdlist.sh output Ævar Arnfjörð Bjarmason
2021-10-20 20:35           ` Jeff King [this message]
2021-10-20 21:31             ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Taylor Blau
2021-10-20 23:14               ` Ævar Arnfjörð Bjarmason
2021-10-20 23:46                 ` Jeff King
2021-10-21  0:48                   ` Ævar Arnfjörð Bjarmason
2021-10-21  2:20                     ` Taylor Blau
2021-10-22 12:37                       ` Ævar Arnfjörð Bjarmason
2021-10-21 14:34                     ` Jeff King
2021-10-21 22:34                       ` Junio C Hamano
2021-10-22 10:51                       ` Ævar Arnfjörð Bjarmason
2021-10-22 18:31                         ` Jeff King
2021-10-22 20:50                           ` Ævar Arnfjörð Bjarmason
2021-10-21  5:39                 ` Eric Sunshine
2021-10-22 19:36           ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
2021-10-25 18:29               ` Junio C Hamano
2021-10-25 21:22                 ` Ævar Arnfjörð Bjarmason
2021-10-25 21:26                   ` Junio C Hamano
2021-10-22 19:36             ` [PATCH v2 02/10] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 03/10] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 04/10] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 07/10] generate-cmdlist.sh: stop sorting category lines Ævar Arnfjörð Bjarmason
2021-10-25 16:39               ` Jeff King
2021-10-22 19:36             ` [PATCH v2 08/10] generate-cmdlist.sh: do not shell out to "sed" Ævar Arnfjörð Bjarmason
2021-10-25 16:46               ` Jeff King
2021-10-25 17:52                 ` Jeff King
2021-10-22 19:36             ` [PATCH v2 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason
2021-10-23 22:19               ` Junio C Hamano
2021-10-23 22:26               ` Junio C Hamano
2021-10-22 19:36             ` [PATCH v2 10/10] generate-cmdlist.sh: replace "cut", "tr" and "grep" with pure-shell Ævar Arnfjörð Bjarmason
2021-10-23 22:26               ` Junio C Hamano
2021-10-22 21:20             ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Taylor Blau
2021-10-23 22:34             ` Junio C Hamano
2021-10-25 16:57             ` Jeff King
2021-11-05 14:07             ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason
2021-11-05 14:07               ` [PATCH v3 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
2021-11-05 22:45                 ` Junio C Hamano
2021-11-06  4:26                   ` Ævar Arnfjörð Bjarmason
2021-11-08 19:18                     ` Junio C Hamano
2021-11-05 14:08               ` [PATCH v3 02/10] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 03/10] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
2021-11-05 22:47                 ` Junio C Hamano
2021-11-06  4:23                   ` Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 04/10] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 07/10] generate-cmdlist.sh: stop sorting category lines Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 08/10] generate-cmdlist.sh: do not shell out to "sed" Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 10/10] generate-cmdlist.sh: don't parse command-list.txt thrice Ævar Arnfjörð Bjarmason
2021-06-25 21:17   ` Why the Makefile is so eager to re-build & re-link Felipe Contreras
2021-06-29  5:04   ` Eric Sunshine
2021-06-24 23:35 ` Øystein Walle
2021-06-24 23:39   ` Øystein Walle
2021-06-25  0:11   ` Ævar Arnfjörð Bjarmason
2021-07-02 11:58 ` [PATCH] Documentation/Makefile: don't re-build on 'git version' changes Ævar Arnfjörð Bjarmason
2021-07-02 15:53   ` Junio C Hamano
2021-07-03 11:58     ` Ævar Arnfjörð Bjarmason
2021-07-05 19:48       ` Junio C Hamano
2021-07-03  1:05   ` Felipe Contreras
2021-07-03 12:03     ` Ævar Arnfjörð Bjarmason
2021-07-03 18:56       ` Felipe Contreras
2021-07-05 19:38       ` Junio C Hamano
2021-07-06 22:25         ` Felipe Contreras

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YXB9moNHRj+nrnX9@coredump.intra.peff.net \
    --to=peff@peff.net \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j6t@kdbg.org \
    --cc=oystwa@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).