From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>, "Jeff King" <peff@peff.net>,
"Johannes Sixt" <j6t@kdbg.org>,
"Øystein Walle" <oystwa@gmail.com>,
"Eric Sunshine" <sunshine@sunshineco.com>,
"Taylor Blau" <me@ttaylorr.com>,
"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: [PATCH v3 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version
Date: Fri, 5 Nov 2021 15:08:07 +0100 [thread overview]
Message-ID: <patch-v3-09.10-e2702bcc1d0-20211105T135058Z-avarab@gmail.com> (raw)
In-Reply-To: <cover-v3-00.10-00000000000-20211105T135058Z-avarab@gmail.com>
Replace the "grep" we run to exclude certain programs from the
generated output with a pure-shell loop that strips out the comments,
and sees if the "cmd" we're reading is on a list of excluded
programs. This uses a trick similar to test_have_prereq() in
test-lib-functions.sh.
On my *nix system this makes things quite a bit slower compared to
HEAD~:
o
'sh generate-cmdlist.sh.old command-list.txt' ran
1.56 ± 0.11 times faster than 'sh generate-cmdlist.sh command-list.txt'
18.00 ± 0.19 times faster than 'sh generate-cmdlist.sh.master command-list.txt'
But when I tried running generate-cmdlist.sh 100 times in CI I found
that it helped across the board even on OSX & Linux. I tried testing
it in CI with this ad-hoc few-liner:
for i in $(seq -w 0 11 | sort -nr)
do
git show HEAD~$i:generate-cmdlist.sh >generate-cmdlist-HEAD$i.sh &&
git add generate-cmdlist* &&
cp t/t0000-generate-cmdlist.sh t/t00$i-generate-cmdlist.sh || : &&
perl -pi -e "s/HEAD0/HEAD$i/g" t/t00$i-generate-cmdlist.sh &&
git add t/t00*.sh
done && git commit -m"generated it"
Here HEAD~02 and the t0002* file refers to this change, and HEAD~03
and t0003* file to the preceding commit, the relevant results were:
linux-gcc:
[12:05:33] t0002-generate-cmdlist.sh .. ok 14 ms ( 0.00 usr 0.00 sys + 3.64 cusr 3.09 csys = 6.73 CPU)
[12:05:30] t0003-generate-cmdlist.sh .. ok 32 ms ( 0.00 usr 0.00 sys + 2.66 cusr 1.81 csys = 4.47 CPU)
osx-gcc:
[11:58:04] t0002-generate-cmdlist.sh .. ok 80081 ms ( 0.02 usr 0.02 sys + 17.80 cusr 10.07 csys = 27.91 CPU)
[11:58:16] t0003-generate-cmdlist.sh .. ok 92127 ms ( 0.02 usr 0.01 sys + 22.54 cusr 14.27 csys = 36.84 CPU)
vs-test:
[12:03:14] t0002-generate-cmdlist.sh .. ok 30 s ( 0.02 usr 0.00 sys + 13.14 cusr 26.19 csys = 39.35 CPU)
[12:03:20] t0003-generate-cmdlist.sh .. ok 32 s ( 0.00 usr 0.02 sys + 13.25 cusr 26.10 csys = 39.37 CPU)
I.e. even on *nix running 100 of these in a loop was up to ~2x faster
in absolute runtime, I suspect it's due factors that are exacerbated
in the CI, e.g. much slower process startup due to some platform
limits, or a slower FS.
The "cut -d" change here is because we're not emitting the
40-character aligned output anymore, i.e. we'll get the output from
command_list() now, not an as-is line from command-list.txt.
This also makes the parsing more reliable, as we could tweak the
whitespace alignment without breaking this parser. Let's reword a
now-inaccurate comment in "command-list.txt" describing that previous
alignment limitation. We'll still need the "### command-list [...]"
line due to the "Documentation/cmd-list.perl" logic added in
11c6659d85d (command-list: prepare machinery for upcoming "common
groups" section, 2015-05-21).
There was a proposed change subsequent to this one[3] which continued
moving more logic into the "command_list() function, i.e. replaced the
"cut | tr | grep" chain in "category_list()" with an argument to
"command_list()".
That change might have had a bit of an effect, but not as much as the
preceding commit, so I decided to drop it. The relevant performance
numbers from it were:
linux-gcc:
[12:05:33] t0001-generate-cmdlist.sh .. ok 13 ms ( 0.00 usr 0.00 sys + 3.33 cusr 2.78 csys = 6.11 CPU)
[12:05:33] t0002-generate-cmdlist.sh .. ok 14 ms ( 0.00 usr 0.00 sys + 3.64 cusr 3.09 csys = 6.73 CPU)
osx-gcc:
[11:58:03] t0001-generate-cmdlist.sh .. ok 78416 ms ( 0.02 usr 0.01 sys + 11.78 cusr 6.22 csys = 18.03 CPU)
[11:58:04] t0002-generate-cmdlist.sh .. ok 80081 ms ( 0.02 usr 0.02 sys + 17.80 cusr 10.07 csys = 27.91 CPU)
vs-test:
[12:03:20] t0001-generate-cmdlist.sh .. ok 34 s ( 0.00 usr 0.03 sys + 12.42 cusr 19.55 csys = 32.00 CPU)
[12:03:14] t0002-generate-cmdlist.sh .. ok 30 s ( 0.02 usr 0.00 sys + 13.14 cusr 26.19 csys = 39.35 CPU)
As above HEAD~2 and t0002* are testing the code in this commit (and
the line is the same), but HEAD~1 and t0001* are testing that dropped
change in [3].
1. https://lore.kernel.org/git/cover-v2-00.10-00000000000-20211022T193027Z-avarab@gmail.com/
2. https://lore.kernel.org/git/patch-v2-08.10-83318d6c0da-20211022T193027Z-avarab@gmail.com/
3. https://lore.kernel.org/git/patch-v2-10.10-e10a43756d1-20211022T193027Z-avarab@gmail.com/
---
command-list.txt | 2 +-
generate-cmdlist.sh | 24 ++++++++++++++++++++----
2 files changed, 21 insertions(+), 5 deletions(-)
diff --git a/command-list.txt b/command-list.txt
index 04cde20c3da..675c28f0bd0 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -43,7 +43,7 @@
# specified here, which can only have "guide" attribute and nothing
# else.
#
-### command list (do not change this line, also do not change alignment)
+### command list (do not change this line)
# command name category [category] [category]
git-add mainporcelain worktree
git-am mainporcelain
diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index 9b7d6aea629..cfe0454d1de 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -6,12 +6,28 @@ die () {
}
command_list () {
- eval "grep -ve '^#' $exclude_programs" <"$1"
+ while read cmd rest
+ do
+ case "$cmd" in
+ "#"* | '')
+ # Ignore comments and allow empty lines
+ continue
+ ;;
+ *)
+ case "$exclude_programs" in
+ *":$cmd:"*)
+ ;;
+ *)
+ echo "$cmd $rest"
+ ;;
+ esac
+ esac
+ done <"$1"
}
category_list () {
command_list "$1" |
- cut -c 40- |
+ cut -d' ' -f2- |
tr ' ' '\012' |
grep -v '^$' |
LC_ALL=C sort -u
@@ -69,11 +85,11 @@ print_command_list () {
echo "};"
}
-exclude_programs=
+exclude_programs=:
while test "--exclude-program" = "$1"
do
shift
- exclude_programs="$exclude_programs -e \"^$1 \""
+ exclude_programs="$exclude_programs$1:"
shift
done
--
2.34.0.rc1.721.ga0c1db665bc
next prev parent reply other threads:[~2021-11-05 14:08 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-24 13:16 Why the Makefile is so eager to re-build & re-link Ævar Arnfjörð Bjarmason
2021-06-24 15:16 ` Jeff King
2021-06-24 15:28 ` Ævar Arnfjörð Bjarmason
2021-06-24 21:30 ` Johannes Sixt
2021-06-25 8:34 ` Ævar Arnfjörð Bjarmason
2021-06-25 9:01 ` Ævar Arnfjörð Bjarmason
2021-06-29 2:13 ` Jeff King
2021-10-20 18:39 ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason
2021-10-20 18:39 ` [PATCH 1/8] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
2021-10-20 18:39 ` [PATCH 2/8] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
2021-10-20 18:39 ` [PATCH 3/8] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
2021-10-20 18:39 ` [PATCH 4/8] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
2021-10-20 18:39 ` [PATCH 5/8] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
2021-10-20 18:39 ` [PATCH 6/8] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
2021-10-21 14:42 ` Jeff King
2021-10-21 16:25 ` Jeff King
2021-10-20 18:39 ` [PATCH 7/8] Makefile: stop having command-list.h depend on a wildcard Ævar Arnfjörð Bjarmason
2021-10-21 14:45 ` Jeff King
2021-10-21 18:24 ` Junio C Hamano
2021-10-21 22:46 ` Øystein Walle
2021-10-20 18:39 ` [PATCH 8/8] Makefile: assert correct generate-cmdlist.sh output Ævar Arnfjörð Bjarmason
2021-10-20 20:35 ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Jeff King
2021-10-20 21:31 ` Taylor Blau
2021-10-20 23:14 ` Ævar Arnfjörð Bjarmason
2021-10-20 23:46 ` Jeff King
2021-10-21 0:48 ` Ævar Arnfjörð Bjarmason
2021-10-21 2:20 ` Taylor Blau
2021-10-22 12:37 ` Ævar Arnfjörð Bjarmason
2021-10-21 14:34 ` Jeff King
2021-10-21 22:34 ` Junio C Hamano
2021-10-22 10:51 ` Ævar Arnfjörð Bjarmason
2021-10-22 18:31 ` Jeff King
2021-10-22 20:50 ` Ævar Arnfjörð Bjarmason
2021-10-21 5:39 ` Eric Sunshine
2021-10-22 19:36 ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason
2021-10-22 19:36 ` [PATCH v2 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
2021-10-25 18:29 ` Junio C Hamano
2021-10-25 21:22 ` Ævar Arnfjörð Bjarmason
2021-10-25 21:26 ` Junio C Hamano
2021-10-22 19:36 ` [PATCH v2 02/10] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
2021-10-22 19:36 ` [PATCH v2 03/10] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
2021-10-22 19:36 ` [PATCH v2 04/10] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
2021-10-22 19:36 ` [PATCH v2 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
2021-10-22 19:36 ` [PATCH v2 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
2021-10-22 19:36 ` [PATCH v2 07/10] generate-cmdlist.sh: stop sorting category lines Ævar Arnfjörð Bjarmason
2021-10-25 16:39 ` Jeff King
2021-10-22 19:36 ` [PATCH v2 08/10] generate-cmdlist.sh: do not shell out to "sed" Ævar Arnfjörð Bjarmason
2021-10-25 16:46 ` Jeff King
2021-10-25 17:52 ` Jeff King
2021-10-22 19:36 ` [PATCH v2 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason
2021-10-23 22:19 ` Junio C Hamano
2021-10-23 22:26 ` Junio C Hamano
2021-10-22 19:36 ` [PATCH v2 10/10] generate-cmdlist.sh: replace "cut", "tr" and "grep" with pure-shell Ævar Arnfjörð Bjarmason
2021-10-23 22:26 ` Junio C Hamano
2021-10-22 21:20 ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Taylor Blau
2021-10-23 22:34 ` Junio C Hamano
2021-10-25 16:57 ` Jeff King
2021-11-05 14:07 ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason
2021-11-05 14:07 ` [PATCH v3 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
2021-11-05 22:45 ` Junio C Hamano
2021-11-06 4:26 ` Ævar Arnfjörð Bjarmason
2021-11-08 19:18 ` Junio C Hamano
2021-11-05 14:08 ` [PATCH v3 02/10] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
2021-11-05 14:08 ` [PATCH v3 03/10] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
2021-11-05 22:47 ` Junio C Hamano
2021-11-06 4:23 ` Ævar Arnfjörð Bjarmason
2021-11-05 14:08 ` [PATCH v3 04/10] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
2021-11-05 14:08 ` [PATCH v3 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
2021-11-05 14:08 ` [PATCH v3 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
2021-11-05 14:08 ` [PATCH v3 07/10] generate-cmdlist.sh: stop sorting category lines Ævar Arnfjörð Bjarmason
2021-11-05 14:08 ` [PATCH v3 08/10] generate-cmdlist.sh: do not shell out to "sed" Ævar Arnfjörð Bjarmason
2021-11-05 14:08 ` Ævar Arnfjörð Bjarmason [this message]
2021-11-05 14:08 ` [PATCH v3 10/10] generate-cmdlist.sh: don't parse command-list.txt thrice Ævar Arnfjörð Bjarmason
2021-06-25 21:17 ` Why the Makefile is so eager to re-build & re-link Felipe Contreras
2021-06-29 5:04 ` Eric Sunshine
2021-06-24 23:35 ` Øystein Walle
2021-06-24 23:39 ` Øystein Walle
2021-06-25 0:11 ` Ævar Arnfjörð Bjarmason
2021-07-02 11:58 ` [PATCH] Documentation/Makefile: don't re-build on 'git version' changes Ævar Arnfjörð Bjarmason
2021-07-02 15:53 ` Junio C Hamano
2021-07-03 11:58 ` Ævar Arnfjörð Bjarmason
2021-07-05 19:48 ` Junio C Hamano
2021-07-03 1:05 ` Felipe Contreras
2021-07-03 12:03 ` Ævar Arnfjörð Bjarmason
2021-07-03 18:56 ` Felipe Contreras
2021-07-05 19:38 ` Junio C Hamano
2021-07-06 22:25 ` Felipe Contreras
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=patch-v3-09.10-e2702bcc1d0-20211105T135058Z-avarab@gmail.com \
--to=avarab@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=j6t@kdbg.org \
--cc=me@ttaylorr.com \
--cc=oystwa@gmail.com \
--cc=peff@peff.net \
--cc=sunshine@sunshineco.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).