From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Johannes Sixt <j6t@kdbg.org>
Cc: Jeff King <peff@peff.net>, Git List <git@vger.kernel.org>,
Junio C Hamano <gitster@pobox.com>, Taylor Blau <me@ttaylorr.com>,
Emily Shaffer <emilyshaffer@google.com>,
Eric Sunshine <sunshine@sunshineco.com>,
Johannes Schindelin <Johannes.Schindelin@gmx.de>
Subject: Re: Why the Makefile is so eager to re-build & re-link
Date: Fri, 25 Jun 2021 10:34:20 +0200 [thread overview]
Message-ID: <87r1gqxqxn.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <fb23a23e-13be-14a8-4fbe-5ca2b4bcdb52@kdbg.org>
On Thu, Jun 24 2021, Johannes Sixt wrote:
> Am 24.06.21 um 17:16 schrieb Jeff King:
>> On Thu, Jun 24, 2021 at 03:16:48PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>> * {command,config}-list.h (and in-flight, my hook-list.h): Every time
>>> you touch a Documentation/git-*.txt we need to re-generate these, and
>>> since their mtime changes we re-compile and re-link all the way up to
>>> libgit and our other tools.
>>>
>>> I think the best solution here is to make the generate-*.sh
>>> shellscripts faster (just one takes ~300ms of nested shellscripting,
>>> just to grep out the first few lines of every git-*.txt, in e.g. Perl
>>> or a smarter awk script this would be <5ms).
>>
>> Yeah, I think Eric mentioned he had looked into doing this in perl, but
>> we weren't entirely happy with the dependency. Here's another really odd
>> thing I noticed:
>>
>> $ time sh ./generate-cmdlist.sh command-list.txt >one
>> real 0m1.323s
>> user 0m1.531s
>> sys 0m0.834s
>>
>> $ time sh -x ./generate-cmdlist.sh command-list.txt >two
>> [a bunch of trace output]
>> real 0m0.513s
>> user 0m0.754s
>> sys 0m0.168s
>>
>> $ cmp one two
>> [no output]
>>
>> Er, what? Running with "-x" makes it almost 3 times faster to generate
>> the same output? I'd have said this is an anomaly, but it's repeatable
>> (and swapping the order produces the same output, so it's not some weird
>> priming thing). And then to top it all off, redirecting the trace is
>> slow again:
>>
>> $ time sh -x ./generate-cmdlist.sh command-list.txt >two 2>/dev/null
>> real 0m1.363s
>> user 0m1.538s
>> sys 0m0.902s
>>
>> A little mini-mystery that I think I may leave unsolved for now.
>
> Strange, really. Reminds me of the case that the `read` built-in must
> read input byte by byte if stdin is not connected to a (searchable) file.
>
> I have two patches that speed up generate-cmdlist.sh a bit:
>
> generate-cmdlist.sh: replace for loop by printf's auto-repeat feature
> (https://github.com/j6t/git/commit/b6d05f653bede727bc001f299b57969f62d3bc03)
> generate-cmdlist.sh: spawn fewer processes
> (https://github.com/j6t/git/commit/fd8721ee8fae06d7b584fa5166f32bf5521ca304)
>
> that I can submit (give me some time, though) or interested parties
> could pick up.
Interesting, but I think rather than micro-optimizing the O(n) loop it
makes more sense to turn it into a series of O(1) in -j parallel,
i.e. actually use the make dependency graph for this as I suggested in:
https://lore.kernel.org/git/87wnqiyejg.fsf@evledraar.gmail.com/
Something like the hacky throwaway patch that follows. Now when you
touch a file in Documentation/git-*.txt you re-make just that file
chain, which gets assembled into the command-list.h:
$ touch Documentation/git-add.txt; time make -k -j $(nproc) git 2>&1
GIT_VERSION = 2.32.0.94.g87ef2a6b7ed.dirty
GEN build/Documentation/git-add.txt.cmdlist.in
CC version.o
GEN build/Documentation/git-add.txt.cmdlist
GEN command-list.h
CC help.o
AR libgit.a
LINK git
Those build/* files are, respectively, the relevant line corresponding
to the *.txt file from command-list.txt:
$ cat build/Documentation/git-add.txt.cmdlist.in
git-add mainporcelain worktree
And then the line we want in command-list.h at the end:
$ cat build/Documentation/git-add.txt.cmdlist
{ "git-add", N_("Add file contents to the index"), 0 | CAT_mainporcelain | CAT_worktree },
The build process in then just a matter of keeping those files
up-to-date, and having "make" create the command-list.h is just a matter
of:
cat build/Documentation/*.cmdlist
Well, with the categories still being an O(n) affair, but the cost of
generating those is trivial.
I think that aside from optimizing for speed it just makes things
clearer, if you modify e.g. git-add.txt you see a clear chain of output
from make about how we generate output from that, and then make the
command-list.h.
Now we'd just show a mysterious command-list.h entry. Whenever you have
"make" create one file from many it becomes hard to see at a glance
what's going on.
I'd be curious to see what how that performs e.g. on Windows, on Linux I
get e.g. (warm cache):
$ touch Documentation/git-a*.txt; time make -k -j $(nproc) command-list.h 2>&1
GEN build/Documentation/git-apply.txt.cmdlist.in
GEN build/Documentation/git-annotate.txt.cmdlist.in
GEN build/Documentation/git-am.txt.cmdlist.in
GEN build/Documentation/git-add.txt.cmdlist.in
GEN build/Documentation/git-archimport.txt.cmdlist.in
GEN build/Documentation/git-archive.txt.cmdlist.in
GEN build/Documentation/git-apply.txt.cmdlist
GEN build/Documentation/git-annotate.txt.cmdlist
GEN build/Documentation/git-am.txt.cmdlist
GEN build/Documentation/git-add.txt.cmdlist
GEN build/Documentation/git-archimport.txt.cmdlist
GEN build/Documentation/git-archive.txt.cmdlist
GEN command-list.h
real 0m0.214s
user 0m0.196s
sys 0m0.071s
Doing the same on master takes around 600ms for me at best:
$ touch Documentation/git-a*.txt; time make -k -j $(nproc) command-list.h 2>&1
GEN command-list.h
real 0m0.611s
user 0m0.756s
sys 0m0.112s
It's even faster when I have to make all of them (my -j is = 8), or
around 450ms after a "touch Documentation/git-*.txt".
We have ~170 lines we process in command-list.txt, I'd think on Windows
you'd get much better results, instead of optimizing those number of
"sort | uniq" to the same number of "sort -u" the common case is just
running 1-5 of those, as that's all the *.txt files you changed, then
just "cat-ing" the full set.
diff --git a/.gitignore b/.gitignore
index 311841f9bed..9b365395496 100644
--- a/.gitignore
+++ b/.gitignore
@@ -13,6 +13,7 @@
/GIT-USER-AGENT
/GIT-VERSION-FILE
/bin-wrappers/
+/build/
/git
/git-add
/git-add--interactive
diff --git a/Makefile b/Makefile
index c3565fc0f8f..5e845bd0f69 100644
--- a/Makefile
+++ b/Makefile
@@ -2231,12 +2231,30 @@ config-list.h: Documentation/*config.txt Documentation/config/*.txt
$(QUIET_GEN)$(SHELL_PATH) ./generate-configlist.sh \
>$@+ && mv $@+ $@
-command-list.h: generate-cmdlist.sh command-list.txt
+build/Documentation:
+ $(QUIET_GEN)mkdir -p build/Documentation
+.PRECIOUS: build/Documentation/git%.txt.cmdlist.in
+build/Documentation/git%.txt.cmdlist.in: Documentation/git%.txt
+ $(QUIET_GEN)if ! grep -w $(patsubst build/Documentation/%.txt.cmdlist.in,%,$@) command-list.txt >$@; \
+ then \
+ # For e.g. git-init-db, which has a *.txt file, but no \
+ # command-list.h entry \
+ >$@; \
+ fi
+build/Documentation/git%.txt.cmdlist: build/Documentation/git%.txt.cmdlist.in
+ $(QUIET_GEN)./generate-cmdlist.sh --tail $< >$@
-command-list.h: $(wildcard Documentation/git*.txt)
- $(QUIET_GEN)$(SHELL_PATH) ./generate-cmdlist.sh \
+COMMAND_LIST_H_FILES = $(filter-out Documentation/git.txt,$(wildcard Documentation/git*.txt))
+
+COMMAND_LIST_GEN = $(patsubst Documentation/%.txt,build/Documentation/%.txt.cmdlist,$(COMMAND_LIST_H_FILES))
+command-list.h: build/Documentation generate-cmdlist.sh command-list.txt $(COMMAND_LIST_GEN)
+ $(QUIET_GEN)$(SHELL_PATH) ./generate-cmdlist.sh --header \
$(patsubst %,--exclude-program %,$(EXCLUDED_PROGRAMS)) \
- command-list.txt >$@+ && mv $@+ $@
+ command-list.txt >$@+ && \
+ echo "static struct cmdname_help command_list[] = {" >>$@+ && \
+ cat build/Documentation/*.txt.cmdlist >>$@+ && \
+ echo "};" >>$@+ && \
+ mv $@+ $@
SCRIPT_DEFINES = $(SHELL_PATH_SQ):$(DIFF_SQ):$(GIT_VERSION):\
$(localedir_SQ):$(NO_CURL):$(USE_GETTEXT_SCHEME):$(SANE_TOOL_PATH_SQ):\
diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index 9dbbb08e70a..f80266d5138 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -1,5 +1,8 @@
#!/bin/sh
+mode=$1
+shift
+
die () {
echo "$@" >&2
exit 1
@@ -61,8 +64,6 @@ define_category_names () {
}
print_command_list () {
- echo "static struct cmdname_help command_list[] = {"
-
command_list "$1" |
while read cmd rest
do
@@ -73,6 +74,9 @@ print_command_list () {
done
echo " },"
done
+}
+
+end_print_command_list () {
echo "};"
}
@@ -84,6 +88,12 @@ do
shift
done
+if test "$mode" = "--tail"
+then
+ print_command_list "$1"
+ exit 0
+fi
+
echo "/* Automatically generated by generate-cmdlist.sh */
struct cmdname_help {
const char *name;
@@ -94,5 +104,3 @@ struct cmdname_help {
define_categories "$1"
echo
define_category_names "$1"
-echo
-print_command_list "$1"
next prev parent reply other threads:[~2021-06-25 8:57 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-24 13:16 Why the Makefile is so eager to re-build & re-link Ævar Arnfjörð Bjarmason
2021-06-24 15:16 ` Jeff King
2021-06-24 15:28 ` Ævar Arnfjörð Bjarmason
2021-06-24 21:30 ` Johannes Sixt
2021-06-25 8:34 ` Ævar Arnfjörð Bjarmason [this message]
2021-06-25 9:01 ` Ævar Arnfjörð Bjarmason
2021-06-29 2:13 ` Jeff King
2021-10-20 18:39 ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason
2021-10-20 18:39 ` [PATCH 1/8] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
2021-10-20 18:39 ` [PATCH 2/8] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
2021-10-20 18:39 ` [PATCH 3/8] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
2021-10-20 18:39 ` [PATCH 4/8] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
2021-10-20 18:39 ` [PATCH 5/8] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
2021-10-20 18:39 ` [PATCH 6/8] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
2021-10-21 14:42 ` Jeff King
2021-10-21 16:25 ` Jeff King
2021-10-20 18:39 ` [PATCH 7/8] Makefile: stop having command-list.h depend on a wildcard Ævar Arnfjörð Bjarmason
2021-10-21 14:45 ` Jeff King
2021-10-21 18:24 ` Junio C Hamano
2021-10-21 22:46 ` Øystein Walle
2021-10-20 18:39 ` [PATCH 8/8] Makefile: assert correct generate-cmdlist.sh output Ævar Arnfjörð Bjarmason
2021-10-20 20:35 ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Jeff King
2021-10-20 21:31 ` Taylor Blau
2021-10-20 23:14 ` Ævar Arnfjörð Bjarmason
2021-10-20 23:46 ` Jeff King
2021-10-21 0:48 ` Ævar Arnfjörð Bjarmason
2021-10-21 2:20 ` Taylor Blau
2021-10-22 12:37 ` Ævar Arnfjörð Bjarmason
2021-10-21 14:34 ` Jeff King
2021-10-21 22:34 ` Junio C Hamano
2021-10-22 10:51 ` Ævar Arnfjörð Bjarmason
2021-10-22 18:31 ` Jeff King
2021-10-22 20:50 ` Ævar Arnfjörð Bjarmason
2021-10-21 5:39 ` Eric Sunshine
2021-10-22 19:36 ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason
2021-10-22 19:36 ` [PATCH v2 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
2021-10-25 18:29 ` Junio C Hamano
2021-10-25 21:22 ` Ævar Arnfjörð Bjarmason
2021-10-25 21:26 ` Junio C Hamano
2021-10-22 19:36 ` [PATCH v2 02/10] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
2021-10-22 19:36 ` [PATCH v2 03/10] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
2021-10-22 19:36 ` [PATCH v2 04/10] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
2021-10-22 19:36 ` [PATCH v2 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
2021-10-22 19:36 ` [PATCH v2 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
2021-10-22 19:36 ` [PATCH v2 07/10] generate-cmdlist.sh: stop sorting category lines Ævar Arnfjörð Bjarmason
2021-10-25 16:39 ` Jeff King
2021-10-22 19:36 ` [PATCH v2 08/10] generate-cmdlist.sh: do not shell out to "sed" Ævar Arnfjörð Bjarmason
2021-10-25 16:46 ` Jeff King
2021-10-25 17:52 ` Jeff King
2021-10-22 19:36 ` [PATCH v2 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason
2021-10-23 22:19 ` Junio C Hamano
2021-10-23 22:26 ` Junio C Hamano
2021-10-22 19:36 ` [PATCH v2 10/10] generate-cmdlist.sh: replace "cut", "tr" and "grep" with pure-shell Ævar Arnfjörð Bjarmason
2021-10-23 22:26 ` Junio C Hamano
2021-10-22 21:20 ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Taylor Blau
2021-10-23 22:34 ` Junio C Hamano
2021-10-25 16:57 ` Jeff King
2021-11-05 14:07 ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason
2021-11-05 14:07 ` [PATCH v3 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
2021-11-05 22:45 ` Junio C Hamano
2021-11-06 4:26 ` Ævar Arnfjörð Bjarmason
2021-11-08 19:18 ` Junio C Hamano
2021-11-05 14:08 ` [PATCH v3 02/10] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
2021-11-05 14:08 ` [PATCH v3 03/10] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
2021-11-05 22:47 ` Junio C Hamano
2021-11-06 4:23 ` Ævar Arnfjörð Bjarmason
2021-11-05 14:08 ` [PATCH v3 04/10] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
2021-11-05 14:08 ` [PATCH v3 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
2021-11-05 14:08 ` [PATCH v3 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
2021-11-05 14:08 ` [PATCH v3 07/10] generate-cmdlist.sh: stop sorting category lines Ævar Arnfjörð Bjarmason
2021-11-05 14:08 ` [PATCH v3 08/10] generate-cmdlist.sh: do not shell out to "sed" Ævar Arnfjörð Bjarmason
2021-11-05 14:08 ` [PATCH v3 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason
2021-11-05 14:08 ` [PATCH v3 10/10] generate-cmdlist.sh: don't parse command-list.txt thrice Ævar Arnfjörð Bjarmason
2021-06-25 21:17 ` Why the Makefile is so eager to re-build & re-link Felipe Contreras
2021-06-29 5:04 ` Eric Sunshine
2021-06-24 23:35 ` Øystein Walle
2021-06-24 23:39 ` Øystein Walle
2021-06-25 0:11 ` Ævar Arnfjörð Bjarmason
2021-07-02 11:58 ` [PATCH] Documentation/Makefile: don't re-build on 'git version' changes Ævar Arnfjörð Bjarmason
2021-07-02 15:53 ` Junio C Hamano
2021-07-03 11:58 ` Ævar Arnfjörð Bjarmason
2021-07-05 19:48 ` Junio C Hamano
2021-07-03 1:05 ` Felipe Contreras
2021-07-03 12:03 ` Ævar Arnfjörð Bjarmason
2021-07-03 18:56 ` Felipe Contreras
2021-07-05 19:38 ` Junio C Hamano
2021-07-06 22:25 ` Felipe Contreras
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: http://vger.kernel.org/majordomo-info.html
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87r1gqxqxn.fsf@evledraar.gmail.com \
--to=avarab@gmail.com \
--cc=Johannes.Schindelin@gmx.de \
--cc=emilyshaffer@google.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=j6t@kdbg.org \
--cc=me@ttaylorr.com \
--cc=peff@peff.net \
--cc=sunshine@sunshineco.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://80x24.org/mirrors/git.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).