git@vger.kernel.org mailing list mirror (one of many)
 help / color / mirror / code / Atom feed
From: "Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
To: Johannes Sixt <j6t@kdbg.org>
Cc: Jeff King <peff@peff.net>, Git List <git@vger.kernel.org>,
	Junio C Hamano <gitster@pobox.com>, Taylor Blau <me@ttaylorr.com>,
	Emily Shaffer <emilyshaffer@google.com>,
	Eric Sunshine <sunshine@sunshineco.com>,
	Johannes Schindelin <Johannes.Schindelin@gmx.de>
Subject: Re: Why the Makefile is so eager to re-build & re-link
Date: Fri, 25 Jun 2021 10:34:20 +0200	[thread overview]
Message-ID: <87r1gqxqxn.fsf@evledraar.gmail.com> (raw)
In-Reply-To: <fb23a23e-13be-14a8-4fbe-5ca2b4bcdb52@kdbg.org>


On Thu, Jun 24 2021, Johannes Sixt wrote:

> Am 24.06.21 um 17:16 schrieb Jeff King:
>> On Thu, Jun 24, 2021 at 03:16:48PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>>  * {command,config}-list.h (and in-flight, my hook-list.h): Every time
>>>    you touch a Documentation/git-*.txt we need to re-generate these, and
>>>    since their mtime changes we re-compile and re-link all the way up to
>>>    libgit and our other tools.
>>>
>>>    I think the best solution here is to make the generate-*.sh
>>>    shellscripts faster (just one takes ~300ms of nested shellscripting,
>>>    just to grep out the first few lines of every git-*.txt, in e.g. Perl
>>>    or a smarter awk script this would be <5ms).
>> 
>> Yeah, I think Eric mentioned he had looked into doing this in perl, but
>> we weren't entirely happy with the dependency. Here's another really odd
>> thing I noticed:
>> 
>>   $ time sh ./generate-cmdlist.sh command-list.txt >one
>>   real	0m1.323s
>>   user	0m1.531s
>>   sys	0m0.834s
>> 
>>   $ time sh -x ./generate-cmdlist.sh command-list.txt >two
>>   [a bunch of trace output]
>>   real	0m0.513s
>>   user	0m0.754s
>>   sys	0m0.168s
>> 
>>   $ cmp one two
>>   [no output]
>> 
>> Er, what? Running with "-x" makes it almost 3 times faster to generate
>> the same output? I'd have said this is an anomaly, but it's repeatable
>> (and swapping the order produces the same output, so it's not some weird
>> priming thing). And then to top it all off, redirecting the trace is
>> slow again:
>> 
>>   $ time sh -x ./generate-cmdlist.sh command-list.txt >two 2>/dev/null
>>   real	0m1.363s
>>   user	0m1.538s
>>   sys	0m0.902s
>> 
>> A little mini-mystery that I think I may leave unsolved for now.
>
> Strange, really. Reminds me of the case that the `read` built-in must
> read input byte by byte if stdin is not connected to a (searchable) file.
>
> I have two patches that speed up generate-cmdlist.sh a bit:
>
> generate-cmdlist.sh: replace for loop by printf's auto-repeat feature
> (https://github.com/j6t/git/commit/b6d05f653bede727bc001f299b57969f62d3bc03)
> generate-cmdlist.sh: spawn fewer processes
> (https://github.com/j6t/git/commit/fd8721ee8fae06d7b584fa5166f32bf5521ca304)
>
> that I can submit (give me some time, though) or interested parties
> could pick up.

Interesting, but I think rather than micro-optimizing the O(n) loop it
makes more sense to turn it into a series of O(1) in -j parallel,
i.e. actually use the make dependency graph for this as I suggested in:
https://lore.kernel.org/git/87wnqiyejg.fsf@evledraar.gmail.com/

Something like the hacky throwaway patch that follows. Now when you
touch a file in Documentation/git-*.txt you re-make just that file
chain, which gets assembled into the command-list.h:
	
	$ touch Documentation/git-add.txt; time make -k -j $(nproc) git 2>&1
	GIT_VERSION = 2.32.0.94.g87ef2a6b7ed.dirty
	    GEN build/Documentation/git-add.txt.cmdlist.in
	    CC version.o
	    GEN build/Documentation/git-add.txt.cmdlist
	    GEN command-list.h
	    CC help.o
	    AR libgit.a
	    LINK git

Those build/* files are, respectively, the relevant line corresponding
to the *.txt file from command-list.txt:

    $ cat build/Documentation/git-add.txt.cmdlist.in
    git-add                                 mainporcelain           worktree

And then the line we want in command-list.h at the end:

    $ cat build/Documentation/git-add.txt.cmdlist
        { "git-add", N_("Add file contents to the index"), 0 | CAT_mainporcelain | CAT_worktree },

The build process in then just a matter of keeping those files
up-to-date, and having "make" create the command-list.h is just a matter
of:

    cat build/Documentation/*.cmdlist

Well, with the categories still being an O(n) affair, but the cost of
generating those is trivial.

I think that aside from optimizing for speed it just makes things
clearer, if you modify e.g. git-add.txt you see a clear chain of output
from make about how we generate output from that, and then make the
command-list.h.

Now we'd just show a mysterious command-list.h entry. Whenever you have
"make" create one file from many it becomes hard to see at a glance
what's going on.

I'd be curious to see what how that performs e.g. on Windows, on Linux I
get e.g. (warm cache):
	
	$ touch Documentation/git-a*.txt; time make -k -j $(nproc) command-list.h 2>&1
	    GEN build/Documentation/git-apply.txt.cmdlist.in
	    GEN build/Documentation/git-annotate.txt.cmdlist.in
	    GEN build/Documentation/git-am.txt.cmdlist.in
	    GEN build/Documentation/git-add.txt.cmdlist.in
	    GEN build/Documentation/git-archimport.txt.cmdlist.in
	    GEN build/Documentation/git-archive.txt.cmdlist.in
	    GEN build/Documentation/git-apply.txt.cmdlist
	    GEN build/Documentation/git-annotate.txt.cmdlist
	    GEN build/Documentation/git-am.txt.cmdlist
	    GEN build/Documentation/git-add.txt.cmdlist
	    GEN build/Documentation/git-archimport.txt.cmdlist
	    GEN build/Documentation/git-archive.txt.cmdlist
	    GEN command-list.h
	
	real    0m0.214s
	user    0m0.196s
	sys     0m0.071s
	
Doing the same on master takes around 600ms for me at best:
	
	$ touch Documentation/git-a*.txt; time make -k -j $(nproc) command-list.h 2>&1
	    GEN command-list.h
	
	real    0m0.611s
	user    0m0.756s
	sys     0m0.112s

It's even faster when I have to make all of them (my -j is = 8), or
around 450ms after a "touch Documentation/git-*.txt".

We have ~170 lines we process in command-list.txt, I'd think on Windows
you'd get much better results, instead of optimizing those number of
"sort | uniq" to the same number of "sort -u" the common case is just
running 1-5 of those, as that's all the *.txt files you changed, then
just "cat-ing" the full set.

diff --git a/.gitignore b/.gitignore
index 311841f9bed..9b365395496 100644
--- a/.gitignore
+++ b/.gitignore
@@ -13,6 +13,7 @@
 /GIT-USER-AGENT
 /GIT-VERSION-FILE
 /bin-wrappers/
+/build/
 /git
 /git-add
 /git-add--interactive
diff --git a/Makefile b/Makefile
index c3565fc0f8f..5e845bd0f69 100644
--- a/Makefile
+++ b/Makefile
@@ -2231,12 +2231,30 @@ config-list.h: Documentation/*config.txt Documentation/config/*.txt
 	$(QUIET_GEN)$(SHELL_PATH) ./generate-configlist.sh \
 		>$@+ && mv $@+ $@
 
-command-list.h: generate-cmdlist.sh command-list.txt
+build/Documentation:
+	$(QUIET_GEN)mkdir -p build/Documentation
+.PRECIOUS: build/Documentation/git%.txt.cmdlist.in
+build/Documentation/git%.txt.cmdlist.in: Documentation/git%.txt
+	$(QUIET_GEN)if ! grep -w $(patsubst build/Documentation/%.txt.cmdlist.in,%,$@) command-list.txt >$@; \
+	then \
+		# For e.g. git-init-db, which has a *.txt file, but no \
+		# command-list.h entry \
+		>$@; \
+	fi
+build/Documentation/git%.txt.cmdlist: build/Documentation/git%.txt.cmdlist.in
+	$(QUIET_GEN)./generate-cmdlist.sh --tail $< >$@
 
-command-list.h: $(wildcard Documentation/git*.txt)
-	$(QUIET_GEN)$(SHELL_PATH) ./generate-cmdlist.sh \
+COMMAND_LIST_H_FILES = $(filter-out Documentation/git.txt,$(wildcard Documentation/git*.txt))
+
+COMMAND_LIST_GEN = $(patsubst Documentation/%.txt,build/Documentation/%.txt.cmdlist,$(COMMAND_LIST_H_FILES))
+command-list.h: build/Documentation generate-cmdlist.sh command-list.txt $(COMMAND_LIST_GEN)
+	$(QUIET_GEN)$(SHELL_PATH) ./generate-cmdlist.sh --header \
 		$(patsubst %,--exclude-program %,$(EXCLUDED_PROGRAMS)) \
-		command-list.txt >$@+ && mv $@+ $@
+		command-list.txt >$@+ && \
+	echo "static struct cmdname_help command_list[] = {" >>$@+ && \
+	cat build/Documentation/*.txt.cmdlist >>$@+ && \
+	echo "};" >>$@+ && \
+	mv $@+ $@
 
 SCRIPT_DEFINES = $(SHELL_PATH_SQ):$(DIFF_SQ):$(GIT_VERSION):\
 	$(localedir_SQ):$(NO_CURL):$(USE_GETTEXT_SCHEME):$(SANE_TOOL_PATH_SQ):\
diff --git a/generate-cmdlist.sh b/generate-cmdlist.sh
index 9dbbb08e70a..f80266d5138 100755
--- a/generate-cmdlist.sh
+++ b/generate-cmdlist.sh
@@ -1,5 +1,8 @@
 #!/bin/sh
 
+mode=$1
+shift
+
 die () {
 	echo "$@" >&2
 	exit 1
@@ -61,8 +64,6 @@ define_category_names () {
 }
 
 print_command_list () {
-	echo "static struct cmdname_help command_list[] = {"
-
 	command_list "$1" |
 	while read cmd rest
 	do
@@ -73,6 +74,9 @@ print_command_list () {
 		done
 		echo " },"
 	done
+}
+
+end_print_command_list () {
 	echo "};"
 }
 
@@ -84,6 +88,12 @@ do
 	shift
 done
 
+if test "$mode" = "--tail"
+then
+	print_command_list "$1"
+	exit 0
+fi
+
 echo "/* Automatically generated by generate-cmdlist.sh */
 struct cmdname_help {
 	const char *name;
@@ -94,5 +104,3 @@ struct cmdname_help {
 define_categories "$1"
 echo
 define_category_names "$1"
-echo
-print_command_list "$1"

  reply	other threads:[~2021-06-25  8:57 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-24 13:16 Why the Makefile is so eager to re-build & re-link Ævar Arnfjörð Bjarmason
2021-06-24 15:16 ` Jeff King
2021-06-24 15:28   ` Ævar Arnfjörð Bjarmason
2021-06-24 21:30   ` Johannes Sixt
2021-06-25  8:34     ` Ævar Arnfjörð Bjarmason [this message]
2021-06-25  9:01       ` Ævar Arnfjörð Bjarmason
2021-06-29  2:13       ` Jeff King
2021-10-20 18:39         ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 1/8] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 2/8] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 3/8] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 4/8] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 5/8] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
2021-10-20 18:39           ` [PATCH 6/8] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
2021-10-21 14:42             ` Jeff King
2021-10-21 16:25               ` Jeff King
2021-10-20 18:39           ` [PATCH 7/8] Makefile: stop having command-list.h depend on a wildcard Ævar Arnfjörð Bjarmason
2021-10-21 14:45             ` Jeff King
2021-10-21 18:24               ` Junio C Hamano
2021-10-21 22:46             ` Øystein Walle
2021-10-20 18:39           ` [PATCH 8/8] Makefile: assert correct generate-cmdlist.sh output Ævar Arnfjörð Bjarmason
2021-10-20 20:35           ` [PATCH 0/8] Makefile: make command-list.h 2-5x as fast with -jN Jeff King
2021-10-20 21:31             ` Taylor Blau
2021-10-20 23:14               ` Ævar Arnfjörð Bjarmason
2021-10-20 23:46                 ` Jeff King
2021-10-21  0:48                   ` Ævar Arnfjörð Bjarmason
2021-10-21  2:20                     ` Taylor Blau
2021-10-22 12:37                       ` Ævar Arnfjörð Bjarmason
2021-10-21 14:34                     ` Jeff King
2021-10-21 22:34                       ` Junio C Hamano
2021-10-22 10:51                       ` Ævar Arnfjörð Bjarmason
2021-10-22 18:31                         ` Jeff King
2021-10-22 20:50                           ` Ævar Arnfjörð Bjarmason
2021-10-21  5:39                 ` Eric Sunshine
2021-10-22 19:36           ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
2021-10-25 18:29               ` Junio C Hamano
2021-10-25 21:22                 ` Ævar Arnfjörð Bjarmason
2021-10-25 21:26                   ` Junio C Hamano
2021-10-22 19:36             ` [PATCH v2 02/10] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 03/10] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 04/10] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
2021-10-22 19:36             ` [PATCH v2 07/10] generate-cmdlist.sh: stop sorting category lines Ævar Arnfjörð Bjarmason
2021-10-25 16:39               ` Jeff King
2021-10-22 19:36             ` [PATCH v2 08/10] generate-cmdlist.sh: do not shell out to "sed" Ævar Arnfjörð Bjarmason
2021-10-25 16:46               ` Jeff King
2021-10-25 17:52                 ` Jeff King
2021-10-22 19:36             ` [PATCH v2 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason
2021-10-23 22:19               ` Junio C Hamano
2021-10-23 22:26               ` Junio C Hamano
2021-10-22 19:36             ` [PATCH v2 10/10] generate-cmdlist.sh: replace "cut", "tr" and "grep" with pure-shell Ævar Arnfjörð Bjarmason
2021-10-23 22:26               ` Junio C Hamano
2021-10-22 21:20             ` [PATCH v2 00/10] Makefile: make generate-cmdlist.sh much faster Taylor Blau
2021-10-23 22:34             ` Junio C Hamano
2021-10-25 16:57             ` Jeff King
2021-11-05 14:07             ` [PATCH v3 00/10] generate-cmdlist.sh: make it (and "make") run faster Ævar Arnfjörð Bjarmason
2021-11-05 14:07               ` [PATCH v3 01/10] command-list.txt: sort with "LC_ALL=C sort" Ævar Arnfjörð Bjarmason
2021-11-05 22:45                 ` Junio C Hamano
2021-11-06  4:26                   ` Ævar Arnfjörð Bjarmason
2021-11-08 19:18                     ` Junio C Hamano
2021-11-05 14:08               ` [PATCH v3 02/10] generate-cmdlist.sh: trivial whitespace change Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 03/10] generate-cmdlist.sh: spawn fewer processes Ævar Arnfjörð Bjarmason
2021-11-05 22:47                 ` Junio C Hamano
2021-11-06  4:23                   ` Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 04/10] generate-cmdlist.sh: don't call get_categories() from category_list() Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 05/10] generate-cmdlist.sh: run "grep | sort", not "sort | grep" Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 06/10] generate-cmdlist.sh: replace for loop by printf's auto-repeat feature Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 07/10] generate-cmdlist.sh: stop sorting category lines Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 08/10] generate-cmdlist.sh: do not shell out to "sed" Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 09/10] generate-cmdlist.sh: replace "grep' invocation with a shell version Ævar Arnfjörð Bjarmason
2021-11-05 14:08               ` [PATCH v3 10/10] generate-cmdlist.sh: don't parse command-list.txt thrice Ævar Arnfjörð Bjarmason
2021-06-25 21:17   ` Why the Makefile is so eager to re-build & re-link Felipe Contreras
2021-06-29  5:04   ` Eric Sunshine
2021-06-24 23:35 ` Øystein Walle
2021-06-24 23:39   ` Øystein Walle
2021-06-25  0:11   ` Ævar Arnfjörð Bjarmason
2021-07-02 11:58 ` [PATCH] Documentation/Makefile: don't re-build on 'git version' changes Ævar Arnfjörð Bjarmason
2021-07-02 15:53   ` Junio C Hamano
2021-07-03 11:58     ` Ævar Arnfjörð Bjarmason
2021-07-05 19:48       ` Junio C Hamano
2021-07-03  1:05   ` Felipe Contreras
2021-07-03 12:03     ` Ævar Arnfjörð Bjarmason
2021-07-03 18:56       ` Felipe Contreras
2021-07-05 19:38       ` Junio C Hamano
2021-07-06 22:25         ` Felipe Contreras

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: http://vger.kernel.org/majordomo-info.html

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87r1gqxqxn.fsf@evledraar.gmail.com \
    --to=avarab@gmail.com \
    --cc=Johannes.Schindelin@gmx.de \
    --cc=emilyshaffer@google.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=j6t@kdbg.org \
    --cc=me@ttaylorr.com \
    --cc=peff@peff.net \
    --cc=sunshine@sunshineco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://80x24.org/mirrors/git.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).